Learning metric space with distillation for large-scale multi-label text classification

Qin, Shaowei; Wu, Hao; Zhou, Lihua; Li, Jiahui; Du, Guowang

doi:10.1007/s00521-023-08308-3

Learning metric space with distillation for large-scale multi-label text classification

Original Article
Published: 11 February 2023

Volume 35, pages 11445–11458, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Shaowei Qin^1,2,
Hao Wu ORCID: orcid.org/0000-0002-3696-9281^1,2,
Lihua Zhou^1,2,
Jiahui Li^1,2 &
…
Guowang Du^1,2

398 Accesses
2 Citations
Explore all metrics

Abstract

Deep neural network-based methods have achieved outstanding results in the task of text classification. However, the relationship of text–label and label–label has not been thoroughly investigated for most existing methods. Furthermore, these methods have excessive computational and memory overhead for large-scale classification. To address these challenges, we propose a novel framework with metric learning and knowledge distillation. We first project the texts and labels into the same embedding space by utilizing the symmetry metric learning on both text–centric and label–centric relationships. Then the distillation component is introduced to learn the text representation features with a deep module. Finally, we use this distilled module to encode new text and make predictions with label embeddings in the metric space. Experimental results on four real datasets show that our model achieves very competitive prediction accuracy while improving training and prediction efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 6

Integrating Label Semantic Similarity Scores into Multi-label Text Classification

Learning neural networks for text classification by exploiting label relations

Article 26 May 2020

Learning Semantic Similarity for Multi-label Text Categorization

Data availability

The dataset used during the current study is publicly available, and the available links have been given in the manuscript.

Notes

References

Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Article Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings
Bhatia K, Jain H, Kar P et al (2015) Sparse local embeddings for extreme multi-label classification. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 730–738
Dahiya K, Agarwal A, Saini D, et al (2021) Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In: International conference on machine learning, PMLR, pp 2330–2340
Devlin J, Chang MW, Lee K et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (long and short papers), ACL, Minneapolis, Minnesota, pp 4171–4186
Gupta N, Bohra S, Prabhu Y et al (2021) Generalized zero-shot extreme multi-label learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining, pp 527–535
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR. arxiv: https://arxiv.org/abs/1503.02531
Jain H, Balasubramanian V, Chunduri B et al (2019) Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 528–536
Jiang T, Wang D, Sun L et al (2021) Lightxml: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp 7987–7994
Jiao X, Yin Y, Shang L et al (2020) Tinybert: distilling BERT for natural language understanding. In: Cohn T, He Y, Liu Y (eds) Findings of the association for computational linguistics: EMNLP 2020, Online Event, 16–20 November 2020, pp 4163–4174
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp 1746–1751
Liu J, Chang WC, Wu Y et al (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 115–124
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2873–2879
Liu W, Wang H, Shen X et al (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974
Article Google Scholar
Loza Mencía E, Fürnkranz J (2008) An evaluation of efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Montemagni S, Tiscornia D, Francesconi E et al (eds) Proceedings of the LREC 2008 workshop on semantic processing of legal texts, Marrakech, Morocco, pp 23–32
Mittal A, Sachdeva N, Agrawal S et al (2021) Eclare: extreme classification with label graph correlations. Proc Web Conf 2021:3721–3732
Google Scholar
Pappas N, Henderson J (2019) GILE: a generalized input-label embedding for text classification. Trans Assoc Comput Linguist 7:139–155
Article Google Scholar
Prabhu Y, Kag A, Gopinath S et al (2018) Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 441–449
Qin S, Wu H, Nie R et al (2020) Deep model with neighborhood-awareness for text tagging. Knowl Based Syst 196(105):750
Google Scholar
Rendle S, Freudenthaler C, Gantner Z et al (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Bilmes JA, Ng AY (eds) UAI 2009, proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, Montreal, QC, Canada, June 18–21, 2009. AUAI Press, pp 452–461
Sanh V, Debut L, Chaumond J et al (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. arxiv: https://arxiv.org/abs/1910.01108
Sun S, Cheng Y, Gan Z et al (2019) Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp 4322–4331
Wang G, Li C, Wang W et al (2018) Joint embedding of words and labels for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, vol 1, Melbourne, Australia, July 15–20, 2018, long papers, pp 2321–2331
Wang H, Chen B, Li W (2013) Collaborative topic regression with social regularization for tag recommendation. In: IJCAI 2013, Proceedings of the 23rd international joint conference on artificial intelligence, Bei**g, China, August 3–9, 2013, pp 2719–2725
Wang R, Ridley R, Su X et al (2021) A novel reasoning mechanism for multi-label text classification. Inf Process Manag 58(2):102,441
Article Google Scholar
Weston J, Chopra S, Adams K (2014) # tagspace: semantic embeddings from hashtags. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1822–1827
Wu H, Qin S, Nie R et al (2022) Effective collaborative representation learning for multilabel text categorization. IEEE Trans Neural Netw Learn Syst 33(10):5200–5214
Article Google Scholar
**ao L, Zhang X, **g L et al (2021) Does head label help for long-tailed multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021. AAAI Press, pp 14,103–14,111
Yang Z, Yang D, Dyer C et al (2016) Hierarchical attention networks for document classification. In: NAACL HLT 2016, The 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego California, USA, June 12–17, 2016, pp 1480–1489
You R, Zhang Z, Wang Z et al (2019) AttentionXML: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In: Vancouver BC (ed) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019. Canada, pp 5812–5822
Zhang Q, Zhang X, Yan Z et al (2021) Correlation-guided representation for multi-label text classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI 2021, virtual event/Montreal, Canada, 19–27 August 2021. ijcai.org, pp 3363–3369
Zhang X, Zhang Q, Yan Z et al (2021) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, online event, August 1–6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for Computational Linguistics, pp 1190–1200
Zhou P, Qi Z, Zheng S et al (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING 2016, 26th international conference on computational linguistics, proceedings of the conference: technical papers, December 11–16, 2016, Osaka, Japan, pp 3485–3495
Zhou P, Shi W, Tian J et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, vol 2, short papers, August 7–12, 2016, Berlin, Germany
Zubiaga A (2012) Enhancing navigation on wikipedia with social tags. CoRR. https://arxiv.org/abs/1202.5469

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62062066, 61962061), partially supported by the Key Program of Basic Research of Yunnan Province (202201AS070015), the Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology (202005AC160005), Top Young Talents of “Ten Thousand Plan” in Yunnan Province, the Program for Excellent Young Talents of Yunnan University.

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, 650091, China
Shaowei Qin, Hao Wu, Lihua Zhou, Jiahui Li & Guowang Du
Key Lab of Intelligent Systems and Computing of Yunnan Province, Yunnan University, Kunming, China
Shaowei Qin, Hao Wu, Lihua Zhou, Jiahui Li & Guowang Du

Authors

Shaowei Qin
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Li
View author publications
You can also search for this author in PubMed Google Scholar
Guowang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hao Wu or Lihua Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qin, S., Wu, H., Zhou, L. et al. Learning metric space with distillation for large-scale multi-label text classification. Neural Comput & Applic 35, 11445–11458 (2023). https://doi.org/10.1007/s00521-023-08308-3

Download citation

Received: 29 June 2022
Accepted: 16 January 2023
Published: 11 February 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08308-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning metric space with distillation for large-scale multi-label text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Integrating Label Semantic Similarity Scores into Multi-label Text Classification

Learning neural networks for text classification by exploiting label relations

Learning Semantic Similarity for Multi-label Text Categorization

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning metric space with distillation for large-scale multi-label text classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Integrating Label Semantic Similarity Scores into Multi-label Text Classification

Learning neural networks for text classification by exploiting label relations

Learning Semantic Similarity for Multi-label Text Categorization

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation