Abstract
Deep neural network-based methods have achieved outstanding results in the task of text classification. However, the relationship of text–label and label–label has not been thoroughly investigated for most existing methods. Furthermore, these methods have excessive computational and memory overhead for large-scale classification. To address these challenges, we propose a novel framework with metric learning and knowledge distillation. We first project the texts and labels into the same embedding space by utilizing the symmetry metric learning on both text–centric and label–centric relationships. Then the distillation component is introduced to learn the text representation features with a deep module. Finally, we use this distilled module to encode new text and make predictions with label embeddings in the metric space. Experimental results on four real datasets show that our model achieves very competitive prediction accuracy while improving training and prediction efficiency.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-023-08308-3/MediaObjects/521_2023_8308_Fig6_HTML.png)
Similar content being viewed by others
Data availability
The dataset used during the current study is publicly available, and the available links have been given in the manuscript.
Notes
Our source code is available at https://github.com/qsw-code/LMSD.
References
Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings
Bhatia K, Jain H, Kar P et al (2015) Sparse local embeddings for extreme multi-label classification. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 730–738
Dahiya K, Agarwal A, Saini D, et al (2021) Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In: International conference on machine learning, PMLR, pp 2330–2340
Devlin J, Chang MW, Lee K et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (long and short papers), ACL, Minneapolis, Minnesota, pp 4171–4186
Gupta N, Bohra S, Prabhu Y et al (2021) Generalized zero-shot extreme multi-label learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining, pp 527–535
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR. arxiv: https://arxiv.org/abs/1503.02531
Jain H, Balasubramanian V, Chunduri B et al (2019) Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 528–536
Jiang T, Wang D, Sun L et al (2021) Lightxml: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp 7987–7994
Jiao X, Yin Y, Shang L et al (2020) Tinybert: distilling BERT for natural language understanding. In: Cohn T, He Y, Liu Y (eds) Findings of the association for computational linguistics: EMNLP 2020, Online Event, 16–20 November 2020, pp 4163–4174
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp 1746–1751
Liu J, Chang WC, Wu Y et al (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 115–124
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2873–2879
Liu W, Wang H, Shen X et al (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974
Loza Mencía E, Fürnkranz J (2008) An evaluation of efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Montemagni S, Tiscornia D, Francesconi E et al (eds) Proceedings of the LREC 2008 workshop on semantic processing of legal texts, Marrakech, Morocco, pp 23–32
Mittal A, Sachdeva N, Agrawal S et al (2021) Eclare: extreme classification with label graph correlations. Proc Web Conf 2021:3721–3732
Pappas N, Henderson J (2019) GILE: a generalized input-label embedding for text classification. Trans Assoc Comput Linguist 7:139–155
Prabhu Y, Kag A, Gopinath S et al (2018) Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 441–449
Qin S, Wu H, Nie R et al (2020) Deep model with neighborhood-awareness for text tagging. Knowl Based Syst 196(105):750
Rendle S, Freudenthaler C, Gantner Z et al (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Bilmes JA, Ng AY (eds) UAI 2009, proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, Montreal, QC, Canada, June 18–21, 2009. AUAI Press, pp 452–461
Sanh V, Debut L, Chaumond J et al (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. arxiv: https://arxiv.org/abs/1910.01108
Sun S, Cheng Y, Gan Z et al (2019) Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp 4322–4331
Wang G, Li C, Wang W et al (2018) Joint embedding of words and labels for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, vol 1, Melbourne, Australia, July 15–20, 2018, long papers, pp 2321–2331
Wang H, Chen B, Li W (2013) Collaborative topic regression with social regularization for tag recommendation. In: IJCAI 2013, Proceedings of the 23rd international joint conference on artificial intelligence, Bei**g, China, August 3–9, 2013, pp 2719–2725
Wang R, Ridley R, Su X et al (2021) A novel reasoning mechanism for multi-label text classification. Inf Process Manag 58(2):102,441
Weston J, Chopra S, Adams K (2014) # tagspace: semantic embeddings from hashtags. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1822–1827
Wu H, Qin S, Nie R et al (2022) Effective collaborative representation learning for multilabel text categorization. IEEE Trans Neural Netw Learn Syst 33(10):5200–5214
**ao L, Zhang X, **g L et al (2021) Does head label help for long-tailed multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021. AAAI Press, pp 14,103–14,111
Yang Z, Yang D, Dyer C et al (2016) Hierarchical attention networks for document classification. In: NAACL HLT 2016, The 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego California, USA, June 12–17, 2016, pp 1480–1489
You R, Zhang Z, Wang Z et al (2019) AttentionXML: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In: Vancouver BC (ed) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019. Canada, pp 5812–5822
Zhang Q, Zhang X, Yan Z et al (2021) Correlation-guided representation for multi-label text classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI 2021, virtual event/Montreal, Canada, 19–27 August 2021. ijcai.org, pp 3363–3369
Zhang X, Zhang Q, Yan Z et al (2021) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, online event, August 1–6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for Computational Linguistics, pp 1190–1200
Zhou P, Qi Z, Zheng S et al (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING 2016, 26th international conference on computational linguistics, proceedings of the conference: technical papers, December 11–16, 2016, Osaka, Japan, pp 3485–3495
Zhou P, Shi W, Tian J et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, vol 2, short papers, August 7–12, 2016, Berlin, Germany
Zubiaga A (2012) Enhancing navigation on wikipedia with social tags. CoRR. https://arxiv.org/abs/1202.5469
Acknowledgements
This work is supported by the National Natural Science Foundation of China (62062066, 61962061), partially supported by the Key Program of Basic Research of Yunnan Province (202201AS070015), the Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology (202005AC160005), Top Young Talents of “Ten Thousand Plan” in Yunnan Province, the Program for Excellent Young Talents of Yunnan University.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, S., Wu, H., Zhou, L. et al. Learning metric space with distillation for large-scale multi-label text classification. Neural Comput & Applic 35, 11445–11458 (2023). https://doi.org/10.1007/s00521-023-08308-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08308-3