Log in

Learning metric space with distillation for large-scale multi-label text classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Deep neural network-based methods have achieved outstanding results in the task of text classification. However, the relationship of text–label and label–label has not been thoroughly investigated for most existing methods. Furthermore, these methods have excessive computational and memory overhead for large-scale classification. To address these challenges, we propose a novel framework with metric learning and knowledge distillation. We first project the texts and labels into the same embedding space by utilizing the symmetry metric learning on both text–centric and label–centric relationships. Then the distillation component is introduced to learn the text representation features with a deep module. Finally, we use this distilled module to encode new text and make predictions with label embeddings in the metric space. Experimental results on four real datasets show that our model achieves very competitive prediction accuracy while improving training and prediction efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The dataset used during the current study is publicly available, and the available links have been given in the manuscript.

Notes

  1. https://archive.ics.uci.edu/ml/datasets/Reuters-21578+Text+Categorization+Collection.

  2. http://www.ke.tu-darmstadt.de/resources/eurlex/eurlex.html.

  3. http://manikvarma.org/downloads/XC/XMLRepository.html.

  4. https://github.com/js05212/citeulike-t.

  5. https://scikit-learn.org/.

  6. Our source code is available at https://github.com/qsw-code/LMSD.

  7. https://nlp.stanford.edu/projects/glove.

  8. https://www.tensorflow.org.

References

  1. Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251

    Article  Google Scholar 

  2. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings

  3. Bhatia K, Jain H, Kar P et al (2015) Sparse local embeddings for extreme multi-label classification. In: Cortes C, Lawrence ND, Lee DD et al (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp 730–738

  4. Dahiya K, Agarwal A, Saini D, et al (2021) Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In: International conference on machine learning, PMLR, pp 2330–2340

  5. Devlin J, Chang MW, Lee K et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (long and short papers), ACL, Minneapolis, Minnesota, pp 4171–4186

  6. Gupta N, Bohra S, Prabhu Y et al (2021) Generalized zero-shot extreme multi-label learning. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery and data mining, pp 527–535

  7. Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR. arxiv: https://arxiv.org/abs/1503.02531

  8. Jain H, Balasubramanian V, Chunduri B et al (2019) Slice: scalable linear extreme classifiers trained on 100 million labels for related searches. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 528–536

  9. Jiang T, Wang D, Sun L et al (2021) Lightxml: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp 7987–7994

  10. Jiao X, Yin Y, Shang L et al (2020) Tinybert: distilling BERT for natural language understanding. In: Cohn T, He Y, Liu Y (eds) Findings of the association for computational linguistics: EMNLP 2020, Online Event, 16–20 November 2020, pp 4163–4174

  11. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp 1746–1751

  12. Liu J, Chang WC, Wu Y et al (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 115–124

  13. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2873–2879

  14. Liu W, Wang H, Shen X et al (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955–7974

    Article  Google Scholar 

  15. Loza Mencía E, Fürnkranz J (2008) An evaluation of efficient multilabel classification algorithms for large-scale problems in the legal domain. In: Montemagni S, Tiscornia D, Francesconi E et al (eds) Proceedings of the LREC 2008 workshop on semantic processing of legal texts, Marrakech, Morocco, pp 23–32

  16. Mittal A, Sachdeva N, Agrawal S et al (2021) Eclare: extreme classification with label graph correlations. Proc Web Conf 2021:3721–3732

    Google Scholar 

  17. Pappas N, Henderson J (2019) GILE: a generalized input-label embedding for text classification. Trans Assoc Comput Linguist 7:139–155

    Article  Google Scholar 

  18. Prabhu Y, Kag A, Gopinath S et al (2018) Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In: Proceedings of the eleventh ACM international conference on web search and data mining, pp 441–449

  19. Qin S, Wu H, Nie R et al (2020) Deep model with neighborhood-awareness for text tagging. Knowl Based Syst 196(105):750

    Google Scholar 

  20. Rendle S, Freudenthaler C, Gantner Z et al (2009) BPR: Bayesian personalized ranking from implicit feedback. In: Bilmes JA, Ng AY (eds) UAI 2009, proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, Montreal, QC, Canada, June 18–21, 2009. AUAI Press, pp 452–461

  21. Sanh V, Debut L, Chaumond J et al (2019) Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR. arxiv: https://arxiv.org/abs/1910.01108

  22. Sun S, Cheng Y, Gan Z et al (2019) Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp 4322–4331

  23. Wang G, Li C, Wang W et al (2018) Joint embedding of words and labels for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, vol 1, Melbourne, Australia, July 15–20, 2018, long papers, pp 2321–2331

  24. Wang H, Chen B, Li W (2013) Collaborative topic regression with social regularization for tag recommendation. In: IJCAI 2013, Proceedings of the 23rd international joint conference on artificial intelligence, Bei**g, China, August 3–9, 2013, pp 2719–2725

  25. Wang R, Ridley R, Su X et al (2021) A novel reasoning mechanism for multi-label text classification. Inf Process Manag 58(2):102,441

    Article  Google Scholar 

  26. Weston J, Chopra S, Adams K (2014) # tagspace: semantic embeddings from hashtags. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1822–1827

  27. Wu H, Qin S, Nie R et al (2022) Effective collaborative representation learning for multilabel text categorization. IEEE Trans Neural Netw Learn Syst 33(10):5200–5214

    Article  Google Scholar 

  28. **ao L, Zhang X, **g L et al (2021) Does head label help for long-tailed multi-label text classification. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021. AAAI Press, pp 14,103–14,111

  29. Yang Z, Yang D, Dyer C et al (2016) Hierarchical attention networks for document classification. In: NAACL HLT 2016, The 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego California, USA, June 12–17, 2016, pp 1480–1489

  30. You R, Zhang Z, Wang Z et al (2019) AttentionXML: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In: Vancouver BC (ed) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019. Canada, pp 5812–5822

  31. Zhang Q, Zhang X, Yan Z et al (2021) Correlation-guided representation for multi-label text classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence, IJCAI 2021, virtual event/Montreal, Canada, 19–27 August 2021. ijcai.org, pp 3363–3369

  32. Zhang X, Zhang Q, Yan Z et al (2021) Enhancing label correlation feedback in multi-label text classification via multi-task learning. In: Findings of the association for computational linguistics: ACL/IJCNLP 2021, online event, August 1–6, 2021, Findings of ACL, vol ACL/IJCNLP 2021. Association for Computational Linguistics, pp 1190–1200

  33. Zhou P, Qi Z, Zheng S et al (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: COLING 2016, 26th international conference on computational linguistics, proceedings of the conference: technical papers, December 11–16, 2016, Osaka, Japan, pp 3485–3495

  34. Zhou P, Shi W, Tian J et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, vol 2, short papers, August 7–12, 2016, Berlin, Germany

  35. Zubiaga A (2012) Enhancing navigation on wikipedia with social tags. CoRR. https://arxiv.org/abs/1202.5469

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (62062066, 61962061), partially supported by the Key Program of Basic Research of Yunnan Province (202201AS070015), the Yunnan Provincial Foundation for Leaders of Disciplines in Science and Technology (202005AC160005), Top Young Talents of “Ten Thousand Plan” in Yunnan Province, the Program for Excellent Young Talents of Yunnan University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hao Wu or Lihua Zhou.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, S., Wu, H., Zhou, L. et al. Learning metric space with distillation for large-scale multi-label text classification. Neural Comput & Applic 35, 11445–11458 (2023). https://doi.org/10.1007/s00521-023-08308-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08308-3

Keywords

Navigation