Log in

Fitting and sharing multi-task learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-Task Learning is an effective method for learning cross-task knowledge. However, existing methods cannot fairly treat each task, their public parts are prone to continuously fit new tasks and decrease the performances of previous tasks. In this paper, we propose the Fitting-sharing Multi-Task Learning method to address this problem. In the Fitting step, a group of indicator parameters are trained to extract task-specific features and store them into an in-task template matrix. After all models converge, the indicators and templates are frozen to protect the learned knowledge. In the Sharing step, a group of connector parameters are trained to acquire information from other templates and to reason cross-task knowledge. Since the learning and sharing processes are separate, each model can acquire the learned knowledge from other tasks without affect them, and the imbalanced cross-task knowledge problem can be naturally avoided. Experimental results on public datasets illustrate that the proposed method can insistently improve the performance compared with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/Youtube+cookery+channels+viewers+comments+in+Hinglish

  2. https://www.kaggle.com/kabirnagpal/flipkart-customer-review-and-rating

  3. https://www.kaggle.com/surajkum1198/twitterdata

  4. https://www.kaggle.com/lishaoshao/tweet-sentiment-extraction-wpf?select=test.csv

  5. https://www.kaggle.com/ishantjuyal/emotions-in-text

  6. https://www.kaggle.com/shoumikgoswami/annotated-gmb-corpus

References

  1. Yu T, Kumar S, Gupta A, Levine S, Hausman K, Finn C (2020) Gradient surgery for multi-task learning. Adv Neural Inf Process Syst 33:5824–5836

  2. Vandenhende S, Georgoulis S, Van Gool L (2020) Mti-net: multi-scale task interaction networks for multi-task learning. ECCV 2020: Computer Vision–ECCV 2020 12349:527–543. Springer Nature Switzerland AG

  3. Gao M, Li J-Y, Chen C-H, Li Y, Zhang J, Zhan Z-H (2023) Enhanced multi-task learning and knowledge graph-based recommender system. IEEE Trans Knowl Data Eng 35(10):10281–10294. Institute of Electrical and Electronics Engineers

  4. Lin B, Zhang Y (2023) Libmtl: a python library for deep multi-task learning. J Mach Learn Res 24(1–7):18

  5. Xu Y, Yang Y, Zhang L (2023) Demt: deformable mixer transformer for multi-task learning of dense prediction. In: Proceedings of the thirty-seventh AAAI conference on artificial intelligence and thirty-fifth conference on innovative applications of artificial intelligence and thirteenth symposium on educational advances in artificial intelligence, pp 3072–3080

  6. Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S (2019) Parameter-efficient transfer learning for NLP. In: International conference on machine learning, pp 2790–2799. PMLR

  7. Ma J, Zhao Z, Chen J, Li A, Hong L, Chi EH (2019) Snr: sub-network routing for flexible parameter sharing in multi-task learning. In: Proceedings of the AAAI conference on artificial intelligence 33(1):216–223

  8. Guo P, Lee C-Y, Ulbricht D (2020) Learning to branch for multi-task learning. In: International conference on machine learning, pp 3854–3863. PMLR

  9. Liu B, Liu X, ** X, Stone P, Liu Q (2021) Conflict-averse gradient descent for multi-task learning. Adv Neural Inf Process Syst 34:18878–18890

  10. Chai H, Cui J, Wang Y, Zhang M, Fang B, Liao Q (2023) Improving gradient trade-offs between tasks in multi-task text classification. In: Proceedings of the 61st annual meeting of the association for computational linguistics, pp 2565–2579

  11. Fifty C, Amid E, Zhao Z, Yu T, Anil R, Finn C (2021) Efficiently identifying task grou**s for multi-task learning. Adv Neural Inf Process Syst 34:27503–27516

    Google Scholar 

  12. Gueta A, Venezian E, Raffel C, Slonim N, Katz Y, Choshen L (2023) Knowledge is a region in weight space for fine-tuned language models. In: Findings of the association for computational linguistics: EMNLP 2023, pp 1350–1370

  13. Tripathi S, Singh C, Kumar A, Pandey C, Jain N (2019) Bidirectional transformer based multi-task learning for natural language understanding. In: Natural language processing and information systems: 24th international conference on applications of natural language to information systems, NLDB 2019, Salford, UK, June 26–28, 2019, Proceedings 24, pp 54–65. Springer

  14. Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L (2022) Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3614–3633

  15. Liu P, Qiu X, Huang X-J (2017) Adversarial multi-task learning for text classification. In: Proceedings of the 55th annual meeting of the association for computational linguistic, pp 1–10

  16. Qin Q, Hu W, Liu B (2020) Feature projection for improved text classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 8161–8171

  17. Romero R, Celard P, Sorribes-Fdez JM, Seara Vieira A, Iglesias EL, Borrajo L (2022) Mobydeep: a lightweight CNN architecture to configure models for text classification. Knowl-Based Syst 257:109914. Elsevier

  18. Zhang T, Gong X, Chen CLP (2021) Bmt-net: broad multitask transformer network for sentiment analysis. IEEE Trans Cybernet 52(7):6232–6243. IEEE

  19. Soni S, Chouhan SS, Rathore SS (2023) Textconvonet: a convolutional neural network based architecture for text classification. Appl Intell 53(11):14249–14268. Springer

  20. Su J, Ahmed M, Lu Y, Pan S, Bo W, Liu Y (2024) Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568:127063. Elsevier

  21. Merity S, **ong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. ar**v preprint ar**v:1609.07843

  22. Devlin J, Chang M-W, Lee K, Toutanova K (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp 4171–4186

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengkai Piao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Piao, C., Wei, J. Fitting and sharing multi-task learning. Appl Intell 54, 6918–6929 (2024). https://doi.org/10.1007/s10489-024-05549-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05549-0

Keywords

Navigation