Log in

Granular Syntax Processing with Multi-Task and Curriculum Learning

  • Research
  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Syntactic processing techniques are the foundation of natural language processing (NLP), supporting many downstream NLP tasks. In this paper, we conduct pair-wise multi-task learning (MTL) on syntactic tasks with different granularity, namely Sentence Boundary Detection (SBD), text chunking, and Part-of-Speech (PoS) tagging, so as to investigate the extent to which they complement each other. We propose a novel soft parameter-sharing mechanism to share local and global dependency information that is learned from both target tasks. We also propose a curriculum learning (CL) mechanism to improve MTL with non-parallel labeled data. Using non-parallel labeled data in MTL is a common practice, whereas it has not received enough attention before. For example, our employed PoS tagging data do not have text chunking labels. When learning PoS tagging and text chunking together, the proposed CL mechanism aims to select complementary samples from the two tasks to update the parameters of the MTL model in the same training batch. Such a method yields better performance and learning stability. We conclude that the fine-grained tasks can provide complementary features to coarse-grained ones, while the most coarse-grained task, SBD, provides useful information for the most fine-grained one, PoS tagging. Additionally, the text chunking task achieves state-of-the-art performance when joint learning with PoS tagging. Our analytical experiments also show the effectiveness of the proposed soft parameter-sharing and CL mechanisms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Woolf BP. Chapter 5 - Communication knowledge. In: Woolf BP, editor. Building intelligent interactive tutors. San Francisco: Morgan Kaufmann; 2009. pp. 136–82.

  2. Cambria E, Mao R, Chen M, Wang Z, Ho S-B. Seven pillars for the future of Artificial Intelligence. IEEE Intell Syst. 2023;38(6):62–9.

    Article  Google Scholar 

  3. Matsoukas S, Bulyko I, **ang B, Nguyen K, Schwartz R, Makhoul J. Integrating speech recognition and machine translation. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (vol. 4). IEEE; 2007. p. 1281.

  4. Zhou N, Wang X, Aw A. Dynamic boundary detection for speech translation. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (vol. 2017). IEEE; 2017. pp. 651–6.

  5. Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A. Information retrieval and text mining technologies for chemistry. Chemical Rev. 2017;117(12):7673–761.

    Article  Google Scholar 

  6. **g H, Lopresti D, Shih C. Summarization of noisy documents: a pilot study. In: Proceedings of the HLT-NAACL 03 Text Summarization Workshop. 2003. pp. 25–32.

  7. Boudin F, Huet S, Torres-Moreno J-M. A graph-based approach to cross-language multi-document summarization. Polibits. 2011;43:113–8.

    Article  Google Scholar 

  8. Councill I, McDonald R, Velikovich L. What’s great and what’s not: Learning to classify the scope of negation for improved sentiment analysis. In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing. 2010. pp. 51–9.

  9. Gupta H, Kottwani A, Gogia S, Chaudhari S. Text analysis and information retrieval of text data. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE; 2016. pp. 788–92.

  10. Syed AZ, Aslam M, Martinez-Enriquez AM. Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text. Artif Intell Rev. 2014;41(4):535–61.

    Article  Google Scholar 

  11. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.

    Google Scholar 

  12. Sun X, Sun S, Yin M, Yang H. Hybrid neural conditional random fields for multi-view sequence labeling. Knowl-Based Syst. 2020;189:105151.

    Article  Google Scholar 

  13. Dozat T, Manning CD. Deep biaffine attention for neural dependency parsing. ar**v:1611.01734 [Preprint]. 2016. Available from: http://arxiv.org/abs/1611.01734.

  14. Zhou H, Zhang Y, Li Z, Zhang M. Is POS tagging necessary or even helpful for neural dependency parsing? 2020.

  15. Mahmood A, Khan HU, Zahoor-ur-Rehman, Khan W. Query based information retrieval and knowledge extraction using hadith datasets. In: 2017 13th International Conference on Emerging Technologies (ICET). 2017. pp. 1–6. https://doi.org/10.1109/ICET.2017.8281714.

  16. Asghar MZ, Khan A, Ahmad S, Kundi FM. A review of feature extraction in sentiment analysis. J Basic Appl Scientific Res. 2014;4(3):181–6.

    Google Scholar 

  17. Cambria E, Zhang X, Mao R, Chen M, Kwok K. SenticNet 8: Fusing emotion AI and commonsense AI for interpretable, trustworthy, and explainable affective computing. In: Proceedings of the 26th International Conference on Human-computer Interaction (HCII). 2024.

  18. Mao R, Lin C, Guerin F. Word embedding and WordNet based metaphor identification and interpretation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1). 2018. pp. 1222–31.

  19. Ge M, Mao R, Cambria E. Explainable metaphor identification inspired by conceptual metaphor theory. In: Proceedings of AAAI. 2022. pp. 10681–9.

  20. Mao R, Li X, He K, Ge M, Cambria E. MetaPro Online: a computational metaphor processing online system. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). Toronto: Association for Computational Linguistics; 2023. pp. 127–35. https://aclanthology.org/2023.acl-demo.12.

  21. Akbik A, Blythe D, Vollgraf R. Contextual string embeddings for sequence labeling. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. pp. 1638–49.

  22. Wang X, Jiang Y, Bach N, Wang T, Huang Z, Huang F, Tu K. Automated concatenation of embeddings for structured prediction. ar**v:2010.05006 [Preprint]. 2020. Available from: http://arxiv.org/abs/2010.05006.

  23. Wong DF, Chao LS, Zeng X. iSentenizer-: Multilingual sentence boundary detection model. Scientific World J. 2014;2014.

  24. Zhang X, Mao R, Cambria E. A survey on syntactic processing techniques. Artif Intell Rev. 2023;56(6):5645–728.

    Article  Google Scholar 

  25. Chen J, Qiu X, Liu P, Huang X. Meta multi-task learning for sequence modeling. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018. p. 32.

  26. Yang Z, Salakhutdinov R, Cohen WW. Transfer learning for sequence tagging with hierarchical recurrent networks. ar**v:1703.06345 [Preprint]. 2017. Available from: http://arxiv.org/abs/1703.06345.

  27. Bender E.M, Koller A. Climbing towards NLU: On meaning, form, and understanding in the age of data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. pp. 5185–98.

  28. Mao R, Chen G, Zhang X, Guerin F, Cambria E. GPTEval: A survey on assessments of ChatGPT and GPT-4. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING), Torino, Italia. 2024. pp. 7844–66.

  29. Cambria E, Poria S, Gelbukh A, Thelwall M. Sentiment analysis is a big suitcase. IEEE Intell Syst. 2017;32(6):74–80. https://doi.org/10.1109/MIS.2017.4531228.

    Article  Google Scholar 

  30. Marcus MP, Santorini B, Marcinkiewicz MA. Building a large annotated corpus of English: the Penn Treebank. Comput Linguist. 1993;19(2):313–30.

    Google Scholar 

  31. Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009). 2009. pp. 147–55.

  32. Che X, Wang C, Yang H, Meinel C. Punctuation prediction for unsegmented transcript based on word vector. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2016. pp. 654–58.

  33. Mao R, Li X. Bridging towers of multi-task learning with a gating mechanism for aspect-based sentiment analysis and sequential metaphor identification. Proc AAAI Conf Artif Intell. 2021;35:13534–42.

    Google Scholar 

  34. Ruder S. An overview of multi-task learning in deep neural networks. ar**v:1706.05098 [Preprint]. 2017. Available from: http://arxiv.org/abs/1706.05098.

  35. Chen S, Zhang Y, Yang Q. Multi-task learning in natural language processing: an overview. ar**v:2109.09138 [Preprint]. 2021. Available from: http://arxiv.org/abs/2109.09138.

  36. Sang EF, Buchholz S. Introduction to the CoNLL-2000 shared task: chunking. In: Fourth Conference on Computational Natural Language Learning and the Second Learning Language in Logic Workshop. ConLL ’00. Association for Computational Linguistics; 2000. pp. 127–32. https://doi.org/10.3115/1117601.1117631.

  37. Le D, Thai M, Nguyen T. Multi-task learning for metaphor detection with graph convolutional neural networks and word sense disambiguation. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020. pp. 8139–46.

  38. Zhang Z, Yu W, Yu M, Guo Z, Jiang M. A survey of multi-task learning in natural language processing: regarding task relatedness and training methods. ar**v:2204.03508 [Preprint]. 2022. Available from: http://arxiv.org/abs/2204.03508.

  39. Bhat S, Debnath A, Banerjee S, Shrivastava M. Word embeddings as tuples of feature probabilities. In: Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online; 2020. pp. 24–33. https://doi.org/10.18653/v1/2020.repl4nlp-1.4, https://aclanthology.org/2020.repl4nlp-1.4.

  40. Grefenstette G, Tapanainen P. What is a word, what is a sentence? Problems of tokenisation. Report, Grenoble Laboratory; 1994.

  41. Stamatatos E, Fakotakis N, Kokkinakis G. Automatic extraction of rules for sentence boundary disambiguation. In: Proceedings of the Workshop on Machine Learning in Human Language Technology. Citeseer; 1999. pp. 88–92.

  42. Sadvilkar N, Neumann M. PySBD: pragmatic sentence boundary disambiguation. ar**v:2010.09657 [Preprint]. 2020. Available from: http://arxiv.org/abs/2010.09657.

  43. Knoll BC, Lindemann EA, Albert AL, Melton GB, Pakhomov SVS. Recurrent deep network models for clinical NLP tasks: Use case with sentence boundary disambiguation. Stud Health Technol Inf. 2019;264(31437913):198–202. https://doi.org/10.3233/SHTI190211.

    Article  Google Scholar 

  44. Makhija K, Ho T-N, Chng E-S. Transfer learning for punctuation prediction. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (vol. 2019). IEEE; 2019. pp. 268–73.

  45. Alam T, Khan A, Alam F. Punctuation restoration using transformer models for high-and low-resource languages. In: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 2020. pp. 132–42.

  46. Palmer DD, Hearst MA. Adaptive multilingual sentence boundary disambiguation. Comput Linguist. 1997;23(2):241–67.

    Google Scholar 

  47. Mikheev A. Tagging sentence boundaries. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics. 2000.

  48. Agarwal N, Ford KH, Shneider M. Sentence boundary detection using a maxEnt classifier. In: Proceedings of MISC. 2005. pp. 1–6.

  49. Ramshaw LA, Marcus M. Text chunking using transformation-based learning. In: Yarowsky D, Church K, editors. Third Workshop on Very Large Corpora. 1995. https://aclanthology.org/W95-0107/.

  50. Sutton C, McCallum A, Rohanimanesh K. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. J Mach Learn Res. 2007;8(3).

  51. Sun X, Morency L-P, Okanohara D, Tsuruoka Y, Tsujii J. Modeling latent-dynamic in shallow parsing: a latent conditional model with improved inference. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). 2008. pp. 841–8.

  52. Lin JC-W, Shao Y, Zhang J, Yun U. Enhanced sequence labeling based on latent variable conditional random fields. Neurocomputing. 2020;403:431–40.

    Article  Google Scholar 

  53. Liu Y, Li G, Zhang X. Semi-Markov CRF model based on stacked neural Bi-LSTM for sequence labeling. In: 2020 IEEE 3rd International Conference of Safe Production and Informatization (IICSPI). 2020. pp. 19–23. https://doi.org/10.1109/IICSPI51290.2020.9332321.

  54. Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. ar**v:1508.01991 [Preprint]. 2015. Available from: http://arxiv.org/abs/1508.01991.

  55. Rei M. Semi-supervised multitask learning for sequence labeling. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: Association for Computational Linguistics; 2017. pp. 2121–30. https://doi.org/10.18653/v1/P17-1194, https://aclanthology.org/P17-1194.

  56. Zhai F, Potdar S, **ang B, Zhou B. Neural models for sequence chunking. ar**v:1701.04027 [Preprint]. 2017. Available from: http://arxiv.org/abs/1701.04027.

  57. Yang Z, Salakhutdinov R, Cohen W. Multi-task cross-lingual sequence tagging from scratch. ar**v:1603.06270 [Preprint]. 2016. Available from: http://arxiv.org/abs/1603.06270.

  58. Wei W, Wang Z, Mao X, Zhou G, Zhou P, Jiang S. Position-aware self-attention based neural sequence labeling. Pattern Recognit. 2021;110:107636.

    Article  Google Scholar 

  59. Church KW. A stochastic parts program and noun phrase parser for unrestricted text. In: Second Conference on Applied Natural Language Processing. Austin: Association for Computational Linguistics; 1988. pp. 136–43. https://doi.org/10.3115/974235.974260, https://www.aclweb.org/anthology/A88-1019.

  60. Kupiec J. Robust part-of-speech tagging using a hidden Markov model. Comput Speech Lang. 1992;6(3):225–42. https://doi.org/10.1016/0885-2308(92)90019-Z.

    Article  Google Scholar 

  61. Brants T. TnT-a statistical part-of-speech tagger. ar**v:cs/0003055 [Preprint]. 2000. Available from: http://arxiv.org/abs/cs/0003055.

  62. McCallum A, Freitag D, Pereira FC. Maximum entropy Markov models for information extraction and segmentation. In: Icml (vol. 17). 2000. pp. 591–8.

  63. Lafferty JD, McCallum A, Pereira FCN. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. ICML ’01. San Francisco: Morgan Kaufmann Publishers Inc.; 2001. pp. 282–9.

  64. Dos Santos C, Zadrozny B. Learning character-level representations for part-of-speech tagging. In: International Conference on Machine Learning. PMLR; 2014. pp. 1818–26.

  65. Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. ar**v:1603.01354 [Preprint]. 2016. Available from: http://arxiv.org/abs/1603.01354.

  66. Chiu JP, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Trans Assoc Computat Linguist. 2016;4:357–70.

    Article  Google Scholar 

  67. Zhao L, Qiu X, Zhang Q, Huang X. Sequence labeling with deep gated dual path CNN. IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2326–35.

    Article  Google Scholar 

  68. Ruder S. An overview of multi-task learning in deep neural networks. ar**v:1706.05098 [Preprint]. 2017. Available from: http://arxiv.org/abs/1706.05098.

  69. Ma Y, Mao R, Lin Q, Wu P, Cambria E. Quantitative stock portfolio optimization by multi-task learning risk and return. Inf Fusion. 2024;104:102165. https://doi.org/10.1016/j.inffus.2023.102165.

    Article  Google Scholar 

  70. He K, Mao R, Gong T, Li C, Cambria E. Meta-based self-training and re-weighting for aspect-based sentiment analysis. IEEE Trans Affective Comput. 2023;14(3):1731–42. https://doi.org/10.1109/TAFFC.2022.3202831.

    Article  Google Scholar 

  71. Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. ar**v:1605.05101 [Preprint]. 2016. Available from: http://arxiv.org/abs/1605.05101.

  72. Zhao S, Liu T, Zhao S, Wang F. A neural multi-task learning framework to jointly model medical named entity recognition and normalization. In: Proceedings of the AAAI Conference on Artificial Intelligence (vol. 33). 2019. pp. 817–24.

  73. Soviany P, Ionescu RT, Rota P. Sebe N. Curriculum learning: a survey. Int J Comput Vis. 2022:1–40.

  74. Ma F, Meng D, **e Q, Li Z, Dong X. Self-paced co-training. In: International Conference on Machine Learning. PMLR; 2017. pp. 2275–84.

  75. Zhang X, Kumar G, Khayrallah H, Murray K, Gwinnup J, Martindale MJ, McNamee P, Duh K, Carpuat M. An empirical exploration of curriculum learning for neural machine translation. ar**v:1811.00739 [Preprint]. 2018. Available from: http://arxiv.org/abs/1811.00739.

  76. Wang W, Caswell I, Chelba C. Dynamically composing domain-data selection with clean-data selection by “co-curricular learning” for neural machine translation. ar**v:1906.01130 [Preprint]. 2019. Available from: http://arxiv.org/abs/1906.01130.

  77. Kocmi T, Bojar O. Curriculum learning and minibatch bucketing in neural machine translation. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP (vol. 2017). 2017. pp. 379–86.

  78. Liu C, He S, Liu K, Zhao J, et al. Curriculum learning for natural answer generation. In: IJCAI. 2018. pp. 4223–9.

  79. Wu L, Tian F, **a Y, Fan Y, Qin T, Jian-Huang L, Liu T-Y. Learning to teach with dynamic loss functions. Adv Neural Inf Process Syst. 2018;31.

  80. Hacohen G, Weinshall D. On the power of curriculum learning in training deep networks. In: International Conference on Machine Learning. PMLR; 2019. pp. 2535–44.

  81. Zhang M, Yu Z, Wang H, Qin H, Zhao W, Liu Y. Automatic digital modulation classification based on curriculum learning. Appl Sci. 2019;9(10):2171.

    Article  Google Scholar 

  82. Sangineto E, Nabi M, Culibrk D, Sebe N. Self paced deep learning for weakly supervised object detection. IEEE Trans Pattern Anal Mach Intell. 2018;41(3):712–25.

    Article  Google Scholar 

  83. Kim D, Bae J, Jo Y, Choi J. Incremental learning with maximum entropy regularization: rethinking forgetting and intransigence. ar**v:1902.00829 [Preprint]. 2019. Available from: http://arxiv.org/abs/1902.00829.

  84. Castells T, Weinzaepfel P, Revaud J. Superloss: a generic loss for robust curriculum learning. Adv Neural Inf Process Syst. 2020;33:4308–19.

    Google Scholar 

  85. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.

  86. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M. Transformers in vision: a survey. ACM Comput Surv (CSUR). 2021.

  87. Mao R, Li X, Ge M, Cambria E. Metapro: a computational metaphor processing model for text pre-processing. Inf Fusion. 2022;86–87:30–43. https://doi.org/10.1016/j.inffus.2022.06.002.

    Article  Google Scholar 

  88. Forney GD. The Viterbi algorithm. Proc IEEE. 1973;61(3):268–78.

    Article  MathSciNet  Google Scholar 

  89. Tilk O, Alumäe T. Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Interspeech (vol. 3). 2016. p. 9.

  90. Guo Q, Qiu X, Liu P, Shao Y, Xue X, Zhang Z. Star-transformer. ar**v:1902.09113 [Preprint]. 2019. Available from: http://arxiv.org/abs/1902.09113.

  91. Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. pp. 1532–43.

  92. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805 [Preprint]. 2018. Availalble from: http://arxiv.org/abs/1810.04805.

  93. Dankers V, Rei M, Lewis M, Shutova E. Modelling the interplay of metaphor and emotion through multitask learning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. pp. 2218–29.

  94. Alqahtani S, Mishra A, Diab M. A multitask learning approach for diacritic restoration. ar**v:2006.04016 [Preprint]. 2020. Available from: http://arxiv.org/abs/2006.04016.

  95. Collins M. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, Philadelphia, PA, USA, July 6-7, 2002. pp. 1–8. https://doi.org/10.3115/1118693.1118694, https://aclanthology.org/W02-1001/.

  96. Kingma DP, Ba J. Adam: a method for stochastic optimization. ar**v:1412.6980 [Preprint]. 2014. Available from: http://arxiv.org/abs/1412.6980.

Download references

Author information

Authors and Affiliations

Authors

Contributions

X.L. and R.M. both contributed to the conceptualization and methodology of the study. X.L. conducted the experiments, and wrote the manuscript. R.M. revised the manuscript. E.C. supervised the study, and edited and approved the final version of the manuscript.

Corresponding author

Correspondence to Erik Cambria.

Ethics declarations

Ethics Approval

This paper does not contain any studies with human participants or animals performed by any of the authors.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Mao, R. & Cambria, E. Granular Syntax Processing with Multi-Task and Curriculum Learning. Cogn Comput (2024). https://doi.org/10.1007/s12559-024-10320-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12559-024-10320-1

Keywords

Navigation