Log in

Curriculum pre-training for stylized neural machine translation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Stylized neural machine translation (NMT) aims to translate sentences of one style into sentences of another style, it is essential for the application of machine translation in a real-world scenario. Most existing methods employ an encoder-decoder structure to understand, translate, and transform style simultaneously, which increases the learning difficulty of the model and leads to poor generalization ability. To address these issues, we propose a curriculum pre-training framework to improve stylized NMT. Specifically, we design four pre-training tasks of increasing difficulty to assist the model to extract more features essential for stylized translation. Then, we further propose a stylized-token aligned data augmentation method to expand the scale of pre-training corpus for alleviating the data-scarcity problem. Experiments show that our method achieves competitive results on MTFC and Modern-Classical translation dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://github.com/clab/fast_align

  2. https://github.com/MarkWuNLP/Data4StylizedS2S

  3. https://github.com/raosudha89/GYAFC-corpus

  4. https://github.com/NiuTrans/Classical-Modern

  5. https://github.com/kpu/kenlm

References

  1. Cohn T, Lapata M (2007) Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 728–735

  2. Chen Y, Liu Y, Cheng Y, Li VOK (2017) A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol 1 pp 1925-1935. https://doi.org/10.18653/v1/P17-1176

  3. Yang J, Wang M, Zhou H, Zhao C, Zhang W, Yu Y, Li L (2020) Towards making the most of bert in neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34 pp 9378–9385. https://doi.org/10.1609/aaai.v34i05.6479

  4. Zhu J, **a Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural machine translation. International conference on learning representations

  5. Wu Y, Wang Y, Liu S (2020) A dataset for low-resource stylized sequence-to-sequence generation. In: Proceedings of the AAAI conference on artificial intelligence vol 34, pp 9290–9297

  6. Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and across languages. In: Proceedings of the 27th international conference on computational linguistics, pp 1008–1021

  7. Niu X, Carpuat M (2020) Controlling neural machine translation formality with synthetic supervision. In: Proceedings of the AAAI conference on artificial intelligence, vol 34 pp 8568–8575

  8. Wu X, Liu J, Li X, Xu J, Chen Y, Zhang Y, Huang H (2021) Improving stylized neural machine translation with iterative dual knowledge transfer. In: Proceedings of the thirtieth international joint conference on artificial intelligence, pp 3971-3977. https://doi.org/10.24963/ijcai.2021/547

  9. Li J, Jia R, He H, Liang P (2018) Delete, retrieve, generate: a simple approach to sentiment and style transfer. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 1865–1874. https://doi.org/10.18653/v1/N18-1169

  10. Sudhakar A, Upadhyay B, Maheswaran A (2019) Transforming delete, retrieve, generate approach for controlled text style transfer. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3269-3279. https://doi.org/10.18653/v1/D19-1322

  11. Yang Z, Hu Z, Dyer C, **ng EP, Berg-Kirkpatrick T (2018) Unsupervised text style transfer using language models as discriminators. Adv Neural Inf Process Syst 31:7298–7309

    Google Scholar 

  12. Wang Ke, Hua Hang, Wan **aojun (2019) Controllable unsupervised text attribute transfer via editing entangled latent representation. Adv Neural Inf Process Syst 32:11034–11044

    Google Scholar 

  13. Fu Z, Tan X, Peng N, Zhao D, Yan R (2018) Style transfer in text: exploration and evaluation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 pp 663–670

  14. Rao S, Tetreault J (2018) Dear sir or madam, may i introduce the gyafc dataset: corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 129–140 https://doi.org/10.18653/v1/N18-1012

  15. Wang Y, Wu Y, Mou L, Li Z, Chao W (2019) Harnessing pre-trained neural networks with rules for formality style transfer. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3573–3578. https://doi.org/10.18653/v1/D19-1365

  16. Chawla K, Yang D (2020) Semi-supervised formality style transfer using language model discriminator and mutual information maximization. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2340-2354. https://doi.org/10.18653/v1/2020.findings-emnlp.212

  17. Zhang Y, Ge T, Sun X (2020) Parallel data augmentation for formality style transfer. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3221-3228. https://doi.org/10.18653/v1/2020.acl-main.294

  18. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41-48

  19. Wang X, Chen Y, Zhu W (2022) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44(9):4555–4576. https://doi.org/10.1109/TPAMI.2021.3069908

    Article  Google Scholar 

  20. Zhang X, Shapiro P, Kumar G, McNamee P, Carpuat M, Duh K (2019) Curriculum learning for domain adaptation in neural machine translation. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1, pp 1903–1915. https://doi.org/10.18653/v1/N19-1189

  21. Wang W, Tian Y, Ngiam J, Yang Y, Caswell I, Parekh Z (2020) Learning a multi-domain curriculum for neural machine translation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7711-7723. https://doi.org/10.18653/v1/2020.acl-main.689

  22. Wang C, Wu Y, Liu S, Zhou M, Yang Z (2020) Curriculum pre-training for end-to-end speech translation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3728-3738. https://doi.org/10.18653/v1/2020.acl-main.344

  23. Kenton J, Devlin M-WC, Toutanova LK (2019) BERT: Pre-training of deep bidirectional transformers for language understandingg. In: Proceedings of the 2019 Conference of the North American Chapter of the association for computational linguistics: human language technologies, vol 1 pp 4171–4186. https://doi.org/10.18653/v1/N19-1423

  24. Conneau A, Lample G (2019) Cross-lingual language model pretraining. Adv Neural Inf Process Sys 32:7057–7067

    Google Scholar 

  25. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871-7880. https://doi.org/10.18653/v1/2020.acl-main.703

  26. Jawahar G, Sagot B, Seddah D (2019) What does BERT learn about the structure of language? In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3651–3657. https://doi.org/10.18653/v1/P19-1356

  27. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311-318. https://doi.org/10.3115/1073083.1073135

  28. Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, Polosukhin Illia (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008. https://doi.org/10.18653/v1/P16-1009

    Article  Google Scholar 

  29. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162

  30. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations

  31. Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation: volume 2, association for computational linguistics, pp 371-376. https://doi.org/10.18653/v1/w16-2323

  32. Yiwei L, Tiange L, Jiacheng S, Todd H, Honglak L (2023) Fine-grained text style transfer with diffusion-based language models. In: Proceedings of the 8th workshop on representation learning for NLP, association for computational linguistics, pp 65–74. https://aclanthology.org/2023.repl4nlp-1.6

  33. Emily R, Daphne I, Ann Y, Andy C, Chris C-B, Jason W (2022) A recipe for arbitrary text style transfer with large language models. In: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 2: short papers), 837-848. https://doi.org/10.18653/v1/2022.acl-short.94

Download references

Funding

The research work descried in this paper has been supported by the National Nature Science Foundation of China (No. 62376019, 61976015, 61976016, 61876198 and 61370130) and Youth Research Fund Project of Bei**g Wuzi University (No. 2023XJQN10). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to **an Xu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, A., Wu, X., Li, X. et al. Curriculum pre-training for stylized neural machine translation. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05586-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05586-9

Keywords

Navigation