Curriculum pre-training for stylized neural machine translation

Zou, Aixiao; Wu, Xuanxuan; Li, **njie; Zhang, Ting; Cui, Fuwei; Xu, **an

doi:10.1007/s10489-024-05586-9

Curriculum pre-training for stylized neural machine translation

Published: 18 June 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Aixiao Zou¹,
Xuanxuan Wu²,
**njie Li³,
Ting Zhang³,
Fuwei Cui⁴ &
…
**an Xu ORCID: orcid.org/0000-0003-0170-626X²

61 Accesses
Explore all metrics

Abstract

Stylized neural machine translation (NMT) aims to translate sentences of one style into sentences of another style, it is essential for the application of machine translation in a real-world scenario. Most existing methods employ an encoder-decoder structure to understand, translate, and transform style simultaneously, which increases the learning difficulty of the model and leads to poor generalization ability. To address these issues, we propose a curriculum pre-training framework to improve stylized NMT. Specifically, we design four pre-training tasks of increasing difficulty to assist the model to extract more features essential for stylized translation. Then, we further propose a stylized-token aligned data augmentation method to expand the scale of pre-training corpus for alleviating the data-scarcity problem. Experiments show that our method achieves competitive results on MTFC and Modern-Classical translation dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Review-Based Curriculum Learning for Neural Machine Translation

Incorporating Translation Quality Estimation into Chinese-Korean Neural Machine Translation

Low-Resource Neural Machine Translation Using XLNet Pre-training Model

Notes

References

Cohn T, Lapata M (2007) Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of the 45th annual meeting of the association of computational linguistics, pp 728–735
Chen Y, Liu Y, Cheng Y, Li VOK (2017) A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics, vol 1 pp 1925-1935. https://doi.org/10.18653/v1/P17-1176
Yang J, Wang M, Zhou H, Zhao C, Zhang W, Yu Y, Li L (2020) Towards making the most of bert in neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol 34 pp 9378–9385. https://doi.org/10.1609/aaai.v34i05.6479
Zhu J, **a Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T (2020) Incorporating BERT into neural machine translation. International conference on learning representations
Wu Y, Wang Y, Liu S (2020) A dataset for low-resource stylized sequence-to-sequence generation. In: Proceedings of the AAAI conference on artificial intelligence vol 34, pp 9290–9297
Niu X, Rao S, Carpuat M (2018) Multi-task neural models for translating between styles within and across languages. In: Proceedings of the 27th international conference on computational linguistics, pp 1008–1021
Niu X, Carpuat M (2020) Controlling neural machine translation formality with synthetic supervision. In: Proceedings of the AAAI conference on artificial intelligence, vol 34 pp 8568–8575
Wu X, Liu J, Li X, Xu J, Chen Y, Zhang Y, Huang H (2021) Improving stylized neural machine translation with iterative dual knowledge transfer. In: Proceedings of the thirtieth international joint conference on artificial intelligence, pp 3971-3977. https://doi.org/10.24963/ijcai.2021/547
Li J, Jia R, He H, Liang P (2018) Delete, retrieve, generate: a simple approach to sentiment and style transfer. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 1865–1874. https://doi.org/10.18653/v1/N18-1169
Sudhakar A, Upadhyay B, Maheswaran A (2019) Transforming delete, retrieve, generate approach for controlled text style transfer. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3269-3279. https://doi.org/10.18653/v1/D19-1322
Yang Z, Hu Z, Dyer C, **ng EP, Berg-Kirkpatrick T (2018) Unsupervised text style transfer using language models as discriminators. Adv Neural Inf Process Syst 31:7298–7309
Google Scholar
Wang Ke, Hua Hang, Wan **aojun (2019) Controllable unsupervised text attribute transfer via editing entangled latent representation. Adv Neural Inf Process Syst 32:11034–11044
Google Scholar
Fu Z, Tan X, Peng N, Zhao D, Yan R (2018) Style transfer in text: exploration and evaluation. In: Proceedings of the AAAI conference on artificial intelligence, vol 32 pp 663–670
Rao S, Tetreault J (2018) Dear sir or madam, may i introduce the gyafc dataset: corpus, benchmarks and metrics for formality style transfer. In: Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 129–140 https://doi.org/10.18653/v1/N18-1012
Wang Y, Wu Y, Mou L, Li Z, Chao W (2019) Harnessing pre-trained neural networks with rules for formality style transfer. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, pp 3573–3578. https://doi.org/10.18653/v1/D19-1365
Chawla K, Yang D (2020) Semi-supervised formality style transfer using language model discriminator and mutual information maximization. In: Findings of the association for computational linguistics: EMNLP 2020, pp 2340-2354. https://doi.org/10.18653/v1/2020.findings-emnlp.212
Zhang Y, Ge T, Sun X (2020) Parallel data augmentation for formality style transfer. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3221-3228. https://doi.org/10.18653/v1/2020.acl-main.294
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41-48
Wang X, Chen Y, Zhu W (2022) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44(9):4555–4576. https://doi.org/10.1109/TPAMI.2021.3069908
Article Google Scholar
Zhang X, Shapiro P, Kumar G, McNamee P, Carpuat M, Duh K (2019) Curriculum learning for domain adaptation in neural machine translation. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1, pp 1903–1915. https://doi.org/10.18653/v1/N19-1189
Wang W, Tian Y, Ngiam J, Yang Y, Caswell I, Parekh Z (2020) Learning a multi-domain curriculum for neural machine translation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7711-7723. https://doi.org/10.18653/v1/2020.acl-main.689
Wang C, Wu Y, Liu S, Zhou M, Yang Z (2020) Curriculum pre-training for end-to-end speech translation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 3728-3738. https://doi.org/10.18653/v1/2020.acl-main.344
Kenton J, Devlin M-WC, Toutanova LK (2019) BERT: Pre-training of deep bidirectional transformers for language understandingg. In: Proceedings of the 2019 Conference of the North American Chapter of the association for computational linguistics: human language technologies, vol 1 pp 4171–4186. https://doi.org/10.18653/v1/N19-1423
Conneau A, Lample G (2019) Cross-lingual language model pretraining. Adv Neural Inf Process Sys 32:7057–7067
Google Scholar
Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871-7880. https://doi.org/10.18653/v1/2020.acl-main.703
Jawahar G, Sagot B, Seddah D (2019) What does BERT learn about the structure of language? In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 3651–3657. https://doi.org/10.18653/v1/P19-1356
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311-318. https://doi.org/10.3115/1073083.1073135
Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, Polosukhin Illia (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008. https://doi.org/10.18653/v1/P16-1009
Article Google Scholar
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, pp 1715–1725. https://doi.org/10.18653/v1/P16-1162
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations
Sennrich R, Haddow B, Birch A (2016) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation: volume 2, association for computational linguistics, pp 371-376. https://doi.org/10.18653/v1/w16-2323
Yiwei L, Tiange L, Jiacheng S, Todd H, Honglak L (2023) Fine-grained text style transfer with diffusion-based language models. In: Proceedings of the 8th workshop on representation learning for NLP, association for computational linguistics, pp 65–74. https://aclanthology.org/2023.repl4nlp-1.6
Emily R, Daphne I, Ann Y, Andy C, Chris C-B, Jason W (2022) A recipe for arbitrary text style transfer with large language models. In: Proceedings of the 60th annual meeting of the association for computational linguistics (volume 2: short papers), 837-848. https://doi.org/10.18653/v1/2022.acl-short.94

Download references

Funding

The research work descried in this paper has been supported by the National Nature Science Foundation of China (No. 62376019, 61976015, 61976016, 61876198 and 61370130) and Youth Research Fund Project of Bei**g Wuzi University (No. 2023XJQN10). The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper.

Author information

Authors and Affiliations

School of Information, Bei**g Wuzi University, No. 321, Fuhe Street, Tongzhou Bei**g, 101149, China
Aixiao Zou
School of Computer Information Technology, Bei**g Jiaotong University, No.3 Shangyuan Road, Haidian Bei**g, 100044, China
Xuanxuan Wu & **an Xu
Global Tone Communication Technology Co., Ltd, China Railway Construction Building, No. 20, Shi**gshan Road, Shi**gshan Bei**g, 100049, China
**njie Li & Ting Zhang
Institute of Automation, Chinese Academy of Sciences, No.95 Zhongguancun East Road, Haidian Bei**g, 100089, China
Fuwei Cui

Authors

Aixiao Zou
View author publications
You can also search for this author in PubMed Google Scholar
Xuanxuan Wu
View author publications
You can also search for this author in PubMed Google Scholar
**njie Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fuwei Cui
View author publications
You can also search for this author in PubMed Google Scholar
**an Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to **an Xu.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zou, A., Wu, X., Li, X. et al. Curriculum pre-training for stylized neural machine translation. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05586-9

Download citation

Accepted: 01 June 2024
Published: 18 June 2024
DOI: https://doi.org/10.1007/s10489-024-05586-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Curriculum pre-training for stylized neural machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Review-Based Curriculum Learning for Neural Machine Translation

Incorporating Translation Quality Estimation into Chinese-Korean Neural Machine Translation

Low-Resource Neural Machine Translation Using XLNet Pre-training Model

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Curriculum pre-training for stylized neural machine translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Review-Based Curriculum Learning for Neural Machine Translation

Incorporating Translation Quality Estimation into Chinese-Korean Neural Machine Translation

Low-Resource Neural Machine Translation Using XLNet Pre-training Model

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation