Abstract
Despite the increasingly good quality of automatic translations, machine-translated texts require corrections. Automatic post-editing models have been introduced to perform these corrections without human intervention. However, no system has been able to fully automate the post-editing process. Moreover, while numerous translation tools benefit from translators’ input, human–computer interaction has been underexplored in post-editing. This study discusses automatic post-editing models and suggests that they could be improved in more interactive scenarios, as previously done in machine translation. While some attempts were made to update automatic post-editing models incrementally, this was mostly done using synthetic corpora, which is likely to affect the performance. To address this issue and as part of this project, automatic post-editing models trained in a traditional setting were developed and updated in both batch and online modes without using synthetic resources, with a view to analysing the performance of incremental adaptations in different systems, domains and language pairs. While the interaction with the translator was simulated, an interactive functionality allowing for dynamic post-editing was included for demonstration purposes. The results showed that none of the models was able to beat the baseline and that the online models systematically yielded a lower performance. Moreover, a human evaluation identified recurrent error patterns. These outcomes confirm the difficulties faced by the task of automatic post-editing. Based on the results, several recommendations are put forward for conducting further research, including experiments with more data (possibly synthetic corpora) and different environmental variables.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
WMT: Workshop on Machine Translation. While the research community continued using this acronym, WMT is now a well-established international conference series (Conference on Machine Translation).
- 2.
TF-IDF: Term Frequency-Inverse Document Frequency. This measure was proposed by Salton, et al. (1975) in their theory of term importance, which demonstrated that the relevance of a word is subject not only to its frequency but also its specificity in a document.
References
Alabau, Vicent, Ragnar Bonk, Christian Buck, Michael Carl, Francisco Casacuberta, Mercedes García-Martínez, Jesús González, Philipp Koehn, Luis Leiva, Bartolomé Mesa-Lao, Daniel Ortiz, Herve Saint-Amand, Germán Sanchis, and Chara Tsoukala. 2013. CASMACAT: An open source workbench for advanced computer aided translation. The Prague Bulletin of Mathematical Linguistics 100 (1): 101–112.
Allen, Jeffrey, and Christopher Hogan. 2000. Toward the development of a post editing module for raw machine translation output: A controlled language perspective. In Proceedings of the 3rd International Controlled Language Applications Workshop (CLAW-00), 62–71. Seattle, Washington, USA.
Alves, Fabio, Arlene Koglin, Bartolomé Mesa-Lao, Mercedes García-Martínez, Norma B. de Lima Fonseca, Arthur de Melo Sá, José Luiz Gonçalves, Karina Sarto Szpak, Kyoko Sekino, and Marceli Aquino. 2016a. Analysing the impact of interactive machine translation on post-editing effort. In New Directions in Empirical Translation Process Research, ed. Michael Carl, Srinivas Bangalore, and Moritz Schaeffer, 77–94. Cham: Springer.
Alves, Fabio, Karina Sarto Szpak, José Luiz Gonçalves, Kyoko Sekino, Marceli Aquino, Rodrigo Araújo e Castro, Arlene Koglin, Norma B. de Lima Fonseca, and Bartolomé Mesa-Lao. 2016b. Investigating cognitive effort in post-editing: A relevance-theoretical approach. In Eyetracking and Applied Linguistics, ed. Silvia Hansen-Schirra, and Sambor Grucza, 109–142. Berlin: Language Science Press.
Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Preprint.
Bar-Hillel, Yehoshua. 1960. The present status of automatic translation of languages. Advances in Computers. 1: 91–163.
Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, and Marco Turchi. 2015. Findings of the 2015 workshop on statistical machine translation. In Proceedings of the 10th Workshop on Statistical Machine Translation, 1–46. Lisboa, Portugal.
Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, and Marcos Zampieri. 2016. Findings of the 2016 conference on machine translation (WMT16). In Proceedings of the 1st Conference on Machine Translation, Volume 2: Shared Task Papers, 131–198. Berlin, Germany.
Bojar, Ondřej, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Huang Shujian, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, and Marco Turchi. 2017. Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Conference on Machine Translation, Volume 2: Shared Task Papers, 169–214. Copenhagen, Denmark.
Castaño, Asuncion, and Francisco Casacuberta. 1997. A connectionist approach to machine translation. In Proceedings of the 5th European Conference on Speech Communication and Technology, 221–229. Rhodes, Greece.
Castilho, Sheila. 2020. Document-level machine translation evaluation project: Methodology, effort and inter-annotator agreement. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 475–476. Lisbon, Portugal.
Castilho, Sheila, Joss Moorkens, Federico Gaspari, Iacer Calixto, John Tinsley, and Andy Way. 2017. Is neural machine translation the new state of the art? The Prague Bulletin of Mathematical Linguistics 108 (1): 109–120.
Chatterjee, Rajen, Marion Weller, Matteo Negri, and Marco Turchi. 2015. Exploring the planet of the APEs: A comparative study of state-of-the-art methods for MT automatic post-editing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Volume 2: Short Papers, 156–161. Bei**g, China.
Chatterjee, Rajen, Amin Farajian, Matteo Negri, Marco Turchi, Ankit Srivastava, and Santanu Pal. 2017a. Multi-source neural automatic post-editing: FBK’s participation in the WMT 2017 APE shared task. In Proceedings of the Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, 630–638. Copenhagen, Denmark.
Chatterjee, Rajen, Gebremedhen Gebremelak, Matteo Negri, and Marco Turchi. 2017b. Online automatic post-editing for MT in a multi-domain translation environment. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, 525–535. Valencia, Spain.
Chatterjee, Rajen, Matteo Negri, Raphael Rubino, and Marco Turchi. 2018. Findings of the WMT 2018 shared task on automatic post-editing. In Proceedings of the 3rd Conference on Machine Translation (WMT), Volume 2: Shared Task Papers, 723–738. Brussels, Belgium.
Chatterjee, Rajen, Christian Federmann, Matteo Negri, and Marco Turchi. 2019. Findings of the WMT 2019 shared task on automatic post-editing. In Proceedings of the 4th Conference on Machine Translation (WMT), Volume 3: Shared Task Papers (Day 2), 13–30. Florence, Italy.
Chatterjee, Rajen, Markus Freitag, Matteo Negri, and Marco Turchi. 2020. Findings of the WMT 2020 shared task on automatic post-editing. In Proceedings of the 5th Conference on Machine Translation (WMT), 646–659. Online.
Cho, Kyunghyun, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724–1734. Doha, Qatar.
do Carmo, Félix, Dimitar Shterionov, Joss Moorkens, Joachim Wagner, Murhaf Hossari, Eric Paquin, Dag Schmidtke, Declan Groves, and Andy Way. 2020. A review of the state-of-the-art in automatic post-editing. Machine Translation 35: 101–143.
Domingo, Miguel, Mercedes García-Martínez, Álvaro Peris, Alexandre Helle, Amando Estela, Laurent Bié, Francisco Casacuberta, and Manuel Herranz. 2019. Incremental adaptation of NMT for professional post-editors: A user study. In Proceedings of Machine Translation Summit XVII, Volume 2: Translator, Project and User Tracks, 219–227. Dublin, Ireland.
Domingo, Miguel, Mercedes García-Martínez, Álvaro Peris, Alexandre Helle, Amando Estela, Laurent Bié, Francisco Casacuberta, and Manuel Herranz. 2020. A user study of the incremental learning in NMT. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 319–328 (Online).
Escribe, Marie, and Ruslan Mitkov. 2021. Interactive models for post-editing. In Proceedings of TRITON (Translation and Interpreting Technology Online), 167–173 (Online).
Esteban, José, José Lorenzo, Antonio S. Valderrábanos, and Guy Lapalme. 2004. TransType2—An innovative computer-assisted translation system. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, 94–97. Barcelona, Spain.
Farajian, M. Amin, Marco M. Turchi, Matteo Negri, and Marcello Federico. 2017. Multi-domain neural machine translation through unsupervised adaptation. In Proceedings of the Conference on Machine Translation (WMT), Volume 1: Research Papers, 127–137. Copenhagen, Denmark.
Foster, George, Pierre Isabelle, and Pierre Plamondon. 1997. Target-text mediated interactive machine translation. Machine Translation 12 (1): 175–194.
Foster, George, Philippe Langlais, and Guy Lapalme. 2002. User-friendly text prediction for translators. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), 148–155. Philadelphia, Pennsylvania, USA.
Green, S., Jeffrey Heer, and Christopher D. Manning. 2015. Natural language translation at the intersection of AI and HCI. Communications of the ACM 58 (9): 46–53.
Guerreiro, Nuno M., Elena Voita, and F.T. André Martins. 2022. Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation. Preprint.
Hassan, Hany Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce **a, Dongdong Zhang, Zhirui Zhang, Ming Zhou. 2018. Achieving human parity on automatic Chinese to English news translation. Microsoft AI & Research.
Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Preprint.
Hutchins, John. 2005. The First Public Demonstration of Machine Translation: The Georgetown-IBM System, 7th January 1954.
Junczys-Dowmunt, Marcin. 2018. Are we experiencing the golden age of automatic post-editing? In Proceedings of the AMTA 2018 Workshop on Translation Quality Estimation and Automatic Post-Editing, 144–206. Boston, Massachusetts, USA.
Karimova, Sariya, Patrick Simianer, and Stefan Riezler. 2018. A user-study on online adaptation of neural machine translation to human post-edits. Machine Translation 32: 309–324.
Kingma, Diederik P., and Jimmy Lei Ba. 2015. Adam: A method for stochastic optimisation. Preprint.
Knight, Kevin, and Ishwar Chander. 1994. Automated post-editing of documents. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI), 779–784. Seattle, Washington, USA.
Knowles, Rebecca, and Philipp Koehn. 2016. Neural interactive translation prediction. In Proceedings of the Association for Machine Translation in the Americas, 107–120. Austin, Texas, USA.
Koehn, Philipp. 2010. Statistical Machine Translation. Cambridge: Cambridge University Press.
Lagarda, Antonio L., Daniel Ortiz-Martínez, Vincent Alabau, and Francisco Casacuberta. 2015. Translating without in-domain corpus: Machine translation post-editing with online learning techniques. Computer Speech and Language. 32 (1): 109–134.
Lagoudaki, Elina. 2008. The value of machine translation for the professional translator. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas (AMTA), 262–269. Waikiki, Hawaii, USA.
Licklider, Joseph C.R. 1960. Man-computer symbiosis. IRE Transactions on Human Factors in Electronics 1: 4–11.
Mitkov, Ruslan. 2021. Translation memory. In The Routledge Handbook of Translation and Memory, ed. Sue-Ann Deane-Cox, and Anneleen Spiessens. Basingstoke: Routledge.
Ñeco, Ramón P., and Mikel L. Forcada. 1997. Asynchronous translations with recurrent neural nets. In: Proceedings of the IEEE International Conference on Neural Networks, vol. 4, 2535–2540. Houston, Texas, USA.
Negri, Matteo, Marco Turchi, Nicola Bertoldi, and Marcello Federico. 2018a. Online neural automatic post-editing for neural machine translation. In Proceedings of the 5th Italian Conference on Computational Linguistics (CLIC-IT 2018), 288–293. Torino, Italy.
Negri, Matteo, Marco Turchi, Rajen Chatterjee, and Nicola Bertoldi. 2018b. eSCAPE: A large-scale synthetic corpus for automatic post-editing. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), 24–30. Miyazaki, Japan.
Nishida, Fujio, Shinobu Takamatsu, Tadaaki Tani, and Tsunehisa Doi. 1988. Feedback of correcting information in post-editing to a machine translation system. In Proceedings of the 12th International Conference on Computational Linguistics, vol. 2, 476–481. Budapest, Hungary.
Ortiz-Martínez, Daniel, and Francisco Casacuberta. 2014. The new Thot toolkit for fully-automatic and interactive statistical machine translation. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 45–48. Gothenburg, Sweden.
Ortiz-Martínez, Daniel, Ismael García-Varea, and Francisco Casacuberta. 2010. Online learning for interactive statistical machine translation. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, 546–554. Los Angeles, California, USA.
Pal, Santanu, Sudip Kumar Naskar, Mihaela Vela, Qun Liu, and Josef van Genabith. 2017. Neural automatic post-editing using prior alignment and reranking. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 349–355. Valencia, Spain.
Pal, Santanu, Nico Herbig, Antonio Krüger, and Josef van Genabith. 2018. A transformer-based multi-source automatic post-editing system. In Proceedings of the 3rd Conference on Machine Translation: Shared Task Papers, 827–835. Brussels, Belgium.
Papineni, Kishore, Salim Roukos, Todd Ward, and Zhu Wei-**g. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311–318. Philadelphia, Pennsylvania, USA.
Peris, Álvaro, and Francisco Casacuberta. 2019. Online learning for effort reduction in interactive neural machine translation. Computer Speech and Language 58 (1): 98–126.
Phaholphinyo, Sitthaa, Teerapong Modhiran, Nattapol Kritsuthikul, and Thepchai Supnithi. 2005. A practical of memory-based approach for improving accuracy of MT. In Proceedings of the MT Summit X, 41–46. Phuket, Thailand.
Poibeau, Thierry. 2022. On “human parity” and “super human performance” in machine translation evaluation. In Proceedings of the Language Resource and Evaluation Conference, 6018–6023. Marseille, France.
Salton, Gerard, Chung-Shu Yang, and T. Yu. Clement. 1975. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science 26: 33–44.
Santy, Sebastin, Sandipan Dandapat, Monojit Choudhury, and Kalika Bali. 2019. INMT: Interactive neural machine translation prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 103–108. Hong Kong, China.
Shterionov, Dimitar, Félix do Carmo, Joss Moorkens, Murhaf Hossari, Joachim Wagner, Eric Paquin, Dag Schmidtke, Declan Groves, and Andy Way. 2020. A roadmap to neural automatic post-editing: An empirical approach. Machine Translation 34 (2): 67–96.
Simard, Michel, and George Foster. 2013. PEPr: Post-edit propagation using phrase-based statistical machine translation. In Proceedings of the XIV Machine Translation Summit, 191–198. Nice, France.
Simard, Michel, Cyril Goutte, and Pierre Isabelle. 2007. Statistical phrase-based post-editing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (NAACL HLT), 508–515. Rochester, New York, USA.
Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation of the Americas, 223–231. Cambridge, Massachusetts, USA.
Su, Keh-Yih, **g-Shin Chang, and Yu-Ling Una Hsu. 1995. A corpus-based statistics-oriented two-way design for parameterised MT systems: Rationale, architecture and training issues. In Proceedings of the 6th International Conference on Theoretical and Methodological Issues in Machine Translation, 334–353. Leuven, Belgium.
Toral, Antonio, Sheila Castilho, Ke Hu, and Andy Way. 2018. Attaining the unattainable? Reassessing claims of human parity in neural machine translation. In Proceedings of the 3rd Conference on Machine Translation: Research Papers, 113–123. Brussels, Belgium.
Toselli, Alejandro Héctor, Enrique Vidal, and Francisco Casacuberta. 2011. Multimodal Interactive Pattern Recognition and Applications. London: Springer Science & Business Media.
Underwood, Nancy, Bartolomé Mesa-Lao, Mercedes García Martínez, Michael Car, Vicent Alabau, Jesús González-Rubio, Luis A. Leiva, Germán Sanchis-Trilles, Daniel Ortíz-Martínez, Francisco Casacuberta. 2014. Evaluating the effects of interactivity in a post-editing workbench. In Proceeding of the 9th International Conference on Language Resources and Evaluation (LREC), 553–559. Reykjavik, Iceland.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Annual Conference on Advances in Neural Information Processing Systems (NIPS), 5998–6008. Long Beach, California, USA.
Zaninello, Andrea, and Alexandra Birch. 2020. Multiword expression aware neural machine translation. In Proceedings of the 12th Language Resources and Evaluation Conference, 3816–3825. Marseille, France.
Zhechev, Ventsislav. 2012. Machine translation infrastructure and post-editing performance at Autodesk. In Proceedings of the Workshop on Post-editing Technology and Practice. San Diego, California, USA.
Acknowledgements
We would like to express our sincere gratitude to the volunteers who kindly accepted to work on the evaluation tasks, in particular:
• the English–Spanish team: Lucía Bellés-Calvera (Universitat Jaume I), Rocío Caro Quintana (University of Wolverhampton), Ana Isabel Cespedosa Vázquez (Universidad de Córdoba) and Ana Isabel Martínez Hernández (Universitat Jaume I).
• the German-English team: Anne Eschenbrücher (University of Wolverhampton), Lydia Körber (University of Potsdam, Free University of Berlin) and Alistair Plum (University of Wolverhampton).
• the English-Chinese team: Chien-yu Chen (University of Barcelona), Jacinda Chen (Hong Kong Polytechnic University), Meng Chunyu (Hong Kong Baptist University), Zhujun Zhang (Soochow University), Hellen Zheng (Anastacio Overseas Inc.) and Ruiqi Zhou (Hong Kong Baptist University).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Escribe, M., Mitkov, R. (2023). Applying Incremental Learning to Post-editing Systems: Towards Online Adaptation for Automatic Post-editing Models. In: Pan, J., Laviosa, S. (eds) Corpora and Translation Education. New Frontiers in Translation Studies. Springer, Singapore. https://doi.org/10.1007/978-981-99-6589-2_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-6589-2_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6588-5
Online ISBN: 978-981-99-6589-2
eBook Packages: EducationEducation (R0)