Log in

Mismatching-aware unsupervised translation quality estimation for low-resource languages

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Translation Quality Estimation (QE) is the task of predicting the quality of machine translation (MT) output without any reference. This task has gained increasing attention as an important component in the practical applications of MT. In this paper, we first propose XLMRScore, which is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. This metric can be used as a simple unsupervised QE method, nevertheless facing two issues: firstly, the untranslated tokens leading to unexpectedly high translation scores, and secondly, the issue of mismatching errors between source and hypothesis tokens when applying the greedy matching in XLMRScore. To mitigate these issues, we suggest replacing untranslated words with the unknown token and the cross-lingual alignment of the pre-trained model to represent aligned words closer to each other, respectively. We evaluate the proposed method on four low-resource language pairs of the WMT21 QE shared task, as well as a new English\(\rightarrow\)Persian (En-Fa) test dataset introduced in this paper. Experiments show that our method could get comparable results with the supervised baseline for two zero-shot scenarios, i.e., with less than 0.01 difference in Pearson correlation, while outperforming unsupervised rivals in all the low-resource language pairs for above 8%, on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Arrows indicate translation directions.

  2. https://wmt-qe-task.github.io/

  3. https://github.com/fatemeh-azadi/Unsupervised-QE

  4. https://www.faraazin.ir/

  5. https://marian-nmt.github.io/

  6. https://github.com/jhclark/tercom

  7. https://github.com/sheffieldnlp/mlqe-pe

  8. https://www.statmt.org/wmt20/parallel-corpus-filtering.html

  9. https://www.cfilt.iitb.ac.in/iitb_parallel/

  10. https://data.statmt.org/news-crawl/

  11. https://github.com/moses-smt/giza-pp.

  12. https://github.com/sheffieldnlp/qe-eval-scripts.

  13. https://github.com/facebookresearch/LASER.

  14. https://www.sbert.net/.

  15. http://web.eecs.umich.edu/~mihalcea/wpt05/

References

  • Aldarmaki, H., & Diab, M. (2019). Context-aware cross-lingual map**. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 3906–3911.

  • Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72.

  • Cao, S., Kitaev, N., & Klein, D. (2019). Multilingual alignment of contextual word representations. In: International Conference on Learning Representations.

  • Chen, Y., Su, C., & Zhang, Y. et al. (2021). HW-TSC’s participation at WMT 2021 quality estimation shared task. In: Proceedings of the Sixth Conference on Machine Translation, pp 890–896.

  • Conneau, A., & Lample, G. (2019). Cross-lingual language model pretraining. Advances in neural information processing systems 32.

  • Conneau, A., Khandelwal, K., & Goyal, N. et al. (2020). Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 8440–8451.

  • Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems, 26, 2292.

    Google Scholar 

  • Devlin, J., Chang, M.W., Lee, K. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186.

  • do Carmo, F., Shterionov, D., & Moorkens, J., et al. (2021). A review of the state-of-the-art in automatic post-editing. Machine Translation, 35(2), 101–143.

  • Edelsbrunner, H., & Morozov, D. (2012). Persistent homology: Theory and practice. In: Proceedings of the European Congress of Mathematics. European Mathematical Society, pp 31–50.

  • Etchegoyhen, T., Garcia, E.M., & Azpeitia, A. (2018). Supervised and unsupervised minimalist quality estimators: Vicomtech’s participation in the wmt 2018 quality estimation task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp 782–787.

  • Fomicheva, M., Sun, S., & Fonseca, E. et al. (2022). MLQE-PE: A multilingual quality estimation and post-editing dataset. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 4963–4974.

  • Fomicheva, M., Sun, S., Yankovskaya, L., et al. (2020). Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 539–555.

    Article  Google Scholar 

  • Guzmán, F., Chen, P.J., & Ott, M. et al. (2019). The FLORES evaluation datasets for low-resource machine translation: Nepali–English and Sinhala–English. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 6098–6111.

  • Huang, H., Liang, Y., & Duan, N. et al. (2019). Unicoder: A universal language encoder by pre-training with multiple cross-lingual tasks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 2485–2494.

  • Ive, J., Blain, F., & Specia, L. (2018). Deepquest: a framework for neural-based quality estimation. In: Proceedings of the 27th International Conference on Computational Linguistics, pp 3146–3157.

  • Jabbari, F., Bakshaei, S., & Ziabary, S.M.M. et al. (2012). Develo** an open-domain English-Farsi translation system using AFEC: Amirkabir bilingual Farsi-English Corpus. In: Fourth Workshop on Computational Approaches to Arabic-Script-based Languages, pp 17–23.

  • Junczys-Dowmunt, M., Grundkiewicz, R., & Dwojak, T. et al. (2018). Marian: Fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations. Association for Computational Linguistics, Melbourne, Australia, pp 116–121.

  • Karthikeyan, K., Wang, Z., & Mayhew, S. et al. (2020). Cross-lingual ability of multilingual BERT: an empirical study. In: 8th International Conference on Learning Representations.

  • Kepler, F., Trénous, J., & Treviso, M. et al. (2019a). Unbabel’s participation in the WMT19 translation quality estimation shared task. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2). Association for Computational Linguistics, Florence, Italy, pp 78–84.

  • Kepler, F., Trénous, J., & Treviso, M. et al. (2019b). OpenKiwi: An open source framework for quality estimation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 117–122.

  • Kim, H., Jung, H. Y., Kwon, H., et al. (2017). Predictor-Estimator: neural quality estimation based on target word prediction for machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 17(1), 1–22.

    Google Scholar 

  • Kim, H., Lim, J.H., & Kim, H.K. et al. (2019). QE BERT: bilingual BERT using multi-task learning for neural quality estimation. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp 85–89.

  • Kingma, D.P., Ba, J. (2015). Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR.

  • Koehn, P., Hoang, H., & Birch, A. et al. (2007). Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics, Prague, Czech Republic, pp 177–180.

  • Koehn, P., Chaudhary, V., & El-Kishky, A. et al. (2020). Findings of the WMT 2020 shared task on parallel corpus filtering and alignment. In: Proceedings of the Fifth Conference on Machine Translation. Association for Computational Linguistics, Online, pp 726–742.

  • Kulshreshtha, S., Redondo Garcia, J.L., & Chang, C.Y. (2020). Cross-lingual alignment methods for multilingual BERT: A comparative study. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, pp 933–942.

  • Kunchukuttan, A., Mehta, P., & Bhattacharyya, P. (2018). The IIT Bombay English-Hindi parallel corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan.

  • Lee, D. (2020). Two-phase cross-lingual language model fine-tuning for machine translation quality estimation. In: Proceedings of the Fifth Conference on Machine Translation, pp 1024–1028.

  • Liu, Q., McCarthy, D., Vulić, I. et al. (2019). Investigating cross-lingual alignment methods for contextualized embeddings with token-level evaluation. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, Hong Kong, China, pp 33–43.

  • Moura, J., Vera, M., & van Stigt, D. et al. (2020). IST-Unbabel participation in the wmt20 quality estimation shared task. In: Proceedings of the Fifth Conference on Machine Translation, pp 1029–1036.

  • Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51.

    Article  Google Scholar 

  • Papineni, K., Roukos, S., & Ward, T. et al. (2002). BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318.

  • Ranasinghe, T., Orǎsan, C., & Mitkov, R. (2020). Transquest: Translation quality estimation with cross-lingual transformers. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 5070–5081.

  • Sabet, M. J., Dufter, P., Yvon, F., et al. (2020). Simalign: High quality word alignments without parallel training data using static and contextualized embeddings. Findings of the Association for Computational Linguistics: EMNLP, 2020, 1627–1643.

    Google Scholar 

  • Snover, M., Dorr, B., Schwartz, R. et al. (2006). A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp 223–231.

  • Specia, L., Turchi, M., & Cancedda, N. et al. (2009). Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European association for machine translation.

  • Specia, L., Shah, K., De Souza, J.G. et al (2013). QuEst - A translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 79–84.

  • Specia, L., Blain, F., & Fomicheva, M. et al (2020). Findings of the WMT 2020 shared task on quality estimation. In: Proceedings of the Fifth Conference on Machine Translation. Association for Computational Linguistics, Online, pp 743–764.

  • Specia, L., Blain, F., & Fomicheva, M. et al (2021). Findings of the WMT 2021 shared task on quality estimation. In: Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, pp 684–725.

  • Tavakoli, L., & Faili, H. (2014). Phrase alignments in parallel corpus using bootstrap** approach. International Journal of Information and Communication Technology Research.

  • Tuan, Y.L., El-Kishky, A., & Renduchintala, A. et al (2021). Quality estimation without human-labeled data. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp 619–625.

  • Wang, J., Fan, K., & Li, B. et al (2018). Alibaba submission for wmt18 quality estimation task. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp 809–815.

  • Wang, J., Wang, K., & Chen, B. et al (2021). QEMind: Alibaba’s submission to the wmt21 quality estimation shared task. In: Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, pp 948–954.

  • Wang, Y., Che, W., & Guo, J. et al (2019). Cross-lingual BERT transformation for zero-shot dependency parsing. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 5721–5727.

  • Wu, S., & Dredze, M. (2019). Beto, Bentz, Becas: The surprising cross-lingual effectiveness of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 833–844.

  • Wu, S., & Dredze, M. (2020). Do explicit alignments robustly improve multilingual encoders? In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, pp 4471–4482.

  • Zerva, C., van Stigt, D., & Rei, R. et al (2021). IST-Unbabel 2021 submission for the quality estimation shared task. In: Proceedings of the Sixth Conference on Machine Translation. Association for Computational Linguistics, Online, pp 961–972.

  • Zerva, C., Blain, F., Rei, R. et al (2022). Findings of the WMT 2022 shared task on quality estimation. In: Proceedings of the Seventh Conference on Machine Translation (WMT). Association for Computational Linguistics, pp 69–99.

  • Zhang, T., Kishore, V., & Wu, F. et al (2020). BERTScore: Evaluating text generation with BERT. In: International Conference on Learning Representations.

  • Zhou, L., Ding, L., & Takeda, K. (2020). Zero-shot translation quality estimation with explicit cross-lingual patterns. In: Proceedings of the Fifth Conference on Machine Translation. Association for Computational Linguistics, Online, pp 1068–1074.

Download references

Acknowledgements

We want to acknowledge the partial support from Institute for Research in Fundamental Sciences (IPM) by the grant number CS1403-04-192. We also want to acknowledge the partial support of Iran National Science Foundation (INSF), under grant no 4002438.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heshaam Faili.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azadi, F., Faili, H. & Dousti, M.J. Mismatching-aware unsupervised translation quality estimation for low-resource languages. Lang Resources & Evaluation (2024). https://doi.org/10.1007/s10579-024-09727-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10579-024-09727-x

Keywords

Navigation