Pivot language approach for phrase-based statistical machine translation

Wu, Hua; Wang, Haifeng

doi:10.1007/s10590-008-9041-6

Pivot language approach for phrase-based statistical machine translation

Published: 23 September 2008

Volume 21, pages 165–181, (2007)
Cite this article

Machine Translation

Hua Wu¹ &
Haifeng Wang¹

502 Accesses
54 Citations
3 Altmetric
Explore all metrics

Abstract

This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L _s and L _t with limited bilingual resources, we bring in a third language, L _p, called the pivot language. For the language pairs L _s − L _p and L _p − L _t, there exist large bilingual corpora. Using only L _s − L _p and L _p − L _t bilingual corpora, we can build a translation model for L _s − L _t. The advantage of this method lies in the fact that we can perform translation between L _s and L _t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L _s − L _t bilingual corpus available, our method can further improve translation quality by using the additional L _s − L _p and L _p − L _t bilingual corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

References

Alshawi H, Bangalore S, Douglas S (2000) Learning dependency translation models as collections of finite-state head transducers. Comput Linguist 26(1): 45–60
Article Google Scholar
Aswani N, Gaizauskas R (2005) Aligning words in English–Hindi parallel corpora. In: Proceedings of the ACL 2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, MI, pp 115–118
Borin L (2000) You’ll take the high road and I’ll take the low road: using a third language to improve bilingual word alignment. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe, Saarbrücken, Germany, pp 97–103
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Google Scholar
Callison-Burch C, Koehn P, Osborne M (2006) Improved statistical machine translation using paraphrases. In: Proceedings of the human language technology conference of the North American chapter of the association of computational linguistics, New York, NY, pp 17–24
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: 43rd annual meeting of the association for computational linguistics, Ann Arbor, MI, pp 263–270
Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora. In: 40th annual meeting of the association for computational linguistics, Philadelphia, PA, pp 255–262
Gollins T, Sanderson M (2001) Improving cross language information retrieval with triangulated translation. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, LA, pp 90–95
Kishida K, Kando N (2003) Two-stage refinement of query translation in a pivot language approach to cross-lingual information retrieval: an experiment at CLEF 2003. In: Proceedings of cross-language evaluation forum, CLEF 2003, Trondheim, Norway, pp 253–262
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Machine translation: from real users to research; 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC. Springer, Berlin, Germany, pp 115–124
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86
Koehn P, Monz C (2006) Manual and automatic evaluation of machine translation between european languages. In: Proceedings of the HLT/NAACL 2006 workshop on statistical machine translation, New York, NY, pp 102–121
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL: human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, AL, Canada, pp 127–133
Lopez A, Resnik P (2005) Improved HMM alignment models for languages with scarce resources. In: Proceedings of ACL 2005 workshop on building and using parallel texts: data driven machine translation and beyond, Ann Arbor, MI, pp 83–86
Lopez A, Resnik P (2006) Word-based alignment, phrase-based translation: what’s the link? In: AMTA 2006, proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation, Boston, MA, pp 90–99
Martin J, Mihalcea R, Pedersen T (2005) Word alignment for languages with scarce resources. In: Proceedings of the ACL-2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, MI, pp 65–74
Melamed D (2004) Statistical machine translation by parsing. In: 42nd annual meeting of the association for computational linguistics, proceedings of the conference, Barcelona, Spain, pp 653–660
Mellebeek B, Owczarzak K, Groves D, Van Genabith J, Way A (2006) A syntactic skeleton for statistical machine translation. In: 11th annual conference of the European association for machine translation, proceedings, Oslo, Norway, pp 195–202
Nießen S, Ney H (2004) Statistical machine translation with scarce resources using morpho-syntactic information. Comput Linguist 30(2): 181–204
Article Google Scholar
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st annual meeting of the association for computational linguistics, Sapporo, Japan, pp 160–167
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual meeting of the association for computational linguistics, Philadelphia, PA, pp 295–302
Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4): 417–449
Article Google Scholar
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics, Philadelphia, PA, pp 311–318
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: 43rd annual meeting of the association for computational linguistics, Ann Arbor, MI, pp 271–279
Resnik P, Smith NA (2003) The web as a parallel corpus. Comput Linguist 29(3): 349–380
Article Google Scholar
Schafer C, Yarowsky D (2002) Inducing translation lexicons via diverse similarity measures and bridge languages. In: COLING 2002: the 19th international conference on computational linguistics, proceedings, Taipei, Taiwan, pp 1–7
Tufiş D, Ion R, Ceausu A, Ştefǎnescu D (2005) Combined word alignments. In: Proceedings of the ACL 2005 workshop on building and using parallel texts: data-driven machine translation and beyond, Ann Arbor, MI, pp 107–110
Utiyama M, Isahara H (2007) A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of human language technology: the conference of the North American chapter of the association for computational linguistics, Rochester, NY, pp 484–491
Vandeghinste V, Schuurman I, Carl M, Markantonatou S, Badia T (2006) METIS-II: machine translation for low-resource languages. In: Proceedings of the fifth international conference on language resources and evaluation, Genoa, Italy, pp 1284–1289
Wang H, Wu H, Liu Z (2006) Word alignment for languages with scarce resources using bilingual corpora of other language pairs. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics main conference poster sessions, Sydney, Australia, pp 874–881
Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Linguist 23(3): 377–403
Google Scholar
Yamada K, Knight K (2001) A syntax-based statistical translation model. In: Association for computational linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference, Toulouse, France, pp 523–530

Download references

Author information

Authors and Affiliations

Toshiba (China) Research and Development Center, 501, Tower W2, Oriental Plaza, No. 1, East Chang An Ave., Dong Cheng District, Bei**g, 100738, China
Hua Wu & Haifeng Wang

Authors

Hua Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, H., Wang, H. Pivot language approach for phrase-based statistical machine translation. Machine Translation 21, 165–181 (2007). https://doi.org/10.1007/s10590-008-9041-6

Download citation

Received: 01 April 2008
Accepted: 11 August 2008
Published: 23 September 2008
Issue Date: September 2007
DOI: https://doi.org/10.1007/s10590-008-9041-6

Keywords

Access this article

Log in via an institution

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Pivot language approach for phrase-based statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Phrase-Based English–Nyishi Machine Translation

Enhancing Pivot Translation Using Grammatical and Morphological Information

Word reordering on multiple pivots for the Japanese and Indonesian language pair

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pivot language approach for phrase-based statistical machine translation

Abstract

Access this article

Similar content being viewed by others

Phrase-Based English–Nyishi Machine Translation

Enhancing Pivot Translation Using Grammatical and Morphological Information

Word reordering on multiple pivots for the Japanese and Indonesian language pair

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation