Abstract
Recent advances in statistical machine translation have used approximate beam search for NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.
Similar content being viewed by others
References
Arun A, Dyer C, Haddow B, Blunsom P, Lopez A, Koehn P (2009) Monte Carlo inference and maximization for phrase-based translation. In: Proceedings of CoNLL, Association for Computational Linguistics, Boulder, Colorado, pp 102–110
Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio, pp 200–208
Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Association for Computational Linguistics, Athens, Greece, pp 1–28
Casacuberta F, Higuera CDL (2000) Computational complexity of problems on probabilistic grammars and transducers. Springer-Verlag, London, UK
DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a Bayesian translation model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, Honolulu, Hawaii, pp 314–323
Eisner J, Tromble RW (2006) Local search with very large-scale neighborhoods for optimal permutations in machine translation. In: Proceedings of the HLT-NAACL workshop on computationally hard problems and joint inference in speech and language processing, New York, pp 57–75
Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 618–626
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6: 721–741
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of 39th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toulouse, France, pp 228–235
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732
Johnson H, Martin J, Foster G, Kuhn R (2007a) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp 967–975
Johnson M, Griffiths T, Goldwater S (2007b) Bayesian inference for PCFGs via Markov Chain Monte Carlo. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp 139–146
Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of EMNLP, Association for Computational Linguistics, Prague, Czech Republic, pp 868–876
Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL. Morristown, NJ, USA, pp 48–54
Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings, Association for Computational Linguistics, Boston, Massachusetts, USA, pp 169–176
Langlais P, Gotti F, Patry A (2007) A greedy decoder for phrase-based statistical machine translation. In: 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Sḱdcvde, Sweden, pp 104–113
Li Z, Eisner J, Khudanpur S (2009) Variational decoding for statistical machine translation. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 593–601
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3): 503–528
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: EMNLP ’02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Association for Computational Linguistics, Morristown, NJ, USA, pp 133–139
Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247): 335–341
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sapporo, Japan, pp 160–167
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318
Schraudolph NN (1999) Local gain adaptation in stochastic gradient descent. Technical Report IDSIA-09-99, IDSIA
Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794
Zens R, Hasan S, Ney H (2007) A systematic comparison of training criteria for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 524–532
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper extends work presented in Arun et al. (2009).
Rights and permissions
About this article
Cite this article
Arun, A., Haddow, B., Koehn, P. et al. Monte Carlo techniques for phrase-based translation. Machine Translation 24, 103–121 (2010). https://doi.org/10.1007/s10590-010-9080-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-010-9080-7