Abstract
Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, promising experimental results are reported on a reference SMT task.
Work supported by the EC (FEDER/FSE) and the Spanish MEC/MICINN under the MIPRCV “Consolider Ingenio 2010” program (CSD2007-00018) and iTrans2 (TIN2009-14511) projects. Also supported by the Spanish MITyC under the erudito.com (TSI-020110-2009-439) project and by the Generalitat Valenciana under grant Prometeo/2009/014 and GV/2010/067, and by the “Vicerrectorado de Investigación de la UPV” under grant 20091027.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andrés-Ferrer, J., Juan, A.: A phrase-based hidden semi-markov approach to machine translation. In: Proc. of EAMT, pp. 168–175 (2009)
Brown, P.F., et al.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19(2), 263–311 (1993)
Brown, P.F., et al.: Aligning sentences in parallel corpora. In: Proc. of ACL, pp. 169–176 (1991)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proc. of ACL, pp. 310–318 (1996)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Statistical Society. Series B 39(1), 1–38 (1977)
Deng, Y., Byrne, W.: HMM word and phrase alignment for statistical machine translation. IEEE Trans. Audio, Speech, and Lang. Proc. 16(3), 494–507 (2008)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. In: Proc. ACL, pp. 177–184 (1991)
Giménez, A., et al.: Modelizado de la longitud para la clasificación de textos. In: Actas del I Workshop de Rec. de Formas y Análisis de Imágenes, pp. 21–28 (2005)
Günter, S., Bunke, H.: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern Recognition 37(10), 2069–2079 (2004)
Kneser, R.: Statistical language modeling using a variable context length. In: Proc. of ICSLP (1996)
Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Proc. of the MT Summit X, pp. 79–86 (2005)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proc. of EMNLP, pp. 388–395 (2004)
Koehn, P.: Stadistical Machine Translation. Cambridge University Press, Cambridge (2010)
Koehn, P., et al.: Statistical phrase-based translation. In: HLT, pp. 48–54 (2003)
Koehn, P., et al.: Moses: Open source toolkit for statistical machine translation. In: Proc. of ACL (2007)
Matusov, E., et al.: Automatic Sentence Segmentation and Punctuation Prediction for Spoken Language Translation. In: Proc. of IWSL, pp. 158–165 (2006)
Och, F.J.: Minimum error rate training in statistical machine translation. In: Proc. of ACL, pp. 160–167 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Tech. rep., Watson Research Center (2001)
Sanchis-Trilles, G., Casacuberta, F.: Log-linear weight optimisation via Bayesian Adaptation in Statistical Machine Translation. In: COLING, pp. 1077–1085 (2010)
Sichel, H.S.: On a distribution representing sentence-length in written prose. J. Roy. Statistical Society. Series A 137(1), 25–34 (1974)
Uzuner, Ö., Katz, B.: A comparative study of language models for book and author recognition. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 969–980. Springer, Heidelberg (2005)
Venugopal, A., et al.: Effective phrase translation extraction from alignment models. In: Proc. of ACL, pp. 319–326 (2003)
Zens, R., Ney, H.: N-gram posterior probabilities for statistical machine translation. In: Proceedings of WSMT, pp. 72–77 (2006)
Zhao, B., Vogel, S.: A generalized alignment-free phrase extraction. In: Proc. of ACL Workshop on Building and Using Parallel Texts, pp. 141–144 (1995)
Zimmermann, M., Bunke, H.: Hidden markov model length optimization for handwriting recognition systems. In: Proc. of IWFHR, pp. 369–374 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Silvestre-Cerdà, J.A., Andrés-Ferrer, J., Civera, J. (2011). Explicit Length Modelling for Statistical Machine Translation. In: Vitrià, J., Sanches, J.M., Hernández, M. (eds) Pattern Recognition and Image Analysis. IbPRIA 2011. Lecture Notes in Computer Science, vol 6669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21257-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-21257-4_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21256-7
Online ISBN: 978-3-642-21257-4
eBook Packages: Computer ScienceComputer Science (R0)