Abstract
History shows that a machine translation (MT) system with the support of a few linguistic rules is not realistic. A few rules are not sufficient for capturing the wide variety a natural language exhibits in its diverse use. This leads us to argue for a corpus-based machine translation (CBMT) system that desires to rely on a large amount of linguistic data, information, examples, and rules retrieved from corpora. The first benefit of a CBMT system is the development of algorithms for alignment of bilingual text corpus (BTC)—an essential part of an MT system. A BTC generates a new kind of translation support resource that helps in learning through trial, verification, and validation. A CBMT system begins with analysis of translations produced by human to understand and define the internal structures of BTC, completely or partially, to design strategies for machine learning. Analysis of BTC lends heavily to develop aids to translation as we do not expect an MT system to ‘produce’ exact translation but to ‘understand’ how translations are actually produced with linguistic and extralinguistic information. The use of BTC in CBMT is justified on the ground that data and information acquired from BTC are richer than monolingual corpus with regard to information of contextual equivalence between the languages. Thus, a CBMT system earns a unique status by a combination of features of the example-based machine translation (EBMT) and statistics-based machine translation (SBMT) kee** a mutual interface between the two.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altenberg, B., and K. Aijmer. 2000. The English-Swedish parallel corpus: A resource for contrastive research and translation studies. In Corpus Linguistics and Linguistic Theory, ed. C. Mair and M. Hundt, 15–33. Amsterdam-Atlanta, GA: Rodopi.
Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In Text and Technology: In Honour of John Sinclair, ed. M. Baker, F. Gill, and E. Tognini-Bonelli, 233–250. Philadelphia: John Benjamins.
Baker, M. 1996. Corpus-based translation studies: The challenges that lie ahead. In: Terminology, LSP, and Translation: Studies in language engineering in honour of Juan C. Sager., ed. Somers, H. Translation Library 18, 175–186. Amsterdam: John Benjamins’.
Brown, P., J. Cocke, S.D. Pietra, F. Jelinek, R.L. Mercer, and P.S. Rosin. 1990. A Statistical approach to language translation. Computational Linguistics 16 (1): 79–85.
Brown, P.F., S.D. Pietra, and R.L. Mercer. 1993. Statistical machine translation. Computational Linguistics 19 (2): 263–312.
Castillo, J.J. 2010. Using machine translation systems to expand a corpus in textual entailment. In Proceedings of the 7th International Conference on Advances in Natural Language Processing. New York, US: Springer, 97–102.
Chen, K.H., and H.H. Chen. 1995. Aligning bilingual corpora especially for language pairs from different families. Informations-Sciences-Applications, 4 (2):57–81.
Condamines, A. 2010. Variations in terminology: Application to the management of risks related to language use in the workplace. Terminology 16 (1): 30–50.
Dash, N.S. 2005. Role of context in word sense disambiguation. Indian Linguistics 66 (1–4): 159–175.
Dash, N.S. 2016. Culling scientific and technical terms (STTs) from text corpora for compiling termbank in Bangla. Research Cell: An International Journal of Engineering Sciences 21: 107–122.
Dash, N.S., and S. Arulmozi. 2016. Generating parallel translation corpora in indian languages: cultivating bilingual texts for cross-lingual fertilization. Translation Today 10 (1): 84–118.
Dietzel, S. 2009. Example-based Machine Translation. Berlin: Springer.
Furuse, O., and H. Lida. 1992. An Example-based Method for Transfer-driven Machine Translation. In Proceedings of the MTI-92, Montreal, Canada, 139–150.
Jones, D. 1992. Non-hybrid Example-based Machine Translation Architectures. In Proceedings of the MTI-92, Montreal, Canada, 163–171.
Kay, M., and M. Röscheisen. 1993. Text-translation alignment. Computational Linguistics 19 (1): 13–27.
Koehn, P. 2005. Europarl: a parallel corpus for statistical machine translation. In Proceedings of MT Summit X, Phuket, Thailand, 79–97.
Koehn, P. 2010. Statistical Machine Translation. Cambridge: Cambridge University Press.
Macken, L., E. Lefever, and V. Hoste. 2013. Bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology 19 (1): 1–30.
McLean, I. 1992. Example-based machine translation using connectionist matching. In Proceedings of the MTI-92. Montreal, Canada, 35–43.
Pala, K., and S.V. Ganagashetty. 2012. Challenges and opportunities in automatically building bilingual lexicon from web corpus. Interdisciplinary Journal of Linguistics 5 (1–2): 169–184.
Sanderson, M., and W.B. Croft. 2012. The History of information retrieval research. Proceedings of the IEEE 100: 1444–1451.
Somers, H. 1999. Example-based machine translation. Machine Translation 14 (2): 113–157.
Somers, H. 2008. Corpora and machine translation. In Corpus Linguistics: An International Handbook, ed. Lüdeling, A., and M. Kytö, 1175–1196. Berlin: Mouton de Gruyter.
Su, K.Y., and J.S. Chang. 1992. Why corpus-based statistics-oriented machine translation. In The Proceedings of the MTI-92, Montreal, Canada, pp. 249–262.
Temmerman, R. 2000. Towards New Ways of Terminology Description: The Socio-Cognitive Approach, 26. London: John Benjamins.
Teubert, W. 2000. Corpus linguistics—A partisan view. International Journal of Corpus Linguistics. 4 (1): 1–16.
Teubert, W. 2002. The role of parallel corpora in translation and multilingual lexicography. In Lexis in Contrast: Corpus-based Approaches, ed. B. Altenberg and S. Granger, 189–214. Amsterdam: John Benjamins.
Vandeghinste, V. 2007. Removing the distinction between a translation memory, a bilingual dictionary, and a parallel corpus. In Proceedings of Translation and the Computer 29, ASLIB, London, UK.
Winograd, T. 1983. Language as a Cognitive Process, vol. I. Mass: Addison-Wesley.
Wright, S.E., and G. Budin. 1997. Handbook of Terminology Management, Basic Aspects of Terminology Management, vol. 1, 370. Amsterdam: John Benjamins.
Web Links
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Dash, N.S., Ramamoorthy, L. (2019). Corpus and Machine Translation. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-1801-6_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1800-9
Online ISBN: 978-981-13-1801-6
eBook Packages: Social SciencesSocial Sciences (R0)