Corpus and Machine Translation

Dash, Niladri Sekhar; Ramamoorthy, L.

doi:10.1007/978-981-13-1801-6_12

Niladri Sekhar Dash³ &
L. Ramamoorthy⁴

389 Accesses

Abstract

History shows that a machine translation (MT) system with the support of a few linguistic rules is not realistic. A few rules are not sufficient for capturing the wide variety a natural language exhibits in its diverse use. This leads us to argue for a corpus-based machine translation (CBMT) system that desires to rely on a large amount of linguistic data, information, examples, and rules retrieved from corpora. The first benefit of a CBMT system is the development of algorithms for alignment of bilingual text corpus (BTC)—an essential part of an MT system. A BTC generates a new kind of translation support resource that helps in learning through trial, verification, and validation. A CBMT system begins with analysis of translations produced by human to understand and define the internal structures of BTC, completely or partially, to design strategies for machine learning. Analysis of BTC lends heavily to develop aids to translation as we do not expect an MT system to ‘produce’ exact translation but to ‘understand’ how translations are actually produced with linguistic and extralinguistic information. The use of BTC in CBMT is justified on the ground that data and information acquired from BTC are richer than monolingual corpus with regard to information of contextual equivalence between the languages. Thus, a CBMT system earns a unique status by a combination of features of the example-based machine translation (EBMT) and statistics-based machine translation (SBMT) kee** a mutual interface between the two.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altenberg, B., and K. Aijmer. 2000. The English-Swedish parallel corpus: A resource for contrastive research and translation studies. In Corpus Linguistics and Linguistic Theory, ed. C. Mair and M. Hundt, 15–33. Amsterdam-Atlanta, GA: Rodopi.
Google Scholar
Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In Text and Technology: In Honour of John Sinclair, ed. M. Baker, F. Gill, and E. Tognini-Bonelli, 233–250. Philadelphia: John Benjamins.
Google Scholar
Baker, M. 1996. Corpus-based translation studies: The challenges that lie ahead. In: Terminology, LSP, and Translation: Studies in language engineering in honour of Juan C. Sager., ed. Somers, H. Translation Library 18, 175–186. Amsterdam: John Benjamins’.
Google Scholar
Brown, P., J. Cocke, S.D. Pietra, F. Jelinek, R.L. Mercer, and P.S. Rosin. 1990. A Statistical approach to language translation. Computational Linguistics 16 (1): 79–85.
Google Scholar
Brown, P.F., S.D. Pietra, and R.L. Mercer. 1993. Statistical machine translation. Computational Linguistics 19 (2): 263–312.
Google Scholar
Castillo, J.J. 2010. Using machine translation systems to expand a corpus in textual entailment. In Proceedings of the 7th International Conference on Advances in Natural Language Processing. New York, US: Springer, 97–102.
Google Scholar
Chen, K.H., and H.H. Chen. 1995. Aligning bilingual corpora especially for language pairs from different families. Informations-Sciences-Applications, 4 (2):57–81.
Google Scholar
Condamines, A. 2010. Variations in terminology: Application to the management of risks related to language use in the workplace. Terminology 16 (1): 30–50.
Google Scholar
Dash, N.S. 2005. Role of context in word sense disambiguation. Indian Linguistics 66 (1–4): 159–175.
Google Scholar
Dash, N.S. 2016. Culling scientific and technical terms (STTs) from text corpora for compiling termbank in Bangla. Research Cell: An International Journal of Engineering Sciences 21: 107–122.
Google Scholar
Dash, N.S., and S. Arulmozi. 2016. Generating parallel translation corpora in indian languages: cultivating bilingual texts for cross-lingual fertilization. Translation Today 10 (1): 84–118.
Google Scholar
Dietzel, S. 2009. Example-based Machine Translation. Berlin: Springer.
Google Scholar
Furuse, O., and H. Lida. 1992. An Example-based Method for Transfer-driven Machine Translation. In Proceedings of the MTI-92, Montreal, Canada, 139–150.
Google Scholar
Jones, D. 1992. Non-hybrid Example-based Machine Translation Architectures. In Proceedings of the MTI-92, Montreal, Canada, 163–171.
Google Scholar
Kay, M., and M. Röscheisen. 1993. Text-translation alignment. Computational Linguistics 19 (1): 13–27.
Google Scholar
Koehn, P. 2005. Europarl: a parallel corpus for statistical machine translation. In Proceedings of MT Summit X, Phuket, Thailand, 79–97.
Google Scholar
Koehn, P. 2010. Statistical Machine Translation. Cambridge: Cambridge University Press.
Google Scholar
Macken, L., E. Lefever, and V. Hoste. 2013. Bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology 19 (1): 1–30.
Google Scholar
McLean, I. 1992. Example-based machine translation using connectionist matching. In Proceedings of the MTI-92. Montreal, Canada, 35–43.
Google Scholar
Pala, K., and S.V. Ganagashetty. 2012. Challenges and opportunities in automatically building bilingual lexicon from web corpus. Interdisciplinary Journal of Linguistics 5 (1–2): 169–184.
Google Scholar
Sanderson, M., and W.B. Croft. 2012. The History of information retrieval research. Proceedings of the IEEE 100: 1444–1451.
Google Scholar
Somers, H. 1999. Example-based machine translation. Machine Translation 14 (2): 113–157.
Google Scholar
Somers, H. 2008. Corpora and machine translation. In Corpus Linguistics: An International Handbook, ed. Lüdeling, A., and M. Kytö, 1175–1196. Berlin: Mouton de Gruyter.
Google Scholar
Su, K.Y., and J.S. Chang. 1992. Why corpus-based statistics-oriented machine translation. In The Proceedings of the MTI-92, Montreal, Canada, pp. 249–262.
Google Scholar
Temmerman, R. 2000. Towards New Ways of Terminology Description: The Socio-Cognitive Approach, 26. London: John Benjamins.
Google Scholar
Teubert, W. 2000. Corpus linguistics—A partisan view. International Journal of Corpus Linguistics. 4 (1): 1–16.
Google Scholar
Teubert, W. 2002. The role of parallel corpora in translation and multilingual lexicography. In Lexis in Contrast: Corpus-based Approaches, ed. B. Altenberg and S. Granger, 189–214. Amsterdam: John Benjamins.
Google Scholar
Vandeghinste, V. 2007. Removing the distinction between a translation memory, a bilingual dictionary, and a parallel corpus. In Proceedings of Translation and the Computer 29, ASLIB, London, UK.
Google Scholar
Winograd, T. 1983. Language as a Cognitive Process, vol. I. Mass: Addison-Wesley.
Google Scholar
Wright, S.E., and G. Budin. 1997. Handbook of Terminology Management, Basic Aspects of Terminology Management, vol. 1, 370. Amsterdam: John Benjamins.
Google Scholar

Web Links

Download references

Author information

Authors and Affiliations

Linguistic Research Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Niladri Sekhar Dash
Linguistic Data Consortium-Indian Languages, Central Institute of Indian Languages, Mysore, Karnataka, India
L. Ramamoorthy

Authors

Niladri Sekhar Dash
View author publications
You can also search for this author in PubMed Google Scholar
L. Ramamoorthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niladri Sekhar Dash .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dash, N.S., Ramamoorthy, L. (2019). Corpus and Machine Translation. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-1801-6_12
Published: 14 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1800-9
Online ISBN: 978-981-13-1801-6
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics