Log in

A review of machine transliteration, translation, evaluation metrics and datasets in Indian Languages

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In today’s global scenario, frequent international and domestic interactions necessitate the application of Machine Transliteration and Translation systems to overcome the language barrier. This paper presents a review of Natural Language Processing (NLP) techniques like Machine Translation (MT) and Machine Transliteration (MTn), along with providing an analytical study of evaluation metrics such as BLEU (BiLingual Evaluation Understudy) score and discussing datasets available for MT and MTn systems in Indian languages. This paper is unique in providing a detailed review of all steps involved in the NLP system development pipeline, from the creation and collection of data to the development of the system, and furthermore, the evaluation and analysis of the system. It also comments on the validity and viability of various evaluation metrics for Indian languages. MT and MTn systems are an evolving field of computational linguistics and are considered to be incredibly challenging to develop. The lack of readily available grammatical rules, the distinction between proper and common nouns, and large datasets, along with additional linguistic complexity compared to many other languages, makes develo** such systems for Indian languages even more complicated. It explores different approaches like statistics oriented, example oriented, and neural network-oriented MT techniques implied in MT tasks, along with providing insight into the work carried out so far for Indian languages. The review also discusses the scope for future research in this field. This article determines the current status of available datasets, MT and MTn systems, along with commenting on the validity of currently available evaluation metrics like BLEU for Indian languages. The article also provides a direction in which further research for Indian languages should ideally be headed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Abadi M, Barham P, Chen J et al (2016) Tensorflow: A system for large-scale machine learning. In: 12th symposium on operating systems design and implementation. pp 265–283.

  2. About Us | AI4Bharat IndicNLP. indicnlp.ai4bharat.org. https://indicnlp.ai4bharat.org/aboutus/ (accessed Jun. 01, 2022).

  3. Achanta SD, Karthikeyan T, Kanna RV (2021) Wearable sensor based acoustic gait analysis using phase transition-based optimization algorithm on IoT. Int J Speech Technol. 1-1

  4. Agarwal A, Lavie A (2008) Meteor, m-bleu and m-ter: evaluation metrics for high-correlation with human rankings of machine translation output. Proceedings of the Third Workshop on Statistical Machine Translation, In, pp 115–118

  5. Ambati V, Rohini U (2007) A hybrid approach to example based machine translation for Indian languages. In: proceedings of 5th international conference on natural language processing (ICON 2007). pp 4–6

  6. Andrabi SAB, Wahid A (2022) Machine translation system using deep learning for English to Urdu. Comput Intell Neurosci 2022:1–11

    Article  Google Scholar 

  7. Annamalai E, Indian Languages CI of (1979) Language movements in India. Central Institute of Indian Languages, https://books.google.com.np/books?id=lnFlAAAAMAAJ, accessed March 2022

  8. Apps B2017 T English to Maithili Dictionary, https://play.google.com/store/apps/details?id=best2017translatorapps.english.maithili.dictionary, accessed December 2021

  9. Association ELR ELRA-ELDA Portal, http://www.elra.info/en/, accessed March 2022

  10. Attardi G Github - attardi/wikiextractor, https://github.com/attardi/wikiextractor, accessed March 2022

  11. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ar**v preprint ar**v:14090473

  12. Bandyopadhyay S (2000) ANUBAAD-the translator from English to Indian languages. In: proceedings of the 7th state Science and technology congress,(SSTC’00). Calcutta, India, pp 1–9

  13. Bawa S, Kumar M (2021) A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J Ambient Intell Human Comput. 1-34

  14. Bennett WS, Slocum J (1985) The LRC machine translation system. Comput Linguistics 11:111–121

    Google Scholar 

  15. Bhatt R, Narasimhan B, Palmer M et al (2009) A multi-representational and multi-layered treebank for hindi/urdu. In: proceedings of the third linguistic annotation workshop (LAW III). pp 186–189

  16. Blatz J, Fitzgerald E, Foster G et al (2004) Confidence estimation for machine translation. In: Coling 2004: Proceedings of the 20th international conference on computational linguistics. Pp 315–321

  17. Bollacker K, Cook R, Tufts P (1962–1963) (2007) freebase: A shared database of structured general human knowledge. AAAI. pp, In

  18. Bombay IIT IIT Bombay English-Hindi parallel Corpus, https://www.cfilt.iitb.ac.in/~parallelcorp/iitb_en_hi_parallel/, accessed March 2022

  19. Cardona G (2017) Indo-Aryan languages, encyclopedia Britannica online, https://www.britannica.com/topic/Indo-Aryan- languages, accessed March 2022

  20. Chakravarthi BR, Priyadharshini R, Banerjee S, Saldanha R, McCrae JP, Krishnamurthy P, Johnson M. (2021) Findings of the Shared Task on Machine Translation in Dravidian languages. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages 2021. pp. 119–125

  21. Chopra D, Joshi N, Mathur I (2018) A review on machine translation in Indian Languages. Eng, Technol Appl Sci Res 8:3475–3478

    Article  Google Scholar 

  22. Choudhary N (2021) LDC-IL: the Indian repository of resources for language technology. Lang Resour Eval 55:1–13

    Article  Google Scholar 

  23. Chung J, Gulcehre C, Cho KH, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. ar**v preprint ar**v:14123555

  24. Corporation M Bing Microsoft Translator, https://www.bing.com/translator/, accessed March 2022

  25. Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English–Hindi machine translation and language divergence. Mach Transl 16:251–304

    Article  Google Scholar 

  26. Deng Y, Byrne W (2008) HMM word and phrase alignment for statistical machine translation. IEEE Trans Audio Speech Lang Process 16:494–507

    Article  Google Scholar 

  27. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: proceedings of the EACL 2014 workshop on statistical machine translation.

  28. Denkowski M, Hanneman G, Lavie A (2012) The cmu-avenue french-english translation system. Proceedings of the Seventh Workshop on Statistical Machine Translation, In, pp 261–266

  29. Devi SL, Pralayankar P, Menaka S et al (2010) Verb transfer in a tamil to hindi machine translation system. In: 2010 International Conference on Asian Language Processing. IEEE, pp 261–264

  30. Dey S, Saha G, Sahidullah M (2021) Cross-corpora language recognition: A preliminary investigation with Indian Languages. ar**v preprint ar**v:210504639

  31. Dhore ML, Dixit SK, Sonwalkar TD (2012) Hindi to English machine transliteration of named entities using conditional random fields. Int J Comput Appl 48:31–37

    Google Scholar 

  32. Diwakar RR Maithili and Magahi - Bihar Articles, https://web.archive.org/web/20120723144641/http://bihar.ws/info/Bihari-Languages/Maithili-and-Magahi.html, accessed March 2022

  33. Dodge J, Sap M, Marasovic A et al (2021) Documenting the English colossal clean crawled corpus. ar**v preprint ar**v:210408758.

  34. Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. ar**v preprint ar**v:180809381

  35. Filippova K, Altun Y (2013) Overcoming the lack of parallel data in sentence compression Google Research

  36. Foundation W Wikimedia Statistics - English Wikipedia, https://rb.gy/gsrijn, accessed March 2022

  37. Foundation W Wikimedia Downloads, https://dumps.wikimedia.org/backup-index.html, accessed March 2022

  38. foundation W Wikipedia Statistics – Tables, https://stats.wikimedia.org/EN/TablesWikipediaZZ.htm, accessed January 2022

  39. Foundation WM Wikipedia:Size comparisons, https://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, accessed March 2022

  40. Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. AAAI, In, pp 1301–1306

  41. Garje GV, Kharate GK (2013) Survey of machine translation systems in India. Int J Nat Language Comput (IJNLC) 2:47–67

    Article  Google Scholar 

  42. GmbH DL DeepL Translate, https://www.deepl.com/translator, accessed March 2022

  43. Google Scholar- advanced search results. scholar.google.com. https://scholar.google.com/scholar?q=machine+translation+indian+language+source%3Aarxiv&hl=en&as_sdt=0%2C5&as_ylo=2017&as_yhi=2022 (accessed Jun. 01, 2022).

  44. Goyal V, Lehal GS (2008) Hindi morphological analyser and generator. In: 2008 first international conference on emerging trends in engineering and technology. IEEE, pp 1156–1159

  45. Goyal V, Lehal GS (2009) Evaluation of Hindi to Punjabi machine translation system. ar**v preprint ar**v:09101868

  46. Goyal V, Sharma DM (2019) The iiit-h gujarati-english machine translation system for wmt19. Proceed Fourth Conf Mach Trans (Volume 2: Shared Task Papers, Day 1). pp. 191–195

  47. Group M Internet World Stats, https://www.internetworldstats.com/stats7.htm, accessed March 2022

  48. Gu J, Wang Y, Chen Y, Cho K, Li VO (2018) Meta-learning for low-resource neural machine translation. ar**v preprint ar**v:1808.08437

  49. Gurevych I, Müller C, Zesch T (2007) What to be? -electronic career guidance based on semantic relatedness. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. pp 1032–1039

  50. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  51. https://internetmarketingteam.com/ Ciil SEO Spoken Corpus Online internet marketing guide. Gilbert AZ, http://www.ciil-spokencorpus.net/, accessed March 2022

  52. IEEE Xplore Search Results. ieeexplore.ieee.org. https://ieeexplore.ieee.org/search/searchresult.jsp?queryText=machine%20translation%20indian%20language&highlight=true&returnFacets=ALL&returnType=SEARCH&matchPubs=true&ranges=2017_2022_Year (accessed Jun. 01, 2022).

  53. IITK FIRE Forum for Information Retrieval Evaluation: Data, http://fire.irsi.res.in/fire/static/data, accessed March 2022

  54. Inc A Google Translate, https://translate.google.com/, accessed March 2022

  55. Indian Languages (CIIL) M Central Institute of Bharatvani Knowledge through Indian Languages, https://bharatavani.in/, accessed March 2022

  56. Indian Languages CI of (2021) Central Institute of Indian Languages: A legend, https://ciil.org/aboutlegend.aspx, accessed March 2022

  57. Indian Languages CI of Anu Kriti (archived feb 2021). archive.org, https://web.archive.org/web/20210211063742/https://www.anukriti.net/, accessed March 2022

  58. Information Retrieval Evaluation F for Forum for Information Retrieval Evaluation, http://fire.irsi.res.in/fire/2021/home, accessed March 2022

  59. International SIL (2021) India, https://www.ethnologue.com/country/IN, accessed March 2022

  60. Jehl L, Simianer P, Hitschler J, Riezler S (2015) The Heidelberg university English-German translation system for IWSLT 2015. Proc of IWSLT, Da Nang, Vietnam

  61. Jha AK, Singh PP, Dwivedi P (2019) Maithili text-to-speech system. In: 2019 IEEE international conference on electronics, Computing and Communication Technologies (CONECCT). IEEE, pp. 1–6

  62. Jha S, Sudhakar A, Kumar Singh A (2018) Neural machine translation based word transduction mechanisms for low-resource Languages. Arvix preprint CoRR, abs.

  63. Johnson M, Schuster M, Le QV et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguistics 5:339–351

    Article  Google Scholar 

  64. Joshi N, Darbari H, Mathur I (2012) Human and automatic evaluation of english to hindi machine translation systems. In: Advances in computer Science, Engineering & Applications. Springer, pp. 423–432

  65. Joshi N, Mathur I, Darbari H, Kumar A (2013) HEval: yet another human evaluation metric. ar**v preprint ar**v:13113961

  66. Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, Hoang H, Heafield K, Neckermann T, ..., Birch, A (2018). Marian: Fast neural machine translation in C++. ar**v preprint ar**v:1804.00344

  67. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: proceedings of the 2013 conference on empirical methods in natural language processing. Pp 1700–1709.

  68. Karakanta A, Ojha AK, Liu CH et al (2019) Proceedings of the 2nd workshop on technologies for MT of low resource Languages. In: Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages

  69. Kaur K, Singh P (2014) Review of machine transliteration techniques. Int J Comput Appl. 107

  70. Klein G, Kim Y, Deng Y, et al. (2017) Opennmt: open-source toolkit for neural machine translation. ar**v preprint ar**v:170102810

  71. Koehn P, Federico M, Shen W et al (2007) Open source toolkit for statistical machine translation: factored translation models and confusion network decoding. In: CLSP summer workshop final report WS-2006, Johns Hopkins University

  72. Kumar MA, Premjith B, Singh S, … Soman KP (2019) An overview of the shared task on machine translation in Indian languages (MTIL)–2017. J Intell Syst 28:455–464

    Google Scholar 

  73. Kunchukuttan A, Mehta P, Bhattacharyya P (2017) The IIT bombay english-hindi parallel corpus. ar**v preprint ar**v:171002855

  74. Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. University of Pennsylvania Repository

  75. Lample G, Ott M, Conneau A, Denoyer L, Ranzato MA (2020) Phrase-based & neural unsupervised machine translation. ar**v preprint ar**v:1804.07755

  76. Lavie A (2010) Evaluating the output of machine translation systems. AMTA Tutorial 86

  77. LDCIL (2007) LDC-IL. ldcil.org, https://www.ldcil.org/,

  78. LDC-IL LDC-IL text corpora, https://ldcil.org/resourcesTextCorp.aspx, accessed March 2022

  79. Liu X, Duh K, Liu L, Gao J (2020) Very deep transformers for neural machine translation. ar**v preprint ar**v:200807772

  80. LLC G Evaluating Models - AutoML translation documentation, https://cloud.google.com/translate/automl/docs/evaluate, accessed March 2022

  81. LLC SM Machine Translation Service - Translate.com, https://www.translate.com/machine-translation#en/es/, accessed March 2022

  82. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. ar**v preprint ar**v:150804025

  83. Madaan P, Sadat F (2020) Multilingual neural machine translation involving Indian languages. In: proceedings of the WILDRE5–5th workshop on Indian language data: resources and evaluation. pp 29–32

  84. Mahata SK, Garain A, Das D, Bandyopadhyay S (2022) Simplification of English and Bengali sentences for improving quality of machine translation. Neural Process Lett 54:1–25

    Article  Google Scholar 

  85. Maity S, Vuppala AK, Rao KS, Nandi D (2012) IITKGP-MLILSC speech database for language identification. In: 2012 National Conference on communications (NCC). IEEE, pp 1–5

  86. Matthews D (2007) Machine transliteration of proper names. University of Edinburgh, Edinburgh, United Kingdom, Master’s Thesis

  87. Milne D, Witten IH (2013) An open-source toolkit for mining Wikipedia. Artif Intell 194:222–239

    Article  MathSciNet  Google Scholar 

  88. Mishra K, Soni A, Sharma R, Sharma DM (2014) Exploring the effects of sentence simplification on Hindi to English machine translation system. In: proceedings of the workshop on automatic text simplification- methods and applications in the multilingual society (ATS-MA 2014). pp 21–29

  89. Mujadia V, Sharma DM (2021) English-Marathi Neural Machine Translation for LoResMT 2021. InProceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021). (pp. 151–157)

  90. Mundotiya RK, Kumar S, Chaudhary UC et al (2020) Development of a dataset and a deep learning baseline named entity recognizer for three low resource Languages: Bhojpuri, Maithili and Magahi. ar**v preprint ar**v:200906451

  91. Murthy AS, Karthikeyan T, Jagan BO, Kumari CU (2020) Novel deep neural network for individual re recognizing physically disabled individuals. Mater Today: Proceed 33:4323–4328

    Google Scholar 

  92. Nair LR, David Peter S (2012) Machine translation systems for Indian languages. Int J Comput Appl 39:0975–8887

    Google Scholar 

  93. Naskar SK, Bandyopadhyay S (2006) Handling of prepositions in English to Bengali machine translation In Proceedings of the Third ACL-SIGSEM Workshop on Prepositions

  94. Nidhi R, Singh T (2018) English-maithili machine translation and divergence. In: 2018 7th international conference on reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). IEEE, pp. 775–778

  95. Och FJ, Ney H (2000) Improved statistical alignment models. In: proceedings of the 38th annual meeting of the association for computational linguistics. pp 440–447

  96. Oh J, Choi K, Isahara H (2006) A comparison of different machine transliteration models. J Artif Intell Res 27:119–151

    Article  MATH  Google Scholar 

  97. Palmer M, Bhatt R, Narasimhan B et al (2009) Hindi syntax: annotating dependency, lexical predicate- argument structure, and phrase structure. In: the 7th international conference on natural language processing. Pp 14–17.

  98. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: proceedings of the 40th annual meeting of the Association for Computational Linguistics. Pp 311–318.

  99. Pathak A, Pakray P (2019) Neural machine translation for indian languages. J Intell Syst 28:465–477

    Google Scholar 

  100. Pennsylvania UO Home Linguistic Data Consortium, https://www.ldc.upenn.edu/, accessed March 2022

  101. **ali P, Ganesh S, Yella S, Varma V (2008) Statistical transliteration for cross language information retrieval using HMM alignment model and CRF. In: proceedings of the 2nd workshop on cross lingual information access (CLIA) addressing the information need of multilingual societies

  102. Poornima C, Dhanalakshmi V, Anand KM, Soman KP (2011) Rule based sentence simplification for english to tamil machine translation system. Int J Comput Appl 25:38–42

    Google Scholar 

  103. Premjith B, Kumar MA, Soman KP (2019) Neural machine translation system for English to Indian language translation using parallel corpus. J Intell Syst 28:387–398

    Google Scholar 

  104. Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: unanswerable questions for SQuAD. ar**v preprint ar**v:180603822

  105. Ramanathan A, Hegde J, Shah R et al (2008) Simple syntactic and morphological processing can help English-Hindi statistical machine translation. In: proceedings of the third international joint conference on natural language processing: volume-I.

  106. Ramesh G, Doddapaneni S, Bheemaraj A, Jobanputra M, Raghavan AK, Sharma A, Khapra MS (2022) Samanantar: the largest publicly available parallel corpora collection for 11 indic languages. Trans Assoc Comput Ling 10:145–162

    Google Scholar 

  107. Ramesh SH, Sankaranarayanan KP (2018) Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. ar**v preprint ar**v:180609652

  108. Rana M, Atique M (2016) Use of fuzzy tool for example-based machine translation. Procedia Comput Sci 79:199–206

    Article  Google Scholar 

  109. Rathod PH, Dhore ML, Dhore RM (2013) Hindi and Marathi to English machine transliteration using SVM. Int J Nat Language Comput 2:55–71

    Article  Google Scholar 

  110. Reddy VR, Maity S, Rao KS (2013) Identification of Indian languages using multi-level spectral and prosodic features. Int J Speech Technol 16:489–511

    Article  Google Scholar 

  111. Registrar General O of (2021) Languages of India, census of India 2011, https://censusindia.gov.in/2011Census/, accessed March 2022

  112. Ruder S (2021) Challenges and opportunities in NLP benchmarking.

  113. Ruiz-Casado M, Alfonseca E, Castells P (2005) Automatic assignment of Wikipedia encyclopedic entries to WordNet synsets. In: International Atlantic Web Intelligence Conference. Springer, pp. 380–386

  114. S R (2019) In India, who speaks in English, and where? mint, https://rb.gy/g10zwz, accessed March 2022

  115. Saha GK (2005) The EB-ANUBAD translator: A hybrid scheme. J Zhejiang Univ - Sci A 6:1047–1050

    Article  Google Scholar 

  116. Sampath Dakshina Murthy A, Karthikeyan T, Vinoth Kanna R (2021) Gait-based person fall prediction using deep learning approach. Soft Comput:1–9

  117. Samuh SBM Home - Sakhi Bahinpa, http://www.sakhibahinpa.org, accessed March 2022

  118. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681

    Article  Google Scholar 

  119. ScienceDirect search results- keywords (machine translation indian language). id.elsevier.com. https://www.sciencedirect.com/search?qs=machine%20translation%20indian%20language&date=2017-2022&articleTypes=FLA (accessed Jun. 01, 2022).

  120. Search Results - Springer. springer.longhoe.net. https://springer.longhoe.net/search?facet-content-type=%22Article%22&query=machine+translation+indian+language&date-facet-mode=between&facet-start-year=2017&previous-start-year=1844&facet-end-year=2022&previous-end-year=2022 (accessed Jun. 01, 2022).

  121. Sharma A, Rattan D (2017) Machine transliteration for Indian languages: a review. International journal of advanced research in computer. Science. 8

  122. Singh M, Kumar R, Chana I (2021) Improving neural machine translation for low-resource Indian languages using rule-based feature extraction. Neural Comput Appl 33:1103–1122

    Article  Google Scholar 

  123. Singh M, Kumar R, Chana I (2021) Machine translation systems for Indian languages: review of modelling techniques, challenges, open issues and future research directions. Arch Comput Methods Eng 28(4):2165–2193

    Article  Google Scholar 

  124. Singh TD, Hujon AV (2020) Low resource and domain specific English to Khasi SMT and NMT systems. In: 2020 International Conference on Computational Performance Evaluation (ComPE) (pp. 733-737). IEEE

  125. Singh UN, Nair VS, Das M (2012) Central Institute of Indian Languages. The Encyclopedia of Applied Linguistics

  126. Sinha RMK (2004) An engineering perspective of machine translation: anglabharti-II and anubharti-II architectures. In: proceedings of international symposium on machine translation, NLP and translation support system (iSTRANS-2004). pp 10–17

  127. Snover M, Dorr B, Schwartz R et al (2006) A study of translation edit rate with targeted human annotation. In: proceedings of the 7th conference of the Association for Machine Translation in the Americas: technical papers. Pp 223–231

  128. Ştefãnescu D, Ion R (2013) Parallel-Wiki: A collection of parallel sentences extracted from Wikipedia. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2013). pp 24–30

  129. Subhadrika Sen TI There’s a growing demand for trained translators, https://www.telegraphindia.com/edugraph/career/theres-a-growing-demand-for-trained-translators-publisher-vaishali-mathur/cid/1840722, accessed January 2022

  130. Suchanek FM, Kasneci G, Weikum G (2008) Yago: A large ontology from wikipedia and wordnet. J Web Semantics 6:203–217

    Article  Google Scholar 

  131. Sutton C, McCallum A (2006) An introduction to conditional random fields for relational learning. Int Stat Relat Learn 2:93–128

    Google Scholar 

  132. Takase S, Kiyono S (2021) Lessons on parameter sharing across layers in transformers. ar**v preprint ar**v:210406022

  133. Team AC Flash and the Future of Interactive Content. archive.org, https://blog.adobe.com/en/publish/2017/07/25/adobe-flash-update.html#gs.51zb0r, accessed January 2022

  134. University CM The METEOR Automatic MT Evaluation Metric, https://www.cs.cmu.edu/~alavie/METEOR/index.html, accessed March 2022

  135. van Rijsbergen CJ (1979) Chapter 7. Inf Retr:178–180

  136. Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING 1996 volume 2: the 16th international conference on computational linguistics.

  137. Voss J (2006) Collaborative thesaurus tagging the Wikipedia way. ar**v preprint cs/0604036

  138. Vuddagiri RK, Gurugubelli K, Jain P et al (2018) IIITH-ILSC speech database for Indian language identification. SLTU, In, pp 56–60

  139. Wallach HM (2004) Conditional random fields: an introduction. Techn Reports (CIS) 22

  140. Wang A, Pruksachatkun Y, Nangia N, et al. (2019) Superglue: A stickier benchmark for general-purpose language understanding systems. ar**v preprint ar**v:190500537

  141. Wang Y, Zhou L, Zhang J, Zong C (2017) Word, subword or character? An empirical study of granularity in Chinese-English NMT. In: China Workshop on Machine Translation. Springer, pp. 30–42

  142. Wani SH (2021) Kashmiri to English machine translation: A study in translation divergence issues of personal and possessive pronouns. Indian J Multiling Res Dev 2(1):1–9

    Article  MathSciNet  Google Scholar 

  143. **e W (2021) GX@DravidianLangTechEACL2021: multilingual neuron machine translation and Back-translation In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics

  144. Xuegong Z (2000) Int Stat Learn Theory Support Vector Mach:1

  145. Zesch T (1646–1652) Müller C, Gurevych I (2008) extracting lexical semantic knowledge from Wikipedia and Wiktionary. In, LREC. pp

  146. Zesch T, Gurevych I (2007) Analysis of the Wikipedia category graph for NLP applications. In: proceedings of the second workshop on TextGraphs: graph-based algorithms for natural language processing. pp 1–8

  147. Zhao L, Kipper K, Schuler W, et al. (2000) A machine translation system from English to American sign language. In: Conference of the Association for Machine Translation in the Americas. Springer, pp. 54–67

Download references

Funding

No funds, grants, or other support were received by the authors for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Abhinav Jha, Literature search and data analysis: Abhinav Jha, Drafting and revisions: Abhinav Jha, Supervision: Hemprasad Yashwant Patil.

Corresponding author

Correspondence to Hemprasad Yashwant Patil.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jha, A., Patil, H.Y. A review of machine transliteration, translation, evaluation metrics and datasets in Indian Languages. Multimed Tools Appl 82, 23509–23540 (2023). https://doi.org/10.1007/s11042-022-14273-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-14273-1

Keywords

Navigation