Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics

  • Conference paper
  • First Online:
Sustainable Communication Networks and Application

Abstract

Language change throughout time and space is one of the major issues in linguistic history. The paper deals with new methods for the study of language evolutions to help researchers and experts. Firstly, a method is used to determine, if the words are cognate or not. A linguistic information algorithm is proposed to derive cognates from online dictionaries. Then a dataset is created of similar terms and machine learning techniques are used to focus on spelling to classify the cognates. The aligned subsequences are used to identify standards and guidelines for language change in newly created languages mainly to distinguish between non-cognate and cognates which are used for classification algorithms. Secondly, for identifying the sort of association between those words that humans expand the method to a simpler level. Discriminating cognates and debts gives an insight into a languageā€™s history and allows a clearer understanding of the linguistic relationship. The spelling characteristics have discriminative features and analyze the linguistic factors underlying this classification task. This is considered as the first such effort, to linguistic knowledge. Thirdly, a machine learning technique is developed for producing similar words automatically. One should concentrate on proto-word reconstruction to address issues related to it to generate the modern words which are not synonyms and another one is generating cognates. The task of reconstruction of proto words is to recreate words from its modern daughter languages in an ancient language. The method is based on the regularity of words and uses knowledge from many modern languages to build an ensemble method for proto-word reconstruction. This method is applied to multiple datasets to improve from the previous dataset accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 210.99
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 210.99
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rama, T.: Siamese convolutional networks for cognate identification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1018ā€“1027, Osaka (2016)

    Google ScholarĀ 

  2. Pagel, M., Atkinson, Q.D., Calude, A.S., Meade, A.: Ultraconserved words point to deep language ancestry across Eurasia. Proc. Natl. Acad. Sci. 110(21), 8471ā€“8476 (2013)

    ArticleĀ  Google ScholarĀ 

  3. Gomes, L., Lopes, J.G.P.: Measuring spelling similarity for cognate identification. In Proceedings of the 15th Portugese Conference on Progress in Artificial Intelligence, EPIA 2011, pp. 624ā€“633, Lisbon (2011)

    Google ScholarĀ 

  4. Gooskens, C., Heeringa, W., Beijering, K.: Phonetic and lexical predictors of intelligibility. Int. J. Humanities Arts Comput. 2(1ā€“2), 63ā€“81 (2008)

    ArticleĀ  Google ScholarĀ 

  5. Hall, D., Klein, D.: Finding cognate groups using phylogenies. In Proceedings of ACL 2010, pp. 1030ā€“1039, Uppsala (2010)

    Google ScholarĀ 

  6. List, J,M.: LexStat: automatic detection of cognates in multilingual wordlists. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS and UNCLH, pp. 117ā€“125, Avignon (2012)

    Google ScholarĀ 

  7. Luong, M.-T., Brevdo, E., Zhao, R.: Neural machine translation (seq2seq) tutorial (2017). https://github.com/tensorflow/nmt

  8. List, J.M., Greenhill, S.J., Gray, R.D.: The potential of automatic word comparison for historical linguistics. PLoS ONE 12(1), 1ā€“18 (2017)

    ArticleĀ  Google ScholarĀ 

  9. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of EMNLP 2015, pp. 1412ā€“1421, Lisbon (2015)

    Google ScholarĀ 

  10. McMahon, A., Heggarty, P., McMahon, R., Slaska, N.: Swadesh sublists and the benefits of borrowing: An Andean case study. Trans. Philol. Soc. 103(2), 147ā€“170 (2005)

    ArticleĀ  Google ScholarĀ 

  11. Tsvetkov, Y., Ammar, W., Dyer, C.: Constraint-based models of lexical borrowing. In: Proceedings of NAACLHLT 2015, pp. 598ā€“608, Denver, CO (2015)

    Google ScholarĀ 

  12. Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 67ā€“81, Montreal (1992)

    Google ScholarĀ 

  13. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)

    Google ScholarĀ 

  14. Schuler, G.D.: Sequence alignment and database searching. In: Baxevanis, A.D., Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, vol. 43, pp. 187ā€“214. John Wiley & Sons, Inc. (2002)

    Google ScholarĀ 

  15. St. Arnaud, A., Beck, D., Kondrak, G.: Identifying cognate sets across dictionaries of related languages. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2519ā€“2528, Copenhagen (2017)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jagadish S. Kallimani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sa**i, G., Kallimani, J.S. (2021). Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics. In: Karuppusamy, P., Perikos, I., Shi, F., Nguyen, T.N. (eds) Sustainable Communication Networks and Application. Lecture Notes on Data Engineering and Communications Technologies, vol 55. Springer, Singapore. https://doi.org/10.1007/978-981-15-8677-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-8677-4_40

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-8676-7

  • Online ISBN: 978-981-15-8677-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation