Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics

Sa**i, G.; Kallimani, Jagadish S.

doi:10.1007/978-981-15-8677-4_40

G. Sa**i⁶ &
Jagadish S. Kallimani⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 55))

Abstract

Language change throughout time and space is one of the major issues in linguistic history. The paper deals with new methods for the study of language evolutions to help researchers and experts. Firstly, a method is used to determine, if the words are cognate or not. A linguistic information algorithm is proposed to derive cognates from online dictionaries. Then a dataset is created of similar terms and machine learning techniques are used to focus on spelling to classify the cognates. The aligned subsequences are used to identify standards and guidelines for language change in newly created languages mainly to distinguish between non-cognate and cognates which are used for classification algorithms. Secondly, for identifying the sort of association between those words that humans expand the method to a simpler level. Discriminating cognates and debts gives an insight into a language’s history and allows a clearer understanding of the linguistic relationship. The spelling characteristics have discriminative features and analyze the linguistic factors underlying this classification task. This is considered as the first such effort, to linguistic knowledge. Thirdly, a machine learning technique is developed for producing similar words automatically. One should concentrate on proto-word reconstruction to address issues related to it to generate the modern words which are not synonyms and another one is generating cognates. The task of reconstruction of proto words is to recreate words from its modern daughter languages in an ancient language. The method is based on the regularity of words and uses knowledge from many modern languages to build an ensemble method for proto-word reconstruction. This method is applied to multiple datasets to improve from the previous dataset accuracies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 160.49; Price includes VAT (France)

Softcover Book: EUR 210.99; Price includes VAT (France)

Hardcover Book: EUR 210.99; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recognition and Generation of Logically Related Words for Historical Text Data using Reconstruction of Protowords

The Construction of English New Words Corpus Based on Decision Tree Algorithm

Semi-automatic construction of word-formation networks

Article Open access 23 January 2020

References

Rama, T.: Siamese convolutional networks for cognate identification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1018–1027, Osaka (2016)
Google Scholar
Pagel, M., Atkinson, Q.D., Calude, A.S., Meade, A.: Ultraconserved words point to deep language ancestry across Eurasia. Proc. Natl. Acad. Sci. 110(21), 8471–8476 (2013)
Article Google Scholar
Gomes, L., Lopes, J.G.P.: Measuring spelling similarity for cognate identification. In Proceedings of the 15th Portugese Conference on Progress in Artificial Intelligence, EPIA 2011, pp. 624–633, Lisbon (2011)
Google Scholar
Gooskens, C., Heeringa, W., Beijering, K.: Phonetic and lexical predictors of intelligibility. Int. J. Humanities Arts Comput. 2(1–2), 63–81 (2008)
Article Google Scholar
Hall, D., Klein, D.: Finding cognate groups using phylogenies. In Proceedings of ACL 2010, pp. 1030–1039, Uppsala (2010)
Google Scholar
List, J,M.: LexStat: automatic detection of cognates in multilingual wordlists. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS and UNCLH, pp. 117–125, Avignon (2012)
Google Scholar
Luong, M.-T., Brevdo, E., Zhao, R.: Neural machine translation (seq2seq) tutorial (2017). https://github.com/tensorflow/nmt
List, J.M., Greenhill, S.J., Gray, R.D.: The potential of automatic word comparison for historical linguistics. PLoS ONE 12(1), 1–18 (2017)
Article Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of EMNLP 2015, pp. 1412–1421, Lisbon (2015)
Google Scholar
McMahon, A., Heggarty, P., McMahon, R., Slaska, N.: Swadesh sublists and the benefits of borrowing: An Andean case study. Trans. Philol. Soc. 103(2), 147–170 (2005)
Article Google Scholar
Tsvetkov, Y., Ammar, W., Dyer, C.: Constraint-based models of lexical borrowing. In: Proceedings of NAACLHLT 2015, pp. 598–608, Denver, CO (2015)
Google Scholar
Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 67–81, Montreal (1992)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Google Scholar
Schuler, G.D.: Sequence alignment and database searching. In: Baxevanis, A.D., Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, vol. 43, pp. 187–214. John Wiley & Sons, Inc. (2002)
Google Scholar
St. Arnaud, A., Beck, D., Kondrak, G.: Identifying cognate sets across dictionaries of related languages. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2519–2528, Copenhagen (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore, India and affiliated to Visvesvaraya Technological University, Belagavi, Karnataka, India
G. Sa**i & Jagadish S. Kallimani

Authors

G. Sa**i
View author publications
You can also search for this author in PubMed Google Scholar
Jagadish S. Kallimani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jagadish S. Kallimani .

Editor information

Editors and Affiliations

Department of EEE, Shree Venkateshwara Hi-Tech Engineering, Erode, Tamil Nadu, India
P. Karuppusamy
Department of Computer Engineering and Informatics, University of Patras, Patras, Greece
Isidoros Perikos
College of Information and Engineering, Wenzhou Medical University, Wenzhou, China
Fuqian Shi
Department of Computer Science, Purdue University Fort Wayne, Fort Wayne, IN, USA
Tu N. Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sa**i, G., Kallimani, J.S. (2021). Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics. In: Karuppusamy, P., Perikos, I., Shi, F., Nguyen, T.N. (eds) Sustainable Communication Networks and Application. Lecture Notes on Data Engineering and Communications Technologies, vol 55. Springer, Singapore. https://doi.org/10.1007/978-981-15-8677-4_40

Download citation

DOI: https://doi.org/10.1007/978-981-15-8677-4_40
Published: 26 January 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8676-7
Online ISBN: 978-981-15-8677-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recognition and Generation of Logically Related Words for Historical Text Data using Reconstruction of Protowords

The Construction of English New Words Corpus Based on Decision Tree Algorithm

Semi-automatic construction of word-formation networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Recognition and Generation of Logically Related Words for Historical Text Data using Reconstruction of Protowords

The Construction of English New Words Corpus Based on Decision Tree Algorithm

Semi-automatic construction of word-formation networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation