Abstract
Language change throughout time and space is one of the major issues in linguistic history. The paper deals with new methods for the study of language evolutions to help researchers and experts. Firstly, a method is used to determine, if the words are cognate or not. A linguistic information algorithm is proposed to derive cognates from online dictionaries. Then a dataset is created of similar terms and machine learning techniques are used to focus on spelling to classify the cognates. The aligned subsequences are used to identify standards and guidelines for language change in newly created languages mainly to distinguish between non-cognate and cognates which are used for classification algorithms. Secondly, for identifying the sort of association between those words that humans expand the method to a simpler level. Discriminating cognates and debts gives an insight into a languageās history and allows a clearer understanding of the linguistic relationship. The spelling characteristics have discriminative features and analyze the linguistic factors underlying this classification task. This is considered as the first such effort, to linguistic knowledge. Thirdly, a machine learning technique is developed for producing similar words automatically. One should concentrate on proto-word reconstruction to address issues related to it to generate the modern words which are not synonyms and another one is generating cognates. The task of reconstruction of proto words is to recreate words from its modern daughter languages in an ancient language. The method is based on the regularity of words and uses knowledge from many modern languages to build an ensemble method for proto-word reconstruction. This method is applied to multiple datasets to improve from the previous dataset accuracies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rama, T.: Siamese convolutional networks for cognate identification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1018ā1027, Osaka (2016)
Pagel, M., Atkinson, Q.D., Calude, A.S., Meade, A.: Ultraconserved words point to deep language ancestry across Eurasia. Proc. Natl. Acad. Sci. 110(21), 8471ā8476 (2013)
Gomes, L., Lopes, J.G.P.: Measuring spelling similarity for cognate identification. In Proceedings of the 15th Portugese Conference on Progress in Artificial Intelligence, EPIA 2011, pp. 624ā633, Lisbon (2011)
Gooskens, C., Heeringa, W., Beijering, K.: Phonetic and lexical predictors of intelligibility. Int. J. Humanities Arts Comput. 2(1ā2), 63ā81 (2008)
Hall, D., Klein, D.: Finding cognate groups using phylogenies. In Proceedings of ACL 2010, pp. 1030ā1039, Uppsala (2010)
List, J,M.: LexStat: automatic detection of cognates in multilingual wordlists. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS and UNCLH, pp. 117ā125, Avignon (2012)
Luong, M.-T., Brevdo, E., Zhao, R.: Neural machine translation (seq2seq) tutorial (2017). https://github.com/tensorflow/nmt
List, J.M., Greenhill, S.J., Gray, R.D.: The potential of automatic word comparison for historical linguistics. PLoS ONE 12(1), 1ā18 (2017)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In Proceedings of EMNLP 2015, pp. 1412ā1421, Lisbon (2015)
McMahon, A., Heggarty, P., McMahon, R., Slaska, N.: Swadesh sublists and the benefits of borrowing: An Andean case study. Trans. Philol. Soc. 103(2), 147ā170 (2005)
Tsvetkov, Y., Ammar, W., Dyer, C.: Constraint-based models of lexical borrowing. In: Proceedings of NAACLHLT 2015, pp. 598ā608, Denver, CO (2015)
Simard, M., Foster, G.F., Isabelle, P.: Using cognates to align sentences in bilingual corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 67ā81, Montreal (1992)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press (2004)
Schuler, G.D.: Sequence alignment and database searching. In: Baxevanis, A.D., Ouellette, B.F.F. (eds.) Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, vol. 43, pp. 187ā214. John Wiley & Sons, Inc. (2002)
St. Arnaud, A., Beck, D., Kondrak, G.: Identifying cognate sets across dictionaries of related languages. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2519ā2528, Copenhagen (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sa**i, G., Kallimani, J.S. (2021). Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics. In: Karuppusamy, P., Perikos, I., Shi, F., Nguyen, T.N. (eds) Sustainable Communication Networks and Application. Lecture Notes on Data Engineering and Communications Technologies, vol 55. Springer, Singapore. https://doi.org/10.1007/978-981-15-8677-4_40
Download citation
DOI: https://doi.org/10.1007/978-981-15-8677-4_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8676-7
Online ISBN: 978-981-15-8677-4
eBook Packages: EngineeringEngineering (R0)