Abstract
The origin of Native American peoples (the Indians) has been a topic of research for many years. Although DNA research has begun to make some progress, there are several competing theories that have yet to be disproved. One of the areas that may reveal the possible relations between Native Americans and other people is through the study of their languages. In this paper, we present a clustering analysis of n = 815 commonly used words in eight different languages including five western languages (English, German, French, Spanish, Italian), an ancient language (Latin), an Asian language (Japanese), and the language of a Native American tribe (Ojibwa). Several similarity measures were established in our clustering analysis using both the word spellings and phonic equivalents (metaphones), in an attempt to discover any underlying relationships between these languages. The results tend to support one of the leading theories describing how Native American tribes, specifically the Ojibwa people, arrived in North America.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 96–103. ACM (1998)
Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. Association for Computational Linguistics (2006)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
FindTheData: Compare common words translated. http://common-words-translated.findthedata.com
Heeringa, W., Kleiweg, P., Gooskens, C., Nerbonne, J.: Evaluation of string distance algorithms for dialectology. In: Proceedings of the Workshop on Linguistic Distances, pp. 51–62. Association for Computational Linguistics (2006)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson Prentice Hall (2007)
Kondrak, G.: N-gram similarity and distance. In: String processing and information retrieval, pp. 115–126. Springer (2005)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1965)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 2, pp. 768–774. Association for Computational Linguistics (1998)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT press (1999)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, pp. 183–190. Association for Computational Linguistics (1993)
Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 39–43 (1990)
Serva, M., Petroni, F.: Indo-European languages tree by Levenshtein distance. EPL (Europhys. Lett.) 81(6), 68,005 (2008)
SIL International: Ethnologue 17th edition website. http://www.ethnologue.com/ethno_docs/introduction.asp
Swadesh, M.: Salish internal relationships. Int. J. Am. Linguist. 16, 157–167 (1950)
Swadesh, M.: The origin and diversification of language. Transaction Publishers (1971)
Wikipedia: Swadesh list. http://en.wikipedia.org/wiki/Swadesh_list
Xu, R., Wunsch, D., et al.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Daniels, J., Nye, D., Hu, G. (2016). Cluster Analysis for Commonalities Between Words of Different Languages. In: Lee, R. (eds) Applied Computing & Information Technology. Studies in Computational Intelligence, vol 619. Springer, Cham. https://doi.org/10.1007/978-3-319-26396-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-26396-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26394-6
Online ISBN: 978-3-319-26396-0
eBook Packages: EngineeringEngineering (R0)