Abstract
Information extraction is a key task in natural language processing which helps in knowledge discovery by extracting facts from the semi-structured text like natural language. Named entity recognition is one of the subtask under information extraction. In this work, we use recurrent based sequence models called Long Short-Time Memory (LSTM) for named entities recognition in Tamil language and word representation for words is done through a distributed representation of words. For this work, we have created a Tamil named entities recognition corpus by crawling Wikipedia and we have also used openly available FIRE-2018 Information Extractor for Conversational Systems in Indian Languages (IECSIL) shared task corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Remmiya Devi G, Veena PV, Anand Kumar M, Soman KP (2018) Entity extraction of Hindi–English and Tamil–English code-mixed social media text. In: Forum for information retrieval evaluation. Springer, pp 206–218
Lample G, Ballesteros G, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. ar**v:1603.01360
Remmiya Devi G, Veena PV, Anand Kumar M, Soman KP (2016) Entity extraction for malayalam social media text using structured skip-gram based embedding features from unlabeled data. Proced Comput Sci 93:547–553
Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, vol 2. Association for Computational Linguistics, pp 1003–1011
Abinaya N, Anand Kumar M, Soman KP (2015) Randomized kernel approach for named entity recognition in Tamil. Indian J Sci Technol 8(24)
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Barathi Ganesh HB, Anand Kumar M, Soman KP (2018) From vector space models to vector space models of semantics. In: Forum for information retrieval evaluation. Springer, pp 50–60
Anand Kumar M, Soman KP, Barathi Ganesh HB (2016) Distributional semantic representation for text classification and information retrieval. In: CEUR workshop proceedings, vol 1737, pp 126–130
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association
Hochreiter S, Schmidhuber S (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Barathi Ganesh HB (2018) Information extractor for conversational systems in Indian languages @ forum for information retrieval evaluation. http://iecsil.arnekt.com/!/home
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. ar**v:1607.04606
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hariharan, V., Anand Kumar, M., Soman, K.P. (2019). Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model. In: Saini, H., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 74. Springer, Singapore. https://doi.org/10.1007/978-981-13-7082-3_12
Download citation
DOI: https://doi.org/10.1007/978-981-13-7082-3_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7081-6
Online ISBN: 978-981-13-7082-3
eBook Packages: EngineeringEngineering (R0)