Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model

  • Conference paper
  • First Online:
Innovations in Computer Science and Engineering

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 74))

Abstract

Information extraction is a key task in natural language processing which helps in knowledge discovery by extracting facts from the semi-structured text like natural language. Named entity recognition is one of the subtask under information extraction. In this work, we use recurrent based sequence models called Long Short-Time Memory (LSTM) for named entities recognition in Tamil language and word representation for words is done through a distributed representation of words. For this work, we have created a Tamil named entities recognition corpus by crawling Wikipedia and we have also used openly available FIRE-2018 Information Extractor for Conversational Systems in Indian Languages (IECSIL) shared task corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Remmiya Devi G, Veena PV, Anand Kumar M, Soman KP (2018) Entity extraction of Hindi–English and Tamil–English code-mixed social media text. In: Forum for information retrieval evaluation. Springer, pp 206–218

    Google Scholar 

  2. Lample G, Ballesteros G, Subramanian S, Kawakami K, Dyer C (2016) Neural architectures for named entity recognition. ar**v:1603.01360

  3. Remmiya Devi G, Veena PV, Anand Kumar M, Soman KP (2016) Entity extraction for malayalam social media text using structured skip-gram based embedding features from unlabeled data. Proced Comput Sci 93:547–553

    Google Scholar 

  4. Mintz M, Bills S, Snow R, Jurafsky D (2009) Distant supervision for relation extraction without labeled data. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, vol 2. Association for Computational Linguistics, pp 1003–1011

    Google Scholar 

  5. Abinaya N, Anand Kumar M, Soman KP (2015) Randomized kernel approach for named entity recognition in Tamil. Indian J Sci Technol 8(24)

    Google Scholar 

  6. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

    Google Scholar 

  7. Barathi Ganesh HB, Anand Kumar M, Soman KP (2018) From vector space models to vector space models of semantics. In: Forum for information retrieval evaluation. Springer, pp 50–60

    Google Scholar 

  8. Anand Kumar M, Soman KP, Barathi Ganesh HB (2016) Distributional semantic representation for text classification and information retrieval. In: CEUR workshop proceedings, vol 1737, pp 126–130

    Google Scholar 

  9. Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association

    Google Scholar 

  10. Hochreiter S, Schmidhuber S (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  11. Barathi Ganesh HB (2018) Information extractor for conversational systems in Indian languages @ forum for information retrieval evaluation. http://iecsil.arnekt.com/!/home

  12. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

    Google Scholar 

  13. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. ar**v:1607.04606

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Hariharan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hariharan, V., Anand Kumar, M., Soman, K.P. (2019). Named Entity Recognition in Tamil Language Using Recurrent Based Sequence Model. In: Saini, H., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 74. Springer, Singapore. https://doi.org/10.1007/978-981-13-7082-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-7082-3_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-7081-6

  • Online ISBN: 978-981-13-7082-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation