Chunker for Gujarati Language Using Hybrid Approach

  • Conference paper
  • First Online:
Rising Threats in Expert Applications and Solutions

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1187))

  • 1170 Accesses

Abstract

For free word order languages, chunking is quite challenging as they have relatively unrestricted phrase structures. A robust chunker helps in other NLP applications. This paper presents a Hybrid chunker for Gujarati Language. Contextual information in the form of last two unicodes of the word and of part-of-speech (POS) is used as the key features in develo** the chunker using Machine learning approach. Four different statistical techniques, namely, SVM, CRF, Naïve Bayes, and HMM have been implemented to identify the most appropriate technique for Chunking the text in Gujarati language. Further, to improve performance, linguistic rules have been designed. Finally, achieved accuracy is 98.21% with precision, recall, and F1 score of 96.42%, 95.62 and 96.02, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 160.49
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 213.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. D. Jurafsky, J.H. Martin, Partial parsing, in Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. (Dorling Kindersley Pvt, Ltd., India, 2014), pp. 460–466

    Google Scholar 

  2. P. Dakwale, Anaphora resolution in hindi. M.S. thesis, IIITH, Hyderabad, India (2014)

    Google Scholar 

  3. S.P. Abney, Parsing by chunks, in Studies in Linguistics and Philosophy Principle-Based Parsing (1991), pp. 257–278

    Google Scholar 

  4. E. Ejerhed, K.W. Church, Finite state parsing, in Papers from the Seventh Scandinavian Conference of Linguistics (University of Helsinki, Finland, 1983)

    Google Scholar 

  5. S. Abney, Partial parsing via finite state cascades, in Proceedings of the ESSLLI Workshop on Robust Parsing, Prague, Czech Republic (1996)

    Google Scholar 

  6. T. Brants, Cascaded markov models, in Proceedings of EACL’99, Bergen, Norway (1999)

    Google Scholar 

  7. L.A. Ramshaw, M.P. Marcus, Text chunking using transformation based learning, in Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA (1995), pp. 84–94

    Google Scholar 

  8. A. Singh et al., HMM based chunker for hindi, in Proceedings of IJCNLP-05: The Second International Joint Conference on Natural Language Processing, 11–13 October 2005, Jeju Island, Republic of Korea. TDIL (2005), http://tdil-dc.in

  9. T. Zhang et al., Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)

    MATH  Google Scholar 

  10. T. Kudo, Y. Matsumoto, Chunking with support vector machines. J. Nat. Lang. Process. 9(5), 3–21 (2002)

    Google Scholar 

  11. A. Bharathi, P. Mannem, Introduction to the shallow parsing contest for South Asian languages, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 1–8, http://shiva.iiit.ac.in/SPSAL2007

  12. A. Bharati et al., AnnCorra: annotating corpora, guidelines for POS and chunk annotation for Indian languages. LTRC-TR31 (2006), http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf

  13. A. Ekbal et al., POS tagging using HMM and rule based chunking, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 25–28, http://shiva.iiit.ac.in/SPSAL2007

  14. S. Dandapat, “Part of Speech and Chunking with Maximum Entropy Model,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 29–32. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007

  15. S. Chandra Pammi and K. Prahallad, “POS tagging and Chunking using Decision Tree Forests,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 33–36. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007

  16. H. Agrawal, “POS tagging and Chunking for Indian Languages,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 37–40. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007

  17. P.V.S. Avinesh, G. Karthik, Part of speech tagging and chunking using conditional random fields and transformation based learning, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, pp. 21–24, http://shiva.iiit.ac.in/SPSAL2007

  18. R.A. Bhat, D.M. Sharma, A hybrid approach to kashmiri shallow parsing, in The 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2011

    Google Scholar 

  19. A. Ojha et al., A hybrid chunker for hindi and Indian english, in Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation Under the 10th LREC2016, 23–28 May 2016, pp. 93–99

    Google Scholar 

  20. C. Patel, D. Ahalpara, A statistical chunker for Indian language Gujarati. Int. J. Comput. Eng. Appl. IX(VII), 173–180 (2015)

    Google Scholar 

  21. M.A. Covington, A dependency parser for variable-word-order languages (The University of Georgia 1990)

    Google Scholar 

  22. E.F.T.K. Sang, J. Veenstra, Representing text chunks, in Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics (1999), pp. 173–179

    Google Scholar 

  23. Source code for nltk.tag.hmm, Natural Language Toolkit, https://www.nltk.org/_modules/nltk/tag/hmm.html. Accessed 15 July 2017

  24. A.Z. Amrullah, R. Hartanto, I.W. Mustika, A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia, in 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta (2017), pp. 1–5. https://doi.org/10.1109/inaes.2017.8068538

  25. E. Loper, Source code for nltk.classify.naivebayes, Natural Language Toolkit, _modules/nltk/classify/naivebayes.html. Accessed 15 July 2017

    Google Scholar 

  26. B. Aisen, A comparison of multiclass SVM methods, 15 December 2006, http://courses.media.mit.edu/2006fall/mas622j/Projects/aisen-project/. Accessed 20 July 2017

  27. T. Kudo, Y. Matsumoto, YamCha: yet another multipurpose chunk annotator (2017), http://chasen.org/~taku/software/YamCha/index.html. Accessed 20 June 2017

  28. T. Kudo, CRF: yet another CRF toolkit (2005), https://taku910.github.io/crfpp/. Accessed 10 June 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chetana Tailor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tailor, C., Patel, B. (2021). Chunker for Gujarati Language Using Hybrid Approach. In: Rathore, V.S., Dey, N., Piuri, V., Babo, R., Polkowski, Z., Tavares, J.M.R.S. (eds) Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, vol 1187. Springer, Singapore. https://doi.org/10.1007/978-981-15-6014-9_10

Download citation

Publish with us

Policies and ethics

Navigation