Chunker for Gujarati Language Using Hybrid Approach

Tailor, Chetana; Patel, Bankim

doi:10.1007/978-981-15-6014-9_10

Chetana Tailor²⁰ &
Bankim Patel²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1187))

1170 Accesses

Abstract

For free word order languages, chunking is quite challenging as they have relatively unrestricted phrase structures. A robust chunker helps in other NLP applications. This paper presents a Hybrid chunker for Gujarati Language. Contextual information in the form of last two unicodes of the word and of part-of-speech (POS) is used as the key features in develo** the chunker using Machine learning approach. Four different statistical techniques, namely, SVM, CRF, Naïve Bayes, and HMM have been implemented to identify the most appropriate technique for Chunking the text in Gujarati language. Further, to improve performance, linguistic rules have been designed. Finally, achieved accuracy is 98.21% with precision, recall, and F1 score of 96.42%, 95.62 and 96.02, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 160.49; Price includes VAT (Germany)

Softcover Book: EUR 213.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid Chunker for Gujarati Language

Hierarchical Amharic Base Phrase Chunking Using HMM with Error Pruning

Amharic Sentence Parsing Using Base Phrase Chunking

References

D. Jurafsky, J.H. Martin, Partial parsing, in Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. (Dorling Kindersley Pvt, Ltd., India, 2014), pp. 460–466
Google Scholar
P. Dakwale, Anaphora resolution in hindi. M.S. thesis, IIITH, Hyderabad, India (2014)
Google Scholar
S.P. Abney, Parsing by chunks, in Studies in Linguistics and Philosophy Principle-Based Parsing (1991), pp. 257–278
Google Scholar
E. Ejerhed, K.W. Church, Finite state parsing, in Papers from the Seventh Scandinavian Conference of Linguistics (University of Helsinki, Finland, 1983)
Google Scholar
S. Abney, Partial parsing via finite state cascades, in Proceedings of the ESSLLI Workshop on Robust Parsing, Prague, Czech Republic (1996)
Google Scholar
T. Brants, Cascaded markov models, in Proceedings of EACL’99, Bergen, Norway (1999)
Google Scholar
L.A. Ramshaw, M.P. Marcus, Text chunking using transformation based learning, in Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA (1995), pp. 84–94
Google Scholar
A. Singh et al., HMM based chunker for hindi, in Proceedings of IJCNLP-05: The Second International Joint Conference on Natural Language Processing, 11–13 October 2005, Jeju Island, Republic of Korea. TDIL (2005), http://tdil-dc.in
T. Zhang et al., Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2, 615–637 (2002)
MATH Google Scholar
T. Kudo, Y. Matsumoto, Chunking with support vector machines. J. Nat. Lang. Process. 9(5), 3–21 (2002)
Google Scholar
A. Bharathi, P. Mannem, Introduction to the shallow parsing contest for South Asian languages, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 1–8, http://shiva.iiit.ac.in/SPSAL2007
A. Bharati et al., AnnCorra: annotating corpora, guidelines for POS and chunk annotation for Indian languages. LTRC-TR31 (2006), http://ltrc.iiit.ac.in/MachineTrans/publications/technicalReports/tr031/posguidelines.pdf
A. Ekbal et al., POS tagging using HMM and rule based chunking, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad (2007), pp. 25–28, http://shiva.iiit.ac.in/SPSAL2007
S. Dandapat, “Part of Speech and Chunking with Maximum Entropy Model,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 29–32. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
S. Chandra Pammi and K. Prahallad, “POS tagging and Chunking using Decision Tree Forests,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 33–36. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
H. Agrawal, “POS tagging and Chunking for Indian Languages,” in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, 2007, pp. 37–40. [Online] Available: http://shiva.iiit.ac.in/SPSAL2007
P.V.S. Avinesh, G. Karthik, Part of speech tagging and chunking using conditional random fields and transformation based learning, in Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, pp. 21–24, http://shiva.iiit.ac.in/SPSAL2007
R.A. Bhat, D.M. Sharma, A hybrid approach to kashmiri shallow parsing, in The 5th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2011
Google Scholar
A. Ojha et al., A hybrid chunker for hindi and Indian english, in Proceedings of the 3rd Workshop on Indian Language Data: Resources and Evaluation Under the 10th LREC2016, 23–28 May 2016, pp. 93–99
Google Scholar
C. Patel, D. Ahalpara, A statistical chunker for Indian language Gujarati. Int. J. Comput. Eng. Appl. IX(VII), 173–180 (2015)
Google Scholar
M.A. Covington, A dependency parser for variable-word-order languages (The University of Georgia 1990)
Google Scholar
E.F.T.K. Sang, J. Veenstra, Representing text chunks, in Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics (1999), pp. 173–179
Google Scholar
Source code for nltk.tag.hmm, Natural Language Toolkit, https://www.nltk.org/_modules/nltk/tag/hmm.html. Accessed 15 July 2017
A.Z. Amrullah, R. Hartanto, I.W. Mustika, A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia, in 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta (2017), pp. 1–5. https://doi.org/10.1109/inaes.2017.8068538
E. Loper, Source code for nltk.classify.naivebayes, Natural Language Toolkit, _modules/nltk/classify/naivebayes.html. Accessed 15 July 2017
Google Scholar
B. Aisen, A comparison of multiclass SVM methods, 15 December 2006, http://courses.media.mit.edu/2006fall/mas622j/Projects/aisen-project/. Accessed 20 July 2017
T. Kudo, Y. Matsumoto, YamCha: yet another multipurpose chunk annotator (2017), http://chasen.org/~taku/software/YamCha/index.html. Accessed 20 June 2017
T. Kudo, CRF: yet another CRF toolkit (2005), https://taku910.github.io/crfpp/. Accessed 10 June 2017

Download references

Author information

Authors and Affiliations

Shrimad Rajachandra Institute of Management and Computer Applications, Uka Tarsadia University, Bardoli, Surat, India
Chetana Tailor & Bankim Patel

Authors

Chetana Tailor
View author publications
You can also search for this author in PubMed Google Scholar
Bankim Patel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chetana Tailor .

Editor information

Editors and Affiliations

IIS Deemed to be University, Jaipur, Rajasthan, India
Vijay Singh Rathore
Techno India College of Engineering, Kolkata, West Bengal, India
Nilanjan Dey
Department of Computer Science, University of Milan, Milano, Italy
Vincenzo Piuri
Porto Accounting and Business School, Polytechnic Institute of Porto, Porto, Portugal
Rosalina Babo
Jan Wyzykowski University, Polkowice, Poland
Zdzislaw Polkowski
Faculty of Engineering, University of Porto, Porto, Portugal
João Manuel R. S. Tavares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tailor, C., Patel, B. (2021). Chunker for Gujarati Language Using Hybrid Approach. In: Rathore, V.S., Dey, N., Piuri, V., Babo, R., Polkowski, Z., Tavares, J.M.R.S. (eds) Rising Threats in Expert Applications and Solutions. Advances in Intelligent Systems and Computing, vol 1187. Springer, Singapore. https://doi.org/10.1007/978-981-15-6014-9_10

Download citation

DOI: https://doi.org/10.1007/978-981-15-6014-9_10
Published: 02 October 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6013-2
Online ISBN: 978-981-15-6014-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Chunker for Gujarati Language Using Hybrid Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Chunker for Gujarati Language

Hierarchical Amharic Base Phrase Chunking Using HMM with Error Pruning

Amharic Sentence Parsing Using Base Phrase Chunking

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Chunker for Gujarati Language Using Hybrid Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Chunker for Gujarati Language

Hierarchical Amharic Base Phrase Chunking Using HMM with Error Pruning

Amharic Sentence Parsing Using Base Phrase Chunking

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation