A Light Arabic POS Tagger Using a Hybrid Approach

  • Conference paper
  • First Online:
Digital Technologies and Applications (ICDTA 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 211))

Included in the following conference series:

  • 2474 Accesses

Abstract

Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. It is a useful preprocessing tool in many natural languages processing (NLP) applications. In this paper, we expose a new Arabic POS Tagger based on the combination of two main modules: the 1st order Markov and a decision tree models. These two modules allow improving existing POS Taggers with the possibility of tagging unknown words. The tag set used for this POS is an elementary tag set composed of 4 tags {noun, verb, particle, punctuation} that are sufficient for some NLP applications but greatly help increasing the accuracy. The POS tagger has been trained with the NEMLAR corpus. The experiment results demonstrate its efficiency with an overall accuracy of 98% for the full system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 245.03
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 320.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.internetworldstats.com

  2. 2.

    https://www.nemlar.org

  3. 3.

    Transliterated using Buckwalter [6].

  4. 4.

    Affixes represent the prefixes, infixes and suffixes indicate substrings that come respectively at the beginning middle and at the end of a word.

  5. 5.

    Corpus available at https://github.com/kdarwish/Farasa.

  6. 6.

    http://arabic.emi.ac.ma/safar/

  7. 7.

    http://arabic.emi.ac.ma:8080/SafarWeb/

References

  1. Al Shamsi FG (2006) A hidden Markov model-based POS tagger for Arabic. In: Proceeding of the 8th international conference on the statistical analysis of textual data, France, pp 31–42

    Google Scholar 

  2. Albared MO (2009) Arabic part of speech disambiguation, pp 517–532

    Google Scholar 

  3. Attia MM (2005) Specifications of the Arabic Written Corpus produced within th NEMLAR project

    Google Scholar 

  4. Atwell ES (2008) Development of tag sets for part-of-speech tagging

    Google Scholar 

  5. Atwell MS (2013) A standard tag set expounding traditional morphological features for Arabic language part-of-speech tagging. Edinburgh University Press

    Google Scholar 

  6. Buckwalter Arabic Transliteration. (n.d.). https://www.qamus.org/transliteration.htm. 20 Oct 2020

  7. Darwish K, Mubarak H, Abdelali A, Eldesouki M (2017) Arabic POS tagging: don’t abandon feature engineering just yet. In: Proceedings of the third arabic natural language processing workshop, pp 130–137. https://doi.org/10.18653/v1/W17-1316

  8. Diab M, Hacioglu K, Jurafsky D (2004) Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: short papers. association for computational linguistics, pp 149–152

    Google Scholar 

  9. Dinçer BT, Karaoğlan B (2004) He effect of part-of-speech tagging on IR performance for Turkish. In: Aykanat C, Dayar T, Körpeoğlu İ (eds.), Computer and Information Sciences—ISCIS 2004, Springer, pp 771–778. https://doi.org/10.1007/978-3-540-30182-0_77

  10. Habash NF (2009) Syntactic annotation in Columbia Arabic Treebank. In: 2nd International Conference on Arabic Language Resources & Tools MEDAR. Cairo

    Google Scholar 

  11. Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) QARAB: A: question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on Computational approaches to semitic languages. July 2002

    Google Scholar 

  12. Imad Zeroual AL (2017) Towards a standard Part of Speech tagset for the Arabic language. J King Saud Univ Comput Inf Sci 171–178

    Google Scholar 

  13. Albared M, T-M O-S-A (2005) probabilistic Arabic part of speech tagger with unknown words handling. J Theor Appl Inf Technol

    Google Scholar 

  14. Maamouri MA (2004) Develo** an Arabic treebank: methods, guidelines, procedures, and tools. In: Proceedings of the 20th international conference on computational linguistics

    Google Scholar 

  15. Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16

    Article  Google Scholar 

  16. Salameh S (2018) A review of part of speech tagger for Arabic Language. International Journal of Computation and Applied Sciences IJOCAAS, 4, 4–5, June 2018. Darwish K, Mubarak H (n.d.). Farasa: A New Fast and Accurate Arabic Word Segmenter. 5

    Google Scholar 

  17. Jaafar Y, Bouzoubaa K (2015) Arabic natural language processing from software engineering to complex pipelines. In: Cicling 2015, Cairo, Egypt, April 2015

    Google Scholar 

  18. Zouaghi A, Merhbene L, Zrigui M (2012) Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38:257–269. https://doi.org/10.1007/s10462-011-9249-3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid Tnaji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tnaji, K., Bouzoubaa, K., Aouragh, S.L. (2021). A Light Arabic POS Tagger Using a Hybrid Approach. In: Motahhir, S., Bossoufi, B. (eds) Digital Technologies and Applications. ICDTA 2021. Lecture Notes in Networks and Systems, vol 211. Springer, Cham. https://doi.org/10.1007/978-3-030-73882-2_19

Download citation

Publish with us

Policies and ethics

Navigation