A Comparative Study of Standard Part-of-Speech Taggers

  • Conference paper
  • First Online:
Advanced Intelligent Systems for Sustainable Development (AI2SD’2018) (AI2SD 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 915))

  • 878 Accesses

Abstract

The Part of Speech (PoS) tagging is resolving ambiguity during text processing to assign morphosyntactic tags to each word according to the context. It is an essential task in several fields, particularly corpus linguistics and Natural Language Processing (NLP). Several PoS taggers and tools are already in service as open source or as commercialized solutions. Therefore, deeper investigation regarding their performance is required especially for under-resourced languages like Arabic. Some well-known probabilistic methods were adapted for PoS tagging such as Hidden Markov Models (HMMs), Support Vector Machines (SVM), and Decision Tree (DT). Based on these methods, language-independent PoS taggers have been developed namely TnT, SVMTool, and Treetagger. In fact, this article presents very important topic which concerns, on the one hand, an adaptation of Standard PoS taggers for the Arabic language, and in the other hand conducting very rich and comparative studies and evaluation. Basically, Arabic PoS taggers are very sensitive to the number of the tagsets used and the text form processed, therefore, four different tagsets and two text forms (i.e., Classical and Modern Standard Arabic) have been used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://qutrub.arabeyes.org/.

  2. 2.

    http://oujda-nlp-team.net/en/programms/standard-pos-tagset-arabic-language/.

References

  1. Henrich, V., Reuter, T., Loftsson, H.: CombiTagger: a system for develo** combined taggers. In: FLAIRS Conference (2009)

    Google Scholar 

  2. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)

    Google Scholar 

  3. Giménez, J., Marquez, L.: SVMTool: a general POS tagger generator based on support vector machines (2004)

    Google Scholar 

  4. Schmid, H.: Probabilistic part-ofispeech tagging using decision trees. In: New Methods in Language Processing, p. 154. Routledge (2013)

    Google Scholar 

  5. Abumalloh, R.A., Al-Sarhan, H.M., Ibrahim, O., Abu-Ulbeh, W.: Arabic part-of-speech tagging. J. Soft Comput. Decis. Support Syst. 3, 45–52 (2016)

    Google Scholar 

  6. Zeroual, I., Lakhouaja, A.: Data science in light of natural language processing: an overview. Proc. Comput. Sci. 127, 82–91 (2018)

    Article  Google Scholar 

  7. Zeroual, I., Lakhouaja, A.: Arabic Corpus linguistics: major progress, but still a long way to go. In: Intelligent Natural Language Processing: Trends and Applications, pp. 613–636. Springer, Cham (2018)

    Google Scholar 

  8. Diab, M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short Papers, pp. 149–152. Association for Computational Linguistics (2004)

    Google Scholar 

  9. Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. TALIP. 8, 14 (2009)

    Google Scholar 

  10. Utvić, M.: Annotating the corpus of contemporary Serbian. In: Proceedings of the INFOtheca ‘12 Conference (2011)

    Google Scholar 

  11. AlGahtani, S., Black, W., McNaught, J.: Arabic part-of-speech tagging using transformation-based learning. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2009)

    Google Scholar 

  12. Altabba, M., Al-Zaraee, A., Shukairy, M.A.: An Arabic morphological analyzer and part-of-speech tagger. Thesis Present. Fac. Inform. Eng. Arab Int. Univ. Damascus Syr. (2010)

    Google Scholar 

  13. Maamouri, M., Bies, A., Kulick, S., Gaddeche, F., Mekki, W., Krouna, S., Bouziri, B., Zaghouani, W.: Arabic treebank: part 1 v 4.1. (2013)

    Google Scholar 

  14. Van den Bosch, A., Marsi, E., Soudi, A.: Memory-based morphological analysis and part-of-speech tagging of Arabic. In: Arabic Computational Morphology, pp. 201–217. Springer (2007)

    Google Scholar 

  15. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  16. Al Shamsi, F., Guessoum, A.: A hidden Markov model-based POS tagger for Arabic. In: Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data, France, pp. 31–42 (2006)

    Google Scholar 

  17. Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools (2009)

    Google Scholar 

  18. Hadni, M., Ouatik, S.A., Lachkar, A., Meknassi, M.: Hybrid part-of-speech tagger for non-vocalized Arabic text. Int. J. Nat. Lang. Comput. 2, 1–15 (2013)

    Article  Google Scholar 

  19. Ababou, N., Mazroui, A.: A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int. J. Speech Technol. 19, 289–302 (2016)

    Article  Google Scholar 

  20. Aliwy, A.H.: Arabic Morphosyntactic raw text part of speech tagging system. http://portal.mimuw.edu.pl/wiadomosci/aktualnosci/doktoraty/pliki/ahmed_hussein_aliwy/aa-dok.pdf (2013)

  21. Imad, Z., Abdelhak, L.: Adapting a decision tree based tagger for Arabic. In: Presented at the 2016 International Conference on Information Technology for Organizations Development, IT4OD 2016 (2016)

    Google Scholar 

  22. Boudchiche, M., Mazroui, A., Bebah, M.O.A.O., Lakhouaja, A., Boudlal, A.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29, 141–146 (2017)

    Google Scholar 

  23. Buckwalter, T.: Buckwalter Arabic morphological analyzer (BAMA) version 2.0. linguistic data consortium (LDC) catalogue number LDC2004L02. ISBN1-58563-324-0 (2004)

    Google Scholar 

  24. Parker, R.: Arabic Gigaword Fourth Edition LDC2009T30 (2009)

    Google Scholar 

  25. Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief. 11, 147–151 (2017)

    Article  Google Scholar 

  26. Attia, M.: Arabic named entities. https://sourceforge.net/projects/arabicnes/

  27. Zeroual, I., Lakhouaja, A.: A new Quranic Corpus rich in morphosyntactical information. Int. J. Speech Technol., 1–8 (2016)

    Google Scholar 

  28. Attia, M., Yaseen, M., Choukri, K.: Specifications of the Arabic Written Corpus produced within the NEMLAR project (2005)

    Google Scholar 

  29. Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., et al.: Building annotated written and spoken Arabic LR’s in NEMLAR project. In: Proceedings of LREC, pp. 533–538 (2006)

    Google Scholar 

  30. Khoja, S.: APT: Arabic part-of-speech tagger. In: Proceedings of the Student Workshop at NAACL, pp. 20–25 (2001)

    Google Scholar 

  31. Alqrainy, S.: A morphological-syntactical analysis approach for Arabic textual tagging. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.505224? (2008)

  32. Sawalha, M.: Arabic morphological features tag set. http://www.comp.leeds.ac.uk/sawalha/tagset_details.html (2009)

  33. Zeroual, I., Lakhouaja, A., Belahbib, R.: Towards a standard Part of Speech tagset for the Arabic language. J. King Saud Univ.—Comput. Inf. Sci. 29, 174–181 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imad Zeroual .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zeroual, I., Lakhouaja, A. (2019). A Comparative Study of Standard Part-of-Speech Taggers. In: Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2018). AI2SD 2018. Advances in Intelligent Systems and Computing, vol 915. Springer, Cham. https://doi.org/10.1007/978-3-030-11928-7_75

Download citation

Publish with us

Policies and ethics

Navigation