Abstract
The Part of Speech (PoS) tagging is resolving ambiguity during text processing to assign morphosyntactic tags to each word according to the context. It is an essential task in several fields, particularly corpus linguistics and Natural Language Processing (NLP). Several PoS taggers and tools are already in service as open source or as commercialized solutions. Therefore, deeper investigation regarding their performance is required especially for under-resourced languages like Arabic. Some well-known probabilistic methods were adapted for PoS tagging such as Hidden Markov Models (HMMs), Support Vector Machines (SVM), and Decision Tree (DT). Based on these methods, language-independent PoS taggers have been developed namely TnT, SVMTool, and Treetagger. In fact, this article presents very important topic which concerns, on the one hand, an adaptation of Standard PoS taggers for the Arabic language, and in the other hand conducting very rich and comparative studies and evaluation. Basically, Arabic PoS taggers are very sensitive to the number of the tagsets used and the text form processed, therefore, four different tagsets and two text forms (i.e., Classical and Modern Standard Arabic) have been used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Henrich, V., Reuter, T., Loftsson, H.: CombiTagger: a system for develo** combined taggers. In: FLAIRS Conference (2009)
Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 224–231. Association for Computational Linguistics (2000)
Giménez, J., Marquez, L.: SVMTool: a general POS tagger generator based on support vector machines (2004)
Schmid, H.: Probabilistic part-ofispeech tagging using decision trees. In: New Methods in Language Processing, p. 154. Routledge (2013)
Abumalloh, R.A., Al-Sarhan, H.M., Ibrahim, O., Abu-Ulbeh, W.: Arabic part-of-speech tagging. J. Soft Comput. Decis. Support Syst. 3, 45–52 (2016)
Zeroual, I., Lakhouaja, A.: Data science in light of natural language processing: an overview. Proc. Comput. Sci. 127, 82–91 (2018)
Zeroual, I., Lakhouaja, A.: Arabic Corpus linguistics: major progress, but still a long way to go. In: Intelligent Natural Language Processing: Trends and Applications, pp. 613–636. Springer, Cham (2018)
Diab, M., Hacioglu, K., Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. In: Proceedings of HLT-NAACL 2004: Short Papers, pp. 149–152. Association for Computational Linguistics (2004)
Farghaly, A., Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. TALIP. 8, 14 (2009)
Utvić, M.: Annotating the corpus of contemporary Serbian. In: Proceedings of the INFOtheca ‘12 Conference (2011)
AlGahtani, S., Black, W., McNaught, J.: Arabic part-of-speech tagging using transformation-based learning. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2009)
Altabba, M., Al-Zaraee, A., Shukairy, M.A.: An Arabic morphological analyzer and part-of-speech tagger. Thesis Present. Fac. Inform. Eng. Arab Int. Univ. Damascus Syr. (2010)
Maamouri, M., Bies, A., Kulick, S., Gaddeche, F., Mekki, W., Krouna, S., Bouziri, B., Zaghouani, W.: Arabic treebank: part 1 v 4.1. (2013)
Van den Bosch, A., Marsi, E., Soudi, A.: Memory-based morphological analysis and part-of-speech tagging of Arabic. In: Arabic Computational Morphology, pp. 201–217. Springer (2007)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173–180. Association for Computational Linguistics (2003)
Al Shamsi, F., Guessoum, A.: A hidden Markov model-based POS tagger for Arabic. In: Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data, France, pp. 31–42 (2006)
Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: 2nd International Conference on Arabic Language Resources and Tools (2009)
Hadni, M., Ouatik, S.A., Lachkar, A., Meknassi, M.: Hybrid part-of-speech tagger for non-vocalized Arabic text. Int. J. Nat. Lang. Comput. 2, 1–15 (2013)
Ababou, N., Mazroui, A.: A hybrid Arabic POS tagging for simple and compound morphosyntactic tags. Int. J. Speech Technol. 19, 289–302 (2016)
Aliwy, A.H.: Arabic Morphosyntactic raw text part of speech tagging system. http://portal.mimuw.edu.pl/wiadomosci/aktualnosci/doktoraty/pliki/ahmed_hussein_aliwy/aa-dok.pdf (2013)
Imad, Z., Abdelhak, L.: Adapting a decision tree based tagger for Arabic. In: Presented at the 2016 International Conference on Information Technology for Organizations Development, IT4OD 2016 (2016)
Boudchiche, M., Mazroui, A., Bebah, M.O.A.O., Lakhouaja, A., Boudlal, A.: AlKhalil Morpho Sys 2: a robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29, 141–146 (2017)
Buckwalter, T.: Buckwalter Arabic morphological analyzer (BAMA) version 2.0. linguistic data consortium (LDC) catalogue number LDC2004L02. ISBN1-58563-324-0 (2004)
Parker, R.: Arabic Gigaword Fourth Edition LDC2009T30 (2009)
Zerrouki, T., Balla, A.: Tashkeela: novel corpus of Arabic vocalized texts, data for auto-diacritization systems. Data Brief. 11, 147–151 (2017)
Attia, M.: Arabic named entities. https://sourceforge.net/projects/arabicnes/
Zeroual, I., Lakhouaja, A.: A new Quranic Corpus rich in morphosyntactical information. Int. J. Speech Technol., 1–8 (2016)
Attia, M., Yaseen, M., Choukri, K.: Specifications of the Arabic Written Corpus produced within the NEMLAR project (2005)
Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., et al.: Building annotated written and spoken Arabic LR’s in NEMLAR project. In: Proceedings of LREC, pp. 533–538 (2006)
Khoja, S.: APT: Arabic part-of-speech tagger. In: Proceedings of the Student Workshop at NAACL, pp. 20–25 (2001)
Alqrainy, S.: A morphological-syntactical analysis approach for Arabic textual tagging. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.505224? (2008)
Sawalha, M.: Arabic morphological features tag set. http://www.comp.leeds.ac.uk/sawalha/tagset_details.html (2009)
Zeroual, I., Lakhouaja, A., Belahbib, R.: Towards a standard Part of Speech tagset for the Arabic language. J. King Saud Univ.—Comput. Inf. Sci. 29, 174–181 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeroual, I., Lakhouaja, A. (2019). A Comparative Study of Standard Part-of-Speech Taggers. In: Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2018). AI2SD 2018. Advances in Intelligent Systems and Computing, vol 915. Springer, Cham. https://doi.org/10.1007/978-3-030-11928-7_75
Download citation
DOI: https://doi.org/10.1007/978-3-030-11928-7_75
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11927-0
Online ISBN: 978-3-030-11928-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)