Abstract
In this work, we will present a new concept of POS tagging that will be implemented for the Arabic language. Indeed, we will see that in Arabic there are a numerous cases where the determination of the morpho-syntactic state of a word depends on the states of the subsequent words, which represents the theoretical foundation of the approach: how to consider, in addition of the past elements, the future ones. We will then demonstrate how the POS tagging in its statistical application: the HMM, is based mainly on the past elements, and how to combine both direct and reverse taggers to tag the same sequence of words in both senses. Thus, we will propose a hypothesis for the result selecting. In the practical part, we will present, in general, the used resource and the changes made on it. Then we will explain the experiment steps and the parameters collected and presented on graphics, that we will discuss later to lead to the final conclusion.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10772-015-9303-7/MediaObjects/10772_2015_9303_Fig8_HTML.gif)
Similar content being viewed by others
Notes
We can say for this function of lA:لا رجلٌ قائمٌ بل رجلان lA rajulN qA}mN bal rajulAni (It is not one man who is standing but two) contrary to the first function (Ibn 'Aqil 2002).
References
Al Shamsi, F., & Guessoum, A. (2006, April). A hidden Markov model-based POS tagger for Arabic. Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data. France, pp. 31–42. Available: http://lexicometrica.univ-paris3.fr/jadt/jadt2006/PDF/004.pdf
Albared, M., Omar, N., & Ab Aziz, M. J. (2009). Classifiers combination to Arabic morphosyntactic disambiguation. International Conference on Electrical Engineering and Informatics, 1, 163–171.
Alkhalil, I. A. (1985). Aljomal fi annahw (Sentences in the Arabic grammar) (1st ed., p. 208). Beirut: Moeassasat Arrisala.
Bar-Haim, R., Sima’an, K., & Winter, Y. (2005, June). Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. Proceedings of the ACL workshop on computational approaches to Semitic languages. Association for Computational Linguistics, pp. 39–46. Available: http://dl.acm.org/ft_gateway.cfm?id=1621796&type=pdf&CFID=451781378&CFTOKEN=21655173
Brill, E. (2000). Part-of-speech tagging. In R. Dale, H. Moisl, & H. Somers (Eds.), Handbook of natural language processing (pp. 403–414). CRC Press: Boca Raton.
Diab, M., Hacioglu, K., & Jurafsky, D. (2007). Automated methods for processing Arabic text: From tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Dordrecht: Kluwer/Springer.
Elhadj, Y. O. (2009). Statistical part-of-speech tagger for traditional Arabic texts. Journal of Computer Science, 5(11): 794–800. Available: http://thescipub.com/PDF/jcssp.2009.794.800.pdf
El-Jihad, A., Yousfi, A., & Si-Lhoussain, A. (2011). Morpho-syntactic tagging system based on the patterns words for Arabic texts. The International Arab Journal of Information Technology, 8(4): 350–354. Available: http://iajit.org/PDF/vol.8,no.4/2-763.pdf
Greene, B. B., & Rubin, G. M. (1971).Automated grammatical tagging of English.
Habash, N., Rambow, O., & Roth, R. (2009, April). MADA + TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd international conference on Arabic Language Resources and Tools (MEDAR), Cairo (pp. 102–109).
Ibn ‘Aqil. (2002). Charh Ibn Aqile eala alfiat Ben Malik (explanation of Alfiat Ibn Malik). Tahqiq Mohyi Adin AbdAlhamid, almaktaba aleasria, vol. 1, Beirut, Lebanon.
Ibn Hicham, A. (2013). Moghni allabib ean kotobi al aearib (what suffices the thinker from the books of arabic traditional grammar). Almaktaba alassrya, vol. 1, Sidon, Lebanon, ISBN: 9953-400-37-7, pp.102–108.
Jurafsky, D., & James, H. (2000). Speech and language processing an introduction to natural language processing, Computational Linguistics and Natural Language Processing. Prentice Hall, ISBN: 10: 0131873210, pp. 1024
Kim, J. H. (1993). Korean Part-of-Speech Tagging by Using a Fuzzy net. Proceedings of the 5th national conference on Korean Information Processing.
Klein, S., & Simmons, R. F. (1963). A computational approach to grammatical coding of English words. Journal of the ACM (JACM), 10(3), 334–347.
Kübler, S., & Mohamed, E. (2012). Part of speech tagging for Arabic. Natural Language Engineering, 18(04), 521–548.
Maegaard, B., Choukri, K., Mokbel, C., & Yaseen, M. (2005). Language Technology for Arabic.ISBN 87-90708-15-6,© NEMLAR, Center for Sprogteknologi, University of Copenhagen, Denmark. Available: http://medar.info/The_Nemlar_Project/Publications/Arabic_LT.pdf
Mohamed, E., & Kübler, S. (2010). Arabic Part of Speech Tagging.LREC, Valetta, Malta. Available: http://cl.indiana.edu/~skuebler/papers/arab.pdf
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., & Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland. Available: http://www.lrec-conf.org/proceedings/lrec2014/pdf/593_Paper.pdf
Raja, F., Tasharofi, S., & Oroumchian, F. (2007). Statistical POS tagging experiments on Persian text. Proceedings of the Second Workshop on Computational Approaches to Arabic Script-based Languages. California, pp. 21–22. Available: http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1005&context=dubaipapers
Valli, A., & Véronis, J. (1999). Etiquetage grammatical des corpus de parole: problèmes et perspectives. Revue française de linguistique appliquée, 4(2), 113–133. Available: http://sites.univ-provence.fr/~veronis/pdf/1999rfla.pdf
Voutilainen, A. (2003). Part-of-speech tagging. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 219–232). New York: Oxford University Press.
Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., Haddad, B., Mukbel, C., Mouradi, A., Al-Kufaishi, A., Shahin, M., Chenfour, N., & Ragheb, A. (2006). Building annotated written and spoken Arabic LR’s in NEMLAR project. Proceedings of LREC. pp. 533–538. Available: https://uop.edu.jo/download/research/members/202_1544_Yase.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kadim, A., Lazrek, A. Bidirectional HMM-based Arabic POS tagging. Int J Speech Technol 19, 303–312 (2016). https://doi.org/10.1007/s10772-015-9303-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-015-9303-7