Log in

Bidirectional HMM-based Arabic POS tagging

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this work, we will present a new concept of POS tagging that will be implemented for the Arabic language. Indeed, we will see that in Arabic there are a numerous cases where the determination of the morpho-syntactic state of a word depends on the states of the subsequent words, which represents the theoretical foundation of the approach: how to consider, in addition of the past elements, the future ones. We will then demonstrate how the POS tagging in its statistical application: the HMM, is based mainly on the past elements, and how to combine both direct and reverse taggers to tag the same sequence of words in both senses. Thus, we will propose a hypothesis for the result selecting. In the practical part, we will present, in general, the used resource and the changes made on it. Then we will explain the experiment steps and the parameters collected and presented on graphics, that we will discuss later to lead to the final conclusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. We can say for this function of lA:لا رجلٌ قائمٌ بل رجلان lA rajulN qA}mN bal rajulAni (It is not one man who is standing but two) contrary to the first function (Ibn 'Aqil 2002).

References

  • Al Shamsi, F., & Guessoum, A. (2006, April). A hidden Markov model-based POS tagger for Arabic. Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data. France, pp. 31–42. Available: http://lexicometrica.univ-paris3.fr/jadt/jadt2006/PDF/004.pdf

  • Albared, M., Omar, N., & Ab Aziz, M. J. (2009). Classifiers combination to Arabic morphosyntactic disambiguation. International Conference on Electrical Engineering and Informatics, 1, 163–171.

    Google Scholar 

  • Alkhalil, I. A. (1985). Aljomal fi annahw (Sentences in the Arabic grammar) (1st ed., p. 208). Beirut: Moeassasat Arrisala.

    Google Scholar 

  • Bar-Haim, R., Sima’an, K., & Winter, Y. (2005, June). Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. Proceedings of the ACL workshop on computational approaches to Semitic languages. Association for Computational Linguistics, pp. 39–46. Available: http://dl.acm.org/ft_gateway.cfm?id=1621796&type=pdf&CFID=451781378&CFTOKEN=21655173

  • Brill, E. (2000). Part-of-speech tagging. In R. Dale, H. Moisl, & H. Somers (Eds.), Handbook of natural language processing (pp. 403–414). CRC Press: Boca Raton.

    Google Scholar 

  • Diab, M., Hacioglu, K., & Jurafsky, D. (2007). Automated methods for processing Arabic text: From tokenization to base phrase chunking. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Dordrecht: Kluwer/Springer.

    Google Scholar 

  • Elhadj, Y. O. (2009). Statistical part-of-speech tagger for traditional Arabic texts. Journal of Computer Science, 5(11): 794–800. Available: http://thescipub.com/PDF/jcssp.2009.794.800.pdf

  • El-Jihad, A., Yousfi, A., & Si-Lhoussain, A. (2011). Morpho-syntactic tagging system based on the patterns words for Arabic texts. The International Arab Journal of Information Technology, 8(4): 350–354. Available: http://iajit.org/PDF/vol.8,no.4/2-763.pdf

  • Greene, B. B., & Rubin, G. M. (1971).Automated grammatical tagging of English.

  • Habash, N., Rambow, O., & Roth, R. (2009, April). MADA + TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In Proceedings of the 2nd international conference on Arabic Language Resources and Tools (MEDAR), Cairo (pp. 102–109).

  • Ibn ‘Aqil. (2002). Charh Ibn Aqile eala alfiat Ben Malik (explanation of Alfiat Ibn Malik). Tahqiq Mohyi Adin AbdAlhamid, almaktaba aleasria, vol. 1, Beirut, Lebanon.

  • Ibn Hicham, A. (2013). Moghni allabib ean kotobi al aearib (what suffices the thinker from the books of arabic traditional grammar). Almaktaba alassrya, vol. 1, Sidon, Lebanon, ISBN: 9953-400-37-7, pp.102–108.

  • Jurafsky, D., & James, H. (2000). Speech and language processing an introduction to natural language processing, Computational Linguistics and Natural Language Processing. Prentice Hall, ISBN: 10: 0131873210, pp. 1024

  • Kim, J. H. (1993). Korean Part-of-Speech Tagging by Using a Fuzzy net. Proceedings of the 5th national conference on Korean Information Processing.

  • Klein, S., & Simmons, R. F. (1963). A computational approach to grammatical coding of English words. Journal of the ACM (JACM), 10(3), 334–347.

    Article  MATH  Google Scholar 

  • Kübler, S., & Mohamed, E. (2012). Part of speech tagging for Arabic. Natural Language Engineering, 18(04), 521–548.

    Article  Google Scholar 

  • Maegaard, B., Choukri, K., Mokbel, C., & Yaseen, M. (2005). Language Technology for Arabic.ISBN 87-90708-15-6,© NEMLAR, Center for Sprogteknologi, University of Copenhagen, Denmark. Available: http://medar.info/The_Nemlar_Project/Publications/Arabic_LT.pdf

  • Mohamed, E., & Kübler, S. (2010). Arabic Part of Speech Tagging.LREC, Valetta, Malta. Available: http://cl.indiana.edu/~skuebler/papers/arab.pdf

  • Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., & Roth, R. M. (2014). Madamira: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC), Reykjavik, Iceland. Available: http://www.lrec-conf.org/proceedings/lrec2014/pdf/593_Paper.pdf

  • Raja, F., Tasharofi, S., & Oroumchian, F. (2007). Statistical POS tagging experiments on Persian text. Proceedings of the Second Workshop on Computational Approaches to Arabic Script-based Languages. California, pp. 21–22. Available: http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1005&context=dubaipapers

  • Valli, A., & Véronis, J. (1999). Etiquetage grammatical des corpus de parole: problèmes et perspectives. Revue française de linguistique appliquée, 4(2), 113–133. Available: http://sites.univ-provence.fr/~veronis/pdf/1999rfla.pdf

  • Voutilainen, A. (2003). Part-of-speech tagging. In R. Mitkov (Ed.), The Oxford handbook of computational linguistics (pp. 219–232). New York: Oxford University Press.

    Google Scholar 

  • Yaseen, M., Attia, M., Maegaard, B., Choukri, K., Paulsson, N., Haamid, S., Krauwer, S., Bendahman, C., Fersøe, H., Rashwan, M., Haddad, B., Mukbel, C., Mouradi, A., Al-Kufaishi, A., Shahin, M., Chenfour, N., & Ragheb, A. (2006). Building annotated written and spoken Arabic LR’s in NEMLAR project. Proceedings of LREC. pp. 533–538. Available: https://uop.edu.jo/download/research/members/202_1544_Yase.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayoub Kadim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadim, A., Lazrek, A. Bidirectional HMM-based Arabic POS tagging. Int J Speech Technol 19, 303–312 (2016). https://doi.org/10.1007/s10772-015-9303-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9303-7

Keywords

Navigation