Log in

A Hybrid Approach for Persian Named Entity Recognition

  • Research Paper
  • Published:
Iranian Journal of Science and Technology, Transactions A: Science Aims and scope Submit manuscript

Abstract

Named entity recognition (NER) is an information extraction subtask that attempts to recognize and categorize named entities in unstructured text into predefined categories such as the names of people, organizations, and locations. Recently, machine learning approaches, such as hidden Markov model (HMM) as well as hybrid methods, are frequently used to solve Name Entity Recognition. To the best of our knowledge, publicly available data sets for NER in Persian do not exist in any machine learning-based Persian NER system. Because of HMM innate weaknesses, in this paper, we have used both hidden Markov model and rule-based method to recognize named entities in Persian texts. The combination of rule-based method and machine learning method results in a high accurate recognition. The proposed system in its machine learning section uses HMM and Viterbi algorithms, and in its rule-based section employs a set of lexical resources and pattern bases for the recognition of named entities including the names of people, locations and organizations. During this study, we annotate our own training and testing data sets for use in the related phases. Our hybrid approach performs on Persian language with 89.73% precision, 82.44% recall, and 85.93% F-measure using an annotated test corpus including 32,606 tokens.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Surrounding words are defined as the words that are around the named entities (usually before them) and help with identifying named entities.

  2. http://www.mehrnews.com/.

References

  • Bikel DM, Schwartz R, Weischedel RM (1999) An algorithm that learns what’s in a name. Mach Learn 34(1-3):211–231

    Article  MATH  Google Scholar 

  • Blunsom P (2004) Hidden markov models. Tech. rep., Human Language Technology. University of Melbourne, Victoria. http://digital.cs.usu.edu/~cyan/CS7960/hmm-tutorial.pdf. Accessed 1 May 2015

  • Borthwick A (1999) A maximum entropy approach to named entity recognition. Ph.D. Thesis, New York University

  • Brill E (1995) Transformation-based error-driven learning andnatural language processing: a case study in part-of-speechtagging. Comput Linguist 21(4):543–565

    MathSciNet  Google Scholar 

  • Cohen WW, Sarawagi S (2004) Exploiting dictionaries in named entity extraction: combining semi-markov extraction processes and data integration methods. In: Proceedings of the Tenth ACM SIGKDD International conference on knowledge discovery and data mining, ACM

  • Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proceedings of the 14th international conference on world wide web, ACM, pp 225–234

  • Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of the 16th International conference on computational linguistics (COLING 96), Copenhagen, pp 466–471

  • Isozaki H, Kazawa H (2002) Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th International conference on computational linguistics-volume 1. association for computational linguistics

  • Lee C-s, Chen Y, Jian Z (2003) Ontology-based fuzzy event recognition agent for Chinese e-news summarization. Expert Syst Appl 25(3):431–447

    Article  Google Scholar 

  • Marrero M, Urbano J, Sánchez-Cuadrado S, Morato J, Gómez-Berbís JM (2013) Named entity recognition: fallacies, challenges and opportunities. Comput Stand Interfaces 35(5):482–489

    Article  Google Scholar 

  • Mihalcea R, Moldovan DL (2001) Document indexing using named entities. Stud Inform Control 10(1):21–28

    Google Scholar 

  • Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvist Investig 30(1):3–26

    Article  Google Scholar 

  • Saggion H, Cunningham H, Bontcheva K, Maynard D, Hamza O, Wilks Y (2004) Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project. Data Knowl Eng 48(2):247–264

    Article  Google Scholar 

  • Seng J-L, Lai JT (2010) An intelligent information segmentation approach to extract financial data for business valuation. Expert Syst Appl 37(9):6515–6530

    Article  Google Scholar 

  • Shamsfard M, Mortazavi P-S (2009) Named entity recognition in persian texts. In: 15th International conference of Irainian computer community, Tehran (In Persian)

  • Sung NH, Chang YS (2004) Business information extraction from semi-structured webpages. Expert Syst Appl 26(4):575–582

    Article  Google Scholar 

  • Tsai T, Chou W-C, Wu S-H, Sung T-Y, Hsiang J, Hsu W-L (2006) Integrating linguistic knowledge into a conditional random fieldframework to identify biomedical named entities. Expert Syst Appl 30(1):117–128

    Article  Google Scholar 

  • Zhou GD, Su J (2002) Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th annual meeting on association for computational linguistics. association for computational linguistics

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamed Moradi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moradi, H., Ahmadi, F. & Feizi-Derakhshi, MR. A Hybrid Approach for Persian Named Entity Recognition. Iran J Sci Technol Trans Sci 41, 215–222 (2017). https://doi.org/10.1007/s40995-017-0209-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40995-017-0209-x

Keywords

Navigation