Abstract
Text summarization is the process of producing a shorter version of a specific text. Automatic summarization techniques have been applied to various domains such as medical, political, news, and legal domains proving that adapting domain-relevant features could improve the summarization performance. Despite the existence of plenty of research work in the domain-based summarization in English and other languages, there is a lack of such work in Arabic due to the shortage of existing knowledge bases. In this paper, a hybrid, single-document text summarization approach (abbreviated as (ASDKGA)) is presented. The approach incorporates domain knowledge, statistical features, and genetic algorithms to extract important points of Arabic political documents. The ASDKGA approach is tested on two corpora KALIMAT corpus and Essex Arabic Summaries Corpus (EASC). The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework was used to compare the automatically generated summaries by the ASDKGA approach with summaries generated by humans. Also, the approach is compared against three other Arabic text summarization approaches. The (ASDKGA) approach demonstrated promising results when summarizing Arabic political documents with average F-measure of 0.605 at the compression ratio of 40%.
Similar content being viewed by others
References
Lloret E, Palomar M. Text summarization in progress: a literature review. Artif Intell Rev. 2010;37(1):1–41.
Radev D, Hovy E, McKeown K. Introduction to the special issue on summarization. Comput linguist. 2002;28(4):399–408.
Ježek, K. and Steinberger, J. Automatic text summarization (the state of the Art 2007 and new challenges). In: the conference Znalosti, Bratislava, Slovakia 2008; p 1–12.
Saggion H. Automatic summarization: an overview. Rev Fr Linguist Appl. 2008;13(1):63–81.
Luhn H. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.
Reeve L, Han H, Brooks A. The use of domain-specific concepts in biomedical text summarization. Inf Process Manag. 2007;43(6):1765–76.
Chen Y, Foong O, Yong S, Kurniawan I. Text summarization for oil and gas drilling topic. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2008;2(6):1799–802.
Yeh J, Ke H, Yang W, Meng I. Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag. 2005;41(1):75–95.
Moens, M., Uyttendaele, C., and Dumortier, J. Abstracting of legal cases: the SALOMON experience. In: the 6th International Conference on Artificial Intelligence and Law (ICAIL97), Melbourne, Australia. 1997; p 114–122.
De Hollander, G. and Marx, M. Summarization of meetings using word clouds. In: the Computer Science and Software Engineering (CSSE) CSI International Symposium, Tehran 2011; p 54–61.
Summers, E. and Stephens, K. Politwitics: summarization of political tweets. 2012. Retrieved Mar. 10, 2015 from the World Wide Web: http://bid.berkeley.edu/cs294-1-spring13/images/3/34/Politwitics_report.pdf.
Chong L, Chen Y. Text summarization for oil and gas news article. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2009;3(5):1282–5.
Sarkar K. Using domain knowledge for text summarization in medical domain. Int J Recent Trends Eng. 2009;1(1):200–5.
Imam I, Hamouda A, Khalek H. An ontology-based summarization system for Arabic documents (OSSAD). Int J Comput Appl. 2013;74(17):38–43.
Jr S, Pappa C, Freitas A, Kaestner C. Automatic text summarization with genetic algorithm-based attribute selection. Adv Artif Intell–IBERAMIA Springer. 2004:305–14.
Qazvinian V, Hassanabadi L, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. Int J Knowl Manag Stud. 2008;2(4):426–44.
Fattah M, Ren F. Automatic text summarization. Int J Comput Electr Autom Control Inf Eng. 2008;2(1):90–3.
Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using genetic algorithms. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; 2010. p. 927–36.
Nandhini K, Balasundaram S. Use of genetic algorithms for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput. 2013;2013:1–11.
Hammo B, Abu-Salem H, Evens M. A hybrid Arabic text summarization technique based on text structure and topic identification. Int J Comput Process Lang. 2011;23(01):39–65.
Al-Omour M. Extractive-based Arabic text summarization approach. M.Sc Thesis: Department of Computer Science, Yarmouk University, Irbid, Jordan; 2012.
Ibrahim A, Elghazaly T, Gheith M. A novel Arabic text summarization model based on rhetorical structure theory and vector space model. Int J Comput Linguist Nat Lang Process. 2013;2(8):480–4.
Douzidia, F. and Lapalme, G. Lakhas, an Arabic summarization system. In: the Document Understanding Conference (DUC), Boston, USA. 2004; p128–135.
Bawakid, A., and Oussalah, M. A semantic summarization system: the University of Birmingham at TAC 2008. In: the first text analysis conference (TAC), Maryland, USA 2008; p 1–6.
Al-Radaideh Q, Afif M. Arabic text summarization using aggregate similarity. In: The international Arab Conference on Information Technology (ACIT’2009). Yemen; 2009. p. 1–8.
Sobh I. An optimized dual classification system for Arabic extractive generic text summarization. M.Sc Thesis: Department of Computer Engineering, Cairo University, Giza, Egypt; 2009.
Hamodeh, A. and Mousa, M. Automatic system for summarizing Arabic comments on social media networks. Al-Majala Al-Dawlia Lelitesalat, Al-Jameia Al-Arabia Lelhasibat. Special Issue. 2013; p 44–56. (In Arabic).
Al-Taani Ahmad and Al-Rousan, Suhaib. Arabic multi-document text summarization. In: the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Turkey 2016.
Oufaida H, Nouali O, Blache. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J King Saud Univ-Comput Inf Sci. 2014;26(4):450–61.
Al-Khawaldeh F, Samawi V. Lexical cohesion and entailment-based segmentation for Arabic text summarization (LCEAS). World Comput Sci Inf Technol J (WCSIT). 2015;5(03):51–60.
Tran HN, Cambria E, Hussain A. Towards GPU-based common-sense reasoning: using fast subgraph matching. Cogn Comput. 2016;8(6):1074–86.
Yunqing **a Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using Bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80.
Li Y, Pan Q, Yang T, Suhang Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput. 2017;9(6):843–51.
Al-Radaideh Q, Gh A-Q. Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput. 2017;9(4):346–445.
Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese A. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.
Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah A, Gelbukh A, et al. Multilingual sentiment analysis: state-of-the-art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.
Mukhtar N, Khan MA, Chiragh N. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput. 2017;9(4):446–56.
Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev. 2017;48(4):499–527.
Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.
El-Khair I. Effects of stop words elimination for Arabic information retrieval: a comparative study. Int J Comput Inf Sci. 2006;4(3):119–33.
Green, S. and Manning, C. Better arabic parsing: baselines, evaluations, and analysis. In: the 23rd International Conference on Computational Linguistics (COLING), Bei**g, China. 2010; p 394–402.
Mustafa S. Word stemming for Arabic information retrieval: the case for simple light stemming. Abhath Al-Yarmouk: Sci Eng Ser. 2012;21(1):123–44.
Singh J, Gupta V. An efficient corpus-based stemmer. Cogn Comput. 2017;9(5):671–88.
Edmundson H. New methods in automatic extracting. J Assoc Comput Mach. 1969;16(2):264–85.
Perumal K, Chaudhuri B. Language independent sentence extraction based text summarization. In: The 9th international conference on natural language processing (ICON), Chennai, India; 2011. p. 213–7.
Kumar Y, Salim N. Automatic multi document summarization approaches. J Comput Sci. 2011;8(1):133–40.
Gupta V, Lehal G. A Survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2(3):258–68.
Miller B, Goldberg D. Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 1995;9(3):193–212.
El-Haj, M. and Koulali, R. KALIMAT: a multipurpose Arabic corpus. In the Second Workshop on Arabic Corpus Linguistics, Lancaster University, UK. 2011b; p 22–25. http://sourceforge.net/projects/kalimat/.
El-Haj M., Kruschwitz U., and Fox C. Using mechanical Turk to create a corpus of Arabic summaries. In: The 7th international language resources and evaluation conference (LREC), Valletta, Malta. 2010; p 36–39.
Lin, C. ROUGE: a package for automatic evaluation of summaries. In: the ACL Workshop on Text Summarization Branches out, Barcelona, Spain. 2004; p 74–81.
El-Haj M, Kruschwitz U, Fox C. Experimenting with automatic text summarisation for Arabic. Hum Lang Technol Chall Comput Sci Linguist Springer. 2011a:490–9.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki declaration of 1975, as revised in 2008 [15].
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by the any of the authors.
Rights and permissions
About this article
Cite this article
Al-Radaideh, Q.A., Bataineh, D.Q. A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms. Cogn Comput 10, 651–669 (2018). https://doi.org/10.1007/s12559-018-9547-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-018-9547-z