Log in

Automatic Arabic Text Summarization Using Analogical Proportions

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Automatic text summarization is the process of generating or extracting a brief representation of an input text. There are several algorithms for extractive summarization in the literature tested by using English and other languages datasets; however, only few extractive Arabic summarizers exist due to the lack of large collection in Arabic language. This paper proposes and assesses new extractive single-document summarization approaches based on analogical proportions which are statements of the form “a is to b as c is to d”. The goal is to study the capability of analogical proportions to represent the relationship between documents and their corresponding summaries. For this purpose, we suggest two algorithms to quantify the relevance/irrelevance of an extracted keyword from the input text, to build its summary. In the first algorithm, the analogical proportion representing this relationship is limited to check the existence/non-existence of the keyword in any document or summary in a binary way without considering keyword frequency in the text, whereas the analogical proportion of the second algorithm considers this frequency. We have assessed and compared these two algorithms with some language-independent summarizers (LexRank, TextRank, Luhn and LSA (Latent Semantic Analysis)) using our large corpus ANT (Arabic News Texts) and a small test collection EASC (Essex Arabic Summaries Corpus) by computing ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (BiLingual Evaluation Understudy) metrics. The best-achieved results are ROUGE-1 = 0.96 and BLEU-1 = 0.65 corresponding to educational documents from EASC collection which outperform the best LexRank algorithm. The proposed algorithms are also compared with three other Arabic extractive summarizers, using EASC collection, and show better results in terms of ROUGE-1 = 0.75 and BLEU-1 = 0.47 for the first algorithm, and ROUGE-1 = 0.74 and BLEU-1 = 0.49 for the second one. Experimental results show the interest of analogical proportions for text summarization. In particular, analogical summarizers significantly outperform three among four language-independent summarizers in the case of BLEU-1 for ANT collection and they are not significantly outperformed by any other summarizer in the case of EASC collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.extractor.com

  2. See its language generation sample at: https://talktotransformer.com/

  3. The morpheme is the smallest meaningful unit in a language.

  4. https://antcorpus.github.io/

  5. http://www.alarabiya.net/ar/

  6. http://www.bbc.com/arabic/

  7. https://arabic.cnn.com/

  8. http://www.france24.com/ar/

  9. http://skynewsarabia.com/

  10. RSS feeds allow users getting updates from online content. It is written in standard XML file.

  11. https://sourceforge.net/projects/easc-corpus/

References

  1. Al-Abdallah RZ, Al-Taani AT. Arabic single-document text summarization using particle swarm optimization algorithm. Procedia Comput Sci 2017;117:30–37.

    Article  Google Scholar 

  2. Al-Khawaldeh FT, Samawi VW. Lexical cohesion and entailment based segmentation for Arabic text summarization. World Comput Sci Inf Technol J 2015;5(3):51–60.

    Google Scholar 

  3. Al-Radaideh Q, Bataineh D. A hybrid approach for Arabic text summarization using domain knowledge and genetic algorithms. Cognit Comput 2018;10(4):651–669.

    Article  Google Scholar 

  4. Al-Saleh AB, Menai M. Automatic Arabic text summarization: a survey. Artif Intell Rev 2016; 45(2):203–234.

    Article  Google Scholar 

  5. Alguliev R, Aliguliyev R. Evolutionary algorithm for extractive text summarization. Intell Inf Manag 2009;1(02):128–138.

    Google Scholar 

  6. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K. 2017. Text summarization techniques: a brief survey. ar**v:1707.02268.

  7. Attia M. 2008. Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester, UK. https://tel.archives-ouvertes.fr/tel-02042299.

  8. Azmi A, Al-Thanyyan S. A text summarizer for Arabic. Comput Speech Lang 2012;26(4): 260–273.

    Article  Google Scholar 

  9. Azmi A, Altmami N. An abstractive Arabic text summarizer with user controlled granularity. Inf Process Manag 2018;54(6):903–921.

    Article  Google Scholar 

  10. Baralis E, Cagliero L, Mahoto N, Fiori A. GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci 2013;249:96–109.

    Article  MathSciNet  Google Scholar 

  11. Bayoudh S, Miclet L, Delhay A. Learning by analogy: a classification rule for binary and nominal data. Proceedings of the IJCAI 2007; 2007. p. 678–683.

  12. Belguith L, Ellouze M, Maȧloul M., Jaoua M, Jaoua F, Blache P. Automatic summarization. Natural language processing of semitic languages; 2014. p. 371–408.

  13. Belkebir R, Guessoum A. A supervised approach to Arabic text summarization using AdaBoost. New contributions in information systems and technologies. In: Rocha A, Correia A, Costanzo S, and Reis L, editors; 2015. p. 227–236.

  14. Bounhas M, Elayeb B. Analogy-based matching model for domain-specific information retrieval. Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART); 2019. p. 496–505.

  15. Bounhas M, Pirlot M, Prade H, Sobrie O. Comparison of analogy-based methods for predicting preferences. Proceedings of the 13th International Conference on Scalable Uncertainty Management (SUM’19), Compiègne, France. In: Benamor N and Theobald M, editors. Springer; 2019. p. 339–354. LNCS 11940.

  16. Bounhas M, Prade H. An analogical interpolation method for enlarging a training dataset. Proceedings of the 13th International Conference on Scalable Uncertainty Management (SUM’19), Compiègne, France. In: Benamor N and Theobald M, editors. Springer; 2019. p. 136–152. LNCS 11940.

  17. Bounhas M, Prade H, Richard G. Analogy-based classifiers for nominal or numerical data. IJAR 2017;91:36–55.

    MathSciNet  MATH  Google Scholar 

  18. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 1998;30(1-7):107–117.

    Article  Google Scholar 

  19. Chouigui A, Ben Khiroun O, Elayeb B. Ant corpus: an Arabic news text collection for textual classification. 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA); 2017. p. 135–142.

  20. Chouigui A, Ben Khiroun O, Elayeb B. Related terms extraction from Arabic news corpus using word embedding. OTM Conferences & Workshops: Proceedings of the 7th International Workshop on Methods, Evaluation, Tools and Applications for the Creation and Consumption of Structured Data for the e-Society. Valletta (Malta): Springer, LNCS 11231; 2018. p. 230–240.

  21. Chouigui A, Ben Khiroun O, Elayeb B. A TF-IDF and co-occurrence based approach for events extraction from Arabic news corpus. International Conference on Applications of Natural Language to Information Systems. Springer; 2018. p. 272–280.

  22. Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006; 7:1–30.

    MathSciNet  MATH  Google Scholar 

  23. Devlin J, Chang M, Lee K, Toutanova K. 2019. BERT: pre-training of deep bidirectional transformers for language understanding, p. 4171–4186.

  24. Dubois D, Prade H, Richard G. Multiple-valued extensions of analogical proportions. Fuzzy Sets Syst 2016;292:193–202.

    Article  MathSciNet  MATH  Google Scholar 

  25. El-Haj M. 2012. Multi-document Arabic text summarisation. Ph.D. thesis, University of Essex UK.

  26. El-Haj M, Kruschwitz U, Fox C. Exploring clustering for multi-document Arabic summarization. Asian Information Retrieval Symposium (AIRS’11); 2011. p. 550–561.

  27. El-Haj M, Kruschwitzo U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th International Language Resources and Evaluation Conference (LREC 2010). European language resources association; 2010.

  28. El-Shishtawy T, El-Ghannam F. Keyphrase based Arabic summarizer (kpas). The 8th International Conference on Informatics and Systems (INFOS 2012); 2012.

  29. Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 2004;22:457–479.

    Article  Google Scholar 

  30. Essid M, Bounhas M, Prade H. Continuous analogical proportions-based classifier. Information processing and management of uncertainty in knowledge-based systems - 18th International Conference, IPMU 2020, Lisbon, Portugal, June 15th-19th, p.541–555; 2020.

  31. Fahandar MA, Hüllermeier E. Learning to rank based on analogical reasoning. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7; 2018. p. 2951–2958.

  32. Fejer H, Omar N. Automatic multi-document Arabic text summarization using clustering and keyphrase extraction. J Artif Intell 2015;8(1):1–9.

    Article  Google Scholar 

  33. Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55(1):119–139.

    Article  MathSciNet  MATH  Google Scholar 

  34. Gupta V, Kaur N. A novel hybrid text summarization system for Punjabi text. Cognit Comput 2016;8(2):261–277.

    Article  Google Scholar 

  35. Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2010;2(3):258–268.

    Google Scholar 

  36. Habash N. Introduction to Arabic natural language processing. Synthesis lectures on human language technologies. Morgan & Claypool Publishers; 2010.

  37. Haboush A, Al-Zoubi M, Momani A, Tarazi M. Arabic text summarization model using clustering techniques. World Comput Sci Inf Technol J 2012;2(2):62–67.

    Google Scholar 

  38. Hathout N. Acquistion of the morphological structure of the lexicon based on lexical similarity and formal analogy. Proceedings of Graph-based Methods for Natural Language Processing (Textgraphs08); 2008. p. 1–8.

  39. Hesse M. On defining analogy. Proc Aristot Soc 1959;60:79–100.

    Article  Google Scholar 

  40. Ibrahim A, Elghazaly T. Improve the automatic summarization of Arabic text depending on rhetorical structure theory. The 12th Mexican International Conference on Artificial Intelligence (MICAI); 2013. p. 223–227.

  41. Ismail S, Moawd I, Aref M. Arabic text representation using rich semantic graph: a case study. Proceedings 4th European Conference of Computer Science (ECCS); 2013. p. 148–153.

  42. Kupiec J, Pedersen J, Chen F. A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1995. p. 68–73.

  43. Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process 1998;25(2-3):259– 284.

    Article  Google Scholar 

  44. Langlais P. Etude quantitative de liens entre l’analogie formelle et la morphologie constructionnelle. Actes du 16ième conférence sur le Traitement Automatique des Langues Naturelles (TALN’09). Senlis, France; 2009. papers/paper-taln-2009a.pdf.

  45. Lepage Y. Analogy and formal languages. Proceedings of the FG/MOL 2001; 2001. p. 373–378.

  46. Lin CY. 2004. Rouge: a package for automatic evaluation of summaries text summarization branches out.

  47. Lin CY, Hovy E. Manual and automatic evaluation of summaries. Proceedings of the ACL-02 Workshop on Automatic Summarization (AS’02); 2002. p. 45–51.

  48. Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 2017;48(4):499–527.

    Article  Google Scholar 

  49. Luhn H. The automatic creation of literature abstracts. IBM J Res Dev 1958;2(2):159–165.

    Article  MathSciNet  Google Scholar 

  50. Conroy JM, O’Leary DP. Text summarization via hidden Markov model. The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 406–407; 2001.

  51. Ma Y, Peng H, Cambria E. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7; 2018. p. 5876–5883.

  52. Mendoza M, Bonilla S, Noguera C, Lozada CAC, Leȯn E. Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 2014;41(9):4158–4169.

    Article  Google Scholar 

  53. Miclet L, Bayoudh S, Delhay A. Analogical dissimilarity: definition, algorithms and two experiments in machine learning. J Artif Intell Res 2008;32:793–824.

    Article  MathSciNet  MATH  Google Scholar 

  54. Miclet L, Prade H. Handling analogical proportions in classical logic and fuzzy logics settings. Proceedings of the ECSQARU’09. Springer; 2009. p. 638–650. LNCS 5590.

  55. Mihalcea R, Tarau P. Textrank: Bringing order into text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; 2004. p. 404–411.

  56. Mihalcea R, Tarau P. A language independent algorithm for single and multiple document summarization. Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Republic of Korea, October 11-13, 2005 - Companion Volume to the Proceedings of Conference Including Posters/Demos and Tutorial Abstracts; 2005.

  57. Moawad I, Aref M. Semantic graph reduction approach for abstractive text summarization. 7th International Conference on Computer Engineering and Systems (ICCES); 2012. p. 132–138.

  58. Moawad I, Aref M, Ibrahim S. Ontology-based model for generating text semantic representation. Int J Intell Comput Inf Sci 2011;11(1):117–128.

    Google Scholar 

  59. Mohamed M, Oussalah M. SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 2019;56:1356– 1372.

    Article  Google Scholar 

  60. Moreau F, Claveau V, Sėbillot P. Automatic morphological query expansion using analogy-based machine learning. Proceedings of the 29th European Conference on Information Retrieval (ECIR2007); 2007. p. 222–233.

  61. Nenkova A, McKeown K. A survey of text summarization techniques. Mining Text Data. In: Aggarwal CC, Zhai C, and blubberdiblubb, editors. Springer; 2012. p. 43–76.

  62. Oueslati O, Cambria E, HajHmida MB, Ounelli H. A review of sentiment analysis research in Arabic language. Future Gener Comput Syst 2020; 112(November 2020):408–430.

  63. Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics; 2002. p. 311–318. Association for computational linguistics.

  64. De la Peña Sarracén GL, Rosso P. Automatic text summarization based on betweenness centrality. Proceedings of the 5th Spanish Conference on Information Retrieval. ACM; 2018. p. 11.

  65. Peters ME, Neumann M, Iyyer M, Gardner M, Christopher C, Lee K, Zettlemoyer L. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018 (Long Papers), New Orleans, Louisiana, USA, June 1-6; 2018. p. 2227–2237.

  66. Prade H, Richard G. Reasoning with logical proportions. Proceedings of the KR 2010; 2010. p. 545–555.

  67. Prade H, Richard G. From analogical proportion to logical proportions. Log Univers 2013;7 (4):441–505.

    Article  MathSciNet  MATH  Google Scholar 

  68. Prade H, Richard G, Yao B. Enforcing regularity by means of analogy-related proportions-a new approach to classification. Int J Comp Inf Sys Ind Manag App 2012;4:648–658.

    Google Scholar 

  69. Qaroush A, Farah IA, Ghanem W, Washaha M, Maali E. 2019;. An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ Comp Info Sci. https://doi.org/10.1016/j.jksuci.2019.03.010.

  70. Al-Radaideh QA, Twaiq LM. Rough set theory for arabic sentiment classification. 2014 international conference on future internet of things and cloud, FiCloud 2014, Barcelona, Spain, August 27–29; 2014. p. 559–564.

  71. Radford A, Narasimhan K, Salimans T, Sutskever I. 2018. Improving language understanding by generative pre-training. Tech. rep., OpenAI.

  72. Saggion H, Poibeau T. Automatic text summarization: past, present and future. Multi-source, multilingual information extraction and summarization; 2013. p. 3–21.

  73. Sander E. 2000. L’analogie, du naïf au créatif Editions l’Harmattan.

  74. Stroppa N, Yvon F. An analogical learner for morphological analysis. Proceedings of the 9th Conference on Computational Natural Language Learning; 2005. p. 120–127.

  75. Stroppa N, Yvon F. 2005. Analogical learning and formal proportions: definitions and methodological issues. Tech rep.

  76. Stroppa N, Yvon F. Du quatriėme de proportion comme principe inductif : une proposition et son application ȧ l’apprentissage de la morphologie. Traitement Automatique des Langues 2006;47(1):33–59.

    Google Scholar 

  77. Yang L, Cai X, Zhang Y, Shi P. Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization. Inf Sci 2014;260:37–50.

    Article  Google Scholar 

  78. Yvon F, Stroppa N, Delhay A, Miclet L. 2004. Solving analogical equations on words. Tech. rep., Ecole Nationale Supérieure des Télécommunications.

  79. Zhao W, Peng H, Eger S, Cambria E, Yang M. Towards scalable and reliable capsule networks for challenging NLP applications. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019 (Long Papers), Florence, Italy, July 28th to August 2nd; 2019. p. 1549–1559.

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments, which significantly improved the quality of this manuscript during the reviewing process. The authors wish to thank Mr. Shehab Abbas who revised the paper and improved the English.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bilel Elayeb.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elayeb, B., Chouigui, A., Bounhas, M. et al. Automatic Arabic Text Summarization Using Analogical Proportions. Cogn Comput 12, 1043–1069 (2020). https://doi.org/10.1007/s12559-020-09748-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-020-09748-y

Keywords

Navigation