Automatic Arabic Text Summarization Using Analogical Proportions

Elayeb, Bilel; Chouigui, Amina; Bounhas, Myriam; Khiroun, Oussama Ben

doi:10.1007/s12559-020-09748-y

Automatic Arabic Text Summarization Using Analogical Proportions

Published: 19 August 2020

Volume 12, pages 1043–1069, (2020)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Bilel Elayeb^1,2,
Amina Chouigui²,
Myriam Bounhas^1,3 &
…
Oussama Ben Khiroun^2,4

996 Accesses
Explore all metrics

Abstract

Automatic text summarization is the process of generating or extracting a brief representation of an input text. There are several algorithms for extractive summarization in the literature tested by using English and other languages datasets; however, only few extractive Arabic summarizers exist due to the lack of large collection in Arabic language. This paper proposes and assesses new extractive single-document summarization approaches based on analogical proportions which are statements of the form “a is to b as c is to d”. The goal is to study the capability of analogical proportions to represent the relationship between documents and their corresponding summaries. For this purpose, we suggest two algorithms to quantify the relevance/irrelevance of an extracted keyword from the input text, to build its summary. In the first algorithm, the analogical proportion representing this relationship is limited to check the existence/non-existence of the keyword in any document or summary in a binary way without considering keyword frequency in the text, whereas the analogical proportion of the second algorithm considers this frequency. We have assessed and compared these two algorithms with some language-independent summarizers (LexRank, TextRank, Luhn and LSA (Latent Semantic Analysis)) using our large corpus ANT (Arabic News Texts) and a small test collection EASC (Essex Arabic Summaries Corpus) by computing ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (BiLingual Evaluation Understudy) metrics. The best-achieved results are ROUGE-1 = 0.96 and BLEU-1 = 0.65 corresponding to educational documents from EASC collection which outperform the best LexRank algorithm. The proposed algorithms are also compared with three other Arabic extractive summarizers, using EASC collection, and show better results in terms of ROUGE-1 = 0.75 and BLEU-1 = 0.47 for the first algorithm, and ROUGE-1 = 0.74 and BLEU-1 = 0.49 for the second one. Experimental results show the interest of analogical proportions for text summarization. In particular, analogical summarizers significantly outperform three among four language-independent summarizers in the case of BLEU-1 for ANT collection and they are not significantly outperformed by any other summarizer in the case of EASC collection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 4

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Article 04 February 2021

Using Statistical and Semantic Analysis for Arabic Text Summarization

Wajeez: An Extractive Automatic Arabic Text Summarisation System

Notes

http://www.extractor.com
See its language generation sample at: https://talktotransformer.com/
The morpheme is the smallest meaningful unit in a language.
https://antcorpus.github.io/
http://www.alarabiya.net/ar/
http://www.bbc.com/arabic/
https://arabic.cnn.com/
http://www.france24.com/ar/
http://skynewsarabia.com/
RSS feeds allow users getting updates from online content. It is written in standard XML file.
https://sourceforge.net/projects/easc-corpus/

References

Al-Abdallah RZ, Al-Taani AT. Arabic single-document text summarization using particle swarm optimization algorithm. Procedia Comput Sci 2017;117:30–37.
Article Google Scholar
Al-Khawaldeh FT, Samawi VW. Lexical cohesion and entailment based segmentation for Arabic text summarization. World Comput Sci Inf Technol J 2015;5(3):51–60.
Google Scholar
Al-Radaideh Q, Bataineh D. A hybrid approach for Arabic text summarization using domain knowledge and genetic algorithms. Cognit Comput 2018;10(4):651–669.
Article Google Scholar
Al-Saleh AB, Menai M. Automatic Arabic text summarization: a survey. Artif Intell Rev 2016; 45(2):203–234.
Article Google Scholar
Alguliev R, Aliguliyev R. Evolutionary algorithm for extractive text summarization. Intell Inf Manag 2009;1(02):128–138.
Google Scholar
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K. 2017. Text summarization techniques: a brief survey. ar**v:1707.02268.
Attia M. 2008. Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. thesis, University of Manchester, UK. https://tel.archives-ouvertes.fr/tel-02042299.
Azmi A, Al-Thanyyan S. A text summarizer for Arabic. Comput Speech Lang 2012;26(4): 260–273.
Article Google Scholar
Azmi A, Altmami N. An abstractive Arabic text summarizer with user controlled granularity. Inf Process Manag 2018;54(6):903–921.
Article Google Scholar
Baralis E, Cagliero L, Mahoto N, Fiori A. GRAPHSUM: discovering correlations among multiple terms for graph-based summarization. Inf Sci 2013;249:96–109.
Article MathSciNet Google Scholar
Bayoudh S, Miclet L, Delhay A. Learning by analogy: a classification rule for binary and nominal data. Proceedings of the IJCAI 2007; 2007. p. 678–683.
Belguith L, Ellouze M, Maȧloul M., Jaoua M, Jaoua F, Blache P. Automatic summarization. Natural language processing of semitic languages; 2014. p. 371–408.
Belkebir R, Guessoum A. A supervised approach to Arabic text summarization using AdaBoost. New contributions in information systems and technologies. In: Rocha A, Correia A, Costanzo S, and Reis L, editors; 2015. p. 227–236.
Bounhas M, Elayeb B. Analogy-based matching model for domain-specific information retrieval. Proceedings of the 11th International Conference on Agents and Artificial Intelligence (ICAART); 2019. p. 496–505.
Bounhas M, Pirlot M, Prade H, Sobrie O. Comparison of analogy-based methods for predicting preferences. Proceedings of the 13th International Conference on Scalable Uncertainty Management (SUM’19), Compiègne, France. In: Benamor N and Theobald M, editors. Springer; 2019. p. 339–354. LNCS 11940.
Bounhas M, Prade H. An analogical interpolation method for enlarging a training dataset. Proceedings of the 13th International Conference on Scalable Uncertainty Management (SUM’19), Compiègne, France. In: Benamor N and Theobald M, editors. Springer; 2019. p. 136–152. LNCS 11940.
Bounhas M, Prade H, Richard G. Analogy-based classifiers for nominal or numerical data. IJAR 2017;91:36–55.
MathSciNet MATH Google Scholar
Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 1998;30(1-7):107–117.
Article Google Scholar
Chouigui A, Ben Khiroun O, Elayeb B. Ant corpus: an Arabic news text collection for textual classification. 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA); 2017. p. 135–142.
Chouigui A, Ben Khiroun O, Elayeb B. Related terms extraction from Arabic news corpus using word embedding. OTM Conferences & Workshops: Proceedings of the 7th International Workshop on Methods, Evaluation, Tools and Applications for the Creation and Consumption of Structured Data for the e-Society. Valletta (Malta): Springer, LNCS 11231; 2018. p. 230–240.
Chouigui A, Ben Khiroun O, Elayeb B. A TF-IDF and co-occurrence based approach for events extraction from Arabic news corpus. International Conference on Applications of Natural Language to Information Systems. Springer; 2018. p. 272–280.
Demsar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006; 7:1–30.
MathSciNet MATH Google Scholar
Devlin J, Chang M, Lee K, Toutanova K. 2019. BERT: pre-training of deep bidirectional transformers for language understanding, p. 4171–4186.
Dubois D, Prade H, Richard G. Multiple-valued extensions of analogical proportions. Fuzzy Sets Syst 2016;292:193–202.
Article MathSciNet MATH Google Scholar
El-Haj M. 2012. Multi-document Arabic text summarisation. Ph.D. thesis, University of Essex UK.
El-Haj M, Kruschwitz U, Fox C. Exploring clustering for multi-document Arabic summarization. Asian Information Retrieval Symposium (AIRS’11); 2011. p. 550–561.
El-Haj M, Kruschwitzo U, Fox C. Using mechanical turk to create a corpus of Arabic summaries. Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Languages workshop held in conjunction with the 7th International Language Resources and Evaluation Conference (LREC 2010). European language resources association; 2010.
El-Shishtawy T, El-Ghannam F. Keyphrase based Arabic summarizer (kpas). The 8th International Conference on Informatics and Systems (INFOS 2012); 2012.
Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research 2004;22:457–479.
Article Google Scholar
Essid M, Bounhas M, Prade H. Continuous analogical proportions-based classifier. Information processing and management of uncertainty in knowledge-based systems - 18th International Conference, IPMU 2020, Lisbon, Portugal, June 15th-19th, p.541–555; 2020.
Fahandar MA, Hüllermeier E. Learning to rank based on analogical reasoning. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7; 2018. p. 2951–2958.
Fejer H, Omar N. Automatic multi-document Arabic text summarization using clustering and keyphrase extraction. J Artif Intell 2015;8(1):1–9.
Article Google Scholar
Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 1997;55(1):119–139.
Article MathSciNet MATH Google Scholar
Gupta V, Kaur N. A novel hybrid text summarization system for Punjabi text. Cognit Comput 2016;8(2):261–277.
Article Google Scholar
Gupta V, Lehal GS. A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2010;2(3):258–268.
Google Scholar
Habash N. Introduction to Arabic natural language processing. Synthesis lectures on human language technologies. Morgan & Claypool Publishers; 2010.
Haboush A, Al-Zoubi M, Momani A, Tarazi M. Arabic text summarization model using clustering techniques. World Comput Sci Inf Technol J 2012;2(2):62–67.
Google Scholar
Hathout N. Acquistion of the morphological structure of the lexicon based on lexical similarity and formal analogy. Proceedings of Graph-based Methods for Natural Language Processing (Textgraphs08); 2008. p. 1–8.
Hesse M. On defining analogy. Proc Aristot Soc 1959;60:79–100.
Article Google Scholar
Ibrahim A, Elghazaly T. Improve the automatic summarization of Arabic text depending on rhetorical structure theory. The 12th Mexican International Conference on Artificial Intelligence (MICAI); 2013. p. 223–227.
Ismail S, Moawd I, Aref M. Arabic text representation using rich semantic graph: a case study. Proceedings 4th European Conference of Computer Science (ECCS); 2013. p. 148–153.
Kupiec J, Pedersen J, Chen F. A trainable document summarizer. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1995. p. 68–73.
Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process 1998;25(2-3):259– 284.
Article Google Scholar
Langlais P. Etude quantitative de liens entre l’analogie formelle et la morphologie constructionnelle. Actes du 16ième conférence sur le Traitement Automatique des Langues Naturelles (TALN’09). Senlis, France; 2009. papers/paper-taln-2009a.pdf.
Lepage Y. Analogy and formal languages. Proceedings of the FG/MOL 2001; 2001. p. 373–378.
Lin CY. 2004. Rouge: a package for automatic evaluation of summaries text summarization branches out.
Lin CY, Hovy E. Manual and automatic evaluation of summaries. Proceedings of the ACL-02 Workshop on Automatic Summarization (AS’02); 2002. p. 45–51.
Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev 2017;48(4):499–527.
Article Google Scholar
Luhn H. The automatic creation of literature abstracts. IBM J Res Dev 1958;2(2):159–165.
Article MathSciNet Google Scholar
Conroy JM, O’Leary DP. Text summarization via hidden Markov model. The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, p. 406–407; 2001.
Ma Y, Peng H, Cambria E. Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2–7; 2018. p. 5876–5883.
Mendoza M, Bonilla S, Noguera C, Lozada CAC, Leȯn E. Extractive single-document summarization based on genetic operators and guided local search. Expert Syst Appl 2014;41(9):4158–4169.
Article Google Scholar
Miclet L, Bayoudh S, Delhay A. Analogical dissimilarity: definition, algorithms and two experiments in machine learning. J Artif Intell Res 2008;32:793–824.
Article MathSciNet MATH Google Scholar
Miclet L, Prade H. Handling analogical proportions in classical logic and fuzzy logics settings. Proceedings of the ECSQARU’09. Springer; 2009. p. 638–650. LNCS 5590.
Mihalcea R, Tarau P. Textrank: Bringing order into text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing; 2004. p. 404–411.
Mihalcea R, Tarau P. A language independent algorithm for single and multiple document summarization. Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Republic of Korea, October 11-13, 2005 - Companion Volume to the Proceedings of Conference Including Posters/Demos and Tutorial Abstracts; 2005.
Moawad I, Aref M. Semantic graph reduction approach for abstractive text summarization. 7th International Conference on Computer Engineering and Systems (ICCES); 2012. p. 132–138.
Moawad I, Aref M, Ibrahim S. Ontology-based model for generating text semantic representation. Int J Intell Comput Inf Sci 2011;11(1):117–128.
Google Scholar
Mohamed M, Oussalah M. SRL-ESA-TextSum: a text summarization approach based on semantic role labeling and explicit semantic analysis. Inf Process Manag 2019;56:1356– 1372.
Article Google Scholar
Moreau F, Claveau V, Sėbillot P. Automatic morphological query expansion using analogy-based machine learning. Proceedings of the 29th European Conference on Information Retrieval (ECIR2007); 2007. p. 222–233.
Nenkova A, McKeown K. A survey of text summarization techniques. Mining Text Data. In: Aggarwal CC, Zhai C, and blubberdiblubb, editors. Springer; 2012. p. 43–76.
Oueslati O, Cambria E, HajHmida MB, Ounelli H. A review of sentiment analysis research in Arabic language. Future Gener Comput Syst 2020; 112(November 2020):408–430.
Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics; 2002. p. 311–318. Association for computational linguistics.
De la Peña Sarracén GL, Rosso P. Automatic text summarization based on betweenness centrality. Proceedings of the 5th Spanish Conference on Information Retrieval. ACM; 2018. p. 11.
Peters ME, Neumann M, Iyyer M, Gardner M, Christopher C, Lee K, Zettlemoyer L. Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018 (Long Papers), New Orleans, Louisiana, USA, June 1-6; 2018. p. 2227–2237.
Prade H, Richard G. Reasoning with logical proportions. Proceedings of the KR 2010; 2010. p. 545–555.
Prade H, Richard G. From analogical proportion to logical proportions. Log Univers 2013;7 (4):441–505.
Article MathSciNet MATH Google Scholar
Prade H, Richard G, Yao B. Enforcing regularity by means of analogy-related proportions-a new approach to classification. Int J Comp Inf Sys Ind Manag App 2012;4:648–658.
Google Scholar
Qaroush A, Farah IA, Ghanem W, Washaha M, Maali E. 2019;. An efficient single document Arabic text summarization using a combination of statistical and semantic features. J King Saud Univ Comp Info Sci. https://doi.org/10.1016/j.jksuci.2019.03.010.
Al-Radaideh QA, Twaiq LM. Rough set theory for arabic sentiment classification. 2014 international conference on future internet of things and cloud, FiCloud 2014, Barcelona, Spain, August 27–29; 2014. p. 559–564.
Radford A, Narasimhan K, Salimans T, Sutskever I. 2018. Improving language understanding by generative pre-training. Tech. rep., OpenAI.
Saggion H, Poibeau T. Automatic text summarization: past, present and future. Multi-source, multilingual information extraction and summarization; 2013. p. 3–21.
Sander E. 2000. L’analogie, du naïf au créatif Editions l’Harmattan.
Stroppa N, Yvon F. An analogical learner for morphological analysis. Proceedings of the 9th Conference on Computational Natural Language Learning; 2005. p. 120–127.
Stroppa N, Yvon F. 2005. Analogical learning and formal proportions: definitions and methodological issues. Tech rep.
Stroppa N, Yvon F. Du quatriėme de proportion comme principe inductif : une proposition et son application ȧ l’apprentissage de la morphologie. Traitement Automatique des Langues 2006;47(1):33–59.
Google Scholar
Yang L, Cai X, Zhang Y, Shi P. Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization. Inf Sci 2014;260:37–50.
Article Google Scholar
Yvon F, Stroppa N, Delhay A, Miclet L. 2004. Solving analogical equations on words. Tech. rep., Ecole Nationale Supérieure des Télécommunications.
Zhao W, Peng H, Eger S, Cambria E, Yang M. Towards scalable and reliable capsule networks for challenging NLP applications. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019 (Long Papers), Florence, Italy, July 28th to August 2nd; 2019. p. 1549–1559.

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments, which significantly improved the quality of this manuscript during the reviewing process. The authors wish to thank Mr. Shehab Abbas who revised the paper and improved the English.

Author information

Authors and Affiliations

Emirates College of Technology, Abu Dhabi, United Arab Emirates
Bilel Elayeb & Myriam Bounhas
RIADI Research Laboratory, ENSI, Manouba University, Manouba, Tunisia
Bilel Elayeb, Amina Chouigui & Oussama Ben Khiroun
LARODEC Research Laboratory, ISGT, Tunis University, Tunis, Tunisia
Myriam Bounhas
National Engineering School of Sousse, Sousse University, Sousse, Tunisia
Oussama Ben Khiroun

Authors

Bilel Elayeb
View author publications
You can also search for this author in PubMed Google Scholar
Amina Chouigui
View author publications
You can also search for this author in PubMed Google Scholar
Myriam Bounhas
View author publications
You can also search for this author in PubMed Google Scholar
Oussama Ben Khiroun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bilel Elayeb.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elayeb, B., Chouigui, A., Bounhas, M. et al. Automatic Arabic Text Summarization Using Analogical Proportions. Cogn Comput 12, 1043–1069 (2020). https://doi.org/10.1007/s12559-020-09748-y

Download citation

Received: 29 November 2019
Accepted: 10 June 2020
Published: 19 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s12559-020-09748-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Arabic Text Summarization Using Analogical Proportions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Using Statistical and Semantic Analysis for Arabic Text Summarization

Wajeez: An Extractive Automatic Arabic Text Summarisation System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Ethical Approval

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic Arabic Text Summarization Using Analogical Proportions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Using Statistical and Semantic Analysis for Arabic Text Summarization

Wajeez: An Extractive Automatic Arabic Text Summarisation System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s Note

Ethical Approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation