Abstract
Sentiment analysis is a common and challenging task in natural language processing (NLP). It is a widely studied area of research; it facilitates capturing public opinions about a topic, product, or service. There is much research that tackles English sentiment analysis. However, the research in the Arabic language is behind other high-resource languages. Recently, models such as bidirectional encoder representations from transformers (BERT) and generative pre-trained transformer (GPT) have been widely used in many NLP tasks; it significantly improved performance in NLP tasks, especially sentiment analysis. However, Arabic was not a priority in their development. Several models focusing on Arabic have recently begun to pave the way for the latest technologies, such as ARBERT, MARBERT, and others. We used multiple datasets for training and testing-ASAD-A Twitter-based Benchmark Arabic Sentiment Analysis Dataset, ArSarcasm-v2, and SemEval-2017. We propose an ensemble learning approach that combines the multilingual model(XLM-T) and the monolingual model(MARBERT) to overcome the intricacies of the Arabic language that are difficult to address with a single model. It also addresses the problem of imbalanced data using a combination of focal loss and label smoothing. The experiments showed that our ensemble learning approach outperforms the state-of-the-art models on all the used datasets.
Similar content being viewed by others
References
Abbes I, Zaghouani W, El-Hardlo O, Ashour F (2020) DAICT: a dialectal arabic irony corpus extracted from twitter. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 6265–6271. https://aclanthology.org/2020.lrec-1.768
Abdelali A, Hassan S, Mubarak H, Darwish K, Samih Y (2021) Pre-Training BERT on Arabic Tweets: Practical Considerations. ar**v preprint ar**v:2102.10684
Abdel-Salam Reem (2021) WANLP 2021 Shared-Task: Towards Irony and Sentiment Detection in Arabic Tweets using Multi-headed-LSTM-CNN-GRU and MaRBERT. In Proceedings of the Sixth Arabic Natural Language Processing Workshop. In: Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 306–311. https://aclanthology.org/2021.wanlp-1.37
Abdul-Mageed M, Elmadany A, Nagoudi E, Moatez B (2021) ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 7088–7105. https://doi.org/10.18653/v1/2021.acl-long.551
Abo MEM, Raj RG, Qazi A (2019) A review on Arabic sentiment analysis: state-of-the-art, taxonomy and open research challenges. IEEE Access 7(2019):162008–162024
Alamro H, Alshehri M, Alharbi B, Khayyat Z, Kalkatawi M, Jaber I I, Zhang X (2021) Overview of the Arabic Sentiment Analysis 2021 Competition at KAUST
Alayba AM, Palade V, England M, Iqbal R (2017) Arabic language sentiment analysis on health services, In 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR). 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR) 1, 1, 114–118. https://doi.org/10.1109/ASAR.2017.8067771
Alayba AM, Palade V, England M, Iqbal R (2018) A combined CNN and LSTM model for Arabic sentiment analysis. In: International Andreas H, Peter K, Min Tjoa A, Edgar W (eds) Machine Learning and Knowledge Extraction. Springer Publishing, Cham, pp 179–191
Alharbi AI, Lee M (2020) Combining character and word embeddings for affect in Arabic Informal social media microblogs. In: International Elisabeth M, Farid M, Helmut H, Philipp C (eds) Natural language processing and information systems. Springer Publishing, Cham, pp 213–224
Alharbi B, Alamro H, Alshehri M, Khayyat Z, Kalkatawi M, Jaber I I, Zhang X (2020) ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset
Al-Twairesh N, Al-Negheimish H (2019) Surface and deep features ensemble for sentiment analysis of arabic tweets. IEEE Access 7(2019):84122–84131
Antoun Wissam, Baly Fady, Hajj Hazem (2020) AraBERT: Transformer-based Model for Arabic Language Understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. European Language Resource Association, Marseille, France, 9–15. https://aclanthology.org/2020.osact-1.2
Arazo E, Ortego D, Albert P, O’Connor N E, McGuinness K (2020) Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, online, 1–8
Bahdanau Dzmitry, Cho Kyunghyun, Bengio Yoshua (2015) Neural Machine Translation by Jointly Learning to Align and Translate
Barbieri F, Anke LE, Camacho-Collados J (2021) Xlm-t: a multilingual language model toolkit for twitter
Biewald L (2020) Experiment tracking with weights and biases. https://www.wandb.com/ Software available from wandb.com
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Associat Computat Linguist 5(7):135–146
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
Darwish K, Habash N, Abbas M, Al-Khalifa H, Al-Natsheh HT, Bouamor H, Bouzoubaa K, Cavalli-Sforza V, El-Beltagy SR, El-Hajj W et al (2021) A panoramic survey of natural language processing in the Arab world. Commun ACM 64(4):72–81
Darwish K, Mubarak H (2016) Farasa: a new fast and accurate Arabic Word Segmenter. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), Portorož, Slovenia, 1070–1074. https://aclanthology.org/L16-1170
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
DeYoung J, Jain S, Rajani N F, Lehman E, **ong C, Socher R, Wallace B C (2020) ERASER: A Benchmark to Evaluate Rationalized NLP Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4443–4458. https://doi.org/10.18653/v1/2020.acl-main.408
El Mahdaouy A, El Mekki A, Essefar K, El Mamoun N, Berrada I, Khoumsi A (2021) Deep multi-task model for sarcasm detection and sentiment analysis in Arabic Language. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 334–339. https://aclanthology.org/2021.wanlp-1.42
El-Beltagy S R, El Kalamawy M, Soliman A B (2017) NileTMRG at SemEval-2017 Task 4: Arabic sentiment analysis. in proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 790–795. https://doi.org/10.18653/v1/S17-2133
Farha Ibrahim Abu, Magdy Walid (2019) Mazajak: An Online Arabic Sentiment Analyser. In: Proceedings of the Fourth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Florence, Italy, 192–198. https://doi.org/10.18653/v1/W19-4621
Farha Ibrahim Abu, Magdy Walid (2020) From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. European Language Resource Association, Marseille, France, 32–39. https://aclanthology.org/2020.osact-1.5
Farha Ibrahim Abu, Magdy Walid (2021) Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 21–31. https://aclanthology.org/2021.wanlp-1.3
Farha Ibrahim Abu, Zaghouani Wajdi, Magdy Walid (2021) Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 296–305. https://aclanthology.org/2021.wanlp-1.36
Gaanoun K, Benelallam I (2021) Sarcasm and sentiment detection in Arabic language a hybrid approach combining embeddings and rule-based features. In Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 351–356. https://aclanthology.org/2021.wanlp-1.45
Ganaie MA, Hu M et al. (2021) Ensemble deep learning: A review
González José-Ángel, Pla F, Hurtado L-F (2017) ELiRF-UPV at SemEval-2017 Task 4: Sentiment Analysis using Deep Learning. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 723–727. https://doi.org/10.18653/v1/S17-2121
Goodfellow I, Bengio Y, Courville A(2016) Deep learning. MIT Press, online. http://www.deeplearningbook.org
Goyal N, Du J, Ott M, Anantharaman G, Conneau A (2021) Larger-Scale transformers for multilingual masked language modeling. In: Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Association for Computational Linguistics, Online, 29–33. https://doi.org/10.18653/v1/2021.repl4nlp-1.4
Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith N A (2020) Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8342–8360. https://doi.org/10.18653/v1/2020.acl-main.740
Hegazi MO, Al-Dossari Y, Al-Yahy A, Al-Sumari A, Hilal A (2021) Preprocessing Arabic text on social media. Heliyon 7(2):e06191
Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of Arabic tweets using deep learning. Proced Comput Sci 142(2018):114–122
Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computat 9(8):1735–1780
Htait A, Fournier S, Bellot P (2017) LSIS at SemEval-2017 Task 4: using adapted sentiment similarity seed words for english and arabic tweet polarity classification. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 718–722. https://doi.org/10.18653/v1/S17-2120
Jabreel M, Moreno A (2017) SiTAKA at SemEval-2017 Task 4: sentiment analysis in twitter based on a rich set of features. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 694–699. https://doi.org/10.18653/v1/S17-2115
Jacovi A, Goldberg Y (2020) Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4198–4205. https://doi.org/10.18653/v1/2020.acl-main.386
James B, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281–305
Jurek A, Mulvenna MD, Bi Y (2015) Improved lexicon-based sentiment analysis for social media analytics. Sec Informat 4(1):1–13
Kaushik C, Mishra A (2014) A scalable, lexicon based technique for sentiment analysis
Khalil T, Halaby A, Hammad M, El-Beltagy S R (2015) Which configuration works best? an experimental study on supervised Arabic twitter sentiment analysis. In: 2015 First International Conference on Arabic Computational Linguistics (ACLing). IEEE, online, 86–93
Khan HU, Peacock D (2019) Possible effects of emoticon and emoji on sentiment analysis web services of work organisations. Int J Work Organisat Emot 10(2):130–161
Kokhlikyan N, Miglani V, Martin M, Wang E, Alsallakh B, Reynolds J, Melnikov A, Kliushkina N, Araya C, Yan S et al (2020) A unified and generic model interpretability library for pytorch, Captum
Kudo Taku (2018) Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 66–75. https://doi.org/10.18653/v1/P18-1007
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Computat 1(4):541–551
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. IEEE, online, 2980–2988
Liu C, Fang F, Lin X, Cai T, Tan X, Liu J, Lu X (2021) Improving sentiment analysis accuracy with emoji embedding. J Safety Sci Resil 2(4):246–252
Mahmoud A-A (2015) Essa Safa Bani, Alsmadi Izzat (2015) Lexicon-based sentiment analysis of arabic tweets. Int J Soc Network Min 2(2):101–114
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space
Mohammad A-S, Bashar T, Mahmoud A-A, Yaser J (2019) Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews. Int J Mach Learn Cybernet 10(8):2163–2175
Morris J, Lifland E, Yoo J Y, Grigsby J, ** D, Qi Y (2020) TextAttack: a framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 119–126. https://doi.org/10.18653/v1/2020.emnlp-demos.16
Mubarak H, Hassan S, Chowdhury S A (2022) Emojis as anchors to detect Arabic offensive language and hate speech
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P HS, Dokania P K (2020) Calibrating deep neural networks using focal loss
Müller R, Kornblith S, Hinton G E (2019) When does label smoothing help?. In Advances in Neural Information Processing Systems, H Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, E Fox, and R Garnett (Eds.), Vol. 32. Curran Associates, Inc., online. https://proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf
Nabil M, Aly M, Atiya A (2015) ASTD: Arabic sentiment tweets dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing. Association for computational linguistics, Lisbon, Portugal, 2515–2519. https://doi.org/10.18653/v1/D15-1299
Olsson F (2009) A literature survey of active machine learning in the context of natural language processing. In: SICS Technical Report. Swedish Institute of Computer Science, online, p 1–59
Opitz David, Maclin Richard (1999) Popular ensemble methods: an empirical study. J Artific Intell R 11(1999):169–198
Oueslati Oumaima, Cambria Erik, HajHmida Moez Ben, Ounelli Habib (2020) A review of sentiment analysis research in Arabic language. Future Generat Comput Syst 112(2020):408–430
Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2020) ASA: a framework for Arabic sentiment analysis. J Informat Sci 46(4):544–559
Pennington J, Socher R, Manning C (2014) GloVe: global vectors for word representation. In: proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532–1543. https://doi.org/10.3115/v1/D14-1162
Rabbimov I, Mporas I, Simaki V, Kobilov S (2020) Investigating the effect of emoji in opinion classification of Uzbek movie review comments. In: International Conference on Speech and Computer. Springer, online, p 435–445
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Rahaman N, Baratin A, Arpit D, Draxler F, Lin M, Hamprecht F, Bengio Y, Courville A (2019) On the spectral bias of neural networks. In: International Conference on Machine Learning. PMLR, online, p 5301–5310
Ribeiro M, Singh S, Guestrin C (2016) Why Should I Trust You?: explaining the predictions of any classifier. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. Association for Computational Linguistics, San Diego, California, 97–101. https://doi.org/10.18653/v1/N16-3020
Robert G, Jörn-Henrik J, Claudio M, Richard Z, Wieland B, Matthias B, Wichmann Felix A (2020) Shortcut learning in deep neural networks. Nature Mach Intell 2(11):665–673
Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 Task 4: sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 502–518. https://doi.org/10.18653/v1/S17-2088
Safaya A, Abdullatif M, Yuret D (2020) KUISAIL at SemEval-2020 Task 12: BERT-CNN for offensive speech identification in social media. In: Proceedings of the fourteenth workshop on semantic evaluation. International Committee for Computational Linguistics, Barcelona (online), 2054–2059. https://doi.org/10.18653/v1/2020.semeval-1.271
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units
Shekar BH, Dagnew G (2019) Grid search-based hyperparameter tuning and classification of microarray cancer data. In 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP). IEEE, online, 1–8
Shiha M, Ayvaz S (2017) The effects of emoji in sentiment analysis. Int J Comput Electr Eng (IJCEE) 9(1):360–369
Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M, Prabhat MR, Adams R (2015) Scalable bayesian optimization using deep neural networks. In International conference on machine learning. PMLR, online, 2171–2180
Soliman T-H, Elmasry MA, Hedar A, Doss MM (2014) Sentiment analysis of Arabic slang comments on facebook. Int J Comput Technol 12(5):3470–3478
Song B, Pan C, Wang S, Luo Z (2021) DeepBlueAI at WANLP-EACL2021 task 2: a deep ensemble-based method for sarcasm and sentiment detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Kyiv, Ukraine (Virtual), 390–394. https://aclanthology.org/2021.wanlp-1.52
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, online, p 2818–2826
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Computat Linguist 37(2):267–307
Tenney I, Wexler J, Bastings J, Bolukbasi T, Coenen A, Gehrmann S, Jiang E, Pushkarna M, Radebaugh C, Reif E, et al (2020) The language interpretability tool: extensible, interactive visualizations and analysis for NLP models. (2020)
The Editors of Encyclopaedia (2021) Arabic language. https://www.britannica.com/topic/Arabic-language
Utlu I, Yücesoy V, Koc A, Cukur T, Senel L-K (2018) Semantic structure and interpretability of word embeddings. IEEE/ACM Trans Audio, Speech Language Process 26(10):1769–1779
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In advances in neural information processing systems, I Guyon, U-Von Luxburg, S Bengio, H Wallach, R Fergus, S Vishwanathan, R Garnett (Eds), Vol. 30. Curran Associates, Inc., online. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wadhawan A(2021) Arabert and farasa segmentation based approach for sarcasm and sentiment detection in arabic tweets
Wang J, Xu J, Wang X (2018) Combination of hyperband and Bayesian optimization for hyperparameter optimization in deep learning
Wu Y, Schuster M, Chen Z, Le Q V, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation
Xue L, Gao M, Chen Z, **ong C, Xu R (2021) Robustness evaluation of transformer-based form field extractors via form attacks
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P HS (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision. online, p 1529–1537
Zhou Z-H, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artific Intell 137(1–2):239–263
Acknowledgements
This research is supported by the Vector Scholarship in Artificial Intelligence, provided through the Vector Institute.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mohamed, O., Kassem, A.M., Ashraf, A. et al. An ensemble transformer-based model for Arabic sentiment analysis. Soc. Netw. Anal. Min. 13, 11 (2023). https://doi.org/10.1007/s13278-022-01009-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-022-01009-0