Abstract
Emotion recognition can be used in a wide range of applications. We are interested in the E-learning system because of the several benefits of learning anywhere and anytime. Despite important advances, students’ emotions can influence the learning process, and this is the case with both traditional learning as well as E-learning. Emotion can limit and blocks our ability to learn, think, and solve problems. On the contrary, it can drive us to success by boosting our innate mental ability when we love what we do and when we are affected by happiness and excitement. In recent years, a large number of studies have addressed the problem of emotion recognition based on different modalities. But the information provided by student emotion recognition based on single-modal data like face is insufficient. Additionally, selecting one affective state over another might be challenging at times. To remove these ambiguities, we propose an Intelligent Affective Tutoring System named "Multimodal Intelligent Tutoring Emotion Recognition System" MITERS that merges three modalities such as face, text, and speech simultaneously. Our System is a real-time-based system that detects the emotion of students and gives adequate feedback. For this purpose, we use deep learning techniques. Among these are (a) Deep Convolution Neural Network (DCNN) used to detect emotion from face modality, (b) Bidirectional Long Short Memory (BiLSTM) for predicting emotions from text information, and (c) Convolutional Neural Network (CNN) which is used to detect emotions from speech modality. The experimental results are compared with some of the well-known approaches and the proposed MITERS has performed well with a classification accuracy of 97% in MELD which is a multimodal database.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-16424-4/MediaObjects/11042_2023_16424_Fig15_HTML.png)
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published article.
References
Akputu OK, Inyang UG, Msugh O, Mughal FT, Usoro A (2022) Recognizing facial emotions for educational learning settings. IAES Int J Robot Autom 11(1):21
Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE Cat. No. 03EX721) (pp 25–30), 2003, November, IEEE
Reimers F, Schleicher A, Saavedra J, Tuominen S (2020) Supporting the continuation of teaching and learning during the COVID-19 Pandemic. Oecd 1(1):1–38
Hazarika D, Boruah A, Puzari R (2022) Growth of Edtech market in India: a study on pre-pandemic and ongoing pandemic situation. J Posit School Psychol 6(3):5291–5303
Khediri N, Ammar MB, Kherallah M (2023) Deep-Learning Based Approach to Facial Emotion Recognition Through Convolutional Neural Network. Int J Comput Inf Eng 17(2):132–136
Choi JH, Lee JS (2019) EmbraceNet: A robust deep learning architecture for multimodal classification. Inf Fusion 51:259–270
Cristinacce D, Cootes T (2008) Automatic feature localisation with constrained local models. J Pattern Recognit 41(10):3054–3067
De Carolis B, D’Errico F, Macchiarulo N, Paciello M, Palestra G (2021) Recognizing cognitive emotions in e-learning environment. International Workshop on Higher Education Learning Methodologies and Technologies Online. Springer, Cham, pp 17–27
D’Mello SK, Dowell N, Graesser AC (2011) Does It Really Matter Whether Students’ Contributions Are Spoken versus Typed in an Intelligent Tutoring System with Natural Language? J Exp Psychol Appl 17(1):1–17
Le TH, Tran HN, Nguyen PD, Nguyen HQ, Nguyen TB, Tran TH, Vu H, Tran TT, Le TL (2022) Spatial and temporal hand-raising recognition from classroom videos using locality, relative position-aware non-local networks and hand tracking. Vietnam J Comput Sci 1–29
Filali H, Riffi J, Boulealam C, Mahraz MA, Tairi H (2022) Multimodal Emotional Classification Based on Meaningful Learning. Big Data Cognit Comput 6(3):95
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp 807-814
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion Identification in Conversations. EMNLP. Online, Association for Computational Linguistics, Findings of the Association for Computational Linguistics, pp 2470–2481
Veni S, Anand R, Mohan D, PAUL E (2021) Feature fusion in multimodal emotion recognition system for enhancement of human-machine interaction. In IOP conference series: materials science and engineering (vol 1084, no 1, p 012004). IOP Publishing, March, 2021
Cao S, Guo D, Cao L, Li S, Nie J, Singh AK, Lv H (2022) VisDmk: visual analysis of massive emotional danmaku in online videos. Vis Comput, pp.1-18
Hazarika D, Boruah A, Puzari R (2022) Growth of Edtech market in India: a study on pre-pandemic and ongoing pandemic situation. J Posit School Psychol 6(3):5291–5303
Hua A, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2:797–800
Khediri N, Ammar MB, Kherallah M (2023) Deep-Learning Based Approach to Facial Emotion Recognition Through Convolutional Neural Network. Int J Comput Inf Eng 17(2):132–136
Cao S, Guo D, Cao L, Li S, Nie J, Singh AK, Lv H (2022) VisDmk: visual analysis of massive emotional danmaku in online videos. Vis Comput 1–18
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536, Florence, Italy. Association for Computational Linguistics, ar**v preprint (ar**v:1810.02508)
Khediri N, Ammar MB, Kherallah M (2022) A new deep learning fusion approach for emotion recognition based on face and text. In Computational collective intelligence: 14th International conference, (ICCCI 2022), Hammamet, Tunisia, 28-30 Sept 2022, proceedings (vol 13501, p 75, Springer Nature)
Wang H, Tlili A, Huang R, Cai Z, Li M, Cheng Z, Yang D, Li M, Zhu X, Fei C (2023) Examining the applications of intelligent tutoring systems in real educational contexts: A systematic literature review from the social experiment perspective, Education and Information Technologies, pp.1-36
Lin HCK, Wang CH, Chao CJ, Chien MK (2012) Employing Textual and Facial Emotion Recognition to Design an Affective Tutoring System. Turkish Online J Educ Technol-TOJET 11(4):418–426
Muthamilselvan T, Brindha K, Senthilkumar S, Chatterjee JM, Hu YC (2022) Optimized face-emotion learning using convolutional neural network and binary whale optimization. Multimed Tools Appl 1–24
Alim SA, Rashid NKA (2018) Some commonly used speech feature extraction algorithms. In From natural to artificial intelligence- algorithms and applications. London, United Kingdom: IntechOpen, 2018 [Online]. Available: https://www.intechopen.com/chapters/63970. https://doi.org/10.5772/intechopen.80419
Liu M, Yu D (2022) Towards intelligent E-learning systems. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11479-6
Ben Ammar M, Neji M, Alimi AM, Gouarderes G (2010) The Affective Tutoring System. Expert Systems with Applications 37:3013–3023
Khediri N, Ben Ammar M, Kherallah M (2017) Towards an online Emotional Recognition System for Intelligent Tutoring Environment, The International Arab Conference on Information Technology, ACIT’2017, Yassmine Hammamet, Tunisia, December 22–24
D’errico F, Paciello M, De Carolis B, Vattanid A, Palestra G, Anzivino G (2018) Cognitive emotions in e-learning processes and their potential relationship with students’ academic adjustment. Int J Emotional Education, (Special issue volume 10, number 1, ISSN 2073-7629, April 2018 pp 89–111)
Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F (2021) Multimodal emotion recognition on ravdess dataset using transfer learning. Sensors 21(22):7665
Khediri N, Ben Ammar M, Kherallah M (2021) Comparison of image segmentation using different color spaces. In: 2021 IEEE 21st International conference on communication technology, (ICCT2021, Tian**, China, 13-16 October 2021)
Ma W, Adesope OO, Nesbit JC, Liu Q (2014) Intelligent tutoring systems and learning outcomes: A meta-analysis. J Educ Psychol 106(4):901–918
Maatuk AM, Elberkawi EK, Aljawarneh S et al (2022) The COVID-19 pandemic and E-learning: challenges and opportunities from the perspective of students and instructors. J Comput High Educ 34:21–38. https://doi.org/10.1007/s12528-021-09274-2
Lam L, Suen CY (1994) A theoretical analysis of the application of majority voting to pattern recognition. In Proceedings of the 12th IAPR international conference on pattern recognition, vol. 3-conference C: signal processing (Cat. No. 94CH3440-5) (vol 2, pp 418–420), October. IEEE
Mousavinasab E, Zarifsanaiey N, Niakan Kalhori SR, Rakhshan M, Keikha L, Ghazi Saeedi M (2021) Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interact Learn Environ 29(1):142–163
Lin HCK, Wang CH, Chao CJ, Chien MK (2012) Employing Textual and Facial Emotion Recognition to Design an Affective Tutoring System. Turkish Online J Educ Technol-TOJET 11(4):418–426
Filali H, Riffi J, Boulealam C, Mahraz MA, Tairi H (2022) Multimodal Emotional Classification Based on Meaningful Learning. Big Data Cognit Comput 6(3):95
Petrakos M, Benediktsson JA, Kanellopoulos I (2001) The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion. IEEE Trans Geosci Remote Sens 39(11):2539–2546
Tang K, Tie Y, Yang T, Guan L (2014) Multimodal emotion recognition (MER) system. In 2014 IEEE 27th Canadian conference on electrical and computer engineering (CCECE) (pp 1–6). IEEE
Petrovica S, Anohina-Naumeca A, Ekenel HK (2017) Emotion recognition in affective tutoring systems: Collection of ground-truth data. Procedia Comput Sci 104:437–444
Hua A, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (vol 2, pp 797–800)
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: a multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th annual meeting of the association for computational linguistics, pp 527–536, Florence, Italy. Association for computational linguistics. ar**v preprint (ar**v:1810.02508, 2018 Oct 5)
Ratnadeep D, Kishori G (2015) Feature Extraction Techniques for Speech Recognition: A Review. Int J Sci Eng Res 6:143–147
Bahreini K, Nadolski R, Westera W (2016) Data fusion for real-time multimodal emotion recognition through webcams and microphones in e-learning. Int J Human-Comput Inter 32(5):415–430
Reimers F, Schleicher A, Saavedra J, Tuominen S (2020) Supporting the continuation of teaching and learning during the COVID-19 Pandemic. Oecd 1(1):1–38
Siddiqui HUR, Zafar K, Saleem AA, Raza MA, Dudley S, Rustam F, Ashraf I (2023) Emotion classification using temporal and spectral features from IR-UWB-based respiration data. Multimed Tools Appl 82(12):18565–18583
Cassano F, Piccinno A, Roselli T, Rossano V (2019) Gamification and learning analytics to improve engagement in university courses. In Methodologies and intelligent systems for technology enhanced learning, 8th international conference 8 pp 156–63. Springer international publishing
Sekkate S, Khalil M, Adib A (2022) A statistical feature extraction for deep speech emotion recognition in a bilingual scenario. Multimed Tools Appl 1–18
Namrata D (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol (ISSN 2320-6802, volume 1)
Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multimodal emotion recognition with transformer-based self supervised feature fusion. IEEE Access 8:176274–176285
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion Identification in Conversations. Findings of the Association for Computational Linguistics: EMNLP. Online, Association for Computational Linguistics, pp 2470–2481
De Carolis B, D’Errico F, Macchiarulo N, Paciello M, Palestra G (2021) Recognizing cognitive emotions in e-learning environment, In International Workshop on Higher Education Learning Methodologies and Technologies Online. Springer, Cham, pp 17–27
Nandi A, Xhafa F, Subirats L, Fort S (2020) A survey on multimodal data stream mining for e-learner’s emotion recognition. In: 2020 International conference on omni-layer intelligent systems (COINS). pp 1–6
**e B, Sidulova M, Park CH (2021) Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion. Sensors. 21(14):4913. https://doi.org/10.3390/s21144913
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2023-0133”.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors have no competing interests to declare relevant to this article’s content.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khediri, N., Ben Ammar, M. & Kherallah, M. A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS). Multimed Tools Appl 83, 57759–57783 (2024). https://doi.org/10.1007/s11042-023-16424-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16424-4