A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)

Khediri, Nouha; Ben Ammar, Mohamed; Kherallah, Monji

doi:10.1007/s11042-023-16424-4

A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)

Published: 12 December 2023

Volume 83, pages 57759–57783, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

188 Accesses
Explore all metrics

Abstract

Emotion recognition can be used in a wide range of applications. We are interested in the E-learning system because of the several benefits of learning anywhere and anytime. Despite important advances, students’ emotions can influence the learning process, and this is the case with both traditional learning as well as E-learning. Emotion can limit and blocks our ability to learn, think, and solve problems. On the contrary, it can drive us to success by boosting our innate mental ability when we love what we do and when we are affected by happiness and excitement. In recent years, a large number of studies have addressed the problem of emotion recognition based on different modalities. But the information provided by student emotion recognition based on single-modal data like face is insufficient. Additionally, selecting one affective state over another might be challenging at times. To remove these ambiguities, we propose an Intelligent Affective Tutoring System named "Multimodal Intelligent Tutoring Emotion Recognition System" MITERS that merges three modalities such as face, text, and speech simultaneously. Our System is a real-time-based system that detects the emotion of students and gives adequate feedback. For this purpose, we use deep learning techniques. Among these are (a) Deep Convolution Neural Network (DCNN) used to detect emotion from face modality, (b) Bidirectional Long Short Memory (BiLSTM) for predicting emotions from text information, and (c) Convolutional Neural Network (CNN) which is used to detect emotions from speech modality. The experimental results are compared with some of the well-known approaches and the proposed MITERS has performed well with a classification accuracy of 97% in MELD which is a multimodal database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of an adaptive E-learning platform with facial emotion recognition

Article 09 February 2023

Using Facial Expression to Detect Emotion in E-learning System: A Deep Learning Method

Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment

Data availability

All data generated or analyzed during this study are included in this published article.

References

Akputu OK, Inyang UG, Msugh O, Mughal FT, Usoro A (2022) Recognizing facial emotions for educational learning settings. IAES Int J Robot Autom 11(1):21
Google Scholar
Litman D, Forbes K (2003) Recognizing emotions from student speech in tutoring dialogues. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE Cat. No. 03EX721) (pp 25–30), 2003, November, IEEE
Reimers F, Schleicher A, Saavedra J, Tuominen S (2020) Supporting the continuation of teaching and learning during the COVID-19 Pandemic. Oecd 1(1):1–38
Google Scholar
Hazarika D, Boruah A, Puzari R (2022) Growth of Edtech market in India: a study on pre-pandemic and ongoing pandemic situation. J Posit School Psychol 6(3):5291–5303
Google Scholar
Khediri N, Ammar MB, Kherallah M (2023) Deep-Learning Based Approach to Facial Emotion Recognition Through Convolutional Neural Network. Int J Comput Inf Eng 17(2):132–136
Google Scholar
Choi JH, Lee JS (2019) EmbraceNet: A robust deep learning architecture for multimodal classification. Inf Fusion 51:259–270
Article Google Scholar
Cristinacce D, Cootes T (2008) Automatic feature localisation with constrained local models. J Pattern Recognit 41(10):3054–3067
Article Google Scholar
De Carolis B, D’Errico F, Macchiarulo N, Paciello M, Palestra G (2021) Recognizing cognitive emotions in e-learning environment. International Workshop on Higher Education Learning Methodologies and Technologies Online. Springer, Cham, pp 17–27
Google Scholar
D’Mello SK, Dowell N, Graesser AC (2011) Does It Really Matter Whether Students’ Contributions Are Spoken versus Typed in an Intelligent Tutoring System with Natural Language? J Exp Psychol Appl 17(1):1–17
Article Google Scholar
Le TH, Tran HN, Nguyen PD, Nguyen HQ, Nguyen TB, Tran TH, Vu H, Tran TT, Le TL (2022) Spatial and temporal hand-raising recognition from classroom videos using locality, relative position-aware non-local networks and hand tracking. Vietnam J Comput Sci 1–29
Filali H, Riffi J, Boulealam C, Mahraz MA, Tairi H (2022) Multimodal Emotional Classification Based on Meaningful Learning. Big Data Cognit Comput 6(3):95
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp 807-814
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion Identification in Conversations. EMNLP. Online, Association for Computational Linguistics, Findings of the Association for Computational Linguistics, pp 2470–2481
Google Scholar
Veni S, Anand R, Mohan D, PAUL E (2021) Feature fusion in multimodal emotion recognition system for enhancement of human-machine interaction. In IOP conference series: materials science and engineering (vol 1084, no 1, p 012004). IOP Publishing, March, 2021
Cao S, Guo D, Cao L, Li S, Nie J, Singh AK, Lv H (2022) VisDmk: visual analysis of massive emotional danmaku in online videos. Vis Comput, pp.1-18
Hazarika D, Boruah A, Puzari R (2022) Growth of Edtech market in India: a study on pre-pandemic and ongoing pandemic situation. J Posit School Psychol 6(3):5291–5303
Google Scholar
Hua A, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2:797–800
Google Scholar
Khediri N, Ammar MB, Kherallah M (2023) Deep-Learning Based Approach to Facial Emotion Recognition Through Convolutional Neural Network. Int J Comput Inf Eng 17(2):132–136
Google Scholar
Cao S, Guo D, Cao L, Li S, Nie J, Singh AK, Lv H (2022) VisDmk: visual analysis of massive emotional danmaku in online videos. Vis Comput 1–18
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: A multimodal multi-party dataset for emotion recognition in conversations, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 527–536, Florence, Italy. Association for Computational Linguistics, ar**v preprint (ar**v:1810.02508)
Khediri N, Ammar MB, Kherallah M (2022) A new deep learning fusion approach for emotion recognition based on face and text. In Computational collective intelligence: 14th International conference, (ICCCI 2022), Hammamet, Tunisia, 28-30 Sept 2022, proceedings (vol 13501, p 75, Springer Nature)
Wang H, Tlili A, Huang R, Cai Z, Li M, Cheng Z, Yang D, Li M, Zhu X, Fei C (2023) Examining the applications of intelligent tutoring systems in real educational contexts: A systematic literature review from the social experiment perspective, Education and Information Technologies, pp.1-36
Lin HCK, Wang CH, Chao CJ, Chien MK (2012) Employing Textual and Facial Emotion Recognition to Design an Affective Tutoring System. Turkish Online J Educ Technol-TOJET 11(4):418–426
Google Scholar
Muthamilselvan T, Brindha K, Senthilkumar S, Chatterjee JM, Hu YC (2022) Optimized face-emotion learning using convolutional neural network and binary whale optimization. Multimed Tools Appl 1–24
Alim SA, Rashid NKA (2018) Some commonly used speech feature extraction algorithms. In From natural to artificial intelligence- algorithms and applications. London, United Kingdom: IntechOpen, 2018 [Online]. Available: https://www.intechopen.com/chapters/63970. https://doi.org/10.5772/intechopen.80419
Liu M, Yu D (2022) Towards intelligent E-learning systems. Educ Inf Technol. https://doi.org/10.1007/s10639-022-11479-6
Article Google Scholar
Ben Ammar M, Neji M, Alimi AM, Gouarderes G (2010) The Affective Tutoring System. Expert Systems with Applications 37:3013–3023
Article Google Scholar
Khediri N, Ben Ammar M, Kherallah M (2017) Towards an online Emotional Recognition System for Intelligent Tutoring Environment, The International Arab Conference on Information Technology, ACIT’2017, Yassmine Hammamet, Tunisia, December 22–24
D’errico F, Paciello M, De Carolis B, Vattanid A, Palestra G, Anzivino G (2018) Cognitive emotions in e-learning processes and their potential relationship with students’ academic adjustment. Int J Emotional Education, (Special issue volume 10, number 1, ISSN 2073-7629, April 2018 pp 89–111)
Luna-Jiménez C, Griol D, Callejas Z, Kleinlein R, Montero JM, Fernández-Martínez F (2021) Multimodal emotion recognition on ravdess dataset using transfer learning. Sensors 21(22):7665
Article Google Scholar
Khediri N, Ben Ammar M, Kherallah M (2021) Comparison of image segmentation using different color spaces. In: 2021 IEEE 21st International conference on communication technology, (ICCT2021, Tian**, China, 13-16 October 2021)
Ma W, Adesope OO, Nesbit JC, Liu Q (2014) Intelligent tutoring systems and learning outcomes: A meta-analysis. J Educ Psychol 106(4):901–918
Article Google Scholar
Maatuk AM, Elberkawi EK, Aljawarneh S et al (2022) The COVID-19 pandemic and E-learning: challenges and opportunities from the perspective of students and instructors. J Comput High Educ 34:21–38. https://doi.org/10.1007/s12528-021-09274-2
Article Google Scholar
Lam L, Suen CY (1994) A theoretical analysis of the application of majority voting to pattern recognition. In Proceedings of the 12th IAPR international conference on pattern recognition, vol. 3-conference C: signal processing (Cat. No. 94CH3440-5) (vol 2, pp 418–420), October. IEEE
Mousavinasab E, Zarifsanaiey N, Niakan Kalhori SR, Rakhshan M, Keikha L, Ghazi Saeedi M (2021) Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interact Learn Environ 29(1):142–163
Article Google Scholar
Lin HCK, Wang CH, Chao CJ, Chien MK (2012) Employing Textual and Facial Emotion Recognition to Design an Affective Tutoring System. Turkish Online J Educ Technol-TOJET 11(4):418–426
Google Scholar
Filali H, Riffi J, Boulealam C, Mahraz MA, Tairi H (2022) Multimodal Emotional Classification Based on Meaningful Learning. Big Data Cognit Comput 6(3):95
Article Google Scholar
Petrakos M, Benediktsson JA, Kanellopoulos I (2001) The effect of classifier agreement on the accuracy of the combined classifier in decision level fusion. IEEE Trans Geosci Remote Sens 39(11):2539–2546
Article Google Scholar
Tang K, Tie Y, Yang T, Guan L (2014) Multimodal emotion recognition (MER) system. In 2014 IEEE 27th Canadian conference on electrical and computer engineering (CCECE) (pp 1–6). IEEE
Petrovica S, Anohina-Naumeca A, Ekenel HK (2017) Emotion recognition in affective tutoring systems: Collection of ground-truth data. Procedia Comput Sci 104:437–444
Article Google Scholar
Hua A, Litman DJ, Forbes-Riley K, Rotaru M, Tetreault J, Purandare A (2006) Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH (vol 2, pp 797–800)
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: a multimodal multi-party dataset for emotion recognition in conversations. In Proceedings of the 57th annual meeting of the association for computational linguistics, pp 527–536, Florence, Italy. Association for computational linguistics. ar**v preprint (ar**v:1810.02508, 2018 Oct 5)
Ratnadeep D, Kishori G (2015) Feature Extraction Techniques for Speech Recognition: A Review. Int J Sci Eng Res 6:143–147
Google Scholar
Bahreini K, Nadolski R, Westera W (2016) Data fusion for real-time multimodal emotion recognition through webcams and microphones in e-learning. Int J Human-Comput Inter 32(5):415–430
Article Google Scholar
Reimers F, Schleicher A, Saavedra J, Tuominen S (2020) Supporting the continuation of teaching and learning during the COVID-19 Pandemic. Oecd 1(1):1–38
Google Scholar
Siddiqui HUR, Zafar K, Saleem AA, Raza MA, Dudley S, Rustam F, Ashraf I (2023) Emotion classification using temporal and spectral features from IR-UWB-based respiration data. Multimed Tools Appl 82(12):18565–18583
Article Google Scholar
Cassano F, Piccinno A, Roselli T, Rossano V (2019) Gamification and learning analytics to improve engagement in university courses. In Methodologies and intelligent systems for technology enhanced learning, 8th international conference 8 pp 156–63. Springer international publishing
Sekkate S, Khalil M, Adib A (2022) A statistical feature extraction for deep speech emotion recognition in a bilingual scenario. Multimed Tools Appl 1–18
Namrata D (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Adv Res Eng Technol (ISSN 2320-6802, volume 1)
Siriwardhana S, Kaluarachchi T, Billinghurst M, Nanayakkara S (2020) Multimodal emotion recognition with transformer-based self supervised feature fusion. IEEE Access 8:176274–176285
Article Google Scholar
Ghosal D, Majumder N, Gelbukh A, Mihalcea R, Poria S (2020) COSMIC: COmmonSense knowledge for eMotion Identification in Conversations. Findings of the Association for Computational Linguistics: EMNLP. Online, Association for Computational Linguistics, pp 2470–2481
Chapter Google Scholar
De Carolis B, D’Errico F, Macchiarulo N, Paciello M, Palestra G (2021) Recognizing cognitive emotions in e-learning environment, In International Workshop on Higher Education Learning Methodologies and Technologies Online. Springer, Cham, pp 17–27
Google Scholar
Nandi A, Xhafa F, Subirats L, Fort S (2020) A survey on multimodal data stream mining for e-learner’s emotion recognition. In: 2020 International conference on omni-layer intelligent systems (COINS). pp 1–6
**e B, Sidulova M, Park CH (2021) Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion. Sensors. 21(14):4913. https://doi.org/10.3390/s21144913
Article Google Scholar

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU-FFR-2023-0133”.

Author information

Authors and Affiliations

Department of Computer Sciences, Faculty of Sciences of Tunis, University of Tunis El Manar, Tunis, Tunisia
Nouha Khediri
Department of Information Systems, Faculty of Computing and IT, Northern Border University, Rafha, Saudi Arabia
Nouha Khediri & Mohamed Ben Ammar
Faculty of Sciences, University of Sfax, Sfax, Tunisia
Monji Kherallah

Authors

Nouha Khediri
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Ben Ammar
View author publications
You can also search for this author in PubMed Google Scholar
Monji Kherallah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nouha Khediri.

Ethics declarations

Competing Interests

The authors have no competing interests to declare relevant to this article’s content.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khediri, N., Ben Ammar, M. & Kherallah, M. A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS). Multimed Tools Appl 83, 57759–57783 (2024). https://doi.org/10.1007/s11042-023-16424-4

Download citation

Received: 24 January 2023
Revised: 13 July 2023
Accepted: 22 July 2023
Published: 12 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s11042-023-16424-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation of an adaptive E-learning platform with facial emotion recognition

Using Facial Expression to Detect Emotion in E-learning System: A Deep Learning Method

Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Real-time Multimodal Intelligent Tutoring Emotion Recognition System (MITERS)

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation of an adaptive E-learning platform with facial emotion recognition

Using Facial Expression to Detect Emotion in E-learning System: A Deep Learning Method

Deep Learning Based Audio-Visual Emotion Recognition in a Smart Learning Environment

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation