Abstract
It is possible to identify emotions based on a person's speech. The field of research focusing on expressing emotions through voice is continuously evolving. This study utilizes the SAVEE and IEMOCAP datasets to explore Speech Emotion Recognition. The SAVEE dataset consists of seven emotions, while 4 out of 11 emotions are considered from the IEMOCAP dataset. The features ZCR, MFCC, F0, and RMS are extracted from the raw audio files, and their means are calculated which are fed as input for training the models. The study presents a comparative analysis of emotion detection on both datasets, employing the models RNN, LSTM, Bi-LSTM, RF, Rotation Forest, and Fuzzy. The RF and Bi-LSTM models achieve highest accuracies of 76 and 72%, respectively, on the SAVEE dataset, when compared to other trained models. The fuzzy and Rotation Forest models are implemented which can be improvised with further optimization techniques. Additionally, a diagnostic User Interface is developed for analyzing audio, loading datasets, extracting features, training models, and classifying human emotions from audio using the trained models.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig18_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig19_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig20_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig21_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig22_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig23_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42979-024-02656-0/MediaObjects/42979_2024_2656_Fig24_HTML.png)
Similar content being viewed by others
Data availability
SAVEE dataset: http://kahlan.eps.surrey.ac.uk/savee/, IEMOCAP dataset: https://sail.usc.edu/iemocap/index.html.
References
Swain M, Routray A, Kabisatpathy P. Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol. 2018;21(1):93–120.
Fayek HM, Lech M, Cavedon L. Evaluating deep learning architectures for speech emotion recognition. Neural Netw. 2017;92:60–8.
Abbaschian BJ, Sierra-Sosa D, Elmaghraby A. Deep learning techniques for speech emotion recognition, from databases to models. Sensors. 2021;21(4):1249.
Surrey Audio-visual expressed emotion (SAVEE) database. (n.d.). Retrieved November 15, 2022, from http://kahlan.eps.surrey.ac.uk/savee/.
IEMOCAP- home. (n.d.). Retrieved November 15, 2022, from https://sail.usc.edu/iemocap/.
Aouani H, Ayed YB. Speech emotion recognition with deep learning. Procedia Comput Sci. 2020;176:251–60.
Al Dujaili MJ, Ebrahimi-Moghadam A, Fatlawi A. Speech emotion recognition based on SVM and KNN classifications fusion. Intern J Electr Comput Eng. 2021;11(2):1259.
Zhao Z, Bao Z, Zhao Y, Zhang Z, Cummins N, Ren Z, Schuller B. Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access. 2019;7:97515–25.
Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861–75.
Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59:101894.
Zehra W, Javed AR, Jalil Z, Khan HU, Gadekallu TR. Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst. 2021. https://doi.org/10.1007/s40747-020-00250-4.
Peng Z, Lu Y, Pan S & Liu Y. Efficient speech emotion recognition using multi-scale cnn and attention. In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 3020–3024. IEEE; 2021.
Li D, Liu J, Yang Z, Sun L, Wang Z. Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Syst Appl. 2021;173:114683.
Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA & Cleder C. Automatic speech emotion recognition using machine learning. In Social media and machine learning. IntechOpen; 2019.
Aljuhani RH, Alshutayri A, Alahdal S. Arabic speech emotion recognition from saudi dialect corpus. IEEE Access. 2021;9:127081–5.
Rumagit RY, Alexander G, Saputra IF. Model comparison in speech emotion recognition for Indonesian language. Procedia Comput Sci. 2021;179:789–97.
Alnuaim AA, Zakariah M, Shukla PK, Alhadlaq A, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction for recognizing speech emotions using multilayer perceptron classifier. J Healthcare Eng. 2022;2022:1–12.
Alnuaim AA, Zakariah M, Alhadlaq A, Shashidhar C, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction with detection of speaker emotions using convolution neural networks. Comput Intell Neurosci. 2022;2022:1–16.
Atmaja BT, Sasou A, Akagi M. Speech emotion and naturalness recognitions with multitask and single-task learnings. IEEE Access. 2022;10:72381–7.
Rehman A, Liu ZT, Wu M, Cao WH & Jia CS. Real-time speech emotion recognition based on syllable-level feature extraction. ar**v preprint ar**v:2204.11382. 2022.
Aftab A, Morsali A, Ghaemmaghami S & Champagne B. Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6912–6916). IEEE 2022.
Huang Z, Dong M, Mao Q & Zhan Y. Speech emotion recognition using CNN. Proceedings of the 22nd ACM International Conference on Multimedia. 2014. https://doi.org/10.1145/2647868.2654984
Padi S, Sadjadi SO, Sriram RD & Manocha D. Improved speech emotion recognition using transfer learning and spectrogram augmentation. In Proceedings of the 2021 International Conference on Multimodal Interaction (pp. 645–652) 2021.
** C, Sherstneva AI & Botygin IA (n.d.). Speech emotion recognition based on deep residual convolutional neural network. Retrieved November 15, 2022, from https://journalpro.ru/articles/speech-emotion-recognition-based-on-deep-residual-convolutional-neural-network/
Kaur K, Singh P. Punjabi emotional speech database: design, recording and verification. Intern J Intell Syst Appl Eng. 2021;9(4):205–8.
Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee HN. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378. https://doi.org/10.3390/s22062378.
Attar HI, Kadole NK, Karanjekar OG, Nagarkar DR & Sujeet. Speech emotion recognition system using machine learning. Retrieved October 20, 2022, from https://ijrpr.com/uploads/V3ISSUE5/IJRPR4210.pdf
Kumar Singh U, Singh S, Khanna S, Shyam R. Speech emotion recognition using machine learning and deep learning. Intern J Eng Appl Sci Techno. 2022;6(11):181–4.
Raj KS & Kumar P. Automated human emotion recognition and analysis using machine learning. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–9). IEEE 2021.
Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell. 2006;28(10):1619–30.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.
Witten IH, Frank E, Hall MA, Pal CJ & DATA M. Practical machine learning tools and techniques. In Data Mining. 2005 2, 4.
Mustakim N, Rabu R, Mursalin GM, Hossain E, Sharif O & Hoque MM. CUET-NLP@ TamilNLP-ACL2022: Multi-class textual emotion detection from social media using transformer. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 199–206). 2022.
Qayyum R, Akre V, Hafeez T, Khattak HA, Nawaz A, Ahmed S & ur Rahman K. Android based Emotion Detection Using Convolutions Neural Networks. In 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) (pp. 360–365). IEEE 2021.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest in this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “SWOT to AI-embraced Communication Systems (SWOT-AI)” guest edited by Somnath Mukhopadhyay, Debashis De, Sunita Sarkar and Celia Shahnaz.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nath, S., Shahi, A.K., Martin, T. et al. Speech Emotion Recognition Using Machine Learning: A Comparative Analysis. SN COMPUT. SCI. 5, 390 (2024). https://doi.org/10.1007/s42979-024-02656-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02656-0