Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Nath, Sasank; Shahi, Ashutosh Kumar; Martin, Tekwo; Choudhury, Nupur; Mandal, Rupesh

doi:10.1007/s42979-024-02656-0

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Original Research
Published: 04 April 2024

Volume 5, article number 390, (2024)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Sasank Nath¹,
Ashutosh Kumar Shahi¹,
Tekwo Martin¹,
Nupur Choudhury ORCID: orcid.org/0000-0002-4382-5227¹ &
…
Rupesh Mandal¹

119 Accesses
Explore all metrics

Abstract

It is possible to identify emotions based on a person's speech. The field of research focusing on expressing emotions through voice is continuously evolving. This study utilizes the SAVEE and IEMOCAP datasets to explore Speech Emotion Recognition. The SAVEE dataset consists of seven emotions, while 4 out of 11 emotions are considered from the IEMOCAP dataset. The features ZCR, MFCC, F0, and RMS are extracted from the raw audio files, and their means are calculated which are fed as input for training the models. The study presents a comparative analysis of emotion detection on both datasets, employing the models RNN, LSTM, Bi-LSTM, RF, Rotation Forest, and Fuzzy. The RF and Bi-LSTM models achieve highest accuracies of 76 and 72%, respectively, on the SAVEE dataset, when compared to other trained models. The fuzzy and Rotation Forest models are implemented which can be improvised with further optimization techniques. Additionally, a diagnostic User Interface is developed for analyzing audio, loading datasets, extracting features, training models, and classifying human emotions from audio using the trained models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Deep learning-based facial emotion recognition for human–computer interaction applications

Article 22 April 2021

Data availability

SAVEE dataset: http://kahlan.eps.surrey.ac.uk/savee/, IEMOCAP dataset: https://sail.usc.edu/iemocap/index.html.

References

Swain M, Routray A, Kabisatpathy P. Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol. 2018;21(1):93–120.
Article Google Scholar
Fayek HM, Lech M, Cavedon L. Evaluating deep learning architectures for speech emotion recognition. Neural Netw. 2017;92:60–8.
Article Google Scholar
Abbaschian BJ, Sierra-Sosa D, Elmaghraby A. Deep learning techniques for speech emotion recognition, from databases to models. Sensors. 2021;21(4):1249.
Article Google Scholar
Surrey Audio-visual expressed emotion (SAVEE) database. (n.d.). Retrieved November 15, 2022, from http://kahlan.eps.surrey.ac.uk/savee/.
IEMOCAP- home. (n.d.). Retrieved November 15, 2022, from https://sail.usc.edu/iemocap/.
Aouani H, Ayed YB. Speech emotion recognition with deep learning. Procedia Comput Sci. 2020;176:251–60.
Article Google Scholar
Al Dujaili MJ, Ebrahimi-Moghadam A, Fatlawi A. Speech emotion recognition based on SVM and KNN classifications fusion. Intern J Electr Comput Eng. 2021;11(2):1259.
Google Scholar
Zhao Z, Bao Z, Zhao Y, Zhang Z, Cummins N, Ren Z, Schuller B. Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access. 2019;7:97515–25.
Article Google Scholar
Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861–75.
Article Google Scholar
Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59:101894.
Article Google Scholar
Zehra W, Javed AR, Jalil Z, Khan HU, Gadekallu TR. Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst. 2021. https://doi.org/10.1007/s40747-020-00250-4.
Article Google Scholar
Peng Z, Lu Y, Pan S & Liu Y. Efficient speech emotion recognition using multi-scale cnn and attention. In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 3020–3024. IEEE; 2021.
Li D, Liu J, Yang Z, Sun L, Wang Z. Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Syst Appl. 2021;173:114683.
Article Google Scholar
Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA & Cleder C. Automatic speech emotion recognition using machine learning. In Social media and machine learning. IntechOpen; 2019.
Aljuhani RH, Alshutayri A, Alahdal S. Arabic speech emotion recognition from saudi dialect corpus. IEEE Access. 2021;9:127081–5.
Article Google Scholar
Rumagit RY, Alexander G, Saputra IF. Model comparison in speech emotion recognition for Indonesian language. Procedia Comput Sci. 2021;179:789–97.
Article Google Scholar
Alnuaim AA, Zakariah M, Shukla PK, Alhadlaq A, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction for recognizing speech emotions using multilayer perceptron classifier. J Healthcare Eng. 2022;2022:1–12.
Article Google Scholar
Alnuaim AA, Zakariah M, Alhadlaq A, Shashidhar C, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction with detection of speaker emotions using convolution neural networks. Comput Intell Neurosci. 2022;2022:1–16.
Google Scholar
Atmaja BT, Sasou A, Akagi M. Speech emotion and naturalness recognitions with multitask and single-task learnings. IEEE Access. 2022;10:72381–7.
Article Google Scholar
Rehman A, Liu ZT, Wu M, Cao WH & Jia CS. Real-time speech emotion recognition based on syllable-level feature extraction. ar**v preprint ar**v:2204.11382. 2022.
Aftab A, Morsali A, Ghaemmaghami S & Champagne B. Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6912–6916). IEEE 2022.
Huang Z, Dong M, Mao Q & Zhan Y. Speech emotion recognition using CNN. Proceedings of the 22nd ACM International Conference on Multimedia. 2014. https://doi.org/10.1145/2647868.2654984
Padi S, Sadjadi SO, Sriram RD & Manocha D. Improved speech emotion recognition using transfer learning and spectrogram augmentation. In Proceedings of the 2021 International Conference on Multimodal Interaction (pp. 645–652) 2021.
** C, Sherstneva AI & Botygin IA (n.d.). Speech emotion recognition based on deep residual convolutional neural network. Retrieved November 15, 2022, from https://journalpro.ru/articles/speech-emotion-recognition-based-on-deep-residual-convolutional-neural-network/
Kaur K, Singh P. Punjabi emotional speech database: design, recording and verification. Intern J Intell Syst Appl Eng. 2021;9(4):205–8.
Article Google Scholar
Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee HN. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378. https://doi.org/10.3390/s22062378.
Article Google Scholar
Attar HI, Kadole NK, Karanjekar OG, Nagarkar DR & Sujeet. Speech emotion recognition system using machine learning. Retrieved October 20, 2022, from https://ijrpr.com/uploads/V3ISSUE5/IJRPR4210.pdf
Kumar Singh U, Singh S, Khanna S, Shyam R. Speech emotion recognition using machine learning and deep learning. Intern J Eng Appl Sci Techno. 2022;6(11):181–4.
Google Scholar
Raj KS & Kumar P. Automated human emotion recognition and analysis using machine learning. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–9). IEEE 2021.
Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell. 2006;28(10):1619–30.
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.
Article Google Scholar
Witten IH, Frank E, Hall MA, Pal CJ & DATA M. Practical machine learning tools and techniques. In Data Mining. 2005 2, 4.
Mustakim N, Rabu R, Mursalin GM, Hossain E, Sharif O & Hoque MM. CUET-NLP@ TamilNLP-ACL2022: Multi-class textual emotion detection from social media using transformer. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 199–206). 2022.
Qayyum R, Akre V, Hafeez T, Khattak HA, Nawaz A, Ahmed S & ur Rahman K. Android based Emotion Detection Using Convolutions Neural Networks. In 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) (pp. 360–365). IEEE 2021.

Download references

Author information

Authors and Affiliations

School of Technology, Assam Don Bosco University, Guwahati, India
Sasank Nath, Ashutosh Kumar Shahi, Tekwo Martin, Nupur Choudhury & Rupesh Mandal

Authors

Sasank Nath
View author publications
You can also search for this author in PubMed Google Scholar
Ashutosh Kumar Shahi
View author publications
You can also search for this author in PubMed Google Scholar
Tekwo Martin
View author publications
You can also search for this author in PubMed Google Scholar
Nupur Choudhury
View author publications
You can also search for this author in PubMed Google Scholar
Rupesh Mandal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nupur Choudhury.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “SWOT to AI-embraced Communication Systems (SWOT-AI)” guest edited by Somnath Mukhopadhyay, Debashis De, Sunita Sarkar and Celia Shahnaz.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nath, S., Shahi, A.K., Martin, T. et al. Speech Emotion Recognition Using Machine Learning: A Comparative Analysis. SN COMPUT. SCI. 5, 390 (2024). https://doi.org/10.1007/s42979-024-02656-0

Download citation

Received: 02 June 2023
Accepted: 26 January 2024
Published: 04 April 2024
DOI: https://doi.org/10.1007/s42979-024-02656-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Deep learning-based facial emotion recognition for human–computer interaction applications

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Deep learning-based facial emotion recognition for human–computer interaction applications

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation