Log in

Speech Emotion Recognition Using Machine Learning: A Comparative Analysis

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

It is possible to identify emotions based on a person's speech. The field of research focusing on expressing emotions through voice is continuously evolving. This study utilizes the SAVEE and IEMOCAP datasets to explore Speech Emotion Recognition. The SAVEE dataset consists of seven emotions, while 4 out of 11 emotions are considered from the IEMOCAP dataset. The features ZCR, MFCC, F0, and RMS are extracted from the raw audio files, and their means are calculated which are fed as input for training the models. The study presents a comparative analysis of emotion detection on both datasets, employing the models RNN, LSTM, Bi-LSTM, RF, Rotation Forest, and Fuzzy. The RF and Bi-LSTM models achieve highest accuracies of 76 and 72%, respectively, on the SAVEE dataset, when compared to other trained models. The fuzzy and Rotation Forest models are implemented which can be improvised with further optimization techniques. Additionally, a diagnostic User Interface is developed for analyzing audio, loading datasets, extracting features, training models, and classifying human emotions from audio using the trained models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
Fig.6
Fig.7
Fig.8
Fig.9
Fig.10
Fig.11
Fig.12
Fig.13
Fig.14
Fig.15
Fig.16
Fig.17
Fig.18
Fig.19
Fig.20
Fig.21
Fig.22
Fig.23
Fig.24

Similar content being viewed by others

Data availability

SAVEE dataset: http://kahlan.eps.surrey.ac.uk/savee/, IEMOCAP dataset: https://sail.usc.edu/iemocap/index.html.

References

  1. Swain M, Routray A, Kabisatpathy P. Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol. 2018;21(1):93–120.

    Article  Google Scholar 

  2. Fayek HM, Lech M, Cavedon L. Evaluating deep learning architectures for speech emotion recognition. Neural Netw. 2017;92:60–8.

    Article  Google Scholar 

  3. Abbaschian BJ, Sierra-Sosa D, Elmaghraby A. Deep learning techniques for speech emotion recognition, from databases to models. Sensors. 2021;21(4):1249.

    Article  Google Scholar 

  4. Surrey Audio-visual expressed emotion (SAVEE) database. (n.d.). Retrieved November 15, 2022, from http://kahlan.eps.surrey.ac.uk/savee/.

  5. IEMOCAP- home. (n.d.). Retrieved November 15, 2022, from https://sail.usc.edu/iemocap/.

  6. Aouani H, Ayed YB. Speech emotion recognition with deep learning. Procedia Comput Sci. 2020;176:251–60.

    Article  Google Scholar 

  7. Al Dujaili MJ, Ebrahimi-Moghadam A, Fatlawi A. Speech emotion recognition based on SVM and KNN classifications fusion. Intern J Electr Comput Eng. 2021;11(2):1259.

    Google Scholar 

  8. Zhao Z, Bao Z, Zhao Y, Zhang Z, Cummins N, Ren Z, Schuller B. Exploring deep spectrum representations via attention-based recurrent and convolutional neural networks for speech emotion recognition. IEEE Access. 2019;7:97515–25.

    Article  Google Scholar 

  9. Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861–75.

    Article  Google Scholar 

  10. Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomed Signal Process Control. 2020;59:101894.

    Article  Google Scholar 

  11. Zehra W, Javed AR, Jalil Z, Khan HU, Gadekallu TR. Cross corpus multi-lingual speech emotion recognition using ensemble learning. Complex Intell Syst. 2021. https://doi.org/10.1007/s40747-020-00250-4.

    Article  Google Scholar 

  12. Peng Z, Lu Y, Pan S & Liu Y. Efficient speech emotion recognition using multi-scale cnn and attention. In ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 3020–3024. IEEE; 2021.

  13. Li D, Liu J, Yang Z, Sun L, Wang Z. Speech emotion recognition using recurrent neural networks with directional self-attention. Expert Syst Appl. 2021;173:114683.

    Article  Google Scholar 

  14. Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA & Cleder C. Automatic speech emotion recognition using machine learning. In Social media and machine learning. IntechOpen; 2019.

  15. Aljuhani RH, Alshutayri A, Alahdal S. Arabic speech emotion recognition from saudi dialect corpus. IEEE Access. 2021;9:127081–5.

    Article  Google Scholar 

  16. Rumagit RY, Alexander G, Saputra IF. Model comparison in speech emotion recognition for Indonesian language. Procedia Comput Sci. 2021;179:789–97.

    Article  Google Scholar 

  17. Alnuaim AA, Zakariah M, Shukla PK, Alhadlaq A, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction for recognizing speech emotions using multilayer perceptron classifier. J Healthcare Eng. 2022;2022:1–12.

    Article  Google Scholar 

  18. Alnuaim AA, Zakariah M, Alhadlaq A, Shashidhar C, Hatamleh WA, Tarazi H, Ratna R. Human-computer interaction with detection of speaker emotions using convolution neural networks. Comput Intell Neurosci. 2022;2022:1–16.

    Google Scholar 

  19. Atmaja BT, Sasou A, Akagi M. Speech emotion and naturalness recognitions with multitask and single-task learnings. IEEE Access. 2022;10:72381–7.

    Article  Google Scholar 

  20. Rehman A, Liu ZT, Wu M, Cao WH & Jia CS. Real-time speech emotion recognition based on syllable-level feature extraction. ar**v preprint ar**v:2204.11382. 2022.

  21. Aftab A, Morsali A, Ghaemmaghami S & Champagne B. Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6912–6916). IEEE 2022.

  22. Huang Z, Dong M, Mao Q & Zhan Y. Speech emotion recognition using CNN. Proceedings of the 22nd ACM International Conference on Multimedia. 2014. https://doi.org/10.1145/2647868.2654984

  23. Padi S, Sadjadi SO, Sriram RD & Manocha D. Improved speech emotion recognition using transfer learning and spectrogram augmentation. In Proceedings of the 2021 International Conference on Multimodal Interaction (pp. 645–652) 2021.

  24. ** C, Sherstneva AI & Botygin IA (n.d.). Speech emotion recognition based on deep residual convolutional neural network. Retrieved November 15, 2022, from https://journalpro.ru/articles/speech-emotion-recognition-based-on-deep-residual-convolutional-neural-network/

  25. Kaur K, Singh P. Punjabi emotional speech database: design, recording and verification. Intern J Intell Syst Appl Eng. 2021;9(4):205–8.

    Article  Google Scholar 

  26. Aggarwal A, Srivastava A, Agarwal A, Chahal N, Singh D, Alnuaim AA, Alhadlaq A, Lee HN. Two-way feature extraction for speech emotion recognition using deep learning. Sensors. 2022;22(6):2378. https://doi.org/10.3390/s22062378.

    Article  Google Scholar 

  27. Attar HI, Kadole NK, Karanjekar OG, Nagarkar DR & Sujeet. Speech emotion recognition system using machine learning. Retrieved October 20, 2022, from https://ijrpr.com/uploads/V3ISSUE5/IJRPR4210.pdf

  28. Kumar Singh U, Singh S, Khanna S, Shyam R. Speech emotion recognition using machine learning and deep learning. Intern J Eng Appl Sci Techno. 2022;6(11):181–4.

    Google Scholar 

  29. Raj KS & Kumar P. Automated human emotion recognition and analysis using machine learning. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–9). IEEE 2021.

  30. Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell. 2006;28(10):1619–30.

    Article  Google Scholar 

  31. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8.

    Article  Google Scholar 

  32. Witten IH, Frank E, Hall MA, Pal CJ & DATA M. Practical machine learning tools and techniques. In Data Mining. 2005 2, 4.

  33. Mustakim N, Rabu R, Mursalin GM, Hossain E, Sharif O & Hoque MM. CUET-NLP@ TamilNLP-ACL2022: Multi-class textual emotion detection from social media using transformer. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 199–206). 2022.

  34. Qayyum R, Akre V, Hafeez T, Khattak HA, Nawaz A, Ahmed S & ur Rahman K. Android based Emotion Detection Using Convolutions Neural Networks. In 2021 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE) (pp. 360–365). IEEE 2021.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nupur Choudhury.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest in this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “SWOT to AI-embraced Communication Systems (SWOT-AI)” guest edited by Somnath Mukhopadhyay, Debashis De, Sunita Sarkar and Celia Shahnaz.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nath, S., Shahi, A.K., Martin, T. et al. Speech Emotion Recognition Using Machine Learning: A Comparative Analysis. SN COMPUT. SCI. 5, 390 (2024). https://doi.org/10.1007/s42979-024-02656-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-02656-0

Keywords

Navigation