Deep Learning Based Emotion Recognition from Chinese Speech

  • Conference paper
  • First Online:
Inclusive Smart Cities and Digital Health (ICOST 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9677))

Included in the following conference series:

Abstract

Emotion Recognition is challenging for understanding people and enhance human computer interaction experiences. In this paper, we explore deep belief networks (DBN) to classify six emotion status: anger, fear, joy, neutral status, sadness and surprise using different features fusion. Several kinds of speech features such as Mel frequency cepstrum coefficient (MFCC), pitch, formant, et al., were extracted and combined in different ways to reflect the relationship between feature combinations and emotion recognition performance. We adjusted different parameters in DBN to achieve the best performance when solving different emotions. Both gender dependent and gender independent experiments were conducted on the Chinese Academy of Sciences emotional speech database. The highest accuracy was 94.6 %, which was achieved using multi-feature fusion. The experiment results show that DBN based approach has good potential for practical usage of emotion recognition, and suitable multi-feature fusion will improve the performance of speech emotion recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.datatang.com/data/39277.

References

  1. Aher, P., Cheeran, A.: Auditory processing of speech signals for speech emotion recognition (2014)

    Google Scholar 

  2. Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614 (1996)

    Article  Google Scholar 

  3. Chu, Y.Y., **ong, W.H., Chen, W.: Speech emotion recognition based on EMD in noisy environments. In: Advanced Materials Research, vol. 831, pp. 460–464. Trans Tech Publication (2014)

    Google Scholar 

  4. El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  5. France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, D.M.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47(7), 829–837 (2000)

    Article  Google Scholar 

  6. Hinton, G.E.: Deep belief networks. Scholarpedia 4(5), 5947 (2009)

    Article  Google Scholar 

  7. **, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4749–4753. IEEE (2015)

    Google Scholar 

  8. Joshi, D.D., Zalte, M.: Recognition of emotion from marathi speech using MFCC and DWT algorithms (2013)

    Google Scholar 

  9. Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3687–3691. IEEE (2013)

    Google Scholar 

  10. Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9), 1162–1171 (2011)

    Article  Google Scholar 

  11. Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)

    Article  Google Scholar 

  12. Picard, R.W., Picard, R.: Affective Computing, vol. 252. MIT Press, Cambridge (1997)

    Google Scholar 

  13. Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 1, pp. I-577. IEEE (2004)

    Google Scholar 

  14. Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621–625. IEEE (2011)

    Google Scholar 

  15. Utane, A.S., Nalbalwar, S.: Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion 2, 8 (2013)

    Google Scholar 

  16. Williams, C.E., Stevens, K.N.: Emotions and speech: some acoustical correlates. J. Acoust. Soc. Am. 52(4B), 1238–1250 (1972)

    Article  Google Scholar 

  17. Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)

    Article  Google Scholar 

Download references

Acknowledgement

This research was supported by the International S&T Cooperation Program of China (ISTCP, 2013DFA10980).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weishan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, W., Zhao, D., Chen, X., Zhang, Y. (2016). Deep Learning Based Emotion Recognition from Chinese Speech. In: Chang, C., Chiari, L., Cao, Y., **, H., Mokhtari, M., Aloulou, H. (eds) Inclusive Smart Cities and Digital Health. ICOST 2016. Lecture Notes in Computer Science(), vol 9677. Springer, Cham. https://doi.org/10.1007/978-3-319-39601-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39601-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39600-2

  • Online ISBN: 978-3-319-39601-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation