Abstract
Emotion Recognition is challenging for understanding people and enhance human computer interaction experiences. In this paper, we explore deep belief networks (DBN) to classify six emotion status: anger, fear, joy, neutral status, sadness and surprise using different features fusion. Several kinds of speech features such as Mel frequency cepstrum coefficient (MFCC), pitch, formant, et al., were extracted and combined in different ways to reflect the relationship between feature combinations and emotion recognition performance. We adjusted different parameters in DBN to achieve the best performance when solving different emotions. Both gender dependent and gender independent experiments were conducted on the Chinese Academy of Sciences emotional speech database. The highest accuracy was 94.6 %, which was achieved using multi-feature fusion. The experiment results show that DBN based approach has good potential for practical usage of emotion recognition, and suitable multi-feature fusion will improve the performance of speech emotion recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Aher, P., Cheeran, A.: Auditory processing of speech signals for speech emotion recognition (2014)
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614 (1996)
Chu, Y.Y., **ong, W.H., Chen, W.: Speech emotion recognition based on EMD in noisy environments. In: Advanced Materials Research, vol. 831, pp. 460–464. Trans Tech Publication (2014)
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, D.M.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47(7), 829–837 (2000)
Hinton, G.E.: Deep belief networks. Scholarpedia 4(5), 5947 (2009)
**, Q., Li, C., Chen, S., Wu, H.: Speech emotion recognition with acoustic and lexical features. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4749–4753. IEEE (2015)
Joshi, D.D., Zalte, M.: Recognition of emotion from marathi speech using MFCC and DWT algorithms (2013)
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3687–3691. IEEE (2013)
Lee, C.C., Mower, E., Busso, C., Lee, S., Narayanan, S.: Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53(9), 1162–1171 (2011)
Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012)
Picard, R.W., Picard, R.: Affective Computing, vol. 252. MIT Press, Cambridge (1997)
Schuller, B., Rigoll, G., Lang, M.: Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 1, pp. I-577. IEEE (2004)
Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621–625. IEEE (2011)
Utane, A.S., Nalbalwar, S.: Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion 2, 8 (2013)
Williams, C.E., Stevens, K.N.: Emotions and speech: some acoustical correlates. J. Acoust. Soc. Am. 52(4B), 1238–1250 (1972)
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)
Acknowledgement
This research was supported by the International S&T Cooperation Program of China (ISTCP, 2013DFA10980).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, W., Zhao, D., Chen, X., Zhang, Y. (2016). Deep Learning Based Emotion Recognition from Chinese Speech. In: Chang, C., Chiari, L., Cao, Y., **, H., Mokhtari, M., Aloulou, H. (eds) Inclusive Smart Cities and Digital Health. ICOST 2016. Lecture Notes in Computer Science(), vol 9677. Springer, Cham. https://doi.org/10.1007/978-3-319-39601-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-39601-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39600-2
Online ISBN: 978-3-319-39601-9
eBook Packages: Computer ScienceComputer Science (R0)