Abstract
This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned spectral features is discussed in brief. Further extraction of these features from sub-syllabic regions such as consonants, vowels and consonant-vowel transition regions is discussed. Extraction of spectral features from pitch synchronous analysis is also discussed. Basic philosophy and use of Gaussian mixture models is discussed in this chapter for classifying the emotions. The emotion recognition performance obtained from different vocal tract features is compared. Proposed spectral features are evaluated on Indian and Berlin emotion databases. Performance of Gaussian mixture models in classifying the emotional utterances using vocal tract features is compared with neural network models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. SPC 48, 1162–1181 (2006)
D. Neiberg, K. Elenius, K. Laskowski, Emotion recognition in spontaneous speech using GMMs, in INTERSPEECH 2006—ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 Sept 2006
D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition, Speech Commun. (2010) (in Press)
S.G. Koolagudi, S. Maity, V.A. Kumar, S. Chakrabarti, K.S. Rao, IITKGP-SESC: Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India, Springer. ISSN: 1865–0929 ed., 17–19 Aug 2009
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, Lissabon, 2005
L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, New Jersy, 1993)
J. Chen, Y.A. Huang, Q. Li, K.K. Paliwal, Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Process. Lett. 11, 258–261 (2004)
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel on set points in continuous speech using auto-associative neural network models, in INTERSPEECH, IEEE, 2004
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)
Y. Zeng, H. Wu, R. Gao, Pitch synchronous analysis method and fisher criterion based speaker identification, in Third International Conference on Natural Computation, vol. 2 (IEEE Computer Society, Washington DC, USA, 2007), pp. 691–695. ISBN: 0-7695-2875-9
H. Muta, T. Baer, K. Wagatsuma, T. Muraoka, H. Fukuda, Pitch synchronous analysis of hoarseness in running speech. J. Acoust. Soc. Am. 84, 1292–1301 (1988)
K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)
B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall, New Delhi, India, 1999)
S. Haykin, Neural Networks: A Comprehensive Foundation (Pearson Education Aisa, Inc., New Delhi, India, 1999)
K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications (Wiley, New York, 1996)
M.S. Ikbal, H. Misra, B. Yegnanarayana, Analysis of autoassociative map** neural networks, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), (USA, 1999), pp. 854–858
S.P. Kishore, B. Yegnanarayana, Online text-independent speaker verification system using autoassociative neural network models, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 2 (Washington, DC, USA, 2001), pp. 1548–1553
A.V.N.S. Anjani, Autoassociate neural network models for processing degraded speech, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000
K.S. Reddy, Source and system features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004
C.S. Gupta, Significance of source features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003
S. Desai, A. W. Black, B.Yegnanarayana, K. Prahallad, Spectral map** using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010)
K.S. Rao, B. Yegnanarayana, Intonation modeling for indian languages. Comput. Speech Lang. 23, 240–256 (2009)
C.K. Mohan, B. Yegnanarayana, Classification of sport videos using edge-based features and autoassociative neural network models. Signal Image Video Process. 4, 61–73 (2008). doi: 10.1007/s11760-008-0097-9
L. Mary, B. Yegnanarayana, Autoassociative neural network models for language identification, in International Conference on Intelligent Sensing and Information Processing, IEEE, pp. 317–320, 24 Aug 2004. doi: 10.1109/ICISIP.2004.1287674
B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using aann models, in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, UT), May 2001
C.S. Gupta, S.R.M. Prasanna, B. Yegnanarayana, Autoassociative neural network models for online speaker verification using source features from vowels, in International Joint Conference on Neural Networks, (Honululu, Hawii, USA), May 2002
B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models, in Proceedings of the IEEE International Conference Acoustics, Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001
S.G. Koolagudi, K.S. Rao, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012). doi:10.1007/s10772-012-9150-8
O.M. Mubarak, E. Ambikairajah, J. Epps, Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources, in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 Aug 2005
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (A Wiley-interscience Publications, Singapore, 2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Rao, K.S., Koolagudi, S.G. (2013). Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features. In: Robust Emotion Recognition using Spectral and Prosodic Features. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6360-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6360-3_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6359-7
Online ISBN: 978-1-4614-6360-3
eBook Packages: EngineeringEngineering (R0)