Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Rao, K. Sreenivasa; Koolagudi, Shashidhar G.

doi:10.1007/978-1-4614-6360-3_2

K. Sreenivasa Rao³ &
Shashidhar G. Koolagudi³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

985 Accesses

Abstract

This chapter discusses the use of vocal tract information for recognizing the emotions. Linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are used as the correlates of vocal tract information. In addition to LPCCs and MFCCs, formant related features are also explored in this work for recognizing emotions from speech. Extraction of the above mentioned spectral features is discussed in brief. Further extraction of these features from sub-syllabic regions such as consonants, vowels and consonant-vowel transition regions is discussed. Extraction of spectral features from pitch synchronous analysis is also discussed. Basic philosophy and use of Gaussian mixture models is discussed in this chapter for classifying the emotions. The emotion recognition performance obtained from different vocal tract features is compared. Proposed spectral features are evaluated on Indian and Berlin emotion databases. Performance of Gaussian mixture models in classifying the emotional utterances using vocal tract features is compared with neural network models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emotion Recognition Using Vocal Tract Information

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

Article 17 August 2017

Speech Emotion Recognition: A Review

References

D. Ververidis, C. Kotropoulos, Emotional speech recognition: Resources, features, and methods. SPC 48, 1162–1181 (2006)
Google Scholar
D. Neiberg, K. Elenius, K. Laskowski, Emotion recognition in spontaneous speech using GMMs, in INTERSPEECH 2006—ICSLP, (Pittsburgh, Pennsylvania), pp. 809–812, 17–19 Sept 2006
Google Scholar
D. Bitouk, R. Verma, A. Nenkova, Class-level spectral features for emotion recognition, Speech Commun. (2010) (in Press)
Google Scholar
S.G. Koolagudi, S. Maity, V.A. Kumar, S. Chakrabarti, K.S. Rao, IITKGP-SESC: Speech Database for Emotion Analysis. Communications in Computer and Information Science, JIIT University, Noida, India, Springer. ISSN: 1865–0929 ed., 17–19 Aug 2009
Google Scholar
F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, B. Weiss, A database of german emotional speech, in Interspeech, Lissabon, 2005
Google Scholar
L.R. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs, New Jersy, 1993)
Google Scholar
J. Chen, Y.A. Huang, Q. Li, K.K. Paliwal, Recognition of noisy speech using dynamic spectral subband centroids. IEEE Signal Process. Lett. 11, 258–261 (2004)
Article Google Scholar
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel on set points in continuous speech using auto-associative neural network models, in INTERSPEECH, IEEE, 2004
Google Scholar
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 1263–1269 (2009)
Article Google Scholar
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)
Article Google Scholar
Y. Zeng, H. Wu, R. Gao, Pitch synchronous analysis method and fisher criterion based speaker identification, in Third International Conference on Natural Computation, vol. 2 (IEEE Computer Society, Washington DC, USA, 2007), pp. 691–695. ISBN: 0-7695-2875-9
Google Scholar
H. Muta, T. Baer, K. Wagatsuma, T. Muraoka, H. Fukuda, Pitch synchronous analysis of hoarseness in running speech. J. Acoust. Soc. Am. 84, 1292–1301 (1988)
Article Google Scholar
K. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16, 1602–1613 (2008)
Article Google Scholar
B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)
Article Google Scholar
B. Yegnanarayana, Artificial Neural Networks (Prentice-Hall, New Delhi, India, 1999)
Google Scholar
S. Haykin, Neural Networks: A Comprehensive Foundation (Pearson Education Aisa, Inc., New Delhi, India, 1999)
Google Scholar
K.I. Diamantaras, S.Y. Kung, Principal Component Neural Networks: Theory and Applications (Wiley, New York, 1996)
Google Scholar
M.S. Ikbal, H. Misra, B. Yegnanarayana, Analysis of autoassociative map** neural networks, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), (USA, 1999), pp. 854–858
Google Scholar
S.P. Kishore, B. Yegnanarayana, Online text-independent speaker verification system using autoassociative neural network models, in Proceedings of the International Joint Conference on Neural Networks (IJCNN), vol. 2 (Washington, DC, USA, 2001), pp. 1548–1553
Google Scholar
A.V.N.S. Anjani, Autoassociate neural network models for processing degraded speech, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2000
Google Scholar
K.S. Reddy, Source and system features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2004
Google Scholar
C.S. Gupta, Significance of source features for speaker recognition, Master’s Thesis, MS Thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai 600 036, India, 2003
Google Scholar
S. Desai, A. W. Black, B.Yegnanarayana, K. Prahallad, Spectral map** using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18, 954–964 (2010)
Google Scholar
K.S. Rao, B. Yegnanarayana, Intonation modeling for indian languages. Comput. Speech Lang. 23, 240–256 (2009)
Article Google Scholar
C.K. Mohan, B. Yegnanarayana, Classification of sport videos using edge-based features and autoassociative neural network models. Signal Image Video Process. 4, 61–73 (2008). doi: 10.1007/s11760-008-0097-9
L. Mary, B. Yegnanarayana, Autoassociative neural network models for language identification, in International Conference on Intelligent Sensing and Information Processing, IEEE, pp. 317–320, 24 Aug 2004. doi: 10.1109/ICISIP.2004.1287674
B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using aann models, in IEEE International Conference on Acoustics, Speech, and Signal Processing, (Salt Lake City, UT), May 2001
Google Scholar
C.S. Gupta, S.R.M. Prasanna, B. Yegnanarayana, Autoassociative neural network models for online speaker verification using source features from vowels, in International Joint Conference on Neural Networks, (Honululu, Hawii, USA), May 2002
Google Scholar
B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN models, in Proceedings of the IEEE International Conference Acoustics, Speech, Signal Processing, (Salt Lake City, Utah, USA), pp. 409–412, May 2001
Google Scholar
S.G. Koolagudi, K.S. Rao, Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. Int. J. Speech Technol. 15, 495–511 (2012). doi:10.1007/s10772-012-9150-8
Article Google Scholar
O.M. Mubarak, E. Ambikairajah, J. Epps, Analysis of an mfcc-based audio indexing system for efficient coding of multimedia sources, in The 8th International Symposium on Signal Processing and its Applications, (Sydney, Australia), 28–31 Aug 2005
Google Scholar
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (A Wiley-interscience Publications, Singapore, 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, India
K. Sreenivasa Rao & Shashidhar G. Koolagudi

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Shashidhar G. Koolagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Sreenivasa Rao .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rao, K.S., Koolagudi, S.G. (2013). Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features. In: Robust Emotion Recognition using Spectral and Prosodic Features. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6360-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6360-3_2
Published: 13 January 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6359-7
Online ISBN: 978-1-4614-6360-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emotion Recognition Using Vocal Tract Information

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

Speech Emotion Recognition: A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robust Emotion Recognition using Pitch Synchronous and Sub-syllabic Spectral Features

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Emotion Recognition Using Vocal Tract Information

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

Speech Emotion Recognition: A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation