Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Li, Yongwei; Sakakibara, Ken-Ichi; Morikawa, Daisuke; Akagi, Masato

doi:10.1007/978-3-030-00126-1_3

Yongwei Li¹⁹,
Ken-Ichi Sakakibara²⁰,
Daisuke Morikawa²¹ &
…
Masato Akagi¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10733))

Included in the following conference series:

International Seminar on Speech Production

641 Accesses
1 Citations

Abstract

This paper explores the commonalities of the glottal source waves and vocal tract shapes among four speakers in emotional speech (vowel: /a/, neutral, joy, anger, and sadness) based on a source-filter model with the proposed precise estimation scheme. The results are as follows. When compared with the spectral tilts of glottal source waves of neutral, (1) those of anger and joy increased, and those of sadness decreased in the 200- to 700-Hz frequency range; (2) those of anger increased, but those of joy decreased, and those of sadness were the same as those of neutral in the 700- to 2000-Hz range; and (3) all spectral tilts had the same tendency over 2000 Hz. For front vocal tract shapes, the area function of anger was the largest, that of sadness was the smallest, and those of joy and neutral were in the middle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Voice production model based on phonation biophysics

Article Open access 08 September 2021

The Human Voice in Speech and Singing

Phonetic Aspects of High Level of Naturalness in Speech Synthesis

References

Schröder, M., Cowie, R., Douglas-Cowie, E., Westerdijk, M., Gielen, S.: Acoustic correlates of emotion dimensions in view of speech synthesis. In: 7th European Conference on Speech Communication and Technology (2001)
Google Scholar
Hamada, Y., Elbarougy, R., Akagi, M.: A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space. In: Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA), pp. 1–7. IEEE Press (2014)
Google Scholar
Li, X., Akagi, M.: Multilingual speech emotion recognition system based on a three-layer model. In: Interspeech, pp. 3608–3612 (2016)
Google Scholar
Banse, R., Scherer, K.R.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70(3), 614–636 (1996)
Article Google Scholar
Airas, M., Alku, P.: Emotions in vowel segments of continuous speech: analysis of the glottal flow using the normalised amplitude quotient. Phonetica 63(1), 26–46 (2006)
Article Google Scholar
Gobl, C., Chasaide, A.N.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40(1–2), 189–212 (2003)
Article Google Scholar
Kitamura, T.: Similarity of effects of emotions on the speech organ configuration with and without speaking. In: Interspeech, pp. 909–912, (2010)
Google Scholar
Erickson, D., Zhu, C., Kawahara, S., Suemitsu, A.: Articulation, acoustics and perception of Mandarin Chinese Emotional Speech. Open Linguist. 2(1), 620–635 (2016)
Google Scholar
Fant, G., Liljencrants, J., Lin, Q.-G.: A four-parameter model of glottal flow. in: STL-QPSR 1985, vol. 4, pp. 1–13 (1985)
Google Scholar
Vincent, D., Rosec, O., Chonavel, T.: Estimation of LF glottal source parameters based on an ARX model. In: Interspeech, pp. 333–336 (2005)
Google Scholar
Kane, J., Gobl, C.: Evaluation of automatic glottal source analysis. International Conference on Nonlinear Speech Processing, Springer, pp. 1–8 (2013)
Google Scholar
Ohtsuka, T., Kasuya, H.: Aperiodicity control in ARX-based speech analysis-synthesis method. In: Seventh European Conference on Speech Communication and Technology, pp. 2267–2270 (2001)
Google Scholar
Kawahara, H., Sakakibara, K.-I., Banno, H., Morise, M., Toda, T., Irino, T.: Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 520–529. IEEE Press (2015)
Google Scholar
Drugman, T., Bozkurt, B., Dutoit, T.: A comparative study of glottal source estimation techniques. Comput. Speech Lang. 26(1), 20–34 (2012)
Article Google Scholar
Kane, J., Gobl, C.: Evaluation of automatic glottal source analysis. In: Drugman, T., Dutoit, T. (eds.) NOLISP 2013. LNCS (LNAI), vol. 7911, pp. 1–8. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38847-7_1
Chapter Google Scholar
Wakita, H.: Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Trans. Audio Electroacoust. 21(5), 417–427 (1973)
Article Google Scholar
Schroder M., Cowie R., Douglas-Cowie E., Westerdijk M., Gielen S.C.: Acoustic correlates of emotion dimensions in view of speech synthesis. In: Proceedings of Interspeech 2001, pp. 87–90 (2001)
Google Scholar

Download references

Acknowledgements

This study was supported by a Grant-in-Aid for Scientific Research (A) (No. 25240026) and China Scholarship Council (CSC).

Author information

Authors and Affiliations

Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Yongwei Li & Masato Akagi
Health Sciences University of Hokkaido, Hokkaido, Japan
Ken-Ichi Sakakibara
Toyama Prefectural University, Toyama, Japan
Daisuke Morikawa

Authors

Yongwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Ken-Ichi Sakakibara
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Morikawa
View author publications
You can also search for this author in PubMed Google Scholar
Masato Akagi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masato Akagi .

Editor information

Editors and Affiliations

Chinese Academy of Social Sciences, Bei**g, China
Qiang Fang
JAIST , Nomi, Japan
Jianwu Dang
Grenoble Alpes University, Saint-Martin-d'Hères, France
Pascal Perrier
Tian** University, Tian**, China
Jianguo Wei
Tian** University, Tian**, China
Longbiao Wang
Shenzhen Institute of Advanced Technology, Shenzhen, China
Nan Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Sakakibara, KI., Morikawa, D., Akagi, M. (2018). Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech. In: Fang, Q., Dang, J., Perrier, P., Wei, J., Wang, L., Yan, N. (eds) Studies on Speech Production. ISSP 2017. Lecture Notes in Computer Science(), vol 10733. Springer, Cham. https://doi.org/10.1007/978-3-030-00126-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-00126-1_3
Published: 11 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00125-4
Online ISBN: 978-3-030-00126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Voice production model based on phonation biophysics

The Human Voice in Speech and Singing

Phonetic Aspects of High Level of Naturalness in Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Commonalities of Glottal Sources and Vocal Tract Shapes Among Speakers in Emotional Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Voice production model based on phonation biophysics

The Human Voice in Speech and Singing

Phonetic Aspects of High Level of Naturalness in Speech Synthesis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation