Speaker-Related Robustness Issues

Zheng, Thomas Fang; Li, Lantian

doi:10.1007/978-981-10-3238-7_3

Thomas Fang Zheng³ &
Lantian Li³

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSIGNAL))

525 Accesses

Abstract

Speaker dependent factors, such as gender, physical condition (cold or laryngitis), speaking style (emotion state, speech rate, etc.), cross-language, accent and session variations, are major concerns in speech signal processing. How they correlate with each other and what the key factors are in speech realization are real considerations in research [1]. The current mainstream research can be divided into five directions which will be described in the following subsections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 42.79; Price includes VAT (Germany)

Softcover Book: EUR 53.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang C, Chen T, Li SZ et al (2001) Analysis of speaker variability. In: INTERSPEECH. pp 1377–1380
Google Scholar
Cumani S, Glembek O, Brümmer N et al (2012) Gender independent discriminative speaker recognition in i-vector space. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4361–4364
Google Scholar
McLaren M, van Leeuwen DA (2012) Gender-independent speaker recognition using source normalization. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4373–4376
Google Scholar
Tull RG, Rutledge JC (1996) Analysis of ‘‘cold-affected’’ speech for inclusion in speaker recognition systems. J Acoust Soc Am 99(4):2549–2574
Article Google Scholar
Tull RG, Rutledge JC (1996) ‘Cold speech’ for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers
Google Scholar
Tull RG, Rutledge JC, Larson CR (1996) Cepstral analysis of ‘‘cold‐speech’’ for speaker recognition: a second look. Dissertation, ASA
Google Scholar
Tull RG (1999) Acoustic analysis of cold-speech: implications for speaker recognition technology and the common cold. Northwestern University
Google Scholar
Kwon OW, Chan K, Hao J et al (2003) Emotion recognition by speech signals. In: INTERSPEECH
Google Scholar
Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5(3):275–294
Article Google Scholar
Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’87, vol 12. IEEE, pp 705–708
Google Scholar
Bie F, Wang D, Zheng TF et al (2013) Emotional speaker verification with linear adaptation. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 91–94
Google Scholar
Zetterholm E (1998) Prosody and voice quality in the expression of emotions. In: ICSLP
Google Scholar
Wu T, Yang Y, Wu Z (2005) Improving speaker recognition by training on emotion-added models. In: International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, Heidelberg, pp 382–389
Google Scholar
Pereira C, Watson CI (1998) Some acoustic characteristics of emotion. In: ICSLP
Google Scholar
Scherer KR, Johnstone T, Klasmeyer G et al (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech? In: INTERSPEECH. pp 807–810
Google Scholar
Scherer KR, Grandjean D, Johnstone T et al Acoustic correlates of task load and stress. In: INTERSPEECH
Google Scholar
Shahin I (2009) Speaker identification in emotional environments. Iran J Electr Comput Eng 8(1):41–46
Google Scholar
Bie F, Wang D, Zheng TF et al (2013) Emotional adaptive training for speaker verification. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4
Google Scholar
Wu W, Zheng TF, Xu MX et al (2006) Study on speaker verification on emotional speech. In: INTERSPEECH
Google Scholar
Shan Z, Yang Y (2008) Learning polynomial function based neutral-emotion GMM transformation for emotional speaker recognition. In: 19th International Conference on Pattern Recognition, 2008, ICPR 2008. IEEE, pp 1–4
Google Scholar
Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475
Article Google Scholar
Rozi A, Li L, Wang D et al (2016) Feature transformation for speaker verification under speaking rate mismatch condition. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4
Google Scholar
van Heerden CJ, Barnard E (2007) Speech rate normalization used to improve speaker verification. In: Proceedings of the Symposium of the Pattern Recognition Association of South Africa. pp 2–7
Google Scholar
Erman B, Warren B (2000) The idiom principle and the open choice principle. Text-Interdisc J Study Discourse 20(1):29–62
Google Scholar
Makkai A (1972) Idiom structure in English. Walter de Gruyter
Google Scholar
Cacciari C, Glucksberg S (1991) Understanding idiomatic expressions: the contribution of word meanings. Adv Psychol 77:217–240
Article Google Scholar
Leech G, Garside R, Bryant M (1994) CLAWS4: the tagging of the British National Corpus. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 622–628
Google Scholar
Doddington GR (2001) Speaker recognition based on idiolectal differences between speakers. In: INTERSPEECH. pp 2521–2524
Google Scholar
Kajarekar SS, Ferrer L, Shriberg E et al (2005) SRI’s 2004 NIST speaker recognition evaluation system. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, I/173–I/176
Google Scholar
Stolcke GTESA, Kajarekar S (2007) Duration and pronunciation conditioned lexical modeling for speaker verification
Google Scholar
Andrews WD, Kohler MA, Campbell JP et al (2002) Gender-dependent phonetic refraction for speaker recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1. IEEE, pp I-149–I-152
Google Scholar
Navrátil J, ** Q, Andrews WD et al (2003) Phonetic speaker recognition using maximum-likelihood binary-decision tree models. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 4. IEEE, p IV-796
Google Scholar
** Q, Navratil J, Reynolds DA et al (2003) Combining cross-stream and time dimensions in phonetic speaker recognition. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03). IEEE, p IV-800
Google Scholar
Hatch AO, Peskin B, Stolcke A (2005) Improved phonetic speaker recognition using lattice decoding. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, pp I/169–I/172
Google Scholar
Auckenthaler R, Carey MJ, Mason JSD (2001) Language dependency in text-independent speaker verification. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001 (ICASSP’01), vol 1. IEEE, pp 441–444
Google Scholar
Ma B, Meng H (2004) English-Chinese bilingual text-independent speaker verification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 (ICASSP’04), vol 5. IEEE, p V-293
Google Scholar
Askar R, Wang D, Bie F et al (2015) Cross-lingual speaker verification based on linear transform. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 519–523
Google Scholar
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Google Scholar
Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
MathSciNet MATH Google Scholar
Wang J, Johnson MT (2013) Vocal source features for bilingual speaker identification. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 170–173
Google Scholar
Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260
Google Scholar
Nagaraja BG, Jayanna HS (2013) Combination of features for multilingual speaker identification with the constraint of limited data. Int J Comput Appl 70(6)
Google Scholar
Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260
Google Scholar
Lu L, Dong Y, Zhao X et al (2009) The effect of language factors for robust speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ICASSP 2009. IEEE, pp 4217–4220
Google Scholar
Kersta LG (1962) Voiceprint identification. J Acoust Soc Am 34(5):725
Article Google Scholar
Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872
Article Google Scholar
Bonastre JF, Bimbot F, Boë LJ et al (2003) Person authentication by voice: a need for caution. In: INTERSPEECH
Google Scholar
Mishra P (2012) A vector quantization approach to speaker recognition. In: Proceedings of the International Conference on Innovation & Research in Technology for sustainable development (ICIRT 2012), vol 1. p 152
Google Scholar
Kato T, Shimizu T (2003) Improved speaker, verification over the cellular phone network using phoneme-balanced and digit-sequence-preserving connected digit patterns. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 3. IEEE, p II-57
Google Scholar
Hébert M (2008) Text-dependent speaker recognition. In: Springer handbook of speech processing. Springer, Berlin, Heidelberg, pp 743–762
Google Scholar
Bimbot F, Bonastre JF, Fredouille C et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process, 430–451
Google Scholar
Markel J, Davis S (1979) Text-independent speaker recognition from a large linguistically unconstrained time-spaced data base. IEEE Trans Acoust Speech Signal Process 27(1):74–82
Article Google Scholar
Beigi H (2009) Effects of time lapse on speaker recognition results. In: 16th International Conference on Digital Signal Processing, 2009. IEEE, pp 1–6
Google Scholar
Beigi H (2011) Fundamentals of speaker recognition. Springer Science & Business Media
Google Scholar
Lamel LF, Gauvain JL (2000) Speaker verification over the telephone. Speech Commun 31(2):141–154
Article Google Scholar
Kelly F, Harte N (2011) Effects of long-term ageing on speaker verification. In: European Workshop on Biometrics and Identity Management. Springer, Berlin, Heidelberg, pp 113–124
Google Scholar
Kelly F, Drygajlo A, Harte N (2012) Speaker verification with long-term ageing data. In: 2012 5th IAPR International Conference on Biometrics (ICB). IEEE, pp 478–483
Google Scholar
Wang L, Wang J, Li L et al (2016) Improving speaker verification performance against long-term speaker variability. Speech Commun 79:14–29
Article Google Scholar
Wang L, Zheng TF (2010) Creation of time-varying voiceprint database. In: Proc. Oriental-COCOSDA
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Division of Technical Innovation and Development, Department of Computer Science and Technology, Center for Speech and Language Technologies, Research Institute of Information Technology, Tsinghua University, Bei**g, 100084, China
Thomas Fang Zheng & Lantian Li

Authors

Thomas Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lantian Li
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, T.F., Li, L. (2017). Speaker-Related Robustness Issues. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-3238-7_3
Published: 07 April 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3237-0
Online ISBN: 978-981-10-3238-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics