Abstract
Speaker dependent factors, such as gender, physical condition (cold or laryngitis), speaking style (emotion state, speech rate, etc.), cross-language, accent and session variations, are major concerns in speech signal processing. How they correlate with each other and what the key factors are in speech realization are real considerations in research [1]. The current mainstream research can be divided into five directions which will be described in the following subsections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang C, Chen T, Li SZ et al (2001) Analysis of speaker variability. In: INTERSPEECH. pp 1377–1380
Cumani S, Glembek O, Brümmer N et al (2012) Gender independent discriminative speaker recognition in i-vector space. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4361–4364
McLaren M, van Leeuwen DA (2012) Gender-independent speaker recognition using source normalization. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4373–4376
Tull RG, Rutledge JC (1996) Analysis of ‘‘cold-affected’’ speech for inclusion in speaker recognition systems. J Acoust Soc Am 99(4):2549–2574
Tull RG, Rutledge JC (1996) ‘Cold speech’ for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers
Tull RG, Rutledge JC, Larson CR (1996) Cepstral analysis of ‘‘cold‐speech’’ for speaker recognition: a second look. Dissertation, ASA
Tull RG (1999) Acoustic analysis of cold-speech: implications for speaker recognition technology and the common cold. Northwestern University
Kwon OW, Chan K, Hao J et al (2003) Emotion recognition by speech signals. In: INTERSPEECH
Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5(3):275–294
Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’87, vol 12. IEEE, pp 705–708
Bie F, Wang D, Zheng TF et al (2013) Emotional speaker verification with linear adaptation. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 91–94
Zetterholm E (1998) Prosody and voice quality in the expression of emotions. In: ICSLP
Wu T, Yang Y, Wu Z (2005) Improving speaker recognition by training on emotion-added models. In: International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, Heidelberg, pp 382–389
Pereira C, Watson CI (1998) Some acoustic characteristics of emotion. In: ICSLP
Scherer KR, Johnstone T, Klasmeyer G et al (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech? In: INTERSPEECH. pp 807–810
Scherer KR, Grandjean D, Johnstone T et al Acoustic correlates of task load and stress. In: INTERSPEECH
Shahin I (2009) Speaker identification in emotional environments. Iran J Electr Comput Eng 8(1):41–46
Bie F, Wang D, Zheng TF et al (2013) Emotional adaptive training for speaker verification. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4
Wu W, Zheng TF, Xu MX et al (2006) Study on speaker verification on emotional speech. In: INTERSPEECH
Shan Z, Yang Y (2008) Learning polynomial function based neutral-emotion GMM transformation for emotional speaker recognition. In: 19th International Conference on Pattern Recognition, 2008, ICPR 2008. IEEE, pp 1–4
Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475
Rozi A, Li L, Wang D et al (2016) Feature transformation for speaker verification under speaking rate mismatch condition. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4
van Heerden CJ, Barnard E (2007) Speech rate normalization used to improve speaker verification. In: Proceedings of the Symposium of the Pattern Recognition Association of South Africa. pp 2–7
Erman B, Warren B (2000) The idiom principle and the open choice principle. Text-Interdisc J Study Discourse 20(1):29–62
Makkai A (1972) Idiom structure in English. Walter de Gruyter
Cacciari C, Glucksberg S (1991) Understanding idiomatic expressions: the contribution of word meanings. Adv Psychol 77:217–240
Leech G, Garside R, Bryant M (1994) CLAWS4: the tagging of the British National Corpus. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 622–628
Doddington GR (2001) Speaker recognition based on idiolectal differences between speakers. In: INTERSPEECH. pp 2521–2524
Kajarekar SS, Ferrer L, Shriberg E et al (2005) SRI’s 2004 NIST speaker recognition evaluation system. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, I/173–I/176
Stolcke GTESA, Kajarekar S (2007) Duration and pronunciation conditioned lexical modeling for speaker verification
Andrews WD, Kohler MA, Campbell JP et al (2002) Gender-dependent phonetic refraction for speaker recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1. IEEE, pp I-149–I-152
Navrátil J, ** Q, Andrews WD et al (2003) Phonetic speaker recognition using maximum-likelihood binary-decision tree models. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 4. IEEE, p IV-796
** Q, Navratil J, Reynolds DA et al (2003) Combining cross-stream and time dimensions in phonetic speaker recognition. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03). IEEE, p IV-800
Hatch AO, Peskin B, Stolcke A (2005) Improved phonetic speaker recognition using lattice decoding. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, pp I/169–I/172
Auckenthaler R, Carey MJ, Mason JSD (2001) Language dependency in text-independent speaker verification. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001 (ICASSP’01), vol 1. IEEE, pp 441–444
Ma B, Meng H (2004) English-Chinese bilingual text-independent speaker verification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 (ICASSP’04), vol 5. IEEE, p V-293
Askar R, Wang D, Bie F et al (2015) Cross-lingual speaker verification based on linear transform. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 519–523
Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Wang J, Johnson MT (2013) Vocal source features for bilingual speaker identification. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 170–173
Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260
Nagaraja BG, Jayanna HS (2013) Combination of features for multilingual speaker identification with the constraint of limited data. Int J Comput Appl 70(6)
Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260
Lu L, Dong Y, Zhao X et al (2009) The effect of language factors for robust speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ICASSP 2009. IEEE, pp 4217–4220
Kersta LG (1962) Voiceprint identification. J Acoust Soc Am 34(5):725
Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872
Bonastre JF, Bimbot F, Boë LJ et al (2003) Person authentication by voice: a need for caution. In: INTERSPEECH
Mishra P (2012) A vector quantization approach to speaker recognition. In: Proceedings of the International Conference on Innovation & Research in Technology for sustainable development (ICIRT 2012), vol 1. p 152
Kato T, Shimizu T (2003) Improved speaker, verification over the cellular phone network using phoneme-balanced and digit-sequence-preserving connected digit patterns. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 3. IEEE, p II-57
Hébert M (2008) Text-dependent speaker recognition. In: Springer handbook of speech processing. Springer, Berlin, Heidelberg, pp 743–762
Bimbot F, Bonastre JF, Fredouille C et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process, 430–451
Markel J, Davis S (1979) Text-independent speaker recognition from a large linguistically unconstrained time-spaced data base. IEEE Trans Acoust Speech Signal Process 27(1):74–82
Beigi H (2009) Effects of time lapse on speaker recognition results. In: 16th International Conference on Digital Signal Processing, 2009. IEEE, pp 1–6
Beigi H (2011) Fundamentals of speaker recognition. Springer Science & Business Media
Lamel LF, Gauvain JL (2000) Speaker verification over the telephone. Speech Commun 31(2):141–154
Kelly F, Harte N (2011) Effects of long-term ageing on speaker verification. In: European Workshop on Biometrics and Identity Management. Springer, Berlin, Heidelberg, pp 113–124
Kelly F, Drygajlo A, Harte N (2012) Speaker verification with long-term ageing data. In: 2012 5th IAPR International Conference on Biometrics (ICB). IEEE, pp 478–483
Wang L, Wang J, Li L et al (2016) Improving speaker verification performance against long-term speaker variability. Speech Commun 79:14–29
Wang L, Zheng TF (2010) Creation of time-varying voiceprint database. In: Proc. Oriental-COCOSDA
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 The Author(s)
About this chapter
Cite this chapter
Zheng, T.F., Li, L. (2017). Speaker-Related Robustness Issues. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-3238-7_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3237-0
Online ISBN: 978-981-10-3238-7
eBook Packages: EngineeringEngineering (R0)