Robust Speaker Recognition Based on Stacked Auto-encoders

Wang, Zhifeng; Zeng, Chunyan; Duan, Surong; Ouyang, Hongjie; Xu, Hongmin

doi:10.1007/978-3-030-57811-4_38

Zhifeng Wang¹⁸,
Chunyan Zeng¹⁹,
Surong Duan¹⁸,
Hongjie Ouyang¹⁸ &
…
Hongmin Xu¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1264))

Included in the following conference series:

International Conference on Network-Based Information Systems

889 Accesses

Abstract

Speaker recognition is a biometric modality which utilize speaker’s speech segments to recognize identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of i-vector framework on cross-channel conditions and explore the nova method for applying deep learning to speaker recognition, the Stacked Auto-encoders is applied to get the abstract extraction of the i-vector instead of applying PLDA. After pre-processing and feature extraction, the speaker and channel independent speeches are employed for UBM training. The UBM is then used to extract the i-vector of the enrollment and test speech. Unlike the traditional i-vector framework, which uses linear discriminant analysis (LDA) to reduce dimension and increase the discrimination between speaker subspaces, this research use stacked auto-encoders to reconstruct the i-vector with lower dimension and different classifiers can be chosen to achieve final classification. The experimental results show that the proposed method achieves better performance than the-state-of-the-art method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 213.99; Price includes VAT (Germany)

Softcover Book: EUR 267.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speaker Recognition Using SincNet and X-Vector Fusion

A method of multi-models fusion for speaker recognition

Article 21 May 2022

A deep learning approach for speaker recognition

Article 18 December 2019

References

Atal, B.S., Hanaver, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50(2), 637–655 (1971)
Article Google Scholar
Doddington, G.R., Flanagan, J.L., Lummis, R.C.: Automatic speaker verification by non-linear time alignment of acoustic parameters (3700815) (1972)
Google Scholar
Member, S.B.D.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Read. Speech Recogn. 28(4), 65–74 (1990)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models (2000)
Google Scholar
Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition (2002)
Google Scholar
Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. Technical report (2005)
Google Scholar
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Mclaren, M.L., Leeuwen, D.A.V.: Source-normalised IDA for robust speaker recognition using i-vectors. In: IEEE International Conference on Acoustics (2011)
Google Scholar
Prince, S.J.D., Elder, J.H.: Probabilistic linear discriminant analysis for inferences about identity (2007)
Google Scholar
Song, Y., Hong, X., Jiang, B., Cui, R., Mcloughlin, I.V., Dai, L.R.: Deep bottleneck network based i-vector representation for language identification (2015)
Google Scholar
Matejka, P., Zhang, L., Ng, T., Mallidi, S.H., Glembek, O., Ma, J., Zhang, B.: Neural network bottleneck features for language identification. In: Proceedings of the Speaker and Language Recognition Workshop (Odyssey 2014), pp. 299–304 (2014)
Google Scholar
Zhang, Z., Wang, L., Kai, A., Yamada, T., Li, W., Iwahashi, M.: Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification. EURASIP J. Audio Speech Music Process. 2015(1), 12 (2015)
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back propagating errors. Nature 323, 533–536 (1986)
Article Google Scholar
Lamel, L.F., Gauvain, J.L.: A phone-based approach to non-linguistic speech feature identification. Comput. Speech Lang. 9(1), 87–103 (1995)
Article Google Scholar
Bocklet, T., Maier, A., Bauer, J.G., Burkhardt, F., Noth, E.: Age and gender recognition for telephone applications based on GMM supervectors and support vector machines. In: IEEE International Conference on Acoustics (2008)
Google Scholar
**a, R., Deng, J., Schuller, B., Liu, Y.: Modeling gender information for emotion recognition using denoising autoencoder. In: IEEE International Conference on Acoustics (2014)
Google Scholar
Shafey, L.E., Khoury, E., Marcel, S.: Audio-visual gender recognition in uncontrolled environment using variability modeling techniques. In: IEEE International Joint Conference on Biometrics (2014)
Google Scholar

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China (No.61901165, 61501199), Science and Technology Research Project of Hubei Education Department (No. Q20191406), Hubei Natural Science Foundation (No. 2017CFB683), and self-determined research funds of CCNU from the colleges’ basic research and operation of MOE (No. CCNU20ZT010).

Author information

Authors and Affiliations

Department of Digital Media Technology, Central China Normal University, Wuhan, 430079, Hubei, China
Zhifeng Wang, Surong Duan, Hongjie Ouyang & Hongmin Xu
Hubei Key Laboratory for High-efficiency Utilization of Solar Energy and Operation Control of Energy Storage System, Hubei University of Technology, Wuhan, 430068, Hubei, China
Chunyan Zeng

Authors

Zhifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunyan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Surong Duan
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Hongmin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunyan Zeng .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada
Kin Fun Li
Faculty of Business Administration, Rissho University, Tokyo, Japan
Tomoya Enokido
Department of Advanced Sciences, Hosei University, Tokyo, Japan
Makoto Takizawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Zeng, C., Duan, S., Ouyang, H., Xu, H. (2021). Robust Speaker Recognition Based on Stacked Auto-encoders. In: Barolli, L., Li, K., Enokido, T., Takizawa, M. (eds) Advances in Networked-Based Information Systems. NBiS 2020. Advances in Intelligent Systems and Computing, vol 1264. Springer, Cham. https://doi.org/10.1007/978-3-030-57811-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-57811-4_38
Published: 20 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57810-7
Online ISBN: 978-3-030-57811-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Robust Speaker Recognition Based on Stacked Auto-encoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speaker Recognition Using SincNet and X-Vector Fusion

A method of multi-models fusion for speaker recognition

A deep learning approach for speaker recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robust Speaker Recognition Based on Stacked Auto-encoders

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speaker Recognition Using SincNet and X-Vector Fusion

A method of multi-models fusion for speaker recognition

A deep learning approach for speaker recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation