Abstract
The main objective of speaker recognition is to identify the voice of an authenticated and authorized individual by extracting features from their voices. The number of published techniques for speaker recognition algorithms is text-dependent. On the other hand, text-independent speech recognition appears to be more advantageous since the user can freely interact with the system. Several scholars have suggested a variety of strategies for detecting speakers, although these systems were difficult and inaccurate. Relying on WOA and Bi-LSTM, this research suggested a text-independent speaker identification algorithm. In presence of various degradation and voice effects, the sample signals were obtained from a available dataset. Following that, MFCC features are extracted from these signals, but only the most important characteristics are chosen from the available features by utilizing WOA to build a single feature set. The Bi-LSTM network receives this feature set and uses it for training and testing. In the MATLAB simulation software, the proposed model’s performance is assessed and compared to that of the standard model. Various dependent factors, like accuracy, sensitivity, specificity, precision, recall, and Fscore, were used to calculate the simulated outputs. The findings showed that the suggested model is more efficient and precise at recognizing speaker voices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zilovic, M.S., Ramachandran, R.P., Mammone, R.J.: Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio Process. 6, 260–267 (1998)
Tranter, S., Reynolds, D.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14, 1557–1565 (2006)
Alexander, A., Botti, F., Dessimoz, D., Drygajlo, A.: The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Sci. Int. 146S, 95–99 (2004)
Hansen, J., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. Sign. Process. Mag. IEEE 32, 74–99 (2015)
Jothilakshmi, S., Gudivada, V.N.: Large scale data enabled evolution of spoken language research and applications. Elsevier 35, 301–340 (2016)
Kekre, H., Kulkarni, V.: Closed set and open set Speaker Identification using amplitude distribution of different transforms. In: 2013 International Conference on Advances in Technology and Engineering, pp. 1–8 (2013)
Mathu, S., et al.: Speaker recognition system and its forensic implications. Open Access Scientific Reports (2013)
Imdad, M.N., et al.: Speaker recognition in noisy environment. Int. J. Adv. Res. Comput. Sci. Electron. Eng. 1, 52–57 (2012)
Imam, S.A., et al.: Review: speaker recognition using automated systems. AGU Int. J. Eng. Technol. 5, 31–39 (2017)
Dhakal, P., Damacharla, P., Javaid, A.Y., Devabhaktuni, V.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)
Varun, S., Bansal, P.K.: A review on speaker recognition approaches and challenges. Int. J. Eng. Res. Technol. (IJERT) 2, 1581–1588 (2013)
Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: Proceedings of the Second Baltic Conference on Human Language Technologies, pp. 317–322 (2005)
Pfister, B., Beutler, R.: Estimating the weight of evidence in forensic speaker verification. In: Proceedings of the 8th European Conference on Speech Communication and Technology, pp. 701–704 (2003)
Thiruvaran, T., Ambikairajah, E., Epps, J.: FM features for automatic forensic speaker recognition. In: Proceedings of the Interspeech 2008, pp. 1497–1500 (2008)
Hebert, M.: Text-dependent speaker recognition. Springer handbook of speech processing. Springer Verlag, pp. 743–762, 2008. https://doi.org/10.1007/978-3-540-49127-9_37
Nayana, P.K., Mathew, D., Thomas, A.: Comparison of text independent speaker identification systems using GMM and i-Vector methods. Procedia Comput. Sci. 115, 47–54 (2017)
El-Moneim, S., Nassar, M., Dessouky, M.I., Ismail, N., El-Fishawy, A., Abd El-Samie, F.: Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools Appl. (2020). https://doi.org/10.1007/s11042-019-08293-7
Zhao, X., Wei, Y.: Speaker recognition based on deep learning. In: 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 283–287 (2019)
Nammous, M.K., Saeed, K., Kobojek, P.: Using a small amount of text-independent speech data for a BiLSTM large-scale speaker identification approach. J. King Saud Univ.- Comput. Inf. Sci. (2020)
Mobin, A., Najarian, M.: Text-independent speaker verification using long short-term memory networks. ar**v:1805.00604 (2018)
Shon, S., Tang, H., Glass, J.: Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1007–1013 (2018)
Jagiasi, R., Ghosalkar, S., Kulal, P., Bharambe, A.: CNN based speaker recognition in language and text-independent small scale system. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 176–179 (2019)
Mokgonyane, T.B., Sefara, T.J., Modipa, T.I., Mogale, M.M., Manamela, M.J., Manamela, P.J.: Automatic speaker recognition system based on machine learning algorithms. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 141–146 (2019)
Hourri, S., Kharroubi, J.: A deep learning approach for speaker recognition. Int. J. Speech Technol. 23(1), 123–131 (2019). https://doi.org/10.1007/s10772-019-09665-y
Mohammadi, M., Mohammadi, H.R.S.: Weighted I-vector based text-independent speaker verification system. In: 2019 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1647–1653 (2019)
Huang, D., Mao, Q., Ma, Z., et al.: Latent discriminative representation learning for speaker recognition. Front Inform. Technol. Electron. Eng. 22, 697–708 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Goyal, A., Kapil, A., Sharma, S., Jaiswal, G., Sharma, A. (2022). Deep Learning Approaches for Speech Analysis: A Critical Insight. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-95711-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)