Deep Learning Approaches for Speech Analysis: A Critical Insight

Goyal, Alisha; Kapil, Advikaa; Sharma, Sparsh; Jaiswal, Garima; Sharma, Arun

doi:10.1007/978-3-030-95711-7_7

Alisha Goyal⁸,
Advikaa Kapil⁹,
Sparsh Sharma¹⁰,
Garima Jaiswal⁸ &
…
Arun Sharma⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1546))

Included in the following conference series:

International Conference on Artificial Intelligence and Speech Technology

1004 Accesses

Abstract

The main objective of speaker recognition is to identify the voice of an authenticated and authorized individual by extracting features from their voices. The number of published techniques for speaker recognition algorithms is text-dependent. On the other hand, text-independent speech recognition appears to be more advantageous since the user can freely interact with the system. Several scholars have suggested a variety of strategies for detecting speakers, although these systems were difficult and inaccurate. Relying on WOA and Bi-LSTM, this research suggested a text-independent speaker identification algorithm. In presence of various degradation and voice effects, the sample signals were obtained from a available dataset. Following that, MFCC features are extracted from these signals, but only the most important characteristics are chosen from the available features by utilizing WOA to build a single feature set. The Bi-LSTM network receives this feature set and uses it for training and testing. In the MATLAB simulation software, the proposed model’s performance is assessed and compared to that of the standard model. Various dependent factors, like accuracy, sensitivity, specificity, precision, recall, and Fscore, were used to calculate the simulated outputs. The findings showed that the suggested model is more efficient and precise at recognizing speaker voices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 93.08; Price includes VAT (Germany)

Softcover Book: EUR 117.69; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comprehensive Review on Speaker Recognition

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

Article 17 November 2023

End-to-end speaker identification research based on multi-scale SincNet and CGAN

Article 02 August 2023

References

Zilovic, M.S., Ramachandran, R.P., Mammone, R.J.: Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Trans. Speech Audio Process. 6, 260–267 (1998)
Article Google Scholar
Tranter, S., Reynolds, D.: An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Lang. Process. 14, 1557–1565 (2006)
Article Google Scholar
Alexander, A., Botti, F., Dessimoz, D., Drygajlo, A.: The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Sci. Int. 146S, 95–99 (2004)
Article Google Scholar
Hansen, J., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. Sign. Process. Mag. IEEE 32, 74–99 (2015)
Article Google Scholar
Jothilakshmi, S., Gudivada, V.N.: Large scale data enabled evolution of spoken language research and applications. Elsevier 35, 301–340 (2016)
Google Scholar
Kekre, H., Kulkarni, V.: Closed set and open set Speaker Identification using amplitude distribution of different transforms. In: 2013 International Conference on Advances in Technology and Engineering, pp. 1–8 (2013)
Google Scholar
Mathu, S., et al.: Speaker recognition system and its forensic implications. Open Access Scientific Reports (2013)
Google Scholar
Imdad, M.N., et al.: Speaker recognition in noisy environment. Int. J. Adv. Res. Comput. Sci. Electron. Eng. 1, 52–57 (2012)
Google Scholar
Imam, S.A., et al.: Review: speaker recognition using automated systems. AGU Int. J. Eng. Technol. 5, 31–39 (2017)
Google Scholar
Dhakal, P., Damacharla, P., Javaid, A.Y., Devabhaktuni, V.: A near real-time automatic speaker recognition architecture for voice-based user interface. Mach. Learn. Knowl. Extr. 1, 504–520 (2019)
Article Google Scholar
Varun, S., Bansal, P.K.: A review on speaker recognition approaches and challenges. Int. J. Eng. Res. Technol. (IJERT) 2, 1581–1588 (2013)
Google Scholar
Niemi-Laitinen, T., Saastamoinen, J., Kinnunen, T., Fränti, P.: Applying MFCC-based automatic speaker recognition to GSM and forensic data. In: Proceedings of the Second Baltic Conference on Human Language Technologies, pp. 317–322 (2005)
Google Scholar
Pfister, B., Beutler, R.: Estimating the weight of evidence in forensic speaker verification. In: Proceedings of the 8th European Conference on Speech Communication and Technology, pp. 701–704 (2003)
Google Scholar
Thiruvaran, T., Ambikairajah, E., Epps, J.: FM features for automatic forensic speaker recognition. In: Proceedings of the Interspeech 2008, pp. 1497–1500 (2008)
Google Scholar
Hebert, M.: Text-dependent speaker recognition. Springer handbook of speech processing. Springer Verlag, pp. 743–762, 2008. https://doi.org/10.1007/978-3-540-49127-9_37
Nayana, P.K., Mathew, D., Thomas, A.: Comparison of text independent speaker identification systems using GMM and i-Vector methods. Procedia Comput. Sci. 115, 47–54 (2017)
Article Google Scholar
El-Moneim, S., Nassar, M., Dessouky, M.I., Ismail, N., El-Fishawy, A., Abd El-Samie, F.: Text-independent speaker recognition using LSTM-RNN and speech enhancement. Multimedia Tools Appl. (2020). https://doi.org/10.1007/s11042-019-08293-7
Zhao, X., Wei, Y.: Speaker recognition based on deep learning. In: 2019 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 283–287 (2019)
Google Scholar
Nammous, M.K., Saeed, K., Kobojek, P.: Using a small amount of text-independent speech data for a BiLSTM large-scale speaker identification approach. J. King Saud Univ.- Comput. Inf. Sci. (2020)
Google Scholar
Mobin, A., Najarian, M.: Text-independent speaker verification using long short-term memory networks. ar**v:1805.00604 (2018)
Shon, S., Tang, H., Glass, J.: Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 1007–1013 (2018)
Google Scholar
Jagiasi, R., Ghosalkar, S., Kulal, P., Bharambe, A.: CNN based speaker recognition in language and text-independent small scale system. In: 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 176–179 (2019)
Google Scholar
Mokgonyane, T.B., Sefara, T.J., Modipa, T.I., Mogale, M.M., Manamela, M.J., Manamela, P.J.: Automatic speaker recognition system based on machine learning algorithms. In: 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA), pp. 141–146 (2019)
Google Scholar
Hourri, S., Kharroubi, J.: A deep learning approach for speaker recognition. Int. J. Speech Technol. 23(1), 123–131 (2019). https://doi.org/10.1007/s10772-019-09665-y
Article Google Scholar
Mohammadi, M., Mohammadi, H.R.S.: Weighted I-vector based text-independent speaker verification system. In: 2019 27th Iranian Conference on Electrical Engineering (ICEE), pp. 1647–1653 (2019)
Google Scholar
Huang, D., Mao, Q., Ma, Z., et al.: Latent discriminative representation learning for speaker recognition. Front Inform. Technol. Electron. Eng. 22, 697–708 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Indira Gandhi Delhi Technical University for Women, Delhi, India
Alisha Goyal, Garima Jaiswal & Arun Sharma
Sanskriti School, Chanakyapuri, New Delhi, India
Advikaa Kapil
Delhi Technological University, Delhi, India
Sparsh Sharma

Authors

Alisha Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Advikaa Kapil
View author publications
You can also search for this author in PubMed Google Scholar
Sparsh Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Garima Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Arun Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arun Sharma .

Editor information

Editors and Affiliations

Indira Gandhi Delhi Technical University for Women, Delhi, India
Amita Dev
Kamrah Institute of Information Technology, Gurgaon, India
S. S. Agrawal
Indira Gandhi Delhi Technical University for Women, Delhi, India
Arun Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goyal, A., Kapil, A., Sharma, S., Jaiswal, G., Sharma, A. (2022). Deep Learning Approaches for Speech Analysis: A Critical Insight. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-95711-7_7
Published: 29 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Learning Approaches for Speech Analysis: A Critical Insight

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Review on Speaker Recognition

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

End-to-end speaker identification research based on multi-scale SincNet and CGAN

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deep Learning Approaches for Speech Analysis: A Critical Insight

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Review on Speaker Recognition

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

End-to-end speaker identification research based on multi-scale SincNet and CGAN

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation