An effective speaker adaption using deep learning for the identification of speakers in emergency situation

Deka, Aniruddha; Kalita, Nijara

doi:10.1007/s11042-024-19373-8

An effective speaker adaption using deep learning for the identification of speakers in emergency situation

Published: 02 July 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Aniruddha Deka¹ &
Nijara Kalita¹

8 Accesses
Explore all metrics

Abstract

Automated speaker identification is an important research topic in recent advanced technologies. This process helps to analyze the speakers in their emergencies. Various existing approaches are used for speaker identification, and the existing systems cannot distinguish background noises such as music and traffic, which can cause signal distortion. Moreover, the existing speaker identification techniques do not have better learning ability and face high computational complexity. Also, they failed to recognize the speaker’s condition because of the inability to extract suitable keywords and noisy speeches. To overcome this issues, the proposed study introduced a new deep-learning mechanism for effective speaker identification in emergency situations. The input audio files are initially collected, and pre-processing is performed to reduce the noise issue using the original speech separation process. The keywords are spotted using the proposed Mel-Frequency Cepstral Coefficients with Binary Weighted Network (MFCC-BWN) from the pre-processed data. This keyword extraction step helps to improve the recognition accuracy because of the significant attributes. Finally, audio files using Adaptive Improved LSTM (AILSTM) identify the speaker’s situation. The simulation result analysis shows that the proposed model outperforms the other comparable methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

Article 17 November 2023

Deep Learning Approaches for Speech Analysis: A Critical Insight

Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments

Article 22 June 2021

Data availability statement

Data sharing not applicable to this article.

References

Ohi AQ, Mridha MF, Hamid MA, Monowar MM (2021) Deep speaker recognition: process, progress, and challenges. IEEE Access 9:89619–89643
Article Google Scholar
Ye F, Yang J (2021) A deep neural network model for speaker identification. Appl Sci 11(8):3603
Article Google Scholar
Deschamps-Berger T, Lamel L, Devillers L (2022) Investigating transformer encoders and fusion strategies for speech emotion recognition in emergency call center conversations. Incompanion publication of the 2022 international conference on multimodal interaction, 144–153
Rataj J, Helmke H, Ohneiser O (2021) AcListant with continuous learning: speech recognition in air traffic control. InAir traffic management and systems IV: selected papers of the 6th ENRI international workshop on ATM/CNS (EIWAC2019) 6:93–109. Springer, Singapore
Colla M, Santos GD, Oliveira GA, de Vasconcelos RB (2023) Ambulance response time in a Brazilian emergency medical service. Socioecon Plann Sci 85:101434
Article Google Scholar
Yamazaki Y, Tamaki M, Premachandra C, Perera CJ, Sumathipala S, Sudantha BH (2019) Victim detection using UAV with on-board voice recognition system. In2019 Third IEEE International Conference on Robotic Computing (IRC), 555–559. IEEE
Gao J, Xu Z, Liang Z, Liao H (2019) Expected consistency-based emergency decision making with incomplete probabilistic linguistic preference relations. Knowl-Based Syst 176:15–28
Article Google Scholar
Briton D (2020) Is this an emergency?’ What is an emergency on a school expedition?
Tursunov A, Mustaqeem CJY, Kwon S (2021) Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors 21(17):5892
Article Google Scholar
Kröger JL, Lutz OH, Raschke P (2020) Privacy implications of voice and speech analysis–information disclosure by inference. Privacy and identity management. Data for better living: AI and privacy: 14th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2. 2 International Summer School, Windisch, Switzerland, August 19–23, 2019. Rev Select Papers 14:242–258
Google Scholar
Bastanfard A, Abbasian A (2023) Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features. Multimedia tools and applications. 1–8
Yousefi M (2021) Deep learning based methods for detection, separation, and recognition of overlap** speech. The University of Texas at Dallas
Wei J, Dingler T, Kostakos V (2021) Understanding User Perceptions of Proactive Smart Speakers. Proc ACM Inter Mobile Wear Ubiquitous Technol 5(4):1–28
Article Google Scholar
Alnuaim AA, Zakariah M, Shashidhar C, Hatamleh WA, Tarazi H, Shukla PK, Ratna R (2022) Speaker gender recognition based on deep neural networks and ResNet50. Wirel Commun Mob Comput 2022:1–3
Article Google Scholar
Ma P, Petridis S, Pantic M (2021) End-to-end audio-visual speech recognition with conformers. InICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 7613–7617. IEEE
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107
Article Google Scholar
Dangol R, Alsadoon A, Prasad PW, Seher I, Alsadoon OH (2020) Speech emotion recognition UsingConvolutional neural network and long-short TermMemory. Multimedia Tools Appl 79:32917–32934
Article Google Scholar
O’Donovan R, McAuliffe E (2020) A systematic review exploring the content and outcomes of interventions to improve psychological safety, speaking up and voice behaviour. BMC Health Serv Res 20(1):1–1
Article Google Scholar
Guerrieri A, Braccili E, Sgrò F, Meldolesi GN (2022) Gender identification in a two-level hierarchical speech emotion recognition system for an Italian Social Robot. Sensors 22(5):1714
Article Google Scholar
Snyder D, Garcia-Romero D, Sell G, McCree A, Povey D, Khudanpur S (2019) Speaker recognition for multi-speaker conversations using x-vectors. InICASSP 2019–2019 IEEE International conference on acoustics, speech and signal processing (ICASSP) 5796–5800. IEEE
Valizada A, Akhundova N, Rustamov S (2021) Development of speech recognition systems in emergency call centers. Symmetry 13(4):634
Article Google Scholar
Singh YB, Goel S (2021) An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning. Multimed Tools Appl 80(9):14001–14018
Article Google Scholar
Ahmad J, Sajjad M, Rho S, Kwon SI, Lee MY, Baik SW (2018) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimedia Tools Appl 77:4883–4907
Article Google Scholar
Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS (2023) Speech emotion recognition through hybrid features and convolutional neural network. Appl Sci 13(8):4750
Article Google Scholar
Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Expert Syst Appl 193:116469
Article Google Scholar
Barhoush M, Hallawa A, Schmeink A (2023) Speaker identification and localization using shuffled MFCC features and deep learning. Int J Speech Technol 26(1):185–196
Article Google Scholar
Abraham JVT, Khan AN, Shahina A (2023) A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients. Int J Speech Technol 26(3):579–587
Article Google Scholar
Garain A, Singh PK, Sarkar R (2021) FuzzyGCP: a deep learning architecture for automatic spoken language identification from speech signals. Expert Syst Appl 168:114416
Article Google Scholar
Shahin I, Nassif AB, Hindawi N (2021) Speaker identification in stressful talking environments based on convolutional neural network. Int J Speech Technol 24(4):1055–1066
Article Google Scholar
Hamsa S, Shahin I, Iraqi Y, Damiani E, Nassif AB, Werghi N (2023) Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG. Expert Syst Appl 224:119871
Article Google Scholar
Jahangir R, The YW, Memon NA, Mujtaba G, Zareei M, Ishtiaq U, Akhtar MZ, Ali I (2020) Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8:32187–32202
Article Google Scholar
Almarshady NM, Alashban AA, Alotaibi YA (2023) Analysis and investigation of speaker identification problems using deep learning networks and the YOHO English speech dataset. Appl Sci 13(17):9567
Article Google Scholar

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Assam down town University, Guwahati, 781026, Assam, India
Aniruddha Deka & Nijara Kalita

Authors

Aniruddha Deka
View author publications
You can also search for this author in PubMed Google Scholar
Nijara Kalita
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have equal contributions in this work.

Corresponding author

Correspondence to Aniruddha Deka.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Deka, A., Kalita, N. An effective speaker adaption using deep learning for the identification of speakers in emergency situation. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19373-8

Download citation

Received: 02 August 2023
Revised: 26 March 2024
Accepted: 06 May 2024
Published: 02 July 2024
DOI: https://doi.org/10.1007/s11042-024-19373-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective speaker adaption using deep learning for the identification of speakers in emergency situation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

Deep Learning Approaches for Speech Analysis: A Critical Insight

Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments

Data availability statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An effective speaker adaption using deep learning for the identification of speakers in emergency situation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep Learning-Based End-to-End Speaker Identification Using Time–Frequency Representation of Speech Signal

Deep Learning Approaches for Speech Analysis: A Critical Insight

Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments

Data availability statement

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation