Abstract
Automated speaker identification is an important research topic in recent advanced technologies. This process helps to analyze the speakers in their emergencies. Various existing approaches are used for speaker identification, and the existing systems cannot distinguish background noises such as music and traffic, which can cause signal distortion. Moreover, the existing speaker identification techniques do not have better learning ability and face high computational complexity. Also, they failed to recognize the speaker’s condition because of the inability to extract suitable keywords and noisy speeches. To overcome this issues, the proposed study introduced a new deep-learning mechanism for effective speaker identification in emergency situations. The input audio files are initially collected, and pre-processing is performed to reduce the noise issue using the original speech separation process. The keywords are spotted using the proposed Mel-Frequency Cepstral Coefficients with Binary Weighted Network (MFCC-BWN) from the pre-processed data. This keyword extraction step helps to improve the recognition accuracy because of the significant attributes. Finally, audio files using Adaptive Improved LSTM (AILSTM) identify the speaker’s situation. The simulation result analysis shows that the proposed model outperforms the other comparable methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-024-19373-8/MediaObjects/11042_2024_19373_Fig16_HTML.png)
Similar content being viewed by others
Data availability statement
Data sharing not applicable to this article.
References
Ohi AQ, Mridha MF, Hamid MA, Monowar MM (2021) Deep speaker recognition: process, progress, and challenges. IEEE Access 9:89619–89643
Ye F, Yang J (2021) A deep neural network model for speaker identification. Appl Sci 11(8):3603
Deschamps-Berger T, Lamel L, Devillers L (2022) Investigating transformer encoders and fusion strategies for speech emotion recognition in emergency call center conversations. Incompanion publication of the 2022 international conference on multimodal interaction, 144–153
Rataj J, Helmke H, Ohneiser O (2021) AcListant with continuous learning: speech recognition in air traffic control. InAir traffic management and systems IV: selected papers of the 6th ENRI international workshop on ATM/CNS (EIWAC2019) 6:93–109. Springer, Singapore
Colla M, Santos GD, Oliveira GA, de Vasconcelos RB (2023) Ambulance response time in a Brazilian emergency medical service. Socioecon Plann Sci 85:101434
Yamazaki Y, Tamaki M, Premachandra C, Perera CJ, Sumathipala S, Sudantha BH (2019) Victim detection using UAV with on-board voice recognition system. In2019 Third IEEE International Conference on Robotic Computing (IRC), 555–559. IEEE
Gao J, Xu Z, Liang Z, Liao H (2019) Expected consistency-based emergency decision making with incomplete probabilistic linguistic preference relations. Knowl-Based Syst 176:15–28
Briton D (2020) Is this an emergency?’ What is an emergency on a school expedition?
Tursunov A, Mustaqeem CJY, Kwon S (2021) Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors 21(17):5892
Kröger JL, Lutz OH, Raschke P (2020) Privacy implications of voice and speech analysis–information disclosure by inference. Privacy and identity management. Data for better living: AI and privacy: 14th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2. 2 International Summer School, Windisch, Switzerland, August 19–23, 2019. Rev Select Papers 14:242–258
Bastanfard A, Abbasian A (2023) Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features. Multimedia tools and applications. 1–8
Yousefi M (2021) Deep learning based methods for detection, separation, and recognition of overlap** speech. The University of Texas at Dallas
Wei J, Dingler T, Kostakos V (2021) Understanding User Perceptions of Proactive Smart Speakers. Proc ACM Inter Mobile Wear Ubiquitous Technol 5(4):1–28
Alnuaim AA, Zakariah M, Shashidhar C, Hatamleh WA, Tarazi H, Shukla PK, Ratna R (2022) Speaker gender recognition based on deep neural networks and ResNet50. Wirel Commun Mob Comput 2022:1–3
Ma P, Petridis S, Pantic M (2021) End-to-end audio-visual speech recognition with conformers. InICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 7613–7617. IEEE
Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107
Dangol R, Alsadoon A, Prasad PW, Seher I, Alsadoon OH (2020) Speech emotion recognition UsingConvolutional neural network and long-short TermMemory. Multimedia Tools Appl 79:32917–32934
O’Donovan R, McAuliffe E (2020) A systematic review exploring the content and outcomes of interventions to improve psychological safety, speaking up and voice behaviour. BMC Health Serv Res 20(1):1–1
Guerrieri A, Braccili E, Sgrò F, Meldolesi GN (2022) Gender identification in a two-level hierarchical speech emotion recognition system for an Italian Social Robot. Sensors 22(5):1714
Snyder D, Garcia-Romero D, Sell G, McCree A, Povey D, Khudanpur S (2019) Speaker recognition for multi-speaker conversations using x-vectors. InICASSP 2019–2019 IEEE International conference on acoustics, speech and signal processing (ICASSP) 5796–5800. IEEE
Valizada A, Akhundova N, Rustamov S (2021) Development of speech recognition systems in emergency call centers. Symmetry 13(4):634
Singh YB, Goel S (2021) An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning. Multimed Tools Appl 80(9):14001–14018
Ahmad J, Sajjad M, Rho S, Kwon SI, Lee MY, Baik SW (2018) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimedia Tools Appl 77:4883–4907
Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS (2023) Speech emotion recognition through hybrid features and convolutional neural network. Appl Sci 13(8):4750
Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Expert Syst Appl 193:116469
Barhoush M, Hallawa A, Schmeink A (2023) Speaker identification and localization using shuffled MFCC features and deep learning. Int J Speech Technol 26(1):185–196
Abraham JVT, Khan AN, Shahina A (2023) A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients. Int J Speech Technol 26(3):579–587
Garain A, Singh PK, Sarkar R (2021) FuzzyGCP: a deep learning architecture for automatic spoken language identification from speech signals. Expert Syst Appl 168:114416
Shahin I, Nassif AB, Hindawi N (2021) Speaker identification in stressful talking environments based on convolutional neural network. Int J Speech Technol 24(4):1055–1066
Hamsa S, Shahin I, Iraqi Y, Damiani E, Nassif AB, Werghi N (2023) Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG. Expert Syst Appl 224:119871
Jahangir R, The YW, Memon NA, Mujtaba G, Zareei M, Ishtiaq U, Akhtar MZ, Ali I (2020) Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8:32187–32202
Almarshady NM, Alashban AA, Alotaibi YA (2023) Analysis and investigation of speaker identification problems using deep learning networks and the YOHO English speech dataset. Appl Sci 13(17):9567
Funding
No funding is provided for the preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
All authors have equal contributions in this work.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript give full consent for publication of this submitted article.
Conflict of interest
Authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Deka, A., Kalita, N. An effective speaker adaption using deep learning for the identification of speakers in emergency situation. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19373-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19373-8