Log in

An effective speaker adaption using deep learning for the identification of speakers in emergency situation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automated speaker identification is an important research topic in recent advanced technologies. This process helps to analyze the speakers in their emergencies. Various existing approaches are used for speaker identification, and the existing systems cannot distinguish background noises such as music and traffic, which can cause signal distortion. Moreover, the existing speaker identification techniques do not have better learning ability and face high computational complexity. Also, they failed to recognize the speaker’s condition because of the inability to extract suitable keywords and noisy speeches. To overcome this issues, the proposed study introduced a new deep-learning mechanism for effective speaker identification in emergency situations. The input audio files are initially collected, and pre-processing is performed to reduce the noise issue using the original speech separation process. The keywords are spotted using the proposed Mel-Frequency Cepstral Coefficients with Binary Weighted Network (MFCC-BWN) from the pre-processed data. This keyword extraction step helps to improve the recognition accuracy because of the significant attributes. Finally, audio files using Adaptive Improved LSTM (AILSTM) identify the speaker’s situation. The simulation result analysis shows that the proposed model outperforms the other comparable methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability statement

Data sharing not applicable to this article.

References

  1. Ohi AQ, Mridha MF, Hamid MA, Monowar MM (2021) Deep speaker recognition: process, progress, and challenges. IEEE Access 9:89619–89643

    Article  Google Scholar 

  2. Ye F, Yang J (2021) A deep neural network model for speaker identification. Appl Sci 11(8):3603

    Article  Google Scholar 

  3. Deschamps-Berger T, Lamel L, Devillers L (2022) Investigating transformer encoders and fusion strategies for speech emotion recognition in emergency call center conversations. Incompanion publication of the 2022 international conference on multimodal interaction, 144–153

  4. Rataj J, Helmke H, Ohneiser O (2021) AcListant with continuous learning: speech recognition in air traffic control. InAir traffic management and systems IV: selected papers of the 6th ENRI international workshop on ATM/CNS (EIWAC2019) 6:93–109. Springer, Singapore

  5. Colla M, Santos GD, Oliveira GA, de Vasconcelos RB (2023) Ambulance response time in a Brazilian emergency medical service. Socioecon Plann Sci 85:101434

    Article  Google Scholar 

  6. Yamazaki Y, Tamaki M, Premachandra C, Perera CJ, Sumathipala S, Sudantha BH (2019) Victim detection using UAV with on-board voice recognition system. In2019 Third IEEE International Conference on Robotic Computing (IRC), 555–559. IEEE

  7. Gao J, Xu Z, Liang Z, Liao H (2019) Expected consistency-based emergency decision making with incomplete probabilistic linguistic preference relations. Knowl-Based Syst 176:15–28

    Article  Google Scholar 

  8. Briton D (2020) Is this an emergency?’ What is an emergency on a school expedition?

  9. Tursunov A, Mustaqeem CJY, Kwon S (2021) Age and gender recognition using a convolutional neural network with a specially designed multi-attention module through speech spectrograms. Sensors 21(17):5892

    Article  Google Scholar 

  10. Kröger JL, Lutz OH, Raschke P (2020) Privacy implications of voice and speech analysis–information disclosure by inference. Privacy and identity management. Data for better living: AI and privacy: 14th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2. 2 International Summer School, Windisch, Switzerland, August 19–23, 2019. Rev Select Papers 14:242–258

    Google Scholar 

  11. Bastanfard A, Abbasian A (2023) Speech emotion recognition in Persian based on stacked autoencoder by comparing local and global features. Multimedia tools and applications. 1–8

  12. Yousefi M (2021) Deep learning based methods for detection, separation, and recognition of overlap** speech. The University of Texas at Dallas

  13. Wei J, Dingler T, Kostakos V (2021) Understanding User Perceptions of Proactive Smart Speakers. Proc ACM Inter Mobile Wear Ubiquitous Technol 5(4):1–28

    Article  Google Scholar 

  14. Alnuaim AA, Zakariah M, Shashidhar C, Hatamleh WA, Tarazi H, Shukla PK, Ratna R (2022) Speaker gender recognition based on deep neural networks and ResNet50. Wirel Commun Mob Comput 2022:1–3

    Article  Google Scholar 

  15. Ma P, Petridis S, Pantic M (2021) End-to-end audio-visual speech recognition with conformers. InICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 7613–7617. IEEE

  16. Rejaibi E, Komaty A, Meriaudeau F, Agrebi S, Othmani A (2022) MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed Signal Process Control 71:103107

    Article  Google Scholar 

  17. Dangol R, Alsadoon A, Prasad PW, Seher I, Alsadoon OH (2020) Speech emotion recognition UsingConvolutional neural network and long-short TermMemory. Multimedia Tools Appl 79:32917–32934

    Article  Google Scholar 

  18. O’Donovan R, McAuliffe E (2020) A systematic review exploring the content and outcomes of interventions to improve psychological safety, speaking up and voice behaviour. BMC Health Serv Res 20(1):1–1

    Article  Google Scholar 

  19. Guerrieri A, Braccili E, Sgrò F, Meldolesi GN (2022) Gender identification in a two-level hierarchical speech emotion recognition system for an Italian Social Robot. Sensors 22(5):1714

    Article  Google Scholar 

  20. Snyder D, Garcia-Romero D, Sell G, McCree A, Povey D, Khudanpur S (2019) Speaker recognition for multi-speaker conversations using x-vectors. InICASSP 2019–2019 IEEE International conference on acoustics, speech and signal processing (ICASSP) 5796–5800. IEEE

  21. Valizada A, Akhundova N, Rustamov S (2021) Development of speech recognition systems in emergency call centers. Symmetry 13(4):634

    Article  Google Scholar 

  22. Singh YB, Goel S (2021) An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning. Multimed Tools Appl 80(9):14001–14018

    Article  Google Scholar 

  23. Ahmad J, Sajjad M, Rho S, Kwon SI, Lee MY, Baik SW (2018) Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture. Multimedia Tools Appl 77:4883–4907

    Article  Google Scholar 

  24. Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS (2023) Speech emotion recognition through hybrid features and convolutional neural network. Appl Sci 13(8):4750

    Article  Google Scholar 

  25. Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Expert Syst Appl 193:116469

    Article  Google Scholar 

  26. Barhoush M, Hallawa A, Schmeink A (2023) Speaker identification and localization using shuffled MFCC features and deep learning. Int J Speech Technol 26(1):185–196

    Article  Google Scholar 

  27. Abraham JVT, Khan AN, Shahina A (2023) A deep learning approach for robust speaker identification using chroma energy normalized statistics and mel frequency cepstral coefficients. Int J Speech Technol 26(3):579–587

    Article  Google Scholar 

  28. Garain A, Singh PK, Sarkar R (2021) FuzzyGCP: a deep learning architecture for automatic spoken language identification from speech signals. Expert Syst Appl 168:114416

    Article  Google Scholar 

  29. Shahin I, Nassif AB, Hindawi N (2021) Speaker identification in stressful talking environments based on convolutional neural network. Int J Speech Technol 24(4):1055–1066

    Article  Google Scholar 

  30. Hamsa S, Shahin I, Iraqi Y, Damiani E, Nassif AB, Werghi N (2023) Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG. Expert Syst Appl 224:119871

    Article  Google Scholar 

  31. Jahangir R, The YW, Memon NA, Mujtaba G, Zareei M, Ishtiaq U, Akhtar MZ, Ali I (2020) Text-independent speaker identification through feature fusion and deep neural network. IEEE Access 8:32187–32202

    Article  Google Scholar 

  32. Almarshady NM, Alashban AA, Alotaibi YA (2023) Analysis and investigation of speaker identification problems using deep learning networks and the YOHO English speech dataset. Appl Sci 13(17):9567

    Article  Google Scholar 

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contributions in this work.

Corresponding author

Correspondence to Aniruddha Deka.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deka, A., Kalita, N. An effective speaker adaption using deep learning for the identification of speakers in emergency situation. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19373-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19373-8

Keywords

Navigation