Secure speaker identification in open and closed environments modeled with symmetric comb filters

Shafik, Amira; Monir, Mohamad; El-Shafai, Walid; Khalaf, Ashraf A. M.; Nassar, M. M.; El-Fishawy, Adel S.; El-Din, M. A. Zein; Dessouky, Moawad I.; El-Rabaie, El-Sayed M.; Abd El-Samie, Fathi E.

doi:10.1007/s11042-023-16463-x

Secure speaker identification in open and closed environments modeled with symmetric comb filters

Published: 13 July 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Amira Shafik¹,
Mohamad Monir¹,
Walid El-Shafai ORCID: orcid.org/0000-0001-7509-2120^1,2,
Ashraf A. M. Khalaf³,
M. M. Nassar¹,
Adel S. El-Fishawy¹,
M. A. Zein El-Din¹,
Moawad I. Dessouky¹,
El-Sayed M. El-Rabaie¹ &
…
Fathi E. Abd El-Samie ORCID: orcid.org/0000-0001-8749-9518^1,4

Abstract

Speech is a fundamental means of human interaction. Speaker Identification (SI) plays a crucial role in various applications, such as authentication systems, forensic investigation, and personal voice assistance. However, achieving robust and secure SI in both open and closed environments remains challenging. To address this issue, researchers have explored new techniques that enable computers to better understand and interact with humans. Smart systems leverage Artificial Neural Networks (ANNs) to mimic the human brain in identifying speakers. However, speech signals often suffer from interference, leading to signal degradation. The performance of a Speaker Identification System (SIS) is influenced by various environmental factors, such as noise and reverberation in open and closed environments, respectively. This research paper is concerned with the investigation of SI using Mel-Frequency Cepstral Coefficients (MFCCs) and polynomial coefficients, with an ANN serving as the classifier. To tackle the challenges posed by environmental interference, we propose a novel approach that depends on symmetric comb filters for modeling. In closed environments, we study the effect of reverberation on speech signals, as it occurs due to multiple reflections. To address this issue, we model the reverberation effect with comb filters. We explore different domains, including time, Discrete Wavelet Transform (DWT), Discrete Cosine Transform (DCT), and Discrete Sine Transform (DST) domains for feature extraction to determine the best combination for SI in case of reverberation environments. Simulation results reveal that DWT outperforms other transforms, leading to a recognition rate of 93.75% at a Signal-to-Noise Ratio (SNR) of 15 dB. Additionally, we investigate the concept of cancelable SI to ensure user privacy, while maintaining high recognition rates. Our simulation results show a recognition rate of 97.5% at 0 dB using features extracted from speech signals and their DCTs. For open environments, we implement a robust Automatic Speaker Identification (ASI) system that is capable of handling noise and interference. In this system, we apply Discrete Transforms (DTs) like DCT, DST, and DWT on degraded speech signals to extract robust features. The proposed system incorporates enhancement techniques, such as Spectral Subtraction (SS), Wiener Filtering (WF), Adaptive Wiener Filtering (AWF), and wavelet de-noising, to improve its performance and accuracy of SI. The results demonstrate the effectiveness of the proposed SIS, even under challenging conditions like low SNR and significant music interference. Leveraging features extracted from signals and their DWTs proves to be highly beneficial, achieving a recognition rate of 97.5% at 15 dB. Furthermore, wavelet de-noising contributes significantly to eliminating noise, while preserving the essential signals, resulting in improved performance. Additionally, we conduct a thorough investigation of the system sensitivity to telephone channel degradations, as well as the impact of interference and noise. By employing DWT and innovative modeling techniques, our research contributes to advancing robust SISs, which can be involved in promising applications in various domains such as security, personal assistance, and forensics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 36

Data availability

All data are available upon request from the corresponding author.

Abbreviations

ACF:: Auto-correlation Function
AMDF:: Average Magnitude Difference Function
ANN:: Artificial Neural Network
ASI:: Automatic Speaker Identification
ASR:: Automatic Speaker Recognition
AWGN:: Additive White Gaussian Noise
AWF:: Adaptive Wiener Filter
DCT:: Discrete Cosine Transform
DFT:: Discrete Fourier Transform
DFT:: Discrete Fourier Transform
DST:: Discrete Sine Transform
DT:: Discrete Transform
DWT:: Discrete Wavelet Transform
EMD:: Empirical Mode Decomposition
ENV:: Envelop
GD:: Gradient Descent
GMM:: Gaussian Mixture Model
IMF:: Intrinsic Model Functions
LMMSE:: Linear Minimum Mean Square Error
MFCC:: Mel Frequency Cepstral Coefficient
MLP:: Multi-layer perceptron
MSE:: Mean Square Error
NPF:: Normalized Pitch Frequency
PL:: Pooling Layer
PLDA:: Probabilistic Linear Discriminant Analysis
PR:: Perfect Reconstruction
RASTA-PLP:: Relative Spectral Transform Perceptual Linear Prediction
RR:: Recognition Rate
SCG:: Scaled Conjugate Gradient back-propagation
SG:: Savitzky Golay
SIS:: Speaker Identification System
SNR:: Signal-to-Noise Ratio
SS:: Spectral Subtraction
SVD:: Support Vector Machine
SVD:: Singular Value Decomposition
TFS:: Temporal Fine Structure
WF:: Wiener Filter
AWF:: Adaptive Wiener Filter

References

Abd El-Fattah MA, Dessouky MI, Diab SM, El-Samie FEA (2008) Speech enhancement using adaptive wiener filtering approach. Progress In Electromagnet ResM 4:167–184
Article Google Scholar
Abd El-Moneim S, Dessouky MI, Abd El-Samie FE, Nassar MA, Abd El-Naby M (2015) Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification. int J Speech Technol 18(4):555–564
Article Google Scholar
Abd El-Samie FE, Shafik A, El-sayed HS, Elhalafawy SM, Diab SM, Sallam BM, Faragallah OS (2015) Sensitivity of automatic speaker identification to SVD digital audio watermarking. Int J Speech Technol 18(4):565–581
Article Google Scholar
Al-Nuaimy W, El-Bendary MAM, Shafik A, Shawki F, Abou-El-azm AE, El-Fishawy NA, Elhalafawy SM, Diab SM, Sallam BM, Abd El-Samie FE, Kazemian HB (2011) "An SVD Audio Watermarking Approach Using Chaotic Encrypted Images," Digit Signal Process, Vol. 21, No. 6, pp. 764–779, Elsevier
Chakroun R, Frikha M (2020) Robust features for text-independent speaker recognition with short utterances. Neural Comput Appl 32(17):13863–13883
Article Google Scholar
Chen P, Cuzzocrea A, **aoyong D, Kara O, Liu T, Sivalingam KM, Ślęzak D, Washio T, Yang X (2018) “Recent Trends in Image Processing and Pattern Recognition” Second International Conference, RTIP2R 2018,Solapur, India, December 21–22, Revised Selected Papers, Part I, Part of the Communications in Computer and Information Science book series (CCIS, volume 1035)
El-Gazar S, El Shafai W, El Banby GM, Hamed HF, Salama GM, Abd-Elnaby M, Abd El-Samie FE (2022) Cancelable speaker identification system based on optical-like encryption algorithms. Comput Syst Sci Eng 43(1):87–102
Article Google Scholar
El-Kfafy HS et al (2020) Efficient remote access system based on decoded and decompressed speech signals. Multimed Tools Appl 79(31):22293–22324
Article Google Scholar
El-Moneim SA, El-Rabaie E-SM, Nassar MA, Dessouky MI, Ismail NA, El-Fishawy AS, Abd El-Samie FE (2020) Speaker recognition based on pre-processing approaches. Int J Speech Technol 23(2):435–442
Article Google Scholar
El-Moneim SA, Sedik A, Nassar MA, El-Fishawy AS, Sharshar AM, Hassan SEA, Mahmoud AZ, Dessouky MI, El-Banby GM, Abd El-Samie FE, El-Rabaie E-SM, Neyazi B, Seddeq HS, Ismail NA, Khalaf AAM, Elabyad GSM (2021) “Text-dependent and text-independent speaker recognition of reverberant speech based on CNN”, International journal of speech technology, Springer
El-Shafai W, Elsayed M, Rashwan M, Dessouky M, El-Fishawy A, Soliman NF, … Abd El-Samie FE (2023) Optical ciphering scheme for cancellable speaker identification system. Comput Syst Sci Eng 45(1):563–578
Article Google Scholar
Farge M, Kevlahan NK-R, Perrier V, Goirand E (2012) Wavelets and turbulence. Proc IEEE 84:639–669
Article Google Scholar
Ge M, Wang L, Nakagawa S, Kawakami Y, Dang J, Li X (2018) Pitch synchronized relative phase with peak error detection for noise-robust speaker recognition. In: 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei, pp 156–160. https://doi.org/10.1109/ISCSLP.2018.8706701.
Gupta K, Gupta D (2016) An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system. In: 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), Noida, pp 493–497. https://doi.org/10.1109/CONFLUENCE.2016.7508170
Hammam H, El-Shafai W, Hassan E, Abu El-Azm AE, Dessouky MI, Elhalawany ME, Abd El-Samie FE (2021) Blind signal separation with noise reduction for efficient speaker identification. Int J Speech Technol 24:235–250
Article Google Scholar
Hassan B, Ahmed R, Li B, Hassan O, Hassan T (2019) "Autonomous framework for person identification by analyzing vocal sounds and speech patterns", 5th international conference on control, automation and robotics (ICCAR), Bei**g, China, 19-22, pp. 649–653
Haykin S (2002) “Adaptive filter theory,” Pearson Education, 4th edition
Karayiannis NB, Venetsanopoulos AN (1990) Regularization theory in image restoration-the stabilizing functional approach. IEEE Trans Acoust Speech Signal Process 38(7):1155–1179
Article MathSciNet Google Scholar
Khalil AA, Abd MM, Elnaby EM, Saad AY, Al-nahari NA-Z, El-Bendary MAM, Abd FE, El-Samie. (2014) Efficient speaker identification from speech transmitted over Bluetooth networks. Int J Speech Technol 17(4):409–416. https://doi.org/10.1007/s10772-014-9238-4
Article Google Scholar
Khalil MI, Mamun N, Akter K (2019) A robust text dependent speaker identification using neural responses from the model of the auditory system," In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox'sBazar, pp 1–4. https://doi.org/10.1109/ECACE.2019.8679215.
Kumar K, Kim C, Stern RM (2011) Delta-spectral cepstral coefficients for robust speech recognition. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, pp 4784–4787. https://doi.org/10.1109/ICASSP.2011.5947425
Kuo SM, Lee BH, Tian W (2013) Real-time digital signal processing: fundamentals, implementations and applications. John Wiley & Sons
Google Scholar
Lim JS, Oppenheim AV (1979) Enhancement and bandwidth compression of noisy speech. Proc IEEE 6(12):1586–1604. https://doi.org/10.1109/PROC.1979.11540.
Maas R, Habets EAP, Sehr A, Kellermann W (2012) On the application of reverberation suppression to robust speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto, pp 297–300. https://doi.org/10.1109/ICASSP.2012.6287875
Chapter Google Scholar
Martinez J, Perez-Meana H, Escamilla-Hernandez E, Suzuki M (2012) Speaker recognition using Mel frequency cepstral coefficients (Mfcc) and Vector quantization (Vq) techniques. https://doi.org/10.1109/CONIELECOMP.2012.6189918
Book Google Scholar
Mukherjee H, Obaidullah SM, Santosh KC, Phadikar S, Roy K (2020) A lazy learning-based language identification from speech using Mfcc-2 features. Int J Mach Learn Cybern 11(1):1–14. https://doi.org/10.1007/s13042-019-00928-3
Article Google Scholar
Mukherjee H, Dhar A, Sk M, Obaidullah KC, Santosh SP, Roy K (2020) Linear Predictive Coefficients-Based Feature to Identify Top-Seven Spoken Languages. Int J Pattern Recognit Artif Intell, Signal Process 34(6)
Nasr MA, Abd-Elnaby M, El-Fishawy AS, El-Rabaie S, El-Samie FEA (2018) Speaker identification based on normalized pitch frequency and Mel frequency cepstral coefficients. Int J Speech Technol 21:941–951
Article Google Scholar
Saha B, Khan S, Shahnaz C, Fattah SA, Islam MT, Khan AI (2018) Configurable digital hearing aid system with reduction of noise for speech enhancement using spectral subtraction method and frequency dependent amplification. In: TENCON 2018 - 2018 IEEE Region 10 Conference, Jeju, pp 0735–0740. https://doi.org/10.1109/TENCON.2018.8650450
Soliman NF, Mostfa Z, El-Samie FEA, Abdalla MI (2017) Performance enhancement of speaker identification systems using speech encryption and cancelable features. Int J Speech Technol 20(4):977–1004. https://doi.org/10.1007/s10772-017-9435-z
Article Google Scholar
Tirumala SS, Shahamiri SR, Garhwal AS, Wang R (2017) Speaker identification features extraction methods: A systematic review. Expert Syst Appl 90:250–271. https://doi.org/10.1016/j.eswa.2017.08.015
Article Google Scholar
Trang H, Loc TH, Nam HBH (2014) “Proposed Combination of PCA and MFCC Feature Extraction in Speech Recognition System,” International Conference on Advanced Technologies for Communications (ATC 2014)
Wang W , Li S , Yang J, Liu Z, Weicun Zhou O (2016) Feature Extraction of Underwater Target in Auditory Sensation Area Based on MFCC,” , IEEE/OES China Ocean Acoustics (COA)
Wu J (2012) “Speaker Recognition System Based on MFCC and Schmm.” Symposium on ICT and Energy Efficiency and Workshop on Information Theory and Security (CIICT 2012), pp. 88–92

Download references

Acknowledgements

The authors are very grateful to all the institutions given in the affiliation list for performing this research work successfully. The authors would like to thank Prince Sultan University for their support.

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Department Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
Amira Shafik, Mohamad Monir, Walid El-Shafai, M. M. Nassar, Adel S. El-Fishawy, M. A. Zein El-Din, Moawad I. Dessouky, El-Sayed M. El-Rabaie & Fathi E. Abd El-Samie
Security Engineering Laboratory, Department of Computer Science, Prince Sultan University, Riyadh, 11586, Saudi Arabia
Walid El-Shafai
Electrical Engineering Department, Faculty of Engineering, Minia University, Minia, 61519, Egypt
Ashraf A. M. Khalaf
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
Fathi E. Abd El-Samie

Authors

Amira Shafik
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Monir
View author publications
You can also search for this author in PubMed Google Scholar
Walid El-Shafai
View author publications
You can also search for this author in PubMed Google Scholar
Ashraf A. M. Khalaf
View author publications
You can also search for this author in PubMed Google Scholar
M. M. Nassar
View author publications
You can also search for this author in PubMed Google Scholar
Adel S. El-Fishawy
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Zein El-Din
View author publications
You can also search for this author in PubMed Google Scholar
Moawad I. Dessouky
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed M. El-Rabaie
View author publications
You can also search for this author in PubMed Google Scholar
Fathi E. Abd El-Samie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors equally contributed.

Corresponding author

Correspondence to Amira Shafik.

Ethics declarations

Ethics approval

All authors contributed and accepted to submit the current work.

Consent to participate

All authors contributed and accepted to submit the current work.

Consent to publish

All authors agreed to submit and publish this work.

Competing interests

The authors have neither relevant financial nor non-financial interests to disclose.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shafik, A., Monir, M., El-Shafai, W. et al. Secure speaker identification in open and closed environments modeled with symmetric comb filters. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-023-16463-x

Download citation

Received: 17 November 2022
Revised: 05 August 2023
Accepted: 08 August 2023
Published: 13 July 2024
DOI: https://doi.org/10.1007/s11042-023-16463-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Institutional subscriptions

Secure speaker identification in open and closed environments modeled with symmetric comb filters

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent to publish

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation