Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Sreekanth, Sankala; Sri Rama Murty, Kodukula

doi:10.1007/s00034-024-02665-7

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Published: 22 April 2024

(2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

85 Accesses
Explore all metrics

Abstract

Automatic speaker verification (ASV) is the task of authenticating the claimed identity of a speaker from his/her voice characteristics. Despite the improved performance achieved by deep neural network (DNN)-based ASV systems, recent investigations exposed their vulnerability to adversarial attacks. Although the literature suggested a few defense strategies to mitigate the threat, most works fail to explain the characteristics of adversarial noise and its effect on speech signals. Understanding the effect of adversarial noise on signal characteristics helps in devising effective defense strategies. A closer analysis of adversarial noise characteristics reveals that the adversary predominantly manipulates the low-energy regions in the time–frequency representation of the test speech signal to overturn the ASV system decision. Inspired by this observation, we employed spectral masking techniques to arrest the information flow from the low-energy regions of the magnitude spectrogram. It is observed that the ASV system trained with masked spectral features is more robust to adversarial examples than the one trained on raw features. In addition, the proposed spectral masking strategy is compared with the most widely used adversarial training defense. The proposed method offers a relative improvement of 17.6 % and 23.7 % compared to the adversarial training defense for 48 and 33 dB SNR attacks, respectively. Finally, the feature sensitivity analysis is performed to demonstrate the robustness of the proposed approach against adversarial attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Automatic speaker verification systems and spoof detection techniques: review and analysis

Article 16 August 2021

Fake Speech Detection Using Modulation Spectrogram

Vulnerability issues in Automatic Speaker Verification (ASV) systems

Article Open access 10 February 2024

Data availability

The data sets generated during and/or analyzed during the current study are available in the Voxceleb-1 repository https://www.robots.ox.ac.uk/vgg/data/voxceleb. The noises utilized to perform data augmentation are available at https://www.openslr.org/17 and https://www.openslr.org/28.

Notes

Python WebRTC VAD interface: https://github.com/wiseman/py-webrtcvad.

References

K. Amino, T. Sugawara, T. Arai, Idiosyncrasy of nasal sounds in human speaker identification and their acoustic properties. Acoust. Sci. Technol. 27(4), 233–235 (2006)
Article Google Scholar
Z. Bai, X.L. Zhang, Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021)
Article Google Scholar
A. Chan, Y. Tay, Y.S. Ong, J. Fu, Jacobian adversarially regularized networks for robustness. ar**v:1912.10185 (2019)
L.C. Chang, Z. Chen, C. Chen, G. Wang, Z. Bi, Defending against adversarial attacks in speaker verification systems. In: 2021 IEEE International Performance, Computing, and Communications Conference (IPCCC), pp. 1–8. IEEE (2021)
X. Chen, J. Wang, X.L. Zhang, W.Q. Zhang, K. Yang, Lmd: A learnable mask network to detect adversarial examples for speaker verification. ar**v:2211.00825 (2022)
X. Chen, J. Yao, X.L. Zhang, Masking speech feature to detect adversarial examples for speaker verification. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 191–195. IEEE (2022)
Z. Chen, S. Wang, Y. Qian, K. Yu, Channel invariant speaker embedding learning with joint multi-task and adversarial training. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6574–6578. IEEE (2020)
J. Deng, J. Guo, N. Xue, S. Zafeiriou, Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690–4699 (2019)
I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples. ar**v preprint ar**v:1412.6572 (2014)
Google: Webrtcvad. https://webrtc.org/
IDVoice: Innovative voice verification software from id r &d. https://www.idrnd.ai/voice-biometrics/
A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, A. Madry, Adversarial examples are not bugs, they are features. In: Advances in neural information processing systems, 32, (2019)
M. Kiefte, Formants in speech perception. J. Acoust. Soc. Am. 140(4), 3162 (2016)
Article Google Scholar
F. Kreuk, Y. Adi, M. Cisse, J. Keshet, Fooling end-to-end speaker verification with adversarial examples. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1962–1966. IEEE (2018)
A. Kurakin, I.J. Goodfellow, S. Bengio, Adversarial examples in the physical world, in Artificial intelligence safety and security. (Chapman and Hall, Boca Raton, 2018), pp.99–112
Chapter Google Scholar
X. Li, J. Zhong, X. Wu, J. Yu, X. Liu, H. Meng, Adversarial attacks on GMM i-vector based speaker verification systems. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6579–6583. IEEE (2020)
M. MohammadAmini, D. Matrouf, J.F. Bonastre, S. Dowerah, R. Serizel, D. Jouvet, Learning noise robust resnet-based speaker embedding for speaker recognition. In: Odyssey 2022: The Speaker and Language Recognition Workshop (2022)
A. Nagrani, J.S. Chung, A. Zisserman, Voxceleb: a large-scale speaker identification dataset. ar**v preprint ar**v:1706.08612 (2017)
M. Pal, A. Jati, R. Peri, C.C. Hsu, W. AbdAlmageed, S. Narayanan, Adversarial defense for deep speaker recognition using hybrid adversarial training. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6164–6168. IEEE (2021)
B.S.M. Rafi, S. Sankala, K.S.R. Murty, Relative significance of speech sounds in speaker verification systems. Circuits Syst. Signal Process. 42(9), 5412–5427 (2023)
Article Google Scholar
Real time network, text and speaker analytics for combating organized crime (Roxanne). https://roxanne-euproject.org/
M. Sambur, Selection of acoustic features for speaker identification. IEEE Trans. Acoust. Speech Signal Process. 23(2), 176–182 (1975)
Article Google Scholar
A. Shafahi, M. Najibi, M.A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L.S. Davis, G. Taylor, T. Goldstein, Adversarial training for free!. In: Advances in Neural Information Processing Systems, 32 (2019)
S.I.I.P., (SIIP).: https://www.interpol.int/en/Who-we-are/Legal-framework/Information-communications-and-technology-ICT-law-projects/Speaker-Identification-Integrated-Project-SIIP
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, S. Khudanpur, X-vectors: Robust dnn embeddings for speaker recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5329–5333. IEEE (2018)
P. Vaishnavi, T. Cong, K. Eykholt, A. Prakash, A. Rahmati, Can attention masks improve adversarial robustness? In: Engineering Dependable and Secure Machine Learning Systems: Third International Workshop, EDSMLS 2020, New York City, NY, USA, Feb 7, 2020, Revised Selected Papers 3, pp. 14–22. Springer (2020)
J. Villalba, Y. Zhang, N. Dehak, x-vectors meet adversarial attacks: Benchmarking adversarial robustness in speaker verification. In: INTERSPEECH, pp. 4233–4237 (2020)
Voiceprint, T.B.: https://www.tdbank.com/bank/tdvoiceprint.html
H. Wu, P.c. Hsu, J. Gao, S. Zhang, S. Huang, J. Kang, Z. Wu, H. Meng, H.y. Lee, Adversarial sample detection for speaker verification by neural vocoders. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 236–240. IEEE (2022)
H. Wu, X. Li, A.T. Liu, Z. Wu, H. Meng, H.Y. Lee, Improving the adversarial robustness for speaker verification by self-supervised learning. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 202–217 (2021)
Article Google Scholar
C. **e, Y. Wu, L.v.d. Maaten, A.L. Yuille, K. He, Feature denoising for improving adversarial robustness. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 501–509 (2019)
Y.Q. Yu, S. Zheng, H. Suo, Y. Lei, W.J. Li, Cam: context-aware masking for robust speaker verification. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6703–6707. IEEE (2021)
Y. Zhu, T. Ko, D. Snyder, B. Mak, D. Povey, Self-attentive speaker embeddings for text-independent speaker verification. In: Interspeech, 2018, 3573–3577 (2018)

Download references

Acknowledgements

This work was supported by DST National Mission Interdisciplinary Cyber-Physical Systems (NM-ICPS), Technology Innovation Hub on Autonomous Navigation and Data Acquisition Systems: TiHAN Foundations at Indian Institute of Technology (IIT) Hyderabad

Author information

Authors and Affiliations

Speech Information Processing Lab, Department of Electrical Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India
Sankala Sreekanth & Kodukula Sri Rama Murty

Authors

Sankala Sreekanth
View author publications
You can also search for this author in PubMed Google Scholar
Kodukula Sri Rama Murty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sankala Sreekanth.

Ethics declarations

Conflict of interest

We declare that we have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sreekanth, S., Sri Rama Murty, K. Defending Adversarial Attacks Against ASV Systems Using Spectral Masking. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02665-7

Download citation

Received: 26 July 2023
Revised: 09 March 2024
Accepted: 12 March 2024
Published: 22 April 2024
DOI: https://doi.org/10.1007/s00034-024-02665-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic speaker verification systems and spoof detection techniques: review and analysis

Fake Speech Detection Using Modulation Spectrogram

Vulnerability issues in Automatic Speaker Verification (ASV) systems

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation