Abstract
Deep neural networks (DNN) exhibit powerful feature extraction capabilities, making them highly advantageous in numerous tasks. DNN-based techniques have become widely adopted in the field of speaker recognition. However, imperceptible adversarial perturbations can severely disrupt the decisions made by DNNs. In addition, researchers identified universal adversarial perturbations that can efficiently and significantly attack deep neural networks. In this paper, we propose an algorithm for conducting effective universal adversarial attacks by investigating the dominant features in the speaker recognition task. Through experiments in various scenarios, we find that our perturbations are not only more effective and undetectable but also exhibit a certain degree of transferablity across different datasets and models.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-024-01274-3/MediaObjects/11280_2024_1274_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-024-01274-3/MediaObjects/11280_2024_1274_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-024-01274-3/MediaObjects/11280_2024_1274_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-024-01274-3/MediaObjects/11280_2024_1274_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11280-024-01274-3/MediaObjects/11280_2024_1274_Fig4_HTML.png)
Similar content being viewed by others
Availability of data and materials
All dataset can be accessed through their respective official websites or mirrors. The VCTK dataset can be accessed at https://datashare.ed.ac.uk/handle/10283/3443. The TIMIT dataset can be accessed at https://catalog.ldc.upenn.edu/LDC93S1. The LibriSpeech dataset can be accessed at https://www.openslr.org/12.
References
Singh, S.P., Kumar, A., Darbari, H., Singh, L., Rastogi, A., Jain, S.: Machine translation using deep learning: An overview. In: 2017 international conference on computer, communications and electronics (comptelix), pp. 162–167 (2017) IEEE
Deng, L., Platt, J.: Ensemble deep learning for speech recognition. In: Proc. interspeech (2014)
Grigorescu, S., Trasnea, B., Cocias, T., Macesanu, G.: A survey of deep learning techniques for autonomous driving. J. Field Robot. 37(3), 362–386 (2020)
Zhao, A., Gu, Z., Jia, Y., Feng, W., Zhang, Y.: TSEE: a novel knowledge embedding framework for cyberspace security (2023)
Du, L., Gu, Z., Wang, Y., Wang, L., Jia, Y.: A Few-Shot Class-Incremental Learning Method for Network Intrusion Detection. IEEE Trans. Netw. Serv, Manag (2023)
Jia, Y., Gu, Z., Du, L., Long, Y., Wang, Y., Li, J., Zhang, Y.: Artificial intelligence enabled cyber security defense for smart cities: A novel attack detection framework based on the MDATA model. Knowl.-Based Syst. 276, 110781 (2023)
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3523–3542 (2021)
Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Chen, Y., Lin, Z., Zhao, X., Wang, G., Gu, Y.: Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7(6), 2094–2107 (2014)
Jia, Y., Gu, Z., Jiang, Z., Gao, C., Yang, J.: Persistent graph stream summarization for real-time graph analytics. World Wide Web, 1–21 (2023)
Soewito, B., Gaol, F.L., Simanjuntak, E., Gunawan, F.E.: Smart mobile attendance system using voice recognition and fingerprint on smartphone. In: 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA), pp. 175–180 (2016). IEEE
Dimaunahan, E.D., Ballado, A.H., Cruz, F.R.G., Cruz, J.C. D.: MFCC and VQ voice recognition based ATM security for the visually disabled. In: 2017IEEE 9th international conference on humanoid, nanotechnology, information technology, communication and control, environment and management (HNICEM), pp. 1–5 (2017). IEEE
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations, ICLR 2014 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and Harnessing Adversarial Examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6572
Zhang, H., Gu, Z., Tan, H., Wang, L., Zhu, Z., **e, Y., Li, J.: Masking and purifying inputs for blocking textual adversarial attacks. Inf. Sci. 648, 119501 (2023)
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. ar**v preprint ar**v:1805.12152 (2018)
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P), pp. 372–387 (2016). IEEE
**e, C., Tan, M., Gong, B., Wang, J., Yuille, A.L., Le, Q.V.: Adversarial examples improve image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 819–828 (2020)
Li, X., Zhong, J., Wu, X., Yu, J., Liu, X., Meng, H.: Adversarial attacks on GMM i-vector based speaker verification systems. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6579–6583 (2020). IEEE
Shamsabadi, A.S., Teixeira, F.S., Abad, A., Raj, B., Cavallaro, A., Trancoso, I.: Foolhd: Fooling speaker identification by highly imperceptible adversarial disturbances. In: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6159–6163 (2021). IEEE
Chen, G., Chenb, S., Fan, L., Du, X., Zhao, Z., Song, F., Liu, Y.: Who is real bob? adversarial attacks on speaker recognition systems. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 694–711 (2021). IEEE
Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: Robust dnn embeddings for speaker recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5329–5333 (2018). IEEE
Bhuvaneshwari, A. and Hemalatha, R. and Satyasavithri, T.: Performance evaluation of Dynamic Neural Networks for mobile radio path loss prediction. In: 2016 IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON), pp. 461–466 (2016) https://doi.org/10.1109/UPCON.2016.7894698
Desplanques, B., Thienpondt, J., Demuynck, K.: Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. ar**v preprint ar**v:2005.07143 (2020)
Ravanelli, M., Bengio, Y.: Speaker recognition from raw waveform with sincnet. In: 2018 IEEE spoken language technology workshop (SLT), pp. 1021–1028 (2018). IEEE
Biggio, B., Corona, I., Maiorca, D., Nelson, Blaine and Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, pp. 387–402 (2013). Springer
Wang, Jiakai: Adversarial Examples in Physical World. In: IJCAI, pp. 4925–4926 (2021)
Aleksander Madry and Aleksandar Makelov and Ludwig Schmidt and Dimitris Tsipras and Adrian Vladu: Towards Deep Learning Models Resistant to Adversarial Attacks. In: International Conference on Learning Representations (2018) https://openreview.net/forum?id=rJzIBfZAb
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9185–9193 (2018)
Tan, H., Gu, Z., Wang, L., Zhang, H., Gupta, B.B., Tian, Z.: Improving adversarial transferability by temporal and spatial momentum in urban speaker recognition systems. Comput. Electr. Eng. 104, 108446 (2022)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 ieee symposium on security and privacy (sp), pp. 39–57 (2017). Ieee
Zhang, L., Meng, Y., Yu, J., ** Li
Corresponding authors
Ethics declarations
Ethics approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Tan, H., Zhang, J. et al. Transferable universal adversarial perturbations against speaker recognition systems. World Wide Web 27, 33 (2024). https://doi.org/10.1007/s11280-024-01274-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11280-024-01274-3