Abstract
Speech consists of voiced and unvoiced segments that differ in their production process and exhibit different characteristics. In this paper, we investigate the spectral differences between bonafide and spoofed speech for voiced and unvoiced speech segments. We observe that the largest spectral differences lie in the 0–4 kHz band of voiced speech. Based on this observation, we propose a low-complexity, pre-processing stage which subsamples voiced frames prior to spoofing detection. The proposed pre-processing stage is applied to two systems, LFCC+GMM and IA/IF+KNN that differ entirely on the features and classifier used for spoofing detection. Our results show improvement with both systems in detection of the ASVspoof 2019 A17 voice conversion attack, which is recognized to have one of the highest spoofing capabilities. We also show improvements in the A18 and A19 voice conversion attacks for the IA/IF+KNN system. The resulting A17 EERs are lower than all reported systems where the A17 spoofing attack is the worst attack except the Capsule Network. Finally, we note that the proposed pre-processing stage reduces the speech date by more than \(4\times \) due to subsampling and using only voiced frames but at the same time maintaining similar pooled EER as that for the baseline systems, which may be advantageous for resource constrained spoofing detectors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, Z., Li, H.: On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed. Tools Appl. 75(3), 5311–5327 (2015). https://doi.org/10.1007/s11042-015-3080-9
Lindberg, J., Blomberg, M.: Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European Conference on Speech Communication and Technology, pp. 5–9 (1999)
Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1008–1012 (2019)
Ge, W., Patino, J., Todisco, M., Evans, N.: Raw differentiable architecture search for speech deepfake and spoofing detection. In: Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 22–28 (2021)
Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Pearson Education India, New Delhi (2006)
Lovekin, J.M., Yantorno, R.E., Krishnamachari, K.R., Benincasa, D.S., Wenndt, S.J.: Develo** usable speech criteria for speaker identification technology. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 421–424 (2001)
Veaux, C., Yamagishi, J., MacDonald, K., Corpus, V.C.T.K.: English multi-speaker corpus for CSTR voice cloning toolkit. The Centre for Speech Technology Research (CSTR), University of Edinburgh (2017)
Kinnunen, T., et al.: t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. ar**v preprint ar**v:1804.09618 (2018)
Consortium: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)
Monson, B.B., Hunter, E.J., Lotto, A.J., Story, B.H.: The perceptual significance of high-frequency energy in the human voice. Front. Psychol. 5, 587 (2014). https://www.frontiersin.org/article/10.3389/fpsyg.2014.00587
Wang, X., et al.: ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput. Speech Lang. 64, 101114 (2020). https://www.sciencedirect.com/science/article/pii/S0885230820300474
Sisman, B., Yamagishi, J., King, S., Li, H.: An overview of voice conversion and its challenges: from statistical modeling to deep learning. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 132–157 (2020). https://doi.org/10.1109/TASLP.2020.3038524
Kobayashi, K., Toda, T., Nakamura, S.: Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Commun. 99, 211–220 (2018). https://www.sciencedirect.com/science/article/pii/S0167639317303710
Hsu, C.C., Hwang, H.T., Wu, Y.C., Tsao, Y., Wang, H.M.: Voice conversion from non-parallel corpora using variational auto-encoder. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6. IEEE (2016)
Huang, W.C., et al.: Generalization of spectrum differential based direct waveform modification for voice conversion. ar**v preprint ar**v:1907.11898 (2019)
**ao, X., Tian, X., Du, S., Xu, H., Chng, E., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2052–2056 (2015)
Todisco, M., et al.: Integrated presentation attack detection and automatic speaker verification: common features and Gaussian back-end fusion. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 77–81 (2018)
De Leon, P.L., Stewart, B.: Synthetic speech detection based on selectedword discriminators. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3004–3008. IEEE (2013)
Mankad, S.H., Garg, S.: On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog. Artif. Intell. 9(4), 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0
Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), p. 1844 (2020)
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 721–725 (2018)
Sankar, M.A., De Leon, P.L., Sandoval, S., Roedig, U.: Low-complexity speech spoofing detection using instantaneous spectral features. In: 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–4. IEEE (2022). https://hdl.handle.net/10468/13215
Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018(06), 187–194 (2018)
Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J.: Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features. IEEE Trans. Neural. Netw. Learn. Syst. 29(10), 4633–4644 (2018)
Sahidullah, M., et al.: Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2016)
Lavrentyeva, G., et al.: STC antispoofing systems for the ASVspoof2019 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1033–1037 (2019)
Chetttri, B., et al.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1018–1022 (2019)
Tian, X., **ao, X., Chng, E.S., Li, H.: Spoofing speech detection using temporal convolutional neural network. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6 (2016)
Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R.: Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 22–26 (2017)
Tak, H., Jung, J.W., Patino, J., Todisco, M., Evans, N.: Graph attention networks for anti-spoofing. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2021)
Witkowski, M., Kacprzak, S., Żelasko, P., Kowalczyk, K., Gałka, J.: Audio replay attack detection using high-frequency features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 27–31 (2017)
Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with RawNet2. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6369–6373 (2021)
Tak, H., Jung, J.W., Patino, J., Kamble, M., Todisco, M., Evans, N.: End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In: Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 1–8 (2021)
Jung, J.W., et al.: AASIST: audio anti-spoofing using integrated spectro-temporal graph attention networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6367–6371 (2022)
Acknowledgement
This publication has emanated from research supported in part by a Grant from Science Foundation Ireland under Grant number 19/FFP/6775 and 13/RC/2077_P2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Muttathu Sivasankara Pillai, A.S., L. De Leon, P., Roedig, U. (2022). Detection of Voice Conversion Spoofing Attacks Using Voiced Speech. In: Reiser, H.P., Kyas, M. (eds) Secure IT Systems. NordSec 2022. Lecture Notes in Computer Science, vol 13700. Springer, Cham. https://doi.org/10.1007/978-3-031-22295-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-22295-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22294-8
Online ISBN: 978-3-031-22295-5
eBook Packages: Computer ScienceComputer Science (R0)