Detection of Voice Conversion Spoofing Attacks Using Voiced Speech

  • Conference paper
  • First Online:
Secure IT Systems (NordSec 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13700))

Included in the following conference series:

  • 670 Accesses

Abstract

Speech consists of voiced and unvoiced segments that differ in their production process and exhibit different characteristics. In this paper, we investigate the spectral differences between bonafide and spoofed speech for voiced and unvoiced speech segments. We observe that the largest spectral differences lie in the 0–4 kHz band of voiced speech. Based on this observation, we propose a low-complexity, pre-processing stage which subsamples voiced frames prior to spoofing detection. The proposed pre-processing stage is applied to two systems, LFCC+GMM and IA/IF+KNN that differ entirely on the features and classifier used for spoofing detection. Our results show improvement with both systems in detection of the ASVspoof 2019 A17 voice conversion attack, which is recognized to have one of the highest spoofing capabilities. We also show improvements in the A18 and A19 voice conversion attacks for the IA/IF+KNN system. The resulting A17 EERs are lower than all reported systems where the A17 spoofing attack is the worst attack except the Capsule Network. Finally, we note that the proposed pre-processing stage reduces the speech date by more than \(4\times \) due to subsampling and using only voiced frames but at the same time maintaining similar pooled EER as that for the baseline systems, which may be advantageous for resource constrained spoofing detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 64.19
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 80.24
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Wu, Z., Li, H.: On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed. Tools Appl. 75(3), 5311–5327 (2015). https://doi.org/10.1007/s11042-015-3080-9

    Article  Google Scholar 

  2. Lindberg, J., Blomberg, M.: Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European Conference on Speech Communication and Technology, pp. 5–9 (1999)

    Google Scholar 

  3. Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  4. Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1008–1012 (2019)

    Google Scholar 

  5. Ge, W., Patino, J., Todisco, M., Evans, N.: Raw differentiable architecture search for speech deepfake and spoofing detection. In: Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 22–28 (2021)

    Google Scholar 

  6. Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Pearson Education India, New Delhi (2006)

    Google Scholar 

  7. Lovekin, J.M., Yantorno, R.E., Krishnamachari, K.R., Benincasa, D.S., Wenndt, S.J.: Develo** usable speech criteria for speaker identification technology. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 421–424 (2001)

    Google Scholar 

  8. Veaux, C., Yamagishi, J., MacDonald, K., Corpus, V.C.T.K.: English multi-speaker corpus for CSTR voice cloning toolkit. The Centre for Speech Technology Research (CSTR), University of Edinburgh (2017)

    Google Scholar 

  9. Kinnunen, T., et al.: t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. ar**v preprint ar**v:1804.09618 (2018)

  10. Consortium: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf

  11. Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)

    Google Scholar 

  12. Monson, B.B., Hunter, E.J., Lotto, A.J., Story, B.H.: The perceptual significance of high-frequency energy in the human voice. Front. Psychol. 5, 587 (2014). https://www.frontiersin.org/article/10.3389/fpsyg.2014.00587

    Article  Google Scholar 

  13. Wang, X., et al.: ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput. Speech Lang. 64, 101114 (2020). https://www.sciencedirect.com/science/article/pii/S0885230820300474

    Article  Google Scholar 

  14. Sisman, B., Yamagishi, J., King, S., Li, H.: An overview of voice conversion and its challenges: from statistical modeling to deep learning. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 132–157 (2020). https://doi.org/10.1109/TASLP.2020.3038524

    Article  Google Scholar 

  15. Kobayashi, K., Toda, T., Nakamura, S.: Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Commun. 99, 211–220 (2018). https://www.sciencedirect.com/science/article/pii/S0167639317303710

    Article  Google Scholar 

  16. Hsu, C.C., Hwang, H.T., Wu, Y.C., Tsao, Y., Wang, H.M.: Voice conversion from non-parallel corpora using variational auto-encoder. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6. IEEE (2016)

    Google Scholar 

  17. Huang, W.C., et al.: Generalization of spectrum differential based direct waveform modification for voice conversion. ar**v preprint ar**v:1907.11898 (2019)

  18. **ao, X., Tian, X., Du, S., Xu, H., Chng, E., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2052–2056 (2015)

    Google Scholar 

  19. Todisco, M., et al.: Integrated presentation attack detection and automatic speaker verification: common features and Gaussian back-end fusion. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 77–81 (2018)

    Google Scholar 

  20. De Leon, P.L., Stewart, B.: Synthetic speech detection based on selectedword discriminators. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3004–3008. IEEE (2013)

    Google Scholar 

  21. Mankad, S.H., Garg, S.: On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog. Artif. Intell. 9(4), 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0

    Article  Google Scholar 

  22. Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), p. 1844 (2020)

    Google Scholar 

  23. Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 721–725 (2018)

    Google Scholar 

  24. Sankar, M.A., De Leon, P.L., Sandoval, S., Roedig, U.: Low-complexity speech spoofing detection using instantaneous spectral features. In: 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–4. IEEE (2022). https://hdl.handle.net/10468/13215

  25. Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018(06), 187–194 (2018)

    Article  Google Scholar 

  26. Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J.: Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features. IEEE Trans. Neural. Netw. Learn. Syst. 29(10), 4633–4644 (2018)

    Article  Google Scholar 

  27. Sahidullah, M., et al.: Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2016)

    Google Scholar 

  28. Lavrentyeva, G., et al.: STC antispoofing systems for the ASVspoof2019 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1033–1037 (2019)

    Google Scholar 

  29. Chetttri, B., et al.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1018–1022 (2019)

    Google Scholar 

  30. Tian, X., **ao, X., Chng, E.S., Li, H.: Spoofing speech detection using temporal convolutional neural network. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6 (2016)

    Google Scholar 

  31. Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R.: Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 22–26 (2017)

    Google Scholar 

  32. Tak, H., Jung, J.W., Patino, J., Todisco, M., Evans, N.: Graph attention networks for anti-spoofing. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2021)

    Google Scholar 

  33. Witkowski, M., Kacprzak, S., Żelasko, P., Kowalczyk, K., Gałka, J.: Audio replay attack detection using high-frequency features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 27–31 (2017)

    Google Scholar 

  34. Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with RawNet2. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6369–6373 (2021)

    Google Scholar 

  35. Tak, H., Jung, J.W., Patino, J., Kamble, M., Todisco, M., Evans, N.: End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In: Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 1–8 (2021)

    Google Scholar 

  36. Jung, J.W., et al.: AASIST: audio anti-spoofing using integrated spectro-temporal graph attention networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6367–6371 (2022)

    Google Scholar 

Download references

Acknowledgement

This publication has emanated from research supported in part by a Grant from Science Foundation Ireland under Grant number 19/FFP/6775 and 13/RC/2077_P2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arun Sankar Muttathu Sivasankara Pillai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Muttathu Sivasankara Pillai, A.S., L. De Leon, P., Roedig, U. (2022). Detection of Voice Conversion Spoofing Attacks Using Voiced Speech. In: Reiser, H.P., Kyas, M. (eds) Secure IT Systems. NordSec 2022. Lecture Notes in Computer Science, vol 13700. Springer, Cham. https://doi.org/10.1007/978-3-031-22295-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22295-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22294-8

  • Online ISBN: 978-3-031-22295-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation