Detection of Voice Conversion Spoofing Attacks Using Voiced Speech

Muttathu Sivasankara Pillai, Arun Sankar; L. De Leon, Phillip; Roedig, Utz

doi:10.1007/978-3-031-22295-5_9

Arun Sankar Muttathu Sivasankara Pillai⁹,
Phillip L. De Leon¹⁰ &
Utz Roedig⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13700))

Included in the following conference series:

Nordic Conference on Secure IT Systems

670 Accesses

Abstract

Speech consists of voiced and unvoiced segments that differ in their production process and exhibit different characteristics. In this paper, we investigate the spectral differences between bonafide and spoofed speech for voiced and unvoiced speech segments. We observe that the largest spectral differences lie in the 0–4 kHz band of voiced speech. Based on this observation, we propose a low-complexity, pre-processing stage which subsamples voiced frames prior to spoofing detection. The proposed pre-processing stage is applied to two systems, LFCC+GMM and IA/IF+KNN that differ entirely on the features and classifier used for spoofing detection. Our results show improvement with both systems in detection of the ASVspoof 2019 A17 voice conversion attack, which is recognized to have one of the highest spoofing capabilities. We also show improvements in the A18 and A19 voice conversion attacks for the IA/IF+KNN system. The resulting A17 EERs are lower than all reported systems where the A17 spoofing attack is the worst attack except the Capsule Network. Finally, we note that the proposed pre-processing stage reduces the speech date by more than \(4\times \) due to subsampling and using only voiced frames but at the same time maintaining similar pooled EER as that for the baseline systems, which may be advantageous for resource constrained spoofing detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 64.19; Price includes VAT (Germany)

Softcover Book: EUR 80.24; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics

Speech frame selection for spoofing detection with an application to partially spoofed audio-data

Article 03 January 2021

A Practical Guide to Logical Access Voice Presentation Attack Detection

References

Wu, Z., Li, H.: On the study of replay and voice conversion attacks to text-dependent speaker verification. Multimed. Tools Appl. 75(3), 5311–5327 (2015). https://doi.org/10.1007/s11042-015-3080-9
Article Google Scholar
Lindberg, J., Blomberg, M.: Vulnerability in speaker verification-a study of technical impostor techniques. In: Sixth European Conference on Speech Communication and Technology, pp. 5–9 (1999)
Google Scholar
Wu, Z., et al.: ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1008–1012 (2019)
Google Scholar
Ge, W., Patino, J., Todisco, M., Evans, N.: Raw differentiable architecture search for speech deepfake and spoofing detection. In: Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 22–28 (2021)
Google Scholar
Quatieri, T.F.: Discrete-Time Speech Signal Processing: Principles and Practice. Pearson Education India, New Delhi (2006)
Google Scholar
Lovekin, J.M., Yantorno, R.E., Krishnamachari, K.R., Benincasa, D.S., Wenndt, S.J.: Develo** usable speech criteria for speaker identification technology. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 421–424 (2001)
Google Scholar
Veaux, C., Yamagishi, J., MacDonald, K., Corpus, V.C.T.K.: English multi-speaker corpus for CSTR voice cloning toolkit. The Centre for Speech Technology Research (CSTR), University of Edinburgh (2017)
Google Scholar
Kinnunen, T., et al.: t-DCF: a detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. ar**v preprint ar**v:1804.09618 (2018)
Consortium: ASVspoof 2019: automatic speaker verification spoofing and countermeasures challenge evaluation plan (2019). https://www.asvspoof.org/asvspoof2019/asvspoof2019_evaluation_plan.pdf
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)
Google Scholar
Monson, B.B., Hunter, E.J., Lotto, A.J., Story, B.H.: The perceptual significance of high-frequency energy in the human voice. Front. Psychol. 5, 587 (2014). https://www.frontiersin.org/article/10.3389/fpsyg.2014.00587
Article Google Scholar
Wang, X., et al.: ASVspoof 2019: a large-scale public database of synthesized, converted and replayed speech. Comput. Speech Lang. 64, 101114 (2020). https://www.sciencedirect.com/science/article/pii/S0885230820300474
Article Google Scholar
Sisman, B., Yamagishi, J., King, S., Li, H.: An overview of voice conversion and its challenges: from statistical modeling to deep learning. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 132–157 (2020). https://doi.org/10.1109/TASLP.2020.3038524
Article Google Scholar
Kobayashi, K., Toda, T., Nakamura, S.: Intra-gender statistical singing voice conversion with direct waveform modification using log-spectral differential. Speech Commun. 99, 211–220 (2018). https://www.sciencedirect.com/science/article/pii/S0167639317303710
Article Google Scholar
Hsu, C.C., Hwang, H.T., Wu, Y.C., Tsao, Y., Wang, H.M.: Voice conversion from non-parallel corpora using variational auto-encoder. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6. IEEE (2016)
Google Scholar
Huang, W.C., et al.: Generalization of spectrum differential based direct waveform modification for voice conversion. ar**v preprint ar**v:1907.11898 (2019)
**ao, X., Tian, X., Du, S., Xu, H., Chng, E., Li, H.: Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2052–2056 (2015)
Google Scholar
Todisco, M., et al.: Integrated presentation attack detection and automatic speaker verification: common features and Gaussian back-end fusion. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 77–81 (2018)
Google Scholar
De Leon, P.L., Stewart, B.: Synthetic speech detection based on selectedword discriminators. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3004–3008. IEEE (2013)
Google Scholar
Mankad, S.H., Garg, S.: On the performance of empirical mode decomposition-based replay spoofing detection in speaker verification systems. Prog. Artif. Intell. 9(4), 325–339 (2020). https://doi.org/10.1007/s13748-020-00216-0
Article Google Scholar
Tak, H., Patino, J., Nautsch, A., Evans, N., Todisco, M.: Spoofing attack detection using the non-linear fusion of sub-band classifiers. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), p. 1844 (2020)
Google Scholar
Tapkir, P., Patil, H.A.: Novel empirical mode decomposition cepstral features for replay spoof detection. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 721–725 (2018)
Google Scholar
Sankar, M.A., De Leon, P.L., Sandoval, S., Roedig, U.: Low-complexity speech spoofing detection using instantaneous spectral features. In: 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 1–4. IEEE (2022). https://hdl.handle.net/10468/13215
Kinnunen, T., et al.: A spoofing benchmark for the 2018 voice conversion challenge: leveraging from spoofing countermeasures for speech artifact assessment. Proc. Odyssey 2018(06), 187–194 (2018)
Article Google Scholar
Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J.: Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features. IEEE Trans. Neural. Netw. Learn. Syst. 29(10), 4633–4644 (2018)
Article Google Scholar
Sahidullah, M., et al.: Integrated spoofing countermeasures and automatic speaker verification: an evaluation on ASVspoof 2015. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2016)
Google Scholar
Lavrentyeva, G., et al.: STC antispoofing systems for the ASVspoof2019 challenge. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1033–1037 (2019)
Google Scholar
Chetttri, B., et al.: Ensemble models for spoofing detection in automatic speaker verification. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1018–1022 (2019)
Google Scholar
Tian, X., **ao, X., Chng, E.S., Li, H.: Spoofing speech detection using temporal convolutional neural network. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6 (2016)
Google Scholar
Jelil, S., Das, R.K., Prasanna, S.M., Sinha, R.: Spoof detection using source, instantaneous frequency and cepstral features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 22–26 (2017)
Google Scholar
Tak, H., Jung, J.W., Patino, J., Todisco, M., Evans, N.: Graph attention networks for anti-spoofing. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (2021)
Google Scholar
Witkowski, M., Kacprzak, S., Żelasko, P., Kowalczyk, K., Gałka, J.: Audio replay attack detection using high-frequency features. In: Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 27–31 (2017)
Google Scholar
Tak, H., Patino, J., Todisco, M., Nautsch, A., Evans, N., Larcher, A.: End-to-end anti-spoofing with RawNet2. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6369–6373 (2021)
Google Scholar
Tak, H., Jung, J.W., Patino, J., Kamble, M., Todisco, M., Evans, N.: End-to-end spectro-temporal graph attention networks for speaker verification anti-spoofing and speech deepfake detection. In: Automatic Speaker Verification and Spoofing Countermeasures Challenge, pp. 1–8 (2021)
Google Scholar
Jung, J.W., et al.: AASIST: audio anti-spoofing using integrated spectro-temporal graph attention networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6367–6371 (2022)
Google Scholar

Download references

Acknowledgement

This publication has emanated from research supported in part by a Grant from Science Foundation Ireland under Grant number 19/FFP/6775 and 13/RC/2077_P2.

Author information

Authors and Affiliations

School of Computer Science and Information Technology, Cork, Ireland
Arun Sankar Muttathu Sivasankara Pillai & Utz Roedig
Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces, NM, USA
Phillip L. De Leon

Authors

Arun Sankar Muttathu Sivasankara Pillai
View author publications
You can also search for this author in PubMed Google Scholar
Phillip L. De Leon
View author publications
You can also search for this author in PubMed Google Scholar
Utz Roedig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arun Sankar Muttathu Sivasankara Pillai .

Editor information

Editors and Affiliations

Reykjavik University, Reykjavik, Iceland
Hans P. Reiser
Reykjavik University, Reykjavik, Iceland
Marcel Kyas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muttathu Sivasankara Pillai, A.S., L. De Leon, P., Roedig, U. (2022). Detection of Voice Conversion Spoofing Attacks Using Voiced Speech. In: Reiser, H.P., Kyas, M. (eds) Secure IT Systems. NordSec 2022. Lecture Notes in Computer Science, vol 13700. Springer, Cham. https://doi.org/10.1007/978-3-031-22295-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-22295-5_9
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22294-8
Online ISBN: 978-3-031-22295-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detection of Voice Conversion Spoofing Attacks Using Voiced Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics

Speech frame selection for spoofing detection with an application to partially spoofed audio-data

A Practical Guide to Logical Access Voice Presentation Attack Detection

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Detection of Voice Conversion Spoofing Attacks Using Voiced Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Voice Spoofing Detection Employing Time Delay Neural Networks and Higher Order Statistics

Speech frame selection for spoofing detection with an application to partially spoofed audio-data

A Practical Guide to Logical Access Voice Presentation Attack Detection

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation