Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2023)

Abstract

Dysarthria is a motor speech disorder that affects an individual’s ability to articulate words, making speech recognition a challenging task. Automatic Speech Recognition (ASR) technologies have the potential to greatly benefit individuals with dysarthria by providing them with a means of communication through computing and portable digital devices. These technologies can serve as an interaction medium, enabling dysarthric patients to communicate with others and computers. In this paper, we propose a transfer learning approach using the Whisper model to develop a dysarthric ASR system. Whisper, Web-scale Supervised Pretraining for Speech Recognition, is a multi-tasking model trained on various speech-related tasks, such as speech transcription on various languages, speech translation, voice activity detection, language identification, etc. on a wide scale of 680,000 h of labeled audio data. Using the proposed Whisper-based approach, we have obtained an word recognition average accuracy of \(59.78\%\) using 155 words of UA-Speech Corpus, using the Bi-LSTM classifier model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarap, A.F.: Deep learning using rectified linear units (ReLU). CoRR abs/1803.08375 (2018). http://arxiv.org/abs/1803.08375. Accessed 6 Feb 2023

  2. Bock, S., Weiß, M.: A proof of local convergence for the ADAM optimizer. In: 2019 International Joint Conference on Neural Networks, IJCNN, Budapest, Hungary, pp. 1–8 (2019)

    Google Scholar 

  3. Iwamoto, Y., Shinozaki, T.: Unsupervised spoken term discovery using Wav2Vec 2.0. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, pp. 1082–1086 (2021)

    Google Scholar 

  4. Kim, H., et al.: Dysarthric speech database for universal access research. In: INTERSPEECH, Brisbane, Australia, pp. 1741–1744 (2008)

    Google Scholar 

  5. Lieberman, P.: Primate vocalizations and human linguistic ability. J. Acoust. Soc. Am. (JASA) 44(6), 1574–1584 (1968)

    Article  Google Scholar 

  6. Lin, Y.Y., et al.: A speech command control-based recognition system for dysarthric patients based on deep learning technology. Appl. Sci. 11(6), 2477 (2021)

    Article  Google Scholar 

  7. O’Shea, K., Nash, R.: An introduction to convolutional neural networks. ar**v preprint ar**v:1511.08458 (2015). Accessed 25 Feb 2023

  8. Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. ar**v preprint ar**v:2212.04356 (2022). Accessed 6 Mar 2023

  9. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  10. Sehgal, S., Cunningham, S.: Model adaptation and adaptive training for the recognition of dysarthric speech. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, Dresden, Germany, pp. 65–71 (2015)

    Google Scholar 

  11. Shahamiri, S.R.: Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 852–861 (2021). https://doi.org/10.1109/TNSRE.2021.3076778

    Article  Google Scholar 

  12. Torrey, L., Shavlik, J.: Transfer learning. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 242–264. IGI Global (2010)

    Google Scholar 

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), vol. 30, Long Beach, USA (2017)

    Google Scholar 

  14. Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in NIPS, vol. 31, Montreal, Canada (2018)

    Google Scholar 

  15. Zhao, Y., Kuruvilla-Dugdale, M., Song, M.: Voice conversion for persons with amyotrophic lateral sclerosis. IEEE J. Biomed. Health Inform. 24(10), 2942–2949 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to express their sincere appreciation to the Ministry of Electronics and Information Technology (MeitY), New Delhi, Govt. of India, for the project ‘Speech Technologies in Indian Languages BHASHINI’, (Grant ID: 11(1)2022-HCC (TDIL)) for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siddharth Rathod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rathod, S., Charola, M., Patil, H.A. (2023). Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48309-7_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48308-0

  • Online ISBN: 978-3-031-48309-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation