Abstract
Dysarthria is a motor speech disorder that affects an individual’s ability to articulate words, making speech recognition a challenging task. Automatic Speech Recognition (ASR) technologies have the potential to greatly benefit individuals with dysarthria by providing them with a means of communication through computing and portable digital devices. These technologies can serve as an interaction medium, enabling dysarthric patients to communicate with others and computers. In this paper, we propose a transfer learning approach using the Whisper model to develop a dysarthric ASR system. Whisper, Web-scale Supervised Pretraining for Speech Recognition, is a multi-tasking model trained on various speech-related tasks, such as speech transcription on various languages, speech translation, voice activity detection, language identification, etc. on a wide scale of 680,000 h of labeled audio data. Using the proposed Whisper-based approach, we have obtained an word recognition average accuracy of \(59.78\%\) using 155 words of UA-Speech Corpus, using the Bi-LSTM classifier model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarap, A.F.: Deep learning using rectified linear units (ReLU). CoRR abs/1803.08375 (2018). http://arxiv.org/abs/1803.08375. Accessed 6 Feb 2023
Bock, S., Weiß, M.: A proof of local convergence for the ADAM optimizer. In: 2019 International Joint Conference on Neural Networks, IJCNN, Budapest, Hungary, pp. 1–8 (2019)
Iwamoto, Y., Shinozaki, T.: Unsupervised spoken term discovery using Wav2Vec 2.0. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Tokyo, Japan, pp. 1082–1086 (2021)
Kim, H., et al.: Dysarthric speech database for universal access research. In: INTERSPEECH, Brisbane, Australia, pp. 1741–1744 (2008)
Lieberman, P.: Primate vocalizations and human linguistic ability. J. Acoust. Soc. Am. (JASA) 44(6), 1574–1584 (1968)
Lin, Y.Y., et al.: A speech command control-based recognition system for dysarthric patients based on deep learning technology. Appl. Sci. 11(6), 2477 (2021)
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. ar**v preprint ar**v:1511.08458 (2015). Accessed 25 Feb 2023
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. ar**v preprint ar**v:2212.04356 (2022). Accessed 6 Mar 2023
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Sehgal, S., Cunningham, S.: Model adaptation and adaptive training for the recognition of dysarthric speech. In: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, Dresden, Germany, pp. 65–71 (2015)
Shahamiri, S.R.: Speech vision: an end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 852–861 (2021). https://doi.org/10.1109/TNSRE.2021.3076778
Torrey, L., Shavlik, J.: Transfer learning. In: Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 242–264. IGI Global (2010)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS), vol. 30, Long Beach, USA (2017)
Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Advances in NIPS, vol. 31, Montreal, Canada (2018)
Zhao, Y., Kuruvilla-Dugdale, M., Song, M.: Voice conversion for persons with amyotrophic lateral sclerosis. IEEE J. Biomed. Health Inform. 24(10), 2942–2949 (2019)
Acknowledgments
The authors would like to express their sincere appreciation to the Ministry of Electronics and Information Technology (MeitY), New Delhi, Govt. of India, for the project ‘Speech Technologies in Indian Languages BHASHINI’, (Grant ID: 11(1)2022-HCC (TDIL)) for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rathod, S., Charola, M., Patil, H.A. (2023). Transfer Learning Using Whisper for Dysarthric Automatic Speech Recognition. In: Karpov, A., Samudravijaya, K., Deepak, K.T., Hegde, R.M., Agrawal, S.S., Prasanna, S.R.M. (eds) Speech and Computer. SPECOM 2023. Lecture Notes in Computer Science(), vol 14338. Springer, Cham. https://doi.org/10.1007/978-3-031-48309-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-48309-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48308-0
Online ISBN: 978-3-031-48309-7
eBook Packages: Computer ScienceComputer Science (R0)