Log in

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper investigates the nuanced characteristics of the spectral envelope attributes due to vocal-tract resonance structure and fine-level excitation source features within short-term Fourier transform (STFT) magnitude spectra for the assessment of dysarthria. The single-channel convolutional neural network (CNN) employing time-frequency representations such as STFT spectrogram (STFT-SPEC) and Mel-spectrogram (MEL-SPEC) does not ensure capture of the source and system information simultaneously due to the filtering operation using a fixed-size filter. Building upon this observation, this study first explores the significance of convolution filter size in the context of the CNN-based automated dysarthric assessment system. An approach is then introduced to effectively capture resonance structure and fine-level features through a multi-channel CNN. In the proposed approach, the STFT-SPEC is decomposed using a one-level discrete wavelet transform (DWT) to separate the slow-varying spectral structure and fine-level features. The resulting decomposed coefficients in four directions are taken as the inputs to multi-channel CNN to capture the source and system features by employing different sizes of convolution filters. The experimental results conducted on the UA-speech corpus validate the efficacy of the proposed approach utilizing multi-channel CNN. The proposed approach demonstrates the notable enhancement in accuracy and F1 score (60.86% and 48.52%) compared to a single-channel CNN using STFT-SPEC (46.45% and 40.97%), MEL-SPEC (48.86% and 38.20%), and MEL-SPEC appended with delta and delta-delta coefficients (52.40% and 42.84%) for assessment of dysarthria in a speaker-independent and text-independent mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The speech database employed for experimental assessments in this paper can be accessed via the following link: http://www.isle.illinois.edu/sst/data/UASpeech/. The data can be obtained by sending a request email to Mark Hasegawa-Johnson, specifying your username and affiliated institution.

References

  1. A. Aggarwal, Enhancement of GPS position accuracy using machine vision and deep learning techniques. J. Comput. Sci. 16(5), 651–659 (2020)

    Article  Google Scholar 

  2. K. An, M. Kim, K. Teplansky, J. Green, T. Campbell, Y. Yunusova, D. Heitzman, J. Wang, Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks. Proc. Interspeech 2018, 1913–1917 (2018)

    Google Scholar 

  3. M. Aqil, A. Jbari, A. Bourouhou, ECG signal denoising by discrete wavelet transform. Int. J. Online Eng. 13, 51 (2017)

    Article  Google Scholar 

  4. K.K. Baker, L.O. Ramig, E.S. Luschei, M.E. Smith, Thyroarytenoid muscle activity associated with hypophonia in Parkinson’s disease and aging. Neurology 51(6), 1592–1598 (1998)

    Article  Google Scholar 

  5. S.S. Barreto, K.Z. Ortiz, Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr. Logop. 72(3), 202–210 (2020)

    Article  Google Scholar 

  6. A. Benba, A. Jilbab, A. Hammouch, Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines. Int. J. Electr. Eng. Inform. 7(2), 297–307 (2015)

    Google Scholar 

  7. J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)

    Article  Google Scholar 

  8. H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Topics Signal Process. 14(2), 390–399 (2019)

    Article  Google Scholar 

  9. H. Chandrashekar, V. Karjigi, N. Sreedevi, Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2880–2889 (2020)

    Article  Google Scholar 

  10. G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application. Int. J. Lang. Commun. Disord. 45(6), 630–644 (2010)

    Article  Google Scholar 

  11. J.R. Duffy. Motor speech disorders: Clues to neurologic diagnosis, in Parkinson’s disease and movement disorders: Diagnosis and treatment guidelines for the practicing physician, pp. 35–53 (2000)

  12. T.H. Falk, W.Y. Chan, F. Shein, Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Commun. 54(5), 622–631 (2012)

    Article  Google Scholar 

  13. T.H. Falk, R. Hummel, W.Y. Chan. Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4480–4483 (2011)

  14. K. Gurugubelli, A.K. Vuppala, Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Commun. 121, 1–15 (2020)

    Article  Google Scholar 

  15. A. Hernandez, S. Kim, M. Chung, Prosody-based measures for automatic severity assessment of dysarthric speech. Appl. Sci. 10(19), 6999 (2020)

    Article  Google Scholar 

  16. S.A. Hicks, I. Strümke, V. Thambawita, M. Hammou, M.A. Riegler, P. Halvorsen, S. Parasa, On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12(1), 1–9 (2022)

    Article  Google Scholar 

  17. N.M. Joy, S. Umesh, Improving acoustic models in torgo dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)

    Article  Google Scholar 

  18. K.L. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa. Automated diagnosis and assessment of dysarthric speech using relevant prosodic features, in Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2013, (Springer, 2014) pp. 529–542

  19. T. Kapoor, R. Sharma, Parkinson’s disease diagnosis using MEL-frequency cepstral coefficients and vector quantization. Int. J. Comput. Appl. 14(3), 43–46 (2011)

    Google Scholar 

  20. R.D. Kent, G. Weismer, J.F. Kent, H.K. Vorperian, J.R. Duffy, Acoustic studies of dysarthric speech: methods, progress, and potential. J. Commun. Disord. 32(3), 141–186 (1999)

    Article  Google Scholar 

  21. H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame. Dysarthric speech database for universal access research, in Proc. INTERSPEECH, pp. 1741–1744 (2008)

  22. I. Kodrasi, Temporal envelope and fine structure cues for dysarthric speech detection using CNNS. IEEE Signal Process. Lett. 28, 1853–1857 (2021)

    Article  Google Scholar 

  23. R. Kronland-Martinet, J. Morlet, A. Grossmann, Analysis of sound patterns through wavelet transforms. Int. J. Pattern Recognit Artif Intell. 1(02), 273–302 (1987)

    Article  Google Scholar 

  24. A. Kumar, G. Pradhan, Detection of vowel onset and offset points using non-local similarity between DWT approximation coefficients. Electron. Lett. 54(11), 722–724 (2018)

    Article  Google Scholar 

  25. R. Kumar, P.K. Singh, J. Yadav, Digital image watermarking technique based on adaptive median filter and hl sub-band of two-stage dwt. Int. J. Comput. Aided Eng. Technol. 18(4), 290–310 (2023)

    Article  Google Scholar 

  26. X. Ma, D. Wang, D. Liu, J. Yang, DWT and CNN based multi-class motor imagery electroencephalographic signal recognition. J. Neural Eng. 17(1), 016, 073 (2020)

    Article  Google Scholar 

  27. A. Maier, T. Haderlein, F. Stelzle, E. Nöth, E. Nkenke, F. Rosanowski, A. Schützenberger, M. Schuster, Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J. Audio Speech Music Process. 2010, 1–7 (2009)

    Article  Google Scholar 

  28. D. Maini, A.K. Aggarwal, Camera position estimation using 2d image dataset. Int. J. Innov. Eng. Technol. 10, 199–203 (2018)

    Google Scholar 

  29. D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Accessible Comput. 6(3), 1–21 (2015)

    Article  Google Scholar 

  30. C. Middag, G. Van Nuffelen, J.P. Martens, M. De Bodt. Objective intelligibility assessment of pathological speakers, in 9th annual conference of the international speech communication association (interspeech 2008), (International Speech Communication Association (ISCA), 2008) pp. 1745–1748

  31. J. Müller, G.K. Wenning, M. Verny, A. McKee, K.R. Chaudhuri, K. Jellinger, W. Poewe, I. Litvan, Progression of dysarthria and dysphagia in postmortem-confirmed parkinsonian disorders. Arch. Neurol. 58(2), 259–264 (2001)

    Article  Google Scholar 

  32. N. Narendra, P. Alku, Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Comput. Speech Lang. 65, 1–14 (2021)

    Article  Google Scholar 

  33. P.D. Polur, G.E. Miller, Experiments with fast fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden markov model. IEEE Trans. Neural Syst. Rehabil. Eng. 13(4), 558–561 (2005)

    Article  Google Scholar 

  34. Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)

    Article  Google Scholar 

  35. J. Ramirez, J.M. Górriz, J.C. Segura, Voice activity detection. Fundamentals and speech recognition system robustness. Robust Speech Recogn. Understand. 6(9), 1–22 (2007)

    Google Scholar 

  36. S. Ratsameewichai, N. Theera-Umpon, J. Vilasdechanon, S. Uatrongjit, K. Likit-Anurucks. Thai phoneme segmentation using dual-band energy contour, in ITC-CSCC, pp. 111–113 (2002)

  37. P. Sahane, S. Pangaonkar, S. Khandekar. Dysarthric speech recognition using multi-taper mel frequency cepstrum coefficients, in 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–4. IEEE (2021)

  38. L.P. Sahu, G. Pradhan, Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circ. Syst. Signal Process. 41, 5676–5698 (2022)

    Article  Google Scholar 

  39. L.P. Sahu, G. Pradhan. Significance of filterbank structure for capturing dysarthric information through cepstral coefficients, in SPCOM 2022-IEEE International Conference on Signal Processing and Communications pp. 1–5 (2022)

  40. R. Sandyk, Resolution of dysarthria in multiple sclerosis by treatment with weak electromagnetic fields. Int. J. Neurosci. 83(1–2), 81–92 (1995)

    Article  Google Scholar 

  41. P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in dwt. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)

    Article  Google Scholar 

  42. S. Skodda, W. Visser, U. Schlegel, Vowel articulation in Parkinson’s disease. J. Voice 25(4), 467–472 (2011)

    Article  Google Scholar 

  43. R.S. Stanković, B.J. Falkowski, The HAAR wavelet transform: its status and achievements. Comput. Electr. Eng. 29(1), 25–44 (2003)

    Article  Google Scholar 

  44. R. Thukral, A. Arora, A. Kumar. Gulshan: Denoising of thermal images using deep neural network, in Proceedings of International Conference on Recent Trends in Computing: ICRTC 2021, (Springer, 2022) pp. 827–833

  45. G. Tzanetakis, G. Essl, P. Cook. Audio analysis using the discrete wavelet transform, in Proc. conf. in acoustics and music theory applications, vol. 66. (Citeseer, 2001)

  46. B.J. Wilson Bronagh, Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 14(4), 307–327 (2000)

    Article  Google Scholar 

Download references

Funding

We would like to declare that we have not received any financial support from any source for the completion of this work.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally

Corresponding author

Correspondence to Gayadhar Pradhan.

Ethics declarations

Conflict of interest

We wish to declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, M.T., Pradhan, G. & Singh, J.P. Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02739-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00034-024-02739-6

Keywords

Navigation