Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Ahmad, Md. Talib; Pradhan, Gayadhar; Singh, Jyoti Prakash

doi:10.1007/s00034-024-02739-6

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Published: 16 June 2024

(2024)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

46 Accesses
Explore all metrics

Abstract

This paper investigates the nuanced characteristics of the spectral envelope attributes due to vocal-tract resonance structure and fine-level excitation source features within short-term Fourier transform (STFT) magnitude spectra for the assessment of dysarthria. The single-channel convolutional neural network (CNN) employing time-frequency representations such as STFT spectrogram (STFT-SPEC) and Mel-spectrogram (MEL-SPEC) does not ensure capture of the source and system information simultaneously due to the filtering operation using a fixed-size filter. Building upon this observation, this study first explores the significance of convolution filter size in the context of the CNN-based automated dysarthric assessment system. An approach is then introduced to effectively capture resonance structure and fine-level features through a multi-channel CNN. In the proposed approach, the STFT-SPEC is decomposed using a one-level discrete wavelet transform (DWT) to separate the slow-varying spectral structure and fine-level features. The resulting decomposed coefficients in four directions are taken as the inputs to multi-channel CNN to capture the source and system features by employing different sizes of convolution filters. The experimental results conducted on the UA-speech corpus validate the efficacy of the proposed approach utilizing multi-channel CNN. The proposed approach demonstrates the notable enhancement in accuracy and F1 score (60.86% and 48.52%) compared to a single-channel CNN using STFT-SPEC (46.45% and 40.97%), MEL-SPEC (48.86% and 38.20%), and MEL-SPEC appended with delta and delta-delta coefficients (52.40% and 42.84%) for assessment of dysarthria in a speaker-independent and text-independent mode.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

Article 03 January 2024

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Data Availability

The speech database employed for experimental assessments in this paper can be accessed via the following link: http://www.isle.illinois.edu/sst/data/UASpeech/. The data can be obtained by sending a request email to Mark Hasegawa-Johnson, specifying your username and affiliated institution.

References

A. Aggarwal, Enhancement of GPS position accuracy using machine vision and deep learning techniques. J. Comput. Sci. 16(5), 651–659 (2020)
Article Google Scholar
K. An, M. Kim, K. Teplansky, J. Green, T. Campbell, Y. Yunusova, D. Heitzman, J. Wang, Automatic early detection of amyotrophic lateral sclerosis from intelligible speech using convolutional neural networks. Proc. Interspeech 2018, 1913–1917 (2018)
Google Scholar
M. Aqil, A. Jbari, A. Bourouhou, ECG signal denoising by discrete wavelet transform. Int. J. Online Eng. 13, 51 (2017)
Article Google Scholar
K.K. Baker, L.O. Ramig, E.S. Luschei, M.E. Smith, Thyroarytenoid muscle activity associated with hypophonia in Parkinson’s disease and aging. Neurology 51(6), 1592–1598 (1998)
Article Google Scholar
S.S. Barreto, K.Z. Ortiz, Speech intelligibility in dysarthrias: influence of utterance length. Folia Phoniatr. Logop. 72(3), 202–210 (2020)
Article Google Scholar
A. Benba, A. Jilbab, A. Hammouch, Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines. Int. J. Electr. Eng. Inform. 7(2), 297–307 (2015)
Google Scholar
J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)
Article Google Scholar
H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Topics Signal Process. 14(2), 390–399 (2019)
Article Google Scholar
H. Chandrashekar, V. Karjigi, N. Sreedevi, Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. IEEE Trans. Neural Syst. Rehabil. Eng. 28(12), 2880–2889 (2020)
Article Google Scholar
G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application. Int. J. Lang. Commun. Disord. 45(6), 630–644 (2010)
Article Google Scholar
J.R. Duffy. Motor speech disorders: Clues to neurologic diagnosis, in Parkinson’s disease and movement disorders: Diagnosis and treatment guidelines for the practicing physician, pp. 35–53 (2000)
T.H. Falk, W.Y. Chan, F. Shein, Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Commun. 54(5), 622–631 (2012)
Article Google Scholar
T.H. Falk, R. Hummel, W.Y. Chan. Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4480–4483 (2011)
K. Gurugubelli, A.K. Vuppala, Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Commun. 121, 1–15 (2020)
Article Google Scholar
A. Hernandez, S. Kim, M. Chung, Prosody-based measures for automatic severity assessment of dysarthric speech. Appl. Sci. 10(19), 6999 (2020)
Article Google Scholar
S.A. Hicks, I. Strümke, V. Thambawita, M. Hammou, M.A. Riegler, P. Halvorsen, S. Parasa, On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12(1), 1–9 (2022)
Article Google Scholar
N.M. Joy, S. Umesh, Improving acoustic models in torgo dysarthric speech database. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 637–645 (2018)
Article Google Scholar
K.L. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa. Automated diagnosis and assessment of dysarthric speech using relevant prosodic features, in Transactions on Engineering Technologies: Special Volume of the World Congress on Engineering 2013, (Springer, 2014) pp. 529–542
T. Kapoor, R. Sharma, Parkinson’s disease diagnosis using MEL-frequency cepstral coefficients and vector quantization. Int. J. Comput. Appl. 14(3), 43–46 (2011)
Google Scholar
R.D. Kent, G. Weismer, J.F. Kent, H.K. Vorperian, J.R. Duffy, Acoustic studies of dysarthric speech: methods, progress, and potential. J. Commun. Disord. 32(3), 141–186 (1999)
Article Google Scholar
H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame. Dysarthric speech database for universal access research, in Proc. INTERSPEECH, pp. 1741–1744 (2008)
I. Kodrasi, Temporal envelope and fine structure cues for dysarthric speech detection using CNNS. IEEE Signal Process. Lett. 28, 1853–1857 (2021)
Article Google Scholar
R. Kronland-Martinet, J. Morlet, A. Grossmann, Analysis of sound patterns through wavelet transforms. Int. J. Pattern Recognit Artif Intell. 1(02), 273–302 (1987)
Article Google Scholar
A. Kumar, G. Pradhan, Detection of vowel onset and offset points using non-local similarity between DWT approximation coefficients. Electron. Lett. 54(11), 722–724 (2018)
Article Google Scholar
R. Kumar, P.K. Singh, J. Yadav, Digital image watermarking technique based on adaptive median filter and hl sub-band of two-stage dwt. Int. J. Comput. Aided Eng. Technol. 18(4), 290–310 (2023)
Article Google Scholar
X. Ma, D. Wang, D. Liu, J. Yang, DWT and CNN based multi-class motor imagery electroencephalographic signal recognition. J. Neural Eng. 17(1), 016, 073 (2020)
Article Google Scholar
A. Maier, T. Haderlein, F. Stelzle, E. Nöth, E. Nkenke, F. Rosanowski, A. Schützenberger, M. Schuster, Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP J. Audio Speech Music Process. 2010, 1–7 (2009)
Article Google Scholar
D. Maini, A.K. Aggarwal, Camera position estimation using 2d image dataset. Int. J. Innov. Eng. Technol. 10, 199–203 (2018)
Google Scholar
D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Accessible Comput. 6(3), 1–21 (2015)
Article Google Scholar
C. Middag, G. Van Nuffelen, J.P. Martens, M. De Bodt. Objective intelligibility assessment of pathological speakers, in 9th annual conference of the international speech communication association (interspeech 2008), (International Speech Communication Association (ISCA), 2008) pp. 1745–1748
J. Müller, G.K. Wenning, M. Verny, A. McKee, K.R. Chaudhuri, K. Jellinger, W. Poewe, I. Litvan, Progression of dysarthria and dysphagia in postmortem-confirmed parkinsonian disorders. Arch. Neurol. 58(2), 259–264 (2001)
Article Google Scholar
N. Narendra, P. Alku, Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Comput. Speech Lang. 65, 1–14 (2021)
Article Google Scholar
P.D. Polur, G.E. Miller, Experiments with fast fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden markov model. IEEE Trans. Neural Syst. Rehabil. Eng. 13(4), 558–561 (2005)
Article Google Scholar
Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)
Article Google Scholar
J. Ramirez, J.M. Górriz, J.C. Segura, Voice activity detection. Fundamentals and speech recognition system robustness. Robust Speech Recogn. Understand. 6(9), 1–22 (2007)
Google Scholar
S. Ratsameewichai, N. Theera-Umpon, J. Vilasdechanon, S. Uatrongjit, K. Likit-Anurucks. Thai phoneme segmentation using dual-band energy contour, in ITC-CSCC, pp. 111–113 (2002)
P. Sahane, S. Pangaonkar, S. Khandekar. Dysarthric speech recognition using multi-taper mel frequency cepstrum coefficients, in 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–4. IEEE (2021)
L.P. Sahu, G. Pradhan, Analysis of short-time magnitude spectra for improving intelligibility assessment of dysarthric speech. Circ. Syst. Signal Process. 41, 5676–5698 (2022)
Article Google Scholar
L.P. Sahu, G. Pradhan. Significance of filterbank structure for capturing dysarthric information through cepstral coefficients, in SPCOM 2022-IEEE International Conference on Signal Processing and Communications pp. 1–5 (2022)
R. Sandyk, Resolution of dysarthria in multiple sclerosis by treatment with weak electromagnetic fields. Int. J. Neurosci. 83(1–2), 81–92 (1995)
Article Google Scholar
P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in dwt. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)
Article Google Scholar
S. Skodda, W. Visser, U. Schlegel, Vowel articulation in Parkinson’s disease. J. Voice 25(4), 467–472 (2011)
Article Google Scholar
R.S. Stanković, B.J. Falkowski, The HAAR wavelet transform: its status and achievements. Comput. Electr. Eng. 29(1), 25–44 (2003)
Article Google Scholar
R. Thukral, A. Arora, A. Kumar. Gulshan: Denoising of thermal images using deep neural network, in Proceedings of International Conference on Recent Trends in Computing: ICRTC 2021, (Springer, 2022) pp. 827–833
G. Tzanetakis, G. Essl, P. Cook. Audio analysis using the discrete wavelet transform, in Proc. conf. in acoustics and music theory applications, vol. 66. (Citeseer, 2001)
B.J. Wilson Bronagh, Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 14(4), 307–327 (2000)
Article Google Scholar

Download references

Funding

We would like to declare that we have not received any financial support from any source for the completion of this work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, Bihar, 800005, India
Md. Talib Ahmad & Jyoti Prakash Singh
Department of Electronics and Communication Engineering, National Institute of Technology Patna, Patna, Bihar, 800005, India
Gayadhar Pradhan

Authors

Md. Talib Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Gayadhar Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
Jyoti Prakash Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally

Corresponding author

Correspondence to Gayadhar Pradhan.

Ethics declarations

Conflict of interest

We wish to declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ahmad, M.T., Pradhan, G. & Singh, J.P. Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech. Circuits Syst Signal Process (2024). https://doi.org/10.1007/s00034-024-02739-6

Download citation

Received: 01 November 2023
Revised: 21 May 2024
Accepted: 21 May 2024
Published: 16 June 2024
DOI: https://doi.org/10.1007/s00034-024-02739-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future Prospects

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation