Abstract
Deep learning methods are data-driven. But for pathological voice detection, it is difficult to obtain high-quality labeled data. In this work, a weakly supervised learning Method is presented to improve the quality of existing datasets by learning sample weights and fine-grained labels. First, A convolutional neural network (CNN) is devised as the basic architecture to detect the pathological voice. Then, a proposed self-training algorithm is used to iteratively run and automatically learn the sample weights and fine-grained labels. These learned sample weights and fine-grained labels are used to train the CNN model from scratch. The experiment results on the Saarbrucken Voice database show that the diagnosis accuracy improved from 75.7 to 82.5%, with a 6.8% improvement in accuracy over the CNN models trained with the original dataset. This work demonstrates that the weakly supervised learning method can significantly improve the classification performance to distinguish pathological voice and healthy voice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Stemple JC, Roy N, Klaben BK (2018) Clinical voice pathology: theory and management. Plural Publishing, San Diego
Dejonckere PH, Bradley P, Clemente P et al (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Oto-rhino-laryngology 258(2):77–82
Mekyska J, Janousova E, Gomez-Vilda P et al (2015) Robust and complex approach of pathological speech signal analysis. Neurocomputing 167:94–111
Rabiner L (1993) Fundamentals of speech recognition
Al-Nasheri A, Muhammad G, Alsulaiman M et al (2017) An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 31(1):113-e9
HenrÃquez P, Alonso JB, Ferrer MA et al (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195
Muhammad G, Melhem M (2014) Pathological voice detection and binary classification using MPEG-7 audio features. Biomed Sig Process Control 11:1–9
Panek D, Skalski A, Gajda J (2014) Quantification of linear and non-linear acoustic analysis applied to voice pathology detection. In: Piȩtka E, Kawa J, Wieclawek W (eds) Information Technologies in Biomedicine, Volume 4. AISC, vol 284, pp 355–364. Springer, Cham. https://doi.org/10.1007/978-3-319-06596-0_33
Hegde S, Shetty S, Rai S et al (2019) A survey on machine learning approaches for automatic detection of voice disorders. J Voice 33(6):947-e11
Cordeiro H, Fonseca J, Guimarães I et al (2017) Hierarchical classification and system combination for automatically identifying physiological and neuromuscular laryngeal pathologies. J Voice 31(3):384-e9
Hemmerling D (2017) Voice pathology distinction using auto associative neural networks. In: 2017 25th European signal processing conference (EUSIPCO), pp 1844–1847. IEEE
Wu H, Soraghan J, Lowit A, et al (2018) A deep learning method for pathological voice detection using convolutional deep belief networks. In: Proceedings Interspeech 2018, pp 446–450. http://dx.doi.org/10.21437/Interspeech.2018-1351, https://doi.org/10.21437/Interspeech.2018-1351
Chen L, Chen J (2020) Deep neural network for automatic classification of pathological voice signals. J Voice S0892–1997
Mesallam TA, Farahat M, Malki KH et al (2017) Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthcare Eng 2017:1–13
Harar P, Galaz Z, Alonso-Hernandez JB et al (2020) Towards robust voice pathology detection. Neural Comput Appl 32(20):15747–15757
Zhou Z (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 1:1
Jiang Y, Zhang X, Deng J, et al (2019) Data augmentation based convolutional neural network for auscultation. J Fudan Univ (Natural Sci) 328–333
Woldert-Jokisz B (2007) Saarbruecken voice database. http://stimmdb.coli.uni-saarland.de/
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. ICLR (Poster)
Harar P, Alonso-Hernandezy JB, Mekyska J et al (2017) Voice pathology detection using deep learning: a preliminary study. In: International conference and workshop on bioinspired intelligence (IWOBI), pp 1–4. https://doi.org/10.1109/IWOBI.2017.7985525
Acknoledgements
This work was supported by National Key R &D Program of China (2019YFC1711800), NSFC (62171138).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wei, W., Wen, L., Qian, J., Shan, Y., Wang, J., Li, W. (2023). Improving Pathological Voice Detection: A Weakly Supervised Learning Method. In: Shao, X., Qian, K., Wang, X., Zhang, K. (eds) Proceedings of the 9th Conference on Sound and Music Technology. Lecture Notes in Electrical Engineering, vol 923. Springer, Singapore. https://doi.org/10.1007/978-981-19-4703-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-19-4703-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4702-5
Online ISBN: 978-981-19-4703-2
eBook Packages: EngineeringEngineering (R0)