Improving Pathological Voice Detection: A Weakly Supervised Learning Method

  • Conference paper
  • First Online:
Proceedings of the 9th Conference on Sound and Music Technology

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 923))

  • 200 Accesses

Abstract

Deep learning methods are data-driven. But for pathological voice detection, it is difficult to obtain high-quality labeled data. In this work, a weakly supervised learning Method is presented to improve the quality of existing datasets by learning sample weights and fine-grained labels. First, A convolutional neural network (CNN) is devised as the basic architecture to detect the pathological voice. Then, a proposed self-training algorithm is used to iteratively run and automatically learn the sample weights and fine-grained labels. These learned sample weights and fine-grained labels are used to train the CNN model from scratch. The experiment results on the Saarbrucken Voice database show that the diagnosis accuracy improved from 75.7 to 82.5%, with a 6.8% improvement in accuracy over the CNN models trained with the original dataset. This work demonstrates that the weakly supervised learning method can significantly improve the classification performance to distinguish pathological voice and healthy voice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Stemple JC, Roy N, Klaben BK (2018) Clinical voice pathology: theory and management. Plural Publishing, San Diego

    Google Scholar 

  2. Dejonckere PH, Bradley P, Clemente P et al (2001) A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur Arch Oto-rhino-laryngology 258(2):77–82

    Article  Google Scholar 

  3. Mekyska J, Janousova E, Gomez-Vilda P et al (2015) Robust and complex approach of pathological speech signal analysis. Neurocomputing 167:94–111

    Article  Google Scholar 

  4. Rabiner L (1993) Fundamentals of speech recognition

    Google Scholar 

  5. Al-Nasheri A, Muhammad G, Alsulaiman M et al (2017) An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 31(1):113-e9

    Article  Google Scholar 

  6. Henríquez P, Alonso JB, Ferrer MA et al (2009) Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans Audio Speech Lang Process 17(6):1186–1195

    Article  Google Scholar 

  7. Muhammad G, Melhem M (2014) Pathological voice detection and binary classification using MPEG-7 audio features. Biomed Sig Process Control 11:1–9

    Article  Google Scholar 

  8. Panek D, Skalski A, Gajda J (2014) Quantification of linear and non-linear acoustic analysis applied to voice pathology detection. In: Piȩtka E, Kawa J, Wieclawek W (eds) Information Technologies in Biomedicine, Volume 4. AISC, vol 284, pp 355–364. Springer, Cham. https://doi.org/10.1007/978-3-319-06596-0_33

  9. Hegde S, Shetty S, Rai S et al (2019) A survey on machine learning approaches for automatic detection of voice disorders. J Voice 33(6):947-e11

    Article  Google Scholar 

  10. Cordeiro H, Fonseca J, Guimarães I et al (2017) Hierarchical classification and system combination for automatically identifying physiological and neuromuscular laryngeal pathologies. J Voice 31(3):384-e9

    Article  Google Scholar 

  11. Hemmerling D (2017) Voice pathology distinction using auto associative neural networks. In: 2017 25th European signal processing conference (EUSIPCO), pp 1844–1847. IEEE

    Google Scholar 

  12. Wu H, Soraghan J, Lowit A, et al (2018) A deep learning method for pathological voice detection using convolutional deep belief networks. In: Proceedings Interspeech 2018, pp 446–450. http://dx.doi.org/10.21437/Interspeech.2018-1351, https://doi.org/10.21437/Interspeech.2018-1351

  13. Chen L, Chen J (2020) Deep neural network for automatic classification of pathological voice signals. J Voice S0892–1997

    Google Scholar 

  14. Mesallam TA, Farahat M, Malki KH et al (2017) Development of the Arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthcare Eng 2017:1–13

    Article  Google Scholar 

  15. Harar P, Galaz Z, Alonso-Hernandez JB et al (2020) Towards robust voice pathology detection. Neural Comput Appl 32(20):15747–15757

    Article  Google Scholar 

  16. Zhou Z (2018) A brief introduction to weakly supervised learning. Nat Sci Rev 1:1

    MathSciNet  Google Scholar 

  17. Jiang Y, Zhang X, Deng J, et al (2019) Data augmentation based convolutional neural network for auscultation. J Fudan Univ (Natural Sci) 328–333

    Google Scholar 

  18. Woldert-Jokisz B (2007) Saarbruecken voice database. http://stimmdb.coli.uni-saarland.de/

  19. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. ICLR (Poster)

    Google Scholar 

  20. Harar P, Alonso-Hernandezy JB, Mekyska J et al (2017) Voice pathology detection using deep learning: a preliminary study. In: International conference and workshop on bioinspired intelligence (IWOBI), pp 1–4. https://doi.org/10.1109/IWOBI.2017.7985525

Download references

Acknoledgements

This work was supported by National Key R &D Program of China (2019YFC1711800), NSFC (62171138).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, W., Wen, L., Qian, J., Shan, Y., Wang, J., Li, W. (2023). Improving Pathological Voice Detection: A Weakly Supervised Learning Method. In: Shao, X., Qian, K., Wang, X., Zhang, K. (eds) Proceedings of the 9th Conference on Sound and Music Technology. Lecture Notes in Electrical Engineering, vol 923. Springer, Singapore. https://doi.org/10.1007/978-981-19-4703-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-4703-2_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-4702-5

  • Online ISBN: 978-981-19-4703-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation