Abstract
Voice pathology includes a wide selection of diseases that affect human voice. Moreover, surgeries in the vicinity of the voice apparatus might temporarily negatively influence human voice. Despite the fact that voice pathology represents a common health issue, the majority of diagnostic methods are either invasive or subject to an interpretation of the medical professional which might vary between individuals. This article presents features that have been traditionally used in aiding the voice pathology diagnosis as well as modern approach to the voice pathology detection using image analysis and classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Airainer, R., Klingholz, F.: Quantitative evaluation of phonetograms in the case of functional dysphonia. J. Voice 7(2), 136–141 (1993). https://linkinghub.elsevier.com/retrieve/pii/S0892199705803431
Alhussein, M., Muhammad, G.: Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6, 41034–41041 (2018)
Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra- and inter-database study for Arabic, English, and German databases: do conventional speech features detect voice pathology? J. Voice 31(3), 386.e1–386.e8 (2017). https://linkinghub.elsevier.com/retrieve/pii/S0892199716301837
Awan, S.N., Solomon, N.P., Helou, L.B., Stojadinovic, A.: Spectral-cepstral estimation of dysphonia severity: external validation. Ann. Otorhinolaryngol. 122(1), 40–48 (2013). http://journals.sagepub.com/doi/10.1177/000348941312200108
Bihari, A., Mészáros, K., Reményi, A., Lichtenberger, G.: Voice quality improvement after management of unilateral vocal cord paralysis with different techniques. Eur. Arch. Otorhinolaryngol. 263(12), 1115–1120(2006). http://springer.longhoe.net/10.1007/s00405-006-0116-9
Boersma, P.: Should Jitter Be measured by peak picking or by waveform matching? Folia Phoniatr. Logop. 61(5), 305–308 (2009). https://www.karger.com/Article/FullText/245159
Cesari, U., De Pietro, G., Marciano, E., Niri, C., Sannino, G., Verde, L.: A new database of healthy and pathological voices. Comput. Electr. Eng. 68, 310–321 (2018). https://www.sciencedirect.com/science/article/pii/S0045790617338739
Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur. Arch. Otorinolaryngol. 258(2), 77–82 (2001). https://doi.org/10.1007/s004050000299
Ferrer, C.A., Bodt, M., Maryn, Y., Van de Heyning, P., Hernandez-Diaz Huici, M.: Properties of the cepstral peak prominence and its usefulness in vocal quality measurements (2007)
Ferrer, C.A., González, E., Hernández-Díaz, M.E.: Evaluation of time and frequency domain-based methods for the estimation of harmonics-to-noise-ratios in voice signals. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 406–415. Springer, Heidelberg (2006). https://doi.org/10.1007/11892755_42
Forero, L., Kohler, M., Vellasco, M., Cataldo, E.: Analysis and classification of voice pathologies using glottal signal parameters. J. Voice Off. J. Voice Found. 1 (2015)
Ghoniem, R.M.: Deep genetic algorithm-based voice pathology diagnostic system. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). LNCS, vol. 11608 (2019). www.scopus.com
Hammarberg, B., Fritzen, B., Gauffin, J., Sundberg, J.: Acoustic and perceptual analysis of vocal dysfunction. J. Phonet. 14(3), 533–547 (1986). https://www.sciencedirect.com/science/article/pii/S0095447019307041
Hammarberg, B., Fritzen, B., Gauffin, J., Sundberg, J.: Vocal acoustic analysis - Jitter, Shimmer and HNR parameters. J. Phonet. 9(3), 533–547 (1986). https://www.sciencedirect.com/science/article/pii/S2212017313002788
Hanson, H.: Glottal characteristics of female speakers. Ph.D. thesis (1995)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Hillenbrand, J.: A methodological study of perturbation and additive noise in synthetically generated voice signals. J. Speech Lang. Hear. Res. 30(4), 448–461 (1987). http://pubs.asha.org/doi/10.1044/jshr.3004.448
Hillenbrand, J., Cleveland, R.A., Erickson, R.L.: Acoustic correlates of breathy vocal quality. J. Speech Lang. Hear. Res. 37(4), 769–778 (1994). http://pubs.asha.org/doi/10.1044/jshr.3704.769
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. ar**v preprint ar**v:1408.5093 (2014)
Lu, C., Wang, Y., Ragulskis, M., Cheng, Y.: Fault diagnosis for rotating machinery: a method based on image processing. PLoS ONE 11(10), e0164111 (2016)
Löfqvist, A.: The long-time-average spectrum as a tool in voice research. J. Phonet. 14(3), 471–475 (1986). https://www.sciencedirect.com/science/article/pii/S0095447019306928
Narra, M., Dodderi, T., Anu, C.C., Varghese, S.M., Dattatreya, T.: Harmonic amplitude measures to note gender difference. Adv. Life Sci. Technol. 31, 17 (2015). https://www.iiste.org/Journals/index.php/ALST/article/view/21271
Parsa, V., Jamieson, D.G.: Identification of pathological voices using glottal noise measures. J. Speech Lang. Hear. Res. 43(2), 469–485 (2000). https://www.proquest.com/docview/232351644/abstract/36BA21E399FA4450PQ/1
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection (2015). ar**v Version Number: 5. https://arxiv.org/abs/1506.02640
Schlegel, P., Kist, A., Semmler, M., Döllinger, M., Kunduk, M., Dür, S., Schützenberger, A.: Determination of clinical parameters sensitive to functional voice disorders applying boosted decision stumps. IEEE J. Transl. Eng. Health Med. 8, 2100511 (2020). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7274815/
Schutte, H.K., Seidner, W.: Recommendation by the union of European phoniatricians (UEP): standardizing voice area measurement/phonetography. Folia Phoniatr. (Basel) 35(6), 286–8 (1983)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Skarnitzl, R.: Fonetická identifikace mluvčího. Univerzita Karlova v Praze, Filozofická fakulta, 1st edn. (2014). https://books.ff.cuni.cz/edicni-rady-ff-uk/varia/foneticka-identifikace-mluvciho/
Titze, I., Horii, Y., Scherer, R.: Some technical considerations in voice perturbation measurements. J. Speech Hear. Res. 30, 252–60 (1987)
Titze, I., Liang, H.: Comparison of Fo extraction methods for high-precision voice perturbation measurements. J. Speech Hear. Res. 36, 1120–33 (1993)
Wang, Y., Cheng, Y.: An approach to fault diagnosis for gearbox based on image processing. In: Shock and Vibration (2016)
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Woldert-Jokisz, B.: Saarbruecken Voice Database. Institut für Phonetik, Universität des Saarlandes (2008). http://stimmdb.coli.uni-saarland.de/
Yumoto, E., Gould, W.J., Baer, T.: Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982). https://asa.scitation.org/doi/abs/10.1121/1.387808
Acknowledgements
Jan Vrba acknowledges his specific university research grant JIGA 445-85-2222. Jakub Steinbach acknowledges his specific university grant (IGA) 445-88-2202.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Steinbach, J., Mazúr, R., Vrba, J. (2023). Trends in Voice Recording Classification - Comparison of Conventional Features and Image Analysis Approach. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Data Science and Algorithms in Systems. CoMeSySo 2022. Lecture Notes in Networks and Systems, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-031-21438-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-031-21438-7_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21437-0
Online ISBN: 978-3-031-21438-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)