Trends in Voice Recording Classification - Comparison of Conventional Features and Image Analysis Approach

Steinbach, Jakub; Mazúr, Richard; Vrba, Jan

doi:10.1007/978-3-031-21438-7_51

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 597))

Included in the following conference series:

Proceedings of the Computational Methods in Systems and Software

681 Accesses
2 Citations

Abstract

Voice pathology includes a wide selection of diseases that affect human voice. Moreover, surgeries in the vicinity of the voice apparatus might temporarily negatively influence human voice. Despite the fact that voice pathology represents a common health issue, the majority of diagnostic methods are either invasive or subject to an interpretation of the medical professional which might vary between individuals. This article presents features that have been traditionally used in aiding the voice pathology diagnosis as well as modern approach to the voice pathology detection using image analysis and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Quantification of Linear and Non-linear Acoustic Analysis Applied to Voice Pathology Detection

Advanced computing solutions for analysis of laryngeal disorders

Article 06 September 2019

A Novel Method for Feature Extraction in Vocal Fold Pathology Diagnosis

References

Airainer, R., Klingholz, F.: Quantitative evaluation of phonetograms in the case of functional dysphonia. J. Voice 7(2), 136–141 (1993). https://linkinghub.elsevier.com/retrieve/pii/S0892199705803431
Alhussein, M., Muhammad, G.: Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6, 41034–41041 (2018)
Google Scholar
Ali, Z., Alsulaiman, M., Muhammad, G., Elamvazuthi, I., Al-nasheri, A., Mesallam, T.A., Farahat, M., Malki, K.H.: Intra- and inter-database study for Arabic, English, and German databases: do conventional speech features detect voice pathology? J. Voice 31(3), 386.e1–386.e8 (2017). https://linkinghub.elsevier.com/retrieve/pii/S0892199716301837
Awan, S.N., Solomon, N.P., Helou, L.B., Stojadinovic, A.: Spectral-cepstral estimation of dysphonia severity: external validation. Ann. Otorhinolaryngol. 122(1), 40–48 (2013). http://journals.sagepub.com/doi/10.1177/000348941312200108
Bihari, A., Mészáros, K., Reményi, A., Lichtenberger, G.: Voice quality improvement after management of unilateral vocal cord paralysis with different techniques. Eur. Arch. Otorhinolaryngol. 263(12), 1115–1120(2006). http://springer.longhoe.net/10.1007/s00405-006-0116-9
Boersma, P.: Should Jitter Be measured by peak picking or by waveform matching? Folia Phoniatr. Logop. 61(5), 305–308 (2009). https://www.karger.com/Article/FullText/245159
Cesari, U., De Pietro, G., Marciano, E., Niri, C., Sannino, G., Verde, L.: A new database of healthy and pathological voices. Comput. Electr. Eng. 68, 310–321 (2018). https://www.sciencedirect.com/science/article/pii/S0045790617338739
Dejonckere, P.H., Bradley, P., Clemente, P., Cornut, G., Crevier-Buchman, L., Friedrich, G., Van De Heyning, P., Remacle, M., Woisard, V.: A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Eur. Arch. Otorinolaryngol. 258(2), 77–82 (2001). https://doi.org/10.1007/s004050000299
Ferrer, C.A., Bodt, M., Maryn, Y., Van de Heyning, P., Hernandez-Diaz Huici, M.: Properties of the cepstral peak prominence and its usefulness in vocal quality measurements (2007)
Google Scholar
Ferrer, C.A., González, E., Hernández-Díaz, M.E.: Evaluation of time and frequency domain-based methods for the estimation of harmonics-to-noise-ratios in voice signals. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 406–415. Springer, Heidelberg (2006). https://doi.org/10.1007/11892755_42
Forero, L., Kohler, M., Vellasco, M., Cataldo, E.: Analysis and classification of voice pathologies using glottal signal parameters. J. Voice Off. J. Voice Found. 1 (2015)
Google Scholar
Ghoniem, R.M.: Deep genetic algorithm-based voice pathology diagnostic system. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). LNCS, vol. 11608 (2019). www.scopus.com
Hammarberg, B., Fritzen, B., Gauffin, J., Sundberg, J.: Acoustic and perceptual analysis of vocal dysfunction. J. Phonet. 14(3), 533–547 (1986). https://www.sciencedirect.com/science/article/pii/S0095447019307041
Hammarberg, B., Fritzen, B., Gauffin, J., Sundberg, J.: Vocal acoustic analysis - Jitter, Shimmer and HNR parameters. J. Phonet. 9(3), 533–547 (1986). https://www.sciencedirect.com/science/article/pii/S2212017313002788
Hanson, H.: Glottal characteristics of female speakers. Ph.D. thesis (1995)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Hillenbrand, J.: A methodological study of perturbation and additive noise in synthetically generated voice signals. J. Speech Lang. Hear. Res. 30(4), 448–461 (1987). http://pubs.asha.org/doi/10.1044/jshr.3004.448
Hillenbrand, J., Cleveland, R.A., Erickson, R.L.: Acoustic correlates of breathy vocal quality. J. Speech Lang. Hear. Res. 37(4), 769–778 (1994). http://pubs.asha.org/doi/10.1044/jshr.3704.769
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. ar**v preprint ar**v:1408.5093 (2014)
Lu, C., Wang, Y., Ragulskis, M., Cheng, Y.: Fault diagnosis for rotating machinery: a method based on image processing. PLoS ONE 11(10), e0164111 (2016)
Article Google Scholar
Löfqvist, A.: The long-time-average spectrum as a tool in voice research. J. Phonet. 14(3), 471–475 (1986). https://www.sciencedirect.com/science/article/pii/S0095447019306928
Narra, M., Dodderi, T., Anu, C.C., Varghese, S.M., Dattatreya, T.: Harmonic amplitude measures to note gender difference. Adv. Life Sci. Technol. 31, 17 (2015). https://www.iiste.org/Journals/index.php/ALST/article/view/21271
Parsa, V., Jamieson, D.G.: Identification of pathological voices using glottal noise measures. J. Speech Lang. Hear. Res. 43(2), 469–485 (2000). https://www.proquest.com/docview/232351644/abstract/36BA21E399FA4450PQ/1
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection (2015). ar**v Version Number: 5. https://arxiv.org/abs/1506.02640
Schlegel, P., Kist, A., Semmler, M., Döllinger, M., Kunduk, M., Dür, S., Schützenberger, A.: Determination of clinical parameters sensitive to functional voice disorders applying boosted decision stumps. IEEE J. Transl. Eng. Health Med. 8, 2100511 (2020). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7274815/
Schutte, H.K., Seidner, W.: Recommendation by the union of European phoniatricians (UEP): standardizing voice area measurement/phonetography. Folia Phoniatr. (Basel) 35(6), 286–8 (1983)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Google Scholar
Skarnitzl, R.: Fonetická identifikace mluvčího. Univerzita Karlova v Praze, Filozofická fakulta, 1st edn. (2014). https://books.ff.cuni.cz/edicni-rady-ff-uk/varia/foneticka-identifikace-mluvciho/
Titze, I., Horii, Y., Scherer, R.: Some technical considerations in voice perturbation measurements. J. Speech Hear. Res. 30, 252–60 (1987)
Article Google Scholar
Titze, I., Liang, H.: Comparison of Fo extraction methods for high-precision voice perturbation measurements. J. Speech Hear. Res. 36, 1120–33 (1993)
Article Google Scholar
Wang, Y., Cheng, Y.: An approach to fault diagnosis for gearbox based on image processing. In: Shock and Vibration (2016)
Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Article Google Scholar
Woldert-Jokisz, B.: Saarbruecken Voice Database. Institut für Phonetik, Universität des Saarlandes (2008). http://stimmdb.coli.uni-saarland.de/
Yumoto, E., Gould, W.J., Baer, T.: Harmonics-to-noise ratio as an index of the degree of hoarseness. J. Acoust. Soc. Am. 71(6), 1544–1550 (1982). https://asa.scitation.org/doi/abs/10.1121/1.387808

Download references

Acknowledgements

Jan Vrba acknowledges his specific university research grant JIGA 445-85-2222. Jakub Steinbach acknowledges his specific university grant (IGA) 445-88-2202.

Author information

Authors and Affiliations

Department of Computing and Control Engineering, University of Chemistry and Technology in Prague, Technická 1905/5, 166 28, Praha 6, Czech Republic
Jakub Steinbach & Jan Vrba
Faculty of Arts, Institute of Phonetics, Charles University, náměstí Jana Palacha 2, 116 38, Praha 1, Czech Republic
Richard Mazúr

Authors

Jakub Steinbach
View author publications
You can also search for this author in PubMed Google Scholar
Richard Mazúr
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vrba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Steinbach .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Radek Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Petr Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Zdenka Prokopova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Steinbach, J., Mazúr, R., Vrba, J. (2023). Trends in Voice Recording Classification - Comparison of Conventional Features and Image Analysis Approach. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Data Science and Algorithms in Systems. CoMeSySo 2022. Lecture Notes in Networks and Systems, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-031-21438-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-031-21438-7_51
Published: 04 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21437-0
Online ISBN: 978-3-031-21438-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Trends in Voice Recording Classification - Comparison of Conventional Features and Image Analysis Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quantification of Linear and Non-linear Acoustic Analysis Applied to Voice Pathology Detection

Advanced computing solutions for analysis of laryngeal disorders

A Novel Method for Feature Extraction in Vocal Fold Pathology Diagnosis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Trends in Voice Recording Classification - Comparison of Conventional Features and Image Analysis Approach

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Quantification of Linear and Non-linear Acoustic Analysis Applied to Voice Pathology Detection

Advanced computing solutions for analysis of laryngeal disorders

A Novel Method for Feature Extraction in Vocal Fold Pathology Diagnosis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation