Toward a Model of Auditory-Visual Speech Intelligibility

Grant, Ken W.; Bernstein, Joshua G. W.

doi:10.1007/978-3-030-10461-0_3

Ken W. Grant²¹ &
Joshua G. W. Bernstein²¹

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 68))

1313 Accesses
14 Citations
2 Altmetric

Abstract

A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect

Article Open access 12 June 2020

McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS)

Article 25 August 2016

A Variety of Visual-Speech Matching ERP Studies in Quiet-Noise Scenarios

References

American National Standards Institute (ANSI). (1969). American National Standard Methods for the calculation of the articulation index. ANSI S3.5-1969. New York: American National Standards Institute.
Google Scholar
American National Standards Institute (ANSI). (1997). American National Standard Methods for calculation of the speech intelligibility index. ANSI S3.5–1997. New York: American National Standards Institute.
Google Scholar
Bernstein, J. G. W., & Grant, K. W. (2009). Audio and audiovisual speech intelligibility in fluctuating maskers by normal-hearing and hearing-impaired listeners. The Journal of the Acoustical Society of America, 125, 3358–3372.
Article PubMed Google Scholar
Bernstein, J. G. W., Summers, V., Grassi, E., & Grant, K. W. (2013). Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing. Journal of the American Academy of Audiology, 24, 307–328.
Article PubMed Google Scholar
Berthommier, F. (2004). A phonetically neutral model of the low-level audio-visual interaction. Speech Communication, 44(1), 31–41.
Article Google Scholar
Braida, L. D. (1991). Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology, 43, 647–677.
Article CAS Google Scholar
Bruce, I. (2017). Physiologically based predictors of speech intelligibility. Acoustics Today, 13(1), 28–35.
Google Scholar
Byrne, D., Dillon, H., Ching, T., Katsch, R., & Keidser, G. (2001). NAL-NL1 procedure for fitting nonlinear hearing aids: Characteristics and comparisons with other procedures. Journal of the American Academy of Audiology, 31, 37–51.
Google Scholar
Drullman, R., & Smoorenburg, G. F. (1997). Audio-visual perception of compressed speech by profoundly hearing-impaired subjects. Audiology, 36(3), 165–177.
Article CAS PubMed Google Scholar
Elhilali, M., Chi, T., & Shamma, S. A. (2003). A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication, 41(2), 331–348.
Article Google Scholar
Erber, N. (1972). Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. Journal of Speech, Language, and Hearing Research, 15(2), 413–422.
Article CAS Google Scholar
Fletcher, H. (1953). Speech and hearing in communication. New York: Van Nostrand.
Google Scholar
Fletcher, H., & Gault, R. H. (1950). The perception of speech and its relation to telephony. The Journal of the Acoustical Society of America, 22, 89–150.
Article Google Scholar
French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19, 90–119.
Article Google Scholar
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., et al. (1990). DARPA, TIMIT acoustic-phonetic continuous speech corpus CD-ROM. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.
Google Scholar
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.
Book Google Scholar
Girin, L., Schwartz, J. L., & Feng, G. (2001). Audio-visual enhancement of speech in noise. The Journal of the Acoustical Society of America, 109(6), 3007–3020.
Article CAS PubMed Google Scholar
Gordon, P. C. (1997). Coherence masking protection in speech sounds: The role of formant synchrony. Perception & Psychophysics, 59, 232–242.
Article CAS Google Scholar
Gordon, P. C. (2000). Masking protection in the perception of auditory objects. Speech Communication, 30, 197–206.
Article Google Scholar
Grant, K. W. (2001). The effect of speechreading on masked detection thresholds for filtered speech. The Journal of the Acoustical Society of America, 109, 2272–2275.
Google Scholar
Grant, K. W., Ardell, L. H., Kuhl, P. K., & Sparks, D. W. (1985). The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. The Journal of the Acoustical Society of America, 77, 671–677.
Article CAS PubMed Google Scholar
Grant, K. W., Bernstein, J. G. W., & Grassi, E. (2008). Modeling auditory and auditory-visual speech intelligibility: Challenges and possible solutions. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 47–58.
Google Scholar
Grant, K. W., Bernstein, J. G. W., & Summers, V. (2013). Predicting speech intelligibility by individual hearing-impaired listeners: The path forward. Journal of the American Academy of Audiology, 24, 329–336.
Article PubMed Google Scholar
Grant, K. W., & Braida, L. D. (1991). Evaluating the articulation index for audiovisual input. The Journal of the Acoustical Society of America, 89, 2952–2960.
Article CAS PubMed Google Scholar
Grant, K. W., Greenberg, S., Poeppel, D., & van Wassenhove, V. (2004). Effects of spectro-temporal asynchrony in auditory and auditory-visual speech processing. Seminars in Hearing, 25, 241–255.
Article Google Scholar
Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108, 1197–1208.
Article CAS PubMed Google Scholar
Grant, K. W., Tufts, J. B., & Greenberg, S. (2007). Integration efficiency for speech perception within and across sensory modalities. The Journal of the Acoustical Society of America, 121, 1164–1176.
Article PubMed Google Scholar
Grant, K. W., & Walden, B. E. (1996). Evaluating the articulation index for auditory-visual consonant recognition. The Journal of the Acoustical Society of America, 100, 2415–2424.
Article CAS PubMed Google Scholar
Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103, 2677–2690.
Article CAS PubMed Google Scholar
Hall, J. W., Haggard, M. P., & Fernandes, M. A. (1984). Detection in noise by spectro-temporal pattern analysis. The Journal of the Acoustical Society of America, 76, 50–56.
Article CAS PubMed Google Scholar
Hardick, E. J., Oyer, H. J., & Irion, P. E. (1970). Lipreading performance as related to measurements of vision. Journal of Speech and Hearing Research, 13, 92–100.
Article CAS PubMed Google Scholar
Helfer, K. S., & Freyman, R. L. (2005). The role of visual speech cues in reducing energetic and informational masking. The Journal of the Acoustical Society of America, 117(2), 842–849.
Article PubMed Google Scholar
Hickson, L., Hollins, M., Lind, C., Worrall, L. E., & Lovie-Kitchin, J. (2004). Auditory-visual speech perception in older people: The effect of visual acuity. Australian and New Zealand Journal of Audiology, 26, 3–11.
Article Google Scholar
Kewley-Port, D. (1983). Time-varying features as correlates of place of articulation in stop consonants. The Journal of the Acoustical Society of America, 73(1), 322–335.
Article CAS PubMed Google Scholar
Killion, M., Schulein, R., Christensen, L., Fabry, D., Revit, L., Niquette, P., & Chung, K. (1998). Real-world performance of an ITE directional microphone. The Hearing Journal, 51, 24–39.
Article Google Scholar
Legault, I., Gagné, J. P., Rhoualem, W., & Anderson-Gosselin, P. (2010). The effects of blurred vision on auditory-visual speech perception in younger and older adults. International Journal of Audiology, 49(12), 904–911.
Article PubMed Google Scholar
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6), 431–461.
Article CAS PubMed Google Scholar
Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.
Google Scholar
Massaro, D. W., Cohen, M. M., & Smeele, P. M. (1996). Perception of asynchronous and conflicting visual and auditory speech. The Journal of the Acoustical Society of America, 100(3), 1777–1786.
Article CAS PubMed Google Scholar
McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. The Journal of the Acoustical Society of America, 77(2), 678–685.
Article CAS PubMed Google Scholar
Middelweerd, M. J., & Plomp, R. (1987). The effect of speechreading on the speech-reception threshold of sentences in noise. The Journal of the Acoustical Society of America, 82(6), 2145–2147.
Article CAS PubMed Google Scholar
Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America, 27(2), 338–352.
Article Google Scholar
Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time. Speech Communication, 41(1), 245–255.
Article Google Scholar
Reetz, H., & Jongman, A. (2011). Phonetics: Transcription, production, acoustics, and perception. Chichester, West Sussex: Wiley-Blackwell.
Google Scholar
Rhebergen, K. S., & Versfeld, N. J. (2005). A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 117(4), 2181–2192.
Article PubMed Google Scholar
Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. The Journal of the Acoustical Society of America, 120(6), 3988–3997.
Article PubMed Google Scholar
Rosen, S. M., Fourcin, A. J., & Moore, B. C. J. (1981). Voice pitch as an aid to lipreading. Nature, 291(5811), 150–152.
Article CAS PubMed Google Scholar
Shahin, A. J., Shen, S., & Kerlin, J. R. (2017). Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech. Language, Cognition and Neuroscience, 32(9), 1102–1118.
Article PubMed PubMed Central Google Scholar
Shoop, C., & Binnie, C. A. (1979). The effects of age upon the visual perception of speech. Scandinavian Audiology, 8(1), 3–8.
Article CAS PubMed Google Scholar
Sommers, M. S., Tye-Murray, N., & Spehar, B. (2005). Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear and Hearing, 26(3), 263–275.
Article PubMed Google Scholar
Steeneken, H. J., & Houtgast, T. (2002). Validation of the revised STI_r method. Speech Communication, 38(3), 413–425.
Article Google Scholar
Studdert-Kennedy, M. (1974). The perception of speech. Current Trends in Linguistics, 12, 2349–2385.
Google Scholar
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215.
Article Google Scholar
Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 3–52). Hillsdale NJ: Lawrence Erlbaum Associates.
Google Scholar
Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society of London B, Biological Sciences, 335(1273), 71–78.
Article CAS PubMed Google Scholar
Tye-Murray, N., Sommers, M. S., & Spehar, B. (2007). Audiovisual integration and lipreading abilities of older adults with normal and impaired hearing. Ear and Hearing, 28(5), 656–668.
Article PubMed Google Scholar
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186.
Article PubMed PubMed Central CAS Google Scholar
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45, 598–607.
Article PubMed Google Scholar
Walden, B. E., Grant, K. W., & Cord, M. T. (2001). Effects of amplification and speechreading on consonant recognition by persons with impaired hearing. Ear and Hearing, 22(4), 333–341.
Article CAS PubMed Google Scholar
Walden, B. E., Surr, R. K., Cord, M. T., & Dyrlund, O. (2004). Predicting hearing aid microphone preference in everyday listening. Journal of the American Academy of Audiology, 15(5), 365–396.
Article PubMed Google Scholar
Wu, Y. H., & Bentler, R. A. (2010). Impact of visual cues on directional benefit and preference: Part I—Laboratory tests. Ear and Hearing, 31(1), 22–34.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
Ken W. Grant & Joshua G. W. Bernstein

Authors

Ken W. Grant
View author publications
You can also search for this author in PubMed Google Scholar
Joshua G. W. Bernstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ken W. Grant .

Editor information

Editors and Affiliations

Department of Speech and Hearing Sciences, Institute for Learning and Brain Sciences, University of Washington, Seattle, WA, USA
Adrian K. C. Lee
Departments of Hearing and Speech Sciences, Psychiatry, Psychology and Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
Mark T. Wallace
Department of Integrative Physiology and Neuroscience, Washington State University Vancouver, Vancouver, WA, USA
Allison B. Coffin
Department of Biology, University of Maryland, College Park, MD, USA
Arthur N. Popper
Department of Psychology, Loyola University Chicago, Chicago, IL, USA
Richard R. Fay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grant, K.W., Bernstein, J.G.W. (2019). Toward a Model of Auditory-Visual Speech Intelligibility. In: Lee, A., Wallace, M., Coffin, A., Popper, A., Fay, R. (eds) Multisensory Processes. Springer Handbook of Auditory Research, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-10461-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-10461-0_3
Published: 09 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10459-7
Online ISBN: 978-3-030-10461-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Toward a Model of Auditory-Visual Speech Intelligibility

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect

McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS)

A Variety of Visual-Speech Matching ERP Studies in Quiet-Noise Scenarios

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Toward a Model of Auditory-Visual Speech Intelligibility

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Audio-visual integration in noise: Influence of auditory and visual stimulus degradation on eye movements and perception of the McGurk effect

McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS)

A Variety of Visual-Speech Matching ERP Studies in Quiet-Noise Scenarios

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation