Toward a Model of Auditory-Visual Speech Intelligibility

  • Chapter
  • First Online:
Multisensory Processes

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 68))

Abstract

A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 149.79
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • American National Standards Institute (ANSI). (1969). American National Standard Methods for the calculation of the articulation index. ANSI S3.5-1969. New York: American National Standards Institute.

    Google Scholar 

  • American National Standards Institute (ANSI). (1997). American National Standard Methods for calculation of the speech intelligibility index. ANSI S3.5–1997. New York: American National Standards Institute.

    Google Scholar 

  • Bernstein, J. G. W., & Grant, K. W. (2009). Audio and audiovisual speech intelligibility in fluctuating maskers by normal-hearing and hearing-impaired listeners. The Journal of the Acoustical Society of America, 125, 3358–3372.

    Article  PubMed  Google Scholar 

  • Bernstein, J. G. W., Summers, V., Grassi, E., & Grant, K. W. (2013). Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing. Journal of the American Academy of Audiology, 24, 307–328.

    Article  PubMed  Google Scholar 

  • Berthommier, F. (2004). A phonetically neutral model of the low-level audio-visual interaction. Speech Communication, 44(1), 31–41.

    Article  Google Scholar 

  • Braida, L. D. (1991). Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology, 43, 647–677.

    Article  CAS  Google Scholar 

  • Bruce, I. (2017). Physiologically based predictors of speech intelligibility. Acoustics Today, 13(1), 28–35.

    Google Scholar 

  • Byrne, D., Dillon, H., Ching, T., Katsch, R., & Keidser, G. (2001). NAL-NL1 procedure for fitting nonlinear hearing aids: Characteristics and comparisons with other procedures. Journal of the American Academy of Audiology, 31, 37–51.

    Google Scholar 

  • Drullman, R., & Smoorenburg, G. F. (1997). Audio-visual perception of compressed speech by profoundly hearing-impaired subjects. Audiology, 36(3), 165–177.

    Article  CAS  PubMed  Google Scholar 

  • Elhilali, M., Chi, T., & Shamma, S. A. (2003). A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication, 41(2), 331–348.

    Article  Google Scholar 

  • Erber, N. (1972). Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. Journal of Speech, Language, and Hearing Research, 15(2), 413–422.

    Article  CAS  Google Scholar 

  • Fletcher, H. (1953). Speech and hearing in communication. New York: Van Nostrand.

    Google Scholar 

  • Fletcher, H., & Gault, R. H. (1950). The perception of speech and its relation to telephony. The Journal of the Acoustical Society of America, 22, 89–150.

    Article  Google Scholar 

  • French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19, 90–119.

    Article  Google Scholar 

  • Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., et al. (1990). DARPA, TIMIT acoustic-phonetic continuous speech corpus CD-ROM. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.

    Google Scholar 

  • Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.

    Book  Google Scholar 

  • Girin, L., Schwartz, J. L., & Feng, G. (2001). Audio-visual enhancement of speech in noise. The Journal of the Acoustical Society of America, 109(6), 3007–3020.

    Article  CAS  PubMed  Google Scholar 

  • Gordon, P. C. (1997). Coherence masking protection in speech sounds: The role of formant synchrony. Perception & Psychophysics, 59, 232–242.

    Article  CAS  Google Scholar 

  • Gordon, P. C. (2000). Masking protection in the perception of auditory objects. Speech Communication, 30, 197–206.

    Article  Google Scholar 

  • Grant, K. W. (2001). The effect of speechreading on masked detection thresholds for filtered speech. The Journal of the Acoustical Society of America, 109, 2272–2275.

    Google Scholar 

  • Grant, K. W., Ardell, L. H., Kuhl, P. K., & Sparks, D. W. (1985). The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. The Journal of the Acoustical Society of America, 77, 671–677.

    Article  CAS  PubMed  Google Scholar 

  • Grant, K. W., Bernstein, J. G. W., & Grassi, E. (2008). Modeling auditory and auditory-visual speech intelligibility: Challenges and possible solutions. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 47–58.

    Google Scholar 

  • Grant, K. W., Bernstein, J. G. W., & Summers, V. (2013). Predicting speech intelligibility by individual hearing-impaired listeners: The path forward. Journal of the American Academy of Audiology, 24, 329–336.

    Article  PubMed  Google Scholar 

  • Grant, K. W., & Braida, L. D. (1991). Evaluating the articulation index for audiovisual input. The Journal of the Acoustical Society of America, 89, 2952–2960.

    Article  CAS  PubMed  Google Scholar 

  • Grant, K. W., Greenberg, S., Poeppel, D., & van Wassenhove, V. (2004). Effects of spectro-temporal asynchrony in auditory and auditory-visual speech processing. Seminars in Hearing, 25, 241–255.

    Article  Google Scholar 

  • Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108, 1197–1208.

    Article  CAS  PubMed  Google Scholar 

  • Grant, K. W., Tufts, J. B., & Greenberg, S. (2007). Integration efficiency for speech perception within and across sensory modalities. The Journal of the Acoustical Society of America, 121, 1164–1176.

    Article  PubMed  Google Scholar 

  • Grant, K. W., & Walden, B. E. (1996). Evaluating the articulation index for auditory-visual consonant recognition. The Journal of the Acoustical Society of America, 100, 2415–2424.

    Article  CAS  PubMed  Google Scholar 

  • Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103, 2677–2690.

    Article  CAS  PubMed  Google Scholar 

  • Hall, J. W., Haggard, M. P., & Fernandes, M. A. (1984). Detection in noise by spectro-temporal pattern analysis. The Journal of the Acoustical Society of America, 76, 50–56.

    Article  CAS  PubMed  Google Scholar 

  • Hardick, E. J., Oyer, H. J., & Irion, P. E. (1970). Lipreading performance as related to measurements of vision. Journal of Speech and Hearing Research, 13, 92–100.

    Article  CAS  PubMed  Google Scholar 

  • Helfer, K. S., & Freyman, R. L. (2005). The role of visual speech cues in reducing energetic and informational masking. The Journal of the Acoustical Society of America, 117(2), 842–849.

    Article  PubMed  Google Scholar 

  • Hickson, L., Hollins, M., Lind, C., Worrall, L. E., & Lovie-Kitchin, J. (2004). Auditory-visual speech perception in older people: The effect of visual acuity. Australian and New Zealand Journal of Audiology, 26, 3–11.

    Article  Google Scholar 

  • Kewley-Port, D. (1983). Time-varying features as correlates of place of articulation in stop consonants. The Journal of the Acoustical Society of America, 73(1), 322–335.

    Article  CAS  PubMed  Google Scholar 

  • Killion, M., Schulein, R., Christensen, L., Fabry, D., Revit, L., Niquette, P., & Chung, K. (1998). Real-world performance of an ITE directional microphone. The Hearing Journal, 51, 24–39.

    Article  Google Scholar 

  • Legault, I., Gagné, J. P., Rhoualem, W., & Anderson-Gosselin, P. (2010). The effects of blurred vision on auditory-visual speech perception in younger and older adults. International Journal of Audiology, 49(12), 904–911.

    Article  PubMed  Google Scholar 

  • Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6), 431–461.

    Article  CAS  PubMed  Google Scholar 

  • Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.

    Google Scholar 

  • Massaro, D. W., Cohen, M. M., & Smeele, P. M. (1996). Perception of asynchronous and conflicting visual and auditory speech. The Journal of the Acoustical Society of America, 100(3), 1777–1786.

    Article  CAS  PubMed  Google Scholar 

  • McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. The Journal of the Acoustical Society of America, 77(2), 678–685.

    Article  CAS  PubMed  Google Scholar 

  • Middelweerd, M. J., & Plomp, R. (1987). The effect of speechreading on the speech-reception threshold of sentences in noise. The Journal of the Acoustical Society of America, 82(6), 2145–2147.

    Article  CAS  PubMed  Google Scholar 

  • Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America, 27(2), 338–352.

    Article  Google Scholar 

  • Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time. Speech Communication, 41(1), 245–255.

    Article  Google Scholar 

  • Reetz, H., & Jongman, A. (2011). Phonetics: Transcription, production, acoustics, and perception. Chichester, West Sussex: Wiley-Blackwell.

    Google Scholar 

  • Rhebergen, K. S., & Versfeld, N. J. (2005). A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 117(4), 2181–2192.

    Article  PubMed  Google Scholar 

  • Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. The Journal of the Acoustical Society of America, 120(6), 3988–3997.

    Article  PubMed  Google Scholar 

  • Rosen, S. M., Fourcin, A. J., & Moore, B. C. J. (1981). Voice pitch as an aid to lipreading. Nature, 291(5811), 150–152.

    Article  CAS  PubMed  Google Scholar 

  • Shahin, A. J., Shen, S., & Kerlin, J. R. (2017). Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech. Language, Cognition and Neuroscience, 32(9), 1102–1118.

    Article  PubMed  PubMed Central  Google Scholar 

  • Shoop, C., & Binnie, C. A. (1979). The effects of age upon the visual perception of speech. Scandinavian Audiology, 8(1), 3–8.

    Article  CAS  PubMed  Google Scholar 

  • Sommers, M. S., Tye-Murray, N., & Spehar, B. (2005). Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear and Hearing, 26(3), 263–275.

    Article  PubMed  Google Scholar 

  • Steeneken, H. J., & Houtgast, T. (2002). Validation of the revised STIr method. Speech Communication, 38(3), 413–425.

    Article  Google Scholar 

  • Studdert-Kennedy, M. (1974). The perception of speech. Current Trends in Linguistics, 12, 2349–2385.

    Google Scholar 

  • Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215.

    Article  Google Scholar 

  • Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 3–52). Hillsdale NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society of London B, Biological Sciences, 335(1273), 71–78.

    Article  CAS  PubMed  Google Scholar 

  • Tye-Murray, N., Sommers, M. S., & Spehar, B. (2007). Audiovisual integration and lipreading abilities of older adults with normal and impaired hearing. Ear and Hearing, 28(5), 656–668.

    Article  PubMed  Google Scholar 

  • van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45, 598–607.

    Article  PubMed  Google Scholar 

  • Walden, B. E., Grant, K. W., & Cord, M. T. (2001). Effects of amplification and speechreading on consonant recognition by persons with impaired hearing. Ear and Hearing, 22(4), 333–341.

    Article  CAS  PubMed  Google Scholar 

  • Walden, B. E., Surr, R. K., Cord, M. T., & Dyrlund, O. (2004). Predicting hearing aid microphone preference in everyday listening. Journal of the American Academy of Audiology, 15(5), 365–396.

    Article  PubMed  Google Scholar 

  • Wu, Y. H., & Bentler, R. A. (2010). Impact of visual cues on directional benefit and preference: Part I—Laboratory tests. Ear and Hearing, 31(1), 22–34.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ken W. Grant .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Grant, K.W., Bernstein, J.G.W. (2019). Toward a Model of Auditory-Visual Speech Intelligibility. In: Lee, A., Wallace, M., Coffin, A., Popper, A., Fay, R. (eds) Multisensory Processes. Springer Handbook of Auditory Research, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-10461-0_3

Download citation

Publish with us

Policies and ethics

Navigation