Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis

Smith, Toni M.; Shen, Yi; Williams, Christina N.; Kidd, Gary R.; McAuley, J. Devin

doi:10.3758/s13414-023-02815-0

Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis

Published: 27 November 2023

Volume 86, pages 627–642, (2024)
Cite this article

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Toni M. Smith¹,
Yi Shen²,
Christina N. Williams²,
Gary R. Kidd³ &
…
J. Devin McAuley¹

236 Accesses
1 Altmetric
Explore all metrics

Abstract

Previous work by McAuley et al. Attention, Perception, & Psychophysics, 82, 3222–3233, (2020), Attention, Perception & Psychophysics, 83, 2229–2240, (2021) showed that disruption of the natural rhythm of target (attended) speech worsens speech recognition in the presence of competing background speech or noise (a target-rhythm effect), while disruption of background speech rhythm improves target recognition (a background-rhythm effect). While these results were interpreted as support for the role of rhythmic regularities in facilitating target-speech recognition amidst competing backgrounds (in line with a selective entrainment hypothesis), questions remain about the factors that contribute to the target-rhythm effect. Experiment 1 ruled out the possibility that the target-rhythm effect relies on a decrease in intelligibility of the rhythm-altered keywords. Sentences from the Coordinate Response Measure (CRM) paradigm were presented with a background of speech-shaped noise, and the rhythm of the initial portion of these target sentences (the target rhythmic context) was altered while critically leaving the target Color and Number keywords intact. Results showed a target-rhythm effect, evidenced by poorer keyword recognition when the target rhythmic context was altered, despite the absence of rhythmic manipulation of the keywords. Experiment 2 examined the influence of the relative onset asynchrony between target and background keywords. Results showed a significant target-rhythm effect that was independent of the effect of target-background keyword onset asynchrony. Experiment 3 provided additional support for the selective entrainment hypothesis by replicating the target-rhythm effect with a set of speech materials that were less rhythmically constrained than the CRM sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Altering the rhythm of target and background talkers differentially affects speech understanding

Article 26 May 2020

Temporal contrast effects in human speech perception are immune to selective attention

Article Open access 27 March 2020

Rhythmic and speech rate effects in the perception of durational cues

Article 12 July 2021

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability

Not applicable.

References

Allen, K., Carlile, S., & Alais, D. (2008). Contributions of talker characteristics and spatial location to auditory streaming. The Journal of the Acoustical Society of America, 123(3), 1562–1570.
Article PubMed Google Scholar
Assmann, P. F., & Summerfield, Q. (1989). Modeling the perception of concurrent vowels: Vowels with the same fundamental frequency. The Journal of the Acoustical Society of America, 85(1), 327–338.
Article PubMed Google Scholar
Assmann, P. F., & Summerfield, Q. (1990). Modeling the perception of concurrent vowels: Vowels with different fundamental frequencies. The Journal of the Acoustical Society of America, 88(2), 680–697.
Article PubMed Google Scholar
Aubanel, V., Davis, C., & Kim, J. (2016). Exploring the role of brain oscillations in speech perception in noise: intelligibility of isochronously retimed speech. Frontiers in Human Neuroscience, 10, 430.
Article PubMed PubMed Central Google Scholar
Auditech. (2015). Multitalker Noise—20 Talkers (Frank Version) [Audio recording]. https://auditec.com/2015/08/04/multitalker-noise-20-talkers-frank-version/
Baese-Berk, M. M., Dilley, L. C., Henry, M. J., Vinke, L., & Banzina, E. (2019). Not just a function of function words: Distal speech rate influences perception of prosodically weak syllables. Attention, Perception, & Psychophysics, 81(2), 571–589.
Article Google Scholar
Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41, 254–311.
Article PubMed Google Scholar
Bolia, R. S., Nelson, W. T., Ericson, M. A., & Simpson, B. D. (2000). A speech corpus for multitalker communications research. Journal of the Acoustical Society of America, 107, 1065–1066.
Article PubMed Google Scholar
Bregman, A. S. (1990). Auditory scene analysis. MIT Press.
Book Google Scholar
Brokx, J. P. L., & Nooteboom, S. G. (1982). Intonation and the perceptual separation of simultaneous voices. Journal of Phonetics, 10(1), 23–36.
Article Google Scholar
Darwin, C. J. (1981). Perceptual grou** of speech components differing in fundamental frequency and onset-time. The Quarterly Journal of Experimental Psychology Section A, 33(2), 185–207.
Article Google Scholar
Darwin, C. J., & Ciocca, V. (1992). Grou** in pitch perception: Effects of onset asynchrony and ear of presentation of a mistuned component. The Journal of the Acoustical Society of America, 91(6), 3381–3390.
Article PubMed Google Scholar
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62.
Article Google Scholar
Desjardins, J. L., & Doherty, K. A. (2013). Age-related changes in listening effort for various types of masker noises. Ear and Hearing, 34(3), 261–272.
Article PubMed Google Scholar
Dilley, L. C., & McAuley, J. D. (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language, 59, 294–311.
Article Google Scholar
Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19, 158.
Article PubMed Google Scholar
Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854–11859.
Article Google Scholar
Ding, N., & Simon, J. Z. (2014). Cortical entrainment to continuous speech: functional roles and interpretations. Frontiers in Human Neuroscience, 8, 311.
Article PubMed PubMed Central Google Scholar
Friston, K. (2005). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological sciences, 360(1456), 815–836.
Article Google Scholar
Friston, K. (2018). Does predictive coding have a future? Nature neuroscience, 21(8), 1019–1021.
Article PubMed Google Scholar
Ghitza, O. (2011). Linking speech perception and neurophysiology: Speech decoding guided by cascaded oscillators locked to the input rhythm. Frontiers in Psychology, 2, 130.
Article PubMed PubMed Central Google Scholar
Giraud, A. L., & Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principles and operations. Nature Neuroscience, 15, 511.
Article PubMed PubMed Central Google Scholar
Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., Simon, J. Z., Poeppel, D., & Schroeder, C. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.” Neuron, 77, 980–991.
Article Google Scholar
Goswami, U. (2019). Speech rhythm and language acquisition: An amplitude modulation phase hierarchy perspective. Annals of the New York Academy of Sciences, 1453(1), 67–8.
Article PubMed Google Scholar
Henry, M. J., & Herrmann, B. (2014). Low-frequency neural oscillations support dynamic attending in temporal context. Timing & Time Perception, 2(1), 62–86.
Article Google Scholar
Humes, L. E., Kidd, G. R., & Fogerty, D. (2017). Exploring use of the coordinate response measure in a multitalker babble paradigm. Journal of Speech, Language, and Hearing Research, 60(3), 741–754.
Article PubMed PubMed Central Google Scholar
Johnson, T. A., Cooper, S., Stamper, G. C., & Chertoff, M. (2017). Noise exposure questionnaire: A tool for quantifying annual noise exposure. Journal of the American Academy of Audiology, 28(1), 14–35.
Article PubMed PubMed Central Google Scholar
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83, 323–355.
Article PubMed Google Scholar
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459–491.
Article PubMed Google Scholar
Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science, 13, 313–319.
Article PubMed Google Scholar
Kollmeier, B., Warzybok, A., Hochmuth, S., Zokoll, M. A., Uslar, V., Brand, T., & Wagener, K. C. (2015). The multilingual matrix test: Principles, applications, and comparison across languages: A review. International Journal of Audiology, 54(Suppl. 2), 3–16.
Article PubMed Google Scholar
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106, 119–159.
Article Google Scholar
Lehiste, I. (1977). Isochrony reconsidered. Journal of phonetics, 5(3), 253–263.
Article Google Scholar
McAuley, J. D., & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: A comparison of interval and entrainment approaches to short-interval timing. Journal of Experimental Psychology: Human Perception and Performance, 29, 1102–1125.
PubMed Google Scholar
McAuley, J. D., Jones, M. R., Holub, S., Johnston, H. M., & Miller, N. S. (2006). The time of our lives: Life span development of timing and event tracking. Journal of Experimental Psychology: General, 135, 348–367.
Article PubMed Google Scholar
McAuley, J. D., Shen, Y., Dec, S., & Kidd, G. (2020). Altering the rhythm of target and background talkers differentially affects speech understanding: Support for a selective-entrainment hypothesis. Attention, Perception, & Psychophysics, 82, 3222–3233.
Article Google Scholar
McAuley, J. D., Shen, Y., Smith, T., & Kidd, G. R. (2021). Effects of speech-rhythm disruption on selective listening with a single background talker. Attention, Perception & Psychophysics, 83(5), 2229–2240. https://doi.org/10.3758/s13414-021-02298-x
Article Google Scholar
Miller, J. E., Carlson, L. A., & McAuley, J. D. (2013). When what you hear influences when you see: Listening to an auditory rhythm influences the temporal allocation of visual attention. Psychological Science, 24(1), 11–18.
Article PubMed Google Scholar
Milne, A. E., Bianco, R., Poole, K. C., Zhao, S., Oxenham, A. J., Billig, A. J., & Chait, M. (2021). An online headphone screening test based on dichotic pitch. Behavior Research Methods, 53(4), 1551–1562.
Article PubMed Google Scholar
Morrill, T. H., Dilley, L. C., McAuley, J. D., & Pitt, M. A. (2014). Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grou** hypothesis. Cognition, 131, 69–74.
Article PubMed Google Scholar
Noble, W., Jensen, N. S., Naylor, G., Bhullar, N., & Akeroyd, M. A. (2013). A short form of the Speech, Spatial and Qualities of Hearing scale suitable for clinical use: The SSQ12. International Journal of Audiology, 52(6), 409–412.
Article PubMed PubMed Central Google Scholar
Peng, Z. E., Waz, S., Buss, E., Shen, Y., Richards, V., Bharadwaj, H.,..., Venezia, J. H. (2022). Remote testing for psychological and physiological acoustics. The Journal of the Acoustical Society of America, 151(5), 3116–3128.
Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87.
Article PubMed Google Scholar
Reips, U. D. (2002). Standards for Internet-based experimenting. Experimental Psychology, 49(4), 243.
PubMed Google Scholar
Riecke, L., Formisano, E., Sorger, B., Baskent, D., & Gaudrain, E. (2018). Neural entrainment to speech modulates speech intelligibility. Current Biology, 28, 161–169.
Article PubMed Google Scholar
Rosen, S., Souza, P., Ekelund, C., & Majeed, A. A. (2013). Listening to speech in a background of other talkers: Effects of talker number and noise vocoding. The Journal of the Acoustical Society of America, 133(4), 2431–2443.
Article PubMed PubMed Central Google Scholar
Schütt, H. H., Harmeling, S., Macke, J. H., & Wichmann, F. A. (2016). Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data. Vision Research, 122, 105–123.
Article PubMed Google Scholar
Shen, Y., & Richards, V. M. (2012). A maximum-likelihood procedure for estimating psychometric functions: Thresholds, slopes, and lapses of attention. The Journal of the Acoustical Society of America, 132(2), 957–967.
Article PubMed PubMed Central Google Scholar
Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. The Journal of the Acoustical Society of America, 134(1), 628–639.
Article PubMed Google Scholar
Turgeon, M., Bregman, A. S., & Roberts, B. (2005). Rhythmic masking release: Effects of asynchrony, temporal overlap, harmonic relations, and source separation on cross-spectral grou**. Journal of Experimental Psychology: Human Perception and Performance, 31(5), 939.
PubMed Google Scholar
Vuust, P., & Witek, M. A. (2014). Rhythmic complexity and predictive coding: A novel approach to modeling rhythm and meter perception in music. Frontiers in Psychology, 5, 1111.
Article PubMed PubMed Central Google Scholar
Wang, M., Kong, L., Zhang, C., Wu, X., & Li, L. (2018). Speaking rhythmically improves speech recognition under “cocktail-party” conditions. The Journal of the Acoustical Society of America, 143, EL255–EL259.
Article PubMed Google Scholar

Download references

Acknowledgments

The authors thank Frank Dolecki and Kyle Oliver for their assistance with data collection and helpful insights, Dylan V. Pearson at Indiana University for assistance with stimulus generation, and members of the Timing, Attention and Perception Lab at Michigan State University for their helpful suggestions and comments at various stages of this project. NIH Grant R01DC013538 (PIs: Gary R. Kidd and J. Devin McAuley), NIH Grant R01017988 (PI: Yi Shen), and the University of Washington Mary Gates Undergraduate Research Scholarship (Awardee: Christina N. Williams; Mentor: Yi Shen) supported this research.

Funding

This work was supported by NIH Grant R01DC013538 (PIs: Gary R. Kidd and J. Devin McAuley), NIH Grant R01017988 (PI: Yi Shen), and the University of Washington Mary Gates Undergraduate Research Scholarship (Awardee: Christina N. Williams; Mentor: Yi Shen)

Author information

Authors and Affiliations

Department of Psychology, Michigan State University, East Lansing, MI, USA
Toni M. Smith & J. Devin McAuley
Department of Speech and Hearing Sciences, University of Washington, Seattle, WA, USA
Yi Shen & Christina N. Williams
Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
Gary R. Kidd

Authors

Toni M. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Yi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Christina N. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Gary R. Kidd
View author publications
You can also search for this author in PubMed Google Scholar
J. Devin McAuley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toni M. Smith.

Ethics declarations

Conflicts of interest

The authors have no relevant financial or nonfinancial interests to disclose

Ethics approval

This study was approved by the institutional review boards at Michigan State University and the University of Washington. The procedures of this study adhere to the principles of the Declaration of Helsinki.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

Not applicable.

Additional information

Open practices statement

The data and materials for all experiments are available upon request. None of the experiments was preregistered.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 Experiment 1: Supplementary results

Learning analysis

At the suggestion of a reviewer, we investigated whether there were any learning effects on performance. Since intact and rhythm-altered conditions were alternated from one block to the next and there was no interaction between rhythm condition and block, F(7, 98) = 1.88, p = .082, η² = 0.118, we averaged proportion correct data over pairs of blocks. There was a small but significant learning effect, F(7, 98) = 2.86, p = .009, η² = 0.170. Follow-up Bonferroni-corrected paired t tests showed that the only comparison that approached significance was between the first (M = 0.39, SD = 0.09) and last (M = 0.49, SD = 0.12) blocks, t(14) = −3.62, p = .077, 95% CI [−0.22, 0.01]. No other differences approached significance (all p > .10).

NEQ and SSQ

Supplementary analyses considered the possible relation between individual scores on the NEQ and SSQ and overall speech-in-noise performance (as well as any relation to the magnitude of the target-rhythm effect) in Experiment 1. Estimates of annual noise exposure (ANE) were calculated for each participant from their NEQ responses, based on the procedure outlined in Johnson et al. (2017). The overall SSQ score was calculated by averaging responses across all items. Separate scores were also calculated for the speech, spatial, and qualities subscales of the SSQ by averaging individual responses across only items within one of the given subscales. The correlation between ANE and SSQ was significantly positive, r(13) = 0.56, p = .029. We found no relationship between estimates of ANE and overall speech-in-noise performance (averaged across target rhythm intact and target rhythm altered conditions), r(13) = −0.22, p = 0.44, nor between overall speech-in-noise performance and overall SSQ scores, r(13) = 0.33, p = 0.24. There also was no reliable correlation between overall speech-in-noise performance and any of the SSQ subscales—speech: r(13) = 0.33, p = .24; spatial: r(13) = 0.16, p = .58; qualities: r(13) = 0.30, p = .28.

In terms of the magnitude of the target-rhythm effect (the performance in the rhythm intact condition minus performance in the rhythm altered condition), there was similarly no relationship between ANE and the magnitude of the target-rhythm effect, r(13) = −0.08, p = .77, nor between SSQ scores and the magnitude of the target-rhythm effect, r(13) = 0.28, p = .31. There was also no correlation between the magnitude of the rhythm effect and any of the three SSQ subscales—speech: r(13) = 0.26, p = .35; spatial: r(13) = 0.27, p = .33; qualities: r(13) = 0.18, p = .53.

Finally, there was no evidence of a relationship between formal music training and speech-in-noise performance when overall performance was compared between participants who did not report having any formal music training (M = 0.45, SD = 0.08) and those who did (M = 0.46, SD = 0.08), t(13) = −0.42, p = .68, Cohen’s d = −0.22, 95% CI [−0.11, 0.07]. Similarly, the magnitude of the target-rhythm effect was not significantly different for participants with formal music training (M = 0.11, SD = 0.06) and without formal music training (M = 0.06, SD = 0.04), t(13) = −1.86, p = .09, Cohen’s d = −0.96, 95% CI [−0.10, 0.01].

Experiment 2: Supplementary results

Supplementary analyses considered the potential relations between estimated ANE from the NEQ, SSQ scores, formal music training and overall speech-in-noise performance. To create a composite speech-in-noise score, proportion correct data was averaged for each participant across OA and rhythm intact/altered conditions. Since there were different SNR groups, we conducted partial correlations between ANE, overall SSQ score, years of formal music training, and speech-in-noise performance, controlling for SNR. Three participants were excluded from these analyses due to data quality issues with their survey responses (e.g., responding “10” to all SSQ items, or having multiple inconsistent responses on the NEQ that may have led to underestimates of ANE). There was no correlation between overall performance and ANE, r(33) = 0.244, p = .16, between overall performance and overall SSQ score, r(33) = -0.116, p = .51, or between overall performance and years of formal music training, r(33) = −0.04, p = .80. When SSQ responses were broken down into three subscales, there was still no correlation with overall proportion correct scores—speech: r(33) = 0.038, p = .83; spatial: r(33) = −0.08, p = .67; qualities: r(33) = −0.193, p = .27.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Smith, T.M., Shen, Y., Williams, C.N. et al. Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis. Atten Percept Psychophys 86, 627–642 (2024). https://doi.org/10.3758/s13414-023-02815-0

Download citation

Accepted: 03 November 2023
Published: 27 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.3758/s13414-023-02815-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Altering the rhythm of target and background talkers differentially affects speech understanding

Temporal contrast effects in human speech perception are immune to selective attention

Rhythmic and speech rate effects in the perception of durational cues

Data availability

Code availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Open practices statement

Publisher's Note

Appendices

Appendix 1

Experiment 1: Supplementary results

Learning analysis

NEQ and SSQ

Experiment 2: Supplementary results

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Contribution of speech rhythm to understanding speech in noisy conditions: Further test of a selective entrainment hypothesis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Altering the rhythm of target and background talkers differentially affects speech understanding

Temporal contrast effects in human speech perception are immune to selective attention

Rhythmic and speech rate effects in the perception of durational cues

Data availability

Code availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Open practices statement

Publisher's Note

Appendices

Appendix 1

Experiment 1: Supplementary results

Learning analysis

NEQ and SSQ

Experiment 2: Supplementary results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation