Explaining neural activity in human listeners with deep learning via natural language processing of narrative text

Russo, Andrea G.; Ciarlo, Assunta; Ponticorvo, Sara; Di Salle, Francesco; Tedeschi, Gioacchino; Esposito, Fabrizio

doi:10.1038/s41598-022-21782-4

Explaining neural activity in human listeners with deep learning via natural language processing of narrative text

Article
Open access
Published: 25 October 2022

Volume 12, article number 17838, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Explaining neural activity in human listeners with deep learning via natural language processing of narrative text

Download PDF

Andrea G. Russo¹,
Assunta Ciarlo²,
Sara Ponticorvo^2,3,
Francesco Di Salle^2,4,
Gioacchino Tedeschi¹ &
…
Fabrizio Esposito¹

3722 Accesses
3 Altmetric
Explore all metrics

Abstract

Deep learning (DL) approaches may also inform the analysis of human brain activity. Here, a state-of-art DL tool for natural language processing, the Generative Pre-trained Transformer version 2 (GPT-2), is shown to generate meaningful neural encodings in functional MRI during narrative listening. Linguistic features of word unpredictability (surprisal) and contextual importance (saliency) were derived from the GPT-2 applied to the text of a 12-min narrative. Segments of variable duration (from 15 to 90 s) defined the context for the next word, resulting in different sets of neural predictors for functional MRI signals recorded in 27 healthy listeners of the narrative. GPT-2 surprisal, estimating word prediction errors from the artificial network, significantly explained the neural data in superior and middle temporal gyri (bilaterally), in anterior and posterior cingulate cortices, and in the left prefrontal cortex. GPT-2 saliency, weighing the importance of context words, significantly explained the neural data for longer segments in left superior and middle temporal gyri. These results add novel support to the use of DL tools in the search for neural encodings in functional MRI. A DL language model like the GPT-2 may feature useful data about neural processes subserving language comprehension in humans, including next-word context-related prediction.

A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension

Article Open access 08 June 2022

Deep language algorithms predict semantic comprehension from brain activity

Article Open access 29 September 2022

Brains and algorithms partially converge in natural language processing

Article Open access 16 February 2022

Introduction

Deep artificial neural networks and deep learning (DL) tools can be remarkably accurate in assisting or performing human tasks and can therefore be used to replicate the output of human reasoning in a computer simulation. Such DL “in-silico” models are increasingly used in neuroscience to address some of the complex neural mechanisms that subserve high-order cognitive functions in humans^1,2. This can be done empirically by applying the same information as input (under the same or a different modality) to both the artificial network and the biological system and then extracting suitable “explanation” parameters from the former to predict the neural responses measured in the latter³. In the field of natural language processing (NLP), DL models have been successfully combined with neuroimaging techniques to recognize and localize some specific neural mechanisms putatively subserving language processing in the human brain^4,5,6,7.

Among the available NLP models based on DL, Schrimpf et al.⁷ showed that the Generative Pre-trained Transformer version 2 (GPT-2)⁸ has the best performance in the encoding of neural signals. The architecture of the GPT-2 is based on the so-called “attention mechanism”⁹ and, compared to other highly performative attention-based DL models, such as BERT¹⁰, it uses only the information on the left side to predict the next word in the text⁸. Recently, Goldstein et al.⁵ used electrocorticography (ECoG) to demonstrate that the GPT-2 shares similar computational principles with the human brain when processing text.

The prediction of the next word from a given sequence also occurs in the human brain when continuously engaged in the generation of meaningful linguistic structures from the auditory streams of words perceived up to that moment, such as during natural listening^6,11,12. In addition, it is an established principle of sensory perception that an attention mechanism is also needed in the human brain to improve the prediction mechanism via synergistic modulation of input signals^13,14. Thus, if a similar principle applies to sequences of tokens (i.e., words, punctuations, etc.), feeding the GPT-2 with a given text, and analyzing its internal process, might help to explain some of the neural processes elicited in the brain of human listeners to whom the same text is narrated.

So far, previous studies have mostly focused on abstracting the neural representation of specific linguistic content from the embeddings of the GPT-2 model or on obtaining the empirical estimation of next-word probabilities from the output of the model^4,5,6. Moreover, one preliminary study by Kumar et al.¹⁵ showed the possibility of explaining the cascade of cortical computations during language comprehension by giving in input a text to the BERT model¹⁰ and leveraging the output of its so-called attention heads, which are the essential internal components of an attention-based DL model, that operate directly on the input words¹⁵. However, no previous studies considered the model “reasoning”, i.e., how the model evaluates the input words and assigns the responsibility to each of them for generating a given output¹⁶. To address this problem, the “input saliency” methods¹⁶ can be used to generate the so-called saliency score, which is a simple weight assigned to each word of the input sequence reflecting how much that word was important for the model in the prediction of the next word^16,17.

The aim of this work is to evaluate the possibility of using metrics derived from a DL model, including ones that aim at explaining its detailed internal reasoning, to provide additional insights on the neural mechanisms underpinning language comprehension in humans. We show that the saliency scores, as obtained from feeding a pre-trained GPT-2 with the transcription of a 12-min spoken narrative, significantly explained the neural signals, as measured with functional MRI (fMRI), in the brain of human listeners who listened to it. To this purpose, a previously published naturalistic fMRI data set¹¹, from a group of Italian participants listening to a story in both forward (FW) and backward (BW) conditions, was re-analyzed using two GTP-2 derived metrics: the negative logarithm of the next-word probability (i.e., the surprisal¹⁸) and the saliency scores associated with all input words in segments of text (i.e., the context words). These measures were calculated by feeding the GPT-2 model with input contexts of varying duration, from a minimum of 15 s to a maximum of 90 s of text. The analyses revealed that the GPT-2 surprisal significantly explained the fMRI signals from an extended network of language-related areas across all time windows, whereas the saliency scores were highly selective with respect to the length of context and significantly explained the neural data only for longer time windows, especially in the superior temporal cortex. Thereby, the GPT-2 appears capable of robustly explaining brain activations associated with context-related word prediction, highlighting a mechanism that is likely pivotal to language comprehension in humans.

Materials and methods

Participants

The raw experimental data, processed and analyzed in this work, have been acquired in a previous fMRI naturalistic experiment on the neural correlates of the linguistic prediction during spoken narrative listening. Full details can be found in the original paper¹¹.

All the volunteers enrolled in the experiment (21 females, mean age 24.5 ± 4.5 years old) were Italian native speakers without known psychiatric or neurological problems, with normal or corrected-to-normal vision and without hearing, developmental and language-related problems. All participants self-reported to be right-handed and were naive with respect to the purpose of the experiment. The study was approved by the Ethics Committee of the University of Salerno and performed in accordance with the Declaration of Helsinki. Each participant signed a written informed consent to participate in the study. No part of the study procedures was pre-registered prior to the research being conducted.

Stimuli and experimental procedure

Participants listened to a short story in both original and its reversed version, while in the MRI scanner. Reversed audio waveform was selected as control condition as it omits meaning and linguistic components, but it is comparable to forward speech in terms of auditory characteristics^11,12. To avoid and reduce possible biases due to previous knowledge of the story, in this study a short narrative of an amateur writer was chosen as our stimulus. Indeed, all subjects declared that they did not have any prior knowledge of the story.

Technical details of the stimuli and of the experimental procedure are reported in the Supplementary Materials.

Image acquisition and functional MRI pre-processing

MRI acquisition was performed with a 3 T scanner (Magnetom Skyra, Siemens Healthcare, Germany). Full details on the sequence parameters can be found in¹¹.

MRI data were pre-processed using FLS (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) and the Data Processing Assistant for Resting-State fMRI toolkit (DPARSF 5.0 http://www.rfmri.org) that is implemented in MATLAB (The MathWorks, Inc., Natick, MA www.mathworks.com) and based on SPM12 (Wellcome Department of Imaging Neuroscience, London, UK, http://www.fil.ion.ucl.ac.uk/spm/). More details are reported in the Supplementary Materials.

Estimation of the surprisal and saliency scores

In this work a version of the GPT-2 model for the Italian language, called GePpeTto¹⁹, was used. Although GePpeTto includes 12 layers, thereby corresponding to the smallest version of the GPT-2, at the best of our knowledge it is the only freely available GPT-2 model for the Italian language that has been trained from scratch on a large corpus encompassing different sources and different styles thus ensuring a training on a mix of both standard and less-standard Italian. The model was used without additional fine tuning. More details about the models can be found in Supplementary Materials, whereas a complete description about the model training and testing, the model parameters and its performances can be found in¹⁹.

The narrative text was first tokenized (i.e., the narrative was split into smaller units, such as individual words or portion of words. See Supplementary Materials for more details) and then these tokens were used to estimate the word-surprisal and corresponding saliency scores. In particular, moving one token at time the whole set of tokens was sampled using a sliding-window approach, with windows spanning from a minimum of 15 s to a maximum of 90 s with a step of 15 s. An interval of 15 s was selected as windows size and step as this reflects the period during which a hemodynamic event evolves²⁰. Moreover, this specific size allowed us to provide in input to the GPT-2 model enough information to produce a quite reliable output. In fact, a very recent study⁵ reported a correlation value around 0.75 between the GPT-2’s predictions and human predictions when using the average number of tokens contained in 15 s (see Table 1). Finally, an incremental step of 15 s allowed us to have a range of six time windows that included not only short-term linguistic phenomena but also linguistic phenomena unfolding across sentences and paragraphs.

Table 1 Descriptive values of each input window.

Full size table

Both the surprisal and the saliency scores of each token were estimated by providing in input to the GPT-2 all the previous tokens (i.e., context tokens) contained in that specific fixed time interval.

The notion of surprisal is based on the assumption that the language comprehension system, after the processing of the first t−1 items (i.e., in our case, the context tokens) will be in a state that implicitly assigns a conditional probability to each potentially upcoming item²¹, thereby the surprisal of a token is defined as:

$${\text{surprisal}}\left( {{\text{token}}} \right) = - \;{\text{log1}}0({\text{P}}\left( {{\text{token}}|{\text{context tokens}}} \right)$$

(1)

If the conditional probability of the observed item is one, it means that given its left-side context there are no other possible items than the actual one, thereby the surprise in observing it is null. Previous studies have shown that the surprisal is parametrically linked to the language-related cognitive effort or the linguistic processing difficulty^22,23 and it can be successfully used to predict neural responses during narrative listening^11,12.

In this study, the conditional probability in formula (1) was estimated by (i) providing the context tokens in input to the GPT-2, (ii) extracting and normalizing the tensor associated with the last context token and (iii) selecting the probability associated with the actual next token in the narrative. In fact, the extracted tensor contained the probability of occurrence at that point of each token in the model vocabulary. Because of the sliding window approach, the surprisal value of the first N tokens, where N is the length of the time window, was estimated by considering only the available previous tokens.

This procedure was repeated for each token and for all the six time windows.

A similar procedure was used to estimate the saliency scores. However, while the surprisal estimation provided a single value for each token, the estimation of the saliency scores returned a vector of N values reflecting the importance (the “weight”) of each context token for the prediction of the current token, thus, aiming to fulfil the need of explaining the model output¹⁶. While in the NLP field the term “saliency” is quite commonly used to describe the marginal effect of each input token on the prediction¹⁶, in the linguistic field it is used to indicate a diverse range of phenomen²⁴. In this study, we referred to the interpretation more aligned (and similar) to the description from the NLP field, i.e., that a salient token exerts an influence on the next-token prediction by making certain upcoming input more expected²⁴ (for a more detailed review on the associations among surprisal, attention and salience in the language processing please see²⁴).

The saliency scores vector for each token, indicating the importance of each context token in the prediction of the current token, was obtained with the “GradientXInput” method²⁵, using the following formula:

$${\text{Saliency vector}} = \left| {\left| {{\text{Grad}}_{{{\text{**}}}} \left( {{\text{f}}_{{\text{c}}} \left( {{\text{X}}_{{{1}:{\text{n}}}} } \right)} \right){\text{X}}_{{\text{i}}} } \right|} \right|_{{2}}$$

(2)

where X_i is the embedding vector of the current/predicted token at position i, and Grad_{** of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011)." href="/article/10.1038/s41598-022-21782-4#ref-CR30" id="ref-link-section-d52794654e1233">30 in naturalistic fMRI experiments.}

An ROI-based analysis was selected (over a voxel-based analysis) to mitigate the computational costs (especially in the case of saliency scores analysis). For both GLM analyses only the time points characterized by the presence of the audio stimulus were retained, thus resulting in a set of functional data of 709 volumes, and the contrast between the FW and the BW conditions was evaluated.

All the statistical analyses were repeated six times by using each time the surprisal values and the saliency scores associated with a specific time window. Resulting statistical results were considered significant at p < 0.05 corrected with the Bonferroni criterion (considering n = 1000 comparisons), then were stored in a volumetric map (where they were assigned to the corresponding atlas’ parcellation) that was eventually projected onto an inflated brain surface in the MNI space for visualization purposes using CAT12 (https://neuro-jena.github.io/cat/).

More details on both the workflow to create suitable fMRI predictors starting from the raw values of surprisal and saliency and the statistical analyses are reported in the Supplementary Materials.

A graphical overview of the workflow is presented in Fig. 1.

Results

Neural data modeling with the GPT-2

The tokenization procedure resulted in 2512 tokens of which only a subset of 1589 were full words. Both surprisal and saliency scores were estimated for all the six time windows, thus the GPT-2 was fed with a minimum of 53.52 ± 7.2 (mean ± std) tokens for the shortest time window (15 s) and with a maximum of 301.59 ± 69.23 tokens using the largest time window (90 s). The number of tokens feeding the GPT-2 varied not only across time windows (i.e., longer time windows contained more tokens) but also across tokens because of intrinsic narrative factors such word length, speaking frequency, and speaker intonation (Table 1). The variable size of the GPT-2 input influenced the estimation of the surprisal of the next token and a significantly positive correlation was observed between the average number of available tokens for each time window and the corresponding average surprisal (r = 0.93, p = 0.007), therefore suggesting a higher uncertainty of the model with larger number of input tokens possibly due to a higher heterogeneity of the input itself.

Finally, the increase of the size of the time window and the sliding window approach also influenced the number of fMRI time points usable for the saliency scores analysis, as with longer time windows more fMRI time points were excluded from the analysis compared to shorter time windows (Table 1).

Surprisal analysis

Across the time windows, the average surprisal varied from a minimum (mean ± std) of 3.69 ± 1.71, when using the time window of 30 s, to a maximum (mean ± std) of 3.99 ± 2.02 when using the time window of 90 s (Table 1).

Regional activation patterns elicited by the surprisal analysis were spatially consistent across all six time windows, thereby corroborating most previous works that already leveraged the outputs of the GPT-2 model with different input context lengths^4,5,6,7. In total, 144 out of 1000 ROIs were significantly activated (p < 0.05 Bonferroni corrected), in at least one time window, and most of them (40 ROIs) were significantly activated in all the time windows (see supplementary Table 1 for more details). The brain patterns associated with different time windows differed in the number of activated ROIs, ranging from a minimum of 46 ROIs (time window of 60 s) to a maximum of 121 ROIs (time window of 15 s) (Supplementary Table 1).

The six brain patterns overlapped in the middle and superior temporal gyri (bilaterally), the anterior and posterior cingulate cortex, and the left prefrontal cortex. In line with previous findings ^4,6,11,12, the strongest effects for the surprisal were observed in ROIs located in the left superior and middle temporal gyrus (Fig. 2, see supplementary Table 1 for more details), thus confirming the putative role of these ROIs in human language comprehension via the word prediction mechanism³¹. The involvement of higher-order areas in frontal and parietal lobes could be further explained by the use of time windows that spanned many seconds of the text and, therefore, were large enough to include sentences and/or paragraphs³⁰.

Saliency score analysis

Significant effects of the saliency scores were only observed for the three longest time windows (i.e., 60, 75, and 90 s). In total, 17 ROIs out of 1000 (p < 0.05, Bonferroni corrected) were found significantly activated in at least one of the three longest time windows. Most ROIs were activated for the 60 s time window but only 12 ROIs were activated for one specific time window (see Supplementary Table 2 for all details). Across the time windows, the strongest effects (i.e., highest statistical values) were observed in ROIs located in the left temporal superior and middle temporal gyri (Fig. 3, Supplementary Table 2).

These findings indicate that the saliency scores derived from the GPT-2 model selectively capture specific neuromodulatory processes occurring in the temporal cortex of the listeners. Considering the role of these scores in the artificial network, the salience-related effects would signal the activation of a similar weighing mechanism in both the artificial network and the brain. To the extent this mechanism provides a window into how the artificial network accounted for the relevance of previous context words, our findings suggest that the sensory prediction model internal to the brain is similarly updated: in practice, this mechanism allows accounting for the different share of responsibility of previously stored words when collecting new sensory evidence for the generation of a prediction error¹⁴. Thereby, while the magnitude of the prediction error is indexed by the surprisal, the weighting of previous words is indexed by the salience scores. A similar process had been already postulated as necessary in word prediction to enable text comprehension in humans²⁴ and, according to our data, it would be putatively hosted in the superior and middle temporal areas. The fact that also the surprisal showed the strongest effects in these areas further supports the idea that context-related word evaluation promotes the neural encoding of word prediction where (and probably how) this takes place.

In contrast to the surprisal analysis, the neural patterns associated with the saliency scores were not consistent across the six time windows.

Discussion

Recent studies have shown that DL language models based on the transformer architecture³²) network but are instead carried out by language-selective brain areas^33,34 that, in our case, are the ones relatively earlier in the processing hierarchy. The lack of significant ROIs for the three shortest time windows (i.e., 15, 30, and 45 s) suggests that this weighing mechanism is manifest only when it would be most effective, i.e., over longer time windows when there is a real advantage in accessing an internal model of an evolving discourse^24,35. Thus, when considering a shorter time window, the observable mechanisms the brain uses to predict the incoming word are not indexed by the varying saliency scores.

In general, considering that the architecture of artificial neural networks was originally inspired by the same principles of biological neural networks, it might be not at all surprising that some specific dynamics observed in the former are somehow reflected in the functioning of the latter, albeit to a different physical scale and/or via different modalities of information exchange (e.g., reading vs. listening). Nonetheless, it is not possible to absolutely warrant this kind of evidence in every circumstance, especially when the depth and complexity of the artificial network increases to levels similar to that of the GPT-2 considered here (or more), and no one-to-one correspondences of computational units is assumed between the model and the brain^2,36,37. The very same architecture of DL models based on multi-head self-attention modules was not completely inspired by (and does not clearly map to) neural computations in biological networks³⁸. Thus, we cannot interpret the computational model as a general cognitive model nor we can reversely infer any critical human features, such as, e.g., abstraction and generalization, for the model ². Nevertheless, it is equally important to not dismiss the potentially useful exchange of information between neuroscience and AI tools as emerging from the analysis of neuroimaging data, especially under naturalistic conditions where even the most detailed a priori cognitive model might be difficult to apply in the explanation of neural responses^1,37.

Our results will likely inspire new lines of research and applications both in the neuroscientific and AI fields. In general, the idea of using a metric, such as the saliency score, that “explains” the reasoning of the DL model to map the brain functioning could be applied in the process of improving AI models, e.g., to make the human-AI interaction more useful and effective or to lead the model to take more on human-like approach³⁹. On the other hand, by inverting the paradigm, another possible application would be the extraction of the stimulus salient features from the brain signal⁴⁰. Finally, the possibility to generate informative predictors using parameters derived from AI models could have relevant clinical applications. For example, the presented methodology could be applied to investigate language dysfunctions in psychiatric and neurodevelopmental disorders, such as schizophrenia⁴¹ or autism spectrum disorder (ASD)⁴².

Our work comes with some limitations. First, although the GPT-2 model is one of the most biologically plausible models⁷, it has been suggested that it does not reflect the way human beings learn and manage language⁴⁴. Finally, although fMRI has been successfully used in previous studies to map language processing^4,15,30,43, the use of high temporal resolution neuroimaging techniques, such as MEG and EEG, could provide results at finer details as their more time-resolved signals could be better suited to a word-based metric such as the salience scores^5,6.

Conclusions

In this work, the neural correlates of narrative comprehension in a naturalistic fMRI experiment have been mapped with two metrics estimated with a state-of-art DL language model (GPT-2): the surprisal and the saliency scores. The results observed in the surprisal analysis confirm previous studies and further support the use of DL language models to explain the fMRI signal during spoken narrative listening. The analysis of the saliency scores revealed the presence of a weighing mechanism operating on the listened words which takes place in the superior and middle temporal cortex. As this mechanism explains the performance of an NLP tool in the prediction of new words from the current segment of text, this finding establishes a novel link between the ways the human brain and the chosen DL language model (GPT-2) build up inferences about the next incoming word. This approach works by unpacking the internal reasoning approach of the artificial neural network and, despite the architectural differences, seems to feature a similar mechanism of neural processing in the brain which may inspire novel strategies for addressing human performances in complex cognitive tasks, including natural language comprehension.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Hasson, U., Nastase, S. A. & Goldstein, A. Direct fit to nature: An evolutionary perspective on biological and artificial neural networks. Neuron 105, 416–434 (2020).
Article CAS Google Scholar
Saxe, A., Nelli, S. & Summerfield, C. If deep learning is the answer, what is the question?. Nat. Rev. Neurosci. 22, 55–67 (2021).
Article CAS Google Scholar
Cichy, R. M. & Kaiser, D. Deep neural networks as scientific models. Trends Cogn. Sci. 23, 305–317 (2019).
Article Google Scholar
Caucheteux, C., Gramfort, A. & King, J.-R. GPT-2’s activations predict the degree of semantic comprehension in the human brain. bioRxiv 2021.04.20.440622. https://doi.org/10.1101/2021.04.20.440622 (2021).
Goldstein, A. et al. Shared computational principles for language processing in humans and deep language models. Nat. Neurosci. 25, 369–380 (2022).
Article CAS Google Scholar
Heilbron, M., Armeni, K., Schoffelen, J.-M., Hagoort, P. & de Lange, F. P. A hierarchy of linguistic predictions during natural language comprehension. Proc. Natl. Acad. Sci. 119, e2201968119 (2022).
Article CAS Google Scholar
Schrimpf, M. et al. The neural architecture of language: Integrative modeling converges on predictive processing. PNAS https://doi.org/10.1073/pnas.2105646118 (2021).
Article PubMed PubMed Central Google Scholar
Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 24 (2019).
Google Scholar
Vaswani, A. et al. Attention Is All You Need. ar**v:1706.03762 (2017).
Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. ar**v:1810.04805 (2019).
Russo, A. G. et al. Semantics-weighted lexical surprisal modeling of naturalistic functional MRI time-series during spoken narrative listening. Neuroimage 222, 117281 (2020).
Article Google Scholar
Willems, R. M., Frank, S. L., Nijhof, A. D., Hagoort, P. & Van Den Bosch, A. Prediction during natural language comprehension. Cereb. Cortex 26, 2506–2516 (2016).
Article Google Scholar
Cohen, L., Salondy, P., Pallier, C. & Dehaene, S. How does inattention affect written and spoken language processing?. Cortex 138, 212–227 (2021).
Article Google Scholar
Smout, C. A., Tang, M. F., Garrido, M. I. & Mattingley, J. B. Attention promotes the neural encoding of prediction errors. PLoS Biol. 17, e2006812 (2019).
Article CAS Google Scholar
Kumar, S. et al. Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model. https://doi.org/10.1101/2022.06.08.495348 (2022).
Bastings, J. & Filippova, K. The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?. ar**v:2010.05607 (2020).
Ghojogh, B. & Ghodsi, A. Attention mechanism, transformers, BERT, and GPT: Tutorial and survey. https://osf.io/m6gcn/. 10.31219/osf.io/m6gcn (2020).
Hale, J. Information-theoretical complexity metrics. Linguist. Lang. Compass 10, 397–412 (2016).
Article Google Scholar
De Mattei, L., Cafagna, M., Dell’Orletta, F., Nissim, M. & Guerini, M. GePpeTto carves Italian into a language model. ar**v:2004.14253 (2020).
Boynton, G. M., Engel, S. A., Glover, G. H. & Heeger, D. J. Linear systems analysis of functional magnetic resonance imaging in human V1. J. Neurosci. 16, 4207–4221 (1996).
Article CAS Google Scholar
Kuperberg, G. R. & Jaeger, T. F. What do we mean by prediction in language comprehension?. Lang. Cogn. Neurosci. 31, 32–59 (2016).
Article Google Scholar
Demberg, V. & Keller, F. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109, 193–210 (2008).
Article Google Scholar
Smith, N. J. & Levy, R. The effect of word predictability on reading time is logarithmic. Cognition 128, 302–319 (2013).
Article Google Scholar
Zarcone, A., van Schijndel, M., Vogels, J. & Demberg, V. Salience and attention in surprisal-based accounts of language processing. Front. Psychol. 7, 844 (2016).
Article Google Scholar
Denil, M., Demiraj, A. & de Freitas, N. Extraction of salient sentences from labelled documents. ar**v:1412.6815 (2015).
Atanasova, P., Simonsen, J. G., Lioma, C. & Augenstein, I. A Diagnostic study of explainability techniques for text classification. ar**v:2009.13295 (2020).
Alammar, J. Ecco: An open source library for the explainability of transformer language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations 249–257 (Association for Computational Linguistics, 2021). https://doi.org/10.18653/v1/2021.acl-demo.30.
Schaefer, A. et al. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb. Cortex 28, 3095–3114 (2018).
Article Google Scholar
Russo, A. G., De Martino, M., Elia, A., Di Salle, F. & Esposito, F. Negative correlation between word-level surprisal and intersubject neural synchronization during narrative listening. Cortex 155, 132–149 (2022).
Article Google Scholar
Lerner, Y., Honey, C. J., Silbert, L. J. & Hasson, U. Topographic map** of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).
Article CAS Google Scholar
Hickok, G. & Poeppel, D. The cortical organisation of speech processing. Nature 8, 393–402 (2007).
CAS Google Scholar
Duncan, J. The multiple-demand (MD) system of the primate brain: Mental programs for intelligent behaviour. Trends Cogn. Sci. 14, 172–179 (2010).
Article Google Scholar
Wehbe, L. et al. Incremental language comprehension difficulty predicts activity in the language network but not the multiple demand network. Cereb. Cortex 31, 4006–4023 (2021).
Article Google Scholar
Caplan, D. Commentary on “The role of domain-general cognitive control in language comprehension” by Fedorenko. Front. Psychol. 5, 629 (2014).
Article Google Scholar
Lascarides, A. & Asher, N. Segmented discourse representation theory: Dynamic semantics with discourse structure. In Computing Meaning (eds Bunt, H. & Muskens, R.) 87–124 (Springer Netherlands, 2008). https://doi.org/10.1007/978-1-4020-5958-2_5.
Chapter Google Scholar
Pulvermüller, F., Tomasello, R., Henningsen-Schomers, M. R. & Wennekers, T. Biological constraints on neural network models of cognitive function. Nat. Rev. Neurosci. 22, 488–502 (2021).
Article Google Scholar
Richards, B. A. et al. A deep learning framework for neuroscience. Nat. Neurosci. 22, 1761–1770 (2019).
Article CAS Google Scholar
Abbott, L. F. & Dayan, P. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (MIT Press, 2001).
MATH Google Scholar
Savage, N. How AI and neuroscience drive each other forwards. Nature 571, S15–S17 (2019).
Article ADS CAS Google Scholar
Zhao, S. et al. Decoding auditory saliency from brain activity patterns during free listening to naturalistic audio excerpts. Neuroinformatics 16, 309–324 (2018).
Article Google Scholar
Meyer, L., Lakatos, P. & He, Y. Language dysfunction in schizophrenia: Assessing neural tracking to characterize the underlying disorder(s)?. Front. Neurosci. 15, 640502 (2021).
Article Google Scholar
Brennan, J. R., La**ess-O’Neill, R., Bowyer, S., Kovelman, I. & Hale, J. T. Predictive sentence comprehension during story-listening in autism spectrum disorder. Lang. Cogn. Neurosci. 34, 428–439 (2019).
Article Google Scholar
Millet, J. et al. Toward a realistic model of speech processing in the brain with self-supervised learning. http://arxiv.org/abs/2206.01685. https://doi.org/10.48550/ar**v.2206.01685 (2022).
Hale, J. T. et al. Neuro-computational models of language processing. Annu. Rev. Linguist. https://doi.org/10.1146/lingbuzz/006147 (2021).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Advanced Medical and Surgical Sciences, School of Medicine and Surgery, University of Campania “Luigi Vanvitelli”, Piazza Luigi Miraglia, 2, 80138, Naples, Italy
Andrea G. Russo, Gioacchino Tedeschi & Fabrizio Esposito
Department of Medicine, Surgery and Dentistry, “Scuola Medica Salernitana”, University of Salerno, Baronissi, Salerno, Italy
Assunta Ciarlo, Sara Ponticorvo & Francesco Di Salle
Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, MN, USA
Sara Ponticorvo
Department of Diagnostic Imaging, University Hospital “San Giovanni di Dio e Ruggi D’Aragona”, Salerno, Italy
Francesco Di Salle

Authors

Andrea G. Russo
View author publications
You can also search for this author in PubMed Google Scholar
Assunta Ciarlo
View author publications
You can also search for this author in PubMed Google Scholar
Sara Ponticorvo
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Di Salle
View author publications
You can also search for this author in PubMed Google Scholar
Gioacchino Tedeschi
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Esposito
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.G.R.: Conceptualization, Methodology, Software, Investigation, Formal analysis, Writing of the original draft of the manuscript, Data curation, Visualization. A.C.: Conceptualization, Writing, review & editing of the manuscript. S.P.: Conceptualization, Writing, review & editing of the manuscript. F.D.S.: Resources. G. T.: Resources. F. E.: Writing, review & editing of the manuscript, Supervision, Resources, Project administration. All authors reviewed the final version of the manuscript.

Corresponding author

Correspondence to Fabrizio Esposito.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Russo, A.G., Ciarlo, A., Ponticorvo, S. et al. Explaining neural activity in human listeners with deep learning via natural language processing of narrative text. Sci Rep 12, 17838 (2022). https://doi.org/10.1038/s41598-022-21782-4

Download citation

Received: 27 July 2022
Accepted: 04 October 2022
Published: 25 October 2022
DOI: https://doi.org/10.1038/s41598-022-21782-4
Springer Nature Limited

Explaining neural activity in human listeners with deep learning via natural language processing of narrative text

Abstract

Similar content being viewed by others

A 10-hour within-participant magnetoencephalography narrative dataset to test models of language comprehension

Deep language algorithms predict semantic comprehension from brain activity

Brains and algorithms partially converge in natural language processing

Introduction