Introduction

Deep artificial neural networks and deep learning (DL) tools can be remarkably accurate in assisting or performing human tasks and can therefore be used to replicate the output of human reasoning in a computer simulation. Such DL “in-silico” models are increasingly used in neuroscience to address some of the complex neural mechanisms that subserve high-order cognitive functions in humans1,2. This can be done empirically by applying the same information as input (under the same or a different modality) to both the artificial network and the biological system and then extracting suitable “explanation” parameters from the former to predict the neural responses measured in the latter3. In the field of natural language processing (NLP), DL models have been successfully combined with neuroimaging techniques to recognize and localize some specific neural mechanisms putatively subserving language processing in the human brain4,5,6,7.

Among the available NLP models based on DL, Schrimpf et al.7 showed that the Generative Pre-trained Transformer version 2 (GPT-2)8 has the best performance in the encoding of neural signals. The architecture of the GPT-2 is based on the so-called “attention mechanism”9 and, compared to other highly performative attention-based DL models, such as BERT10, it uses only the information on the left side to predict the next word in the text8. Recently, Goldstein et al.5 used electrocorticography (ECoG) to demonstrate that the GPT-2 shares similar computational principles with the human brain when processing text.

The prediction of the next word from a given sequence also occurs in the human brain when continuously engaged in the generation of meaningful linguistic structures from the auditory streams of words perceived up to that moment, such as during natural listening6,11,12. In addition, it is an established principle of sensory perception that an attention mechanism is also needed in the human brain to improve the prediction mechanism via synergistic modulation of input signals13,14. Thus, if a similar principle applies to sequences of tokens (i.e., words, punctuations, etc.), feeding the GPT-2 with a given text, and analyzing its internal process, might help to explain some of the neural processes elicited in the brain of human listeners to whom the same text is narrated.

So far, previous studies have mostly focused on abstracting the neural representation of specific linguistic content from the embeddings of the GPT-2 model or on obtaining the empirical estimation of next-word probabilities from the output of the model4,5,6. Moreover, one preliminary study by Kumar et al.15 showed the possibility of explaining the cascade of cortical computations during language comprehension by giving in input a text to the BERT model10 and leveraging the output of its so-called attention heads, which are the essential internal components of an attention-based DL model, that operate directly on the input words15. However, no previous studies considered the model “reasoning”, i.e., how the model evaluates the input words and assigns the responsibility to each of them for generating a given output16. To address this problem, the “input saliency” methods16 can be used to generate the so-called saliency score, which is a simple weight assigned to each word of the input sequence reflecting how much that word was important for the model in the prediction of the next word16,17.

The aim of this work is to evaluate the possibility of using metrics derived from a DL model, including ones that aim at explaining its detailed internal reasoning, to provide additional insights on the neural mechanisms underpinning language comprehension in humans. We show that the saliency scores, as obtained from feeding a pre-trained GPT-2 with the transcription of a 12-min spoken narrative, significantly explained the neural signals, as measured with functional MRI (fMRI), in the brain of human listeners who listened to it. To this purpose, a previously published naturalistic fMRI data set11, from a group of Italian participants listening to a story in both forward (FW) and backward (BW) conditions, was re-analyzed using two GTP-2 derived metrics: the negative logarithm of the next-word probability (i.e., the surprisal18) and the saliency scores associated with all input words in segments of text (i.e., the context words). These measures were calculated by feeding the GPT-2 model with input contexts of varying duration, from a minimum of 15 s to a maximum of 90 s of text. The analyses revealed that the GPT-2 surprisal significantly explained the fMRI signals from an extended network of language-related areas across all time windows, whereas the saliency scores were highly selective with respect to the length of context and significantly explained the neural data only for longer time windows, especially in the superior temporal cortex. Thereby, the GPT-2 appears capable of robustly explaining brain activations associated with context-related word prediction, highlighting a mechanism that is likely pivotal to language comprehension in humans.

Materials and methods

Participants

The raw experimental data, processed and analyzed in this work, have been acquired in a previous fMRI naturalistic experiment on the neural correlates of the linguistic prediction during spoken narrative listening. Full details can be found in the original paper11.

All the volunteers enrolled in the experiment (21 females, mean age 24.5 ± 4.5 years old) were Italian native speakers without known psychiatric or neurological problems, with normal or corrected-to-normal vision and without hearing, developmental and language-related problems. All participants self-reported to be right-handed and were naive with respect to the purpose of the experiment. The study was approved by the Ethics Committee of the University of Salerno and performed in accordance with the Declaration of Helsinki. Each participant signed a written informed consent to participate in the study. No part of the study procedures was pre-registered prior to the research being conducted.

Stimuli and experimental procedure

Participants listened to a short story in both original and its reversed version, while in the MRI scanner. Reversed audio waveform was selected as control condition as it omits meaning and linguistic components, but it is comparable to forward speech in terms of auditory characteristics11,12. To avoid and reduce possible biases due to previous knowledge of the story, in this study a short narrative of an amateur writer was chosen as our stimulus. Indeed, all subjects declared that they did not have any prior knowledge of the story.

Technical details of the stimuli and of the experimental procedure are reported in the Supplementary Materials.

Image acquisition and functional MRI pre-processing

MRI acquisition was performed with a 3 T scanner (Magnetom Skyra, Siemens Healthcare, Germany). Full details on the sequence parameters can be found in11.

MRI data were pre-processed using FLS (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) and the Data Processing Assistant for Resting-State fMRI toolkit (DPARSF 5.0 http://www.rfmri.org) that is implemented in MATLAB (The MathWorks, Inc., Natick, MA www.mathworks.com) and based on SPM12 (Wellcome Department of Imaging Neuroscience, London, UK, http://www.fil.ion.ucl.ac.uk/spm/). More details are reported in the Supplementary Materials.

Estimation of the surprisal and saliency scores

In this work a version of the GPT-2 model for the Italian language, called GePpeTto19, was used. Although GePpeTto includes 12 layers, thereby corresponding to the smallest version of the GPT-2, at the best of our knowledge it is the only freely available GPT-2 model for the Italian language that has been trained from scratch on a large corpus encompassing different sources and different styles thus ensuring a training on a mix of both standard and less-standard Italian. The model was used without additional fine tuning. More details about the models can be found in Supplementary Materials, whereas a complete description about the model training and testing, the model parameters and its performances can be found in19.

The narrative text was first tokenized (i.e., the narrative was split into smaller units, such as individual words or portion of words. See Supplementary Materials for more details) and then these tokens were used to estimate the word-surprisal and corresponding saliency scores. In particular, moving one token at time the whole set of tokens was sampled using a sliding-window approach, with windows spanning from a minimum of 15 s to a maximum of 90 s with a step of 15 s. An interval of 15 s was selected as windows size and step as this reflects the period during which a hemodynamic event evolves20. Moreover, this specific size allowed us to provide in input to the GPT-2 model enough information to produce a quite reliable output. In fact, a very recent study5 reported a correlation value around 0.75 between the GPT-2’s predictions and human predictions when using the average number of tokens contained in 15 s (see Table 1). Finally, an incremental step of 15 s allowed us to have a range of six time windows that included not only short-term linguistic phenomena but also linguistic phenomena unfolding across sentences and paragraphs.

Table 1 Descriptive values of each input window.

Both the surprisal and the saliency scores of each token were estimated by providing in input to the GPT-2 all the previous tokens (i.e., context tokens) contained in that specific fixed time interval.

The notion of surprisal is based on the assumption that the language comprehension system, after the processing of the first t−1 items (i.e., in our case, the context tokens) will be in a state that implicitly assigns a conditional probability to each potentially upcoming item21, thereby the surprisal of a token is defined as:

$${\text{surprisal}}\left( {{\text{token}}} \right) = - \;{\text{log1}}0({\text{P}}\left( {{\text{token}}|{\text{context tokens}}} \right)$$
(1)

If the conditional probability of the observed item is one, it means that given its left-side context there are no other possible items than the actual one, thereby the surprise in observing it is null. Previous studies have shown that the surprisal is parametrically linked to the language-related cognitive effort or the linguistic processing difficulty22,23 and it can be successfully used to predict neural responses during narrative listening11,12.

In this study, the conditional probability in formula (1) was estimated by (i) providing the context tokens in input to the GPT-2, (ii) extracting and normalizing the tensor associated with the last context token and (iii) selecting the probability associated with the actual next token in the narrative. In fact, the extracted tensor contained the probability of occurrence at that point of each token in the model vocabulary. Because of the sliding window approach, the surprisal value of the first N tokens, where N is the length of the time window, was estimated by considering only the available previous tokens.

This procedure was repeated for each token and for all the six time windows.

A similar procedure was used to estimate the saliency scores. However, while the surprisal estimation provided a single value for each token, the estimation of the saliency scores returned a vector of N values reflecting the importance (the “weight”) of each context token for the prediction of the current token, thus, aiming to fulfil the need of explaining the model output16. While in the NLP field the term “saliency” is quite commonly used to describe the marginal effect of each input token on the prediction16, in the linguistic field it is used to indicate a diverse range of phenomen24. In this study, we referred to the interpretation more aligned (and similar) to the description from the NLP field, i.e., that a salient token exerts an influence on the next-token prediction by making certain upcoming input more expected24 (for a more detailed review on the associations among surprisal, attention and salience in the language processing please see24).

The saliency scores vector for each token, indicating the importance of each context token in the prediction of the current token, was obtained with the “GradientXInput” method25, using the following formula:

$${\text{Saliency vector}} = \left| {\left| {{\text{Grad}}_{{{\text{**}}}} \left( {{\text{f}}_{{\text{c}}} \left( {{\text{X}}_{{{1}:{\text{n}}}} } \right)} \right){\text{X}}_{{\text{i}}} } \right|} \right|_{{2}}$$
(2)

where Xi is the embedding vector of the current/predicted token at position i, and Grad** of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011)." href="/article/10.1038/s41598-022-21782-4#ref-CR30" id="ref-link-section-d52794654e1233">30 in naturalistic fMRI experiments.

An ROI-based analysis was selected (over a voxel-based analysis) to mitigate the computational costs (especially in the case of saliency scores analysis). For both GLM analyses only the time points characterized by the presence of the audio stimulus were retained, thus resulting in a set of functional data of 709 volumes, and the contrast between the FW and the BW conditions was evaluated.

All the statistical analyses were repeated six times by using each time the surprisal values and the saliency scores associated with a specific time window. Resulting statistical results were considered significant at p < 0.05 corrected with the Bonferroni criterion (considering n = 1000 comparisons), then were stored in a volumetric map (where they were assigned to the corresponding atlas’ parcellation) that was eventually projected onto an inflated brain surface in the MNI space for visualization purposes using CAT12 (https://neuro-jena.github.io/cat/).

More details on both the workflow to create suitable fMRI predictors starting from the raw values of surprisal and saliency and the statistical analyses are reported in the Supplementary Materials.

A graphical overview of the workflow is presented in Fig. 1.

Figure 1
figure 1

Graphical description of the analysis workflow. The text of the audiobook was first tokenized and then given in input to the GPT-2 model. For each window, the surprisal value associated with the next token in the text and the saliency scores associated with each context token were estimated. The resulting values were first processed to generate suitable predictors and then used to encode the fMRI signal elicited by both listening conditions (FW and BW) in a mixed-effects GLM group analysis (a). The conditional probability used in formula (1) to estimate the surprisal was estimated by providing the context tokens in input to the model, extracting and normalizing via softmax the tensor associated with the last context token and, finally selecting from it the values associated with the actual next token in the narrative (b). The saliency scores associated to each context token used to predict the next token were estimated using the “GradientXInput” method by multiplying the embedding vector of the token to predict with the back-propagated gradient of the score assigned to it by the model (c). Credit for the icons from NounProject.com contained in the figure: “Sound wave” by Rabee Balakrishnan; “Audiobook” by Vectors Market; “Document” by Brian Gonzalez; “Brain” by by Clockwise; “Neural Network” by Chenyu Wang.

Results

Neural data modeling with the GPT-2

The tokenization procedure resulted in 2512 tokens of which only a subset of 1589 were full words. Both surprisal and saliency scores were estimated for all the six time windows, thus the GPT-2 was fed with a minimum of 53.52 ± 7.2 (mean ± std) tokens for the shortest time window (15 s) and with a maximum of 301.59 ± 69.23 tokens using the largest time window (90 s). The number of tokens feeding the GPT-2 varied not only across time windows (i.e., longer time windows contained more tokens) but also across tokens because of intrinsic narrative factors such word length, speaking frequency, and speaker intonation (Table 1). The variable size of the GPT-2 input influenced the estimation of the surprisal of the next token and a significantly positive correlation was observed between the average number of available tokens for each time window and the corresponding average surprisal (r = 0.93, p = 0.007), therefore suggesting a higher uncertainty of the model with larger number of input tokens possibly due to a higher heterogeneity of the input itself.

Finally, the increase of the size of the time window and the sliding window approach also influenced the number of fMRI time points usable for the saliency scores analysis, as with longer time windows more fMRI time points were excluded from the analysis compared to shorter time windows (Table 1).

Surprisal analysis

Across the time windows, the average surprisal varied from a minimum (mean ± std) of 3.69 ± 1.71, when using the time window of 30 s, to a maximum (mean ± std) of 3.99 ± 2.02 when using the time window of 90 s (Table 1).

Regional activation patterns elicited by the surprisal analysis were spatially consistent across all six time windows, thereby corroborating most previous works that already leveraged the outputs of the GPT-2 model with different input context lengths4,5,6,7. In total, 144 out of 1000 ROIs were significantly activated (p < 0.05 Bonferroni corrected), in at least one time window, and most of them (40 ROIs) were significantly activated in all the time windows (see supplementary Table 1 for more details). The brain patterns associated with different time windows differed in the number of activated ROIs, ranging from a minimum of 46 ROIs (time window of 60 s) to a maximum of 121 ROIs (time window of 15 s) (Supplementary Table 1).

The six brain patterns overlapped in the middle and superior temporal gyri (bilaterally), the anterior and posterior cingulate cortex, and the left prefrontal cortex. In line with previous findings 4,6,11,12, the strongest effects for the surprisal were observed in ROIs located in the left superior and middle temporal gyrus (Fig. 2, see supplementary Table 1 for more details), thus confirming the putative role of these ROIs in human language comprehension via the word prediction mechanism31. The involvement of higher-order areas in frontal and parietal lobes could be further explained by the use of time windows that spanned many seconds of the text and, therefore, were large enough to include sentences and/or paragraphs30.

Figure 2
figure 2

Regional statistical map of the contrast FW versus BW projected on a surface in the standard MNI space (left and right hemisphere). Color-coded areas indicate the atlas’ parcellations that yielded a significantly higher correlation with the surprisal predictor in the FW speech condition compared to the BW speech condition (p < 0.05, Bonferroni corrected). A left lateralized regional pattern, encompassing parcellations mainly in the temporal and frontal areas, was observed for all the time windows. Consistently across the time windows, the highest statistical values were observed in parcellations located in the left superior/middle temporal gyrus.

Saliency score analysis

Significant effects of the saliency scores were only observed for the three longest time windows (i.e., 60, 75, and 90 s). In total, 17 ROIs out of 1000 (p < 0.05, Bonferroni corrected) were found significantly activated in at least one of the three longest time windows. Most ROIs were activated for the 60 s time window but only 12 ROIs were activated for one specific time window (see Supplementary Table 2 for all details). Across the time windows, the strongest effects (i.e., highest statistical values) were observed in ROIs located in the left temporal superior and middle temporal gyri (Fig. 3, Supplementary Table 2).

Figure 3
figure 3

Regional statistical map of the contrast FW versus BW projected on a surface in the standard MNI space (left and right hemisphere). Color-coded areas indicate the atlas’ parcellations that yielded a significantly higher correlation with the saliency scores predictor in the FW speech condition compared to the BW speech condition (p < 0.05, Bonferroni corrected). A left lateralized regional pattern, encompassing parcellations mainly in the temporal areas, was observed only for the three largest time windows (i.e., 60, 75, and 90 s), whereas no significant atlas’ parcellation was observed when considering shorter time window (i.e., 15, 30, and 45 s). Consistently across the time windows, where significant parcellations were observed, the highest statical values were elicited in regions located in the superior/middle temporal gyrus.

These findings indicate that the saliency scores derived from the GPT-2 model selectively capture specific neuromodulatory processes occurring in the temporal cortex of the listeners. Considering the role of these scores in the artificial network, the salience-related effects would signal the activation of a similar weighing mechanism in both the artificial network and the brain. To the extent this mechanism provides a window into how the artificial network accounted for the relevance of previous context words, our findings suggest that the sensory prediction model internal to the brain is similarly updated: in practice, this mechanism allows accounting for the different share of responsibility of previously stored words when collecting new sensory evidence for the generation of a prediction error14. Thereby, while the magnitude of the prediction error is indexed by the surprisal, the weighting of previous words is indexed by the salience scores. A similar process had been already postulated as necessary in word prediction to enable text comprehension in humans24 and, according to our data, it would be putatively hosted in the superior and middle temporal areas. The fact that also the surprisal showed the strongest effects in these areas further supports the idea that context-related word evaluation promotes the neural encoding of word prediction where (and probably how) this takes place.

In contrast to the surprisal analysis, the neural patterns associated with the saliency scores were not consistent across the six time windows.

Discussion

Recent studies have shown that DL language models based on the transformer architecture32) network but are instead carried out by language-selective brain areas33,34 that, in our case, are the ones relatively earlier in the processing hierarchy. The lack of significant ROIs for the three shortest time windows (i.e., 15, 30, and 45 s) suggests that this weighing mechanism is manifest only when it would be most effective, i.e., over longer time windows when there is a real advantage in accessing an internal model of an evolving discourse24,35. Thus, when considering a shorter time window, the observable mechanisms the brain uses to predict the incoming word are not indexed by the varying saliency scores.

In general, considering that the architecture of artificial neural networks was originally inspired by the same principles of biological neural networks, it might be not at all surprising that some specific dynamics observed in the former are somehow reflected in the functioning of the latter, albeit to a different physical scale and/or via different modalities of information exchange (e.g., reading vs. listening). Nonetheless, it is not possible to absolutely warrant this kind of evidence in every circumstance, especially when the depth and complexity of the artificial network increases to levels similar to that of the GPT-2 considered here (or more), and no one-to-one correspondences of computational units is assumed between the model and the brain2,36,37. The very same architecture of DL models based on multi-head self-attention modules was not completely inspired by (and does not clearly map to) neural computations in biological networks38. Thus, we cannot interpret the computational model as a general cognitive model nor we can reversely infer any critical human features, such as, e.g., abstraction and generalization, for the model 2. Nevertheless, it is equally important to not dismiss the potentially useful exchange of information between neuroscience and AI tools as emerging from the analysis of neuroimaging data, especially under naturalistic conditions where even the most detailed a priori cognitive model might be difficult to apply in the explanation of neural responses1,37.

Our results will likely inspire new lines of research and applications both in the neuroscientific and AI fields. In general, the idea of using a metric, such as the saliency score, that “explains” the reasoning of the DL model to map the brain functioning could be applied in the process of improving AI models, e.g., to make the human-AI interaction more useful and effective or to lead the model to take more on human-like approach39. On the other hand, by inverting the paradigm, another possible application would be the extraction of the stimulus salient features from the brain signal40. Finally, the possibility to generate informative predictors using parameters derived from AI models could have relevant clinical applications. For example, the presented methodology could be applied to investigate language dysfunctions in psychiatric and neurodevelopmental disorders, such as schizophrenia41 or autism spectrum disorder (ASD)42.

Our work comes with some limitations. First, although the GPT-2 model is one of the most biologically plausible models7, it has been suggested that it does not reflect the way human beings learn and manage language44. Finally, although fMRI has been successfully used in previous studies to map language processing4,15,30,43, the use of high temporal resolution neuroimaging techniques, such as MEG and EEG, could provide results at finer details as their more time-resolved signals could be better suited to a word-based metric such as the salience scores5,6.

Conclusions

In this work, the neural correlates of narrative comprehension in a naturalistic fMRI experiment have been mapped with two metrics estimated with a state-of-art DL language model (GPT-2): the surprisal and the saliency scores. The results observed in the surprisal analysis confirm previous studies and further support the use of DL language models to explain the fMRI signal during spoken narrative listening. The analysis of the saliency scores revealed the presence of a weighing mechanism operating on the listened words which takes place in the superior and middle temporal cortex. As this mechanism explains the performance of an NLP tool in the prediction of new words from the current segment of text, this finding establishes a novel link between the ways the human brain and the chosen DL language model (GPT-2) build up inferences about the next incoming word. This approach works by unpacking the internal reasoning approach of the artificial neural network and, despite the architectural differences, seems to feature a similar mechanism of neural processing in the brain which may inspire novel strategies for addressing human performances in complex cognitive tasks, including natural language comprehension.