Introduction

Episodic memory changes over time. Converging lines of evidence from lesion studies in rodents1,2, human neuroimaging studies3,4,5 or studies in amnesic patients6,7 indicate that episodic memories undergo a time-dependent neural reorganization. While memories are initially dependent on the hippocampus, they become more dependent on neocortical structures, such as the ventromedial prefrontal cortex (vmPFC)8,9,10, inferior frontal gyrus (IFG)4,5, anterior cingulate cortex (aCC)2,11,12,13, angular gyrus and precuneus14,15, as time after encoding proceeds. Whether remote memories become entirely independent of the hippocampus is still debated16,17,18 and, intriguingly, initial evidence points to the possibility of a time-dependent reorganization of memories within the hippocampus, from anterior to parietal parts19,20. Critically, the neural reorganization of memory is thought to be accompanied by a transformation from a detailed episodic memory trace to a more gist-like representation16,17. Such qualitative changes over time are a fundamental aspect of memory and may promote the building of abstract knowledge networks4. Moreover, they have highly relevant implications, for instance, for eyewitness testimony or the generalized memory for aversive events in mental disorders.

The nature of these qualitative changes of memories over time remains, however, elusive. One possible mechanism is a perceptual transformation, in which a detailed, perceptually rich episodic trace evolves over time into a less specific trace that contains knowledge of general perceptual features of the original event (e.g. ‘I remember the painting contained a lot of red and brown’). Indeed, the hippocampus is critically implicated in remembering perceptual details21 and the perceptual transformation perspective may be close to the common view that memories fade away and simply lose (perceptual) detail over time22. Alternatively, with time, memories may not be just a perceptually degraded version of the original trace but become semantically transformed into representations that carry the semantic gist, with only minimal (detailed or generalized) perceptual information (e.g. ‘I remember the painting showed an apple on a table’). This semantization of memories over time may provide a better explanation of how episodic experiences are integrated into abstract knowledge structures than a mere decay of (perceptual) features of a memory trace. While prominent theoretical accounts appear to favor the semantic transformation view16,17, there is a lack of clear empirical evidence for a semantic transformation of memory over time. Paradigms used in previous studies on time-dependent memory transformation in humans or rodents involved tests of transformation that were both semantically and perceptually similar to the original event and could thus not distinguish between different mechanisms of transformation. Thus, whether the transformation of memory over time is perceptual or semantic in nature (or both) remains unclear.

In the present experiment, we aimed at elucidating the nature and neural signature of time-dependent memory transformation. Specifically, we sought to determine whether there is a semantic or a perceptual transformation of the original memory over time. Moreover, because emotional arousal has been shown, on the one hand, to enhance memory for the gist of the event at the cost of reduced memory for peripheral features23,24,25 but, on the other hand, to increase memory specificity in the long-run2,26, we further tested whether the nature of memory transformation over time, as well as its neural underpinnings, would differ depending on the level of emotionality of the encoded material. To this end, we tested participants’ recognition memory for emotionally neutral and negative pictures either 1d or 28d after encoding. As the neural reorganization of memories can be expected to be much further progressed 28d compared to 1d after encoding2,19, varying the delay between encoding and recognition testing allowed probing time-dependent memory transformation. Critically, this recognition test included, in addition to initially encoded and entirely new pictures, also lures that were either perceptually or semantically related to the original stimuli. Encoding as well as memory testing took place in an MRI scanner, enabling us to analyze time-dependent changes in the reinstatement of encoding patterns and the specificity of memory representations during memory testing by leveraging multivariate fMRI-analysis approaches. A perceptual transformation would be indicated if, with increasing delay after encoding, perceptually related, but not semantically related, items are endorsed as ‘old’. Conversely, a semantic transformation would be indicated if participants endorse semantically related, but not perceptually related, items as ‘old’.

Here, we show that episodic memories are semantically transformed over time, while we obtain no credible evidence for a perceptual transformation. This time-dependent semantization of memories was further enhanced for emotionally negative compared to neutral stimuli. At the neural level, the time-dependent transformation of memories was reflected in semantic, gist-like representations of remote memories in prefrontal as well as parietal neocortical storage sites. The anterior hippocampus was associated with distinct representations of encoded events that declined with increasing delay after encoding. Posterior hippocampal memory reinstatement increased over time and was associated with less specific memory representations that were linked to the semantic gist of the original memory, again without evidence for a reliable effect of the perceptual gist.

Results

To elucidate whether episodic memories are semantically or perceptually transformed over time and whether this process is equally evident for emotionally neutral compared to negative pictures, we performed a 3-day study: Day 1—encoding of emotionally neutral or negative pictures in the MRI scanner; Day 2 (either 1d or 28d after Day 1)—recognition testing in the MRI scanner; Day 3—individual assessment of the semantic and perceptual relatedness of the stimulus material. In order to dissociate semantic and perceptual mechanisms of time-dependent memory transformation, the recognition test included, in addition to original and entirely novel items, items that were either perceptually or semantically related to the original pictures. Each originally encoded picture corresponded precisely to one semantically related, one perceptually related and one unrelated picture, matching the original picture in terms of the level of emotionality and other relevant features (see methods section). The semantic and perceptual relatedness of each originally encoded item to their corresponding semantically related, perceptually related, or unrelated lure was tested in an independent behavioral pilot study (n = 32 participants), which confirmed that semantically related items were rated as significantly more semantically related but significantly less perceptually related to the original items than perceptually related items (see Supplementary Fig. 1).

On the first experimental day, 52 healthy, right-handed young adults (26 females, 26 males, age: M = 24.29 years, SEM = 0.55 years) encoded 60 pictures (30 emotionally neutral, 30 emotionally negative) in an MRI scanner, each presented for 3 s in each of three consecutive runs (see Fig. 1). To control for alertness during encoding, participants were instructed to respond with a button press as soon as a fixation cross appeared between trials. On average, participants missed only 1.48 (SEM = 0.43) responses across all trials and runs, indicating that participants were attentive during encoding, without statistically significant differences between 1d- and 28d-groups (main effect delay: F(1, 50) = 1.46, p = 0.233, \({\eta }_{p}^{2}\) = 0.03, 95% Confidence Interval: [9e–05, 0.18]; delay × run: F(1.87, 93.71) = 0.84, p = 0.429, \({\eta }_{p}^{2}\) = 0.02, 95% Confidence Interval: [0.001, 0.12]; mixed ANOVA). To ensure that the 1d- and 28d-groups did not differ in initial encoding, we asked participants to recall as many of the pictures as possible immediately after the encoding session. In this immediate free recall test, participants recalled on average 50.99% (SEM = 2.21%) of the 60 previously encoded items. A mixed ANOVA with the between-subjects factor delay (1d vs. 28d) and the within-subject factor emotion (neutral vs. negative) did not indicate a statistically significant difference between delay groups in immediate memory performance (main effect delay: F(1, 50) = 0.17, p = 0.678, \({\eta }_{p}^{2}\) = 0.003, 95% Confidence Interval: [2e–05, 0.11]; delay × emotion: F(1, 50) = 1.13, p = 0.293, \({\eta }_{p}^{2}\) = 0.02, 95% Confidence Interval: [6e–05, 0.16]). As expected, participants recalled significantly more negative (M = 58.78%, SEM = 2.30%) than neutral pictures (M = 43.21%, SEM = 2.49%; main effect emotion: F(1, 50) = 69.33, p = 5e−11, \({\eta }_{p}^{2}\) = 0.58, 95% Confidence Interval: [0.42, 0.72]; Supplementary Fig. 2), indicating an enhancement of immediate memory performance due to the emotionality of the encoded material, in line with previous reports27,28.

Fig. 1: Experimental paradigm.
figure 1

On the first experimental day (Day 1), participants encoded 30 emotionally neutral and 30 negative pictures, each presented once in each of three consecutive runs. After a delay of 1d or 28d (Day 2), participants were presented with the encoded pictures, lures that were perceptually or semantically related to the old pictures or entirely novel, unrelated material in a recognition test. Both encoding and memory testing were conducted in an MRI scanner. On the third experimental day (Day 3), participants rated the individually perceived semantic and perceptual relatedness between each old image and their corresponding semantically related, perceptually related or unrelated lure. All depicted images are licensed under Creative Commons BY-SA License: image representing emotionally negative item at encoding (fire) is courtesy of Sylvain Pedneault (https://commons.wikimedia.org/wiki/File:Fire_inside_an_abandoned_convent_in_Massueville,_Quebec,_Canada.jpg; edited), image representing ‘old’ item is courtesy of W. Bulach (https://commons.wikimedia.org/wiki/File:00_2141_Bicycle-sharing_systems_-_Sweden.jpg; edited), image representing ‘semantically related’ item is courtesy of Matti Blume (https://commons.wikimedia.org/wiki/File:Bike_share_2019,_Berlin_(P1080139).jpg; edited), image representing ‘perceptually related’ item is courtesy of Ivy Main (https://fi.m.wikipedia.org/wiki/Tiedosto:Bottled_water_in_supermarket.JPG; edited), image representing ‘unrelated’ item is courtesy of Hannes Drexl (https://commons.wikimedia.org/wiki/File:Autokran_Seite.jpg?uselang=de; unchanged).

Memories are semantically transformed over time

On experimental Day 2 (either 1d or 28d after initial encoding), participants underwent a recognition test in which they were instructed to indicate for each of the presented pictures, whether the picture had been presented on Day 1 (‘old’) or not (‘new’). Critically, this recognition test included, in addition to original and entirely novel, unrelated items, lures that were either semantically or perceptually related to the old items, thus enabling us to examine the nature of time-dependent memory transformation. As expected, the hit rate was significantly higher in the 1d-group (M = 91.86%, SEM = 1.12%) than in the 28d-group (M = 75.58%, SEM = 2.45%; main effect delay: F(1, 50) = 20.72, p = 3e−05, \({\eta }_{p}^{2}\) = 0.29, 95% Confidence Interval: [0.11, 0.49]; Fig. 2a and Supplementary Table 1). Notably, this delay-dependent decrease in memory performance was dependent on the emotionality of the stimuli (emotion × delay: F(1, 50) = 9.23, p = 0.004, \({\eta }_{p}^{2}\) = 0.16, 95% Confidence Interval: [0.02, 0.36]; main effect emotion: F(1, 50) = 4.52, p = 0.038, \({\eta }_{p}^{2}\) = 0.08, 95% Confidence Interval: [0.002, 0.27]): the decrease in hits for the 28d- compared to the 1d-group was significantly lower for emotionally negative compared to emotionally neutral pictures (interaction contrast: t(50) = 3.04, p = 0.004, d = 0.40, 95% Confidence Interval = [0.14, 0.66]). Accordingly, the hit rate for negative pictures after 28d was significantly higher than for neutral pictures (paired t-test: t(50) = −3.65, p = 6e–04, d = −0.66, 95% Confidence Interval = [−1.01, −0.31]), while there was no statistically significant difference in the hit rate for emotionally negative and neutral pictures when tested 1d after encoding (paired t-test: t(50) = 0.64, p = 0.522, d = 0.14, 95% Confidence Interval = [–0.28, 0.56]). The latter finding may be owing to the overall very high memory performance on the recognition test 1d after encoding.

Fig. 2: Memory performance during recognition testing based on stimulus categories.
figure 2

a Left: The decrease in hits from 1d to 28d after encoding (main effect delay: p = 0.003) was significantly higher for emotionally neutral than negative items (delay × emotion: p = 0.004; mixed ANOVA). Right: The increase in false alarms (FAs) from 1d to 28d after encoding (main effect delay: p = 0.012) was significantly higher for lures that were semantically related to the encoded pictures, compared to perceptually related (interaction contrast: p = 2e−04) or unrelated lures (interaction contrast: p = 0.030; delay × lure type: p = 7e−04). This semantization of memories over time was significantly higher for emotionally negative compared to neutral items (interaction contrast: p = 0.006; delay × lure type × emotion: p = 0.017; mixed ANOVA). All n = 52 participants. Bars represent mean ± SEM. Individual data points indicate the percentage of the 30 items per participant, emotion and item type which were correctly (left) or incorrectly (right) endorsed as ‘old’. b Individual items were significantly more likely to be semantically transformed (main effect delay: p = 0.030), but not significantly more likely to be perceptually transformed in the 28d- compared to the 1d-group (all p > 0.293). Accordingly, detailed memory decreased with increasing delay after encoding (main effect delay: p = 1e–07). Moreover, emotionally negative memories were more robust against forgetting over time (delay × emotion: p = 0.003), but, again, more often semantically transformed than neutral ones (delay × emotion: p = 0.014; binomial generalized linear mixed models; all n = 52 participants). Bars represent mean ± SEM. Connected dots represent individual data points. All post-hoc tests were applied on estimated marginal means with Šidák correction for multiple comparisons. All reported p-values are two-tailed. Source data are provided as Source Data file. *p < 0.050; **p < 0.010; ***p < 0.001.

To assess the nature of memory transformation over time, the key question of this study, we analyzed participants’ false alarms (FAs) to unrelated (i.e., entirely novel), semantically related and perceptually related lures by means of a mixed ANOVA with the between-subjects factor delay (1d vs. 28d) and the within-subject factors emotion (neutral vs. negative) and lure type (unrelated vs. semantically related vs. perceptually related). This analysis showed a time-dependent increase in FA rates depending on the lure type (delay × lure type: F(1.55, 77.43) = 9.33, p = 7e–04, \({\eta }_{p}^{2}\) = 0.16, 95% Confidence Interval: [0.05, 0.32]; main effect lure type: F(1.55, 77.43) = 42.90, p = 2e–11, \({\eta }_{p}^{2}\) = 0.46, 95% Confidence Interval: [0.32, 0.60]; main effect delay: F(1, 50) = 6.79, p = 0.012, \({\eta }_{p}^{2}\) = 0.12, 95% Confidence Interval: [6e–03, 0.29]). As shown in Fig. 2a, a striking increase in the FA rate for the 28d- compared to the 1d-group was observed selectively for semantically related lures (two-sample t-test: t(50) = −3.32, p = 0.002, d = −1.09, 95% Confidence Interval = [–1.73, –0.45]), which was significantly higher than for perceptually related (interaction contrast: t(50) = −4.29, p = 2e–04, d = −0.58, 95% Confidence Interval = [−0.85, −0.32]; two-sample t-test: t(50) = −1.22, p = 0.226, d = −0.26, 95% Confidence Interval = [−0.69, 0.16]) or entirely novel, unrelated lures (interaction contrast: t(50) = −2.68, p = 0.030, d = −0.47, 95% Confidence Interval = [−0.82, −0.13]; two-sample t-test: t(50) = −3.32, p = 0.002, d = −1.09, 95% Confidence Interval = [−1.73, −0.45]). Thus, after a delay of 28d, 52.78% of all new pictures which were incorrectly endorsed as ‘old’ were semantically related, while only 23.14% and 24.08% were perceptually related or unrelated to the encoded pictures, respectively. This pattern of results suggests a semantic memory transformation over time. Our results did not suggest a statistically significant difference in FAs for perceptually related items compared to unrelated items at both 1d (paired t-test: t(50) = −2.31, p = 0.073, d = −0.34, 95% Confidence Interval = [−0.62, −0.05]) and 28d after encoding (paired t-test: t(50) = −0.88, p = 0.767, d = −0.11, 95% Confidence Interval = [−0.37, 0.14]).

Interestingly, this semantization over time was significantly more pronounced for emotionally negative compared to neutral pictures (delay × emotion × lure type: F(1.96, 97.98) = 4.27, p = 0.017, \({\eta }_{p}^{2}\) = 0.08, 95% Confidence Interval: [0.01, 0.21]), resulting in a significantly higher difference in FAs between emotionally negative and neutral semantically related lures at 28d (paired t-test: t(50) = −2.72, p = 0.009, d = −0.58, 95% Confidence Interval = [–1.00, –0.16]), compared to 1d (interaction contrast: t(50) = 2.88, p = 0.006, d = 0.52, 95% Confidence Interval = [0.17, 0.88]; paired t-test: t(50) = 1.36, p = 0.181, d = 0.25, 95% Confidence Interval = [–0.11, 0.6]). To follow up on this three-way interaction, we further analyzed the FAs by a separate ANOVA per lure type, each with the factors delay and emotion. These analyses confirmed a significant emotionality-dependent increase in the FA rate in the 28d-group compared to the 1d-group selectively for semantically related lures (delay × emotion: F(1, 50) = 8.30, p = 0.006, \({\eta }_{p}^{2}\) = 0.14, 95% Confidence Interval: [0.02, 0.34]) and did not indicate a statistically significant interaction effect for unrelated lures (delay × emotion: F(1, 50) = 0.54, p = 0.467, \({\eta }_{p}^{2}\) = 0.01, 95% Confidence Interval: [3e–05, 0.13]) or perceptually related lures (delay × emotion: F(1, 50) = 0.23, p = 0.637, \({\eta }_{p}^{2}\) = 0.003, 95% Confidence Interval: [2e–05, 0.11]).

Weighting the FAs by level of confidence (×1 = ‘rather old’, ×2 = ‘definitely old’) before analyzing them by means of a mixed ANOVA with the factors delay (1d vs. 28d), lure type (1d vs. 28d) and emotion (neutral vs. negative), did not change our pattern of results regarding delay-dependent effects on memory specificity (delay × lure type × emotion: F(1.96, 98.12) = 5.57, p = 0.005, \({\eta }_{p}^{2}\) = 0.10, 95% Confidence Interval: [0.02, 0.24]; delay × lure type: F(1.50, 75.19) = 8.83, p = 0.001, \({\eta }_{p}^{2}\) = 0.15, 95% Confidence Interval: [0.04, 0.32]; main effect lure type: F(1.50, 75.19) = 37.45, p = 3e–10, \({\eta }_{p}^{2}\) = 0.43, 95% Confidence Interval: [0.28, 0.58]; main effect delay: F(1, 50) = 5.45, p = 0.024, \({\eta }_{p}^{2}\,\)= 0.10, 95% Confidence Interval: [0.004, 0.29]; see Supplementary Fig. 3), indicating that our finding of an emotionally enhanced memory semantization in the course of time-dependent memory transformation was not significantly influenced by the confidence of FAs. Moreover, analyzing the confidence associated with FAs by means of binomial generalized linear mixed models (LMMs) did not reveal any significant main effect or interaction of the predictors delay and emotion, neither for semantically related (all p > 0.455), perceptually related (all p > 0.131) nor for unrelated lures (all p > 0.448; see Supplementary Table 2).

While the previous analyses showed a time-dependent increase in FAs depending on the lure type, the correspondence of each originally encoded picture to precisely one perceptually related and one semantically related lure during memory testing furthermore allowed us to analyze the response pattern at the level of each individual set of related stimuli to assess the extent of detailed, semantically transformed, perceptually transformed or entirely forgotten memories19. For this, we categorized the responses for each of the 60 related stimulus sets as either detailed, semantically transformed, perceptually transformed, or forgotten and analyzed the occurrence of each specificity category by means of binomial generalized LMMs with delay (1d vs. 28d), emotion (neutral vs. negative) and their interactions as fixed effects and the random intercept of participants and stimulus sets. Memories were classified as detailed when participants endorsed solely the originally encoded pictures as ‘old’ but not the semantically or perceptually related lures. If participants endorsed the semantically related lures but not the perceptually related lures, the respective memories were classified as being semantically transformed. Conversely, if participants endorsed the perceptually related lures but not the semantically related lures, the memories were classified as perceptually transformed. If participants endorsed neither the old nor the semantically or perceptually related items, the respective memories were classified as ‘forgotten’. Thus, all 60 items per specificity category and participant are included in each analysis except of trials in which participants missed to indicate their memory for the previously presented item (missed responses), which on average led to only 0.95% (SEM = 0.44%) of missing data points per participant (no significant difference between delay groups; two-sample t-test: t(31.20) = −1.07, p = 0.294, d = −0.30, 95% Confidence Interval = [–0.86, 0.26]; see Supplementary Table 3 for an overview of the number of stimulus sets per category). Compared to the 1d-group, participants of the 28d-group had significantly fewer detailed (main effect delay: z = −5.29, p = 1e–07, β = −1.51, 95% Confidence Interval: [–2.07, –0.95]) and more forgotten memories (main effect delay: z = 5.75, p = 9e–09, β = 1.79, 95% Confidence Interval: [1.18, 2.41]; see Fig. 2b). Importantly, the 28d-group showed also significantly more semantically transformed memories than the 1d-group (main effect delay: z = 2.17, p = 0.030, β = 0.64, 95% Confidence Interval: [0.06, 1.22]) without a statistically significant increase in perceptually transformed memories (all p > 0.293; see Supplementary Table 4). Again, the nature of the time-dependent changes in memory was critically dependent on the emotionality of the items: Over time, significantly fewer emotionally negative pictures were forgotten than neutral ones (delay × emotion: z = −3.00, p = 0.003, β = −0.75, 95% Confidence Interval: [−1.25, −0.26]). Even more importantly, emotionally negative pictures were significantly more often semantically transformed over time (z-test: z = −4.31, p = 2e–05, d = −1.31, 95% Confidence Interval: [−1.90, −0.71]) than neutral ones (z-test: z = −2.17, p = 0.030, d = −0.64, 95% Confidence Interval: [−1.22, −0.06]; delay × emotion: z = 2.46, p = 0.014, β = 0.66, 95% Confidence Interval: [0.14, 1.19]), in line with findings suggesting that superior memory for emotional material, indicated here by a slower forgetting rate, may come at the cost of reduced memory specificity19,23,24,25.

Participants’ relatedness ratings on Day 3 confirmed the results of our behavioral pilot study (see methods section and Supplementary Fig. 1) that semantically related lures were perceived as being significantly more semantically related (M = 9.20, SEM = 0.10) to the corresponding old picture than perceptually related (M = 2.30, SEM = 0.16; paired t-test: t(50) = −33.41, p < 9e–99, d = −4.59, 95% Confidence Interval = [−4.86, −4.32]) and unrelated lures (M = 1.72, SEM = 0.16; paired t-test: t(50) = −36.87, p < 9e–99, d = −5.02, 95% Confidence Interval = [–5.28, –4.75]; main effect lure type on semantic relatedness: F(1.22, 60.96) = 1157.08, p = 1e–43, \({\eta }_{p}^{2}\) = 0.96, 95% Confidence Interval: [0.94, 0.97]; see Fig. 3a, Supplementary Fig. 4 and Supplementary Table 5). Perceptually related lures were perceived as being significantly more perceptually related (M = 6.09, SEM = 0.21) to their corresponding old picture than unrelated lures (M = 1.74, SEM = 0.17; paired t-test: t(50) = −22.4, p < 9e–99, d = −3.05, 95% Confidence Interval = [–3.32, –2.78]; main effect lure type on perceptual relatedness: F(1.45, 72.71) = 201.81, p = 6e–32, \({\eta }_{p}^{2}\) = 0.80, 95% Confidence Interval: [0.74, 0.85]). As expected, semantically related lures were also rated higher in perceptual relatedness to their corresponding old picture (M = 5.50, SEM = 0.20) compared to unrelated lures (paired t-test: t(50) = −16.00, p < 9e–99, d = −2.18, 95% Confidence Interval = [−2.44, −1.91]). Importantly, perceptually related lures were rated as significantly higher in perceptual than in semantic relatedness to their corresponding old image (paired t-test: t(51) = 16.67, p = 3e–22, d = 3.25, 95% Confidence Interval = [2.66, 3.83]) while semantically related items were rated as significantly more semantically than perceptually related to their corresponding old image (paired t-test: t(51) = −16.38, p = 6e–22, d = −2.83, 95% Confidence Interval = [−3.37, −2.28]).

Fig. 3: Individually perceived relatedness and memory specificity.
figure 3

a Participant’s relatedness ratings confirmed that semantically related items were perceived as significantly more semantically related to the corresponding old picture than perceptually related (paired t-test: p < 9e–99) and unrelated lures (paired t-test: p < 9e–99; main effect lure type on semantic relatedness: p = 1e–43) and that perceptually related lures were perceived as significantly more perceptually related to their corresponding old picture than unrelated lures (paired t-test: p < 9e–99; main effect lure type on perceptual relatedness: p = 6e–32; mixed ANOVAs; all n = 52 participants). Bars represent mean ± SEM. Connected dots represent individual data points. b Taking these individual relatedness ratings into account when analyzing false alarms (FAs) by means of a binomial generalized linear mixed model (gLMM), confirmed that the delay-dependent increase in FAs (main effect delay: p = 0.016) was primarily driven by the semantic relatedness, specifically for emotionally negative stimuli (delay × semantic relatedness × emotion: p = 0.018). N = 52 participants. Lines represent predicted probabilities for FAs as estimated by the binomial gLMM, with error bands indicating the 95% Confidence Interval for these predicted probabilities. All post-hoc tests were applied on estimated marginal means with Šidák correction for multiple comparisons. All reported p-values are two-tailed. Source data are provided as Source Data file. *p < 0.050; ***p < 0.001.

The individual stimulus relatedness ratings on Day 3 further allowed us to analyze FAs by means of a binomial generalized LMM with the factors delay (1d vs. 28d), emotion (neutral vs. negative), semantic relatedness rating, perceptual relatedness rating and their interactions as fixed effects and the random intercept of participants and stimuli. This analysis showed, in line with the categorical analyses above, a time-dependent increase in FAs that was primarily driven by the semantic relatedness, which affected the probability of a FA in particular for emotionally negative stimuli (delay × semantic relatedness × emotion: z = 2.36, p = 0.018, β = 0.12, 95% Confidence Interval = [0.02, 0.21]; main effect delay: z = 2.40, p = 0.016, β = 0.85, 95% Confidence Interval = [0.16, 1.55]; main effect semantic relatedness: z = 2.04, p = 0.041, β = 0.07, 95% Confidence Interval = [0.003, 0.13]; Fig. 3b). We obtained no statistically significant effect of the individual perceptual relatedness ratings on FAs and their increase over time (all p > 0.127; see Supplementary Table 6).

As semantically related items are usually also high in perceptual relatedness to original stimuli, we additionally analyzed whether the delay-dependent increase in FAs for semantically related items was equally evident in semantically related lures low (≤ 5) vs. high (> 5) in perceptual relatedness. A generalized LMM with the factors perceptual relatedness level (low vs. high), delay (1d vs. 28d) and emotion (neutral vs. negative) and the random intercept of participants and stimuli confirmed our previous finding of an emotionally enhanced increase in the probability for a FA for semantically related lures over time (delay × emotion: β = 0.93, p = 0.029, z = 2.19, 95% Confidence Interval = [0.09, 1.85]). This analysis did not indicate any influence of the level of perceptual relatedness of a semantically related stimulus to its corresponding original item on FAs (all p > 0.215; see Supplementary Table 7).

In sum, our behavioral data demonstrate that memories are semantically transformed over time while we found no statistically significant evidence for a perceptual memory transformation. This time-dependent semantization of memories was further consistently more pronounced for emotionally negative than for neutral stimuli.

Distinct pattern representations of encoded events in the anterior hippocampus decrease over time

In order to examine the neural mechanisms involved in the semantic transformation of memories over time, we leveraged model-based Representational Similarity Analyses (RSAs)19,29,30 assessing how the similarity between activation patterns of encoded items and different lure types (semantically related vs. perceptually related vs. unrelated) at memory testing changes in the course of memory transformation. Here, neural representational similarity matrices (RSMs) were compared to three conceptual model RSMs (see Fig. 2a), each predicting different similarity patterns between old items and the different lure types at memory testing: (i) similar representations for old pictures that are distinct from patterns for all novel stimuli (model 1: ‘old items are distinct from all lures’), (ii) similar representations between old items and semantically related lures which are distinct from perceptually related and unrelated lures (model 2: ‘old and semantically related items are similar’) and (iii) similar representations between old items and perceptually related lures, which are distinct from semantically related and unrelated lures (model 3: ‘old and perceptually related items are similar’). Note that for all models we expected old items to be represented more similarly, as they should equally initiate recognition processes in neural areas relevant for memory representations that, in case of recent, specific memory, should be distinct from all lures (model 1), or, in case of transformed memory representations, similar to either semantically (model 2), or perceptually (model 3) related lures. Based on recent evidence19, we hypothesized that the anterior hippocampus is particularly relevant for the specificity of recent memories while the posterior hippocampus represents remote, semantically transformed memories. Accordingly, the anterior hippocampus should reflect distinct representations (model 1) at a short delay, but this representation should decrease over time, while we expected the posterior hippocampus to represent semantically transformed memory that should increase over time (model 2). A mixed ANOVA with the factors delay (1d vs. 28d), emotion (neutral vs. negative), model (1: ‘old items are distinct from all lures’ vs. 2: ‘old and semantically related items are similar’ vs. 3: ‘old and perceptually related items are similar’) and hippocampal long axis (anterior vs. posterior) revealed a significant delay × model × long axis interaction (F(1.55, 75.88) = 5.36, pcorr = 0.024, \({\eta }_{p}^{2}\) = 0.10, 95% Confidence Interval: [0.02, 0.25]) and a delay × model × long axis × emotion interaction (F(1.68, 82.19) = 4.64, pcorr = 0.034, \({\eta }_{p}^{2}\) = 0.09, 95% Confidence Interval: [0.01, 0.23]; see Fig. 4b). Note that one extreme outlier (28d group) was excluded from this analysis. Post-hoc tests confirmed a significant decrease in recognition processes for the encoded material (model 1) over time in the anterior hippocampus (two-sample t-test: t(49) = 2.42, p = 0.020, d = 0.36, 95% Confidence Interval = [0.07, 0.64]), which was significant for emotionally negative items (two-sample t-test: t(49) = 2.40, p = 0.021, d = 0.46, 95% Confidence Interval = [0.08, 0.83]), while emotionally neutral items did not show a statistically significant decrease in model fit over time (two-sample t-test: t(49) = 1.05, p = 0.297, d = 0.25, 95% Confidence Interval = [−0.22, 0.73]). Interestingly, the anterior hippocampus also showed a delay-dependent decrease in perceptually similar memory representations for neutral items (model 3, two-sample t-test: t(49) = 2.40, p = 0.003, 0.94, 95% Confidence Interval = [0.34, 1.54]) indicating a time-dependent decrease in the representation of perceptual details in the anterior hippocampus for those items. Neither the anterior hippocampus (model 2: t(49) = −0.01, p = 0.989, d = −3e–03, 95% Confidence Interval = [−0.39, 0.38]) nor the posterior hippocampus (t(49) = 1.11, p = 0.271, d = 0.23, 95% Confidence Interval = [−0.17, 0.62]) showed a statistically significant delay-dependent change in the fit to the model reflecting semantically transformed pattern representations.

Fig. 4: Computational approach for model-based RSA analyses and results along the hippocampal anterior-posterior axis.
figure 4

a Schematic overview over the creation of a neural RSM for emotionally neutral items with exemplary correlation values. Each neural RSM per region of interest (ROI), emotion category and subject was compared to three conceptual models. All depicted images are licensed under Creative Commons BY-SA License: image representing ‘old’ item is courtesy of W. Bulach (https://commons.wikimedia.org/wiki/File:00_2141_Bicycle-sharing_systems_-_Sweden.jpg; edited), image representing ‘semantically related’ item is courtesy of Matti Blume (https://commons.wikimedia.org/wiki/File:Bike_share_2019,_Berlin_(P1080139).jpg; edited), image representing ‘perceptually related’ item is courtesy of Ivy Main (https://fi.m.wikipedia.org/wiki/Tiedosto:Bottled_water_in_supermarket.JPG; edited), image representing ‘unrelated’ item is courtesy of Hannes Drexl (https://commons.wikimedia.org/wiki/File:Autokran_Seite.jpg?uselang=de; unchanged). b In the left anterior hippocampus, specifically for negative items (model 1; two-sample t-test: p = 0.021), distinct representations of encoded pictures (model 1; two-sample t-test: p = 0.019) and, specifically for emotionally neutral items (model 3; two-sample t-test: p = 0.003), perceptually similar representations (model 3; two-sample t-test: p = 0.002) decreased with increasing delay after encoding (delay × long axis × model: pcorr = 0.024; delay × emotion × long axis × model: pcorr = 0.034; mixed ANOVA; n = 51 participants). Bars represent mean ± SEM. If analyses were repeated for both hemispheres, Bonferroni-corrected p-values (pcorr) are reported. All reported p-values are two-tailed. All post-hoc tests were applied on estimated marginal means with Šidák correction for multiple comparisons. Regions of interest are visualized on a sagittal section of a T1-weighted template82 in MNI-152 space. Source data are provided as Source Data file. *p < 0.050; **p < 0.010.

Together, these data indicate that the anterior hippocampus represents recently encoded events in a detailed manner, including perceptual features, and that these anterior hippocampal representations decrease over time, while our results did not yield reliable evidence for a more gist-like, transformed pattern representation in the anterior hippocampus, neither at the 1d- nor at the 28d-delay.

Semantically transformed representations of encoded events increase in prefrontal and parietal cortices over time

While the hippocampus has been implicated to be particularly important for recently encoded and specific memories in previous studies16,18, neocortical regions are assumed to become more relevant for remote memory9,16,18,31. Specifically, the vmPFC8,9,10, IFG4,5, aCC2,11,12,13, angular gyrus and the precuneus14,15 have been associated with the formation of long-term memories. Thus, we analyzed time-dependent memory transformation processes in these neocortical long-term memory regions. We first performed a delay (1d vs. 28d) × model (1: ‘old items are distinct from all lures’ vs. 2: ‘old and semantically related items are similar’ vs. 3: ‘old and perceptually related items are similar’) × emotion (neutral vs. negative) ANOVA using a combined mask, including the vmPFC, IFG, aCC, angular gyrus and precuneus, as we expected a similar increase in transformed memory representations over time in all of those neocortical regions. This analysis showed a significant increase in representational similarity between old items and semantically related lures in the 28d- compared to the 1d-group in the neocortex (model 2; two-sample t-test: t(50) = −2.04, p = 0.047, d = −0.62, 95% Confidence Interval = [−1.21, −0.02]), and no statistically significant delay-dependent change in the fits to models reflecting distinct (model 1; two-sample t-test: t(50) = –1.62, p = 0.111, d = −0.38, 95% Confidence Interval = [−0.84, 0.08]) or perceptually similar memory representations (model 3; two-sample t-test: t(50) = 0.83, p = 0.412, d = 0.13, 95% Confidence Interval = [–0.18, 0.45]; delay × model: F(1.48, 74.03) = 7.1, p = 0.004, \({\eta }_{p}^{2}\) = 0.12, 95% Confidence Interval: [0.03, 0.29]; main effect model: F(1.48,74.03) = 19.11, p = 3e–06, \({\eta }_{p}^{2}\,\)= 0.28, 95% Confidence Interval: [0.13, 0.44]; see Fig. 5). Accordingly, neocortical activity patterns during memory testing showed a significantly higher fit to model 2 (‘old and semantically related items are similar’) than to both other models in the 28d-group (paired t-tests; model 1: t(50) = −4.02, p = 6e–04, d = −0.61, 95% Confidence Interval = [−0.91, −0.31]; model 3: t(50) = 5.50, p = 4e–06, d = 0.93, 95% Confidence Interval = [0.60, 1.26]) while there was no statistically significant difference in fit to either model in the 1d-group (paired t-tests; model 1: t(50) = −2.15, p = 0.106, d = −0.27, 95% Confidence Interval = [−0.52, −0.02]; model 3: t(50) = 1.36, p = 0.446, d = 0.19, 95% Confidence Interval = [−0.08, 0.46]). Thus, this analysis indicates the formation of semantically transformed representations of encoded events in the neocortex over time.

Fig. 5: Model-based RSA results in neocortical long-term memory storage sites.
figure 5

Upper panel: Pattern representations in a combined ROI including long-term memory cortices (vmPFC, IFG, aCC, angular gyrus and precuneus) were semantically (model 2; two-sample t-test: p = 0.047) transformed over time, while there was no statistically significant effect for the model testing for perceptually transformed representation patterns (model 3; two-sample t-test: p = 0.412; delay × model: p = 0.004; mixed ANOVA). Lower panel: Post-hoc testing revealed that this time-dependent semantization of pattern representations (model 2) was specific to the vmPFC (main effect delay: p = 0.046) and right angular gyrus (main effect delay: pcorr = 0.010; mixed ANOVAs; n = 52 participants). Bars represent mean ± SEM. If analyses were repeated for both hemispheres, Bonferroni-corrected p-values (pcorr) are reported. All reported p-values are two-tailed. All post-hoc tests were applied on estimated marginal means with Šidák correction for multiple comparisons. Regions of interest (ROIs) are visualized on sagittal (prefrontal ROIs) and axial (parietal ROIs) sections of a T1-weighted template82 in MNI-152 space. Source data are provided as Source Data file. +p < 0.060; *p < 0.050.

To investigate whether this time-dependent memory semantization was equally evident in all individual neocortical regions, we analyzed the fit of the RSM for each individual neocortical ROI to the model reflecting semantically transformed pattern representations (model 2) by means of mixed ANOVAs with the factors delay and emotion (see Fig. 5). This analysis confirmed a delay-dependent increase in representational similarity between old items and semantically related lures in the vmPFC (main effect delay: F(1, 50) = 4.19, p = 0.046, \({\eta }_{p}^{2}\) = 0.08, 95% Confidence Interval: [0.001, 0.26]) and right angular gyrus (main effect delay: F(1, 50) = 8.34, pcorr = 0.011, \({\eta }_{p}^{2}\) = 0.14, 95% Confidence Interval: [0.02, 0.34]). This analysis did not indicate a statistically significant delay-dependent change in similarity between model 2 and pattern representations in the precuneus (main effect delay: F(1, 50) = 3.93, p = 0.053, \({\eta }_{p}^{2}\) = 0.07, 95% Confidence Interval: [0.00, 0.25]; F(1,50) = 0.01, p = 0.943, \({\eta }_{p}^{2}\) = 1e–04, 95% Confidence Interval: [2e–05, 0.10]), IFG (main effect delay: F(1,50) = 2.25, p = 0.140, \({\eta }_{p}^{2}\) = 0.04, 95% Confidence Interval: [2e–04, 0.21]; delay × emotion: F(1,50) = 1.8, p = 0.186, \({\eta }_{p}^{2}\) = 0.03, 95% Confidence Interval: [1e–04, 0.19]), and aCC (main effect delay: F(1, 50) = 1.15, p = 0.289, \({\eta }_{p}^{2}\) = 0.02, 95% Confidence Interval: [6e–05, 0.16]; delay × emotion: F(1,50) = 1.41, p = 0.240, \({\eta }_{p}^{2}\) = 0.03, 95% Confidence Interval: [8e–05, 0.18]).

Furthermore, we repeated this model-based RSA in the bilateral occipital pole and Heschl’s gyrus as neocortical control regions for which we did not expect any statistically significant increase in transformed memory representations over time. Analyzing activation patterns in those regions by means of delay (1d vs. 28d) × model (1: ‘old items are distinct from all lures’ vs. 2: ‘old and semantically related items are similar’ vs. 3: ‘old and perceptually related items are similar’) × emotion (neutral vs. negative) ANOVAs did not indicate a statistically significant time-dependent change in fit of pattern representations, neither in the occipital pole (delay × model: F(1.83, 91.39) = 0.87, p = 0.415, \({\eta }_{p}^{2}\) = 0.02, 95% Confidence Interval: [0.001, 0.12]; delay × emotion × model: F (2.00, 99.82) = 0.47, p = 0.624, \({\eta }_{p}^{2}\) = 9e–03, 95% Confidence Interval: [0.001, 0.10]) nor in Heschl’s gyrus (delay × model: F(1.38, 68.96) = 0.32, p = 0.645, \({\eta }_{p}^{2}\) = 6e–03, 95% Confidence Interval: [2e–04, 0.08]; delay × emotion × model: F(1.69, 84.5) = 0.48, p = 0.587, \({\eta }_{p}^{2}\) = 1e–02, 95% Confidence Interval: [5e–04, 0.10]). Interestingly, activation patterns in the occipital pole showed an overall higher fit to model 3 (‘old and perceptually related items are similar’) compared to model 1 (paired t-test: t(50) = −4.87, p = 3e–05, d = −0.5, 95% Confidence Interval = [−0.70, −0.30]) as well as model 2 (paired t-test: t(50) = −3.85, p = 0.001, d = −0.49, 95% Confidence Interval = [−0.73, −0.24]) without a statistically significant effect of temporal delay (main effect model: F(1.83, 91.39) = 13.26, p = 2e–05, \({\eta }_{p}^{2}\,\)= 0.21, 95% Confidence Interval: [0.09, 0.36]). This finding most likely reflects the processing of overlap** visual features in old and perceptually related images in this region.

Our model-based analyses thus indicate that semantically transformed representations of previously encoded events emerge in prefrontal and posterior parietal cortices in the course of memory transformation while we did not observe any credible evidence for a perceptual transformation in these regions.

Posterior hippocampal memory reinstatement increases over time

While our model-based approach assessed the time-dependent change in representational similarity between encoded and new item categories at memory testing, we further analyzed the reactivation of individual items during memory test, i.e., Encoding-Retrieval Similarity (ERS), as a measure of trial-specific memory reinstatement20,70,71.

Quantification and statistical analysis

For our MRI data analysis, each trial of the encoding and recognition task was modeled as an individual regressor convolved with a hemodynamic response function along with six session-constants in one GLM per subject using SPM12. To increase the reliability by normalizing for noise72, the resulting beta-values were transformed into t-statistics. Data were further subjected to RSAs29 using custom scripts in MATLAB Version 2020b (The Mathworks, Inc, Natick, USA). Note that for our neural analyses, activation patterns of all trials of relevant item types were included. We opted for an analysis at the category level instead of relying on participants’ correct or incorrect responses because (i) we were interested in how the encoding-retrieval delay and lure type affected the similarity between representational patterns as an indicator of the specificity of the neural representational patterns rather than the underlying neural patterns of a specific behavioral response; (ii) (multivariate) neural data are much more sensitive to fine-grained changes in memory representations compared to behavioral data that is merely based on dichotomous ‘yes’ vs. ‘no’ (i.e. ‘old’ vs. ‘new’) responses; (iii) reducing analyses on incorrectly endorsed lures (FAs) would have resulted in an insufficient number of trials for the fMRI analyses while (iv) focusing solely on correctly endorsed items (hits) would exclude items that are particularly low in memory specificity, which are of particular interest when investigating the neural underpinnings in memory transformation over time.

Model-based retrieval-similarity analysis

We analyzed time-dependent changes of representational similarities between the different stimulus-types at recognition testing by applying a model-comparison RSA29,30,73. This approach, i.e. comparing multivariate representational patterns of all experimental trials (irrespective of the correctness of the response) to conceptual models, allows inferences about the structure of neural representations29,30,73 and has been successfully employed in previous studies to characterize memory representations, even at longer delays after encoding19,74,75,76,77 and is thus highly suitable for investigating changes in memory quality over time.

Here, separately for both emotionality categories, each trial’s activation pattern across voxels was correlated (Pearson’s r) with the activation patterns of each other trial during memory testing. Next, we computed the mean pattern similarity for comparisons within each of the three runs and for each between-run combination (run 1 and run 2, run 2 and run 3 or run 3 and run 1). Those run-related pattern similarities where then subtracted from each correlation estimate of the corresponding run-combination to account for inflated correlations as a function of temporal proximity between scans78,79. In the resulting 120 × 120 RSMs, each combination of trials was placed in the respective cells, ordered by stimulus type (Fig. 4a, left panel). The resulting neural RSMs were compared to three theoretical model RSMs (Fig. 4a, right panel), each predicting different similarity patterns between the four stimulus categories at recognition testing: similar representations for old pictures that are distinct from patterns for all novel stimuli (model 1: ‘old items are distinct from all lures’), similar representations between old items and semantically related lures which are distinct from perceptually related and unrelated lures (model 2: ‘old and semantically related items are similar’) and a model that expects similar representations between old items and perceptually related lures which are distinct from semantically related and unrelated lures (model 3: ‘old and perceptually related items are similar’). Note that for all models we expected old items to be represented more similarly, as they should equally initiate recognition processes in neural areas relevant for specific (model 1) or transformed (model 2 and model 3) memory representations. We computed Spearman’s rank correlation coefficient for each single-subject RSM and the conceptual models as we did not assume a direct linear match between the compared RSMs29. The resulting rho-values were further Fisher z-transformed and subjected to mixed ANOVAs with the factors delay (1d vs. 28d), emotion (neutral vs. negative) and a-priori model (1: ‘old items are distinct from all lures’ vs. 2: ‘old and semantically related items are similar’ and’ vs. 3: ‘old and perceptually related items are similar’) in R. As we expected a time-dependent differentiation along the anterior-posterior hippocampal long axis, we additionally included the factor long axis (anterior vs. posterior) in the analysis regarding the hippocampus. For the neocortex, we predicted a comparable increase in semantically transformed memory representations (model 2) with increasing delay in each of our prefrontal (aCC, IFG, vmPFC) and parietal (precuneus, angular gyrus) long-term memory ROIs. We therefore first performed a mixed ANOVA with the between-subjects factor delay (1d vs. 28d) and the within-subject factors emotion (neutral vs. negative) and model RSM (model 1 vs. model 2 vs. model 3) using a combined mask that included all of these prefrontal and parietal ROIs. To confirm whether the resulting effect in model 2 was equally evident in the individual neocortical storage sites, we repeated this delay × emotion ANOVA with the neural RSM of each neocortical ROI. In case analyses were repeated for both hemispheres, resulting p-values were Bonferroni corrected (pcorr) to account for multiple comparisons.

Memory reinstatement analysis

Additionally, we assessed ERS as a measure of trial-specific memory reinstatement20,32,33,34,35,36,37. Due to the important role of the hippocampus in the reinstatement of episodic memories38,39, we focused specifically on the hippocampus and the differentiation along its anterior-posterior axis in the analyses of ERS. We computed the similarity (Pearson’s r) between activation patterns across all encoding runs as a reliable indicator of encoding-related activation patterns on experimental Day 1 and activation patterns of the same item during memory testing at Day 2 (see also20). Note that contrasting this ERS measure with ERS measures based on each individual encoding run, i.e. run 1, run 2, run 3, on a trial-by-trial level yielded a very similar pattern of results and no differences in anterior (all p > 0.333) nor posterior hippocampal ERS (all p > 0.165) between different ERS measures. Resulting correlation estimates were Fisher z-transformed before statistical analyses in R were conducted. First, time-dependent changes in item-specific hippocampal ERS were analyzed by means of trial-wise LMMs with the factors delay (1d vs. 28d), emotion (neutral vs. negative), long axis (anterior vs. posterior) and their interactions as fixed effects and the random intercept of participants and stimuli. As this analysis was repeated for both hemispheres, resulting p-values were Bonferroni corrected (pcorr) to account for multiple comparisons. Further, we followed up whether the observed delay-dependent increase in left posterior hippocampal ERS was associated with a decrease in specificity of the reinstated memories. To this end, we analyzed the occurrence of a FA for a semantically related or perceptually related lure by means of binomial generalized LMMs with emotion (neutral vs. negative), delay (1d vs. 28d), ERS and their interaction as fixed effects and the random intercept of participants and stimuli.

While ERS is computed by correlating pattern representations of individual items during encoding and memory test, i.e. ‘old’ items, we furthermore assessed the similarity elicited by perceptually or semantically related items at memory test and corresponding old items during encoding as a possible indicator for a reinstatement of the perceptual or semantic gist of the original memory. The resulting Fisher transformed r-values were again subjected to LMMs with delay (1d vs. 2d), emotion (neutral vs. negative), long axis (anterior vs. posterior) and their interaction as fixed effects and the random effects of subjects and stimuli. Furthermore, we explored delay-dependent changes in memory reinstatement, i.e. ERS, and reinstatement by related material in our neocortical long-term memory as well as sensory control ROIs by means of LMMs with the fixed effects of delay (1d vs 28d), emotion (neutral vs. negative) and their interactions.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.