The retrieval of an episodic autobiographical memory (EAM), or a memory of a specific life event (Tulving, 1983), is thought to be a dynamic process, requiring engagement of self-initiated and goal-directed mechanisms (Cabeza & St. Jacques, 2007; Conway, Pleydell-Pearce, & Whitecross, 2001; Conway et al., 1999; Greenberg & Rubin, 2003; Monge, Wing, Stokes, & Cabeza, 2018). EAM retrieval is initiated with reconstruction, or the mental operation that searches through autobiographical knowledge and selects a memory (Addis, Wong, & Schacter, 2007; Conway & Pleydell-Pearce, 2000; Svoboda, McKinnon, & Levine, 2006). During this initial phase, it is hypothesized that abstract personal knowledge is usually accessed earlier in the retrieval process, cueing specific memories that commonly come later in reconstruction (Haque & Conway, 2001). However, specific memories can come to mind relatively instantaneously, such that episodic memory is accessed directly (Conway & Pleydell-Pearce, 2000). After an EAM is identified during the reconstruction phase, it can undergo elaboration, during which sensory-perceptual and other event-specific details that facilitate a sense of “reliving” are typically retrieved (Suddendorf & Corballis, 1997; Tulving, 1983, 2002). Both phases are critical components for EAM retrieval, where reconstruction guides access to the collection of autobiographical experiences and elaboration helps recall event-specific information that can be used for various purposes (e.g., reminiscence or problem-solving).

Episodic autobiographical memory reconstruction and elaboration in normal cognitive aging

Accumulating evidence indicates that normal cognitive aging is associated with a qualitative shift from episodic to semantic content (Addis, Wong, & Schacter, 2008; Devitt, Addis, & Schacter, 2017; Spreng et al., 2018; Turner & Spreng, 2015). In the context of autobiographical memory, this shift is commonly seen in studies examining aspects of reconstruction and elaboration. For instance, on cued recall and narrative tasks, in comparison with young adults, cognitively normal older adults less often reconstruct, or settle on, memories that are unique in place and time (Ford, Rubin, & Giovanello, 2014; Piolino et al., 2006; Ros, Latorre, & Serrano, 2010; Ros, Latorre, Serrano, & Ricarte, 2017; St. Jacques, Rubin, & Cabeza, 2012), turning instead toward autobiographical memories that are reflective of coarser, more semantic contexts, including lifetime periods and general events (i.e., events that are unfolding over extended periods or events that are repeated; Burgess & Shallice, 1996; Conway, 1996). Likewise, there appears to be a “semantic shift” in elaboration of EAMs, where in comparison with young adults, older adults typically retrieve fewer episodic (or internal) details while describing the unfolding of a specific event, which often is accompanied by the production of more semantic or other external details (Addis et al., 2008; Levine, Svoboda, Hay, Winocur, & Moscovitch, 2002). There is also growing appreciation for the potential effects of an age-related semantic shift on other forms of self-referential cognition that are believed to involve reconstruction and elaboration, including future thinking (Addis, Musicaro, Pan, & Schacter, 2010; Addis et al., 2008). Although EAM and episodic future thinking are proposed to have important adaptive utilities, such as aiding problem-solving and decision-making (Addis & Schacter, 2008; Schacter et al., 2012; Vandermorris, Sheldon, Winocur, & Moscovitch, 2013), cognitive sources of this move toward semantic knowledge in older adults remain unclear, and there are a number of hypothesized contributors including changes in cognition, narrative style, and motivational strategies (Andrews-Hanna, Grilli, & Irish, 2019; Gaesser, Sacchetti, Addis, & Schacter, 2011; Turner & Spreng, 2015).

Retrieval routes of episodic autobiographical memory reconstruction and normal cognitive aging: An unexplored relationship

While reconstruction and elaboration are partners in EAM retrieval, reconstruction ultimately “sets the stage” for which autobiographical memories are retrieved, and may also shape how EAMs are elaborated (Addis, Knapp, Roberts, & Schacter, 2012). From this viewpoint, it is noteworthy that very little is known about the actual unfolding of the cognitive reconstruction process among older adults, and how it might differ from young adults. In other words, to what degree is the path to reconstruction taken by older adults guided by semantic memory, and in what ways might their reconstruction journey diverge from young adults? Relatedly, could alterations in reconstruction due to age influence elaboration?

Two cognitive mechanisms that capture the path to reconstruction, and might lead to overgeneral memory, are direct and generative retrieval (Conway, 2001, 2005; Conway & Pleydell-Pearce, 2000). Direct retrieval can be likened to a bottom-up process and is thought to occur when details associated with a specific spatiotemporal context are accessed immediately, imbedding an event in a particular place and time (Conway & Pleydell-Pearce, 2000; Harris, O’Connor, & Sutton, 2015). In contrast, generative retrieval is believed to be a top-down, iterative process, as semantic memories are initially recalled and used as cues to focus the mental search toward a specific event. The EAM reconstruction process is therefore a combination of the functionality of both retrieval routes, such that EAMs can be called to mind instantaneously, or one can follow a semantic-to-episodic path to retrieval. However, despite the importance of direct and generative retrieval to EAM reconstruction, to our knowledge, no study has investigated whether older adults’ tendency to settle on overgeneral autobiographical memories is a consequence of altered direct retrieval, generative retrieval, or both.

Current theoretical models and empirical studies suggest that direct retrieval may be affected by normal aging. An important feature of direct retrieval is that this route likely places high demands on the efficient binding of episodic details, enabling a unique event to be rapidly reconstructed in the mind’s eye. In support of this viewpoint, prior research has shown that the hippocampus, a region that is critical for relational processing of episodic details (Cohen & Eichenbaum, 1993; Cohen, Poldrack, & Eichenbaum, 1997; Eichenbaum & Cohen, 2001; Yonelinas, 2014), is more strongly activated by direct retrieval relative to generative retrieval (Addis et al., 2012). The link between direct retrieval and binding is critical, because normal aging is associated with poorer relational processing (Chalfonte & Johnson, 1996; Lyle, Bloise, & Johnson, 2006; Mitchell, Johnson, Raye, & D’Esposito, 2000; Naveh-Benjamin, 2000; Old & Naveh-Benjamin, 2008). Less efficient binding, therefore, may be associated with a relatively lowered reliance on direct retrieval. Indeed, this may contribute to why older adults demonstrate decreased activation of the medial temporal lobe during EAM reconstruction relative to young adults (Addis, Roberts, & Schacter, 2011).

Generative retrieval also may be vulnerable to an age-related semantic shift. For generative retrieval, theoretical models have emphasized the importance of executive resources as autobiographical memory stores are iteratively filed through and verified (Burgess & Shallice, 1996; Conway & Pleydell-Pearce, 2000). Indeed, in one study, working memory was positively associated with the specificity of events chosen during reconstruction in both young and older adults (Ros et al., 2017). Addis et al. (2012) also found that in young adults, regions in lateral prefrontal and temporal cortical regions showed stronger activation early during generative retrieval, suggesting that utilizing this route may require coactivation of executive resources and semantic memory. Relatedly, Klein and colleagues (Klein, German, Cosmides, & Gabriel, 2004) posited that disruptions in inhibition could cause a “cueing cascade,” where semantic or episodic information provokes the retrieval of other events in a disorganized fashion, resulting in incoherent recollection. Considering age-related effects on executive functioning (Glisky, 2007; Glisky & Glisky, 2008), there are a few ways in which generative retrieval may be altered. For instance, the starting point may shift, with older adults beginning reconstruction at highly abstract semantic autobiographical knowledge. Alternatively, or in addition to this shift in how reconstruction begins, older adults may remain within semantic memories longer and/or experience a disorganized cueing cascade leading to the termination of reconstruction without reaching an EAM.

Linking age-effects on reconstruction and elaboration

Studying age-related differences in direct and generative retrieval during EAM reconstruction is important not only for identifying the cognitive mechanisms underlying overgeneral autobiographical memory but is also critical for understanding whether age-related effects are consistent across the entire EAM retrieval process (i.e., reconstruction to elaboration). One possibility is that there is a core set of cognitive processes that are operative across all parts of retrieval, guiding the trajectory from cue presentation to elaboration. For example, relational processing, working memory, and/or motivation may drive the nature of reconstruction and elaboration. From this viewpoint, individuals who demonstrate highly efficient reconstruction should also be highly episodically detailed during elaboration. Alternatively, cognitive factors that contribute to early mechanisms of reconstruction may diverge from those that alter the quality of elaboration, eliciting a weak relationship between the two. Investigating the extent to which individual differences in young and older adults’ abilities to internally navigate early reconstruction efficiently via direct and generative retrieval and elaborate in rich episodic (or internal) detail could help close this gap in knowledge, and in the process shed light on the connection between age-related effects on different aspects of EAM retrieval.

Present study

In the present study, we examined whether there is an age-related semantic shift to direct or generative EAM retrieval. As a secondary aim, we explored whether there is a link between reconstruction and elaboration. We adopted a “think-aloud” paradigm (D’Argembeau & Mathy, 2011; Uzer, Lee, & Brown, 2012) to assess the efficiency of direct and generative retrieval routes in cognitively healthy older and young adults. This design required participants to verbalize their mental reconstruction for specific events in response to cue words. Each think-aloud trial was scored for different types of autobiographical and general semantic content that correspond to their specificity of spatiotemporal context. This scoring provided a fine-grained analysis of the level at which participants started and ended reconstruction, as well as the types of memories accessed in between. To investigate the link with elaboration, participants revisited some of the EAMs generated during the think-aloud portion and described them in detail, as is commonly done in autobiographical memory studies.

We hypothesized that there would be age-related differences in the efficiency of both direct and generative retrieval modes. Consistent with a semantic shift view, we predicted that older adults would less often engage in direct retrieval relative to young adults. We also predicted that in instances of generative retrieval, older adults would less often follow a funneled iterative process to a specific event (i.e., they would terminate reconstruction at semantic memories), and they would demonstrate less “episodic efficiency” in generative retrieval, as evidenced by accessing more memories during reconstruction, starting at more abstract memories, and taking longer to complete retrieval. Finally, we predicted that if there is a common set of cognitive processes that contribute to age-related alterations in EAM reconstruction and elaboration, higher rates of retrieving a specific event via direct retrieval and generative retrieval would be correlated with higher internal detail generation.

Method

Participants

Cognitively normal young (age range 23 to 30 years) and older adults (age range 65 to 78 years) were recruited from the Tucson community. To be eligible for the current study, participants had to score within normal limits for depressive symptoms (>16) on the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977). Older adults also had to score within normal limits for cognition on the Montreal Cognitive Assessment (MoCA; Nasreddine et al., 2005), using a cutoff with good sensitivity and specificity for mild cognitive impairment (i.e., less than 25; Roalf et al., 2013).Footnote 1 Both young adults (mostly graduate students) and older adults had to deny a history of learning disabilities or attention problems and neurological conditions. All participants provided informed consent in accordance with the Institutional Review Board of the University of Arizona.

We aimed to enroll 20 participants in each age group. We justified this sample size because it was comparable to other studies of age differences in EAM retrieval (Addis et al., 2008; Ford et al., 2014; Levine et al., 2002; St. Jacques & Levine, 2007). We also referenced the effect size for the difference between generative and direct reconstruction frequencies from an earlier study with young adults (D’Argembeau & Mathy, 2011, \( {\eta}_p^2 \) = .45) to conduct an a priori power analysis. With 20 participants in each age group, the partial eta squared value from D’Argembeau and Mathy (2011) gave us high power (.99) to detect differences in direct and generative retrieval rates with an alpha level of .05 (two-tailed). In a sensitivity power analysis, these group sizes also gave us adequate power (.80) to detect medium to large effects for our comparisons of primary interest—namely, a group difference in direct retrieval rates, and a Group × Ending Memory Type interaction in generative retrieval (d ~ 0.91 and \( {\eta}_p^2 \) ~ .18, respectively, two-tailed). A sample size of 40 total participants also gave us adequate power (.80) to detect medium size relationships (r2 ~ .16) between reconstruction and elaboration performance (two-tailed).

In total, 49 participants were recruited, but six were excluded because they identified elevated depressive symptoms on the CES-D, and three older adults were excluded based on low performance on the MoCA. Therefore, as planned, there were 20 participants in each group, which were matched on self-reported education, t(38) = 0.99, p = .33, 95% CI [−1.37, 0.47], and gender (young adults: eight males/12 females; older adults: seven male/13 females). Consistent with prior research (Grilli & Verfaellie, 2015; Grilli, Wank, & Verfaellie, 2018), both young and older adults were given the Vocabulary subtest of the Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV; Wechsler, 2008), given that autobiographical memory tasks can place heavy demands on verbal skills. Older (M = 13.95, SD = 1.36) and young adults (M = 13.70, SD = 2.58) demonstrated similar levels of verbal intelligence (Vocabulary scaled scores) when correcting for age, t(28.77) = 0.38, p = .70, 95% CI [−1.08, 1.58]. See Table 1 for demographic and related sample data.

Table 1 Demographics and neuropsychological performance

Materials and procedure

Reconstruction experimental task

To assess reconstruction, we adopted a “think-aloud” paradigm similar to D’Argembeau and Mathy (2011) and Uzer et al. (2012), where, in response to a cue word, participants were instructed to retrieve specific autobiographical events (or EAMs) that occurred within the past 5 years. In our instructions to participants, a specific autobiographical event was defined as an event that occurred at a particular time and place (i.e., within a single day) and that was personally experienced, as opposed to an event heard about from others. Participants were then instructed to immediately begin speaking aloud after seeing the cue, to capture the entire reconstruction process. The participants were also reminded that they should try to say everything that comes to mind, even if it is not related to the cue word. Finally, participants were told that the specific event they retrieve did not have to be related to the cue word and that the cue should be used as a starting point.

After these introductory instructions, participants were presented with one demonstration cue word and two practice cue words with feedback, in which the experimenter clarified the definition of specific events if one was not given and reminded participants to continue speaking aloud as best they could. If participants retrieved specific events during practice items, the experimenter confirmed that the chosen events were consistent with the main goal of the task. Test items included 20 cue words (e.g., restaurant, bird, storm) and were chosen from the Clark and Paivio (2004) extended norms. Cue words were high in Thorndike–Lorge frequency (M = 1.70, SD = 0.40), imageability (M = 6.50, SD = 0.31), and concreteness (M = 6.70, SD = 0.36), and reflected comparable means and standard deviations reported by Addis et al. (2011). For each participant, the cue words were presented on a computer screen and in a random order using DMDX (Forster & Forster, 2003). Participants were given up to 2 minutes to retrieve a specific event (Addis & Schacter, 2008), but were not required to speak for the entire time allotted and were not encouraged to come up with a more specific event. If at any point during a trial the participant remained silent for more than a few seconds without indicating that they completed thinking aloud (e.g., verbal or nonverbal cues that they were finished), they were reminded to describe everything that was coming to mind. No other forms of prompting were provided, regardless of whether the participants retrieved specific or nonspecific memories, because we did not want to contaminate or alter remaining think-aloud trials. All responses were manually written down during the task administration so that experimenters could identify think-aloud trials when a specific event was recalled. From this list of specific events established by the experimenter, participants later chose six to describe during elaboration (see Elaboration Experimental Task section). Their responses were also audio recorded and later transcribed for formal scoring. The experimenters manually recorded time to retrieval of a specific event after the cue was presented.

Reconstruction scoring protocol

A coding scheme inspired by a hierarchical representation of autobiographical memory (Conway, 2005; Conway & Pleydell-Pearce, 2000) was used to identify different levels of autobiographical memory utilized by participants during their reconstruction process (D’Argembeau & Mathy, 2011; Haque & Conway, 2001). Five memory types were used to score reconstruction content, ranging from abstract to specific. These were (1) metacomments, or metacognitive statements, editorializing, or other content unrelated to reconstruction (e.g., “I can’t remember”); (2) abstract facts, or semantic facts about world knowledge or personal semantic facts about themselves or other people in their lives that are not tied to a specific time or place (e.g., “I am a runner”); (3) lifetime periods, or content describing time periods lasting at least 1 month (e.g., “We’ve been having some very exciting storms this summer”); (4) general events, or events lasting longer than 24 hours, but shorter than 1 month in duration (extended, e.g., “Well, we were on vacation in [place] again . . .”) and repeated instances of the same event that occur within 24 hours (repeated, e.g., “. . . I take her every Thursday to her physical therapy appointment”); and (5) specific events, or singular events that occur within a 24-hour period (e.g., “. . . about 3 weeks ago, my girlfriends and I drove over to [place] to pick peaches . . .”). Each think-aloud trial was scored such that we identified the starting memory type (the first memory retrieved) and ending memory type (the last memory type retrieved) using the operational definitions. Direct retrieval trials were categorized as think-aloud trials that began (and ended) with a specific event as scoring ceased once a specific event was retrieved. The remaining were identified as generative retrieval trials. We also scored all memories that were retrieved in between the starting and ending memories for generative retrieval trials. Our scoring procedure deviated from D’Argembeau and Mathy (2011) in that we scored for multiple, sequential memories that were the same type (e.g., a general event that follows a general event would be scored as two different memories; see Table 2 for examples). The average time (in seconds) to retrieve a specific event was also calculated.

Table 2 Scoring examples of participant responses

Consistent with prior research (Grilli, Wank, & Verfaellie, 2018; Verfaellie, Bousquet, & Keane, 2014), a primary scorer (A.A.W.) scored all think-aloud trials. To determine interrater reliability for our reconstruction phase scoring protocol, the primary scorer and a secondary scorer (M.D.G.) scored a practice set of five participants randomly selected from another study administering this think-aloud task. For each participant, we calculated separate proportions based on the number of items for which they started and ended the think-aloud trial with each memory type out of all valid trials. We also calculated the total number of each memory type that was scored and the average number of memory types accessed during the think-aloud trials for each participant. Interrater reliability was good to excellent for all starting memory type proportions, memory type totals, and average memory types accessed per think-aloud trial (Cronbach α range = .86 to 1). The ending memory type proportions for metacomments, general events, and specific events showed excellent reliability (Cronbach α range = .98 to 1). We were unable to calculate ending memory reliability for abstract facts and lifetime periods due to lack of variance and relative infrequency for these memory types, as scored by both raters (mean frequencies for the two raters ranged from 0 to .05 for these memory types).

Follow-up questions for each reconstruction trial

Participants were asked to rate the emotional quality and age of all specific memories retrieved to determine whether these variables were comparable between groups, given that memory remoteness and emotionality have been shown to influence other aspects of EAM retrieval (Grilli, Wank, Bercel, & Ryan, 2018; St. Jacques & Levine, 2007). We had no specific hypotheses about differences between groups on these two variables. For emotional quality, participants were asked to rate each specific event retrieved on a scale of 1 (extremely negative) to 5 (extremely positive). Participants also indicated when each of these specific events occurred: (1) within the past year, (2) between 1 and 2 years ago, (3) between 2 and 3 years ago, (4) between 3 and 4 years ago, and (5) between 4 and 5 years ago. Finally, as a manipulation check of the immediacy of the reconstruction process, after completing the reconstruction task and the emotional quality and age ratings, experimenters provided participants with a sheet of paper listing all of the cue words and asked them to circle any words for which they did not share a memory that came to mind. We did not ask participants to provide a reason, but some provided an explanation on their own.

Elaboration experimental task

After the reconstruction task, participants were instructed to select six of the 20 EAMs (or of the total number of EAMs generated by the participant and identified by the experimenter) to describe in further detail. Once selected from the list of specific events curated by the experimenter, they were asked to describe the event in as much detail as possible in concordance with the standard Autobiographical Interview (AI) procedure (Levine et al., 2002). They were given up to 5 minutes to describe each EAM and were provided a general probe asking for more details if they ended their elaboration before 5 minutes. If a participant was unable to generate at least six EAMs (this was the case for two older adults), the maximum number of available specific events was used for elaboration. There were a few instances where the experimenter marked a think-aloud trial as having a specific event recalled, but was later determined to be nonspecific in the formal scoring. These elaboration trials were removed from the analyses. The elaboration responses were audio recorded and transcribed.

Elaboration scoring protocol

The AI scoring procedure (Levine et al., 2002) was utilized to score elaborations. This protocol breaks up the narrative into two main categories, “internal,” or episodic, and “external,” or semantic and other details. These categories were further divided into more specific detail types. For the internal category, we scored for details relating to the unfolding of the event, locations, time references, thoughts and emotions had during the event, and perceptual experiences. Semantic details were coded for world knowledge and semantic information provided about the self. Other details included metacognitions, repetition of information, and external components (episodic events outside the chosen event). Internal and external details were totaled separately within each narrative, and then averaged across the total number of events elaborated.

One rater coded all narratives from each participant. This rater was trained using sample narratives provided by Levine et al. (2002), and achieved excellent reliability on a second set of training narratives from our laboratory that were scored by an expert AI scorer (Cronbach α’s > .90).

Statistical procedures

Reconstruction

For direct and generative retrieval analyses, we used proportions of all valid trials because, for eight older adults and nine young adults, one to five trials were removed for the following reasons: (1) an event was retrieved from more than 5 years ago, (2) the same event was provided for a cue word previously presented, (3) a future event was provided, (4) a memory was retrieved from the same day as testing, (5) a specific event was recalled for which they were not personally involved, or (6) the audio recorder did not record part of the think-aloud trial for an item.

To determine whether the current study replicated previous findings of overgeneral autobiographical memory in older adults, we conducted independent-samples t tests of total specific events retrieved across all valid retrieval trials (regardless of whether they were direct or generative) between older and young adults. However, since there were differences in the number of valid trials across participants, we also conducted independent-samples t tests to examine age differences in the proportion of trials that ended with specific events out of all valid think-aloud trials.

Direct retrieval analyses used the proportion of trials for which participants began reconstruction with a specific event (e.g., out of the 20 trials for a given individual, how many had a specific event as the first memory?). We also calculated the average time (in seconds) to retrieve specific events via direct retrieval once the cue word was presented. Independent-samples t tests were used to assess for age differences in direct retrieval proportions and average reaction time.

Generative retrieval analyses focused only on valid trials for which the first memory retrieved was nonspecific (i.e., we removed all direct retrieval trials). To analyze where in the autobiographical memory hierarchy participants initiated reconstruction, we created proportions of generative retrieval think-aloud trials beginning with each nonspecific memory type (i.e., metacomments, abstract facts, lifetime periods, and general events) divided by total valid generative retrieval trials. Similarly, we calculated proportions for the number of generative retrieval trials that ended with each memory type out of total generative trials. We also calculated the average number of memories accessed per generative retrieval trial when they retrieved a specific event (“successful”) and, separately, when they did not retrieve a specific event (“unsuccessful”). Average word count and average time (in seconds) variables were created for generative retrieval trials that were successful and unsuccessful. Lastly, we attempted to capture lateral or backwards “movement” through the autobiographical memory hierarchy, which may indicate difficulty in retrieving a specific event (e.g., retrieving multiple abstract facts in a row or recalling a lifetime period, then a general event, only to move back to a lifetime period). For this analysis, we calculated the proportion of successful generative retrieval trials for which a lateral or backwards move was made out of total successful generative retrieval trials. We also created the same variable for unsuccessful generative retrieval trials.

Starting and ending memory type proportions in generative retrieval trials compared between older and young adults was assessed with a 2 (age group: older/young) × 4 (starting memory type: metacomment/abstract fact/lifetime period/general event) mixed analysis of variance (ANOVA) and a 2 (age group) × 5 (ending memory type: metacomment/abstract fact/lifetime period/general event/specific event) mixed ANOVA, respectively. Main and interaction effects were followed up with post hoc t tests. We used a Bonferroni correction for main effects of starting and ending memory types because we did not have specific predictions about the relative frequency at which memory types would be retrieved. Independent-samples t tests were conducted to examine age-related differences in average memories accessed, average word count, proportion of lateral or backward moves, and average reaction time in both successful and unsuccessful generative retrieval trials.

Follow-up questions for each reconstruction trial

We calculated the average emotional valence rating and age of specific memories retrieved to compare between older and young adults using independent-samples t tests. The average number of think-aloud trials for participants who thought of a memory but did not share it was also calculated and compared between age groups using an independent-samples t test.

Elaboration

A 2 (age group) × 2 (detail type: internal/external) mixed-measures ANOVA and post hoc t tests were conducted to examine age differences in episodic compared with nonepisodic detail generation while elaborating on specific events. In addition, we examined age-related differences in episodic specificity (i.e., the average number of internal details divided by total details) to be consistent with procedures from prior work (Grilli, Wank, Bercel, & Ryan, 2018; Levine et al., 2002). Finally, Pearson correlations between direct and successful generative retrieval proportions and average internal detail production from elaborations were conducted to determine whether there were similarities in the retrieval process across retrieval phases. We also calculated the difference between the correlation values of internal detail production with direct and successful generative retrieval proportions across the entire sample (Diedenhofen & Musch, 2015; Steiger, 1980).

Results

Reconstruction

Ending memory type of all think-aloud trials

Replicating previous research, we found that older adults (M = 12.15, SD = 4.80) produced fewer specific events compared with young adults (M = 16.25, SD = 2.61) when collapsing across direct and generative retrieval trials, t(29.34) = 3.35, p = .002, 95% CI [−6.60, −1.60], d = 1.06. To control for differences across participants in valid think-aloud trials, we also found that in comparison with young adults (M = .84, SD = .13), older adults (M = .63, SD = .24) had a lower proportion of trials where specific events were retrieved, t(29.17) = 3.56, p = .001, 95% CI [−.34, −.09], d = 1.12.

Direct retrieval

As represented in Fig. 1, older adults (M = .23, SD = .17) less often engaged in direct retrieval in comparison with young adults (M = .36, SD = .21), t(38) = 2.08, p = .04, 95% CI [−.25, −.003], d = 0.66. Older adults (M = 5.83, SD = 3.02) took longer (in seconds) than young adults (M = 3.87, SD = 2.19) to retrieve a specific event in direct retrieval trials, t(36) = 2.31, p = .03, 95% CI [0.24, 3.68], d = 0.75.

Fig. 1
figure 1

Direct retrieval proportions. * represents statistical significance at the p < .05 level. Lines within boxplots represent medians, and triangles represent means. Horizontal edges represent first and third quartiles. Whiskers represent 1.5 standard deviations above the upper quartile and below the lower quartile. Figure created using RStudio (R Core Team, 2019) and the ggplot2 package (Wickham, 2016)

Generative retrieval

Starting memory type. Results from a two-way 2 (age group) × 4 (starting memory type) mixed ANOVA with the Greenhouse–Geisser correction included a significant main effect of starting memory, F(2.10, 79.88) = 34.75, p < .001, \( {\eta}_p^2 \) = .48, indicating that, as shown in Fig. 2, starting memory types differed across participants regardless of age. However, there was no interaction, F(2.10, 79.88) = 0.31, p = .74, \( {\eta}_p^2 \) = .008, or effect of age group (as we examined proportional data). Given that we did not have specific predictions about the relative frequency of starting memory types, we used a Bonferroni correction for the post hoc t tests (adjusted alpha = .008). The most common starting memory type was abstract facts (M = .50, SD = .20), followed by metacomments (M = .21, SD = .19) and general events (M = .21, SD = .15), and then lifetime periods (M = .08, SD = .07). All contrasts of starting memory type proportions were significant, ts > 3.75, ps < .001, ds > 0.93, except for the comparison of metacomments and general events, t(40) = .12, p = .91, d = .03.

Fig. 2
figure 2

Generative retrieval: Starting memory type. Proportions of generative think-aloud trials that began with each memory type. Lines within boxplots represent medians, and triangles represent means. Horizontal edges represent first and third quartiles. Whiskers represent 1.5 standard deviations above the upper quartile and below the lower quartile. Figure created using RStudio (R Core Team, 2019) and the ggplot2 package (Wickham, 2016)

Ending memory type. A two-way 2 (age group) × 5 (ending memory type) mixed ANOVA with the Greenhouse–Geisser correction, revealed a main effect of ending memory, F(1.43, 54.38) = 79.36, p < .001, \( {\eta}_p^2 \) = .68, suggesting that there were differences in memory types that ended think-aloud trials across all participants. The analysis also found an interaction between age group and ending memory type, F(1.43, 54.38) = 5.22, p = .02, \( {\eta}_p^2 \) = .12, indicating that young and older adults differed in the memory types ending think-aloud trials. Critically, as shown in Fig. 3, older adults ended at specific events (M = .52, SD = .29) less often than young adults (M = .73, SD = .24), t(38) = 2.52, p = .02, 95% CI [−.38, −.04], d = 0.80, whereas older adults ended at general events (M = .27, SD = .19) more often than young adults (M = .15, SD = .15), t(38) = 2.28, p = .03, 95% CI [.01, .23], d = 0.72. Young and older adults did not statistically differ in how often they ended their reconstruction at metacomments, abstract facts, or lifetime periods, ts < 1.82, ps > .08, ds < 0.58. As expected, there was no effect of group, given that we investigated proportional data.

Fig. 3
figure 3

Generative retrieval: Ending memory type. Proportions of generative think-aloud trials that ended with each memory type. * represents statistical significance at the p < .05 level. Lines within boxplots represent medians, and triangles represent means. Horizontal edges represent first and third quartiles. Whiskers represent 1.5 standard deviations above the upper quartile and below the lower quartile. Figure created using RStudio (R Core Team, 2019) and the ggplot2 package (Wickham, 2016)

For the post hoc t tests for ending memory type, we again used a Bonferroni correction, as we did not have specific predictions about the relative frequencies (adjusted alpha level = .005). Specific events were the most common end point among young adults, ts > 6.77, ps < .001, ds > 2.94. Although young participants ended think-aloud trials more often with general events compared with lifetime periods, t(19) = 4.36, p < .001, d = 1.05, they did not statistically differ in how often they ended trials with other memory types, ts < 3.03, ps > .007. The specific event level was also the most common end point among older adults compared with metacomments, abstract facts, and lifetime periods, ts > 4.62, ps < .001, ds > 1.98. The contrast between proportions of ending think-aloud trials at specific events compared with ending at general events in this age group was also significant, t(19) = 2.40, p = .02, d = 1.02, but did not survive correction for multiple comparisons (Bonferroni-corrected alpha value = .005). Ending think-aloud trials at general events was also more common than ending at metacomments, abstract facts, and lifetime periods in the older adult group, ts > 4.39, ps < .001, ds > 0.96. There were no differences in proportions of think-aloud trials where older adults ended at metacomments, abstract facts, and lifetime period, ts < 2.48, ps > .02, ds < 0.67.

Additional analyses of generative retrieval. Older adults (M = 3.14, SD = 0.92) did not significantly differ in average memories accessed during generative retrieval trials that ended with a specific event (i.e., successful generative retrieval) compared with young adults (M = 2.99, SD = 0.79), t(38) = 0.55, p = .59, 95% CI [−0.40, 0.70], d = 0.17. On trials of generative retrieval ending at nonspecific memory types (i.e., unsuccessful generative retrieval), there was a marginal effect where older adults (M = 4.23, SD = 2.14) retrieved more memories during think-aloud trials on average than young adults (M = 3.07, SD = 1.47), t(30.15) = 1.90, p = .07, 95% CI [−0.09, 2.41], d = 0.63. Regarding word count, which may generally reflect the amount of content retrieved during generative retrieval, we found that average words did not differ between older (M = 38.84, SD = 22.24) and young adults (M = 31.26, SD = 22.55) for generative think-aloud trials ending at specific events, t(38) = 1.07, p = .29, 95% CI [−6.75, 21.92], d = 0.34. Interestingly, for generative think-aloud trials ending at nonspecific event levels, older adults (M = 107.13, SD = 55.54) provided significantly more words on average than young adults (M = 66.14, SD = 42.49), t(34) = 2.49, p = .02, 95% CI [7.49, 74.49], d = 0.83. Two older adults and two younger adults were excluded from the generative retrieval analyses of ending at nonspecific events because they provided a specific event on every generative retrieval trial.

To provide a clearer picture on the efficiency of generative retrieval, we sought to determine whether, in comparison with young adults, older adults made more lateral or backward moves during all generative reconstruction trials (e.g., two or more abstract facts in a row or moving from a lifetime period to an abstract fact), which could reflect reconstruction that is less episodically goal consistent (i.e., “cascaded cueing,” as Klein et al., 2004, might suggest). For each participant, we calculated a proportion of cue words for which at least one lateral or backward move was made divided by total valid trials. A two-tailed, independent-samples t test revealed no significant differences between young (M = .40, SD = .32) and older adults (M = .39, SD = .29) when examining both generative think-aloud trials ending at specific events, t(38) = 0.08, p = .93, 95% CI [−.20, .19], and trials ending at nonspecific events (young adults: M = .54, SD = .36; older adults: M = .69, SD = .31), t(33) = 1.38, p = .18, 95% CI [−.07, .39]. Finally, the time taken (in seconds) from when the cue word was presented to reaching a specific event in generative retrieval think-aloud trials did not statistically differ between young (M = 19.92, SD = 12.13) and older adults (M = 26.40, SD = 12.11), t(38) = 1.69, p = .10, 95% CI [−1.28, 14.24], d = 0.53.

Follow-up questions for each reconstruction trial

The mean response of emotional valence for the EAMs retrieved by the young adults (M = 3.47, SD = 0.36) and older adults (M = 3.58, SD = 0.52) was between neutral and somewhat positive, and did not significantly differ between groups, t(38) = 0.78, p = .44, 95% CI [−0.18, 0.40]. Regarding the age of the EAMs, the mean response was not significantly different between the age groups, t(38) = 0.06, p = .95, 95% CI [−0.29, 0.27], and was within the past 2 years for both the young adult group (M = 1.73, SD = 0.46) and the older adult group (M = 1.72, SD = 0.42). Finally, 15 young adults and 11 older adults reported first nonverbally retrieving a memory that they ultimately decided not to share before providing the recorded response. For these individuals, deciding not to report the memory that first came to mind occurred for 3.60 (SD = 2.47) cue words on average in the young adults and 3.36 (SD = 2.94) cue words on average for the older adults, t(24) = 0.22, p = .83, 95% CI [−2.43, 1.96]. Nine of these participants freely provided that the memory first retrieved, but not ultimately shared, did not fit the constraints of the instructions. Notably, all generative retrieval results presented in previous sections remained the same with these trials removed from the analyses (ending memory main effect and interaction: ps < .05; starting memory main effect: p < .001), however, the age effect of direct retrieval became nonsignificant, t(38) = 1.72, p = .09, 95% CI [.24, .02].

Supplemental data on reconstruction

Please see Supplemental Fig. S1 for a depiction of the frequency at which different types of memory were retrieved during reconstruction.

Elaboration

Age comparison of detail composition

One older adult did not complete the elaboration portion of the task because of time constraints. A mixed-measures 2 (age group) × 2 (detail type) ANOVA revealed both a main effect of detail type, F(1, 37) = 201.34, p < .001, \( {\eta}_p^2 \) = .85, and an interaction of age group and detail type, F(1, 37) = 4.92, p = .03, \( {\eta}_p^2 \) = .12. Post hoc t tests found a marginal group difference where young adults (M = 48.27, SD = 19.50), on average, provided a greater number of internal details compared with older adults (M = 37.24, SD = 16.49), t(37) = 1.90, p = .06, 95% CI [−22.78, 0.72], d = 0.61. There were no differences between young (M = 9.27, SD = 4.29) and older adults (M = 8.78, SD = 4.80) for external detail production, t(37) = 0.34, p = .74, 95% CI [−3.44, 2.46], d = 0.11.Footnote 2 We also examined a proportional score for episodic specificity, which revealed that young adults (M = .84, SD = .04) demonstrated a higher degree of specificity compared with older adults (M = .81, SD = .06), t(37) = 2.08, p = .04, 95% CI [−.06, −.001], d = 0.67.

Direct and generative retrieval associations with internal details

Across the entire sample, average internal details provided in elaborations was not related to direct retrieval proportions, r(37) = −.12, p = .46, but was significantly and positively correlated with successful generative retrieval proportions, r(37) = .61, p < .001. There was a significant difference between the magnitude of these correlations, z = 3.42, p < .001. A similar pattern emerged when examining each age group separately for these correlations. Internal detail production was not significantly related to direct retrieval in young, r(18) = −.29, p = .22, or older adults, r(17) = −.14, p = .58, but was positively associated with successful generative retrieval in each group, young: r(18) = .60, p = .005, older: r(17) = .54, p = .02. These correlations are shown in Fig. 4.

Fig. 4
figure 4

Reconstruction and elaboration correlations. Average internal details across events per participant correlated with proportions of direct retrieval and successful generative retrieval. Dotted lines represent the regression relationships for entire sample. Figure created using RStudio (R Core Team, 2019) and the ggplot2 package (Wickham, 2016)

Discussion

Evidence from prior research suggests that older adults exhibit a tendency to be less specific in both the selection of autobiographical memories (Ford et al., 2014; Piolino et al., 2006; Ros et al., 2010; Ros et al., 2017; St. Jacques et al., 2012) and the elaboration of EAMs (Addis et al., 2008; Levine et al., 2002). Our study was the first to investigate direct and generative retrieval in the context of age-related differences in mental reconstruction and how they affect memories selected during the retrieval process. Additionally, we explored the relationship between reconstruction success via the two retrieval routes and elaboration performance. Four main findings emerged: (1) older adults produced fewer EAMs, or specific events, during reconstruction compared with younger adults, replicating prior work; (2) older adults engaged in direct retrieval less often than young adults; (3) older adults ended generative retrieval think-aloud trials with a specific event less often than young adults, and instead ended with nonspecific general events more so than young adults; and (4) average internal details produced during elaborations was significantly and positively related to generative retrieval success (i.e., ending generative retrieval trials with a specific event) in both young and older adults, but was not significantly correlated with direct retrieval.

The main goal of this study was to determine the extent to which older adults exhibited a semantic shift (Devitt et al., 2017; Spreng et al., 2018; Turner & Spreng, 2015) in their reconstruction journey by focusing on two cognitive mechanisms that might contribute to overgeneral memory selection—namely, direct and generative retrieval. Consistent with prior research (Ford et al., 2014; Piolino et al., 2006; Ros et al., 2010; Ros et al., 2017; St. Jacques et al., 2012), we found that older adults selected fewer specific events relative to young adults on a cue word task. By employing a think-aloud paradigm, the current study provided novel evidence that this age-related semantic shift in EAM retrieval is, at least in part, a product of disruptions in both direct and generative retrieval. Specifically, older adults engaged in bottom-up reconstruction (i.e., directly accessing specific events) less often and more slowly, and therefore demonstrated a greater reliance on top-down reconstruction (i.e., first accessing autobiographical memory stores via generative retrieval). Generative retrieval, however, also was less effective at cueing specific events in older adults, as evidenced by a greater tendency to terminate reconstruction at nonspecific general events. In other words, our findings reveal that in comparison with young adults, older adults reconstruct less specific autobiographical memories because direct retrieval of such memories is rarer, and generative retrieval less frequently leads to a specific event.

Regarding the pattern of reconstruction that characterized generative retrieval, there were additional differences, but also some similarities, between young and older adults. Relative to young adults, older adults generated more words and mentally accessed marginally, albeit not significantly, more nonspecific memories while they unsuccessfully attempted to reconstruct EAMs. These findings could be an instantiation of the cueing cascade put forth by Klein et al. (2004), as nonspecific memories led to the retrieval of more nonspecific memories in the absence of retrieving a specific event. This pattern of results was not found for generative retrieval trials that ended with a specific event, suggesting that when older adults are able to successfully complete generative retrieval, reconstruction processes are not remarkably different from those of young adults. Also, the age-related semantic shift in generative retrieval appeared not to affect other aspects of reconstruction. Compared with younger adults, the older group did not significantly differ in the nonspecific autobiographical memory types initially accessed, in the time to reach specific events, and in the proportions of think-aloud trials with lateral or parallel moves in the autobiographical memory hierarchy during generative reconstruction.

Another question the current study aimed to address was whether the ability to successfully reconstruct EAMs through direct and generative routes might share a common set of cognitive processes with the ability to provide a high degree of episodic precision during elaboration of one’s memories. Previous research partly addressed this question by instructing young adults to write down and elaborate specific memories in response to cue words (Kyung, Yanes-Lukin, & Roberts, 2016). They found no relationship between the proportion of specific memories and memory detail, but did not identify which memories were retrieved via direct or generative retrieval, nor did they examine this effect in older adults. When examining direct and generative retrieval trials separately, the current study found that for both young and older adults, individuals who were better at successfully navigating the generative reconstruction process were also those who tended to generate the most internal details during elaboration of their memories. Individual differences in elaboration, however, were not significantly related to individual differences in direct retrieval in either young or older adults. Overall, these results suggest similar cognitive processes underlie both EAM generative retrieval and elaboration, whereas direct retrieval may depend on cognitive processes that are distinct and largely unrelated to those of elaboration. One possibility is that direct retrieval relies on scene construction (Hassabis, Kumaran, & Maguire, 2007; Hassabis & Maguire, 2007, 2009) or perceptual components of memory (Sheldon, Fenerci, & Gurguryan, 2019) more than generative reconstruction and elaboration. In contrast, generative reconstruction and later elaboration could be, at least in part, supported by executive functions like working memory and inhibition to keep in mind and maintain reconstruction/elaboration goals (Addis et al., 2008). Another possibility is that direct retrieval is an outcome of rapid pattern completion from the given cue word, whereas generative retrieval, similar to elaboration, involves searching autobiographical memory stores until an appropriate cue for pattern completion is found (Horner, Bisby, Bush, Lin, & Burgess, 2015; Sheldon & Levine, 2016). Lastly, while the present study hints at the possibility that reconstruction may shape elaboration, a future study can investigate this idea more directly by allowing think-aloud trials to naturally transition from reconstruction to elaboration.

In addition to age differences in reconstruction, we found that during elaboration, older adults had lower episodic specificity than young adults, a highly replicated result (Addis et al., 2010; Addis et al., 2008; Gaesser et al., 2011; Levine et al., 2002; Martinelli et al., 2013; St. Jacques & Levine, 2007; St. Jacques et al., 2012). However, when examining age effects on average internal and external details, older adults were marginally lower in internal details with no difference in external detail production compared with young adults. These results deviate to some extent from previous research, which has found more prominent age-related reductions in internal detail, often accompanied by elevated use of external details. The slightly different pattern of results in the present study may be due to our experimental separation of reconstruction and elaboration. Specifically, the AI or other like designs may unintentionally capture some or all of reconstruction in elaboration trials, which in older adults would include more semantic memories than in young adults, given our reconstruction results. In support of this idea, we note that episodic specificity for young and older adults in the present study seems to be slightly higher than what is commonly reported in the literature.

Our use of a think-aloud design may hint at another potential source of age differences typically observed in overgeneral autobiographical memory (Ford et al., 2014; Piolino et al., 2006; Ros et al., 2010; Ros et al., 2017; St. Jacques et al., 2012). Although we stopped scoring whether a participant reached a specific event, in some instances participants continued thinking aloud, either by describing the specific event in more detail or resuming reconstruction in search of another memory. It is interesting to consider whether older adults, compared with young, would be more likely to resume reconstruction with more abstract knowledge after retrieving specific events. An age-associated tendency to resume reconstruction might lead to inflated estimates of overgeneral memory, as the only memory recorded in non-think-aloud paradigms is the last one retrieved.

More broadly, a semantic shift in reconstruction and elaboration of EAM has the potential to be both detrimental and adaptive. On one hand, it may lead to decreased flexibility in the use of various types of autobiographical content. Sheldon et al. (2019) proposed that conceptual and perceptual subsystems facilitate episodic memory, with the former possibly map** onto reconstruction and general events, and the latter onto elaboration and specific events. Sheldon et al. (2019) also posit that decision-making relies on both conceptual and perceptual forms of remembering such that the sole use of one over the other could lead to faulty decisions. According to this framework, older adults might demonstrate some rigidity in decision-making due to reduced direct retrieval and elaboration of episodic details in conjunction with a greater reliance on generative retrieval of conceptual autobiographical components. On the other hand, older adults may wish to convey different goals or messages when accessing their autobiographical memory, possibly reflecting a distinct narrative style from that of young adults who tend to provide precise and event-specific recollections (James, Burke, Austin, & Hulme, 1998). Although the current study does not specifically address this view, in the context of mental reconstruction, it is possible that a purposeful and generative process is preferred by older adults to choose and elaborate a memory that subserves these different communicative goals. To better understand this issue, future studies could implement a design that does not require verbal descriptions of thought processes, such as a button-press procedure (Addis et al., 2012; Harris et al., 2015; Jeunehomme & D’Argembeau, 2016; Uzer et al., 2012) or a narrative task that does not require memory retrieval (Gaesser et al., 2011).

The findings from our self-report measures of EAM retrieval speak against emotional valence or memory age explanations of the observed age-related difference in direct or generative retrieval. Critically, the selected memories, which were commonly rated as neutral to somewhat positive and occurring within the past 2 years, did not vary by group for emotional valence or age of memory. It is plausible, however, that the 5-year constraint on memory retrieval in this task may have affected the age groups differently. The time frame for young adults overlaps with the reminiscence bump, whereas older adults are decades removed from those self-defining, specific memories. That said, studies examining episodic specificity of elaboration using the AI (Levine et al., 2002) do not necessarily find that memories from the reminiscence bump are more detailed, and memories from the past few years still show age effects (Acevedo-Molina, Matijevic, & Grilli, 2020; Levine et al., 2002). Nonetheless, similar to prior research, we cannot rule out that older adults tend to select EAMs that were inherently less specific to begin with, regardless of time period. The commonly used memory cueing methodology limits our ability to explore this possibility because we cannot determine whether specific events chosen for elaboration were equally detailed when they occurred across age groups.

Although it is important to examine verbal reconstruction and elaboration of EAM retrieval because of its occurrence in social conversations outside of the laboratory (Wank et al., 2020), there are limitations of the think-aloud design to consider. It is possible that participants felt uncomfortable providing their thought process during memory reconstruction, which may have affected our direct retrieval observations. We attempted to get a sense of how often this occurred by asking participants whether there were any cue words for which a memory came to mind, but they decided not to share it aloud. We found that on a small proportion of think-aloud trials, most participants reported thinking of, but deciding not to share an initial EAM. Participants who chose to disclose why they did not share those memories explained that the recalled events generally did not fit within the constraints of the task. After removing trials for which participants endorsed this phenomenon and reanalyzing the data, the generative retrieval results remained the same. However, the age effect of direct retrieval became nonsignificant, which could mean that older adults, more so than young adults, engaged in direct retrieval after first retrieving an unshared memory. Another possible influence on direct retrieval identification is whether there is an acceptable duration threshold between cue-word presentation and specific event verbalization. Consistent with prior research (D’Argembeau & Mathy, 2011; Harris et al., 2015; Uzer et al., 2012), we did not set a time threshold for direct retrieval, although both older and younger adults tended to complete direct retrieval within a few seconds (means of 5.83 and 3.87 seconds, respectively). We thought it was important to score trials as direct retrievals, regardless of delay, as long as the first memory recalled out loud was a specific event, as we cannot rule out that no retrieval process had begun prior to the first verbalization (i.e., the mind could be “blank”). That said, it is possible that some of these trials were actually covert generative retrievals, meaning that the participant was actively reconstructing without thinking aloud. Our attempts to prompt participants to think aloud when there were prolonged periods of silence likely helped mitigate covert retrieval. Our follow-up introspective question and analyses also might have partly addressed this issue (i.e., did you retrieve a memory that you did not share?). Nonetheless, future research could include a posttrial question querying mental processes during periods of silence, to ensure that these trials were not filled with covert generative retrieval.

As mentioned, another factor that could affect how one approaches the think-aloud task is narrative style, which may shift in older age. From this view, some of our age-related outcomes may better capture verbalized retrieval of EAM in the real-world, with internal mental reconstruction possibly following a different pattern. Despite the potential influence of narrative style, our think-aloud design captured the well-documented overgeneral autobiographical memory effect in older adults, suggesting that any interaction between narrative style and think-aloud designs has a negligible effect on the end result of reconstruction. Additionally, the present study cannot address the relationship between spontaneous and intentionally retrieved memories with that of direct and generative retrieval (Barzykowski & Staugaard, 2017; Berntsen, 2010; Berntsen & Jacobsen, 2008). It will be important to merge these two lines of research in future studies, as it is also possible that there is a shared process or set of processes facilitating intentional (e.g., direct retrieval) and unintentional recollection or other forms of thought (e.g., task unrelated thoughts) that are believed to be affected with increasing age (Berntsen, Rubin, & Salgado, 2015; Maillet & Schacter, 2016).

In summary, we demonstrated that two cognitive mechanisms of reconstruction, direct and generative retrieval, experience age-related semantic shifts, contributing to overgeneral autobiographical memory in older adults. We also found that individual differences in generative retrieval were related to individual differences in the tendency to be episodically specific during elaboration, such that individuals who tended to recall specific events via generative retrieval also elaborated on their memories with more internal detail. These results highlight the importance of investigating EAM reconstruction in older adults, as they indicate that the ability to reconstruct a specific event is diverted from an optimal trajectory early in the retrieval process. Future research could investigate whether EAM reconstruction can be modulated among older adults and how alterations in reconstruction could impact the adaptive utility of such retrieval.