Background

Alzheimer’s disease (AD) is the most common cause of dementia [1]. It is characterized by the insidious accumulation of beta-amyloid and tau proteins ensuing damage to neurons and accompanied with progressive cognitive and behavioral changes. The AD continuum is characterized by three phases, (a) preclinical, (b) mild cognitive impairment (MCI) and (c) AD dementia. One condition which has received increasing attention as an indicator of preclinical AD is subjective cognitive decline (SCD), described as the perception (by oneself or a close contact) of worsening of one's mental abilities, despite seemingly unimpaired performance on objective tests [2]. SCD has been associated with increased risks of future objective cognitive decline [3], as well as increased likelihood of biomarker abnormalities consistent with AD pathology [4]. In the intermediate stage between SCD and AD dementia, MCI patients present objective impairment in one or more cognitive domains [5], but their cognitive changes are mild enough that they require minimal aid or assistance, retaining independence of function in their daily life. On the other hand, AD dementia is associated with more significant cognitive impairments in at least two cognitive domains, which in this case interferes with independence and activities of daily living [6]. The typical amnestic AD dementia most prominently affects learning of new information (episodic memory), but deficits can also be observed in language, visuospatial or executive functions, and through behavioral abnormalities or personality changes. Measuring cognitive decline is therefore central in assessing individuals on the AD continuum. To establish cognitive decline, clinicians will often rely on self and relative-reported changes, as well as comparisons to demographically-adjusted norms of cognitive performance in healthy individuals. Another method is to compare current abilities to an estimate of one’s baseline abilities before they were affected by the disease, often referred to as premorbid abilities.

Historically and across many countries, one of the ways to estimate premorbid abilities in patients is the administration of word reading tests [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. This method relies on the assumptions that (a) reading abilities reached by a normal adult is related to their general intelligence and (b) once reading becomes a highly practiced and overlearned skill, it can be maintained at a high level despite deteriorations in other areas of intellectual functioning [23]. These assumptions are consistent with the Cattell-Horn theory of intelligence, which divides general intelligence into two distinct but correlated categories, that is crystallized and fluid intelligence, where crystallized intelligence refers to learned abilities and accumulated knowledge, word reading abilities being an example of, and fluid intelligence refers to more innate mental abilities such as reasoning, memory span and processing speed [24, 25]. As with its name, it is understood that crystallized intelligence remains relatively stable across the lifespan [26] whilst fluid intelligence is vulnerable to the effects of normal ageing [27]. Performances on tasks implicating crystallized intelligence such as reading have therefore been used in older adults or adults with acquired cognitive impairment to estimate general baseline abilities.

In 1978, Nelson and O’Connell introduce the first irregular word reading test, the New (later changed to “National”) Adult Reading Test (NART) [28]. The logic behind the use of irregular word reading, as opposed to regular word reading, in estimating premorbid intelligence, is that irregular word reading relies on familiarity to specific words with exceptional spelling. For example, “pint” can only be read correctly by a person who knows of the word and recognises it. Its pronunciation indeed cannot be guessed through the application of common rules of grapheme-phoneme correspondence, as that would only result in reading it like “mint”. Therefore, the accurate reading of less frequent irregular words would indicate a larger premorbid vocabulary, which would be related to a high premorbid intellectual quotient (IQ). This assumption was verified on many occasions in healthy adults, most recently when the NART was standardized against the Weschler Adult Intelligence Scale IV, both tests correlating with r = 0.69 [29]. Additional to this measure of validity, the reliability of NART-like tests was found to be excellent with an estimated Cronbach’s α of around 0.93 [30].

While the validity and reliability of the NART as a measure of premorbid intelligence was clearly demonstrated in cognitively unimpaired adults, the situation might be different in neurodegenerative disorders, whose disease-related cognitive impairments could impact irregular word reading performance. In 1996, Taylor and colleagues pointed out that if estimates of premorbid IQ in patients with neurodegenerative disorders are to be considered valid and accurate, they should (a) not differ significantly from those of demographically matched control subjects (i.e., cognitively unimpaired older adults) and (b) not significantly change as disease progresses in severity.

Regarding Taylor’s first criteria, conflicting evidence have been reported. While many cross-sectional studies support the use of NART-like tests in estimating premorbid IQ of AD patients [18, 22, 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47], many cross-sectional studies have observed significant differences on NART-like test scores between demographically matched HC and AD participants, thus giving support to the theory that irregular word reading might be affected in AD dementia and that this widely used test does not give an accurate estimate of premorbid intelligence in this population [48,49,50,51,52,53,54,55,56,57,58,59,60,61]. Even more importantly, regarding the second criteria, several longitudinal studies observed a significant decline in NART-like performance in AD participants over time [62,63,64,65,66,67,68,69,70], suggesting that these tests are sensitive to dementia-related cognitive impairments.

These conflicting results could be the result of different factors. A first problem with many of the aforementioned studies is that they have been conducted in the 90s and early 2000s, when concepts like SCD and MCI didn’t exist. The same can be said for AD dementia criteria which were not as well developed at the time [6]. In those older studies, it is possible that SCD or MCI participants were classified as normal controls or that other types of dementias were diagnosed as AD dementia. Of note, SCD patients are absent from all the aforementioned studies whilst only three included MCI participants (that is, [22, 42, 70]). Previous studies were also conducted using a relatively low sample size, most often with less than 50 AD participants. This brings particular concern towards the studies in support of the accuracy of irregular word reading premorbid IQ estimates in AD dementia, because in some, nonsignificant differences suggested that a larger sample size would reveal statistical and clinical significance in control-AD comparisons [35, 36, 47]. Nonetheless, when focusing on studies with larger samples sizes and/or longitudinal studies (vs. cross-sectional studies) the evidence seems against the use of irregular word reading as a marker of premorbid IQ in AD dementia. It is also notable that even in studies supporting their use, NART-like tests were often found to only be accurate at certain, earlier stages of the AD continuum, whilst becoming inaccurate in more severe stages. The stage at which inaccuracies appear varies from study to study, ranging from MCI to moderately severe AD.

Alternatively to the theory that irregular word reading is a measure of premorbid intelligence in AD dementia, some studies suggest that its impairment might reflect a semantic decline [49,50,51,52, 56, 62, 63, 67, 68, 71], understood as the loss of general/encyclopedic knowledge. This hypothesis is in line with models of reading that consider the core influence of semantic processes on irregular word reading [72,73,74,75,76,77,78,79]). Consistent with this idea, AD performances on reading and writing tasks that rely to a lesser extent on semantic processing (e.g., reading or writing of words with regular grapheme-phoneme correspondence) appear to be qualitatively more similar to, than divergent from, normal performances, in contrast with tasks requiring semantic processing such as exception word reading [56]. This is further supported by a co-occurring and proportionally similar decline in semantic performances (as measured for instance by picture naming performance) and irregular word reading [67]. Thus, it would appear that a core semantic memory deficit may be the underlying mechanism to impaired irregular word reading in AD dementia, in line with a large body of work suggesting that semantic memory impairments are an early and predominant symptom in MCI and AD dementia [80,81,82,83,84]. Consistently with this hypothesis, the left anterior temporal lobe (ATL) region, involved in semantic processing [85], seems to play a critical role in irregular word reading tasks [86,87,88,89], and shows atrophy in patients on the AD continuum [90,91,92]. Nonetheless, the hypothesis of a semantic deficit causing irregular word reading deficits in the AD continuum remains debated and more evidence is needed to draw solid conclusions regarding the underlying cognitive and neural mechanisms of irregular-word reading in these patients.

The aim of the present article is to assess over a large, well-characterized sample representative of the AD continuum, whether irregular word reading performances (a) significantly differ between diagnostic categories across this continuum and (b) are linked to general cognitive impairment / dementia severity. We hypothesize (1) that demographically-matched MCI and AD participants will perform significantly worse than controls on irregular word reading and (2) that irregular word reading will be correlated with general cognitive impairment / dementia severity. If these two hypotheses are supported by our results and that irregular-word reading performance is not maintained at different stages of AD, we will investigate three additional aims, namely whether the performance on irregular word reading is linked (c) to semantic neuropsychological tests; (d) to psycholinguistic variables associated with semantic processes (but not with psycholinguistic variables associated with phonological processes) and (e) to brain volumes in regions associated with semantic processing. These analyses will contribute to clarify the underlying cognitive and neural mechanisms of irregular word reading deficits. We hypothesize that (3) we will observe a stronger correlation between irregular word reading and tests of semantic processes (e.g. picture naming), as opposed to other tests (e.g., executive functions or episodic memory); (4) the accuracy of single items of the irregular words reading test will be associated with the lexicosemantic variables of the words (e.g., number of sense, semantic neighborhood, concreteness or age of acquisition) as opposed to phonological variables (e.g., number of phonemes, syllables or phonological neighborhood) and (5) finally, we should find neural correlates of semantics to be related to irregular word reading performance, namely the left ATL.

Methods

The data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). ADNI began in 2004 as a public–private partnership under the leadership of Dr. Michael W. Weiner. The primary goal of ADNI has been to detect AD dementia at the earliest possible stage (pre-dementia) and identify ways to track the disease progression. To that end, data from magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers as well as clinical and neuropsychological assessments have been collected to test if they can be combined to measure the progression of the various stages of the AD continuum. The initial five-year study (ADNI-1) was extended by two years in 2009 by a Grand Opportunities grant (ADNI-GO), and in 2011 and 2016 by further competitive renewals of the ADNI-1 grant (ADNI-2, and ADNI-3, respectively). For up-to-date information, see www.adni-info.org.

Participants

Participants over all ADNI studies (1, GO, 2 and 3) who had American National Adult Reading Test (AmNART) scores available at their baseline assessment were included in this study. All participants, aged between 54 and 91 years (inclusive), had completed a minimum of six years of education and did not have vascular dementia, depression, sensory disturbances, or other medical conditions that could interfere with the study. A study-partner who had frequent contact with the participant (an average of 10 h per week or more) also accompanied them to visits and filled out questionnaires.

Participants were divided into five categories: healthy control (HC), subjective cognitive decline (SCD), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI) and Alzheimer’s disease (AD) dementia.

The HC status was reserved for participants free of memory complaints, verified by a study partner, beyond what one would expect for age, as well as normal memory function documented by scoring above education adjusted cutoffs on the Logical Memory II subscale (LM II) delayed paragraph recall, from the Wechsler Memory Scale—Revised (WMS-R). Scoring (a) ≥ 9 for 16 or more years of education; (b) ≥ 5 for 8–15 years of education; and (c) ≥ 3 for 0–7 years of education. Additionally, Mini-Mental State Examination (MMSE) score between 24 and 30 (inclusive), Clinical Dementia Rating (CDR) = 0, and without significant impairment in activities of daily living. There was no criterion regarding memory complaints.

Participants classified as SCD presented the same scores as HC participants on the WMS-R LM II, MMSE, CDR and presented no significant impairment in activities of daily living. Unlike their HC counterpart, SCD participants presented significant subjective memory concern as reported by subject, study partner, or clinician, as well as significant memory concern confirmed by Cognitive Change Index score ≥ 16.

Participants were classified as EMCI if they presented subjective memory concerns as reported by the subject, their study-partner or clinician, had abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring inclusively (a) 9–11 for 16 or more years of education; (b) 5–9 for 8–15 years of education; and (c) 3–6 for 0–7 years of education, an MMSE score between 24 and 30 (inclusive) and a CDR score = 0.5. Their general cognition and functional performance were sufficiently preserved so that a diagnosis of AD could not be made.

Participants were classified as LMCI if they presented subjective memory concerns as reported by the subject, their study-partner or clinician, had abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring (a) ≤ 8 for 16 or more years of education; (b) ≤ 4 for 8–15 years of education; and (c) ≤ 2 for 0–7 years of education, an MMSE score between 24 and 30 (inclusive) and a CDR score = 0.5. Their general cognition and functional performance were sufficiently preserved so that a diagnosis of AD could not be made.

Diagnosis of AD was made in participants with a memory complaint confirmed by a study partner (or reported only by the study-partner), with abnormal memory function documented by scoring within the education adjusted ranges on the WMS-R LM II, scoring (a) ≤ 8 for 16 or more years of education; (b) ≤ 4 for 8–15 years of education; and (c) ≤ 2 for 0–7 years of education, an MMSE score between 20 and 26 (inclusive), with a CDR score = 0.5 or 1, and who met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association criteria for probable AD.

Therefore, clinical diagnoses were used to classify patients in the current study. Nonetheless, cerebrospinal fluid (CSF) amyloid- and tau-positivity rates in each group are reported in Table 1. ADNI specific cutoffs, described elsewhere, were as follow: amyloid positive ≤ 977 pg/ml [93], phosphorylated tau positive ≥ 24 pg/ml [94].

Table 1 Demographics, neuropsychological and language data for all groups

In addition to ADNI general inclusion and group classification criteria, we applied for this study two additional specific criteria. The first one was to be native English speakers (excluded N = 33). The second criterion was consistency between total AmNART scores and single item-level data on this test, when available, in the ADNI database (excluded N = 52). Of the original 2097 and after all considerations, 2012 participants remained, of which 681 HC, 104 SCD, 290 EMCI, 589 LMCI and 348 AD. Demographics of this final sample are provided in the result section.

Procedure

Cognitive assessments

AmNART

To measure irregular word reading abilities in an American population, the AmNART (sometimes called ANART) was used. This test is an adaptation of the original British NART [11, 12] developed specifically for the American English population to estimate premorbid intelligence through irregular word reading [10]. The version used by ADNI comprises a list of 50 irregular words, with about half of them identical to the NART. These words are irregular words, also known as exception words, meaning that their actual pronunciation differs from what would be predicted based on the application of grapheme-to-phoneme correspondence (e.g., pint, cellist). They are intended to be printed in order of increasing difficulty and are relatively short to avoid the possible adverse effect of stimulus complexity. Given no time limit, the subject is instructed to read aloud down the list of words, errors made in pronouncing each word is then recorded into an “error score”. Participants are allowed to self-correct but are not prompted to do so unless it was difficult to hear what was said and it is necessary to determine whether the pronunciation was correct or incorrect. If they hesitate on two different pronunciations, one correct and the other incorrect, they will be asked which one they think is best.

To assess the involvement of psycholinguistic variables on successful reading of irregular words, we extracted characteristics for each of the 50 AmNART irregular words using the English lexicon project (ELP; [95]) as well as the WordNet [96] data sets, prioritizing ELP, but using WordNet when data was not otherwise available. As control variables, we used (a) word length (number of letters), (b) objective lexical frequency, (c) orthographic neighborhood density and (d) summed bigram frequencies by position. The measure of lexical frequency was obtained from ELP and is the log10 of number of times the word appears in the corpus + 1. The measure of orthographic neighborhood density was the orthographic Levenshtein distance to the 20 closest neighbors in the lexicon (OLD20, [97]). To put it simply, it is a measure of similarity and proximity to other words of the lexicon. Specifically, the OLD20 of a given word is computed as the mean of string edit distances from this word to its 20 closest orthographic neighbors in the lexicon. The edit distance used, Levenshtein distance (LD), corresponds to the number of operations (letter deletion, insertion, or substitution) needed to change a word into another word: for example, the LD from smile to similes is 2 (two insertions: I and S). Next is summed bigram frequencies by position, where bigram is defined as a sequence of two letters, it was obtained from ELP and is a measure of frequency of bigrams that is sensitive to positions within words by taking into account the letter positions where the bigram occurs. For example, the bigram frequency for DO in DOG counts DO bigrams only when they appear in the first two positions of a word in the corpus. As lexicosemantic variables, we used (a) age of acquisition, (b) concreteness, (c) number of senses and (d) semantic neighborhood density. The measure of age of acquisition was obtained from ELP, originally recorded by Kuperman and colleagues [98] as the estimated age at which a word was learned, which has been shown to have larger effects in tasks involving semantic information (e.g., picture naming and lexical decision) as opposed to tasks where semantic information was less involved (e.g., reading aloud; [99, 100]). The measure of concreteness was obtained from ELP and is described by Brysbaert and colleagues (2014) [101] as evaluating the degree to which the concept denoted by a word refers to a perceptible, relatable, entity. The measure for number of senses was obtained from WordNet and is described by Miller [96] as the number contexts in which the word can be used to express the number of possible meanings it has. The measure for semantic neighborhood density was obtained from ELP and is described by Mirman and Magnuson [102] as the number and/or proximity of neighboring representations, density referring to how tightly packed the words in the neighborhood are [103]. For phonological variables, we used (a) the number of syllables,(b) the number of phonemes (c) and phonological neighborhood. The measure of phonological neighborhood was obtained from ELP and is, similarly to the aforementioned OLD20, a measure of 20 phonological LD (PLD20).

Mini-mental state exam and Montreal cognitive assessment

To measure general cognitive impairment/ dementia severity, we used scores obtained by participants on the Mini Mental State Examination (MMSE; [104]) and the Montreal Cognitive Assessment (MoCA [105]), two test that are routinely used to screen a wide range of cognitive functions and identify patients on the AD continuum, as well as to determine disease severity.

Boston naming test

To measure lexicosemantic abilities, the Boston Naming Test (BNT) was used [106]. It measures the ability to orally label (name) drawing of objects. Participants have 20 seconds to name what the drawing represents after being presented with the image. A semantic cue is given if the participant fails to recognize the picture (e.g., answering bench instead of tree) or if they state that they do not know what the picture represents. The semantic cue is either a short explanation about the item (e.g., for a mask: “it’s part of a carnival fantasy”) or a superordinate category (e.g., for a beaver: “it’s a kind of animal”). The test presents objects in order of frequency, from most to least common and is discontinued after 6 consecutive failures. ADNI only administers odd numbered items on the standard 60 item BNT, this gives us a maximum score of 30.

Trail making part-B

To measure executive functioning, the trail making test was used. More specifically, we used scores obtained on part-B of the test, which depends on visuomotor, perceptual-scanning skills and requires considerable cognitive flexibility in shifting from number to letter sets under time pressure [107]. 25 circles are presented to the participant which contains numbers 1 through 13 and letters A through L, the circles are scrambled across the given medium, the participant must connect the circles while alternating between numbers and letters in ascending order (e.g., A to 1,1 to B; B to 2; 2 to C), they have up to 300 seconds to complete the test, their time to complete it (in seconds) is recorded as their score.

Rey auditory verbal learning test (30-min delay)

To measure episodic memory, we used the Rey Auditory Verbal Learning Test (RAVLT) [108]. Over five learning trials, participants are read a list of 15 words (list A), they are asked to recall them immediately with no regards for order. After the fifth learning trial, the same task is done using an interfering list (B). Immediately and 30 minutes after administration of list B, list A is recalled, this time without first being read. Scores from the 30-min delay test were used as our measure of episodic memory.

Neuroimaging

All participants received T1-weighted (T1w) MRIs (see http://adni.loni.usc.edu/methods/mri-tool/mri-analysis/ for the detailed MRI acquisition protocols). T1w scans for each participant were pre-processed through our standard pipeline including denoising [109], intensity inhomogeneity correction [110] and intensity normalization into range [0–100]. The pre-processed images were then both linearly (9 parameters: 3 translation, 3 rotation, and 3 scaling; [111]) and nonlinearly [112] registered to a population appropriate average template generated based on 150 ADNI participants. The quality of all the image processing steps, including the linear and nonlinear registrations was visually verified by an experienced rater (MD). Deformation-based morphometry (DBM) was performed to measure the local anatomical differences in the brains of the participants by estimating the Jacobian determinant of the inverse of the estimated nonlinear deformation field as a proxy of atrophy [113]. DBM values reflect the relative volume of the voxel with respect to the template,i.e. a value of 1 indicates similar volume to the same region in the template, values lower than one indicate volumes smaller than the corresponding region in the template, while values higher than one indicate volumes that are larger than the corresponding region in the template. Therefore, lower DBM values can be interpreted as reduction in the structure volume, i.e., regional atrophy. Voxel-wise DBM maps were used to assess the relationship between brain atrophy and AmNART scores at a voxel level. In addition, mean DBM values within a region of interest (ROI) including the left anterior temporal lobe were used to assess the relationship between atrophy in the left anterior temporal lobe and AmNART scores.

Statistical analyses

Behavioral analyses

To describe the sample, Pearson’s chi-square test was used to assess sex as well as amyloid- and tau-positivity differences. One-way analysis of variance (ANOVA) and Tukey post-hoc testing were used for all other variables.

To test the hypotheses that AmNART scores are dementia insensitive and semantic-related we modeled a number of multiple linear regression that predicts AmNART total error score based on (1) diagnostic category, extracting an ANOVA table to test for the factor as a whole, (2) tests of severity (MMSE, MoCA) and (3) neuropsychological tests (BNT, Trail making part-B, RAVLT), controlling for sex, age, and education. The MMSE was also used as a control variable for disease severity when assessing relation to neuropsychological tests, it was favored as results on the MoCA were not available for the whole sample. Epsilon square was used as a measure for effect size (ε2; [114]). Stein’s formula was used to calculate adjusted R 2 [115].

To assess the involvement of each psycholinguistic variable extracted from the AmNART on irregular word reading, we analyzed single-item accuracy with a generalized logistic mixed-effects model using the lme4 package [116]. This analysis was conducted on a subsample of participants who had single item-level AmNART data available, as opposed to only having total AmNART score available (195 HC, 323 LMCI, 156 AD). Single-item accuracy was predicted by length, lexical frequency, orthographic neighborhood, bigram frequencies by position, age of acquisition, concreteness, number of senses, semantic neighborhood density, number of syllables, number of phonemes and phonological neighborhood as fixed effects, with by-item and by-subject random intercepts as random effects. |z| values beyond 1.96 were deemed as significant [117]. Bigram frequencies by position and number of senses were logarithmically transformed to normalize these variables. 20 words with missing values in age of acquisition, objective lexical frequency, concreteness, and/or phonological neighborhood had to be excluded from this analysis. The remaining 30 words were ache, aisle, algae, asthma, blatant, bouquet, cellist, chord, courteous, debt, deny, depot, epitome, façade, gauge, heir, hiatus, hyperbole, naïve, nausea, papyrus, pint, placebo, scion, sieve, simile, subtle, superfluous, thyme and zealot.

Neuroimaging analyses

Similar linear regression models were used to assess the relationship between AmNART scores and voxel-wise DBM values in the subset of the participants that had MRI information available (N = 1863), controlling for age, sex, and level of education. A second set of models were also run with diagnostic category as an additional covariate. Voxel-wise results were corrected for multiple comparisons using False Discovery Rate (FDR) controlling technique, with a significance threshold of 0.05.

Second, we conducted a ROI-based analysis to test the specific hypothesis of a relationship between the volume in the left ATL and irregular word reading on a subsample of participants who had neuroimaging data available (N = 1863). To do so, we modeled a multiple linear regression that predicts AmNART total error score based on the DBM in the left ATL, controlling for sex, age, education, with and without including diagnostic category as a covariate in the models, similar to the voxel level analyses. The ATL ROI was selected from a previous study [118, 119].

All statistical analyses were performed using R Statistical Software (version 4.2.1; [120]).

Results

Demographic characteristics of the 2012 participants are shown in Table 1. Groups differed with regards to sex χ2 (4) = 44.94, p < 0.001, age F (4, 2007) = 12.15, p < 0.001, ε2 = 0.02 and education F (4, 2007) = 12.57, p < 0.001, ε2 = 0.02. All following analyses were therefore controlled for sex, age and education. Expectedly, groups differed with regards to CSF amyloid- χ2 (4) = 242.86, p < 0.001 and tau-positivity χ2 (4) = 214.85, p < 0.001. Amyloid-positivity rates were 36% in HC, 32% in SCD, 44% in EMCI, 72% in LMCI and 88% in AD participants with available CSF data. Tau-positivity rates were 29% in HC, 35% in SCD, 36% in EMCI, 61% in LMCI and 82% in AD participants with available CSF data. Neuropsychological and language evaluations broadly revealed the expected patterns of impairment across the AD continuum. First, measures of severity worsened along the continuum of disease progression stages. Second, episodic memory deficits were predominant, but cognitive decline gradually extended to other cognitive domains.

Irregular word reading across the AD continuum

When controlling for sex, age, education, AmNART total error score significantly differed between diagnoses (F [4,, 2004] = 52.20 p < 0.001, partial ε2 = 0.09, Fig. 1). Overall, patient groups with more advanced disease progression on the AD continuum made more errors on irregular word reading. Specifically, as seen in Fig. 1, AD dementia participants showed significantly lower performance compared to all other groups. In addition, HC scores also differed significantly from that of EMCI and LMCI. Means and standard deviations of AmNART total scores as well as significant differences are presented in Table 1 (more detailed T ratios, p values and effect sizes for each contrast are presented in Supplementary Table 1).

Fig. 1
figure 1

Relation between AmNART error score and diagnostic category

Of note, we observed the presence of 18 outlier participants whose AmNART total error score deviated by ± 3.29 standard deviation relative to the average of their respective diagnostic group, more precisely 12 HC, 3 EMCI and 3 LMCI. However, excluding these participants did not impact any of the results of the analyses.

Association between irregular word reading and general cognitive impairment / severity

Whole sample and group-specific partial correlations between AmNART and measures of disease severity/global cognition (MoCA and MMSE) are presented in Fig. 2. These measures control for sex, age and education. Both measures of severity were significantly correlated with total AmNART scores, in all diagnostic groups as well as across the whole sample, further supporting a strong link between AD disease progression and impaired irregular word reading.

Fig. 2
figure 2

A Relation between MMSE and AmNART error score relative to diagnostic category. B Relation between MoCA and AmNART error score relative to diagnostic category

Association between irregular word reading and lexicosemantic, executive functioning and episodic memory performance

Whole-sample and group-specific partial correlations between AmNART and the chosen neuropsychological tests (BNT, Trail making part-B and RAVLT delayed recall) are presented in Fig. 3. These measures control for sex, age, education and severity as measured by the MMSE. Total AmNART irregular word reading scores were significantly and moderately correlated with BNT scores (measuring picture naming or lexicosemantic abilities), weakly but significantly correlated with the Trail making part-B (measuring executive functioning), and poorly correlated with the RAVLT delayed recall (measuring episodic memory), being only significant in the EMCI group (p < 0.001) and across the whole sample (p < 0.05).

Fig. 3
figure 3

A Relation between Boston Naming Test and AmNART error score relative to diagnostic category. B Relation between Trail making part-B and AmNART error score relative to diagnostic category. C Relation between RAVLT delayed recall and AmNART error score relative to diagnostic

The model created to distinguish between the involvement of lexicosemantic, executive and memory functions in irregular word reading is presented in Table 2. Consistently with the correlational analyses, we observed that the BNT provides a strong contribution to the model (standardized β = -0.31, p < 0.001), the Trail making part-B provides a weak but significant contribution (standardized β = -0.06, p < 0.001) and the RAVLT delayed recall does not provide a significant contribution (p = 0.887).

Table 2 Multiple regression predicting AmNART total error score using neuropsychological tests results

Association between irregular words and psycholinguistic variables (lexicosemantic and phonological)

To better understand the relationships between AmNART irregular words and correct reading, we first selected a subsample for whom single item-level AmNART data was available (195 HC, 323 LMCI, 156 AD). The model used to predict irregular word item success based on their psycholinguistic variables is presented in Table 3. While none of the phonological variables had a significant effect on irregular word reading accuracy, there was a significant effect of age of acquisition (β = -0.42, z = -5.62).

Table 3 Generalized logistic mixed-effects model predicting irregular word successful reading using psycholinguistic variables

Link between irregular word reading and brain volumes

Figure 4 shows the results of the significant associations between voxel-wise DBM maps and AmNART scores, including age, sex, and education level as covariates, after correction for multiple comparisons (FDR). At a voxel-wise whole brain level, we observed significant correlations with bilateral medial temporal lobe regions, including the hippocampi, as well as with the ATL, the inferior and middle temporal gyrus, and the fusiform gyrus, predominantly in the left hemisphere. However, no voxels survived FDR correction after including the diagnostic group as covariate. At the ROI level, ATL DBM values were significantly associated with AmNART scores when including age, sex, and education as covariates (standardized β = -0.11, p < 0.001). Furthermore, this association remained significant after including diagnostic group as an additional covariate (standardized β = -0.05, p < 0.05). The model used to predict AmNART error score based on brain volumes in the ATL is presented in Table 4.

Fig. 4
figure 4

Relation between voxel-wise DBM maps and AmNART error score. Axial, coronal and sagittal slices showing the t-statistic maps reflecting the significant patterns of brain volume changes in the sample. Colour gradient indicates shrinkage of the tissue (i.e., atrophy). X, Y and Z values indicate MNI coordinates for the displayed slice

Table 4 Multiple regression predicting AmNART total error score using anterior temporal lobe volume

Discussion

The present study aimed to assess, over a large and well-characterized sample of participants on the AD continuum, whether irregular word reading performance is an accurate indicator of premorbid intelligence, or a marker of general cognitive and semantic deficits in this population. Results showed that EMCI, LMCI and AD patients make significantly more errors in reading irregular words compared to HC, and that AD patients also make significantly more errors than all other groups. Across the whole AD continuum, as well as within each diagnostic group, irregular word reading abilities were further significantly correlated to measures of general cognitive impairment / dementia severity. This suggests that irregular word reading performances decline throughout the AD continuum, and that even at a finer grain beyond diagnostic categories, a strong link exists between dementia severity and irregular word reading difficulties. Furthermore, results indicated significant moderate association between irregular word reading and neuropsychological tests of lexicosemantics, as opposed to weak association to executive function and no association to episodic memory. At the item level, none of the phonological variables had significant effect on irregular word reading accuracy whilst age of acquisition, a semantic variable, provided a significant contribution. Finally, the whole-brain neuroimaging analysis pointed to the hippocampal and left ATL volume loss as the main contributor to decreased irregular word reading performances. These results are consistent with the theory of irregular word reading impairments as an indicator of disease severity and semantic decline, as opposed to an indicator of premorbid IQ in the AD continuum population and pave the way for further investigation on the matter.

Consistent with our first hypothesis, MCI and AD participants performed significantly worse than controls in reading of irregular words, controlling for sex, age and education. EMCI, LMCI and AD participants correctly read an average of 2.9, 3.8 and 7.4 fewer words, respectively, than HC. These measures are comparable to that of Weinborn and colleagues [70] who, when using the Wechsler Test of Adult Reading (WTAR, another 50 irregular word test) found that MCI and AD participants read on average 3.0 and 7.4 fewer words, respectively, than HC. Consistent with hypothesis 2, results indicate that irregular word reading is correlated with general cognitive impairment / dementia severity. This relationship was similar in controls as it was throughout the different diagnostic categories, although and expectedly, that relationship became stronger as we advanced throughout the AD continuum, when larger variations in impairment appeared. Taken together, these two sets of results indicate that irregular word reading performances, although relatively stable throughout normal ageing [121], decline throughout the AD continuum as early as the EMCI stage. Therefore, the assessment of premorbid IQ with the AmNART in participants on the AD continuum violates the criteria set by Taylor and colleagues in 1996 [68], that is to say that an accurate estimate of premorbid IQ should (a) not differ significantly from those of demographically matched control subjects and (b) not significantly change as disease progresses in severity. This has major clinical implications for clinicians and researchers, who could be led to underestimate cognitive changes in people with memory complaints, be more likely to underdiagnose AD continuum conditions or underestimate disease progression in those already diagnosed with one of these conditions. Therefore, it seems preferable for clinicians to rely on comparisons to demographically-adjusted norms of cognitive performance to establish cognitive decline, as well as on repeated measures over time.

Consistent with hypothesis 3, lexicosemantic abilities were the second-best predictor of irregular word reading performances, just after education but largely above dementia severity and other cognitive functions (executive functions and episodic memory). The importance of lexicosemantic abilities in irregular-word reading was further in line with hypothesis 4, as AmNART item success rate was significantly predicted by the age of acquisition of irregular words, which has been associated with semantic representations [99, 100]. This is consistent with the fact that NART-like tests are intended to bypass phonemic decoding by relying more heavily on a person’s knowledge of exceptional spelling associated with irregular words. Overall, this set of results highlights the strong association between irregular word reading and semantic abilities, as suggested by Strain and colleagues in 1998 [67] and consistent with the idea of semantic abilities’ core influence on irregular word reading performances, particularly but not limited to the AD continuum population. These results are consistent with models of reading that would consider the core influence of semantic abilities on correct reading aloud of irregular words, as emphasized by Taylor and colleagues in their 2015 review [78]. Although not all semantic psycholinguistic variables significantly predicted correct reading, the significant involvement of age of acquisition is consistent with the idea that words acquired at a younger age and used more frequently within the population might be more strongly stored in semantic memory, enhancing the likelihood of successful reading. What these results also show is that executive functioning, episodic memory and phonology do not seem to be as crucial in irregular word reading performance. Overall, these findings suggest that irregular word reading serves as a reliable marker for semantic decline in patients within the AD continuum. Interestingly, earlier studies have suggested that baseline AmNART scores could predict longitudinal cognitive decline in individuals with AD, attributing this effect to the protective nature of premorbid intelligence [66]. However, the current results offer an alternative explanation for this phenomenon: our study aligns more closely with several studies that have demonstrated that baseline semantic memory impairments predict future cognitive decline in AD [87,88,89]. These observations in AD patients are not dissimilar to ATL atrophy in semantic dementia, also accompanied with irregular word reading deficits [79]. However, single word reading tasks like the AmNART have been hypothesized to not be demanding enough on the ATL [77] which could explain the small effect size. Interestingly, the neural correlates of AmNART identified in the current study are notably more specific than the typically reported diffuse pattern of regions (both anterior and posterior) in studies on crystallized intelligence [130, 131]. This provides additional support for the idea that the AmNART may be more sensitive to detecting semantic decline than assessing premorbid intellectual abilities in a population of individuals within the AD continuum.

While the current study fulfills many gaps in the literature (large sample size, well-characterized participants at four different stages on the AD continuum, investigation of underlying cognitive and neural mechanisms of irregular word reading), these results also need to be considered within the context of several limitations. Firstly, the ideal study design would involve obtaining a measure of intelligence before disease onset to correlate with the AmNART score collected during the disease. Unfortunately, this data was unavailable in the ADNI dataset. Secondly, the cross-sectional design of the study does not confirm that irregular word reading declines with time in participants on the AD continuum, as a longitudinal design would. Thirdly, the ADNI cohort is not population-based and underrepresents ethnoculturally diverse populations, its participants are also highly educated and have fewer comorbidities compared to other cohorts [132]. ADNI results must also be interpreted with the caveat that they may have limited external validity and generalizability for more diverse populations. This is a significant problem, especially because racial/ethnic disparities in timeliness and comprehensiveness of dementia diagnosis have already been highlighted [133]. The results of the current study suggest that the AmNART capture semantic decline and might therefore underestimate premorbid intelligence in patients on the AD continuum. The assessment of premorbid function using tools that are not validated in diverse populations might therefore contribute to or even amplify disparities to timeliness of dementia diagnosis. Generalizing to other populations is further complicated by differences in how irregular words are experienced in other languages with more transparent spelling-to-sound correspondences. Italian for example, is more transparent than the more opaque English and is characterized by regular spelling to sound correspondence [54]. The same can be said for languages that incorporate phonograms or ideograms (e.g., Chinese, Japanese, Korean or Vietnamese) which could be more context-dependent or invoke greater imageability in reading. Results may therefore differ significantly in NART-like tests developed for more “transparent” languages. Fourth, item-level analyses were conducted on a subsample of participants for which item-level AmNART data (as opposed to only total AmNART score) was available (195 HC, 323 LMCI, 156 AD). Additionally, 20 words with missing values in age of acquisition, objective lexical frequency, concreteness, and/or phonological neighborhood had to be excluded from item-level analyses. Fifth, the use of the BNT as unique semantic test has certain limits, as picture naming involves distinct cognitive processes that are not limited to semantics. Involved are visual analysis of the picture, recognition of the stimulus as familiar, activation of the semantic representation of the object via the semantic system, a lexical-semantic process which directs selection and retrieval of semantic information in a task appropriate way, modality-independent lexical access to the phonological word form of the object, that is to say the speech sounds used in the word; and the motor programming and articulation required for saying the word [134, 135]. This is important as reading models see the involvement of both lexical and semantic processing in correct reading of irregular words [77], these results should therefore not be interpreted as an involvement of semantics alone. Sixth, it is important to note that the AD dementia population recruited in the ADNI study is at relatively early stages of the disease (i.e. MMSE score between 20 and 26 and CDR score = 0.5 or 1). Future studies could investigate irregular word reading in later stages of the disease.

Conclusions

Measuring cognitive decline can be particularly challenging for clinicians when considering that diseases as insidious as AD dementia may be involved. Cognitive decline will more often than not have to be estimated post-hoc, blind to an individual’s objective baseline performances. The first assessment, where only one time point is available, could prove critical to any intervention against the disease and its progression. Currently, clinicians have to rely on subjective complaints, demographically-adjusted norms of cognitive performance and repeated measures. The results of this study lend support to the idea that irregular word reading tests do not provide an accurate estimate of premorbid IQ in the MCI-AD populations as it appears irregular word reading performances significantly declines in this population and are related to semantic impairments correlated to hippocampal and ATL volume loss. Relying on these estimates could lead clinicians to underestimate cognitive decline in people with those conditions. Premorbid estimates should rely on more crystalized forms of intelligence that are uncorrelated to disease severity as evidenced by longitudinal studies in clinically diverse populations.