Background

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder which gradually impairs memory, cognition, and other vital functions [1]. Individuals along the AD continuum exhibit markedly heterogeneous progression rates as the disease advances [2, 3]. Both linear and non-linear progression of cognitive decline has been documented in AD [4, 5], with distinct progression profiles found among individuals [2, 3]. Still, the mechanisms that underlie the heterogeneity in AD progression rates remain incompletely understood.

The neuropathological hallmarks of AD are centered around the presence of amyloid-beta (Aβ) plaques and neurofibrillary tangles of hyperphosphorylated tau, which are believed to precede structural neurodegenerative changes in the brain [6, 7]. Links between cognitive decline in AD and biomarker levels for Aβ [8], tau [9], and atrophy/neurodegeneration [10] have been reported in the literature. However, with little exception, studies have focused on individual biomarkers rather than examining their synergies and combined contribution to progressive cognitive decline along the AD continuum. An accurate characterization of the mechanisms leading to heterogeneity in progression rates would nevertheless benefit from considering biomarkers for Aβ (‘A’), tau (‘T’), and neurodegeneration (‘N’) together, consistent with the recently proposed AT(N) framework [11, 12]. Yet, combining AT(N) biomarkers in a single model is not trivial, given their complex, non-linear relationships with one another and/or their relationship with cognitive decline [13, 14]. A modeling approach based on deep learning arises as a natural solution to this problem, given its ability to model complex and non-linear map**s [15, 16]. Deep learning models have emerged as a powerful tool recently in relevant tasks, such as differentiating between individuals with dementia and controls [17, 18], and classifying stable vs. progressive mild cognitive impairment (MCI) [15, 19,20,21].

In the current study, we propose a model-driven approach, based on AT(N) biomarkers, for stratifying progression rates along the AD continuum and delineating their underlying mechanisms. Notably, this work focuses on heterogeneity of cognitive decline along the AD continuum unlike previous studies where MCI progression was examined [22, 23]. We first employ data-driven clustering of cognitive assessments to define individuals with prodromal or clinical AD as either Fast Decliners (FD) or Moderate Decliners (MD) (Fig. 1A). These progression phenotypes are then used to train, validate, and test a deep learning model using baseline biomarkers for A (CSF Aβ 1–42), T (CSF p-tau 181), and N (MRI images and FDG-PET) (Fig. 1B). The model was trained with and without Aβ, tau, and neurodegeneration biomarkers, allowing us to compare the relative contribution of biomarker synergies, particularly amyloid-, and tau-mediated neurodegeneration to progression rates along the AD continuum. We additionally examined the extent to which the cognitive progression phenotypes predicted by our model reflected variation in regional atrophy characteristics (Fig. 1C), commonly used for subty** AD [24,25,26,27]. This allowed us to examine if our model-based framework reflected patterns of neurodegeneration captured by other commonly used approaches.

Fig. 1
figure 1

Study setup. A Clustering of MMSE scores to classify subjects as Fast and Moderate Decliners (FD and MD, respectively). B Baseline AT(N) biomarkers including CSF Aβ (A), CSF p-tau (T), and FDG-PET along with T1-weighted images (N), from a cohort of subjects with prodromal and clinical AD (= 321, augmented to 1104) were used to train the deep learning models for FD/MD prediction. C The predicted cognitive progression phenotypes in the test set (ntest = 97) were also examined for overlap with putative atrophy-based AD subtypes

Methods

time points during 24 months following baseline, (3) had T1-weighted MRI images takenParticipants and data acquisition

Data used in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu; clinical trial registration: NCT00106899). The ADNI was launched in 2003 as a public-private partnership, led by the Principal Investigator Michael W. Weiner, M.D. The primary goal of the ADNI has been to test whether serial MRI, positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. In the current study, participants from the ADNI-1, ADNI-2/Go, and ADNI-3 cohorts were included if they were (1) diagnosed with AD at baseline or within 1 year of their first diagnosis (i.e., MCI subjects were included if they were later diagnosed as AD), (2) had valid cognitive evaluations (Mini-Mental State Examination (MMSE) scores) performed at a minimum of t = 4 time points during 24 months following baseline, (3) had T1-weighted MRI images taken using 3T scanners based on either an inversion recovery-fast spoiled gradient recalled (IR-SPGR) or a magnetization-prepared rapid gradient-echo (MP-RAGE) sequences, and (4) were determined to be amyloid-positive within the study’s timeline according to published criteria (CSF Aβ < 976.6 pg/mL or 18F-florbetapir-PET uptake ratio > 1.11). The 18F-florbetapir-PET uptake ratios, provided in ADNI as a derived variable, were calculated by extracting weighted cortical retention means from frontal, cingulate, parietal, and temporal regions, after co-registering the PET and MRI scans. These data were used to calculate standardized uptake value ratios (SUVRs), normalized by a whole cerebellum reference region. SUVRs with a positivity threshold of 1.11 were then identified [28, 29].

In total, 321 unique subjects were identified using these criteria. Out of these, 310 subjects were determined to be amyloid-positive using CSF Aβ cutoff while 11 subjects satisfied the PET uptake ratio criterion. All subjects provided written informed consent, and the procedures were approved by the Institutional Review Boards of participating centers.

MRI images and their processing

T1-weighted SPGR or MPRAGE images were acquired using 3T scanners (full details of the image acquisition protocols can be found online (http://adni.loni.usc.edu/methods/documents/mri-protocols/). T1-weighted images were used for training of the deep learning models. The cohort with valid MRI and cognitive assessments were split between the training (n train = 224) and testing (n test = 97) datasets.

CSF Aβ 1-42 (amyloid-beta) and p-tau181 (tau) biomarkers along with MRI + fluorodeoxyglucose (FDG)-PET data (neurodegeneration) were used as the A, T, and N biomarkers respectively in our AT(N)-centered analytical framework. CSF samples used in this study were collected and processed previously (see, [30]; http://adni.loni.usc.edu/methods). CSF Aβ and p-tau were measured with the fully automated Elecsys immunoassay (Roche Diagnostics, Basel, Switzerland) by the ADNI biomarker core (University of Pennsylvania, Philadelphia, PA). Processed (see, http://adni.loni.usc.edu/methods) FDG PET images were averaged, with uptake values from angular, temporal, and posterior cingulate cortices serving as one of our two biomarkers for neurodegeneration (along with MRI) [31]. This average FDG PET was previously obtained using a series of steps to mitigate inter-scanner variability and normalized in spatial resolution and intensity range for further analysis [32]. Each MRI image was standardized to 0 mean and unit standard deviation. Similarly, other AT(N) biomarkers were standardized before being used as input in the deep learning model.

Unsupervised clustering of cognitive measurements

To characterize longitudinal change in cognition, we used 2-year follow-up MMSE scores. In addition, other cognitive assessments over the same duration were used for validation purposes, including the Alzheimer’s Disease Assessment Scale, 13-Item Subscale (ADASCog13), Clinical Dementia Rating Sum of Boxes (CDR-SB), and Functional Assessment Questionnaire (FAQ). These tests were administered as described online (http://www.adni-info.org).

We used time-series clustering based on the dynamic time-war** (DTW) method [33] to identify cognitive phenotypes in a data-driven manner. Clustering is typically applied in order to partition a heterogeneous set of samples into more homogeneous clusters based on some similarity measure. When it comes to clustering of time-series data, a DTW-based similarity measure is more widely applicable than the conventional Euclidean distance or spatial distance based measures [34, 35]. DTW is able to find optimal global alignment between sequences of different shapes. The shape-based DTW method is particularly well-suited to dynamic time-series data with potential temporal drift, showing better accuracy than linear models [34, 36]. We used the DTW to cluster the MMSE scores of our cohort using t = 4 time points, collected over 2 years from baseline, using Hierarchical Agglomerative Clustering with Ward’s linkage [37]. Clustering was repeated with other linkage methods such as Ward1 and the unweighted pair-group method using arithmetic averages (UPGMA) to examine the similarity of cluster labels [38, 39]. Other cognitive assessments such as ADASCog13, CDR-SB, and FAQ were used for validation purposes, testing if the phenotypes based on MMSE scores also differ in other measures of cognition in AD. Further, to determine the optimal number of cognitive decline clusters in our cohort, we used silhouette analysis to compare average silhouette width for k = 2, 3, and 4 clusters.

Deep learning model architecture and training

Deep learning models have been extensively used for AD classification [17, 18] and predicting progression of MCI [15, 21, 40]. Deep learning models are typically compared against linear or non-linear Support-Vector Machine (SVM), logistic regression, or random forest classifiers, where SVM has been shown to outperform the latter two [41]. We first calibrated our deep learning model’s performance using a similar comparison with SVM. Our deep learning model used a parameter-efficient architecture similar to that previously proposed for classification of MCI [42]. The Parameter-Efficient Network model, designated as PENet, takes a combination of baseline AT(N) biomarkers including MRI images and FDG-PET (N), CSF p-tau (T), and CSF Aβ (A) and learns to predict the subject’s cognitive decline status (FD vs MD) using these baseline measurements only. The multi-modal feature extractor implemented in the model uses a series of convolutional blocks, or conv blocks, to process MRI tensors. These conv blocks are composed of a convolutional layer followed by batch normalization and exponential linear unit (ELU) transformation. The model also makes use of separable convolution blocks, or sep-conv blocks, which perform the operation of a convolution block but with far fewer parameters, hence reducing the risk of over-fitting. PENet uses 2 conv blocks followed by 3 sep-conv blocks with increasing number of filters (Fig. 3A). It processes non-imaging biomarkers by dense or FC (fully connected) blocks.

Implementation

Experiments were conducted using python version 3.6. The implementation was developed using the Keras deep learning library with Tensorflow backend. The model was trained on Ubuntu 18.04 on a single Nvidia Tesla V100 GPU with 16G memory, using a batch size of 25 and trained for 50 epochs after which the model showed stable dynamics (Fig. S2). This training was performed using the Stochastic Gradient Descent algorithm with an initial learning rate = 8 × 10-4 and exponential decay with a drop rate = 0.5. The FC layers used in the model were regularized using L2 regularization with penalty coefficient = 5 × 10-4.

Data augmentation and validation framework

The implemented model was trained and validated using 5-fold cross-validation stratified by class phenotypes. All qualifying subjects from ADNI-1, ADNI-2/Go, and ADNI-3 were used in our experiments, yielding a total of n = 321 subjects (n train = 224, MD = 136, FD = 88; n test = 97, MD = 58, FD = 39). To improve model generalizability, we augmented the training dataset through a combination of image rotation (random angle in [−90°, 90°], translation (random shift in [0, 0.5]), and flip** operations, resulting in 1104 training images. Special care was taken to use the test dataset only after all steps of augmentation, model selection, and hyperparameter tuning were completed, ensuring no data leakage.

Analysis of atrophy-based AD subtypes

MRI images for the test dataset were processed (http://adni.loni.usc.edu/) using Freesurfer (http://surfer.nmr.mgh.harvard.edu/) to extract region of interest (ROI)-based gray matter (GM) volume. The following processing steps were performed: (1) motion-correction and skull-strip** based on a watershed deformation method [43], (2) image registration to the Talairach brain template, (3) estimation and labeling of gray matter-white matter (GM-WM) boundary using a tessellation step, and (4) registration of volume to an atlas to acquire volume and surface statistics for each ROI. Using these extracted volumes, we investigated the potential association between the model-based cognitive progression phenotypes and atrophy-based AD subtypes, as previously identified [24]. Subtypes were identified using the ratio of hippocampal volume (HV) to cortical total volume (CTV). Following the same procedure as described previously [24], subjects in the test dataset with an HV:CTV ratio above the 75th percentile were identified as belonging to the Hippocampal-Sparing AD (HpSp) subtype, those with HV:CTV ratio below the 25th percentile as belonging to the Limbic-Predominant subtype (LP) and the rest were designated as typical-AD (tAD).

To visualize the spatial extent of atrophy, the subtypes HpSp, LP, and tAD were also contrasted against age-matched controls (n = 30) to extract voxel-wise contrast maps using FSL's optimized voxel-based morphometry (VBM) (http://www.fmrib.ox.ac.uk/fsl/) [44]. The FSL VBM processing pipeline involved brain extraction of T1-weighted images followed by segmentation into WM, GM, and CSF volume probability maps. Next, a random subset of each compared cohort was used to create the average study-specific GM template by registration to MNI152 space using the FSL FLIRT tool. This was followed by non-linear registration of all GM images in the native image space to the average GM template. Subsequently, these registered images were smoothed using a full-width half-maximum (FWHM) of 6mm and their voxel-wise GM volumes were contrasted using a general linear model (GLM) formulation. To identify significant differences between the compared groups, non-parametric statistics were performed using the ‘randomise’ FSL function (5000 permutations) with FWE correction set at p < 0.05, based on threshold-free cluster enhancement (TFCE).

Statistical analysis

For comparisons between two groups, unpaired two-sided t tests or Wilcoxon rank-sum test were used. For testing significant differences in MMSE scores of the MD and FD phenotypes, we used the selective inference method [33, 37], suitable for shape-based clustering of dynamic time-varying observations [34, 35]. The longitudinal MMSE scores of the entire cohort were clustered to reveal 2 different progression phenotypes identified as moderate (MD: n = 194; ages 73.8 ± 7.28; Supplementary Table 1) and fast (FD: n = 127; ages 73.2 ± 8.02) decline (Fig. 2A and B). Other linkage methods such as Ward1 [38] and UPGMA resulted in very similar clustering solutions [38, 39]. The 2 clusters did not exhibit any significant differences in age, education, gender, total cortical volume, and APOE e4 status (all p > 0.05; Supplementary Table 1 and Fig. S3). Silhouette analysis, used to determine the optimal number of clusters, resulted in maximal silhouette width for k = 2 clusters (Supplementary Table 2). The MD and FD phenotypes showed, as expected, significant differences in MMSE profiles (p = 6.73 × 10-3), revealed using a method developed for post-clustering comparisons [24] based solely of patterns of neurodegeneration. We found no significant differences in the distribution of AD subtypes among the two cognitive progression phenotypes. Thus, the model-based phenotypes identified here, may not be readily detectable using atrophy-based methods. Data-driven studies on AD subty** revealed neurodegeneration patterns similar to those found here [24,25,26], but atrophy-based methods do not always result in distinct cognitive progression phenotypes [27]. Additional work is needed to better reconcile atrophy and cognitive-based subty** of AD and its progression.

Limitations

Several limitations should be noted. First, we acknowledge that any study investigating the effects of AT(N) biomarkers in AD should ideally test longitudinal changes in these biomarkers in the same cohort. However, this was not applicable in the current study due to missing data in several of the biomarkers. Second, while being beyond the scope of the current study, an examination of the spatiotemporal characteristics of amyloid and tau deposition using PET-based markers can provide useful information about the progression of cognitive decline along the AD continuum. Future work focusing on spatiotemporal changes in the synergy between biomarkers for Aβ, tau, and neurodegeneration as it relates to progression rates is thus warranted.

Conclusions

To conclude, our study combined data and model-driven methods to uncover the role of AT(N) biomarkers in the progression of cognitive decline along the AD continuum. The results converge to support a more complex, synergistic relationship between AT(N) biomarkers in determining this progression. Our findings further demonstrate the utility of using modeling approaches to study the complex multifaceted mechanisms that underlie disease progression in AD.