Introduction

In recent decades, numerous randomized clinical trials (RCTs) of disease-modifying therapies (DMTs) have been conducted, aimed at detecting the impact of therapeutic interventions on the clinical progression of Alzheimer’s disease (AD) (1). In RCTs, treatment effects are assessed using validated cognitive and functional scales, as required by regulators (2). The gold standard for determining the efficacy of a treatment is to evaluate whether a statistically significant difference in mean change from baseline to end of study is observed between treatment and placebo groups. Following several decades of failures, three anti-amyloid monoclonal antibody phase III clinical trials (EMERGE, CLARITY AD and TRAILBLAZER-ALZ 2) met their primary endpoints (35), paving the way to accelerated and full approvals of the first DMTs (6, 7).

While these contemporary RCTs produced statistically significant results, the interpretation of the observed score changes on clinical outcome assessments (COAs) utilized in these studies is still a matter of debate, due to a relatively small absolute size of the benefit measured. Determining whether treatment effects represent clinically meaningful benefits to patients and their partners is central to advancing the field of disease-modifying AD therapeutics. To address these issues, the EU-US CTAD Task Force met in Boston. On October 24th, 2023, presentations from academic, industry researchers and trial methodologists, followed by a general discussion, focused on state-of-the-art knowledge for the evaluation of clinical meaningfulness of AD trial results, including the point of view of patients and study partners on what they consider as clinically meaningful. In this manuscript, we provide a summary of key topics (Panel 1) and emerging solutions, while also highlighting knowledge gaps and potential future research directions.

Panel 1 Questions addressed during the CTAD Task Force meeting

What is meaningful for patients and care partners?

The US Food and Drug Administration (FDA) has indicated that a clinically meaningful treatment should be determined on the basis on whether the treatment has a positive and significant effect on how an individual feels, functions or survives (8). Thus, a pre-requisite of determining clinical meaningfulness within this conceptual framework is to identify the priorities of patients and their carers or partners, in order to ensure that validated outcome measures, widely used in RCTs for AD, capture these concepts. These considerations are particularly timely and important, as the current primary outcomes did not incorporate direct patient and caregiver perspectives in their development.

Several qualitative studies aiming to elucidate patients’ and care partners’ perspectives were presented. The “What Matters Most” study was a qualitative interview study conducted with 60 patients and their caregivers, on the relevance of individual symptoms and impacts of the disease in cognitively unimpaired individuals and patients with mild cognitive impairment (MCI) or dementia due to AD (9). In the MCI and mild dementia group, the most frequently reported bothersome or challenging issues were impairment of memory and forgetfulness. The MCI group noted other concerns such as impairment of concentration and orientation. In the mild dementia group, changes in personality and behavior, uncertainty about the future, and the apprehension about becoming a burden were disclosed. The most frequently reported impact of the disease on everyday life, in the MCI clinical stages, were changes in mood and behavior and decreases in social and leisure activities. In mild dementia, decreasing abilities in daily activities and increasing reliance on the care partner were frequently mentioned, in addition to those described by the MCI group (9).

The Real world Outcomes across the AD spectrum for better care: Multimodal data Access Platform (ROADMAP) project was a large European initiative aimed at creating a framework for real-world evidence in AD research (10). One aspect was a systematic literature review on meaningful outcomes from the perspective of patients, caregivers, and healthcare professionals. This review showed that these stakeholders prioritized outcomes typically assessed in RCTs related to cognition and autonomy, but also revealed the importance of other aspects such as maintenance of identity and personality, avoidance of caregiver burden, availability of information on the disease, and access to healthcare services. A second part of the systematic review attempted to address the question of a meaningful delay in progression of AD across the disease continuum, from the perspective of these stakeholders. However, the literature review between years 2008 and 2017 revealed only minimal research published on this topic and was insufficient to draw meaningful conclusions (10).

The Patient-Reported Outcome Consortium Cognition Working Group conducted a qualitative interview study in 25 patients with amnestic MCI and their informants on their most relevant disease-related issues. These data were subsequently analyzed by the Critical Path Institute’s Coalition Against Major Diseases (CAMD) (11). The most frequently reported issues involved cognition, mental health, social interactions, and functional abilities. In a second step, the group mapped the concerns derived from qualitative interviews on basic domains of cognition and neuropsychiatric symptoms. In a third step, these were mapped onto established instruments for assessing the effects of treatments in AD clinical trials. This approach allowed a translation of data obtained on standard instruments in studies to meaningful endpoints in everyday life as perceived by those with amnestic MCI (11).

Research on the patients’ and study partners’ perspectives has been limited to a small number of qualitative studies. Based on the available evidence, most (e.g., memory, autonomy in daily life activities) but not all the concepts highlighted as being meaningful, are indeed captured by the primary COAs included in current and ongoing clinical trials (e.g., the Clinical Dementia Rating Scale (CDR) and the Integrated Alzheimer’s Disease Rating Scale (iADRS)). Furthermore, RCTs employ a holistic measurement strategy spanning primary, secondary, and exploratory outcomes; what matters to patients and care partners may not necessarily be captured by primary outcomes, focused solely on cognition and function (Table 1). Thus, secondary and exploratory outcomes, in addition to supporting the observed effects on cognition and function, may also evaluate the impact of treatment on other concepts emphasized by patients and care partners, such as quality of life, neuropsychiatric symptoms, caregiver burden, and resource utilization.

Table 1 Clinical outcome assessments in some recent phase III anti-amyloid RCTs

The state-of-art knowledge for interpreting the clinical meaningfulness of ad clinical trial results

Shifting the focus to meaningful within-person changes

In recent draft guidance to industry, the FDA highlights the importance of establishing thresholds on a COA that reflects a clinically meaningful change in the concept(s) being measured in a specific target population (12). Several terms are interchangeably used in the literature to refer to meaningful score changes on a COAs (e.g., minimal clinically important difference, minimal clinically relevant change, minimum detectable change, meaningful score difference) (13). The previous emphasis on establishing thresholds to interpret between-group (placebo versus intervention) differences, has evolved towards determining score ranges that reflect “meaningful within-person change” (MWPC) (12). MWPC thresholds are intended to represent changes on a given COA beyond which score change is considered to reflect meaningful impacts on an individual patient.

A variety of approaches for supporting the interpretation of COA-based endpoints have been described in the recent FDA guidance (12). Anchor-based methods link the change in the COA-based endpoint-of-interest to a conceptually related relevant external variable(s) for which meaningful change is more intuitive or already well-established. Distribution-based methods that derive effect sizes or other statistical properties related to the observed variation on the target COA in the clinical trial population (e.g., proportion of the standard deviation or standard error of measurement) are employed as a supportive approach to other empirical methods. Additional approaches that merit consideration for determining thresholds of clinically meaningful within-person change include qualitative or vignette-based approaches. Among qualitative approaches, semi-structured interviews conducted independently from a clinical trial, or exit interviews performed within a clinical trial, can provide key information on what constitutes a clinically meaningful change on target COAs directly from patients and/or care partners (14). Vignette-based approaches, such as bookmarking/standard-setting and scale judgment, are particularly useful if reliable anchor variables are lacking. The bookmarking/standard-setting approach utilizes judgment to determine severity cut-points for a condition through the evaluation of clinical vignettes that include the target COA (15, 16). Similarly, scale judgment methods estimate thresholds of meaningful change by assessing clinical vignettes that reflect the amount of change on the target COA that was experienced after receiving treatment and can provide important perspectives on an individual’s threshold for meaningful change in relation to their current status (17).

MWPC thresholds can be useful to identify meaningful “progressors” (i.e., individuals that meaningfully progress on a given COA) to support within-patient analyses of clinical trial data. For example, the percentage of “progressors” can be compared between the placebo and treatment arms. These within-patient analyses help to address questions like: “How many patients meaningfully progressed (by the established threshold(s)) on treatment compared to placebo at the end of study?” or “What is the likelihood of meaningful progression (by the established threshold(s)) on treatment compared to placebo”? This approach, recently applied in the TRAILBLAZER-ALZ 2 study (NCT04437511) (5), can support the interpretation of treatment efficacy, over the course of a RCT. Conversely, MWPC thresholds are not intended to inform the required magnitude or evaluate the meaningfulness of between-group differences in mean change from baseline, for instance in attempting to address the question: “Is the magnitude of difference between treatment and placebo groups meaningful”? Applying MWPC thresholds in this way sets unrealistic expectations for emerging DMTs, which aim to slow disease progression in the context of a progressive neurodegenerative disease. For example, considering the average placebo decline on CDR-Sum of boxes (CDR-SB) observed in studies of recently approved DMTs (on average 1.5-2 points over 18 months), in order to exceed available MWPC thresholds for the CDR-SB (18) at the between-group level, a treatment would need to show either complete stabilization or improvement from baseline, on average. COAs are designed to quantify the severity of a specific aspect of the disease, across an entire population and multi-year span of disease. Therefore, it is not expected that any patient will transition from a minimum to a maximum score in the course of a study. For this reason, MWPC thresholds would constitute only a small portion of the range of the outcome scale (e.g., 1.5–2 points out of an 18-point scale).

In summary, a variety of promising approaches to support the interpretation of COAs are available and in use. Determining a range of thresholds that reflect MWPC on widely used AD COAs can support the interpretation of emerging clinical trial results when correctly applied. Finally, such thresholds should not be used as requirements for meaningful group-level differences in change from baseline, leading to unrealistic expectations for what constitutes a clinically meaningful treatment benefit for emerging DMTs.

Clinical Relevance and Interpretation of Meaningful Within-Person Change on CDR

The CDR is a widely used measure to clinically stage AD (according to the CDR-Global Score (CDR-GS)) and provide a granular assessment of changes in cognition and function (via the CDR-SB) (19). The CDR-GS is computed by a standard algorithm and directly captures AD severity: 0 (unimpaired individuals), 0.5 (MCI), 1 (mild dementia), 2 (moderate dementia), 3 (severe dementia). The CDR-SB score is the sum score of 6 domains (i.e., memory, orientation, judgment, community affairs, home and hobbies, personal care) each scored between 0 and 3, such that the CDR-SB score ranges from 0 to 18 (19). The CDR-SB score is commonly used and widely accepted by regulators as a primary endpoint in early AD RCTs. As previously described, the CDR captures changes in cognition and function that are considered meaningful to patients and their caregivers (e.g., changes in memory, orientation, autonomy, leisure activity etc.) (9, 10, 11).

Two studies aiming to determine MWPC thresholds for CDR-SB in early AD were presented. An analysis of data from the National Alzheimer’s Coordinating Center (18), a longitudinal prospective observational database (20), estimated the MWPC thresholds for a variety of COAs, including the CDR-SB, using clinician-rated anchor-based and distribution-based approaches. The mean changes in CDR-SB across consecutive annual visits were described, stratified by whether the clinician indicated there was a meaningful decline in scores from the previous visit (anchor-based approach). The clinically meaningful decline was defined based on a variable representing the clinician’s assessment of whether there was a meaningful decline since the previous visit. The magnitude of change on the CDR-SB associated with clinician assessment of meaningful decline increased with disease severity, ranging from 1–2 points: 0.98 for the MCI subgroup, 1.63 for mild AD, and 2.3 among those with moderate-severe AD. Results from the distribution-based methods, as well as those using an alternative anchor (change in CDR-GS), were consistent with the primary anchor-based approach (18). The fact that the patient population did not have confirmed amyloid pathology and was not restricted to individuals at the early stage of the disease represents a potential limitation to the generalizability of these findings to a contemporary biomarker-confirmed early AD population. Moreover, given the limited treatment options for AD, treating clinicians might underestimate the progression noted by patients or care partners.

Another study aimed to incorporate the caregiver perspective by using caregiver-rated anchors to evaluate meaningful changes on the CDR-SB (21) in biomarker-confirmed patients from the TAURIEL study (NCT03289143), a phase II clinical trial aimed to assess the safety and efficacy of semorinemab in early AD. Across time points and anchors, the mean CDR-SB change associated with the “somewhat worse” category on the Caregiver Global Impression of Change – Alzheimer’s Disease (CaGI-Alz) anchor items ranged from 1.5–2.5 points: 1.5–2.1 in the combined early AD sample, 1.1–2.1 in the MCI subgroup and 1.8–2.3 in mild AD. A 1-point change was more readily associated with the “no change” anchor category and distribution-based estimates (21).

Taken together, the threshold ranges reported across these studies and others (18, 21, 22) using anchor- and distribution-based approaches are broadly aligned, despite differences in study populations (observational versus clinical trials) and anchor types and raters (various clinician- versus caregiver-rated anchors of change and/or severity), providing confidence in their robustness and applicability to early AD populations. Importantly, these initial studies suggest that thresholds may differ according to disease severity (e.g., MCI vs mild AD) highlighting the importance of calibrating thresholds of meaningful within-person change to the target population.

Clinical Relevance and Interpretation of Meaningful Within-Person Change on iADRS

The iADRS is a clinical outcome assessment that measures the impact of cognitive loss on the ability to conduct everyday activities (23). The iADRS was developed as a single integrated measure of cognition and function incorporating inputs from multiple sources (i.e., patient and caregiver) collected across 31 items from 2 widely accepted used scales: the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog13) and the Alzheimer’s Disease Cooperative Study-Instrumental Activities of Daily Living (ADCS-iADL) (23). In the phase II and III studies of donanemab, TRAILBLAZER-ALZ (NCT03367403) and TRAILBLAZER-ALZ 2 (NCT04437511), the iADRS was used as the primary outcome measure. Psychometric analyses were implemented using Rasch Measurement Theory (RMT), a psychometric technique, to evaluate how well items within the iADRS perform as a set, and to provide an integrated measurement of global disease severity. RMT results evaluating iADRS item-level performance demonstrated overlap** contributions of cognitive and functional items to the iADRS total score throughout the range of disease severity (24). In early stages of symptomatic AD, cognitive impairments in episodic memory were associated with declines on items assessing instrumental functions relying on memory (e.g., talking about recent experiences) (24). In a post-hoc analysis, donanemab, notably impacted the episodic memory and memory-dependent activities of daily living items. (25). RMT evidence generated regarding the relative order and clinical stage at which specific cognitive and functional deficits tend to emerge offers valuable insights into the potential impact and clinical meaningfulness of DMTs (24). Depending on an individual’s current level of functioning along the disease continuum, recent findings have clearly shown the functions that are likely to be maintained or exhibit slowed progression when treated with donanemab.

Decline on the iADRS has been associated with outcomes of disease progression such as measures of patient quality of life, caregiver burden, and health costs (26). MWPC thresholds for iADRS were estimated using data from two RCTs (AMARANTH (NCT02245737) and EXPEDITION3 (NCT01900665)). Using anchor-based, distribution-based, regression analyses, and cumulative distribution function plots, a 9-point worsening for individuals with mild dementia and 5 points for patients with MCI were identified as a clinically meaningful change (27).

These studies (24, 26) provided quantitative evidence in support for the iADRS as a fit-for-purpose integrated measure of cognition and daily function in early symptomatic AD and demonstrated that slowing of clinical decline with donanemab treatment translates into meaningful benefits for patients. MWPC thresholds for iADRS might serve as suitable estimates for evaluating the likelihood of meaningful progress as well as the proportion of patients that meaningfully progress. Further research is needed to validate these thresholds.

Challenges, perspectives and additional considerations

The implementation of patient-reported outcomes in AD clinical trials

Clinical outcome assessments (COAs) is an umbrella term for different COA types, including performance-based outcomes, clinician-reported outcomes, observer-reported (often care partner) outcomes, and patient-reported outcomes (PROs). PROs and Patient Reported Outcome Measures (PROMs) are directly and subjectively reported by the patient (28). While they play a key role in many disease areas, for example 79% of all European Medicines Agency New Marketing Authorizations in oncology between 2017 and 2021 (excluding biosimilar and generics) contained PROMs in their filings (28), there are challenges to their use in clinical trials for symptomatic AD. Importantly, the validity of self-reporting in individuals with cognitive impairment may be questionable due to loss of insight, and result in unreliable responses.

There are few examples of PROMs in AD trials, often limited to Quality of Life (QoL) scales included as exploratory endpoints. Quality of Life is a broad construct that includes emotional, social, and physical aspects (29). The Centers for Disease Control and Prevention (CDC) defines Health-related Quality of Life (HRQoL) as one’s perception of how one’s well-being is affected by a disease, disability, or disorder (30). HRQoL measures offer direct patient and/or care partner perceptions of the impact of a disease (31, 32). Several scales exist for measuring HRQoL, and can be general or disease-specific, single domain, or multidimensional (28). Few scales are AD-specific and some are intended for specific clinical stages of AD from mild to severe disease (33, 34). The Quality of Life in Alzheimer’s Disease (QoL-AD) for mild to moderate dementia and the Dementia Quality of Life (DemQoL) for all dementia stages cover all HRQoL domains (symptoms, physical function, psychological well-being, social functioning) and are completed by both, the patient and the proxy (34). There are no validated HRQoL instruments specifically for MCI. Several scales allow proxy respondents, raising important questions such as: who is the appropriate judge of the quality of life? and when is one unable to judge one’s own quality of life? Of note, the FDA discourages proxy-reported outcomes for cognitively impaired populations (35). Proxy-reported outcomes differ from observer-reported outcomes (such as the ADCS-ADL), which are currently utilized in several RCTs in that the observer, in addition to reporting their observation, may also interpret or provide an opinion based on the observation (35).

The results from the CLARITY AD trial (NCT03887455) showed a 49% and 56% reduction in decline, as measured by patient-rated EQ-5D-5L and QOL-AD, respectively, in patients treated with lecanemab compared to placebo (36). If QOL-AD was rated by the study partner, there was a 23% reduction in decline observed in patients treated with lecanemab compared to placebo at 18 months (36). These results provide key learnings. Firstly, both general and AD-specific scales were sensitive in demonstrating decline over 18 months and detecting treatment effects. Secondly, disease-specific QoL scales showed a greater treatment effect versus HRQoL measures non-specific for AD. Finally, in this study, patient-reported responses were more sensitive to treatment effects than proxy-reported responses. Further research is needed to understand the generalizability of these findings.

In summary, a broader conceptualization and implementation of PROMs in early AD trials and treatment (registries) could permit understanding patient benefit beyond the primary endpoints of cognition and activities of daily living. However, concerns regarding the reliability of self-report measures over time due to declining cognition and loss of patient insight have led to the widespread use of observer-(often caregiver-) reported outcomes. HRQoL measures may contribute to evidence for clinical meaningfulness by providing the patient’s perspective on what matters to them and appear to be sensitive to decline and to treatment effects in early AD. AD-specific HRQoL tools may be most relevant and generally include a broader range of concepts. The use of proxy measures to evaluate QoL has caveats and caution is required in broad assumptions that individuals with AD cannot provide meaningful QoL responses. Indeed, self-reported measures of QoL (and other symptoms/impacts) may be essential in the very early disease stages, where subtle changes are noticeable to people living with AD but not readily observed by others.

Novel measures to capture the patient’s voice

Other measures to capture the patient’s voice should be considered. The Goal Attainment Scale advocated by Kenneth Rockwood (37) attempted to individualize patient and care partner goals related to cognition, function, leisure, behavior and social interaction. This approach has the potential to be modified for use in early-stage disease trials. Digital tools may be appropriate for capturing nuances of change in cognition and function associated with independent activities of daily living (3840). Passive home monitoring and smartphones that monitor sleep, location, voice and gait might also be considered as composite metrics of change in future trials (41).

The “time saved” approach

Disease-modifying effects and slower disease progression can result in preserved function and cognition, manifested as delayed milestones of decline. These effects are expected to accumulate while on treatment and be maintained even when treatment is discontinued. In contrast, symptomatic effects provide a temporary benefit while on treatment that is lost when treatment is discontinued. Time component tests (TCTs) translate the mean changes between placebo and treatment groups from the units of the outcome scale to a time metric (4244). This approach (i.e., the “time-saved” approach) can facilitate the understanding of clinical trial findings by expressing treatment effects in terms of the time (i.e., months/years) by which cognitive or functional loss is delayed. For example, TCTs were employed to determine the time saved with donanemab treatment in the phase II TRAILBLAZER-ALZ study. At week 76, disease progression was delayed on average by 5.3 months and 5.2 months as measured by the iADRS and the CDR-SB, respectively (42).

DMTs are expected impact all aspects of the disease. Correlations between changes in different outcomes are small, making it challenging to observe consistent patterns or improvements across multiple aspects unless there is a robust treatment effect. Combining TCTs for the iADRS and CDR-SB - which is feasible because these measures can be both converted onto a time scale - reduces variability. Thus, combining TCTs across different outcomes can better estimate the time saved. For example, an intervention that can change the ADASCog, ADCS-iADL, and CDR-SB simultaneously has evidence of a much stronger treatment effect than if it affected only one of these measures. Using data from the LipiDiDiet study, TCTs were employed to determine the time saved with Souvenaid treatment. At 24 months, disease progression was delayed by 9 months using a combination estimate of time savings including a 5-item composite Neuropsychological Test Battery (NTB), the CDR-SB and hippocampal atrophy (43).

Providing information on treatment effects in terms of time saved may be easier for patients and care partners to understand and may be more clinically interpretable to clinicians, than the difference in point change on a COA. Furthermore, this approach can facilitate the comparison of results between trials in which different outcome measures are employed because time becomes a common metric in these analyses.

Does amyloid-removal provide disease-modifying benefit?

The previous discussion of establishing clinical meaningfulness, focuses on the treatment effects at the end of a trial (i.e., within a prespecified, relatively short time interval). However, if an intervention alters disease pathology and thereby changes the disease trajectory, a different approach may be appropriate. When treatments alter underlying pathophysiology, treatment benefits may be expected to continue beyond the termination of therapy, and continued treatment may enlarge the placebo-treatment difference. For instance, if an 18-month trial demonstrates a 5-month time-saving benefit, extending the treatment for an additional 18 months may be expected to yield a total time-saving of 10 months.

What is the evidence that amyloid-removal alters the course of AD? While formal delayed-start analyses are not available for lecanemab and donanemab, modeling of trial results (45) and post-trial follow-up data (46) support a disease-modifying effect. Emerging evidence of treatment effects on biomarkers presumably downstream from amyloid accumulation (e.g., plasma neurofilament light chain (NfL), Glial Fibrillary Acidic Protein (GFAP), phospho-tau (p-tau) species (e.g., p-tau181 and p-tau217), and neurofibrillary tangle burden as measured by Positron Emission Tomography) supports the notion of altered pathobiology (47). The biological changes persist after completion of the treatment course (46, 48, 49). The evidence seems sufficient to justify consideration of disease-slowing benefits beyond the documented 18-month drug-placebo differences in pivotal trials when assessing the clinical meaningfulness of treatment.

Conclusions

The EU-US CTAD Task Force identified several general principles that should be considered when evaluating the meaningfulness of clinical trial results (summarized in Panel 2).

Panel 2 Key points about Clinical Meaningfulness in Alzheimer’s Disease Clinical Trials

The most commonly used primary outcomes (e.g., CDR-SB and iADRS) capture concepts identified as meaningful for patients and care partners and reflect the core manifestations of disease (e.g., memory, autonomy). More broadly, RCTs employ secondary and exploratory outcomes that capture additional important aspects that matter to patients beyond cognition and autonomy in daily life activities, such as behavioral symptoms, caregiver burden, and quality of life. Secondary and exploratory outcomes support the relevance and meaningfulness of observed effects on the primary outcomes. Observer-reported outcomes are widely used in AD to represent the patient’s perspective. While these outcomes are crucial in trials of early AD, where the reliability of self-report may be questioned due to the characteristic cognitive impairment and potential loss of insight over time, implementing patient-reported outcomes in the earlier stages of disease could enable a more accurate representation of patients’ perspectives.

Three phase III anti-amyloid RCTs have demonstrated statistically significant differences in mean change from baseline to end of study between treatment and placebo groups. The consistent findings across multiple RCTs, considering different agents, enhance the robustness of these results. The clinical meaningfulness of emerging therapies has been the topic of debate and discussion. Importantly, multiple approaches to evaluate the clinical meaningfulness of treatment benefit are available and should be considered. For example, a growing number of studies are now attempting to establish consensus thresholds for meaningful within-person change on outcomes widely-used in AD clinical trials to support within-patient analyses of trial data. These thresholds may be useful to support the interpretation of clinical trial results by informing progressor and/or time-to-event analyses but are not fit-for purpose for application to between-group differences in change from baseline. Using such thresholds to determine the meaningfulness of these between-group differences, as has been widely cited in the literature, may set unrealistic expectations for progressive diseases whereby emerging DMTs aim to delay or slow progression (rather than demonstrating improvement), and can lead to erroneous conclusions. Additional promising approaches to describe clinical trial results in a more tangible way for patients and their care partners (e.g., time-saved approach) are emerging and should be further explored. It is important to highlight that data on clinical meaningfulness derived from RCTs analyses may not be automatically applicable to the populations that will receive these drugs in real-world clinical settings. Indeed, RCTs involve patients selected on the basis of very specific criteria and may not be fully representative of the broader population that will be treated. Continued efforts to assess the clinical meaningfulness of DMTs in clinical practice (phase IV of drug development) will be necessary and will provide data on clinical meaningfulness that double-blind RCTs may not fully capture. Finally, in view of a mechanism of action that targets primary disease pathology, evidence of downstream biomarker changes and post-trial clinical observations support the need to consider long-term effects on disease trajectory in weighing clinical meaningfulness of therapy.