Introduction

Randomized controlled trials (RCTs) are designed to assess drug efficacy and safety. RCTs tend to enroll healthier, younger individuals [1,2,3]. Consequently, the generalizability of RCT findings to the broader population may be questionable [1, 3]. Real-world evidence (RWE) can be generated using post-approval observational data from patient registries, medical records, or claims data. Since some argue that pragmatic RCTs are real-world evidence also, RWE plays a crucial role in understanding the effectiveness and safety of medications in populations not well represented in clinical trials [4,5,6]. Such RWE informs clinical guidelines, prescribing practices, and box warnings. However, without the benefit of randomization, RWE studies are subject to confounding [4,5,6].

Channeling bias is one source of confounding, and results from measured and unmeasured prognostic factors that “channel” a patient to receive a certain treatment, particularly when both newly approved and established treatments for the same clinical indication exist [6,7,8]. Newer medications may be prescribed to persons with complicated or refractory disease, those who experienced adverse effects of prior treatments, individuals with greater healthcare utilization, or persons with higher incomes [6,7,8,9,10]. These differences in prescribing may attenuate with time [8, 11, 12], but if not addressed, they can lead to biased estimates of and incorrect conclusions about medication effectiveness and safety [6,7,8]. This is particularly important if a drug is investigated just after market entry, where findings can affect how the new drug is utilized.

A popular means to control confounding, including confounding due to channeling bias, is through propensity scores [6]. If time since market entry of a drug is considered when modeling the propensity score, it may be possible to address channeling bias also. A propensity score is the conditional probability of receiving a medication given a certain distribution of measured covariates [13, 14]. Propensity scores are calculated based on measured characteristics and then accounted for through adjustment, stratification, standardization, weighting, or matching [13,14,15]. Non-overlap** propensity scores indicate that individuals receiving one treatment are not comparable to those receiving the other treatment [8, 12, 15,16,17]. In a common practice known as “trimming,” researchers remove individuals who have non-overlap** propensity scores from the final analytical dataset [8, 12, 15,16,17,18,19]. Although trimming reduces bias and improves the internal validity of RWE studies, trimming can reduce statistical precision and limit generalizability [12, 14,15,16,17,18,19].

Emerging treatments for many neurological disorders ensure that many RWE effectiveness and safety studies will soon be presented for peer review and publication, which may be impacted by channeling bias. If so, an unintended consequence may be the publication of studies with unrecognized limitations. In response, the aims of this study were (1) to examine channeling through measurement of differences in those prescribed more recently approved versus established medications for the treatment of three common neurological disorders and (2) to assess how treatment group comparability can vary over time using propensity score distributions and overlap. Our study hypotheses were (1) individuals who received recently approved, as compared to established neuropsychiatric medications, would have hallmarks of complicated or refractory disorders, and (2) measured characteristics of those prescribed more recently approved and established medications would become more similar over time.

Methods

Study Approval

The Office of Regulatory Affairs of the University of Pennsylvania (Philadelphia, PA) granted Institutional Review Board exemption for this study.

Data Source

We used Optum’s de-identified Clinformatics® Data Mart Database data from January 2005 through June 2019. Optum data contains demographic, enrollment, healthcare claims, and pharmacy claims data for commercially insured and Medicare Advantage individuals [20, 21]. Enrollees are representative of the commercially insured population in the United States (US) [20].

Neurological Disorders and Medication Pairs

To explore differences in prescribing newer versus established medications, we selected three common neurological disorders with a recently approved and an established treatment option that would likely be compared in an RWE study: diabetic peripheral neuropathy, Parkinson disease (PD) psychosis, and epilepsy. To be included in the analysis, individuals must have been a new user of either the more recently approved or established treatment for a given indication [22]. To be a “new user,” an individual needed to have continuous insurance coverage in a 6-month look-back period and have no prescription fills for that respective medication during that period [22]. To avoid trial prescriptions, we also required a minimum initial dispensing of a 14-day supply of the designated medication. We identified medications using the National Drug Codes (NDCs) for each active ingredient available from Cerner Multum Lexicon Plus (Denver, CO), a regularly updated database that contains up-to-date NDCs for all medications approved in the US [23].

Diabetic Peripheral Neuropathy

We examined adult (≥ 18 years of age) new users of either gabapentin (1993 FDA approval) or pregabalin (2004 FDA approval). Although gabapentin is not FDA-approved for diabetic peripheral neuropathy, it is used as a first-line treatment for this indication [24]. We excluded individuals if their new user date occurred before both medications were available on the market. Because pregabalin was approved on December 30, 2004, we excluded individuals with new user dates prior to January 2005.

Diabetic peripheral neuropathy diagnoses were identified in the 2 years prior to initial medication dispensing date using the International Classification of Diseases Clinical Modification, 9th Revision (ICD-9-CM) code 357.2 (diabetic polyneuropathy) [25], approved based on expertise knowledge, and the corresponding 10th Revision (ICD-10-CM) codes E08.42, E09.42, E10.42, E11.42, or E13.42 obtained via general equivalence forward map**s from the US Centers for Medicare & Medicaid Services (CMS) [26].

Parkinson Disease Psychosis

For this indication, we restricted to new users who started either quetiapine (1997 FDA approval) or pimavanserin (2016 FDA approval), the two preferred antipsychotics for treating PD psychosis, from May 2016 onward [27]. We selected quetiapine as the established comparator instead of clozapine, as a previous study among individuals with PD in this dataset showed very limited clozapine use [28]. Because there are no specific diagnostic codes for PD psychosis, we required both a claim for PD (ICD-9-CM 332.0 or ICD-10-CM G20 by a physician or advanced practice provider) plus an antiparkinsonian medication (eTable 1) within 6 months of the claim (positive predictive value = 86.4%) [29]. Individuals must have met both criteria in the 2 years prior to the new user date to be considered as new users of antipsychotics for PD psychosis treatment. We further excluded those < 40 years of age, as PD is extremely rare before the fourth decade [30].

Epilepsy

Lastly, starting in 2016, we examined adult (≥ 18 years of age) new users of levetiracetam (1999 FDA approval) and brivaracetam (2016 FDA approval), two broad-spectrum epilepsy medications [31]. We used the previously validated approach detailed by Moura et al. (2017) with a positive predictive value of 84% to identify epilepsy (ICD-9-CM codes, 345, 333.2, 779, 780.2, 780.3; ICD-10-CM codes, G40, G41, R56) in the 2 years prior to the new user date [32].

Covariates

The following sociodemographic factors were measured at the time of the drug dispensing: age, sex, race, region of residence (Northeast, Midwest, South, West), insurance product type (health maintenance organization, preferred provider organization, point of service, other), zip code-based income level (< $40 thousand [K], $40–49 K, $50–59 K, $60–74 K, $75–100 K, ≥ 100 K, unknown/missing), and zip code-based education level (less than bachelor’s degree, bachelor’s degree or greater, unknown/missing). Categorical variables with sparse missing data or small “other” categories (sex [≤ 0.04% for a given drug], region [≤ 0.47% for a given drug]) were reassigned to the mode.

All other covariates were assessed over the 6 months prior to initial dispensing. The presence of individual comorbid medical conditions (depression, anxiety, bipolar disorder, psychotic disorders, epilepsy, migraine, dementia, hypertension, diabetes, congestive heart failure, atrial fibrillation, chronic kidney disease, end-stage renal disease/hemodialysis, stroke/transient ischemic attack, and chronic obstructive pulmonary disease) were determined by any claim for a CMS Chronic Conditions Warehouse previously validated ICD-9-CM or ICD-10-CM diagnosis code [33]. We also calculated a combined Charlson-Elixhauser comorbidity index score [34]. Prescription utilization was measured as the monthly average number of distinct medications filled; healthcare utilization intensity was captured by the presence (yes/no) of an inpatient admission, outpatient visit, emergency department visit, or neurologist visit. For each drug pair, we also examined previous or simultaneous use of another medication for the same clinical indication as a proxy for patients having complicated/refractory disease or possible adverse effects from prior treatment. More specifically, we considered other nerve pain medications such as tricyclic antidepressants, duloxetine, or opioid analgesics prescribed for diabetic peripheral neuropathy, the use of other atypical antipsychotics in the PD psychosis group, and other antiepileptic medications for the epilepsy group (eTable 1).

Statistical Analysis

Analysis was conducted in SAS v9.4 (Cary, NC). For each indication, we used bivariate analysis to examine differences in sociodemographics, clinical characteristics, and prescription/healthcare utilization intensity between new users of the more recently approved and established medication. We assessed whether differences in characteristics were statistically meaningful using standardized differences (|stddiff|≥ 0.10 threshold), which are a sample size-independent approach to compare group characteristics [35, 36].

We fit a propensity score model for each calendar year with logistic regression that included all a priori identified covariates. Indication for each drug pair use (e.g., epilepsy for levetiracetam and brivaracetam) was not included as a comorbidity covariate in the models when applicable. A few uncommonly occurring covariates were omitted from certain models to allow model convergence: end-stage renal disease/hemodialysis (PD psychosis, epilepsy), migraine (PD psychosis), and congestive heart failure (epilepsy). We plotted the propensity score distribution and determined the amount of non-overlap in the propensity scores (which would ordinarily be trimmed) for each drug pair and year [8, 11, 12]. We examined how the amount of non-overlap changed over time and performed a Cochran-Armitage test to examine if the proportion of propensity scores trimmed differed over time.

To examine the impact of non-overlap and trimming on generalizability, we examined the drug pair with the most substantial non-overlap, levetiracetam-brivaracetam. We created a propensity score model pooling all years of data with calendar year included in the model. After trimming, we re-fit the model [19], sorted propensity scores in descending order, and conducted a 1:1 greedy match without replacement with a caliper of 0.2 times the standard deviation of the logit of the propensity scores [14]. This sample would be used in a hypothetical comparative effectiveness or safety study. We performed a bivariate analysis of the matched sample using standardized differences to examine both prognostic balance and whether this trimmed sample differed from the larger, untrimmed sample of new users.

Results

Diabetic Peripheral Neuropathy

For the diabetic peripheral neuropathy comparison, there were 17,548 and 43,935 new users of pregabalin and gabapentin, respectively, who met study criteria. Table 1 displays sociodemographic and clinical characteristics of pregabalin and gabapentin users pooled across all years. New users of pregabalin were slightly younger (mean age = 65.5 [SD = 11.6]) than gabapentin new users (mean age = 68.4 [SD = 11.1]). In the prior 6 months, pregabalin users also had more unique prescription fills per month (5.6 [SD = 3.0] vs. 4.7 [SD = 2.7]) and more often had at least one outpatient visit (84.2% vs. 77.8%), neurologist visit (20.8% vs. 14.9%), and prior neuropathic pain drug prescription (73.9% vs. 38.7%), as compared to their gabapentin counterparts. Uptake of both drugs increased with time.

Table 1 Characteristics of persons with diabetic peripheral neuropathy who were new users of pregabalin or gabapentin (2005–2019; n = 61,483)

As shown in eFig. 1, there was good overlap of propensity score plots for this drug pair. In 2005, only 15.6% of the gabapentin group and 3.1% of the pregabalin group (overall = 12.4%) were trimmed due to propensity score non-overlap. The proportion trimmed differed by year (Cochran-Armitage p-value < 0.01) with minimal trimming observed after the first year pregabalin became available.

Parkinson Disease Psychosis

Between 2016 and 2019, 4081 individuals with PD met study criteria (pimavanserin n = 961, quetiapine n = 3120). Meaningful differences between pimavanserin and quetiapine users were present with respect to comorbidities and healthcare utilization (Table 2). Documentation of a psychotic disorder was more common among pimavanserin users (21.2% versus 13.4%). Pimavanserin users less frequently had a recent history of inpatient admission or emergency department visit; however, they more often had neurologist care (Table 2). In total, 41.1% of pimavanserin users had taken a prior atypical antipsychotic, compared to only 14.0% of quetiapine users. Pimavanserin users were more often in the highest income brackets (stddiff = 0.130).

Table 2 Characteristics of persons with Parkinson disease who were new users of pimavanserin or quetiapine (2016–2019; n = 4081)

eFigure 2 displays the propensity score distributions for pimavanserin and quetiapine users for 2016 through 2019. The greatest amount of trimming due to non-overlap was in the first year (quetiapine = 7.2%, pimavanserin = 2.0%; overall = 6.1%). The proportion trimmed differed by year (Cochran-Armitage test p < 0.01).

Epilepsy

There were 602 new users of brivaracetam and 28,491 new users of levetiracetam between 2016 and 2019. Brivaracetam users were younger (mean = 48.0 [SD = 17.7]) than levetiracetam users (mean = 63.1 [SD = 18.2]) and more often female (59.8% versus 53.3%) (Table 3). Individuals in the brivaracetam group were also more often White (61.3% versus 56.2%), in a point of service (POS) health plan (35.4% versus 19.8%), and from a higher income bracket. The levetiracetam group had a higher Charlson-Elixhauser comorbidity score (mean = 3.9 [SD = 3.8]) as compared to the brivaracetam group (mean = 1.5 [SD = 2.3]). New users of brivaracetam were more likely to have an outpatient visit but less likely to have an inpatient admission, emergency department visit, or neurologist care. Brivaracetam new users had more medication fills; almost all (93.4%) of the brivaracetam new users had previously used an antiepileptic drug, as compared to 32.1% of levetiracetam new users.

Table 3 Characteristics of persons with epilepsy who were new users of brivaracetam or levetiracetam (2005–2019; full sample n = 29,093; restricted sample n = 1198)

Figure 1 displays an extensive amount of non-overlap between these two antiepileptic medications; the proportion trimmed differed by year (Cochran-Armitage p-value < 0.01). In 2016, 43.6% of levetiracetam users were subject to trimming due to non-overlap; this proportion dropped to 29.9% (2017), 11.6% (2018), and 22.4% (2019). For brivaracetam, these proportions were 2.9, 2.2, 0.5, and 0.9%. The overall proportion trimmed in 2016 was 43.2%.

Fig. 1
figure 1

Propensity score overlap for apparent, new users of brivaracetam or levetiracetam, by year

After trimming, re-fitting the propensity score model, and matching, the analytic sample for the epilepsy pair included 599 brivaracetam new users and 599 levetiracetam new users (mean propensity score = 0.17 (SD = 0.14) for both groups, stddiff = 0.001). Of the variables included in the propensity score model, there was still imbalance in income (stddiff = 0.111) (Table 3). Almost all new users in the analytic sample had prior antiepileptic drug use (brivaracetam = 93.3%, levetiracetam = 94.2%), and characteristics reflected that of the brivaracetam group in the original sample (e.g., younger, lower comorbidity score, fewer acute care visits).

Discussion

Recently, several treatments have been developed for the treatment of both more common (e.g., multiple sclerosis, epilepsy, migraine) [37,38,39,40] and rare (e.g., Duchenne muscular dystrophy, spinal muscular atrophy) neurological disorders [41, 42]. RWE studies comparing newer versus established treatments surely are forthcoming and are an essential for informing clinical care and providing the evidence basis for clinical guidelines. However, in such studies, researchers should consider how differential prescribing patterns may impact the study design, approach, and analysis.

The first aim of this study was to examine differential prescribing patterns for three more recently approved neuropsychiatric drugs (pregabalin, pimavanserin, and brivaracetam), compared to their respective established treatments. Pregabalin, pimavanserin, and brivaracetam were prescribed to patients with different sociodemographic, clinical, and healthcare utilization characteristics than users of the established drugs for the same clinical indication. Across all three pairs, the most profound difference noted was that users of the more recently approved drugs in our study were more likely to have a prior prescription for a medication for the same indication. Consistent with our hypothesis, prior or concurrent medication use, in particular, suggests that these more recently approved medications were channeled to individuals with complicated or refractory disease, or who experienced adverse effects with the standard medication. This finding could also reflect drug coverage formulary restrictions and step therapy requirements. These findings have RW study design implications—in all three drug pairs, RWE studies which use an incident new-user design (i.e., only including treatment naïve individuals) would exclude the majority of actual new users of pregabalin, pimavanserin, and brivaracetam, as these drugs entered the market at a much later time compared to their respective comparators, and thus, a large proportion of previous users of these comparators could have switched to the new treatment. Therefore, a prevalent new-user design (i.e., all including individuals who have switched medications) [43] is likely to produce more generalizable RWE for neuropsychiatric drugs that also reflect clinical practice. An incident new-user design in an observational cohort study, while straightforward, only works best when evaluating users who are naïve to both treatments. This is of great use if studying the comparative effects of two drugs with a similar clinical indication that enter the market at the same time, or, if one wants to exclude persons that switch medications [43].

Although differences in prescribing varied by drug pair, younger age, higher socioeconomic status, and greater specialty care are consistently associated with receipt of newer drugs [9]. Pimavanserin new users were less likely to have several comorbidities but more likely to be diagnosed with psychotic disorders. This difference is likely a reflection of stricter prescribing requirements for pimavanserin, as the drug is only available through specialty pharmacies. Surprisingly, despite RCT data suggesting that brivaracetam is associated with a reduced risk of depression and anxiety [44], individuals with these particular psychiatric disorders were not channeled to brivaracetam.

The second aim of this study was to examine shifts in prescribing over time. As we hypothesized, we found that propensity score non-overlap improved over time in all three drug pairs we studied, although not in a linear fashion. More specifically, the proportions of non-overlap could fluctuate unpredictably during the first few years when the newer drug became available on the market. For the diabetic peripheral neuropathy and PD psychosis pairs, non-overlap was minimal after the first year both drugs were available. In contrast, propensity score non-overlap was pronounced and persistent for the comparison of brivaracetam and levetiracetam. Individuals in the study sample created using standard propensity score matching and trimming for this drug pair overwhelmingly had prior antiepileptic drug use and were substantially younger than the actual, full sample. These restricted, trimmed sample characteristics suggest that in order to have an internally valid, RWE study of brivaracetam versus levetiracetam, generalizability and sample size will necessarily be limited [8, 12, 14,15,16,17,18,19].

In the absence of existing guidelines, based on these data and the need to maximize the generalizability of comparative effectiveness and safety studies for new neuropsychiatric drugs, we suggest researchers postpone RWE studies of more recently approved medications until it can be demonstrated that such drugs have been available long enough to achieve comparable prescribing [8, 11, 12] and consider performing sequential cohort studies, where RWE studies are iteratively conducted over time, to understand comparative safety and effectiveness of drugs with shifts in prescribing [8, 11]. When studies comparing newer and established treatments are critically needed as soon as new treatments enter the market, researchers should recognize the potential for channeling bias and implement appropriate methodological approaches to improve this issue in such studies. Propensity score analysis represents a practical approach for evaluating and minimizing potential channeling bias [6, 11]. For example, researchers conducting RWE studies should display the baseline differences between groups, report the proportion trimmed, and explore time trends in propensity score overlap. The latter is particularly important when studying newly approved medications. Researchers may also want to consider alternatives to propensity score matching plus trimming, including overlap weighting [15, 45]. Overlap weighting assigns greater weights to those with propensity scores of 0.5 and lower weights to those with extreme propensity scores, thus eliminating the need to trim [45]. Regardless of approach, comparative analyses reported without the additional data elements suggested above are not fully interpretable. While there is a lack of studies addressing potential for channeling bias in neurology, this issue has been previously recognized and/or addressed in previous comparative studies of diabetic [11], cardiovascular [46], pain [47], or chronic obstructive pulmonary disease [48] treatments.

A major strength of this study is that use of a large, administrative dataset of commercially insured individuals allowed for adequate sample size to identify differential prescribing overall and over time for several common neurological disease treatments in the US, raising awareness regarding potential for channeling bias in neurology studies. However, our use of medical claims data limited our ability to examine health plan, provider, or patient characteristics that may function as channeling factors, such as formularies, disease severity, differing drug effectiveness, and physician/patient preferences [49]. A priori selection of propensity score covariates, although common in clinical research, may result in covariate misspecification [14, 15, 18]. For this reason, computerized algorithmic covariate selection via the high-dimensional propensity score (hdPS) method is increasingly preferred for pharmacoepidemiology research [50]. Our trimming approach focused on areas of non-overlap. Other trimming approaches may remove extreme propensity scores where individuals may be treated counter to prediction or conduct asymmetric trimming with the goal of targeting unmeasured confounders [15, 17,18,19, 45]. These approaches would have increased the proportion trimmed for all pairs. While we used a 6-month baseline period to identify new users of specific medications of interest, it could be possible that these individuals had been on these medications prior to this baseline period; hence, they were only “apparent” new users of the drugs. Additionally, we evaluated some chronic diseases in the 2 years prior the new user date but did not require continuous insurance coverage during that period; therefore, some diagnosis codes could have been missed due to loss of coverage and resulted in underestimation of the proportions of certain diseases. Moreover, using antipsychotic medication prescription as a proxy for PD psychosis is not ideal; however, a previous study has shown that hallucinations were underestimated using diagnosis claims in Medicare data [51]. Lastly, our database includes only commercially insured adults in the US, which is not representative of the entire US population, and studies in younger populations might yield different results.

Conclusions

Physician, patients, insurers, and other decision-makers constantly seek to evaluate the safety, clinical, and economic benefits of newer treatments for neurological conditions, particularly those that may have advantages over existing drugs. This constant evaluation, combined with the surge in newly available and forthcoming therapies for neurological diseases, will likely lead to an increase in the number of observational studies that assess the clinical and economic benefits of new drug therapies, even though these types of data are prone to the effects of differential prescribing. Confounding due to channeling bias is critical to understand because it can lead to biased conclusions in RWE studies. We demonstrated that there is evidence of channeling in the prescribing of new treatments for neurological disorders and that RWE studies of medication effectiveness and safety in neurology may have limited generalizability if initiated too soon after the marketing of a new medication. Additionally, researchers may want to consider prevalent new-user designs if these new medications are frequently prescribed to those with prior treatment. Overall, researchers should report the methodological approaches used to explore and address these issues.