FormalPara Key Points for Decision Makers

A growing body of research is investigating how rheumatoid arthritis treatments such as biologics may be targeted to those who would benefit the most.

Despite research and development being directed towards prescribing algorithms to target medicines such as biologics, whether patients or potential patients are receptive to these new approaches and what drives their preferences remain unknown.

On average, patients and members of the public preferred the stratified approach.

1 Introduction

Although there have been developments in the treatment of rheumatoid arthritis (RA), significant heterogeneity in the rates of response to both cheap and expensive therapies remains [1]. Second-line treatment with biologics, including abatacept, tocilizumab and rituximab as well as anti-tumour necrosis factors such as etanercept, adalimumab, golimumab, certolizumab-pegol and infliximab, may also be associated with some common adverse drug reactions (ADRs) affecting 10–20 of 100 people treated. Relevant ADRs include injection-site reactions and infections [2]; whilst many reactions are minor, serious ADRs also occur and include sepsis or pneumonia [3].

The risk of ADRs, the high non-response rates and the associated delay to the start of an effective treatment means prescribing biologics can be harmful to some individuals [4]. As a consequence, there is considerable interest in trying to reduce these harms by develo** mechanisms to target treatments, such as biologics, for RA. These targeting strategies require an understanding of clinical and demographic variables in addition to biologic factors (biomarkers) that drive the heterogeneity in treatment response. There are ongoing research programmes that aim to develop targeted (personalised medicine) approaches to the use of biologics to select treatments that improve the probability of response and minimise the risk of an ADR and thereby improve the health of people with RA, in addition to using healthcare resources effectively [1, 5].

Current research suggests that no single biomarker will predict a safe and effective response to a biologic [6]. Instead, targeting biologics in RA is likely to be achieved through multiple genetic tests and/or other factors such as proteins, transcriptomes or patient characteristics combined in an algorithm to guide prescribing (a prescribing algorithm) [7, 8]. Prescribing algorithms for targeting safe and effective healthcare are characterised by their accuracy to predict who will and will not safely and effectively respond (positive predictive value [PPV] and negative predictive value [NPV]). The predictive value of a prescribing algorithm may be improved by incorporating more variables; however, this could involve additional tests (with an associated financial cost), which could potentially delay the start of treatment. Researchers must therefore decide when a prescribing algorithm is ‘sufficient’ in terms of whether the marginal benefit of adding additional information is outweighed or equal to the marginal cost of collecting and processing such information. Likewise, patients (current and future) must compare the prescribing algorithm (a personalised medicine approach) with conventional approaches to selecting treatment and dosage and balance the associated benefits and risks.

It is currently unknown how individuals feel about the benefits and risks associated with a personalised approach to treatment in RA. Discrete choice experiments (DCEs) [9,10,11] have been used to quantify preferences for the benefit and risk trade-offs [12] associated with biologics such as the balance between frequency and mode of administration [13,14,15,16,17,18,19,20] and for personalised (or stratified) approaches to healthcare more generally [21, 22].

The primary aim of this study was to understand preferences for the benefit–risk trade-offs of a prescribing algorithm-based personalised approach to the selection and dose of biologic for the treatment of RA. A secondary aim was to estimate the acceptability of the algorithm at different predictive values to inform research into and development of stratified approaches.

2 Method

This study used an online DCE to elicit the preferences of a sample of people with RA and members of the public (potentially representing future patients) for a hypothetical prescribing algorithm to target the selection and dose of a biologic for RA (hereafter ‘biologic calculator’) or the conventional approach. The biologic calculator was presented as an option that provided additional information to the clinician than usually available when making a prescribing decision. The study was designed and reported in line with published recommendations for healthcare DCEs [23, 24]. In the experiment, respondents were asked to choose between two algorithm-based approaches (biologic calculator A and B) and an opt-out alternative of ‘conventional prescribing’. The term ‘biologic calculator’ was used as a more user-friendly term than ‘prescribing algorithm’.

2.1 Survey Design

The survey was formatted and presented online using Sawtooth software [25]. The final survey comprised three sections containing training materials to explain the purpose of the survey, including a description of a biologic calculator and relevant attributes and levels; the choice questions; and questions about the respondent’s background. The final survey can be found in electronic supplementary material (ESM)-1. Approval for the study was obtained from The University of Manchester’s Research Ethics Committee (project ID: 2576).

2.2 Attributes and Levels

A preliminary set of attributes was initially identified by a literature review to understand the benefits and risks presented in other studies quantifying preferences in RA [13, 14, 16,17,18,19, 26], although none of the studies reviewed looked at preferences for personalised approaches to prescribing specifically. The final set of attributes was selected through an iterative process of consultations with clinicians and patients, including interviews with three clinical experts and meetings at five patient support groups (attended by 51 individuals in total) in England and Scotland.

The patient groups were used to ensure that the preliminary attributes identified and confirmed as clinically relevant by the experts were relevant to their treatment choices. These patient group discussions also used materials from a published qualitative study [4] exploring patient views about predictive testing to guide treatment. The qualitative study suggested patients were concerned about delays to starting treatment, the benefits and risks of tests, and accuracy of tests. These initial themes, including additional delay to starting treatment, the accuracy of the tests, the ability of the tests to predict response and risk of infection, provided a starting point to guide discussion with the patient groups and were used to bring meaning to different terms (specifics about ‘accuracy’) to develop these ideas into attributes for use in the DCE.

While collecting views about the attributes with patients, a suitable range of levels was also explored (and later verified through consultation with experts develo** the stratified approaches and an appraisal of published data to ensure clinical meaningfulness).

The patient groups in this study were initially identified through the MATURA (MAximizing Therapeutic Utility in Rheumatoid Arthritis) network. The original patient group comprised individuals with RA interested in research seeking to personalise prescribing approaches. After initial discussions with this group, we contacted other patient organisations, including the National Rheumatoid Arthritis Society, and discussions were held with patients across England and Scotland.

The meetings with patient groups occurred a few weeks apart to allow for the materials to be developed based on feedback from the previous group discussion. Formal qualitative data collection did not occur, and no analysis took place. Instead, field notes were discussed within the research team to make changes to the DCE design. Issues raised included changing the term ‘personalised medicine’ as this seemed to imply that the approach was precise or could ‘perfectly’ predict response. Some participants expressed concerns about small issues not salient to the majority of patients, so these were explained in the DCE training materials rather than as attributes in the study. Examples of these included time to discuss treatment choices with the doctor and also family and what happens to data collected by the calculator (that it is securely stored with medical records and not available to other organisations). As the results were specific to patient groups, additional pre-testing was undertaken to ensure final attributes were generalisable to the DCE sample (see Sect. 2.6).

Eventually, five attributes were selected: additional delay to the start of treatment; PPV; NPV; risk of a serious infection; and cost saving to the UK National Health Service (NHS). Each attribute was assigned four levels (Table 1).

Table 1 Attributes, attribute definitions and levels

The levels for additional delay to starting treatment with the biologic calculator were allowed to vary from zero (no delay) to 30 additional days to reflect the variation in the number of tests potentially required for the biologic calculator. For example, next-generation sequencing of four or more genes takes around 30 days (the maximum) [27,28,29], whereas smoking status or body mass index can be determined on the same day in the clinic (the minimum). The opt-out (conventional approach) option assumed there was no additional delay to starting treatment.

No biologic calculator currently exists, so the plausible levels for PPV and NPV were assigned with the assistance of three experts currently involved in research programmes develo** prescribing algorithms to target biologics. The opt-out (conventional prescribing) option assumed that no predictive information about the probability of a safe or effective response was available to the prescriber. The level assigned to the conventional approach was therefore defined as equivalent to a 50% chance of a correct prescribing decision.

The levels for risk of infection from a biologic were determined from a literature review; we identified that the risk of serious infection for patients with RA was between 2 and > 7.5% (for combination biologics) depending on the treatment [30]. The maximum risk of 10% was chosen for the opt-out option to reflect the possibility that the biologic calculator could predict those who were more likely to experience an infection (e.g., by looking at their age or immune profile). The attributes and levels for risk of an ADR and PPV/NPV were explained using both percentages and icon arrays.

Biologic therapies used to treat autoimmune conditions are expensive (around £4000–10,000 per patient per year [1]), so the levels would be very high if an individual had to pay for their treatment. The attribute ‘cost saving to the NHS’ allowed a monetary valuation of the prescribing approaches without the use of individual willingness to pay. The values for the cost attribute were developed after consultation with three experts on the potential savings minus the cost of the additional tests required for the biologic calculator [31].

2.3 Experimental Design

Presenting all possible combinations of attributes and levels (see Table 1) into two alternatives would result in 1,048,576 possible scenarios. A subset of these scenarios (a fractional factorial) was identified using experimental design methods (minimising the D-error) with Ngene software [32]. A design was generated to allow the estimation of ‘main effects’, which means the design allows the analysis to focus on the direct effect of each attribute rather than interactions between attribute levels. The experimental design incorporated conjectured priors to indicate the expected direction of the attribute (i.e., that cost saving would be positive, that infection risk would be negative). Detailed priors from pilot work were not incorporated in the study design to allow for replication of the study with different patient samples without changing the experimental design. The optimum number of choice sets was guided by the pilot work. The final design comprised five choice sets (see Fig. 1), and an additional choice set was added to use as a ‘dominance check’ to verify the respondents were answering in line with economic theory. Each respondent thus completed six choice sets in the survey.

Fig. 1
figure 1

Example choice question. NHS UK National Health Service

2.4 Training Materials

Training materials are used to describe the background to the choice questions and provide information for respondents to enable them to make choices in the subsequent DCE. In this study, all respondents needed to understand the potential role of a prescribing algorithm for biologics, for which this study used the more layperson-friendly term ‘biologic calculator’. The biologic calculator was described in the training materials in terms of its component characteristics, including the levels of predictive values (PPV and NPV) and risk of an adverse reaction to a biologic and potential cost saving to the NHS. Animated training materials were developed after discussions with three clinical experts and programmers who implemented the online animation based at MindBytes© (https://mindbytesplatform.be/demos/stratified-biologics-RA-choices/).

The animation included a scripted storyline [33] where an avatar (called ‘Alex’), presented as a gender-neutral stick figure, described key concepts relevant to the choice task. The setting of the story was dynamic, with different backgrounds to indicate the location (‘in hospital’ or ‘at home’). The story had a simplistic structure and is described in more detail in Vass et al. [34]. Briefly, the central character (Alex) is followed from having symptoms and receiving a diagnosis of RA to visiting their doctor who explains how first-line treatments may fail and that in these cases, biologics may be tried. The story goes on to explain that the choice of biologic and the starting dose will be made by a clinician who can use a new technology (the biologic calculator) to guide their decision. The biologic calculator provides the clinician with additional information to that usually available when making a prescribing decision. The final part of the story explains the relevant attributes that describe the biologic calculator and that choosing a prescribing approach requires trade-offs of benefits and risks. The attributes were explained using graphics and visuals such as risk grids to aid understanding of probabilistic information [35].

2.5 Background Questions

Respondents were asked to complete the choice questions and then a series of background questions about themselves, including a self-reported measure of health status (EQ-5D-5L). The responses to the EQ-5D-5L were translated into a score representing current levels of health status (where zero equates to death and one perfect health) using a published tariff of preference weights for a population in the UK [36]. Key sociodemographic characteristics (age, sex, employment status, etc.) were also recorded. People with a diagnosis of RA were asked specific questions about their disease history, including time since diagnosis and experience of biologics.

2.6 Piloting

The DCE survey went through extensive piloting conducted in two phases to refine questions and terminology. In phase 1, a sample of respondents from a patient group (n = 7) involved in the identification of attributes and levels completed the survey independently online and provided feedback using free-text comments followed by a group discussion at a subsequent patient meeting. In phase 2, a quantitative pre-test was conducted with members of the public (n = 100) recruited via the internet panel.

2.7 Study Population and Sample

The link to the online survey was sent to a sample of the public (potential future patients) and current patients both recruited through an internet panel provider, ResearchNow® (now called Dynata®). A ‘current patient’ was defined as a person aged ≥ 18 years reporting a diagnosis of RA and residing in the UK. No restrictions were placed on date of diagnosis, disease activity or treatment experiences. In the public sample, respondents had to be aged ≥ 18 years and residing in the UK without a diagnosis of RA. Sample size calculations are a challenge in DCE studies and typically require estimates of the effect sizes that were unknown in this study [37]. A sample of ~ 150 patients and ~ 150 members of the public was chosen based on the results of another comparable study [19].

2.8 Analysis of Data

The choice data were analysed within a random utility framework [38] using a mixed logit model. The analysis aimed to quantify the relative importance of each attribute in the individual’s utility function, which was specified as in Eq. 1:

$$ U_{{nj}} = \beta _{{{\text{conv}}}} + \beta _{{1~}} {\text{Delay}}_{{nj}} + \beta _{2} {\text{PPV}}_{{nj}} + \beta _{3} {\text{NPV}}_{{nj}} + ~\beta _{4} {\text{Risk}}_{{nj}} + \beta _{5} {\text{Cost}}_{{nj}} + \varepsilon _{{nj}} , $$
(1)

where U represents an individual’s (n) indirect utility for an alternative (j); \(\beta _{{{\text{conv}}}}\) is an alternative-specific constant (ASC) for the opt-out (conventional) option (this ASC captures differences in the mean of the distribution of the unobserved effects in the random component, \(\varepsilon _{{nj}}\), between the opt-out [conventional approach] and the other alternatives [biologic calculators]); and \(\beta _{{1 - 5}}\) are preference weights associated with each of the five attributes in the DCE.

Mixed logit models are a type of discrete choice model that account for the nature of the data including a binary dependent variable (set to one where the respondent chose the option in the choice set and to zero where the respondent did not choose the option in the choice set). Mixed logit models also recognise that the panel (the sample of respondents) each made multiple choices (completing six choice sets each) and can account for preference heterogeneity that cannot be observed by estimating a distribution around the estimated mean of each attribute (preference parameter). In this study, the random parameters used in the mixed logit model were assumed to be normally distributed. The attributes were assumed to be continuous and modelled as linear. Models for the patient and public samples were estimated separately to avoid issues of scale heterogeneity (see McFadden [38] and Vass et al. [39]).

2.9 Balancing Benefits and Risks

The balance between benefits and risks, represented by different attributes describing the biologic calculator, were quantified through estimating marginal rates of substitution (MRS), representing how much more of one attribute respondents are ready to tolerate in exchange for higher levels of another. Confidence intervals around the mean MRS estimates were approximated using the Delta method [41].

3 Results

The final study sample comprised 142 members of the public and 151 patients who completed the survey between 14 and 17 November 2017. Sociodemographic data for the study sample are shown in Table 2. There were clinically meaningful differences in self-reported health status because patients with RA had a much lower level of health (mean EQ-5D score 0.650; standard deviation 0.312) than those in the public sample (mean EQ-5D score 0.902; standard deviation 0.151) [36]. ESM 2 presents the distributions of the health status scores, and ESM 3 provides additional summary statistics for the study sample.

Table 2 Sample characteristics

3.1 Results of the Mixed Logit Models

The results of the mixed logit models estimated for each sample (patient and public) are shown in Table 3. In the public sample, the directions of all attributes aligned with a priori expectations: delay and risk were seen as negative, whereas PPV, NPV and cost saving to the NHS were viewed as positive. All attributes were statistically significant at the 10% level except cost saving to the NHS. The large, negative and statistically significant ASC indicates that the public sample would prefer personalised approaches over conventional prescribing; however, the standard deviation suggests statistically significant preference heterogeneity. There was also statistically significant heterogeneity in preferences for three specific attributes (PPV, risk and cost).

Table 3 Results of the mixed logit models

The direction of attributes in the patient sample also aligned with a priori expectations. In the patient sample, the attributes delay, NPV and cost were not statistically significant. Similar to the public sample, the ASC indicated the patient sample would prefer personalised approaches over conventional prescribing, all else equal, and there was significant heterogeneity for the attributes PPV, risk and cost.

3.2 Marginal Rates of Substitution

The coefficients for the patient and public sample estimated from the mixed logit models (Table 3) may not be directly comparable because of scale heterogeneity [39, 40]. The ratios of the estimated coefficients, representing the marginal rate of substitution, can be directly compared across the two samples [39]. Assuming a linear and continuous specification for each attribute, marginal rates of substitution are presented in Table 4. For the test accuracy, members of the public were willing to accept the largest increase in risk (2.1%) for a 10% increase in NPV, whereas an increase in NPV was not statistically significant for patients who were only willing to accept an increase in risk (of 0.65%) for a 10% increase in PPV. The overlap** confidence intervals show little difference in the preferences of patients and the public.

Table 4 Marginal rates of substitution

4 Discussion

The results of this DCE suggested the individuals who took part in this study, representing current or potential future patients, were influenced by the predictive value of the prescribing algorithm (‘biologic calculator’). Although both PPV and NPV were statistically significant in the public sample, only PPV was statistically significant in the patient sample. However, the large, negative and statistically significant constant implies patients preferred the personalised approach regardless of the level of NPV. The finding that individuals would prefer personalised approaches (using a prescribing algorithm) over conventional approaches, all else held equal, indicates that there was nothing about the prescribing algorithm that the individuals sampled inherently disliked, despite some concerns in the literature about perceptions that the public and patients may have about ‘artificial intelligence’-driven decision making replacing clinical decision making in healthcare [42].

The attribute ‘cost saving to the NHS’ was not statistically significant for either patients or the public sampled in this study. The valuation elicited for the prescribing algorithm in this study was not a traditional willingness to pay from the individual’s perspective but instead a willingness to save money for the healthcare system. We are not aware of other studies eliciting willingness to save for the health system; however, ‘traditional’ estimates of willingness to pay for relief of RA symptoms has been valued at around £550 per month [43,44,45]. Framing cost as a willingness to save for the healthcare system provides a valuation from a ‘socially inclusive’ perspective rather than the individual’s ‘personal’ valuation acquired when cost is framed as a charge or out-of-pocket expense [46]. The socially inclusive perspective is relatively less common and limits the comparability of the monetary valuations estimated in this research.

The results of this quantitative study mirror some of the published qualitative investigations into patients’ views of predictive testing to stratify RA treatments [4]. A focus group-based qualitative study found that many patients welcomed a new personalised approach to prescribing. Reducing delays to an effective treatment was an important consideration. Delay was an important attribute to the public in this quantitative study, but it was not statistically significant in the patient sample.

Some examples of studies considering stratifying technologies in other therapeutic areas exist [47]. For example, Powell et al. [21] identified patterns of importance that were similar to those observed for a biologic calculator in a study eliciting preferences for pharmacogenetic testing in epilepsy. Najafzadeh et al. [48] used a DCE administered to the public and patients to elicit their preferences for a genomic test to predict drug response for cancer treatment and found that a delay in receiving treatment caused by the test turnaround time was valued lowest ($650). Najafzadeh et al. [48] found some statistically significant differences between the preferences of patients and the public for a genetic test to guide cancer treatment. The preference weight for the highest level of sensitivity was larger for patients than for the public. Patients were also more likely to opt out of testing.

Limitations of the study include the use of internet recruitment for the final survey study, which may restrict the generalisability of the results as the results may not necessarily be representative of the general public or the average patients with RA. Although patient groups were utilised in the survey development, internet panels were chosen to acquire a survey sample quickly and relatively inexpensively; however, the respondents to internet panel DCEs are likely to be computer literate. The patient sample also relied on self-reported diagnoses of RA. The patient sample had substantially lower EQ-5D scores than the public (see ESM 2), as expected for respondents with a chronic condition. Although popular in healthcare DCEs [9], the advantages and disadvantages of using internet panels for DCEs have yet to be thoroughly explored. There is some emerging evidence from the health stated-preference literature that suggest internet panels generate preference data comparable to those from mail-based surveys [49] but are superior in terms of response rate [50]. Other health-valuation studies have found they provide good-quality data compared with other methods such as postal surveys and telephone interviews [51].

Patient groups were used because we felt the members would be confident to articulate views relevant to the research as they were somewhat familiar with the content. For example, we could talk about the personalised approach to prescribing without first having to explain treatment with biologics as even patients on methotrexate were familiar with these medicines. However, patients engaged in research and patient groups were not purposefully recruited for the final study, and the views of the members could differ from the views of respondents.

The study also relied on patients’ self-reported diagnosis of RA. Members of the patient groups and those completing the survey online may erroneously believe that they have RA when they actually have another type of arthritis or inflammatory disorder. The survey materials were carefully developed to explain the disease area, and we hoped that those who did not identify with the patient scenario presented would exit the survey. Of course, this cannot be guaranteed and is therefore a limitation of the study.

All attributes were continuous and were thus modelled as linear in the utility functions. With the available sample size, investigations into alternative specifications were not feasible. As with many survey studies, the findings from this sample may not be generalisable to the wider population. The study focus was limited to preferences for stratifying treatments with biologics in RA, but subsequent studies could use a similar method in other disease areas such as psoriasis or systemic lupus erythematosus. Furthermore, it remains unclear whether patients are truly the actual demanders of healthcare, particularly in stratified medicine where there are many key stakeholders. For example, patients often rely on their clinicians’ advice, and this may drive their healthcare choices [52]. In addition, service commissioners deciding whether to introduce new technology into clinic may not believe patient or public preferences offer the most appropriate viewpoint for their decisions [53].

5 Conclusion

The results of this DCE suggest that individuals are open to personalised approaches to treatment with biologics using a prescribing algorithm. This study suggested that preferences for the personalised approach, including PPV, risk and cost saving, were heterogeneous in samples of patients and the public. Researchers develo** new approaches to personalised medicine should pay close attention to the predictive value (both positive and negative) of the mechanism being developed to target therapy.