INTRODUCTION

The goal of primary care is to provide continuous, comprehensive, and coordinated care.1, 2 However, the addition of a population health perspective, along with increased specialization of care, can impede achievement of those goals.3 The number of clinical issues addressed during primary care visits has grown,4 along with the complexity of diagnostic testing and prescribing, increasing the need to coordinate with multiple providers.5 To reflect changes in what primary care providers (PCPs) do, new models of healthcare delivery have been implemented.6, 7 The Centers for Medicare and Medicaid Services (CMS) Innovation Center was created to test new models of healthcare delivery to improve care quality while lowering costs.8 The Innovation Center’s advanced delivery model for primary care, Comprehensive Primary Care Plus (CPC+), was launched in 2017.

The goal of the CPC+ model is to promote coordinated, patient-centered care. To do so, the model employs innovative methods to measure and improve access and quality, including expanded use of the electronic health record (EHR) and patient-reported outcome measures (PROMs).9 The digitization of healthcare has provided opportunities to improve the patient-centeredness and quality of care10,11,12, including incorporating PROM data into the medical record by collecting these data directly from patients electronically and merging the data into EHRs.13, 14

PROMs provide reliable information coming directly from patients about what they are able to do (i.e., their functioning) and how they feel (i.e., their symptoms). Interest in incorporating PROMs into clinical practice is based on growing evidence that this information can help clinicians and patients to improve patient outcomes by supplementing information provided by traditional clinical measures.15,16,17,18,19 PROMs also have the potential to serve as a measure of healthcare quality, such as the amount of improvement in functioning, or reduction in symptoms, that occurs over a period of time.20, 21 Moreover, deployment of patient-reported outcome performance measures (PRO-PMs) could enable a shift to providing the goal-oriented care especially needed for people with multiple chronic conditions (MCCs).22

Recognizing the potential value of incorporating PROMs into an enhanced primary care delivery model, the Innovation Center sought to identify a short list of PROMs to consider for use in the CPC+ model. The research we detail here was designed to help the Innovation Center determine the one or two best PROMs for use in primary care to enhance patient care and for develo** PRO-PMs to evaluate performance. We had four research objectives:

  1. 1.

    Identify high-priority, patient-reported outcome (PRO) domains that would provide useful information to guide PCPs, with the emphasis on health-related domains important to patients with MCCs

  2. 2.

    Identify existing PROMs for those domains

  3. 3.

    Identify criteria to evaluate and compare the candidate PROMs

  4. 4.

    Select, using the criteria, the one or two best PROMs for each PRO domain

METHODS

Overview

The sheer volume of possible PRO domains (for example, the Patient-Reported Outcome Measurement Information System [PROMIS] alone includes 97 PRO domains) and possible PROMs to assess them required a comprehensive, yet result-tailored, approach. We based our recommendations for PRO domains and PROMs on a synthesis of qualitative and quantitative data gathered from key informant groups and an environmental scan (Tables 1 and 2).

Table 1 Key Informant Group Data Collection and Analysis Methods
Table 2 Most Frequently Mentioned PRO Domains Across Key Informant Group Input and Environmental Scan of the Literature
Table 3 Criteria Used To Evaluate PROMs

The purpose of the key informant groups was to identify the priority topics that stakeholders in primary care quality assessment thought should be addressed, factors they would advise us to consider when selecting a PROM, and their experiences with specific PROMs. The purpose of the environmental scan was to identify recommendations in the published and gray literature regarding (1) PRO domains for assessment in primary care, (2) PROMs to assess those PRO domains, and (3) criteria to apply to identify the best PROMs. After identifying a small subset of PRO domains and ten candidate PROMs, we conducted a targeted review of the evidence base to summarize the measurement properties of each candidate PROM. The relationship between our data sources and our research objectives is summarized in online appendix 1 and detailed below.

Key Informant Data Collection Methods

We tailored our methods for collecting key informant data to the preferences of each type of key informant group. The methods used to analyze the data were tailored to the nature of the data. Table 1 summarizes the characteristics of each key informant group and the characteristics of the data collection and analysis methods for each.

Methods for Targeted Environmental Scan, Evaluation of Yield, and Data Synthesis

We briefly describe below how we tailored targeted environmental scan strategies to each research objective. Details about the search strategies are available in online appendix 2.

Objective 1: Identify High-Priority PRO Domains for Primary Care

Before we could select candidate PROMs, we had to identify a subset of high-priority PRO domains that should be measured. To do so, we searched professional society websites, selected patient group websites, and published and gray literature describing primary care patient and clinician priorities.23,24,25,26,27,28,29 We also conducted searches in the Google Scholar using published search strategies developed for a similar purpose.30 Next, we took a census of the PRO domains that emerged from these searches and identified those that were common across the sources. Finally, we compared this subset of domains to those identified via key informant data collection to select the subset of domains common to all or most sources.

Objective 2: Identify Candidate PROMs for PRO Domains

Once a subset of four high-priority PRO domains was selected, we developed a targeted PubMed-search strategy for each, conducted additional searches in EMBASE and PsycINFO, and searched reference lists of articles. For three of four PRO domains (ability to participate, depressive symptoms, and physical functioning), our initial searches for PROMs yielded far too many articles to review. For those domains, we searched for review articles published in the past 5 years. We did not find a published, comprehensive review of multiple PROMs for self-efficacy; however, we were able to obtain an unpublished review from the researcher (Dr. L. Shulman) who developed the self-efficacy measures for PROMIS. We supplemented this review with a search of the PROMIS measure database, as the Shulman review did not contain any PROMs appropriate for primary care. We excluded from consideration PROMs that would not be appropriate for patient self-report to assess primary care performance in the USA. Specific reasons for excluding PROMs from candidacy are provided below.

PROMs Excluded

Table 4 Scores for Candidate PROMs, by Criterion and Total

• For which there was no English language translation

• That had never been used in the USA

• For which modes of administration did not include self-report

• That were specific to a particular condition

We merged information on PROM candidates with information identified via key informant data collection.

Objective 3: Identify Criteria to Evaluate PROMs

To identify PROM selection criteria, we conducted environmental scans31, 32 based on an initial set of 15 guidelines from standard setting bodies detailing evaluation criteria for PROMs and quality measures. The criteria included in each document along with citations to the documents is presented in online appendix 4. We then searched the reference lists of that initial set to identify additional sources. Next, we conducted a content analysis of the yield of this review and the qualitative data obtained from key informants to identify a consensus set of evaluation criteria. Finally, we compared the resulting set to the measure evaluation criteria put forth by the National Quality Forum.

Objective 4: Select the Best PROMs for Primary Care

We developed a schema in which we assigned a PROM one point for each criterion for which there was supporting evidence. When the evidence for a criterion was mixed, the PROM received a half point. When there was no evidence or when the only evidence found was unfavorable to the PROM, the PROM received no point for that criterion. We did not use previously published methods to “score” the appropriateness of PROMs, because they lacked criteria specific to the application of the PROMs in primary care or performance measurement.33, 34

RESULTS

Objective 1: PRO Domains

The row headings in Table 2 provide a list of all the PRO domains that were identified in the thematic analysis of interview and focus group results, confirmed in the survey of payers, or included in gray or published literature (see online appendix 3 for additional detail). The last column in Table 2 shows the degree of consensus, across different sources, on which PROs may be important to patient care. The PRO domains with the greatest support across stakeholders and literature were as follows:

  • Ability to participate in social roles

  • Depression

  • Pain

  • Physical function

  • Self-efficacy for managing one’s health/chronic condition

We proceeded with identifying PROMs for each of these PRO domains except pain due to the current controversy over the potential unintended consequence of a mandate to assess pain.35

Objective 2: Candidate PROMs

Initially, we identified a total of 503 PROMs for the four PRO domains. We eliminated approximately 85% of these (n = 429) because they did not meet selection criteria. We reviewed 74 PROMs in greater depth. Of these, we eliminated 64 because the PROM questions did not match the PRO domain or because using the PROM would not be feasible (that is, the PROM was too long or too difficult to obtain). Our PROM screening procedure is illustrated in Fig. 1.

Figure 1
figure 1

Flow of published evidence, by domain, through the review process. The asterisk indicates the most common reason for exclusion. US, United States of America.

After applying all the exclusions documented in Fig. 1, there were 10 PROMs for which we searched the primary literature to obtain additional evidence:

Ten PROMS Selected for Further Analysis

Ability to participate

• Keele Assessment of Participation (KAP)

• PROMIS Ability To Participate in Social Roles and Activities Short Forms

Depressive symptoms

• Patient Health Questionnaire-9 (PHQ-9)

• Center for Epidemiologic Studies Depression Scale—Revised (CESD-R)

• Patient-Reported Outcome Measurement Information System (PROMIS) Depression Short Forms (SFs) and Computer Adaptive Tests (CATs)

Physical function

• PF-10 (Short Form-36’s physical functioning 10-item subscale)

• PROMIS Physical Function Short Forms and CAT

Self-efficacy

• Self-Efficacy to Manage Chronic Disease Scale (SEMCD

• PROMIS Self-Efficacy for Managing Symptoms Short Forms and CAT

• PROMIS Self-Efficacy for Managing Medications and Treatments Short Forms and CAT

Objective 3: PROM Selection Criteria

We identified 15 measure standards documents and related resources which provide PROM evaluation criteria (summarized in Table 3). A content analysis of the evaluation criteria across these 15 documents is presented in online appendix 4. Online appendix 4 also details how the PROM selection criteria described in the 15 measure standards documents relate to the five broad categories of the NQF quality measure evaluation criteria: (1) importance, (2) scientific acceptability, (3) feasibility, (4) usability and use, and (5) related and competing measures. We describe key considerations related to each NQF criterion below as well as additional guidance regarding these criteria provided by key informant groups.

Importance

Primary care PRO domains should be relevant and meaningful to stakeholders. Practice representatives and patients expressed the desire to choose PROMs that meet specific health-related patient needs. Clinical thought leaders, practice representatives, and payer representatives identified the importance of selecting a PROM whose results lead to specific clinical actions, are useful for tracking patients’ health, and identify gaps in care. They recommended that (1) PROM and PRO-PM scores be sensitive to a change in clinical practice and (2) scores distinguish between high and low performing primary care practices.

Scientific Acceptability

Each of the 15 standards documents included scientific acceptability as a PROM selection criterion. Aspects of scientific acceptability included (1) availability of a conceptual and measurement model for the PROM; (2) empirical evidence for the reliability, validity, and responsiveness of PROM scores; and (3) availability of aids to support score interpretation. The availability of interpretive aids improves feasibility and usability as well. For example, clinical thought leaders indicated that a PROM would be useful if it produced results that were easy to understand and had interpretive aids for PROM scores such as graphs.

Feasibility

Feasibility refers to ease of implementing the PROM in primary care practice. For example, PROM length, availability of different formats, including electronic, and whether there is guidance to practitioners about how to use the PROM data will determine feasibility. Key informant groups emphasized the need to keep response time to less than 10 min, offer multiple data collection formats (including computer adaptive tests), and make the instrument accessible to patients with impairments. In addition, to reduce administrative burden, payer representatives recommended that the PROM include methods to reduce missing data and enhance data quality, such as those typically available in electronic formats.

Usability and Use

For this criterion, we focused on the Innovation Center’s goal of develo** PRO-PMs to evaluate the quality of care under the CPC+ model. We looked for evidence that the PROM was currently widely used in primary care or that there already was a PRO-PM based on the PROM.

Related and Competing Measures

For any given PROM, multiple PRO-PMs might be specified. Three of 15 sources referenced the need to consider related and competing PRO-PM measures. We did not evaluate related or competing PRO-PMs at this stage because there currently are so few PROMs for which there is even one PRO-PM.

Objective 4: Selection of the Best PROMs

Table 4 presents the results of our assessment of each of the 10 PROMs identified in the four priority domains, sorted by total score. We based the scores on a subset of the criteria identified in our analysis of the 15 standards documents (see online appendix 4) for which there was the most published evidence. For example, the most common evidence for the scientific acceptability criterion refers to the reliability and validity of the PROMs. The most common types of reliability evidence published are for internal consistency (Cronbach’s alpha) and test-retest reliability. The most common types of validity evidence published are for content and for construct validity. For construct validity, we included the specific instances of structural validity (i.e., factor analysis) because that was by far the most common analysis done, and responsiveness because the ability to detect change is important to informing patient care and to evaluating the performance of practices. Other types of construct validity evidence were collapsed into the category “other construct.” For feasibility, we included whether the PROM could be administered electronically and using computer adaptive software because this would shorten the administration time.

Electronic administration also can build in features to improve data quality in real time and immediately populate the PROM data base without the need to enter data manually. The Importance criterion is not included in Table 4 because the domains represented in Table 4 columns were those already determined to be important through the analysis shown in Table 2.

We recommended those PROMs with consistently positive evidence for 70% (8 of 11) of the selection criteria:

Detailed descriptions of these PROMs are available in online appendix 5.

DISCUSSION

Although the term “patient-centered” was introduced over 50 years ago, most measures used in primary care are clinician-centered—such as vital signs and laboratory tests based on biomedical and physical science.61 PROM data can complement conventional clinical measures.10 Exciting efforts to do this at the healthcare system scale are now taking shape in advanced models of primary care like CPC+. Our systematic process of stakeholder engagement and evidence-based review resulted in our identifying (1) four high-priority, health-related domains for primary care (ability to participate in social roles, depression, physical function, and self-efficacy for managing one’s health); (2) criteria for use in selecting PROMs for those domains; and (3) the five best existing PROMs to consider for measuring three of the four high-priority domains: PHQ-9 or PROMIS Depression (for depression), PF-10 or PROMIS-PF (for physical function), and PROMIS Self-Efficacy for Managing Treatments and Medications (for self-efficacy for managing one’s health). The paucity of evidence supporting PROMs for ability to participate in social roles suggests the need for further research and development of PROMs to address this important domain.

Choosing among these recommended PROMs will require comparing the benefits and drawbacks of each. For example, the content of the two recommended depression measures differs in that the PHQ-9 includes questions that are not about depressed mood but are associated with depressed mood (e.g., questions about sleep, fatigue, appetite, concentration, and behavior). In contrast, the PROMIS Depression items are all specific to sad thoughts and mood. The PHQ’s mixed content represents symptoms characteristic of the depression “syndrome” as specified by the DSM-IV. This diversity of content results in lower internal consistency reliability for the PHQ-9 than that observed for PROMIS depression. Factor analysis shows that the PHQ-9 does not measure a single phenomenon, but that PROMIS depression questions measure depressed mood exclusively. In addition, unlike PROMIS, there is no computerized adaptive test for PHQ-9, although there is the two-stage screening process. However, PHQ-9 has an extensive history of use as a performance measure in clinics, providing information about its feasibility and usefulness. We provide additional comparative information about measures for each of the other PRO domains in online appendix 5.

Considerable research shows that comorbid depression increases the impact of chronic disease on patient outcomes; thus, supporting the routine assessment of depression in primary care practices.62,63,64,65 Unlike depressive symptoms, however, physical functioning is not often measured systematically in primary care.66 In our study, stakeholder support for measuring physical functioning may have been due to the large percentage of patients with musculoskeletal disorders (e.g., arthritis, back pain) seen by primary care providers. For such patients, routine use of a physical function PROM applicable across conditions could aid in screening, in decision-making and monitoring, and in accounting for variations in clinical performance. Our research showed that both the PF-10 and the PROMIS-PF have strong evidence supporting their use for this purpose. The advantage of the PROMIS-PF is computer adaptive technology which leads to increased precision and, thus, ability to detect improvement or decline over time.44, 67

Self-efficacy for managing one’s health also has not been routinely assessed in primary care perhaps because improving this outcome is a recent and aspirational goal of healthcare.68, 69 Yet, a major focus of CMS’ advanced primary care models is to include services such as patient education and coaching to help patients manage their chronic conditions.70, 71 The effectiveness of advanced primary care models may be judged, in part, on evidence of patient self-efficacy for managing their own health. Thus, primary care model evaluation research may provide data to describe the measurement properties of self-efficacy PROMs and guide their future use.

One could argue that improving patients’ abilities to participate in valued social roles is the ultimate goal of healthcare; yet this remains a challenging topic for measurement. The ability to participate in social roles is a function of many factors external to healthcare such as opportunity and interest. To the extent that outcomes like depressive symptoms and physical function help determine one’s ability to participate in social roles, the PROMs we recommend for those topics may help to address social role participation as well.

Limitations

The volume of information about PRO domains and PROMs required that we set boundaries on our search for evidence. A different approach to the environmental scan may have surfaced alternative or additional PRO domains and PROMs. Similarly, the volume of evidence that we obtained required that we use a strategy to screen the PRO domains and PROMs. While we described and justified our strategy, the application of different criteria in a different way may have changed the results. For example, the burden associated with each PROM was inferred from the number of items. However, other PROM characteristics also influence burden such as reading level required to comprehend the text, potential sensitivity of the topic, and the way the PROM is formatted. Additionally, we considered the quality of evidence supporting the PROM (online appendix 5), but did not use a formal method such as GRADE.72 Moreover, we used a simple method of scoring PROMs, and criteria were not weighted by importance. For example, evidence for internal consistency reliability for the PROM scores was given the same weight as the existence of a PRO-PM based on the PROM. Our choice of the 70% level of positive support across criteria was arbitrary. If, instead, we had required 75% support, none of the self-efficacy PROMs would qualify, which would mean that just two PRO domains would be measured. On the other hand, the PROMIS Ability to Participate in Social Roles and Activities met 68% of our criteria. If the level of positive support was reduced to 65%, there would be at least one recommended PROM for each of the four PRO domains. Finally, we selected PRO domains according to number of stakeholder groups that recommended them and based on the literature review. Table 2 shows that patients also mentioned cognitive ability, sleep, fatigue, and sexual functioning as important.

Future research with patients should be conducted to evaluate the replicability of our results and supplement them. We focused on identifying consensus PRO domains but did not investigate how the meaning of a PRO domain might differ across stakeholder groups: this is another topic worthy of further research.

Conclusions

We identified two strong candidate measures for each of the core health domains of depression and physical function. Problems in these domains are highly prevalent in primary care patients, especially those with MCCs. Additionally, we identified a strong candidate PROM for a domain important to people living with chronic conditions—self-efficacy for managing their medications and treatments. PCPs can coach and support patients in develo** the required skills. Existing evidence suggests that it should be feasible to incorporate these tools into PRO information systems and EHRs to support primary care practice and performance measures. Further studies will be needed to provide evidence that these PROMs produce information that is useful to clinicians, patients, and practices. If this evidence supports the effectiveness of these PROMs and performance measures, they can be used to promote patient centeredness and convey the value and quality of primary care.