Introduction

Most patients with rheumatoid arthritis (RA) have multimorbidity, but not all. Prior studies found that between 50 and 84% of patients with RA had some comorbidity with a mean of 2 comorbidities based on the Charlson Index [1, 2]. In addition, patients with RA develop comorbidities at an increased rate after the diagnosis of RA compared with matched controls [3]. Some of the excess morbidities may be directly related to RA (i.e., interstitial lung disease) and others are likely part of the systemic inflammatory milieu caused by RA. Comorbidities are important in RA as they strongly associate with disease activity, response to treatment, and overall mortality [4, 5].

Several prior studies have used machine learning (ML) to identify clusters of comorbidities in RA [6, 7], and similar clustering analyses have been pursued across other rheumatic diseases [8, 9]. Develo** clusters of comorbidities through the use of ML is relatively easy to achieve, but it is important to consider the purpose of the clustering: do clusters of comorbidities (versus individual comorbidities) provide new insights or act to predict or possibly explain clinical outcomes. While prior comorbidity cluster studies have created clusters, it has not always been clear what motivated prior studies. In addition, prior comorbidity cluster studies have been derived from single academic medical centers with unclear generalizability. They also have defined comorbidity clusters using data at one point in time without respect to the longitudinal accumulation of comorbidities.

Relatively little work has focused on determining the value of comorbidity clusters in the longitudinal modeling of clinical outcomes. We compared results for different ML algorithms employed to cluster patients based on comorbidities among RA patients in CorEvitas, a large US-based registry. We assessed how clusters of patients with given comorbidities predict future outcomes, including physical function and RA disease activity, and compared the prediction of outcomes using comorbidity clusters versus individual comorbidities. We hypothesized that the supervised ML algorithms would be fitted to predict outcomes as well as individual comorbidities.

Methods

Study population and design

We used the CorEvitas RA registry to identify a cohort of patients potentially eligible. From this group, patients were required to have at least 6 years of experience in the registry, between 2011 and 2021, but patients could have entered the CorEvitas before 2011. The first visit in the CorEvitas RA registry was considered baseline with follow-up through the last visit in the registry. The full longitudinal dataset was used to identify patients in comorbidity clusters during the first phase of these analyses. For the second phase, the comorbidity clusters were assessed using the first 3 years of consecutive available data with the next 3 consecutive years used to determine clinical outcomes.

Comorbidities of interest

The comorbidities of interest are collected at baseline and then updated in CorEvitas. These include conditions summarized in Supplemental Table 1. The list of comorbidities included is quite similar to what has been reported in prior papers examining frequent comorbidities in RA [2, 10]; this grou** of comorbidities has been found to be associated with relevant clinical outcomes in RA.

Comorbidities are recorded at the time of enrollment in the registry and updated by patients and clinicians at subsequent visits that typically occur twice per year. Since we focused on chronic comorbidities, i.e., comorbidities accumulated over time. In other words, if one of these chronic comorbidities (e.g., diabetes or coronary artery disease) was reported, then it was assumed to be ongoing at subsequent visits.

Specific questions on comorbidities in CorEvitas changed in 2011. To determine the impact of changes in the collection of comorbidities, a secondary analysis was conducted only using participants who entered in 2011 or after (see Supplemental Table 2). The reporting of comorbidities appeared similar to the total cohort. Thus, this sub-analysis was not pursued further.

Outcomes

The first phase of analyses focused on deriving comorbidity clusters using ML algorithms; therefore, the clusters were the outcomes. The second phase focused on whether comorbidity clusters associated with future clinical outcomes. The clinical outcomes of interest in phase two were the clinical disease activity index (CDAI) and function as measured by the Health Assessment Questionnaire—Disability Index (HAQ-DI) [11, 12].

CDAI and HAQ-DI are measured at almost all visits in CorEvitas. CDAI is a continuous scale from 0 to 76 with well-accepted thresholds for different levels of disease activity [11]. CDAI includes four components: patient global arthritis activity (0–10), physician (assessor) global arthritis activity (0–10), tender joint count (0–28), and swollen joint count (0–28). Since we assessed the outcomes during the final 3 years of the study period, the time-averaged CDAI from those years was used as the primary disease activity outcome. The time-averaged CDAI was calculated based on a weighted average of the CDAI, using the number of months between visits as the weighting factor. In other words, the CDAI at a given visit was multiplied by the number of months after a given visit; each segment (CDAI x months) was added together and then divided by 36 months. A secondary outcome was the change in time-averaged CDAI between the first 3 years and the second 3 years of the study period.

The HAQ-DI encompasses 20 items across eight domains, each item scored 0–3 based on how much help is required to complete a given task (i.e., dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities) [12]. The average score for each domain is calculated, and then the average across the eight domains is used as a summary. The same method was used for the HAQ-DI to assess outcomes during the final 3 years of the study period, using a time-averaged HAQ-DI. Just as with the CDAI, change in time-averaged HAQ-DI was considered a secondary outcome.

Statistical analyses

We assessed patient characteristics at baseline and at year 3 of follow-up and then examined the comorbidity distribution across the population throughout 6 years of longitudinal follow-up. During the first phase of this work, the results of five different ML algorithms for clustering the patients’ comorbidities over the 6-year period were examined; the ML algorithms included K-mode, K-mean, agglomerative hierarchical divisive analysis clustering (DIANA), agglomerative nesting clustering (AGNES), and model-based clustering (VarSelLCM) [13, 14]. Three, four, five, and six clusters were each assessed. We chose 5 as the number of clusters for all clustering algorithms based on the “elbow” method from the K-mode clustering [15]. The data were clustered by patient. (For K-means, center = 5; for K-modes, modes = 5; for AGNES and DIANA, cut the tree at k = 5. For VarselCluster, we selected all comorbidity variables and chose the highest probability group among 5 groups as the patient’s cluster group.)

In the second phase of this work, we compared the performance of the different clustering algorithms, with respect to their association with clinical outcomes. For all ML algorithms, the five-cluster solution was chosen based on statistical methods that look for an inflection point in the sum of squares [13] (see Supplemental Fig. 1). The two clinical outcomes selected were the time-averaged CDAI and time-averaged HAQ-DI. The clusters were defined using data from the first 3 years of follow-up and the clinical outcomes defined in the next 3 years.

To understand the value of the different clusters, we compared the model fit for three sets of models. These included the following as independent variables: (a) only demographics and RA variables; (b) demographics, RA variables, and each comorbidity; (c) demographics, RA variables, and the clusters. This was repeated for each of the ML clustering algorithms. Sensitivity analyses considered sex-stratified models, models with only comorbidities recorded since baseline, and the secondary outcome (change in CDAI or HAQ-DI).

R (version 4.3.0) and SAS (version 9.4) statistical computing packages were used for all analyses.

Results

Among all RA patients in the CorEvitas RA registry, nearly 12,000 patients had accumulated the minimum 6 years of longitudinal data. Characteristics of the study cohort at baseline and year 3 are shown in Table 1. At baseline, patients were on average 59 (SD 12) years of age, 77% were women, CDAI was 11.3 (SD 11.9, moderate disease activity), HAQ-DI was 0.32 (SD 0.42), and disease duration was 10.8 (SD 9.9) years. Almost all reported current use of a DMARD at both years 1 and 3. In addition, the use of medications for common comorbidities was frequent. Table 2 shows the percentage of patients that reported a comorbidity over the 6-year study period. The median number of comorbidities at baseline was 2 (IQR 1, 3). As anticipated, cardiovascular comorbidities (i.e., coronary artery disease, hypertension, diabetes, and hyperlipidemia) are all common. Osteoporosis is reported in almost one-quarter of patients, acute kidney injury in one-sixth, and mental health issues in over half.

Table 1 Characteristics of patients with rheumatoid arthritis from the CorEvitas registry included in the analyses, at baseline and after 2 years of follow-up
Table 2 Comorbidities of patients with rheumatoid arthritis from the CorEvitas registry included in the analyses, at baseline and during follow-up

Clusters of patients were generated based on their comorbidities using five different ML algorithms; five cluster results were a focal point as they appeared to best describe the data (Supplemental Fig. 1 and Supplemental Table 3a-e) [13]. Using 5 clusters, 24 comorbidities, and 11,883 subjects, each ML algorithm provided different clusters. K-modes and K-means gave similar results: both generated one cluster that had few patients who had few comorbidities; both generated one cluster with many patients having cardiovascular comorbidities; and both generated a cluster with mental health issues and fibromyalgia. The model-based clustering algorithm generated clusters with a broad distribution of comorbidities. Lymphoma and skin cancers were relatively more frequent in one cluster and fibromyalgia and mental health issues in another cluster. The agglomerative hierarchical methods gave similar answers to each other that were different than the first three methods. The DIANA and AGNES algorithms each created one cluster with much higher frequencies of all comorbidities.

To better understand the potential role of the different ML clustering algorithms in clinical research, we examined their relationship with two different outcomes—CDAI and HAQ-DI. Distribution of CDAI and HAQ-DI during the first 3 years and the subsequent 3 years were assessed (Fig. 1). At baseline, approximately 50% of CDAI scores were in the remission or low disease activity range and the remainder were evenly split between moderate and high disease activity. At follow-up, these proportions remained stable. For the HAQ-DI, at baseline, approximately 90% had scores of 1 or below (little or no assistance with typical activities). This remained stable at follow-up.

Fig. 1
figure 1

Distribution of CDAI and HAQ-DI. Time-averaged value of CDAI at baseline (A) and during the final 3 years of study follow-up for patients in CorEvitas (B). Time-averaged value of HAQ-DI at baseline (C) and during the final 3 years of study follow-up for patients in CorEvitas (D)

The regression models for time-averaged CDAI are shown in Table 3. The first model (far left) includes only demographics and RA variables and no comorbidities; the R2 was 0.30 and RMSE 7.07. Adding all comorbidities individually demonstrated a slightly higher R2 (0.33) and a slightly lower RMSE (6.95). When the five clusters from K modes were substituted for the individual comorbidities, the model fit hardly changed. The same was observed for clusters generated by regression and the DIANA agglomerative hierarchical algorithm. The same pattern was observed for the time-averaged HAQ-DI endpoint (Table 4). The sensitivity analyses—sex-specific stratified analyses, baseline versus post-baseline comorbidities, and change in time-averaged CDAI or HAQ-DI—found very similar results (see Supplemental Tables 4, 5, and 6). However, the models with a change in time-averaged outcome had slightly better model fit than the models with the primary outcome.

Table 3 Multivariable regression models comparing models for time-averaged CDAI outcome, individual comorbidities versus comorbidity clusters
Table 4 Multivariable regression models comparing models for time-averaged HAQ-DI outcome, individual comorbidities versus comorbidity clusters

Discussion

It has become popular to attempt to examine how patients cluster based on their comorbidities in rheumatic diseases [6,7,8]. This is often achieved with some form of ML clustering algorithm. While clustering of patients based on comorbidities is intended to provide a deeper understanding of the heterogeneity of diseases, such as RA, the clusters are not always very interpretable; further, it is hard to gauge whether the clusters have provided more information than the individual comorbidities. To examine this issue, we used a very large longitudinal RA registry to characterize comorbidity clusters using ML algorithms. The clusters varied based on the ML algorithm used. To examine whether the different algorithms provided new information about the individual comorbidities, clinical outcomes models were assessed for CDAI and HAQ-DI. Model results demonstrated that the clusters performed similarly to each other and similar to models with individual comorbidities; this was true across outcomes.

Two recent studies, both using ML, have examined whether informative patient clusters based on comorbidities among patients with RA could be identified. The first one used data from a single center registry and developed principal components which were then clustered using K-mean clustering [7]. From 1443 patients, 5 clusters were determined that differed in disease activity, comorbidity scores, and outcomes such as infection. This study was limited in several important ways. In addition to it comprising only one academic rheumatology practice, cluster analyses were applied to baseline comorbidities only, without accounting for the development over time of additional comorbid conditions. Further, it was not clear how the clusters of comorbidities were added incrementally over considering each comorbid condition individually. The second study, from the Mayo Clinic, included 1409 patients with RA [16]. Several ML algorithms were used, including hierarchical clustering, network analysis, and latent class analysis. Different methods yielded different numbers of clusters.

Machine learning algorithms permit relatively easy methods to cluster many variables across hundreds or thousands of patients. These methods have become popular in various types of high dimensional data, such as proteomics, transcriptomics, and genomics. It is natural for clinical researchers to import such methods into analyses of clinical variables; however, it is not clear whether these methods add much over more traditional analyses. While the current analyses did not suggest that comorbidity clusters explained much of the variation in CDAI or HAQ-DI, these clusters may be more useful in explaining other clinical outcomes, i.e., treatment response. Our findings suggest that ML clustering algorithms can be used on comorbidity data to define groups of patients based on the many varied conditions patients have other than RA. However, the algorithms can be difficult to interpret. Not surprisingly, the clusters derived from these ML algorithms are not better predictors than individual comorbidities, but they do produce clinical models with similar overall fit. Similar fit with the clusters is impressive; however, the clusters cannot be produced without knowing the individual comorbidities. Thus, the value of clustering comorbidities in clinical analyses is not perfectly clear. In addition, since the clusters collapse 24 variables into five, this approach is statistically more efficient.

The fact that the clusters have similar value as the individual comorbidities suggests that the 23 different comorbidities may not need to be collected if the clusters are known. Since the clusters do not have clear face validity, it is not apparent that clinicians can recognize patients that occupy one cluster or another. Clustering algorithms can be used to describe phenotypes of heterogeneous disease, like RA; however, comorbidity data may not be that helpful for defining these sub-phenotypes. Further, the different ML algorithms defined different clusters, suggesting that the clustering did not describe a “biologic” truth; rather, it likely represented a statistical phenomenon.

This study has several important strengths, including a very large sample size with longitudinal data. In addition, the cohort is derived from many practices across the USA, including both community-based and academic rheumatology practices. Limitations include the fact that comorbidity reporting may be incomplete and not consistently defined across clinicians. Also, some of the comorbid conditions are self-reported and thus there is likely misclassification.

Conclusions

In conclusion, we defined clusters of RA patients based on comorbidities, using a ML algorithm. Different algorithms produced different clusters, many of which were hard to understand clinically. However, in clinical outcomes models, the clusters performed similarly to each other and to the individual comorbidities. Comorbidity clusters seem to be useful in clinical outcomes models due to their statistical efficiency. However, it is not clear that they provide new insights beyond the individual variables. While ML clustering algorithms have a clear role in multi-dimensional biologic data, their role in clinical research in rheumatology needs continued assessment. We recommend that future comorbidity clustering studies be designed with a clear purpose in mind for the clustering, such as identifying a small number of clusters that best predict future outcomes.