Background

Chronic pain is one of the leading causes of disability, affecting more than 30% of people worldwide [1]. Depression is also a leading cause of disability, affecting approximately 5% of adults worldwide [2, 3]. It is generally understood that chronic pain and depression are commonly co-morbid disorders [1, 4]. Indeed, research suggests that chronic pain increases the risk of depression, and depression increases the risk of chronic pain [5, 6]. However, the prevalence of depression among people living with chronic pain remains unclear [1, 7]. Previous studies have reported prevalence estimates ranging from about 15% to 85% [8,9,10]. There are several possible reasons for such variation in the prevalence of depression among people with chronic pain reported across studies. Firstly, measures of depression and definitions of depression vary considerably across studies. For example, some studies measure current depression, while others measure lifetime depression [10]. Secondly, the extent of pain varied (e.g., regional vs widespread pain). Thirdly, the demographic and health characteristics of the populations sampled varied. For example, people with chronic pain who are female, have additional chronic health conditions, or have a lower socioeconomic status are thought to be at higher risk of depression [11].

A clinical prediction model could calculate the risk of a particular endpoint for individual patients by combining multiple predictors, which could be a useful way to accurately estimate the probability that patients with chronic pain suffer from depression based on their individual characteristics [12]. Recently published methodological papers have provided a framework for the development of valid clinical prediction models [13,14,15].

The selection of the appropriate dataset is important for the development of a valid clinical prediction model. Among potentially suitable datasets, we selected the UK Biobank dataset for the following reasons. Firstly, at its baseline visit, the UK Biobank recruited about 0.5 million participants across the UK, which provided a large sample size to start a study. Secondly, the “experience of pain” questionnaire (2019–2020) provides a comprehensive assessment of chronic pain, including regional or widespread pain, neuropathic or non-neuropathic pain, and pain location that bothers you most. Thirdly, the validity of the measurement of depression in the “online mental health self-assessment” questionnaire (2016–2017) is supported by a dual approach that includes both secondary care record linkage (i.e., diagnosis by a professional) and self-reporting of symptoms [16]. Using this dataset, we aimed to develop and internally validate clinical prediction models of depression among individuals with chronic pain.

Methods

Study sample

This study used data from the UK Biobank. UK Biobank is a large-scale biomedical database, which recruited approximately 500,000 people in the UK at its initial enrollment (from 13 March 2006 to 1 October 2010). Part of these participants received follow-up surveys. For example, about 157,000 participants received the “online mental health self-assessment” questionnaire from 13 July 2016 to 27 July 2017, and about 167,000 participants received the “experience of pain” questionnaire from 9 January 2019 to 18 April 2020 [17]. More details about the UK Biobank can be found in the registry online protocol: http://www.ukbiobank.ac.uk. The North West Multi-centre Ethics Committee granted ethical approval to access data from the UK Biobank, and all participants provided written informed consent.

To define chronic pain, we selected the “experience of pain” questionnaire (2019–2020) rather than the baseline visit (2006–2010) for the following reasons. Firstly, the number of pain types in the “experience of pain” questionnaire was much higher than at the baseline visit (i.e., 15 in the “experience of pain” questionnaire compared with 8 in the baseline visit). Secondly, the “experience of pain” questionnaire collected a number of additional pain-related variables (e.g., neuropathic pain or not, and the pain area that bothers you the most). To match the measurement time of chronic pain and depression, the analysis sample was restricted to participants who reported having pain for more than 5 years in the “experience of pain” questionnaire (2019–2020) and completed the “online mental health self-assessment” questionnaire (2016–2017). Based on the International Classification of Diseases 11th Revision definitions for chronic pain and the data availability of UK Biobank, chronic pain was classified as widespread pain (through the question “have you experienced pain or discomfort all over the body?”) and regional pain (i.e., leg pain, chest pain, feet pain, hand pain, arm pain, knee pain, hip pain, stomach or abdominal pain, back pain, neck or shoulder pain, facial pain, and headache) [18].

Although previous literature suggested that multisite pain is strongly related to mood disorders and played an important role in the development of chronic pain, UK Biobank has created a new question, “the pain area that bothers you the most,” in consideration of the fact that many people have multiple pains [19, 20]. Therefore, we included the pain area that bothers you the most as one of the predictors. We also collected the nature of pain (neuropathic and non-neuropathic pain) as one pain-related characteristic [21]. Details for defining pain can be found in Supplementary A.

Outcomes

We followed the framework that the UK Biobank team proposed to define the depression [16]. The primary outcome was a “lifetime” history of depression rather than present depression, because many mental disorders (e.g., depression) can fluctuate. By including those with a “lifetime” history, we are more likely to more comprehensively capture those with the condition. The dual approach was used to define a “lifetime” history of depression, which included both secondary care record linkage (i.e., diagnosed by a professional) and self-report of symptoms through the Composite International Diagnostic Interview-Short Form (CIDI-SF), depression module, lifetime version. CIDI-SF is a simplified version of its full version CIDI [22] which is a fully structured diagnostic interview, and one previous validation study showed CIDI-SF had comparable accuracy for diagnosing major depressive episodes when compared to CIDI [23]. Two reasons justified the choice of the dual approach: firstly, traditional full-version diagnostic interview is too expensive to be implemented in a cohort with a large sample size (e.g., UK Biobank). Secondly, secondary care record linkage can fail to identify patients with less severe illnesses as these patients are less likely to seek help from the professional compared with patients with more severe illnesses [24] Through this dual approach, all participants were classified as having no “lifetime” history of depression, having a “lifetime” history of subthreshold depressive symptoms, and having a “lifetime” history of depression.

Following the framework that the UK Biobank team proposed, the secondary outcome was present depression [16]. It is worth noting that the UK Biobank team identified present depression among participants with a history of depression, but did not provide clear justification for this approach. Readers should be aware of this point when interpreting the results. Present depression was defined through the Patient Health Questionnaire 9-question version (PHQ-9). PHQ-9 is a validated tool that included nine short screening questionnaires and is widely used in screening for depression [25].

The detailed algorithms and the corresponding R code to define the above outcome were provided by the official group, as available at https://data.mendeley.com/datasets/kv677c2th4/3.

Covariates

Previous systematic reviews have identified factors that are known to increase risk of depression [11, 26, 27]. Based on these findings and data availability in the UK Biobank and in daily practice, we consider the following variables as covariates: demographic characteristics (age, gender, ethnicity, and Townsend deprivation score which reflected socioeconomic status), body mass index (BMI), lifestyle behaviors (smoking status, alcohol consumption, and physical activity), comorbidities as identified in the recent international consensus on the definition of multimorbidity for research purposes (i.e., stroke, coronary artery disease, heart failure, peripheral artery disease, diabetes, Addison’s disease, cystic fibrosis, chronic obstructive pulmonary disease, asthma, Parkinson’s disease, epilepsy, multiple sclerosis, paralysis, solid organ cancers, hematological cancers, metastatic cancers, dementia, schizophrenia, connective tissue disease, chronic liver disease, inflammatory bowel disease, chronic kidney disease, end-stage kidney disease, and HIV/AIDS) [28], and regular opioid use. For participants with chronic regional pain, nature of pain, and pain location that bothers you most were also added. Definition details could be found in Supplementary B. Other pain severity-related variables were not included as predictors due to the concerns with the potential measurement bias. For example, pain intensity was measured through the question “Thinking about the last 24 hours, how would you rate your pain on a 0-10 scale, where 0 is ‘no pain’ and 10 is ‘pain’ as bad as it could be,” which may not align with the timeline of when patients completed the mental health questionnaire.

Statistical analysis

Baseline characteristics for participants with chronic pain were shown by depression status. Overall and subgroup prevalence of having: (1) a “lifetime” history of depression among participants with chronic widespread pain; (2) a “lifetime” history of depression among participants with chronic regional pain; (3) present depression among participants with chronic widespread pain; (4) present depression among participants with chronic regional pain were provided. Subgroup analyses were performed based on the “one covariate at a time” principle by each of the variables mentioned in the covariates section. Wald statistic was used to assess whether the prevalence differed by each covariate [29].

Prediction models (through logistic regression) to estimate the probability of depression for individuals with chronic pain were developed. The choice of logistic regression was based on its ease of understanding and communication, as well as its ability to handle binary outcomes [30]. To ensure precise predictions and prevent overfitting, the maximum number of candidate predictor parameters was estimated based on the criteria proposed (details in Supplementary C) by Riley et al. [31]. To minimize the influence of sparse data from binary predictors, we excluded predictors if the number of events in one level of the predictor was less than 10. If the remaining predictors were still more than the estimated maximum number, we excluded predictors with an insignificant Wald statistic. Considering most covariates have a small quantity of missing data (details in Table 1), a single imputation through the transcan function (i.e., a nonlinear additive transformation and imputation function) was used [29].

Table 1 Baseline characteristics for participants with chronic pain stratified by depression status

The modeling strategy we used was adapted from Harrell’s Regression Modeling Strategies (detailed in Fig. 1) [29] The full model, including all pre-specified predictors without variable selection, was considered the gold standard. However, clinicians may have insufficient resources (e.g., time) to collect all these predictors. Thus, the simplified model may be needed in daily practice. One significant benefit of Harrell’s simplified model is that it offers varying degrees of parsimony to clinicians based on their specific needs. This is achieved by estimating the contribution of each predictor. In our study, we provide two examples. Firstly, the simplified model (reported as equations and nomograms) has at least 95% of the performance compared with the full model. Secondly, we assume that the clinician only wants to collect the three most important predictors.

Fig. 1
figure 1

Summary of the modeling strategy

Model performance was assessed by the discrimination (through optimism-corrected C statistic) and calibration (through calibration plot) [12]. Optimism is defined as a bias due to overfitting. The bootstrap method is a class of resampling methods that samples a sub-dataset from the original one with replacement. The estimate of the optimism equals the C statistic from the original sample minus the C statistic from the bootstrap sample. In our study, this process was repeated 1000 times to get an average optimism. The final reported optimism-corrected C statistic equals the C statistic from the original sample minus the average optimism [29]. In addition, the C-statistic with the 95% confidence interval using 10-fold cross-validation was provided. We checked whether two continuous variables (age and BMI) should be modeled through splines and the results showed that they can be analyzed through the original form. Based on clinical knowledge and other literature, we assessed the potential interaction between age and ethnicity and the results showed that we did not need to include this interaction term in the model [32]. Details for modeling could be found in Supplementary D.

For chronic regional pain, although one prediction model may not work well for different categories, we did not develop a clinical prediction model for each category as the sample size may be insufficient. To explore the robustness of the prediction model for the overall chronic regional pain, we performed an additional analysis by evaluating its model performance for each category of chronic regional pain.

We reported this study based upon the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) and Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [33, 34]. All statistical analyses were performed in R, version 4.2.2 (R Group for Statistical Computing).

Results

Of the UK Biobank participants, 24,405 participants with chronic pain were included: 912 (3.7%) had present depression, 7952 (32.6%) had a “lifetime” history of depression, 5317 (21.8%) had a “lifetime” history of subthreshold depressive symptoms, and 11,137 (45.6%) had no “lifetime” history of depression. Figure 2 shows the selection process. Table 1 reports the participants’ baseline characteristics. Among included participants, 9228 (37.8%) were men, the mean (SD) age was 64.1 (7.5) years, and 23,706 (97.1%) were white. Univariate associations of the covariates with depression outcomes could be found in Supplementary E.

Fig. 2
figure 2

Flowchart of study participants

Primary outcome

Among participants with chronic widespread pain, the prevalence of having a “lifetime” history of depression was 45.7% (1716/3757) (Table 2). Subgroup analyses revealed that the prevalence ranged from 25.0 to 66.7% (Table 2). 26 predictors were included in the initial full prediction model (Supplementary F). The final simplified model (Supplementary G) with nine predictors (age, BMI, smoking status, physical activity, Townsend deprivation score, gender, history of asthma, history of heart failure, and history of peripheral artery disease) was built with its equation in Supplementary H and the nomogram in Fig. 3. The prediction model showed moderate discrimination (optimism-corrected C statistic was 0.66; C statistic from the 10-fold cross-validation: 0.67, 95% confidence interval [CI] 0.65 to 0.69) and good calibration (on the calibration plot) (Supplementary I). Age (as age increases by one year, the odds of having a “lifetime” history of depression decreases: odds ratio [OR] 0.94, 95% CI 0.93 to 0.95), gender (compared to females, males were less likely to have a “lifetime” history of depression: OR 0.56, 95% CI 0.47 to 0.65), and BMI (as the value of BMI increase by one, the odds of having a “lifetime” history of depression also increases: OR 1.02, 95% CI 1.01 to 1.03) were the three most important predictors.

Table 2 Overall and subgroup prevalence of depression among participants with chronic pain
Fig. 3
figure 3

Nomogram for estimating the probability of having a “lifetime” history of depression for individuals with chronic widespread pain. Gender: male—1 and female—0. History of one comorbidity: yes—1 and no—0. Instructions for the use of the nomogram: (1) locate the answer for each predictor, (2) draw a straight line upward to the point axis and record the score, (3) calculate the total score for all predictors and locate the score in the total points axis, (4) draw a straight line downward to the probability axis to estimate the individual’s probability of having a “lifetime” history of depression

Among participants with chronic regional pain, the prevalence of having a “lifetime” history of depression was 30.2% (6235/20,648) (Table 2). Subgroup analyses revealed that the prevalence ranged from 21.4 to 70.6% (Table 2). Thirty predictors were included in the initial full prediction model (Supplementary F). The final simplified model (Supplementary G) with eight predictors (age, gender, nature of pain, smoking status, regular opioid use, history of asthma, pain location that bothers you most, and BMI) was built with its equation in Supplementary H and the nomogram in Fig. 4. The prediction model showed moderate discrimination (optimism-corrected C statistic was 0.65; C statistic from the 10-fold cross-validation: 0.66, 95% CI 0.65 to 0.66) and good calibration (on the calibration plot) (Supplementary I). Age (as age increases by one year, the odds of having a “lifetime” history of depression decreases: OR 0.96, 95% CI 0.96 to 0.96), gender (compared to females, males were less likely to have a “lifetime” history of depression: OR 0.53, 95% CI 0.50 to 0.57), and nature of pain (compared with patients with non-neuropathic pain, patients with neuropathic pain were more likely to have a “lifetime” history of depression: OR 1.47, 95% CI 1.36 to 1.58) were the three most important predictors.

Fig. 4
figure 4

Nomogram for estimating the probability of having a “lifetime” history of depression for individuals with chronic regional pain. Gender: male—1 and female—0. Nature of pain: neuropathic pain—1 and non-neuropathic pain—0. Regular opioid use: yes—1 and no—0. History of asthma: yes—1 and no—0. Pain location that bothers you most: arm pain—a, back pain—b, chest pain—c, facial pain—d, feet pain—e, hand pain—f, headache—g, hip pain—h, knee pain—i, leg pain—j, neck or shoulder pain—k, and stomach or abdominal pain—l. Instructions for the use of the nomogram: (1) locate the answer for each predictor, (2) draw a straight line upward to the point axis and record the score, (3) calculate the total score for all predictors and locate the score in the total points axis, (4) draw a straight line downward to the probability axis to estimate the individual’s probability of having a “lifetime” history of depression

Secondary outcome

Among participants with chronic widespread pain, the prevalence of having present depression was 10.5% (396/3757) (Table 2). Subgroup analyses revealed that the prevalence ranged from 4.5 to 33.3% (Table 2). In total, 13 predictors were included in the initial full prediction model (Supplementary F). The final simplified model (Supplementary G) with seven predictors (age, BMI, smoking status, physical activity, Townsend deprivation score, history of peripheral artery disease, and history of chronic kidney disease) was built with its equation in Supplementary H and the nomogram in Supplementary J. The prediction model showed moderate discrimination (optimism-corrected C statistic was 0.75; C statistic from the 10-fold cross-validation: 0.76, 95% CI 0.74 to 0.79) and good calibration (on the calibration plot) (Supplementary I). Age (as age increases by one year, the odds of having present depression decreases: OR 0.91, 95% CI 0.90 to 0.93), BMI (as the value of BMI increases by one, the odds of having present depression also increases: OR 1.04, 95% CI 1.02 to 1.06), and smoking status (compared to current smokers, both former [OR 0.62, 95% CI 0.44 to 0.86] and never [OR 0.47, 95% CI 0.34 to 0.65] smokers were less likely to have present depression) were the three most important predictors.

Among participants with chronic regional pain, the prevalence of having present depression was 2.5% (516/20,648) (Table 2). Subgroup analyses revealed that the prevalence ranged from 1.4 to 26.7% (Table 2). In total, 17 predictors were included in the initial full prediction model (Supplementary F). The final simplified model (Supplementary G) with 10 predictors (age, BMI, nature of pain, pain location that bothers you most, Townsend deprivation score, regular opioid use, physical activity, smoking status, history of diabetes, and history of chronic obstructive pulmonary disease) was built with its equation in Supplementary H and the nomogram in Supplementary J. The prediction model showed moderate discrimination (optimism-corrected C statistic was 0.74; C statistic from the 10-fold cross-validation: 0.75, 95% CI 0.73 to 0.77) and good calibration (on the calibration plot) (Supplementary I). Age (as age increases by one year, the odds of having present depression decrease: OR 0.93, 95% CI 0.92 to 0.94), BMI (as the value of BMI increases by one, the odds of having present depression also increases: OR 1.06, 95% CI 1.04 to 1.07), and nature of pain (compared with patients with non-neuropathic pain, patients with neuropathic pain were more likely to have present depression: OR 1.71, 95% CI 1.40 to 2.10) were the three most important predictors.

Additional analyses

For the primary outcome (i.e., a “lifetime” history of depression), the results showed that the model developed for the overall chronic regional pain also worked well for all categories (optimism-corrected C statistics: 0.62 to 0.67) of chronic regional pain except for stomach pain (optimism-corrected C statistic: 0.59). For the secondary outcome (i.e., the present depression), the results showed that the model developed for the overall chronic regional pain worked well for all categories (optimism-corrected C statistics: 0.69 to 0.78) of chronic regional pain except for chest pain (optimism-corrected C statistics: 0.64), feet pain (optimism-corrected C statistics: 0.65), hand pain (optimism-corrected C statistics: 0.66), and headache (optimism-corrected C statistics: 0.60).

Discussion

Key results

We found that there was substantial variability in the prevalence of having a “lifetime” history of depression among patients with chronic pain. Among participants with chronic widespread pain, the prevalence of having a “lifetime” history of depression was 45.7%; subgroup analyses indicated that the prevalence ranged from 25.0 to 66.7%.

This study developed and evaluated clinical prediction models to estimate the probability of having a “lifetime” history of depression among patients with chronic pain. Among participants with chronic widespread pain, the final clinical prediction model consisted of nine predictors, including age, BMI, smoking status, physical activity, Townsend deprivation score, gender, history of asthma, history of heart failure, and history of peripheral artery disease. Among participants with chronic regional pain, the final clinical prediction model consisted of eight predictors, including age, gender, nature of pain, smoking status, regular opioid use, history of asthma, pain location that bothers you most, and BMI.

Comparison with previous studies

Using the terms “chronic pain,” “depression,” and “UK Biobank” in PubMed (from the inception to March 1, 2024), we found 13 studies including chronic pain and depression through the analysis of UK Biobank [19, 20, 35,36,37,38,39,40,41,42,43,44,45]. Of the 13 studies, five focused on genetic information [35, 36, 39, 41, 45], five were association analyses [19, 37, 38, 42, 44], one examined the role of coffee in the association between chronic pain and depression [43], one was a clinical prediction model for the development and spread of chronic pain [20], and one assessed risk factors for facial pain [40]. We also extended the search to clinical prediction models based on other datasets and found no other relevant studies. Therefore, this is the first study to develop prediction models that estimate individuals’ probability of experiencing depression among participants with chronic pain. Our models reported through TRIPOD guidelines, showed moderate discrimination and good calibration.

Although our study could not answer the question of bidirectional causality between chronic pain and depression, readers should bear in mind the complex interplay between chronic pain and depression when interpreting the results. Previous studies have reported the role of depression in the chronicity of pain, especially the nociplastic type of pain (fibromyalgia), and the role of chronic pain in the development of depression [20, 46]. Repeated measurements of both pain and depression could facilitate a deeper exploration into whether chronic pain predisposes patients to depression or vice versa [47].

Limitations

Several limitations should be noted. Firstly, the difference in the measurement time for chronic pain and depression status might bring bias. Although we restricted the analysis sample to those whose pain duration was more than 5 years, we could not totally exclude the influence of recall bias [48]. We also did not find formal analysis to assess the reliability of this retrospective way to define chronic pain, further studies should be performed to assess the accuracy of the estimate. Secondly, genetic information (e.g., polygenic risk scores) may add additional value in predicting depression among individuals with chronic pain, as previous studies have found the genetic relationship between pain and depression [49, 50]. However, the data we applied for this project did not include genetic data. This should be investigated in further studies. Thirdly, although we did not include the number of pain sites as one of the predictors considering the measurement issue in the relevant questionnaire in the UK Biobank, this variable may provide useful information, which should be collected in future studies with more accurate pain questionnaires. Fourthly, although an external validation would be beneficial, a suitable dataset for comparison with the UK Biobank was not found. Further validation studies that prospectively collect data with the comprehensive assessment of chronic pain and depression status are therefore still needed. Fifthly, participants in this study were from the UK, between the ages of 47 to 80, meaning results may not be generalizable to other countries or age groups. Finally, non-white patients were grouped into one category (i.e., the ethnicity in the analysis was treated as a binary variable: white and non-white) in this study to facilitate analysis. However, this way might mask the differences among these non-white patients, which should be explored in future studies.

Implications of clinical practice and future research

Results from this study can support clinicians in deciding upon treatment priorities for the patient. Importantly, the predictors included are easily collected by clinicians. To further enhance the model, future researchers should focus on improving the quality of the measurement instruments and look to objective assessment when possible. They should also consider other potentially important predictors to improve the predictive accuracy of the model, such as genetic information. Finally, external validation should take place. As Riley et al. mentioned in their new methodological paper, researchers should focus on the target population and setting in which the model is planned to be implemented, especially when the intended population or setting is different from the one in which the model was developed (e.g., UK Biobank) [14].

Conclusions

There was substantial variability in the prevalence of depression among patients with chronic pain. Clinically relevant factors were selected to develop prediction models. Clinicians can use these models to assess patients’ treatment needs. These predictors are convenient to collect during daily practice, making it easy for busy clinicians to use them.