figure b

Introduction

Insulin action has been quantitatively defined as its ability to regulate glucose disposal [1]. Insulin resistance (IR) is a physiological state whereby glucose disposal is impaired and accompanied by a compensatory hyperinsulinaemia [1]. IR is a primary risk factor for the development of type 2 diabetes and non-alcoholic fatty liver disease [1]. Through the promotion of atherogenic dyslipidaemia and hypertension, IR is a strong risk factor for atherosclerotic CVD (ASCVD) [1].

The euglycaemic–hyperinsulinaemic clamp (EIC) is widely accepted as the reference standard for directly assessing insulin sensitivity [2, 3]. During an EIC, the plasma insulin concentration is acutely raised and maintained at a constant level by a primed continuous infusion of insulin [2, 3]. The plasma glucose concentration is held constant at basal levels by a variable rate glucose infusion using the negative feedback principle [2, 3]. Under these steady-state conditions of euglycaemia, the glucose infusion rate (‘M value’) equals glucose uptake by all the tissues in the body and is therefore a measure of tissue sensitivity to exogenous insulin [2, 3]. Insulin sensitivity as estimated by the EIC has been linked to incident type 2 diabetes [4], ASCVD [5, 6] and heart failure [7], but is invasive, labour-intensive and expensive to perform [2, 3]. For this reason, the test is invariably substituted in epidemiological studies by simpler surrogate indexes including fasting insulin, the HOMA-IR index, the QUICKI or OGTT-based measures [8].

The prevalence of IR worldwide is increasing at an alarming rate, secondary to the obesity pandemic and decreasing levels of physical activity [9]. Thus, a critical need exists for more accurate diagnostic tests of insulin sensitivity that are able to detect those at greatest risk of adverse metabolic consequences of IR [10, 11]. Surrogate measures of IR possess suboptimal diagnostic sensitivity, especially among people without obesity, and are hampered by the lack of standardisation of the insulin assay [11, 12]. Diagnostic approaches leveraging blood-based signatures derived from the measurement of multiple biomarkers have shown promise and may allow for the more reliable identification of individuals at high cardiometabolic risk [13]. Here, we assess the utility of this approach in explaining the variability in insulin sensitivity as estimated by the M value using high-throughput plasma proteomics in two of the largest studies to date that have implemented the EIC: the Relationship between Insulin Sensitivity and Cardiovascular disease (RISC) [14] and the Uppsala Longitudinal Study of Adult Men (ULSAM) [15].

Methods

Study populations

The RISC and ULSAM studies are described extensively elsewhere [14, 15]. Briefly, RISC was a prospective observational cohort study of 1037 healthy people aged 30–60 years from 19 centres in 14 European countries whose main aim was to examine whether insulin sensitivity independently predicts CVD risk over 3–10 years’ follow-up [14].

The ULSAM study is an ongoing longitudinal epidemiological study based on all available men born between 1920 and 1924 and living in Uppsala County, Sweden [15]. The men were interviewed and examined at the ages of 50, 60, 70, 77, 82, 88 and 93 years, but the EIC was performed once at the age 70 visit in the early 1990s [6, 7].

Measurement of protein biomarkers

We measured a total of 828 proteins in the plasma of RISC and ULSAM participants using the proximity extension assay (PEA) developed by Olink [16]. A detailed description is available in the electronic supplementary material (ESM Methods).

We used plasma obtained at the baseline visits between 2002 and 2004 for 1037 RISC participants and at the 1991–1995 visits for 954 ULSAM participants. For RISC, blood used for protein analysis was drawn within 1 month of the day of the EIC while, for ULSAM, blood was drawn on the same day as the EIC.

Outcome measure and covariates

In each study, the M value, the rate of glucose disposal, was calculated as the amount of glucose taken up during the EIC study and was transformed to milligrams per kilogram body weight per minute. More details are provided in ESM Methods.

From the clinical databases of each study, we extracted covariates related to IR that were collected using standardised study protocols at the same time that the EIC was performed to include in our multivariable analyses, including age, sex (where applicable), BMI, systolic blood pressure (SBP) and standardised measures of cholesterol [4, 14].

Statistical analyses

Figure 1 summarises the different data sources used and our analytical approach. We applied the least absolute shrinkage and selection operator (LASSO) method to develop regression models incorporating multiple protein levels [17]. The M value was analysed as a continuous variable, and we also analysed IR as a binary variable, comparing people below the first quartile of the M value to the 75% above it. This quartile threshold has been previously linked to a sharp increase in adverse health outcomes related to IR [18]. In the RISC and ULSAM cohorts separately, models were trained on a randomly selected 70% of the cohort and tested on the remaining 30%. In addition, we assessed the portability of the RISC model to the ULSAM cohort and vice versa.

Fig. 1
figure 1

Flow chart showing the different data sources and design used in this study

We applied LASSO regression in models that included three sets of clinical predictors: (1) recruitment centre, age, sex, BMI (basic clinical predictor model); (2) recruitment centre, age, sex, BMI, lipids, SBP (standard clinical predictor model); and (3) BMI, lipids and SBP or the subset of covariates whose distribution overlapped substantially across both cohorts (shared clinical predictor model). The lipids covariate was in form of the triglyceride to HDL-cholesterol ratio, which is arguably the simplest and most closely linked measure to IR [19]. For each of these sets of covariates, we also trained a model that additionally included proteins. A fourth protein-only model aimed to assess the performance of plasma proteins in the absence of any clinical covariates. Finally, we combined the HOMA-IR index with the standard covariates, again with and without proteins. As plasma insulin assays are not standardised, this final model was not tested across cohorts.

The standard clinical predictor model without and then with proteins was our a priori primary model of interest, as it establishes the added incremental value of protein measurements over and above all routinely available anthropomorphic and laboratory measurements for the assessment of IR in clinical practice. The basic clinical predictor model was formed by restricting the standard clinical predictor model to covariates believed to be causally associated with IR. We performed sensitivity analyses excluding proteins with a high proportion of measures flagged as being below the limit of detection (LOD). The standard clinical predictor model trained in ULSAM was also tested in the subset of men only in RISC. We generated predicted M values with our LASSO models and calculated standard diagnostic test statistics summarising the ability of the linear models to correctly classify whether an individual’s M value fell within the lowest quartile of measured M values.

In our final analysis we ran the randomised LASSO stability selection algorithm as presented by Meinshausen and Bühlmann [20] and implemented in R package stabs with the improved error bounds introduced by Shah and Samworth [21, 22]. We implemented this algorithm not only on the raw M values but also on residuals of the M values after progressively regressing out clinical covariates of interest, including centre (where applicable), age, sex (where applicable), BMI, lipids and blood pressure. We did not implement this algorithm on the binary M values. A detailed description is available in ESM Methods.

Results

Cohort characteristics

The baseline characteristics of the RISC and ULSAM participants included in the analyses are shown in Table 1. The cohort characteristics are fully described in ESM Results. Protein levels and distributions in the RISC and ULSAM cohorts using the relative Normalised Protein eXpression (NPX ) scale are shown in ESM Table 1.

Table 1 Characteristics of RISC and ULSAM participants at the time of EIC assessment

Standard linear regression analyses and replication of marginal effects of proteins

A total of 359 and 317 proteins for RISC and ULSAM, respectively, were significant at a false discovery rate of <0.05 in the single protein association tests including age, sex and centre covariates. When BMI was also included, these numbers decreased to 271 and 241, respectively. The number of overlap** proteins between the two cohorts was 168 and 72 for the two sets of covariates, respectively. Specifically, among the significant proteins in RISC, 46.8% (168/359) and 26.6% (72/271) were replicated in ULSAM and, among the significant proteins in ULSAM, 53.0% (168/317) and 29.9% (72/241) were replicated in RISC. Corresponding statistics using a more stringent Bonferroni correction are provided in ESM Results. Full results are shown in ESM Table 2.

LASSO regression models

Consistent with previous reports [23, 24], standard clinical variables alone (age, sex, centre, BMI, lipids and SBP) explained 0.237–0.258 of the M value R2 in RISC and 0.381–0.446 of the M value variance in ULSAM when validation was performed within the same cohort (Fig. 2a,b). LASSO regression models selected 39–53 proteins in RISC and 34–67 proteins in ULSAM. Compared with covariate models alone, protein models resulted in absolute increases of R2 ranging from 0.201 to 0.238 for RISC and from 0.123 to 0.217 for ULSAM. Importantly, the 95% CIs of R2 for a model including proteins did not overlap the 95% CIs for the corresponding model without proteins. Models with proteins alone explained a similar proportion of M value variance to models with proteins and clinical variables. Specifically for our primary model of interest, A standard LASSO model incorporating 53 proteins in addition to routinely available clinical variables increased the M value R2 from 0.237 (95% CI 0.178, 0.303) to 0.456 (0.372, 0.536) in RISC and from 0.443 (0.360, 0.530) to 0.632 (0.569, 0.698) in ULSAM with the addition of 61 proteins. Findings for the binary IR variable generally mirrored those for the continuous M value. Adding proteins to a set of covariates increased the AUC by 0.035–0.072 for RISC and by 0.036–0.051 for ULSAM, but the 95% CIs had a substantial amount of overlap.

Fig. 2
figure 2

Variance explained (R2) using M as a continuous variable, and the AUC statistic using M as a binary variable. (a, b) Models performed using the training and the test datasets from the same cohort: (a) RISC and (b) ULSAM. (c, d) Models performed using the training dataset from one cohort and the testing dataset from the other cohort: (c) RISC vs ULSAM and (d) ULSAM vs RISC. In ULSAM, age is a limited covariate (69–73 years) and the variables centre and sex are invariant as there was only one centre and all participants were males. aA priori main model. bModels with common covariates in both cohorts. BMI was selected in all of the only-covariates models performed. Lipids were additionally selected in the only-covariates models in ULSAM only when M was used as a continuous variable

We observed a similar but less marked degree of improvement in the proportion of the M value variance explained by the addition of proteins when we trained models in one cohort and tested them in the other cohort (Fig. 2c,d). Standard clinical variables alone (BMI, lipids and SBP) explained 0.357 of M value variance in ULSAM, and 0.251 of M value variance in RISC. Proteins alone led to an increase of 0.160 over the model of shared covariates alone and 0.134 when combined with shared covariates in ULSAM (Fig. 2c), while the respective increases in RISC were 0.118 and 0.094 (Fig. 2d). As in the within-cohort analyses, the 95% CIs of R2 for a model including proteins did not overlap the 95% CIs for the corresponding model without proteins.

Findings for the within and cross-cohort analyses of the binary IR variable generally mirrored those for the continuous M value, with absolute improvements in the AUC observed in the models incorporating proteins of 0.028–0.041 over clinical predictor models (Fig. 2a–c). The one exception was a 0.018 reduction in the AUC when comparing the protein-only model with the shared covariate model ported from ULSAM to RISC (Fig. 2d). However, 95% CIs for absolute measures of AUC showed substantial overlap between corresponding models. Restricting the testing cohort to male participants in RISC did not improve performance of the model trained on the male-only ULSAM cohort (ESM Table 3).

The results of LASSO models that included HOMA-IR as an additional clinical variable increased R2 in RISC from 0.237 to 0.330 (ESM Table 4). However, no appreciable increase in R2 was observed in ULSAM. When tested within RISC, LASSO models selected HOMA-IR and BMI, which together explained 0.330 of M value variance, while BMI, lipids and HOMA-IR explained 0.430 of M value variance in ULSAM. Adding proteins as covariates nevertheless increased R2 by 0.139 in RISC and 0.136 in ULSAM, an increase comparable to that seen in models that did not include HOMA-IR.

Comparing predicted vs actual M value classifications using our linear predictor we observed an increase of ~10% to 25% for sensitivity and ~7% to 15% for balanced accuracy across the various models that included proteins compared with analogous models restricted to clinical predictors alone (Table 2). Full cross-tabulations of observed and predicted M values by class (lowest quartile = case, rest = non-case) as well as additional diagnostic test proportions are provided in ESM Table 5.

Table 2 Selected diagnostic test characteristics for all linear models to detect an M value in the lowest quartile

In sensitivity analyses, excluding proteins flagged as being below the LOD using three different LOD cut-offs (25%, 10% and 3%) gave similar R2 and AUC results to the main analyses (ESM Table 6).

The cross-validation mean squared error plots, including upper and lower SE bars, along the λ sequence for the RISC and ULSAM cohorts are shown in ESM Figs 14. The root mean squared errors (RMSEs) corresponding to each LASSO regression model obtained are shown in ESM Fig. 5.

Proteins consistently selected by standard LASSO and randomised LASSO stability selection analyses

ESM Tables 79 list the proteins selected by standard LASSO analyses for each model run, the number of times a protein was selected among a set of models, and the correlation among proteins selected and not selected by LASSO. A total of 135 proteins were selected by LASSO in one or more models. Six proteins were selected in ten or more of the 16 main LASSO models run. These were insulin-like growth factor-binding protein 2 (IGFBP2), leptin (LEP), reticulon-4 receptor (RTN4R), adhesion G protein-coupled receptor G1 (ADGRG1), inhibin beta C chain (INHBC) and lipoprotein lipase (LPL). While selected proteins were less correlated than non-selected proteins, ~25% to 30% of pairwise correlations still had an r>0.2 compared with 30–40% among non-selected proteins.

Our randomised LASSO stability selection analyses selected subgroups of ten proteins in RISC and seven proteins in ULSAM in up to six M value models tested per cohort (Table 3). Four proteins were selected in both cohorts: fatty acid-binding protein 4 (FABP4), IGFBP2, LEP and RTN4R. The number of proteins selected decreased progressively as more covariates were regressed out of the M value, with only two proteins selected for fully regressed models. IGFBP2 was the only protein selected in all models in both cohorts. Improvements in R2 for stability selection proteins were intermediate to those observed with standard LASSO regression models in both RISC (range 0.352–0.366) and ULSAM (range 0.472–0.587) but still substantially higher than in models restricted to covariates alone. R2 for cross-cohort analyses remained largely similar in both directions (RISC to ULSAM range 0.444–0.524; ULSAM to RISC range 0.345–0.348) to R2 for standard LASSO models (RISC to ULSAM range 0.491–0.517; ULSAM to RISC range 0.345–0.369).

Table 3 Proteins selected by randomised LASSO stability selection analyses when regressed on raw M values or on the M value residuals after regressing out clinical predictors of insulin sensitivity

Discussion

We aimed to develop a novel blood-based proteomic signature to reliably estimate a direct measure of insulin sensitivity in men and women with normoglycaemia or impaired glucose tolerance using a high-throughput platform able to measure reliably hundreds of low-abundance proteins in plasma. Our principal findings are fourfold.

First, a large proportion of the 823 proteins measured with the PEA platform were statistically associated with IR in both the RISC and the ULSAM cohorts. Approximately half of the significant protein associations in one cohort were replicated in the other cohort and vice versa. While such observations may, in part, be driven by the correlation structure of the proteins measured in plasma, we noted several well-established proteins among the significant associations, as well as several novel ones. These results serve as a compelling starting point for further inquiry into the role of these proteins in the pathophysiology of IR.

Second, protein models derived by standard LASSO regression of the M value explained a notably larger R2 than models restricted to commonly available clinical variables related to IR. This increase was most evident when applying the standard LASSO within each cohort but was also present when the model from one cohort was applied to the other cohort, despite dramatic differences between the two in the distribution of anthropometric characteristics and health status at the time of the clamp. The increased variance explained persisted even in the face of an increasing degree of stringency in the threshold for inclusion of proteins with values below the LOD, suggesting substantial redundancy in the information provided by measuring hundreds of proteins in the plasma. While patterns were consistent when the M value was transformed into a binary indicator of IR, 95% CIs for AUCs overlapped substantially, probably reflecting a loss of power from discretising the outcome, as well as suboptimal inference for testing the null hypothesis of a delta AUC not equal to zero [25].

Third, a substantially higher proportion of the R2 of the M value was explained by proteins alone than by clinical variables alone in each cohort, even after including HOMA-IR among the clinical variables. Furthermore, a protein-only model performed as well as models that combined clinical variables and proteins together in most cases. These findings suggest that plasma protein signatures for IR derived from the proteins we measured not only provide incremental value to clinical variables but can also replace predictive clinical variables when they are not available. Consistent with this hypothesis is the observation that ~25% of proteins significantly associated with the M value when adjusting for age and sex alone lost their significance when BMI was added as a covariate.

Fourth, a stability selection algorithm including randomised LASSO selected a substantially smaller number of proteins and reduced the proportion of R2 explained within a cohort to an intermediate degree between that observed with clinical variables alone and that observed with clinical variables combined with the proteins selected by a standard LASSO approach. Differences in R2 were less marked for cross-cohort analyses. In these randomised LASSO cross-cohort analyses, models including only a small number of stable proteins achieved nearly the same performance as the standard LASSO cross-cohort analyses. For models trained in RISC and tested in ULSAM, the two ‘stable’ proteins in addition to clinical covariates achieved an R2 of 0.444, while the 51 standard LASSO proteins achieved an R2 of 0.491. Similarly, for models trained in ULSAM and tested in RISC, two ‘stable’ proteins achieved an R2 of 0.345, while 67 standard LASSO proteins achieved an R2 of 0.369. These findings suggest the presence of unstable features leading to some degree of model overfitting within a cohort despite our use of testing and training sets and cross-validations. They also suggest that proteins selected by the randomised LASSO method are likely to be the most generalisable and robust if implemented clinically to help predict IR across multiple populations, even when the baseline characteristics of those populations are quite divergent.

Few studies have reported on the utility of high-throughput proteomics of plasma to more reliably estimate IR [13]. Several studies have focused on the identification of prevalent type 2 diabetes or the prediction of incident type 2 diabetes while others have examined surrogate measures of IR [13]. To the best of our knowledge, this is the first study to combine high-throughput methodology with a direct measure of insulin sensitivity. Our findings highlight the potential of using a proteomic signature to estimate a direct measure of insulin sensitivity over and above the use of clinical variables and across a wide range of age and baseline health states. Such analyses may provide opportunities to markedly improve the identification of insulin-resistant individuals at risk of IR-related adverse health consequences [1, 10].

Several proteins selected repeatedly by LASSO for both continuous and binary outcomes in our primary analyses have long-standing and well-established connections to IR. The two most frequently selected included IGFBP2 and LEP, which also possessed the largest multivariate effect sizes per SD increase in measured protein. IGFBP2 is associated with multiple cardiovascular risk factors related to the metabolic syndrome and IR and is further regulated by LEP [26,27,28,29]. LEP regulates food intake, body weight and glucose metabolism, is associated with IR independent of body fat mass, and lowers blood glucose and insulin even in the absence of weight loss in mice [30]. LPL, the third most frequently selected protein, is also associated with obesity and other metabolic disorders related to energy balance, insulin action and body weight regulation [31]. More specifically, LPL is an important enzyme in the metabolism of triacylglycerol-rich lipoproteins associated with IR, including triglycerides, chylomicrons and very low-density lipoproteins. By facilitating adhesion of these lipoproteins to the vascular endothelium of specific tissues, LPL plays an important role in their targeted delivery to and use by insulin-sensitive tissues, such as muscle and liver, and their clearance from the blood [31]. Overexpression of LPL in target tissues such as muscle causes muscle-specific IR by promoting dysfunction in insulin signalling and action [32]. These three proteins were also selected by our stability selection algorithm.

Three proteins selected frequently by LASSO with less established roles in IR include RTN4R, ADGRG1 and INHBC. RTN4R has important roles in regulating axon regeneration and neuronal plasticity in the central nervous system, but the canonical ligand for RTN4R (reticulin 4) is also expressed in non-parenchymal cells within the liver where it blocks diet-induced hepatic lipid accumulation, steatosis and IR [33]. ADGRG1 is a receptor that facilitates adhesion of cells to collagen matrix, especially in the develo** brain, but it is also the most highly expressed G protein-coupled receptor in human and mouse pancreatic islets, with a high correlation between expression and many genes essential for beta cell function [34]. Lastly, INHBC is a member of the TGF-β family involved in the regulation of the secretion of follicle-stimulating hormone by the pituitary gland along with other inhibins and activins, and thus regulates hypothalamic and pituitary hormone secretion as well as the secretion of gonadal hormones and insulin [35, 36]. The INHBC gene is also highly expressed in the liver [37]. These three proteins were also selected by our stability selection algorithm. Two additional proteins with no obvious prior link to IR, secretoglobin family 3A member 2 (SCGB3A2) and integrin subunit alpha V (ITGAV), were selected by all stability selection algorithm models for RISC and ULSAM, respectively, and are worth highlighting. SCGB3A2 is a small secretory protein that is predominantly expressed in airway epithelial Club cells, It has anti-inflammatory, growth factor, anti-fibrotic and anti-cancer activities that influence various lung diseases including asthma [38]. ITGAV may regulate angiogenesis and cancer progression [39].

A key strength of our study is the profiling of many low-abundance proteins in plasma from two of the largest studies conducted to date that have made direct measurements of insulin sensitivity using the reference standard, the EIC, as well as undertaking comprehensive and standardised documentation of clinical risk factors related to IR. Our study is therefore the most comprehensive plasma proteomics study to date on IR. We note that the absolute incremental improvement in variance explained by the proteomic signature was comparable in the two cohorts despite the vastly different cardiometabolic risk profiles at baseline. This large difference in baseline health state in the two cohorts is likely to be responsible for the substantial differences in the proportion of variance explained by the clinical variables alone.

Our study also has weaknesses that are worth noting. First, the plasma used for the RISC study was not collected on the day of the EIC but rather within 1 month of the EIC. However, a recent study demonstrated remarkable stability and reproducibility of most protein measurements using the PEA over a period of 1 year [40]. Furthermore, any misclassification related to biological variability is likely to be non-differential and bias our results towards the null. Second, we did not assess the entire plasma proteome. Our limited assessment was also not completely unbiased as several of the panels we measured were designed specifically for the study of cardiometabolic disease. This semi-targeted platform thus might limit the utility of additional proteins, but this assumes that knowledge of predictors is already saturated, which is unlikely given that at least 5000 additional canonical and non-redundant proteins have been catalogued in the Human Plasma PeptideAtlas build 2021-07 [41]. Third, we were limited in our ability to demonstrate portability of the proteomic signature to people of non-European race and ethnicity with a differing prevalence of and predisposition to IR. Thus, further study in these populations is necessary to ensure that implementation of this approach does not further promote health disparities. Fourth, the cost of complete proteomic profiling remains very high, but the availability of custom panels has the potential to cut costs and facilitate the clinical implementation of this technology. Lastly, both the RISC and ULSAM studies are too restricted in their size and follow-up to have the power to clearly demonstrate a benefit of the M value in predicting adverse outcomes over modestly correlated proxy measures. By extension, this limitation also applies to the assessment of the incremental benefit of plasma proteomics. However, the relative utility of proteomic signatures for M values vs proxies of IR could soon be tested in large-scale population studies such as UK Biobank using the same technology [42].

In summary, our results suggest that plasma proteomic profiling has the potential to improve individual assessments of insulin sensitivity based on a reference measure. The measurement of additional proteins in combination with other -omics profiling should be explored to determine whether an even more accurate estimation of an individual’s insulin-mediated glucose disposal/uptake and subsequent risk of health consequences can be made and whether this approach can be successfully implemented in clinical practice.