Introduction

Type 2 diabetes mellitus (T2D) is a metabolic disease characterized by raised fasting glucose levels due to insulin resistance and impaired insulin production. It is a leading cause of cardiovascular disease, blindness and kidney failure1. The 2017 global estimate of 425 million persons with diabetes is projected to increase by 48% to 629 million by 20452. A continuing challenge is the identification of persons at high risk of T2D, particularly in the absence of established risk factors such as obesity and poor diet3,4. This need is underscored by a 2017 survey by the charity Diabetes UK, where a top research priority for persons affected by T2D was to "identify people at high risk of type 2 diabetes and help to prevent the condition from develo**"5. Another challenge has been the identification of currently unknown molecular mechanisms of T2D that could act as novel treatment targets6.

Non-targeted (or untargeted) metabolomics describes the assessment of small molecules (< 1,500 Daltons in molecular weight) in biological specimens and comprises a broad range of peptides, carbohydrates, lipids and nucleic acids. Non-targeted methods such as ultra-performance liquid chromatography coupled to quadrupole-time-of-flight mass spectrometry (UPLC-QTOFMS) capture all metabolite signals detectable by the method at hand without a priori selection. In an electrospray ionization source (ESI), effluents from the liquid chromatography system are nebulized at atmospheric pressure and ionization occurs through the application of a strong electric field on the surface of the effluent droplets as they elute from the nebulizer. The size of the charged droplets diminishes as the formed molecular ions and molecular adducts travel towards the mass spectrometer for analysis, collision induced dissociation, and mass detection. The accurate mass, mass spectra, and retention time of each molecular ion is matched to metabolites by comparison to internal and external standards or public databases7,8,9. Serum and plasma metabolomics have been used to discover biomarkers and improve risk prediction for insulin resistance and T2D10,11,12,13. Far less attention has been paid to the urinary metabolome. A genome-wide association study showed about a two-thirds overlap between urinary and plasma metabolite loci in the genome14,15. One study in ~ 3,900 healthy persons found correlations between five-year change in glycated hemoglobin levels and baseline levels of urinary metabolites such a betaine and trimethylamine16. A cross-sectional study reported 94 metabolites in plasma, urine or saliva samples that differed between persons with and without T2D17. We are unaware of any published study that uses large-scale non-targeted urinary metabolomics for biomarker discovery or risk prediction of incident T2D.

Here, we use non-targeted UPLC-MS urinary metabolomics in two community-based cohorts of > 1,400 Swedish adults to discover metabolites associated with prevalent T2D and to assess whether urinary metabolomics improves risk prediction of incident T2D beyond an established clinical risk score.

Results

We included 789 participants of the PIVUS study (108 prevalent cases of T2D) and 635 participants of the ULSAM study (89 cases of prevalent T2D). Figure 1 shows the study flow, and baseline characteristics are displayed in Table 1.

Figure 1
figure 1

Study design.

Table 1 Participant characteristics.

In the discovery sample PIVUS, 7 out of 62 preliminarily annotated metabolites measured in both cohorts were associated with prevalent T2D after adjustment for sex, age and urinary creatinine at a false discovery rate (FDR) < 0.05: 3-methyxanthine (odds ratio, OR, per standard deviation increase and 95% confidence interval, CI, 0.70 [0.59, 0.84], P = 7.93 × 10−5), 2-hepteneoglycine (OR, 0.70, 95% CI [0.58, 0.84], P = 1.31 × 10−4), nonanoylcarnitne (OR, 0.71, 95% CI [0.57, 0.88], P = 1.44 × 10−3), L-tyrosine (OR, 1.36, 95% CI [1.12, 1.66], P = 2.05 × 10−3), irinotecan metabolite NPC (OR, 0.76, 95% CI [0.63, 0.91], P = 3.53 × 10−3), 3-hydroxyundecanoyl-carnitine (OR, 0.78 95% CI [0.65, 0.93], P = 4.71 × 10−3) and vildaglipitin (OR, 0.79, 95% CI [0.67, 0.93], P = 5.47 × 10−3) (Table 2).

Table 2 Associations between urinary metabolite levels and prevalent T2D in the discovery and replication samples in logistic regression adjusted for age, sex (for PIVUS only) and urinary creatinine per standard deviation unit increase in metabolite level.

Two of these preliminarily annotated metabolites were associated with prevalent T2D in the replication sample ULSAM at the nominal significance level: 3-hydroxyundecanoyl-carnitine (OR, 0.61, 95% CI [0.47, 0.79], P = 1.56 × 10−4) and nonanoylcarnitine (OR, 0.71, 95% CI [0.56, 0.89], P = 3.11 × 10−3). In-depth manual annotation (Supplementary Text and Supplementary Figures 110) confirmed the annotation of 3-hydroxyundecanoyl-carnitine. The second compound was annotated as the sodium adduct of nonanoylcarnitine (Supplementary Figures 813). Figure 2 shows the associations of the two replicated metabolite features in the combined cohorts with and without additional adjustment for T2D risk factors in the Framingham Offspring Study (FOS) diabetes model.

Figure 2
figure 2

Associations of the replicated urinary metabolites and prevalent T2D in the combined sample (n = 1,424). Results from logistic regression adjusted for age, sex, cohort and urinary creatinine (red color) and with additional adjustment for BMI, HDL-cholesterol, triglycerides, systolic and diastolic blood pressure, hypertension and family history of diabetes (blue color). Error bars denote 95% CI around odds ratios per standard deviation increase in urinary metabolite level.

To assess associations with incident T2D, we combined both cohorts after excluding prevalent cases of T2D at baseline (n = 1,227) and randomly split the sample into a two-thirds training (n = 818) and one-third test set (n = 409). Over a maximum of 12 years′ follow-up (mean 6.32 ± 3.1 years), there were 36 and 10 incident cases of T2D in the training and test sets, respectively. LASSO regression in the training set that forced cohort, age, sex, urinary creatinine and the FOS variables into the model selected six out of the 62 metabolites as the optimal parsimonious model to predict risk of T2D (C5H14S, indoleacrylic acid, sotalol, tranexamic acid, trans-ferulic acid, (3a,5b,7a,12a)-24-[(carboxymethyl)amino]-1,12-dihydroxy-24-oxocholan-3-yl-b-D-glucopyranosiduronic acid). In the holdout test set, the baseline model C statistic was 0.866 (95% CI, 0. 786–0.946, Nagelkerke's pseudo-R2 0.271), and the baseline-plus-metabolite model C was 0.892 (95% CI, 0.812–0.972, pseudo-R2 0.354; change in model fit likelihood ratio test, P = 0.276). Hosmer–Lemeshow test in the test sample did not reject the null hypothesis of good fit (baseline model P = 0.398, baseline-plus-metabolite model P = 0.257). In contrast, calibration plots of observed and predicted risk indicated that while the baseline model was well calibrated, the baseline-plus-metabolites model showed signs of underestimation of risk (Fig. 3). This discrepancy may be due to the small number of cases—particularly in the test set—which resulted in low statistical power of the formal calibration test and does not allow reliable conclusions about the merits of the model.

Figure 3
figure 3

Calibration plots in the baseline FOS model (left panel) and the baseline-plus-metabolites model (right panel) in the test sample (n = 409).

Discussion

In 1,424 Swedish adults enrolled in two community-based cohorts, we discovered associations between prevalent T2D and lower urinary levels of 3-hydroxyundecanoyl-carnitine and the sodiated adduct of nonanoyl-carnitine (Supplementary Text). We also found indications for improved risk prediction for incident T2D over an average 6-year follow-up period after adding six urinary metabolites to an established diabetes risk score that did not, however, reach statistical significance. The small number of cases demands cautious interpretation of the prediction results for incident T2D.

Association between urinary 3-hydroxyundecanoyl-carnitine level and T2D

3-hydroxyundecanoyl carnitine (C18H35NO5, HMDB0061637) belongs to the group of acylcarnitines, which are essential organic compounds composed of a fatty acid with a carboxylic acid attached to carnitine by an ester bond that are essential intermediates in fatty acid metabolism. This odd-numbered C11-carnitine occurs with relatively low abundance in the circulation and tissues when compared to even-chain acylcarnitines. The principal origin of odd-numbered medium-chain acylcarnitines remains elusive; odd-chain acylcarnitines originate both, from branched-chain amino acid catabolism, and to a lesser extent the peroxisomal processes of fatty acid alpha oxidation18. Despite the low abundance of odd-numbered acylcarnitines in biological matrices, the use of mass spectrometric methods has prompted the detection of C11-carnitine and other odd-numbered acylcarnitines in animal and human plasma19,20,21, urine20,22, as well as liver and kidney23,24. However, the distribution of odd-numbered acylcarnitines and other acylcarnitines between tissues, plasma, and renal excretion remains poorly understood25.

Whilst levels of various acylcarnitines in the circulation18,19 and urine20 have been associated with increased risk of T2D and insulin resistance21, there is a dearth of evidence linking specifically the odd-numbered medium-chain C11-carnitine to diabetes. In a comprehensive analysis including > 110 acylcarnitines in plasma and urine of leptin-deficient (db/db) mice, an accumulation of plasma medium- and long-chain acylcarnitines was accompanied by a decrease in urinary odd-numbered (C7, C9, and C11) medium-chain acylcarnitine levels20. None of the other seven acylcarnitines (variants of C5, C6, C8 and C10-carnitine) among the 62 automatically annotated metabolites in our study were statistically significantly associated with prevalent T2D (Supplementary Text).

Our study is the first in human participants to report an association between lower levels of urinary C11-carnitine and prevalent T2D. Lack of power to detect associations with other carnitine metabolites cannot be excluded, as our sample size was limited. Annotation certainty of this metabolite in our sample at Metabolomics Standards Initiative (MSI) confidence level 2 is comparatively good (Supplementary Text and Supplementary Figures 13), although our inability synthesize authentic standards for external validation of the annotation leaves some uncertainty.

Associations between another urinary carnitine metabolite and T2D

Lower urinary levels of C-571 were associated with prevalent T2D both before and after additional adjustment for established T2D risk markers (Fig. 2). This metabolite was initially computationally annotated as N-jasmonoylisoleucine, but review of the spectral data strongly suggests this signal as the sodiated adduct of nonanoyl-carnitine with the association signal possibly due to a statistical artifact (Supplementary Text, Supplementary Figures 1015). The precursor molecular ion (M + H) of nonanoyl-carnitine compound was not associated with any of the outcomes in our study and the signal for the sodiated adduct of nonanoylcarnitine could, in our opinion, be a statistical artefact. We are therefore unable to further explore the possible biology behind this association but provide detailed information on the annotation in the Supplementary Text and Supplementary Figures.

Strengths and Limitations

We report the first epidemiological study of non-targeted urinary metabolomics to assess the risk of prevalent and incident T2D in two independent community-based cohorts. Strict statistical controls for multiple testing, a discovery/replication design, over 10 years of follow-up in the ULSAM cohort, and the unbiased non-targeted metabolomics method are strengths of our study. Limitations include the limited power for incident T2D analysis and annotation uncertainties. The ULSAM cohort included only men, whilst the PIVUS cohort had a balanced sex ratio (all analyses were adjusted for sex). Our study used deep-frozen urine samples collected several years before the UPLC-MS technology became available, necessitating analysis of spot urine samples in PIVUS and 24-h urine collections in ULSAM. Analyses were adjusted for type of sample collection and difference in urine concentration (using creatinine levels as a proxy), but the different sampling methods may have impacted the results. In the absence of external validation and reanalysis of the samples (that were used up in the analysis), our annotation of metabolites remains unconfirmed. Future studies should strive for more controlled settings with regards to the collection of urine samples.

Conclusion

In our metabolomics study in over 1,400 adults, lower urinary levels of 3-hydroundecanoyl-carnitine were associated with prevalent T2D. We were unable to assign molecular identities to another T2D-associated signal, but provide extensive discussion of the mass spectral characteristics and possible identities. We report our complete results despite remaining annotation uncertainties as a pioneering effort to study non-targeted urinary metabolomics and T2D without a priori selection of potential metabolites or biomarkers of interest, and as our explanations of the analytical pipeline makes an innovative and informative contribution to the field of human metabolism research. The field of non-targeted metabolomics is young and the growing availability of comparison structures in molecular databases will improve the identification of metabolites in the future.

Methods

Participants

Uppsala Longitudinal Study of Adult Men (ULSAM)

Between 1970–1973, ULSAM enrolled 2,322 (81.7%) of all 2,841 men born between 1920–1924 who were residents of Uppsala county, Sweden26. Regular biomedical assessments have been carried out ever since as detailed here (https://www.pubcare.uu.se/ulsam/). The current study used data and a 24-h urine collection at age 77 years. Participants were followed up until assessment at 93 years of age or death according to the Swedish Death register. Urine metabolomics data from 635 individuals out of 839 that attended assessment were available (missing individuals are due to missing urine samples or insufficient sample quality as metabolomics was carried out in the 2010s on biobank samples obtained at assessment age 77 years in the early 1990ies).

Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS)

In 2001, the PIVUS study (https://www.medsci.uu.se/pivus/) enrolled 50% (n = 1,016) of a random sample of Uppsala community residents aged 70 years with the aim of comparing different measures of arterial compliance27. The current study is based on the assessment at age 75 years where spot urine sample were collected and participants were followed until re-assessment at 80 years of age or death. We included urine metabolomics data from 789 participants who had deep-frozen urine samples of sufficient quality available at the point of analysis in the 2010s.

Non-targeted metabolomics

Non-targeted metabolomics profiling of urine samples was carried out by ultra-performance liquid chromatography (UPLC) on a Waters Acquity UPLC system coupled to a Waters Xevo G2-Time-Of-Flight-Mass Spectrometry (TOFMS) platform at Colorado State University (Fort Collins, CO, USA). Data acquisition in the positive electrospray ion mode with a mass-to-charge ratio (m/z) range of 50–1,200 at 5 Hz was alternately performed at collision energies of 6 V and 15–30 V without discrimination or pre-selection. Cohorts were analyzed independently in multiple batches. Every set of 20 authentic samples was analyzed in triplicates interspersed with procedural blanks and control samples. There were 3,352 injection samples in ULSAM and 3,158 samples in PIVUS. Following data processing in XCMS28, there were 4,406 features in ULSAM and 3,615 features in PIVUS (including the control samples). LOESS curve normalization to correct for shift in intensity over time and probabilistic quotient normalization (PQN) to normalize samples based on dilution factors obtained from the median intensity across samples were carried out. XCMS parameters used for peak detection were nSlaves = 200, method = centWave, ppm = 25, peakwidth = c(2:25), snthresh = 11, mzCenterFun = wMean, integrate = 2, mzdiff = 0.01, prefilter = c(1,100). Peak alignment by retention time was carried out with the obiwarp method. Quality control included manual inspection of plots of total ion counts per sample and plots of peak number by retention time. Peaks were grouped with the parameters bw = 2, minfrac = 0.10, max = 1,000, mzwid = 0.01, sleep = 0.0001. Features with correlations across triplicate injections below 0.2 were removed; the triplicate peaks of features passing quality control were averaged. The final number of features was 4,084 in ULSAM and 3,178 in PIVUS. RAMClustR (version 1.0.4)29 was used to cluster features into spectra, interpretMSSpectrum30 was used to infer the molecular ion, and MS-Finder31 was used to annotate metabolites. Only annotated, quality-controlled metabolite features measured in both PIVUS and ULSAM were included in this study. Because this data-driven non-manual annotation can be liable to statistical artefacts, we refer to it as "preliminary/initial annotation" in the text. For all outcome-associated features, we went back to the original UPLC-MS data and carried out manual in-depth review to verify or refute the preliminary annotation. We present these validation steps for the main results of this study in the Supplementary Text and Supplementary Figures.

Outcome definition

In ULSAM, diabetes was defined as fasting plasma glucose ≥ 7 mmol/L, glycated hemoglobin HbA1c ≥ 6.5% (48 mmol/mol), use of anti-diabetic medication according to the Swedish Prescribed Drug Register ATC code A10, and/or diagnosis of T2D according to the National Patient Register. In PIVUS, diabetes was defined as fasting plasma glucose concentration ≥ 7 mmol/L, use of anti-diabetic medication, and/or diagnosis of T2D according to validated hospital records (whole blood glucose values were transformed to plasma concentrations by adding 11%). HbA1c measurements were not available in PIVUS. More information on the cohorts and assessments are available here https://www.pubcare.uu.se/ulsam/ (ULSAM), and here https://www.medsci.uu.se/pivus/ (PIVUS).s

Statistical analysis

For association analyses with T2D, we included all 62 preliminarily annotated metabolites present in both cohorts and excluded all features that could not be annotated or were present in only one of the cohorts. Log2-transformed metabolite signals were adjusted by ANOVA-type normalization within each injection run for winter season (an indicator variable for sampling during November to March) and storage time between sampling and UPLC-MS analysis, followed by averaging across injections.

We divided the association analysis into two parts (Fig. 1): In part 1, we used logistic regression adjusted for age, sex and urinary creatinine (measured with a colorimetric assay IL Test Creatinine 181672–00 on a Monarch 2000 centrifugal analyzer [Instrumentation Laboratories, Lexington, MA, USA] in ULSAM; and with a modified Jaffe reaction on an Architext Ci8200 analyzer [Reagent 3L81, Abbot, Abbot Park, IL, USA] in PIVUS) to test associations between each urinary metabolite (scaled to standard deviation units) and prevalent T2D at baseline. Urinary creatinine was included as a covariate because it was strongly associated with the dominant principle components in principle component analysis (implemented as part of the XCMS normalization steps), and to control for between-sample variation in urine concentration and sampling method (24 h versus spot collection). Metabolites associated at a false discovery rate (FDR) < 0.05 in the discovery sample PIVUS were tested in the replication sample ULSAM. In part 2, we used LASSO L1-regularised logistic regression to select urinary metabolites that together improved risk prediction for incident T2D when added to the risk factors in the Framingham Offspring Study (FOS) diabetes risk score32. We combined both cohorts, excluded all cases of prevalent T2D at baseline and randomly split the dataset into a 2/3 training and 1/3 holdout test set. The training dataset was used to develop the LASSO model by tenfold bootstrapped internal cross-validation and the test set was used only once to evaluate performance of the selected model with regard to risk discrimination (C statistic), calibration (plots of observed against predicted risk), goodness-of-fit (Hosmer–Lemeshow test) and explained variance (Nagelkerke's pseudo-R2). To develop the model in the training set, we forced cohort status and the FOS variables (age, sex, parental history of diabetes, body mass index, blood pressure, fasting glucose, HDL-cholesterol, triglycerides), into the model and allowed free shrinkage on all 6 urinary metabolite regression coefficients. Analyses were carried out in R version 3.3.3.

Study approval

All participants provided written informed consent. The study was approved by the Regional Ethical Review Board of Uppsala University (Dnr. 251/90; 97/329; 2/605 and 2007/338 for ULSAM; Dnr. 00,419; 2005/M-079 and 2011/045 for PIVUS) and has been carried out in accordance with the principles of the Declaration of Helsinki as revised in 2008. Data handling since May 2018 has been in accordance with the EU protection regulation 2016/679 ("GDPR").

Data and resource availability

Individual level data from ULSAM and PIVUS are not deposited in the public domain, as existing ethical permits and Swedish/EU data protection regulations do not allow this. Full datasets are made available to researchers who meet the criteria for confidential data access as stipulated by participant informed consent and institutional review board/ethics committee permission at Uppsala University (Uppsala, Sweden). Data access in ULSAM is granted through the Interdisciplinary Collaboration Team on Uppsala Longitudinal Studies (ICTUS; https://www2.pubcare.uu.se/ULSAM/res/proposal.htm; contact: vilmantas.giedraitis@pubcare.uu.se). Data from the PIVUS study can be applied for at the PIVUS steering committee (https://www.medsci.uu.se/pivus/; contact: lars.lind@medsci.uu.se).

De-identified raw mass spectrometry data (without phenotype or other identifying information) and the analysis code can be obtained without prior ethical or legal approval from the main author (christoph.nowak@ki.se).