Background

Metabolomics is an important tool in the identification of new etiological pathways associated with chronic diseases, including breast cancer [1,2,3,4,5,6,7,8], as the metabolome reflects both endogenous parameters and exogenous exposures [9]. Prospective studies using targeted metabolomics (analyses of a pre-defined panel of metabolites) or untargeted metabolomics approaches have reported novel associations of pre-diagnostic blood concentrations of endogenous metabolites with breast cancer risk. These metabolites include lysophosphatidylcholine a C18:0 [8], 16a-hydroxy-DHEA-3-sulfate [4, 5], various carnitines [4, 5], caprate (10:0) [6], histidine, glycerol, N-acetyl-glycoprotein [7], acetone, glycerol-derived compounds, other amino acids, and lipids [2, 3], suggesting new potential avenues of research and possible additional targets for prevention.

In a previous case-control study nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort, we investigated the association between blood concentrations of endogenous metabolites, measured by targeted metabolomics, and risk of breast cancer [1]. We reported a positive association between acetylcarnitine (C2) and breast cancer risk and negative associations of arginine, asparagine, phosphatidylcholines acyl-alkyl (PCs ae) C36:3, C34:2, C36:2, C38:2, and phosphatidylcholine diacyl (PC aa) C36:3 with breast cancer risk, among women not using exogenous hormones at blood collection.

To further assess how these findings can inform breast cancer prevention research, a better understanding of potentially modifiable determinants of blood levels of these metabolites is needed. Towards this aim, we report here the results of a cross-sectional analysis nested in the EPIC cohort to investigate associations of a wide range of lifestyle and anthropometric variables and acetylcarnitine, arginine, asparagine, PCs aa C36:3, ae C34:2, ae C36:2, ae C36:3, and ae C38:2.

Methods

The EPIC study

EPIC is an ongoing multi-center cohort study including approximately 520,000 participants recruited between 1992 and 2000 from ten European countries [10]. Female participants (n = 367,903) were aged 35–75 years at recruitment. Detailed information was collected on dietary, lifestyle, reproductive, medical, and anthropometric data at inclusion [10]. Around 246,000 women from all countries provided a baseline blood sample. Blood was collected according to a standardized protocol in France, Germany, Greece, Italy, the Netherlands, Norway, Spain, and the UK [10]. Serum (except in Norway), plasma, erythrocytes, and buffy coat aliquots were stored in liquid nitrogen (−196°C) in a centralized biobank at IARC. In Denmark, blood fractions were stored locally in the vapor phase of liquid nitrogen containers (−150°C), and in Sweden, they were stored locally at −80°C in standard freezers. All participants provided written informed consent to participate in the EPIC study. This study was approved by the ethics committee of the International Agency for Research on Cancer (IARC) and all centers.

Study population and cross-sectional design

This study included all female EPIC participants (1) who provided a blood sample; (2) who were previously included in one of six case-control studies on cancer etiology nested within the EPIC cohort (on breast [1], endometrial [11], colorectal [12], kidney [13], liver [14], and gallbladder cancers) with available blood concentrations of acetylcarnitine, arginine, asparagine, PCs aa C36:3, ae C34:2, ae C36:2, ae C36:3, and ae C38:2 measured by the same targeted metabolomics approach; (3) who were included as control participants in these studies (i.e., free of cancer (except non-melanoma skin cancer) at the time of the diagnosis of the cases, using incidence-density sampling, and matched to cases by age, sex, study center, time of blood collection, fasting status at blood collection (except for kidney cancer study), menopausal status and exogenous hormone use at blood collection (for breast, endometrial, liver, and gallbladder studies), and phase of menstrual cycle (for breast and endometrial cancer studies)); and (4) whose samples were included in an analytical batch including at least 10 samples, to ensure proper normalization of metabolite concentrations (see the “Statistical analyses” section) (N = 3163).

We then excluded women who declared use of hormones at blood collection (n = 768), and those whose hormone use status at blood collection was unknown (n = 37), because associations between the studied metabolites and breast cancer risk were limited to hormone non-users [1]. The current analysis included data from 2358 participants.

The 2358 participants were split into a discovery set (N = 1572, 66.7% of the population) and a validation set (N = 786, 33.3% of the population). Metabolites of interest were those found to be associated with breast cancer risk, and this observed association could result from associations between metabolites and some of the correlates under study in the present work. Thus, the discovery set included all controls from the breast cancer study (n = 1079), and randomly selected controls from the other nested case-control studies (n = 493), while the validation set did not include participants from the breast cancer study. This way, associations identified on the discovery set and further validated on the validation set are guaranteed not to be driven by the breast cancer study only.

Laboratory measurements

Before exclusions of hormone users, a total of 3179 samples were available for 3163 women. All samples, plasma (in 95.1% of samples) or serum, were assayed by liquid chromatography-mass spectrometry using the AbsoluteIDQ p180 commercial kit (Biocrates Life Sciences AG, Innsbruck, Austria). A total of 2289 (72.0%) samples were assayed at the laboratory of the Biomarkers Group at IARC (breast, colorectal, kidney, and liver cancer studies); 851 (26.8%) at the Imperial College, London; and 39 (1.2%) at the Helmholtz Zentrum, München, Germany. At IARC, analyses were run on a QTRAP5500 (breast, kidney, and liver cancer studies) and TQ4500 (colorectal cancer study) mass spectrometers (AB Sciex, Framingham, MA, USA), while at the Imperial College London and Helmholtz Zentrum, analyses were run using an API4000TQ (endometrial and gallbladder cancer studies). All analyses for a given study were performed using the same instrument. Sixteen participants had their samples analyzed in two different studies, at IARC and at the Helmholtz Zentrum, for whom the metabolite concentrations were averaged over the two measures.

Out of the 3179 samples, arginine concentrations could not be quantified in five, as they were below the lower limit of quantification (LLOQ) and were therefore imputed to half this LLOQ, consistently with previous work [1].

Covariate data

Details of data collection in EPIC are described elsewhere [10]. Lifestyle and medical factors were assessed in the baseline questionnaire. Usual dietary intakes were assessed using center- or country-specific validated questionnaires covering the previous 12 months and matched to the US Department of Agriculture food composition database to estimate macronutrient intakes [15]. Glycemic index and glycemic load were computed. In all EPIC centers, except France, Oxford, and Norway, height, weight, and waist and hip circumference were measured on all participants using similar protocols (in Umeå (Sweden), only weight and height were measured). In France and Oxford, weight, height, and waist and hip circumferences were measured in a sub-set of participants, but self-reported weight and height were obtained from all individuals, and validation studies showed high correlations between self-reported and measured values (r ≥ 0.90) [16, 17]. In Oxford, self-reported measurements also included waist and hip circumferences. In Norway, only self-reported height and weight were available.

Dietary data were used to compute the inflammatory score of the diet (ISD) [18] (reflecting the inflammatory potential of the diet based on 28 dietary components), the modified Mediterranean diet score [19] (a 9-component score indicating the degree of adherence to the traditional Mediterranean diet; 0 minimal adherence to 9 maximal adherence), and the Diet Quality Index-International (DQI-I; a 17-component score based on general nutritional guidelines [20, 21]; 0 to 100, minimal to maximal diet quality). Dietary and lifestyle data were combined to calculate the Healthy Lifestyle Index (HLI) [22], designed to reflect five components of lifestyle factors (smoking, alcohol consumption, diet (cereal fibers, red and processed meat, the ratio of polyunsaturated to saturated fatty acids, margarine, glycemic load, and fruits and vegetables), physical activity, and body mass index; ranging from 0, least healthy, to 20). Furthermore, we calculated the World Cancer Research Fund/American Institute for Cancer Research score, which reflects recommendations for cancer prevention on weight maintenance, physical activity, intake of food and drinks which promote weight gain, of plant-based foods, of animal-based foods, of alcohol, and breastfeeding [23] (from 0, low adherence to recommendation, to 7 for women).

Statistical analyses

Normalization of metabolite concentrations

A specific statistical pipeline was developed [24] and applied on raw metabolite concentrations (before exclusion of hormone users) to adequately pool measures obtained from different studies, instruments, and laboratories. This pipeline was shown to be efficient in removing unwanted variability and improving the comparability of measurements acquired across different nested studies. Log-transformed concentrations of the metabolites of interest were normalized to remove effects of analytical batch and study, which were estimated as random effects in mixed-effects linear models correcting for possible heteroscedasticity. Corrected metabolite concentrations analyzed in this work correspond to residuals from the model.

Missing data

When missing values on covariates represented less than 5% of the overall values, they were imputed to the mode value (categorical variables: number of full-term pregnancies, ever use of oral contraceptive, ever use of hormones for menopause (by menopausal status), education level, physical activity, smoking status, fasting status) or median (continuous variables: age at menarche, age at first full-term pregnancy (among parous women), duration of breastfeeding among women who breastfed, waist circumference, hip circumference, waist/hip ratio, time at blood collection). When missing values represented more than 5% of values for a variable, this variable was categorized, and a “missing” category was created (phase of menstrual cycle at blood collection for pre- and perimenopausal women, breastfeeding, lifetime alcohol consumption, Healthy Lifestyle Index, WCRF/AICR score).

Identification of correlates

Participants’ characteristics were described using frequencies for categorical variables and mean (standard deviation) for continuous variables. We calculated partial Pearson’s correlations between metabolite concentrations (adjusted for center and age) and between metabolites and age (adjusted for center).

Analyses were first run in the discovery set. For each metabolite of interest and each lifestyle variable, a linear regression model was built with metabolite concentration as a dependent variable. Models were adjusted for center of recruitment, age at blood collection, menopausal status (premenopausal, perimenopausal, postmenopausal [25]), phase of the menstrual cycle for premenopausal women (follicular, ovulatory, luteal, missing), time of the day, and fasting status at blood collection (“No”: < 3 h since last meal (< 4 h in Umeå), “In between”: 3–6 h (4–8 h in Umeå), and “Yes”: > 6 h (> 8 h in Umeå)). Models that examined age as exposure were not adjusted for age, and models with menopausal status as main exposure were not adjusted for phase of menstrual cycle, as this variable is defined in premenopausal women only.

Variables tested as possible correlates were age at blood collection (continuous), age at menarche (continuous), total duration of menstrual cycles (quartiles/missing), pregnancy (ever/never), number of full-term pregnancies (continuous), age at first full-term pregnancy (nulliparous/quartiles), breastfeeding (ever/never/missing), duration of breastfeeding (nulliparous/quartiles/missing), use of oral contraceptive (ever/never; current users excluded), menopausal status at blood collection (premenopausal/perimenopausal/postmenopausal), use of hormones for menopause (ever/never; current users are excluded), education level (no schooling or primary/technical, professional or secondary/longer education), physical activity (Cambridge Index [26]: inactive/moderately inactive/moderately active/active), smoking status (never/former/current), smoking status combined with intensity (never/current, 1–15 cigarettes/day/current, 16+ cigarettes/day/current, pipe/cigar/occasional/former, quit for ≤10 years/former, quit 11–20 years/former, quit > 20 years), baseline alcohol consumption (continuous, g/day), lifetime alcohol consumption (non-drinker/former drinker/current > 0–3 g/day/> 3–12 g/day/> 12–24 g/day/> 24 g/day/missing), BMI (continuous, kg/m2), waist circumference (continuous, cm), hip circumference (continuous, cm), waist/hip ratio (continuous), height (continuous, cm), total energy intake (continuous, kcal/day), and the following food components estimated as residuals on total energy intake (continuous, g/day): protein, carbohydrate, starch, sugar, fiber, fat (total), fatty acids (monounsaturated, polyunsaturated, saturated, trans, trans-monoenoic, trans-polyenoic), glycemic index (continuous), glycemic load (continuous), Healthy Lifestyle Index (0–10/11–15/16–20), WCRF/AICR score (quartiles/missing), modified Mediterranean diet score (continuous), diet quality index (continuous), and inflammatory score of the diet (continuous).

For each metabolite, P-values from F-tests for each variable were collected and were corrected for multiple testing by controlling for family-wise error rate at α = 0.05 by permutation-based stepdown minP adjustment of P-values, a method which accounts for dependencies between tests [27].

Validation

All statistically significant associations in the discovery set (based on P-values corrected for multiple tests ≤0.05) were assessed in the validation set, using the same model and categories of variables as in the discovery set. In this validation set, a more conservative approach was chosen for controlling for multiple tests [28], i.e., the Bonferroni correction based on the number of tests run for each metabolite.

For all variables showing a significant association with the metabolites of interest in both the discovery and validation sets, continuous variables were categorized (quartiles) and means of metabolites, with 95% confidence intervals, were estimated in each category, using the overall dataset (n = 2358).

Interactions

For each metabolite and each variable examined as potential correlate, we investigated interaction with fasting status (no/in between/yes), menopausal status at blood collection (pre-/peri-/postmenopausal), and BMI (18.5–24.9/25–29.9/≥30 kg/m2, excluding n = 15 participants with BMI < 18.5 kg/m2), in the discovery set. To do so, an interaction term was added in the model and the P-value associated with this term was evaluated, after correction for multiple testing using the permutation minP algorithm.

Sensitivity analyses

We conducted sensitivity analyses (1) excluding participants from the liver and gallbladder studies (n = 128), for which the blood fraction analyzed was serum and not plasma, and (2) excluding participants with self-reported diabetes (n = 71) or with missing data on diabetes status (n = 160) at recruitment.

Results

Participants’ characteristics overall and from the discovery and validation sets are shown in Table 1. Overall, 39.7% of the participants were not fasting at blood collection while 44.4% were considered fasting (more than 6 h since last meal (8 h in Umeå)). Around 30% of participants were premenopausal. Overall, participant characteristics were similar among discovery and validation sets (Table 1). Of note, the mean age (standard deviation (SD)) at blood collection in the validation set was 55.5 (8.1) years and 53.1 (8.6) years in the discovery set. Consequently, the proportion of postmenopausal women was 61.8% in the validation set and 51.4% in the discovery set. In the validation set, 42.0% of participants had ever used oral contraceptive (vs 50.3% in the discovery set), 53.3% of women had received none or primary education (vs 47.3% in discovery set), 29.9% were physically inactive (vs 24.7% in discovery set), 16.9% were current smokers (vs 21.6% in discovery set), and 26.3% were alcohol non-consumers (vs 19.2% in discovery set).

Table 1 Main characteristics of women included (hormone non-users only), overall and in discovery and validation sets

In all participants (N = 2358), strong correlations were observed between acyl-alkyl PCs (Fig. 1, Pearson’s correlation coefficients 0.61 to 0.92), while moderate correlations were observed between acyl-alkyl PCs and PC aa C36:3 (0.41 to 0.55). Arginine was moderately correlated with all metabolites except for acetylcarnitine (C2), with an observed correlation of 0.19 with asparagine and correlations ranging from 0.11 to 0.13 with PCs. Asparagine showed similar low correlations (0.12 to 0.15) with PCs and a negative correlation with C2 (−0.17). C2 showed the greatest correlation with age (0.23), followed by PC aa C36:3 (0.19), while for other metabolites correlations with age ranged from −0.09 to 0.07.

Fig. 1
figure 1

Partial Pearson correlations between metabolites identified as associated with breast cancer risk, and age (N = 2358). Metabolite concentrations were log-transformed and normalized as described in the “Methods” section. Coefficients are shown only for significant correlations (P-value < 0.05). Correlations between metabolite concentrations are adjusted for center and age, and correlations between metabolites and age are adjusted for center. Abbreviations: C2, acetylcarnitine; PC aa, phosphatidylcholine diacyl; PC ae, phosphatidylcholine acyl-alkyl

In the discovery set, 104 associations (31% of the 336 associations tested, 8 metabolites × 42 variables) had P-values ≤0.05 (Supplementary Table 1, see Additional file 1). After correction of P-values for multiple testing, 57 of these associations remained significant (Table 2), which did not include any associations with arginine. Thirty associations were replicated in the validation set (same direction as in the discovery set, Supplementary Table 1, see Additional file 1) after Bonferroni correction of P-values, which did not include any associations with PC aa C36:3 (Table 2).

Table 2 P-values for associations between metabolites and selecteda variables

Figure 2 represents means of the metabolite concentrations across categories of variables in the overall population (n = 2358), for metabolites and variables for which a significant association was detected in both the discovery and validation sets. Asparagine concentration was negatively associated with BMI, waist and hip circumferences, and WHR. C2 was positively associated with age but not with the other factors. PCs ae C36:2 and ae C38:2 were negatively associated with BMI, waist and hip circumferences, and waist/hip ratio. Negative associations with BMI, waist circumference, and waist/hip ratio were also observed for PCs ae C34:2 and ae C36:3. PC ae C34:2, C36:2, and 36:3 were additionally positively associated with total fat intake, and with saturated fatty acid intake, which was also positively associated with PC ae C38:2. For PC ae C36:2, additional associations were observed with alcohol intake at recruitment and over lifetime (negative) and with HLI and WCRF/AICR score (positive).

Fig. 2
figure 2

Adjusted means of metabolite concentrations by categories of correlates (N = 2358). Only metabolites and variables for which a significant association was detected in the discovery and validation sets are shown. Adjusted means and their 95% confidence intervals were obtained from linear regression models adjusted for fasting status, center, age, date and time at blood collection, menopausal status, and phase of menstrual cycle at blood collection. Dotted lines indicate the overall means of metabolite concentration. *Residuals on total energy intake. Abbreviations: Asn, asparagine; AICR, American Institute for Cancer Research; BMI, body mass index; C2, acetylcarnitine; PC aa, phosphatidylcholine diacyl; PC ae, phosphatidylcholine acyl-alkyl; WCRF, World Cancer Research Fund

Analyses of interactions with BMI, menopausal, and fasting status (Supplementary Table 2, see Additional file 1) did not suggest any significant interaction with these variables in the associations reported above. The only interactions with significant P-values after correction for multiple testing were with menopausal status for the association between asparagine and age (P-int = 0.04) and with fasting status for the association of height and PC ae C38:2 (P-int = 0.03).

When excluding serum samples (restricting the analysis to plasma samples) from both discovery (n = 40) and validation (n = 88) sets, results were largely consistent with those in the main analyses (data not shown), except for generally larger P-values (due to the lower statistical power) that led to the following non-significant associations in the discovery set: asparagine and WCRF/AICR score, PC aa C36:3 and age and BMI, and PC ae C38:2 and trans-polyenoic fatty acid intake. In the validation set, associations between asparagine and hip and waist circumferences were not statistically significant anymore. However, estimates were very close in direction and magnitude to the ones obtained overall (before exclusion of serum samples).

After exclusion of participants with self-reported diabetes at blood collection (discovery set, n = 45; validation set, n = 26) or with missing information on diabetes (discovery set, n = 86; validation set, n = 74), associations were very similar in direction and magnitude to those observed in the whole dataset, although sometimes not significant in the validation set (data not shown), such as asparagine and hip circumference and waist/hip ratio, and PC ae C36:3 and BMI and total and saturated fat intakes.

Discussion

In this study, we identified several lifestyle and anthropometric correlates of blood metabolites which have been previously associated with breast cancer risk in women not taking exogenous hormones at blood collection. Concentrations of PCs ae C34:2, ae C36:2, ae C36:3, and ae C38:2 showed negative associations with adiposity and positive associations with total (except for PC ae C38:2) and saturated fat intakes. PC ae C36:2 also showed a negative association with alcohol consumption and positive associations with the WCRF/AICR score and the Healthy Lifestyle Index. Asparagine concentrations were negatively associated with adiposity, and arginine concentrations were not associated with any of the variables examined. Acetylcarnitine concentrations were positively associated with age but not with any of the other factors. We did not identify any correlate of the only diacyl PC (PC aa C36:3) associated with breast cancer risk. These associations were consistent across different BMI, fasting status, and menopausal status categories.

Acyl-alkyl phosphatidylcholines have been previously associated with various lifestyle and dietary factors. In our work, concentrations of acyl-alkyl PCs were negatively associated with measures of adiposity (including BMI and waist circumference). This observation is consistent with the global pattern of negative associations between PCs ae and BMI previously reported in EPIC [12], in particular for PCs ae C38:2 [29] and ae C36:2 [30], and in the EPIC-Potsdam sub-cohort [31]. PC ae C38:2 and C34:2 were also associated with weight loss in an intervention study (n = 17 participants) [12]. In the EPIC-Potsdam sub-cohort [32], a negative association of several PCs ae was reported with risk of type 2 diabetes, as well as a positive correlation with circulating high-density lipoprotein cholesterol. In an analysis of two studies of Japanese and American men and women [33], PCs ae C34:2, 36:3, and 38:2 were negatively associated with metabolic syndrome (in particular with high-density lipoprotein cholesterol and triglycerides), but not with elevated waist circumference. Among 200 Canadian adults younger than 55 years, concentrations of PCs ae C34:2, C36:2, and C36:3 were lower in obese participants with metabolic syndrome than in obese participants without metabolic syndrome and in normal weight participants [34], while an opposite trend was reported for several PCs aa. These results support an association of PCs with obesity or metabolic health that deserves further investigation.

Lower concentrations of PCs were reported in vegetarian and vegan men than in meat eaters [35]. Moreover, analyses in colorectal cancer patients (60% males) indicated positive associations of several PCs, mostly acyl-alkyl, with Western and carnivore dietary patterns [36]. These results are consistent with the positive association we report with saturated fat intake. However, few studies have been conducted in women, and an analysis conducted among healthy participants from the KarMeN study, not using exogenous hormones, suggested differences in plasma concentrations of some PCs between men and women, although PCs were not the most important components for predicting sex [37]. A recent metabolomic study of plasma lipid-related profiles and diet quality in the Nurses’ Health Study [38] reported that PC C36:2 plasmalogen was associated with unhealthy components of the Alternate Healthy Eating Index.

A negative association of PCs and alcohol consumption, in particular PC ae C36:2, has been reported in EPIC, in both men and women [39]. A negative association with PC ae C36:2 was also observed separately in men and women from the KORA F4 study when comparing moderate-to-heavy drinkers (≥20 g/day for women, 40 g/day in men) with light drinkers (< 20 g/day for women, 40 g/day in men) [40], and in the CARLA study (men and women combined) [41].

The positive associations reported between PC ae C36:2 and the WCRF/AICR and HLI scores, which integrate alcohol and body weight components, likely reflect inverse associations of this metabolite with alcohol consumption and adiposity as demonstrated in the analyses of single correlates. These associations are in line with a recent study conducted in EPIC on metabolic signatures of a healthy lifestyle, assessed by the WCRF/AICR score [42]. In this work, PCs ae 36:2 and C38:2 were among endogenous metabolites with the greatest loadings (> 100 examined) in the signature of the WCRF/AICR score. This metabolic signature showed the greatest correlations with the recommendations regarding normal weight maintenance and alcohol avoidance, in line with the associations we report. In contrast, a study in colorectal cancer patients indicated negative associations between several PCs ae and aa and the WCRF/AICR score [36]. However, the score was restricted to its dietary components, therefore not considering the body weight component.

Metabolomics studies on aging reported increasing circulating concentrations of acylcarnitines, mostly long-chain, with age [43, 44], which could reflect loss in mitochondrial function [45]. In a study [46] comparing metabolites in serum samples obtained 7 years apart from the same individuals (KORA S4 and KORA F4), acetylcarnitine and several other acylcarnitines increased in the follow-up samples compared with baseline samples. Associations of similar direction were observed in their validation study on samples collected 4 years apart, although not statistically significant after accounting for multiple testing. Acylcarnitines have also been associated with impaired glucose metabolism and insulin resistance, but these associations were most often reported for long-chain or odd short-chain acylcarnitines [47,48,49,50,51], although associations with acetylcarnitine (which is an even short-chain acylcarnitine) have also been reported [52]. In our previous work, this metabolite was the only one to show a positive association with breast cancer risk in age-matched cases and controls, suggesting that its association with age does not fully explain the association with breast cancer. In the present work, we did not observe any association of acetylcarnitine with anthropometric factors likely associated with metabolic health, in contrast with a positive association with BMI reported in the EPIC Norfolk cohort [53].

A negative association between circulating asparagine and obesity has been recently reported in different populations, including Europeans [50, 53], obese Iranian adults [54], and Japanese [55]. Negative associations with diabetes and coronary artery disease have also been reported [50, 53], in lean as well as in obese subjects [49]. However, most studies exploring the associations between amino acids and obesity showed significant associations only with branched-chain amino acids (which do not include asparagine) [49, 56]. Asparagine was also part of the metabolic signature of a healthy lifestyle derived in EPIC [42] and of the metabolic signature of BMI, waist circumference, and waist/hip ratio [12].

In our study, arginine was not associated with any of the factors investigated. This result contrasts with those in several studies reporting negative associations of arginine with age [46] and with obesity and alcohol intake, as well as a positive association with smoking in the EPIC Norfolk cohort [53], which however had not excluded hormone users. Arginine has also been negatively associated with hemoglobin concentrations and with insulin-like growth factor 1 and estradiol [57] in premenopausal women not using exogenous hormones. These observations may suggest that arginine concentrations could potentially be more tightly regulated by endogenous metabolism compared to lifestyle exposures.

Major strengths of this work include the wide variety of data collected which enabled us to investigate many potential correlates for the metabolites associated with breast cancer risk, and the large sample size of our study, compared to other metabolomics studies, where large studies are essential [58]. With the detailed information available on characteristics of women at blood collection, we were also able to exclude hormone users from our analysis, which is important as hormone use could possibly affect concentrations of some metabolites [59].

A first limitation to this work is the cross-sectional design, which prevents us from drawing any conclusions on the timing or causality of the associations. Another limitation is that the large sample size was achieved by pooling data from different previous studies, rather than by initial design, therefore adding methodological complexity because of analyses performed by different laboratories, with different instruments, and on different biological matrices. However, the analytical protocol used has shown high inter-laboratory reproducibility [60], and we addressed potential heterogeneity in metabolite concentrations by develo** a dedicated pipeline [24] applied to the data prior to statistical analyses. In addition, for all metabolites included (except asparagine, not evaluated), high correlations were reported between measures in serum and in plasma (r ≥ 0.78, except for arginine, r = 0.50), although concentrations were generally higher in serum than in plasma, in particular for arginine [61]. Good reliability of measurements was also reported for both matrices (intra-class correlations for the metabolites of interest ≥0.58 in plasma, ≥0.67 in serum) [62]. Furthermore, exclusion of serum samples did not substantially modify the results. A third limitation is the heterogeneity of fasting status of participants. However, variables to determine fasting status were carefully recorded, therefore enabling us to test the effect of this variable on the results, and we found no evidence of heterogeneity in the associations by fasting status. Dietary intakes were assessed using food frequency questionnaires adapted to local habits. These questionnaires were validated through a calibration approach using a common 24-h diet recall [63] to adjust for possible systematic misclassification in dietary measurements, and a validation study using 24-h urine samples was conducted [64]. Despite these methodological efforts, however, potential measurement error may persist because of recall bias, misreporting of consumption for certain foods, or errors related to the food composition tables used (despite careful matching [15]). Nevertheless, several cross-sectional studies showing good correlations [65, 66] between intakes measured by food questionnaires and expected specific biomarkers suggest that data from food frequency questionnaires can be used for the purposes of the present work. Finally, the applied technology for PC measurement does not allow for precise identification of the compounds measured, since the signal observed is not specific and may correspond to different structural isomers. Further work is needed to investigate specifically associations with lipid compounds.

Conclusions

In conclusion, this cross-sectional analysis identified several modifiable correlates of blood concentrations of metabolites associated with breast cancer risk. These associations may indicate possible mechanisms underlying associations between lifestyle and anthropometric factors, and risk of breast cancer. To better understand how our results could improve our current knowledge on the association between lifestyle factors and breast cancer risk, dedicated tools, such as mediation analysis, bring promising perspectives. Intervention studies would be required to evaluate the possible causality of the associations observed with modifiable factors and to assess whether concentrations of these specific metabolites could be modified through lifestyle changes.