Background

More than 1.9 million new colorectal cancer (CRC) cases and 935,000 deaths from CRC were estimated to occur in 2020 worldwide, ranking third and second in incidence and mortality, respectively [1]. Despite the considerable reduction in incidence and mortality ascribed to screening and improved treatment, CRC is often diagnosed at advanced clinical stages. Therefore, identifying and reducing modifiable risk factors are attractive primary prevention strategies to counter the escalating global rise of CRC.

The potential health effects of plant-based diets have been increasingly recognized and ascribed to their environmental sustainability benefits [2, 3]. However, not all plant-based foods were beneficial to CRC. High intakes of whole grains, fruits, vegetables, and fiber were associated with a low risk of CRC [27]. We calculated the global polygenic risk score (PRS) for CRC based on an up-to-date genome-wide association study reporting 95 single-nucleotide polymorphisms (SNPs) significantly associated with CRC in participants of European descent [21]. The effect size of each SNP (β-coefficient) and other related information were shown in Additional file 1: Table S4. The PRS for CRC was calculated by summing the risk allele numbers of each SNP weighted by the effect size to CRC: PRS = (β1 × SNP1 + β2 × SNP2 + …+βn × SNPn) * (N/sum of β-coefficient), where SNPn was the risk allele number of each SNP.

Covariates

Sociodemographic factors (age at the last dietary assessment, sex, ethnicity, and educational qualifications) and lifestyle factors (alcohol intake frequency, smoking status, and physical activity) were self-reported at the baseline assessment. Townsend deprivation index was applied to indicate socioeconomic status, with higher scores equating to higher socioeconomic deprivation [28]. Alcohol intake frequency was classified as daily or almost daily, three or four times a week, once or twice a week, one to three times a month, special occasions only, and never. Smoking status was categorized as current smoker, former smoker, and non-smoker. Three levels of physical activity were proposed to classify populations (low, moderate, and high) based on the International Physical Activity Questionnaire guidelines [29]. Body mass index (BMI) was calculated as weight (kg) divided by the square of height (m) and classified as < 18.5, 18.5 to 24.9, 25.0 to 29.9, and ≥ 30.0 kg/m2. TEI was calculated based on their answers to the dietary questionnaire [30].

Statistical analyses

The PDI, hPDI, and uPDI scores were sorted in ascending order and classified by quartiles (Q1-Q4) using three breakpoints, i.e., P25, P50, and P75. We estimated the associations of three categorical PDIs with CRC incidence and mortality using a cause-specific Cox proportional hazards regression model with time-to-event as the timescale. The results were presented as hazard ratios (HRs) and 95% confidence intervals (CIs). The proportional hazards assumption was tested by the Schoenfeld residual method and satisfied. Missing values of covariates were treated as dummy variables. We successively adjusted for age and sex, ethnicity, education, Townsend deprivation index, BMI, alcohol frequency, smoking status, physical activity, TEI, PRS for CRC, first 10 principal components of ancestry, and genotype measurement batch. The PDIs were also treated as continuous variables, and HRs per 10-score increment were reported. To investigate the dose-response association between PDIs and CRC risk, we performed restricted cubic splines (RCS) fitted by Cox proportional hazards regression to flexibly model the CRC risk distributed by PDIs. We further investigated the association between PDIs and the incidence of CRC at different anatomical subsites.

We estimated the associations of PRS with CRC risk using a cause-specific Cox proportional hazards regression model. Then we conducted stratified analysis by CRC-PRS tertiles to assess the associations between PDIs tertiles and CRC risk among individuals with different genetic risks. Multiplicative interactions were tested by including a PDIs × PRS term in the fully adjusted model. We also estimated the joint association of PDIs and genetic risk with CRC by defining a combined variable according to the tertiles of genetic risk and PDIs (9 categories).

We conducted subgroup analyses stratified by sex in the incidence and mortality analysis, and further by age, Townsend deprivation index, BMI, alcohol frequency, smoking status, and physical activity in the incidence analysis. Multiplicative interactions were tested by including a “PDIs × covariates” term in the fully adjusted model.

For secondary analyses, we (1) conducted sensitivity analyses by excluding individuals with less than 2 years of follow-up to minimize the reverse casualty and using sub-distribution hazard models for competing risk; (2) examined the overall and sex-stratified association of three food categories (healthy plant foods, less healthy plant foods, and animal foods) with the CRC risk by adding the values in each food category together to understand which food category played a key role; (3) examined the PDIs-CRC associations after modifying the PDI and hPDI by assigning a positive score to the beneficial animal foods (dairy products and seafood) ascertained by the inverse association with CRC reported by the previous literatures [31, 32].

All analyses were performed using SAS version 9.4 (SAS Institute, USA) and R software (The R Foundation, http://www.r-project.org, version 4.0.2). A level of < 0.05 for two-sided P values was considered statistically significant.

Results

Characteristics of study population

The main baseline characteristics of participants by PDI, hPDI, and uPDI groups are shown in Table 1, Additional file 1: Tables S5 and S6, respectively. Among 186,675 cancer-free participants at baseline, the PDI ranged from 24 to 77, the hPDI ranged from 29 to 82, and the uPDI ranged from 28 to 79. Participants with higher PDI and hPDI but lower uPDI tended to be older, female, well-educated, non-current smokers, physically active, and with lower alcohol intake, TEI, and BMI.

Table 1 Baseline characteristics of 186,675 participants by PDI groups

Association between PDIs and CRC incidence

During a median of 9.5 years of follow-up (interquartile range [IQR], 9.4–10.3 years), 2163 CRC cases were documented. We did not observe significant departures from linearity when the non-linearity of PDIs with the incidence of CRC was tested (Pnon−linearity >0.05; Additional file 1: Fig. S2). Compared to the lowest quartile, multivariable-adjusted HRs of CRC incidence in the highest quartile were 0.87 (95% CI, 0.77–0.99) and 0.85 (95% CI, 0.75–0.97) for PDI and hPDI, respectively, and that in the second and highest quartile were 1.18 (95% CI, 1.04–1.33) and 1.14 (95% CI, 1.01–1.30) for uPDI, respectively (Fig. 1 and Additional file 1: Table S7). Additionally, per 10-score increments of PDI and hPDI were associated with 12% and 9% lower risks of CRC incidence, respectively.

Fig. 1
figure 1

Associations of PDI, hPDI, and uPDI with risk of CRC incidence. The models adjusted for age (continuous), sex (female, male), ethnicity (White, mixed, Asian, Black, Chinese, others, or unknown), education (college or university, vocational qualification, upper secondary, lower secondary, others, or unknown), Townsend deprivation index (in quintiles), body mass index (< 18.5, 18.5–24.9, 25-29.9, or ≥ 30 kg/m2), alcohol frequency (daily or almost daily, 3 or 4 times a week, 1 or 2 times a week, 1 to 3 times a month, special occasions only, never, or unknown), smoking status (never, former, current, or unknown), physical activity (low, moderate, high, or unknown), total energy intake (continuous), polygenic risk score for CRC (continuous), first 10 principal components of ancestry (in Units, continuous), and genotype measurement batch (continuous). CI confidence interval, CRC colorectal cancer, hPDI healthful plant-based diet index, HR hazard ratio, PDI plant-based diet index, uPDI unhealthful plant-based diet index

Concerning different anatomical subsites of CRC, the Q4 level of hPDI (HR: 0.77 [95% CI, 0.60–0.98]) and uPDI (HR: 1.30 [95% CI, 1.02–1.65]) were observed to be negatively and positively associated with risk of distal colon cancer, respectively (Table 2). Higher PDI (Ptrend = 0.0093) and hPDI (Ptrend = 0.0330) were associated with a reduced risk of rectal cancer. None of the three PDIs were associated with the risk of proximal colon cancer.

Table 2 Association between plant-based diet indices and risk of CRC incidence classified by anatomical subsites

The modification by genetic risk on the PDIs-CRC associations

There existed a non-linear relationship between PRS and CRC incidence (Pnon−linearity >0.05; Additional file 1: Fig. S3), and per SD increment of PRS accounted for a 45% increased risk of CRC incidence.

In stratified analyses by genetic risk, we observed a reduced risk of CRC incidence conferred by hPDI in subjects with low genetic risk and by PDI in those with intermediate and high genetic risk (Additional file 1: Table S8). In addition, no interaction between PDIs and PRS for CRC incidence was observed (Pinteraction >0.05).

The joint analysis showed a risk gradient with increasing genetic risk and decreasing PDIs quality (Fig. 2). Compared with individuals at the highest PRS and lowest PDI/hPDI category, the multivariable-adjusted HRs for CRC risk were 0.41 (95% CI, 0.34–0.50) among those at the lowest PRS and highest PDI category, and 0.37 (95% CI, 0.30–0.46) among those at the lowest PRS and highest hPDI category. Compared to those with the lowest PRS and uPDI, the multivariable-adjusted HR for CRC risk was 2.35 (95% CI, 1.92–2.87) in the highest PRS and uPDI.

Fig. 2
figure 2

Joint Associations of PDI, hPDI, and uPDI and PRS with risk of CRC incidence. The models adjusted for age (continuous), sex (female, male), ethnicity (White, mixed, Asian, Black, Chinese, others, or unknown), education (college or university, vocational qualification, upper secondary, lower secondary, others, or unknown), Townsend deprivation index (in quintiles), body mass index (< 18.5, 18.5–24.9, 25-29.9, or ≥ 30 kg/m2), alcohol frequency (daily or almost daily, 3 or 4 times a week, 1 or 2 times a week, 1 to 3 times a month, special occasions only, never, or unknown), smoking status (never, former, current, or unknown), physical activity (low, moderate, high, or unknown), total energy intake (continuous), first 10 principal components of ancestry (in Units, continuous), and genotype measurement batch (continuous). CI confidence interval, CRC colorectal cancer, hPDI healthful plant-based diet index, HR hazard ratio, PDI plant-based diet index, uPDI unhealthful plant-based diet index

Association between PDIs and CRC incidence stratified by subgroups

In the fully adjusted models, a significant association of the Q2 (HR: 1.37 [95% CI, 1.14–1.65]) and Q4 (HR: 1.29 [95% CI, 1.05–1.58]) levels of uPDI (Ptrend =0.0472) with an increased risk of CRC incidence was observed in females, whereas a reduced risk of CRC incidence conferred by higher PDI (HRQ4: 0.78 [95% CI, 0.66–0.92], Ptrend =0.0028) and hPDI (HRQ4: 0.79 [95% CI, 0.67–0.95], Ptrend =0.0069) was reported only in males (Additional file 1: Table S9).

We observed an inverse association of PDI with CRC incidence in participants who had lower Townsend deprivation index and normal BMI, drank alcohol frequently and had moderate physical activity (Additional file 1: Table S10). The negative association of hPDI with CRC incidence was revealed in older participants, who were less deprived and overweight, drank less alcohol, and never smoked (Additional file 1: Table S11). Meanwhile, we observed an interaction between hPDI and age (Pinteraction =0.0238). For uPDI, the positive association was restricted to older adults, non-smokers, and those with normal BMI and less alcohol intake (Additional file 1: Table S12).

Association between PDIs and CRC mortality

A total of 466 CRC deaths occurred after a median of 9.9 years of follow-up (IQR, 9.5–10.4 years). We did not observe a non-linear relationship between PDIs and CRC mortality (Pnon−linearity >0.05; Additional file 1: Fig. S4). As presented in Fig. 3 and Additional file 1: Table S13, the age-sex adjusted model showed a decreased risk of CRC mortality with the highest PDI (HR: 0.71 [95% CI, 0.55–0.92]), which was eliminated after additional adjustment for all covariates. However, the inverse association of PDI with CRC mortality was still present among males (Additional file 1: Table S14). Interestingly, hPDI showed a protective tendency in the male population (Ptrend =0.0388).

Fig. 3
figure 3

Associations of PDI, hPDI, and uPDI with risk of CRC mortality. The models adjusted for age (continuous), sex (female, male), ethnicity (White, mixed, Asian, Black, Chinese, others, or unknown), education (college or university, vocational qualification, upper secondary, lower secondary, others, or unknown), Townsend deprivation index (in quintiles), body mass index (< 18.5, 18.5–24.9, 25-29.9, or ≥ 30 kg/m2), alcohol frequency (daily or almost daily, 3 or 4 times a week, 1 or 2 times a week, 1 to 3 times a month, special occasions only, never, or unknown), smoking status (never, former, current, or unknown), physical activity (low, moderate, high, or unknown), total energy intake (continuous), polygenic risk score for CRC (continuous), first 10 principal components of ancestry (in Units, continuous), and genotype measurement batch (continuous). CI confidence interval, CRC colorectal cancer, hPDI healthful plant-based diet index, HR hazard ratio, PDI plant-based diet index, uPDI unhealthful plant-based diet index

Additionally, a null association between PDIs and CRC mortality was independent of genetic risk, and no significant interaction was found (Pinteraction >0.05; Additional file 1: Table S15).

Secondary analyses

The inverse association of hPDI with CRC risk disappeared when further excluding participants with less than two years of follow-up. The PDIs-CRC associations remained largely unchanged when using sub-distribution hazard models for competing risk (Additional file 1: Table S16). In addition, we observed a negative association between the intake of healthy food groups and CRC risk in males (Additional file 1: Table S17).

We further modified the PDI and hPDI by firstly assigning a positive score to dairy products (as beneficial components, HR: 0.96 [95% CI, 0.94–0.99]) and by secondly assigning positive scores to both dairy products and seafood (as potential beneficial components, HR: 0.97 [95% CI, 0.92–1.02]). We did not observe any non-linearity in the association of the modified PDI/hPDI and CRC risk (All Pnon−linearity >0.05; Additional file 1: Fig. S5). The results of both sensitivity analyses remained stable (Additional file 1: Table S18).

Discussion

In this large prospective study, we found that independent of genetic predisposition, greater adherence to PDI and hPDI was associated with a lower risk of CRC, predominantly distal CRC. The inverse association of PDI and hPDI with the risk of CRC incidence and mortality was more pronounced in males, but uPDI was positively associated with CRC incidence risk only among females. In the joint analysis, we observed a gradually decreased CRC risk ascribed to higher PDIs quality combined with lower genetic risk.

Over the years, following a plant-based diet has become increasingly popular, and studies have linked vegetarian diets to CRC risk. A meta-analysis of 3,059,009 subjects demonstrated that diets rich in plant-based food were associated with a lower risk of digestive system cancers, especially CRC [33]. Subsequently, two large-scale cohort studies from the UK Biobank concluded that low meat-eaters, even vegetarians, had a decreased risk of CRC compared with regular meat-eaters [34, 35]. However, adherence to a strict vegetarian or vegan diet has been challenging for a long time. Furthermore, these diets did not distinguish between healthier and lower-quality plant-based foods [36]. Therefore, Satija et al. proposed the PDIs considering the quality of plant-based foods [25]. However, previous evidence on associations between plant-based diets and CRC risk has been inconclusive. A case-control study in China observed an inverse association of hPDI but a positive association of uPDI with CRC risk [14]. A recent study in the Nurses’ Health Study (NHS) and the Health Professionals Follow-up Study (HPFS) obtained similar results and found a negative association of hPDI, especially with KRAS‐wildtype CRC [15]. However, a prospective cohort of women aged 26–45 years in the NHSII and another study of subjects in the HPFS, NHS and NHSII found that the three PDIs were not associated with CRC risk [16, 17]. The latest study from the UK explored the associations of hPDI and uPDI with risk of mortality and major chronic diseases and only found a positive association of Q2 and Q3 levels of uPDI with CRC risk [18]. Herein, we comprehensively and more deeply examined the associations between three PDIs and CRC-specific outcomes using a larger-scale sample size and found that the inverse associations of PDI and hPDI but the positive association of uPDI with CRC risk remained significant in the final model and sensitivity analyses. These findings supported evidence-based preventive interventions and highlighted the potential importance of the quality of plant-based foods for CRC prevention.

The hypothesis of gene-diet interactions in the etiology of CRC has long been supported [37]. A Danish nested study of 1038 cases and 1857 controls showed that CCAT2 rs6983267 T-allele carriers had a lower relative risk of CRC by red and processed meat intake compared to GG homozygotes [38]. Another case-control study of 9243 participants observed that red and processed meat intake increased CRC risk regardless of PRS levels [39]. The interplay between the overall genetic risk and the whole diet quality (e.g., PDIs) for CRC has not been reported. In the present study, we found that both PRS and PDIs could independently predict CRC risk. However, the inverse associations of PDI and hPDI and a positive association of uPDI with CRC risk were independent of genetic predisposition without any interactions, which signified that people with different genetic risks should all value the quality of plant foods.

Studies have explored the specific associations of plant-based diets and even vegetarianism with the anatomical subsites of CRC; however, these varied depending on the study design [33]. A previous meta-analysis of cohort studies reported no significant association between vegetarianism and colon and rectal cancer risk [40]. In contrast, our stratified analysis by CRC localization found that the effect of PDIs was more concentrated in the distal CRC, which was consistent with the results from the Multiethnic Cohort Study [19]. This might be ascribed to different distributions of the intestinal microbiome in various parts of the gut [41], and compared with the colon, the rectum is more susceptible to genotoxic and cytotoxic damage due to its longer transit time and the large accumulation of feces prior to defecation [42]. The present findings emphasized the role of plant-rich diets in the prevention of distal CRC.

Sex differences were observed in our results. Generally, the females consume more plant foods and fewer animal foods than the males [14]. In our study population, the females ate more healthy plant foods and less unhealthy plant foods, so there may be no further benefits from healthy plant foods, but they may suffer the harms of unhealthy plant foods. Besides, the males had a higher risk of CRC than the females [43], suggesting that a plant-based diet may offer more benefits for the males than the females in reducing risk.

The protective association of a high-quality plant-based diet with CRC could be partly attributable to food components and nutrients with antioxidant and anti-inflammatory properties. Nutrients abundant in healthy plant foods (e.g., polyphenols, such as proanthocyanidins and anthocyanin 3-glucosides in fruits and vegetables) were reported to act as antioxidants to inhibit the production of pro-inflammatory cytokines [44, 45] and have protective activities against CRC [46]. High levels of antioxidant micronutrients, such as vitamin E, vitamin C, carotenoids, and phytochemicals present in healthy plant-based diets, were related to lower levels of inflammation, while low-quality plant-based foods and meat could be proinflammatory [36, 47]. Furthermore, dietary fiber from whole grains, fruits, and vegetables processed protective activity on CRC by regulating prebiotic microbiota and fermentation rate [7]. These features of healthy plant-based diets might conduce to the prevention of CRC and should be taken into account in dietary recommendations for the general population.

The prospective study design and the large sample size were the two main strengths of this study. To our knowledge, this was the first longitudinal study to comprehensively investigate the association of plant-based diets with risks of CRC incidence and mortality considering genetic predisposition in the general population. Several limitations should be mentioned. First, due to a 5.5% participation rate in the UK Biobank, the recruitment was influenced by selection bias [48]. Studies have demonstrated that the lack of representativeness in the UK Biobank does not materially affect the associations between diets and health outcomes [49], but rather distorts genetic associations and downstream analyses [50]. Therefore, with respect to the analysis of genetic data, our study population may not be completely representative of the UK population. Second, the dietary assessment was based on 24-hour recall, which might be subjected to measurement error and lead to misclassification. Third, only 17 food groups were used to construct the PDIs due to the unavailability of vegetable oils in the current study, which was included in the original paper describing the PDIs by Satija et al. [25]. Fourth, the PDIs treat all animal-based foods equally without discrimination by assigning opposite scores, which may ignore benefits from some food components, such as dairy products and seafood. However, the results of our sensitivity analyses were stable by considering dairy products and seafood as healthful food groups. Fifth, we could not further subdivide meat into red and white meats, the latter of which may be associated with a reduced CRC risk [51]. Sixth, even though we had controlled the majority of confounders, the residual confounding from unmeasured or unknown factors might remain. Finally, our analyses were conducted among Europeans, limiting the extrapolation of our findings to other ethnic groups.

Conclusions

Our results suggested that adherence to higher-quality plant-based diets was associated with a lower risk of CRC incidence, particularly in distal CRC (distal colon and rectal cancer). Increased quality of plant-based diets combined with decreased genetic risk may have more benefits against CRC. These findings provided suggestions for future research on the importance of food quality when adhering to a plant-based dietary pattern for the prevention of CRC in the general population with different genetic predispositions.