Background

The human gut microbiome, which consists of multi-kingdom microbes of prokaryotes, viruses, protists and fungi, is essential to human health [1]. Current research mainly focuses on the prokaryotic and viral components of the gut ecology [2,3,4]. However, the complicated associations of other types of microorganisms, particularly fungi, with human health remain largely unknown. Although the fungal community, also known as mycobiome, comprises less than 1% of the entire human gut microbiome [5], they have been shown to be involved in disease pathogenesis and to profoundly influence the host immune system [6, 7]. For example, Candida albicans can cause infections in immunocompromised human hosts [8], and alterations of the gut mycobiome composition have been reported in multiple human diseases [9, 10]. While fine-grained fungal taxonomic markers associated with certain phenotypes have been reported [9, 11, 12], the overall structure of the gut mycobiome and the inter-individual variation in fungal composition remain unclear.

Enterotypes, which have been proposed to summarize the human gut microbial characteristics, are effective in stratifying populations and providing a global overview of the inter-individual variations in gut microbial composition [13, 14]. Multiple studies have consistently identified bacterial enterotypes, which are independent of the distribution of the hosts’ age, geography, and gender [13,14,15,16]. Defined based on the prokaryotic compositional patterns, enterotypes could enhance understanding of human health and facilitate intervention [17]. As an integral part of the human gut multi-kingdom microbiome, fungi share microhabitats with the prokaryotic microbiome in the gut through different types of interactions, such as mutualism, commensalism, and competition [18]. Notably, several fungi-bacteria synergistic interactions within the human gut have been reported to be associated with human diseases. For instance, Hoarau et al. [19] found a positive inter-kingdom correlation between Candida tropicalis and two bacterial species, Serratia macesecens and Escherichia coli, in individuals with Crohn’s diseases. The physical interactions among these three species resulted in the formation of robust biofilms, which potentially cause host’s tissue damage and trigger specific immune responses [20]. Hence, the interactions between fungi and bacteria within the human gut play important roles in sha** the ecology of the intestinal microbial community [18, 21]. However, the landscape of the human gut mycobiome and whether fungal enterotype-like structures exist in the human gut are unclear.

In this study, we collected 3,363 fungal sequencing samples from 16 cohorts across Europe, North America, and Asia, including 572 newly sequenced samples from China. Four fungal enterotypes were identified independently of cohorts and geographical regions and were closely correlated with bacterial enterotypes. We noticed strong effects of host phenotypes (including age and diseases) on the fungal enterotypes. Notably, the Candida (Can_type) enterotype, enriched in the elderly population, showed a higher prevalence in patients with multiple diseases, even beyond the age influence, and was associated with a severe compromised intestinal barrier. Furthermore, a Can_type-enriched aerobic respiration pathway mediated the association between the compromised intestinal barrier and aging. Overall, our findings elucidated the highly structured nature of the gut mycobiome and its clinical relevance to human health.

Results

Landscape of human gut mycobiome composition and diversity

To characterize the human gut fungal diversity and composition, we collected internal transcribed spacer (ITS) sequencing data from 15 published projects (Supplementary Table S1) [12, 22,23,24,25,26,27,28,29,30]. In addition, we recruited 572 Chinese participants (Chinese Gut Mycobiome cohort, or CHGM) aged from 17 to 89 years old and profiled their fecal mycobiome with ITS1 sequencing. In total, 3363 fecal samples with ITS1- (960 samples; hereafter referred to as ‘ITS1-combined’) or ITS2- (2403 samples; hereafter referred to as ‘ITS2-combined’) sequencing data from 16 cohorts covering three continents (Europe, North America, and Asia) were included in our study (Fig. 1a).

Fig. 1
figure 1

Composition and diversity of the human gut mycobiome across studies and geographic sites. a Geographic distribution of study populations and associated fungal enterotypes, where the datasets are sequenced with either ITS1 or ITS2 barcodes. b Genus-level gut mycobiome composition across the three continents (North America, Europe, and Asia). c Cumulative curves of the number of detected genera according to the number of sequenced samples from different study populations. d The distribution of fungal Shannon diversity across study populations. The Venn diagram shows the number of fungal genera detected by ITS1- and ITS2- based amplification. e, The correlation between the Shannon index of bacteria and that of fungi in the Zuo et al. [22] cohort, with shaded region representing 95% confidence intervals of the linear regression

The combined dataset (3363 samples) contained a total of 1,120 genus-level taxonomic groups, where 354 fungi were present in at least 10 samples, and the sequencing depth of most cohorts was sufficient to capture the diversity of the gut mycobiome (Figure S2). With sample rarefaction analysis, we noticed that the number of detected genera in the Germany (Andrea et al. [28]) and Chinese (CHGM) populations dramatically increased with an increasing number of samples, and the number of fungal genera detected in our CHGM cohort far exceeded those of other cohorts (Fig. 1c, Figure S1b). However, the observed number of the fungal genera was still considerably below the estimated saturation level, even when combining all datasets (Figure S1c), suggesting a requirement for a further increase in sample size to characterize the comprehensive gut fungal diversity. At the genus level, Saccharomyces and Candida were the most abundant genera across all samples, followed by Penicillium and Aspergillus (Fig. 1b). These genera are also the most common commensal fungi in other human body sites, including skin, lung, and oral cavity [58]. To better unveil the colonization of fungi in the gut, profiling of active fungal community by ITS cDNA analysis is needed in future work. Secondly, the interactions between the bacteria and fungi were not explored here. The landscape of multi-kingdom interactions can provide insights into the mechanisms underlying the gut mycobiome structure and its association with host physiological conditions. Finally, we explored the functions of gut fungi based on the metagenomics data. However, the metagenomics data is dominated by bacteria, which leads to the underrepresentation of functional profiling of gut mycobiome. Fungi-enriched metagenomics sequencing can be helpful to infer the complete functional profiling of the mycobiome in the future.

Conclusions

In this study, we characterized the human gut fungal community structures with a broad spectrum of ITS sequencing samples from 16 cohorts across 11 countries worldwide, including 572 newly ITS-profiled and metagenomically sequenced samples from China. We confirmed the existence of four fungal enterotypes that varied in taxonomic and functional compositions. These enterotypes showed close associations with both age and diseases, with the Candida-dominated enterotype being particularly enriched in the elderly population and associated with multiple human diseases accompanied by a compromised intestinal barrier. Bidirectional mediation analysis further revealed that the Can_type-associated fungi-contributed aerobic respiration pathway could mediate the association between aging and the compromised intestinal barrier. These findings reveal both the biological and clinical significance of fungal enterotypes and offer a new perspective on host-microbe interactions.

Materials and methods

Data collection

We downloaded ITS sequencing data of fecal samples from public databases including National Center for Biotechnology Information (NCBI) sequence read archive (SRA) and China National GeneBank database (CNGBdb). Samples with read number fewer than 10,000 were discarded. Due to the instability and large difference in the human gut mycobiome of infants, we excluded samples from infants. Metadata including demographics (e.g., age, gender, BMI, country) and human disease phenotypes were also retrieved from corresponding publications or databases. As a result, we collected a total of 2791 public samples from 11 countries covering multiple human disease phenotypes including clostridium difficile infection (CDI), alcohol use disorder (AUD), coronavirus disease 2019 (COVID-19), type 2 diabetes (T2D), irritable bowel syndrome (IBS), alcoholic hepatitis (ALHP), Crohn’s disease (CD), and melanoma. The details for each project including the number of samples, country, associated disease phenotype and used amplicon targets were listed in Supplementary Table S1.

We additionally collected human fecal samples from newly recruited 572 Chinese volunteers (CHGM cohort) with age ranging from 18 to 89 years old, where the fecal mycobiome was profiled with ITS1 amplification. Of these samples, 74 were collected from subjects with Alzheimer’s disease (AD) enrolled in Shanghai Sixth People’s Hospital, whereas others were obtained from healthy volunteers recruited in Wuhan, Shanghai, and Zhengzhou. Subjects who take antibiotics, antifungals or probiotics up to 1 month prior to sample collection were excluded from this study. The study protocol was approved by the Human Ethics Committee of the School of Life Science of Fudan University (No. BE1940) and the Ethics Committee of the Tongji Medical College of Huazhong University of Science. All subjects provided informed consent before participation and were asked to complete questionnaires. In total, the combined dataset consisted of 3363 samples from 16 cohorts and covered 11 countries from three continents, including Europe (615 samples), North America (344 samples) and Asia (2404 samples); among which, the fungal compositions of six and nine cohorts were determined by ITS1- (960 samples) and ITS2- (2403 samples) sequencing.

DNA extraction from fecal samples

After sample collection, the fecal samples from the CHGM cohort were immediately stored on dry ice and transported to a refrigerator at – 80 ℃ within 5 h. Total DNA was extracted from fecal samples using semi-automated DNeasy PowerSoil HTP 96 Kit (Qiagen, 12,955–4) according to manufacturer’s instructions. The purified DNAs were quality-checked by 1% agarose gel, and DNA concentration and purity were determined with NanoDrop 2000 UV–vis spectrophotometer (Thermo Scientific, Wilmingtom, USA).

ITS sequencing and procession

The mycobiome of CHGM cohort was profiled by the sequencing of Internal Transcribed Spacer (ITS), and the ITS1 hypervariable region was amplified with primer pairs ITS1F (5′-CTTGGTCATTTAGAGGAAGTAA-3′) and ITS2R (5′-GCTGCGTTCTTCATCGATGC-3′) [59] by an BI GeneAmp® 9700 PCR thermocycler (ABI, CA, USA). The PCR amplification was conducted as follows: initial denaturation at 95 ℃ for 3 min, followed by 27 cycles of denaturing at 95 ℃ for 30 s, annealing at 55 ℃ for 30 s, elongation at 72 ℃ for 45 s and a final extension at 72 ℃ for 10 min. The PCR mixtures (20 μL total value) contained 4 μL of 5 × FastPfu buffer, 2 μL of 2.5 mM dNTPs, 0.8 μL of each primer (5 μM concentration), 0.4 μL of FastPfu DNA Polymerase and 10 ng of template DNA. The PCR products were extracted from 2% agarose gel and purified using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Union City, CA, USA) according to manufacturer’s instructions, and further quantified using Quantus™ Fluorometer (Promega, USA). Purified amplicons were pooled and paired-end sequenced on Illumina MiSeq PE300 platform (Illumina, San Diego, USA) according to the standard protocols by Majorbio Bio-Pharm Technology Co. Ltd. (Shanghai, China).

The raw ITS reads were first demultiplexed, quality-filtered by fastp version 0.20.0 [60] and merged by FLASH version 1.2.7 [61] with the following criteria: (i) the 300 bp reads were truncated at any site with an average quality score < 20 over a 50-bp sliding window, and the truncated reads shorter than 50 bp were discarded; (ii) only overlap** sequences longer than 10 bp were assembled according to their overlapped sequence, and the maximum mismatch ratio of overlap region is 0.2. QIIME2 (version 2019.7) was used for the downstream analysis [62]. The quality-filtered ITS reads were then denoised and clustered into amplicon sequence variants (ASVs) using DADA2 [63], and chimeric sequences were identified and removed. Then the Naïve Bayes classifier trained on the UNITE reference database [64] was used for taxonomy assignment of individual ASVs. \(\alpha\)- and \(\beta\)-diversity analysis was conducted on samples at the sampling depth of 10,000 by utilizing the R packages “vegan” (version 2.5–7) [65] and “phyloseq” (version 1.34.0) [66]. \(\alpha\)-diversity was estimated by the Shannon index (evenness and richness of community within a sample), Simpson index (a qualitive measure of community diversity that accounts for both the number and the abundance of features), Faith’s phylogenetic diversity (or Faith’s PD; a qualitative measure of community diversity that incorporates both the phylogenetic relationship and abundance of the observed features) and richness (observed number of features). The fungal genera presented in less than 10 samples were excluded from downstream analysis.

Metagenomics sequencing and processing

The fecal bacterial microbiome of CHGM cohort was profiled by whole-genome shotgun sequencing with Illumina HiSeq 2000 platform (Novogen, Bei**g, China). DNA libraries were prepared as described previously [67]. The raw sequencing reads were quality-filtered using fastp version 0.20.0, followed by the use of Bowtie2 [68] to remove host-derived reads by map** to the human reference genome (hg38). Quantitative profiling of the taxonomic composition of the microbial communities was performed via MetaPhlAn2 [69]. Profiling of microbial pathways was performed with HUMAnN2 v2.8.1 [70] by map** reads to Uniref90 [71] and MetaCyc [72] reference databases. Both the abundance output of MetaPhlAn2 and HUMAnN2 were normalized into the relative abundance. We extracted the metabolic pathways of gut fungi for downstream analysis. The metabolic pathways or bacterial species presented in less than 10 samples were excluded from downstream analysis. To estimate the percentage of human DNA contents (HDCs) within CHGM cohort, we aligned the clean reads to the human reference genome with bowtie2, and the HDCs was calculated as the percentage of mapped reads to the total number of clean reads.

16S rRNA sequencing data processing

The 16S rRNA sequencing data available for four cohorts including Lemoinne et al. [27], Vitali et al. [73], Prochazkova et al. [30], and Zuo et al. [22] were downloaded from NCBI SRA. Raw 16S reads were quality filtered, clustered into ASVs and taxonomic annotated using QIIME2 (version 2019.7) as described above. The taxonomies of ASVs were annotated by using the SILVA database [74]. \(\alpha\)- and \(\beta\)-diversity analysis was conducted on samples at the sampling depth of 25,000. The bacterial genera presented in less than 10 samples were excluded from consideration.

Fungal enterotype clustering

The fecal samples of ITS1 and ITS2 amplification were separately clustered into fungal enterotypes by using a partitioning around medoid (PAM) clustering method [75] as those previously described for bacterial enterotype discovery [13, 14]. Briefly, the samples were grouped into clusters with partitioning around medoid (PAM) based on the between-sample Bray–Curtis distance calculated at genus-level, where three other widely used distance matrices including Jaccard, Kulcxynski, and Jensen-Shannon distance (JSD) were also considered to validate the robustness of fungal enterotypes (Figure S4a). The optimal number of clusters was determined by the silhouette index. The driver genera of each enterotype was determined as the genus with the highest relative abundance in the enterotype.

We further validated the structural similarity of fungal enterotypes obtained separately from ITS1 and ITS2-combined fungal datasets. Specifically, we performed cross-dataset validation between ITS1 and ITS2 datasets with one dataset used for training a LASSO logistic regression model [76] to predict the fungal enterotype in the other dataset. To determine whether the fungal enterotypes can reflect the overall community structure and not only the difference of the driver genera, we further removed driver genera, Candida, Saccharomyces, Aspergillus, Saccharomyces sp and Ascomycota sp from the data and re-performed cross-validation as described above.

Gut aging index

We calculated the gut aging index (GAI) by using the relative abundance of 21 age-associated gut fungal genera. Subjects with diseases or age below 18 years old were excluded from this analysis. To identify age-associated fungi, we adopted a multivariate linear regression analysis on 531 healthy subjects with age ranging from 18 to 90 years from four cohorts (i.e., the CHGM cohort, Gao et al. [23], Limon et al. [12], and Zuo et al. [22]) to examine the associations between age and the relative abundance of fungal genera with the adjustment of gender and cohort. The fungal genera associated with a p values < 0.05 in the linear regression test were considered as “age-associated”. We grouped these age-associated gut fungal genera into two sets \({M}_{P}\) and \({M}_{N}\), where \({M}_{P}\) was the set of fungal genera positively associated with age and vice versa for \({M}_{N}\). We then coupled these two fungal genera sets with a computational procedure (see below) to define a gut aging index (GAI) for a mycobiome sample. The GAI of sample i is defined as.

$$GAI=log10\left(\frac{{R}_{{M}_{P,i}}}{|{M}_{P}|}{\sum }_{j\in {M}_{P}}{x}_{j,i}/\frac{{R}_{{M}_{N,i}}}{|{M}_{N}|}{\sum }_{j\in {M}_{N}}{x}_{j,i}\right),$$

where \({R}_{{M}_{P,i}}\) denotes the richness of \({M}_{P}\) (or the number of present fungal genera of \({M}_{P}\) in sample i) in sample i, \(|{M}_{P}|\) is the size of set \({M}_{P}\) (or the overall number of fungal genera in \({M}_{P}\)), \({x}_{j,i}\) denotes the relative abundance of fungi j in sample i and the same for \({R}_{{M}_{N,i}}\) and \(|{M}_{N}|\). The calculation of GAI considered both the richness and the relative abundance of age-associated gut fungal genera to quantify the balance between \({M}_{P}\) and \({M}_{N}\). Due to the difference between the set sizes of \({M}_{P}\) and \({M}_{N}\), we calculated the proportion of the present fungi of these two sets for each sample (\(\frac{{R}_{{M}_{P}}}{|{M}_{P}|}\) and \(\frac{{R}_{{M}_{N}}}{|{M}_{N}|}\)) instead of the richness \({R}_{{M}_{P}}\) and \({R}_{{M}_{N}}\). As such, a higher GAI or GAI > 0 indicates that a more age-positive related fungal profile rather than an age-negative related fungal profile in one sample, and thus suggests a higher intestinal aging degree.

Statistical analysis

All statistical analysis were conducted using R version 4.0.5 within RStudio and all figures were visualized by using “ggplot2” package version 3.3.5 [77]. The quantification of the variance explained by factors (e.g., continent, amplicon target) was calculated using the permutational multivariate analysis of variance (PERMANOVA, permutations = 999, distance = “bray”) as implemented by the “adonis” function in the R package “vegan”. Correlation between the \(\alpha\)-diversity and chronological age was assessed with Spearman’s correlation. Comparisons of enterotype characteristics (e.g., diversity), host phenotypes (e.g., BMI, age, gender, disease) and health related index (e.g., HDCs, GAI, and GMHI) across fungal enterotypes were performed using Fisher’s exact test or chi-square test for categorical variables and Wilcoxon rank-sum tests for continuous variables. The pathways enriched in each enterotype were determined by using a Wilcoxon-rank-sum test, where the other three enterotypes were combined into a single group. The bi-directional mediation analysis was performed using the “mediate” function within the R package “mediation” (version 4.5.0) [78] with 1000 bootstrap sampling times to infer the causal role of the aging in contributing to the compromised intestinal barrier through the fungi-contributed aerobic respiration pathway. For analysis regarding multiple comparisons, the Benjamini–Hochberg false discovery rate (adjusted p) [79] was employed to correct for multiple testing. The results with adjusted p < 0.05 were considered significant without statement specially.