Introduction

The tea plant, Camellia sinensis (L.) O. Kuntze, is a popular horticultural crop extensively cultivated across the world. It is found in non-alcoholic beverages and belongs to the Theaceae family, which includes evergreen woody species (Jia et al. 2020; Jiang et al. 2021). Tea contains several metabolites that impart unique flavors such as floral, fruity, sweet, bitter, and astringent (Qin et al. 2020; Wang et al. 2020a). Tea metabolites are not only related to sensory quality and medicinal value but also provide resistance to abiotic and biotic challenges in tea plants (Hu et al. 2022; Liu et al. 2020; Ren et al. 2020; Wang et al. 2017; Way et al. 2009; Zeng et al. 2021). Consequently, the investigation of tea metabolites has emerged as a primary focus of tea research. Traditionally, research on tea plants has focused on the leaves. However, the reproductive organ, the tea flower, plays a crucial role in plant physiology and tea production, and warrants further exploration.

Tissue-specificity in metabolomics has been documented in numerous species, with distinct types and quantities of metabolites found in vegetative and reproductive organs (Chen et al. 2016; Li et al. 2020; Ying et al. 2020; Zhou et al. 2019). Metabolites in tea plants vary between old and young leaves, as well as between leaves and flowers (Jia et al. 2016; Qiu et al. 2020). Flowers and leaves frequently compete for essential nutrients such as nitrogen, phosphorus, and potassium. In addition, metabolite competition between flowers and leaves frequently occurs in tea plants. During tea flowering, the levels of saccharides and certain amino acids gradually increase in the flowers and decrease in the leaves (Jia et al. 2016; Chen et al. 2018). Sugars and amino acids are not only key flavor metabolites, but can also directly or indirectly serve as modifiers (e.g., glycosylation and methylation) of other metabolites, influencing the resistance and yield of tea plants (Hu et al. 2022; Zhao et al. 2022). Organic acids accumulate during the flowering process of tea plants, participate in the tricarboxylic acid cycle, and contribute to the generation of multiple aroma components (Jia et al. 2016). Thus, detecting tea flower metabolites and revealing the underlying genetic mechanisms are critical for understanding the metabolism of the entire tea plant as well as gathering essential information for tea cultivar improvement.

In recent years, advances in sequencing technologies and bioinformatic methodologies have facilitated the assembly of several tea genomes, laying the groundwork for resequencing and genetic studies of tea plants (Wang et al. 2020b; ** to identify potential candidate genes related to metabolic variations (Huang et al. 2022). Based on the rich diversity found in natural populations and the extensive genotypic and phenotypic data available, GWAS have played an important role in the identification of metabolism-related genes in various plant species (Chen et al. 2016; Wen et al. 2014; Ying et al. 2020). It has been gradually applied to the study of tea metabolites (Fang et al. 2012). Based on linkage disequilibrium (LD) decay in the tea population, significant SNPs within 500 kb were merged into the initial QTL region; the merged QTL regions contained at least two SNPs. The final QTL region was extended by 100 kb based on the initial region. In the QTL region, candidate genes were selected based on functional annotation and correlations between phenotypic data and gene expression levels.

Haplotype analysis

The haplotype analysis toolkit CandiHap was used to preselect causal SNPs in candidate genes for target metabolites. SNPs that met the following criteria were screened using VCFtools (v 0.1.16): (1) missing data ≤ 0.1.(2) twoalleles only.(3) MAF ≥ 0.05, mean depth values ≥ 5. The identified SNPs were subsequently annotated using CandiHap, based on gene annotations from the genome of of an ancient tea tree, named as DASZ (Danecek et al. 2011; Li et al. 2023; Visscher et al. 2017; Zhang et al. 2020).

Results

High-throughput quantification of non-targeted metabolites in flowers of diverse tea plant collections

A total of 297 and 581 metabolites with stable peaks in positive and negative ion modes, respectively, were identified (Table S1). The analysis of the population metabolite content revealed that most of them followed a normal distribution (Fig. S1-S2). Among these metabolites, 454 were annotated based on molecular mass and MS/MS fragments using the Compound Discoverer 3.0 library. According to the Human Metabolome Database (HMDB), these metabolites were classified into eight superclasses: 103 organic acids and derivatives, 99 phenylpropanoids and polyketides, 69 benzenoids, 67 organic oxygen compounds, 43 lipids and lipid-like molecules, 29 organoheterocyclic compounds, and 21 alkaloids and derivatives (Fig. 1a, Table S1).

Fig. 1
figure 1

Classification of metabolites in tea flowers. (a) Superclass of metabolites in tea flowers. A, organic acids and derivatives; B, phenylpropanoids and polyketides; C, benzenoids; D, other organic oxygen compounds; E, lipids and lipid-like molecules; F, organoheterocyclic compounds; G, alkaloids and derivatives; and H, other metabolites. (b) Subclass of metabolites in tea flowers. A_a, carboxylic acids and derivatives; C_b, benzene and substituted derivatives; B_c, flavonoids; D_d, other organooxygen compounds; E_e, fatty acyls; G_f, alkaloids and derivatives; B_g, cinnamic acids and derivatives; D_h, sugars and derivatives; E_i, prenol lipids; and D_j, esters

The superclasses included 10 subclasses with more than 10 metabolites based on the HMDB: carboxylic acids and derivatives, benzene and substituted derivatives, flavonoids, organooxygen compounds, fatty acyls, cinnamic acids and derivatives, alkaloids and derivatives, sugars and derivatives, prenol lipids, and esters (Fig. 1b, Table S1). In addition to these major subclasses, there were over 50 subclasses, some of which contained only one metabolite (Table S1). Tea flower metabolites contribute to numerous metabolic pathways in tea plants. Metabolite annotation results revealed that tea flowers also had abundant catechins, caffeine, and theanine. The correlation between metabolites was primarily positive, with a few cases showing a significant negative correlation. Metabolites within the same class showed stronger associations than those between different classes (Fig. S3).

The differential metabolites between C. sinensis var. assamica (CSA) and C. sinensis var. sinensis (CSS)

The population of 171 tea genotypes comprised of 29 CSA and 142 CSS accessions. The two subgroups could not be distinguished by all of the metabolites detected in the principal component analysis (Fig. 2a). However, certain metabolites were differentially accumulated between CSA and CSS accessions. A combination of t-test and fold change (FC) analysis was used to identify differences in floral metabolites between the CSA and CSS accessions. A total of 31 differential metabolites were detected between CSA and CSS accessions (t-test P-value < 0.01 and FC ≥ 2 or FC ≤ 0.5), 18 of which were annotated in the Compound Discoverer 3.0 library. These differential metabolites included eight organic acids and derivatives, four fatty acyls, four benzene and substituted derivatives, two dihydrofurans, one flavonoid, and several unknown metabolites (Table S2). The accumulations of 19 metabolites were higher in CSA accessions than in CSS accessions (Fig. 2b, up metabolites), whereas 12 metabolites had lower accumulations in CSA accessions than in CSS accessions (Fig. 2b, down metabolites).

Fig. 2
figure 2

Metabolic classification of tea accessions between Camellia sinensis var. sinensis (CSS) and C. sinensis var. assamica (CSA) subgroups. (a) Principal component analysis based on all detected metabolites in the tea plant flowers. (b) Differential metabolites between CSS and CSA accessions

To investigate the variation in metabolite levels, the distribution of differential metabolites in populations categorized by genetic structure was analyzed. The 171 genotypes were divided into five sub-populations based on genetic variation using RNA sequencing (Zhang et al. 2020). Among these five subpopulations, most accessions in Pop1 belonged to CSA, and most accessions in Pop4 belonged to CSS; therefore, these two subpopulations were selected for differential metabolite analysis (Table S3). Generally, the differential metabolites between the two major tea varieties (CSA and CSS) were consistent with those between Pop1 and Pop4. However, there were a few outliers within the same subpopulation (Fig. 3a and Table S3). For example, Neg_132 (annotated as 3- [3,5-Dimethoxyphenyl] protonic acid in the Compound Discoverer 3.0 library) was a metabolite that showed high accumulation in Pop4 but low accumulation in Pop1. However, Neg_132 showed low accumulation in Zhongcha108 (3W10-8) and Zhongcha102 (3W2-23) of Pop4 and high accumulation in 3W10-21 of Pop1 (Fig. 3b). Pos_729 (annotated as [E, E]-alpha-farnesene in the CD 3.0 library) was highly accumulated in Pop1, but showed low accumulation in Pop4. However, Pos_729 showed low accumulation in accession 3W10-21 of Pop1 and high accumulation in accession 3W6-9 of Pop4 (Fig. 3c).

Fig. 3
figure 3

Distribution of differential metabolites in tea flower subgroups. (a) Heat map of the differential metabolites between Pop1 and Pop4. A, organic acids and derivatives; B, benzenoids; C, lipids and lipid-like molecules; D, organoheterocyclic compounds; E, phenylpropanoids and polyketides; F, unknown metabolites. (b) Content of differential metabolite Neg_132. (c) Content of differential metabolite Pos_729. Genotype 3W10-21 belongs to Pop1, genotypes 3W10-8, 3W2-23 and 3W6-9 belong to Pop4. Note: *** indicates a significant difference (P < 0.001; One-way ANOVA with Tukey’s test)

Genome-wide association study of metabolites in tea flowers

According to the GEC calculation, the population of 171 genotypes comprised approximately 200,000 SNPs that were evenly distributed across 15 chromosomes (Fig. 4). A total of 1,238 QTL were identified with a threshold of -log10 P ≥ 5.3, encompassing 5,666 genes. Our GWAS results revealed that some metabolites were associated with the same locus and frequently exhibited similar structures. Previous studies have indicated that metabolites with related structures frequently share metabolic pathways and are co-regulated by common genes (Zhou et al. 2019). In this study, we found 14 loci (-log10 P ≥ 4.8) associated with at least 8 metabolites, among which 7 were structurally similar (Table 1).

Fig. 4
figure 4

Number of SNPs within a 1 Mb window across 15 chromosomes of the tea plant genome. The change in color from white to red indicates the number of SNPs within a 1 Mb window

Table 1 Summary of loci associated with multiple structurally related metabolites

Locus 1 was associated with 12 metabolites on chromosome 3, seven of which contained a methoxy group (Table S4). Loci 3 and 4, located on chromosome 7, were associated with three metabolites (ECG-3''-O-ME, EGCG-3''-O-ME, 3-Methoxy-4-hydroxyphenyl-glycolglucuronide) related to the methoxy group ("-O-ME"). These metabolites were distinct from those associated with locus 1 (Table S4). In addition, loci 3 and 4 were co-associated with eight metabolites, demonstrating a significant correlation (Fig. S3). The methoxyltransferase gene caffeoyl-CoA O-methyltransferase (CsCCoAOMT, W07g015551), located on chromosome 7, generates methoxy groups in epigallocatechin gallate (EGCG).

Locus 2 on chromosome 6 was associated with 13 metabolites, six of which were annotated as carboxylic acids and derivatives. Locus 7 on chromosome 7 was associated with eight metabolites. Metabolite annotation results indicated that locus 7 was also related to amino acid metabolism. Furthermore, loci 2 and 7 were associated with N-feruloylglycine (Neg_1525, annotated in the CD 3.0 library) and N-methylhexanamide (Pos_2774, annotated in the CD 3.0 library). Additionally, N-methylhexanamide was associated with another locus related to amino acid metabolism, locus 10 (Table S4). The association of N-feruloylglycine with multiple loci indicated that it could be regulated by multiple loci of genes on different chromosomes.

Metabolites associated with locus 5 on chromosome 9 were annotated as aromatic compounds containing benzene and its substituted derivatives. Locus 6 on chromosome 10 was associated with eight metabolites, all of which were detected in positive ion mode and annotated as flavonoids and phenyl acid derivatives (Table S4). These metabolites were clustered together (Fig. S3) and shared the same MS/MS fragments. Most of them had fragment 139.04, which was speculated to be a flavonoid B ring fragment based on the Compound Discoverer 3.0 library (Table S5 and Fig. S4). Loci 8 and 9 were found to be associated with 11 and 9 metabolites, respectively. Furthermore, eight of these metabolites were identical, implying that they may share a similar genetic basis to a significant extent (Table 1 and S4).

Metabolite identification and candidate gene mining using combined analyses

Pos_915 was identified as ECG-3''-O-ME in the library and shared the same QTL with the unknown metabolites Neg_78 and Neg_118 (-log10 P ≥ 7.8) on chromosome 7, spanning from 19,373,965 bp to 19,574,651 bp (Fig. 5a–c). Notably, Pos_915 and Neg_118 had the same molecular weights, which led to their confirmation as ECG-3''-O-ME when compared to the standard. Neg_118 and Neg_78 contained three identical fragments, 125.02, 137.02, and 183.03, respectively, suggesting that they could be flavonoids with similar structures (Fig. 5h–j, Fig. S5). The molecular weight difference between the two metabolites was 16.04, which was consistent with the molecular weights of epicatechin gallate (ECG) and EGCG, leading to the hypothesis that Neg_78 was EGCG-3''-O-ME. A comparison of the retention index with the standards confirmed this hypothesis. The methyltransferase gene W07g015551 was included in QTL analysis (Fig. 5a). Haplotype analysis of W07g015551 revealed a SNP at 19,390,344 bp that affected the content of EGCG-3''-O-ME and ECG-3''-O-ME. When the nucleotide was cytosine (C), the codon encoded alanine with a high content of metabolites. The SNP was guanine (G) with a non-synonymous mutation (alanine mutated to serine) or absent, and its metabolite content was lower than that of base C (Fig. 5d–f). The gene W07g015551, which encodes O-methyltransferase in tea plants, transfers the methyl group from S-adenosyl methionine (SAM) to EGCG, resulting in the formation of EGCG-3''-O-ME in tea leaves (** et al. 2023b; Liu et al. 2023).

Fig. 5
figure 5

Genome-wide association studies (GWAS) and haplotype analysis drive the mining of (-)-epigallocatechin-3-(3"-O-methyl) gallate (EGCG-3''-O-ME) and (-)-Epicatechin-3-(3''-O-methyl) gallate (ECG-3''-O-ME) candidate genes. (a)–(c) Manhattan plot showing the GWAS results for metabolites Neg_78 (a), Neg_118 (b), and Pos_915 (c). (d) The structure and SNP location of candidate gene W07g015551. (e)–(g) Haplotype analysis of W07g015551 associated with the relative contents of metabolites Neg_78 (e), Neg_118 (f), and Pos_915 (g). (h), (i) Secondary mass spectrometry (MS/MS) spectra of Neg_78, Neg_118, and Pos_915. Note: *** indicates a significant difference (P ≤ 0.001; One-way ANOVA with Tukey’s test). Neg_78 is EGCG-3''-O-ME in negative ion mode, Neg_118 is ECG-3''-O-ME in positive ion mode, and Pos_915 is ECG-3''-O-ME in negative ion mode

GWAS results for Pos_1118 revealed a significant QTL (-log10 P = 6.03) on chromosome 8, which contained four genes (Fig. 6a). Following haplotype analysis of these genes, a SNP was identified on W08g018636 that corresponded to a high-low phenotype, where a guanine (G) or thymine (T) nucleotide was found at 131,986,699 bp on chromosome 8 (Fig. 6e). Metabolite concentration was high with the SNP G (encoding glycine), and low with the SNP C (encoding arginine) (Fig. 6b). The gene function annotation for W08g018636 was associated with lipid metabolism. Pos_1118 was identified as jasmonic acid (JA) in the library. Three fragments were identical between JA (Fig. 6c) and Pos_1118 (Fig. 6d). JA is a lipid-derived hormone with a cyclopentane ketone structure that regulates plant immunity and adaptive growth (**g et al. 2021; Wan et al. 2020b; Wang et al. 2019). As JA is related to lipid metabolism, W08g018636 is a potential candidate responsible for the content of Pos_1118 and the JA metabolic pathway.

Fig. 6
figure 6

Genome-wide association studies (GWAS) and haplotype analysis drive the mining of Pos_1118. (a) Manhattan plot showing the GWAS results for metabolite Pos_1118. (b) Boxplot of haplotype analysis of candidate gene W08g018636, corresponding to Pos_1118. (c) Secondary mass spectrometry (MS/MS) spectra of Pos_1118. (d) MS/MS spectra of jasmonic acid (JA). (e) Structure and SNP location of candidate gene W08g018636. Note: * indicates a significant difference in One-way ANOVA with Tukey’s test (* indicates P ≤ 0.05, *** indicates P < 0.001)

The metabolite of Neg_365 (Fig. 7c) was identified as p-coumaroylquinic acid. Neg_365 contained three primary fragments: 337.091, 191.055, and 173.044. The molecular weight of quinic acid was 192.167; therefore, fragment 191.055 was speculated to be the product of quinic acid under negative ions. The difference between fragments 337.091 and 173.044 was 164.047, indicating the hydration of the coumaroylated group. GWAS results revealed that the two significant loci on chromosomes 1 (-log10 P = 9.54) and 2 (-log10 P = 10.40) contained 38 and 30 genes, respectively (Fig. 7a). Haplotype analysis revealed that three non-synonymous mutations in W01g002625 corresponded to metabolites with high and low content. W01g002625 has a single exon (Fig. 7d). At 256,121,229 bp on chromosome 1 (SNP1, Fig. 7d), the nucleotide cytosine (C), which encodes threonine, resulted in low metabolite content, and the nucleotide T, encoding isoleucine, resulted in high metabolite content. At position 256,121,194 bp on chromosome 1 (SNP2, Fig. 7d), the nucleotide G encoding valine resulted in a low metabolite content, and the nucleotide C encoding leucine resulted in a high metabolite content. At position 256,121,107 bp on chromosome 1 (SNP3, Fig. 7d), the nucleotide encoding threonine resulted in a low metabolite content, and the nucleotide G encoding alanine resulted in a high metabolite content. These three loci were involved in controlling metabolite content. The haplotype of C-G-A (Hap2, Fig. 7d) corresponded to the low Neg_365 content, whereas the haplotype of T-C-G (Hap 1, Fig. 7d) corresponded to the high Neg_365 content (Fig. 7b). Notably, the nucleotides with higher metabolite content frequently appeared simultaneously, similar to the nucleotides with lower metabolic content. The W01g002625 gene annotation was acetyltransferase (anthocyanin 5-aromatic acyltransferase), suggesting that gene function and metabolite structure analyses were consistent.

Fig. 7
figure 7

Genome-wide association studies (GWAS) and haplotype analysis of Neg_365. (a) Manhattan plot showing the GWAS results for metabolite Neg_365. (b) Boxplot of haplotype analysis of candidate gene W01g002625, corresponding to Neg_365. (c) Secondary mass spectrometry (MS/MS) spectra of Neg_365. (d) Structure and SNP location of candidate gene W01g002625. Note: *** indicates a significant difference (P ≤ 0.001; One-way ANOVA with Tukey’s test)

Discussion

A diverse array of metabolites has been detected in tea plant flowers, including organic acids, flavonoids, and alkaloids, which are similar to those found in tea leaves (Li et al. 2018; Zhou et al. 2022; Zhuang et al. 2020). Organic acids were the most abundant metabolites detected in the tea plants flowers, followed by flavonoids. This could be related to the involvement of organic acids in the metabolism of aroma compounds in the flowers (Jia et al. 2016). In addition, tea plant flowers contain several saccharides that provide a sweet flavor and can be glycosylated into aroma compounds. Sweetness and aroma are essential to attract insects for pollination (Cui et al. 2023; ** et al. 2023a). Flavonoids play a crucial role in tea plants, and are extensively glycosylated and methylated. Some studies have shown that glycosylation and methylation of flavonoids influence plant traits such as stress resistance and yield (Dong et al. 2020; He et al. 2022). Under stressful conditions, tea plants tend to reproduce, leading to an increased number of flowers. These flowers may compete with the leaves for organic acids, flavonoids, and sugars for reproductive growth, which affects the yield, resistance, and quality of the tea plants.

Notably, compounds such as catechin, caffeine, and theanine, which are abundant in tea leaves and contribute to the taste of tea, are also present in tea flowers. This observation suggests that the flavor of tea flowers may be similar to that of tea leaves, which explains why tea flowers are consumed as beverages. When ingested as a drink, tea flowers tend to exhibit a more fragrant and sweeter taste than leaves. Most of the correlations between metabolites identified in this study were positive, suggesting that numerous metabolites are yet to be discovered. The experimental outcomes were associated with metabolite extraction, ion sources, and metabolite annotation. To investigate the metabolism of tea plants in the future, it is essential to improve metabolite identification and attempt to identify additional metabolites in tea using diverse experimental approaches. Such initiatives will facilitate the study of tea plant resistance, yield, flavor, and quality.

Tea plants are primarily comprised of two distinct varieties: CSA and CSS. Previous studies have demonstrated that typical CSA and CSS accessions can be differentiated using genomic analyses (Wang et al. 2020b; **a et al. 2020; Yu et al. 2020b), other research has revealed frequent genetic exchanges between different subgroups of tea (Zhang et al. 2021a, 2020). In the present study, principal component analysis could not differentiate between CSA and CSS based on metabolic data. However, differential metabolites were observed between the CSA and CSS accessions. The primary differential metabolites identified were organic acids and lipids, unlike the findings in tea leaves, where flavonoids were identified as the primary differential metabolites (Yu et al. 2020b). This finding implies that metabolic differences between tea subpopulations exhibit tissue-specific patterns.

Lipids and organic acids are essential components for plant growth and reproduction, as well as for stress resistance in tea plants (**g et al. 2021; Wan et al. 2020a, 2020b; Yu et al. 2020a). Abnormal quantities of differential metabolites in certain samples may be attributed to genetic variation. These samples provide valuable resources for future research on the genetic variation in tea plants.

Although non-targeted metabolic studies yield a significant number of mass characteristics, identifying each detected metabolite is a daunting challenge. GWAS results demonstrated that structurally similar metabolites are frequently associated with the same locus, implying that the genetic basis can be used to annotate metabolites. This is particularly beneficial for the classification of unknown metabolites. Additionally, GWAS can provide a large number of candidate genes, and subsequent haplotype analysis is an effective method for predicting candidate genes and conducting gene functional analysis (Li et al. 2021; Liang et al. 2021; Zhang et al. 2020).

Conclusions

The MS/MS data, when combined with GWAS results and haplotype analysis, enables the effective identification of metabolites and their underlying genetic mechanisms. This study provides novel insights into tea breeding, including the identification of exclusive alleles responsible for important agronomic characteristics. We used secondary mass spectrometry, genome-wide association, and haplotype analysis to identify the beneficial alleles within tea plant varieties. The application of these findings to tea plant breeding can improve the efficiency of tea plant cultivar development.