Population sequencing enhances understanding of tea plant evolution

Wang, **nchao; Feng, Hu; Chang, Yuxiao; Ma, Chunlei; Wang, Liyuan; Hao, **nyuan; Li, A’lun; Cheng, Hao; Wang, Lu; Cui, Peng; **, Jiqiang; Wang, **aobo; Wei, Kang; Ai, Cheng; Zhao, Sheng; Wu, Zhichao; Li, Youyong; Liu, Benying; Wang, Guo-Dong; Chen, Liang; Ruan, Jue; Yang, Yajun

doi:10.1038/s41467-020-18228-8

Population sequencing enhances understanding of tea plant evolution

Article
Open access
Published: 07 September 2020

Volume 11, article number 4447, (2020)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Population sequencing enhances understanding of tea plant evolution

Download PDF

15k Accesses
21 Altmetric
2 Mentions
Explore all metrics

Abstract

Tea is an economically important plant characterized by a large genome, high heterozygosity, and high species diversity. In this study, we assemble a 3.26-Gb high-quality chromosome-scale genome for the ‘Long**g 43’ cultivar of Camellia sinensis var. sinensis. Genomic resequencing of 139 tea accessions from around the world is used to investigate the evolution and phylogenetic relationships of tea accessions. We find that hybridization has increased the heterozygosity and wide-ranging gene flow among tea populations with the spread of tea cultivation. Population genetic and transcriptomic analyses reveal that during domestication, selection for disease resistance and flavor in C. sinensis var. sinensis populations has been stronger than that in C. sinensis var. assamica populations. This study provides resources for marker-assisted breeding of tea and sets the foundation for further research on tea genetics and evolution.

Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties

Article Open access 24 July 2020

Understanding the Origin and Evolution of Tea (Camellia sinensis [L.]): Genomic Advances in Tea

Article 01 March 2023

Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis

Article Open access 15 July 2021

Introduction

Tea [Camellia sinensis (L.) O. Kuntze, 2n = 30] is one of the most important and traditional economic crops in many develo** countries in Asia, Africa, and Latin America, and is consumed as a beverage by more than two-thirds of the world’s population^{3). After performing five filtering steps (described in the Methods section), we identified a total of 218.87 million SNPs among the tea populations, with a density of approximately 67 SNPs per kb (Fig. 1a; Supplementary Tables 14 and 15). We anticipate that this extensive whole-profile SNP dataset will be valuable for further tea genomics research and marker-assisted breeding.}

**Fig. 2: Distribution and evolution of tea.**

To further investigate the phylogenetic relationships among these accessions, we constructed a maximum likelihood-based phylogenetic tree with SNPs filtered from the total SNP dataset (see the Methods section for details), using Camellia sasanqua as an outgroup (Fig. 2c). We found that all samples were clustered into one of three independent clades (Fig. 2c; Supplementary Data 4) corresponding to the CSR, CSS, and CSA populations, which is consistent with the morphology-based classical taxonomy of CSA and CSS.

Principal component analysis (PCA) was used to investigate the relationships and differentiation among populations and consistently revealed the presence of three clusters corresponding to CSA, CSS, and CSR (Fig. 2b). The first two principal components accounted for 13.08% of the total variance, with PC1 reflecting the variability of the CSA and CSS groups and PC2 differentiating CSR plants from CSA and CSS plants. We found that CSS showed better aggregation than CSA and CSR, whereas the juncture accessions of CSA and CSS were also close to CSR in the phylogenetic tree. At a K value of 3, CSA, CSS, and CSR could be readily distinguished (Fig. 2d; Supplementary Fig. 12; Supplementary Note 4), which is consistent with the PCA results (Fig. 2b). At a K value of 3 or 4, most new accessions collected from outside China appeared to have originated from CSA and CSS (yellow color, marked with an arrow in Fig. 2d), indicating their high diversity.

On the basis of the phylogenetic and population structure results (Fig. 2c; Supplementary Data 4–6), we further investigated individual- and population-level heterozygosity among the populations (Supplementary Data 3). We accordingly found the heterozygosity of CSR (6.37E-3) to be significantly higher than that of CSA (6.29E-3) and CSS (5.69E-3) (both P values < 0.05; Supplementary Fig. 13). We also calculated linkage disequilibrium (LD) decay values based on the squared correlation coefficient (r²) of pairwise SNPs in two groups, which revealed that for the CSA and CSS groups, the average r² among SNPs decayed to ~50% of its maximum value at ~41 and 59 kb, respectively. These values thus indicate that the tea genomes have relatively long LD distances and slow LD decay (Supplementary Fig. 14).

Selective sweeps in the two major tea populations

It is generally thought that the differences between CSS and CSA teas lie primarily in their flavor, leaf and tree types, cold tolerance, and processing suitability. Among the accessions assessed in the present study, the CSA population comprised three green tea accessions and 34 black tea accessions, whereas the CSS population contained 45 green tea accessions, 19 oolong tea accessions, and 11 black tea accessions (Fig. 3a). To determine the potential genetic bases of these differences, we used SweepFinder2 (version 1.0) to scan for selective sweep regions and selected regions with the top 1% of composite likelihood ratio (CLR) scores and the genes overlap** with the final sweep regions (≥300 bp). On the basis of this analysis, we identified a total of 1336 and 1028 genes bearing selection signatures in the CSA and CSS populations, respectively (Supplementary Data 7 and 8; Supplementary Fig. 15).

**Fig. 3: Sweep genesets in CSA and CSS show different directions of domestication.**

Using the data generated from GO analysis, we selected enriched genes (P value < 0.05, FDR < 0.05) from the candidate selective sweep genes of the CSA and CSS populations (Supplementary Tables 16 and 17; Supplementary Fig. 16) and accordingly found that volatile terpene metabolism genes, such as cytochrome P450s (e.g., geraniol 8-hydroxylase) and terpene synthases, including alpha-terpineol synthase (ATESY), (−)-germacrene D synthase (TPSGD), and strictosidine synthase (STSY), were significantly selected in the CSS population but not the CSA population (Fig. 3b; Supplementary Tables 16 and 17). The functionalization of core terpene molecules requires cytochrome P450s³², among which geraniol 8-hydroxylase catalyzes the conversion of geraniol (6E)-8-hydroxygeraniol (Fig. 3b), which may affect the accumulation level of geraniol. Alpha-terpineol, a monoterpene found in tea, is generated by the ATESY-mediated catalysis of geranyl-PP, whereas TPSGD catalyzes the conversion of farneyl-PP to the sesquiterpene germacrene D. Strictosidine is the precursor of terpenoid indole alkaloids, and STSY is a key enzyme in the synthesis of these alkaloids (Fig. 3b). Moreover, we found that 80% of the selected terpene-related genes showed relatively high expression in buds or leaves, whereas 33% of these genes showed significantly high expression in buds or leaves (Fig. 3c; Supplementary Table 18).

Compared with the CSA accessions, the CSS accessions were characterized by the selection of a larger number of NBS-ARC (nucleotide-binding site domain in apoptotic protease-activating factor-1, R proteins and Caenorhabditis elegans death-4 protein) genes, the Arabidopsis homologs of which, including RPS3 (also known as RPM1)³³, RPS5³⁴, and SUMM2³⁵, have been shown to be involved in resistance to Pseudomonas syringae (RPS) (Supplementary Tables 16 and 17). The expression profiles of these genes revealed that 69% of the NBS-ARC genes subject to selection are highly expressed in spring, autumn, or winter, whereas 24% of these genes are significantly highly expressed in spring, autumn, or winter (Fig. 3d; Supplementary Table 19). However, among the 214 genes under selection in both the CSS and CSA populations, we were unable to detect enrichment of any genes related to flavor synthesis or abiotic and biotic stress resistance in the CSA population (Supplementary Data 7 and 8).

Discussion

This study presents a chromosome-scale genome sequence of tea and resequencing data for 139 tea accessions collected from around the world. According to our analyses, these genomic resources will be valuable for future genomics research and molecular breeding of tea. The data reveal the genome-wide phylogeny of tea and the directions of divergent selection between the two main tea varieties, namely, CSS and CSA. Compared with CSA, in CSS, genes involved in flavor metabolism and cold tolerance have been subjected to stronger selection, which is consistent with the fact that tea accessions from eastern and northern China, such as green and oolong tea, have a distinct aroma and are cold tolerant. Our data also indicate that the CSR population is an ancestor of CSS and CSA. However, although these findings represent an important step in unravelling details of the origin and domestication of CSS and CSA, it remains necessary to identify the closest ancestor of tea and to examine a larger number of CSR accessions in the future. Due to the limitations of sampling in India, we cannot rule out the possibility of other evolutionary scenarios, an evaluation of which will require a more comprehensive collection of samples. Although several studies related to tea genomics have recently been published^{7), and the transcript reads were assembled using Cufflinks (version 2.2.1). All of the predicted gene structures were integrated using EVidenceModeler (version 1.1.1). Protein-coding genes with a coding sequence length shorter than 300 nt and with stop codons were filtered (with the exception of stop codons at the end of a sequence). We then mapped RNA-seq reads to the predicted coding regions using SOAP2 and selected the predicted gene regions based on RNA-seq data (regions with >50% coverage). The methods used for gene and functional annotation are described in detail in Supplementary Note 2. The sequences of LJ43 and Actinidia chinensis²¹ proteins were analyzed using blastp with the parameters -evalue 1e-5 -num_alignments 5. Thereafter, syntenic blocks were identified using MCScanX with the parameters –e 1e-20. SCZ and YK10 were analyzed using the same pipeline and parameters. We also analyzed the genome synteny between Theobroma cacao⁵⁴ and LJ43, SCZ, and YK10 (Supplementary Note 3).}

Analysis of positive Darwinian selection

A species tree was constructed as described in Supplementary Note 3, without SCZ and YK10. We identified 1031 single-copy gene families. The protein sequences of single-copy genes were aligned using ClustalW2⁵⁵, and then the ClustalW2 data were transformed to nuclear format according to the alignment protein sequences using an in-house Perl script. Gblocks⁵⁶ was used to cleave the nuclear alignment sequences based on the t = c parameter. “Branch-site” models A and Test2 were selected to assess positive selection using codeml of the PAML package. The significant sites were dropped if the 5-bp sequences around the site sequences were cut by Gblocks. A false discovery rate (FDR) value of ≤0.05 was used to filter the results.

SNP calling and filtering

Quality-controlled reads were mapped to the unmasked tea genome using bwa (version 0.7.15)⁵⁷ with the default parameters. SAMtools (version 1.4)⁵⁸ was used for sorting, and Picard (v.2.17.0) was used to remove duplicates. The HaplotypeCaller of GATK (version 3.8.0)⁵⁸ was used to construct general variant calling files for the tea group (139 accessions) and outgroup (C. sasanqua, CM-1) by invoking -ERC:GVCF. gVCF files in the tea group were combined using GenotypeGVCFs in GATK to form a single-variant calling file, whereas the gVCF file for the outgroup was called using the option ‘–allSites’ to include all sites. The final single-variant calling file was merged using BCFtools (version 1.6), with only the consistent positions retained in both groups. To obtain high-quality SNPs, we initially used the GATK hard filter to filter the merged VCF data with the options (QD ≥ 2.0 && FS ≤ 60.0 && MQ ≥ 40.0 && MQRankSum ≥ −12.5 && ReadPosRankSum ≥ −8.0). Thereafter, we performed strict filtering of the SNP calls based on the following criteria: (1) sites were located at a distance of least 5 bp from a predicted insertion/deletion; (2) the consensus quality was ≥40; (3) the sites were not triallelic and did not contain InDels; (4) the depth ranged from 2.5 to 97.5% in the depth quartile; and (5) SNPs had minor allele frequencies (MAFs) ≥ 0.01.

Population genetic analyses

We selected high-quality SNPs with a maximum of 20% missing data, and to eliminate the potential effects of physical linkage among variants, the sites were thinned such that no two sites were within the same 2000-bp region. Phylogenetic analysis was conducted with the final SNP set using IQ-TREE (version 1.6.9)^59,60,61. A maximum likelihood (ML)-based phylogenetic tree was constructed using the GTR + F + R5 model, with 1000 rapid bootstrap replicates conducted to determine branch confidence values. The best-fitting model was estimated using ModelFinder implemented in IQ-TREE after evaluating 286 DNA models. GTR + F + R5 was selected based on the Bayesian information criterion. The ML phylogenetic tree was constructed based on intergene region SNPs using the final SNP set and 4DTV SNPs. Principal component analysis (PCA) of the final SNP set was performed using PLINK (version 1.90), with the principal components plotted against one another using R 3.4 to visualize patterns of genetic variation. We also used the final SNP set for population structure analysis using ADMIXTURE (version 1.3)⁶², which was run with K values (the number of assumed ancestral components) ranging from 1 to 10.

Population heterozygosity at a given locus was computed as the fraction of heterozygous individuals among all individuals in a given population. The average heterozygosity was then calculated for each 40-kb sliding window, with a step size of 20 kb. Individual heterozygosity was computed as the fraction of loci that were heterozygous in an individual. Average heterozygosity was also calculated using the same method. Windows with an average depth <1 were filtered out.

To eliminate the influence of differences in sample number, eight samples of the CSR/CSA/CSS populations were randomly selected to calculate nucleotide diversity. To reduce the sampling error, we performed 20 repeat calculations for each population using VCFtools (version 0.1.16) with a window size of 50 kb and a step size of 10 kb. The data for each population are presented as boxplots created using R.

Selective sweep analysis

TreeTime 0.5.3⁶³ was used to infer the ancestral state based on ML using the generated evolutionary tree. Sites lacking a reconstructed ancestral state in a population were folded in the SweepFinder2 analysis. We excluded sites that were neither polymorphic nor substitutions, as recommended by the SweepFinder2 manual⁶⁴. To reduce the likelihood of false positives, the chromosome-wide frequency spectrum was calculated as the background for each chromosome and population. SweepFinder2 was run with a grid size of 100. The CLR scores from the SweepFinder2 results were extracted and merged into sweep regions when the neighboring score(s) exceeded a certain threshold, which was set as the top 1% of CLR scores. To obtain regions with greater continuity, we merged regions into a single region with a certain size threshold between regions, with the threshold being set to 50% of the size in the adjacent sweep regions. The final score for each sweep region was the sum of the CLR scores of the sites in the sweep region. The final sweep regions were filtered based on a minimum size of 300 bp. Genes overlap** within the sweep regions were extracted as candidate selective sweep genes. The GO-enriched (P value < 0.05, FDR < 0.05) candidate selective sweep genes were chosen, and Fst, θ_π and Tajima’s D values were calculated using VCFtools with a window size of 50,000 bp and a step size of 10,000 bp.

Gene expression

Transcript-level expression was calculated using HISAT2, StringTie, and Ballgown with the default parameters⁶⁵. The genes identified among the selection results were selected for expression analysis, and an expression heatmap was plotted using the heatmap package in R 3.4. The average expression of selected genes shown in Fig. 3d was calculated according to season, whereas the average expression of selected genes shown in Fig. 3c was calculated according to tissue. Student’s t-test was used to identify the significantly differentially expressed genes (P value < 0.05).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The RNA-seq, 10× Genomics, Hi–C, Illumina short reads, and PacBio raw data for the ‘Long**g 43’ cultivar of Camellia sinensis var. sinensis have been deposited in the European Bioinformatics Institute with the accession code PRJEB39502. The raw resequencing data have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive database with the accession codes PRJNA646044. All raw sequence data are also available in the Genome Sequence Archive⁶⁶ in the BIG Data Center⁶⁷, Bei**g Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number PRJCA001158. The assembly and annotation of the ‘Long**g43’ genome are available in BIG database [https://bigd.big.ac.cn/search/?dbId=gwh&q=GWHACFB00000000]. Source data are provided with this paper.

References

**a, E. H. et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10, 866–877 (2017).
CAS PubMed Google Scholar
Wei, K. et al. A coupled role for CsMYB75 and CsGSTF1 in anthocyanin hyperaccumulation in purple tea. Plant J. 97, 825–840 (2019).
CAS PubMed Google Scholar
Lu, H. et al. Earliest tea as evidence for one branch of the Silk Road across the Tibetan Plateau. Sci. Rep. 6, 18955 (2016).
ADS CAS PubMed PubMed Central Google Scholar
Wu, J. Review on ‘Cha Ching’ (Agriculture Press, Bei**g, 1987).
Harbowy, M. E. & Balentine, D. A. Tea chemistry. Crit. Rev. Plant Sci. 16, 415–480 (1997).
CAS Google Scholar
Hara, Y., Luo, S. J., Wickremasinghe, R. L. & Yamanishi, T. Special issue on tea. Food Rev. Int. 11, 371–546 (1997).
Google Scholar
Liang, Y. & Shi, M. Advances in tea plant genetics and breeding. J. Tea Sci. 35, 103–109 (2015).
Google Scholar
Chen, L., Yu, F.-L. & Tong, Q.-Q. Discussions on phylogenetic classification and evolution of Sect. Thea. J. Tea Sci. 20, 89–94 (2000).
Google Scholar
Yang, J.-B., Yang, J., Li, H.-T., Zhao, Y. & Yang, S.-X. Isolation and characterization of 15 microsatellite markers from wild tea plant (Camellia taliensis) using FIASCO method. Conserv. Genet. 10, 1621–1623 (2009).
CAS Google Scholar
Raina, S. N. et al. Genetic structure and diversity of India hybrid tea. Genet. Resour. Crop Ev. 59, 1527–1541 (2012).
CAS Google Scholar
Zhang, W., Rong, J., Wei, C., Gao, L. & Chen, J. Domestication origin and spread of cultivated tea plants. Biodivers. Sci. 26, 357–372 (2018).
Google Scholar
Huang, H., Shi, C., Liu, Y., Mao, S. Y. & Gao, L. Z. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol. Biol. 14, 151 (2014).
PubMed PubMed Central Google Scholar
Li, M.-M., Meegahakumbura, M. K., Yan, L.-J., Liu, J. & Gao, L.-M. Genetic involvement of Camellia taliensis in the domestication of C.sinensis var. assamica (Assimica Tea) revealed by nuclear microsatellite markers. Plant Divers. Resour. 37, 29–37 (2015).
Google Scholar
Yao, M. Z., Ma, C. L., Qiao, T. T., **, J. Q. & Chen, L. Diversity distribution and population structure of tea germplasms in China revealed by EST-SSR markers. Tree Genet. Genome 8, 205–220 (2012).
Google Scholar
Chen et al. Discrimination of wild tea germplasm resources (Camellia sp.) using RAPD markers. Agr. Sci. China 1, 1105–1110 (2002).
Google Scholar
Wei, C. L. et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc. Natl Acad. Sci. USA 115, E4151–E4158 (2018).
CAS PubMed PubMed Central Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
CAS PubMed Google Scholar
Yang, Y. & Liang, Y. Clonal Tea Plant Cultivar Records of China (Shanghai Scientific & Technical Publishers, Shanghai, 2014).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
CAS PubMed Google Scholar
Ma, J. Q. et al. Construction of a SSR-based genetic map and identification of QTLs for catechins content in tea plant (Camellia sinensis). PLoS ONE 9, e93131 (2014).
ADS PubMed PubMed Central Google Scholar
Huang, S. et al. Draft genome of the kiwifruit Actinidia chinensis. Nat. Commun. 4, 2640 (2013).
ADS PubMed Google Scholar
Salojarvi, J. et al. Genome sequencing and population genomic analyses provide insights into the adaptive landscape of silver birch. Nat. Genet. 49, 904–912 (2017).
CAS PubMed Google Scholar
Teh, B. T. et al. The draft genome of tropical fruit durian (Durio zibethinus). Nat. Genet. 49, 1633–1641 (2017).
CAS PubMed Google Scholar
Sun, S. L. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 50, 1289–1295 (2018).
CAS PubMed Google Scholar
Ou, S. J., Chen, J. F. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
PubMed PubMed Central Google Scholar
Gaut, B. S., Morton, B. R., McCaig, B. C. & Clegg, M. T. Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl Acad. Sci. USA 93, 10274–10279 (1996).
ADS CAS PubMed PubMed Central Google Scholar
García-Andrade, J., Ramírez, V., Flors, V. & Vera, P. Arabidopsis ocp3 mutant reveals a mechanism linking ABA and JA to pathogen-induced callose deposition. Plant J. 67, 783–794 (2011).
PubMed Google Scholar
Koh, E., Carmieli, R., Mor, A. & Fluhr, R. Singlet oxygen-induced membrane disruption and serpin-protease balance in vacuolar-driven cell death. Plant Physiol. 171, 1616–1625 (2016).
PubMed PubMed Central Google Scholar
Fourrier, N. et al. A role for SENSITIVE TO FREEZING2 in protecting chloroplasts against freeze-induced damage in Arabidopsis. Plant J. 55, 734–745 (2008).
CAS PubMed Google Scholar
Liu, J. & Last, R. L. MPH1 is a thylakoid membrane protein involved in protecting photosystem II from photodamage in land plants. Plant Signal. Behav. 10, e1076602 (2015).
PubMed PubMed Central Google Scholar
Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).
ADS CAS PubMed Google Scholar
Pateraki, I., Heskes, A. M. & Hamberger, B. Cytochromes P450 for terpene functionalisation and metabolic engineering. Adv. Biochem. Eng. Biotechnol. 148, 107–139 (2015).
CAS PubMed Google Scholar
Mackey, D., Holt, B. F., Wiig, A. & Dangl, J. L. RIN4 interacts with Pseudomonas syringae type III effector molecules and is required for RPM1-mediated resistance in Arabidopsis. Cell 108, 743–754 (2002).
CAS PubMed Google Scholar
Warren, R. F., Henk, A., Mowery, P., Holub, E. & Innes, R. W. A mutation within the leucine-rich repeat domain of the Arabidopsis disease resistance gene RPS5 partially suppresses multiple bacterial and downy mildew resistance genes. Plant Cell 10, 1439–1452 (1998).
CAS PubMed PubMed Central Google Scholar
Zhang, Z. B. et al. Disruption of PAMP-induced MAP kinase cascade by a Pseudomonas syringae effector activates plant immunity mediated by the NB-LRR protein SUMM2. Cell Host Microbe 11, 253–263 (2012).
CAS PubMed Google Scholar
**a, E. et al. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into genome evolution and adaptation of tea plants. Mol. Plant 13, 1013–1026 (2020).
CAS PubMed Google Scholar
Zhang, Q. J. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons to drive genome size evolution. Mol. Plant 13, 935–938 (2020).
CAS PubMed Google Scholar
Wang, X.-C., Yao, M.-Z., Ma, C.-L. & Chen, L. Analysis and evaluation of biochemical components in bitter tea plant germplasms. Chin. Agr. Sci. Bull. 24, 65–69 (2008).
Google Scholar
Zhao, D. W., Yang, J. B., Yang, S. X., Kato, K. & Luo, J. P. Genetic diversity and domestication origin of tea plant Camellia taliensis (Theaceae) as revealed by microsatellite markers. BMC Plant Biol. 14, 14 (2014).
PubMed PubMed Central Google Scholar
Chen, C. The General History of Tea Industry (Chinese Agricultural Press, Bei**g, 2008).
Li, W. The evolution of Bashu tea culture and the development of Chinese tea culture. Chongqing Soc. Sci. 10, 100–104 (2009).
Google Scholar
Meegahakumbura, M. K. et al. Indications for three independent domestication events for the tea plant (Camellia sinensis (L.) O. Kuntze) and new insights into the origin of tea germplasm in China and India revealed by nuclear microsatellites. PLoS ONE 11, e0155369 (2016).
CAS PubMed PubMed Central Google Scholar
Meegahakumbura, M. K. et al. Domestication origin and breeding history of the tea plant (Camellia sinensis) in China and India based on nuclear microsatellites and cpDNA sequence data. Front. Plant Sci. 8, 2270 (2017).
PubMed Google Scholar
Yang, Z., Baldermann, S. & Watanabe, N. Recent studies of the volatile compounds in tea. Food Res. Int. 53, 585–599 (2013).
CAS Google Scholar
Owuor, P. O., Takeo, T., Horita, H., Tsushida, T. & Murai, T. Differentiation of clonal teas by terpene index. J. Sci. Food Agr. 40, 341–345 (2010).
Google Scholar
Takeo, T. et al. One speculation the origin and dispersion of tea plant in China-One speculation based on the chemotaxonomy by using the content-ration of terpene-alcohols found in tea aroma composition. J. Tea Sci. 12, 81–86 (1992).
Google Scholar
Takeo, T. Variation in amounts of linalol and geraniol produced in tea shoots by mechanical injury. Phytochemistry 20, 2149–2151 (1981).
CAS Google Scholar
Wan, X. & **a, T. Secondary Metabolism of Tea Plant (China Science Publishing, Bei**g, 2015).
**n, X.-F., Kvitko, B. & He, S. Y. Pseudomonas syringae: what it takes to be a pathogen. Nat. Rev. Microbiol. 16, 316–328 (2018).
CAS PubMed PubMed Central Google Scholar
Song, J. Q. et al. Gene RB cloned from Solanum bulbocastanum confers broad spectrum resistance to potato late blight. Proc. Natl Acad. Sci. USA 100, 9128–9133 (2003).
ADS CAS PubMed PubMed Central Google Scholar
Wang, L. et al. Transcriptional and physiological analyses reveal the association of ROS metabolism with cold tolerance in tea plant. Environ. Exp. Bot. 160, 45–58 (2019).
CAS Google Scholar
Healey, A., Furtado, A., Cooper, T. & Henry, R. J. Protocol: a simple method for extracting next-generation sequencing quality genomic DNA from recalcitrant plant species. Plant Methods 10, 21 (2014).
PubMed PubMed Central Google Scholar
Jackman, S. D. et al. Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinforma. 19, 393 (2018).
CAS Google Scholar
Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).
CAS PubMed Google Scholar
Larkin, M. A. et al. Clustal W and clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
CAS PubMed Google Scholar
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
CAS PubMed PubMed Central Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
CAS PubMed Google Scholar
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
CAS PubMed Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
CAS PubMed PubMed Central Google Scholar
Sagulenko, P., Puller, V. & Neher, R. A. TreeTime: maximum-likelihood phylodynamic analysis. Virus Evol. 4, vex042 (2018).
PubMed PubMed Central Google Scholar
DeGiorgio, M., Huber, C. D., Hubisz, M. J., Hellmann, I. & Nielsen, R. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformatics 32, 1895–1897 (2016).
CAS PubMed Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
CAS PubMed PubMed Central Google Scholar
Wang, Y. Q. et al. GSA: Genome sequence archive. Genom. Proteom. Bioinf. 15, 14–18 (2017).
Google Scholar
Zhang, Z. et al. Database resources of the BIG data center in 2019. Nucleic Acids Res. 47, D8–D14 (2019).
CAS Google Scholar

Download references

Acknowledgements

This work was funded by the Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2017-TRICAAS and CAAS-ASTIP-2017-AGISCAAS), the Agricultural Science and Technology Innovation Program Cooperation and Innovation Mission (CAAS-XTCX2016), the Major Science and Technology Special Project of Variety Breeding of Zhejiang Province (2016C02053), Shenzhen Science and Technology Research Funding (JSGG20160429104101251), the National Youth Talent Support Program and the Program for the Innovative Research Team of Yunnan Province. We thank Hualin Huang (Tea Research Institute of Guangdong Agricultural Academy of Sciences), Haitao Zheng (Rizhao Tea Research Institute), and Lizhe Lv (**nyang Tea Research Institute) for supplying tea plant samples. We thank **ujuan Shao for analyzing the gene annotations of LJ43. We thank Assistant Professor Supriyo Basak at the Kunming Institute of Botany, CAS, and Banasthali Vidyapith for help in assessing genome size with flow cytometry.

Author information

These authors contributed equally: **nchao Wang, Hu Feng, Yuxiao Chang, Chunlei Ma, Liyuan Wang, **nyuan Hao.

Authors and Affiliations

Key Laboratory of Tea Biology and Resource Utilization, Ministry of Agriculture and Rural Affairs, National Center for Tea Plant Improvement, Tea Research Institute, Chinese Academy of Agricultural Sciences, 310008, Hangzhou, China
**nchao Wang, Chunlei Ma, Liyuan Wang, **nyuan Hao, Hao Cheng, Lu Wang, Jiqiang **, Kang Wei, Liang Chen & Yajun Yang
Lingnan Guangdong Laboratory of Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, 518120, Shenzhen, China
Hu Feng, Yuxiao Chang, A’lun Li, Peng Cui, **aobo Wang, Cheng Ai, Sheng Zhao, Zhichao Wu & Jue Ruan
Tea Research Institute, Yunnan Academy of Agricultural Sciences, 650231, Menghai, China
Youyong Li & Benying Liu
State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, 650223, Kunming, China
Guo-Dong Wang
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650223, Kunming, China
Guo-Dong Wang

Authors

**nchao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiao Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chunlei Ma
View author publications
You can also search for this author in PubMed Google Scholar
Liyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
**nyuan Hao
View author publications
You can also search for this author in PubMed Google Scholar
A’lun Li
View author publications
You can also search for this author in PubMed Google Scholar
Hao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Lu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Cui
View author publications
You can also search for this author in PubMed Google Scholar
Jiqiang **
View author publications
You can also search for this author in PubMed Google Scholar
**aobo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Ai
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhichao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Youyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Benying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jue Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.W., Y.C., G.W., L.C., J.R., and Y.Y. designed the experiments and managed the project. X.W., F.H., Y.C., C.M., L.Y.W., X.H., and A.L. wrote the manuscript with input from all authors. X.W., F.H., Y.C., C.M., X.H., A.L., H.C., J.J., L.W., K.W., X.B.W., C.A., Z.W., S.Z., P.C., Y.L., B.L., G.W., L.C., J.R., and Y.Y. collected the samples, extracted genetic material, analyzed the data, and performed the experiments. X.W., Y.C., C.M., X.H., and S.Z. performed the experiments and genomic and RNA sequencing. J.R. performed the genome assembly analyses. H.F. and X.B.W. performed the gene annotation analyses. H.F., X.H., A.L., and C.A. performed transcriptomic analyses. X.W., H.F., A.L., and G.W. performed population analyses. X.W., Y.C., P.C., L.C., G.W., J.R., and Y.Y. revised the manuscript.

Corresponding authors

Correspondence to Guo-Dong Wang, Liang Chen, Jue Ruan or Yajun Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, X., Feng, H., Chang, Y. et al. Population sequencing enhances understanding of tea plant evolution. Nat Commun 11, 4447 (2020). https://doi.org/10.1038/s41467-020-18228-8

Download citation

Received: 16 March 2020
Accepted: 07 August 2020
Published: 07 September 2020
DOI: https://doi.org/10.1038/s41467-020-18228-8
Springer Nature Limited

Population sequencing enhances understanding of tea plant evolution

Abstract

Similar content being viewed by others

Introduction

Selective sweeps in the two major tea populations

Discussion

Analysis of positive Darwinian selection

SNP calling and filtering

Population genetic analyses

Selective sweep analysis

Gene expression

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation