Abstract
Tea is an economically important plant characterized by a large genome, high heterozygosity, and high species diversity. In this study, we assemble a 3.26-Gb high-quality chromosome-scale genome for the ‘Long**g 43’ cultivar of Camellia sinensis var. sinensis. Genomic resequencing of 139 tea accessions from around the world is used to investigate the evolution and phylogenetic relationships of tea accessions. We find that hybridization has increased the heterozygosity and wide-ranging gene flow among tea populations with the spread of tea cultivation. Population genetic and transcriptomic analyses reveal that during domestication, selection for disease resistance and flavor in C. sinensis var. sinensis populations has been stronger than that in C. sinensis var. assamica populations. This study provides resources for marker-assisted breeding of tea and sets the foundation for further research on tea genetics and evolution.
Similar content being viewed by others
Introduction
Tea [Camellia sinensis (L.) O. Kuntze, 2n = 30] is one of the most important and traditional economic crops in many develo** countries in Asia, Africa, and Latin America, and is consumed as a beverage by more than two-thirds of the world’s population3). After performing five filtering steps (described in the Methods section), we identified a total of 218.87 million SNPs among the tea populations, with a density of approximately 67 SNPs per kb (Fig. 1a; Supplementary Tables 14 and 15). We anticipate that this extensive whole-profile SNP dataset will be valuable for further tea genomics research and marker-assisted breeding.
a The distribution of tea accessions assessed in the present study. The teas within the black oval had the highest nucleotide polymorphism. b Principal component analysis of the tea populations. PC1 and PC2 split the tea populations into three clusters. The Camellia sinensis var. sinensis (CSS) samples were found to cluster more tightly than the C. sinensis var. assamica (CSA) samples. c A phylogenetic tree of tea. Camellia sasanqua Thunb. was used as the outgroup, and the tea samples closest to the outgroup were C. sinensis-related species (CSR). d Structure of the tea populations. Green, blue, and yellow represent the CSS, CSA, and CSR populations, respectively. Yellow and orange are marked with arrows. Source data are provided as a Source Data file.
To further investigate the phylogenetic relationships among these accessions, we constructed a maximum likelihood-based phylogenetic tree with SNPs filtered from the total SNP dataset (see the Methods section for details), using Camellia sasanqua as an outgroup (Fig. 2c). We found that all samples were clustered into one of three independent clades (Fig. 2c; Supplementary Data 4) corresponding to the CSR, CSS, and CSA populations, which is consistent with the morphology-based classical taxonomy of CSA and CSS.
Principal component analysis (PCA) was used to investigate the relationships and differentiation among populations and consistently revealed the presence of three clusters corresponding to CSA, CSS, and CSR (Fig. 2b). The first two principal components accounted for 13.08% of the total variance, with PC1 reflecting the variability of the CSA and CSS groups and PC2 differentiating CSR plants from CSA and CSS plants. We found that CSS showed better aggregation than CSA and CSR, whereas the juncture accessions of CSA and CSS were also close to CSR in the phylogenetic tree. At a K value of 3, CSA, CSS, and CSR could be readily distinguished (Fig. 2d; Supplementary Fig. 12; Supplementary Note 4), which is consistent with the PCA results (Fig. 2b). At a K value of 3 or 4, most new accessions collected from outside China appeared to have originated from CSA and CSS (yellow color, marked with an arrow in Fig. 2d), indicating their high diversity.
On the basis of the phylogenetic and population structure results (Fig. 2c; Supplementary Data 4–6), we further investigated individual- and population-level heterozygosity among the populations (Supplementary Data 3). We accordingly found the heterozygosity of CSR (6.37E-3) to be significantly higher than that of CSA (6.29E-3) and CSS (5.69E-3) (both P values < 0.05; Supplementary Fig. 13). We also calculated linkage disequilibrium (LD) decay values based on the squared correlation coefficient (r2) of pairwise SNPs in two groups, which revealed that for the CSA and CSS groups, the average r2 among SNPs decayed to ~50% of its maximum value at ~41 and 59 kb, respectively. These values thus indicate that the tea genomes have relatively long LD distances and slow LD decay (Supplementary Fig. 14).
Selective sweeps in the two major tea populations
It is generally thought that the differences between CSS and CSA teas lie primarily in their flavor, leaf and tree types, cold tolerance, and processing suitability. Among the accessions assessed in the present study, the CSA population comprised three green tea accessions and 34 black tea accessions, whereas the CSS population contained 45 green tea accessions, 19 oolong tea accessions, and 11 black tea accessions (Fig. 3a). To determine the potential genetic bases of these differences, we used SweepFinder2 (version 1.0) to scan for selective sweep regions and selected regions with the top 1% of composite likelihood ratio (CLR) scores and the genes overlap** with the final sweep regions (≥300 bp). On the basis of this analysis, we identified a total of 1336 and 1028 genes bearing selection signatures in the CSA and CSS populations, respectively (Supplementary Data 7 and 8; Supplementary Fig. 15).
a The tea types were used for SweepFinder2 analysis of CSS (C. sinensis var. sinensis) and CSA (Camellia sinensis var. assamica). b The pathway of terpene metabolism. The selective sweep genes are indicated by stars. The arrows bisected by equal symbols indicate hidden processes. c The expression of terpene-related genes in different tea tissues. d The expression of NBS-ARC genes in different tea tissues. Source data underlying Fig. 3c, d are provided as a Source Data file.
Using the data generated from GO analysis, we selected enriched genes (P value < 0.05, FDR < 0.05) from the candidate selective sweep genes of the CSA and CSS populations (Supplementary Tables 16 and 17; Supplementary Fig. 16) and accordingly found that volatile terpene metabolism genes, such as cytochrome P450s (e.g., geraniol 8-hydroxylase) and terpene synthases, including alpha-terpineol synthase (ATESY), (−)-germacrene D synthase (TPSGD), and strictosidine synthase (STSY), were significantly selected in the CSS population but not the CSA population (Fig. 3b; Supplementary Tables 16 and 17). The functionalization of core terpene molecules requires cytochrome P450s32, among which geraniol 8-hydroxylase catalyzes the conversion of geraniol (6E)-8-hydroxygeraniol (Fig. 3b), which may affect the accumulation level of geraniol. Alpha-terpineol, a monoterpene found in tea, is generated by the ATESY-mediated catalysis of geranyl-PP, whereas TPSGD catalyzes the conversion of farneyl-PP to the sesquiterpene germacrene D. Strictosidine is the precursor of terpenoid indole alkaloids, and STSY is a key enzyme in the synthesis of these alkaloids (Fig. 3b). Moreover, we found that 80% of the selected terpene-related genes showed relatively high expression in buds or leaves, whereas 33% of these genes showed significantly high expression in buds or leaves (Fig. 3c; Supplementary Table 18).
Compared with the CSA accessions, the CSS accessions were characterized by the selection of a larger number of NBS-ARC (nucleotide-binding site domain in apoptotic protease-activating factor-1, R proteins and Caenorhabditis elegans death-4 protein) genes, the Arabidopsis homologs of which, including RPS3 (also known as RPM1)33, RPS534, and SUMM235, have been shown to be involved in resistance to Pseudomonas syringae (RPS) (Supplementary Tables 16 and 17). The expression profiles of these genes revealed that 69% of the NBS-ARC genes subject to selection are highly expressed in spring, autumn, or winter, whereas 24% of these genes are significantly highly expressed in spring, autumn, or winter (Fig. 3d; Supplementary Table 19). However, among the 214 genes under selection in both the CSS and CSA populations, we were unable to detect enrichment of any genes related to flavor synthesis or abiotic and biotic stress resistance in the CSA population (Supplementary Data 7 and 8).
Discussion
This study presents a chromosome-scale genome sequence of tea and resequencing data for 139 tea accessions collected from around the world. According to our analyses, these genomic resources will be valuable for future genomics research and molecular breeding of tea. The data reveal the genome-wide phylogeny of tea and the directions of divergent selection between the two main tea varieties, namely, CSS and CSA. Compared with CSA, in CSS, genes involved in flavor metabolism and cold tolerance have been subjected to stronger selection, which is consistent with the fact that tea accessions from eastern and northern China, such as green and oolong tea, have a distinct aroma and are cold tolerant. Our data also indicate that the CSR population is an ancestor of CSS and CSA. However, although these findings represent an important step in unravelling details of the origin and domestication of CSS and CSA, it remains necessary to identify the closest ancestor of tea and to examine a larger number of CSR accessions in the future. Due to the limitations of sampling in India, we cannot rule out the possibility of other evolutionary scenarios, an evaluation of which will require a more comprehensive collection of samples. Although several studies related to tea genomics have recently been published7), and the transcript reads were assembled using Cufflinks (version 2.2.1). All of the predicted gene structures were integrated using EVidenceModeler (version 1.1.1). Protein-coding genes with a coding sequence length shorter than 300 nt and with stop codons were filtered (with the exception of stop codons at the end of a sequence). We then mapped RNA-seq reads to the predicted coding regions using SOAP2 and selected the predicted gene regions based on RNA-seq data (regions with >50% coverage). The methods used for gene and functional annotation are described in detail in Supplementary Note 2. The sequences of LJ43 and Actinidia chinensis21 proteins were analyzed using blastp with the parameters -evalue 1e-5 -num_alignments 5. Thereafter, syntenic blocks were identified using MCScanX with the parameters –e 1e-20. SCZ and YK10 were analyzed using the same pipeline and parameters. We also analyzed the genome synteny between Theobroma cacao54 and LJ43, SCZ, and YK10 (Supplementary Note 3).
Analysis of positive Darwinian selection
A species tree was constructed as described in Supplementary Note 3, without SCZ and YK10. We identified 1031 single-copy gene families. The protein sequences of single-copy genes were aligned using ClustalW255, and then the ClustalW2 data were transformed to nuclear format according to the alignment protein sequences using an in-house Perl script. Gblocks56 was used to cleave the nuclear alignment sequences based on the t = c parameter. “Branch-site” models A and Test2 were selected to assess positive selection using codeml of the PAML package. The significant sites were dropped if the 5-bp sequences around the site sequences were cut by Gblocks. A false discovery rate (FDR) value of ≤0.05 was used to filter the results.
SNP calling and filtering
Quality-controlled reads were mapped to the unmasked tea genome using bwa (version 0.7.15)57 with the default parameters. SAMtools (version 1.4)58 was used for sorting, and Picard (v.2.17.0) was used to remove duplicates. The HaplotypeCaller of GATK (version 3.8.0)58 was used to construct general variant calling files for the tea group (139 accessions) and outgroup (C. sasanqua, CM-1) by invoking -ERC:GVCF. gVCF files in the tea group were combined using GenotypeGVCFs in GATK to form a single-variant calling file, whereas the gVCF file for the outgroup was called using the option ‘–allSites’ to include all sites. The final single-variant calling file was merged using BCFtools (version 1.6), with only the consistent positions retained in both groups. To obtain high-quality SNPs, we initially used the GATK hard filter to filter the merged VCF data with the options (QD ≥ 2.0 && FS ≤ 60.0 && MQ ≥ 40.0 && MQRankSum ≥ −12.5 && ReadPosRankSum ≥ −8.0). Thereafter, we performed strict filtering of the SNP calls based on the following criteria: (1) sites were located at a distance of least 5 bp from a predicted insertion/deletion; (2) the consensus quality was ≥40; (3) the sites were not triallelic and did not contain InDels; (4) the depth ranged from 2.5 to 97.5% in the depth quartile; and (5) SNPs had minor allele frequencies (MAFs) ≥ 0.01.
Population genetic analyses
We selected high-quality SNPs with a maximum of 20% missing data, and to eliminate the potential effects of physical linkage among variants, the sites were thinned such that no two sites were within the same 2000-bp region. Phylogenetic analysis was conducted with the final SNP set using IQ-TREE (version 1.6.9)59,60,61. A maximum likelihood (ML)-based phylogenetic tree was constructed using the GTR + F + R5 model, with 1000 rapid bootstrap replicates conducted to determine branch confidence values. The best-fitting model was estimated using ModelFinder implemented in IQ-TREE after evaluating 286 DNA models. GTR + F + R5 was selected based on the Bayesian information criterion. The ML phylogenetic tree was constructed based on intergene region SNPs using the final SNP set and 4DTV SNPs. Principal component analysis (PCA) of the final SNP set was performed using PLINK (version 1.90), with the principal components plotted against one another using R 3.4 to visualize patterns of genetic variation. We also used the final SNP set for population structure analysis using ADMIXTURE (version 1.3)62, which was run with K values (the number of assumed ancestral components) ranging from 1 to 10.
Population heterozygosity at a given locus was computed as the fraction of heterozygous individuals among all individuals in a given population. The average heterozygosity was then calculated for each 40-kb sliding window, with a step size of 20 kb. Individual heterozygosity was computed as the fraction of loci that were heterozygous in an individual. Average heterozygosity was also calculated using the same method. Windows with an average depth <1 were filtered out.
To eliminate the influence of differences in sample number, eight samples of the CSR/CSA/CSS populations were randomly selected to calculate nucleotide diversity. To reduce the sampling error, we performed 20 repeat calculations for each population using VCFtools (version 0.1.16) with a window size of 50 kb and a step size of 10 kb. The data for each population are presented as boxplots created using R.
Selective sweep analysis
TreeTime 0.5.363 was used to infer the ancestral state based on ML using the generated evolutionary tree. Sites lacking a reconstructed ancestral state in a population were folded in the SweepFinder2 analysis. We excluded sites that were neither polymorphic nor substitutions, as recommended by the SweepFinder2 manual64. To reduce the likelihood of false positives, the chromosome-wide frequency spectrum was calculated as the background for each chromosome and population. SweepFinder2 was run with a grid size of 100. The CLR scores from the SweepFinder2 results were extracted and merged into sweep regions when the neighboring score(s) exceeded a certain threshold, which was set as the top 1% of CLR scores. To obtain regions with greater continuity, we merged regions into a single region with a certain size threshold between regions, with the threshold being set to 50% of the size in the adjacent sweep regions. The final score for each sweep region was the sum of the CLR scores of the sites in the sweep region. The final sweep regions were filtered based on a minimum size of 300 bp. Genes overlap** within the sweep regions were extracted as candidate selective sweep genes. The GO-enriched (P value < 0.05, FDR < 0.05) candidate selective sweep genes were chosen, and Fst, θπ and Tajima’s D values were calculated using VCFtools with a window size of 50,000 bp and a step size of 10,000 bp.
Gene expression
Transcript-level expression was calculated using HISAT2, StringTie, and Ballgown with the default parameters65. The genes identified among the selection results were selected for expression analysis, and an expression heatmap was plotted using the heatmap package in R 3.4. The average expression of selected genes shown in Fig. 3d was calculated according to season, whereas the average expression of selected genes shown in Fig. 3c was calculated according to tissue. Student’s t-test was used to identify the significantly differentially expressed genes (P value < 0.05).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The RNA-seq, 10× Genomics, Hi–C, Illumina short reads, and PacBio raw data for the ‘Long**g 43’ cultivar of Camellia sinensis var. sinensis have been deposited in the European Bioinformatics Institute with the accession code PRJEB39502. The raw resequencing data have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive database with the accession codes PRJNA646044. All raw sequence data are also available in the Genome Sequence Archive66 in the BIG Data Center67, Bei**g Institute of Genomics (BIG), Chinese Academy of Sciences, under accession number PRJCA001158. The assembly and annotation of the ‘Long**g43’ genome are available in BIG database [https://bigd.big.ac.cn/search/?dbId=gwh&q=GWHACFB00000000]. Source data are provided with this paper.
References
**a, E. H. et al. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant 10, 866–877 (2017).
Wei, K. et al. A coupled role for CsMYB75 and CsGSTF1 in anthocyanin hyperaccumulation in purple tea. Plant J. 97, 825–840 (2019).
Lu, H. et al. Earliest tea as evidence for one branch of the Silk Road across the Tibetan Plateau. Sci. Rep. 6, 18955 (2016).
Wu, J. Review on ‘Cha Ching’ (Agriculture Press, Bei**g, 1987).
Harbowy, M. E. & Balentine, D. A. Tea chemistry. Crit. Rev. Plant Sci. 16, 415–480 (1997).
Hara, Y., Luo, S. J., Wickremasinghe, R. L. & Yamanishi, T. Special issue on tea. Food Rev. Int. 11, 371–546 (1997).
Liang, Y. & Shi, M. Advances in tea plant genetics and breeding. J. Tea Sci. 35, 103–109 (2015).
Chen, L., Yu, F.-L. & Tong, Q.-Q. Discussions on phylogenetic classification and evolution of Sect. Thea. J. Tea Sci. 20, 89–94 (2000).
Yang, J.-B., Yang, J., Li, H.-T., Zhao, Y. & Yang, S.-X. Isolation and characterization of 15 microsatellite markers from wild tea plant (Camellia taliensis) using FIASCO method. Conserv. Genet. 10, 1621–1623 (2009).
Raina, S. N. et al. Genetic structure and diversity of India hybrid tea. Genet. Resour. Crop Ev. 59, 1527–1541 (2012).
Zhang, W., Rong, J., Wei, C., Gao, L. & Chen, J. Domestication origin and spread of cultivated tea plants. Biodivers. Sci. 26, 357–372 (2018).
Huang, H., Shi, C., Liu, Y., Mao, S. Y. & Gao, L. Z. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol. Biol. 14, 151 (2014).
Li, M.-M., Meegahakumbura, M. K., Yan, L.-J., Liu, J. & Gao, L.-M. Genetic involvement of Camellia taliensis in the domestication of C.sinensis var. assamica (Assimica Tea) revealed by nuclear microsatellite markers. Plant Divers. Resour. 37, 29–37 (2015).
Yao, M. Z., Ma, C. L., Qiao, T. T., **, J. Q. & Chen, L. Diversity distribution and population structure of tea germplasms in China revealed by EST-SSR markers. Tree Genet. Genome 8, 205–220 (2012).
Chen et al. Discrimination of wild tea germplasm resources (Camellia sp.) using RAPD markers. Agr. Sci. China 1, 1105–1110 (2002).
Wei, C. L. et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc. Natl Acad. Sci. USA 115, E4151–E4158 (2018).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Yang, Y. & Liang, Y. Clonal Tea Plant Cultivar Records of China (Shanghai Scientific & Technical Publishers, Shanghai, 2014).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Ma, J. Q. et al. Construction of a SSR-based genetic map and identification of QTLs for catechins content in tea plant (Camellia sinensis). PLoS ONE 9, e93131 (2014).
Huang, S. et al. Draft genome of the kiwifruit Actinidia chinensis. Nat. Commun. 4, 2640 (2013).
Salojarvi, J. et al. Genome sequencing and population genomic analyses provide insights into the adaptive landscape of silver birch. Nat. Genet. 49, 904–912 (2017).
Teh, B. T. et al. The draft genome of tropical fruit durian (Durio zibethinus). Nat. Genet. 49, 1633–1641 (2017).
Sun, S. L. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 50, 1289–1295 (2018).
Ou, S. J., Chen, J. F. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
Gaut, B. S., Morton, B. R., McCaig, B. C. & Clegg, M. T. Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc. Natl Acad. Sci. USA 93, 10274–10279 (1996).
García-Andrade, J., Ramírez, V., Flors, V. & Vera, P. Arabidopsis ocp3 mutant reveals a mechanism linking ABA and JA to pathogen-induced callose deposition. Plant J. 67, 783–794 (2011).
Koh, E., Carmieli, R., Mor, A. & Fluhr, R. Singlet oxygen-induced membrane disruption and serpin-protease balance in vacuolar-driven cell death. Plant Physiol. 171, 1616–1625 (2016).
Fourrier, N. et al. A role for SENSITIVE TO FREEZING2 in protecting chloroplasts against freeze-induced damage in Arabidopsis. Plant J. 55, 734–745 (2008).
Liu, J. & Last, R. L. MPH1 is a thylakoid membrane protein involved in protecting photosystem II from photodamage in land plants. Plant Signal. Behav. 10, e1076602 (2015).
Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).
Pateraki, I., Heskes, A. M. & Hamberger, B. Cytochromes P450 for terpene functionalisation and metabolic engineering. Adv. Biochem. Eng. Biotechnol. 148, 107–139 (2015).
Mackey, D., Holt, B. F., Wiig, A. & Dangl, J. L. RIN4 interacts with Pseudomonas syringae type III effector molecules and is required for RPM1-mediated resistance in Arabidopsis. Cell 108, 743–754 (2002).
Warren, R. F., Henk, A., Mowery, P., Holub, E. & Innes, R. W. A mutation within the leucine-rich repeat domain of the Arabidopsis disease resistance gene RPS5 partially suppresses multiple bacterial and downy mildew resistance genes. Plant Cell 10, 1439–1452 (1998).
Zhang, Z. B. et al. Disruption of PAMP-induced MAP kinase cascade by a Pseudomonas syringae effector activates plant immunity mediated by the NB-LRR protein SUMM2. Cell Host Microbe 11, 253–263 (2012).
**a, E. et al. The reference genome of tea plant and resequencing of 81 diverse accessions provide insights into genome evolution and adaptation of tea plants. Mol. Plant 13, 1013–1026 (2020).
Zhang, Q. J. et al. The chromosome-level reference genome of tea tree unveils recent bursts of non-autonomous LTR retrotransposons to drive genome size evolution. Mol. Plant 13, 935–938 (2020).
Wang, X.-C., Yao, M.-Z., Ma, C.-L. & Chen, L. Analysis and evaluation of biochemical components in bitter tea plant germplasms. Chin. Agr. Sci. Bull. 24, 65–69 (2008).
Zhao, D. W., Yang, J. B., Yang, S. X., Kato, K. & Luo, J. P. Genetic diversity and domestication origin of tea plant Camellia taliensis (Theaceae) as revealed by microsatellite markers. BMC Plant Biol. 14, 14 (2014).
Chen, C. The General History of Tea Industry (Chinese Agricultural Press, Bei**g, 2008).
Li, W. The evolution of Bashu tea culture and the development of Chinese tea culture. Chongqing Soc. Sci. 10, 100–104 (2009).
Meegahakumbura, M. K. et al. Indications for three independent domestication events for the tea plant (Camellia sinensis (L.) O. Kuntze) and new insights into the origin of tea germplasm in China and India revealed by nuclear microsatellites. PLoS ONE 11, e0155369 (2016).
Meegahakumbura, M. K. et al. Domestication origin and breeding history of the tea plant (Camellia sinensis) in China and India based on nuclear microsatellites and cpDNA sequence data. Front. Plant Sci. 8, 2270 (2017).
Yang, Z., Baldermann, S. & Watanabe, N. Recent studies of the volatile compounds in tea. Food Res. Int. 53, 585–599 (2013).
Owuor, P. O., Takeo, T., Horita, H., Tsushida, T. & Murai, T. Differentiation of clonal teas by terpene index. J. Sci. Food Agr. 40, 341–345 (2010).
Takeo, T. et al. One speculation the origin and dispersion of tea plant in China-One speculation based on the chemotaxonomy by using the content-ration of terpene-alcohols found in tea aroma composition. J. Tea Sci. 12, 81–86 (1992).
Takeo, T. Variation in amounts of linalol and geraniol produced in tea shoots by mechanical injury. Phytochemistry 20, 2149–2151 (1981).
Wan, X. & **a, T. Secondary Metabolism of Tea Plant (China Science Publishing, Bei**g, 2015).
**n, X.-F., Kvitko, B. & He, S. Y. Pseudomonas syringae: what it takes to be a pathogen. Nat. Rev. Microbiol. 16, 316–328 (2018).
Song, J. Q. et al. Gene RB cloned from Solanum bulbocastanum confers broad spectrum resistance to potato late blight. Proc. Natl Acad. Sci. USA 100, 9128–9133 (2003).
Wang, L. et al. Transcriptional and physiological analyses reveal the association of ROS metabolism with cold tolerance in tea plant. Environ. Exp. Bot. 160, 45–58 (2019).
Healey, A., Furtado, A., Cooper, T. & Henry, R. J. Protocol: a simple method for extracting next-generation sequencing quality genomic DNA from recalcitrant plant species. Plant Methods 10, 21 (2014).
Jackman, S. D. et al. Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinforma. 19, 393 (2018).
Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101–108 (2011).
Larkin, M. A. et al. Clustal W and clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Sagulenko, P., Puller, V. & Neher, R. A. TreeTime: maximum-likelihood phylodynamic analysis. Virus Evol. 4, vex042 (2018).
DeGiorgio, M., Huber, C. D., Hubisz, M. J., Hellmann, I. & Nielsen, R. SweepFinder2: increased sensitivity, robustness and flexibility. Bioinformatics 32, 1895–1897 (2016).
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Wang, Y. Q. et al. GSA: Genome sequence archive. Genom. Proteom. Bioinf. 15, 14–18 (2017).
Zhang, Z. et al. Database resources of the BIG data center in 2019. Nucleic Acids Res. 47, D8–D14 (2019).
Acknowledgements
This work was funded by the Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2017-TRICAAS and CAAS-ASTIP-2017-AGISCAAS), the Agricultural Science and Technology Innovation Program Cooperation and Innovation Mission (CAAS-XTCX2016), the Major Science and Technology Special Project of Variety Breeding of Zhejiang Province (2016C02053), Shenzhen Science and Technology Research Funding (JSGG20160429104101251), the National Youth Talent Support Program and the Program for the Innovative Research Team of Yunnan Province. We thank Hualin Huang (Tea Research Institute of Guangdong Agricultural Academy of Sciences), Haitao Zheng (Rizhao Tea Research Institute), and Lizhe Lv (**nyang Tea Research Institute) for supplying tea plant samples. We thank **ujuan Shao for analyzing the gene annotations of LJ43. We thank Assistant Professor Supriyo Basak at the Kunming Institute of Botany, CAS, and Banasthali Vidyapith for help in assessing genome size with flow cytometry.
Author information
Authors and Affiliations
Contributions
X.W., Y.C., G.W., L.C., J.R., and Y.Y. designed the experiments and managed the project. X.W., F.H., Y.C., C.M., L.Y.W., X.H., and A.L. wrote the manuscript with input from all authors. X.W., F.H., Y.C., C.M., X.H., A.L., H.C., J.J., L.W., K.W., X.B.W., C.A., Z.W., S.Z., P.C., Y.L., B.L., G.W., L.C., J.R., and Y.Y. collected the samples, extracted genetic material, analyzed the data, and performed the experiments. X.W., Y.C., C.M., X.H., and S.Z. performed the experiments and genomic and RNA sequencing. J.R. performed the genome assembly analyses. H.F. and X.B.W. performed the gene annotation analyses. H.F., X.H., A.L., and C.A. performed transcriptomic analyses. X.W., H.F., A.L., and G.W. performed population analyses. X.W., Y.C., P.C., L.C., G.W., J.R., and Y.Y. revised the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, X., Feng, H., Chang, Y. et al. Population sequencing enhances understanding of tea plant evolution. Nat Commun 11, 4447 (2020). https://doi.org/10.1038/s41467-020-18228-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-020-18228-8
- Springer Nature Limited