Background

Domestication syndrome is a phenomenon observed in crops. This results in a suite of traits that distinguish cultivated genotypes from their wild progenitors, including changes in morphology, physiology, and phenology that make them more amenable to cultivation. Thanks to the abundance of archaeobotanical, ecological and genetic information available for a handful of economically important seed propagated crops, the domestication syndrome has been well-documented in these species [1, 2]. However, less is known about the domestication trajectories of vegetatively propagated crops [2]. One of the main advantages of vegetative propagation is that it allows for the preservation of desirable traits from one generation to the next. This is because when a plant is propagated vegetatively, the offspring is genetically identical to the parent plant [2,3,4]. This means that desirable traits such as disease resistance, yield, and flavor can be maintained over many generations. This contrasts with sexual reproduction, where traits can be lost or diluted through the process of genetic recombination. The type of propagation used during domestication can result in diametrically opposed domestication syndromes. For example, while the use of vegetative propagation has been shown to negatively affect the capacity for sexual reproduction via the accumulation of mutations in genes associated to flower development, self-fertilization, and seed development, which lead to the production of self-fertilized fruits, flowering asynchrony, and lower seed viability [2, 5, 6]; crops domesticated by sexual reproduction, tend to present larger seeds, synchronic flowering and pollinator dependent fertilization [2].

Vitis vinifera is a perennial woody liana belonging to the Vitaceae family. The species is divided into two different forms principally based on their reproductive system and whether it is a cultivated or a wild form. Wild grapevines (V. vinifera ssp. sylvestris), are commonly dioecious plants [7], and are naturally distributed across Asia and Europe. Cultivated grapevines (V. vinifera ssp. vinifera) mainly produce hermaphrodites flowers, and are broadly cultivated across the world, both for grape production to be consumed as a fruit, and for winemaking, grape juice or other derived products [8].

Although viticulture started at the Paleolithic age as a food source in Europe from wild accessions [9], there is evidence that the use of grapes by humans to produce wine started near to the seventh millennium BC [10]. This significantly influenced the domestication of grapevines by selecting varieties that produce a particular fruit quality and larger berries [7, 8]. It is believed that such selection occurred using vegetative propagation by cuttings to enhance the preservation of phenotypes of interest [7, 8, 11], which in turn had a negative effect on the crop tolerance to biotic and abiotic stresses. For example, populations of wild grapevines in North Africa and coastal regions of Northern Spain shown better adaptation to salt stress than cultivated grapevines [12, 13], while wild accessions from Germany, Iran and Georgia show higher resistance to mildew infections [14,15,16,17]. Moreover, despite the use of vegetative reproduction to maintain a desired genotype, the use of asexual reproduction in grapevine has resulted in novel phenotypes appearing within the same variety [5] and same vineyard [18]. Such phenotypic variants are frequently found in vegetatively propagated crops and often make up a significant portion of the cultivated varieties. Although a genetic basis is often presumed to be the reason for the noticeable differences in traits observed, epigenetic modifications have also been proposed to play an important role [19,20,36] to our data. Firstly, demultiplexing was performed in order to ensure the structure of the adapters to identify the samples [35], and a fastq-filter was performed using Stacks v2.55 [37]. The demultiplexed sequences from the triplicates from each accession were pooled to form a unique sample. Paired-end sequences were merged using PEAR v0.9.6 [38]. Alignment and methylation calling were performed with Bismark v0.23.0 [39] using the reference genome of Vitis vinifera L. PN40024 v4.1 [40]. Sequencing depth, coverage, and methylation differences between wild and cultivated accessions were visualized using ChromoMap R v1.0.0 [41].

Global differences in DNA methylation were visualized using hierarchical clustering and principal component analysis (PCA) performed using MethylKit R Package v1.16.1 [42] on the calculated percentage of methylation in all methylated cytosines present in at least four of the accessions. The percentage of total methylation was compared between cultivated and wild accessions in each context (CG, CHG and CHH (where H = A, T,C)) using T-test, after testing for normality in the data using Kolmogorov-Smirnoff Test, considering significant differences when p-value < 0.01. Finally, differentially methylated cytosines (DMCs) were identified using the methylKit R package v1.16.1 [42]. Cytosines were considered differentially methylated between wild and cultivated accessions when the observed difference in methylation was more than 25% and p-value < 0.01. To reduce the effect of genetic mutations on differential methylation data, for a genomic location to be included in the differential methylation analysis, such location must have a cytosine in a minimum of four samples per group and the location must have been sequenced to a minimum coverage of 10X. Additionally, a second more stringent filtering was implemented by identifying all genomic locations containing a SNP using the epiDiverse - SNP pipeline (available at “https://github.com/EpiDiverse/SNP”). Then, all epialleles located in genomic locations containing a SNP were removed from the analysis and hierarchical clustering was performed using all remaining epialleles.

To determine if DNA methylation patterns associated to the geographic origin of wild accessions were present, we performed a comparative analysis following the premises of De Andrés et al., (2012) [43]. For this, the methylation information gathered from wild accessions was filtered for epialleles associated to single nucleotide polymorphism as described above. Both the remaining epialleles and the SNPs identified using epiDiverse - SNP pipeline were used for hierarchical clustering analysis.

Protein coding genes presenting at least one methylated cytosine within 1000 bp of the transcription start site were deemed methylated. The annotated genome PN40024 v4.1 was used to determine the genic location (promoter, intro, exon) of methylated cytosines identified within genes. Then methylated genes were divided into 6 groups based on the type of methylation observed: 1. Core methylated genes, i.e., genes presenting unchanged methylated cytosines both in wild and cultivated accessions (CMCs); 2. Genes presenting CMCs and hypermethylated differentially methylated cytosines (DMCs) in cultivated compared to wild accessions; 3. genes presenting CMCs and hypomethylated DMCs in cultivated compared to wild accessions; 4. genes presenting CMCs and both hypomethylated and hypermethylated DMCs; 5. genes presenting hypermethylated DMCs in cultivated compared to wild accessions; and 6. genes presenting hypomethylated DMCs in cultivated compared to wild accessions. As above, DMCs associate with a SNP were removed from the analysis using the epiDiverse - SNP pipeline. Gene Ontology (GO) analysis was implemented with GOstats [44] and rrvgo [45] package in R, for each of these groups using all genes sequenced (i.e., presenting at least one read overlap** with a window of 1000 bp before and after the 5’ and 3’ UTRs respectively) as the gene universe. QuickGO Browser [46] (GO version 2023-09-20) was used to generate the ancestor charts for the main GO terms in each group.

Results

Differences in global levels of DNA methylation between wild and domesticated grapevine genotypes

 EpiGBS2 libraries yielded a total of 44.5 million reads with an average of 2.5 million reads per sample (ranging from 1,106,659 to 8,249,031 reads). Bisulfite conversion efficiency showed on average 90% unmethylated cytosines converted to uracils. The mean percentage of mappable reads per sample after de-multiplexing was 49%, ranging from 37 to 60%. This resulted in an overall genome coverage of 1.5% (ranging between 0.7% and 2.6% (Supplementary Table S1)), with reads distributed evenly across the whole genome (See Fig. 1A for read distribution across chromosome 17 and Supplementary File 1 for read distribution across all chromosomes).

Methylation calling identified a total of 222,647 genomic locations containing methylated cytosines. The CG context presented the highest level of cytosine methylation, followed by CHG and CHH context (Fig. 1B). Cultivated varieties presented consistent significantly higher (T-test, p-value < 0.01) levels of DNA methylation than wild accessions in all sequence contexts (Fig. 1B). PCA plots built using the percentage of methylation for all sequenced cytosines as variables, show that wild and cultivated form two different clusters separated mainly by PC1 in all sequence contexts (Fig. 1C-E). Such observed separation between wild and cultivated accessions is particularly evident for the CHH context (Fig. 1E and Supplementary Fig. 3).

Fig. 1
figure 1

Analysis of differences in global levels of DNA methylation in cultivated andwild V. vinifera  accessions. A Visualization of genomic and epigenomic information for chromosome 17 of Vitis vinifera using 100,000 bp windows. Vertical bars in panels (a) and (b) show the number of protein coding genes and transposable elements respectively per genomic window. Bars in panel (c) shows average sequencing depth per genomic window (Log 10 of calculated depth for sequenced bases). Panel (d) shows the average fold change in methylation in given window (blue and red bars indicate an average hypermethylated or hypomethylated window in cultivated vs. wild accessions. Panel containing chromosome number (i.e., chr17 here) shows average fold change in methylation in each window (hypomethylation (orange) hypermethylation (yellow). To visualize an interactive version of the figure containing all DMCs per window in all chromosomes see Supplementary File 1 (Follow instructions available in Supplementary File 2). Panels generated using ChromoMap R [41]. B Bars show the average percentage of methylation per sequence context (CG, CHG, CHH, and unknown) in cultivated (V. vinifera ssp. vinifera (n = 10); black bars), and wild type (V. vinifera ssp. sylvestris (n = 8); white bars) accessions. Error bars indicate the calculated Standard Deviation. ** T-test, p-value < 0.01. C-E Multivariate analysis of percentage of methylation for all individual cytosine sequenced in cultivated and wild V. vinifera accessions. Principal Component analysis plots show results for methylation analysis results in the CG (C), CHG (D), and CHH (E) contexts. Blue and red circles represent cultivated and wild accessions respectively. PCs 1 to 3 represent 53, 64, 69% of the total measured variability in CG, CHG, and CHH contexts respectively

Analysis of global methylation levels in wild and cultivated accessions at genomic feature level (i.e., intergenic and genic regions) showed that intergenic regions presented similar levels of DNA methylation to those observed genome-wide in all sequence contexts, with the exemption of CHGs, which showed higher levels of DNA methylation (Supplementary Fig. 1). Conversely, genic regions showed consistently lower levels of DNA methylation in all sequence contexts than those observed genome-wide (Supplementary Fig. 2). Finally, cultivated accessions presented significantly higher levels of DNA methylation (T-test, p-value < 0.01) than wild accessions in all sequence contexts and genomic features (Supplementary Figs. 1 and 2).

Identification of differentially methylated cytosines associated to domestication

Differential Methylation analysis identified a total of 9955 DMCs between wild and cultivated accessions evenly distributed across the genome (Fig. 1A and Supplementary File 1). Of those, 7793 DMCs were hypermethylated and 2162 DMCs were hypomethylated in cultivated vines compared to wild accessions. The majority of both hyper and hypomethylated DMCs were found in the CHH context (77 and 69% respectively) (Fig. 2A). From a gene feature context, DMCs were mainly found in intergenic regions (Fig. 2B). This is particularly evident in the CHH context, where 56 and 60% of hypermethylated and hypomethylated DMCs, respectively, were found in intergenic regions. The second most abundant genic feature presenting DMCs were introns, with percentages varying between 24 and 35% in hypermethylated DMCs, and 28 and 32% in hypomethylated DMCs, depending on the sequence context (Fig. 2B).

Fig. 2
figure 2

Identification of DMCs associated to grapevine’s domestication. Pie charts show (A) the total number and percentage of hypermethylated (top pie chart) and hypomethylated DMCs identified in cultivated vines compared to wild accessions in each sequence context (CG, CHG and CHH); and (B) the percentage of DMCs identified per genic feature and sequence context, in cultivated compared to wild type accessions

Effect of genetic differences to epigenetic differentiation between wild and cultivated accessions

The EpiDiverse-SNP pipeline identified 57,489 SNPs in the 222,711 genomic locations containing methylated cytosines. Of the remaining 165,189 genomic locations containing methylated cytosines, 5869 DMCs were hypermethylated and 1575 DMCs were hypomethylated in cultivated vines compared to wild accessions (i.e., 25% of the original DMCs were associated to a SNP). Hierarchical clustering analysis using all epialleles and only those not associated to SNPs showed no significant clustering differences (Supplementary Fig. 3).

Analysis of (epi)genetic signals of provenance in wild type accessions

We then compared wild accessions to determine if a genetic and or epigenetic signal associated to the location from where they were originally collected exist. Hierarchical cluster analysis showed no clear epigenetic signal irrespective of the use of all epialleles sequenced or after removing epialleles associated to a SNP (Fig. 3a). However, when only genetic information was used (i.e., clustering samples using the SNPs identified by the EpiDiverse-SNP pipeline, two separate clusters of wild accessions grouped by their provenance. One cluster contained all three accessions originally collected in the North of the Iberian Peninsula, in oceanic, continental and mountain climatic zones, while the second cluster contained all accession collected from the South of the Iberian Peninsula (Mediterranean climatic zone) (Fig. 3b) (see in Supplementary TableS1 for metadata associated to each accession).

Fig. 3
figure 3

Effect of region of origin on the methylome of Iberian Vitis vinifera ssp. sylvestris. Analysis of genetic (a) and epigenetic (b) differences among wild grapevine accessions originally collected from different regions of the Iberian Peninsula and grown in a common garden. Epigenetic analysis was performed using epialleles not associated to SNPs the epiDiverse-SNP pipeline to remove the effect of underlying genetic variation between wild grapevine populations. Samples highlighted in red, the represented branches correspond to wild accessions belonging to the South of Spain, and in blue, they correspond to wild accessions coming from the North of Spain, placed approximately to the map of the Spanish Climate Zones

Analysis of domestication associated DMCs within genic features

Collectively epiGBS2 results generated reads overlap** with a total of 7174 genes. Of those, a total of 2854 (40%) genes were identified as genes that contained at least one methylated cytosine (Supplementary Table S2A). Methylated cytosines were mainly found in introns (66–80%), followed by exons (15–20%), and promoters (4–14%) (Fig. 4) (Supplementary Table S2B). Genes containing methylated cytosines could be further divided into six groups, in order of abundance, (1) Genes presenting methylated cytosines both in wild and cultivated accessions (1883 genes) (core methylated genes (CMCs) hereafter); (2) genes presenting CMCs and hypermethylated DMCs in cultivated compared to wild accessions (564 genes); (3) genes presenting CMCs and hypomethylated DMCs in cultivated compared to wild accessions (252 genes); (4) genes presenting CMCs and both hypomethylated and hypermethylated DMCs (116 genes); (5) Genes presenting hypermethylated DMCs in cultivated compared to wild accessions (28 genes); and (6) genes presenting hypomethylated DMCs in cultivated compared to wild accessions (11 genes). Functional analysis of the genes identified within each group revealed that CMCs are significantly associated with the regulation of cellular response to stress and isoprenoid/terpenoid processes. Cultivated grapevines hypermethylated genes were associated mainly to processes associated to protein targeting to peroxisomes and histone lysine demethylation, while hypermethylated genes in wild grapevines related to ethylene regulation processes and response to ozone. The remaining group (i.e., genes both presenting hyper and hypomethylated cytosines between in both types of accessions) presented GO terms related to defense response (Fig. 4) (See Supplementary Table S2C for a complete list of GO terms in each group).

Fig. 4
figure 4

Schematic representation of methylated gene types in wild and cultivated grapevines. Boxes within gene models show the percentage of the total methylated cytosines in each gene group found in each genic context. Arrow heads color and size indicate the type of methylated cytosine found in each gene type (Core methylated cytosines (CMCs); Hypomethylated and hypermethylated cytosines in cultivated vs. wild grapevine accessions) and the abundance of that type of methylation within that gene type, respectively. Right panel shows the number of identified genes for each group and their correspondent most representative GO terms

Discussion

While significant strides have been made in understanding the genetic underpinnings of crop domestication, there is still a relative paucity of knowledge regarding the role of epigenetic mechanisms in this process. Epigenetics has emerged as a crucial regulator of various biological processes in both plants and animals. Recent studies have begun to hint at the potential involvement of epigenetic changes in the adaptation and phenotypic diversification of domesticated crops. However, a comprehensive understanding of how these epigenetic modifications may have been harnessed—or inadvertently altered—during the domestication process is still in its infancy. Studying the contribution of epigenetic mechanisms to domestication [47,48,49] will provide novel insights into the early stages of domestication and the selective pressures faced by ancestral agriculturists. At the same time, such studies will lay the foundation for the development of comprehensive models integrating plant adaptation to the environment through epigenetics mechanisms, facilitating their use for the development of novel cultivars more resilient to stress [33].

Epigenetic signal of domestication is independent of genetic variation

Previous studies have shown that DNA methylation variability in plants can be attributed to three main factors: genetic (sequence) differences, environmental induction, and stochasticity (see Konate et al., 2020 for a recent example) [50]. Moreover, ** the evolutionary and developmental trajectories of domesticated species, influencing in the crop’s plasticity and uniformity. Nevertheless, since this study only included hermaphrodite flower producing cultivated accessions, further studies including dioecious cultivated accessions are required to determine if the epigenetic differences identified here are really associated with domestication or to the sexual strategy of the studied plants. Additionally, future studies should analyze complete methylomes and focus on the consequences of methylation changes on gene expression to gain a comprehensive understanding of the role of DNA methylation in grapevine domestication.