Background

Genomic regions, when bound by regulatory proteins, are thought to be depleted of nucleosomes or have undergone dynamic nucleosome modifications or displacement [1, 2]. These regions, also known as “open chromatin,” are highly enriched with active cis-regulatory DNA elements (CREs) in eukaryotic genomes [3, 4]. Open chromatin shows a pronounced sensitivity to cleavage by various nucleases, including deoxyribonuclease I (DNase I) [5, 6] and transposase Tn5 [7,8,9], whereas chromatin with DNA tightly bound by nucleosomes is drastically less sensitive to these nucleases. These genomic regions are known as DNase I hypersensitive sites (DHSs) and can be identified by DNase-seq [3, 10, 11] or ATAC-seq [8, 9]. DHSs have been shown to be frequently associated with most common types of active CREs, including promoters and enhancers [12, 13]. Strikingly, reporter assays indicated that 70–80% of the DHSs located in intergenic regions of Arabidopsis thaliana and Zea mays genomes show enhancer function [14, 15].

Histone modification plays an important role in epigenetic regulation of genes in response to environmental and developmental cues [16,17,18]. In plants, H3K4me3 is a euchromatin mark mainly distributed at 5′ ends of actively expressed genes [19,20,21]. These genes are generally associated with low levels of tissue specificity [19]. In contrast, H3K27me3 is associated with one of the major gene silencing systems in plants and is enriched across the transcribed regions of genes that are involved in many developmental and other processes [22]. Genes marked by H3K27me3 are usually transcriptionally inactive and display tissue specificity [22]. Genome-wide distributions of these histone modifications and their association with gene expression have been well-documented in a number of plant species [23, 24]. However, investigation of histone modification dynamics under abiotic stresses has been focused on individual stress-responsive genes. For instance, H3K4me3 has been found to be enriched at 5′ ends of the dehydration-induced genes in both Arabidopsis [21, 54], H3K4me3 was most enriched at the 5′ end of active genes in RT tubers (Fig. 5, Fig. 6d) and leaves (Fig. 5). Strikingly, in cold tubers, H3K4me3 became more widely distributed toward the center of gene body regions of all active genes (Figs. 5 and 6e). Interestingly, we observed the deposition of H3K4me3 within constitutively silenced genes in cold tubers (Fig. 6e). Comparison of the H3K4me3 signals in RT versus cold tubers revealed increased levels of H3K4me3 in the gene body regions of both active and constitutively silenced genes (Wilcoxon rank sum test, p < 2.2e−16), as well as in the 5′ and 3′ regions of constitutively silenced genes in cold tubers (Wilcoxon rank sum test, p = 1.2e−08) (Fig. 6f).

We conducted ChIP-seq for one additional histone modification mark H4K5,8,12,16ac (Additional file 2: Table S5), a mark generally associated with transcription [55]. The distribution of H4K5,8,12,16ac along active genes was similar to patterns of H3K4me3 in RT tubers (Additional file 1: Figure S10a) and was enriched at both the 5′ end and gene body regions in cold tubers (Additional file 1: Figure S10b). However, unlike H3K4me3, we did not observe the deposition of this mark in constitutively silenced genes upon cold treatment (Additional file 1: Figure S10c).

Bivalent H3K4me3-H3K27me3 mark associated with active genes in cold tubers

Coexistence of H3K4me3 and H3K27me3 on the same nucleosomes, known as bivalent histone marks, is well known to be associated with the promoters of poised genes responsible for differentiation and development in mammalian stem cells [7a, b). In addition, we found that H3K4me1 was exclusively enriched in the bivalent mark-associated genes in cold tubers, which resembles the association of the same bivalent mark with H3K4me1 reported in mammalian species [57], suggesting that the cold-induced bivalent H3K4me3-H3K27me3 mark may play a specific role in responses to cold stress. It will be interesting to examine if this bivalent mark is associated with plant response to other abiotic stresses.

In animals, most genes associated with H3K4me3-H3K27me3 bivalent domains were partially repressed [72]. Briefly, 10 g of finely ground powder was suspended in the same volume of pre-chilled nuclear isolation buffer (NIB; 10 mM Tris-HCl, 80 mM KCl, 10 mM EDTA, 1 mM spermidine, 1 mM spermine, 0.15% mercaptoethanol, 0.5 M sucrose, pH 9.5) as the volume of powder for nuclei isolation. The prepared nuclei pellet was suspended in nuclear digestion buffer (NDB; 10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2, pH 7.4) for DNase I digestion. The digestion was conducted with gradient concentrations (0.04–4 units) of DNase I for 10 min at 37 °C. Digestion patterns were visualized and assessed using a Pulsed-Field Gel Electrophoresis (PFGE) system (Bio-Rad, Cat.# 170-3615) with the program of 20–60 s switch time for 17.5 h at 6 V/cm. The running process was performed in a cold room (10 °C). After DNase I digestion, high molecular weight (HMW) DNA was isolated and blunt end withT4 DNA polymerase (NEB, Cat. #0203 L). HMW DNA was then ligated with adapter I (5′-Biotin-ACAGGTTCAGAGTTCTACAGTCCGAC-3′ and 5′ P-GTCGGACTGTAGAACTCTGAAC-3′) and digested with MmeI. Restriction enzyme MmeI-treated ends were ligated with adapter II (5′-P-TCGTATGCCGTCTTCTGCTTG-3′ and 5′-CAAGCAGAAGACGGCATACGANN-3′). The adapter-ligated DNA was enriched using PCR with linker-specific primers (5′-CAAGCAGAAGACGGCATACGA-3 and 5′-AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA-3′). DNA fragments ~ 90 bp in length purified by PAGE were submitted for sequencing using an Illumina HiSeq platform in the single-end mode with 50-nucleotide reads.

Each replicate sample used for DNase-seq was prepared for RNA-seq. High-quality RNA was extracted using a RNeasy Plant Mini Kit (Qiagen, Cat. # 74904), followed by DNase I treatment to remove genomic DNA. About 5 μg of total RNA was converted to cDNA using the TruSeq mRNA-seq kit from Illumina, and multiplexed cDNA libraries were sequenced on an Illumina HiSeq platform in the single-end mode, generating 50-nucleotide reads.

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) was performed following published protocols [73], using the same samples that were used for DNase-seq. Antibodies against H3K27me3 (Millipore 07-449), H3K4me3 (Abcam 8580), H3K4me1 (Abcam 8895), and H4K5,8,12,16 ac (Millipore 06-598) were used in ChIP experiments. ChIP-seq libraries for Illumina sequencing were constructed according to the protocol of “preparing samples for ChIP sequencing of DNA” provided by Illumina. Briefly, extracted nuclei were digested into monomer nucleosome pattern (~ 150 bp fragments) using MNase (Sigma N3755). Target chromatin fragments were captured using corresponding antibodies and precipitated with rProtein A sepharose (GE 17-1279-01). ChIPed DNA was extracted from precipitated chromatin for ChIP-seq library preparation. ChIPed DNA was end-repaired using an End-It DNA end repair kit (Epicenter, ER0720). The “dA” base was then added to 3′ ends of the end-repaired DNA fragments using Klenow fragment (NEB, M0212S), followed by Illumina adapter ligation for pair-end sequencing, using a quick ligase (NEB M2200). Adapter-ligated DNA fragments were purified by running a 2% agarose gel in TAE buffer and were size-selected from 150 to 300 bp. Purified adapter-ligated ChIPed DNA was enriched by 13 PCR cycles and purified by running a 2% gel for isolating DNA fragments in the range of 200–300 bp. Purified ChIP-seq libraries were sequenced on an Illumina HiSeq platform in either single-end or paired-end mode with 100- or 150-nucleotide reads.

Sequential ChIP-seq

Sequential ChIP-seq was conducted using a Re-ChIP-IT kit (Active Motif, Cat # 53016) following the manufacturer’s instruction. The material that was used for DNase-seq was also used for sequential ChIP-seq. Potato tubers were sliced and then cross-linked with 1% formaldehyde for 10 min by vacuum infiltration. Cross-linked tissue was quickly quenched in 0.125 M glycine followed by 3 times of wash using ddH2O. After grinding, nuclei were isolated in nuclei extraction buffer (10 mM Tris-HCl pH 8.0, 0.25 M sucrose, 10 mM MgCl2, 1% Triton X-100, and protease inhibitors) and pelleted by centrifugation. Nuclei were re-suspended in buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, and 1% SDS) and fragmentized to 200–500 bp with an average size of ~ 300 bp using sonication (Qsonica Q700) for 2 min with settings of pulse on for 15 s and pulse off for 60 s on ice. Following the published strategy from Arabidopsis [32], chromatin was immunoprecipitated with anti-H3K4me3 and subsequent anti-H3K27me3, as well as anti-H3K27me3 and subsequent anti-H3K4me3, respectively, for each sample. In addition, chromatin from each sample was also immunoprecipitated with first antibody and followed by no antibody as control, to eliminate the possibility that the final enrichment was due to carry-over from the first antibody. Immunoprecipitated DNA was de-cross-linked and extracted for Illumina sequencing library construction. The library construction is the same as for regular ChIP-seq library. Sequential ChIP-seq libraries were sequenced on an Illumina HiSeq platform in paired-end mode with 150-nucleotide reads.

Reverse transcription and quantitative real time-PCR

Reverse transcription was performed using Invitrogen SuperScript™ III Reverse Transcriptase kit (Invitrogen, Cat # 18080044) with oligo(dT)20 primer. The amounts of individual genes were measured by using gene-specific primers with SYBR Advantage qPCR Premix (Takara, Cat # 639676). Quantitative real-time-PCR (qRT-PCR) was conducted using a RT-PCR cycler (CFX connect Bio-Rad) with settings of initial denaturation 95 °C for 30 s, and 40 cycles of 95 °C for 15 s, 56 °C for 20 s, and 72 °C for 15 s. Three biological replicates from each treatment were used for quantifying relative expression for each gene. The expression of individual genes was normalized to the reference genes EF1α using the 2−ΔΔCt calculation. Statistical significance was evaluated using t test. The specific primers used for potato genes are shown in Additional file 2: Table S8.

Data analysis

The raw reads generated from DNase-seq, RNA-seq, ChIP-seq, and sequential ChIP-seq were processed for quality control using FASTQC program (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Reads were cleaned using Cutadapt v1.9.1 [74] with a minimum base quality of 20. Cleaned DNase-seq, ChIP-seq, and sequential ChIP-seq reads were aligned to the DM potato genome assembly (PGSC v4.04 [38]) using Bowtie 1 [75] with no mismatches allowed. Only reads that mapped to unique positions were used for further analysis.

DNase I hypersensitive sites (DHSs) were identified using in-house developed program Popera [76] (https://github.com/forrestzhang/Popera) with FDR < 0.05. Popera applies the kernel density estimation algorithm for the DHS identification, which is similar to the algorithm defined in F-seq [10]. DHSs were identified independently in each biological replicate for each sample. Overlap** DHSs (at least 1 bp overlap) between 2 biological replicates of a sample were retained for downstream analyses. The distribution of DNase-seq reads in the potato genome was revealed by calculating the coverage of unique reads (mapped to a unique genomic position) in each 100 bp non-overlap** window from the entire genome. The most frequent DNase I cutting site (a single base pair position) within a DHS was indicated from the DHS peak point calculated using the number of uniquely aligned DNase-seq reads. Genomic distribution of DHSs relative to annotated genes was determined if the most frequent DNase I cutting site within a DHS is located in a genomic feature. Tissue-specific and temperature-specific DHSs were identified if a DHS does not overlap (no single base pair overlap) any DHSs found in the other sample. To define the DNase I sensitivity, genes were aligned from transcription start sites (TSSs) to transcription termination sites (TTSs) and divided into 100 bins, while gene flanking regions were also partitioned into the same number of windows as genes. Normalized DNase-seq reads were plotted over aligned genes as well as their ± 1 kb flanking regions.

To determine the histone modification distribution from ChIP-seq and sequential ChIP-seq data, the mid-point of the uniquely aligned pair-end reads was set as the modification signal. The level for individual histone modification was measured by quantifying histone modification signals within an interval and normalizing to length of the interval, read number per base genome per million mapped reads, and input data. Similarly, the level of bivalent histone modifications was quantified within an interval using bivalent histone modification signals generated from sequential ChIP-seq with anti-H3K4me3 followed by anti-H3K27me3 (named K4-K27) and normalized to control sequential ChIP-seq with anti-H3K4me3 followed by no antibody (K4-noAb), the length of the interval, and read number per base genome per million mapped reads. The bivalent histone modification level was also measured for the same sample using sequential ChIP-seq with reversed order of antibodies (K27-K4 normalized to K27-noAb). Genes were processed for further analyses only if they displayed increased levels of bivalent histone modifications in both sequential ChIP-seq K4-K27 and K27-K4 data upon cold storage.

RNA-seq reads processed from quality control were mapped to the potato (PGSC v4.04 [38]) genome assemblies, using Tophat (v2.1.1) [77]. Cufflinks (v2.2.1) [78] was used to call the expression value (FPKM) of annotated potato genes. Differentially expressed genes were called using Cuffdiff (v2.21) and DEseq2 (v1.10.1) [79] with FDR < 0.01, respectively. Differentially expressed genes were used for further analyses if they were detected by both Cuffdiff and DEseq2. Similarly, genes were considered not differentially expressed or constitutively silenced if they were detected by both Cuffdiff and DESeq2. The programs of data process and statistical test were written and conducted in Perl or R (https://www.r-project.org). z test was conducted using two-tailed probability.

Motif search

Cold-specific genic DHSs were split into those overlap** putative promoters, exons, and introns. The top 1000 DHSs based on peak read depths were used for further analysis. Motif scanning was conducted using meme-chip from the MEME suite tools [80]. Negative control sequences for cold-specific DHSs at promoters were constructed by taking the top 1000 promoter DHSs that overlapped in both RT and cold tubers data sets (shared promoter DHSs). Negative control sequences for cold-specific exonic and intronic DHSs were assembled similarly as cold-specific DHSs at promoters, except using exonic and intronic DHSs in lieu of promoter DHSs (shared exonic DHSs and intronic DHSs), respectively. All DHSs were aligned by their peak coordinates and scanned for motifs using 100 bp surrounding the peak coordinates.

Gene ontology enrichment

A total of 6442 potato genes associated with bivalent H3K4me3-H3K27me3 mark upon cold stress were divided into three groups based on differential expression upon cold stress. Homologous sequences in Arabidopsis thaliana of the upregulated (n = 3064), downregulated (n = 1994), and constitutively expressed (n = 1384) potato bivalent mark-associated genes were identified, respectively, using the Blastp program (BLAST v2.2.31). The Arabidopsis homologous protein sequences with the highest similarity to the potato bivalent mark-associated genes were screened for enriched Gene Ontology terms using agriGO [81]. Enrichment test was conducted using Fisher’s exact test and the Benjamini–Hochberg FDR P value normalization. Background terms were set to all annotated Arabidopsis genes for each enrichment test.