Background

Low temperature has strong influence on the geographical distribution, growing season, quality and yield of plants. Previous reports have shown that plants may develop acquired freezing tolerance after exposure to non-freezing low temperatures [1, 2]. Plant cells cope with cold stress by regulating the expression of transcription factors and effectors during non-freezing low temperatures [3]. However, the transcriptome-level changes that underlie perception of temperatures below zero, which may be related to the ability to survive under extremely low temperatures [4], is poorly understood comparing that with cold sensing above freezing temperature.

A variety of cold-responsive (COR) genes that are under the control of some key transcription factors (TFs) are thought to be involved in cold sensing [5]. For example, the well characterized TF DREB1/CBF can regulate a subset of COR genes by binding the DRE/CRT cis-elements in promoter regions of COR genes [68]. By studying the DREB1/CBF network pathway, the roles of cellular or environmental factors, e.g. calcium, light, and circadian rhythm, are revealed in cold acclimation [1, 9]. The DREB1/CBF pathway in chilling response seemed well characterized in some plants, and its regulon has been identified in Brassica napus, rice, and poplar. However, the molecular mechanisms of cold-acclimation response are not well understood at the whole genome or transcriptome level since only 12% of cold responsive genes are members of the CBF regulon [10, 11]. It remains to be answered whether low temperature perception occurs below freezing temperature, and if so, whether it occurs by a similar molecular mechanism as above freezing temperature.

Populus euphratica Oliv. is naturally distributed in semiarid areas and plays an important role in maintaining local arid ecosystems [12]. They distinguish themselves considerably from other species by growing in deserts with extremely salty soil and wide environmental temperature ranges. Thus, P. euphratica has been widely considered as a model species for elucidating abiotic resistance mechanisms in trees [1317]. Screening for cold responsive genes in P. euphratica can be a useful approach to understand the responses of woody plants to low temperatures and can also help elucidate the difference in cold perception between below- and above- freezing temperatures.

Recently, the development of Illumina/Solexa-based deep-sequencing technologies has made it possible to capture an unbiased view of the RNA transcript profile of a species under a given condition at the whole genome level [18]. Using this method, ESTs and numerous novel transcripts have been discovered in a tissue-specific manner [19, 20]. In the current study, we sought to identify genes linked to low temperature (below or above zero) perception and to explore the regulatory and sensory mechanisms involved in low temperature response processes by performing de novo assembly of the P. euphratica transcriptome using Solexa data. Two-year-old plants were subjected to temperatures of 4°C and a further drop to -4°C to conduct comprehensive analysis of transcriptional responses. The acquired information may facilitate attempts to elucidate response mechanisms of this species to chilling stress and will help in the development of strategies for improving of freezing tolerance in trees.

Results and discussion

Reads assembly and poplar databses alignment

Three cDNA libraries were generated with mRNA from control (22°C), 4°C- or -4°C-treated P. euphratica plants and sequenced by Illumina deep-sequencing. After removing adapters, low-quality sequences, and ambiguous reads, a total of 132 million, 135 million, and 134 million clean reads with a mean length of 90 bp were generated in the control (CK), 4°C-treated sample (C4), and -4°C-treated sample (F4), respectively (Table 1). The raw data were deposited in the NCBI Sequence Read Archive (SRA) under the accession number SRP026075. The total length of the reads was >30 gigabases (Gb), equivalent to ~75-fold coverage of a P. trichocarpa genome. All trimmed reads were de novo assembled into contigs by the Trinity method [21]. The average contig size exceeded 320 bp in each of the three libraries (Figure 1A). Using paired-end information, the contigs were joined into assembled unigenes. Over 80% reads could be mapped back to the assembled transcripts, indicating a high quality of assembly (Additional file 1). Finally, 108,502 unigenes with an average length of 1,047 bp and N50 of 1,821 bp were assembled (Table 1). All unigenes were longer than 200 bp and 11.34% (12,309) of them were longer than 1,000 bp (Figure 1B).

Table 1 Overview of the sequencing and assembly
Figure 1
figure 1

Overview of assembly by Trinity. (A) Length frequency distribution of contigs obtained from de novo assembly of high-quality clean “reads”. (B) Length frequency distribution of unigenes produced by contig joining.

To estimate the representation of unigenes in the collection, all unique sequences generated from different assemblages were subjected to a BLAST comparison to compare EST collections from a variety of Populus species. The results indicated that our assemblages covered most P. euphratica transcripts (Additional file 2). By performing BLASTx against the Populus trichocarpa v3 dataset with an E-value of 1.0E-5 as threshold, 83,618 ESTs were assigned with an identity score ≥ 75%, covering 77.07% of assembled unigenes (Additional file 3). Of these unigenes, 79, 389 (97.3%) members shared >90% identity with their homologs from P. trichocarpa. Meanwhile, 3,6559 homologs (>80%) of P. trichocarpa v3 gene models have been sequenced. All these results indicate that our RNA-sequencing data has high contiguity, coverage, and could be used for further analyses.

Functional annotation and classification of the unigenes

Using the best hits found by BLASTx or BLASTn with an E-value of < 1.0E-5, all of the unigenes were annotated according to the public databases including non-redundant protein (NR) database, non-redundant nucleotide (NT) database, SwissProt, Kyoto Encyclopedia of Genes and Genomes (KEGG) database and Clusters of Orthologous Groups (COG) database on the basis of similarities. The number of unigenes annotated with each database is summarized (Additional file 4). Of the 108,502 high-quality unique sequences, 85,584 (78.88%) unigenes had at least one significant match to an existing gene model in the BLAST searches (Additional file 4). By performing BLASTx against the NR database with an E-value cut-off of 1.0E-5, 71,428 BLASTx hits were obtained, covering 83.5% of the annotated unigenes. Within the P. euphratica unigene set, 49,291 (45.43%) unigenes were categorized (E-value < 1.0E-5) in 25 COG clusters (Figure 2). The five largest categories were: 1) general function predictions only (18.2%), 2) transcription (9.6%), 3) replication, recombination and repair (8.3%), 4) signal transduction mechanisms (7.3%) and 5) post-translational modification, protein turnover, chaperones (6.8%). Classification of Gene ontology (GO) terms was performed according to the NR annotation using the Blast2GO software [2]. In the category of biological process, the largest groups were cellular process, metabolic process, response to stimulus, and biological regulation (Figure 3). As for the molecular function category, unigenes with binding and catalytic activity formed the largest groups.

Figure 2
figure 2

COG functional classification of the P. euphratica transcriptome. 49,291 unigenes with significant homologies in the COG database (E-value < 1.0 E-5) were classified into 25 COG categories. The capital letters in x-axis indicates the COG categories as listed on the right of the histogram, and the y-axis indicates the number of unigenes.

Figure 3
figure 3

Function classifications of GO terms of all P. euphratica transcripts. Based on highscore BLASTx matches in the NR plant proteins database, P. euphratica unigenes were classified into three main GO categories and 31 sub-categories. The left y-axis indicates the percentage of a specific category of genes in each main category. The right y-axis indicates the number of genes in the same category.

To obtain a better understanding of the biological functions of the unigenes, a KEGG pathway-based analysis was also performed. Based on a comparison against the KEGG database using BLASTx with an E-value cutoff of <1.0E-5, 39,313 (36.23%) of the 108,502 unigenes had significant matches in the database and were assigned to 127 KEGG pathways. Of the 8,220 metabolism pathway unigenes, 2,726 were involved in plant hormone signal transduction pathways, including tryptophan metabolism, zeatin biosynthesis, diterpenoid biosynthesis, carotenoid biosynthesis, cysteine and methionine metabolism, brassinosteroid biosynthesis, α-Linolenic acid metabolism, and phenylalanine metabolism.

The three samples had 68 members in common when the 100 most abundant transcripts were compared (Additional file 5). The 23 unique members highly expressed in the control were involved in auxin signaling, cell division, and biogenesis. In contrast, the 19 unique members highly expressed in the C4 sample were stress (e.g., arginine decarboxylase, and dehydration) -induced genes. The 28 unique members highly expressed in the F4 sample were also stress-related genes, e.g., the glucanase, zinc finger protein, and E3 ubiquitin-protein ligase genes. These results indicate that our data are reliable.

Protein coding sequence prediction

Unigenes were aligned by BLASTx (E-value < 1.0E-5) against the NR, Swiss-Prot, KEGG, and COG protein databases in that order. Unigenes aligned to a high priority database were not aligned to databases of lower priority. The process ended when all alignments had been performed. The correct reading frame of the nucleotide sequences (5’-3’direction) of unigenes was defined by the highest rank in the BLAST results, and the corresponding protein sequences were obtained from the standard codon table. Unigenes that could not be aligned to any database were scanned with ESTScan [22] to produce the nucleotide and amino acid sequences of the predicted region. In total, 71,559 unigene coding sequences (CDSs) were generated by the BLASTx protein database searches described above. Of these unigenes with CDS sequences, the majority (44,005 members, occupied 61.5%) were over 500 bp and 23,479 were over 1, 000 bp in length (Figure 4A-B). Using the ESTscan program, we assigned another 489 unigene CDSs that could not be aligned to above databases (Figure 4C-D). The length frequency distributions of these unigene CDSs and their corresponding amino acid sequences are given (Figure 4).

Figure 4
figure 4

Transcriptome coding sequence (CDS) predicted by BLASTx and ESTScan. (A) The length distribution of CDs using BLASTx. (B) The length distribution of proteins using BLASTx. (C) The length distribution of CDs using ESTscan. (D) The length distribution of proteins using ESTscan.

Differentially expressed gene among three samples

We measured gene expression levels based on fragments per kilobase of exon model per million mapped reads (FPKM). After applying the chi-square test and Benjamini-Hochberg multiple testing corrections using R program among three samples simultaneously, we identified 2,858 genes as reliable DEGs in at least two samples (assigned as either DEGs) regardless of fold change (Additional file 6). Of these DEGs, 131 were expressed differentially in all three samples (assigned as all-DEGs, Additional file 7). Given a standard at an estimated absolute log2-fold change of >1, the respective DEGs of CK vs. C4, CK vs. F4 and C4 vs. F4 were 1,661, 866, and 1,161 (Additional files 8, 9, 10). The number of up-regulated unigenes in C4 and F4 samples was 1,113 and 630, respectively.

To accurately identify DEGs, we selected the 50 most significantly up-regulated transcripts that could be well-annotated by poplar database or NR database. As a result, those coding for the chlorophyll a/b binding protein (e.g., Unigene50811, CL12828.Contig3, Unigene50363, and Unigene55266), rubisco activase (CL4046.Contig4, Unigene50527, and Unigene55538), AP2/ERF transcription factors (Unigene26311,Unigene22719, CL9386.Contig2, Unigene18453, and CL9876.Contig3), and some other transcription factors (CL1721.Contig8, and Unigene27837) were the most up-regulated interpretable transcripts in C4 sample (Additional file 11). As for the top 50 up-regulated transcripts in the F4 sample (Additional file 12), the annotated transcripts focused on transcription factors (DREB1 transcription factors e.g. unigene26567 and unigene26311; WRKY transcription factor Unigene18620) and xyloglucan endotransglycosylases (Unigene19292, Unigene14078 and CL29.Contig1).

Although Illumina sequencing is a highly efficient method for DEG screening, false positives still occur because of the sensitivity of this technology to templates present in DNA samples [23]. Thus, we validated the RNA sequencing data by performing qPCR analysis on 10 transcripts randomly selected from the up-regulated gene list. The qPCR results indicate that all of these DEGs exhibited similar expression kinetics to those obtained from the RNA sequencing analysis (Figure 5). These results support the validity of the method used for determining DEGs from the RNA sequencing analysis.

Figure 5
figure 5

Expression analyses of 10 DEGs by qPCR. qPCR was performed on 10 members randomly selected from up-regulated gene lists of the C4 or F4 sample.

Gene ontology and pathway enrichment analyses of differentially expressed unigenes

All DEGs were mapped to each term of the Gene Ontology database (http://www.geneontology.org/, release data: Aug 1st, 2012) and the gene numbers were calculated from each GO term. Using a hypergeometric test, we identified the significantly enriched GO terms of DEGs compared to the genomic background (p ≤ 0.05, after Bonferroni correction). In the category of biological processes, three Go terms including “response to stress”, “response to stimulus” and “response to carbohydrate stimulus” are enriched (p ≤ 0.05, after bonferroni correction) after 4°C and -4°C treatments (Table 2), suggesting that genes in these processes may play important roles in low temperature perception. Additionally, “carbon fixation process”, “glucan metabolic process” and “macromolecule metabolic regulation processes” are also enriched for DEGs in C4 (Table 2), indicating that genes related to these processes may also participate in cold sensing. A close inspection referred to “response to stimulus” category indicated that “response to hormone stimulus” and “response to abiotic stimulus” were two over-presented subcategories (data not shown), suggesting our low temperature treatment may have caused an efficient abiotic stress and have activated some hormone response process. Furthermore, DEGs with “protein binding” and “protein modification” subcategories were also over-presented in both samples, indicating that comprehensive changes had taken place in cells in response to low temperature stress. We further performed Go enrichment analysis for genes that differentially expressed in all of three samples and the results indicated that those involved in gene expression regulation, macromolecule metabolic process regulation, and abiotic stimulus response were enriched. As for the category of “molecular function”, DEGs with “structural molecule activity” was the only common group that over-presented after 4°C and -4°C treatments (Table 2).

Table 2 Over-representative GO terms of DEGs in low temperature stressed P. euphratica GO ID

By performing the KEGG pathway analyses, we identified twelve pathways that changed significantly (q ≤ 0.05) under 4°C treatment, including the members involved in carbohydrate, energy, vitamin, hormone, and nitrogen metabolism (Additional file 13). “Plant pathogen interaction”, “hormone signal transduction”, and “biosynthesis of unsaturated fatty” pathways had the top three most differentially expressed unigene numbers and thus seem to play important roles in low temperature perception above freezing point. As for -4°C treatment, only 3 pathways changed significantly (Additional file 13). “Plant pathogen interaction” was assigned as a major pathway that changed significantly in both treated groups, indicating that low temperature stress response signal network may overlap with plant-pathogen interactions signals in P. euphratica. This is a notable finding considering that little is known about the overlap in signal transduction between abiotic and biotic stresses. Additionally, the transcripts of all of unsaturated fatty acid pathway genes increased significantly in the C4 sample. This result is in accordance with previous reports that plants undergoing low temperature stress preferentially accumulate poly-unsaturated and unsaturated fatty acids, which enhance low temperature tolerance under chilling conditions [24, 25].

Transcription factors responding to low temperature stress

Transcription factors play crucial roles in the regulation of target gene expression via specific binding to cis-acting elements in their promoters [26]. Many of the COR genes contain cis-elements, such as dehydration-responsive elements/C-repeat elements (DRE/CRT, A/GCCGAC) and myeloblastosis (MYB, C/TAACNA/G) [27, 28] in their promoters that can be regulated by DREB and MYB transcription factors. Analysis of these transcription factors could provide useful information on the complex regulatory networks involved in P. euphratica cold stress responses.

Changes in the expression of transcription factors occurred both after 4°C and -4°C treatments (Table 3). The AP2/ERF transcription factors were overrepresented ( log2-fold change > 1) in both treated samples. This family contains 24 and 22 up-regulated members in the C4 and F4 samples, respectively (Table 3), indicating its important role in low temperature stress responses. The AP2/ERF transcription factors have been subdivided into five subfamilies including AP2 subfamily, DREB subfamily, ERF subfamily, RAV subfamily and others. Some RAP homologs (e.g. unigene16978 and CL5587.contig2) and ERF homologs (e.g. unigene8840, CL4762.contig1, CL13298.contig1), which were seldom studied in cold sensing were up-regulated in both C4 and F4 samples, indicating the potential function of these subfamilies in cold response. As a group of DREB subfamily, CBF/DREB1 was found to be expressed specifically under cold stress but not under normal growth conditions. Here, several DREB1/CBF-like unigenes changed their expression significantly after low temperature treatments. The transcripts of two CBF4/DREB1D homologs, Unigene26311 and Unigene22719, both increased over 11-fold after both treatments (Additional files 11, 12). However, no Arabidopsis CBF2 homologs were found up-regulated in the P. euphratica transcriptome. Thus, our results not only indicate a key role of the CBF/DREB1 transcription factors in low temperature responses but also suggest that the CBF/DREB1 transcriptional activation mechanism of P. euphratica is not necessarily the same as that of Arabidopsis.

Table 3 Distribution of differentially expressed transcription factors

Previous studies have shown that not all cold-regulated gene expression is under the direct control of the CBF/DREB family [11, 29]. Besides the AP2/ERF family, it is likely that the WRKY and NAC transcription factors also play important roles in the transcriptional regulation of genes in early cold response in P. euphratica because they were overrepresented in the up-regulated gene list (log2-fold change > 1). In the WRKY family, 20 and 12 members were up-regulated in the C4 and F4 groups, respectively (Table 3). In comparison, none was found down-regulated in the respective groups. In the NAC transcription factor family, the transcripts of 9 and 5 members were up-regulated, while none was found down-regulated in both treated samples. Evidence that the WRKY and NAC transcription factor gene families may play important roles in the regulation of transcriptional reprogramming associated with cold stress responses is incremental [68]. The FPKM method formula was:

FPKM = 10 6 C NL / 10 3 ,

where C is the number of reads that uniquely aligned to one unigene; N is the total number of reads that uniquely aligned to all unigenes; L is the base number in the CDS of one unigene. The goal of this transformation is to normalize the counts in regard of the differing library sizes and the length of the transcripts [69].

We identified DEGs from different samples (CK, C4 and F4) using R program. The Pearson’s chi-square test was applied to assess the lane effect. For each gene, the P-value was computed. After that, Benjamini–Hochberg false discovery rate (FDR) was applied to correct the results for p value. FDR method is widely used in deep-sequencing studies because of its power in finding over-representative unigenes [7073]. The transcripts that were induced or suppressed at an estimated absolute log2-fold change of >1 and FDR adjusted p-value ≤ 0.05 were considered to be differentially expressed [74].

GO and KEGG analyses for differentially expressed unigenes

In order to find the significantly enriched GO terms in DEGs against a genome background, the DEGs were annotated to GO database (http://www.geneontology.org/) using hypergeometric test for statistical analysis [25]. For p value correction, we used the rigorous Bonferroni correction method. The cutoff p value after correction was 0.05. GO terms fulfilling this condition were defined as being significantly enriched. The KEGG pathway enrichment analysis of DEGs was also performed with the whole genome background as a reference to find the main biochemical pathways and signal transduction pathways in which DEGs involved. After multiple testing corrections, we defined pathways with q-value ≤0.05 as being those significantly enriched in DEGs.

Quantitative PCR analysis

Quantitative PCR (qPCR) was performed to determine the expression level of selected unigenes. The qPCR was conducted using a power SYBR Green PCR Kit (ABI) in a MicroAmp™ 96-well plate with a StepOnePlus™ Real-Time PCR System (ABI). The relative expression value was calculated by the 2-ΔΔCt method using PeActin (GenBank accession number EF148840) as an internal control [75]. Gene-specific primers used in the qPCR analysis are listed in Additional file 14. RNA pools used in the qPCR analyses were extracted from three independent samples which were different from those used for RNA-seq. Three technical replicates were used for each sample.