Background

Transcription factors exhibit sequence-specific DNA-binding and are capable of activating or repressing transcription of downstream target genes. In plants, WRKY proteins constitute a large family of transcription factors that are involved in various physiological processes. Proteins in this family contain at least one highly conserved signature domain of about 60 amino acid residues, which includes the conserved WRKYGQK sequence followed by a zinc finger motif, located in the C-terminal region [1]. The WRKY domain facilitates binding of the proteins to the W box or the SURE (sugar-responsive cis-element) in the promoter regions of target genes [2, 3]. As deduced from nuclear magnetic resonance (NMR) analysis of the C-terminal WRKY domain of Arabidopsis WRKY4 (AtWRKY4), the conserved WRKYGQK sequence of WRKY domains is directly involved in DNA binding [4]. WRKY proteins can be classified into three groups (1, 2 and 3) based on the number of WRKY domains and the pattern of the zinc-finger motif. Group 1 proteins typically contain two WRKY domains including a C2H2 motif. Group 2 proteins have a single WRKY domain and a C2H2 zinc-finger motif and can be further divided into five subgroups (2a-2e) based on the phylogeny of the WRKY domains. Group 3 proteins also have a single WRKY domain, but their zinc-finger-like motif is C2-H-C [1].

Since the cloning of the first cDNA encoding a WRKY protein, SPF1 from sweet potato [5], a large number of WRKY proteins have been experimentally identified from several plant species [617], and have been shown to be involved in various physiological processes under normal growth conditions and under various stress condition [18]. It has been well documented that WRKY proteins play a key role in plant defense against various biotic stresses including bacterial, fungal and viral pathogens [1927]. They also play important regulatory roles in developmental processes, such as trichome initiation [28], embryo morphogenesis [29], senescence [30], and some signal transduction processes mediated by plant hormones such as gibberellic acid [Full size image

Next, to establish whether these WRKY genes are expressed, we screened the cucumber EST database in NCBI. Twenty-seven putative WRKY genes matched at least one EST hits (Table 1). We cloned and sequenced full-length cDNAs of 32 of the annotated CsWRKY genes (Table 1). Consequently, annotation errors of 17 putative WRKY genes could be corrected (data not shown). All CDSs of 32 CsWRKY genes have been submitted to GenBank and their accession numbers in GenBank were showed on Table 1.

Table 1 WRKY genes in cucumber

Multiple sequence alignment, structure and phylogenetic analysis

The phylogenetic relationship of the CsWRKY proteins was examined by multiple sequence alignment of their WRKY domains, which span approx 60 amino acids (Figure 2). A comparison with the WRKY domains of several different AtWRKY proteins resulted in a better separation of the different groups and subgroups. For each of the groups or subgroups, 1, 2a to 2e and 3, one representative was chosen randomly. These were: AtWRKY20, 40, 72, 50, 74, 65 and 54. As shown in Figure 2, the sequences in the WRKY domain were highly conserved.

Figure 2
figure 2

Alignment of multiple CsWRKY and selected AtWRKY domain amino acid sequences. Alignment was performed using Clustal W. The suffix 'N' or 'C' indicates the N-terminal WRKY domain or the C-terminal WRKY domain, respectively, of a specific WRKY protein. The amino acids forming the zinc-finger motif are highlighted in yellow. The conserved WRKY amino acid signature is highlighted in grey, and gaps are marked with dashes. The position of a conserved intron is indicated by an arrowhead.

Sequence comparisons, phylogenetic and structural analyses showed that the WRKY domains could be classified into three large groups corresponding to groups 1, 2 and 3 in Arabidopsis as shown by Eulgem et al., 2000 (Figure 3). It is worth noting that group 1 contained 12 CsWRKY proteins, eight of which contained two WRKY domains. However, the other four (CsWRKY15, CsWRKY16, CsWRKY38 and CsWRKY39) contained only one WRKY domain but clustered with CTWD (C-terminal WRKY domains) and NTWD (N-terminal WRKY domains) respectively. Our study further showed that CsWRKY15 and CsWRKY16 were actually two domains of one WRKY protein, while CsWRKY38 and CsWRKY39 were two independent WRKY proteins. Domain acquisition and domain loss events appear to have shaped the WRKY family [41, 42]. Thus, CsWRKY38 and CsWRKY39 may have arisen from a two-domain WRKY protein that lost one of its WRKY domains during evolution. The structure and phylogenetic tree of the CsWRKY domain clearly indicated that group 2 proteins can be divided into five distinct subgroups (2a-e). Compared with the group 3 proteins in Arabidopsis (14 members), there are only 6 CsWRKY proteins in group 3. Whereas genome duplication events have resulted in the expansion of the WRKY genes in Arabidopsis and rice [17], it appears that these events have not occurred in the cucumber WRKY family. Although Huang et al. [40] reported that the cucumber genome shows no evidence of recent whole-genome duplication and tandem duplication. We used the method of Schauser et al. [43] to search for small duplication blocks in CsWRKY family, but none were found. In addition, a rooted phylogenetic tree of WRKY domains was also constructed to identify putative orthologs in Arabidopsis and cucumber (additional file 1). All orthologs are listed in additional file 2.

Figure 3
figure 3

Unrooted phylogenetic tree representing relationships among WRKY domains of cucumber and Arabidopsis. The amino acid sequences of the WRKY domain of all CsWRKY and AtWRKY proteins were aligned with Clustal W and the phylogenetic tree was constructed using the neighbor-joining method in MEGA 4.0. Group 1 proteins with the suffix 'N' or 'C' indicates the N-terminal WRKY domains or the C-terminal WRKY domains. The red arcs indicate different groups (or subgroups) of WRKY domains. Diamonds represent orthologs from cucumber (blue) and Arabidopsis (red).

Analysis of the structure of CsWRKY genes showed that all WRKY genes except CsWRKY40 had at least one intron insert. Two major types of intron splicing were found in the conserved WRKY domains of CsWRKY genes (Figure 2), which are similar to WRKY domains in AtWRKY genes. However, the length of the conserved introns was 2.8 times greater in cucumber (~686 bp) than in Arabidopsi s (~241 bp). Coincidentally, this rate was very similar to the size difference (2.9 times) between the genome of cucumber (376 Mb) and Arabidopsis (125 Mb). The conserved motifs of WRKY family proteins in cucumber and Arabidopsis were investigated using Meme version 4.4 as described in the Methods (additional file 3), and a schematic overview of the identified motifs is given in additional file 4. As displayed schematically in Figure 4, except for the members of group 2c and group 2e, one or more conservative motifs outside of the WRKY domain motif can be detected in a WRKY protein. The CsWRKY and AtWRKY proteins from the groups 1 and 2, always share the same conserved motifs. In contrast, the members of group 3 AtWRKY (AtWRKY63, AtWRKY64, AtWRKY66 and AtWRKY67) show an Arabidopsis- specific conserved motifs (motifs 6, 7 and 8; additional file 3), but other members of group 3 share the same conserved motifs with other CsWRKY proteins.

Figure 4
figure 4

Schematic diagram of amino acid motifs of CsWRKY and AtWRKY proteins from different groups (or subgroups). Motif analysis was performed using Meme 4.0 software as described in the Methods. The selected WRKY proteins are listed on the left. The black solid line represents the corresponding WRKY protein and its length. The different-colored boxes represent different motifs and their position in each WRKY sequence. A detailed motif introduction for all CsWRKY proteins is shown in additional file 4.

Expression profile of CsWRKY genes under normal growth conditions and under various abiotic stress conditions

We analyzed the expression of all CsWRKY genes under normal growth conditions in seven different tissues: cotyledons, leaves, roots, stems, female flowers, male flowers and fruits. Not all of the predicted genes were expressed in plants grown under normal growth conditions. Among 55 predicted genes, 48 genes (87%) were expressed in at least one of the seven tissues (Figure 5). The other seven genes did not show any detectable expression as tested by RT-PCR in the above tissues, but they may be expressed in other tissues, e.g., seeds. Also, some of the CsWRKY genes may be pseudogenes. The following ten genes were expressed in all tested tissues with relatively higher expression intensities: CsWRKY2, CsWRKY7, CsWRKY14, CsWRKY17, CsWRKY25, CsWRKY37, CsWRKY41, CsWRKY44, CsWRKY49 and CsWRKY57. Five WRKY genes (CsWRKY5, CsWRKY13, CsWRKY23, CsWRKY28 and CsWRKY55) were expressed at relatively low levels in all the tested tissues.

Figure 5
figure 5

Expression profiles of cucumber WRKY genes in various tissues as determined by RT-PCR analyses. Seven amplified bands from left to right for each WRKY gene represent amplified products from cotyledons, leaves, roots, stems, female flowers, male flowers and fruits.

We used RT-PCR analyses to examine the expression of CsWRKY genes in response to three different abiotic stresses: cold, drought and salinity. Of the 48 expressed CsWRKY genes, 23 showed differential expressions in response to at least one stress, whereas the other 25 did not (Table 2). It should be noted that none of the stress-inducible CsWRKY genes belongs to group 3. We conducted real-time PCR analyses to confirm and quantify the expression levels of the 23 stress-inducible WRKY genes in response to abiotic stresses. As shown in Figure 6, RT-PCR and real-time PCR generally gave the same results for the expression profiles and abundance of transcripts. However, in rare instances, the difference in expression detected by real-time PCR was more significant than that detected by RT-PCR (Figure 5E). As shown in Table 2, the results of real-time PCR showed that most of the stress-responsive genes were upregulated in response to abiotic stress (Figure 6A, B, C), and only three genes were downregulated (Figure 6D). As determined by real-time PCR analysis, there were no differences in the expressions of six group 3 CsWRKY genes in response to abiotic stress (Figure 6F).

Table 2 CsWRKY gene expression patterns under abiotic stress as determined by RT-PCR and real-time PCR.
Figure 6
figure 6

Expression patterns of six selected WRKY genes under abiotic stresses. In A-F, the top panel shows the RT-PCR result and the bottom panel shows the corresponding real-time PCR result. For real-time PCR, the relative amount of mRNA (y-axis) was calculated by according to the description in Methods. The cucumber β-actin gene was used as an internal control to normalize the data. The 0, 0.5, 1, 3, 6, 12, and 24 (x-axis) indicate the treatment time (hour) under corresponding abiotic stresses. The error bars were calculated based on three replicates. A-C, significant up-regulated expression of WRKY genes can be detected under abiotic stresses. D, significant down-regulated expression of CsWRKY53 can be detected under cold treatment. E, the expression difference detected by real-time PCR was more significant than that detected by RT-PCR. F, no significant expression difference can be detected in group 3 WRKY gene CsWRKY50 under abiotic stress. Statistical significance was obtained by using Student's t-test.

Comparison of abiotic stress-inducible orthologs between cucumber and Arabidopsis

We compared the expressions of CsWRKY genes with those of their possible orthologs in Arabidopsis under abiotic treatment. As shown in additional file 5, except for group 3 WRKY genes, Arabidopsis WRKY genes whose orthologus CsWRKY genes were not induced by abiotic treatments were also not stresses-inducible. In addition, most of orthologous AtWRKY genes of stress-inducible CsWRKY genes also responded to at least one stress-type treatment. These findings imply a possible correlation between the expression profiles of these orthologs in Arabidopsis and cucumber in response to abiotic stresses. Among the CsWRKY genes whose expressions changed in response to abiotic stress, there were 13 for which stresses-inducible orthologs existed in Arabidopsis (additional file 5). To investigate whether the expressions of these orthologs were correlated between the two species, we compared the expressions of these 13 pairs of orthologs under various stresses as described in the Methods section. This analysis generated a total of 22 sets of data (one pairs of orthologs may be induced by more than one abiotic stresses). As shown in Table 3, the correlation coefficients of 12 sets of data, more than half of the 22 sets of data, were greater than 0.5, indicating a positive correlation between the orthologous pairs under abiotic stresses (Figure 7A-D). The expression profiles of only two sets of data were negatively correlated (Figure 7G-H). Finally, the average correlation coefficients of 22 datasets for all the putative orthologous WRKY genes was 0.40 and differed significantly (p < 0.01) from the average expression correlation of a control dataset composed of randomly chosen gene pairs (0.04) (Table 3). In contrast, when the correlation coefficients of group 3 CsWRKY and AtWRKY orthologs were calculated, there was no clear positive or negative correlation (Figure 7E-F). Our results indicated that there is a correlative expression profile between stress-inducible CsWRKY genes and their putative AtWRKY orthologs, except for the group 3 WRKY genes. This finding suggests that the expression of group 3 WRKY orthologs differ between cucumber and Arabidopsis. All expression data used to calculate correlations are shown in additional file 6.

Table 3 Pearson correlation coefficients for expression profiles of orthologs*
Figure 7
figure 7

Pairwise comparisons of the expression profiles of putative orthologous cucumber and Arabidopsis WRKY genes under abiotic stresses. The relative expression of CsWRKY genes was obtained by real-time RT-PCR (indicated by triangles). Data are the means of three replicates with standard errors represented by bars. The CsWRKY expression data were compared with the mean-normalized expression data for their putative orthologous AtWRKY genes from a publicly available Arabidopsis microarray data set (indicated by circles) according to the description in Methods. The relative amount of mRNA (y-axis) was the ratio of treated to untreated sample. The treatment time (h) under the particular abiotic stress is presented on the x-axis. R indicates the correlation coefficient for expression between orthologs under the corresponding abiotic stresses. A distinct positive correlation was detected in most orthologs (A-D), but no obvious correlation was detected in group 3 orthologs (E-F). A negative correlation was detected in a small number of orthologs (G-H).

Evolutionary analysis of group 3 WRKY genes in Arabidopsis and cucumber

The group 3 WRKY genes seem to have greatly expanded in angiosperms after the divergence of the monocots and dicots (160 Mya) [44]. Here, we further investigated the duplication and diversification of group 3 WRKY genes after divergence of the eurosids I group (which include cucumber, soybean, and poplar) and the eurosids II group (which include Arabidopsis) (110 Mya). A phylogenetic tree of WRKY proteins encoded by group 3 WRKY genes of Arabidopsis (14), cucumber (6), poplar (10), and soybean (7) was constructed using the most primitive WRKY domain of Giardia lamblia as an outgroup. This analysis showed that many members of the group 3 AtWRKY proteins clustered together and displayed the close phylogenetic relationship (Figure 8), indicating that they arose after the divergence of the eurosids I and II. Two types of gene duplication events, tandem duplication and segmental duplication, were the main factors in the expansion of group 3 AtWRKY genes. The results of this phylogenetic analysis indicated that no gene duplication events have occurred in CsWRKY gene evolution because of no paralogs of cucumber can be detected. Hence, the different evolutionary patterns of group 3 WRKY in cucumber and Arabidopsis occurred after their divergence.

Figure 8
figure 8

Phylogram of group 3 WRKY domains from Arabidopsis ( AtWRKY ), cucumber ( CsWRKY ), poplar ( PtWRKY ) and soybean ( GmWRKY ). The phylogenetic tree was constructed using the neighbor-joining method as implemented in PHYLIP 3.2. Numbers on internal nodes are the percentage bootstrap support values (1000 re-sampling); only values exceeding 50% are shown. The most primitive Giardia lamblia WRKY C-terminal domain (GlWRKY 1C) was used as an outgroup. The letters T and S indicate nodes where tandem duplication and recent segmental duplication events have occurred, respectively. * indicates the AtWRKY associated with the gene duplication events.

To determine whether selection pressure had affected group 3 WRKY genes, we estimated the ω (dn/ds) values for all branches of group 3 WRKY genes in Arabidopsis and cucumber (Figure 9 and Table 4). In Arabidopsis, the ML estimate of dN/dS values for all nodes under model M0 were < 1, with a mean value of 0.276 (Table 4), indicating that group 3 AtWRKY genes have been under purifying selection, which was the predominant force acting on the evolution of the group 3 AtWRKY genes. However, the log likelihood differences between model M3 and model M0 were statistically significant for all nodes tested, suggesting that selective pressure varied among branches and some genes might have been under positive selection. We further used model M7 and M8 of PAML to address whether positive selection has played a role in the evolution of group 3 AtWRKY genes. Of the eight nodes analyzed, log-likelihood values were significantly higher under the M8 model than under the M7 model for five nodes (nodes 1, 2, 3, 4 and 5), which indicates that positive selection has contributed to the evolution of group 3 AtWRKY genes. Interestingly, the terminal nodes with clusters of duplicated AtWRKY genes were all under positive position selection, suggesting a correlation between duplication of genes and positive selection. Furthermore, we identified the positively selected sites under model M8 using the Bayesian method. Several positive selection sites were detected in above five nodes but only one positive selection site could be detected in the region of WRKY domains. Thus, it appears that because of the high degree of conservation in WRKY domains of the WRKY genes, the positive selection contributed mostly to the regions outside of the WRKY domains. In cucumber, although the log likelihood differences between model M3 and model M0 suggest that selective pressure varied among branches, there was no detectable positive selection in any of the nodes. Assuming that there were no duplication events in CsWRKY genes and that positive selection is associated with duplication of WRKY genes as we described here, the extensive positive selection events were probably followed by the group 3 WRKY gene duplication events. This positive selection might be the main evolutionary force for group 3 AtWRKY genes. Due to the absence of duplicated genes and positive selection in cucumber, the functions of group 3 CsWRKY genes might be more conservative than those of AtWRKY genes.

Figure 9
figure 9

Phylogram of group 3 WRKY genes of Arabidopsis and cucumber. The phylograms were constructed using the neighbor-joining method as implemented in PHYLIP 3.2. Numbers on the left of each internal node represent bootstrap support values (1000 re-sampling); only values exceeding 50% are shown. Numbers on the right of each node represent the nodes that were used for positive selection analysis. Arabidopsis AtWRKY1 was used as an outgroup. The trees represent phylogenetic relationships among (A) AtWRKY proteins and (B) CsWRKY proteins.

Table 4 Likelihood ratio test results of group 3 AtWRKY and CsWRKY.