Background

Radish (Raphanus sativus L.), belonging to the Brassicaceae family, is an important root vegetable crop with multiple varieties, such as green-fleshed radish. Its flesh is green due to the existence of chlorophyll, which is necessary for photosynthesis. The expression of chlorophyll biosynthesis-related genes contributes to the chlorophyll accumulation. Due to the presence of chlorophyll, chlorophyll fluorescence technology has detected the occurrence of photosynthesis in the flesh of green-fleshed radish [1].

Basic Helix-Loop-Helix (bHLH) transcription factor family is the second largest gene families in plant kingdom, usually classified into 15–26 subfamilies. Members of the bHLH gene family have been identified in many species, for example, 188 in apple (divided into 18 subfamilies, [2]), 159 in tomato (divided into 25 subfamilies, [3]) and 115 in grape (classified into 25 subfamilies, [4]). BHLH transcription factors have a highly conserved bHLH domain with ∼60 amino acids, which comprises a basic region followed by two amphipathic α-helices separated by a variable loop region (HLH) [5]. The basic region is relevant to the DNA binding that allows the bHLH proteins binding to the cis-acting elements in the promoter regions of the target genes, and the HLH region functions as a dimerization domain that allows the formation of homo- and/or heterodimers.

The researches show that the bHLH proteins can positively or negatively regulate the process of chlorophyll biosynthesis. Phytochrome interacting factor1(PIF1), as a bHLH protein, negatively controls the chlorophyll biosynthesis in the dark by regulating the expression of genes coding heme oxygenase (HO3), protochlorophyllide oxidoreductase (POR), and ferrochelatase (FeChII), which are involved in the chlorophyll biosynthetic pathway [6]. Ectopic overexpression of a bHLH gene from Populus euphratica enhances tolerance to water-deficit stress and results in a higher chlorophyll content and photosynthetic rate in Arabidopsis [12]. BHLH genes have been confirmed in B. rapa and B. oleracea, with 251 B. rapa bHLHs and 268 B. oleracea bHLHs, respectively [13]. To understand the synteny relationships between RsbHLH genes and Bra/BolbHLH genes, the syntenic orthologous gene pairs were identified between RsbHLHs and Bra/BolbHLHs. There were 950 syntenic orthologous gene pairs between 174 RsbHLHs and 209 BrabHLHs (Fig. 6b). A total of 20 RsbHLHs only one syntenic orthologous BrabHLHs, such as RsbHLH6, RsbHLH19, RsbHLH36. The remaining RsbHLHs had at least two syntenic orthologous BrabHLHs. RsbHLHs with two syntenic BrabHLHs are the most, such as RsbHLH61- BrabHLH080/BrabHLH218. The number of RsbHLHs with four syntenic orthologous BrabHLHs ranked the second, such as RsbHLH52-BrabHLH052/BrabHLH168/BrabHLH038/BrabHLH006. Of course, there was also a case where multiple RsbHLHs correspond to one syntenic orthologous BrabHLH, such as RsbHLH14/RsbHLH98/RsbHLH110/RsbHLH149-BrabHLH081. There were 638 syntenic orthologous gene pairs between 162 RsbHLHs and 187 BolbHLHs (Fig. 6b). Like BrabHLHs, the syntenic relationship between RsbHLHs and BolbHLHs was divided into 1:1, 1:n, and n:1(n ≥ 2). A total of 24 RsbHLHs only one syntenic orthologous BolbHLHs, such as RsbHLH37, RsbHLH121, and RsbHLH128. RsbHLHs with two syntenic orthologous BolbHLHs had the highest number (such as RsbHLH61-BolbHLH156/BolbHLH137), followed by RsbHLHs with three syntenic orthologous BolbHLHs (RsbHLH34-BolbHLH070/BolbHLH157/BolbHLH213). Likewise, there were also multiple RsbHLHs corresponding to one syntenic orthologous BolbHLH, such as RsbHLH8/RsbHLH40/RsbHLH175/RsbHLH193- BolbHLH080. To sum up, although the number of bHLH genes in B. oleracea was slightly more than that of B. rapa, the syntenic orthologous bHLH pairs between B. oleracea and R. sativus was significantly less than that of B. rapa and R. sativus, which may be due to the closer relationship between B. rapa and R. sativus. This is supported by the genetic studies that the size and structural characteristics of the R. sativus genome are similar to those of B. rapa genome [14].

Genome duplication events can contribute to the expansion of gene family in plant kingdom. The results suggested that four types of duplication existed in the RsbHLH members, namely whole genome duplication (WGD) or segmental event, dispersed event, proximal event, and tandem event (Additional file 3). 162 (76%) RsbHLH genes were duplicated and retained from a WGD or segmental event, revealing that WGD or segmental duplication was the main driving force for the expansion of the radish bHLH gene family. Eight tandem events of 16 RsbHLH genes were identified and located on four chromosomes. Among these events, two events (RsbHLH10 and RsbHLH11, RsbHLH96 and RsbHLH97) were located on chromosome R1 and R5, respectively. Three events (RsbHLH57 and RsbHLH58, RsbHLH59 and RsbHLH60, and RsbHLH69 and RsbHLH70) took place within the same chromosome R4, and the remaining three events (RsbHLH133 and RsbHLH134, RsbHLH136 and RsbHLH137, and RsbHLH144 and RsbHLH145) also took place within the same chromosome R6. Finally, the dN / dS for the 48 paralogous gene pairs were calculated to confirm the selection pressure (Additional file 4). All of the RsbHLH paralogous gene pairs had a dN / dS < 1(dN means non-synonymous substitution ratio; dS means synonymous substitution ratio), implying that these RsbHLH genes had experienced strong purifying selective pressure.

Promoter cis-element analysis

BHLH genes can take part in the regulation of plant growth and development, and response to various abiotic stresses. To further investigate the potential biological functions of RsbHLH genes, the cis-acting regulatory elements in the promoter regions of RsbHLH genes were analyzed using PLACE tool. As shown in Fig. 7 and Additional file 5, three main categories were identified in the cis-acting regulatory elements of RsbHLH genes. Category one was associated with plant growth and development, such as flavonoid biosynthesis, phytochrome expression, and circadian control. The motifs contained in this category were MBSI, circadian, CAT-box, RY-element, etc. Light responsive element was present in the promoter regions of 208 RsbHLH genes, indicating that the expression of RsbHLH genes maybe controlled by light. The genes involved in chlorophyll metabolism are regulated by light [15]. This may mean that RsbHLH genes had a certain relationship with chlorophyll metabolism. Category two was related to phytohormones, such as gibberellin, methyl jasmonate, and abscisic acid. The motifs included in this category were CGTCA-motif, ABRE, TCA-element, P-box, etc. Category three was involved in abiotic stresses, such as low-temperature responsiveness, light responsiveness, and drought-inducibility. The motifs included in this category were LTR, G-box, MBS, WUN-motif, etc.

Fig. 7
figure 7

Cis-regulatory elements in the promoter regions of the RsbHLH genes. Different cis-regulatory elements are represented with different colored boxes, which are placed at the top on the right. The element size is estimated by the scale at the bottom

Analysis of GO enrichment and expression

To predict the potential biological functions, the GO enrichment analysis was performed by WEGO to show three aspects of functional classifications, namely, cellular component, molecular function, and biological process (Fig. 8). Among the 213 RsbHLH genes, 207 RsbHLH genes were enriched in the biological process, and most of these RsbHLHs mainly participated in “metabolic process” (such as “primary metabolic process” and “biosynthetic process”), and “biological regulation” (such as “regulation of metabolic process” and “regulation of cellular process”). Additionally, 85 RsbHLHs, 74 RsbHLHs and 72 RsbHLHs were involved in “response to stimulus”, “developmental process”, and “multicellular organismal process”, respectively. The “response to stimulus” was mainly associated with “response to abiotic stimulus” (cold, salt, light, etc.) and “response to chemical” (gibberellin, abscisic acid, jasmonic acid, etc.), which was consistent with the previous promoter element analysis. The “developmental process” involved mainly in “anatomical structure development” (fruit development, flower development, carpel development, etc.). The “multicellular organismal process” contained mainly photomorphogenesis, guard cell differentiation, root hair initiation and so on. In summary, RsbHLHs could play an important role in the growth and development of radish.

Fig. 8
figure 8

The GO annotation of RsbHLH genes. All annotated GO terms include cellular component, molecular function and biological process. The y axis indicates the number of genes

A comparative RNA-seq analysis between the green-fleshed radish (GF) and white-fleshed radish (WF) was made to study the expression patterns of 213 RsbHLHs at five growth stages. The RsbHLH genes with FPKM < 1 at all five stages in GF and WF were regarded as unexpressed genes, so only 119 RsbHLH genes were carried on expression analysis. The expression patterns of these 119RsbHLHs among five stages varied greatly (Fig. 9). For example, whether GF or WF, some RsbHLHs were stably expressed at five stages, such as RsbHLH29. Additionally, whether GF or WF, RsbHLH105 and RsbHLH154 were only highly expressed at the stage 3. For RsbHLH36, it had always been highly expressed at five stages in GF, while always lowly expressed at five stages in WF. Reversely, RsbHLH140 had always been highly expressed at five stages in WF, while always lowly expressed at five stages in GF. For RsbHLH130, it only had a relatively high expression at the stage 3 of WF. The various expression patterns of RsbHLHs among five stages suggested that the bHLH members may play a vital role in the entire growth and development of radish taproot. The differentially expressed gene (DEG) analysis revealed that there were 19, 16, 22, 18, 13 differentially expressed RsbHLHs between GF and WF at stage 1 to stage 5, respectively (Additional file 6). Four RsbHLHs (RsbHLH36, RsbHLH44, RsbHLH69, and RsbHLH140) were differentially expressed genes shared by the five stages. The differential expression of RsbHLHs may lead to the differences in traits between green-fleshed radish and white-fleshed radish.

Fig. 9
figure 9

Expression patterns of RsbHLH genes at five development stages of GF and WF. GF indicates the green-fleshed radish ‘Cuishuai’; WF indicates the white-fleshed radish ‘Zhedachang’. The colour scale is shown at the right top. Higher expression level is in red, while lower expression level is in blue

To demonstrate the accuracy and reproducibility of the transcriptome data, fifteen RsbHLH genes were selected to analyze the transcript abundance by qRT-PCR. The results showed that the expression trends of these fifteen genes detected by qRT-PCR in the fifth stage were in line with the RNA-Seq results (Additional files 7 and  8). Therefore, the reliability of the transcriptome data was confirmed.

Candidate bHLHs involved in chlorophyll metabolism in Raphanus sativus L

The weighted gene co-expression network analysis (WGCNA) was used to analyze the connection between genes and physiological traits, discovering the vital genes associated with physiological traits. There were 8666 DEGs between GF and WF, including 46 bHLH genes, and these DEGs were performed by WGCNA. The expression profiles of the 8666 genes were grouped into 16 modules (MEs), displaying 15 different co-expression networks (ME1-ME15) and the outliers that did not belong to any cluster (ME0). As Fig. 10a shown, different colors represented different modules, with the module size ranging from 27 to 2510. To confirm modules that were significantly associated with chlorophyll content, the module-trait correlation relationships were constructed (Fig. 10b). Chlorophyll a content was significantly negatively correlated with the turquoise module (-0.9) and midnightblue module (-0.7), and significantly positively associated with the blue module (0.86) and brown module (0.72). Chlorophyll b was significantly positively correlated with the blue module (0.8), and significantly negatively correlated with the turquoise module (-0.8). These four modules contained 23 bHLH genes, which were used to screen out hub genes.

Fig. 10
figure 10

a Hierarchical cluster tree revealing gene co-expression modules identified by WGCNA. The branches contain 15 modules labeled in different colors. Except the gray module, the modules are named ME1 to ME15. b Module-trait associations. Columns correspond to chlorophyll content, and rows correspond to the characteristic genes of the modules. The correlation between two is shown in cell by Pearson correlation coefficient, and p-value is in parentheses. Cell color ranges from red (high positive correlation) to blue (high negative correlation)

Gene with high within-module connectivity was considered as a hub gene in a module [16]. Hub genes in modules could be more important than the other genes in the co-expression network, and they were considered as the representatives of the modules. Based on the high connectivity, four bHLH genes were regarded as hub genes: RsbHLH140 (turquoise), RsbHLH52 (blue), and RsbHLH36 and RsbHLH49 (brown). DEGs co-expressed with these four bHLH genes were screened out for further analysis.

Two DEG genes (Rs498020 and Rs428920) co-expressed with RsbHLH52 and three DEG genes (Rs536440, Rs386330, and Rs340620) co-expressed with RsbHLH140 were found to be involved in the chlorophyll metabolic pathway [1], and these five genes were shown in Fig. 11. Furthermore, as shown in Fig. 10b, RsbHLH140 was negatively correlated with chlorophyll content, while RsbHLH36, RsbHLH49, and RsbHLH52 were positively correlated with chlorophyll content. Thus, it was concluded that RsbHLH140 could negatively regulated the process of chlorophyll metabolism, and RsbHLH36, RsbHLH49, and RsbHLH52 positively controlled the process of chlorophyll metabolism, which was consistent with the analysis results of the gene expression pattern (Fig. 9). To sum up, these four bHLH genes may be involved in the metabolic process of chlorophyll, and then associated with GF photosynthesis.

Fig. 11
figure 11

DEGs co-expressed with RsbHLH52 in the blue module (a) and RsbHLH140 in the turquoise module (b)

Discussion

The whole genome duplication greatly facilitates the expansion of the bHLH gene family

The whole genome duplication (WGD) events have occurred throughout the process of plant evolution, which was a driving force for the expansion of gene family [17]. In radish, some gene families expanded also mainly through the whole genome duplication to generate the large number of members, such as MYB gene family, HSF gene family and CPA gene family [18,19,20]. In this study, 213 bHLH genes were identified from the radish genome, showing that the bHLH gene family in radish had been expanded in comparison with that in Arabidopsis. The results showed that 162 (76%) RsbHLH genes were duplicated and retained in the WGD event, implying that WGD event greatly promoted the amplification of the RsbHLH gene family. This result was consistent with the results of the bHLH gene family analysis in other species, such as Nelumbo nucifera, Fagopyrum tataricum, and Xanthoceras sorbifolia Bunge [21,22,23].

Gene duplication promotes the formation of the paralogous gene pairs. A total of 48 paralogous gene pairs were identified in our results, including 8 paralogous gene pairs from the tandem duplication and 36 paralogous gene pairs from the WGD duplication. Among these paralogous gene pairs, 16 paralogous gene pairs had the different gene structure, and exon gain/loss was observed. For instance, RsbHLH17 contained four exons, while its paralogous RsbHLH170 had seven exons (Fig. 2), indicating a gain of three exons occurred during evolution. RsbHLH138 contained six exons, while its paralogous RsbHLH193 had four exons, indicating a loss of two exons occurred during evolution. A similar pattern was also reported in bHLH gene families of the Ginkgo biloba and Xanthoceras sorbifolia Bunge [23, 24]. These gain/losses could be due to the results of chromosomal rearrangements and fusions, and may potentially give rise to the functional diversification of the gene families [25]. Additionally, 12 paralogous gene pairs had the discrepant motifs, in which four gene pairs had the same gene structures, such as RsbHLH10-RsbHLH11 and RsbHLH74-RsbHLH152. Differences in gene sequences may lead to changes in protein domains although they had the same gene structures. These proteins may have gone through the wide domain shuffling during the WGD [26]. We also found that despite some paralogous gene pairs had the different gene structures, they had the same motifs, such as RsbHLH52-RsbHLH161 and RsbHLH93-RsbHLH124, revealing that the sequences of protein motifs were conserved in the process of evolution.

The duplicated genes could acquire the new functions or segment the original functions to improve the adaptability of environments [27]. There are four fates for duplicated genes: (1) duplicated genes retain original functions; (2) a copy of the duplicated genes is silenced; (3) a copy of the duplicated genes retains the original function, while another copy obtains the new function, called neofunctionalization; (4) Two copies segment the original functions and obtain different functions, called subfunctionalization [28, 29]. The functional divergence of duplicated genes could cause the alteration in the expression pattern. Herein, some paralogous gene pairs showed the different expression patterns. For example, RsbHLH2 was expressed at five stages of the GF and WF, while its paralogous RsbHLH100 was not expressed at five stages of the GF and WF. RsbHLH92 had the lowest expression level at the third stage of the GF and WF, while its paralogous RsbHLH123 had the highest expression level at the third stage of the GF and WF. In a word, the distinct expression patterns of the paralogous bHLH gene pairs may lead to the formation of the unique traits and improving the adaptability to the environment in radish.

BHLH genes may have important functions in the photosynthesis in green-fleshed radish

Transcription factors (TFs) are activated and bind to the promoter of the crucial genes involved in various biological pathways, and regulate the growth and development of plants. As one of the most important biological pathways for plants, photosynthesis is regulated by a variety of transcription factors [30,31,32]. The bHLH gene family is the second largest TF family, playing an important role in the regulation of photosynthesis in plants [33].

In addition to leaves, photosynthesis also appears in many non-foliar organs. The results of chlorophyll fluorescence show that photosynthesis can occur in the green-fleshed radish rich in chlorophyll [1]. In our study, four bHLH genes (RsbHLH36, RsbHLH49, RsbHLH52, and RsbHLH140) may participate in controlling the photosynthesis process by influencing the changes of chlorophyll content. In the bHLH gene family, a few members are generally considered as negative regulators of photosynthesis. For example, phytochrome interacting factor1 (PIF1) and phytochrome interacting factor4 (PIF4), they negatively regulate the chlorophyll biosynthesis or bring about the chlorophyll degradation to lower the chlorophyll content, hindering the progress of photosynthesis [6, 9]. According to the WGCNA analysis, RsbHLH140 was significantly negatively correlated with the chlorophyll content. In addition, compared with GF, RsbHLH140 had the higher expression level in WF (Fig. 9). Three DEGs (Rs536440, Rs386330, and Rs340620) co-expressed with RsbHLH140 were involved in chlorophyll biosynthesis pathway, and they were expressed in GF, while hardly expressed in WF [1], showing that the higher expression of RsbHLH140 may suppress the expression of these three genes in WF. The A. thaliana orthologous genes of these three genes (AT3G56940, AT1G74470, and AT3G51820) are annotated in the TAIR database and participate in the chlorophyll biosynthetic process [34]. So, RsbHLH140 may act as a negative regulator of photosynthesis by suppress the chlorophyll biosynthesis. It is reported that knocking out of the negatively regulated bHLH gene can enhance the photosynthesis. Chen et al. [35] find that knocking out NRP1 gene, as a bHLH gene, gives rise to greater photosynthesis and increased biomass in rice.

Additionally, a few bHLH genes are regarded as positive regulators of photosynthesis. In contrast to PIF1, PIF3 has been proved as a positive regulator of photosynthesis. PIF3 contributes to the chlorophyll accumulation and acts positively in chloroplast development [36]. Overexpression of PebHLH35 from Populus euphratica results in a higher chlorophyll content and enhance the photosynthetic rate [Search of cis-elements in the RsbHLH gene promoter regions

The upstream 2000 bp genomic sequences of RsbHLH genes relative to the translation start codon were extracted from Raphanus sativus L. genome. These 2000 bp regions were regarded as the promoter sequences of RsbHLH genes. The cis-regulatory elements of RsbHLH genes were screened from these promoter regions using online tool PLACE (https://www.dna.affrc.go.jp/PLACE/?action=newplace).

RNA-seq and qRT-PCR analysis of RsbHLH genes

Total RNA was extracted from the flesh tissues using Trizol reagent (Promega, Madison, WI, USA). The purity, concentration, and integrity of RNA were detected by the NanoPhotometer® spectrophotometer (IMPLEN, CA, USA), the Qubit® 2.0 Flurometer (Life Technologies, CA, USA), and the Bioanalyzer 2100 system (Agilent Technologies, CA, USA), respectively. After the RNA quality assessment, 30 sequencing libraries were generated using the NEBNext® Ultra™ Directional RNA Library Prep Kit according to the Illumina manufacturer's protocols (NEB, USA). The quality of libraries was checked by the Agilent Bioanalyzer 2100 system. Finally, 150 bp paired-end sequencing was performed on an Illumina Hiseqxten platform. After screening, the high-quality clean reads were aligned to the reference Raphanus sativus L. genome using the HISAT2 software [41]. The mapped clean reads were calculated to obtain the read count for each gene according to the map** results by the featureCounts software [42]. The expression level of each gene was estimated using the fragments per kilobase of exon per million mapped reads (FPKM) value, which was calculated using the countToFPKM package (https://github.com/AAlhendi1707/countToFPKM). The formula is as follows: \(\text{FPKM} = \frac{{10}^{6}{\text{C}}}{\text{NL/}{10}^{3}}\), where C is the number of fragments that specially mapped to the gene, N is the total number of fragments that specially mapped to the reference genome, and L is the number of bases in the coding region of the gene. Genes were considered to be differentially expressed genes (DEGs) between GF and WF with FDR value ≤ 0.05, Padj value < 0.01and |log2 FC|≥ 1.5 based on the DEGSeq R package [43]. The expression patterns of RsbHLH genes in the five developmental stages were presented by the TBtools software [44].

Fifteen bHLH genes were randomly chosen to be verified by qRT-PCR. The specific primers of fifteen genes were designed by the Primer5 software, and the primer sequences were listed in Additional file 9. QRT-PCR was performed using the Bio-Rad Real-Time PCR platform with quant one step qRT-PCR Kit (Tian gen). Actin gene was used as the internal control to standardize the results, and 2−ΔΔCT method was applied to calculate the relative expression level [45]. The all reactions were carried out with the following conditions: 95 °C for 15 min and 40 cycles of 95 °C for 10 s, 60 °C for 30 s. After each run, a melting curve was generated to ensure the product specificity and to check for the presence of primer dimers.

The weighted gene co-expression network analysis

The co-expression network of DEGs was constructed using WGCNA package in Rstudio [46]. The PickSoftThreshhold function was used to confirm a soft threshold (power) value according to the approximate Scale-free Topology Criterion. The soft threshold was 10 to establish the co-expression network on the basis of the adjacency matrix. The automatic network construction function blockwiseModules was applied to obtain weighted co-expression clusters, called modules, with the following operating parameters: power = 10, TOMType = unsigned, minModuleSize = 30, reassignThreshold = 0, minKMEtoStay = 0.3, mergeCutHeight = 0.25. The data on chlorophyll content was derived from our previous research results [1].