Introduction

A characteristic feature of vascular plants is that CO2 is fixed by photosynthesis in source leaves and then transported to and utilized by different sink organs for growth. During this process, three key factors can affect source-to-sink relationship: (i) photosynthesis capacity that determines carbon availability; (ii) sugar transportation; (iii) carbon utilization and storage at sink organs [1]. During plant growth and development, sink organs/tissues are dynamic [2, 3]. For example, immature leaves and shoot apical meristems are sink organs in vegetative stages while develo** flowers and seeds become sinks in reproductive stages. Therefore, the abilities of sink organs to obtain, utilize, and store carbon (so-called ‘sink strength’) are dynamic and tightly controlled [4, 5]. Moreover, the distribution of carbon utilization/storage (carbon allocation) within a sink tissue is well coordinated. The C4 grasses include important bioenergy crops, such as maize, sorghum, switchgrass, and sugarcane, and serve as the most-significant plant source of carbohydrates and bioethanol [6]. Among these C4 crops, Sorghum bicolor is an excellent example for studying carbon allocation, because sweet sorghum varieties have two sink organs, seeds, and stem [7, 8]. While a significant portion of carbon reserves are in cell wall components, large amounts of soluble sugars (primarily sucrose) and starch accumulate in sorghum stems after flowering. Thus, this feature makes sweet sorghum an interesting model to study carbon partitioning and sugar accumulation for other bioenergy crops like sugarcane [9, 10]. In addition, sweet sorghum can accumulate considerable amount of starch in the internode [11] and has differential expression patterns of the cell wall-related genes compared to non-sweet genotypes [12], indicating that the distribution of carbon utilization within sweet sorghum internodes may be redirected to establish sink strength. Also, sorghum is an emerging bioenergy crop with multiple advantages: (i) a ~ 730-Mb diploid genome and several reference assemblies with great synteny to maize and sugarcane [13,14,15,16]; (ii) good tolerance to several abiotic stresses and desirable agronomical features, such as the stay-green trait [7, 17, 18]; (iii) rich genetic resources [19], such as several EMS resources [20,36,37,38,39], the molecular mechanism regulating stem sugar concentrations remains unclear. Physiology results using radiolabeling and dye transport approaches suggest that sucrose may be transported to storage parenchyma via apoplasmic and/or symplasmic routes [40,41,42,43].

Carbohydrates are stored in sorghum stems in three significant forms, sucrose in vacuoles, starch in plastids, and lignocellulosic cell wall biomass [26]. The sucrose in vacuoles could be related to several sugar transporters, such as Sucrose Transporters (SUTs), Tonoplast Sugar Transporters (TSTs), and Sugars Will Eventually be Exported Transporters (SWEETs). The expression profiles of these transporters have been examined in sweet and grain genotypes [12, 42,43,44,45,46], suggesting SbTST2 as a candidate gene for stem sugar difference between sweet and grain sorghum lines [46]. The sorghum SWEETs fell into the four phylogenetically defined clades, in which evidence of phylogeny–function correlation has been shown in several species [47,48,49,50,51,52,53,54]. Starch synthesis requires a suite of well-characterized enzymes and transporters (reviewed previously in [55, 56]), including ADP-glucose pyrophosphorylase (AGPase), soluble starch synthase (SS), granule-bound starch synthase (GBSS), starch branching enzyme (SBE), starch debranching enzyme (DBE)/isoamylase (ISA), and glucose-6-phosphate translocators (GPT) that fuel starch synthesis with glucose-1-phosphate (G1P) [57]. Starch is degraded by a set of kinases and hydrolases, including glucan–water dikinase (GWD), phosphor-glucan–water dikinase (PWD), α- and β-amylase (AMY and BAM, respectively) and disproportionating enzyme (DPE) [58,59,60].

Plant cell walls include primary and secondary cell wall (PCW and SCW, respectively). PCW, mainly composed of cellulose, hemicellulose, and pectin, exists in all plant cell types and is tensile to yield to cell expansion and turgor pressure. SCW, mainly composed of lignin, crosslinked with cellulose and hemicellulose, exists in specific cell types to provide mechanical support and serve as a defensive barrier. Cellulose, as the most abundant structural polysaccharide in plant cell wall, is synthesized by cellulose synthases that are encoded by CesA gene family [61, 62], of which two phylogenetic groups are responsible for PCW and SCW biosynthesis, respectively [63]. Hemicelluloses are branched hetero-carbohydrate polymers synthesized by cellulose synthase-like (Csl) enzymes. Lignin is a complex heteropolymer crosslinked from three monolignins, namely p-coumaryl (H), coniferyl (G), and sinapyl (S) alcohols [64]. Ten major gene families required for monolignol biosynthesis have been well studied in sorghum at the genome level [65], namely, phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), 4-coumarate:CoA ligase (4CL), hydroxycinnamoyl-transferase (HCT), 4-coumarate 3-hydroxylase (C3H), cinnamyl-CoA reductase (CCR), cinnamyl alcohol dehydrogenase (CAD), caffeic acid O-methyltransferase (COMT), caffeoyl-coenzyme A 3-O-methyltransferase (CCoAOMT), and ferulate 5-hydroxylase (F5H). Seven out of the ten enzymes have been structurally and biochemically investigated (see “Methods”). Three Brown midrib (Bmr) loci are known to encode enzymes of monolignol biosynthesis [33]. The total sugar concentration of internode-extracted juice was measured by Brix [75]. All internode samples were collected in the field at 9:00–11:00 AM and stored on ice. After transferring samples back to the laboratory, the juice was extracted immediately.

RNA-seq data analysis

Three RNA-seq data sets were used (Additional file 2). The first data set is the transcriptomes of sugar-accumulating internodes from a conversion line R9188 and its two parents Rio and BTx406 collected at flag leaf stage, flowering and 10 and 15 days after flowering (designated as T1, T2, T3 and T4, respectively) [12]. The dwarf inbred line R9188 was developed from the BTx406/Rio cross followed by one backcross to Rio and contains the early flowering and dwarf loci introgressed from BTx406 [76]. RNA was extracted from the pooled tissues from upper internodes of Rio, BTx406 and R9188 (internode 2, 3, and 4, numbered from top to bottom) as described elsewhere [12].

The second RNA-seq data set is the transcriptomes of Della internodes collected from eight developmental stages (29, 16, and 7 days before anthesis, anthesis, and 11, 25, 43, and 68 days after anthesis, designated as A-29, A-16, A-7, A0, A11, A25, A43, and A68, respectively) [26]. Particularly, Della stem was fully mature at A-7 and the grains reached a soft dough stage at A25 and became completely mature before A43. RNA was extracted using the tenth internode of greenhouse-grown plants (numbered from bottom to top), while field grown Della in Texas, US had 14–15 internodes as previously described [26].

The third RNA-seq data set is the SIL-05 transcriptomes of internode, panicle, and leaf tissues at three stages, 1, 17, and 36 days after heading (1 DAH, 17 DAH, and 36 DAH, respectively) [45]. SIL-05 flowers between 1 and 17 DAH, and sucrose starts to accumulate in SIL-05 stem between 1 and 36 DAH and can reach 18.9% in juice at 64 DAH. RNA was extracted from the corresponding internode from the leaf below flag leaf of SIL05 as described previously [45, 77].

The stages of the three RNA-seq data sets were aligned relative to anthesis and the stages over stem sugar accumulation were identified for each genotype (Fig. 1). To overcome the issues of lacking replicates (for dataset3) and unavailability of raw data (for dataset2), we took the following strategies for analysis (Additional files 3, 4). (1) The raw data of data sets 1 and 3 were quality filtered and analyzed using the same pipeline. Reads were mapped to the sorghum reference genome (BTx623, Sbicolor_v2.1_255) using TopHat v2.0.14 (a maximum mismatch of 9 bp and default settings for other parameters) [13, 78]. Read counts were calculated using ‘HTseq’ with uniquely mapped reads and RPKM values were calculated for SIL-05 [79]. (2) The dataset2 expression matrix using reads per kilobase of transcript per Million mapped reads (RPKM) were reversely calculated to the normalized average read counts for three replicates of each time point based on RPKM definition with the assumption that in Della, every gene has the same gene length as that in the genotypes from data sets 1 and 3. Then, the normalized average read counts can be considered as an input of read count matrix without replicates for differential expression analysis. (3) To perform differential expression analysis for data sets 2 and 3 without replicates, the gene-wise dispersion of biological variation (using biological coefficient of variance, BCV, as index) for sorghum internode tissues was calculated using the triplicates of dataset1 with “edgeR” [80]. The BCV matrix was used to identify differentially expressed genes in data sets 2 and 3 with “edgeR” using the following criteria: q values < 0.05 and log2Fold Change (log2FC) ≥ 1. (4) To investigate the sample relationship across data sets, potential batch effects for the expression matrix between the three RNA-seq studies were minimized with quantile normalization using ‘preprocessCore’ as described in Additional file 4 [81, 82].

Fig. 1
figure 1

Time-course alignment of the RNA-seq time points between sorghum genotypes. Duration of booting and grain development stages were according to the field observations for Rio and Della, and previous studies [12, 26, 45, 77]. The time points when the RNA-seq samples were collected are color-coded by genotypes and aligned to the developmental process of sorghum, with those samples related to stem sugar accumulation highlighted in the red broken box

To identify common candidate genes associated with stem sugar accumulation in all the data sets, we investigated genes involved in primary metabolism and sugar transporters, and considered candidates using the following criteria. (1) Low-expression genes were excluded (maximum RPKM ≤ 5), because the genes are annotated as encoding enzymes or transporters functioning in primary metabolic pathways that are responsible for major carbon reserves in sorghum stem. (2) To identify genes showing distinct expression trends between sweet and non-sweet genotypes, the genes should meet two criteria: (2A) differential expression at post-anthesis stages compared to anthesis or pre-anthesis stages in the sweet genotypes, but not in the non-sweet genotypes, or vice versa; (2B) an expression trend in Della and SIL05 similar to that in Rio, but contrasting to BTx406/R9188. To visualize the similar expression trends of selected candidate genes in sweet versus non-sweet comparison, heatmap of log2 fold change was used with the fold changes calculated using RPKM + 0.1 to avoid zero values.

Identification of genes involved in primary metabolism and sugar transport

Annotation information of genes potentially involved in cell wall metabolism, starch and sucrose metabolism, and glycolysis and sucrose transporter families (SUTs, SWEETs, and TSTs) was extracted from earlier literature and databases [83,84,85,86,87,88,89]. Detailed methods for gene annotation are in Additional file 4. All the gene annotation information is shown in Additional file 5.

Phylogenetic analysis of SWEET gene family

The 23 SWEETs reported previously were used for gene family analysis (Additional file 6). To re-confirm that these genes encode putative SWEETs, BLAST searches against sorghum genomes v2 and v3 followed by filtering using two MtN3 domain (Pfam: PF03083) were performed [15, 47]. The deduced amino acid sequences of sorghum SWEETs were compared with rice and maize SWEET proteins; only primary or canonical transcripts were used. The rice and maize SWEETs were described previously [52,53,54]. SbSWEET nomenclature was according to Bhimidine et al. [46]. Sequence alignment was performed using MUSCLE and neighbor joining (NJ) phylogenetic trees were generated using MEGA v7 with JTT protein substitution model, pairwise deletion for gaps/missing data, and 1000-time bootstrap [90]. Sobic.003G038800 was not included in phylogenetic tree due to its two incomplete MtN3 domain (Additional file 7). Sorghum expression atlas and MOROKOSHI database were used to evaluate the spatio-temporal expression patterns of SWEETs [15, 84].

Quantitative PCR validation

Total RNA was extracted from the pooled samples of upper internodes (internodes 2, 3, and 4, numbered from top to bottom) for Rio, BTx406, and R9188, respectively, using TRIZOL and PureLink RNA extraction kit (Invitrogen). The samples were collected from the plants grown in a split-plot design and are the same samples used for RNA-seq of Rio/R9188/BTx406 as described previously [12]. The concentration and purity of the RNA were evaluated using a Nanodrop 2000 spectrophotometer. After cDNA synthesis with SuperScript III First Strand kit, real-time quantitative PCR (qPCR) was conducted with PowerUp SYBR Green mastermix (Thermo Fischer) using the ABI StepOne Plus Real-Time PCR system. Relative expression levels were calculated using the ΔΔCT method with Ubiquitin as the internal reference gene because of its stable expression determined by the RNA-seq data [12]. All real-time qPCR primers are listed in Additional file 8.

Results

Dynamics of internode sugar accumulation

We compared the dynamics of internode sugar accumulation between Della and Rio, which had similar plant heights but differed in above-ground internode number (Fig. 2a, b). Both genotypes showed slight difference in days to flowering (Fig. 2c). Della and Rio showed similarities in internode sugar concentration dynamics in several aspects. First, the total sugar concentrations were markedly increased in both genotypes from anthesis to 38 DAF (from ~ 9 to ~ 18% in Della and from ~ 12 to ~ 19% in Rio; Fig. 2d). Second, the upper internodes (internode 2–8 for Rio and internode 1–5 for Della) had higher sugar concentrations than the lower internodes in both genotypes. Third, total sugar concentrations were significantly increased at two stages in both genotypes: (i) the first 10 days after anthesis and (ii) from 22 to 32 DAF (Additional file 9). We also characterized the water content dynamics of both genotypes from 10 to 38 DAF (Fig. 2e; Additional file 9). Della and Rio had juicy stems and their internode water contents remained stably high (~ 70% to 80%) during stem sugar accumulation, except for an obvious decrease at 38 DAF in Della. Slight, but statistically significant, decreases in water contents were observed at 32 DAF for most of the internodes in both genotypes, matching slightly increased Brix levels. Generally, the upper internodes had lower water contents compared to lower internodes (Fig. 2e).

Fig. 2
figure 2

Comparison of the phenotypes and dynamics of sugar accumulation between Della and Rio. Three phenotypes, including plant height (a), number of above-ground internode (b), and days to flowering (c), were compared between Della and Rio. The statistical differences determined by Welch two-sample t test and presented using asterisks (*, **, *** indicate p < 0.05, p < 0.01 and p < 0.005, respectively). d Dynamics of total sugar concentration (measured by Brix) from 0 DAF (anthesis) to 38 DAF between Della and Rio. e Dynamics of internode water content from 10 DAF to 38 DAF between Della and Rio. The statistical differences in Brix (d) and internode water content (e) between internodes for each genotype and time point were calculated using one-way ANOVA and multiple comparison, and displayed by letter. The values within a genotype and time point labeled by the same letter are not significantly different at p = 0.05. For ac, n = 36; for d, e, n = 6

Comparative transcriptome analysis of sugar-accumulating stems

Batch effects between the three RNA-seq data sets were removed (see “Methods”, Additional files 4 and 10). The developmental timelines for the five genotypes were aligned relative to their anthesis (Fig. 1) [12, 26, 45, 77]. Principal component analysis (PCA) showed that ~ 40% of transcriptome variances between samples were explained by PC 1 and 2, which appeared to be associated with developmental stages and genotypes, respectively (Fig. 3). Our previous study shows that the Rio-converted line R9188 is partially active in primary metabolism and has an intermediate stem sugar concentration. The PCA results showed that R9188 stem transcriptomes were differentiated from those of Rio and BTx406. Della stem transcriptomes fell between Rio and R9188, with SIL-05 stem transcriptomes grouped close to Della A11 and A25. Similarly, hierarchical clustering (HC) results (Additional file 11) grouped the transcriptomes into five clusters and identified those enriched with sugar-accumulating internodes: (i) some developmental stages of Della, when soluble sugars are actively accumulated, were clustered with Rio stems (cluster 3); (ii) SIL-05 stem samples were clustered with R9188 stems (cluster 4). Overall, PCA and HC results suggest that metabolic active stem samples with high or intermediate sugars tend to group together.

Fig. 3
figure 3

PCA analysis. PC1 and PC2 variance of expression between RNA-seq samples and related developmental stages and genotypes, respectively, is shown

Cellulose synthetic genes

Non-structural carbohydrates (sugars/starch) and structural carbohydrates (cell wall components) represent major carbon reserves in the stem during post-anthesis. Sugars and starch together account for ~ 50% of stem dry weight, whereas structural carbohydrates account for ~ 30% [26]. Representative sweet varieties had higher starch content (ranging from ~ 3 to 10%) than grain sorghum lines (< 2%), supporting starch as an important carbon reserves in stem [11]. We compared the expression dynamics of primary metabolic genes to examine whether a similar expression trend could be observed in Della and SIL05 for the gene that was differentially expressed between Rio and BTx406/R9188 (Additional files 5 and 12). While considering the quantitative expression differences between genotypes, we primarily focused on the fold changes of gene expression within a genotype due to limitations in quantitative cross-comparison between data sets.

Several CesA genes were highly expressed in sweet sorghum genotypes, but significantly decreased in BTx406/R9188 after flowering (Fig. 4a). These CesA genes belong to the ancestral clusters CesA_AC4, CesA_AC5, and CesA_AC6 in phylogeny and are associated with PCW cellulose synthesis [63]. In contrast, the expression of CesA genes corresponding to SCW cellulose synthesis (Sobic.001G224300 and Sobic.002G205500) decreased post-anthesis in all genotypes. Several Csl genes were differentially expressed between sweet and non-sweet genotypes. A CslF gene that has a major role in synthesis of mixed-linkage (1,3; 1,4) β-glucan (MLG) maintained its expression levels in the sweet genotypes with particularly high expression in Della and Rio, but decreased from pre-anthesis stages in non-sweet genotypes [73, 91]. Similarly, a CslA gene (Sobic.007G137400) decreased its expression in BTx406/R9188, but maintained a seemingly higher expression levels in sweet genotypes [92]. Its homologs in Arabidopsis are both responsible for mannan synthesis and affect cell wall integrity and organization [93]. Two highly expressed genes that encode homogalacturonan α-1,4-galacturonosyltransferase (GAUT) and are possibly responsible for pectin synthesis were significantly decreased in BTx406/R9188 after flowering, but were either stable or upregulated compared to anthesis stage in sweet sorghum [92]. Three xyloglucan galactosyltransferase (XGT) genes exhibited similar trends: their expression levels in BTx406/R9188 were significantly downregulated when comparing to the anthesis stage, but such downregulation was not observed after flowering in the sweet genotypes. More interestingly, two highly expressed genes encoding endo-1,4-β-glucanase (CAZy ID: GH9) were decreased in BTx406/R9188, but remained high or upregulated expression in sweet genotypes after flowering. The Arabidopsis homolog (AT5G49720, KOR1) of the two GH9 genes can interact with CesA complex and is required for cellulose deposition [94,95,96], while the other Arabidopsis homolog (AT1G19940) affects cell wall crystallinity, secondary cell wall development, and biomass of Arabidopsis plants [96]. Overall, the genes highlighted here, CesA, Csl, XGT, GAUT, and GH9, are important for the synthesis of cellulose, pectin, and hemicellulose (including xyloglucation, mannan, and MLG), which are the major components of primary cell wall [92]. Also, MLG is suggested as a storage form of stem glucose when comparing the carbon allocation differences between grain, sweet, and wild sorghum lines [97, 98]. The expression analysis of cellulosic genes suggests that sweet sorghum stem tissue maintains an active primary cell wall development during post-anthesis that could serve as a significant carbon demand.

Fig. 4
figure 4

Comparison of representative genes involved in cell wall metabolism between sorghum genotypes. Gene expression dynamics are shown in heatmaps for cell wall biosynthetic genes (a) and monolignol biosynthetic genes (b), with the pathway of monolignol biosynthesis shown in c. The cell colors are shaded to reflect the magnitude of log2 fold change of gene expression relative to the anthesis stages in each genotype. The expression levels in RPKM are labeled on each cell with statistical differences (q values determined by edgeR) indicated by asterisk (*q < 0.05; **q < 0.01; ***q < 0.005). The geneIDs highlighted in red are those that share the similar expression trends between sweet lines Rio, Della and SIL05 but contrast to BTx406 and R9188. In b, the monolignol biosynthetic genes with functional evidence in sorghum are underlined, with their gene names or corresponding sorghum mutants labeled according to previous studies [99,100,101,102,103,104,105,106,107]

Monolignol biosynthetic genes

Extensive functional and structural studies in sorghum have identified and characterized the major genes controlling key steps of monolignol biosynthesis. These genes include those encoding the first and the third enzymes in the phenylpropanoid pathway (PAL and 4CL, respectively) that impact on the metabolic flux of monolignol precursors [99, 100], and several downstream genes encoding the enzymes (HCT, CCR, CAD, COMT, and CCoAMOT) that alter overall lignification and/or monolignin ratios [101,102,103,104,105,106,107]. Here, the RNA-seq data sets confirmed that these major functional genes are among those with the highest expression levels in their own families (Fig. 4b, c). The monolignol pathway was active in all the genotypes before flowering and was gradually downregulated after stem maturation. Sweet genotypes had higher expression levels of these genes during post-anthesis compared to BTx406/R9188, with statistical differences observed between Rio versus BTx406/R9188.

Previous studies on sorghum PAL genes have identified two subgroups encoding the enzymes active for l-Phe deamination (PAL) and l-Phe/-Tyr deamination (PTAL), respectively [100]. Expression profiling showed that two SbPTALs (Sobic.004G220300 and Sobic.006G148800) and one SbPAL (Sobic.004G220400) had the highest expression levels in stem and their expression decreased remarkably during post-anthesis in BTx406/R9188. In contrast, the SbPTALs and SbPAL maintained their expression levels from ~ 50 to several hundreds of RPKM in sweet genotypes, significantly higher than those in BTx406/R9188 (Fig. 5a). Consistent with the higher expression of SbPTALs in sweet sorghum, l-tyrosine content was decreased in BTx406/R9188, but remained stable over time in Rio [12] (Additional files 4, 13), which could support the metabolic flux of phenylpropanoid biosynthesis in sweet sorghum. The other two enzymes that could influence the metabolic flux of precursors of flavonoids and lignins are C4H and 4CL [99, 100]. Indeed, the most highly expressed C4H gene (Sobic.002G126600) was significantly decreased in BTx406/R9188 compared to Rio at 15 DAF but was stable in the sweet genotypes. The predominant 4CL gene (Sobic.004G062500, Bmr2) exhibited an expression pattern similar to C4H: its expression started to decrease in BTx406/R9188 at 10-days after flowering (T3), while in sweet genotypes, it remained stable until 15–25 days after flowering (for Rio and Della, respectively).

Fig. 5
figure 5

Comparison of starch-metabolic genes between sorghum genotypes. The expression dynamics of representative starch-metabolic genes were compared between sorghum genotypes and shown in heatmap (a), with the starch-metabolic pathway shown in b. The cell colors are shaded to reflect the magnitude of log2 fold change of gene expression relative to the anthesis stages in each genotype. The expression levels in RPKM are labeled on each cell with statistical differences (q values determined by edgeR) indicated by asterisk (*q < 0.05; **q < 0.01; ***q < 0.005). The geneIDs highlighted in red are those shared the upregulation expression trends between sweet lines Rio, Della and SIL05 but contrast to BTx406 and R9188

For the genes encoding downstream enzymes in the monolignol pathway, expression of several major functional genes including CCR, CAD, COMT, CCoAOMT, and F5H was decreased from 10 DAF in BTx406/R9188, but such an expression trend was not detected until 15–25 DAF in sweet genotypes (Fig. 4b). Several of these predominantly expressed genes encode the functional enzymes for each family, of which the structural information and substrate kinetics had been experimentally validated previously, such as SbHCT [103], SbCCR1 [107], SbCAD2 [106], SbCOMT [105], and SbCCoAOMT [104]. Particularly, CCoAOMT catalyzes the methylation of caffeoyl-CoA to feruloyl-CoA, whereas COMT catalyzes the methylation of caffeic acid, 5-hydroxyconifer-aldehyde, or 5-hydroxyconifer-alcohol to facilitate S lignin production. Both COMT and CCoAOMT use S-adenosyl-l-methionine (SAM) as the methyl donor [104, 105]. In the RNA-seq data set from Rio/BTx406/R9188, three SAM synthases (SAMS) showed significantly higher expression levels in Rio than those in BTx406/R9188, while several genes required for SAM metabolism were highly expressed [108] (Additional file 13). Similarly, using our previously published metabolome results from the same samples for RNA-seq dataset1, we showed that SAM content was stable in Rio over the stages, but were undetected after the T1 stage in BTx406/R9188, indicating that Rio could have higher levels of SAM compared to BTx406/R9188 [12] (Additional files 4, 13). Overall, most of the monolignol biosynthetic genes continuously decreased when compared to the pre-anthesis and anthesis stages in BTx406/R9188, but in sweet genotypes, they were relatively stable at early post-anthesis stages after an initial decrease from pre-anthesis stage. This suggests that active monolignol biosynthesis is maintained at early stages of post-anthesis in sweet sorghum.

Starch biosynthetic genes

Based on homologous and orthologous relationships between maize and sorghum genes, we identified genes encoding key enzymes in starch metabolism, including AGPase, SS, GBSS, SBE, ISA, GPT, GWD, PWD, and AMY (Additional file 12). We identified two AGPase small subunit genes and four AGPase large subunit genes, with Sobic.003G230500 and Sobic.007G101500 encoding the predictively cytoplasm-localized small and large subunits, respectively (Fig. 5). Interestingly, several AGPase subunits predicted to have plastidial localization (Sobic.002G160400, Sobic.009G245000, and Sobic.001G100000) were significantly upregulated in the sweet genotypes during sugar accumulation, but their expression levels were stable or decreased in BTx406/R9188. In contrast, the plastid-localized AGPase small subunits did not differ in expression trends between sorghum genotypes. Moreover, we identified two GPTs in sorghum (SbGPT1, Sobic.007G065500 and SbGPT2, Sobic.002G322000), of which the homologs in Arabidopsis function in G6P translocation into plastids and are responsible for providing G6P for the oxidative pentose phosphate pathway (OPPP) in specific tissues or fueling starch synthesis in non-green tissues, respectively [109, 110]. Expression of SbGPT2 but not SbGPT1 was upregulated and remained at high levels in sweet genotypes, but dramatically decreased in BTx406/R9188 (Fig. 5). Particularly, the Brix of introgression line R9188 can reach a high level comparable to Rio, but is not maintained and decreases at post-anthesis stages [12]. Among all the starch-related genes, SbGPT2 is the only gene whose expression dynamics correlates well with soluble sugar levels at all stages [12]. Both AGPase and GPT2 expression data indicated that ADP-glucose synthesis likely occurs in plastid and is highly active in sweet sorghum. Furthermore, several starch biosynthetic genes showed coordinated expression patterns like those observed in AGPase: (i) upregulation over the time course of stem sugar accumulation in sweet genotypes; (ii) significantly higher expression levels in Rio than in BTx406/R9188, and (iii) differential expression in Della and SIL05. They include two SS (Sobic.010G047700, Sobic.010G093400), one GBSS (Sobic.002G116000), two ISA (Sobic.007G204600, Sobic.009G127500), and two SBE genes (Sobic. 010G273800, Sobic.006G066800). The co-expression patterns between these starch biosynthetic genes are consistent with the notion that starch biosynthetic enzymes from multiple pathways form complexes in maize endosperm amyloplasts [56]. In addition, SbGWD (Sobic.010G143500) and SbPWD (Sobic.004G120100) with key roles in starch degradation also showed upregulated expression in sweet genotypes but not in BTx406/R9188. Similarly, a model of starch metabolism in Della has been proposed based on the expression of starch-metabolic genes from the Della RNA-seq data sets, and the model supports the activation of starch metabolism as sugar accumulates in sweet sorghum stem [11]. Taken together, the activation of starch-metabolic genes is associated with stem sugar levels and sink strength: all three sweet genotypes maintained high and upregulated expression of starch genes; R9188 with intermediate stem sugar [12] had lowered expression levels in some starch genes compared to sweet genotypes, including those encoding SbGPT2, AGPase (Sobic.001G100000, Sobic.007G101500), and SS (Sobic.010G047700, Sobic.010G093400).

Sucrose-metabolic genes

Sucrose levels could be influenced by three sets of enzymes that are directly involved in channeling sucrose into primary metabolism: (i) invertase (INV) hydrolyzes sucrose into glucose and fructose; (ii) sucrose synthase (SuSy) hydrolyzes sucrose into fructose and UDP glucose; (iii) sucrose-phosphate synthase (SPS) and sucrose-phosphate phosphohydrolase (SPP) resynthesize sucrose (pathway in Fig. 6). The INVs are grouped into alkaline–neutral INVs (INVANs) and acid INVs based on their optimum pH, the latter being classified into cell wall INVs (INVCWs, insoluble) and vacuolar INVs (INVVRs, soluble). We identified 18 INVs in sorghum [111], including seven INVANs, nine INVCWs, and two INVVRs, Additional file 14). We compared the INVs here with those identified in a sugarcane–sorghum comparative study [112] and found the INVs to be identical, with an additional INVCW (Sobic.001G099700) not expressed in the sorghum RNA-seq data sets. First, three INVCWs (Sobic.006G255600, Sobic.004G163800, and Sobic.003G440900) expressed in stems with varied expression patterns among genotypes, appeared not to be related to sugar accumulation (Fig. 6). Second, all seven INVANs were expressed. Their expression patterns varied among genotypes and time points, but were not directly correlated to sucrose accumulation, indicating tightly regulated sucrose metabolism in cytosol. Third, the INVVR (Sobic.004G004800) with the highest expression levels among all INVs was sharply decreased at anthesis in all genotypes, a prerequisite for sugar accumulation in vacuole. Three SuSy genes were highly and differentially expressed: Sobic.001G344500 with the highest expression level was remarkably downregulated in all the genotypes, correlating with decrease of sucrose cleavage activity in Rio during sugar accumulation [113]; expression dynamics of Sobic.010G072300 and Sobic.001G378300 differed between genotypes. Several highly expressed SPSs and SPP (Sobic.004G151800) were stably detected and no clear trends were observed between sweet versus non-sweet genotypes. Furthermore, three Trehalose 6-Phosphate Synthase (TPS) and two Trehalose Phosphate Phosphatase (TPP) genes were highly transcribed. TPS and TPP genes jointly control the biosynthesis of trehalose 6-phosphate (T6P), an important signal and negative feedback regulator of sucrose levels [114]. One TPP gene (Sobic.002G303900) showed distinct expression patterns in sweet versus non-sweet sorghum. It was barely expressed in sweet genotypes but upregulated in Btx406/R9188 [12]. Functional studies are needed to elucidate their potential roles in carbon metabolism. It is worth noting that none of these sucrose-metabolic genes, except for TPP, showed upregulation when comparing sweet versus non-sweet genotypes, unlike those observed in cell wall and starch-metabolic genes (Figs. 4, 5, 6), suggesting two possibilities: sucrose metabolism is not a major limiting factor for stem sugar accumulation, or the regulatory role of sucrose-metabolic pathway lies in post-transcriptional levels. Taken together, the expression of SPSs, SPPs, SuSys and seven intracellular INVANs suggest active sucrose metabolism in stems, supporting previous conclusion that sucrose may be inverted and re-synthesized, while a significant portion was not metabolized during accumulation [40, 41].

Fig. 6
figure 6

Comparison of sucrose-metabolic genes between sorghum genotypes. The expression dynamics of representative sucrose-metabolic genes were compared between sorghum genotypes and shown in heatmap (a), with the sucrose metabolism pathway shown in b. The cell colors are shaded to reflect the magnitude of log2 fold change of gene expression relative to the anthesis stages in each genotype. The expression levels in RPKM are labeled on each cell with statistical differences (q values determined by edgeR) indicated by asterisk (*q < 0.05; **q < 0.01; ***q < 0.005). The TPP gene highlighted in red was barely expressed in sweet lines Rio, Della and SIL05, but upregulated in BTx406 and R9188

Sucrose transporters

To identify candidate sucrose transporters responsible for stem sugar accumulation, we analyzed three families: SUTs, TSTs, and SWEETs. A comprehensive phylogenetic analysis of SWEETs from all the sequenced species within Angiosperms defines four clades [47]. Results of SWEETs substrate selectivity have been reported in several species, such as Arabidopsis [48,49,50], rice [54], maize [52, 53], cucumber [51], and other species [115]. Published data suggest that the SWEET substrate selectivity correlates well with its phylogenetic clade (Additional file 15), indicating that only clade III SWEETs are likely to transport sucrose. However, the physiological functions of SWEETs remain to be characterized individually, which could be determined by their spatio-temporal expression patterns and/or the metabolic pathway that SWEET-mediated transport process could affect. According to our bioinformatics search, we identified 23 SWEETs and validated them to group into the four previously defined clades (Additional file 16) [47], with the designation of 20 SbSWEETs being consistent with the previous report [46]. By contrast, Mizuno et al. [45] identified 23 SWEETs including Sobic.003G149000, Sobic.003G038700, and Sobic.003G038800. Sobic.003G149000 contains two complete MtN3 domains and is designated SbSWEET17 (Additional files 6, 7, 16). Sobic.003G038700 is grouped with clade II SWEETs and has two complete MtN3 domains, but Sobic.003G038800 has not, which is probably a tandemly duplicated copy of Sobic.003G038700 (Additional files 6, 7). These two putative SWEETs specifically expressed in the inflorescence at relatively lower levels compared to other SWEETs in the expression database MOROKOSHI (Additional file 7) [45, 84]. We focused on clade III SWEETs that could transport sucrose. Among the six differentially expressed clade III SbSWEETs, SbSWEET13A (Sobic.008094000) exhibited highest expression level in stems and was upregulated in all the genotypes (Fig. 7) [45]. Previously, SbSWEET13A was not found to be differentially expressed when compared between two grain and sweet sorghum lines [46]. Here, integration of several data sets demonstrated that SbSWEET13A was upregulated in stem tissues from all the investigated genotypes, probably excluding its role in determining sugar accumulation difference between grain and sweet sorghum (Fig. 7). Interestingly, the tandem-duplicated SbSWEET13A/B/C showed different expression levels and spatio-temporal specific patterns according to our data and other databases (Fig. 7). SbSWEET13A/B were preferentially expressed in green tissues, while SbSWEET13C was also expressed in roots, suggesting their functional divergence. SbSWEET11A exhibited lower expression levels in sweet genotypes than in the non-sweet genotypes, although its expression level was relatively lower than other clade III SWEETs. It is possible that SbSWEET11A might be highly expressed at particular cell types in stem tissue, leading to its expression level being diluted when measured in whole tissues. In addition, SbSWEET13A and SbSWEET11A appeared to be positively and negatively correlated to internode maturity or stages in non-sweet genotypes, but such a correlation was not observed in sweet genotypes, the biological significance of which needs further functional studies. The other clade III SbSWEET genes failed to generate a clear expression trends between sweet and non-sweet genotypes (Fig. 7), suggesting that SWEET-mediated sucrose transport may not be a major difference between sweet and non-sweet genotypes. Mizuno et al. [45] proposed that SbSWEET13A and 3A are involved in sucrose efflux from leaf; however, SbSWEET3A was grouped into clade I and homologous to other clade I SWEET3s from rice and maize (Fig. 7, Additional file 6) [52,53,54]. Similarly, the SbSWEET13A homolog in sugarcane (SsSWEET13C) was highly expressed in leaf mature zone and internode sclerenchyma cells, but expressed at very low levels in stem parenchyma, supporting a proposed role of SWEET13 in sucrose efflux in leaves [116]. SbSWEET4A, 4B, and 4C were suggested as candidates for sucrose transportation in panicle and stem, respectively, due to their spatial expression preference [45]. Here, SbSWEET4A/B/C are grouped into clade II together with their maize orthologs, consistent with the previous SWEET phylogenetic analysis [47]. Thus, the proposed roles of SbSWEET4s would be questioned because of their phylogeny–function correlation. It should be noted that the SWEET phylogenetic tree here resembles the one that covered all the sequenced species of Angiosperms, but differs from the tree reported in sorghum [45, 47]. This discrepancy may be due to the following differences in phylogenetic analysis: (i) species used for analysis; (ii) methods for amino acid sequence alignment; (iii) parameters for constructing phylogenetic tree.

Fig. 7
figure 7

Comparison of sugar transporter genes between sorghum genotypes. The expression dynamics of SWEETs, SUTs, and TSTs were compared between sorghum genotypes and shown in heatmap. The cell colors are shaded to reflect the magnitude of log2 fold change of gene expression relative to the anthesis stages in each genotype. The expression levels in RPKM are labeled on each cell with statistical differences (q values determined by edgeR) indicated by asterisk (*q < 0.05; **q < 0.01; ***q < 0.005). SbTST2 is highlighted in red, because upregulation of SbTST2 was observed during post-anthesis stages in sweet lines Rio, Della, and SIL05, and its expression level in Rio was higher than those in non-sweet BTx406/R9188 (p < 0.05, determined by two-way ANOVA followed by multiple comparison)

Additionally, we identified three SUTs that were differentially expressed in at least one genotype (Fig. 7). Among the two highly expressed SUTs (SbSUT1, Sobic.001G488700; SbSUT2, Sobic.008G193300), only SbSUT2 had a slightly higher expression level in sweet genotypes compared to BTx406/R9188 during post-anthesis, suggesting it as a candidate transporter related to stem sugar accumulation. The expression profiles of SbTSTs showed that Sobic.001G312900 and Sobic.004G099300 also had higher expression levels (Fig. 7); they are homologous to the TST1 and TST2 in Arabidopsis [117] and sugar beet [118]. TST2 is the only TST member that can transport sucrose and is related to sucrose storage in vacuoles, which is confirmed in several species, including sugar beet [118], melon [127, 128]. Also, our interpretations may come with caveats based on the possibility that the expression data analyzed may not be truly representative in some instances, since some of the data sets are not replicated. Moreover, several cell wall components, such as xylan and glucan, account for considerable fractions of carbon utilization [26] are not included in the present model due to missing information on their metabolic genes in sorghum and closely related species.

Conclusions

Here, we have presented the first comparative transcriptome analysis of sugar-accumulating internodes in sorghum that is relevant to bioenergy research at a gene discovery level. The common transcriptome features indicate differences in several primary metabolic pathways between the sweet and non-sweet sorghums, suggesting the metabolic networks possibly coordinating carbon allocation and sink strength in the sorghum internode. Specifically, several genes, including those involved in cellulose and monolignol synthesis (CesA, PTAL, and CCR), starch metabolism (AGPase, SS, SBE and G6P-translocator SbGPT2), and sucrose metabolism and transportation (TPP and TST2), were strongly correlated with the three sweet sorghum genotypes compared to the non-sweet lines, serving as candidates for functional studies of carbon manipulation in sorghum stem. This study also shows that a combination of multiple advanced resources (including metabolites, expression data sets, genotypes, and conditions of sorghum stem sink) provides a comprehensive and cohesive picture of the complexity of carbon sink strength in sorghum stem, which might not be achieved by a single data set. The many candidate genes identified here could be manipulated and studied to further our understanding and utilization of carbon allocation and/or sugar accumulation in bioenergy crops.