Background

Heterosis is a phenomenon in which hybrids exhibit superiority over their parental lines in economic traits, such as enhanced biomass production, development rate, stress tolerance and, most important, grain yield. Heterosis has been extensively used to increase crop productivity throughout the world. A major increase in rice yield was caused by the application of heterosis. Because of the key role of heterosis, the molecular mechanisms should be elucidated. In the early twentieth century, dominance (Davenport 1908) and over-dominance (Shull 1908) were used to explain heterosis. However, with nothing about molecular concepts being covered, consequently, they cannot interpret the molecular genetic mechanisms of heterosis (Birchler et al. 2003). With the development of polymerase chain reaction (PCR), molecular markers have been widely used to identify the distance between the hybrid and its parents and to build the relationship between heterosis and genetic distance. However, marker PCR can only be used to classify heterotic groups and determine genetic diversity, but it cannot predict heterosis because the coefficient of the relationship between the genetic distance of SSR markers and yield heterosis is very small (Xu et al. 2009). Subsequently, molecular markers and hybrid genetic analysis have been used to locate QTLs for heterosis. A Pioneer study of the heterosis gene qGY2–1 related to yield was reported in haplotype populations (He et al. 2006). To eliminate the epistasis effect among QTLs, Bian et al. (

Results

Phenotype analysis for WFYT025 and its parents

In this study, we investigated the yield-related traits of WFYT025 and its parents. It was found that the panicles of WFYT025 and its male parent CHT025 were larger than those of the female parent WFB, and their grain number and primary branch number were also higher than those in WFB (Fig. 1a and b). However, no significant differences were observed between WFYT025 and parental line CHT025 for both grain number and primary branch number (Fig. 1b). Mid-parent heterosis (MPH) and higher parent heterosis (HPH) were estimated for the heterosis of panicles. The MPH for all of the traits except the seed setting ratio and tiller remained significant (Table 1). Traits such as primary branch number, secondary branch number, filled grain number, empty grain number and 1000-grain weight were significant for the MPH at p < 0.05, while traits such as spike length, total grain number and yield per plant were highly significant at p < 0.01. The MPH showed negative effects on the seed setting ratio. Apart from the seed setting ratio and empty grain number, the MPH values for all of the traits varied from 1.16 to 32.32%. In addition, HPH for yield per plant remained highly significant (22.99%) at the p < 0.01 level. Further analysis indicated that significant difference for yield per plant was mainly due to the large MPH range for filled grain numbers (20.01%) and 1000-grain weight (6.25%). This implied that compared to the 1000-grain weight, the yield heterosis was more likely to underlay the filled grain number between hybrid WFYT025 and maternal line WFB.

Fig. 1
figure 1

Comparisons of super hybrid WFYT025 combination. a The upper panel illustrates the panicles from combination of super hybrid WFYT025. Left, CHT025; middle, WFYT025; right, WFB. The lower panel shows the combination of super hybrid WFYT025. Left, CHT025; middle, WFYT025; right, WFB. b Panicle traits of CHT025, WFYT025, and WFB

Table 1 Phenotypic Analysis of Super Hybrid WFYT025 Combination

Identification of transcripts by sequencing

A total of 917 million raw reads were generated using the high-throughput Illumina HiSeq 2500 platform. The paired-end sequences with low-quality reads containing adapters were trimmed off. Finally, 87.2 million clean reads were obtained (Table 2). The correlation for the gene expression level from three biological replicates of each line was 0.97 < R2 < 0.99. (Additional file 1: Figure S1). We pooled the short reads and aligned them to the Nipponbare reference genome (IRGSP v1.0) to identify the transcripts. Out of 35,679 identified transcripts, 27,917 transcripts were mapped, covering 78.24% of the genome. In addition, the transcriptome profile of WFYT025 was similar to that of its female parent WFB (Fig. 2).

Table 2 Number of Mapped Reads
Fig. 2
figure 2

Hierarchical clustering analysis of all gene models based on expression data. Each horizontal line refers to a gene. The color key represents RPKM normalized log2 transformed counts. With the color varied from blue to red, the expression of transcripts are from low to high. CS 1 to 3, YS 1 to 3 and BS 1 to 3 stand for the replicated samples from CHT025, WFYT025, WFB, respectively

Validation of gene expression by quantitative real-time PCR (qRT-PCR)

To validate the results of mRNA sequencing data, the expression of a subset of 15 randomly selected DGHP was determined by qRT-PCR. The list of primer sequences is presented in Additional file 2: Table S1. The results obtained from qRT-PCR and RNA-seq were compared, and expression trends were consistent for all transcripts in both analyses; the correlation coefficient (R2) was 0.9339 (Fig. 3).

Fig. 3
figure 3

Comparison of the log2 (FC) of 15 randomly selected transcripts using RNA-Seq and qRT-PCR

Analysis of differentially expressed genes (DEGs)

We adopted reads per kilobase million reads (FPKM) to measure gene expression levels. Two criteria were considered to identify putative DEGs: (1) the false discovery rate (FDR) should be ≤0.05 and (2) the fold change (FC) should be ≥2. Following these criteria, 4160 DEGs have been identified between paternal line CHT025 and WFYT025. Of these, 2155 DEGs were up-regulated and 2005 were down-regulated. Additionally, 2809 DEGs were identified between maternal line WFB and WFYT025, of which 1463 DEGs were up-regulated and 1346 DEGs were down-regulated (Table 3). For a detailed comparison, the FPKM of all transcripts is presented in Additional file 3: Table S2. DEGs between parents are designated DGPP, and DEGs among the hybrid and parents are designated DGHP. DGHP may be relevant to heterosis because differences in expression between hybrids and parents should underlie their phenotypic differences. While DGPP only refers to the differences among the two parental lines (Song et al. 2010), there are still 3223 DGHPs that overlapped with DGPP, which indicates that these DGPP are also associated with heterosis (Fig. 4). In addition, 1059 DGHPs were shared between the hybrid and both of its parents.

Table 3 Number and Classification of DGHP
Fig. 4
figure 4

DEGs in super hybrid WFYT025 combination. Venn diagram of DEGs between the hybrid and its parents. CS, YS and BS represent CHT025, WFYT025 and WFB, respectively

The mode of inheritance for DGHP

Using the method to evaluate the mode of inheritance, DGHP were classified into four expression patterns: over-dominance (Hp ≤ − 1.2 or Hp > 1.2), dominance (− 1.2 < Hp ≤ − 0.8 or 0.8 < Hp ≤ 1.2), additive effect (− 0.2 < Hp ≤ 0.2), and partial dominance (− 0.8 < Hp ≤ − 0.2 or 0.2 < Hp ≤ 0.8) (Additional file 4: Table S3). As shown in Fig. 5, these data suggested that the over-dominant effect, dominant effect, partially dominant effect and additive effect accounted for 63.1%, 17.3%, 15.6% and 4.0%, respectively.

Fig. 5
figure 5

Breakdown of the DGHP according to the dominance ratio Hp. Depending on the principal of Hp = [d] / [a], Hp was classified as either positive or negative

Functional classification of DGHP by Gene Ontology (GO)

We applied Gene Ontology (GO) to classify the function of the mRNA. Using Web Gene Ontology Annotation Plot (WEGO) software (Ye et al. 2006), we distributed 5910 DGHP into at least one term in the GO molecular function, cellular component, and biological process categories. Further analysis showed that 5910 DGHP were present in 54 functional subcategories at a significance level of p < 0.05 (Fig. 6). In the cellular function category, cells and cell parts were mainly divided in the groups. For the molecular function category, DGHP was enriched with binding and catalytic activity. With respect to biological processes, cellular and metabolic processes were highly enriched in DGHP. We further analysed the GO terms of DGHP enriched with the biological process subcategories. These GO terms, including response to stimulus, cell proliferation, carbohydrate metabolic process, organ formation, and gibberellin biosynthetic process, may underlie heterosis in the young panicle of WFYT025 (Tables 4 and 5).

Fig. 6
figure 6

Comparison of Gene Ontology (GO) classifications of DGHP. a CS and YS represent CHT025 and WFYT025 respectively. Red column and green column represent up-regulated and down-regulated transcripts respectively. b BS and YS represent WFB and WFYT025 respectively. Red column and green column represent up-regulated and down-regulated transcripts respectively

Table 4 Significant GO Terms of DGHP Between CS and YS in the Biological Process Category
Table 5 Significant GO Terms of DGHP Between BS and YS in the Biological Process Category

DGHP map** Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway

For the identification of metabolic pathways in which DGHP were involved and enriched, the Kyoto Encyclopedia of Genes and Genomes pathway database was used. In total, 118 pathways were identified in 613 DGHP (between paternal line CHT025 and hybrid line WFYT025). The top 20 most enriched pathways mainly covered carbon fixation in photosynthetic organisms, DNA replication, fatty acid biosynthesis and metabolism, and phenylpropanoid biosynthesis (Fig. 7a). In contrast, 268 DGHP between maternal line WFB and WFYT025 were classified into 107 pathways, and the top 20 most enriched pathways were mainly concentrated in plant hormone signal transduction, carotenoid biosynthesis, diterpenoid biosynthesis, zeatin biosynthesis, and cysteine and methionine metabolism with a significance level of p < 0.05 (Fig. 7b). This suggests that the considerable differences in young panicles between WFB and WFYT025 may be related to hormone regulation.

Fig. 7
figure 7

KEGG pathway assignments of DGHP. a KEGG analysis of DGHP between CHT025 and WFYT025. b KEGG analysis of DGHP between WFB and WFYT025. Both (a) and (b) showed the top 20 most represented categories and the number of transcripts predicted to belong to each category

Comparison of DGHP with grain yield-related genes (QTLs)

We were able to map the DGHP that were significant in the KEGG analysis (P < 0.05) between WFYT025 and WFB for the QTLs associated with grain yield in the rice genome (http://www.gramene.org). As shown in Table 6, a total of 36 transcripts were mapped in the interval of 22 yield-related QTLs, including 15 grain number QTLs, 6 1000-grain weight QTLs and 1 yield per plant QTL. Most genes shared the same location with one yield-related QTL. However, Os03g0856700 corresponded to qGP3–1 for grain number and qSNP-3b for spikelet number per panicle. Os04g0229100 was mapped to the same loci as qGwt4a for 1000-grain weight and qSNP-4a for spikelet number per panicle, while Os04g0578400 and Os04g0608300 shared the same chromosome segment with qGPP-4 for grain number per panicle and qSNP4–1 for spikelet number per panicle.

Table 6 Significant differentially Expressed Transcripts Mapped in each of the QTL Regions

Discussion

Though heterosis has been extensively exploited in plant breeding and plays an important role in agriculture, the molecular and genetic mechanisms underlying this phenomenon remain poorly understood. Differential gene expression between a hybrid and its parents may be associated with heterosis (He et al. 2010; Kim et al. 2013; Zhang et al. 2008). Here, we investigated the relationship between transcriptional profiles and heterosis in super hybrid rice WFYT025 by RNA-Seq.

Comparative analysis of DGHP

Using RNA-Seq analysis, 872 million high-quality paired-end reads of 150 bp were generated from the panicles of WFYT025 and its parental lines at the panicle differentiation stage, and 27,917 annotated transcripts were identified. Of these transcripts, 4160 DGHP between hybrid WFYT025 and paternal line CHT025 and 2809 DGHP between hybrid WFYT025 and maternal line WFB were identified.

The filled grain number heterosis exhibited significant differences between WFYT025 and WFB; however, there were no significant differences between WFYT025 and CHT025 (Fig. 1b, Table 1). Therefore, the results suggest that the expression of DGHP between WFYT025 and WFB at the young panicle development stage may play an important role in grain number heterosis compared to that between WFYT025 and CHT025. Therefore, focusing on the expression of DGHP between WFB and WFYT025 might find an association between DGHP and heterosis for filled grain number.

The genetic basis of heterosis

We have been able to identify a number of DGHPs underlying grain number between hybrid WFYT025 and maternal line WFB, confirming the suggestion that heterosis is a polygenic phenomenon (Kusterer et al. 2007; Bian et al. 1996), qGP3–1 (Li et al. 2001), qGPP-4 (** of quantitative trait loci controlling grain-filling in rice (Oryza sativa L.). J Exp Bot 56:2107–2118" href="/article/10.1186/s12284-018-0229-y#ref-CR37" id="ref-link-section-d150719461e3694">2005), qSP6–1 (Zhuang et al. 2001), qSNP-6 (He et al. 2001), qNFPB-11 (Yamagishi et al. 2004)); spikelet density (e.g., qSD-15 (Li et al. 1998) and qSSD-10 (** quantitative trait loci for yield, yield components and morphological traits in an advanced backcross population between Oryza rufipogon and the Oryza sativa cultivar Jefferson. Theor Appl Genet 107:479–493" href="/article/10.1186/s12284-018-0229-y#ref-CR38" id="ref-link-section-d150719461e3739">2003), qGwt4a (Lin et al. 1995), qKw5 (Li et al. 1997), qGw-6 (Lu et al. 1996), and qGw7 (Li et al. 2000)); and yield per plant (e.g., yd7a (Li et al. 2000)).

The potential association between DGHP and QTLs was also suggested within many QTL regions, including putative protein phosphatase 2C (Os01g0846300) with qSSBP1–1 for the number of spikelets on secondary branches per panicle and putative transketolase (Os05g0408900) with qSD-15 for spikelet density. Interestingly, OsGA20ox1 (Os03g0856700), which is related to gibberellin biosynthesis, is located in both qGP3–1 for the number of grains per panicle and qSNP-3b for the spikelet number per panicle. Putative fatty acid hydroxylase (Os04g0578400), which is involved in carotenoid biosynthesis, and OsSAUR20-Auxin-responsive SAUR gene family member (Os04g0608300) was shared in both qSNP4–1 for spikelet number per panicle and qGPP-4 for number of grains per panicle. Except for a small number of cloned genes, such as Os01g0788400, Os02g0697400, Os02g0771600, OsGA20ox1 (Os03g0856700), Os03g0760200, Os03g0645900, Os04g0474800, Os04g0522500, Os04g0556500, Os05g0380900, Os07g0154100, and Os07g0155600, the remaining genes(including Os01g0846300, Os05g0408900, Os04g0578400 and Os04g0608300), which have been located in grain yield QTLs (including grain number, 1000-grain weight, and yield), were not cloned. Studying the function of these candidate transcripts in these QTL regions may increase the knowledge of the molecular mechanisms underlying heterosis.

Transcription factors probably underlying heterosis

Since transcripts are always under different levels of regulation, such as transcription and splicing through genetic or epigenetic mechanisms, the detailed sequence comparisons and validations for different alleles of annotated DGHP are not suitable to display in this current report. Transcription factors (TFs) are certainly one of the causes of gene expression fluctuations. In this study, we indeed found that 51 TFs showed significant differential expression in the hybrid compared with the maternal line (Additional file 6: Table S5). It is a coincidence that a previous study also proposed that altered gene expression caused by interactions between transcription factor allelic promoter regions in hybrids was one reasonable mechanism underlying heterosis in rice (Zhang et al. 2008).

Furthermore, among the 51 TFs, we found that LAX1, which is the main regulator involved in the formation of axillary bud primordium in rice, is overrepresented in the hybrid (Komatsu et al. 2003). MADS-box 55 (MADS50) was up-regulated significantly, and MADS-box 56 (MADS56) was down-regulated in the hybrid compared to the maternal line (Additional file 6: Table S5). This is consistent with a previous study that suggests that OsMADS50 and OsMADS56 function antagonistically in regulating LD-dependent flowering (Ryu et al. 2009). Certainly, except for 21 reported TFs, the remaining 30 novel TFs might play an important role in the young panicle and grain number heterosis.

Conclusions

In this study, we systematically investigated the transcriptome profiles from super-hybrid rice WFYT025 combinations for young panicles at the panicle differentiation stage by deep high-quality sequencing. We obtained a large amount of DGHP and found that the over-dominance effect is the main mode of inheritance for DGHP. Comparing the significantly enriched DGHP (P < 0.05) between WFYT025 and WFB with QTLs in response to grain number, we found some candidate transcripts that may contribute to the increase in grain yield. Exploring these candidate transcripts will provide new opportunities for revealing the heterosis of grain yield.

Methods

Plant materials and growth conditions

The hybrid WFYT025 along with its parental lines Changhui T025 (CHT025) and Wufeng B (WFB) were planted in the experimental field of Jiangxi Agricultural University. WFYT025 is a super-hybrid rice combination derived from the cross between female parent WFB and male parent CHT025. WFYT025 and the two parents were sown at the experimental plot in Jiangxi Agricultural University in a completely randomized block design with three replications in autumn 2016. Each plot consisted of 50 rows, with each row consisting of 10 plants, each separated from its neighbour by 20 cm. Crop management followed normal procedures for rice. These three lines were selected in this study to measure phenotypic traits and conduct transcriptome analyses. At maturity time, panicles were selected with ten replicates for the estimation of heterosis. The young panicles at the differentiation stage were collected and stored at − 80 °C for RNA-Seq analysis, and each sample had at least three biological replications to minimize systematic errors.

Panicle heterosis measurements

To determine 1000-grain weight, panicles were dried in an oven at 42 °C for 1 week. Panicle length, primary branch, secondary branch, number of filled grains and total grain number were measured manually. Mid-parent heterosis (MPH) and higher parent heterosis (HPH) were calculated for these traits according to the following formulas: MPH = (F1 − MP) / MP and HPH = (F1 − BP) / BP, where F1 is the performance of the hybrid, MP is the average performance of the two parents and BP is the performance of better parents. Hypothesis testing was performed using a t-test.

RNA extraction, cDNA library preparation and sequencing

Total RNA was extracted from rice panicles using Trizol reagent (Invitrogen, Carlsbad, CA, USA) and purified using an RNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. The quality and integrity of RNA were tested using an Agilent Bioanalyzer 2100 system (Agilent, Santa Clara, CA, USA); RNA Integrity Number (RIN) values were greater than 8.5 for all samples. After total RNA extraction, eukaryotic mRNA was enriched by Oligo (dT) beads, while prokaryotic mRNA was enriched by removing rRNA using the Ribo-Zero TM Magnetic Kit (Epicentre). Then, the enriched mRNA was fragmented into 200-bp segments using fragmentation buffer and reverse transcribed into cDNA with random primers. Second-strand cDNA synthesis was subsequently performed using DNA polymerase I, RNase H, dNTP and buffer. Then, the cDNA fragments were purified with QIAquick PCR extraction kit, end repaired, poly (A) added, and ligated to Illumina sequencing adapters. The ligation product size was selected by agarose gel electrophoresis, PCR amplified, and sequenced with 100 cycles of paired-end sequencing (2 × 150 bp) using Illumina HiSeq TM 2500 by Gene Denovo Biotechnology Co. (Guangzhou, China). The processing of fluorescent images into sequences, base-calling and quality value calculations were performed using the Illumina data processing pipeline (version 1.8). The sequence reads were submitted to the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/sra) under the accession number SRP127997.

Identification of differentially expressed mRNAs

Raw reads generated from high-throughput sequencing were treated as follows. First, to remove adapters that were added for reverse transcription and sequencing, sequences with too many unknown bases (>10%) and low-quality bases (>50% of the bases with a quality score ≤ 20) were removed. The reads mapped to the ribosome RNA (rRNA) database were removed with the read alignment tool Bowtie 2 (Langmead and Salzberg 2012). The remaining reads of each sample were then mapped to the Nipponbare reference genome (IRGSP build 1.0) by TopHat2 (version 2.0.3.12) (Kim et al. 2013). The parameters for alignment were set as follows: 1) the maximum read mismatch should be 2; 2) the distance between mate-pair reads should be 50 bp; 3) the error of distance between mate-pair reads should be ±80 bp. Differential expression was estimated and tested using the software package edgeR (R version: 2.14, edge R version: 2.3.52) (Robinson et al. 2010). We quantified gene expression levels in terms of fragments per kb for a million reads (FPKM) (Mortazavi et al. 2008), calculated the false discovery rate (FDR), and estimated the fold change (FC) and log 2 values of FC. Transcripts that exhibited an FDR ≤ 0.05 and an estimated absolute log2(FC) ≥ 1 were considered to be significantly differentially expressed.

The mode of inheritance analysis

For statistical analysis, the analysis of variance (ANOVA) was usually by the model: y = u + (GA) + (GD) + (SR) + e, where y is the acquired gene expression, u is the overall mean, GA is the additive effect, GD is the dominant effect, SR is the replication effect, and e is the residual error (Lynch and Walsh 1998). Hp = [d]/[a], referred to as the dominance ratio or potency (where [a] and [d] represent GA and GD, respectively), was also calculated to measure the non-additivity of the F1 hybrid relative to its parents (Griffing 1990). Considering gene expression levels as quantitative traits, we adopted traditional quantitative genetic parameters, such as composite additive effect [a] and composite dominance effect [d], to estimate our expression profile. DGHP were classified according to the dominance ratio Hp (= [d]/[a]), based on 99.8% confidence intervals constructed for [d] - [a] ([d] > 0) and [d] + [a] ([d] < 0). According to the value of Hp (=[d]/[a]), we considered that these genes belonged to partial dominance (− 0.8 < Hp ≤ − 0.2 or 0.2 < Hp ≤ 0.8), over-dominance (Hp ≤ − 1.2 or Hp > 1.2), dominance (− 1.2 < Hp ≤ − 0.8 or 0.8 < Hp ≤ 1.2) and additive effect (− 0.2 < Hp ≤ 0.2) (Stuber et al. 1987, Bian et al. 2011).

Cluster analysis

Cluster analysis of all annotated transcripts from the hybrid and its parents was performed. The FPKM-normalized expression counts for each transcript were clustered with the software Cluster 3.0, and the results were visualized using Treeview (Eisen et al. 1998).

Real-time quantitative PCR

The expression of genes with differential expression (DEGs) and results of RNA sequencing were validated by quantitative real-time PCR. Total RNA from nine samples (including three biological replicates) was extracted using the Prime Script™ RT reagent Kit with gDNA Eraser according to the manufacturer’s instructions. SYBR-based qRT-PCR reactions (SYBR Green I, Osaka, Japan) were performed on an ABI VIIA@7 using the following thermal cycling conditions: 50 °C for 2 min; 95 °C for 5 min followed by 40 cycles at 95 °C for 15 s and 60 °C for 34 s. All qRT-PCR reactions were performed in triplicate samples, and the results were analysed with the system’s relative quantification software (ver. 1.5) based on the (ΔΔCT) method. The detection of the threshold cycle for each reaction was normalized against the expression level of the rice Actin1 gene with the primer sequences 5′-TGGCATCTCTCAGCACATTCC-3′ and 5′-TGCACAATGGATGGGTCAGA-3′.