Background

Rice is the primary food source for over 3.5 billion people [7] or cell size [8] or both [9]. This increases sink area for acquiring storage compounds, leading to increased grain weight, thus, establishing a positive correlation between seed size and grain yield [10]. The quest for understanding GS regulation in rice is challenging. As a quantitative trait, it undergoes polygenic regulation and is under environmental influence. Furthermore, various traits of rice grain, namely, length, width, and weight are associated with each other and one gene/quantitative trait locus (QTL) may affect more than one such trait [11].

Grain development in rice generally occurs over a month and can be distinguished by three land mark events viz., cell division, initiation of organs, and maturation [12, 13]. The first land mark event extends from 0 to 2 days after pollination (DAP) where extensive cell division occurs immediately after anthesis followed by triple fusion to form a middle-globular embryo and a syncytial endosperm. Organ initiation on the embryo and cellularization of endosperm begin by 3–4 DAP and are marked as the second landmark process. Maturation involves organ enlargement and maturation of embryo and endosperm, including grain filling, and extends from 5 to 29 DAP. Further, maturation can be divided into distinct sub-stages which include individual organ enlargement on embryo and endoreduplication of endosperm (5–10 DAP), embryo maturation and programmed cell death (PCD) in endosperm (11–20 DAP), and dormant embryo and dehydration of endosperm (21–29 DAP). Storage accumulation occurs during maturation phase. Storage is related to grain filling, which is majorly contributed by carbohydrate and seed storage protein accumulation. Altogether, the process of rice seed development has been defined by five stages as S1 (0–2 DAP), S2 (3–4 DAP), S3 (5–10 DAP), S4 (11–20 DAP), and S5 (21–29 DAP) [12, 14].

GS regulation in rice is extremely diverse with involvement of genes controlling hormones [15], G-protein signaling [33], stress tolerance [34, 35], grain quality, and hybrid vigor [2: Table S10). All these detected QTLs were mapped on 12 chromosomes explaining 1.5–30% phenotypic variation for grain size/weight with a significant logarithm of the odds score (LOD) (2.5–13.0) in rice. Further, genes from each of the abovementioned transcriptome analysis identified to delineate pathways controlling rice GS differentiation, which were either stage-specific or were highly differentially regulated between SN and LGR (Fig. 9, Additional file 2: Tables S5, S6, S7, S8 and S9), were overlayed with the grain size/weight QTLs. A total of 186 of these genes (Additional file 2: Table S11) were found to be located within QTL genomic intervals. These were contributed by 31 genes which were commonly differentially regulated in a stage-specific manner in SN and LGR, 15 cell cycle-related genes which were specifically upregulated in either SN S5 or LGR S3 stage, 38 LGR S3 preferential genes, 45 TFs belonging to families with stage-preferential expression in a stage of either SN or LGR, 50 phytohormone related genes which had a different pattern and level of expression between SN and LGR, and seven genes with opposite regulation in SS or CS comparisons whose upstream miRNAs also show opposite regulation. These 186 genes (including regulatory genes) present in the QTL genomic regions associated with GS traits give a strong indication of their roles in the process, and thus should be explored especially as promising candidate genes by their detailed functional characterization in rice.

Novel miRNA-target modules in rice GS regulation

In rice seeds, miRNAs are known to function both in early seed development [57] and grain filling [58]. Using exactly the same seed development tissues (S1–S5) and flag leaf, as for transcriptome, miRNA expression profiles were generated for SN and LGR. Small RNA sequencing generated a total of 14 Gb clean data for 36 samples (Additional file 2: Table S12) with average Pearson’s correlation of 0.85 between biological replicates (Additional file 2: Table S13). miRNAs with TPM ≥ 50 [59, 60] were counted as expressed. The data was validated by stem-loop qRT-PCR of miR530-5p (Additional file 1: Figure S12A). A total of 193 (SN) and 196 (LGR) miRNAs were expressed, belonging to 70 and 76 families, of which four and 10 families were specific to SN and LGR, respectively (Additional file 1: Figure S12B, Additional file 2: Table S14). The numbers of total and specific miRNAs expressed in each stage were more in LGR (Additional file 1: Figure S12C). This was in conjunction with lesser number of expressed genes in LGR (Additional file 1: Figure S3). LGR also had a greater number of differentially expressed miRNAs (DEMs) than SN (Additional file 1: Figure S13A). Comparison of DEMs amongst CS and same SS stages of SN and LGR showed that most miRNAs exhibited similar regulation between SN and LGR in both comparisons (Additional file 1: Figure S13B, Additional file 2: Table S15).

Further, miRNAs and their targets with negative correlation in expression patterns were identified. DEMs present throughout seed development in SN or LGR and their targets were delineated to extract miRNA-target modules pertinent to GS (Additional file 1: Figure S14A, B, Additional file 2: Table S16). Thirteen miRNAs which were upregulated in all the five stages in SN had 85 targets which were downregulated in all stages in SN (Additional file 1: Figure S14A). These included osa-miR1848, osa-miR319a-3p, osa-miR535-3p, osa-miR1872, osa-miR1874, and osa-miR5504, which have been discussed later. Conversely, in SN itself, 13 miRNAs which were downregulated in all the five stages had 41 targets with negatively correlated expression patterns (Additional file 1: Figure S14B, Additional file 2: Table S16). This set included osa-miR1432 and osa-miR397, which have been discussed. Targets of these 13 miRNAs belonged to functional categories including cell wall synthesis, amino acid metabolism, TFs, protein post-translational modification, signaling, and transport. In LGR, 21 miRNAs that were upregulated in all the five stages had 61 negatively correlated targets (Additional file 1: Figure S14A, Additional file 2: Table S16). Majority of these were non-conserved miRNAs (specific to rice). These included osa-miR529 and osa-miR396, which have been discussed. Targets of these 21 miRNAs belonged to the functional categories including amino acid metabolism, hormone metabolism, secondary metabolism, TFs, stress response, signaling, and transport. Twenty-two miRNAs were downregulated in all the five stages of LGR and had 38 targets (Additional file 1: Figure S14B, Additional file 2: Table S16). In this set, osa-miR166, osa-miR398, osa-miR159, osa-miR167i-3p, and osa-miR172a have been discussed. Targets of these 22 miRNAs belonged to several functional categories relevant to GS, including cell cycle, cell division, cell wall modification, cell organization, DNA synthesis, fermentation, and starch synthesis.

Five miRNAs were upregulated throughout seed development in both SN and LGR (Fig. 8A). They had a total of nine targets which showed negative correlation in expression pattern and were downregulated throughout seed development in both SN and LGR. Similarly, five miRNAs were downregulated throughout seed development in both SN and LGR (Fig. 8B), and targeted eight genes having negative correlation in expression. These targets belonged to functional categories which included TFs, hormone-related genes, signaling and transport, amino acid metabolism, and light reaction. These included osa-miR396, osa-miR408-3p, osa-miR408-5p, osa-miR444, and osa-miR528, which have been detailed further on. Of these, osa-miR319 and osa-miR528-5p had multiple novel predicted targets.

Fig. 8
figure 8

Expression analysis of miRNAs-targets pairs showing negative correlation in all five seed developmental stages of both SN and LGR. A Graphs showing differential expression levels (log2fc) of five miRNAs upregulated and their targets downregulated in all five stages of SN and LGR. B Graphs showing differential expression levels (log2fc) of five miRNAs downregulated and their targets upregulated in all five stages of SN and LGR. In each graph, dotted and solid lines represent miRNAs and their targets, respectively. Names of miRNAs and their targets have been mentioned in the legends in each graph

DEMs with opposite regulation between SN and LGR should also be important players in GS (Additional file 1: Figure S13B). In terms of SS, a total of 22 miRNAs were upregulated in any stage in LGR and downregulated in the same stage in SN. Similarly, five miRNAs were upregulated in any stage in SN and downregulated in the same stage in LGR. In terms of CS, nine miRNAs were upregulated in LGR and downregulated in a comparable stage in SN, while six miRNAs were regulated in a vice versa manner. Out of these, for eight miRNAs that were upregulated in LGR and downregulated in SN, nine targets showed negative correlation in expression (Fig. 9). A prominent family which was highlighted by this analysis was osa-miR2118, which has been detailed later. No targets were found showing negative correlation for miRNAs upregulated in SN and downregulated in LGR. The targets for eight miRNAs upregulated in LGR and downregulated in SN included two genes each for lipid metabolism genes and transport, and one gene each related to cytochrome P450, UDP glucosyl and glucoronyl transferases (UGTs), and biotic stress. Therefore, these miRNA-target modules exhibiting opposite behavior amongst LGR and SN might be essential regulators of GS in rice and should be examined further.

Fig. 9
figure 9

miRNA-target modules showing opposite regulation pattern in seed developmental stages of SN and LGR. Graphs showing differential expression levels (log2fc) of eight miRNAs upregulated in same stage (i.e., SN S1-LGR S1, SN S2–LGR S2) or comparable stage (i.e., SN S1-LGR S2, SN S2-LGR S3, SN S3-LGR S4, SN S4-LGR S5) of LGR and downregulated in same or comparable stage of SN, and their targets that show negative correlation in SN and LGR (downregulated in LGR and upregulated in SN in the same stage or comparable stage). IDs of miRNA and targets have been mentioned in the legends in each graph. Dotted blue line and solid lines represent expression of miRNA and targets, respectively. In each graph, negative correlation obtained between miRNA and its target in same stages and comparable stage has been marked with purple and blue bars, respectively

Rice grain development database (RGDD) allows for easy data access

To make the transcriptome and miRNome data searchable, RGDD (www.nipgr.ac.in/RGDD/index.php) has been developed. In the transcriptome tab, Locus ID from RGAP (Rice Genome Annotation Project) can be directly entered in option 1. In case the Locus ID is not known, and the user wants to search using gene functions, then option 2 should be used. In this case, search can be made using options of the gene encoding for a TF or related to a phytohormone. Transposable elements can also be selected for or eliminated using option 2. The output shows both expression (average FPKM) and differential expression (Log2 fold change) values for all stages of both SN and LGR. The output also mentions if the gene is seed-specific or not (Additional file 1: Figure S16A). In the miRNome tab, option 1 allows for entering of a known miRNA ID. Option 2 allows to select an miRNA from a dropdown list. The output displays both the expression (average TPM) and differential expression (log2 fold change) values (Additional file 1: Figure S16B).

Discussion

Contrasting genotypes, SN and LGR

In the present study, two rice genotypes contrasting for GS have been considered for transcriptome and miRNome analyses, throughout five seed development stages, with the aim to elucidate genes and pathways contributing to increase in GS. In such extensive studies, there is a possibility of data dilution by genes reflecting genotypic differences. In order to minimize this, flag leaf for both genotypes has been used as a control to calculate DEGs involved in seed development, individually for SN and LGR. Subsequently, these DEGs have been compared amongst SN and LGR to elucidate the ones with different expression patterns and levels. Here, flag leaf has served as a control to remove genes which might have expressed at high levels in both leaf and seed in a particular genotype only. If not eliminated by this method, these genes would have been highlighted as DEGs and would be due to genotypic differences, and not represent an actual role in GS. The comparative methodology used in the present study has often been used to examine two plant varieties contrasting for a given trait. Temporal transcriptome analysis for cold tolerance, in two contrasting cultivars of tobacco, Tai tobacco (TT, cold susceptibility) and Yan tobacco (YT, cold resistance), has been used to identify DEGs in both cultivars after comparing with the corresponding control (without cold treatment) for each cultivar [61]. Comparative transcriptomics study in the root and shoot tissues of N-efficient (PBW677) and N-inefficient (703) cultivars of wheat have revealed the genes that regulate nitrogen use efficiency [62]. PBW677 has considerably more abundant DEGs compared to PBW703. Two contrasting peanut cultivars, Zhonghuahei 1 and Zhongkaihua 151, with high and low free amino acids in mature seeds, respectively, have been compared by metabolomics and transcriptomic approaches to identify the regulatory network of amino acid metabolism [63]. Recently, a study integrated metabolome with transcriptome analyses in order to have a better understanding of the metabolite profiles and molecular mechanisms regulating different cane traits, namely, brix, rind color, and textures in the stems and leaves of contrasting sugarcane varieties FN41 and 165402 [64]. Two contrasting teak cultivars, T. grandis “** rice grain transcriptome using the agilent microarray platform. Methods Mol Biol. 2019;1892:277–300." href="/article/10.1186/s12915-023-01577-3#ref-CR67" id="ref-link-section-d343189e2129">67]. Transcriptome of whole rice grains has been used to study grain filling under soil drying conditions [68]. Entire panicles have been used for studying transcriptome rhythms due to warm night temperature [69]. Transcriptome analyses have been performed on mutant and wild-type 12 DAP whole grains to understand role of Fernonia-like receptor genes in GS [70]. Complete caryopsis has been used to examine the transcriptome and phytohormone changes contributing to grain chalkiness [57, 81]. miR1874 was upregulated throughout seed development in both SN and LGR (Fig. 8A). It specifically expresses in develo** seeds of both japonica cultivar Nipponbare [82] and Baifeng B, an indica landrace [83]. Similarly, miR528 which is highly expressed in Nipponbare grains [81], is also highly expressed in both SN and LGR (Additional file 2: table S14). Another case is of miR396, which is a negative regulator of GS in different genetic backgrounds in both indica and japonica rice [84]. Hence, it will be interesting to extrapolate the information generated in the present study to other genotypes and identify molecular markers of rice grain development and GS control.

Transcriptome changes regulating GS

Transcriptome transitions control developmental progression. The morphological data of seeds and endosperm sections showed the full seed size and weight were obtained later in LGR (Fig. 1, Additional file 1: Figure S1). Developmental events also occurred later in LGR seed development. Hence, it can be hypothesized that slower progression of seed development might positively affect GS. To elucidate genes/processes responsible for this, the transcriptomes of SN and LGR during seed development were compared. Amongst the pathways with ≥ 50 genes with log2fc ≥ 10 (Additional file 1: Figure S4B), AGPs (under cell wall category) and GDSL motif lipases were more in LGR. AGPs are glycoproteins present on the cell wall and function in plant growth and development, including cell division and expansion. They express in various rice tissues, including seeds and seedling that are subjected to rapid changes in cell morphology [85, 86]. GDSL are multifunctional enzymes involved in fatty acid metabolism during seed germination and have been found to be downregulated in RGE1 mutants of Arabidopsis which exhibit smaller seeds [87, 88]. Abundance of these proteins in LGR might indicate their involvement in cell size increment. SSPs and LEAs were abundant in both SN and LGR. This is because SSPs are second major storage products in rice [14, 89]. LEA proteins form up to 4% of total cellular proteins in seeds and are associated with imparting drought tolerance during seed drying phase [90, 91]. The comparative transcriptome data shows an insight into the genes and pathways that might be responsible for GS increment.

LGR S3 stage is important for increase in GS

CS and SS pair-wise comparisons for oppositely regulated DEGs, categorized into the same pathways (Fig. 2D). This could imply that similar functions could be modulated by different genes in the two genotypes. LGR had higher genes in the DNA synthesis category. DEGs had an LGR-specific enrichment of cell cycle and DNA metabolism (Additional file 1: Figure S6A), and the number of cell cycle-related DEGs were highest for S3 stage of LGR (Fig. 3B). This was the stage where the endosperm cell size increased in LGR (Fig. 1I, xv, xvi), unlike SN. DEGs preferential to LGR S3 (Additional file 1: Figure S15) had a high number of cell cycle-related genes. Also, 38 LGR S3 stage-specific or preferential DEGs were located within QTLs governing GS. S3 stage in LGR overlapped with endoreduplication phase during cereal seed development [92, 93]. Expression of endoreduplication-related genes peaked in LGR S3 stage (Additional file 1: Figure S6C). Direct correlations exist between endoreduplication and cell size in plants [94]. It is known that endoreduplication increases cell size by increasing nuclear volume, driving cells to increase cell volume to maintain nucleo-cytoplasmic ratio [95], and by increasing gene expression to enhance reserve accumulation [96]. Hence, it can be hypothesized that higher endoreduplication levels in LGR might be positively contributing to GS, which can be verified further.

The process of grain filling, once completely understood, can be used to increase yield [97]. After starch, the second most abundant nutrient component of rice grain are SSPs [98]. Quality of SSPs and starch have a direct influence on each other [99, 104] and cellulose synthesis to mediate organ elongation in plants under different stimuli [105, 106]. Phosphorylation of fructose after cleavage of sucrose by fructokinases drives carbon metabolism towards starch synthesis and respiration. OsFKII, a rice fructokinase gene, expresses at high levels in endosperm [107] suggesting involvement in starch accumulation. Mutation in rice plastid phosphorylase gene, Pho1, reduces starch synthesis and causes abnormal seed morphology [108]. There are innumerable examples to show that GS and starch content have a direct correlation and starch has a forbearance on grain quality [109,110,114]. Hence, the genes related to carbohydrate biosynthesis and metabolism and SSP biosynthesis play an important role in control of GS and should be targeted for crop improvement through molecular breeding.

Regulators of GS

TFs regulate grain size and shape [55]. When the genes participating in/coding for TFs, hormone-related, cell cycle and growth, SSPs and carbohydrate pathway (Additional file 1: Figure S15) were sorted on the basis of their differential expression or specificity, amongst the seed-specific genes commonly expressed in all seed stages of SN and LGR, maximum genes were TFs indicating their regulatory importance during seed development. TF families (≥ 10 members) included Myb, AP2, NAC, zfC2H2, bZIP, Homeobox, MADS, B3, PHD, and Aux/IAA, in that order. For DEGs with log2fc ≥ 10, TF encoding genes were most abundant, again, highlighting their regulatory importance. Genes from many of these families regulate seed development [12, 37, 38, 51, 53]. Comparison of expression of TFs between SN and LGR showed that members of bZIP, NF-YC, C2H2, GATA, MADS, FHA, NF-YB, PHD, LBD, PLATZ, SBP, WRKY, and DOF families might regulate GS in various capacities and are also located within QTLs (Fig. 6, Additional file 2: Table S11). Of these, RISBZ1, a bZIP TF is involved in starch synthesis in rice seeds [115]. bZIPs regulate amylose biosynthesis in wheat grain [116]. NF-Ys regulate rice grain quality [117]and control endosperm development [118]. OsNF-YB1 regulates nutrient transport via sucrose transporters in endosperm during grain filling [81]. Conserved miRNAs included osa-miR529 and osa-miR396, which are involved in seed size regulation [151, 152]. In the contrasting set of miRNAs downregulated in all stages of LGR, several miRNAs associated with seed size and grain filling were present, including osa-miR166 [153], osa-miR398, and osa-miR159 [154]. Few of the targets were cyclin-related proteins (targeted by osa-miR167i-3p) and FtsZ2-1 (targeted by osa-miR172a). OsFtsZ2 is homologous to bacterial cytokinesis-related FtsZ and is implicated positively in amyloplast division in rice [44]. There was also an expansin precursor, which mediates cell wall expansion [155]. These miRNAs are downregulated throughout seed development in LGR to enhance GS by promoting numerous aspects of seed development, including cell proliferation, cell elongation, energy production, and grain filling [9, 44, 96, 134]. Thus, new potential miRNA-target modules identified here should be explored further to establish their roles in GS regulation.

miRNA-target pairs with similar differential regulation in both SN and LGR

miRNA-target modules with similar regulation throughout both SN and LGR (Fig. 8) can be pivotal regulators of seed development. miR396a-3p is one such molecule. There are 9 members in osa-miR396 family. Of these, miR396c, miR396e, and miR396f are negative regulators of grain length, width, and weight [158]. MIM396-5p plants, where all of the above are downregulated, have increased grain length but decreased grain width [84]. Grains of Baifeng B (an indica landrace) and japonica cultivar Zhongua 11 show high expression of miR396 [57, 83]. In SN and LGR, miR396a-3p was upregulated throughout grain development, though to a higher extent in SN (Fig. 8A). Its putative target is novel. Hence, miR396a-3p might be an important regulator of rice seed development and should be explored further. Also, in Arabidopsis leaf, the levels of miR396 limit cell proliferation [159], indicating a similar role in grain. Additionally, in our data, 13 miRNAs belonging to this family (including -3p and -5p forms) are differentially expressed, and regulation of GS might be a combination of these. miR396 functions through miR408, a positive regulator of grain size [84]. In concordance, both miR408-5p and miR408-3p were downregulated throughout grain development in SN and LGR (Fig. 8B). However, it is known that osa-miR408-3p targets constitutively expressed BRD2/LTBSG1 and is involved in BR-mediated cell elongation in rice seeds [160]. Often genes regulate multiple aspects of rice grain development, including grain size, starch, and seed storage protein biosynthesis. Since miR396/miR408 module is differentially regulated throughout seed development, and with novel targets, it is possible, these miRNA-target pairs are major regulators of rice grain development, and need to be explored further. In addition, osa-miR319b which was commonly upregulated during seed development in SN and LGR (Fig. 8A) is known to target OsCAF2, which regulates normal chloroplast development [161]. This suggests suppression of chloroplast development in rice seeds. miR319 has been shown to target TCPs in leaf development [162]. It is highly upregulated in our data (Fig. 8A), and with novel targets, hinting at new avenues to explore. miR444 is known to regulate tillering and ovule development [163, 164]. It is downregulated in both SN and LGR (Fig. 8B) though to varying extents. miR528 is known to control flowering time, pollen formation, and plant height [171, 172]. GPDH is involved in energy production from gluconeogenesis. Inhibition of sucrose production via gluconeogenesis triggers compensated cell enlargement in Arabidopsis cotyledons [173], and hence, these targets might have a role in GS regulation. Four members of miR2118 family and their targets were oppositely regulated between SN and LGR in either CS or SS stages (Fig. 9). miR2118 is essential for proper reproductive development by formation of anther wall [174]. Since its members are upregulated in LGR stages, this family can be studied further for their role in GS control.

Briefly, it appears that genes associated with BR signaling, cell expansion, and stress tolerance experience opposite regulation, via miRNAs, in SN and LGR. Furthermore, miRNAs with opposite regulation between SN and LGR, seem to favor cell enlargement by modulating BR signaling and lipid metabolism [168, 173], thereby increasing GS in LGR. In addition, the localization of these targets on QTLs associated with GS strengthens the probability of their involvement in the process. Thus, novel miRNA-target modules identified for SN and LGR can provide suitable avenues for future studies in rice GS regulation.

The Domino effect model of seed size regulation

This extensive comparative morphological, histological, and transcriptome analysis of SN and LGR seeds throughout seed development concludes with a “Domino effect” model of seed size regulation (Fig. 10), emphasizing significance of the chronology of seed developmental events in governing GS. Just as falling of one domino triggers the next one, completion of one seed development event initiates the next. This process is strictly overseen by TFs, hormonal interplay, and miRNA regulation. Stage-wise comparison of the transcriptome of five seed stages showed that any given stage of LGR was most similar to the preceding stage in SN, also validated by endosperm sections. Delayed cellularization in LGR indicated a longer period of initial cell division phase (Figs. 1 and 2), supported not only by enrichment of cell cycle-related genes in its transcriptome, but also continuation of cell cycle till S3 stage (Fig. 3). Maximum increment in seed weight in SN and LGR occurred in S3 and S4 stages, respectively, indicative of delayed storage reserve accumulation in LGR (Figs. 1H, 4 and 5), as a consequence of predominant expression of carbohydrate and SSP biosynthetic genes from SN S2 and LGR S3 stages onwards. Storage reserve accumulation occurs after cellularization [13]and enhances seed size and weight [23, 119]. As cellularization is prolonged in LGR, accumulation of storage compounds is procrastinated (Fig. 10). Cells in LGR spikelets were larger in size (Fig. 1; Additional file 1: Figure S1A), suggesting enhanced cell expansion. Subsequently, enhanced endosperm cell size and prominent nuclei were apparent in S3 stage, marking it as the period of cell elongation, coinciding with endoreduplication phase [12]. Markers of endoreduplication showed peak expression at LGR S3 (Additional file 1: Figure S6C). Additionally, LGR S3 transcriptome was most unique in comparison with other stages (Figs. 2B and 3), signifying intense transcriptome reprogramming to increase seed size. Thus, larger seed size in LGR appears to be a result of enhanced cell expansion via endoreduplication during S3 stage (Fig. 10). Lastly, PCD started early in SN endosperm cells (Fig. 1I and Additional file 1: Figure S1C), induced probably by expression of positive regulators of PCD from S2 stage onwards (Additional file 1: Figure S8B and C). Precocious cellularization and PCD are known to reduce GS in rice [25, 26], thus restricting GS in SN, and in turn supporting the postulated “Domino effect”. Essential events occurring during seed development, namely cell cycle, cell expansion, storage accumulation, and PCD, appear to be modulated by temporal regulation of phytohormones in the two genotypes creating differences in GS (Figs. 7 and 10). Moreover, extensive miRNA regulation of genes throughout seed development (Fig. 8) adds another level of regulation in this process. Our study also suggests the presence of new miRNA-target modules that need to be functionally validated for their roles in rice seed development. The postulated Domino effect model is also supported by the transcriptome analyses of seed development in IR64, which has a medium-sized grain [14]. Thus, a “Domino effect” influences seed development wherein one process/pathway is overlapped by the next one, and it is the extent of one process that determines the occurrence of subsequent one, thereby regulating seed size.

Fig. 10
figure 10

The “Domino effect” model of seed size regulation. Diagram represents progression of the major seed developmental events in SN and LGR. Upper and lower panels represent SN and LGR, respectively (as mentioned on the left of the diagram). S1–S5 represent five stages of seed development. Orange, red, green, violet, and blue bars represent progression of cell cycle, cellularization, endoreduplication, storage reserve accumulation, and PCD during seed development (as indicated in the color legend). Lines represent accumulation pattern of hormones during seed development in SN and LGR based on genes showing different expression pattern in the two genotypes. Pathways specific or preferential to S3 stage in LGR contributing to enhanced cell size have been mentioned between the dotted red lines below S3 stage. Comparable seed developmental stages have been connected with dotted gray lines

Conclusions

Comparison of transcriptome of five seed development stages from SN and LGR highlights the importance of S3 (5–10 DAP) stage in LGR, for increment of rice grain size. S3 stage of LGR has the most unique transcriptome amongst all comparisons. This is the stage where maximum number of cell cycle genes specifically express, and the increment in total protein content is highest. All events of seed development, including grain filling, occur later in LGR. Genes involved in phytohormone pathways (136 genes) and members from nine transcription factor families contributing to temporal changes have been elucidated. The DEGs underlying the QTLs will have functional relevance for genetic dissection of GS trait in rice. Novel miRNA-target pairs which might contribute to seed development or GS increment have been determined. Out of these five miRNAs show upregulation throughout seed development in both SN and LGR and target nine genes. Also, five miRNAs are downregulated throughout seed development in both SN and LGR and target eight genes. Eight miRNAs and their nine targets have opposite regulation between SN and LGR and could potentially regulate GS. The analyses have led us to propose a “Domino effect” model for rice grain increment. In this, the attainment of completion of one step of grain development triggers the next one. Since each event is slower in LGR, there is a temporal lag leading to increased cell size and subsequently higher grain filling, and eventually bigger grain size. The expression data for all genes and miRNAs from this study are available to RGDD. Availability of such information on a single platform will not only be useful for rice yield enhancement but can be extrapolated to other crops as well.

Methods

Plant materials

Long-grained and short-grained indica rice, LGR [66], and Sonasal (SN), respectively, were grown in the kharif season in the field conditions at NIPGR, New Delhi, India. Once panicle emergence started, individual panicles were observed for anthesis. Panicles took 3–4 days to complete emergence and anthesis. Each panicle on the plant was tagged on the day of its anthesis (by mentioning the date) before noon, particularly the region where freshly dehisced anthers are visible, as the process follows a basipetal direction. The pollinating spikelets on the lowermost part of the panicle were often left untagged, due to staggered anthesis. The day of anthesis was counted as 0 DAP, as pollination occurs in a few hours. The tagged panicles were left on the plant to mature. Individual seeds were separated and collected for each DAP, as they matured and harvested in liquid nitrogen. While harvesting, empty seeds were discarded. Seeds from each DAP were stored at − 80 °C. At the time of RNA isolation, equal weights of seeds for each DAP were pooled according to the categories, S1 (0–2 DAP), S2 (3–4 DAP), S3 (5–10 DAP), S4 (11–20 DAP), and S5 (21–29 DAP). Each biological replicate was made from a separate pool of seeds. Dried mature seeds of SN and LGR were harvested for estimation of grain length, width, and weight.

Seed trait measurements

Freshly harvested seeds of 0, 1, 2, 3, 4, 7, 10, 13, 18, 21, and 29 DAP were collected in 30% ethanol. Images of husked and dehusked seeds of SN and LGR were captured at × 4 magnification under a stereozoom microscope (AZ100, Nikon; Japan). Grain length and width were quantified per seed by taking average of 100 seeds using WinSEEDLE™ software (Regent Instruments Inc.; Canada). One thousand-grain weight was measured in biological triplicates. Grain filling rate was estimated by measuring fresh weight of 15 seeds in triplicates from each DAP after removing stalks and awns. The measurements from each DAP constituting a stage were added to obtain the weight of 15 seeds/stage.

SEM analysis of spikelet

Freshly harvested spikelets belonging to 7 DAP harvested from SN and LGR were immersed completely in FAE fixative solution [10% formaldehyde, 5% acetic acid, and 50% absolute ethanol] in 15 ml Borosil® glass vials. These samples were vacuum infiltrated for 30 min and incubated at 4 °C overnight in dark. The tissue samples were subjected to dehydration series with increasing gradient of ethanol (60%, 70%, 80%, 95%, and 100%) at room temperature for 1 h. The outer surface cells of lemma were observed under EVO® LS10 scanning electron microscope (ZEISS, Germany) at × 500 magnifications. The images were analyzed using ImageJ [175] to estimate cell length and cell area of the outer epidermal cells in the middle region of the lemma.

Histological study

Freshly harvested seeds of 4, 5, 7, 9, and 11 DAP were dehusked, fixed, and dried in ethanol series (as mentioned in the previous section). Ethanol was gradually replaced with xylene by kee** the tissues in increasing concentration of xylene and decreasing concentration of ethanol in the percentages of 25:75, 50:50, and 75:25. Next, two rounds of treatment with 100% xylene were given to the seeds at room temperature for 1 h on the rocker, followed by infiltration with paraplast X-TRA (Sigma-Aldrich; USA) by adding molten paraplast every 2–3 h at 60 °C for 3 days. The wax-infiltrated tissues were then embedded into molds (Yorko®; India), and 10-μm sections were cut for 4, 5, and 7 DAP and 15-μm sections were cut for 9 and 11 DAP using rotary microtome (Leica Biosystems, Germany). Paraplast was removed from the sections by immersing the slides in HistoChoice® clearing agent (Sigma-Aldrich; USA) for 1 h. Sections were stained with 0.01% toluidine blue-O for 10 min or 0.1% Coomassie brilliant blue/CBB [0.25% CBB in 50% methanol and 10% acetic acid] for 20 min or 2% I2/KI solution (2 g KI and 0.2 g Iodine in 100 ml MQ) for 2–5 min. The sections were mounted using D.P.X. mountant (Himedia®, India) and were visualized under light microscope (Eclipse 80i, Nikon; Japan) at × 20 magnification.

Total RNA isolation from seed tissue and flag leaf

Total RNA was isolated from five seed developmental stages of the two rice genotypes LGR and SN, namely, S1 (0–2 DAP), S2 (3–4 DAP), S3 (5–10 DAP), S4 (11–20 DAP), and S5 (21–29 DAP) using a seed-specific protocol [176] with few modifications, as previously described [37, 52, 80, 177]. Briefly, 100 mg tissue was taken by pooling equal amounts of seed tissue from the respective DAP constituting a stage. The tissue was ground to a fine powder in liquid nitrogen and mixed with extraction buffer [50 mM Tris–HCl pH 9.0, 20 mM EDTA pH 8.0, 150 mM NaCl, 1% N-lauryl sarcoyl (sodium salt), and 5 mM DTT], followed by phenol:chloroform:isoamyl alcohol (25:24:1) treatment. Then, GH buffer [8 M guanidine hydrochloride, 20 mM 2-[N-Morpholino] ethanesulfonic acid, 0.5 M EDTA, 50 mM β-mercaptoethanol] and phenol:chloroform:isoamyl alcohol were added to the supernatant, followed by chloroform treatment. RNA was precipitated by adding 3 M sodium acetate (pH 5.2) and twice volume of chilled ethanol. The pellets were washed twice with 70% ethanol, dried, and dissolved in 40 μl DEPC-treated deionized water. RNA from flag leaves of SN and LGR were isolated using TRI Reagent® Solution (Invitrogen™; USA) as per manufacturer’s protocol. DNase treatment was given to the RNA samples using RNase-Free DNase Set (Qiagen; Germany). The purity and concentration of RNA samples were checked by NANODROP 2000c Spectrophotometer (Thermo Scientific; USA) and on 1% denaturing agarose gel in 1 × MOPS buffer [400 mM MOPS, 99.6 mM sodium acetate, 20 mM EDTA] with 1.1% formaldehyde. Integrity of RNA samples was checked using 1 μl RNA sample (25–500 ng/μl) by Agilent 2100 Bioanalyzer.

RNA-Seq library preparation and sequencing

The RNA isolated from the seed and the leaf samples (in biological triplicates) were used for cDNA library preparation according to the TrueSeq® RNA Sample Preparation v2 Guide (Illumina®; USA) according to the manufacturer’s protocol as described previously [53, 80]. Briefly, poly-A mRNA purified using oligo(dT)-attached magnetic beads were fragmented and primed, followed by double-stranded cDNA synthesis using SuperScript II Reverse transcriptase (Invitrogen™; USA), Second Strand Master Mix, End Repair Mix and Resuspension Buffer. Blunt-ended ds cDNA fragments were then adenylated and adapters were ligated. Following this, DNA fragments were enriched by PCR and paired-end sequencing was performed using Illumina HiSeq™ 2000.

Whole transcriptome sequencing data analysis

Data analysis was done as mentioned earlier [53, 80]. Low-quality reads were trimmed using Cutadapt (v1.8.1) (Martin, 2011). The unwanted sequences, including non-polyA tailed RNA, rRNAs, tRNAs, adapter sequences, mitochondrial genome, were removed using Bowtie2 (v2.1.0) (Langmead, 2010), in-house perl scripts and picard tools (v1.85). The clean reads thus obtained were used for expression and differential expression analysis. The clean reads were aligned to the reference genome (MSU 7) by TopHat (v2.0.8) [178]. The uniquely mapped reads were used for estimation of gene expression using Cufflinks program (v2.0.2) [179]. Pearson’s correlation coefficient and PCA between the biological replicates was calculated and visualized using corrplot and pca3d packages of R (version 3.2.0; https://cran.r-project.org/) to estimate the relatedness between the biological replicates and the tissue samples. The normalized gene expression data was represented as FPKM (fragments per kilobase per million of reads mapped). All genes with an FPKM ≥ 1 were counted as expressed and considered for downstream analyses after removal of transposable element (TE)-related genes. Differential expression in the seed developmental stages was calculated by Cuffdiff program (v2.0.2) [180] with respect to flag leaf at p-value ≤ 0.05 and q-value ≤ 0.05. DEGs during seed development in SN and LGR were determined with respect to flag leaf (as vegetative control) from each genotype. Stringent cutoffs of FPKM value ≥ 1, log2fold change (log2fc) value ≥ 1 and q-value ≤ 0.05 were used for identification of DEGs. GO enrichment analysis of the expressed and DEGs was performed using agriGO software (v1.2) [181] and BiNGO plugin [182] of Cytoscape (version 3.4.0). Heatmaps and k-means cluster diagrams were prepared using MeV_4_6_0 [183] employing Euclidean distance method. Pathway annotations were performed using MapMan software (version 3.6.0RC1) [184]. Functional annotation of genes was performed using RGAP version 7 and funRiceGenes (https://funricegenes.github.io/). Bubble plots were prepared in R software using ggplot2 (Wickham, 2016) and reshape2 [185] packages.

Real-time PCR validation

cDNA for qPCR validation were prepared from the RNA isolated from the five stages of seed development and flag leaf in two biological replicates as detailed previously, using Superscript™ III first-strand cDNA synthesis kit (Invitrogen™; USA) [37, 52, 53, 80, 177]. Primers for real-time PCR were made from the unique regions of the selected genes using Primer Express 3.0. miRNA sequence was taken from miRBase (http://mirbase.org). The assay was carried out with Fast SYBR® Green Master Mix (Applied Biosystems; USA) as mentioned previously [177]. Real-time PCR was done on 7500 Fast Real-Time System (Applied Biosytsems; USA), and 7500 software v2.0.1 was used for data analysis. Housekee** gene, OsACT1, was used as the endogenous control for real-time PCR. Fold change was calculated by ΔΔCT method. List of primers has been given in Additional file 2: Table 17.

QTL map**

Molecular map** of QTLs was performed to establish the correspondence amongst grain size/weight QTLs with the genes having pronounced expression especially during seed developmental stages in rice. For this, SNPs exhibiting differentiation between high (40 g) and low (10 g) 1000-grain weight parental genotypes of a F5 map** population (LGR × Sonasal) were genotyped using the genomic DNA of 286 map** individuals using the Illumina Infinium assay (https://www.illumina.com). The SNP genoty** information showing goodness-of-fit towards the expected Mendelian 1:1 segregation ratio was analyzed using the JoinMap 4.1 (http://www.kyazma.nl/index.php/mc.JoinMap) at a higher LOD threshold (3.0) with Kosambi map** function. SNPs were mapped into defined linkage groups/chromosomes according to their centiMorgan (cM) genetic distances and physical positions (bp). The individuals along with parents of a map** population were phenotyped for 2 years, for grain size/weight grown in the field at NIPGR, New Delhi. To identify grain size/weight QTLs, the genoty** data of SNPs mapped on 12 rice chromosomes was correlated with grain size/weight trait phenotypic data of map** individuals and parental genotypes using composite interval map** function of MapQTL 6 at a LOD threshold score > 2.5 with 1000 permutations (p < 0.05 significance). The phenotypic variation explained by each major grain size/weight QTL at a significant LOD was estimated. Further, the genes showing high differential expression in SN and LGR in each analysis were delineated. The physical positions of these genes were matched with the QTL genomic regions. The ones overlap** with a QTL were separated.

Small RNA library preparation

Total RNA was isolated from the five seed developmental stages and flag leaf tissue of SN and LGR. cDNA library was prepared according to the TrueSeq™ Small RNA library preparation kit (Illumina®; USA). Briefly, adapters were ligated sequentially at 3′ and 5′ end of the RNA, respectively. cDNA was prepared to selectively enrich the adapter ligated RNA fragments by performing PCR with two primers that anneal to the ends of the adapters. Next, the cDNA prepared was amplified by PCR and indexed with RNA PCR primer 1 (RP1) and RNA PCR primer Index (RPIX), respectively. The cDNA libraries were gel purified, and the quality of the libraries was checked by running 1 μl of the products on Agilent tape station with DNA HS Screen tape. The obtained libraries were then normalized to 2 nM concentration for cluster generation on Illumina sequencing platforms by adding Tris HCL 10 mM, pH 8.5. Following this, single-end sequencing was performed using Illumina MiSeq® system.

Small RNA sequencing and data analysis

Low-quality reads were trimmed using Cutadapt (v1.3) (Martin, 2011). The adapter removed reads were aligned against siRNA, snRNA, snoRNA, tRNA, and rRNA using nc-RNA databases (siRNAdb; http://sirna.sbc.su.se/sirnadb_050915.txt; NCBI Genbank; http://www.ncbi.nlm.nih.gov/genbank/; deepBase; http://deepbase.sysu.edu.cn/download.php; GtRNAdb; http://gtrnadb.ucsc.edu/; Rfam; http://rfam.xfam.org/) with Bowtie2 program v2.1.0 [186]. The unaligned clean reads of 17–35 bp length were used for miRNA identification by aligning to mature miRNAs of Oryza sativa in miRBase release-21 (http://www.mirbase.org/) using Bowtie program (version 0.12.9). The expression data of miRNAs were normalized using the TPM (transcripts per million) method. The differential expression analysis of the miRNAs was estimated using a negative binomial method with the DESeq package [187] of R (https://cran.r-project.org/) with default parameters. The targets of the known miRNAs were identified using psRNA Target [188] tool with default parameters.

Stem-loop qRT-PCR

Stem-loop qRT-PCR was done to detect the levels of miR530-5p in SN and LGR seed developmental stages (S1–S5). Total RNA was isolated from S1–S5 stages and flag leaf of SN and LGR as mentioned above. The total RNA was purified using acid phenol (Ambion®; Naugatuck, CT, USA) as per manufacturer’s protocol. For stem-loop qRT-PCR, miR530-5p specific cDNA was synthesized using its specific stem-loop reverse transcription primer (miR530-5p SL primer) (Additional file 2: table 17). This cDNA was synthesized from 200 ng of total RNA by using Superscript™ III first-strand cDNA synthesis kit (Invitrogen™; USA) according to the manufacturer’s protocol. This cDNA was used for qRT-PCR assay using miR530-5p forward primer and a universal reverse primer. The conditions for qRT-PCR were maintained same as mentioned previously. snRNA U6 was used as an internal control. Three biological replicates were used for the assay and significance was calculated using Student’s t test with p value˂0.001 denoted as double asterisk (**).

Total starch and protein isolation and quantification

A total of 100 mg of develo** rice seeds from each of S1–S5 stages of SN and LGR were finely ground to powder in liquid nitrogen and incubated for 10 min at 4 °C after homogenizing with 1 mL of lysis buffer (20 mM Tris pH 7.6, 150 mM NaCl, 1 mM EDTA, 1 mM DTT, 1 mM PMSF). The insoluble debris was removed from the mixture by spinning at 100 g for 10 min at 4 °C. Supernatant was centrifuged at 12,000 g for 10 min at 4 °C to obtain the soluble protein fraction [189]. Total isolated proteins from each sample were quantified according to the Bradford method [190] (Amresco M173-KIT).

Seeds from all developmental stages (S1–S5) of SN and LGR were crushed using liquid nitrogen. One hundred milligrams of the crushed sample was used for isolating starch. Starch isolation and determination was done by using starch assay kit (Megazyme, Wicklow, Ireland, http://www.megazyme.com/), according to the manufacturer’s protocol. Starch estimation was done in two independent biological replicates.

Database development

RGDD has been developed using Tableau Software (https://www.tableau.com/), which is a leading data visualization software used for reporting and analyzing vast volumes of data. The background data input for this database was in CSV format. It was obtained subsequent to the abovementioned transcriptome and miRNome analyses. The programming was done within Tableau Software itself.