Background

Rice is a model plant species for which many genetic and genomic resources have been developed. These resources include high-quality genome sequence information (Goff et al. 2010). These resources have contributed to remarkable advances in rice functional genomics during the last two decades, and many genes have been functionally characterized (Jiang et al. 2009; Emanuelli et al. 2010). For these approaches, it is necessary to make the list of candidate genes involved in the trait of interest readily available for individual experimental design. It is also important that the genomic locations of functionally characterized genes can be readily compared with the location of QTLs involved in the same trait. Rice databases such as Gramene (Youens-Clark et al. 2011) and Oryzabase (Kurata and Yamazaki 2006) include information on gene function from published research. However, it is necessary to rearrange the data provided by these databases for carrying out the abovementioned approaches. We also found that several functionally characterized genes are not included in those databases, probably because information on such genes was published in agronomy and breeding journals rather than in genetics, genomics, or molecular biology journals.

In this study, our goal was to facilitate the application of gene function information to the study of natural variation in rice. To accomplish this, we comprehensively searched for articles related to rice functional genomics and established a list of functionally characterized genes. Information on each gene was summarized to facilitate direct comparison with QTL information from Q-TARO (Yonemaru et al. 2010). We also compared the genomic locations of functionally characterized genes and QTLs. The information on functionally characterized genes obtained in this study was compiled in a new database, the O verview of Functionally Characterized G enes in R ice O nline database (OGRO), which is located on the Q-TARO website (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/ogro).

Results and discussion

Extraction of information on functionally characterized genes in rice

To establish the list of functionally characterized genes in rice, we conducted a comprehensive search for articles related to rice functional genomics, and we extracted information on gene function by manually checking every article identified in the search. As of 31 March 2012, 702 functionally characterized genes were annotated based on the information from 707 articles. The categories of information extracted for each of the functionally characterized genes are listed in Table 1. The list of functionally characterized genes includes seven microRNAs (miRNAs) that have been associated with specific phenotypes (** of epigenetic modifications of the rice genome uncovers interplay between DNA methylation, histone methylation, and gene expression. Plant Cell 2008, 20: 259–276. 10.1105/tpc.107.056879" href="/article/10.1186/1939-8433-5-26#ref-CR25" id="ref-link-section-d112793985e660">2008).

Table 1 Information on functionally characterized genes extracted from each article
Figure 1
figure 1

Overview of the functionally characterized genes in rice. (A) Genomic distribution of the 702 functionally characterized genes compiled during this study. The position of each gene is indicated by a horizontal bar; the color indicates the major category for that gene. Gray vertical bars to the right of each chromosome indicate heterochromatic regions (Cheng et al. 2001; Li et al. 2008). (B) The proportions of genes isolated by each method. (C) Numbers of functionally characterized genes in each trait category (total and by each of the methods listed in B).

There are 44 755 gene loci, excluding transposable elements (TEs) and ribosomal protein or tRNA loci, in RAP (Rice Annotation Project 2008; http://rapdb.dna.affrc.go.jp/), and 491 miRNA loci in release 18 miRbase (Griffiths-Jones et al. 2008; http://www.mirbase.org/). The functionally characterized genes compiled during this study represent only 1.6% of these loci. In Arabidopsis, a model dicot species, 5826 genes have been functionally characterized, accounting for more than 20% of the gene loci in this species (Lamesch et al. 2012). Considering both the number and the proportion of functionally characterized genes in Arabidopsis, it seems that the functional characterization of rice genes is far from complete.

For the gene information item "method of isolation" (Table 1), the genes identified by using cultivars, landraces, or wild relatives were described as "natural variation". Among the 702 functionally characterized genes, 11% (80 genes) had been identified through natural variation. Another 41% (286 genes) were identified by mutant analysis, and 48% (336) were identified by using transgenic plants (isolation method classified as "overexpression", "knockdown", "knockdown/overexpression", or “others”; Figure 1B). This breakdown indicates that both forward- and reverse-genetics approaches are valuable methods in rice functional genomics.

We annotated the functionally characterized genes based on the phenotypes described in each of the articles (Table 1). The phenotypes related to each gene were classified into "major category" and "category of objective character" (Table 1). These categories are identical to those used in Q-TARO (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/). Genes associated with multiple traits were counted within each relevant category.

The number of functionally characterized genes within each category is shown in Figure 1C. The variability in the number of functionally characterized genes among the different categories (Figure 1C) probably reflects the agronomic importance of each trait and the interests of individual researchers rather than the actual number of genes involved in each trait. In the major category "resistance or tolerance", transgenic approaches ("overexpression", "knockdown", and "knockdown/overexpression") were used for functional analysis more frequently than for genes in the major categories "morphological trait" and "physiological trait" (Figure 1C). This difference might be due to the difficulty in screening mutant and natural populations for traits related to resistance or tolerance. Within the major category "resistance or tolerance", most of the genes in the categories "cold", "drought", and "salinity" were characterized by overexpression analysis (Figure 1C). The overexpressing plants often showed pleiotropic effects such as growth retardation (Abbasi et al. 2004; Ye et al. 2011). To survey whether functionally characterized genes were also arranged in such clusters, we calculated the distribution of functionally characterized genes and compared it with the genomic locations of the QTL clusters (Figure 3). In this comparison, we also included the gene density of RAP loci (Rice Annotation Project et al. 2008; http://rapdb.dna.affrc.go.jp/). There was good correspondence between the genomic locations of functionally characterized genes and RAP locus gene density (Figure 3). Furthermore, functionally characterized genes and QTLs also showed high co-localization (Figure 3), indicating that QTLs tended be located in regions of high gene density. Regarding the genetic basis of the QTL clusters, two main possibilities are generally considered: the pleiotropic effects of one or a few genes, or the effects of multiple genes that are tightly linked to one another. Several genes responsible for QTLs have been reported to have pleiotropic effects; for example, SCM2 is involved in panicle architecture, culm length, and culm mechanical strength (Ookawa et al. 2010), and IPA/WFP is involved in panicle architecture, panicle number, and culm mechanical strength (Jiao et al. 2010; Miura et al. 2010). However, when we examined the genomic location of QTL clusters and genes identified by using natural variation, we found that the QTL clusters often contained multiple genes identified by using natural variation (Figure 3). For example, on the long arm of chromosome 1, which contains the largest QTL cluster region, there were four genes that had been identified by using natural variation: Pi37 for blast resistance (Lin et al. 2007), qSH1 for seed shattering (Konishi et al. 2006), qNPQ1-2 for photosynthetic capacity (Kasajima et al. 2011), and sd1 for culm length (Sasaki et al. 2002). On the short arm of chromosome 6, the location of the second-largest QTL cluster region, there were eight genes that had been identified by using natural variation: wx (Wang et al. 1995) and alk (Gao et al. 2011b) for eating quality, Hd3a (Kojima et al. 2002) and Hd1 (Yano et al. 2000) for heading date, DPL2 (Mizuta et al. 2010) and S5 (Chen et al.

Figure 4
figure 4

Screen shots of the O verview of Functionally Characterized G enes in R ice O nline database (OGRO) (http://qtaro.abr.affrc.go.jp/ogro). (A) Gene information table. All displayed information can be exported as comma-separated values (CSV format). (B) OGRO genome viewer. This viewer can be used to compare the locations of QTLs with those of functionally characterized genes.

Although recent advances in next-generation sequencing technologies have enabled re-sequencing of a large number of rice genomes (Xu et al. 2011) as well as high-throughput genoty** and large-scale genetic variation surveys (McNally et al. 2009; Ebana et al. 2010; McCouch et al. 2010; Nagasaki et al. 2010; Yamamoto et al. 2010), analysis of gene function is still indispensable both for understanding fundamental phenomena and for genomics-based breeding. Increasing numbers of mutant panels have been developed in rice, and their comprehensive analysis is ongoing (Chern et al. 2007). These experiments will provide additional information on gene function, which will be added to the database as it becomes available.