Abstract
Background
The high-quality sequence information and rich bioinformatics tools available for rice have contributed to remarkable advances in functional genomics. To facilitate the application of gene function information to the study of natural variation in rice, we comprehensively searched for articles related to rice functional genomics and extracted information on functionally characterized genes.
Results
As of 31 March 2012, 702 functionally characterized genes were annotated. This number represents about 1.6% of the predicted loci in the Rice Annotation Project Database. The compiled gene information is organized to facilitate direct comparisons with quantitative trait locus (QTL) information in the Q-TARO database. Comparison of genomic locations between functionally characterized genes and the QTLs revealed that QTL clusters were often co-localized with high-density gene regions, and that the genes associated with the QTLs in these clusters were different genes, suggesting that these QTL clusters are likely to be explained by tightly linked but distinct genes. Information on the functionally characterized genes compiled during this study is now available in the O verview of Functionally Characterized G enes in R ice O nline database (OGRO) on the Q-TARO website (http://qtaro.abr.affrc.go.jp/ogro). The database has two interfaces: a table containing gene information, and a genome viewer that allows users to compare the locations of QTLs and functionally characterized genes.
Conclusions
OGRO on Q-TARO will facilitate a candidate-gene approach to identifying the genes responsible for QTLs. Because the QTL descriptions in Q-TARO contain information on agronomic traits, such comparisons will also facilitate the annotation of functionally characterized genes in terms of their effects on traits important for rice breeding. The increasing amount of information on rice gene function being generated from mutant panels and other types of studies will make the OGRO database even more valuable in the future.
Similar content being viewed by others
Background
Rice is a model plant species for which many genetic and genomic resources have been developed. These resources include high-quality genome sequence information (Goff et al. 2010). These resources have contributed to remarkable advances in rice functional genomics during the last two decades, and many genes have been functionally characterized (Jiang et al. 2009; Emanuelli et al. 2010). For these approaches, it is necessary to make the list of candidate genes involved in the trait of interest readily available for individual experimental design. It is also important that the genomic locations of functionally characterized genes can be readily compared with the location of QTLs involved in the same trait. Rice databases such as Gramene (Youens-Clark et al. 2011) and Oryzabase (Kurata and Yamazaki 2006) include information on gene function from published research. However, it is necessary to rearrange the data provided by these databases for carrying out the abovementioned approaches. We also found that several functionally characterized genes are not included in those databases, probably because information on such genes was published in agronomy and breeding journals rather than in genetics, genomics, or molecular biology journals.
In this study, our goal was to facilitate the application of gene function information to the study of natural variation in rice. To accomplish this, we comprehensively searched for articles related to rice functional genomics and established a list of functionally characterized genes. Information on each gene was summarized to facilitate direct comparison with QTL information from Q-TARO (Yonemaru et al. 2010). We also compared the genomic locations of functionally characterized genes and QTLs. The information on functionally characterized genes obtained in this study was compiled in a new database, the O verview of Functionally Characterized G enes in R ice O nline database (OGRO), which is located on the Q-TARO website (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/ogro).
Results and discussion
Extraction of information on functionally characterized genes in rice
To establish the list of functionally characterized genes in rice, we conducted a comprehensive search for articles related to rice functional genomics, and we extracted information on gene function by manually checking every article identified in the search. As of 31 March 2012, 702 functionally characterized genes were annotated based on the information from 707 articles. The categories of information extracted for each of the functionally characterized genes are listed in Table 1. The list of functionally characterized genes includes seven microRNAs (miRNAs) that have been associated with specific phenotypes (** of epigenetic modifications of the rice genome uncovers interplay between DNA methylation, histone methylation, and gene expression. Plant Cell 2008, 20: 259–276. 10.1105/tpc.107.056879" href="/article/10.1186/1939-8433-5-26#ref-CR25" id="ref-link-section-d112793985e660">2008).
Overview of the functionally characterized genes in rice. (A) Genomic distribution of the 702 functionally characterized genes compiled during this study. The position of each gene is indicated by a horizontal bar; the color indicates the major category for that gene. Gray vertical bars to the right of each chromosome indicate heterochromatic regions (Cheng et al. 2001; Li et al. 2008). (B) The proportions of genes isolated by each method. (C) Numbers of functionally characterized genes in each trait category (total and by each of the methods listed in B).
There are 44 755 gene loci, excluding transposable elements (TEs) and ribosomal protein or tRNA loci, in RAP (Rice Annotation Project 2008; http://rapdb.dna.affrc.go.jp/), and 491 miRNA loci in release 18 miRbase (Griffiths-Jones et al. 2008; http://www.mirbase.org/). The functionally characterized genes compiled during this study represent only 1.6% of these loci. In Arabidopsis, a model dicot species, 5826 genes have been functionally characterized, accounting for more than 20% of the gene loci in this species (Lamesch et al. 2012). Considering both the number and the proportion of functionally characterized genes in Arabidopsis, it seems that the functional characterization of rice genes is far from complete.
For the gene information item "method of isolation" (Table 1), the genes identified by using cultivars, landraces, or wild relatives were described as "natural variation". Among the 702 functionally characterized genes, 11% (80 genes) had been identified through natural variation. Another 41% (286 genes) were identified by mutant analysis, and 48% (336) were identified by using transgenic plants (isolation method classified as "overexpression", "knockdown", "knockdown/overexpression", or “others”; Figure 1B). This breakdown indicates that both forward- and reverse-genetics approaches are valuable methods in rice functional genomics.
We annotated the functionally characterized genes based on the phenotypes described in each of the articles (Table 1). The phenotypes related to each gene were classified into "major category" and "category of objective character" (Table 1). These categories are identical to those used in Q-TARO (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/). Genes associated with multiple traits were counted within each relevant category.
The number of functionally characterized genes within each category is shown in Figure 1C. The variability in the number of functionally characterized genes among the different categories (Figure 1C) probably reflects the agronomic importance of each trait and the interests of individual researchers rather than the actual number of genes involved in each trait. In the major category "resistance or tolerance", transgenic approaches ("overexpression", "knockdown", and "knockdown/overexpression") were used for functional analysis more frequently than for genes in the major categories "morphological trait" and "physiological trait" (Figure 1C). This difference might be due to the difficulty in screening mutant and natural populations for traits related to resistance or tolerance. Within the major category "resistance or tolerance", most of the genes in the categories "cold", "drought", and "salinity" were characterized by overexpression analysis (Figure 1C). The overexpressing plants often showed pleiotropic effects such as growth retardation (Abbasi et al. 2004; Ye et al. 2011). To survey whether functionally characterized genes were also arranged in such clusters, we calculated the distribution of functionally characterized genes and compared it with the genomic locations of the QTL clusters (Figure 3). In this comparison, we also included the gene density of RAP loci (Rice Annotation Project et al. 2008; http://rapdb.dna.affrc.go.jp/). There was good correspondence between the genomic locations of functionally characterized genes and RAP locus gene density (Figure 3). Furthermore, functionally characterized genes and QTLs also showed high co-localization (Figure 3), indicating that QTLs tended be located in regions of high gene density. Regarding the genetic basis of the QTL clusters, two main possibilities are generally considered: the pleiotropic effects of one or a few genes, or the effects of multiple genes that are tightly linked to one another. Several genes responsible for QTLs have been reported to have pleiotropic effects; for example, SCM2 is involved in panicle architecture, culm length, and culm mechanical strength (Ookawa et al. 2010), and IPA/WFP is involved in panicle architecture, panicle number, and culm mechanical strength (Jiao et al. 2010; Miura et al. 2010). However, when we examined the genomic location of QTL clusters and genes identified by using natural variation, we found that the QTL clusters often contained multiple genes identified by using natural variation (Figure 3). For example, on the long arm of chromosome 1, which contains the largest QTL cluster region, there were four genes that had been identified by using natural variation: Pi37 for blast resistance (Lin et al. 2007), qSH1 for seed shattering (Konishi et al. 2006), qNPQ1-2 for photosynthetic capacity (Kasajima et al. 2011), and sd1 for culm length (Sasaki et al. 2002). On the short arm of chromosome 6, the location of the second-largest QTL cluster region, there were eight genes that had been identified by using natural variation: wx (Wang et al. 1995) and alk (Gao et al. 2011b) for eating quality, Hd3a (Kojima et al. 2002) and Hd1 (Yano et al. 2000) for heading date, DPL2 (Mizuta et al. 2010) and S5 (Chen et al. Screen shots of the O verview of Functionally Characterized G enes in R ice O nline database (OGRO) (http://qtaro.abr.affrc.go.jp/ogro). (A) Gene information table. All displayed information can be exported as comma-separated values (CSV format). (B) OGRO genome viewer. This viewer can be used to compare the locations of QTLs with those of functionally characterized genes.
Although recent advances in next-generation sequencing technologies have enabled re-sequencing of a large number of rice genomes (Xu et al. 2011) as well as high-throughput genoty** and large-scale genetic variation surveys (McNally et al. 2009; Ebana et al. 2010; McCouch et al. 2010; Nagasaki et al. 2010; Yamamoto et al. 2010), analysis of gene function is still indispensable both for understanding fundamental phenomena and for genomics-based breeding. Increasing numbers of mutant panels have been developed in rice, and their comprehensive analysis is ongoing (Chern et al. 2007). These experiments will provide additional information on gene function, which will be added to the database as it becomes available.
Conclusion
In this study, we comprehensively searched for articles related to rice functional genomics and extracted information on 702 functionally characterized genes (Figure 1). The information on each gene was organized to enable direct comparison with the QTL information in Q-TARO (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/), which will facilitate a candidate-gene approach to identifying the genes responsible for QTLs (Figure 2). Because the QTL descriptions in Q-TARO contain information on agronomic traits, such comparisons will also facilitate the annotation of functionally characterized genes in terms of their effects on traits important for rice breeding. We found that the genes responsible for QTLs in QTL clusters were identified as different genes (Figure 3). Considering this evidence along with the data showing co-localization of QTL clusters and high-density gene regions (Figure 3), our results suggest that many QTL clusters are caused by distinct but tightly linked genes. Information on the functionally characterized genes compiled in this study is now available in OGRO on the Q-TARO Web site (Figure 4; http://qtaro.abr.affrc.go.jp/ogro). The increasing amount of information on rice gene function being generated from mutant panels and other types of studies will make the OGRO database even more valuable in the future.
Methods
Extraction of gene information from published articles
Functional genomics studies have been done using many different approaches, and the degree of functional characterization differs substantially among genes. To avoid ambiguity, we established two main criteria for functionally characterized genes in rice. The first was verification of function: gene function had to be demonstrated in rice through direct evidence based on complementation tests, mutant analysis, or transgenic plant analysis. The second was verification of the phenotype: there had to be evidence that the function of the gene affected the phenotype of the rice plant. Functional analysis using other organisms such as yeast and Arabidopsis was not counted as meeting this criterion because such experiments do not necessarily indicate that the gene has a biological role in rice.
Articles related to rice functional genomics were identified by searching the Web of Science database (http://apps.webofknowledge.com/) with the search terms "rice" and "Oryza sativa". Because rice studies span a broad range of research fields, the following categories were surveyed: Agriculture Multidisciplinary, Agronomy, Biotechnology & Applied Microbiology, Cell Biology, Genetics & Heredity, Multidisciplinary Sciences, and Plant Sciences. To make this search comprehensive, the time span was set to "All" (i.e., all publications since 1899). As of 31 March 2012, we identified a total of 14 102 articles using these search conditions. All of the articles were then manually checked, and articles containing information on gene function that met our criteria for functionally characterized genes were selected. The result was a total of 707 articles. For each gene meeting the criteria for a functionally characterized gene, we extracted information including the gene locus ID, genome position, method of isolation, related traits, and reference information (doi) (Table 1). Whenever possible, the RAP ID number (Rice Annotation Project 2008; http://rapdb.dna.affrc.go.jp/) was used as the gene locus ID number. If there was no corresponding ID in RAP, the Michigan State University (MSU) locus number (Yuan et al. 2005; http://rice.plantbiology.msu.edu/) or GenBank (http://www.ncbi.nlm.nih.gov/genbank/) accession number was used. Information on genome position (start and end) was based on International Rice Genome Sequencing Project (IRGSP) Pseudomolecules build 4.0 (http://rgp.dna.affrc.go.jp/E/IRGSP/Build4/build4.html). The genome positions of genes not found in the reference genome (Oryza sativa L. ssp. japonica cv. Nipponbare) were indicated by using either a position adjacent to the deleted sequence or the positions of the flanking markers used for positional cloning. Under the method of isolation, "knockdown/overexpression" indicates that the genes were characterized by using both knockdown and overexpression transgenic plants.
Comparison of genomic locations and densities between functionally characterized genes and QTLs
We compared the relative genome positions and distributions of functionally characterized genes and QTLs within each of the trait categories. The genome position of each functionally characterized gene was represented by the midpoint between the genome start and genome end positions (Table 1). The QTL information was extracted from Q-TARO (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/).
We also performed comparisons across all of the trait categories between the density of functionally characterized genes, the density of RAP loci and the number of QTLs. The density of functionally characterized genes or RAP loci at each point in the genome was expressed as the proportion of the total number of genes (loci) contained within the surrounding 1-Mb block, calculated by using a window size of 2 Mb. The number of QTLs was counted within every 1-Mb block along the genome sequence.
Database construction
All data on the functionally characterized genes annotated in this study were compiled in OGRO (http://qtaro.abr.affrc.go.jp/ogro). Like Q-TARO (Yonemaru et al. 2010; http://qtaro.abr.affrc.go.jp/), OGRO consists of two Web applications: a gene information table and a genome viewer. The Web applications were implemented as Perl scripts and CGI modules. The database was constructed using MySQL, a relational database management system. We used the GBrowse viewer (http://gmod.org/wiki/Main_Page), which was configured to access OGRO from within the Q-TARO genome viewer.
Authors' information
EY, JY, TY: National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305–8602, Japan. MY: National Institute of Agrobiological Sciences, 1–2 Ohwashi, Tsukuba, Ibaraki 305–8634, Japan.