Background

Rice (Oryza sativa L.) is the most important staple food for over half of the world population. Rice quality is primarily influenced by starch which is composed of two polysaccharides: amylose and amylopectin. The percentage of amylose on total starch, measured as Apparent Amylose Content (AAC), is the key determinant of rice cooking properties. High amylose varieties, like risotto varieties, cook dry with firm and separate grain; while low amylose cultivars (cvs.) are tender, glossy and cohesive after cooking. Suggested classification of amylose content identified classes as waxy (0–5%), very low (5–12%), low (12–20%), intermediate (20–25%), and high (25–33%), even considering that commercially rice is classified by amylose content as either low (less than 20% amylose), medium (21–25%) and high (26–33%) (Juliano 1992; Suwannaporn et al. 2007).

Over the past several decades, various methods have been reported for the determination of amylose content, including iodine binding, near infrared spectroscopy, size-exclusion chromatography and most recently, asymmetric field flow fractionation (Juliano 1971; Wesley et al. 2003; Ward et al. 2006; Chiaramonte et al. 2012). However, none of these methods is cost-effective in terms of high throughput screening (Caffagni et al. 2013) and the iodine binding represents the only one which has been validated for routine use (Fitzgerald et al. 2008). The Waxy gene is located on chromosome 6 and consists of 13 exons and 12 introns. Two wild type alleles, Wxa, primarily found in indica subspecies, and Wxb, mainly found in japonica subspecies, have been found to predominate at the waxy locus for high and low AAC respectively (Dobo et al. 2010). The difference between the two alleles is related to the presence in Wxb of a G to T Single Nucleotide Polymorphism (SNP) at the 5′ splice site of the first 1,124 bp long intron, localized 1,164 bp upstream the start codon (haplotype AGT T instead of AGG T). Most of the waxy and low AAC cvs. screened so far carry this polymorphism which results, for Wxb, in the reduction of pre-mRNA splicing efficiency and promotion of alternative splicing at cryptic sites in exon 1, leading to a decreased production of functional enzymes and causing the glutinous and low amylose phenotypes (Wang et al. 1995; Ayres et al. 1997; Bligh et al. 1998; Cai et al. 2006). Moreover, Takanokai and co-workers (2009) compared the sequence of the GS3 gene, responsible of grain size, in 54 rice cvs. and identified 86 SNPs and 20 InDels. The allele mining for the rice fragrant gene badh2 also allowed the development of diagnostic molecular markers (Shi et al. 2008).

Allele mining experiments applied to the Waxy gene led to the identification of five different allelic variants. Mikami et al. (2008) studied the allelic diversification at the Wx locus in Asian rice associating different alleles to grain amylose content alterations. They identified the Wxop allele in indica varieties from India, Nepal, Indonesia and China showing an opaque and chalky endosperm with a very low amylose content. This allele is characterized by an A to G SNP in exon 4 at position +715 from the ATG causing an Asp to Gly aminoacid change. The allele Wxin was frequent in accessions belonging to an aromatic group and to tropical japonica which exhibited an intermediate AAC and is determined by a non-conservative mutation in exon 6 at position +1,083. The mutation alters a Tyr to Ser in the active site of the enzyme reducing its specific activity (Dobo et al. 2010). The last allele analyzed by Mikami and co-workers (2008), the wx allele, is present only in waxy varieties and is characterized by a 23 bp duplication in the second exon, 100 bp downstream the ATG, causing a premature stop codon which inactivates the Waxy gene. A minor allele, represented by Wxmq, was identified in the low AAC rice cv. Milky Queen and was characterized by two base changes within the coding region: G to A in exon 4 and T to C in exon 5. Each of these mutations generated missense aminoacid substitutions (Sato et al. 2002). One additional low AAC-associate allele, Wxhp, was identified by Liu et al. (2009) in a few low AAC Yunnan landraces. An A to G SNP occurring in exon 2 at position +497 causes an Asp to Gly substitution resulting in a reduction of the activity of GBSSI.

Following the discovery of this variability at the Wx locus, additional AAC class-specific molecular markers were developed. The combination of the Wxin SNP in exon 6 and a SNP in exon 10, which consists of a C/T non-conservative SNP, with the RM190 microsatellite in exon 1 enabled the discrimination between intermediate and high AAC in a small number of genotypes (Larkin and Park 2003). More recently, Dobo and co-workers (2010) performed the same analysis increasing the number of genotypes and identified three allelic groups based on the combination of the three SNPs explaining 89.2% of the variation in AAC among 85 US varieties and 93.8% of the variation in 279 European accessions.

To date, the attention has been focalised in the identification of Wx genetic variants explaining amylose reduction in the endosperm of intermediate, low and waxy genotypes and the molecular markers used so far can discriminate only waxy, low, intermediate and high AAC. There are no markers explaining the different percentages of seed amylose within the high amylose class and further elucidations are needed to improve rice quality particularly in occidental countries where dry and firm rice is largely preferred.

In this study a collection of 127 rice accessions (125 japonica ssp. and 2 indica ssp.) was analyzed for AAC leading to the identification of the AAC classes low, intermediate and high. Effectiveness of available markers diagnostic for Apparent Amylose Content was verified and a low predictability within the high AAC class was observed. To discover new molecular markers associated to the different AACs, twenty-one genotypes representing each AAC class were selected and subjected to the re-sequencing of the waxy locus. New SNPs were identified in four high AAC accessions and used to develop new SNP-based molecular markers. Moreover, un-expected associations between grain shape characters and polymorphisms associated to the waxy locus were identified and analyzed.

Results

AAC assessment in a rice germplasm collection

A collection of 125 temperate japonica and two indica rice accessions originated from different rice cultivation areas was evaluated for AAC and four classes were identified, ranging from waxy to high AAC (Table 1). No accessions of the very low AAC class (5–12%) were detected; high frequency was instead observed for low and intermediate amylose classes (Figure 1) while a low frequency of the high amylose class was highlighted. Accessions showing AAC higher than 20% (60 in total) originated from very different countries (20 from USA, 11 from Italy, 4 from Portugal, 3 from Spain, 2 from France and the remaining from other countries), indicating that the trait conferring a relatively high amylose content was selected in different rice cultivation areas.

Table 1 Germplasm collection of 127 rice accessions
Figure 1
figure 1

Frequency distribution of amylose content classes in the analyzed germplasm collection: Waxy (0-5%), low amylose (5-20%), intermediate (20-25%) and high amylose (>25%).

Molecular markers analyses

The germplasm collection of 127 rice accessions (Table 1) was chosen to evaluate the effectiveness of known molecular markers in predicting Apparent Amylose Content. Ten different alleles for the RM190 CTn microsatellite were identified (Table 2) but a clear relationship with the different AAC groups was observed only for some alleles (Table 1). Repeats CT9, CT10 and CT14 were present only in genotypes with 23–24.85% AAC, while CT11 and CT20 also identified accessions with AAC higher than 25%. Among this AAC classes, two varieties showed unique CT alleles: CT13 for Ota (22.55% AAC) and CT21 for Bomba (22.84% AAC). The most frequent allele, CT18 identified in 67 accessions, represents a wide range of AAC (14.92-21.56%). Similar results were observed for CT17 and CT19 which were associated to heterogeneous AAC intervals ranging from 15.55% to 23.27%. Considering non-glutinous genotypes, the RM190 microsatellite explained the 74.9% of the AAC variation in our collection (Table 3).

Table 2 Alleles identified for the RM190 microsatellite in the germplasm collection
Table 3 Percentage of variation for Apparent Amylose Content (AAC) explained by the tested GBSSI molecular markers in the germplasm collection

For the SNP in the splicing site of the leader intron (intron 1), results confirming previous investigations were observed (Wang et al. 1995; Ayres et al. 1997; Bligh et al. 1998; Cai et al. 1998; Isshiki et al. 1998; Larking and Park 2003; Dobo et al. 2010). With only two exceptions represented by Rotundus and Antoni, all the accessions with AAC lower than 22% showed the T allele, while all the genotypes with AAC higher than 22% had the G allele (Table 1). This SNP explained a higher level of AAC variation (77.5%) with respect to the RM190 (Table 3) and the combination of the two polymorphisms did not significantly increase the level of explained variability (77.7%).

The presence of SNPs characterizing the different known Wx alleles was assessed in our collection through sequencing of exons 2, 4, 5 and 6. The wx allele was identified in the waxy cv. Calmochi 101, in which the 23 bp duplication (sequence motif: ACGGGTTCCAGGGCCTCAAGCCC) in exon 2 responsible for Waxy gene inactivation was present; literature data in fact provide AAC values for this cv. ranging from 0.8 to 1% (Park et al. 2007; Li et al. 2008). Allelic variation in the Wxin allele, characterized by a non-conservative A/C mutation in exon 6, was observed with the A allele being detected, with few exceptions, in accessions with AAC ranging from waxy to 22% and higher than 24%, and the C allele generally present in genotypes with an AAC from 22 to 24% (Table 1). No polymorphisms were identified in exons 4 and 5, representative of the Wxmq allele and, unlike in previous observations (Dobo et al. 2010), we did not observe mutations in exon 10 thus excluding the presence of SNPs in this exon.

Combining the results obtained for SNPs in intron 1 and exon 6, three allelic patterns were identified: GA, TA and GC (Table 2), the first letter indicating the G/T polymorphism in intron 1 while the second the A/C SNP in exon 6. Similarly to the behaviour observed for intron 1, most of the accessions from waxy to 22% AAC (with two exceptions) carried the TA haplotype (Table 1), GC pattern was present in accessions with 22-24% AAC (with two exceptions) and GA in accessions with AAC >24% (with three exceptions). Statistical analyses showed that the A/C SNP in exon 6 alone explained only the 30.9% of the variation in AAC, but altogether the two SNPs explained the 79.5%. Adding the RM190 microsatellite to the analysis, we did not find a significantly higher explanation of the AAC variation: 79.6% vs. 79.5% (Table 3).

The allelic variation at the RM190 locus together with intron 1/exon 6 SNPs data allowed the identification of allelic patterns associated to different AAC classes. In particular, even considering that no association of CT17 with a specific AAC group could be identified, all the accessions with haplotype CT17, G in intron 1 and C in exon 6 shared an AAC level ranging from 22 to 23%, with the exception of Giada (23.93%). Similarly, the CT14 allele associated with the CG and AG allelic patterns shared similar AAC (ranging from 23.2 to 23.8%). Among the CT20 group, the most frequent allelic pattern for high AAC class was CG, which identifies an AAC range from 22 to 24%, while in combination with AG frequently identified accessions with more than 25% AAC. CT9, CT10 and CT11 associated to the allelic pattern AG were typical of genotypes with more than 24% of AAC.

Despite the fact that the level of variation in AAC explained by our results is in agreement to the one recently evaluated in an Italian rice collection (Caffagni et al. 2013), it is consistently lower than previously observed. As examples, Ayres et al. (1997) using the combination of RM190 and the G/T in intron1 explained the 85.9% of the variation in AAC; Dobo and co-workers (2010) could explain the 93.8% of variation in AAC with RM190, the SNP in intron 1, the SNP in exon 6 and the SNP in exon 10 which was not present in our collection. Owing these results, it was realized that additional molecular markers could be needed to increase predictability of the different AAC classes within our germplasm panel.

Allele mining of the GBSSI gene

To mine the genetic variation at the level of the GBSSI locus within our germplasm collection, the gene as well as 1kbp of the upstream putative regulatory region were sequenced in twenty-one genotypes representing all the AAC classes identified: Calmochi 101 for the waxy type; Prever, Yrl 196, Delta, Lomellino and Campino for 14-16% AAC; Yrm 6–2, Timich 108, Augusto, Loto and Sant’Andrea for 16-19% AAC; Upla 91, Antoni, Gigante Vercelli, A201 and Gladio for 21-25% AAC; and Fragrance, L 202, Zhen Shang 47, Arroyogrande and Alinano C for >25% AAC (Table 1). The twenty-one GBSSI sequences were compared by multiple alignments considering the Nipponbare sequence as the reference. Sequence comparisons showed an overall high level of similarity (Figure 2), indicating that the coding sequence was conserved in most of the genotypes with some exceptions and led to the identification of 32 SNPs (Table 4). The waxy cv. Calmochi 101 carried the wx allele with the 23 bp duplication in exon 2, as described before. Antoni and Gigante Vercelli accessions showed the non-conservative A/C SNP in exon 6, previously identified and causing a Serine/Tyrosine substitution in the GBSSI protein (Larking and Park, 2003). Additionally, these genotypes contained two common SNPs: one in the putative regulatory region (position −1,514) and one in the first intron (position −399). Gigante Vercelli also showed a SNP at −2,174 bp upstream the ATG (Figure 2; Table 4). Zhen Shang 47 and Alinano C carried a C/T SNP at position +1,801 in exon 9 which results in the substitution of Pro415 to Ser in the GBSS protein. Also for Zhen Shang 47 and Alinano C, SNP mutations were identified in the non-coding sequences: 2 in the promoter region, 11 in intron 1, 1 in intron 6, 9 in intron 10 and 2 in intron 12 (Figure 2; Table 4). Most of the SNPs identified in the present work are not classified in the group of SNPs computationally characterized by Kharabian (2010) and present in OryzaSNP (http://oryzasnp.plantbiology.msu.edu/cgi-bin/gbrowse/osa_snp_tigr/) and dbSNP (http://www.ncbi.nlm.nih.gov/sites/entrez?db=snp&TAbCmd=Limits) databases. Additionally to these SNPs polymorphisms, a 4 bp tandem repeat that was identified in rice cvs. showing high level of Wx transcripts (Cai et al. 1998) was highlighted at position +272 in the rice accessions Alinano C and Zhen Shang 47. In these two accessions, also a 1 bp deletion was present in intron 1 at position −862 and a 3 bp deletion in intron 10 at 2,241 bp from the start codon (data not shown).

Figure 2
figure 2

Schematic representation of sequence alignment for the GBSSI alleles. Nipponbare Wx sequence was used as reference. The position of the ATG is indicated with 0; negative numbers are referred to bp positions in the putative regulatory regions while positive numbers indicate bp positions in the coding region. Colour shade represents the different AAC classes from white corresponding to waxy to dark grey corresponding to high AAC. SNPs are symbolized by black bars, insertions by blue triangles and deletions by red crosses. Thick, intermediate and tight boxes indicate exons, 5′ and 3′ UTR and the promoter region, respectively. Lines between boxes represent introns.

Table 4 SNPs identified from re-sequencing of the Waxy gene and the putative regulatory region

Generation of new molecular markers from allele mining

With the aim of identifying more informative markers allowing a better discrimination between accessions with AAC higher than 25% from those with lower levels, one and three SNPs were selected from Antoni/Gigante Vercelli and Alinano C/Zhen Shang 47, respectively, for molecular markers development. dCAPS molecular markers were obtained from the SNPs identified at positions −1,514 (promoter region) (Antoni and Gigante Vercelli), +1,801 (exon 9), +2,282 (intron 10) and +2,806 (intron 12) (Alinano C and Zhen Shang 47) and used to screen the 127 accessions (Figure 3; Table 1). Antoni and Gigante Vercelli haplotype (T) at position −1,514 was identified in 17 additional accessions belonging to the group with AAC 20.65 – 24.85%. Among them, 11 carried the C allele for the SNP in exon 6 as Antoni and Gigante Vercelli. Considering both the SNPs, four allelic patterns associated to different levels of AAC were identified (Table 5). The AG allele (the first base referred to the SNP in exon 6, while the second to position −1,514) was present in 89 accessions and associated to a mean AAC value of 18.99% (without considering the waxy variety Calmochi 101); the GC allele was found in 18 accessions with a mean AAC value of 23.50%; the AT allele, carried by 6 accessions, corresponded to an AAC average of 24.24%; the last allele, CT, was identified in 13 genotypes with a mean AAC of 22.75%.

Figure 3
figure 3

dCAPS molecular markers developed from SNPs found in Antoni, Gigante Vercelli, Alinano C and Zhen Shang 47 Waxy alleles. Lomellino and Prever were used as control. Primers and restriction enzymes for each molecular marker are listed in Additional file 1: Table S1.

Table 5 Apparent Amylose Content (AAC) values observed in accessions with different allelic patterns for the newly identified SNP at position −1,514 in Antoni and Gigante Vercelli associated to the exon 6 SNP

Considering the RM190 alleles associated to the four haplotypes identified for exon 6 and the SNP at −1,514, it was observed that the combination of CT20 with A (exon 6) and G (SNP −1,514) was always associated to an AAC higher than 24.5% (Table 1), thus providing a previously un-indentified tool for selecting rice accessions with high AAC. Allelic variation observed for the SNPs at positions +1,801, +2,282 and +2,806 can finally provide useful diagnostic tools for selecting accessions with AAC higher than 24% when specific donors like Merlè, CNA 4081, Orione, Zhen Shang 47 and Alinano C are used (Table 1); for these accessions, in fact, a unique CGT allelic pattern was observed for the three SNPs. The SNP at position −1,514 slightly increased the AAC explained variation to 80.1% when considering the SNPs only, and to 80.3% when the RM190 was included (Table 3). However, when only the models having all the variables significant at P ≤ 0.05 were considered, the model with the highest ability to explain AAC variation (79.5%) was the one with SNPs at intron1 and exon 6 (Table 3), a result which is fully in agreement with previous work (Caffagni et al. 2013).

Relationships between grain shape parameters and Waxy haplotypes

Correlation analyses between grain shape parameters for the 126 non-glutinous rice accessions (Table 1) are indicated in Figure 4 A, B and showed a close agreement with those reported by Tran et al. (2012). The most obvious correlations were between the L/W ratio and either grain length (positive) or grain width (negative). A negative correlation exists between the length and the width of the grain while regarding AAC, grain length and the L/W ratio were positively correlated with it, whereas grain width was negatively correlated. An analysis of the histogram distribution plots of the variables indicated that the distributions of AAC and grain width clearly showed two modes (Figure 4 A).

Figure 4
figure 4

Correlations between grain shape parameters and AAC. A) Scatterplot matrix of grain (dehulled caryopses, i.e., brown rice) characters. Histogram distribution plots for the single variables are shown on the diagonal cells. Below them the bivariate distributions are shown for every pair of traits. Confidence ellipses (ELL) mark confidence limit for each distribution (P = 0.95, for a normal bivariate); B) Pearson’s correlation coefficients for grain (dehulled caryopses, i.e., brown rice) characters. All the correlations are significant with P ≤ 0.001 (probabilities were adjusted by Bonferroni’s correction for multiple tests; n = 126).

In rice, traits related to grain size, shape and cooking properties have a large impact on market appreciation and play a pivotal role in the adoption of new varieties (Webb 1991; Juliano 2003). It is therefore interesting to note that, as shown in Table 6, we found some surprising associations between the haplotypes at the waxy locus and the shape of the rice caryopsis. In particular, the SNP at the first intron (which identifies Wxa and Wxb genotypes) was associated with differences in the width of the grain, the length to width ratio (L/W), and the length of the kernel. The SNP at the sixth intron showed no association with these traits by itself, but it slightly increased the overall explained variance once the SNP at the first intron was considered. This suggests that the latter SNP was the actual responsible of the association, whereas the SNP at the intron 6 has only an ancillary effect. Even the CTn showed a significant association, but its explanatory capability of the variance of grain biometric parameters was lower than that of the SNP at the first intron, thus that, given it also has no direct effect on the functionality of GBSSI, its association is most probably indirect and most likely ascribable to the phylogenetic association of some CTn haplotypes with the two versions of the SNP at the first intron.

Table 6 Percentage of variation for seed biometric indexes explained by the tested GBSSI molecular markers in the germplasm collection

The positive correlation between AAC and L/W ratio, and the negative one with the width of the grain were further analysed (Figure 5). In fact, by plotting L/W ratio versus AAC for the overall genotype set used in this work, the positive correlation (r = 0.517, P < 0.001) is immediately apparent (Figure 5A). When the genotype set is grouped according to the haplotype at the SNP at intron 1 (G/T), that is, the Wxa/Wxb allelic version is superimposed onto the correlation plot, it becomes evident that: (a)- this SNP offers a sharp distinction between genotypes with AAC ≤ 21% (Wxb) and AAC ≥ 21% (Wxa); (b)- in our set of genotypes, there seems to be a break for AAC around 21.5%, which, therefore, appears to be a more precise threshold value for discriminating between Wxa/Wxb allelic versions, even though a few genotypes spill over, and small differences in AAC can actually occur because of the method of assay (Fitzgerald et al. 1995; Bligh et al. 1998; Cai et al. 1958), with modifications by Inatsu (1988), the AAC of milled grain was measured with a FOSS FIAstar 5000 auto-analyzer which is based on a flow injection of a solution of NaOH 0.09% to the sample, the addition of an iodine solution and the spectrometric determination of the absorbance of the formed color at 720 nm. The calibration was performed measuring the absorbance of standard rice samples carrying 15.40%, 23.10% and 27.7% of AAC, respectively, using a white reference. These reference samples were supplied and certified by FOSS as having their amylose content determined against amylose/amylopectin standards. The SoFIA software (FOSS) was used to build up the calibration curve and to obtain the percentage of amylose in our samples. Each reference analysis was repeated twice for each sample. Biometric parameters of rice seeds (decorticated grain length and decorticated grain width) were evaluated trough optical scanner-produced high resolution images analyzed with the WinSEEDLE 2011a software (Regent Instruments Inc.).

Molecular markers analyses

To obtain genomic DNA from the rice accessions, seeds were germinated in petri dishes at 30°C and one-week old seedlings were transplanted and grown in a greenhouse until three leaf stages; leaves were therefore collected, frozen in liquid nitrogen and store at −80°C. Genomic DNA was extracted on plates using the Wizard® Magnetic 96 DNA Plant System (Promega) according to manufacturer’s instructions. The CTAB DNA extraction method (Doyle and Doyle 1987) was instead applied for the 21 genotypes selected for GBSSI gene re-sequencing.

The RM190 CT repeat was assayed using the M13-tailed forward primer RM-190 F (CACGACGTTGTAAAACGACCTTTGTCTATCTCAAGACAC) and the reverse primer RM-190R (TTGCAGATGTTCTTCCTGATG) (Ayres et al. 1997; Chen et al. 2008). PCR reactions were performed in 10 μl containing 15 ng of genomic DNA, 0.1 μM of RM-190 F, 1 μM of RM-190R and FAM-labelled M13 (CACGACGTTGTAAAACGAC), 0.2 mM dNTPs and 1 U GoTaq DNA Polymerase (Promega). DNA was amplified using a touchdown program as follows: denaturation at 94°C per 3 min, 20 cycles at 94°C per 45 sec, from 61°C to 51.5°C per 45 sec, reducing the annealing temperature of 0.5°C for each cycle, and 72°C per 45 sec, 24 cycles at 94°C per 45 sec, 51°C per 45 sec and 72°C per 45 sec and a final extension of 72°C per 10 min. Labelled PCR products were run in a 3130 Genetic Analyzer (Applied Biosystems). ROX size standard (Applied Biosystems) was used.

PCRs for dCAPS analyses of the G/T polymorphism in intron 1 were carried out in 20 μl using GoTaq DNA Polymerase (Promega) supplemented with 5% DMSO and with 20 ng of genomic DNA, 1 μM of GBSS-W2F, 1 μM of GBSS-W2R (Ayres et al. 1997; Chen et al. 2008). Amplification conditions were: 94°C per 4 min followed by 40 cycles of 94°C per 40 sec, 60°C per 50 sec and 72°C per 1 min per kb and a final extension of 72°C per 10 min. After amplification 5 μl of PCR products were digested with 1 U of AccI restriction enzyme (New England BioLabs) in a total volume of 10 μl at 37°C over night. The samples were run on 2% agarose gel. Each digested sample was compared alongside with its not-digested cognate control.

The presence of Wxop, Wxin, Wxmq, Wxhp and wx alleles and the variability for the SNP in exon 10 identified by Larking and Park (2003) were assessed in our collection through the sequencing of exons 4, 6, 5, 2 and 10 using the same procedure for the allele mining experiment described below.

To detect the SNPs at position −1,514, +1,801, +2,282 and +2,806 by dCAPS analysis, mismatched forward primers were designed by dCAPS Finder 2.0 software (http://helix.wustl.edu/dcaps/dcaps.html). Primers and restriction enzymes for dCAPS assay are listed in Additional file 1: Table S1. For PCR amplification, the same protocol used for detecting the G/T polymorphism in the leader intron was used.

Allele mining

For the allele mining of the GBSSI alleles in the 21 selected genotypes, six overlap** regions, ranging from 800 bp to 2,500 bp, were PCR amplified from genomic DNA with the same protocol described above for the G/T polymorphism. Primers designed on the Nipponbare genomic sequence (Genebank AC NC_008399; Additional file 2: Table S2) were utilized for genomic DNA amplifications using the combinations indicated in Additional file 2: Table S2. The amplified regions covered the entire gene plus 1 kbp of the upstream putative regulatory region. After gel purification by the Wizard® SV Gel and PCR Clean-Up System (Promega), PCR products were directly sequenced. Sequencing reactions were accomplished by the use of ABI BigDye Terminator version 3.1 (Applied Biosystem) in forward and reverse directions with 5 μl of each PCR amplification product and the same primers used for PCR amplifications or internal primers. All primers were designed using the Primer3 0.4.0 software (http://frodo.wi.mit.edu/) and blasted against the rice genomic sequence on the Gramene website (http://www.gramene.org) to ensure the specificity for the GBSSI gene.

Computational analyses

The RM190 microsatellite was analyzed using the GeneMapper software (Applied Biosystem). Sequence assembly was assessed with the ContigExpress tool of Vector NTI Software (Invitrogen) using the Nipponbare genomic sequence as reference. Sequence comparison was carried out by MultAlin software (http://multalin.toulouse.inra.fr/multalin/). All data were analysed with the Systat 12 software (SPSS Inc., Chicago, IL, USA). The relationships between numerical variables (AAC and grain biometrical characters) were evaluated by Pearson correlation coefficients and regression analysis. The associations between numerical variables (AAC) and categorical variables (marker haplotypes) were analysed according to the General Linear Model (GLM) procedure.