Background

Rice (Oryza sativa L.) is one of the three major food crops and is a staple food for nearly half of the world’s population (Miura et al. 2010). Excellent germplasm resources are the basis for improving the breeding efficiency of new rice varieties (Zhao et al. 2011). Various physical or chemical mutagenic factors can induce changes in biological genetic material, resulting in new allelic variations and species. As a new radiation mutagenesis method, heavy ion mutagenesis has unique advantages, such as a high mutation rate, wide mutation spectrum, fast mutation stability, and stable and reliable mutagenesis, and this method is simple and easy to implement (Qu et al. 2007; Hase et al. 2012). Due to the high linear energy transfer (LET) properties of heavy ion beams, single nucleotide variations (SNVs) and insertions/deletions (InDels) and structural variations (SVs) can be induced at higher frequencies (Zheng et al. 2021). This method can induce heritable variation in plant genomes in contemporary times. The resulting mutants are important materials for functional genomics research (Oono et al. 2020). Heavy ion radiation is one of the effective ways to innovate rice germplasm (**g et al. 2021). In recent years, this technique has played an important role in plant breeding (Yang et al. 2019a; Li et al. 2019; Sjahril et al. 2020; Okasa et al. 2021; Zhang et al. 2022).

Mining of allelic variations is the key to creating new germplasm for plants and animals. The rice Wx gene is the main gene that controls amylose synthesis, and the discovery and utilization of its allelic variation is an important way to analyze rice quality variation and is also an important basis for rice quality improvement. Currently, the Wx gene has been discovered and identified, and multiple important allelic variants, including Wxlv, Wxa, Wxb, Wxin, Wxmp, Wxop, and wx (Zhang et al. 2019), have also been identified. GS3 is a major QTL controlling grain length in rice, and the protein it encodes negatively regulates grain length (Fan et al. 1997; Zhang et al. 2012; Fazio et al. 2003). The targeted induced local lesions in genomes (TILLING) technique is a reverse genetics technique developed in the 1990s. This technique is based on chemical mutagenesis materials and combines chemical mutagenesis technology with PCR screening technology and high-throughput detection methods. Linked together, a high-throughput and rapid detection of point mutations in target gene regions has formed a technical system (Henikoff and Comai 2003) that has been applied in a variety of plants and promoted mutagenesis and breeding development (Boualem et al. 2014; Anai 2012; Ochiai et al. 2011; Chen and Dubcovsky 2012; Minoia et al. 2010). MutMap is a forward genetic gene map** strategy and genetic analysis method developed based on whole genome sequencing (WGS) (Abe et al. 2012). The MutMap method also includes a variety of developments and extensions, such as MutMap+ and MutMap-Gap. These methods do not require the establishment of cumbersome progeny map** groups and do not rely on genetic hybridization and any linkage information. The identification process of variant loci has been successfully applied to the study of multiple gene map**s in different species (Takagi et al. 2013a, b, 2015; Rym et al. 2017). In recent years, targeted sequencing technology has been widely used. Targeted sequencing is GenoPlexs based on multiplex PCR and GenoBaits based on liquid-phase probe capture, which can detect multiple SNPs within a single amplicon, greatly improving intratarget variation and detection efficiency. This technique has the characteristics of high marker flexibility and high detection efficiency and can be widely used in biological evolution, genetic map construction, gene location cloning, marker trait association detection, allelic variation detection, etc. (Shen et al. 2021; Guo et al. 2019; Lu et al. 2019; Li et al. 2020; Du et al. 2019; Yang et al. 2019b).

In this study, 3872 12C6+ radiation mutagenized mutant materials for the second generation were identified by mixed sample targeted sequencing technology, and the mutant mixed samples and SNPs related to the granulotype genes GS3 and GW5 were mined and then selected by Sanger sequencing. Mutant individual plants, combined with phenotype and protein function analysis, were utilized to further select key mutant individual plants and conduct WGS to analyze the relationship between mutant phenotype and genotype, discover new allelic variations, and establish a system. An efficient and accurate method for directional identification of allelic variation in rice grain type genes through mutation induction, targeted sequencing, and whole genome sequencing combined with a mixed-samples strategy, abbreviated as MTWA, was developed (Fig. 1). The innovation of this method is that the use of targeted sequencing combined with WGS can quickly screen mutants and identify mutation sites, with high detection throughput, low detection cost, and flexible target traits and sites.

Fig. 1
figure 1

Simplified steps of the MTWA process. a Wild-type (WT) seeds were irradiated with a carbon ion beam, and the M2 generation was planted and harvested. b The M2 generation mixed samples were constructed at a ratio of 8:1 for targeted sequencing detection, and the grain shape of the M2 generation seeds was measured. c Sanger sequencing was performed on individual plants in a mixed sample of potential mutations detected by targeted sequencing and compared with phenotypes to identify key mutants. d WGS of key mutants to resolve associations between mutant phenotypes and genotypes and to mine for novel allelic variants

Results

Grain Ty** Phenotype Investigation

The wild-type (WT) material used in this study was Huahang No. 31 (Fig. 2a). A total of 3872 seeds of the M2 generation per plant were harvested. We measured the grain length and width of these seeds (Additional file 7; Table S4). The grain length was between 8 and 10.22 mm, the average length was 9.28 mm, and the coefficient of variation was 1.89%. The grain length of WT was 9.31 mm (Fig. 2c). The grain width was between 1.54 and 2.87 mm, with an average width of 2.02 mm, and the WT grain width was 2.03 mm, with a coefficient of variation of 0.56% (Fig. 2d, Table 1). Both the grain length and grain width conformed to a normal distribution and had a wide variation range. Compared with the WT, there were many materials with large grain type differences, indicating that there were several potential grain type mutations (Fig. 2b).

Fig. 2
figure 2

a WT Huahang 31 plant type. Scale bars = 10 cm. b Mutant granulotype and WT alignment. Scale bars = 1 cm. c Material grain length number distribution. d Material grain width number distribution

Table 1 Grain type phenotype survey

Targeted Sequencing to Screen a Mixed Pool of Potential Mutations

The target genes of targeted sequencing are GS3 (Fig. 3a) and GW5 (Fig. 3b). GS3 and GW5 are the two genes that have the greatest influence on grain length and grain width, and their mechanism has been thoroughly studied. Targeted sequencing (Fig. 3c) detected a total of 179 mutation sites were obtained (Additional file 8; Table S5), all of which were homozygous mutations, of which 110 sites were in the GS3 interval and 69 were in the GW5 interval. The total mutation frequency in the GS3 interval was calculated to be 4.05 × 10− 5, and the total mutation frequency in the GW5 interval was 9.02 × 10− 5 (total mutation frequency = mutation base number/gene fragment length). Among the 179 mutation sites, 63.57% were located in the intron region, and 30% were located in the exon region (Fig. 3d, Additional file 9; Table S6). We retained only the nonsynonymous and nonsense mutations located in the exon region that could cause amino acid changes. At the same time, reliable sites with relatively high reads were screened, and a total of 15 SNPs were obtained (Table 2), of which 14 were nonsynonymous mutations and 1 was a nonsense mutation, including 11 GS3 interval loci and 4 GW5 interval locus points (Fig. 3e), for a total of 12 mixed samples. Of the 12 mixed samples, 8 were associated with GS3 and 4 were associated with GW5. 4–101, 6–44 and 7–78 each contained two SNPs, and the remaining nine mixed samples only had one SNP.

Fig. 3
figure 3

a GS3 gene information. b GW5 genetic information. c Targeted sequencing process. d Location distribution of SNPs in the genome. e Mutation site functional type distribution

Table 2 Targeted sequencing mutation site information

Screening of Individual Mutant Plants and Identification of Their Authenticity

To further screen out the mutant individual plants from the mixed samples, we isolated individual plants in the 12 mixed samples, which contained a total of 96 individual plant materials. The 96 individual plant materials of the fragment were subjected to Sanger sequencing and compared with the results of targeted sequencing to determine the target mutant individual plant. A total of 13 loci were consistent with the targeted sequencing results, among which the Sanger sequencing results of SNP-5 and SNP-6 were different from the targeted sequencing results (Fig. 4a), so the mutants at these two loci were excluded and targeted sequencing. The concordance rate with Sanger sequencing was 86.67%, and a total of 13 mutants were screened. The complete 15 SNP results are shown in Additional file 2: Fig. S2.

To verify the authenticity of the selected mutants, we identified the authenticity of the selected 13 single-plant materials according to the technical regulations for the identification of rice varieties (SSR marking method) issued by the Ministry of Agriculture (NY/T 1433–2014) and designed a total of 10 pairs of SSR markers (Additional file 6: Table S3). The agarose gel electrophoresis detection results of the 13 mutant individual plants were consistent with the WT, indicating that they were all true mutations (Fig. 4b).

Fig. 4
figure 4

a Sanger sequencing results; the red box is the target SNPs. The Sanger sequencing results of SNP-1 and SNP-2 were consistent with the targeted sequencing results. SNP-5 and SNP-6 were consistent with WT, and no variants were detected. b Results of agarose gel electrophoresis. The 10 SSR fragment polymorphisms of 13 mutant individual plants were consistent with WT

Phenotypic Verification and Protein Function Analysis of Individual Mutant Plants

After verification, the corresponding grain type and phenotype data of the real variant individual plants were found according to the number. Only 6 of the 13 individual plants showed significant changes in grain type, and the grain types of the remaining 7 individual plants were the same as those of the control. There was no significant difference in the ratios between samples (Table 3). According to the screening results of targeted sequencing, 13 SNPs are nonsynonymous and nonsense mutations, which theoretically lead to amino acid changes, while some SNPs do not cause significant changes in phenotype, presumably not changing the function of a protein or structural or other genetic mutations.

We screened a total of 9 grain length mutants, of which 2 grain lengths were significantly longer and 2 grain lengths were significantly shorter than WT grains (Fig. 5a). Nine SNPs related to GS3 were identified, including 8 nonsynonymous mutations and 1 nonsense mutation (GS3-2), in which GS3-1 was located in exon 1 and GS3-2 was located in exon 2. The remaining seven mutations were located in exon 5 (Fig. 5b). The mutation position of GS3-1 is relatively advanced, and it is not located in the functional structural region and has no effect on the structure and function of the protein, so the grain length does not change significantly (Fig. 5b). GS3-2 is located in the OSR domain, the 55th amino acid is mutated to a stop codon, the OSR domain is deleted, and the protein structure and function are severely affected (Fig. 5d), resulting in a significant increase in grain length. Both GS3-3 and GS3-4 are located in the TNFR domain of Cys-rich mutants, and the grain length of GS3-3 is significantly reduced. Protein structural analysis showed that the mutation of amino acid No. 135 leads to two additional β sheets in the secondary structure of the protein. and presumably resulted in impaired TNFR domain function (Fig. 5d), whereas the GS3-4 grain length was not significantly altered. GS3-5, GS3-6, GS3-7, GS3-8 and GS3-9 are all located in the Cys-rich VWFC domain, among which only the grain length of GS3-5 is significantly reduced, and the mutation of GS3-5 may lead to the impaired structure of the VWFC domain function, but there is no significant difference in protein structure compared with WT plants (Additional file 3: Fig. S3). The GS3-6, GS3-8 and GS3-9 phenotypic results were similar to those of GS3-4; functional domain amino acid point mutation occurred, but the phenotype did not change significantly. It was speculated that the mutation of these 4 amino acids may not affect the function of the protein or that other gene mutations have an impact on the phenotype; however, the grain length of GS3-7. In contrast, the functional site analysis of its protein showed that the mutation of amino acid 183 of GS3-7 was located in the ligand binding site of the protein (Fig. 5c). After the mutation, the function of the protein was affected, so the particle shape changed. However, GS3-4, GS3-5, GS3-6, GS3-8 and GS3-9 showed no significant difference when compared to the WT in terms of protein structure and function (Additional file 3: Fig. S3).

Table 3 Amino acid mutation and phenotypic information of single mutant plants
Fig. 5
figure 5

a Grain length and WT alignment of nine GS3 mutants. Scale bars = 1 cm. b Mutant distribution on gene and protein structure. c GS3 protein functional site (red part). d Protein structures of GS3 mutants (differential structures in red)

We screened 4 mutants with grain width, of which 1 grain width increased, 1 grain width decreased, and the remaining two grain widths had no significant changes (Fig. 6a). The four identified GW5-related SNPs were all nonsynonymous mutations, of which GW5-4 was located in exon 1, and the other three were located in exon 2 (Fig. 6b). GW5-1, GW5-2 and GW5-3 are all located in the calmodulin-binding domain; the difference is that only the granule width of GW5-1 is significantly wider than that of other mutants, and the other two mutants have no obvious change in phenotype. We predicted the protein structure and function of mutants and found that the protein structures of GW5-1 and GW5-2 were more similar to each other than to the WT, while the protein structure of GW5-3 had no obvious change (Fig. 6c). The phenotypes of GW5-1 and GW5-2 of the same domain differ, presumably due to interference from other genes. The GW5-4 mutation position is relatively forward, not located in the functional structural region, and has no effect on the structure and function of the protein (Fig. 6b), but the grain width is significantly narrowed, and the protein structure is relatively concentrated (Fig. 6c).

Fig. 6
figure 6

a Grain width and WT alignment of four GW5 mutants. Scale bars = 1 cm. b Mutant distribution on gene and protein structure. c Protein structures of GW5 mutants (differential structures in red)

Whole Genome Sequencing of Key Mutants Identifies New SNPs Affecting the GS3 Mutation Effect.

To explore the reasons for the contradiction between genotype mutation and phenotype mutation and to clarify whether allelic variation in other grain length-related genes had an impact on the phenotype, GS3-M1, GS3-M2 and WT sample were subjected to WGS. A total of 2,084,534 SNPs and 336,039 InDels were obtained by sequencing GS3-M1, and 2,116,343 SNPs and 341,777 InDels were obtained from GS3-M2. There were 189473 different loci in the two samples after comparison (Additional file 10; Table S7). After screening, three new allelic variants related to grain length were finally obtained (Table 4) (Fig. 7a). GS3-G1 is located in the second exon of OsNST1, which mutates serine No. 65 to threonine. At present, there are few reports on this gene, and its protein structure cannot be predicted. Mutants exhibit reduced cell wall cellulose content and structural changes, resulting in reduced mechanical strength and abnormal plant development, such as dwarf plants and smaller seed size (Song et al. 2011). GS3-G2 is a variant located in the first exon of OsMAPK6 that mutates the aspartic acid at No. 131 to arginine, which affects only one of its functional domains. Inhibition of OsMPK6 expression can make rice panicles denser and grains smaller, and mutation of this gene can significantly reduce grain length, grain width and thousand-grain weight (Guo et al. 2018). GS3-G3 is located in the second exon of RAE2, resulting in a frameshift insertion mutation at amino acid 99 and impaired function of the cysteine-rich region of the encoded protein EPFL1. The number of kernels decreased, the kernels became longer, and the proportion of awned kernels increased (** et al. 2016) (Fig. 7b).

Table 4 Whole-genome mutation site information
Fig. 7
figure 7

a Whole-genome mutation screening process. b Positions of the three new SNPs on their corresponding genes

Discussion

Reliability of MTWA to Identify Mutants and Allelic Variants

Screening mutants and identifying allelic variants are important foundations for innovative germplasm materials and functional genomics research (Guo et al. 2006). To efficiently utilize germplasm resources, the most fundamental way is to excavate new alleles and purposefully aggregate or transfer them in conventional breeding or molecular breeding and then combine them with molecular design to achieve the purpose of improving breeding efficiency. In this study, the seeds of Huahang 31 were irradiated with 12C6+, and the M2 generation population containing 3872 individual plants was obtained. The variation coefficient of grain length was 1.89%, and the variation coefficient of grain width was 0.56%, indicating that there are many potential mutations in grain type. body. In this study, a new method for allelic variation and mutant identification, MTWA, was proposed, which used targeted sequencing technology to initially identify mixed sample materials. After screening, a total of 15 SNPs and 12 mixed sample materials were obtained, and Sanger sequencing was performed. The key mutants in the mixed samples were screened out and verified for authenticity. The consensus rate between targeted sequencing and Sanger sequencing was 86.67%, proving the feasibility of the method. A total of 13 mutants and 15 SNPs were screened out. Analysis of the phenotype of the mutant individual plant, the protein functional structural analysis of the mutation site, and WGS of the key mutants were conducted. Mining new grain length-related allelic variations, analyzing the connection between genotype mutations and phenotype mutations, and establishing a set of systematic, efficient and accurate new methods for allelic variation identification were also conducted. At the same time, a batch of mutation sites and mutant materials with breeding value were screened and identified.

Advantages of MTWA

Traditional forward genetic identification methods usually require the construction of complex map** populations and linkage maps and gene linkage analysis using genetic markers (Zhang et al. 2012; Serquen et al. 1997; Fazio et al. 2003) but can only map mutation sites to a large range of chromosomal regions (Hazen 2005), and fine map** of mutant genes is expensive and time consuming (Schneeberger et al. 2009; Abe et al. 2012). MTWA can be utilized to analyze the M2 generation without the need to construct a complex genetic population, which greatly shortens the detection time. This technique is also a directional mutant identification method. TILLING technology uses CEL I enzyme to digest PCR amplification products and detects and selects mutants by capillary electrophoresis (Yan et al. 2014), but the process is relatively complicated; however, the TILLING technique can detect gene fragments. There are certain length requirements; usually, the length of the target gene fragment is less than 1.5 kb, and the high-resolution melting curve (HRM) detection region is only 150–500 bp (Sikora et al. 2011). On the other hand, MTWA directly performs second-generation sequencing on the target fragment to determine the mutation sites. The MTWA process is relatively simple, and the sequencing results are more accurate than those of the TILLING method. At the same time, multiple mutation sites can be detected in a single amplicon, and there is no restriction on the fragment length. The detection efficiency is higher and has wider applicability than the TILLING method. Compared to MutMap and its derived methods, mutants in MTWA do not require backcrossing of parental lines, thus greatly reducing time and effort (Allen et al. 2013; Abe et al. 2012); if the genome of the target crop is large and complex, MutMap will have the problems of high sequencing costs, large datasets, and difficult comparison and analysis, especially in allopolyploid species with high genome heterozygosity, and highly homologous sequences and subtypes of genomes could also be detected (Li et al.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

References

Download references

Acknowledgements

The authors would thank American Journal Experts for their valuable language service.

Funding

This work was supported by the Special Rural Revitalization Funds of Guangdong Province (2021KJ382) and the Research and Development Plan for Key Areas in Guangdong Province (No. 2018B020206002).

Author information

Authors and Affiliations

Authors

Contributions

KS and TG designed the experiment and wrote the manuscript. KS, DDL, AYX, QW and SSJ conducted the experiments and performed data analysis. HZ, JFW, GLY, DHZ, CHH, HW and ZQC drafted proposals and corrected the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tao Guo.

Ethics declarations

Ethics approval and consent to participate

This study complied with the ethical standards of China where this research was performed.

Consent for publication

Not applicable.

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Fig. S1.

VAF calculation method, green bases are mutation bases, red bases are reference genome bases, and the ratio of green bases to all bases is the VAF.

Additional file 2: Fig. S2.

The Sanger sequencing results of the mutation sites of the 15 mutant individual plants, in which the results of SNP-5 and SNP-6 are different from the targeted sequencing results, these two mutant individual plants are excluded, and the rest are the same as the targeted sequencing results, which are true variation.

Additional file 3: Fig. S3.

Protein structures of 9 GS3 mutants, the red part is the difference from WT.

Additional file 4: Table S1.

Targeted sequencing primer information.

Additional file 5: Table S2.

Primers for amplifying fragments of the target site.

Additional file 6: Table S3.

Primers for authenticity verification of mutant individuals.

Additional file 7: Table S4.

Grain type datas.

Additional file 8: Table S5.

Mutation mixed sample information screened by targeted sequencing.

Additional file 9: Table S6.

Mutation site information obtained by targeted sequencing.

Additional file 10: Table S7.

Mutation site information screened by whole genome sequencing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, K., Li, D., **a, A. et al. Targeted Identification of Rice Grain-Associated Gene Allelic Variation Through Mutation Induction, Targeted Sequencing, and Whole Genome Sequencing Combined with a Mixed-Samples Strategy. Rice 15, 57 (2022). https://doi.org/10.1186/s12284-022-00603-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12284-022-00603-2

Keywords