Genetic relationships and identification of core germplasm among rice photoperiod- and thermo-sensitive genic male sterile lines

Zhang, **anwen; He, Qiang; Zhang, Wuhan; Shu, Fu; Wang, Wei**; He, Zhizhou; **ong, Hairong; Peng, Junhua; Deng, Huafeng

doi:10.1186/s12870-021-03062-x

Genetic relationships and identification of core germplasm among rice photoperiod- and thermo-sensitive genic male sterile lines

Research
Open access
Published: 02 July 2021

Volume 21, article number 313, (2021)
Cite this article

Download PDF

You have full access to this open access article

BMC Plant Biology Aims and scope Submit manuscript

Genetic relationships and identification of core germplasm among rice photoperiod- and thermo-sensitive genic male sterile lines

Download PDF

Wei** Wang^1,2,
Zhizhou He³,
Hairong **ong⁴,
Junhua Peng³ &
…
Huafeng Deng²

1876 Accesses
1 Citation
Explore all metrics

Abstract

Background

Harnessing heterosis is one of the major approaches to increase rice yield and has made a great contribution to food security. The identification and selection of outstanding parental genotypes especially among male sterile lines is a key step for exploiting heterosis. Two-line hybrid system is based on the discovery and application of photoperiod- and thermo-sensitive genic sensitive male sterile (PTGMS) materials. The development of wide-range of male sterile lines from a common gene pool leads to a narrower genetic diversity, which is vulnerable to biotic and abiotic stress. Hence, it is valuable to ascertain the genetic background of PTGMS lines and to understand their relationships in order to select and design a future breeding strategy.

Results

A collection of 118 male sterile rice lines and 13 conventional breeding lines from the major rice growing regions of China was evaluated and screened against the photosensitive (pms3) and temperature sensitive male sterility (tms5) genes. The total gene pool was divided into four major populations as P1 possessing the pms3, P2 possessing tms5, P3 possessing both pms3 and tms5 genes, and P4 containing conventional breeding lines without any male sterility allele. The high genetic purity was revealed by homozygous alleles in all populations. The population admixture, principle components and the phylogenetic analysis revealed the close relations of P2 and P3 with P4. The population differentiation analysis showed that P1 has the highest differentiation coefficient. The lines from P1 were observed as the ancestors of other three populations in a phylogenetic tree, while the lines in P2 and P3 showed a close genetic relation with conventional lines. A core collection of top 10% lines with maximum within and among populations genetic diversity was constructed for future research and breeding efforts.

Conclusion

The low genetic diversity and close genetic relationship among PTGMS lines in P2, P3 and P4 populations suggest a selection sweep and they might result from a backcrossing with common ancestors including the pure lines of P1. The core collection from PTGMS panel updated with new diverse germplasm will serve best for further two-line hybrid breeding.

View this article's peer review reports

Breeding and study of two new photoperiod- and thermo-sensitive genic male sterile lines of polyploid rice (Oryza sativa L.)

Article Open access 07 November 2017

Genetic analysis of environment-sensitive genic male sterile rice under US environments

Article 04 February 2019

Recurrent selection breeding by dominant male sterility for multiple abiotic stresses tolerant rice cultivars

Article Open access 09 November 2017

Background

Rice (Oryza sativa L.) is the most important cereal and staple food for more than 50% of global population and 90% of the world’s rice is grown in Asia [1]. The world population is expected to rise to 10 billion by 2050 and to feed this population agriculture production needs to be expanded by 70%. It was estimated that the world population would require 763 million tons of rice in 2020, and 852 million tons by 2035 [2, 3]. Harnessing heterosis is one of the major approaches to increase rice yield and has made a great contribution to food security in China and many other countries [2, 7]. The photo or temperature sensitive genetic male sterile (PTGMS) lines have occupied millions of hectares of rice field in China for more than a decade [11,12,13,16]. It enables researchers to readily select sub-sets of informative SNPs for use on smaller, in-house platforms for immediate applications in various genomic and molecular approaches including marker-assisted or genomic selection, genome-wide polymorphisms in high throughput QTL and association map**. It is also used to select highly targeted sets of SNPs for high-resolution haplotype analysis and gene discovery. Several medium-density and high-density chip arrays have been developed for rice [17]. Theses SNP assays have been developed at different densities, for example the 50 K-SNP chip [18], C6AIR [19], the RICE6K [21], and the 700 K-SNP High Density Rice Array [22]. The SNP density required to meet these criteria in rice is ~ 6–7000 markers, due to the significant differences in SNP distribution and frequency that characterize the deeply differentiated subpopulations of O. sativa [19]. The availability of high-density SNP chips for rice makes it possible to undertake large-scale, high-throughput germplasm characterization, enhancing the value of the genetic resources available in the world’s major germplasm repositories.

In this study, 131 two-line PTGMS rice lines were clustered on the basis of sterile genes. The clusters were further genotyped by 56 K whole genome rice SNP-chip and the genetic relationship among the materials was analyzed. The core breeding collection of these materials was screened and the selective sweep that occurred during the breeding process was investigated. The results provide important insights into the narrow genetic basis of available PTGMS gene pool and is a reference for the future two-line hybrid breeding and research programs in rice.

Results

Screening for photo- and temperature-sensitive nuclear genes

The collection of 131 two-line male sterile lines was evaluated and screened against the photosensitive male sterility (pms3) and temperature sensitive male sterility (tms5) genes to identify the similarity among them. The total gene pool was divided into four major populations. There were 9.16% (12) lines in the population 1 (P1), possessing the pms3 and denoted as the photosensitive genic male sterile population. A total of 77% (101) lines were in the population 2 (P2) possessing tms5 and denoted as the thermo sensitive genic male sterile population. Only 3.82% (5) lines possessed both pms3 and tms5 genes and were classified as the population 3 (P3). They are denoted as the photo/thermo sensitive genic male sterile population. The other 9.92% (13) lines were the normal breeding lines without any PTGMS gene, classified in the population 4 (P4) and denoted as the conventional breeding population (Additional Table 1).

Genoty** and evaluation of genomic variants

In the present study, we genotyped the 131 lines with 56 K SNP marker chip. The average SNP density was 1.52 SNPs per 10 kb genomic region with a range from 1.32 in the chromosome 12 to 1.74 SNP in the chromosome 3. The average filtered SNP density was one SNP per 15.5 kb genomic region which varied from 1 SNP per 9.7 kb for P4 to 26 kb for P3 (Fig. 1). The genome coverage ranged from 91% for chromosome 1 to 100% for most of the other chromosomes. The markers density was much higher than 12 SNPs per Mb in intergenic SNP based assay [19] but relatively lower than the previously reported 0.745 per kb 50 kb density in gene single copy-based chip [23]. Hence, the markers density in this study is suitable to investigate the genetic diversity among genotypes.

Genetic purity

Genetic purity among the genotypes can be evaluated by the available homozygous or heterozygous alleles. In the current study, we assessed the frequency of homozygous and heterozygous alleles per locus and estimated the nucleotide diversity and Shannon’s index (I). The genome of all four populations was highly homozygous with 93 to 97% homozygous alleles. The lowest population heterozygosity among alleles was observed in P4 (2.66%) followed by P3 (3.14%), while the maximum heterozygosity was observed for P1 (6.80%) followed by P2 (3.70%) (Fig. 2, Additional Table 1).

Genome wide nucleotide diversity

The genetic diversity among 118 male sterile lines and 13 conventional breeding lines was evaluated. The population’s nucleotide diversity (π) was low, ranging from 7.54 × 10^− 8 to 4.29 × 10^− 4. It indicates the close relatedness among the genotypes and suggests the limited number of available signature-genes in the germplasm. Among all four populations, we observed the lowest genome-wide π-value in the P4 (male fertile) genotypes (π = 0.0000339), while the highest genetic diversity (π = 0.0000503) was in P1. The values for genetic diversity among P2 (π = 0.0000348) and P3 (π = 0.0000347) were approximately same (Fig. 2, Additional Table 1). Within each population, the widest genetic diversity range was observed for P3, while it was the lowest in P2.

Genome-wide genetic differentiation

Although the genetic differentiation between the whole male sterile population (P1, P2, P3) and the conventional lines was not very high (weighted Fst = 0.078), the highest levels (mean weighted Fst > 0.2) of genetic differentiation was revealed by P1 with other populations. It was maximum with P4 (weighted Fst = 0.252) followed by P3 (weighted Fst = 0.178) and P2 (weighted Fst = 0.108) (Fig. 3, Table 1). The level of genetic differentiation between P3 and P4 was similar to that of P1 from P2. On the other hand, the genetic differentiation of P2 from P3 and P4 was not very high (weighted Fst = 0.045 and 0.099, respectively). It suggests a driving force of pms3 and tms5 genes in sha** the genetic variation pattern and indicates the genetic similarity of P2 and P3. The linkage disequilibrium (LD) was estimated by r² for the distance classes of < 1 kb in 30 kb distance around the loci pms3 and tms5. The average r² value reached the threshold of r² < 0.5 at the distance of 3.9 kb for markers around the tms5 locus while this value remained above threshold for the pms3 locus (Additional Fig. 5).

Table 1 The Fst values for pairwise comparison among populations

Full size table

Phylogenetic cluster analysis of the four populations of PTGMS lines of rice

The phylogenetic analysis was performed with all selected SNP markers to reveal the ancestral relation among the 118 male sterile lines and 13 conventional breeding lines. According to the neighbor joining (NJ) tree, all the lines could be divided into four clades at the 0.05 genetic distances (Fig. 4). Among them, the first clade consisted of five lines (N5088S, Nongken58S, Wan2304S, 7001S, and N95076S). These lines were at the maximum genetic distance of 0.29 and were the root of the phylogenetic tree. These lines could be the ancestors of other male sterile lines. The second clade was composed of six genotypes, including five genotypes (GD-1S, H03S, S242, S240, and 1103S) belonging to P1, and one genotype (11Fan17S) belonging to P2. Four genotypes (H03S, S242, S240, and 1103S) in this clade grouped together and showed a relatively high distance from other genotypes, while the two other genotypes from P1 (GD1S) and P2 (11Fan17S) made the root of remaining genotypes. These genotypes might be the progenitors of clade 1 lines and the ancestors of the remaining as they showed close relation to other genotypes. The remaining genotypes from P2 and P3 were grouped with the genotypes from P4, indicating their genetic resemblance.

Principal component analysis of the four populations of PTGMS lines of rice

The results were further supported by the principal component analysis (PCA). The top two principal components PC1 and PC2 explained 19.77 and 7.62% of the total variation, respectively, and divided the germplasm into two major categories. The next two PCs such as PC3 and PC4 explained 6.23 and 3.82% of the total variation, respectively (Additional Fig. 1). One of two clusters grouped the five breeding lines of P1 (clade-I of phylogenetic tree) with a wide genetic distance from other clade, while the other clusters possessed all of the genotypes from P2, P3 and P4 with a narrow genetic distance. Being an independent cluster from others, the Cluster 1 harbored the highest level of genetic differentiation (Table 1, Fig. 5, Additional Fig. 1). Its unique genetic variation pattern could also be evidenced in the top two PCs. The succeeding clusters could further be classified into three overlap** groups (Fig. 5).

Admixture cluster analysis of the four populations of PTGMS lines of rice

To infer the admixture degree across the 131 samples, we further performed an unsupervised admixture analysis with 56 K SNP markers based on K run from 2 to 4. We found that at K = 2, a genetic divergence occurred between the P1 genotypes and their close relatives, while K = 3 and K = 4 sub-divided the groups. At K = 4, the whole germplasm was divided into four (S1, S2, S3, S4) groups of 28, 5, 23 and 75 genotypes, respectively. Except for the genetically distant five genotypes from P1 grouped as S2 in the structure analysis, a potential widespread genetic introgression from conventional breeding lines of P4 to other populations was observed across K = 2 to K = 4 (Fig. 6). There were 10, 10 and 13 genetically pure lines in S1, S3 and S4, and the remaining showed genomic introgression. These results reinforce the previous analysis with pure lines and mixed genomic lines [21, 24]. Among the populations, 5, 31, and 1 genotype from P1, P2, and P4, respectively, were observed to be pure lines. In contrast, all the conventional breeding lines in P4 had the introgressed genomic components. The pure lines specifically the genotypes in S2 are likely the ancestors of the remaining germplasm.

Genome-wide selective sweep signals, their molecular function and validation

In order to better detect genome-wide selection signals related to the male sterility in the genotypes, we divided the populations into male sterile (P1, P2, P3) and male fertile (P4) groups. The high Fst values (top 1%, Fst value> 0.34) were used as criteria for classifying the selective sweeps. There was no selection sweep on chromosome 3, 5, and 8. A total of 1044 candidate genes were found within the sweeps detected on other chromosomes. Some of these genes could be associated with sterility (Additional Table 3). Five sweeps located on chromosomes 1, 2, 4 and 6 exhibited high Fst values (0.888, 0.718, 0.652, 0.650, 0.643) indicating obvious genetic differentiation between male sterile and fertile populations. The largest genomic region of 2 Mb containing 376 candidate genes was observed on chromosome 2 followed by 1 Mb region of chromosome 4 containing 174 candidate genes (Additional Table 2). Kyoto Encyclopedia of genes and genomes (KEGG) pathway enrichment analysis revealed that the candidate genes in the selection sweeps were mainly involved in the ‘Alanine, aspartate and glutamate metabolism’ and ‘ABC transporters’ pathways (Fig. 7A). Gene ontology (GO) analysis revealed 113 GO terms of which the molecular binding was identified as the top enriched ‘Molecular function’ (Fig. 7B).

To further validate the genome-wide selective sweep signals, three pedigree groups A, B, and C were obtained from the rice breeding database according to their breeding history (Additional Fig. 2). In the pedigree groups A, there were 7, 15 and 1 genotypes containing pms3, tms5 and pms3.tms5 genes, respectively, while the genotypes in group B and C possessed tms5 gene. The genome wide diversity for all three groups was investigated and the genetic differentiation from conventional genotypes (P4) was studied. A total of 185, 182 and 181 selection sweep signals were observed for pedigree group A, B and C, respectively, in comparison with conventional lines at the Fst threshold of top 5% selective sweep signals (Additional Fig. 3, Additional Table 4). Among the top 1% selective sweeps, we found the same genomic regions as identified in top genetic differentiation hits for male sterile (P1, P2, and P3) and conventional breeding lines (P4). The genes in candidate regions were subjected to GO analysis. The ‘Binding’ type of molecular functions involved in ‘metabolic’ and ‘signal transduction’ processes in ‘Nucleus’ and ‘membrane complexes’ were on top hits (Additional Fig. 4).

Selection of core germplasm

All the genotypes were arranged on the basis of genetic diversity and top 30% genotypes were selected at the first step. All the cluster analysis and the male sterility allele’s evaluation grouped the germplasm into four groups. The genotypes in each group were arranged on the basis of their available sterility allele and the genetic diversity among the genotypes. The top 10% of commonly selected genotypes from both procedures resulted in the selection of 13 genotypes to develop a core collection. Among them, 2 (GD1S and N5088S), and 11 (ZhunS, Biao506S, Shen08S, **ShanS, 2301S, S204, Ke8S, Long605S, 66S, Longke638S, and 99S) genotypes were selected from P1 and P2, respectively. Furthermore, one genotype (S239) from P3 and one conventional genotype (Shuhui881) were also included in the core collection.

Discussion

Crop breeding programs aim to harness genetic diversity for desirable phenotypes to meet human demands [25]. To achieve the ideal genotype, the bottleneck effect on phenotypic selection in elite varieties during rice breeding programs have dramatically narrowed down their genetic diversity [26]. However, the information about genes which generated the changes in desirable phenotypes in elite rice varieties is limited. Even some genes may cause the transition of PGMS line to TGMS line [27]. The photosensitive (pms3) and temperature sensitive male sterility (tms5) genes could classify the collection of 131 two-line male sterile lines into four major populations in this study. The information about available PTGMS genes and markers will not only help the direct selection of genotypes in hybrid breeding but also for the future research programs. Based on these genetic markers, other germplasm resources could also be evaluated and manipulated to enhance the genetic diversity. In addition, genome-wide marker analysis of various rice populations has demonstrated that shifts in genetic population structures have occurred multiple times in history [28]. The shift of genetic diversity in the local gene pool may be managed with the use of germplasm for human demands in rice breeding programs [28].