Introduction

The area of soybean (Glycine max) cultivation in the world has expanded by more than 900% since the 1960s in North and South America, due to its significant roles in animal feeding and human nutrition (http://www.fao.org/faostat/en/#home). However, soybean yield per unit area has not changed significantly compared with rice, wheat, and maize, suggesting the lack of a true Green Revolution in soybean breeding (Liu, et al. 2020a). Soybean yield is determined by both the total number of nodes and the number of pods per node, therefore the yield increase cannot be achieved in soybean by simply adoption of the shorter varieties. There are several options to increase soybean yields, and hybrid breeding hold the greatest potential to boost yield.

Soybean is an autogamous legume species, and male sterility line is a prerequisites for commercially available hybrid breeding and large quantities of seed production. An earlier heterosis test demonstrated that significant yield increases could be achieved in soybean; the heterozygous F1 plants of 248 combinations yielded 20% more than their parental lines among the 1123 combinations that were tested (Sun, et al. 1999; Palmer, et al. 2001). However, hybrid breeding in soybean has received limited attention in contrast to maize and rice. Male-sterile female lines with cytoplasmic male sterility (CMS) or genic-controlled photoperiod/thermo-sensitive male sterility (P/TGMS) have been extensively used for many years in maize and rice (Chen and Liu, 2014; Wan, et al. 2019). Hybrid rice in China covers 50–60% of the total rice cultivation fields, which contributed greatly to rice yield and ensure food security (Kim and Zhang, 2018; Liao, et al. 2021).

Male sterility lines are available in soybean, and the first soybean CMS line was reported under the US patents No. 4545146 in Davis (1985). Since then, no further information on this CMS line has been reported. Considerable research on soybean CMS lines has been conducted in China since the early 80s of the last century (Sun, et al. 1994a; Palmer, et al. 2001). To date, more than 40 hybrid soybean varieties have been bred and approved in China after several generations of researchers with more than 40 years of efforts, and more than 30 invention patents and technical standards have been authorized for the use of new technologies and methods for soybean hybrid breeding (Sun, et al. 2021). However, the CMS genes and underlying molecular mechanisms are still unknown in soybean, which has restricted the development of commercial varieties.

With the explosion in genomic resources and the rapid development of molecular biology and technology, the biotechnology-based male-sterility (BMS) systems for hybrid breeding have been established in maize, rice and other crops and vegetables (Chang, et al. 2016; Wu, et al. 2016; Singh, et al. 2019). The BMS systems utilize nuclear male sterility to propagate the pure nuclear male sterile seeds on a large scale, which not only make the climate change not a threat to the pure hybrid seed production anymore, but also unlock the potential for breeding superior hybrids through expanding the parental germplasm pool. In soybean, the nuclear male sterile mutants ms4, ms1, ms6, and ms3 have been cloned in recent years (Thu, et al. 2019; Fang, et al. 2021; Jiang, et al. 2021; Nadeem, et al. 2021; Yu, et al. 2021; Hou, et al. 2022). To speed up the large-scale commercial cultivation of hybrid soybean, it is time to consider where to put the investments, should we continue to count on the three-line hybrid system and looking for the ideal maintainer and restorer lines, or we can rely on the BMS systems to realize the commercialization of hybrid soybean. In this review, we try to cover recent advances in cytoplasmic-nuclear and nuclear male sterility systems in soybean to see if the technological breakthroughs will make us to succeed in hybrid soybean production.

Male sterility in plant

Plant male sterility (MS) refers to the phenomenon that the stamen develops unnormal, losing the ability to produce the functionally active male gametes for fertilization. According to their phenotypic characteristics, Kaul (1988) divide MS into three categories including structural, sporogenous, and functional. Structural MS indicates that the stamen is either completely absent or abnormally formed, which results in the absence of pollen. Sporogenous MS indicates that the stamen is essentially morphological normal, but fail to produce functional microspores or pollen due to the failure of early microsporogenesis and late microgametogenesis. Functional MS indicates that the viable pollen is produced, but either cannot be released from the anther due to the absence of dehiscence or is unable to geminate on the stigma and to initiate fertilization.

On the other scheme, according to the origin of inheritance, two types of MS are distinguished: cytoplasmic male sterility (CMS) and nuclear or genic male sterility (GMS). CMS is co-controlled by the nuclear and cytoplasmic genes, while GMS is controlled by the nuclear genes alone. CMS is widely spread in the higher plants and more than 300 species possess CMS were reported up to now (Liu, et al. 2001). The CMS is the result of the incompatibility between nuclear and mitochondrial gene products and there are several ways to generate CMS, including wide/inter-specific hybridization, protoplasmic fusion, induced mutations and genetic engineering (Bohra, et al. 2016). GMS is derived from the changes in the structure and function of nuclear genes, most of which are caused by natural variation and can also be achieved by physical and chemical mutagenesis. Mostly, the fertility of a GMS line is controlled by a recessive gene, and rarely by a dominant gene.

Cytoplasmic male sterility (CMS) system

CMS/Rf (restorer-of-fertility) system, also known as the three-line hybrid system, comprises a cytoplasmic male sterile line, a maintainer line, and a restorer line. The sterile line contains a cytoplasmic male sterile gene, while lacks a nuclear restorer gene (Schnable and Wise, 1998), which is characterized by sterile pollen and unable to produce progeny by self-inbreeding. The maintainer line excludes the nuclear restorer gene but contains the fertile cytoplasmic gene (Chen and Liu, 2014). However, the restorer line preserves a functionally nuclear gene and with or without a fertile cytoplasmic gene (Chen and Liu, 2014). The pollens of both the maintainer and the restorer lines are fertile, so they can propagate by self-pollination. When the sterile line is used as the female parent, it can receive pollen from either the maintainer or restorer line and produce hybrid progeny. The maintainer line is used to cross with the male sterile line to reproduce the male sterile line, while the restorer line is used to cross with the male sterile line to produce hybrid progenies with heterosis to realize yield increase.

The CMS/Rf system has been exploited for hybrid seed production in plenty of crops such as maize, rice, wheat, rape, soybean, sorghum, carrot, sugar beet, sunflower, cotton, pepper, and petunia (Garcia, et al. 2019). Although this system has been successfully applied to soybean, the yield increase is still far away from that of rice and maize. One of the main reasons is the limited number of identified CMS lines, which heavily restricts the utilization of the three-line system in soybean hybrid seed production. In order to address this issue, the various cytoplasmic genes that produce MS phenotypes, along with their corresponding nuclear-encoded restorer-of-fertility genes, need to be identified urgently.

Genic male sterility (GMS) system

GMS is controlled by nuclear genes without the influence of the cytoplasmic genome that are either insensitive or sensitive to environmental conditions, called genetically stable GMS and environment-sensitive genic male sterility (EGMS), respectively. In the case of EGMS, male fertility is often impressionable to different environmental conditions, including photoperiod (PGMS), temperature (TGMS), photoperiod and temperature (PTGMS), and humidity (HGMS) (Chen and Liu, 2014; Xue, et al. 2018; Abbas, et al. 2021). EGMS is regarded as an efficient genetic tool to develop two-line hybrids, since the need of a maintainer line can be eliminated, and the male sterile line can be propagated by self-pollination under specific conditions (Garcia, et al. 2019). In this system, almost every conventional inbred line is able to restore the fertility of the male-sterile line, and no negative effects related to sterility-inducing cytoplasm have been observed. Furthermore, genes of this system can be easily transferred to other genetic backgrounds (Yu, et al. 2016) developed new “transgene clean” commercial TGMS lines in rice by knocking out TMS5 via CRISPR/Cas9. Whereafter, Li et al. (2017) produced TGMS maize by targeted mutation of the maize homolog of rice TMS5 (called ZmTMS5) using the CRISPR/Cas9 editing system. In addition, two rice reverse PGMS lines in japonica cultivars 9522 and JY5B were also generated by editing Carbon Starved Anther (CSA) gene using CRISPR/Cas9 (Li, et al. 2016). Furthermore, Qi et al. (2012; Wang, et al. 2016; Dai, et al. 2022). To narrow down the candidate region for Rf1 gene for CMS-RN, Guo et al. (2022) constructed an F2 population by crossing JLCMS204A with JLR230 (restorer line), and the gene was located between the marker dCAPS-1 and BARCSOYSSR_16_1076., A recent study identified Glyma.16G161900 as the candidate gene of Rf1 (Yang, et al. 2023). In addition, Glyma.09G171200, encoding a pentatricopeptide repeat (PRR) protein, was confirmed as the candidate gene of another Rf3 gene for CMS-RN (Sun, et al. 2022). In addition, the Rf gene of CMS-ZD type was located to the marker BARCSOYSSR_16_1064 and BARCSOYSSR_16_1082 on Chr. 16 (Dong, et al. 2012). Another Rf-m gene of CMS-ZD allocating on Chr. 16 was identified between GmSSR1602 and GmSSR1610 (Wang, et al. 2016). Furthermore, another PPR gene (GmPPR576, Glyma.16G161900) was identified as the candidate Rf gene of CMS-N8855 type (Wang, et al. 2021), which was consistent with that of Rf1 gene (Yang, et al. 2023). Four Rf genes for CMS-RN, CMS-ZD and CMS-N8855 were closely distributed on Chr. 16 with a close region (Dong, et al. 2012; Wang, et al. 2016; Wang, et al. 2021; Guo, et al. 2022), whether they were controlled by the same gene needs to be further verified.

In addition, due to the lack of systematically cytological observation and the inconsistent cytological phenotype even for the same CMS type, for example, the CMS-N8855 line, whether the different CMS types are really distinguished from each other should be confirmed (Ding, et al. 2001; Fan 2003). Furthermore, an unusual phenomenon also happened that the same maintainer and restorer line can maintain and restore different CMS type, viz. YA (CMS-RN) and ZA (CMS-ZD) (Zhao, et al. 1998). Considering the contradictions, we cannot rule out the possibility that the six classified CMS types may not be completely different from each other.

GMS system in soybean

The first report of GMS line in soybean was published in 1928, the mutant st1 was both male and female sterile caused by abnormal chromosome association, which was controlled by a single recessive gene (Owen 1928). To date, approximately 30 GMS lines have been identified in soybean (Table 3). According to the phenotypic characteristics, fs1fs2 (Johns and Palmer, 1982) and ft (transformed flower) (Singh and Jha, 1978) belong to the structural MS, the others belong to sporogenous MS, and no functional MS has been reported in soybean.

Table 3 The main GMS locus in soybean

Two PGMS including ms3 (Chaudhari and Davis, 1977) and 88-428-BY (Wei 1991) and three TGMS including ms8 (Palmer 2000), ms9 (Palmer 2000), and msp (Stelly and Palmer, 1980) have already been reported. In addition, st1-st8, NJS-1H, D8804-7, and fs1fs2 mutants were both male and female sterile (Owen 1928; Hadley and Starnes, 1964; Palmer 1974; Johns and Palmer, 1982; Palmer and Kaul, 1983; Skorupska and Palmer, 1990; Zhao, et al. 1995; Ilarsian, et al. 1997; Palmer and Horner, 2000; Kato and Palmer, 2003; Li, et al. 2010; Speth, et al. 2015). The ms1 was the first GMS line that showed male sterile and female fertile phenotype in soybean (Brim and Young, 1971). In addition, ms2, ms4-ms7, ms12, MJ89-1, msMOS, msNJ, N7241S, Wh921, mst-M, and ft also belong to the male sterile and female fertile category (Singh and Jha, 1978; Palmer 1979; Buss 1983; Graybosch, et al. 1984; Graybosch and Palmer, 1985; Skorupska and Palmer, 1989; Ma, et al. 1993; **, et al. 1997; Zhang, et al. 1999b; Palmer 2000; Zhao, et al. 2005; Zhang 2019; Zhao, et al. 2019).

Although five PGMS and TGMS lines have been identified, so far, only MS3 (Glyma.02G107600), encoding a plant homeodomain (PHD) protein, has been identified (Hou, et al. 2022). The fertility of mutant ms3 mutant can restore under long-day conditions, thus the mutant could be used to create a new, more stable photoperiod-sensitive genic male sterility line for two-line hybrid seed production in soybean. With the rapid development of BMS systems in rice and maize, more and more attempts have been made in soybean. The 13 GMS lines (ms1, ms2, ms4-ms7, ms12, NJ89-1, msMOS, msNJ, N7241S, Wh921, and mst-M) displaying male sterile and female fertile phenotypes are suitable for exploiting this new technology in soybean. In order to make this design a reality, a large number of of works have been performed to explore the candidate sterile genes for these GMS mutants. MS4 (Glyma.02G243200) is the first GMS gene that has been discovered in soybean by fine map**, which encodes a MALE MEIOCYTE DEATH 1 (MMD1) protein (Thu, et al. 2019). The functional confirmation of MS4 in regulating male fertility was conducted by heterologous expression in Arabidopsis mmd1 mutant (Thu, et al. 2019). Subsequently, the function of MS12 (Glyma.10G117000) was also confirmed by QTL map** and functional complementation of soybean gene in Arabidopsis cdc20.2 (cell division cycle 20.2) mutant (Zhang 2019). In 2021, both MS1 (Glyma.13G114200) and MS6 (Glyma.13G066600) have been identified by fine map** (Fang, et al. 2021; Jiang, et al. 2021; Nadeem, et al. 2021; Yu, et al. 2021). MS6 encodes a Tapetal Development and Functional 1 (TDF1) protein, a R2R3 MYB transcription factor, and predominantly expressed in anther, where it regulated the formation of pollen grain (Yu, et al. 2021). MS1, encoding a NPK1-ACTIVATING KINESIN 2 (NACK2) protein, is essential for cell plate formation after cytokinesis by directly control of the phragmoplast expansion (Fang, et al. 2021). Identification and characterization of GMS genes will provide more options for building the BMS systems for hybrid soybean production.

Challenges and prospects in the commercialization of hybrid soybean

Although more than 40 hybrid soybean varieties have been generated from the three-line hybrid system (cytoplasmic male sterility), the unstable sterility of MS line and the high cost of hybrid seed production constrained the large-scale application of heterosis in soybean, which makes the cultivation of hybrid soybean still has a long way to go. We believe that the three components are the keys to make hybrid soybean a commercial success:

Identify the male sterile lines with high out-crossing rate

The out-crossing rate is the key determinant of hybrid seed production. Seed production has not been efficient and cost-effective for hybrid soybean. The main reason is that the mutations in causing the male sterility also very often have pleiotropic effects and lead to the defect in female function, which make the male sterile lines with low seed set. The identification of ms1 locus revealed that the gene was highly expressed in style and ovary and may also function in megagametogenesis or embryo development in soybean (Fang, et al. 2021). Studies had focused on the outcrossing rate on male sterile plants, the most promising record was the ms2 mutant, the outcrossing rate on male sterile plants was 74% of the self-pollinated plants (Carter, et al. 1986; Perez, et al. 2009). The feasible solution is to speed up the cloning of the causal gene for male sterile mutants that have good recorded with seed-set, and simultaneously generate new male sterile lines by genome editing to make the mutation only affect the male fertility and without any effects on female productivity and other growth habits.

Besides the finding of ideal male sterility lines from the genetic perspective, the structure changes of flower and reproductive organs, for examples, the stigma protruding beyond the anthers, more pollen grains, and nectaries produce more fluids and/or volatiles, could increase the opportunity for cross-pollination (Palmer, et al. 2001). Pollen grain from soybean is heavy and sticky and the insect-mediated pollination is still indispensable even when the soybean flower is opened. The improvement of techniques for hybrid seed production is equally important for the commercialization of hybrid soybean, including the management of insect pollinators for cross-pollination and the suitable environment for both pollinators and soybeans, etc (Palmer, et al. 2001; Garibaldi, et al. 2021).

Incorporate genomic selection to precise guidance on hybridization combination

Breeding 4.0 has been considered the next revolution of maize breeding (Wallace, et al. 2018). Even though the soybean breeding program is still at the Breeding 2.0 to 3.0 stages with molecular markers and genomic data to complement phenotypic data, the high-quality graph-based soybean pan-genome and the low cost of genome sequencing will turn promise into practice (Liu, et al. 2020b). The genotypes of soybean germplasm lines will be collected using high-throughput genoty** approaches such as next-generation sequencing (NGS) and SNP array platforms. Genetic variations among soybean germplasm of different origins/sources will make the selection of superior hybrid cost-effective.

Good understanding of the molecular mechanisms of anther development in soybean

Little is known about the biological processes and genes that regulate anther and pollen development in soybean. Like most (70%) angiosperms, soybean produces bicellular pollen. By contrast, rice and Arabidopsis both produce tricellular pollen, the biological significance of the evolution of these two types of pollen grains is still unclear (Williams, et al. 2014). Bicellular pollen undergoes mitotic division to form two sperm cells after germination; prior to anthesis, tricellular pollen forms a male germ unit (MGU) that develops rapidly, which may make tricellular pollen favored in angiosperms that demand rapid reproduction (Hackenberg and Twell, 2019). So, the knowledge of extensive studies of anther and pollen formation in Arabidopsis and rice should not be simply transferred to soybean. Taking advantage of the comparative transcriptome analysis, the uncovering of anther-specific genes, genetic networks, and hub genes in soybean anther development will provide important insights into the molecular events underlying soybean reproductive developmental processes, as well as valuable resources for the plant reproductive biology community in the areas of pollen evolution, pollination/fertilization, and hybrid breeding.

In summary, on-going and future research should consider the enhancement of hybrid seed production efficiency, and the long-term investment and commitment will definitely make the commercialization of hybrid soybean a reality.