Introduction

Plant height is a key factor in determining rice yield since excessive height can easily cause lodging and reduce yield. As such, the breeding and large-scale promotion of semi-dwarf rice varieties, which has been conducted since the 1950s, has increased rice yield by 20–30% and has, accordingly, been hailed as the "green revolution" in rice production. The first major breakthrough was attributed to the successful application of a semi-dwarf gene (Peng et al. 1999; Hedden 2003).In the 1970s and 1980s, three-line rice breeding aimed at promoting heterosis resulting in another leap in rice yield and progression towards ensuring China’s food security. Nowadays, semi-dwarfing materials are also used in the successful development of hybrid rice. However, in the 30 years succeeding the two major leaps in dwarf and hybrid rice breeding, the yield of rice in China has remained more or less stagnant.

Super rice breeding with ideal plant type as the model represents an important strategy for promoting future improvements in China’s rice yield. The main strategy of such breeding efforts is to combine ideal plant type with heterosis (Chen et al. 2001). Plant type improvement is the main aim of rice breeding and is highly dependent on plant height. However, the semi-dwarf varieties that are currently used in production are all associated with the recessive dwarf gene Ossd1, and the wide application of this single gene poses a potential risk of genetic diversity loss. At the same time, rice varieties that carry the semi-dwarf gene sd1 also exhibit poor drought tolerance and low photosynthetic effect, which has become a bottleneck in the development of new rice varieties (Sha et al. 2021). Therefore, the identification and analysis of plant height-related genes, to elucidate their physiological, biochemical, and molecular mechanisms, have significant implications for rice breeding and production.

Rice plant height is a quantitative trait controlled by multiple genes, that are associated with a variety of physiological traits and processes. In fact, at least 70 dwarf mutants of rice have been discovered, and the underlying mechanisms have reportedly been associated with the phytohormone signaling and the biosynthesis of gibberellic acid (GA), abscisic acid (ABA), and various brassinosteroids (Spielmeyer et al. 2002; Sakamoto et al. 2004; Hubbard et al. 2010; Sun 2010; Song et al. 2012; Shen et al. 2014; Tong et al. 2016). However, to ensure sufficient statistical power, this strategy requires the genotype and phenotype analysis of numerous offsprings, which is both time-consuming and labor-intensive. In contrast, bulked segregant analysis (BSA) only requires the genoty** of individuals with extreme phenotypes (Giovannoni et al. 1991; Michelmore et al. 1991). Initially, BSA was widely used in QTL identification and gene mining related to specific traits; such as disease resistance, color, and fertility (Zhang et al. 1994, 1996; Monna et al. 1995). However, the recent and rapid development of next-generation sequencing (NGS) technology has enabled the use of BSA, along with whole-genome sequencing (i.e., BSA-seq), to efficiently identify QTLs and trait-related genes (Abe et al. 2012; Kadambari et al. 2018). In comparison to traditional QTL map**, BSA-seq ensures improved work efficiency and sufficient statistical power and has been successfully applied to a variety of plant taxa; including arabidopsis, rice, maize, and pepper (Ramirez-Gonzalez et al. 2015; Huang et al. 2017; Zegeye et al. 2018). However, because the method yields less-than-ideal confidence interval resolution, researchers have had to combine BSA-seq with fine map** to identify potential candidate genes for specific QTLs (Wambugu et al. 2018). For instance, the QTL qRSL7 (Lei et al. 2020) was mapped from 4.17 Mb to 222 kb by BSA-seq and classical QTL map** to allow the identification of final candidate genes. In addition, the gene ZmVEN1, which is associated with maize grain texture, was detected using BSA-seq and RNA-seq (Wen et al. 2019). Thus, the integration of BSA-seq and fine map** are necessary for locating major QTLs and mining target genes.

Accordingly, the aim of the present study was to implement BSA-seq and fine map** for the identification of height-related genes in rice. An F2:3 population derived from a cross of tall and dwarf varieties (Longyang11 and Dongfu114, respectively) was subjected to BSA-seq analysis and fine-map** strategy, along with haplotype and sequence analysis, to identify potential candidate genes. Further, CRISPR/Cas9 gene editing was used to develop a knockout mutant to assess the function of the putative height-related genes.

Materials and Methods

Plant Materials and Height Evaluation

The japonica varieties ‘Dongfu 114’ (‘DF114’) and ‘Longyang 11’ (‘LY11’) were obtained from Northeast Agriculture University (Harbin, China) and used as female and male parents, respectively, to generate an F2:3 population of 638 individuals. In the spring of 2019, ‘DF114’ (n = 48), ‘LY11’ (n = 48), and F2:3 (n = 638) individuals were planted in four rows under natural conditions in paddy fields at Acheng Experimental Station (Heilongjiang Province, China), and 5 plants from the center of each plot were selected for plant height evaluation.

Construction of Segregating Pools

All flag leaves of the 638 F2:3 individuals were collected separately for total genomic DNA extraction, which was performed using the Cetyltrimethylammonium bromide (CTAB) method (Murray et al. 1980), with minor modifications, and the isolated DNA was quantified using a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). After quantification, a Qubit 2.0 fluorimeter (Life Technologies, Carlsbad, CA, USA) was used to pool equimolar amounts of genomic DNA from the 30 shortest and 30 tallest individuals (L-pool and H-pool, respectively).

Whole-Genome Re-Sequencing

Total genomic DNA was extracted from the bulked pools, and at least 3 µg of genomic DNA was used to construct paired-end libraries, with an insert size of 500 bp, using the Illumina paired-end DNA sample prep kit (San Diego, CA, USA). The resulting libraries were sequenced by Genedenovo (Guangzhou, China) using the HiSeq X10 NGS platform (Illumina). To achieve quality trimming, thereby ensuring high-confidence variant calling, raw reads were filtered by removing reads with ≥ 10% unidentified nucleotides, with > 50% bases with shared quality scores of ≤ 20, or aligned to the barcode adapter. To identify SNPs and indels, filtered reads were aligned to the ‘Nipponbare’ reference genome sequence (Matsumoto et al. 2005) using the Burrows-Wheeler Aligner (v.0.7.16a-r1181; Kumar et al. 2019), using ‘mem -M’ setting, where -M is an option used to mark shorter split-alignment hits as secondary alignments. Variant calling was performed using the GATK Unified Genotyper (v.3.5; https://gatk.broadinstitute.org), and both SNPs and indels were filtered using the GATK Variant Filtration function with proper standards (-Window 4, -filter "QD < 4.0 || FS > 60.0 || MQ < 40.0 ", -G_filter "GQ < 20"). All mutations for genes, functions, and genomic regions were annotated using ANNOVAR (Wang et al. 2010). Association analysis was performed using, ∆(SNP-Index) (Abe et al. 2012; Takagi et al. 2013), G-value (Magwene et al. 2011; Mansfeld et al. 2018), ED (Hill et al. 2013), and two-tailed Fisher’s exact test (Fisher et al. 1922) values; the overlap** interval of the four methods was considered as the final QTL interval.

Development of SNP Markers and Narrowing Candidate Interval

KASP markers were designed using Premier 5.0 (Additional file 1: Table S4), and markers that were polymorphic between parents were used to validate BSA-seq results and construct a linkage map, by genoty** the 638 progeny individuals, and to narrow the potential candidates using the Inclusive Composite Interval Map** (ICIM) module of QTL IciMap** (v.4.2; http://www.isbreeding.net). The Logarithm of the odds (LOD) score threshold for confirming a significant QTL was established using a permutation test with 1000 replicates and a significant level of P < 0.01.

Knockout Plant Construction

CRISPR/Cas9 gene-editing vector construction was conducted as described by Li et al. (2017). Two target sequences, including Point accepted mutation (PAM) (GGCAAGGGAGGGAAGGGTCTCGG / CGTCTACGCCCTCAAGCGCCAGG) were selected within the target genes, and the targeting specificity was confirmed using a BLAST search against the rice genome (Hsu et al. 2013). Genomic DNA was extracted from these knockout lines; and after PCR amplification, the designed target site amplicon (300–500 bp) was sequenced directly and identified using the Degenerate Sequence Decoding method (Ma et al. Full size image

Whole-Genome Resequencing and BSA-Seq Analysis

The mean coverage depth for the parents and the two pools was 50 × , and comparison of the sequences to the ‘Nipponbare’ reference genome resulted in the identification of 801,247 SNPs and 133,510 indels, which were reduced to 277,729 SNPs and 45,142 indels by trimming and filtering (Additional file 1: Table S2). A total of 322,871 high-quality SNPs/indels that were homozygous in each parent and polymorphic between the parents were then selected for BSA-seq analysis, using ∆(SNP-Index) values (Fig. 2A), Euclidean distance (ED) values (Fig. 2B), and G-values (Fig. 2C), as well as Fisher’s exact test P-values (Fig. 2D), were used to identify candidate plant height-related QTL regions (Table 1). Two significant (P < 0.01) peaks in the ∆(SNP-Index) distribution spanned 12.14-Mb (4.78–16.92 Mb) and 5.98-Mb (11.82–17.80 Mb) intervals (qPH7 and qPH9) on chromosomes 7 and 9, respectively, and significant peaks in G-value distribution completely covered the results of ∆(SNP-Index) analysis, whereas those in the ED and Fisher’s exact test P-value distributions completely covered the interval on Chromosome 9 alone. Furthermore, only the region identified in the Fisher’s exact test P-value distribution (2.02 Mb of qPH9) was included in the intersection of genome regions identified by all four methods. It is also worth noting that, in the Δ(SNP-Index) and ED value distributions, the peak values observed for qPH9 were higher than those observed for qPH7 (Table 1), which indicated a greater difference in the allele ratio between the two mixed pools. Therefore, qPH9 was considered to be a more significant target for mining candidate plant height genes. There is a known panicle gene DEP1 in the qPH9 interval, and the DEP1 genome sequence analysis shows that there is no difference between ‘DN114’ and ‘LY11’ (Additional file 2: Fig. S1).

Fig. 2
figure 2

Quantitative trait locus (QTL) analysis of rice plant height at maturity using 4 QTL-seq methods. A Manhattan plot showing the distribution of Δ(SNP-index) on chromosomes. B Manhattan plot showing the distribution of Euclidean distance (ED2) on chromosomes. C Manhattan plot showing the distribution of G-value on chromosomes. D Manhattan plot showing the distribution of log-transformed Fisher’s exact test P-value distribution, –log10(p) on chromosomes. Blue and red lines represent 95 and 99% confidence intervals, respectively, and black lines represent mean values of the 4 algorithms, which were drawn using sliding window analysis. Numbers on the horizontal coordinates represent chromosome numbers

Table 1 QTLs conferring plant height at maturity by 4 methods for identification using BSA-seq

Fine Map** of qPH9

The qPH9 coding region harbored 26 SNPs and 14 indels (Additional file 1: Table S3). The analysis of 13 highly credible SNPs (Additional file 1: Table S4) in the 638 progenies, using a LOD score of 3.0 as a threshold for consecutive occurrence, yielded a qPH9 linkage map (Fig. 3A). The qPH9 locus was simultaneously linked to plant height and anchored to the 126 kb interval between 15,803,211 bp and 15,929,211 bp (Fig. 3B, Additional file 1: Table S5), which explained 20.50% of the phenotypic variation in plant height, and the peak LOD score was 32.95 (Table 2). The positive qPH9 allele was contributed by the ‘DF114’parent.

Fig. 3
figure 3

Fine map** of the quantitative trait locus (qPH9) associated with rice plant height at maturity. A Detection of qPH9 by ICIM module of QTL IciMap** 4.2. The orange bar with vertical black lines and labels represents the linkage map and Kompetitive allele-specific PCR (KASP) markers. B Putative plant height genes identified at qPH9 using annotation information from the ‘Nipponbare’ reference genome (http://plants.ensembl.org/index.html/)

Table 2 Identification of qPH9 for plant height at maturity by linkage analysis

Putative Candidate Genes for qPH9

Annotation of the 126 kb interval, using annotation information of the ‘Nipponbare’ reference genome (http://plants.ensembl.org/index.html/), resulted in the identification of 15 candidate genes (Fig. 3B). These genes include 4 hypothetical proteins, 2 non-coding proteins, 2 C2H2 transcription factor family proteins, 2 Glycosyl transferase proteins, 1 Cold-regulated protein, 1 AAA-type ATPase family protein, 1 Histone H4 protein, 1 Pectin methylesterase and 1 Senescence-associated protein (Additional file 1: Table S6). The resequencing data (Additional file 1: Table S3) indicated that the difference between ‘DN114’ and ‘LY11’ were in the 15,803,211 bp and 15,929,211 position of Chromosome 9, which was within the range of 126 kb. Among them, SNP (9:15,916,244) is a non-synonymous mutation in the CDs region of Os09g0433600. In addition, the genome sequence of Os09g0433600 also showed that, there were only SNP differences at 9:15,916,244 between Os09g0433600 of ‘DN114’ and ‘LY11’ (Additional file 3: Fig. S2). Therefore, we believe that Os09g0433600 is a candidate gene for qPH9, and named OsPH9.

Significant Association SNP of OsPH9 with Plant Height by RFGB Database

With RFGB Database analysis (Fig. 4; Additional file 1: Table S7), 8 SNPs were obtained in the promoter region and CDs region of OsPH9, which were 15,915,682 (C > A/M), 15,915,683 (A > G/R), 15,915,701 (T > G/K), 15,915,717C (C > T) and 15,915,782 (T > C) in the promoter region, and 15,916,129 (C > T), 15,916,186 (C > A/M) and 15,916,244 (T > C/-) in the CDs region, respectively. A total of 11 haplotypes were obtained based on the different combinations of 8 SNPs. As shown in Fig. 4B, the plant height difference between haplotypes is significant, which indicates that the sequence difference of OsPH9 is closely related to the change in plant height. Among them, 15,916,244 (T > C) is consistent with resequencing results (Additional file 1: Table S3), indicating that 15,916,244 (T > C) may be the main reason for the difference in plant height between ‘DF114’ and ‘LY11’. We then referred to the data from the 3010 Rice Genomes Project and found that 15,916,244C mainly exists in the genotype of japonica, its allele frequency was 2.82% in japonica; however, it did not occur in indica. It could thus be inferred that 15,916,244C is a rare natural variation (Additional file 1: Table S8). Therefore, it can be used as a special functional variation in ‘DF114’ to improve the plant height of rice varieties.

Fig. 4
figure 4

Haplotype and sequence analysis. A Haplotype analysis of Os09g0433600, B Plant height differences of different haplotypes

Knockout of OsPH9 using CRISPR/Cas9 System

Functional analysis of the H4 histone coding gene Os09g0433600 (OsPH9), was conducted using the CRISPR/Cas9-generated osph9-mutant lines (A467-3 and A467-12) in which the motifs GGG and GC were deleted at the 26 and 247 bp of the CDs region, respectively (Fig. 5A). This revealed that the loss-of-function of OsZOS9-12 resulted in reduced height and node length (Fig. 5B,C), with the mature plant height of lines A467-3 and A467-12 being reduced by 7.96 and 8.99%, respectively, when compared to the wild-type plants (Fig. 5D).

Fig. 5
figure 5

Functional analysis of the H4 histone coding gene Os09g0433600 (OsPH9) in rice. A DNA sequences of Os09g0433600 in ‘Dongnong 430’ (WT) and knockout lines (A467-3 and A467-12). B Mature plant height phenotypes of WT and knockout mutant lines. C Stem node lengths of WT and knockout mutant lines. D Mature plant heights of WT and knockout mutant lines (**P < 0.01; Student’s t-test)