Background

Rice, which is considered as the representative species of the Oryza genus, is a cereal crop that originated in China and India. It has been cultivated for over 7,000 years in the Yangtze River basin of China and has played a fundamental role in the development of human civilization [10]. Few studies had explored BLS resistance genes, and the molecular mechanisms underlying rice resistance to Xoc and the pathogenicity of this bacterium were not well understood. BC2 generation using the varieties H359 and Jiannong 8 revealed that resistance to BLS was a quantitative trait controlled by multiple genes [11]. Identified quantitative traits controlled by multiple genes were shown to explain the genetic mode of BLS resistance [12]. Using composite interval map**, five QTLs with significant contributions were identified in the resistant population [13]. Using the BLS-resistant near-isogenic line H359R, the genomic composition of resistance QTLs was analyzed by genotype display analysis. The results showed that the BLS-resistant parent Acc8558 contained three resistance QTL regions, while the susceptible parent H359 contained only one resistance QTL region [14]. One QTL, qBlsr5a, was subsequently mapped to 30 kb using a sub-chromosome segment substitution line, suggesting that the most likely candidate gene was LOC_Os05g01710 [15]. A map** population was constructed by crossing wild rice Oryza rufipogon Griff resistant source 'DY19' with indica rice variety 9311. The results showed that 'DY19' was controlled by a pair of new major recessive genes, bls2, located between SL03 and SL04 on chromosome 2, within a range of about 4 cM [16]. A map** and separation population was constructed using the highly resistant international rice variety BJ1 and the highly susceptible local high-quality rice variety Youzhan 8 as parents. A recessive BLS-resistant gene was identified in BJ1 and this gene mapped to chromosome 10 at about 48.8 cM and was closely linked to RM158 [17]. However, the precise roles of these previously identified genes in disease resistance and susceptibility had not been well characterized. Therefore, there was a need for the identification of additional QTLs/genes that confer resistance to BLS in rice to more fully characterize disease mechanisms. Additionally, molecular marker-assisted selection (MAS) required an explicit understanding of the genetic architecture of agronomic traits [18], so more complete identification of QTLs was required for breeding applications.

In recent years, advances in genomic and transcriptomic technologies had provided powerful tools to investigate the molecular basis of plant-pathogen interactions. Microarray experiments, RNA sequencing (RNA-seq), and other high-throughput methods had enabled researchers to analyze the global gene expression patterns of both the host plant and the pathogen during infection, revealing key genes and pathways involved in the disease process. A microarray experiment was conducted to analyze changes in genome-wide gene expression in response to Xoc at the early stage of infection in both the rice transgenic line and its wild type [19]. One of the differentially expressed pathogenesis-related genes (DEPGs), NRRB, was found to play a role in rice-Xoc interactions [20]. A novel ankyrin-like protein, AnkB, was identified in Xoc and can invade host leaves via stomata and wounds. The type three secretion system (T3SS) of Xoc was found to be pivotal to its pathogenic lifestyle [21]. The effector protein AvrRxo1, an ATP-dependent protease, can enhance the virulence of Xoc and inhibit stomatal immunity in rice by targeting and degrading the rice OsPDX1, which was involved in pyridoxal phosphate synthesis, thereby reducing vitamin B6 levels in rice [22]. A rice multi-parent advanced generation inter-cross (MAGIC) population was used to map QTLs conferring resistance to BLS and another major rice disease, bacterial blight (BB) [23]. A genome-wide association study (GWAS) analysis was conducted on a collection of 236 diverse rice accessions, mainly indica varieties, allowing identification of 12 quantitative trait loci (QTLs) on chromosomes 1, 2, 3, 4, 5, 8, 9 and 11, that conferred resistance to five representative isolates of Thai Xoc [24]. GWAS map** was also conducted to study BLS resistance in rice, and resources were identified with strong resistance and significant SNPs that can potentially be used for breeding BLS-resistant rice [25]. Despite these advances additional work was required to identify new QTLs/genes for BLS resistance in rice and confirm the elite alleles for their utilization in modern molecular breeding.

In this study, we conducted a GWAS to identify resistance to bacterial leaf streak (RBLS) using 747 cultivated rice accessions from the 3,000 Rice Genome (3 K-RG) project [2, 26]. We performed integrated gene annotation and genetic variation, homology, haplotype, and transcriptome analysis to identify candidate functional genes and possible causal polymorphisms for BLS in rice. Our results provided insights into the genetic architecture of BLS, and markers derived from the newly identified genes will be useful in improving resistance to BLS for modern molecular rice breeding.

Results

Population structure and phenotypic variation in RBLS traits among 747 cultivated rice accessions

The 747 accessions were collected and a map was plotted using their latitude and longitude based on their clear geographic distribution (Fig. 1a, Additional file 2: Table S1). The Population structure and phenotypic variation in RBLS traits among 747 cultivated rice accessions were analyzed (Fig. 1b-d, Additional file 2: Table S1). We conducted principal component analysis (PCA) using 2.86 million SNPs to assess the population structure of 747 accessions (Fig. 1d). Our analysis revealed a distinct and profound subpopulation structure within this germplasm collection. The PCA indicated that the accessions were classified into two distinct subspecies groups: Oryza sativa ssp. indica (comprising 458 accessions) and japonica (comprising 289 accessions). The first two principal components (PCs) accounted for almost half of the genetic variation, with PC1 and PC2 accounting for 38% and 7% of the total genetic variation, respectively. Furthermore, the neighbor-joining tree also demonstrated a clear subpopulation structure within the 747 accessions (Fig. 1c).

Fig. 1
figure 1

Population genomic analyses and phenotype statistics for RBLS in rice. a. The geographical distribution of 747 cultivated rice accessions using the R package “leaflet”, and the map was based on “Esri.WorldStreetMap” provided by OpenStreetMap. Dark red and light blue dots on the world map represented japonica and indica, respectively. b. Distribution of resistance to bacterial leaf streak (RBLS) in the full population. c. Phylogenetic tree of all accessions inferred from whole-genome SNPs. Major clades were indicated, and dark red and light blue lines represented japonica and indica, respectively. d. Principal component analysis (PCA) for different subpopulations

We utilized ADMIXTURE software [27] to investigate the population structure and genotype structure of the 747 accessions. The analysis was based on the maximum likelihood estimation model and cross-validated for the number of subpopulations (K) to determine the optimal number of ancestral components (K = 1–10). Our results from the structural simulation analysis revealed that the cross-validation error (CV) was minimized when K = 2 (Additional file 1: Fig. S1). Therefore, we selected a k value of 2 to evaluate the genetic structure of the 747 rice genotypes. The distinct levels of K illustrated a clear separation between the indica and japonica subpopulations when K = 2 (Additional file 1: Fig. S1). As a result, we utilized the full population, as well as the indica and japonica subpopulations, to conduct further phenotypic analysis and GWAS.

The resistance to rice bacterial leaf streak displayed a variation that ranged from 0.416 to 2.042 in the full population and indica varieties, and from 0.421 to 1.816 in japonica varieties. Our phenotypic data exhibited a normal distribution and were deemed suitable for association analysis (Fig. 1b).

GWAS and QTL identification of resistance to rice bacterial leaf streak

We conducted GWAS for the full population and the indica and japonica subpopulations, using the phenotypic data of resistance to rice bacterial leaf streak and sequence data from the 747 rice accessions. We employed the general linear model (GLM), mixed linear model (MLM), and fixed and random model circulating probability unification (FarmCPU) implemented in rMVP software [28]. To minimize false positives arising from population structure, we compared the quantile–quantile plots from the three models for BLS resistance for each population and determined that FarmCPU was more suitable than GLM and MLM. The first three principal components were used as covariates within the GWAS model to account for subpopulation structure. We established the suggestive significant threshold at -log (P) = 5 to detect significant association signals. For each GWAS result, we identified SNPs with P-values lower than the threshold (Fig. 2, Additional file 1: Figs. S2-S6).

Fig. 2
figure 2

GWAS results of RBLS in different populations. Manhattan plots and Quantile–quantile plots for GWAS in full (a), indica (b), and japonica (c) using FarmCPU. The red and black genes indicated known genes and candidate genes involved in the RBLS and described in the text, respectively. A dotted horizontal line for each figure indicates the significance threshold (P = 10−5)

Based on this criterion, we identified a total of 89 signals for RBLS across the full, indica, and japonica populations, with 28, 21, and 40 signals detected in the full, indica, and japonica populations, respectively (Fig. 2, Table 1, Additional file 3: Table S2). The QTLs detected by GWAS offered valuable insights into the genetic architecture of the observed phenotypic variations. To further investigate these associations for RBLS, we compared the QTLs identified by GWAS with previous linkage map** results. Our analysis revealed that nine QTLs associated with RBLS had been previously identified in genetic linkage map** studies [16, 31] and Osaba1 (Zeaxanthin epoxidase) [32] (Fig. 2, Additional file 3: Table S2).

OsPSKR1 had been shown to rescue root growth and influence susceptibility to Pseudomonas syringae pv. DC3000 in Arabidopsis pskr1-3 mutants. Moreover, the expression of OsPSKR1 was found to be up-regulated following inoculation with RS105, a strain of Xoc that causes bacterial leaf streak in rice [31]. Based on our sequence data, we found four SNPs (P ≤ 10–2) located in the promoter region of OsPSKR1 (Fig. 3a). We conducted haplotype analysis for OsPSKR1 using the genotypes of these four SNPs for the 747 accessions, and we identified four haplotypes (hap1, hap2, hap3, and hap4) (Fig. 3a). In the japonica subpopulation, we observed a significant difference in the mean value of RBLS between hap2 and hap4; the mean RBLS for hap4 (1.089, 9 accessions) was higher than that for hap2 (0.835, 163 accessions). However, no significant difference was found between the four haplotypes in the indica subpopulation. Therefore, we concluded that OsPSKR1 was a functional gene that regulates RBLS, and hap2 of OsPSKR1 was identified as the superior genotype. Increasing the frequency of hap2 in the japonica subpopulation could enhance resistance to rice bacterial leaf streak. In summary, hap2 of OsPSKR1 could be utilized to improve resistance to RBLS in rice (Fig. 3a).

Fig. 3
figure 3

Exploration of two cloned genes for RBLS. a The cloned genes for RBLS and heat map of the ratio of RPKM. Different colors show ratio of RPKM in the Oryza sativa L. ssp. japonica cv. Nipponbare leaves at 48 h after inoculation with 10 geographically diverse strains of Xanthomonas oryzae pv. oryzicola and mock inoculated. Rows of the heat map correspond to the two cloned genes for RBLS listed on the left of the table. Haplotype analysis of OsPSKR1 (b) and Osaba1 (c). Different haplotypes and a comparison of RBLS among haplotypes of OsPSKR1 and Osaba1 in the indica and japonica subgroups. The green violins represent indica, the red violins represent japonica rice, and different letters indicate significant differences (P < 0.05) detected by one-way ANOVA

The Osaba1 mutant has been shown to be resistant to Xoc [32]. We identified five significant SNPs located in the promoter region of Osaba1. Haplotype analysis was conducted for Osaba1 using genotypic data from the 747 accessions, which revealed a total of three haplotypes (hap1, hap2, and hap3) in both the indica and japonica subpopulations (Fig. 3b). In the indica subpopulation, hap1 and hap2 of Osaba1 exhibited a significant difference in RBLS compared to hap3, with the mean RBLS of hap3 (1.049, 67 accessions) being higher than that for hap1 (0.896, 332 accessions) and hap2 (0.759, 8 accessions). Therefore, we tentatively identified this gene as a functional gene that controls bacterial leaf streak in rice, with hap1 and hap2 of Osaba1 being identified as the superior genotypes. However, we also observed a higher proportion of superior haplotypes for this gene in both indica and japonica rice, indicating that these superior haplotypes have already been widely utilized in modern breeding processes (Fig. 3b).

Transcriptome analysis of candidate genes

To ensure the reliability of our results, we conducted transcriptome analysis on Nipponbare leaves 48 h after inoculation with 10 geographically diverse strains of Xoc. We utilized three biological replicates for each condition, and three replicates of mock-inoculated O. sativa were used as controls. We compared the reads per kilobase per million mapped reads between the samples inoculated with the Xoc strains and the mock-inoculated samples, denoted as RPKMXoc and RPKMMock, respectively. The genes detected in the GWAS for RBLS were classified as up-regulated or down-regulated genes based on their RPKMXoc/RPKMMock values. Genes with values greater than 1.5 were classified as up-regulated, while those with values lower than 0.67 were classified as down-regulated, between the inoculation with 10 geographically diverse strains of Xoc and the mock-inoculated samples (Figs. 3c and 4). As previously described, our GWAS results identified two cloned genes that may be functional genes for RBLS in the japonica population. OsPSKR1 was identified as an up-regulated gene, with the ratios of its RPKM values between the 10 geographically diverse strains of Xoc and the mock-inoculated samples being greater than 1.5. The ratio of RPKM in B8-12 to the mock sample for OsPSKR1 was 2.32, and the lowest ratio of RPKM in RS105 to mock for OsPSKR1 was 1.53. Osaba1 was identified as a down-regulated gene, with the ratios of its RPKM values between the 10 geographically diverse strains of Xoc and the mock-inoculated samples being all lower than 0.67. The lowest ratio for Osaba1 was 0.42, which was the ratio of RPKM in CFBP7341 to the mock sample. The largest ratio of RPKM in CFBP2286 to mock for Osaba1 was 0.65 (Fig. 3c). These results were in agreement with previous reports, indicating that the transcriptomic analyses were trustworthy and could be employed to further identify potential candidate genes for the QTLs detected by GWAS.

Fig. 4
figure 4

Candidate genes for RBLS and heat map of the ratio of RPKM. Different colors show the ratio of RPKM in leaves of Nipponbare 48 h after inoculation with 10 geographically diverse strains of Xoc and mock-inoculated. The rows of the heat map correspond to the 20 candidate genes for RBLS listed on the left of the table

With the reliability of the transcriptomic analyses established, we proceeded to identify potential candidate genes within the QTL for RBLS by comparing the RPKM values of genes within each QTL detected by GWAS between the samples inoculated with 10 geographically diverse strains of Xoc and the mock-inoculated samples. We utilized the transcriptomic data to identify 20 candidate genes that corresponded to the eight QTLs detected by GWAS. Among these, seven genes were up-regulated, while 13 genes were found to be down-regulated (Fig. 4). Based on the transcriptomic analysis, these genes were the most likely candidate genes for RBLS among the 20 candidates we identified.

To investigate the function of these candidate genes, the same approach of KEGG pathway enrichment analysis for cloned and candidate genes were applied. The cloned and candidate genes were enriched for KEGG pathway (Additional file 1: Figs. S7-S8). The terms “MAPK signaling pathway—plant”, “Plant hormone signal transduction” and “Biosynthesis of secondary metabolites” from cloned genes (Additional file 1: Fig. S7), and “MAPK signaling pathway—plant”, “Biosynthesis of secondary metabolites” and “Metabolic pathways” from candidate genes (Additional file 1: Fig. S8) were both related to development and resistance. Taken together, these candidate genes were most likely the genes that control RBLS, and could provide theoretical guidance for subsequent research on disease resistance in rice.

Candidate gene analysis in QTLs

To gain a deeper understanding of the potential candidate genes, we conducted haplotype analysis for these genes. Our analysis revealed significant differences in the mean RBLS values between haplotypes for five of the genes (Fig. 5).

Fig. 5
figure 5

Investigation of five candidate genes for RBLS. Different haplotypes and a comparison of RBLS between the haplotypes of OsRBLS1 (a), OsRBLS2 (b), OsRBLS3 (c), OsRBLS4 (d) and OsRBLS5 (e) in the indica and japonica subgroups. The green violins represent indica, the red violins represent japonica rice, and different letters indicate significant differences (P < 0.05) detected by one-way ANOVA

LOC_Os08g10260 was located in qRBLS8-1 and encoded a NBS-LRR disease resistance protein. We found four SNPs (P ≤ 10–2) within the promoter region of the gene. Haplotype analysis was conducted for LOC_Os08g10260 using the genotypes of these four SNPs for all 747 accessions, ultimately leading to the identification of seven haplotypes (Fig. 5a). In the indica subpopulation, we observed a significant difference in RBLS between hap2 (1.092, 30 accessions) and hap3 (1.079, 73 accessions) compared to hap5 (0.867, 77 accessions), hap6 (0.826, 32 accessions), and hap7 (0.838, 13 accessions). Hap2 and hap3 exhibited higher mean RBLS values than hap5, hap6, and hap7. In japonica, we observed a significant difference in RBLS between hap1 (1.066, 33 accessions), and hap4 (0.761, 99 accessions), hap7 (0.686, 12 accessions), with the mean RBLS value for hap1 greater than that of hap4 and hap7. Based on our findings, we concluded that LOC_Os08g10260, which we named OsRBLS1, is a functional gene that controls RBLS. Hap4, hap5, hap6, and hap7 of OsRBLS1 were identified as the superior genotypes. Thus, increasing their frequency in both the indica and japonica subpopulations could enhance resistance to rice bacterial leaf streak (Fig. 5a).

ASR1 (LOC_Os02g33820), located in qRBLS2-7, encoded an abscisic stress-ripening gene. We found four SNPs (P ≤ 10–2) within the promoter region of ASR1. Prior research had demonstrated that overexpression of ASR1 in transgenic rice plants led to increased tolerance to both drought and cold stress. Furthermore, ASR1 and ASR5 had been shown to function complementarily in the regulation of gene expression in response to aluminium (Al) toxicity [33, 34]. We conducted haplotype analysis for ASR1 using the genotypes of these four SNPs for all 747 accessions, ultimately leading to the identification of three haplotypes (Fig. 5b). Within the indica subspecies, we noted a significant difference in RBLS between hap1, hap3, and hap2. The mean RBLS value of hap2 (0.758, 46 accessions) was lower than those for hap1 (0.96, 79 accessions) and hap3 (0.879, 139 accessions). In the japonica subspecies, a significant difference in RBLS was observed between hap1 and hap2, with hap1 exhibiting a higher mean RBLS value than hap2. Based on our results, we concluded that ASR1 (LOC_Os02g33820), which we named OsRBLS2, was a functional gene that controls RBLS. Regarding OsRBLS2, hap2 was identified as the superior genotype. Thus, increasing the frequency of hap2 in the indica subpopulation could enhance resistance to rice bacterial leaf streak (Fig. 5b).

We observed a significant difference in RBLS between hap1 and hap2 of LOC_Os03g47160, LOC_Os10g32990, and LOC_Os10g33210 within both the indica and japonica subpopulations. LOC_Os03g47160 was located within qRBLS3-1 and encoded an expressed protein that we named OsRBLS3. LOC_Os10g32990 and LOC_Os10g33210 were both located in qRBLS10-2. LOC_Os10g32990 encoded a precursor of receptor-like protein kinase 2, named OsRBLS4. LOC_Os10g33210, on the other hand, encoded a peptide transporter PTR3-A that we named OsRBLS5. In both the indica and japonica subpopulations, the mean RBLS values of hap2 for OsRBLS3 and OsRBLS5 were significantly lower than those for hap1. While, the mean RBLS value of hap2 of OsRBLS4 was significantly higher than that for hap1 in both indica and japonica. Therefore, we concluded that OsRBLS3, OsRBLS4, and OsRBLS5 were functional genes that control RBLS. The hap2 of OsRBLS3 and OsRBLS5, and hap1 of OsRBLS4 were superior genotypes. Increasing the frequency of these genotypes in either the indica or japonica subpopulations could improve resistance to rice bacterial leaf streak (Fig. 5c-e). By employing GWAS and linkage map** in conjunction with sequence analysis, haplotype analysis, and transcriptome analysis, we identified 20 QTLs and five candidate genes that control resistance to bacterial leaf streak in rice. We also determined the superior haplotypes of these genes and assessed their breeding potential in both the indica and japonica subspecies. These genes could be useful in further investigations aimed at elucidating the genetic mechanisms underlying resistance to bacterial leaf streak in rice.

Discussion

Genetic factors contributing to resistance to bacterial leaf streak in rice

Efforts to develop BLS-resistant rice varieties necessitated the exploration and utilization of genes associated with RBLS. To date, several genes related to bacterial leaf streak disease in rice have been cloned [31, 32, 35,36,37,42]. The resistance of hybrid rice to the tested pathogen was dominant, but the susceptibility was recessive, as determined by spray inoculation during the seedling stage, indicating that the resistance of the restorer lines determined the resistance of the hybrid material [43]. The resistance of Duar and IR36 to the tested pathogen was identified by needle inoculation, and was suggested to be controlled by two pairs of recessive genes [44]. Eight materials were randomly selected from 57 resistant wild rice materials for hybridization and backcrossing with 9311 and resistance was found to be inherited recessively [45]. BC2 generations were constructed using H359 and Jiannong 8, and resistance to bacterial leaf streak disease was found to be controlled by multiple genes as a quantitative trait [11]. Similarly, another study found that inheritance of resistance to bacterial leaf streak disease was controlled by multiple genes as a quantitative trait [12]. Four resistant rice varieties, Dular, IR1545-339, IR26, and IR36, as well as a susceptible variety, **gang 30, were used for hybridization and backcrossing and the resistance gene in IR1545-339 was found to be non-allelic [46]. A highly pathogenic bacterial strain, S-103, was used to inoculate resistant 90IRBBN44, and genetic analysis revealed that the resistance in 90IRBBN44 was controlled by a pair of recessive genes [14]. The BLS resistance quantitative trait locus (QTL) qBlsr3d in rice was verified to be controlled by quantitative inheritance by constructing a H359-BLSR3D single-chromosome segment substitution line [16]. The BLS resistance gene in the international rice variety BJ1 was located using the BSA method. Resistance was found to be controlled by a pair of recessive genes located on the chromosome 10 at a distance of approximately 48.8 cM and tightly linked to marker RM158 [17]. A doubled haploid (DH) population was constructed using the parents Taichung Native 1 and Chunjiang 06 and four effective quantitative trait loci (QTL) were identified through needle inoculation during the seedling stage on the 2, 4, 5, and 8 chromosomes of rice [50,

Conclusion

To summarize, we analyzed the population structure of 747 rice accessions and evaluated their phenotypes 20 days after inoculation with Xoc strain GX01. We then conducted GWAS on the full population and the indica and japonica subpopulations based on both the phenotypic resistance to rice bacterial leaf streak and the sequence data of the 747 rice accessions. Our analysis led to the identification of 20 QTLs associated with RBLS in rice. Subsequently, we used a combination of linkage map**, sequence analysis, haplotype analysis, and transcriptome analysis to identify five candidate genes that control resistance to BLS in rice. We also determined their superior haplotypes and breeding potential in both indica and japonica rice varieties. These findings suggested that these genes could be employed in future studies aimed at elucidating the genetic mechanisms underlying resistance to bacterial leaf streak in rice.

Methods

Materials and sequencing data

The rice diversity panel comprised 747 cultivated rice accessions sourced from the 3 K Rice Genome (3 K-RG) [2, 26]. This panel included 290 genotypes from the mini-core collection, which was initially selected from a core set of 4,310 accessions [53], and 457 lines from the International Rice Molecular Breeding Network [54]. The two collections consisted of accessions from 45 countries, encompassing major rice-growing regions worldwide (Table S1). A map for 747 accessions was plotted using the R package “leaflet” (https://rstudio.github.io/leaflet/), and the map was based on “Esri.WorldStreetMap” provided by OpenStreetMap, with latitude and longitude of based on their clear geographic distribution. Transcriptomic data were obtained from the NCBI (https://www.ncbi.nlm.nih.gov/gds) with series accession ID: PRJNA280380 [55].

Phenotypic data for RBLS

A total of 747 varieties were grown under field conditions in Nanning, Guangxi, southern China, with a plant spacing of 20 cm x 30 cm, and each variety was replicated twice. The bacterial strains of rice bacterial leaf streak were obtained from GX01, which was isolated at the Guangxi Academy of Agricultural Sciences. At the maximum tillering stage, the 747 varieties were inoculated with Xoc using the acupuncture method [10]. Phenotypic assessments were conducted 20 days after inoculation (DAI), and the diseased leaf areas were measured using the Standard Evaluation System for Rice (5th Edition, 2014).

Phylogenetic and population structure analysis

The genetic variation data for the 747 accessions, in the form of single nucleotide polymorphisms (SNPs), were obtained from the publicly available 3 K-RG database. The database comprised approximately 17 million highly credible SNPs and 2.4 million indels, which were aligned to the Nipponbare IRGSP 1.0 reference genome [2, 26].

After removing SNPs with missing rates > 30% and minor allele frequencies < 5% in the full, indica, and japonica populations, we identified 2,860,820 SNPs as a credible SNP set. To account for population structure, we performed principal component (PC) and kinship matrix analyses for the 747 accessions. To infer the basal group of rice, we used a total of 2,860,820 SNPs with a missing rate of ≤ 50% to construct a phylogenetic tree using the unweighted neighbor-joining (NJ) method with the 'phangorn' R package. For constructing the PC matrix, we used the first three principal components (PCs). To conduct PCA, we used R (version 4.0.3) and plotted the first three eigenvectors. We further called independent SNPs using Plink v1.90b4 with the parameter '–indep-pairwise 50 5 0.3'. This resulted in the effective numbers of independent SNPs being 113,223, 95,164, and 45,752 for the full, indica, and japonica populations, respectively [56]. To analyze the population structure of the rice accessions, we used the 113,223 independent SNPs to perform a maximum likelihood clustering analysis with ADMIXTURE (version 1.3) [27]. To estimate the genetic ancestry of each sample, we ran the cross-validation error (CV) procedure with varying levels of K (K = 1–10). Based on the cross-validation error, we determined that a value of K = 2 was optimal. We plotted the ADMIXTURE result using an R script.

Genome-wide association study

We obtained a total of 2,860,820 SNPs with a missing rate of ≤ 50% and minor allele frequencies of ≥ 5%. To account for population structure, we performed principal component (PC) and kinship matrix analyses, using the first three PCs to construct the PC matrix. We performed a genome-wide association study (GWAS) to identify genetic variants associated with resistance to rice bacterial leaf streak (BLS) in the 747 accessions. And we used three different models: general linear model (GLM), mixed linear model (MLM), and Fixed and random model Circulating Probability Unification (FarmCPU). The GWAS was conducted using the SNP set and default settings in the rMVP software [28]. The strong linkage disequilibrium (LD) among SNPs resulted in non-independence, which caused the thresholds derived from the total number of markers to be too stringent for detecting significant associations. To address this issue, we calculated suggestive thresholds using the formula "-log10(1 / effective number of independent SNPs)", as previously described [50, 52, 57, 58]. And we determined the effective numbers of independent SNPs using PLINK [56], with a window size of 50, step size of 50, and r2 ≥ 0.3. The effective numbers of independent SNPs were found to be 113,223, 95,164, and 45,752 in the full population, and the indica and japonica subpopulations, respectively. To identify significant associations, we set the threshold at -log (P) = 5. We used genome structure annotation information from MSU-RGAP 7.0 to annotate nonsynonymous SNPs using SnpEff [59]. An in-house Perl script was used to separate these SNPs from all the identified SNPs in the 747 accessions.

Haplotype analysis

For haplotype analysis, we used SNPs with a significance level of P ≤ 10–2 in the 2 k promoter and exons of the gene. Haplotypes that contained at least five varieties were used for statistical testing. We calculated the differences in phenotypic value between haplotypes of each gene using one-way ANOVA or Student's t-tests with an R script [52, 60]. Genes with RPKMXoc/RPKMmock > 1.5 were defined as up-regulated genes, and those with RPKMXoc/RPKMmock < 0.67 were defined as down-regulated genes.