Background

Monochasma savatieri Franch. ex Maxim, a perennial medicinal plant in the Scrophulariaceae family, is endemic to China and Japan, where it grows preferentially on sandy hillsides and in grassy clusters. In China, it is mainly distributed in southeastern regions, including the provinces of Jiangxi, Fujian, Hunan, and Zhejiang [1, 2]. The whole plant of M. savatieri can be used for medicinal purposes, including the treatment of colds, cough, pneumonia, fever, toothache, and irregular menstruation [3,4,5]. Preparations of the plant are characterized by a wide range of biological activities, including antioxidant, antibacterial, and antiviral activities, which are assumed to be attributed the main medicinally effective components, namely, phenylpropanoids, flavonoids, alkaloids, saponins, and polysaccharides [6, 7].

M. savatieri has a long history of medicinal use and is one of the essential component of Yanning granules, which has been designated a “national protected traditional Chinese medicine variety” [8, 9]. However, as an important and valuable medicinal plant, the artificial cultivation of M. savatieri has been unsuccessful. In recent years, there has been an increase in the market demand for M. savatieri, which has often led to limitations in the supply of its wild resources. Such scarcities among wild populations can be attributed to the fact that M. savatieri is a root hemiparasite that has stringent habitat requirements and a naturally weak reproductive capacity [10]. Moreover, the seeds M. savatieri are characterized by a short period of dormancy, and germination rates decline sharply with a prolongation of storage time. Consequently, if, on reaching maturity, the seeds lack suitable germination conditions, they will rapidly lose vitality [11]. Owing to a combination of unfavorable factors, including over-harvesting and habitat destruction, the natural habitats of wild resources of M. savatieri have sharply deteriorated and reduced in size. Their number of individuals and size of natural population have markedly declined in recent years owing to the short seed dormancy and inherently poor regeneration. This has resulted in substantial losses of genetic resources, as indicated by a survey conducted in China by Chen et al. [12], who found that wild resources of M. savatieri in almost all investigated regions showed sharp declines. Moreover, in Japan, M. savatieri has long been listed as an endangered species [13]. Consequently, conservation of the wild resources of M. savatieri is considered a matter of particular urgency. Currently, however, there is a notable lack of information regarding the genetic variation of M. savatieri, which thereby hinders evaluations of the genetic status of this species and makes it difficult to formulate appropriate scientifically based resource conservation strategies.

Analyses of species genetic diversity and population structure play key roles in genetic resource conservation and plant breeding [14, 15]. In this regard, a range of molecular markers, including random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), sequence-related amplified polymorphism (SRAP), simple sequence repeat (SSR), and single-nucleotide polymorphism (SNP) markers, have been widely used to assess the genetic diversity of species and reveal their population structures [16,17,28,29,30]. To date, studies on the genetic diversity of M. savatieri are still limited, and one of the most important reasons is the lack of effective molecular markers for the species.

In this study, great efforts have been made to collect and preserve the wild germplasm resources of M. savatieri from the major distribution provinces of China. Meanwhile, we detected an abundance of EST-SSRs in the assembled transcripts of the full-length transcriptome data of M. savatieri, and from among these, we developed reproducible polymorphic EST-SSR markers to assess the level of genetic diversity and genetic structure for the collected wild germplasm resources of M. savatieri. The collected wild germplasm resources and genetic information will provide novel insights for the conservation, utilization and breeding strategies development of M. savatieri.

Results

Characterization of EST-SSR loci

In total, we detected 35,807 EST-SSR loci distributed among 18,279 of a total of 40,970 transcript sequences (98.98 Mb) (The data were deposited and are available at https://github.com/AWan222/LRCyc). The frequency of SSR distribution (number of SSR/total transcripts) was 87.40%, with an average of one locus per 2.76 kb. In addition, we found that 9078 sequences contained more than one SSR loci, and that there were 6363 (17.77%) compound SSRs. The SSR types were abundant, ranging from mononucleotide to hexanucleotide repeats, and the number of each type varied. Among these, mononucleotide (18,578, 51.88%) and dinucleotide (12,750, 35.61%) EST-SSRs were the most abundant repeat types, followed by trinucleotide loci (4171, 11.65%), with less than 1.00% of the total being represented by tetra-, penta-, and hexa-nucleotides (Table 1). The number of repeats varied greatly among the different repeat types of SSRs. The highest frequency of SSR repeat types occurred with 10 repeats, with 6466, accounting for 18.06%, followed by 6 (4504, 12.58%) and 11 (3797, 10.60%) repeats (Fig. 1). The repeats of mononucleotide SSRs were mainly concentrated in the range of 10–20 times, and the number of other SSR repeats were mainly concentrated in the range of 5–10 times. Among the identified SSRs, a total of 95 types of motif were detected (Fig. 1). Tetra- and hexa-nucleotide repeat types exhibited the highest number of motif types (21 and 39 types, respectively). Moreover, among different repeat motif types, the A/T repeat motif was the most abundant (17,737, 49.54%) followed by AG/CT (7445, 20.79%), AC/GT (2867, 8.01%), AT/AT (2406, 6.72%), and AAT/ATT (951, 2.66%). These findings provide a solid foundation for the development of EST-SSR markers in M. savatieri.

Table 1 Occurrence of SSRs in the transcripts of the Monochasma savatieri
Fig. 1
figure 1

Characterization of SSR loci. A Distribution of SSR repeat number in Monochasma savatieri. B Distribution of SSR motif types in Monochasma savatieri

Genetic diversity

For the purposes of EST-SSR primers screening, we randomly selected the DNA of 10 different M. savatieri individuals, among which, we developed 33 highly polymorphic SSRs that were subsequently used to assess the genetic diversity and genetic relationships among the 46 M. savatieri specimens obtained from different wild habitats in China (Table 2). We accordingly succeeded in amplifying a total of 208 alleles, with the numbers of alleles scored for the 33 loci ranging from 3 (LRC-60747-1) to 11 (LRC-19013-3) with an average of 6.303 (Fig. 2). The average MAF ranged from 0.130 to 0.924 (average: 0.424), whereas the GD ranged from 0.143 to 0.894 (average: 0.706), thereby indicating differences in the polymorphisms of the EST-SSR loci. The PIC ranged from 0.138 to 0.884 (average: 0.668), with the values indicating that 31 SSR loci (93.94%) were highly polymorphic (PIC ≥0.5). Of the remaining two loci, one was moderately polymorphic (0.25 < PIC < 0.5) and the other showed low polymorphism (PIC ≤0.25).

Table 2 Genetic diversity parameters of the newly developed 33 EST-SSR markers across the 46 Monochasma savatieri individuals
Fig. 2
figure 2

Results of the PCR amplification of 46 M. Monochasma savatieri germplasm resources using the EST-SSR markers (A) LRC-19013-3, (B) LRC-35320-1, and (C) LRC-10353-1. M: DNA Ladder; 1 to 46: materials of the 46 Monochasma savatieri individuals listed in Supplemental Table 1

The results obtained for population-level indexes of genetic diversity in each population are shown in Table 3. The number of alleles (Na) for each locus at the population level ranged from 1.909 to 2.273 (average: 2.068), whereas the number of effective alleles (Ne) ranged from 1.682 to 1.823 (mean: 1.747). Among the four populations, we recorded the highest values of Na and Ne in the ZJ and JX populations, respectively. Expected heterozygosity (He) at the population level varied from 0.314 to 0.362 (mean: 0.342) and Shannon index (I) values obtained for the four populations ranged from 0.481 to 0.583 (mean 0.535), with both indexes indicating the highest genetic diversity in the ZJ population, whereas the FJ population was characterized by the lowest genetic diversity. Furthermore, analysis of the percentage of polymorphic loci (PPL) revealed that the FJ and ZJ populations had the lowest (63.64%) and highest (75.76%) PPL values respectively, with a mean value among populations of 70.45%. These results accordingly indicated that the ZJ population has the highest genetic diversity, whereas comparatively, the JX, HN, and FJ populations are characterized by relatively lower genetic diversities.

Table 3 Analysis of the genetic diversity of four Monochasma savatieri populations

Sample size is an important factor in the analysis of genetic diversity, because it affects genetic diversity indexes. To verify the accuracy of the genetic diversity results using 46 M. savatieri samples, we randomly selected different individuals from 46 samples and constructed 26 groups of different sample sizes with three replicates for each group to plot the trends of the Ne, I and He indexes. As shown in Fig. 3, the genetic diversity indexes Ne, I and He increased with an increase in the effective sample size. However, when the sample size reached approximately 25, the genetic diversity indexes stabilized and no longer increased significantly with increase in sample size. Therefore, the genetic diversity analysis with the 46 M. savatieri samples in this study was accurate and effective.

Fig. 3
figure 3

Fitting curves between sample size and different genetic diversity indexes

Genetic differentiation

To gain estimates of genetic differentiation, we performed AMOVA analysis at three levels, namely, among populations, within populations, and within individuals. We accordingly established that 49.74% of the total genetic variation originate from variability within the populations, whereas 48.36% is attributed to differences among populations, and only 1.91% ascribed to within-individual differences (Table 4). F-statistics analysis revealed that the inbreeding coefficient (Fis) and overall fixation index (Fit) were 0.962 and 0.981, respectively, and that the genetic differentiation coefficient (Fst) was 0.497 (Fst > 0.25), indicating a relatively large genetic differentiation among the populations. In addition, we determined a gene flow (Nm) value of 0.253 (Nm < 1), which tends to indicate that the genetic differentiation among population may be caused by migration or genetic drift.

Table 4 Analysis of the molecular variance (AMOVA) of the 46 Monochasma savatieri individuals

Population structure

To gain an understanding of the structural and genetic characteristics of the M. savatieri populations, we subjected the 46 target individuals to PCoA, UPGMA, and structure analyses. Our PCoA analysis was based on the use of a correlation genetic similarity matrix, showing the first three principal coordinates with Eigen value effects of 20.77, 18.23, and 11.78% respectively, the derived scatter plot of which discriminately divided the 46 individuals into three main groups (Fig. 4). Notably, these three PcoA clusters appeared to be identical to those identified based on structure analysis, representing the natural distribution of M. savatieri. Structure clustering analysis revealed that the maximal ΔK value (120.776) was obtained for a K value of 3 (Fig. 5), thereby indicating that the 46 M. savatieri individuals could be separated into three distinct genetic clusters (Fig. 5). Specifically, the genetic structure map revealed that the nine individuals in the FJ population grouped in Cluster 1 (blue cluster), the 19 individuals in the HN and JX populations grouped in Cluster 2 (green cluster), and the remaining 18 individuals from the ZJ population were collected in Cluster 3 (red cluster). Consistent with the results of the PCoA analysis, these findings thus indicated that gene penetration among the different M. savatieri populations was relatively low.

Fig. 4
figure 4

A principal coordinate analysis (PCoA) plot for the four Monochasma savatieri populations showing the separation into three main clusters

Fig. 5
figure 5

A The structure of the Monochasma savatieri population inferred based on Bayesian clustering of the EST-SSR markers data. B A graph showing the relationship between ΔK and K values. C The relationship between the pairwise estimates of genetic distance and the corresponding geographical distances among the four populations of Monochasma savatieri

The same three-way partitioning of populations was similarly confirmed by the findings of UPGMA cluster analysis. The UPGMA dendrogram was constructed based on Nei’s genetic distance and accurately reflects the genetic relationships among and within populations. The tree showed that the 46 M. savatieri individuals from the four populations could be divided into two major clusters (Fig. 6), with Cluster 1 comprising exclusively the FJ population and Cluster 2 containing all individuals from the remaining three populations, which were further divided into two short branches, with populations (HN and JX) forming one short branch and the ZJ population comprising the other. Furthermore, both the UPGMA dendrogram and STRUCTURE clustering indicated that the clustering of populations tended to be consistent with the geographical distance among populations. These observations were supported by our Mantel test results, which indicated a significant correlation between genetic and geographical distances (R2 = 0.3304, p < 0.05) (Fig. 5).

Fig. 6
figure 6

A UPGMA dendrogram of the 46 Monochasma savatieri individuals

Discussion

M. savatieri is an important and rare medicinal plant found only in southeast China and the islands of Kyushu in southwestern Japan [12]. In recent years, as the market demand for M. savatieri increased, overharvesting and habitat destruction caused its wild germplasm resources to reduce excessively. Therefore, it is of great importance to collect and analyze the germplasm resources of M. savatieri for the formulation of protection strategies and utilization of the wild resources. In this study, great efforts were made to collect and preserve the wild germplasm resources of M. savatieri from the major distribution provinces of China. This is the most reported current collection for the wild germplasm resources of M. savatieri. The abundant germplasm resources of M. savatieri will provide considerable opportunities for future genetic research and breeding application.

Molecular markers play a pivotal role in genetic research and molecular breeding. However, available resources for molecular markers in M. savatieri are very scarce. In this study, we developed a novel set of EST-SSR markers and used them to determine the genetic diversity of M. savatieri populations. To the best of our knowledge, this is the first study that has sought to assess the genetic diversity of this important medicinal plant species. The frequency of EST-SSRs in M. savatieri is comparable to that of found in Rhododendron fortunei [31] and Cocos nucifera [32], although somewhat lower than that previously reported for Pongamia pinnata [33] and Stephanandra incisa [34], but notably higher than that in Paeonia suffruticosa [35] and Cephalotaxus oliveri [36]. These findings accordingly reveal the relatively broad range of SSR densities among plants at the species level. With respect to M. savatieri, we identified mono- and dinucleotides as the predominant types of EST-SSR repeat. This pattern contrasts to a certain extent with that in a range of other plants, including Corchorus spp. [37], Lycium barbarum [38], and Curcuma alismatifolia [39], for which di- and tri-nucleotide repeats predominate. We speculate that this disparity could be attributable to the relatively long evolutionary history of M. savatieri [40]. Furthermore, among the different repeat motifs, we identified A/T (17,737, 49.54%) as the predominant type in M. savatieri. Notably, the proportion of A/T motifs was found to be considerably higher than that of G/C motifs, which is consistent with the pattern reported for numerous plants, such as Opisthopappus [41], Pinus koraiensis [42], and Phoebe bournei [43].

Genetic diversity is an important prerequisite for species adaptation to environmental change and the development of disease resistance, the levels of which determine the viability and evolutionary potential of populations, and are often used to predict species trends and potential dangers [44]. The use of molecular markers, such as RAPD, AFLP, and SSR, has been shown to be an accurate and reliable approach for analyzing the genetic diversity of plant populations [45], among which, SSRs are widely employed owing to their strengths. In this study, we used 33 novel developed EST-SSR primers to examine the genetic diversity of M. savatieri specimens collected from 10 geographically separated wild areas, the results of which indicated a large variation in the number of alleles of each locus, ranging from 3 to 11 with a mean value of 6.303. The average GD and PIC values determined for the loci, 0.706 and 0.668, respectively, were found to be higher than those reported for some plants, such as those in the genera Polygonatum [46] and Perilla [47], indicating the high polymorphism of these loci. Additionally, the differences in the values for each locus indicate that these 33 loci are characterized by significant genetic variation among the populations, thereby highlighting their importance with respect to the evaluation of M. savatieri germplasm resources. The ploidy level, defined as the number of sets of chromosomes in the nucleus, is a crucial characteristics for studying biodiversity and develo** strategies for plant breeding. In this study, within the expected PCR product sizes, the overwhelming majority of alleles of amplification loci of the 33 SSR markers in each germplasm were one or two, whereas three alleles existed only in a few loci. Therefore, it was speculated that M. savatieri was a diploid or triploid plant, but additional approaches, including flow cytometry and chromosome counting, were requested to further define the ploidy level of M. savatieri.

Analysis of M. savatieri genetic diversity at the population level revealed that He and I ranged from 0.314 to 0.362 (average: 0.342) and 0.481 to 0.583 (average: 0.535) respectively, which are lower than the values previously obtained for Paeonia decomposita [48], Magnolia sinostellata [49], and Camellia nitidissima [50], thus indicating that the genetic diversity of M. savatieri is lower than that of these three species. For plants, reproductive patterns, genetic drift, natural selection, anthropogenic activities, and habitat fragmentation are among the main factors contributing to changes in genetic diversity [51]. It can be speculated that the low genetic diversity of M. savatieri populations is attributable to its narrow range of distribution, along with the large spatial distance between populations, which limits inter-population gene exchange. Moreover, excessive harvesting of natural populations, habitat destruction, and declining population sizes may contribute to inbreeding and genetic drift within the M. savatieri population, thereby reducing its genetic diversity. Nevertheless, among the four regional populations examined in this study, we identified the Zhejiang population as being the genetically most diverse. Generally, a higher genetic diversity is indicative of a higher complexity of plant diversity, and consequently a greater potential for environmental adaptation [44]. Accordingly, analysis of the genetic diversity of M. savatieri will predictably provide a theoretical basis for its protection and breeding.

The degree of differentiation between natural populations can be described in terms of gene flow (Nm) and the genetic differentiation coefficient (Fst), which are negatively correlated, with a higher differentiation coefficient between populations being taken to be indicative of a lower extent of gene flow [52]. Genetic differentiation among plant populations is influenced by range size, habitat fragmentation, and population isolation, reflecting the cumulative effects of genetic mutation, genetic drift, gene flow, and natural selection [53]. Previously, Wright et al. [54] established that genetic differentiation among populations can be considered high when Fst values are greater than 0.25, and thus the Fst value of 0.497 we obtained for M. savatieri would tend to indicate a significant genetic differentiation among populations. This assumption was supported by our AMOVA results (P < 0.001), indicating a higher molecular variation among populations (49.74%) than within populations (48.36%). Gene flow is one of the main factors contributing to the maintenance of a homogeneous population genetic structure. Theoretically, at Nm values > 1, there is a lower likelihood of genetic drift, thereby limiting the genetic differentiation among populations, whereas conversely, when Nm < 1, gene flow is insufficient to counteract the effects of genetic drift, thereby favoring an increase in the genetic differentiation among populations [55]. In the present study, we obtained a value of only 0.253 for the gene flow between M. savatieri populations, indicating that low levels of gene exchange would not weaken the differentiation among populations caused by genetic drift, which we suspect could be the primary factor influencing the genetic structure of M. savatieri. It can be reasoned that these findings reflect the characteristics of M. savatieri, such as its semi-parasitism, low reproductive capacity, and low seed viability. Moreover, the viability of M. savatieri populations is increasingly being threatened by habitat fragmentation and destruction, and consequently, declining numbers and hindered gene exchange may lead to a gradual increase in genetic differentiation.

The genetic structure of species reflects the mutation, recombination, genetic drift, and selection effects experienced by populations during the course of evolution [56], and thus population genetic structure analyses are deemed important for gaining an accurate understanding of the genetic relationships among germplasms. In this study, we used three clustering methods (PCoA, STRUCTURE, and UPGMA) to analyze the population structure of M. savatieri. PCoA and UPGMA clustering analyses were performed to classify the assessed materials based on their genetic similarity coefficients and genetic distance, which can clarify the intuitive relationship between populations [57]. However, the results obtained using these two clustering do not provide an indication of the materials that interpenetrated between populations. Conversely, Structure analysis is based on the Hardy-Weinberg equilibrium and Bayesian model algorithm, which eliminates the effects of deviation attributable human factors on population division and can objectively classify populations [58]. Consequently, in the present study, we used these three methods in combination to analyze germplasm resources, thereby enabling us to further characterize the germplasm population structure.

UPGMA clustering analysis divided the 46 individuals collected from 10 different wild regions into two clusters, thereby indicating the presence of two discrete genetic populations among these regions. Using STRUCTURE analysis, these individuals were clearly divided into three different clusters, with a few examples of admixed individuals. These results indicate the relatively homogeneous genetic structure of most samples, and the similar composition of samples collected from the same regions. The results of PCoA analysis were found to be identical to those obtained using STRUCTURE and were in general consistent with the UPGMA clustering tree. In addition, we found that the genetic relationships among populations reflected the natural geographical locations of these populations, which was supported by the Mantel test results, which revealed a significant positive correlation between the geographical and genetic distances among populations (R2 = 0.3304, P < 0.05). Collectively, these results tend to imply that geographical isolation, which would hinder gene exchange between different populations and isolate different gene pools, could be the primary factor contributing to the apparent genetic differentiation among M. savatieri populations.

Conclusions

In the present study, 46 M. savatieri wild individuals were collected from 10 habitats in major distribution provinces in China. Using full-length transcriptome sequencing data, we identified and characterized EST-SSR loci and developed the first set of highly polymorphic EST-SSR markers for M. savatieri. The results of this study indicated that gene penetration among the different M. savatieri populations was relatively low, and the individuals from Zhejiang Province showed the highest genetic diversity. Geographical isolation plays a vital role in genetic differentiation among M. savatieri populations. Based on current trends, effective protection is absolutely necessary and urgent for wild germplasm resources of M. savatieri to maintain their genetic identity and diversity.

Materials and methods

Plant materials and DNA extraction

For the purposes of the present study, we collected a total of 46 M. savatieri individuals from different distribution areas in China, among which there were nine accessions from Fujian Province (FJ), 11 from Hunan Province (HN), eight from Jiangxi Province (JX), and 18 from Zhejiang Province (ZJ) (Fig. 7, Supplemental Fig. 2). The formal identification of these plant materials was performed by Prof. Jiankun ** in structured populations. Am J Hum Genet. 2000;7(1):170–81." href="/article/10.1186/s12864-022-08832-x#ref-CR64" id="ref-link-section-d310283e4212">64]. The length of the burn-in period and Markov Chain Monte Carlo (MCMC) run parameters were set to 100,000 iterations. We assessed K values of 1 to 10 based on 10 independent runs for each value. The most favorable K value was determined by estimating the maximum value of the ΔK statistic, using the web-based STRUCTURE Harvester (version 0.9.94) [65], and cluster analysis between populations, based on unweighted pair group with arithmetic average (UPGMA), was performed using NTSYS-pc (version 2.10e) software [66].