Background

Understanding the processes that drive differentiation between populations and elucidating the mechanisms that underlie the origins and maintenance of genetic variations are major aims and fundamental tasks in evolutionary biology [1,2,3,4,5], which are also core issues in conservation biology [6, 7]. Myriad factors may impact the evolution and genetic differentiation of plant populations, where geological events and climate oscillations have been suggested as critical drivers [8,9,

Conclusion

In summary, when genetics, geographical conditions, climate variables, and evolutionary processes were all considered, O. taihangensis and O. longilobus were clearly distinct. At ~ 17.44 Ma during the early Miocene, the establishment of differing monsoon regimes due to the enhanced Asian monsoon from the QTP uplift triggered the derivation of O. taihangensis from O. longilobus. During the mid- late Miocene period, dramatic climatic shifts coupled with the progressive and heterogeneous uplift of the QTP initiated the intraspecific differentiation of these two species. Up until the Pleistocene, the rapid uplift of the Taihang Mountains coupled with violent climatic oscillations further promoted the diversity of the two species. With the formation of the Taihang Mountains, this complex topography led to localized environments and ecological heterogeneity, which established spatiotemporal isolation between populations. Under this scenario, O. taihangensis and O. longilobus underwent adaptive divergence, which gradually shaped current genetic structures and distribution patterns. The results of this study explored the differentiation mechanisms of these two species of the Opisthopappus genus, revealing the impacts of environmental events by taking small-scale spatial niches into consideration, while providing clues for the further investigation of other germplasm resources of the Taihang Mountains.

Methods

Sample collection

Our study was conducted in accordance with the laws of the People’s Republic of China, and field collection was approved by the Chinese Government. All researchers received permission letters from the College of Life Science, Shanxi Normal University, to collect the samples, which were taxonomically identified based on their phenotype by Junxia Su (Associate Professor of systematic botany) at Shanxi Normal University. The voucher specimens were deposited in the herbarium of College of Life Science, Shanxi Normal University (No:20170105030–20170105050).

Eleven populations of O. longilobus and thirteen populations of O. taihangensis were sampled, which covered the Opisthopappus distribution ranges (Table 1, Fig. 1). Individuals growing at a common site were regarded as a single "population". Fresh young leaves devoid of disease or insect pests were selected for each of the sample sites, where 10–15 individuals from each population were collected. These samples were placed into sealed bags filled with silica gel, dehydrated/quickly dried, and stored at 20 °C for later use. A global positioning system (GPS) was employed to demarcate each sample site and record the longitude, latitude, and elevation of each population (Table 1).

PCR amplification, sequencing, and genoty**

The total genomic DNA was extracted using the modified 2 × CTAB method [71]. The quality of DNA was measured using an ultraviolet spectrophotometer and 0.8% agarose gel electrophoresis, and stored at − 20 °C for further use.

The SNP and InDel primers (Additional file 8: Table S3) of nuclear genes of Opisthopappus were obtained from a pervious study [41]. For the SNP primers, the 20 µL PCR reaction contained 10 µL 2 × MasterMix, 2 µL template DNA (30 ng/µL), 1 µL primer S (10 µM), 1 µL primer A (10 µM), and 6 µL ddH2O. The PCR procedure proceeded as follows: pre-denaturation at 94 °C for 5 min., denaturation at 94 °C for 1 min, annealing temperature based on each primer setting for 1 min, elongation at 72 °C for 1.5 min., repeated for 35 cycles, last elongation at 72 °C for 10 min, and preservation at 4 °C. The PCR products detected using 2% agarose gel electrophoresis were confirmed via an automatic analysis electrophoresis gel imaging system, which were then sent to Sangon Biotech (Shanghai) for sequencing.

For the InDel primers, the PCR reaction was 20 µL, which contained 10 µL 2 × MasterMix, 3 µL template DNA (30 ng/µL), 1 µL primer S (10 µM), 1 µL primer A (10 µM), and 5 µL ddH2O. The PCR procedure was as follows: pre-denaturation at 94 °C for 1 min, denaturation at 94 °C for 1 min, annealing temperature based on each primer setting for 1 min, elongation at 72 °C for 1 min, repeated for 35 cycles, last elongation at 72 °C for 10 min, preservation at 4 °C. The PCR products were detected using 8% polyacrylamide gel electrophoresis. The presence or absence of each InDel fragment were coded as ‘1′and ‘0′ respectively. The details for the numbers of individuals for SNP sequencing and InDel genoty** are shown in Table 1.

Population genetic differentiation analyses

Prior to population genetic analysis, the partition homogeneity test (PHT) were initially conducted by PAUP [86] to identify whether the SNP sequences were suitable to be combined. The non-significant (P > 0.05) of the results revealed that the combined SNP sequences were suitable.

The haplotypes, haplotype frequencies, haplotype diversity (Hd), and nucleotide diversity (π) were calculated using DNASP 5.10 [87]. The genetic GST and NST differentiation parameters were examined by PERMUT 2.0 [88] based on the haplotype frequency.

For the InDel data, the genetic characteristics, Nei's gene diversity index (H), Shannon’s information index (I), and the percentage of polymorphic loci (PPL), were calculated by POPGENE 1.31 [89]. An analysis of molecular variance (AMOVA) was implemented by ARLEQUIN 3.5 [90] and GENALEX 6.5 [91] to detect the distribution of genetic variations within and between populations or species. Subsequently, the FST, FCT, and FSC values [92] were calculated based on hierarchical AMOVA, and the permutation test was set to 1000.

Cluster analysis based on the maximum likelihood (ML) method and Nei’s genetic distance, respectively, was performed using MEGA 7.0 [93]. Bayesian clustering analysis (BCA) was employed to examine the similarity and divergence of genetic components between populations and performed using STRUCTURE 2.2 [94] for both the SNP sequencing and InDel data. The posterior probability of grou** number (K = 2–24) was estimated through 10 independent runs using 500,000 step Markov chain Monte Carlo (MCMC) replicates, following a 1,000,000-step burn-in for each run to evaluate consistency. The best grou** number was evaluated by ΔK [95] in STRUCTURE HARVESTER 0.6.94 [96]. These 10 runs were aligned and summarized using CLUMPP 1.1.2 [97] and the visualization of the results was plotted using DISTRUCT 1.1 [98].

To test the genetic differentiation between populations or species, a discriminant analysis of principal components (DAPC) was implemented by the function dapc in the R package ‘adegenet’ [99], which initially transformed the genetic data using principal component analysis (PCA) results, and subsequently performed discriminant analysis on the retained principal components. The properties of the “without a priori”, using partial synthetic variables to minimize variations within groups [100], might assist with objectively evaluating the artificial classification of O. taihangensis and O. longilobus. Kruskal–Wallis tests for the first two principal components (PCs), and the first two linear discriminants (LDs) of DAPC, were conducted to examine the genetic differentiation between the populations and species.

Inference of population demographic history

A network relationship was generated through the median-joining method in POPART 1.7 [101], to investigate the evolutionary relationships between the Opisthopappus haplotypes. BEAST 1.84 [102] was employed to estimate the differentiation and diversification time between haplotypes. Chrysanthemum indicum, belonging to the same subtribe of Chrysantheminae with Opisthopappus (holding identified genomic information) was selected as the outgroup in BEAST analysis. The haplotype sequence of each primer was aligned to the NT (Nucleotide Sequence) database followed by manual splicing. Owing to the absence of the record of the Opisthopappus fossil data, the divergence time of Chrysanthemum and Opisthopappus (25.40 Ma) referred to the Time Tree website (http://www.timetree.org/) was adopted as a prerequisite for calibrating the age of most recent common ancestor (tMRCA).

The Akaike Information Criterion (AIC) with a “greedy” algorithm in PartitionFinder 2.1.1 [103] was employed to select the best-fit partitioning schemes and evolutionary models. Based on the AIC results, the dataset was partitioned into four groups (group1: SNP2 + SNP29, group2: SNP4 + SNP26, group3: SNP13 + SNP32, and group4: SNP19 + SNP23), and the phylogenetic relationships were inferred based on four optimal evolutionary models, namely HKY + I + G + X, HKY + I + G, SYM + I + G and GTR + I + X, corresponding to group1 to group4, respectively. The generic average mutation rate of 6.1 × 10–9 (5.1 and 7.1 × 10–9) for the nuclear DNA of the Asteraceae species was employed according to the present study [75]. Markov chain Monte Carlo (MCMC) was repeated 8 × 107 times by sampling every 80,000 generations. TRACER 1.5 [102] was used to check the convergence of the framework, which ensured that every tested parameter was greater than 200.

To assess whether the species had experienced a significant expansion, we utilized ARLEQUIN 3.5 [90] to calculate the Tajima’s D [104] and Fu’s FS [105] values. Moreover, the sum of square deviation (SSD) and raggedness index (Rag) in the mismatch distribution analysis (MDA) was performed in ARLEQUIN 3.5. The process employed a 1000 step permutation test.

Approximate Bayesian computation (ABC) analysis, provided by DIY-ABC 2.1.0 [106], enabled the estimation of complex evolutionary population histories. Based on the estimated genetic variations, genetic structures, and current geographic distributions, three evolutionary scenarios were proposed. Scenario 1: O. longilobus and O. taihangensis were differentiated from a common ancestral population during the same period. Scenario 2: O. taihangensis was an ancestral population, and O. longilobus was differentiated from O. taihangensis. Scenario 3: O. longilobus was the ancestral population, and O. taihangensis was differentiated from O. longilobus.

Each scenario was performed with 1,000,000 simulations and six summary statistics (number of haplotypes, number of segregating sites, mean of pairwise differences, Tajima’s D and private segregating sites) were selected. The substitution rates of nuclear genes were the same as those used in the BEAST analysis. To identify the best-supported scenario under direct and logistic approaches, we selected 1% of the simulated datasets closest to the observed data to evaluate model accuracy and estimate the relative posterior probability (PP) with 95% confidence intervals (95% CI) for each scenario. Further, the parameters including effective population size and divergence generation was estimated under the optimal scenario. The goodness of fit of the best supported scenario was evaluated by the option ‘model checking’ with principal component analysis (PCA). To estimate type I and II errors on the power of model selection, we assessed confidence in scenario choice with 500 simulated pseudo-observed data sets (PODs) for the seven plausible scenarios.

Additionally, the historical and contemporary gene flow were estimated within the two Opisthopappus species by MIGRATE-N 3.6 [107] and BAYESASS 3.0 [108], respectively. In MIGRATE-N 3.6, maximum-likelihood analyses were performed using 10 short chains (104 trees) and three long chains (105 trees) with 104 trees discarded as an initial burn-in’ and astatic heating scheme at four temperatures (1, 1.5, 3, and 1000,000). To ensure the consistency of estimates, we repeated this procedure five times and reported average maximum-likelihood estimates with 95% confidence intervals. The parameters θ and M were estimated using a Bayesian method, which could be employed to estimate the number of migrants per generation (Nm) into each population using the Eq. 4Nm = θ*M.

When estimating the contemporary gene flow using BAYESASS 3.0, the parameters were examined including migration rates (m), allele frequencies (a) and inbreeding coefficients (f) to ensure that the optimal acceptance rates of the three parameters fell within the 20–60% range. Ten independent runs were executed to minimize the convergence problem. The result with the lowest deviance was adopted according to the method of Meirmans [109], where the 95% credible interval was estimated as m ± 1.96 × standard deviation (SD).

Environmental variables influence analyses

Nineteen bioclimatic variables (Bioclim) representing Grinnellian niches [110, 111], which are defined as the scenopoetic environmental variables of a species required to survive, were downloaded from the WorldClim database (http://www.worldclim.org/) with a resolution of 30 arc-sec (~ 1 × 1 km) and extracted using the R package ‘raster’ [112]. Subsequently, the significance test of the distribution of climate factors along the two species was tested by one-way ANOVA. A principal component analysis (PCA) of independent climatic variables to reduce the dimensionality that defined the niche space, allowed for the comparison of the integrity of climate variables between O. longilobus and O. taihangensis, after which the PC1–PC3 were reserved for further analysis.

To test how the geographical and environmental differences impacted genetic differentiation, the Mantel test, partial Mantel test, and Barrier analysis were applied in this study. Further, a multiple matrix regression with randomization (MMRR) was performed to explore whether the genetic distance responded to variations in geographic and/or environmental distances.

Pairwise FST distance calculated in ARLEQUIN 3.5 was used as the genetic distance. The geographic distance was estimated using the GENALEX 6.5 according to three-dimensional factors (latitude, longitude, and elevation). The environmental distance was calculated using the Euclidean distance with PASSAGE 2.0 based on the first three PCs [113].

The Mantel test was performed in the R package ‘vegan’ [114], whereas the MMRR analysis was performed using the R package ‘PopGenReport’ [115, 116]. Logarithmic transformation of the distance matrices was conducted to ensure that they are in the same or similar order of magnitude. Regression coefficients of the Mantel test (r) and MMRR (r2) and their significance were determined based on 9,999 random permutations. Scatterplots to reveal the relationships between genetic, environmental, and geographic distances were conducted using GraphPad Prism 8 [117].

The biogeographic boundaries between population pairs were calculated by the Monmonier’s maximum-difference algorithm in BARRIER 2.2 [118] based on the multiple distance matrix. Permutation and bootstrap tests were conducted with 1000 replicates for each case (Fig. 1).

In addition, distance based redundancy analyses (dbRDA) were performed to elucidate whether the climatic variables conditioned on the geographic distribution explained the genetic differentiation of the populations using the R package ‘vegan’. Firstly, a distance-based principal coordinate analysis (PCoA) of the genetic data at the species level was performed to generate several principal coordinates (PCs) using the R package ‘ape’ [119]. Next, the PC1-3 of climatic variables were employed as explanatory variables conditioned on geographic factors, and significance tests were performed using the “anova. cca” [120] function in the R package ‘vegan’ with 999 permutations. The distribution pattern of the PC1-3 of climate variables along the ordination axes1-2 was further analyzed using a generalized linear model (GLM). Finally, the first two RDA axes and the explanatory variables were employed to construct the ordination and ordisurf plots of the dbRDA.