Background

Ferula L. is one of the genera of Apiaceae [1], which was once classified in the tribe Peucedaneae [2, 3], but now in the tribe Scandiceae [4,5,6]. This genus, encompassing about 180–185 species all over the world [7], distributes in the Mediterranean region, Siberia, Central Asia, and northern Africa [3, 8, 9], and grows mostly in mountainous regions and desert clay soils [8, 10]. The Ferula genus has been chiefly recognized by the prominent taproots, stout stems, finely divided leaves with large inflated sheaths, and strongly dorsally compressed mericarps with filamentary or prominent dorsal ribs, narrowly or broadly winged marginal ribs and the plane or slightly concave commissural face [1, 6]. However, due to the great variations in the leaf, inflorescences, and mericarps anatomy, distinguishing this genus from nearby genera was extremely difficult. Hence, the taxonomic delimitation of Ferula has long been contentious. Pimenov [11, 12] suggested that Talassia and Soranthus should be transferred into Ferula according to the anatomical characteristics of the fruit which was the presence of a sclerotic cell layer in the mesocarp of fruits. Pimenov [13], according to the type specimens and morphological features, summarized the nomenclatural combinations of Ferula in China and merged the S. meyeri and T. transiliensis into the Ferula. However, Qin and Shen [53], looks forward to working out this difficulty. In animals, the mitochondrial gene cytochrome oxidase 1 has been confirmed to be reliable and valid as the DNA barcode for species identification [54, 55]. In plants, the common DNA barcodes including trnH-psbA, matK, and rbcL are insufficient to accurately identify species [56, 57]. The variation of the rbcL gene was relatively low (Pi = 0.00161) in the 22 studied plant species. As a result, this region may be restricted to accurately delimitating Ferula species.

According to the sequence variation, we chose five protein-coding regions (ycf1, ndhF, matK, rps11, and rpl22) and eight non-coding regions (ycf15/trnV, trnH /psbA, trnG/trnR, trnR /atpA, psbI/trnS, rps15/ycf1, rps2/rpoC2, and ycf3/trnS) as the potential identifiers for species in Ferula. Among them, the trnH-psbA region is a member of universal DNA barcodes [57]; ycf1 and rpl22, have been selected as the coming DNA barcodes in some plants [58, 59]. We will examine if these sequences could serve as valid DNA barcodes for species identification in the Ferula genus in future research.

Phylogenetic analyses

Same to previous results obtained by Kurzyna-Młynik et al. [6] based on nrITS data and by Panahi et al. [18] based on nrITS and three plastid DNA rps16 and rpoC1 intron, and rpoB-trnC intergenic spacer, our phylogeny based on plastome data robustly supported that T. transiliensis and S. meyeri nested in Ferula genus. This relationship also showed in our ITS-based phylogenetic tree, although the support of which was weak. Hence, transferring T. transiliensis and S. meyeri into the Ferula genus should be reasonable. And their name should be the F. transiliensis [60] and F. sibirica [11]. Additionally, our phylogenetic result with high resolution indicated that T. transiliensis and S. meyeri were more closely related to F. conocaula and F. syreitschikowii than the other Ferula species. However, due to the limited samples of Ferula acquired in our study and maternal inheritance of plastome, their phylogenetic positions within Ferula genus need to completely exploit in future studies.

The infrageneric taxonomy of Ferula was inconsistent in previous studies. Korovin et al. [19, 61] divided Ferula into six subgenera and eight sections based on vegetative features and habits. In The Flora of Reipublicae Popularis Sinica [15], the Ferula species grown in China were placed in four subgenera and four sections [15, 19]. However, Panahi et al. [17] proposed a new classification that included four subgenera and eight sections based on molecular phylogenetic results.

In our study, the 22 species were strongly divided into two lineages: one encompassed F. olivacea, F. paeoniifolia, and F. kingdon-wardii (lineage I); the other had the remaining species (lineage II). This result was further supported by species’ geographical distributions and mericarp structures. The members of lineage I are distributed in the alpine meadows and rock cranny of cliffs in Yunnan and Sichuan Provinces [1, 62]; the mericarps of these three species have very prominent dorsal and lateral ribs, and two vascular bundles were present in the dorsal and lateral ribs [63]. Whereas the members of lineage II are located in the gravelly slopes and desert gravels in **njiang and other provinces; their mericarps have filiform or slightly prominent dorsal and lateral ribs with one vascular bundle [15, 63]. Combining the robust phylogenetic framework and morphological characteristics, our result strongly supported the establishment of subgenera Sinoferula and subgenera Narthex [17]. But our result showed that the F. licentiana should be placed in the subgenera Narthex, and F. peaoniifolia should be added into subgenera Sinoferula. In addition, our result inferred that the infrageneric taxonomy of Ferula genus in Flora of Reipublicae Popularis Sinica [15] was inappropriate.

The adaptation evolution of Ferula plastome

Ferula species mostly grow in high-temperature, strong-bright, and drought environments, and thus we speculated several genes were probably under a special evolutionary process [1]. As we expected, 12 genes with significant posterior probabilities for codon sites were identified by the BEB test in our study. Researchers proposed that codon sites with higher posterior probabilities could be considered as positively selected sites, and genes in possession of positively selected sites may evolve under various selection pressure [64]. Therefore, 12 genes detected in our study may have undergone positive selection pressures. The 12 genes comprised two ATP subunit genes (atpB and atpF), five NADH dehydrogenase genes (ndhA, ndhC, ndhI, ndhJ, and ndhK), one gene (psbK) associated with photosystem II, one gene (rpl20) about large subunit of ribosome, and three RNA polymerase subunits genes (rpoB, rpoC1, and rpoC2). Among them, the largest proportion of genes (ndhA, ndhC, ndhI, ndhJ, and ndhK) are related to the NADH-dehydrogenase subunits. NADH-dehydrogenase subunits were fundamental to the electron transport chain for the generation of ATP, and photosynthesis of plants [65, 66]. Wang et al. [67] found that NADH could induce the PSI cycle electron to divert the electrons to avoid plants being injured and provide the ∆pH for CO2 assimilation for a certain period of time under high-temperature stress. Therefore, these genes under positive selection helped Ferula species refrain from injury and thrive in drought and intense light environments. Additionally, several codon sites with significant posterior probabilities were found in rpo genes (rpoB, rpoC1, and rpoC2). The rpoB gene encodes the β-subunit of RNA Polymerase in plastomes [68], and the rpoC2 gene encodes another subunit of RNA Polymerase which is responsible for the expression of photosynthetic genes [69]. The previous research indicated that RNA polymerase could not only keep the essential metabolic process to survive, but also regulate the process of gene transcription and expression, for facilitating species to respond to the changing environment conditions [70, 71]. Moreover, via implementing comparative experiments, Gao et al. [72] revealed that the rpoC2 gene underwent strong positive selection in the sun-loving rice species, and this phenomenon inferred that this gene was important for sun-loving rice species to adapt to the sunlight habitat. Hence, those rpo genes under positive selection in our analysis may contribute to adapting the bright environments for Ferula species. Furthermore, the atpF gene, encoding one of the subunits of H+-ATP synthase, played the crucial role in electron transportation, and photorespiration for plants [73]. In a previous study, this gene was positively selected in two evergreen Quercus species comparing with two deciduous Quercus species, which could help the evergreen species to resist the stress from cold and drought [74]. Generally, the Ferula species grow and develop in early spring and live in the arid desert areas [15, 75], thereby the atpF gene may be significant in environment adaptation of Ferula species. In brief, these positively selected genes have been beneficial to the development and reproduction of Ferula species, and played an important role in adapting to the harsh environment where Ferula species grow.

Conclusion

In our study, we sequenced and assembled 22 plastomes of Ferula, Talassia, and Soranthus species. Based on the comparative analysis of plastomes, we observed conservation in genome structure, gene number, codon usage, and repeats types and distribution, but variation in plastomes size, GC content, and the SC/IR boundaries. Thirteen mutation hotspot regions were detected and has potential as DNA barcodes for species identification in Ferula and related genera. Based on the phylogenetic analysis for Ferula using 22 plastomes and 62 ITS sequences, we agreed with some previous studies that Talassia and Soranthus should be placed into Ferula. Our result also supported the monophyly of subgenera Sinoferula and subgenera Narthex. The phylogeny reconstructed by the plastomes highlighted the strength of the plastome that possessed the more variable sites and greatly resolved the phylogeny of studied species. In addition, twelve genes with significant posterior probabilities for codon sites helped Ferula species to adapt to their harsh environments. Our study offers a new perspective for further study in phylogeny and evolution of Ferula species.

Methods

Plant materials and DNA extraction

Fresh leaves from adult plants of the 22 species were collected from each yield site. Then, the leaves were immediately dried using silica gel for DNA extracting. The total genomic DNA was extracted from the dried leaf tissue using a plant DNA extraction kit (Cwbio Biosciences, Bei**g, China). The formal identification of those samples collected was undertaken by Associate Professor Songdong Zhou (Sichuan University). The Voucher specimens were deposited at the herbarium of Sichuan University (Chengdu, China), and their deposition numbers were listed in the Additional file 11: Table S8. The newly sequenced 22 ITS have been submitted to NCBI (Additional file 8: Table S5).

Plastome genome sequencing and assembling

The raw reads of 22 newly sequenced species were generated from the Illumina HiSeq X Ten platform (paired-end, 150 bp) at Novogene (Tian**, China). The raw reads were filtered using fastP version v0.15.0 (-n 10 and -q 15) to yield clean reads [76]. Then clean reads were used to assemble plastomes using NOVOPlasty v2.6.2 [77] with default parameters and the rbcL gene (MK749921.1) of F. bungeana downloaded from NCBI as seed. The assembled genomes were initially annotated by the PGA [78], and then adjusted manually in Geneious v9.0.2 [79]. Using the same method, the plastomes of non-Ferula obtained from the NCBI were re-annotated. Finally, the plastid genome maps were drawn using Chloroplot [80].

Repeat sequences and codon usage

The Perl script MISA (http://pgrc.ipk-gatersleben.de/misa/) was used to analyze simple sequence repeats (SSRs) in the plastome sequences. The parameters of SSRs were set as follows: 10, 5, 4, 3, 3, and 3, in response to mono-, di-, tri-, tetra-, penta-, and hexanucleotides, respectively. The REPuter online program [81] was used to search repeat sequences including (F) forward, (P) palindromic, (R) reverse, and (C) complementary repeats. The parameters were as follows: (1) a repeat size of over 30 bp; (2) two repeats with more than 90% sequence identity; and (3) Hamming distance = 3. Then, the protein-coding genes were extracted from the 22 plastid genomes for codon analysis by the CodonW v1.4.2 program [82].

Genome structure and sequence diversity

The IR region contraction and expansion at the border of the plastome were analyzed by the online program IR scope [83]. The size and position of the gene were then manually adjusted. The sequence identity of whole plastomes was detected and visualized by the online program m-VISTA [84] in Shuffle-LAGAN mode, with the F. sinkiangensis as a reference. Nucleotide diversities of the coding genes and intergenic regions were calculated by DnaSP v5 [85].

Phylogenetic analysis

To investigate the phylogeny of Ferula, 42 plastomes and 62 nuclear ITS sequences were used to reconstruct the phylogenetic tree (Table S5). Chamaesium jiulongense X. L. Guo & X. J. He, Bupleurum commelynoideum de Boiss. were selected as the outgroups to root the phylogenetic tree according to the results of Zhou et al. [86]. For plastome data, 80 single-copy protein-coding sequences (CDs) commonly shared by the 42 plastomes were extracted using Phylosuite v.1.2.2 [87] and then respectively aligned by MAFFT v7.221 [88]. These alignments were concatenated as a super matrix by Phylosuite v.1.2.2 [87]. The nrITS sequences were aligned by MAFFT v7.221 [88].

The prepared data sets of CDs and nrITS were then subjected to Maximum-Likelihood (ML) analyses and Bayesian Inference (BI). For ML analysis, the phylogenetic trees were generated by RAxML 8.2.8 [89] with the GTRGAMMA model, as suggested in the RAxML manual, and 1,000 bootstrap replicates. The BI analysis was conducted using MrBayes v.3.2.5 [90], with the TVM + I + G and GTR + I + 0 substitution models determined by Modeltest v3.7 [91] for plastomes and ITS, respectively. Markov chain Monte Carlo (MCMC) algorithm was run for one million generations, with one tree sampled every 100 generations. The first 25% of trees were discarded as burn-in, and the remaining trees were used to build the consensus tree. The phylogenetic tree was displayed and edited in FigTree v1.4.2 [92].

Positive selected analysis

The Optimized Branch-Site model [93] and the Bayesian Empirical Bayes (BEB) [64] method were used to identify genes that were positively selected in Ferula species compared to the non-Ferula species. Single-copy protein-coding regions of 42 plastomes were extracted and then aligned using the ClustalW [94] with the amino acid codons. Then the alignments of sequences were trimmed. Finally, the trimmed alignments were used to implement the positive selection analysis by the CODEML algorithm in the PAML package [95] in EasyCodeml [96] with the branch-site model and the Ferula clade designed as the foreground branch. The BEB method was used to compute the posterior probabilities of amino acid sites to confirm whether these sites were selected positively and with high posterior probabilities [64]. The likelihood-ratio tests (LRT) were implemented based on Lan et al. [97], as a result, if the gene was with a p-value < 0.5, it would be certified as the positively selected gene. We then used Jalview v.2.11.1.7 [98] to view the amino acid sequences of positively selected genes.

Morphological observations of mericarps

The whole structures of dorsal and commissural side views, and anatomical structures including transverse section, rib shape, and vittae of mericarps in 12 species were observed and photographed via a stereomicroscope (SMZ25, Nikon Corp., Tokyo, Japan). These mature mericarps were selected randomly and measured by the KaryoType [99]. Mericarp terminology is based on Kljuykov et al. [100].