Introduction

Chloroplast (cp) is a specialized eukaryotic organelle and its genetic materials are mainly maternally inheritance, in which a core set of genes have originated from the cyanobacterial ancestor and are mostly involved in photosynthesis and metabolic processes1,2,3,4. Chloroplast genome has a small size and is roughly 120–180 kilobases in length5. The advancement of modern sequencing technologies has boosted the study of chloroplast genetics and genomics. Insights into chloroplast genome sequences have revealed considerable sequence and structural variations occurred within and between plant species. For example, three types of mutations, including gene/intron loss, inverted repeat changes and inversions in the land plant chloroplast genomes can lead to the gene order changes and are often referred to as structural changes or rearrangements5. To date, chloroplast genomes have been widely utilized as markers for studying the species identification, phylogenetic and population analyses6,7,8.

Filipendula Mill. (Rosoideae, Rosaceae) is a perennial herbaceous plant and contains approximately 15 species, which generally grows in the high mountain of the temperate regions9. The geographic distribution of Filipendula mainly covers East Asia, Europe and North America10. Filipendula species have long been utilized for medicinal purposes and most published papers have focused on the medicinal properties of these plants11,12. Their aerial parts (leave and flower) and underground organs (roots) are good resources of bioactive substances, including tannins, polyphenolic acid and essential oils, which have antioxidant, anticancer, anti-inflammatory, gastroprotective, anti-hyperalgesic, anti-genotoxic, and hepatoprotective effects13,14. Besides, the leaves of Filipendula can be processed into the herbal tea in Russia and other Siberia countries, which is used to relieve influenza and gout, to clean wounds and eyes15.

The classification of genus Filipendula is confused all the time16. Juzepczuk16 has divided this genus into three subgenera and two sections mainly based on the indigenous species. Afterward, Shimizu17 revised the former taxonomic system and classified 15 species of the whole genus into two monotypic subgenera (Hypogyna T. Shimizu and Filipendula) and one large subgenus (Ulmaria Moench) with four sections (Ulmaria Hill, Albicoma Juz., Sessilia T. Shimizu and Schalameya Juz.). In 1967, Sergievskaya amended the two former systems and divided the genus into four subgenera, including three subgenera of Shimizu’ system and subgenus Aceraria of Juzepczuk’s system16. Of these four subgenera, only Shimizu’s sect. Ulmaria was retained within subgenus Ulmaria and the remaining sections were transferred into subgenus Aceraria. In the last taxonomic revision of the genus, Schanzer9 divided the genus into four sections: Hypogyna, Schalameya, Albicoma and Filipendula mainly based on the morphological and geographic data. Therefore, the four systems are incongruent with each other to a certain extent and the names of some species in the different systems are still used.

To date, limited studies have been documented on Filipendula diversity and phylogenetic analysis. Only few studies have reported that isozymes18 and microsatellites19 can be used as markers to assess genetic variations in F. vulgaris. Investigations of the phylogeny of Rosoideae or Rosaceae have revealed that Filipendula as monophyly is sister to the rest of the subfamily Rosoideae20,21,22. Several evidence have revealed that the species in the basal lineage exhibited the unique chloroplast structure. For example, a single inversion as the powerful phylogenetic marker identified the basal members of the Asteraceae23. In a second case, two inversions and an expansion of the IR clarified the basal nodes in leptosporangiate ferns24. Whether did this phenomenon occur between Filipendula and other genera of the subfamily Rosoideae? However, the basic knowledge of the chloroplast genome in Filipendula is absent and the chloroplast phylogeny of Filipendula species has not been reported until now. Moreover, the infrageneric phylogenetic relationships of Filipendula was only analysed using one nucleotide segment (ITS)10. Therefore, the present study aimed to provide the unprecedented chloroplast genome data for comparative analysis, to reconstruct the infrageneric phylogeny of Filipendula based on eight cp genomes (F. vestita, F. ulmaria, F. palmata (including two varieties, F. palmata var. palma and F. palmata var. glabra), F. angustiloba, F. vulgaris, F. camtschatica and F. multijuga) and to explore evolutionary history of this genus.

Results

Characterization and structural analyses of eight Filipendula cp genomes

In this study, eight assembled cp genomes from seven Filipendula species in which F. palmata had two varieties (Fig. 1), had an average size of 154,522 bp (ranging from 154,205 bp-154,633 bp) and 36.63% GC content (Table 1). These eight cp genomes were divided into four regions and two copies of an inverted repeat (IR) separated large and small single copy regions (LSC and SSC), respectively (Fig. 2). The four regions formed the typical circular structure and varied a little in size, in which the LSC region had a largest size, ranging from 82,851 bp to 83,295 bp, followed by the IR region (from 27,093 bp to 27,286 bp) (Fig. 2 and Table 1). In addition, a total of 126 genes were annotated in each Filipendula cp genome except for F. camtschatica, including 81 protein-coding genes (PCGs), 37 tRNA and 8 rRNA genes. It was worth noting that the gene number had reduced by one because rpl14 was not found in F. camtschatica (Table 1). The majority of PCGs were involved in the photosynthesis and metabolism (Table S1). Of all genes, 16 duplicated genes were identified in the IR region, and 16 genes (petB, petD, atpF, ndhA, ndhB, rpoC1, rps16, rps12, rpl16, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, ycf3) had the introns, in which two genes (ycf3 and rps12) had two introns and the rest of them had one intron (Table S1).

Figure 1
figure 1

Photograph images of Filipendula palmata var. palmata (A-C), F. vestita (D and E) and F. ulmaria (F and G). Photos by Jie Cai, Ting Zhang and Ji-Dong Ya from Kunming Institute of Botany, Chinese Academy of Sciences.

Table 1 Summary of complete plastomes of Filipendula species.
Figure 2
figure 2

Circular map of Filipendula chloroplast genome. The inner grey circle indicates the GC content of each genome position. Genes in the inner circle of the genomic map are transcribed clockwise and vice versa.

To further analyse the structure of eight Filipendula cp genomes, multiple alignments were conducted, and the results indicated that there were an identical gene order and orientation across these tested Filipendula species (Figure S1), which was consistent with the result of circular map of Filipendula cp genome (Fig. 2). Early findings have indicated that the variations of IR play an important effect on the stability of plastome structure5,25. In this study, a comprehensive comparison of the IR/SC boundaries were analysed among eight Filipendula cp genomes. The result indicated that the boundaries of IR/LSC were very conserved, LSC/IRb/a (JLB/A) boundaries were flanked rps19 and trnH with a length of 8 bp away from the 5’ end and 3’ end of these two genes, respectively (Fig. 3). In contrast, the IR/SSC junctions showed the few differences. ψycf1 and ycf1 separately span the boundaries of IRb/SSC (JSB) and IRa/SSC (JSA). Two flanked distances of the junction point between ψycf1 and JSB or ycf1 and JSA exhibited the different lengths in these two genes because the lengths of ψycf1 and ycf1 occurred a few changes in Filipendula species (Fig. 3). Therefore, the nearly unchanged IR might facilitate the stability of plastome structure of this genus. Altogether, these results demonstrated that the cp genome structure was evolutionarily conservative in Filipendula.

Figure 3
figure 3

Comparison of the border regions of four chloroplast genome parts among Filipendula species.

However, we found that Filipendula cp genomes exhibited the structural differences when compared with those of other genera of Rosoideae. At first, Filipendula cp genomes had a smaller gene number and three genes (rps4, rpl2 and rpl32) were absent when compared to other genera of Rosoideae (Fig. 4). In addition, the gene order in three sequence blocks (ndhC and trnT-UGU, rps12 and accD, trnS-GGA and trnfM-CAU) of other genera of Rosoideae plants were highly conserved, whereas those of Filipendula cp genomes significantly differed (Fig. 4). Further analysis indicated that a minimum of three inversions occurred within cp genomes of Filipendula species (Fig. 4). Besides, Filipendula species had a plesiomorphic gene order similar to other genera of Rosoideae plants in two blocks of psbM and trnG-GCC, trnV-UAC and rbcL. However, these two blocks had the obvious changes in location within the cp genomes of Filipendula when compared to those of other genera of Rosoideae plants (Fig. 4). Such transpositions of these blocks caused to the divergent chloroplast gene order between Filipendula plants and other genera of Rosoideae plants (Fig. 4). Therefore, the cp genomes of Filipendula species exhibited the considerable differences in structure from those of other genera of Rosoideae plants: a minimum of 3 inversions, transpositions of two blocks within the LSC and gene losses.

Figure 4
figure 4

Structural variations between 15 representative genera of Rosoideae and Filipendula plastomes.

Repeats in plastome may be associated with the endpoints of inversion5. In present study, four types of repeats (palindromic repeats, forward repeats, reverse repeat and complement repeats) were detected in Filipendula cp genomes. The total number of repeats varied from 273 to 321 (Fig. 5), which outnumbered other species of Rosaceae (i.e. Sorbus)26. Filipendula camtschatica had the most abundant repeats, including 152, 6, 5 and 158 forward, reverse, complement and palindromic repeats, respectively (Fig. 4). Similarly, forward and palindromic repeats became two major repeat types in other six Filipendula species (Fig. 4). Although complementary repeats (2–6) and reverse repeats (5–8) had the small number, they were observed in each Filipendula species (Fig. 5). The majority of the repeats were found in intergenic regions (Table S2). Some repeats were found in coding or intron sequences of several genes, such as trnG-UCC, trnG-GCC, trnL-UAA, accD, psaA, psaB, clpP, ycf1, ycf2, ycf3, ycf4, petB, ndhF and trnL-UAA (Table S2). Interestingly, all the genes except trnG-UCC, ycf1, ycf2, petB and ndhF were located in three inversion and two transposition blocks (Fig. 5). Additionally, among six genes of reversion endpoints, only accD contained the repeats, none of repeats were observed in the remainder (ndhC, rps12, trnfM-CAU, trnS-GGA and trnT-UGU) (Table S2). It was worth mentioning that rps12 was duplicated in the endpoint of rps12-accD inversion in Filipendula (Fig. 5).

Figure 5
figure 5

Number of four type repeats examined in eight Filipendula chloroplast genomes.

Genomic sequence divergence analysis in Filipendula

To better understand the sequence divergence of Filipendula species, eight whole plastomes were compared and used to analyse sequence identity with mVISTA program using the cp genome of F. angustiloba as a reference. The results indicated that the whole cp genomes of Filipendula species were relatively conserved, in which the LSC region exhibited the highest divergence, whereas the IR regions were the most conserved (Figure S2). In addition, the high sequence divergence mainly occurred in noncoding regions, whereas only several genes (i.e., accD, clpP, ycf1 and ycf2) were found to be divergent in their coding regions (Figure S2).

SSRs are a class of short tandem repeats (1–6 bp) and highly polymorphic markers, which are widely distributed in the plastomes in plants and commonly used for species identification and phylogenetic analyses27,28,29. In this study, the mono-, di-, tri-, tetra-, penta- and hexa-nucleotide repeat units were analysed. Filipendula cp genomes were found to contain 105 (F. vulgaris) to 123 (F. camtschatica) SSRs (Fig. 6A, Table S3). Most of the SSRs were mononucleotide repeats (66.07%, 56.91%, 60.91%, 65.77%, 66.07%, 63.72%, 58.93% and 63.81% in F. angustiloba, F. camtschatica, F. multijuga, F. palmata var. glabra, F. palmata var. palmata, F. ulmaria, F. vestita and F. vulgaris, respectively), which mainly made up of A and T nucleotides (Fig. 6A,B). Dinucleotide repeats were the second abundant SSRs with the major constitution of AT/AT nucleotides (Fig. 6A,B). Trinucleotids and tetranucleotide repeats were small in number, but both repeats were observed in each Filipendula plastomes (Fig. 6A,B, Table S3). By contrast, pentanucleotids and hexanucleotids were found in only few Filipendula plastomes. For example, few pentanucleotide repeats were only found in F. multijuga, F. camtschatica and F. vulgaris and one hexanucleotide repeat was only found in F. vulgaris (Fig. 6A).

Figure 6
figure 6

Frequency of six SSR types (A) and distribution of SSR sequences (B) examined in eight Filipendula chloroplast genomes.

Besides, sliding window analysis was conducted to reveal the highly variable regions in eight Filipendula cp genomes. The average value of nucleotide diversity (Pi) over the entire cp genome was 0.005, indicating the whole cp genome was relatively conserved (Fig. 7). This result was consistent with the mVISTA result (Figure S2). In addition, we found that the high variability mainly occurred in noncoding regions. Four mutational hotspots with pi values greater than 0.02 were identified, namely ψycf1-ndhF, rps12-trnV-GAC, ndhF-trnL-UAG, and trnV-GAC-rps12 (Fig. 7). Of four variable regions, rps12-trnV-GAC and trnV-GAC-rps12 from two IR regions had the highest pi value. Based on these results, the noncoding regions exhibited the higher variability and divergence than the protein-coding regions. And then, selective signatures were determined by the ratios of non-synonymous (Ka) to synonymous (Ks) substitution rates on the 76 unique protein sequences. Our results demonstrated that the ratios of Ka/Ks of the majority of genes in these Filipendula species were less than 1, suggesting these PCGs were under strong purifying selection (Table S4). Two genes (matK and rps8) with Ka/Ks ratios more than 1 were under positive selection (Table S4).

Figure 7
figure 7

Sliding window analysis of Pi values among cp genomes of seven Filipendula species. X-axis, position of the midpoint of a window; Y-axis, nucleotide diversity of each window. (Window length: 600 bp, step size: 200 bp).

Phylogenetic and molecular dating analysis of Filipendula

The structural rearrangement of chloroplast genome is usually used for reconstructing phylogenies of plants5. Based on our results, the overall structure of cp genome was highly conserved in seven Filipendula species (including two varieties). Under this case, the high homoplasy of cp genome structure was not used for phylogenetic analysis. Nevertheless, the cp genome of genus Filipendula generated gene loss, transposition and inversion, whereas the other genera of Rosoideae were lack of these structural changes. Two previous studies have given an identical support for Filipendula as the first clade to split off the rest of Rosoideae in the nuclear and plastome trees21,45 in Phylosuit46.

From the best ML tree, we generated 1000 bootstrap replicates to produce a dated phylogeny with a 95% confidence interval (CI) on the age at the nodes using TREEPL47, following the guide by Maurin48. We considered 90 and 106.5 Ma as the minimum- and maximum-age calibrations for the stem of Rosaceae as suggested by Zhang et al.21. Three fossil calibrations were also used as minimum-age calibrations assigned to internal nodes (all outside our study clades) (Table S7).

Conclusions

The complete chloroplast genomes of seven Filipendula species were analysed in this study. The genome structure and gene content within Filipendula were rather conserved. However, gene loss, transpostion and inversion were observed in the cp genomes of Filipendula when compared with those of other genera of Rosoideae. Sequence divergence mainly occurred in noncoding regions, in which numbers of SSRs and four mutational hotspots were identified in each Filipendula species. The phylogenetic and molecular dating analyses showed that Filipendula was divergent from other genera of Rosoideae about 82.88 Ma (82.04–83.77 Ma, 95%HPD). And seven Filipendula species were split at 9.64 Ma (9.11–10.17 Ma, 95%HPD) into two major clades. The results provided the basis for the study of the evolutionary history and phylogenetic analysis of Filipendula.