
The plastid is the metabolically active semi-autonomous organelle in plants, which is mainly involved in photosynthesis and can also participate in many biosynthesis [1]. The plastid, together with the nucleus and mitochondrion, are the three genetic compartments in the plant cell. The plastid genome (plastome) is generally circular in structure, with a typical size ranging from 120 to 160 kb in flowering plants [2]. In addition, the plastome in most plants have a tetrad structure with two equal-sized inverted repeats (IRa and IRb, ~ 20–28 kb), dividing the whole genome into a large single-copy region (LSC, ~ 80–90 kb) and a small single-copy region (SSC, ~ 16–27 kb) [3]. Generally, a total of 110–120 genes including protein-coding, ribosomal DNA (rDNA), and transfer RNA (tRNA) genes are encoded in the plastome with and the majority of them functioning photosynthesis [4].

The plastomes have long been documented to be conserved in evolution and have a moderate molecular sequence evolution rate as compared to the nucleus and mitochondrial genomes in plants [5]. However, with the increasing body of sequenced genomes, a certain degree of variation was observed for plastomes [6,7,8,9,10,11,12,13]. This is somewhat expected as the heterotrophic plants usually have lost the ability of photosynthesis, accompanied by the loss of photosynthesis-related genes and degradation of plastome with many genomic rearrangements, such as in Petrosavia [14] and Cuscuta [12, 13, 15]. Furthermore, plastomic variation can also be found in certain photosynthetic lineages, e.g., lycophyte, Fabaceae, subfamily Lobelioideae in Campanulaceae and Pelargonium [16,17,18,19,20,21]. The documented variations mainly involve expansion/contraction or even loss of IR, gene duplication, gene and intron loss, inversion, and genomic rearrangement [14, 20, 22,23,24,25,26,27].

Within the plastome, the IR region is more conserved in size and gene content, as well as lower substitution rate, compared to the single-copy regions [28]. The ribosomal genes (rrn16, 23, 4.5, and 5), together with seven protein-coding genes (rpl2, 23, ndhB, rps7, 12, ycf2, and part of ycf1) and several tRNA genes, are usually located in the IR region [2, 19]. It is speculated that IR can stabilize the plastome through a repair mechanism induced by homologous recombination [29, 30], possibly contributing to the slow evolution of plastome [2, 28]. However, the expansion and contraction of the IR region has frequently been observed such as expansion in Pelargonium and Petroselinum [19, 31] and contraction in Erodium, Trifolium, and Pinus [28], being able to cause extensive variation in plastome size. Even the complete loss of IR was found in angiosperm families of Fabaceae, Cactaceae, Arecaceae, and Putranjivaceae [2, 23, 24, 27, 32]. The loss of IR could be also found in the coniferous plants with short inverted repeat (sIR) generated, ranging in size from tens to more than 1000 base pairs, and the sIR has the ability to mediate different isomers of plastomes [33,34,35]. The IR could even be transformed into direct repeat (DR) by multiple events of inversions such as in Selaginella [21]. Furthermore, three IR copies was found in Chamaetrichon [36]. In addition to IR, the repeats larger than 1 kb can also provide homologous sequences for the recombination-dependent replication (RDR) [37], resulting in the coexistence of different alleles in an individual, called as plastome heteroplasmy [38].

In addition to the genomic rearrangements associated with the IR region, the occurrence of gene duplication, gene and intron loss, and inversions has also been reported in the plastome [24]. Gene duplication is generally rare and mainly due to the IR expansion, such as those found in the plastomes of Eleocharis, Arbutus unedo, and Asarum [39,40,41]. On the other hand, gene and intron loss are more frequently observed and approximately 62 independent loss events occurred across the evolution of flowering plants [24]. Multiple losses are observed for ndhA-K, infA, and rps16 and for introns in the clpP, rpl2, rpl16, rpoC1, and rps12 [42,43,44]. The occurrence of inversions has been reported in multiple lineages (e.g., Fabaceae, Poaceae, Passifloraceae, Pelargonium, Scaevola, Trachelium, Jasminum, and Oenothera) [23, 24] and sometimes could be used as a phylogenetic marker. Its occurrence could be caused by the presence of sequence repeats at both ends of it [45, 46] and those larger than 1 kb can lead to the rearrangement of plastome [16, 25]. The changes of gene order by inversions have been suggested to play a role in the adaptative evolution of green plants and algae [47]. Accelerating sequence evolution was also detected in the highly rearranged plastomes of Geraniaceae [27] and previous studies suggested a significantly faster evolutionary rate of plastomic DNA in Poales compared to the other commelinid groups [48, 49]. Moreover, the evolutionary rate may be heterogeneous during the evolution of Poales with low in early-diverging group while high rate in Poaceae [11, 50,51,52].

As described above, amounts of studies revealed the genomic variation of plastid in plants from different aspects in light of an overall conserved evolutionary history. However, the evolutionary pattern of plastomes in angiosperms, especially in order at the family level, remains scarce. As one of the largest and the most economically important order of plants [53], Poales was found to show a certain degree of plastomic variation in previous studies [39, 54,55,56,57,58,59], despite being photoautotrophic plants. For example, three inversions (~ 28 kb, ~ 6 kb, and < 1 kb in size) and multiple gene and intron losses have long been documented in the plastomes of grasses (Poaceae) [55]. Furthermore, similar inversions also occurred in the plastomes of Ecdeiocoleaceae and Joinvilleaceae as closely related families of Poaceae, as well as gene loss such as accD [56, 57]. More variation was revealed in the Cyperaceae and two sequenced Eleocharis species had distinct characteristics including larger plastome size of about 200 kb, high rate of sequence recombination, low GC content and gene density, and a large amount of repetitive DNA sequence with at least four different plastomic configurations existing in each species [39]. Recent sequencing of 37 plastomes from Bromeliaceae also found new lineage-specific rearrangements, including significant shift of IR boundary and large inversions in Tillandsioideae [58]. Many large repeats and rearrangements were also found in the four plastomes of Juncus (Juncaceae) [60]. Nevertheless, large-scale study about the general evolutionary pattern of plastome in the Poales as a whole is still lacking.

At present, more than 1000 plastomes of Poales were sequenced and deposited in the NCBI database (last access: October 10, 2022), but about 85% of them belong to a single family of Poaceae. Sixteen families are recognized in Poales, which could be divided into 5 groups: (1) early-divergent grade (Bromeliaceae, Typhaceae, Rapateaceae); (2) cyperid clade (Thurniaceae, (Cyperaceae, Juncaceae)); (3) xyrid grade (Mayacaceae, Eriocaulaceae, Xyridaceae); (4) restiid clade (Anarthriaceae, (Centrolepidaceae, Restionaceae)); and (5) graminid clade (Flagellariaceae, (Joinvilleaceae, (Ecdeiocoleaceae, Poaceae)) [11, 61, 62].

With more than 23,000 species, Poales diversified with extremely high diversity in terms of habits, morphologies, photosynthesis pathways, and pollination types, exhibiting a wide range of adaptations [49, 61, 63, 64]. Members of Poales exhibit all three photosynthetic pathways (C3, C4, and Crassulacean acid metabolism (CAM)), and species with C4 and CAM photosynthetic pathways are often adapted to exposed, hot, and dry environments [65]. CAM photosynthesis in Poales is restricted to Bromeliaceae, where it occurs in more than 40% of the species surveyed [66]. More than 6000 species of Poales have C4 photosynthesis, accounting for about 80% of the C4 plants in angiosperms, and are concentrated in Cyperaceae and Poaceae, each of which exhibit multiple origins of the C4 pathway [65, 67].

In addition, the ancestors of Poales appear to have been distributed in seasonally swampy/moist, highly infertile, and fire-prone habitats [49, 64]. This ancestral habitat was inherited by some extant families of Poales including Eriocaulaceae, Flagellariaceae, Juncaceae, Mayacaceae, Rapateaceae, Thurniaceae, Typhaceae, and Xyridaceae [49]. Recently, positive selection and photosynthesis-related adaptive mutations were detected in the evolution of the rbcL gene in several C4 plant groups including those species from the Poales [53, 68, 69]. Whether there were any genes involved in the adaptive evolution to the swampy habit and the photosynthesis-related adaptive evolution at the whole plastome level remain unexplored [13].

In our previous study, we sequenced 60 plastomes from 16 families of Poales [61]. Based on this dataset and publicly available data, we further developed a dataset of 93 plastomes of Poales (including 5 incomplete ones) representing 16 families and phylogenetic diversity of Poales at the family level. Our purpose is (1) to reveal the general pattern of plastomic variation in Poales; (2) to investigate the molecular mechanism underlying the variation; and (3) to explore the potential role of plastome variation may have played in the adaptive evolution of Poales. In short, we unfolded the unprecedented plastomic variation in the Poales and depicted their evolutionary trajectories. We found that the plastomic variations showed a pattern of “small-large-moderate” in the evolution and diversification of Poales, possibly contributing to the adaptation of its species to a wide range of habitats. Our study thus provides a comprehensive insight into the unusual variation of plastome in Poales in particular and for the understanding of the plastid evolution in angiosperms in general.


Dynamic configuration and characteristics of plastomes in Poales

Our dataset of 93 plastomes represents all the 16 families of Poales (Additional file 1: Table S1). Among the 60 sequenced plastomes in our previous study [61], the average sequencing depth of 55 complete plastomes ranged from 165.77 × (Anarthria humilis) to 7645.98 × (Leptaspis urceolata) (Additional file 1: Table S2), average at 1411.11 × . However, we were not able to acquire the complete plastome sequences for five species of Thurniaceae, Mayacaceae, and Ecdeiocoleaceae, but with only a few number of scaffolds assembled. The plastomes of Poales could also have multiple different configurations, and such plastomic alternations are mainly found in the families of Anarthriaceae, Cyperaceae, Juncaceae, Restionaceae, and Xyridaceae (Additional file 1: Table S1; Additional file 2: Fig. S1). After careful examination, all of the analyzed plastomes of Poales were assembled to have a typical quadripartite structure except for three species, Anarthria humilis, Isolepis setacea, and Xyris capensis (plus Xyris capensis var. schoenoides) (see details in the next section) (Fig. 1 and Additional file 2: Fig. S2).

Extensive variation in the genomic size and GC content of plastomes were found in the Poales (Fig. 1A and Additional file 1: Table S2). The largest plastome within the Poales, also the largest one within the monocots reported to date, was from the Anarthria humilis (Anarthriaceae) of 225,293 bp and approximately twice of the smallest one at 126,519 bp in Xyris capensis (Xyridaceae) (Fig. 1B). Similar difference was observed for the size of the LSC (117,896 bp in Eleocharis dulcis (Cyperaceae) vs. 74,401 bp in Juncus bufonius (Juncaceae)) and IR regions (41,905 bp in Carex siderostica (Cyperaceae) vs. 16,991 bp in Xyris indica (Xyridaceae)), while the SSC region showed a much greater variation from 34,770 bp in Leptaspis banksii (Poaceae) to just 1961 bp in Juncus grisebachii (Juncaceae) and about 17.7-fold difference (Additional file 1: Table S2). The general trend of size variation was that the early-divergent grade of Poales, i.e., Bromeliaceae, Typhaceae, and Rapateaceae, were relatively conserved in evolution, and the subsequently diverging groups (the cyperid, xyrid, and restiid) showed a high degree of variations, and the finally diverging graminid clade had a moderate variation (Fig. 1A).

The GC content of plastomes also varied in a large range from the lowest of 31.2% in Mayaca fluviatilis (Mayacaeae) to the highest of 39.1% in two Guaduella species (Poaceae). Like the observed pattern of genome size variation, the GC content was relatively conserved and with a level of 37.4%, 36.7%, and 36.8% on average for Bromeliaceae, Typhaceae, and Rapateaceae (Additional file 1: Table S2; Additional file 2: Fig. S3), respectively. Then the GC content was generally decreased along the diversification of the cyperid, xyrid, and restiid, especially for Mayacaceae at 31.2%, Cyperaceae at 33.6%, and Xyridaceae at 33.5% on average. Finally, the GC content rose in the graminid to reach the highest of 39.1% in Poaceae.

Expansion/contraction of IR and different IR types

Although the IR region of plastome is generally conserved in evolution, both the massive expansion and contraction of IR were recovered in the Poales, and in the extreme cases even the complete loss of one IR copy or the gain of a third copy of IR occurred (Figs. 2A and 3; Additional file 2: Fig. S2). The plastomes of Anarthriaceae and Cyperaceae experienced the greatest degree of IR expansion in the Poales. The IR of Anarthriaceae was expanded to include both the LSC and SSC regions with the whole ycf1 and trnG-UCC genes located in it, as well as partial accD and rpoA sequences. In the Cyperaceae, expansion of IR also had the whole ycf1 from the SSC and even with the rps15 and ndhA, G, H, and I genes being included (Fig. 2A; Additional file 2: Fig. S2). Eventually, the SSC region just had only seven genes retained. More strikingly, three IR copies were both found in two species of Anarthria humilis (Anarthriaceae) and Isolepis setacea (Cyperaceae) (Fig. 1B). The IRa and IRb were equal in length and encoded the same genes both in the two species, while the third copy defined as IRc was shorter with fewer number of genes. In Anarthria humilis, the IRa/IRb was expanded to 33,752 bp and IRc was slightly shorter of 26,551 bp due to the loss of ycf1 gene. The orientation of IRc was the same as IRa. And IR expansion and the gain of a third IR copy made Anarthria humilis being the largest known plastome in the monocots. In Isolepis setacea, the IRa/IRb was further expanded to 37,112 bp; however, the 11,339 bp IRc only contained a core set of four rRNA genes (4.5S, 5S, 16S, and 23S) and four tRNA genes (trnR-ACG, trnA-UGC, trnI-GAU, and trnfM-CAU) (Figs. 1B and 2A; Additional file 1: Table S2).

On the other hand, IR in the Xyridaceae species was contracted with the original genes of trnR-ACG, trnN-GUU, and ycf1 all being relocated in the SSC region. Moreover, only one complete IR copy occurred in Xyris capensis and X. capensis var. schoenoides. A pair of sIR of 3343 bp in X. capensis var. schoenoides and DR of 1650 bp in X. capensis was found, and both of which encoded two genes of infA and rpl36, likely playing a role like typical IRa/IRb (Fig. 1B; Additional file 2: Fig. S4). In addition, the plastome assemblies of Anarthria humilis, Isolepis setacea, Xyris capensis, and X. capensis var. schoenoides were selected for PCR verification, and the sequencing results supported the presence of three IR copies, sIR and DR in these genomes, respectively (Additional file 1: Table S3 and Additional file 2: Fig. S4).

For the remaining families, we found that the plastomes of Bromeliaceae, Rapateaceae, and Typhaceae were relatively conserved in the IR boundary, as well as Flagellariaceae of the graminid clade. However, the IR of Joinvilleaceae and Poaceae was expanded to the SSC region with the rps15 gene being included. In short, the IR expansion/contraction and gain/loss (Fig. 2B) echoed the trend of variation pattern in plastome size during the evolution of Poales as described above.

Multiple gene and intron loss

Comparison of the 93 plastomes of Poales showed that they encoded a unique set of 96 to114 genes, including 63 to 80 protein-coding, 25 to 30 tRNA, and 4 rRNA ones (Additional file 1: Table S2). However, with the exception of 34 genes (atp (A, B, E, H, and I), ccsA, cemA, matK, psaI, psb (A, C, D, E, F, I, J, K, L, and Z), rbcL, rpl16, rpoB, rps (2 and 4), trn (E-UUC, F-GAA, Q-UUG, S-GCU, S-GGA, S-UGA, T-UGU, and V-UAC), ycf (3 and 4)), the remaining ones all experienced some kinds of sequence duplication, degradation to being short fragmented copy or gene and intron loss in certain plastomes (Fig. 2A; Additional file 2: Fig S5).

At least one loss event was observed for the 38 genes and introns in the Poales, and the most frequent lost was ycf15 with 8 times, followed by 2 introns of clpP with 5 times and rps16, ycf1, and ycf2 with 4 times. The gene/intron loss events were mostly found in the plastomes of cyperid, xyrid, and restiid. Moreover, loss or degradation of all the ndh genes with just short fragments remained were only observed in certain species of Juncaceae (Fig. 2A). By contrast, only one gene loss was found in the plastome of Flagellariaceae as the early-divergent family of the graminid clade. However, multiple gene and intron loss occurred in the remaining three families, particularly Poaceae. Gene duplication was also found in addition to the 17 common genes in the IR region, a total of 51 genes with two or more copies were found (Additional file 2: Fig. S5).

Widespread occurrence of inversions

We further built a dataset of 88 complete plastomes to separately perform synteny analysis with representative species from each family with the least variation for illustration of inversions, a major structural rearrangement in the plastome evolution. Inversions were found in more than one third of the families in the Poales, and the majority of them were larger than 1 kb and found in the LSC region (Fig. 3; Additional file 1: Table S4). The early-divergent families were conserved with no inversion detected as compared to the typical plastome structure of flowering plants and with Ananas comosus used as the reference here, except for one hybrid species of Bromeliaceae (Vriesea x poelmanii), which had a ~ 28 kb inversion from psbD to accD (Fig. 3A, B; Additional file 1: Table S4).

Within the cyperid clade, the species of Cyperaceae and Juncaceae had 5–13 and 6–11 inversions, respectively. In Cyperaceae, most species shared six inversions, ranging from ~ 1 to ~ 6–10 kb. In Juncaceae, the occurrence of inversions was more diversified and most species only shared two inversions of ~ 1 kb and ~ 5–6 kb in size (Additional file 1: Table S4). At the family level, Cyperaceae and Juncaceae shared four inversions, and specifically had one inversion and three inversions (Fig. 3A, C), respectively.

In the xyrid grade, the plastomes of Eriocaulaceae had three inversions at most (Additional file 1: Table S4). However, the plastome of Paepanathus alpinus (Eriocaulaceae) showed great collinearity with that of Ananas comosus and no inversion was detected. Five to seven inversions were identified in Xyridaceae, and most species of this family had common five ones, with the largest one being ~ 17–20 kb. At the family level, Eriocaulaceae and Xyridaceae did not have shared inversions with only five unique inversions found in Xyridaceae (Fig. 3A, D).

There were eight, seven, and ten inversions identified in the three families of Anarthriaceae, Centrolepidaceae and Restionaceae of the restiid clade, respectively (Additional file 1: Table S4). Two common inversions were shared by all the three families. Moreover, the families of Centrolepidaceae and Restionaceae shared additional five inversions and had four and one unique inversions, respectively (Fig. 3A, E).

In the graminid clade, Joinvilleaceae and Poaceae contained five and three to four inversions, respectively. All the analyzed Poaceae species contained the three well-documented inversions (~ 28 kb, ~ 6 kb, and < 1 kb) in the previous study [55]. The rpl2/partial accD (~ 30 kb) inversion was only found in the early-divergent grass of Guaduella macrostachys. The occurrence of inversions in Poaceae was different from the pattern observed in the remaining families of Poales and the diversity was limited with the majority of inversions shared by all sampled grasses, indicating that they arose prior to the origin of this family (Additional file 1: Table S4). In addition, Flagellariaceae was relatively conserved without inversion detected. Although the representative species of Joinvilleaceae and Poaceae shared the rps15 inversion, they also exhibited three and two unique inversions, respectively (Fig. 3A, F).

Abundant repeats and heterogeneity of substitution rate

With large amounts of inversions observed, we calculated the genomic rearrangement distance. As expected, the largest distance was found in the cyperid clade, followed by the restiid clade, while the smallest one occurred in the early-divergent grade of Poales (Fig. 4A). We further detected repeats (≥ 30 bp) in the 93 plastomes. Among families, Cyperaceae had the largest number of repeats, with the maximum of 2718 in Carex siderostica and Bromeliaceae and Eriocaulaceae had the least number of repeats and both under 50. At the clade level, the cyperid clade had the largest number of repeats as well as the largest variation in the number from 168 to 2718, while the early-divergent grade had the least (Fig. 4B). In addition, the majority of identified repeats ranged in size from 30 bp to 1 kb, and a few repeats larger than 1 kb were only found in families of Anarthriaceae, Cyperaceae, Juncaceae, Mayacaceae, Poaceae, Restionaceae, and Xyridaceae (Additional file 2: Fig. S6).

Our plastome-based phylogenetic tree of Poales [11, 49, 61] possessed a combination of short and long branch lengths, indicating heterogeneous molecular evolutionary rates among families (Additional file 2: Fig S7). Being slow in the early-divergent grade, the substitution rate in the Poales gradually increased from the early-divergent grade to the restiid clade and reached the highest in the Juncaceae of the cyperid clade, and afterwards decreased in the graminid clade. Dividing the substitution rate into synonymous and non-synonymous, we also obtained a similar trend of variation (Fig. 4C). We further used three clock models (global clock, local clock, and clockless) to investigate the shifts in the rate of nucleotide substitution across Poales. The clockless model was the best and the local clock model was better than the global clock model according to the corrected Akaike information criterion (AICc) (Additional file 1: Table S5). Under the local clock model, the early-divergent grade had the lowest substitution rate as inferred from the branch lengths of phylogenetic trees above, while the highest rate in the intermediate lineages (cyperid clade, xyrid grade, and restiid clade) (Additional file 1: Table S5), exhibiting a 3.5-fold difference. These results clearly demonstrated the sharp shifts in the rate of nucleotide substitution during the evolution of Poales.

Correlations between plastomic characters and genomic variation

To investigate the molecular mechanism underlying the plastomic variation in Poales, we selected eight pairs of variables of 93 plastomes for multivariate correlation analysis and found that most of them were significantly correlated with each other (Fig. 4D). We considered that the correlation coefficient (r) at |r|≥ 0.8, 0.5 ≤|r|< 0.8, and |r|< 0.5 represented strong, moderate, and weak correlation, respectively. The rearrangement distance was found to be positively correlated with the repeat number (r = 0.80), as well as with other variables such as genome size, LSC size, and IR size with r at 0.53, 0.58, and 0.52, respectively. The repeat number was also moderately correlated with the genome size, LSC size, and IR size with r at 0.71, 0.72, and 0.62, respectively (Fig. 4D; Additional file 2: Fig. S8). In addition, the inversion numbers showed positive correlation with the repeat numbers (r = 0.79), as well as with the rearrangement distance (r = 0.95) as expected (Additional file 2: Fig. S9). On the other hand, negative correlations were found in the GC content between rearrangement distance (r =  − 0.59) and repeat number (r =  − 0.62). And the SSC size was negatively correlated with the IR size (r =  − 0.68).

Positive selection of genes in plastome

To detect the potential selection of protein-coding genes in the plastomes, we firstly used site-specific model and branch-site model to analyze 34 genes shared by all sampled 93 Poales species. Based on the site-specific model, we identified 8 genes (atpA, psbK, rbcL, rpl22, rpoB, rps2, rps7, and ycf3) showing positive selection signals (Table 1). To further reveal in which families the positive selection of these 8 genes occurred, we performed selection analyses with the branch-site model on them. We found that atpA and rpoB experienced positive selection in the cyperid clade (P = 0.013) and Mayacaceae (P = 0.017), respectively. Intriguingly, the rbcL gene was under positive selection in the Xyridaceae and the C4 clade of Poaceae, with five positively selected (P = 4.4E − 07) and one positively selected (P = 0.012), respectively. The rps2 gene was positively selected parallelly in the Flagellariaceae (P = 0.016) and restiid clade site (P = 0.014), respectively (Fig. 5; Additional file 2: Table S6). In addition, we also estimated the dN/dS ratio on each branch of the phylogenetic trees of 29 genes with loss events in certain plastomes. We found that all of them with the exception of ycf2 displayed an average dN/dS ratio < 1, indicating they were undergoing purifying selection. Moreover, the closely related lineages of those experiencing gene loss within the same family tended to have a larger dN/dS ratio than those of sister families without gene loss (Additional file 1: Table S7).

Diverse patterns of plastome variation in Poales

With the rapid development of sequencing technology, growing numbers of plant plastomes have been sequenced. However, a few studies have been conducted for an order at the family level [70, 71], with recent studies expanding coverage across all families of monocots [11] and angiosperms [72] while mainly focusing on phylogenetic relationships rather than plastomic evolution. Most studies have focused on the plastomic evolution at the familial and generic level, such as in algae and non-photosynthetic flowering plants [22, 36, 73]. The estimated 23,000 species of Poales are all photosynthetic autotrophic plants including four carnivorous species (two species of Brocchinia, Catopsis berteroniana, and Paepalanthus bromelioides) with diversified photosynthetic pathways distributed in various habitats [64]. Previous studies indicated certain variation in the plastomes of Poales but generally focused on individual families [39, 57,58,59]. Here, we expanded sampling of Poales representing all the 16 families of Poales, revealing diverse varying patterns among families from genome size, gene content to the GC content. The Anarthria humilis (Anarthriaceae, restiid clade) has the largest plastome within the monocots reported to date, being of ~ 225 kb, just 18 kb smaller than the largest one in angiosperms (Pelargonium transvaalense, 243 kb) [19]. This genome is about twice as large as the smallest one within Poales, i.e., Xyris capensis of ~ 127 kb (Xyridaceae, xyrid grade). At the family level, Cyperaceae has larger plastomes, averaging at ~ 186 kb and about 26 kb larger than the typical ones [39]. The plastomes of the other families were all in the typical range of size but with its own specificity for each family. The observed variation of plastome size mainly came from the groups of cyperid, restiid, and xyrid while moderate variation for the graminid clade and least for the early-divergent grade of Poales.

The plastomic GC content also varied among different families of Poales. The GC content was nearly the same at 37.0% for different species from the early-divergent grade of Poales and decreased in the cyperid, restiid, and xyrid, reaching the lowest value of 31.6% in Mayacaceae. Finally, the GC content was increased in the graminid clade with the highest of 39.1% found in Poaceae. The GC content of sequenced angiosperm plastomes ranged from 22.67 to 43.20% [74]. The highest and lowest GC content of Poales differed by 7.5%, showing a relatively large variation within an order of angiosperms. Moreover, this trend of variation was parallel to that observed in the GC content of nuclear genome [75] also with the highest level found in Poaceae (Additional file 1: Table S2).

The loss of gene/intron events of Poales were also diverse and species-specific. A total of 40 gene/intron that were documented to be lost with the most frequent lost genes were rps16, ycf1, ycf2, and ycf15. In addition, previously reported unusual loss of accD gene and introns of clpP and rpoC1 actually occurred multiple times in Poales [ to evaluate the assembly results. Our definition of gene loss was based on the combination of gene length and sequence similarity. When no similarity and length greater than 60% were detected in the whole genome, the gene is considered to be lost. Gene fragments with a certain degree of similarity (greater than 60%) to the normal genes, but with the stop codons appeared early in the open reading frames are defined as short fragmented/partial copy of genes. In addition, we selected four plastomes of Anarthria humilis, Isolepis setacea, Xyris capensis, and X. capensis var. schoenoides showing great structural variation to perform validation by polymerase chain reaction (PCR) and Sanger sequencing. The designed primers were provided in Additional file 1: Table S3. The plastome maps were drawn using OGDRAW v1.3.1 [99].

Plastome features and repeat analysis

We calculated the characteristics of plastome in Geneious v9.1.4 [100], including the genome size, gene content, and GC content. The presence and loss of genes in the plastomes were drawn using the heat map function of TBTOOLS [101] and the gene group map. Dispersed repeats defined into three types of forward, reverse, and palindromic were identified using REPuter [102]. The hamming distance was set as 3 the maximum and minimal repeat size was 5000 bp and 30 bp, respectively.

Collinearity and rearrangement distance

The plastome of Ananas comosus was chosen as the reference as its good collinearity to that of Nicotiana tabacum. With one IR region removed to avoid mistakes, 88 complete plastomes were separately analyzed for collinearity with that of Ananas comosus using progressiveMauve software [103]. The orientation of the locally collinear blocks (LCBs) was confirmed and marked by a (+ / −) sign, and a negative sign indicated the presence of an inversion. Compared with the reference, the number of LCBs in each plastomes was counted. Finally, the corresponding genomic rearrangement distances were calculated using GRIMM [104]. Based on the rearrangement distance, the species with the least plastomic structural variation in each family was selected for illustration. The inversion and the shift of the IR boundary was marked by manual on the map.

Multivariable pairwise correlation analysis

To investigate the potential underlying mechanism of plastome variation in Poales, we selected eight variables of GC content, repeats number, genome length, LSC length, SSC length, IR length, rearrangement distance, and CDs number for analyses. The correlation analysis between any two variables were carried out by the R, and the results were visualized by the ggplot2 packages.

Plastomic substitution rate and inference of rate changes

The sequences of concatenated 80 plastid protein-coding genes were trimmed dataset with using the Gblocks [105] with allowed gap with half (80PG-half matrix), and then inferred phylogenetic tree by RaxML [106] with 1000 replicates using the GTRGAMMA model. We included 99 individuals of Poales and 7 outgroup taxa (7 species of Commelinales and Zingiberales) (Additional file 2: table S8) for analyses. Sequence divergence for each branch in the tree was calculated by HYPHY v.2.2.4 [107]. With synonymous (dS) and nonsynonymous substitution rates (dN) in the MG94xHKY85 codon model. In order to more accurately test the molecular evolution rate changes across Poales, we chose baseml of PAML [108] for analysis. We selected three clock models for comparative analysis like previous studies [48, 109]. The global clock model was that assuming all Poales lineages had the same molecular evolution rate. The local clock model was that assuming specified branches had different evolutionary rates and other branches had the same one. In the clockless model, the rates of each branch were shifty. Our analysis used the GTR + Γ model and the 80PG-half matrix with the corresponding phylogenetic tree derived from it.

Selective pressure analysis

We calculated the ratio (ω) of the non-synonymous substitution rate (dN) and the synonymous substitution rate (dS) to estimate the selection pressure with site-specific and branch-site model. The 34 protein-coding genes common to all 106 samples were selected and computed by the codeml program in PAML [108] in the site-specific and branch-site model, respectively, based on phylogenetic tree of 80PG-half matrix. The one single-gene matrices (Additional file 4) were aligned and then treated by Gblocks with allowed gap with half. The site-specific model parameters were set to model = 0, NSsites = 0, 1, 2, 3, 7, and 8, and seqtype = 0. The P-values of likelihood ratio tests (LRTs) were calculated for the following three pairs of models to identify positively selected genes (p < 0.05) including M0 (one-ratio) vs. M3 (discrete), M1 (near neutral) vs. M2 (positive selection), M7 (β) vs. M8 (β and ω). The branch-site was used to evaluate potential positive selection in the Xyridaceae, Flagellariaceae, Eriocaulaceae, Mayacaceae, Typhaceae, Rapateaceae, the cyperid clade, the restiid clade, and C4 clade of Poaceae that were respectively as the foreground branches. A neutral branch-site model (Model = 2, NSsites = 2, Fix_omega = 1, omega = 1) and an alternative model (Model = 2, NSsites = 2, Fix_omega = 0, omega = 2) were used, respectively. The P-values were calculated by right-tailed Chi-square test off on the difference of log-likelihood values between the two models with one degree of freedom. Moreover, BEB method [110] was performed to compute the posterior probabilities for amino acid sites potentially under positive selection. P-value < 0.05 and ω > 1 was of the gene defined under positively selected gene. The posterior probability > 0.95 for a site was defined as positively selected. We performed a selection pressure analysis for 29 lost genes with 5 genes yielding meaningless results so only the results of the analysis of 24 genes were obtained. The species did not have the gene were manually pruned from the phylogenetic tree inferred from all 106 species used as a reference tree. Employing the corresponding tree, one ratio (model = 0, NSsites = 0) and the free ratio model (model = 1, NSsites = 0) was employed to calculate dN, dS, and dN/dS to obtain a general evolutionary pattern of selective pressure along the closely related lineages with gene loss. The mean dN/dS ratios were estimated by excluding genes with an extremely small estimation of dN or dS (< 0.001, which would always result in a very large dN/dS) [111]. The P-value was calculated as described above.