Background

Mitogenomes of flowering plants displayed unique features with extremely expansive non-coding regions, high recombinations, and frequent sequence transfers, which make them challenging and interesting to study [1]. The extreme variation of plant mitogenomes through inter-species contrasting existed not only between distantly related species but also at family or genus levels. A comparison of six Solanaceae mitochondrial genomes showed huge differences in size (423,596 bp-684,857 bp), similarity (38.13%-55.81%), and gene orders [2]. The mitogenomes of Silene undergo experienced unprecedented mutation rate increases and size expansions of more than 40-fold during evolution [3]. Even in closely related species, the structure of mitogenomes still showed various conformations due to intra- or inter-molecular recombination [4]. Understanding the laws of mitogenome variation and evolution at different taxonomic categories was still a challenge in angiosperm. More mitochondrial genome data and comparison analyses of them were needed, especially at low taxonomic levels.

In land plants, mitogenomes usually contained foreign genes or fragments due to horizontal gene transfer (HGT) or intracellular gene transfer (IGT) [5, 6]. Moreover, along with constant mitochondrial genome recombination, such sequence transfer between genomes is continuously ongoing [7]. It was considered one of the potential driving forces for the rapid evolution of mitochondrial genomes with complex compositions and special structures [8]. Furthermore, structural rearrangements and foreign fragment insertion of mitogenomes were closely related to gene chimerism in some plant lineages, resulting in cytoplasmic male sterility which was discovered in Brassica juncea, Oryza sativa, Brassica oleracea, etc. [9,19]. However, the extreme divergence of morphology gave rise to the difficulty of species taxonomy so that phylogenies in some clades were still confused in Dendrobium [

Table 1 General features of D. wilsonii and D. henanense mitogenomes
Fig. 1
figure 1

Mitochondrial genome maps of D. wilsonii (a) and D. henanense (b). Isoforms of two mitogenomes are depicted as two circles, respectively. a Annotation of plastid-derived genes (only intact genes are shown); (b) Mitochondrial gene annotation (grey gene names are reverse genes and black names are forward genes; "-ex" means exon; "-cp" represents tRNAs of plastid origin). Internal curves represent the positions of repeat pairs

We newly sequenced and assembled the chloroplast genome of D. henanense into a typical circular structure, consisting of quadripartite regions (two inverted repeats (IRA, IRB), a large single copy (LSC), and a small single copy (SSC)) (Fig. 2a). The chloroplast genome was 151,219 bp long with 37.52% GC content, while the length of four regions was 26,128 bp (IR), 84,962 bp (LSC), and 14,001 bp (SSC) with 30.35%-43.4% GC contents. The gene contents were well-conserved in Dendrobium. A total of 103 genes were annotated, including 69 protein-coding, 30 tRNA, and 4 rRNA genes. Among them, 11 protein-coding genes and 6 tRNA contained introns.

Fig. 2
figure 2

Plastome map of D. henanense, and plastid-derived sequences in mitogenomes of D. wilsonii and D. henanense. a Plastome map of D. henanense. Genes outside the circle are forward and inside the circle are reverse. Different colors represent different functional groups of genes; (b) Numbers of plastid-derived sequences with different lengths in mitogenomes of D. wilsonii and D. henanense; (c) Distributions of plastid-derived sequences in each isoform of D. wilsonii and D. henanense mitogenomes

Codon usage and RNA editing level of mitochondrial genes

The total length of protein-coding genes in D. wilsonii and D. henanense were 34,593 bp and 32,730 bp. The start codon was ATG in most of the protein-coding genes, excluding mttB. Three typical stop codons (TAA, TGA, and TAG) were detected in all protein-coding genes. The relative synonymous codon usage (RSCU) of all protein-coding genes was calculated using W 1.4.4 (Additional file 2: Table S2). Most NNT or NNA had high RSCU values (> 1.0), such as His (CAU, 1.5/1.48), Gln (CAA, 1.5), and Ser (UCU, 1.43), showing that A or U has a higher percentage at third codons than G or C.

In plant mitogenome, C to U RNA editing was common, playing a significant role in gene expression. We predicted 571 and 567 nonsynonymous editing sites in protein-coding genes of D. wilsonii and D. henanense mitogenomes (Table 1). RNA editing sites of the same genes were conservative between these two Dendrobium species, except the ccmFc, cob, and nad4. The number of RNA editing sites differed in each gene (Additional file 3: Fig. S1). The ccmFn had the most RNA editing sites (40 sites), whereas only two RNA editing sites were discovered in the rps11. In addition, editing levels among different codon positions were heterogeneous. The editing levels in second-codon positions were higher than in first-codon and third-codon positions. Notably, no editing sites were detected in the third-codon position of all genes.

Repeat and SSR analysis

Repeat sequences, including large repeats (> 1000 bp), intermediate repeats (100–1000 bp), and short repeats (< 100 bp), were closely related to recombinational activities and structural variations of plant mitogenomes. A total of 182 and 196 repeats were identified in D. wilsonii and D. henanense mitogenomes, which corresponded to 21.65% (165,202 bp) and 16.25% (131,259 bp) of the whole mitogenome length (Table 1). The total lengths of short, intermediate, and large repeats were 5,670 bp (6,674 bp), 15,506 bp (19,229 bp), and 144,026 bp (105,356 bp) in the mitogenome of D. wilsonii (D. henanense) (Additional file 4: Fig. S2). In three types of repeats, the number of short repeats accounted for the highest proportion in mitogenomes. The D. wilsonii mitogenome presents six large repeat pairs, ranging from 8,373 bp to 16,983 bp with high sequence identities (> 99%). Large repeat numbers of D. henanense (five pairs) were lower compared with D. wilsonii, ranging from 3,522 to 16,490 bp. The analyses of repeat-mediated recombinational activity showed that a total of 10 and 14 repeat pairs exhibited evidence of recombinational activity (Additional file 5: Fig. S3). These recombinationally active repeats were distributed among isoform 1, 2, 3, 5, 6, 7, 12, 15, and 20 in D. wilsonii mitogenome (isoform 1, 2, 4, 6, 8, 12, 14, 17, 19, 20, 21, and 24 in D. henanense mitogenome).

Three types of simple sequence repeats (SSRs) were discovered in mitogenomes, including mononucleotide, dinucleotide, and trinucleotide repeats (Additional file 6: Fig. S4). The total numbers of SSRs were 54 and 62 in mitogenomes of D. wilsonii and D. henanense. The distributions of SSRs were diverse in different isoforms. The isoform 1 of D. wilsonii and isoform 4 of D. henanense contained the largest number of SSRs (five and nine).

The repetitive content of the chloroplast genome was far less than that of mitogenomes. We only detected 3,542 bp repeats of D. henanense plastome, including four types ranging from 21 bp-141 bp. Forward, reverse, complement, and palindromic repeats account for 31%, 11%, 2%, and 56% of the total number of repeats (Additional file 7: Fig. S5). The distribution densities of these repeats varied in SSC, LSC, and IR regions. Compared with SSC and IR regions, the LSC region had a higher repeat density. There were 32 SSRs identified in the plastome (Additional file 8: Fig. S6). The numbers of SSRs in LSC, IRs, and SSC regions were 20, 4, and 8 respectively.

Synteny and gene clusters of two mitogenomes

The gene synteny between mitochondrial genomes of D. wilsonii and D. henanense were analyzed (Fig. 3). The gene contents of these two mitogenomes were conserved, and most genes are arranged in clusters. But gene orders and positions were various in isoforms. Subsequently, we identified gene clusters of D. wilsonii and D. henanense mitogenomes, with two or more adjacent genes (Additional file 9: Table S3). These two mitogenomes shared 14 gene clusters, including rrn26-trnM-CAT, atp8-nad4L-atp4, trnP-TGG- trnW-CCA, nad2-trnY-GTA, trnE-TTC-trnY-GTA, atp6_b-trnV-TAC, rps3-rpl16-rpl2-rps19, atp9-rps7, atp6-trnM-CAT, atp1-ccmFn, rrn5-rrn18, nad7-trnI-TAT, rps14-rpl5, and nad3-rps12. However, nad9-trnF-GAA and trnM-CAT-trnG-GCC of other Dendrobium mitogenomes were absent in these two mitogenomes. Although mitochondrial structures were various due to frequent rearrangements, gene clusters were highly conservative in D. wilsonii and D. henanense mitogenomes. These gene clusters were the potential co-transcription units and fragmentation of them probably resulted in partial loss of gene functions. It could explain the relative conservation of gene clusters in mitochondrial genomes.

Fig. 3
figure 3

Synteny between mitochondrial genomes of D. wilsonii and D. henanense. Syntenic gene pairs between mitogenomes are connected by grey curves

Sequence transfer from plastomes to mitogenomes

A total of 79,909 bp and 96,511 bp cp-derived sequences were identified, accounting for 10.5% and 12% of the length of D. wilsonii and D. henanense mitogenomes, respectively (Table 1). The length of cp-derived sequences ranged from 216–4,227 bp for D. wilsonii mitogenome and 263–9,901 bp for D. henanense mitogenome (Fig. 2b). In D. wilsonii mitogenome, the range from 200 to 500 bp was most common (30 cp-derived sequences), followed by 501 to 1,000 bp (21 cp-derived sequences). While in D. henanense mitogenome, the most common length of cp-derived sequences was 501–1,000 bp and followed by 1,001–1,500 bp. There were 5 and 12 intact plastid genes annotated in cp-derived sequences of two mitogenomes (Fig. 1).

To understand the distribution characteristics of transferred sequences, numbers of cp-derived sequences were calculated in different locations of mitogenomes (Fig. 2c). Among different isoforms, cp-derived sequences displayed uneven distribution which was independent of the length of isoforms. For instance, in the mitogenome of D. wilsonii, most isoforms with cp-derived sequences ranged from 1 to 10, excluding isoform 3, 7, 18, 19, and 22. The isoform 17 had the most cp-derived sequences (10). Similar results were also found in D. henanense mitogenome, with 1 to 13 cp-derived sequences distributed in 20 of 24 isoforms.

To explore the potential mechanism of continual sequence transferring, we detected the correlation among cp-derived sequences, GC content, and repeats (Additional file 10: Fig. S7). The correlative relationships between cp-derived sequences and GC contents (Pearson’s r = -0.34) were higher than cp-derived sequences vs repeats (Pearson’s r = -0.07), yet both correlative values were not at a significant level.

Mitogenome comparison of D. wilsonii and D. henanense with other four orchid species

We compared two newly sequenced mitochondrial genomes with other four orchid mitogenomes (Dendrobium officinale: LC640134‐LC640155; D. huoshanense: LC657527‐LC657545; Phalaenopsis aphrodite: MN366132-MN366175; Gastrodia elata: MF070084-MF070102). The sizes of mitogenomes vary from 576 kb in Phalaenopsis aphrodite to 1,339 kb in Gastrodia elata (Fig. 4a). Mitogenomes of Dendrobium species and other two orchid species displayed multi-chromosomal structures, consisting of 19–44 isoforms. The contents of protein-coding genes were similar among orchid species, including 37–38 unique genes with 30,969 bp-34,593 bp long.

Fig. 4
figure 4

Genomic comparisons among Orchidaceae mitogenomes. a Genome size and content of D. wilsonii and D. henanense and other four orchid mitogenomes. Lengths of repeats, cp_derived sequences, and coding regions are shown in different colors; (b) Similarity among six Orchidaceae mitogenomes. Blue represents low similarity. Red represents relatively high similarity

As expected, sequence similarities at the genus level were higher than at the family level (Fig. 4b, Additional file 11: Fig. S8). D. henanense mitogenome shared more sequences with D. huoshanense (91%), D. wilsonii (86%), and D. officinale (85%) than other orchid species (P. aphrodite -37%, G. elata -11%). Significantly, the mitochondrial genome of G. elata only shared 11–15% sequences with Dendrobium and Phalaenopsis, although their phylogenetic relationships were close in orchids. Numerous foreign sequences of G. elata mitogenome transferred from the mitogenome of its host due to HTG was a potential explanation for such low similarities.

We also examined the contents of repetitive and cp-derived sequences in the Orchidaceae (Fig. 4a). Repetitive sequences were diverse in mitogenomes of these species, ranging from 50,419 bp in D. officinale to 165,202 bp in D. wilsonii. Repetitive contents accounted for high proportions of the total length of mitogenomes (8%-22%). The cp-derived content was also an extremely variable feature across the Orchidaceae mitogenomes. There were 6,800 bp-96,511 bp cp-derived sequences identified, accounting for 0.5%-12% of whole mitochondrial sequences. Compared with other orchid species, cp-derived sequences were more abundant in the mitogenomes of Dendrobium species.

Phylogenetic analysis

In the present study, mitochondrial genomes of 26 Dendrobium species were newly assembled, with D. huoshanense mitogenome as a reference. The phylogenetic relationships of Dendrobium were reconstructed based on mitochondrial (matrix 1, matrix 2) and chloroplast genomes (matrix 3) (Fig. 5). Topologies of maximum likelihood (ML) and Bayesian inference (BI) phylogenies displayed high consistency in all three matrices (Additional file 12: Fig. S9 vs Fig. 5a, Additional file 13: Fig. S10 vs Fig. 5b, Additional file 14: Fig. S11 vs Fig. 5c). The backbones of trees based on three matrices were strongly supported with PP > 0.99 and BPML > 85%, excepting a few nodes.

Fig. 5
figure 5

Phylogenies of 26 Dendrobium species inferred from whole mitogenomes and plastomes. a Plastid phylogeny; (b) Mitochondrial phylogeny based on whole mitogenomes; (c) Mitochondrial phylogeny based on mitogenomes excluded plastid-derived sequences. Only BI trees of both mitochondrial and plastid phylogenies are shown because topologies in BI trees are almost identical to the results of ML trees (Additional file 1: Fig. S6-S8). The two numbers on each branch were bootstrap supports (BS) of ML analysis and posterior probability (PP) of BI analysis, respectively. Only BS > 50% are shown near the nodes. Black star label BS of 100% or PP of 1.00. Discordances between phylogenies are marked with different colors

To understand the effect of foreign sequences of mitogenomes on phylogenetic analyses, the phylogeny of matrix 1 with cp-derived sequences was compared with the matrix 2 (excluded cp-derived sequences) phylogeny (Fig. 5b, 5c). The results showed that the topologies of these two phylogenies were consistent in most clades, excepting two positions: (1) For the phylogeny of matrix 1, D. falconeri was clustered into the D. officinale clade comprising monophyly. While in matrix 2, D. falconeri was the paraphyly with the D. officinale clade; (2) The phylogenetic position of the D. fimbriatum clade and D. exile clade were clustered into monophyly in matrix 2. However, the relationship of these two clades was paraphyly in the phylogeny of matrix 1.

The mitochondrial phylogenies (matrix 1 and matrix 2) were also compared with the chloroplast phylogeny. Phylogenies of mitochondria and plastome were accordant in most clades, except for a few nodes (Fig. 5). The tree of the mitochondrial matrix 1 shared more topological features with the chloroplast tree than the tree of matrix 2. Nevertheless, phylogenetic relationships based on chloroplast and mitochondrial genomes differed in the positions of the D. gratiosissimum clade and D. densiflorum clade that immunized the effect of cp-derived sequences. The results showed that the mitochondrial phylogeny displayed unique evolutionary relationships of Dendrobium, distinct from the plastid phylogeny. Moreover, the existence of cp-derived sequences would let to an underestimate of the potential inconsistency between the mitochondrial tree and the plastid tree.