1 Introduction

Descurainia sophia, also known as flixweed, belongs to the genus Descurainia of the Cruciferae family and is mainly distributed in the northern part of China. The dried mature seeds of the D. sophia are called “Tinglizi”, which is the mainstream species of “Tinglizi” commonly used in clinical medicine nowadays and is recorded in the Pharmacopoeia of the People's Republic of China (2020 edition) [1]. Traditional Chinese medical theory suggests its seeds have the functions of cough relief, asthma prevention, urination promotion and edema alleviation [2]. In addition, the seeds of D. sophia was found to have significant pharmacological activities in improving cardiovascular function and regulating blood lipid levels [3]. Through analyzing the chemical composition, various secondary metabolites were isolated from D. sophia, such as glucosinolates, isothiocyanates, flavonoids, cardiac glycosides, phenylpropanoids, organic acids, and fatty oils, etc. [3]. A total of 10 glucosinolates [4,5,6] and 10 isothiocyanates [7, 8] have been identified from D. sophia, and in 2016 we isolated 4 benzenic glucosinolates compounds from the seeds of D. sophia [9]. Glucosinolates and isothiocyanates were not only involved in the defense system of the plant, but also had preventive effects against diabetes and hypertension, neuroprotection, and anticancer, and were the basis of important medicinal substances for D. sophia [10]. Moreover, the oil content of D. sophia seeds was as high as 39–44%, and the oil was mostly unsaturated fatty acids, including 38.13–40.92% linolenic acid [11]. As an essential fatty acid, linolenic acid had a variety of biological activities, such as lowering blood lipids, cardiovascular protection, anticancer, antioxidant, etc. [12], and had a high medicinal and nutritional value. At present, most studies on D. sophia have focused on the isolation of chemical components and pharmacological activities, but there are no reports about the nuclear and mitochondrial genomes of D. sophia.

Mitochondria are one of the important organelles in eukaryotes and are the site of aerobic oxidation in the organism, providing energy for various physiological activities of the cell. Mitochondria possess their own genetic material and genetic system, and are semi-autonomous organelles [13]. The first plant mitochondrial genome (mitogenome) sequenced was that of the terrestrial plant- Marchantia polymorpha, reported in 1992 [14]. With the development of sequencing technology, the mitogenome data of the dicotyledons Arabidopsis thaliana, Beta vulgaris, Nicotiana tabacum, Brassica napus, the monocotyledons Zea mays, Oryza sativa [15], and the medicinal plants Salvia miltiorrhiza [16], Cannabis sativa [17] have been released. The size of the mitogenome varies widely among different species of plants and animals, with most animals having a mitogenome size of about 15–17 kb, while angiosperms generally have a mitogenome size between 200 and 750 kb, with that of cucumber reaching 1156 kb [15]. Research on the mitogenome has progressed much less than that on the chloroplast genome because of its complex structure, few coding genes, and difficulty in assembly. As of 2023-Nov-15, 10,387 chloroplast genomes, 596 mitochondrial genomes and 1296 plastid genomes data have been recorded by the NCBI GenBank database. The main characteristics of plant mitogenomes include huge variation in genome size and structure, highly conserved genes, sparse gene distribution, massive amount of non-coding sequences, and a large number of RNA editing sites.

In recent decades, with the rapid development of second- and third-generation high-throughput sequencing technologies and the decreasing cost of sequencing, more and more plant mitogenomes have been reported, and plant mitogenomes are used for molecular identification and phylogenetic analysis [18]. Therefore, the mitogenome of D. sophia was of great value for studying the origin and evolution of D. sophia, determining the phylogenetic status and the conserving germplasm resources. In this study, the complete mitogenome of D. sophia was assembled using high-throughput sequencing technology, and the structure and composition of its mitogenome were investigated, which provided the molecular basis for studying the genetic diversity, evolutionary relationship and the development of new molecular markers in D. sophia.

2 Materials and methods

2.1 Plant materials and mitogenome sequencing

The leaves of D. sophia was collected from the Henan Provincial Medicinal Botanical Garden, Henan University of Chinese Medicine, Zhengzhou City, Henan Province, China (34°46′41″N, 113°48′28″E) in April 2022. High-quality genomic DNA was isolated from leaves of D. sophia using the modified CTAB method [19], followed by high-throughput sequencing on the PacBio Sequel II and Illumina HiSeq X Ten platforms.

2.2 Assembly and annotation of mitogenome

The Illumina short-reads were assembled using GetOrganelle v1.7.7.0 [20] with the default parameters for assembling plant mitochondrial genomes, and selected the plant mitochondrial genome database “embplant_mt”. The PacBio long-reads were assembled directly using the default parameters of Flye assembler v2.9.2 [21] to obtain graphical assembly results in GFA format. Then the PacBio long-reads were aligned to the mitogenome by BWA v0.7.17 [22], and the aligned PacBio long-reads were exported through the SAMtools [23], which was used to resolve the repeated sequence regions of the graphical mitogenome assembled from the Illumina short-reads. The results of the short-reads and long-reads were then aligned to finally obtain the structure and composition of D. sophia mitogenome.

Arabidopsis thaliana (NC_037304), Capsella rubella (NC_042883.1) and Liriodendron tulipifera (NC_021152.1) were selected as reference genomes and Geseq v2.03 [24] was adopted to annotate the protein coding genes (PCGs) of the D. sophia mitogenome. The tRNAscan-SE v2.0.11 [25] and BLASTN v2.13.0 [26] (parameter: -evalue 1e-5 -outfmt 6 -max_hsps 10 -word_size 7 -task blastn-short) were applied to annotate tRNA and rRNA genes, respectively. The Apollo v1.11.8 [27] was used to manually correct mitogenome annotation errors.

2.3 Analysis of codon preference and repeated sequences

The Phylosuite v1.1.16 [28] was adopted to extract the PCGs of the mitogenome, and then the PCGs were analyzed for codon preference and RSCU (Relative Synonymous Codon Usage) values were calculated using MEGA v7.0 [29]. Repeated sequences including simple sequence repeats (SSRs), tandem repeats and dispersed repeats were identified by MISA v2.1 [30], TRF v4.09 [31] and REPuter program [32] (default parameters), respectively. The results were visualized by Excel 2021.

2.4 Identification of mitochondrial plastid DNAs (MTPTs)

The GetOrganelle v1.7.7.0 [20] and CPGAVAS2 [33] (default parameters) were adopted to assemble and annotate the chloroplast genome of D. sophia from Illumina short-reads, respectively. The annotation results of the chloroplast genome were corrected using CPGView software [34]. The BLASTN v2.13.0 [26] was applied to identify the mitochondrial plastid DNAs (MTPTs) between mitogenome and chloroplast genome. The results were visualized by using Circos v0.69.9.

2.5 Phylogenetic and synteny analysis

The 29 plant species, including D. sophia, from three orders of angiosperms were selected according to their genetic relationships to construct phylogenetic tree, and their complete mitogenomes were downloaded from the NCBI GenBank database. These species were D. sophia (OQ916154), Brassica napus (NC_008285.1), Brassica rapa (NC_016125.1), Brassica juncea (NC_016123.1), Brassica carinata (NC_016120.1), Brassica oleracea (NC_016118.1), Brassica nigra (NC_029182.1), Boechera stricta (NC_042143.1), Arabidopsis thaliana(NC_037304.1), Arabis alpina (NC_037070.1), Capsella bursa-pastoris (MN746809.2), C. rubella (NC_042883.1), Sinapis arvensis (NC_031896.1), Schrenkiella parvula (KT988071.2), Eruca vesicaria (KF442616.1), Raphanus sativus (NC_018551.1), Carica papaya (NC_012116.1), and Batis maritima (NC_024429.1) of the order Brassicales; Theobroma grandiflorum (NC_066895.1), Theobroma cacao (NC_066894.1), Aquilaria sinensis (NC_054354.1), Bombax ceiba (NC_038052.1), Gossypium thurberi (NC_035074.1), Hibiscus cannabinus (NC_035549.1), Gossypium arboretum (NC_035073.1), Gossypium davidsonii (NC_035075.1), and Gossypium trilobum (NC_035076.1) of the order Malvales; Cotinus coggygria (NC_064986.1) and Mangifera longipes (NC_060990.1) of the order Sapindales. The mitogenomes of C. coggygria and M. longipes were set as outgroup. The 21 common genes (atp6, atp8, ccmB, ccmC, ccmFC, cob, cox1, cox3, matR, nad1, nad2, nad4, nad5, nad6, nad7, nad9, rpl5, rpl16, rps3, rps14, sdh4) that were 100% conserved in all plant mitogenomes were extracted with PhyloSuite v1.1.16 [28]. The multiple sequence alignment analysis was performed using MAFFT v7.505 [35], and then IQ-TREE v1.6.12 [36] was used to construct phylogenetic tree. The iTOL v4.0 [37] software was adopted to visualize the results of phylogenetic tree.

According to the results of phylogenetic analysis, the mitogenomes of seven plant species (belonging to Brassicales) were chosen for synteny analysis, with the following species: C. bursa-pastoris, C. rubella, A. thaliana, B. stricta, D. sophia, B. maritima, and C. papaya. Among them, C. bursa-pastoris, C. rubella, A. thaliana, B. stricta were more closely related to D. sophia, while B. maritima, and C. papaya were more distantly related to D. sophia. To analyze the collinearity relationship of these seven mitogenomes, BLASTN results for two-by-two alignment of mitogenomes with each species were acquired based on the BLAST program. Homologous sequences longer than 500 bp were reserved as conserved collinearity regions, and multiple synteny plot were generated by MCscanX software [38].

2.6 Prediction of RNA editing sites

The DNA sequences of 35 PCGs encoded by the D. sophia mitogenome were used as input files, and then the C → U RNA editing sites of mitochondrial PCGs were predicted by Deepred-Mt software [39] with the parameter of “cutoff value = 0.9”. Deepred-Mt was based on the Convolutional Neural Networks (CNN) model for prediction, which had a high accuracy compared to other prediction software.

3 Results

3.1 Characteristics of D. sophia Mitogenome

We assembled mitogenome of D. sophia based on PacBio Sequel II and Illumina HiSeq X Ten platforms. A total of 14.8 Gb raw data from PacBio long-reads and 12.9 Gb raw data from Illumina short-reads were generated, achieving the 60.76 × average depth of D. sophia mitogenome coverage with no gap (Fig. S1). The draft mitogenome assembled from Illumina short-reads was visualized using Bandage software [40], resulting in a unitig graph, which represented the mitogenome structure of D. sophia. The yellow blocks in unitig graph indicated the repeated sequence regions of branch nodes (Fig. 1A). Afterwards, the branch nodes caused by repeated sequences were resolved based on PacBio long-reads, and then a circular contig molecule was generated from the unitig graph (Fig. 1C). This was consistent with the results obtained by assembling PacBio long-reads directly using Flye software (Fig. 1B), which differ in length by only two bases. The above results showed that the assembly results based on Illumina short-reads and PacBio long-reads were consistent. Considering the high accuracy of Illumina sequencing platform, we preferred the assembly results based on Illumina short-reads. The main structure of the D. sophia mitogenome was a single circular molecule. After excluding repeated regions with PacBio long-reads, one closed-loop DNA molecule without branches was assembled with a total length of 265,457 bp and GC content of 44.78% (Fig. 2). The assembled mitogenome sequences and raw data (Illumina and PacBio) of D. sophia had been submitted to the NCBI GenBank database (accession number OQ916154) and SRA database (accession number PRJNA1078884).

Fig. 1
figure 1

Structure analysis of D. sophia mitogenome. A The unitig graph of D. sophia mitogenome assembled from Illumina short-reads, the yellow blocks represent the repeated sequence regions of branch nodes, and the green lines indicate a single loop contig; B Mitogenome assembly map based on PacBio long-reads; C A circular contig molecule are obtained after resolving the branch nodes caused by repeated sequences based on PacBio long-reads

Fig. 2
figure 2

The complete mitochondrial genome map of D. sophia. The genes inside the circle are transcribed clockwise, whereas the genes outside the circles are transcribed counterclockwise. Genes are represented by different colors according to their functional classification. The arrows indicate the positive and negative strands. The gray inner circle stands for the GC content

The D. sophia mitogenome was annotated with 56 unique genes, including 35 PCGs, 3 rRNA genes (rrn5, rrn18, and rrn26) and 18 tRNA genes (of which trnM-CAU, trnS-GCU and trnY-GUA were multi-copied genes) (Fig. 2, Table 1). PCGs contained 25 core genes and 10 variable genes (Table 1). The 25 core genes were divided into 7 classes, including 9 NADH dehydrogenase genes, 1 ubiquinol cytochrome c reductase gene, 3 cytochrome c oxidase genes, 5 ATP synthase genes, 5 cytochrome c biogenesis genes, 1 maturase gene, and 1 transport membrane protein gene. The 10 variable genes were composed of ribosomal protein large subunit (4 genes), ribosomal protein small subunit (5 genes), and succinate dehydrogenase (1 gene).

Table 1 Gene composition in the mitogenome of D. sophia

3.2 Analysis of codon preference in D. sophia Mitogenome

The 35 PCGs of D. sophia mitogenome were analyzed for codon preference, and the codon usage for each amino acid was shown in Fig. 3. Codons with RSCU (Relative Synonymous Codon Usage) value greater than 1 were considered to be used preferentially by amino acids (Table S1). Apart from the RSCU values of 1 for both the start codon AUG and tryptophan (UGG), there was also a general codon usage preference for mitochondrial PCGs. For instance, the stop codon had a high usage preference for UAA with the highest RSCU value of 1.68 among mitochondrial PCGs, followed by alanine (Ala) with a usage preference for GCU with an RSCU value of 1.6. Notably, the maximum RSCU value of phenylalanine (Phe) was less than 1.2, which did not have a strong codon usage preference.

Fig. 3
figure 3

Analysis of codon preference in D. sophia mitogenome. Horizontally, the 21 amino acids and their corresponding codons are displayed on the x-axis. Vertically, rectangles of different color indicate RSCU values for different codons of the same amino acid

3.3 Analysis of repeated sequences

A total of 57 SSRs were identified in D. sophia mitogenome. The monomeric, dimeric, trimeric, tetrameric and pentameric forms of SSRs were 14, 8, 10, 22, and 3, accounting for 24.56, 14.03, 17.54, 38.60 and 5.26% of the total SSRs, respectively, and no hexameric SSRs were detected (Fig. 4A). Among the 14 monomeric SSRs, 6 adenine (A) monomeric repeats were identified, taking up 42.86% of the monomeric SSRs (Table S2). In the mitogenome of D. sophia, there were 21 tandem repeats with a match greater than 73% and length of 8–71 bp (Table S3). Moreover, 391 pairs of dispersed repeats with length greater than or equal to 30 bp were detected, including 210 pairs of palindromic repeats and 181 pairs of forward repeats (Fig. 4B). However, no reverse repeats and complementary repeats were identified, with the longest palindromic repeat being 1062 bp and the longest forward repeat being 451 bp (Table S4).

Fig. 4
figure 4

Analysis of repeated sequences in D. sophia mitogenome. A Types of SSRs; B Types of tandem repeats and dispersed repeats

3.4 Mitochondrial plastid DNA analysis

During mitochondrial evolution,  some chloroplast genome fragments were transferred into the mitogenome to produce mitochondrial plastid DNAs (MTPTs). The 12 homologous fragments were identified between chloroplast genome and mitogenome of D. sophia (Fig. 5, Table 2), with a total length of 6,004 bp, accounting for 2.26% of the mitogenome. Among them, there were two fragments more than 1,000 bp, and the longest aligned fragment was MTPT7 with a length of 2,485 bp. Through annotating these 12 homologous fragments, 7 complete genes were detected (Table 2), including 1 chloroplast PCG (psaA) and 6 tRNA genes (trnD-GUC, trnN-GUU, trnI-CAU, trnM-CAU, trnS-GGA, and trnW-CCA). Many tRNA genes remained intact in both chloroplast and mitochondrial organelle genomes and had high sequence similarity, indicating that they can still play a role in the mitogenome of D. sophia.

Fig. 5
figure 5

Homologous fragments between chloroplast genome and mitogenome in D. sophia. The blue arc represents the mitogenome (mtDNA), the green arc indicates the chloroplast genome (cpDNA), and the yellow lines stand for the homologous fragments between two organelle genomes

Table 2 Transferred fragments from the chloroplast genome to the mitogenome in D. sophia

3.5 Phylogenetic and synteny analysis

To investigate the phylogenetic relationships of D. sophia mitogenome, based on the DNA sequences of 21 common mitochondrial PCGs, we performed the phylogenetic analysis of 29 plant species from three angiosperm orders, including D. sophia (Fig. 6). The mitogenomes of M. longipes and C. coggygria, two species of Sapindales, were set as outgroups. The results showed that plant species from three orders, Brassicales, Malvales and Sapindales, were clustered into three separate branches. According to the topological structure of mitogenome phylogeny and the latest Angiosperm Phylogeny Group (APG IV) classification [41], D. sophia belonged to the Cruciferae family and was more closely related to B. stricta (Fig. 6).

Fig. 6
figure 6

Phylogenetic analysis of D. sophia. Two plant species of Sapindales, M. longipes and C. coggygria, were set as outgroups

According to the sequence similarity, MCscanX software was used to generate the multiple synteny plot of D. sophia with closely related species. A great deal of homologous collinearity regions were identified in the mitogenomes of D. sophia and six other Brassicales species, but these collinearity regions had short lengths (Fig. 7). Furthermore, some gaps were detected, indicating these sequences were unique in D. sophia and had no homology with six other plant species. The results showed that the collinearity regions between the mitogenomes of these seven species were highly non-conservative in the order of arrangement, and the mitogenome of D. sophia underwent frequent genome recombination with closely related species.

Fig. 7
figure 7

Synteny analysis of D. sophia mitogenomes and closely related species. The horizontal bars represent mitogenomes of different plant species with different colors. The red arcs show regions where inversions occur, and the gray arcs show regions of good homology

3.6 Prediction of RNA editing sites in mitogenome of D. sophia

The RNA editing sites of 35 PCGs in the D. sophia mitogenome were analyzed using Deepred-Mt software with the cutoff value = 0.9. A total of 406 potential RNA editing sites were identified in 35 mitochondrial PCGs, all of which were base C → U editing sites (Fig. S2, Table S5). Among all the PCGs of D. sophia mitogenome, 34 RNA editing sites were identified for the ccmB gene with the highest number of edits, followed by the nad4 gene with 32 RNA editing sites. The atp6 and rps14 genes each have only one RNA editing site.

4 Discussion

No mitogenome data have been released for a single species of the genus Descurainia in the Cruciferae family. In this study, the second- and third-generation sequencing technologies (Illumina Hiseq X Ten and PacBio Sequel II platform) were adopted to resolve the mitogenome of D. sophia. The mitogenome of D. sophia had a circular structure with the size of 265,457 bp and GC content of 44.78%, which was consistent with A. thaliana (44.8%), C. bursa-pastoris (44.74%), O. sativa (43.8%), Z. mays (43.9%), and Morus alba (45.50%), indicating that the GC content was highly conserved in higher plants [42]. There were great differences in the usage of genomic codons among different species. This preference was the result of the relative balance that had gradually developed within the cells of organisms during long-term evolutionary selection. In the mitogenome of D. sophia, most of the PCGs had typical ATG as start codons, and the distribution of amino acid composition was similar to that of other angiosperms [43].

The PCGs only accounted for about 11.94% of the D. sophia mitogenome, tRNA and rRNA genes occupied 0.68% and 1.97%, respectively, and the rest were non-coding sequences accounting for 85.41%. Mitogenome coding sequences were more conserved than non-coding sequences, but non-coding sequences were the main source of mitogenome variation [44]. The non-coding sequences of the mitogenome were mainly composed of repeated sequences, chloroplast genome homologous sequences, and nuclear genome homologous sequences. The repeated sequences can be divided into two categories: tandem repeats and dispersed repeats. The repeated sequences were essential for recombination in the mitogenome. The longest repeated sequence in a species (generally more than 1 kb in angiosperms) will constitute homologous recombination, resulting in the isomerization of the mitogenome [45]. In the mitogenome of D. sophia, there was only one pair of dispersed repeats longer than 1 kb with a length of 1062 bp, and the rest of the dispersed repeats were 30–713 bp in length. Homologous recombination mediated by repeated sequences was commonly found in plant mitogenomes. The size of repeated sequences has been reported to be closely associated to the frequency of recombination, e.g., recombination mediated by short repeated sequences tends to be less frequent than that mediated by long repeated sequences [15]. For example, it has been found that in the mitogenomes of Scutellaria tsinyunensis [46], Abelmoschus esculentus [47] and Prunus salicina [48], the recombination frequency mediated by long repeated sequences was high, while that of short repeat sequences was low. In the mitogenome of D. sophia, only a high frequency of recombination with a pair of long-repeated sequences was detected, and some short-repeated sequences that may be involved in recombination have not been identified. The degree of variation and structural heterogeneity of mitogenome varies significantly among Brassicaceae. Through analysis of 16 mitogenomes (Brassicaceae plants), Liu found that the mitogenome of Meniocus linifolius has four conformations, with more loss of PCGs and ambiguous phylogenetic status. The mutation rate and structural heterogeneity of M. linifolius were significantly accelerated compared with other Brassicaceae plants [49]. With the release of more mitogenomes of genus Descurainia plants, it is possible to analyze the mitogenome size, the number of genes (gene loss), repetitive sequences, variation of genes, and the evolutionary impact of these variations at the genus level.

In the process of plant mitochondrial evolution, some chloroplast genome fragments were transferred into the mitogenome, and the length and sequence similarity of the transferred fragments varied among species. The 12 fragments were identified to migrate from the chloroplast genome to the D. sophia mitogenome, including 7 complete genes (1 PCG and 6 tRNA genes) with a total length of 6 kb, accounting for 2.26% of the mitogenome. The 23 chloroplast migrated fragments were detected in the mitogenome of Quercus acutissima, including 13 complete genes (2 PCGs, 10 tRNAs and 1 rRNA genes) with a total length of 15.69 kb, representing 3.49% of the mitogenome [50]. These were the typical ratios for angiosperms, with the total length of chloroplast migrated fragments ranging from 4.4 kb in Arabidopsis [51] to 138 Kb in Amborella [52]. In angiosperms, the migration of tRNA genes from chloroplasts to mitochondria was common, indicating that tRNA genes were more conserved in the mitogenome than PCGs genes, and they may play an indispensable role in mitochondria [53]. Those transferred chloroplast PCGs contained only partial sequences, indicating that they experienced a certain degree of gene loss. Considering the low substitution rate of the mitogenome, mitochondrial genes were important reference gene sequences for phylogenetic analysis at the taxonomic level [54]. Phylogenetic and synteny analysis revealed that D. sophia was closely related to B. stricta. The chromosome number of B. stricta was n = 7 [55], while the genome of D. sophia was 2n = 4× = 28 (unpublished data). This may have occurred with chromosomal changes such as chromosome fusion and whole genome duplication.

RNA editing was prevalent in the mitogenomes and chloroplast genomes of plants, and was one of the required steps for mitochondrial and chloroplast gene expression. RNA editing belonged to the post-transcriptional modification, which contributed to improve protein folding and function [18]. It was reported that two key RNA editing sites in cotton mitochondrial gene atp1 mRNA were necessary for cotton fiber cell elongation and were closely associated with cotton fiber length [56]. RNA editing factor SlORRM4 was required during tomato ripening. When the SlORRM4 gene was knocked out, the RNA editing sites of many mitochondrial genes (e.g., nad3, cytc1, cox2) transcripts were reduced, resulting in delayed maturation of tomato [57]. These results indicated that mitochondrial RNA editing was closely associated to important cultivation traits in plants. Previous studies reported that 441 RNA editing sites were identified in 36 genes of the A. thaliana mitogenome [51], and 491 sites in 34 genes of the O. sativa mitogenome [58]. We identified 406 C → U RNA editing sites in 35 PCGs of the D. sophia mitogenome. The number of RNA editing sites varied widely among different mitochondrial genes, with the cytochrome c biogenesis gene (ccmB) having the highest number, followed by the NADH dehydrogenase gene (nad4). Currently, an efficient mitochondrial genome editor (mitoTALECD) had been developed in A. thaliana, which enabled editing of the A. thaliana mitogenome C → U with editing efficiency up to 100%, and the resulting homogeneous mutations can be stably inherited to the next generation [59]. Therefore, the analysis of RNA editing sites provided a basis for predicting the gene function of new codons in the future, and was more helpful to understand the expression of mitochondrial and chloroplast genes in plants.

5 Conclusions

In this study, we sequenced, assembled and annotated the D. sophia mitogenome using second- and three-generation high-throughput sequencing technology on the Illumina HiSeq X Ten and PacBio Sequel II platform. According to the assembly results, the main structure of the D. sophia mitogenome was a circular DNA molecule without branches. The size of the D. sophia mitogenome was 265,457 bp, with the GC content of 44.78%, which encoded 56 unique genes, including 35 PCGs, 18 tRNA and 3 rRNA genes. In addition, we analyzed codon preference, repeated sequences, DNA sequence transformation, phylogenetic relationship, synteny and RNA editing sites in the D. sophia mitogenome. The phylogenetic and synteny analysis indicated that D. sophia was closely related to B. stricta, and the D. sophia mitogenome underwent frequent genome recombination. This study provided the essential information for future studies on genetic breeding, species identification and development of new molecular markers in D. sophia.