Introduction

The genus Dacus Fabricius (Diptera: Tephritidae: Dacini) is one of the most economically important fruit flies1. There are about 248 Dacus species most of which show a strong preference for attacking the pods of Asclepiadaceae and Apocynaceae, or the fruits and flowers of Cucurbitaceae1,2. The majority of Dacus species distribute in the African continent, several species are found in the Indian Subcontinent, Southeast Asia, Australia and the Pacific2.

Dacus (Callantra) longicornis Wiedemann has a widespread distribution across southern Asia and Southeast Asia and attacks Cucurbitaceae species3. Very limited studies focused on D. longicornis are available except for taxonomy and records or first record of this species in some countries and areas3,4,5. Molecular data of D. longicornis has not been well studied with only seven records published in GenBank as of May 2016. It is becoming increasingly evident that detailed knowledge of molecular data of D. longicornis is required not only for its population structure and geographical variability studies, but also for a comprehensive phylogeny analysis of the tribe Dacini which consist of two very large genera - Bactrocera Macquart (629 spp.) and Dacus Fabricius (248 spp.) and two small genera - Ichneumonopsis Hardy (one sp.) and Monacrostichus Bezzi (two spp.)2,7,8,9.

The whole mitogenome has become established as one of the most useful markers and has been used for molecular systematic, phylogeography, diagnostics and molecular evolutionary studies10,11,12. By the May of 2016, forty-five complete mitogenomes of 19 Tephritidae species are available in GenBank (Supplementary Table S1), including 16 Bactrocera species which are Bactrocera (Bactrocera) arceae (Hardy & Adachi) (KR233259)8, B. (B.) carambolae Drew & Hancock (EF014414), B. (B.) correcta (Bezzi) (JX456552), B. (B.) dorsalis (Hendel) (DQ845759, DQ917577, B. (B.) papayae Drew & Hancock DQ917578 and B. (B.) philippinensis Drew & Hancock DQ995281; B. (B.) papayae and B. (B.) philippinensis have been proven to be the same species with B. (B.) dorsalis)13,14, B. (B.) latifrons (Hendel) (KT881556)9, B. (B.) melastomatos Drew & Hancock (KT881557)9, B. (B.) tryoni (Froggatt) (HQ130030)15, B. (B.) umbrosa (Fabricius) (KT881558)9, B. (B.) zonata (Saunders) (KP296150)16, B. (Daculus) oleae (Gmelin) (AY210702, AY210703, GU108459 to GU108479)15,17, B. (Tetradacus) minax (Enderlein) (HM776033)18, B. (Zeugodacus) caudata (Fabricius) (KT625491 and KT625492)9, B. (Z.) cucurbitae (Coquillett) (JN635562)19, B. (Z.) diaphora (Hendel) (KT159730)20, B. (Z.) scutellata (Hendel) (KP722192) and B. (Z.) tau (Walker) (KP711431)24, the nucleotide composition of D. longicornis was all AT biased and positive AT skews and negative GC skews, not only in the whole mitochondrial genome but also in PCGs, rRNAs, tRNAs and the control region (Table 2).

Table 2 Nucleotide composition of the mitochondrial genome of Dacus longicornis.

All of the PCGs started with ATN codons (ATG in COII, ATP6, COIII, ND4, ND4L, CYTB and ND1; ATC in ATP8, ND5 and ND6; ATT in ND2; ATA in ND3) except for COI which started with TCG codon. Seven PCGs (COI, COII, ATP8, ATP6, COIII, ND4L and ND6) stopped with TAA codon, three PCGs (ND2, ND3 and ND4) had TAG stop codon, while ND5, CYTB and ND1 had incomplete stop codon T.

Twenty-two typical tRNAs which are usually observed in insect mitogenomes were also found in D. longicornis mitogenome. The size of 22 tRNAs ranged from 64 bp (tRNAHis) to 72 bp (tRNAVal). Most tRNAs could be folded into the cloverleaf structure except for tRNASer(AGN) which lacked the dihydorouridine (DHU) arm (Fig. 2). Twenty-three G-U pairs, four mismatched base U-U pairs and one mismatched base U-C pair were found in D. longicornis mitogenome tRNA secondary structures. The G-U pairs were located in the amino acid acceptor (AA) arm (9 bp), DHU arm (8 bp), anticodon (AC) arm (3 bp) and TψC (T) arm (3 bp). The mismatched base U-U pairs were located in AA arm (2 bp), AC arm (1 bp) and T arm (1 bp). The mismatched base U-C pairs were located in T arm.

Figure 2
figure 2

Putative secondary structures of tRNAs found in the mitochondrial genome of Dacus longicornis.

The tRNAs are labelled with the abbreviations of their corresponding amino acids. Inferred Watson-Crick bonds are illustrated by lines, whereas GU bonds are illustrated by dots.

The lrRNA was assumed to fill up the blanks between tRNALeu(CUN) and tRNAVal. For the boundary between the srRNA gene and the control region, alignments with homologous sequences in other mitogenomes of Tephritidae were applied to determine the 3′-end of the gene. The lrRNA is 1,331 bp long with an A + T content of 78.5%, and the srRNA is 798 bp long with an A + T content of 74.9%.

The control region (1,343 bp) was flanked by srRNA and tRNAIle and was highly enriched in AT (85.3%). Two 151 bp repeats were found in the control region and one 19 bp poly-T stretch located near the repeats. Furthermore, the region near tRNAIle contained another 22 bp poly-A stretch. Both repeated sequences and poly stretches are common in the control region for most insects25,26, and these motifs may function during processing of the replication and transcription.

Phylogenetic relationship

Four datasets were used in the phylogenetic analysis, there are 14,586 residues in the PCG123RNA matrix (containing nucleotides of 13 PCGs, two rRNAs and 22 tRNAs), 11,148 residues in the PCG123 matrix (containing nucleotides of 13 PCGs), 10,870 residues in the PCG12RNA matrix (containing nucleotides of 13 PCGs but excluding the third codon sites, two rRNAs and 22 tRNAs) and 7,432 residues in the PCG12 matrix (containing nucleotides of 13 PCGs but excluding the third codon sites).

The topology structures conducted from Bayesian and ML analyses were very similar based on these four datasets (Fig. 3). The monophyly of Tephritidae and Dacini tribe were well supported in all trees with posterior probabilities 1.0 and ML bootstraps 100. The genus Bactrocera was not monophyletic but it was different from other Tephritidae mitochondrial genome phylogeny studies which only included Bactrocera speices of Dacini8,9,16,20,http://www.ncbi.nlm.nih.gov/) and confirmed by alignment with homologous genes from other 18 tephritid species available in GenBank. Transfer RNA (tRNA) genes were identified using the tRNAscan-SE38 and ARWEN39 and checked manually. The circular map of D. longicornis mitogenome sequence was drawn with CGView40. The nucleotide composition and codon usage were analyzed using MEGA 6.041. The composition of skew was measured with the following formula: AT skew = (A − T)/(A + T) and GC skew = (G − C)/(G + C)42. The annotated mitogenome sequence of D. longicornis has been deposited in GenBank with accession number KX345846.

Phylogenetic analyses

To better resolve molecular phylogeny of Dacini especially between Dacus and Zeugodacus, a total of 21 species of Diptera species were used in phylogenetic analysis, including 19 Tephritidae and two outgroup species from Drosophilidae. Detailed information of these species used in this study were listed in Supplementary Table S1.

Sequences of 13 PCGs, two rRNAs and 22 tRNAs were used in phylogenetic analysis. The MAFFT algorithm in the TranslatorX online platform43 under the L-INS-i strategy was utilized to align 13 PCGs based on codon-based multiple alignments and to toggle back to the nucleotide sequences. Before back-translate to nucleotides, poorly aligned sites were removed from the protein alignment using GBlocks within the TranslatorX with default settings. Muscle algorithm implemented in MEGA 6.041 was performed to align the sequences of two rRNAs, ambiguous positions in the rRNAs alignment were filtered by hand. Quality control of the hand alignments44 was performed by comparing with homologous sequences from previously sequenced tephritid mitogenomes to identify 22 tRNAs. Individual genes were concatenated using SequenceMatrix v1.7.845. Four datasets were set up for phylogenetic analysis: (1) nucleotides of 13 PCGs, two rRNAs and 22 tRNAs (P123R) with 14,586 residues, (2) nucleotides of 13 PCGs (P123) with 11,148 residues, (3) nucleotides of 13 PCGs exclude the third codon sites, two rRNAs and 22 tRNAs (P12R) with 10,870 residues and (4) nucleotides of 13 PCGs exclude the third codon sites (P12) with 7,432 residues.

The optimal partition strategy and substitution models for each partition were selected by PartitionFinder v1.1.146. As the software required a user to pre-define partitions, we created input configuration files with 39/42/26/29 (P123/P123R/P12/P12R) pre-defined partitions of the dataset. The “greedy” algorithm were used along with branch lengths estimated as “unlinked” and Bayesian information criterion (BIC)47,48 to search for the best-fit scheme. The best selected partitioning schemes and models of three datasets for ML and BI analyses were listed in Supplementary Table S3.

We performed Bayesian inference (BI) and maximum likelihood (ML) based on the best-fit partitioning schemes recommended by PartitionFinder (Supplementary Table S3). We used MrBayes 3.2.249 to conduct Bayesian analysis. The datasets were conducted with two simultaneous runs of 2 million generations, each with one cold and three heated chains. Samples were drawn every 1,000 Markov chain Monte Carlo (MCMC) steps, with the first 25% discarded as burn-in. The stationarity was considered to be reached and stopped run when the average standard deviation of split frequencies was below 0.01. The ML analysis was conducted with RAxML 8.0.050 with 1,000 bootstrap replicates and using the rapid bootstrap feature (random seed value 12345)51.

Additional Information

Accession Codes: Dacus longicornis mitochondrial genome is available in GenBank database (accession number: KX345846).

How to cite this article: Jiang, F. et al. The first complete mitochondrial genome of Dacus longicornis (Diptera: Tephritidae) using next-generation sequencing and mitochondrial genome phylogeny of Dacini tribe. Sci. Rep. 6, 36426; doi: 10.1038/srep36426 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.