Background & Summary

Microplitis manilae Ahmead (Hymenoptera: Braconidae: Microgastrinae) is a solitary endoparasitoid wasp and is primarily distributed in the Asia-Pacific region1. It attacks several lepidopteran species, with Spodoptera species being its preferred target, including S. frugiperda, S. exigua and S. litura2, which of them are the world’s most significant agricultural pests3. M. manilae is thought to be an ideal biological control agent for Spodoptera spp.

So far, it has approximately 200 species have been recognized within Microplitis1, and some of them, i.e. M. croceipes, M. demolitor and M. mediator, have been widely used in biological pest control4,5,6. The virulence factors of Microplitis wasps that act to suppress or circumvent host immunity are primarily composed of polydnaviruses (PDV), venom, and teratocytes7,8. In recent years, the biology, ecology, and interaction with the host of Microplitis have been studied9,10. The study of the interactions between parasitoid wasps and their host insects, particularly the regulation of host immunity and development by parasitoid wasps, has great potential for increasing the use of parasitoid wasps in sustainable pest management in agriculture. To further understand the complex relationship between parasitoids and their hosts, high quality genome data would play an important role. The genome at the chromosome level may shed light on the evolution of parasites, the mechanisms of parasitism, the potential for develo** new techniques for biological control and utilizing natural enemies as resources. However, only two fragmented genomes from the genus Microplitis (M. demolitor and M. mediator) are currently available in NCBI and a chromosome-level genome assembly for Microplitis spp. has not been published.

In this study, we used MGI short-read, ONT long-read and Hi-C sequencing technologies to assemble the M. manilae chromosome-level genome. The final genome size was 282.85 Mb with a scaffold N50 length of 25.23 Mb, and 268.17 Mb assembled genome sequences were successfully anchored on 11 chromosomes. In total, 15,689 protein-coding genes were identified, and 13,580 of them were functionally annotated.

Methods

Insect collection and rearing

The wasps Microplitis manilae were collected from maize fields in Dongfang City, Hainan Province, China (18.86°N, 108.72°E) in November 2020 and reared using their host Spodoptera frugiperda under laboratory conditions of 26 ± 1 °C, 65 ± 5% RH, and a 14 h light: 10 h dark photoperiod.

Sequencing

The extraction of DNA and RNA was performed on newly emerged male individuals that had been raised for five or more generations. Genomic DNA was obtained using the Blood & Cell Culture DNA Mini Kit (Qiagen, Hilden, Germany) for both long-read and short-read whole genome sequencing. RNA was isolated using the TRlzol reagent (Vazyme, Nan**g, China). The Hi-C library was generated using the restriction endonuclease DpnII. Long-read sequencing was carried out using the Nanopore PromethION platform (Oxford Nanopore Technologies, UK), with an insert size of approximately 20 kb. Short-read and transcriptome sequencing were performed using libraries with an insert size of 350 bp and sequenced on the MGISEQ. 2000 platform. The total data generated from the long-read sequencing was 76.31 Gb, while the total data generated from the short-read sequencing was 82.60 Gb (Table 1).

Table 1 Statistics of the DNA/RNA sequence data used for genome assembly.

Genome size estimation and assembly

The raw reads obtained from the MGISEQ. 2000 platform were subjected to quality control using fastp v0.21.011 to filter adapter sequences and low-quality reads. The remaining reads of MGI library were then used to estimate the genome size of M. manilae by GenomeScope v1.0.012 and analyze the 17-mer distribution with Jellyfish v2.3.013. The final genome size was estimated to be 297.29 Mb through K-mer analysis.

The draft genome is obtained by first assembling long reads and then polishing the results with short reads, which has been widely used in genome assembly research for different organisms recently14,15,31 and the candidate coding region was identified by PASA pipeline v2.4.1 (https://github.com/PASApipeline/PASApipeline). The repeat-masked genome was analyzed using AUGUSTUS v3.3.332 and SNAP v2006-07-2833 for de novo gene prediction. The protein sequences of hymenopteran species were downloaded from the NCBI Database as references for homology-based prediction. Exonerate v2.4.034 was utilized to align the reference proteins to the genome assembly and predict gene structures. Finally, a consensus gene set was created by integrating the genes predicted by the aforementioned three methods using EVidenceModeler v1.1.135. We predicted 15,689 protein-coding genes for the M. manilae genome by combining the evidences from the transcriptome, ab initio, and homology-based predictions. The average length of the predicted gene was 8,718 base pairs, while that of a protein-coding region was 1,575 bp. Exon and intron lengths on average were 319 and 1,814 bp, respectively. There were 4.9 exons on average per gene (Table 4).

Table 4 Statistics of gene structure annotation in the Microplitis manilae genome.

Gene functions were annotated using BLASTP v2.9.036 (-evalue 1e-5) to search against UniProtKB (Swiss-Prot + TrEMBL)37, and InterProScan 5.52-86.038 to search against the Pfam39, CDD40, Gene3D41, Smart42, and Superfamily43 databases. The eggnog-mapper v2.1.444 was used to predict conserved sequences and domains, GO terms, and KEGG pathways against the eggnog v5.0 database45. A total of 13,580 (86.56%) genes were functionally annotated against the UniProtKB database. In integrating with InterProScan and eggnog annotation results, 13,227 (84.31%) protein-coding genes with protein domains were identified, which were assigned 11,276 COG Functional Categories genes, 9,489 Reactome pathways, 7,819 MetaCyc, 7,722 GO terms, 7,324 KEGG KO terms, and 4,274 KEGG pathways, respectively.

Data Records

The MGI, ONT, RNA-seq and Hi-C sequencing data used for the genome assembly have been deposited in the NCBI Sequence Read Archive (SRA) database with accession numbers SRR2135882846, SRR2135882747, SRR2135882948 and SRR2135882649, respectively, under the BioProject accession number PRJNA872950. The chromosomal assembly has been deposited at GenBank with accession number JAPFQK00000000050. Genome annotation information has been deposited in the Figshare database51.

Technical Validation

Evaluating the quality of the genome assembly

The quality of M. manilae genome assembly was evaluated using two approaches. Firstly, sequencing data were mapped to the genome to verify the accuracy, yielding map** rates of 99.52% for MGI, 94.40% for RNA-seq, and 98.52% for ONT data. Secondly, BUSCO analysis found 98.6% of the 1,367 single-copy orthologues (in the insects_odb10 database) to be complete (97.9% single-copied genes and 0.7% duplicated genes), 0.4% fragmented, and 1.0% missing.

Chromosome synteny

Chromosome synteny between M. manilae and Cotesia congregata was detected by MCScanX52 with default parameters. The genome assembly of C. congregata53 was retrieved from NCBI with accession number GCA_905319865.3. The visual diagram was generated using TBtools54. The synteny of the M. manilae assembly was compared to that of C. congregata, a closely related species of the subfamily Microgastrinae. The results showed a low level of synteny between M. manilae and C. congregata (Fig. 3). A number of fusion and fission events were detected between these two wasps. For instance, Chr11 and a part of Chr5 of M. manilae were syntenic to Chr4 of C. congregata, whereas Chr1 of M. manilae was syntenic to a portion of Chr2 and Chr3 of C. congregata. Low genome synteny was also identified between Nasonia vitripennis and Pteromalus puparum, both of which are members of the same family Pteromalidae55.

Fig. 3
figure 3

Chromosomal synteny between Microplitis manilae and Cotesia congregata genomes.

Gene annotation validation

OrthoFinder v2.5.456 was utilized to infer sequence orthology, based on protein annotation sequences of 11 additional hymenopteran organisms retrieved from NCBI, including Apis mellifera, Athalia rosae, Bombus terrestris, Chelonus insularis, Diachasma alloeum, Fopius arisanus, M. demolitor, Nasonia vitripennis, Orussus abietinus, Polistes dominula, and Venturia canescens (Table S2). A total of 132,122 genes were assigned to 12,544 gene families. Among them, 4,910 gene families were presented in all the species genomes, with 3,780 single-copy and 1,130 multicopy gene families. In the 15,689 predicted genes of M. manilae, 14,822 (94.47%) were grouped into 9,725 families. There were 1,295 genes in 241 families unique to M. manilae (Fig. 4, Table S3).

Fig. 4
figure 4

Distribution of genes in different Hymenoptera species. “1:1:1” represents shared single-copy genes, “N:N:N” as multicopy genes shared by all species, “others” as unclassified orthologs, “unassigned” as orthologs which cannot be assigned into any gene families (orthogroups).

All single-copy protein sequences were concatenated into one data matrix after being aligned with MAFFT v7.42757. The phylogenetic tree was constructed using IQ-TREE v2.0.558 with the best model (JTT + F + R7) estimated by ModelFinder59. Statistical support for the phylogenetic trees was evaluated by Ultrafast bootstrap60 analysis using 1000 replicates. The phylogenetic tree reconstructed by IQ-TREE had high bootstrap support values. The topology of the phylogeny was consistent with that of the previous study61. The MCMCTree package in PAML v4.9j62 was used to estimate divergence times. Based on a previous study, five calibration time points were used: root holometabolous: <300 million years ago (mya); Orussoidea + Apocrita: 211–289 mya; Apocrita: 203–276 mya; Aculeata: 160–224 mya; and Ichneumonoidea: 151–218 mya61. As expected, our analysis revealed that M. manilae was closely related to M. demolitor and these two species diverged approximately 7.6 mya (Fig. 5). CAFE v4.2.163 was used to estimate gene family expansions and contractions with a p value of 0.01. Finally, we found 615 and 635 gene families experienced expansions and contractions in M. manilae, respectively, and 395 (310 expanded and 85 contracted) of them were rapidly evolved (Fig. 5).

Fig. 5
figure 5

Phylogenetic and gene family evolution analyses of Microplitis manilae and 11 other Hymenoptera species. The bootstrap values of all nodes are supported at 100/100. Node values indicate the number of gene families showing expansion, contraction, and rapid evolution. The scale at the bottom of the figure represents the divergence time.