Background

The high-throughput sequencing technology has greatly stimulated studies of insect genomes and transcriptomes [11]. LncRNAs show poor conservation among different species and have relatively low expression level compared with mRNAs [12].

Systematic identification and analyses of lncRNAs have been investigated in various species, such as goat [13], mouse [14], zebrafish [15], tilapia [4], chicken [16], and fungus [17]. Many studies have provided data enabling lncRNA identification in insects. In Drosophila melanogaster, a total of 1875 candidate lncRNAs were identified from multiple transcriptome data sets [18]. Using RNA-seq technology, 8096 putative lncRNAs were identified in one susceptible and two insecticide-resistant strains of Plutella xylostella [19]. In addition, 2949 lncRNAs were found in RNA-seq data of multiple life stages of Anopheles gambiae [20]. These studies increased the catalog of insect lncRNAs and provided insight into their functions, such as cell differentiation, transcription regulation, and dosage compensation [1]. Compared with mRNA, lncRNA exhibits more tissue specific-expression in insects, indicating a specific function associated with these tissues [21].

LncRNAs can play crucial roles in many biological processes, such as cell differentiation and development [22, 23]. In Drosophila, lncRNAs were probable involved in molting because the mass of lncRNAs was significantly up-regulated in the late embryonic and larval stages [5]. Knockdown of lincRNA_1317 expression by RNA interference suppressed the replication of dengue virus in Aedes aegypti, demonstrating the essential role of the lncRNA in anti-viral defenses [24]. Genome location and co-expression analyses of protein-coding genes and lncRNAs revealed that several lncRNAs might be associated with fecundity and virulence in Nilaparvata lugens [1]. More interestingly, specific expression of lncRNAs among tissues suggested their associated functions. In Locusta migratoria, knockdown of a brain-specific lncRNA (PAHAL) by RNA interference reduced aggregation behavior [25]. Functional annotation of target genes of testis-specific lncRNAs from RNA-seq data indicated that they may participate in the spermatogenesis of Bombyx mori [26].

The melon fly, Zeugodacus cucurbitae (Coquillett), is one of the most destructive and troublesome agricultural pests [27, 28]. The genome of Z. cucurbitae has been sequenced and released [29], which provides sequence information for gene annotation and functional research. The genome-wide expression of genes during the developmental stages has also been analyzed by RNA-seq [30]. However, there is no information about lncRNAs or functional studies in Z. cucurbitae. In this study, 24 RNA-seq datasets were constructed from different tissues of female and male Z. cucurbitae, including midgut, Malpighian tubules, fat body, ovary, and testis. By the way, a total of 3124 lncRNAs were strictly identified from the RNA-seq data, and their features and characteristics were analyzed. Differentially expressed lncRNAs between tissues in female and male adults, as well as similar tissues in female and male adults, were analyzed. Tissue-specific lncRNAs were screened in female and male tissues based on their relative expression levels. GO and KEGG pathway enrichment analysis of targets of midgut-specific lncRNAs revealed unique functional annotations. Our findings create a catalog of lncRNAs in tissues of Z. cucurbitae and provide information that will be useful for further functional studies.

Results

Identification and characterization of lncRNAs

A total of 511,526,830 raw reads were generated from 24 RNA-seq datasets. Q30 scores were ≥ 93.0% in all of the samples. GC contents ranged from 40.1 to 46.69%. The accuracy of RNA-seq data was of high degrees as no “N” base was detected in any of the samples (Table 1). All of the RNA sequencing data produced in this study are available in the NCBI BioProject database (http://www.ncbi.nlm.nih.gov/bioproject/) under the accession number: PRJNA579200. After filtering under a computational pipeline (Fig. S1), a total of 22,159 lncRNA candidates were retained. Null-expressed transcripts (FPKM value < 1 in all analyzed samples) were discarded, and the numbers of lncRNAs in female and male tissues were screened. In females, the largest population of lncRNAs (1024) was found in the Malpighian tubules (Fig. 1a). There were 20,330 null-expressed lncRNAs in female tissues (Fig. 1b). Fat body had the largest lncRNA population (1026) among male tissues (Fig. 1c). Male tissues had 19,680 null-expressed lncRNAs (Fig. 1d). After discarding all null-expressed lncRNAs, a total of 3124 lncRNA transcripts were strictly identified from the transcriptome data of the eight tissues. Most of these were lincRNAs (1464; 46.9%), followed by intronic lncRNAs (1037; 33.2%), anti-sense lncRNAs (301; 9.6%), and sense lncRNA (322; 10.3%) (Fig. 2a). The lncRNA length distribution showed that most lncRNA transcripts were longer than 3000 nucleotides (Fig. 2b). The majority of lncRNAs only had one isoform (Fig. 2c). Most of the lncRNAs in Z. cucurbitae contained two exons (Fig. 2d).

Table 1 Summary statistics of the RNA-seq data
Fig. 1
figure 1

Number of lncRNAs in tissues of female (a) and male (c) Zeugodacus cucurbitae. LncRNAs with null expression in female tissues (b) and male tissues (d) were discarded. Abbreviations were consistent with the above

Fig. 2
figure 2

Number of four types of lncRNA (a), the lncRNA length distribution (b), the isoform number of lncRNA (c), the exon number distribution of lncRNA (d). lincRNA means long intergenic non-coding RNA

Expression of lncRNAs in Z. cucurbitae

To analyze the differences in expression of lncRNAs among tissues, the hierarchical clustering of 1554 differentially expressed lncRNAs (DELs) was analyzed in a heatmap using the FPKM value (Fig. 3). Many DELs clustered in specific tissues based on lncRNA expression levels among the different tissues. DELs between every two pairs of tissues were analyzed. In female Z. cucurbitae. A total of 151 higher- and 103 lower-expressed lncRNAs were found in the comparison of Malpighian tubules vs. ovary. The comparison of midgut vs. fat body showed 69 DELs, among which 36 were higher- and 33 were lower-expressed (Fig. 4a). Comparisons of Malpighian tubules vs. testis and midgut vs. Malpighian tubules had the most and fewest DELs in males, respectively. A total of 806 DELs were found in male Malpighian tubules vs. testis; 604 were higher- and 202 were lower-expressed. A total of 45 DELs existed in midgut vs. Malpighian tubules of males; 28 were higher- and 17 were lower-expressed (Fig. 4b). DELs between similar tissues in male and female adults were analyzed. The comparison of ovary vs. testis had 623 DELs, which was much more than other tissue comparisons (Fig. 4c).

Fig. 3
figure 3

Cluster heatmap showing the expression profile of differentially expressed lncRNAs in female (a) and male (b) tissues of Zeugodacus cucurbitae. The heatmap was generated using R pheatmap. Red and Green indicate higher and lower expression levels, respectively. Abbreviations are consistent with the above

Fig. 4
figure 4

Statistical analysis of differentially expressed lncRNAs between tissues in female Zeugodacus cucurbitae (a), male Zeugodacus cucurbitae (b), and similar tissues between female and male Zeugodacus cucurbitae (c). Abbreviations are the same as above

LncRNAs showed differential expression among tissues. Tissue-specific lncRNAs were identified in all tissues. Venn diagrams showed that each tissue contained a certain number of tissue-specific lncRNAs. In midgut, Venn diagram analysis showed 8 and 8 specifically expressed lncRNAs in females and males (Fig. 5a1 and a2). A total of 5, 7, 9, and 21 specifically expressed lncRNAs were found in female Malpighian tubules (Fig. 5b1), male Malpighian tubules (Fig. 5b2), female fat body (Fig. 5c1), and male fat body (Fig. 5c2), respectively. A total of 42 ovary-specific lncRNAs had a relatively high expression in the ovary compared with other female tissues (Fig. 5d1). The number of testis-specific lncRNAs (364) was much larger than those of other tissues (Fig. 5d2).

Fig. 5
figure 5

Quantitative expression analysis of midgut, Malpighian tubules, fat body, ovary, and testis in Zeugodacus cucurbitae. Each section of the Venn diagrams shows the numbers of differentially expressed lncRNAs with a ratio of two tissues expression level above 10. Venn diagrams indicate the number of midgut-specific lncRNAs (a1 and a2), Malpighian tubules-specific lncRNAs (b1 and b2), fat body-specific lncRNAs (c1 and c2), ovary-specific lncRNAs (d1), and testis-specific lncRNAs (d2) in female and male Zeugodacus cucurbitae. Abbreviations are consistent with those used previously

Functional annotation of target genes of tissue-specific lncRNAs

GO and KEGG pathway analysis were conducted to study the potential functions of lncRNAs, and some of them can regulate the expression of neighboring genes (cis) and related co-expressed genes (trans) [31]. To illustrate some special functional annotations, target genes of tissue-specific (e.g., midgut-specific) lncRNAs were analyzed. A total of 457 target genes were obtained in the female midgut, among which 51 were cis-regulated and 410 were trans-regulated. For the male midgut, a total of 273 target genes were predicted, including 34 cis-regulated and 241 trans-regulated genes. GO analysis indicated that these target genes were involved in different physiological activities, including biological process, molecular function, and cellular component. In these categories, metabolic process, catalytic activity, and membrane were the most abundant subgroups (Fig. 6a). KEGG pathway analyses showed that these target genes were most frequently predicted in metabolism, among which the three pathways (purine metabolism, oxidative phosphorylation, and carbon metabolism) were most significantly enriched (Fig. 6b).

Fig. 6
figure 6

GO and KEGG pathway analyses of the target genes of midgut-specific lncRNAs in Zeugodacus cucurbitae. a GO analysis of the functions of lncRNA target genes. b KEGG pathway analysis of lncRNA target genes

Validation of differentially expressed lncRNAs

Four differentially expressed lncRNAs were randomly selected and their expression patterns in the eight tissues were examined by RT-qPCR. The selected four differentially expressed lncRNAs were named as Zc-Lnc22787, Zc-Lnc50977, Zc-Lnc99852, and Zc-Lnc11868. The expression patterns of these four lncRNAs calculated from RNA-seq data and RT-qPCR results were consistent (Fig. 7). All of our findings showed that our pipeline was strict in lncRNA identification and indicated that the identified lncRNAs were differentially expressed, in vivo.

Fig. 7
figure 7

Validation of four randomly selected differentially expressed lncRNAs by quantitative real-time PCR (RT-qPCR). The bar represents the mean lncRNA expression and the error bar represents the positive standard error (SE) of the mean. Abbreviations were consistent with those used previously. Data were analyzed by one-way ANOVA followed with Tukey’s test (P < 0.05)

Discussion

The lncRNAs are responsible for several key physiological processes [32, 33], including epigenetics [34], immune response [35], and protein degradation [36]. LncRNAs in insect species have now been studied in D. melanogaster [5], A. aegypti [24], B. mori [38]. After identification under a computational pipeline, the screening criteria of the expression threshold of at least 1 FPKM in each tissue resulted in a strict catalog containing 3124 lncRNAs. A similar result was reported in Drosophila, in which 1077 lncRNAs were identified from 43,967 transcripts in the transcriptomes of different development stages [5]. Each tissue had a specific number of lncRNA in Z. cucurbitae. In Drosophila, lncRNAs were also distributed in many tissues of males [18]. Differences in the lncRNA numbers of different tissues may explain the variable lncRNA amounts in different insect species. Among the identified lncRNAs, the long intergenic lncRNAs (lincRNA) were most common, followed by intronic, sense, and anti-sense lncRNAs. In B. mori, lincRNAs and intronic lncRNAs were the most and least common, and sense lncRNAs were not identified [50], the Coding-Potential Assessment Tool (CPAT version 1.2.2) [51], and Coding Potential Calculator (CPC, version 0.9 r2) [52] were used to predict the protein-coding potential. Transcripts with CNCI scores < 0, CPAT = “no”, and CPC scores < 0 were retained. After that, Pfam was implemented and transcripts that contained any known protein domains would be excluded [53]. Finally, the remaining transcripts were aligned with Rfam database, GtRNAdb database, Silva database, and Repbase database, respectively, to screen out other ncRNA, such as small nuclear RNA (snRNA), transfer RNA (tRNA), small nucleolar RNA (snoRNA) repeat sequences, and ribosomal RNA (rRNA) using Bowtie tools [54]. Genome map** rates revealed large differences among biological replicates of Malpighian tubules from female and male melon flies. Considering this, Malpighian tubules as well as other tissues had one deleted replicate, and the average FPKM values between the remaining two replicates were used for downstream analyses [20]. Transcripts with an FPKM value < 1 in all tissues were considered as null-expressed and were discarded. The remaining transcripts were considered reliable lncRNAs. Additionally, mRNAs were obtained from the same RNA-seq libraries in this study.

Tissue-specific expressed lncRNAs

Tissue-specific lncRNAs refer to lncRNAs that have extremely high expression in the given tissue [18]. To study the tissue-specific lncRNAs in female and male Z. cucurbitae, DESeq was used to analyze the significance of the differential expression of lncRNAs in each two tissues [55]. In this step, the software provided a statistical program for calculating the difference in numeric gene expression analysis with fold change ≥2 and a False Discovery Rate (FDR) < 0.05. On this basis, tissue-specific expressed lncRNAs were screened in each tissue with the ratio of FPKMtissue 1/FPKMall the others ≥ 10.

Target prediction and GO and KEGG pathway analysis

LncRNA targets were predicted according to the genomic location and co-expression between lncRNAs and mRNAs. Two categories (cis-regulation and trans-regulation) of the lncRNA regulation modes were analyzed. LncRNAs’ regulation on their neighbor genes within 100 kb upstream and downstream in chromosomes was regarded as cis-regulation [56]. For trans-regulation, co-expression analyses of lncRNA and mRNA were investigated based on their expressions as previously implemented in tissues of B. mori [26]. Coefficients with r > 0.9 or < − 0.9 and pearson’s correlation with p-value < 0.01 were judged to be correlated expressed. All of the identified cis- and trans-regulated protein-coding genes were used for GO and KEGG pathway analysis. TopGO R packages and KOBAS software [57] were used for GO and KEGG pathway analysis, respectively.

Quantitative real-time PCR (RT-qPCR)

To validate expression patterns of differentially expressed lncRNAs, the eight tissues were dissected from 5-day-old melon fly adults in the same manner as the sequenced samples. After total RNA isolation, lnRcute lncRNA cDNA kit (TIANGEN, Bei**g, China) was used for first-strand lncRNA cDNA synthesis. Primers used for lncRNAs validation were designed using Primer 3.0 (http://bioinfo.ut.ee/primer3-0.4.0/) (Tab. S1). To determine the cycle threshold (Ct) value and amplification efficiency of each pair of primers, a standard curve was conducted with serial dilutions of cDNA (1, 5− 1, 5− 2, 5− 3, 5− 4). The qPCR reaction was run on a CFX384 Optics Module (Bio-Rad, Singapore) using the lnRcute lncRNA SYBR Green premix (TIANGEN, Bei**g, China). RT-qPCR was conducted with 10 μL of mixture, each consisted of 5 μL of lncRNA SYBR premix, 4 μL of nuclease-free water, 0.5 μL of lncRNA cDNA (~ 500 ng/μL), and 0.25 μL each of forward and reverse primers (10 μM). The PCR procedure was as follows: an initial denaturation at 95 °C for 3 min, followed by 40 cycles of 95 °C for 5 s and 60 °C for 15 s, the specificity of primers were ensured by the record of a melting curve analysis from 60 °C to 95 °C. Relative expression levels of lncRNAs among different tissues were normalized by Alpha-tubulin and beta-tubulin 1 [58]. All experiments were conducted in four biological replicates. Data were calculated by qBase plus software [59].

Statistical analysis

The difference among tissues was analyzed using SPSS 19.0 software (IBM, Chicago, IL, USA) with a one-way analysis of variance (ANOVA) followed by Tukey’s honestly significant difference (HSD) test (P < 0.05).