Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain

Petrosino, Giuseppe; Ponte, Giovanna; Volpe, Massimiliano; Zarrella, Ilaria; Ansaloni, Federico; Langella, Concetta; Di Cristina, Giulia; Finaurini, Sara; Russo, Monia T.; Basu, Swaraj; Musacchia, Francesco; Ristoratore, Filomena; Pavlinic, Dinko; Benes, Vladimir; Ferrante, Maria I.; Albertin, Caroline; Simakov, Oleg; Gustincich, Stefano; Fiorito, Graziano; Sanges, Remo

doi:10.1186/s12915-022-01303-5

Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain

Research article
Open access
Published: 18 May 2022

Volume 20, article number 116, (2022)
Cite this article

Download PDF

You have full access to this open access article

BMC Biology Aims and scope Submit manuscript

Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain

Download PDF

Giuseppe Petrosino^1,2^na1,
Giovanna Ponte¹^na1,
Massimiliano Volpe^1,3,4^na1,
Ilaria Zarrella¹,
Federico Ansaloni³,
Concetta Langella¹,
Giulia Di Cristina^1,5,
Sara Finaurini⁶,
Monia T. Russo⁷,
Swaraj Basu^1,8,
Francesco Musacchia¹,
Filomena Ristoratore¹,
Dinko Pavlinic^9,10,
Vladimir Benes⁹,
Maria I. Ferrante⁷,
Caroline Albertin¹¹,
Oleg Simakov^12,13,
Stefano Gustincich^3,6,
Graziano Fiorito¹ &
…
Remo Sanges ORCID: orcid.org/0000-0002-5047-9713^1,3,6

12k Accesses
5 Citations
1339 Altmetric
145 Mentions
Explore all metrics

Abstract

Background

Transposable elements (TEs) widely contribute to the evolution of genomes allowing genomic innovations, generating germinal and somatic heterogeneity, and giving birth to long non-coding RNAs (lncRNAs). These features have been associated to the evolution, functioning, and complexity of the nervous system at such a level that somatic retrotransposition of long interspersed element (LINE) L1 has been proposed to be associated to human cognition. Among invertebrates, octopuses are fascinating animals whose nervous system reaches a high level of complexity achieving sophisticated cognitive abilities. The sequencing of the genome of the Octopus bimaculoides revealed a striking expansion of TEs which were proposed to have contributed to the evolution of its complex nervous system. We recently found a similar expansion also in the genome of Octopus vulgaris. However, a specific search for the existence and the transcription of full-length transpositionally competent TEs has not been performed in this genus.

Results

Here, we report the identification of LINE elements competent for retrotransposition in Octopus vulgaris and Octopus bimaculoides and show evidence suggesting that they might be transcribed and determine germline and somatic polymorphisms especially in the brain. Transcription and translation measured for one of these elements resulted in specific signals in neurons belonging to areas associated with behavioral plasticity. We also report the transcription of thousands of lncRNAs and the pervasive inclusion of TE fragments in the transcriptomes of both Octopus species, further testifying the crucial activity of TEs in the evolution of the octopus genomes.

Conclusions

The neural transcriptome of the octopus shows the transcription of thousands of putative lncRNAs and of a full-length LINE element belonging to the RTE class. We speculate that a convergent evolutionary process involving retrotransposons activity in the brain has been important for the evolution of sophisticated cognitive abilities in this genus.

Impact of non-LTR retrotransposons in the differentiation and evolution of anatomically modern humans

Article Open access 15 August 2018

Investigation of the activity of transposable elements and genes involved in their silencing in the newt Cynops orientalis, a species with a giant genome

Article Open access 20 July 2021

De-novo emergence of SINE retroposons during the early evolution of passerine birds

Article Open access 14 December 2017

Background

Transposable elements (TEs) have contributed to the evolution of specific functions in a variety of biological systems and have given birth to a large fraction of vertebrate long non-coding RNAs (lncRNAs) [1,2,3]. Among TEs, retrotransposons move via a copy-and-paste mechanism using an RNA intermediate. The long interspersed element (LINE) L1, a non-LTR retrotransposon that predated the human genome, is active during neuronal differentiation [4] and causes somatic mosaicism establishing genomic variability in the brain [5]. Somatic L1 insertions are suggested to alter the transcriptional output of individual neurons, eventually affecting neuronal plasticity and behavior [6]. Transposition-driven genomic heterogeneity has also been documented in invertebrates including the neurons of mushroom bodies of Drosophila melanogaster [7] where they have been suggested to drive behavioral variability in individual flies. Negative regulators of retrotransposons are reported to be expressed in specific subgroups of neurons in Aplysia californica [8], Drosophila melanogaster [7], and in the mouse brain [9], further supporting the idea that, in nervous systems, retrotransposon transcription is finely regulated in a broad range of organisms and does not simply constitute noise. Our understanding of the activities of TEs in metazoa genomes is nevertheless still far to be complete, and the number of cellular events in which they have an influence is constantly growing. Novel findings allow us to increase our comprehension but also add layers of complexity to the topic. L1 elements, for example, have been demonstrated to have specific activity also at the non-coding level in the regulation of transcription and the organization of the genome [10, 11] and to be a component of extra-chromosomal DNA [12].

Among invertebrates, the cephalopod mollusk Octopus vulgaris is known for the richness of its behavioral repertoire achieving sophisticated vertebrate-like plasticity and neural control [13,14,15,16]. The remarkable complexity in the morphological and functional organization of its nervous system [14, 17, 18] is linked to evolutionary innovations at the cellular and molecular levels [17,18,19,20,21]. In the common octopus, about 500 million nerve cells constitute the nervous system, with about 300 million composing the nervous system in the arm and about 200 million nerve cells in what is considered to be the central brain [22]. This number results to be ten thousand times higher than that found in the sea hare Aplysia and still remains two hundred times higher when compared to the number of neurons present in the brain of the honeybee Apis mellifera (reviewed in [23]). This complexity at the cellular level is determined by key aspects of the transcriptional outputs of its genome such as large cadherin genes encoding over 70 extracellular cadherin domains; unprecedented expansions of gene families crucial for regulation, signaling, and cell communication (e.g., protocadherins, zinc finger proteins, G-protein coupled receptors); birth of many novel octopus-specific genes; the existence of a vascular endothelial growth factor (VEGF) pathway; reflectin genes originated by horizontal gene transfer; differential arrangements of key developmental genes; and extensive RNA editing capabilities [21, 24,25,26,27,28].

The sequencing of the Octopus bimaculoides genome [24] revealed an expansion of TEs and specific gene families related to transcriptional regulation and neuronal connectivity. Although no specific analysis was performed to identify full-length potentially active TEs and non-coding transcripts were discarded from the definition of the reference transcriptome, it was suggested that TEs are active in O. bimaculoides because a substantial fraction of RNAseq reads resulted in an overlap with TE fragments annotated in non-coding intergenic regions. Expansion of TEs has also been found in the genome of Octopus minor [29] and the other sequenced cephalopod species such as Euprymna scolopes [30] and the giant squid Architeuthis dux [31]. Performing a survey of the O. vulgaris genome, we have confirmed the expansion of TEs also in this species [32]. The current picture is that, in coleoid cephalopod species, repeated elements cover on average 45% of the genome while, in non-coleoid, this coverage is smaller reaching 30% in N. pompilius [33], 20% in L. gigantea [34], and 35% in C. cigas [35]. The TE expansion in coleoid cephalopod has been demonstrated to be due to retrotransposons [33].

Here, we sequence the O. vulgaris neural transcriptome to gain insights into the molecular composition underpinning its neural complexity. We identify a full-length LINE element and show that it is expressed especially in specific areas of the brain related to known forms of behavioral plasticity including learning. We also provide evidence for the transcription of thousands of long non-coding RNAs and the pervasive inclusion of TE fragments in coding and non-coding transcripts in both O. vulgaris and O. bimaculoides.

Results

Thousands of putative lncRNAs are expressed in the Octopus vulgaris nervous system

We generated a de novo assembly of the Octopus vulgaris neural transcriptome identifying and evaluating its functional annotations, potential lncRNAs, and repeats composition (Fig. 1; Additional file 1: Tables S1, S2, Figs. S1, S2, Additional files 2, 3 and 4). From each of three different octopus individuals, we collected four parts, three of them as representative of the central brain: (1) supra-esophageal mass (SEM), (2) sub-esophageal mass (SUB), and (3) optic lobe (OL), and one representing the peripheral nervous system: (4) a piece of the second left arm (ARM) including the arm nerve cord. We used these parts as a source of RNA for the sequencing (see the “Methods” section, Additional file 1: Table S1). The sequencing generated approximately 850 million paired-end reads accounting for 85 Gbp of sequence data that, following assembly and filtering, produced 64,477 unique transcripts (Additional file 2). The sequences showed a N50 value of 2087 bp, an average length of 1308, and about 38% average GC content (Additional file 1: Table S2). The transcriptome is more than 98% complete, and we functionally annotated 21,030 (32.6%) protein-coding transcripts (Additional file 1: Fig. S1 and Additional file 4). By performing stringent annotation analysis, a high proportion of transcripts (7806; 12.1%) was classified as putative lncRNAs (see the “Methods” section and Additional file 1: Fig. S2). In analogy to what is known about lncRNA expression in mammals [36], the non-coding portion of the O. vulgaris transcriptome shows a lower level of expression when compared to protein-coding genes (Fig. 1a) and a significantly higher number of lncRNAs is expressed in the central brain (~ 10%) with respect to the arm (~ 7%, p-value < 1e−40) (Fig. 1b). Functional enrichments highlight the differences between transcripts expressed in the central brain and in the periphery. Specifically, transcripts expressed in the brain are enriched in functions associated with neuronal cell adhesion and reverse transcription, while the ones expressed in the arm are enriched for functions associated with signal transduction and translation (Fig. 1c and Additional file 1: Fig. S3). We then mapped the transcriptome to the survey of the O. vulgaris genome [32]. Despite the quality of the current assembly being at the survey level and therefore rather fragmented, more than 34,000 transcripts were reliably mapped on the reference (Additional file 5).

We then validated the expression and the sequence of selected coding and non-coding transcripts by RT-PCR and Sanger sequencing (Fig. 2) selecting a group of transcripts showing a specific peak of expression in each of the collected parts. A transcript was considered to have a peak of expression in a given part when showing an expression level higher than 0.5 counts per million (CPM) in all three biological replicates of exclusively one part and below 0.5 in all the others. The value of 0.5 CPM was arbitrarily chosen as the value representing the 25 percentile of all the expression levels. This allowed the selection of ~ 1800 transcripts (~ 1500 coding and ~ 300 non-coding). Among the coding transcripts with an expression peak, only 54 resulted annotated. Among them, we noticed the presence of putative homologs of homeobox genes and selected 4 of them for validation through RT-PCR in 3 different individuals. The tested Arx putative homolog (Aristaless-related homeobox, comp31544_c0_seq1) is expressed mainly in the SEM by RNAseq and the RT-PCR validated this result. The RT-PCR also confirmed the peak of expression for the Hoxb5a putative homolog (homeobox B5, comp28131_c1_seq2) which is expressed mainly in the SUB and the Meox2 putative homolog (mesenchyme homeobox 2, comp34840_c15_seq1) which is expressed mainly in the ARM. The Phox2b putative homolog (paired-like homeobox 2b, comp28142_c1_seq1) peaking in the OL by RNAseq data is expressed in all sampled parts of the brain by RT-PCR. For RT-PCRs concerning the lncRNAs, Subl (lncRNA with a peak of expression in the SUB, comp35227_c11_seq1) and Arml (lncRNA with a peak of expression in ARM, comp20195_c0_seq1) resulted to be tissue-specific, as they were mainly expressed in the SUB and in the ARM, respectively. On the other hand, Seml (lncRNAs with a peak of expression in SEM, comp18661_c0_seq1) showed to be expressed in the SEM but also in the SUB and the OL, while Oll (lncRNA with a peak of expression in the OL, comp35506_c7_seq1) presented expression in the OL but also in all the other sampled parts (Fig. 2c). Both Oll and Arml show the existence of two different isoforms in at least one individual. The RT-PCR results were generally in agreement with the sequencing data for both coding and non-coding transcripts tested. The identifiers indicated for every transcript are the same used in the Additional files and can be used to identify the corresponding sequences and annotations.

In order to verify the presence of lncRNAs also in O. bimaculoides, we reassembled the public RNAseq data from this species with the same method used for our O. vulgaris data (see the “Methods” section). We assembled 92,820 unique transcripts (Additional file 1: Table S2 and Additional file 6) of which 84,043 (90%) map on the published assembled genome with at least 90% coverage and 90% identity (Additional file 7). Our analysis demonstrated the presence, also in this octopus species, of thousands of putative lncRNAs (Additional file 8). Indeed, we were able to classify 10,030 assembled transcript as putative non-coding. They correspond to more than 10% of the assembled transcriptome of which 9132 map on the reference genome.

We then analyzed the conservation between the two octopus transcriptomes. Putative orthologs where identified by using BLAST and selecting for the reciprocal best hits (RBHs). From this analysis about 33,322 transcripts (50% of the assembled transcriptome) of the O. vulgaris transcriptome share orthologs with O. bimaculoides. The fraction of conserved transcripts is similar between coding (29,331, 52% of the coding transcripts with RBH) and noncoding transcripts (3991, 51%). In general, the average conservation of coding transcripts is slightly but significantly higher than that of lncRNA (93.3% vs 92.1%, t-test p-value = 1.1e−56). To evaluate the conservation of promoters, we extracted 1000 nucleotides upstream of the transcription start site of each transcript from each RBH pair that could be mapped on the genome in both species and contained enough nucleotides upstream of the transcription start site. Because the genomes are rather fragmented, about 18,000 RBH transcript pairs satisfied these conditions in both species and were considered in the analysis. On these pairs, we performed global alignments and classified as conserved those promoter pairs showing at least 50% identity. Randomization analysis building alignments between random promoter pairs showed the complete absence of pairs presenting such a level of conservation. From this analysis, 883 promoter pairs (5%) were classified as conserved between the two species. Interestingly, the proportion of conserved promoters from noncoding (7%, 171 out of 2451) is significantly higher (binomial test p-value 5.3e−07) with respect to promoters resulting conserved among promoter pairs of the coding transcripts (4.6%, 712 out of 15493). Finally, in order to test the conservation of the assembled transcripts at the positional level, we selected all the O. vulgaris contigs containing at least 10 mapped transcripts, which resulted in about 500 contigs, and identified the locations of the respective O. bimaculoides orthologs. On average, 85% of the ortholog transcript pairs are present on the same contig in the two species. We conclude that the transcripts are positionally conserved between the two species.

A full-length LINE element is transcribed in the Octopus vulgaris brain

In order to evaluate the contribution of TEs to the O. vulgaris transcriptome, we analyzed its content in terms of repeated elements using RepeatMasker (Additional file 9) and found that more than 3.5 million nucleotides derive from interspersed repeats (4.5% of the total transcriptome content; Additional file 1: Table S3). More than 35% of the generated transcripts contain at least one interspersed repeat fragment. Among them, retroelements represent the TE fragments more frequently embedded in transcripts (26% of transcripts contain a fragment from at least one retroelement). According to the segregation of transposable elements in transcripts expressed in the different sequenced parts of the organism, we observed that SINE fragments are present in a higher fraction of transcripts expressed in the brain, while LINEs, LTRs, and DNA transposons are present in a higher portion of transcripts expressed in the periphery (Additional file 1: Fig. S4a). SINE also results in the class of retroelements more frequently embedded in lncRNAs (Additional file 1: Fig. S4b). We then asked whether the presence of head-to-head coding/noncoding pairs harboring the SINEUP [37] activity might be present in octopus. By screening the map** of the transcriptomes generated in both species, we have found about 250 and 500 coding/noncoding overlap** pairs in O. vulgaris and O. bimaculoides, respectively. The 20% of these pairs contain a SINE element embedded in the lncRNA, but only few of them, 4 in O. vulgaris and 2 in O. bimaculoides, have a head-to-head orientation with the SINE fragment embedded in the non-overlap** part of the lncRNA resembling a canonical SINEUP element. We conclude that the SINEUP mechanism is probably not a frequent feature in the octopus genome.

To identify the molecular basis of the observed TEs expansion in O. vulgaris, we searched the transcriptome for putative autonomous active elements. We found a single element, a LINE mainly transcribed in the brain (Fig. 3a–c) that we named RTE-2_OV following phylogenetic analysis from which the element resulted to belonging to the RTE clade (Fig. 3d). The identified LINE presents an ORF of 3327 nucleotides and 5′ and 3′ UTRs of about 600 nucleotides. The translation corresponds to a 1109-amino acid polypeptide chain containing all the catalytic amino acids and domains needed for retrotransposition (Fig. 3a, b): a C-terminal endonuclease (EN), a reverse transcriptase (RT), and an N-terminal C2H2 zinc-finger (Znf) which is relatively rare in RTE elements [40]. It is unlikely that the strategy used to assemble the transcriptome might have assembled a full-length LINE with a complete ORF by putting together fragments from different independent transcriptional units. In order to rule out this possibility, we amplified the element from both gDNA and cDNA with a single PCR reaction. The longest single amplification goes from nucleotide 253 to 3903 of the assembled sequence. The fragment contains almost the entire ORF plus the majority of 5′ UTRs. There are only 6 nucleotides lacking at the end of the ORF because a microsatellite begins at the end of the ORF and made up the majority of the 3′ UTR. This did not allow a specific amplification of this region. Microsatellites are a common feature of RTE retrotransposons [41]. We cloned the obtained fragment from the cDNA amplification and validated it by primer walking and Sanger sequencing assembling a sequence with more than 98% identity to the reference. These results support the existence of the element in the genome and its transcription. The finding that this element contains an intact ORF and is transcribed indicates that it might be active and possibly drive retrotransposition. We then asked whether this element might show evidence of somatic retrotransposition. Southern blots analysis did not lead to conclusive results because of high background noise likely due to the high number of copies of this TE in the genome of O. vulgaris; however, the pattern observed is consistent with the existence of germinal and somatic genomic variations associated to the element (Additional file 1: Fig. S5). We also performed quantitative PCR experiments on DNA deriving from the four parts of different individuals. The results support the presence of different amounts of DNA from the element in different portions of the brain (Fig. 3e) showing an average higher content of RTE-2_OV DNA in the SUB with respect to the other parts. The existence of these increased amounts of RTE-2_OV DNA in SUB is in line with its higher expression in the SUB (Fig. 3f). Interestingly, the putative homolog of Piwi, known to repress the translation of TEs and therefore to restrict retrotransposition, displays a lower expression in the SUB with respect to the other portions of the central brain (Fig. 3g) consistently with what has been observed in the fruitfly brain [7].

The RTE-2_OV element is mainly expressed in amacrine neurons

To identify the domains of activity for the RTE-2_OV element, we performed RNA in situ hybridization and immunohistochemistry analysis. Localization of the RTE-2_OV transcript in O. vulgaris through in situ hybridization (ISH) showed specific expression of the element in subgroups of neurons in the brain and the absence of any signal in neuropils. We found most of the small cells of the sub-frontal lobe and of the five gyri of the vertical lobe as the most intensely stained areas in the supra-esophageal mass (SEM; Fig 4a–e; Additional file 1: Fig. S6b-e). Neural cells stained for RTE-2_OV create a tapestry closely matching the pattern obtained by DAPI (Additional file 1: Fig. S6b-c; Fig. S6d-e as control), indicating that the transcript is expressed in the great majority of the cellular bodies. We also found several positive neural cells in the sub-esophageal mass (SUB: anterior and posterior areas; Fig. 4f–h; Additional file 1: Fig. S6g) and in the optic lobe (OL; Fig. 4i–k; Additional file 1: Fig. S6l). In particular, after ISH, positive cells (20–25 μm in diameter) were observed at the level of the SUB in the pallovisceral lobe and some larger neurons (40–50 μm) belonging to typical cellular types of the motor-center present in the area. Positivity to the mRNA of the RTE-2_OV was also observed in cells belonging to the ventral side of the anterior pedal lobe (SUB) and in some larger cells (up to 50 μm) at the level of the dorsal brachial lobe (anterior part of the SUB). The small cells pertaining to these areas do not show any positive signal. In the optic lobe, the outer layer appeared rich in intensely positive cells (small amacrine cells, < 5 μm), and the inner medulla presented scattered cell bodies (up to 10 μm) expressing the element. Cell bodies of the peduncle complex at the level of the median and posterior lobules of the olfactory lobe also revealed a positive signal after ISH. We finally found isolated sparse large motor neurons positive at the level of the nerve cord in the arm (Additional file 1: Fig. S6s). No positive cells were observed in any muscle fiber in the arm or any other structure.

Immunohistochemistry (IHC) with RTE-2_OV custom antibodies identified in the vertical lobe a number of large cells organized in trunks and positive fibers in the neuropil of the gyri. We also observed large positive cells organized in the chain at the level of the cellular wall of the sub-vertical lobe (Fig. 5b) and few positive neurons in the great majority of the islands of cells in the posterior wall of the sub-vertical lobe. A distinct pattern of positive fibers was found in the neuropils of the SEM (Fig. 5a–c) together with a scattered number of cell bodies (see areas belonging to superior and inferior frontal lobes; Fig. 5c, d). We did not identify positive fibers in the SUB, but only a distinct population of a small number of neurons in the vasomotor lobes (Fig. 5e) and in discrete areas of the anterior and the lateral posterior wall of the pedal lobe (Fig. 5f–h). Immuno-reactivity was also evident in several scattered amacrine cells of the external granular layer of the OL (Fig. 5i). We found several RTE-2_OV-positive fibers dispersed in discrete areas of the neuropil of the OL and a distinct pattern of positive cells and fibers in the peduncle lobe, mostly toward the internal layer of cells of the neural wall of the olfactory lobe and the spine (Fig. 5l). A small number of large cells dispersed in the arm nerve cord were also found to be stained (data not shown).

Identification of a potentially active LINE in Octopus bimaculoides

In order to add further support to the idea that LINE elements might be active in the genomes of the Octopus genus, we searched the public genome, transcriptome, and our custom transcriptome assembly (Additional files 6, 7 and 8) of O. bimaculoides for the presence of assembled full-length LINEs. We were not able to identify any assembled full-length LINE in the transcriptome nor in the genome. However, when inspecting the repeat library consensus generated in the work by Albertin et al. [24], we identified two LINEs with a full-length ORF and the complete set of domains: an RTE and a Dong [Annotation and map** of the assembled transcriptome

CEGMA (Core Eukaryotic Genes Map** Approach; release 2.5) [77] was used to measure the completeness of the assembled transcriptomes using the set of 248 Core Eukaryotic Genes (CEGs). Transcripts annotation was performed using the Annocript pipeline [78] (release 0.2) with the combination of tool, parameters, and databases described below and using BLAST+ (release 2.2.27) [79]. To annotate proteins, we used BLASTX against the UniRef90 and Swiss-Prot databases from UniProt (release 2013_09) [80] with the following parameters: -word_size 4, -evalue 10e-5, -num_descriptions 5, -num_alignments 5, -threshold 18. To annotate protein domains, we used RPSBLAST against the Conserved Domains Database (CDD v3.10) [81] with the following parameters: -evalue 10e-5, -num_descriptions 20, -num_alignments 20). Ribosomal and small non-coding RNAs were identified using BLASTN against a custom database made by Rfam (realease 11.0) [82] and ribosomal RNA sequences from GenBank (parameters: -evalue = 10e-5, -num_descriptions 1, -num_alignments 1). Each transcript was associated to Gene Ontology (GO) [83], Enzyme Commission (EC) [84], and UniPathway [85] through cross-map** of the best match from UniRef90 or Swiss-Prot using the annotation map** tables from UniProt. For each transcript, we used the Virtual Ribosome (Dna2pep release 1.1) [86] to predict the length of the longest ORF searching across all reading frames without the constraint to begin translation from a methionine start codon (parameters: -o none, -r all). The non-coding potential for each transcript was calculated using Portrait (release 1.1) [87]. The Octopus vulgaris reference genome survey was downloaded on March 2021 from https://springernature.figshare.com/ndownloader/files/13876385. Assembled and filtered unique transcripts were mapped on the genome using gmap [88] (parameters: --suboptimal-score 0 -f gff3_gene --gff3-add-separators 0 -t 32 --min-trimmed-coverage 0.9 --min-identity 0.9) considering only transcripts aligning at least 90% of their length with 90% minimum identity. We were able to map 34,239 (~ 53%) transcripts.

Non-coding annotation of the assembled transcriptome

Putative lncRNAs were classified based on a heuristic process considering the annotation results. The constraints used to identify potential lncRNAs have to be considered very stringent (Additional file 1: Fig. S2). In published studies, different combinations of analyses have been used to identify lncRNAs [87, 89,90,91] (1) lack of similarity with proteins, (2) lack of similarity with domain profiles, (3) lack of similarity with other RNAs (ribosomal, snoRNA, etc.), (4) transcript and longest ORF lengths, and (5) non-coding potential. We put all these together and classified as lncRNA only those transcripts satisfying all the following conditions: (a) length ≥ 200 nucleotides; (b) lack of similarity with any of the following: protein from Swiss-Prot and UniRef90, domains from CDD, rRNA from GenBank, and other small ncRNA from Rfam; (c) longest ORF < 100 amino acids; and (d) non-coding potential score ≥ 0.95. Using these stringent constraints, we were able to predict in the O. vulgaris transcriptome 7806 (~ 12%) transcripts as putative lncRNAs.

Assembly, map**, and annotation of the Octopus bimaculoides public RNAseq data

O. bimaculoides RNAseq raw data from Albertin et al. [24] were downloaded from NCBI SRA in October 2015 using the SRA Toolkit. Raw reads were filtered and trimmed based on quality and adapter inclusion using Trimmomatic (release 0.33; parameters: -threads 32, ILLUMINACLIP:illumina_adapters.fa:2:40:15:10:true LEADING:3 TRAILING:3 SLIDINGWINDOW:3:20 MINLEN:50). Read pairs with both reads passing the filters were considered for the assembly. Trimmed and filtered reads were assembled with Trinity (release 2.1.0; parameters: --seqType fq --SS_lib_type RF --CPU 32 --max_memory 240G --inchworm_cpu 32 --bflyHeapSpaceInit 24G --bflyHeapSpaceMax 240G --bflyCalculateCPU --normalize_reads --min_kmer_cov 2 --jaccard_clip) using digital normalization, strand information, the Jaccard clip and assuring that every kmer used in the assembly was present in at least 2 reads to reduce noise. Redundancy of assembled transcripts was reduced using Cd-hit [92] (version: 4.6, parameters: -c 0.90 -n 8 -r 0 -M 0 -T 0). To measure the expression levels, raw reads were mapped on the transcriptome using Bowtie (version 1, parameters: -t -q -p 32 --chunkmbs 10240 --maxins 500 --trim5 2 --trim3 2 --seedlen 28 --tryhard -a -S). SAM outputs from Bowtie were converted into BAM, sorted, indexed, and counted using the view, sort, index, and idxstats programs, respectively, from the Samtools software collection. All transcripts not showing at least 1 reads map** per million mapped reads (CPM) in at least 1 sample were discarded from the transcriptome. Octopus bimaculoides genome was downloaded on August 2015 from http://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=Metazome.

Assembled and filtered unique transcripts were mapped on the genome using gmap [93] (version: 2015-09-28, parameters: --suboptimal-score 0 -f gff3_gene --gff3-add-separators 0 -t 32 --min-trimmed-coverage 0.9 --min-identity 0.9) considering only transcripts aligning at least 90% of their length with 90% minimum identity. We were able to map 84,043 (~ 90%) transcripts. Annotations and all the remaining analysis were executed as for O. vulgaris.

Conservation analysis and SINEUP search

Putative orthologs between O. vulgaris and O. bimaculoides species were identified using BLAST+ (program blastn, parameters: -best_hit_overhang 0.1, -evalue 1e-0.5) and searching for reciprocal best hit (RBH). Promoters were defined as 1000 nucleotides upstream the annotated transcription start site and extracted for each ortholog pairs for which both transcripts could be mapped on the respective genome presenting enough sequence space upstream the transcriptional start site. Promoter pairs were aligned among them by using the function pairwiseAlignment from the Biostrings Bioconductor [88] package in R using default parameters. To identify positional conservation between the two species, we selected all the O. vulgaris scaffolds containing at least 10 mapped transcripts and counted how many pairs of orthologs were present on the same scaffolds in both the species. To search potential SINEUP, in each species, we used the GenomicRanges Bioconductor package [94]. Basically, we collected the closest transcripts for each pair (mRNA/mRNA, lncRNAs/lncRNAs, mRNA/lncRNAs). We then parsed the RepeatMasker output calculating the coverage of the repeats for each transcript and selected the mRNA/lncRNAs pairs with head-to-head overlap having at least one SINE element in the non-overlap** part of the lncRNA.

Identification and classification of repetitive elements

Repetitive elements for each transcriptome were annotated using RepeatMasker (A.F.A. Smit, R. Hubley & P. Green RepeatMasker at http://repeatmasker.org; release 4.0.5) searching against the Repbase database [95] (release 20140131) with parameters: -species bilateria, -s, -gff. We counted from RepeatMasker output the repeat fragments present at least once in each transcript and built a matrix containing the percentage of transcripts containing fragments related to (a) retroelements, (b) DNA transposons, (c) satellites, (d) simple-repeats, (e) low complexity, (f) others, and (g) unknown classes for each transcriptome according to the RepeatMasker classification.

Identification of full-length transposable elements

We parsed RepeatMasker output calculating the percentage of overlap between the assembled transcripts and the repeat consensus from Repbase selecting all elements with at least 80% coverage on the repeat consensus. Elements showing the highest coverage were selected. On these, we used Virtual Ribosome to predict the longest complete ORFs by searching across all reading frames with methionine as start codon and a canonical stop (parameters: -o strict, -r all). A single transcript resulted with a complete ORF. On this, we used InterPro [96] to identify and classify protein domains. The potential catalytic amino acids essential for the retrotransposition were manually identified comparing the putative translation with those reported in Clements and Singer [38]. The same analysis was performed on both the transcriptomes of O. bimaculoides (assembled by Albertin and assembled by us). The analysis was also performed on the RepeatMasker annotation of the genome by Albertin downloaded from http://octopus.unit.oist.jp/OCTDATA/. For consistency, we also analyzed RepeatMasker annotations of the genome and the transcriptomes produced using the same tool, library, and parameters used for O. vulgaris and the other species considered in this study. In no one of the analyses, we could find a full-length transposable element retaining a complete ORF for O. bimaculoides. We then translated the main RepeatScout and RepeatModeler repeat libraries consensuses assembled by Albertin et al. (main RepeatScout library: http://octopus.unit.oist.jp/OCTDATA/TE_FILES/mainrepeatlib.fa.gz; RepeatModeler library: http://octopus.unit.oist.jp/OCTDATA/TE_FILES/oct.rm.tar.gz) with the Virtual Ribosome tool to predict longest ORF searching across all reading frames showing methionine as start codon (parameters: -o strict, -r all) and a canonical stop. The InterPro tool was then used to identify and classify the LINE characteristic domains. The potential catalytic amino acids essential for the retrotransposon activity were manually identified by comparing the ORF sequences with those reported in Clements and Singer. This led us to the identification of two potentially functional LINE retrotransposons.

Identification of a potentially active retrotransposon in O. bimaculoides

The two candidate retrotransposons found in O. bimaculoides RepeatModeler libraries were analyzed to search for integration sites in gonads and optic lobe using MELT [43] (v2.0.2). Two genomic DNAseq WGS libraries from gonads (SRR2010220 and SRR2005790) were downloaded from the European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/data/view/PRJNA270931. We generated two additional DNAseq WGS libraries from DNA extracted from the optic lobe (L001 and L002) of a different individual. O. bimaculoides reference genome was filtered for scaffold shorter than 10,000 bp and reads mapped on it using BWA [97] (v0.7.15; parameters: mem, -t 32). SAM output from BWA was converted into BAM, sorted, indexed, and counted using the view, sort, index, and idxstats programs, respectively, from the SAMtools software. The resulted sorted BAM files were used as input for MELT (parameters: -d 10000). Since the reads length differs between the two set of libraries (150 bp for gonads and 260 bp for optic lobe), the optic lobe dataset was trimmed with Trimmomatic (v0.32; parameters: CROP:200, HEADCROP:50) to obtain homogeneous reads of 150 bp in all the datasets. We filtered the integration sites (ISs) identified by MELT for entries which passed all the MELT checks and which presented at least 3 discordant pairs of reads supporting both left and right sides of the breakpoints. BLAST (v2.6.0; parameters: -evalue 99999) search of the candidate retrotransposons consensus sequences was performed against the genome and the identified ISs were additionally filtered out when the BLAST search showed similarity in a range of 260 bp around the IS breakpoint. The same analysis was performed using non-trimmed reads and two additional ISs identification programs, RetroSeq [98] and an in-house developed pipeline, and the significance of the results was maintained (data not shown).

Evaluation of the activity of the RTE element discovered in O. bimaculoides

We performed 30X coverage WGS of the DNA extracted from two different tissues (SUB and GILL) of two different O. bimaculoides individuals. About 150 ng of genomic DNA was processed in order to construct a whole-genome Illumina sequencing library using the Illumina DNA Prep kit according to the standard protocol in a manual procedure. The library QC has been performed by the Agilent DNA 1000 Kit run on the 2100 Bioanalyzer (Agilent Technologies, Inc.). We obtained 8 libraries with 62 nM average concentration and 513 bp average size. A 0.9-nM final library pool has been loaded on a NovaSeq 6000 S2 Reagent Kit (300 cycles) and run on the NovaSeq 6000 System. We obtained an average %Q30 = 90.4, 84.8% clusters passing filter, and a total output 1.55 Tb. The identification of non-reference LINE insertion events in GILL and SUB of two different individuals was performed using MELT (version 2.2.2). First, reads were mapped to the reference genome using bwa with default parameters (version 0.7.15). Then, the MEI zip file needed for the subsequent MELT analysis was generated by using the MELT BuildTransposonZIP command by (i) setting the error value to 3, (ii) providing the FASTA sequence of the RTE LINE, and (iii) providing the genomic coordinates of the annotated insertion sites of the RTE LINE (previously identified by masking the LINE sequence on the O. bimaculoides reference genome by RepeatMasker—version 4.0.5). Finally, MELT SPLIT analysis was run following the Preprocess, IndivAnalysis, GroupAnalysis, Genotype, and MakeVCF steps as indicated in the MELT documentation (https://melt.igs.umaryland.edu/manual.php). We selected only the integration sites which could be identified by at least 3 supporting reads and passing all the quality checks performed by the MELT (classified as PASS). The results were evaluated using the UpSetR library [99].

Phylogenetic tree generation

Evolutionary tree in Fig. 3d was generated using 100 full-length LINEs belonging to 15 LINE clades (Additional file 1: Table S4). Protein sequences were selected from Ohshima and Okada [39] and manually checked. InterPro was used to identify endonuclease and reverse transcriptase domains in all the LINEs. Multiple sequence alignments were performed using MAFFT [100] (v7.221; with option L-INS-i). We utilized TrimAl [101] (v1.4.rev15) to perform automated trimming aligned sequences (parameters: -fasta -automated1). Phylogenetic relationships between LINE elements were reconstructed with MrBayes [102] (v3.2.1). Bayesian analysis was run for six million generations with twenty-two chains, sampling every 1000 generations (6000 samples). Convergence was attained with a standard deviation of split frequencies below 0.01 and a consensus tree was generated using a burnin parameter of 1500 (25% of 6000 samples). The phylogenetic tree was visualized with FigTree program (release 1.4.2; http://tree.bio.ed.ac.uk/software/figtree/).

Classification of transcripts expressed in each sample, expression peaks, and selection of candidates for validations

We classified subsets of transcripts according to their expression levels across the different parts. A transcript was considered expressed in a specific part if, in all the replicates of that part, it showed an expression level > 0.5 CPM. This resulted in the classification of the groups of transcripts used to perform the analysis. We also classified transcripts having a peak of expression. These represent transcripts showing an expression level > 0.5 CPM in all three biological replicates of exclusively one part and below 0.5 in the others, which resulted in ~ 1800 transcripts. They were used to draw the heatmap using MeV (release 4.8.1) as part of the TM4 suite [103] with hierarchical clustering exploiting Pearson correlation. Candidates for validations were selected among the 1800 transcripts with expression peak using the following additional criteria. For coding transcripts, we randomly selected four coding transcripts representing octopus homologs of homeobox genes. The annotation was manually verified. For lncRNAs, we randomly selected four putative non-coding containing a SINE fragment. Both coding and lncRNA candidates were validated using RT-PCR and Sanger sequencing.

Polymerase chain reaction (PCR)

Octopus vulgaris cDNAs were generated from 200 ng of total RNA using Superscript VILO cDNA Synthesis Kit (Life Technologies) in 20 μl reaction volume. PCR were carried out using 20 ng of cDNA, 0.25 μl of Taq DNA Polymerase (5 U/μl; Roche), 1 μl of each specific forward and reverse primer (25 pmol/μl), 2.5 μl of PCR reaction buffer (10×), 2.5 μl of dNTP mix (10×), and water (up to final volume 25 μl). The ubiquitin gene (accession number FJ617440) was used as an internal control. Reactions for the coding transcripts were amplified with a single step of 2 min at 94 °C, 15 s at 94 °C, 30s at 60 °C, and 1 min at 72 °C for 35 cycles and 7 min for 72 °C. Reactions for the non-coding transcripts were amplified at an annealing temperature of 58 °C. The following primers were utilized:

Arx forward 5′-TCCCTGCCTTCTCAACACAT-3′
Arx reverse 5′-TCCGAACTTCCACGCTTACT-3′
Hoxb5a forward 5′-GTGGCGAGGAATTTAGGAAG-3′
Hoxb5a reverse 5′-GCAACAGTCATAGTCCGAACAG-3′
Phox2b forward 5′-AATGGGGTGAGATCCTTTCC-3′
Phox2b reverse 5′-TTCATTGCAATCTCCTCTCG-3′
Meox2 forward 5′-TCCAGAACCGTCGGATGAAA-3′
Meox2 reverse 5′-TACGTAAAGGGCACACACCT-3′
Seml forward 5′-CACTTGTGCAAGGTACCACG-3′
Seml reverse 5′-AGGTCTCCTTAAATTTATTTCTGTGCA-3′
Subl forward 5′-ACAGAGCATCTTGAGTCTCACT-3′
Subl reverse 5′-CACTCCTGCGCCTTTCATTT-3′
Oll forward 5′-GGATTGACCCTGCAACTTGG-3′
Oll reverse 5′-CAGTGATGACGGACTTGCAA-3′
Arml forward 5′-GTACCCCACAAAATTAAATC-3′
Arml reverse 5′-CACTCACAAGGCTTTAGTTGGC-3′
Ubi forward 5′-TGTCAAGGCAAAGATTCAAGA-3′
Ubi reverse 5′-GGCCATAAACACACCAGCTC-3′

Cloning and primer walking of the LINE element in Octopus vulgaris

PCR has been carried out on cDNA and gDNA with Takara LA Taq and the primer pair Line F - Line R7 with the following amplification program: 1 min at 94 °C (30 s at 94 °C, 5 min at 68 °C) × 35 cycles, 10 s at 72 °C. The specific amplicon obtained on cDNA has been gel extracted and cloned in pGEM – T Easy Vector System following manufacturer instructions. The cDNA clone has been Sanger sequenced with the following primers:

SP6 (pGEM’s multiple cloning region) 5′-ATTTAGGTGACACTATAGAA-3′
LineF 5′-CCCCAGTCGTCTTGACTTTG-3′
LineF1 5′-GAGCAGCCCTCTTCAGGAT-3′
LineF2 5′-GCGACCATCATCAGTGCTTA-3′
LineR2 5′-TCAGATGCCAGTGTTTGGAG-3′
LineF3 5′-GGGTCAGAAAGTGACGAGGA-3′
LineR3 5′-TGCATGAGGCGGAGTTTAG-3′
LineF4 5′-CAAGAGGCTGATCCTGGAGA-3′
LineR4 5′-CCGATCTCCTTTCCGCTTAT-3′
LineF5 5′-AGGAGAAATGCATGGAGCAG-3′
LineR5 5′-TGTTGATACCGGACTTGCAG-3′
LineR6 5′-CGGTAAGCAGTCCACGTCTC-3′
LineR7 5′-GAACTGCCGCCATGAGAC-3′
T7 (pGEM’s multiple cloning region) 5′-TAATACGACTCACTATAGGG-3′

LINE copy number variation using quantitative real-time PCR in Octopus vulgaris

Copy number variation analysis was performed on genomic DNA extracted from octopuses (N = 9; SEM, SUB, OL, and ARM). One ARM sample was chosen as calibrator, while 18S was chosen as invariant control. Purified genomic DNA concentrations were assessed by NanoDROP (Thermo Fisher Scientific). According to the starting concentration, DNA samples were diluted in TE buffer (10mM Tris-HCl, 1mM EDTA, pH 7.5) to a concentration of 100 ng/μL and then further diluted to a concentration of 10 ng/μL. All dilutions were checked by NanoDROP. Primers and multiplexing efficiencies were verified by linear regression to a standard curve ranging from 50 ng to 16 pg of genomic DNA. LINE and 18S slopes were − 3.3 and 3.8, respectively, and represent acceptable amplification efficiencies. Standard curves also confirmed that the final concentration of 5 ng DNA tested in qPCR was within the linear range of reaction. Reactions were performed in 20 μl reaction mixture containing iQ Multiplex Powermix (Bio-Rad), Taqman primers (10 μM), and probes (10 μM) differentially labeled (with FAM or VIC fluorophore) and specifically designed to hybridize with the target DNA sequences. LINE element was amplified using the following primers:

LINE forward 5′-AGCAGTGGGAATCATTCA-3′
LINE reverse 5′-GTCGTTTTCGTCGAACCAGT-3′

18S was amplified using the following primers:

18S forward 5′-AGTTCCGACCGTAAACGATG-3′
18S reverse 5′-CCCTTCCGTCAATTCCTTTA-3′

As probe sequences we utilized the following:

FAM: 5′-AACTCTGGGCCAAATTACGA-3′
VIC: 5′-GGGAAACCATAGTCGGTTCC-3′

qPCR was carried out for 20 s at 90 °C, followed by 40 cycles of 10 s at 95 °C and 30 s at 59 °C using the 7900HT Fast Real Time PCR System (Applied Biosystems). Assays were performed for each sample in duplicate and reproduced four times. Data obtained from the co-amplifications of the target DNA sequence and the internal invariable control 18S were analyzed using the 2–ΔΔCt method79.

Southern blotting

The brains (SEM, SUB, and OL) and a piece of an arm (ARM) of three O. vulgaris were dissected after humane-killing and immediately frozen in liquid nitrogen. Pulverized samples were treated following the methods utilized by Perelman and coworkers80; in brief, after phenol∶chloroform (50:50) extraction DNA was precipitated using cold isopropanol followed by centrifugation, suspended in TE buffer (10 mM Tris–HCl pH 8.0 and 0.1 mM EDTA pH 8.0), treated with ribonuclease A (10 μg/mL) and incubated at 37 °C for 30 min. DNA concentration was estimated using NanoDrop and quality checked by electrophoresis on 0.8% agarose gel. 10 μg genomic DNA for each tissue was digested with EcoRI (New England Biolabs) overnight at 37 °C and resolved on a 0.9% agarose gel for 15 h at 1.5V. DNA was transferred to a Hybond-N+ nylon membrane (0.45 μm; Amersham Pharmacia Biotech) according to Sambrook and Russell81. DIG-labeled LINE DNA probe was prepared by PCR DIG Probe synthesis kit (Roche). Hybridization and autoradiography were performed according to the DIG Application Manual (Roche).

Probe synthesis for in situ hybridization

We amplified by PCR a 356-bp cDNA fragment of the assembled LINE from bp 1512 to bp 1868 using the following primers:

LINE forward 5′-GGGTCAGAAAGTGACGAGGA-3′
LINE reverse 5′-TGCATGAGGCGGAGTTTAG-3′

The choice of the fragment and the design of primers have been based on manual curation steps ensuring that the chosen fragment is present exclusively in the transcript of the identified LINE element and in no other assembled transcripts. The amplified fragment was cloned into TOPO® TA Cloning® vector (Life Technologies, CA, USA) according to the manufacturer’s protocol. Cloned fragment was digested using BAMHI and ECORV restriction enzymes and validated by Sanger sequencing. Sense and antisense digoxigenin-labeled RNA probes were generated by in vitro transcription using the DIG-RNA Labeling Kit (SP6/T7; Roche Applied Sciences, QC, Canada). Labeled RNA probes were quantified by dot blot analysis.

In situ hybridization experiments

Brain masses (SEM, SUB, OL) and a segment of an arm from octopuses (N = 3) were fixed in paraformaldehyde 4% (PFA) in phosphate-buffered saline (PBS) at 4 °C (3h for brain masses; overnight, ARM). Samples were washed (four rinses in PBS), dehydrated in series of graded methanol/PBS (1:3, 1:1, 3:1 v/v), and stored at least one night in methanol (− 20 °C). Tissues were then rehydrated at 25 °C in a series of graded methanol/PBS (3:1, 1:1, 1:3 v/v) solutions and cryoprotected in 30% sucrose in PBS. After sucrose infiltration, samples were embedded in tissue freezing medium (OCT; Leica Biosystems) and sectioned using a cryostat (Leica CM3050 S). Sagittal and/or coronal sections (40 μm) were collected in PBST (phosphate-buffered saline including 0.1% Tween™ 20 and 0.2mM sodium azide). Washed free-floating sections were mounted on Superfrost Plus slides (Menzel Gläser) and let dry overnight under fume hood. Hybridizations were performed as described by Abler et al. 82 with modifications. After rehydration in PSBT, the sections were quenched at 25 °C in 6% H₂O₂ (30 min) treated with proteinase K (10 min) and post-fixed with PFA-G (4% paraformaldehyde and 0.2% glutaraldehyde) for 20 min. Prehybridization step was performed in hybridization solution (HB 50% formamide, 5× SSC, with 10 μg/mL heparin, 10 μg/mL yeast tRNA, and 1% Blocking reagent) for at least 1 h at 60 °C and then incubated overnight in HB with the digoxigenin-labeled riboprobes. Post-hybridization washes (50% formamide, 5× SSC, 1% SDS) were carried out for 2 h at 60 °C. The sections were washed in TNT (10mM Tris-HCL pH 7.5, 0.5 M NaCl, 0.1% Tween™ 20) at 25 °C and incubated for 15 min at 37 °C with RNase (0.25 μg/mL), followed by a FS (50% formamide, 5× SSC) incubation of 2 h (60 °C). DIG was detected with an alkaline phosphatase labeled antibody (Roche). After a saturation step in TBS pH 7.5 (10% sheep serum, 1% blocking reagent, 1% BSA, 0.1% Tween™ 20) for 1 h (room temperature), the sections were incubated overnight at 4 °C with antibodies (1:1000; in TBS containing: 5% sheep serum, 1% blocking reagent, 1% BSA). The following day, sections were washed for 2 h in TBS (pH 7.5; 0.1% Tween™ 20 and 2 mM levamisole) and then washed in alkaline phosphatase solution (100 mM Tris-HCL pH 9.5, 100 mM NaCl, 50 mM MgCl₂, 0.1% Tween™ 20 and 2 mM levamisole). Bound antibodies were revealed using NBT-BCIP (Roche). After DIG in situ hybridization, slides were counterstained with DAPI (5 μg/mL, Invitrogen) washed and mounted using aqueous mounting.

RTE-2_OV custom-made polyclonal antibodies

Custom-made polyclonal antibodies were obtained from Primm Biotech Custom Antibody Services (Milan, Italy) and raised against two peptides derived from two portions of the RTE-2_OV protein: GAA (1-100 aa) and HAA (569-673 aa) resulting to be unique within the translation of the assembled transcriptome. To choose the portion to select, the manufacturer also took into consideration protein similarity (selection of regions with no significant identity to the murine and rabbit proteome), low complexity and transmembrane regions (exclusion of such regions), and distribution of predicted antigenic peptides (selection of regions with a high number of predicted antigenic peptide). The selected synthetic peptides were injected into two rabbits and boosted three times within 38 days (at days 21, 28, 35) after the first injection. The final bleeding was conducted 3 days after the last injection, and the crude sera were purified on Sepharose columns by immunoaffinity.

Immunohistochemistry and antibody validation

SEM, SUB, and OL dissected from O. vulgaris (N = 3) were immediately immersed 4% paraformaldehyde (PFA) in seawater (4 °C for 3 h). After fixation, samples were washed several times in PBS, cryoprotected in sucrose 30%, and embedded in OCT compound (OCT; Leica Biosystems). The embedded brain parts were then sectioned at 20 μm using a cryostat (Leica CM3050 S). No antigen retrieval was required. Tissue sections were rehydrated in three successive baths of 0.1 M PBS and incubated for 90 min (at RT) in 5% goat serum (Vector Laboratories Ltd.) diluted in 0.1 M PBS containing 0.05% Tween (PBTw). The slices were subsequently incubated at 4 °C with custom-made polyclonal antibodies raised against LINE (G and H; see RTE-2_OV custom-made polyclonal antibodies for details). The next day, slices were again washed by several changes of PBTw and incubation (at RT for 90 min) with secondary antibodies was carried out using Alexa Fluor 488 or 546 goat anti-rabbit IgG both diluted 1:200 in PBTw. Subsequently, sections were rinsed, and the cell nuclei were counterstained with DAPI (Molecular Probes, Eugene, OR). Finally, after further extensive washes, the sections were mounted with fluorescent mounting medium (Fluoromount, Sigma). For all antisera tested, omission of the primary antiserum and/or secondary antiserum resulted in negative staining. In addition, specificity was assured by pre-incubating (4 °C, overnight) the antibodies with 1 mg/mL of synthetic epitope (HAA and GAA, see RTE-2_OV custom-made polyclonal antibodies for details) before staining. Again, no immunostaining was observed. The two custom-made polyclonal antibodies raised against two different peptides of the RTE-2_OV protein stained the same spatial arrangement in the octopus brain tissue.

Imaging

Sections were observed under microscopes depending on the techniqueImage acquisition and processing were performed using the Leica Application Suite software (Leica Microsystems). For IHS, we utilized a Leica DMI6000 B inverted microscope, and for IHC, a Leica TCS SP8 X confocal microscope (Leica Microsystems, Germany). Tile Z-stacks were performed using a 0.2-μm step size. IHC figures have been assembled following guidelines for color blindness provided by Wong [104].

Availability of data and materials

The RNAseq dataset of Octopus vulgaris supporting the conclusions of this article is available in the ArrayExpress database (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-3957. The WGS DNAseq dataset of Octopus bimaculoides supporting the conclusions of this article is available in the European Nucleotide Archive (https://www.ebi.ac.uk/ena) under accession number PRJEB51312.

Abbreviations

TEs:: Transposable elements
LINE:: Long interspersed element
SINE:: Short interspersed element
LTR:: Long terminal repeat
LncRNAs:: Long non-coding RNAs
RTE:: Retrotransposon element
SEM:: Supra-esophageal mass
SUB:: Sub-esophageal mass
OL:: Optic lobe
ARM:: Arm
GO:: Gene Ontology
CPM:: Counts per million
RT-PCR:: Reverse transcriptase polymerase chain reaction
qPCR:: Quantitative polymerase chain reaction
MELT:: Mobile Element Locator Tool
IS:: Integration site
ISH:: In situ hybridization
IHC:: Immunohistochemistry
RBP:: Ribonucleoprotein

References

Kazazian HH. Mobile elements: drivers of genome evolution. Science. 2004;303:1626–32.
Article CAS PubMed Google Scholar
Kelley DR, Rinn JL. Transposable elements reveal a stem cell specific class of long noncoding RNAs. Genome Biol. 2012;13:R107.
Article PubMed PubMed Central CAS Google Scholar
Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9:e1003470.
Article CAS PubMed PubMed Central Google Scholar
Muotri AR, Chu VT, Marchetto MCN, Deng W, Moran JV, Gage FH. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature. 2005;435:903–10.
Article CAS PubMed Google Scholar
Baillie JK, Barnett MW, Upton KR, Gerhardt DJ, Richmond TA, De Sapio F, et al. Somatic retrotransposition alters the genetic landscape of the human brain. Nature. 2011;479:534–7.
Article CAS PubMed PubMed Central Google Scholar
Singer T, McConnell MJ, Marchetto MCN, Coufal NG, Gage FH. LINE-1 retrotransposons: mediators of somatic variation in neuronal genomes? Trends Neurosci. 2010;33:345–54.
Article CAS PubMed PubMed Central Google Scholar
Perrat PN, DasGupta S, Wang J, Theurkauf W, Weng Z, Rosbash M, et al. Transposition-driven genomic heterogeneity in the Drosophila brain. Science. 2013;340:91–5.
Article CAS PubMed Google Scholar
Rajasethupathy P, Antonov I, Sheridan R, Frey S, Sander C, Tuschl T, et al. A role for neuronal piRNAs in the epigenetic control of memory-related synaptic plasticity. Cell. 2012;149:693–707.
Article CAS PubMed PubMed Central Google Scholar
Lee EJ, Banerjee S, Zhou H, Jammalamadaka A, Arcila M, Manjunath BS, et al. Identification of piRNAs in the central nervous system. RNA. 2011;17:1090–9.
Article CAS PubMed PubMed Central Google Scholar
Jachowicz JW, Bing X, Pontabry J, Bošković A, Rando OJ, Torres-Padilla M-E. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat Genet. 2017;49:1502–10.
Article CAS PubMed Google Scholar
Percharde M, Lin C-J, Yin Y, Guan J, Peixoto GA, Bulut-Karslioglu A, et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell. 2018;174:391–405.e19.
Article CAS PubMed PubMed Central Google Scholar
Thomas CA, Tejwani L, Trujillo CA, Negraes PD, Herai RH, Mesci P, et al. Modeling of TREX1-dependent autoimmune disease using human stem cells highlights L1 accumulation as a source of neuroinflammation. Cell Stem Cell. 2017;21:319–331.e8.
Article CAS PubMed PubMed Central Google Scholar
Shomrat T, Zarrella I, Fiorito G, Hochner B. The octopus vertical lobe modulates short-term learning rate and uses LTP to acquire long-term memory. Curr Biol. 2008;18:337–42.
Article CAS PubMed Google Scholar
Edelman DB, Seth AK. Animal consciousness: a synthetic approach. Trends Neurosci. 2009;32:476–84.
Article CAS PubMed Google Scholar
Young JZ. Computation in the learning system of cephalopods. Biol Bull. 1991;180:200–8.
Article CAS PubMed Google Scholar
Turchetti-Maia A, Shomrat T, Hochner B. The vertical lobe of cephalopods. A brain structure ideal for exploring the mechanisms of complex forms of learning and memory. In: Byrne JJ, editor. The Oxford Handbook of Invertebrate Neurobiology. Oxford: Oxford University Press; 2017. p. 1–27.
Google Scholar
Shigeno S, Andrews PLR, Ponte G, Fiorito G. Cephalopod brains: an overview of current knowledge to facilitate comparison with vertebrates. Front Physiol. 2018;9:952.
Article PubMed PubMed Central Google Scholar
Styfhals R, Seuntjens E, Simakov O, Sanges R, Fiorito G. In silico Identification and Expression of Protocadherin Gene Family in Octopus vulgaris. Front Physiol. 2018;9:1905.
Shigeno S, Ragsdale CW. The gyri of the octopus vertical lobe have distinct neurochemical identities. J Comp Neurol. 2015;523:1297–317.
Article CAS PubMed Google Scholar
Gray EG, Young JZ. Electron microscopy of synaptic structure of octopus brain. J Cell Biol. 1964;21:87–103.
Article CAS PubMed PubMed Central Google Scholar
Wang ZY, Ragsdale CW. Cadherin genes and evolutionary novelties in the octopus. Sem Cell Dev Biol. 2017;69:151–7.
Article CAS Google Scholar
Young JZ. The number and sizes of nerve cells in octopus. Proc Zool Soc London. 1963;140:229–54.
Article Google Scholar
Borrelli L, Fiorito G. 1.31 - Behavioral analysis of learning and memory in cephalopods. In: Byrne JH, editor. Learning and memory: a comprehensive reference. Oxford: Academic Press; 2008. p. 605–27.
Chapter Google Scholar
Albertin CB, Simakov O, Mitros T, Wang ZY, Pungor JR, Edsinger-Gonzales E, et al. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature. 2015;524:220–4.
Article CAS PubMed PubMed Central Google Scholar
Guan Z, Cai T, Liu Z, Dou Y, Hu X, Zhang P, et al. Origin of the reflectin gene and hierarchical assembly of its protein. Curr Biol. 2017;27:2833–2842.e6.
Article CAS PubMed Google Scholar
Ritschard EA, Fitak RR, Oleg S, Sönke J. Genomic signatures of G-protein-coupled receptor expansions reveal functional transitions in the evolution of cephalopod signal transduction. Proc Biol Sci. 2019;286:20182929.
CAS PubMed PubMed Central Google Scholar
Lee PN, Callaerts P, De Couet HG, Martindale MQ. Cephalopod Hox genes and the origin of morphological novelties. Nature. 2003;424:1061–5.
Article CAS PubMed Google Scholar
Liscovitch-Brauer N, Alon S, Porath HT, Elstein B, Unger R, Ziv T, et al. Trade-off between transcriptome plasticity and genome evolution in cephalopods. Cell. 2017;169:191–202.e11.
Article CAS PubMed PubMed Central Google Scholar
Kim B-M, Kang S, Ahn D-H, Jung S-H, Rhee H, Yoo JS, et al. The genome of common long-arm octopus Octopus minor. GigaScience. 2018;7:giy119.
Belcaid M, Casaburi G, McAnulty SJ, Schmidbaur H, Suria AM, Moriano-Gutierrez S, et al. Symbiotic organs shaped by distinct modes of genome evolution in cephalopods. PNAS. 2019;116:3030–5.
Article CAS PubMed PubMed Central Google Scholar
da Fonseca RR, Couto A, Machado AM, Brejova B, Albertin CB, Silva F, et al. A draft genome sequence of the elusive giant squid, Architeuthis dux. Gigascience. 2020;9:giz152.
Zarrella I, Herten K, Maes GE, Tai S, Yang M, Seuntjens E, et al. The survey and reference assisted assembly of the Octopus vulgaris genome. Sci Data. 2019;6:1–8.
Article Google Scholar
Zhang Y, Mao F, Mu H, Huang M, Bao Y, Wang L, et al. The genome of Nautilus pompilius illuminates eye evolution and biomineralization. Nat Ecol Evol. 2021;5:927–38.
Article PubMed PubMed Central Google Scholar
Simakov O, Marletaz F, Cho S-J, Edsinger-Gonzales E, Havlak P, Hellsten U, et al. Insights into bilaterian evolution from three spiralian genomes. Nature. 2012. https://doi.org/10.1038/nature11696.
Zhang G, Fang X, Guo X, Li L, Luo R, Xu F, et al. The oyster genome reveals stress adaptation and complexity of shell formation. Nature. 2012. https://doi.org/10.1038/nature11413.
Qureshi IA, Mattick JS, Mehler MF. Long non-coding RNAs in nervous system function and disease. Brain Res. 2010. https://doi.org/10.1016/j.brainres.2010.03.110.
Zucchelli S, Fasolo F, Russo R, Cimatti L, Patrucco L, Takahashi H, et al. SINEUPs are modular antisense long non-coding RNAs that increase synthesis of target proteins in cells. Front Cell Neurosci. 2015;9:174.
Article PubMed PubMed Central CAS Google Scholar
Clements AP, Singer MF. The human LINE-1 reverse transcriptase:effect of deletions outside the common reverse transcriptase domain. Nucleic Acids Res. 1998;26:3528–35.
Article CAS PubMed PubMed Central Google Scholar
Ohshima K, Okada N. SINEs and LINEs: symbionts of eukaryotic genomes with a common tail. Cytogenetic Genome Res. 2005;110:475–90.
Article CAS Google Scholar
Malik HS, Eickbush TH. The RTE class of non-LTR retrotransposons is widely distributed in animals and is the origin of many SINEs. Mol Biol Evol. 1998;15:1123–34.
Article CAS PubMed Google Scholar
Tay WT, Behere GT, Batterham P, Heckel DG. Generation of microsatellite repeat families by RTE retrotransposons in lepidopteran genomes. BMC Evol Biol. 2010;10:144.
Article PubMed PubMed Central CAS Google Scholar
**ong Y, Eickbush TH. Dong, a non-long terminal repeat (non-LTR) retrotransposable element from Bombyx mori. Nucleic Acids Res. 1993;21:1318.
Article CAS PubMed PubMed Central Google Scholar
Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017. https://doi.org/10.1101/gr.218032.116.
Ohira R, Zhang Y-H, Guo W, Dipple K, Shih SL, Doerr J, et al. Human ARX gene: genomic characterization and expression. Mole Genet Metab. 2002;77:179–88.
Article CAS Google Scholar
Friocourt G, Poirier K, Rakić S, Parnavelas JG, Chelly J. The role of ARX in cortical development. Eur J Neurosci. 2006;23:869–76.
Article PubMed Google Scholar
Mankoo BS, Collins NS, Ashby P, Grigorieva E, Pevny LH, Candia A, et al. Mox2 is a component of the genetic hierarchy controlling limb muscle development. Nature. 1999;400:69–73.
Article CAS PubMed Google Scholar
De Cecco M, Ito T, Petrashen AP, Elias AE, Skvir NJ, Criscione SW, et al. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019;566:73–8.
Article PubMed PubMed Central CAS Google Scholar
Gray EG. The fine structure of the vertical lobe of octopus brain. Philos Trans R Soc Lond, B, Biol Sci. 1970;258:379–94.
Article CAS Google Scholar
Nixon M, Young JZ. The brains and lives of cephalopods: Oxford University Press; 2003.
Google Scholar
Young JZ. The anatomy of the nervous system of Octopus vulgaris. Oxford: Clarendon Press; 1971.
Google Scholar
Abbott NJ, Williamson R, Maddock L. Cephalopod neurobiology: neuroscience studies in squid, octopus and cuttlefish: Oxford University Press; 1995.
Book Google Scholar
Young JZ. Multiple matrices in the memory system of octopus. In: Abbott JN, Williamson R, Maddock L, editors. Cephalopod Neurobiology. Oxford: Oxford University Press; 1995. p. 431–43.
Google Scholar
Upton KR, Gerhardt DJ, Jesuadian JS, Richardson SR, Sánchez-Luque FJ, Bodea GO, et al. Ubiquitous L1 mosaicism in hippocampal neurons. Cell. 2015;161:228–39.
Article CAS PubMed PubMed Central Google Scholar
Hochner B. Functional and comparative assessments of the octopus learning and memory system. Front Biosci (Schol Ed). 2010;2:764–71.
Article PubMed Google Scholar
Marini G, De Sio F, Ponte G, Fiorito G. Behavioral analysis of learning and memory in cephalopods. In: Byrne JH, editor. Learning and memory: a comprehensive reference (second edition). Amsterdam: Academic Press, Elsevier; 2017. p. 441–62.
Chapter Google Scholar
Maldonado H. The visual attack learning system in Octopus vulgaris. J Theor Biol. 1963;5:470–88.
Article CAS PubMed Google Scholar
Maldonado H. The positive and negative learning process in Octopus vulgaris Lamarck. Influence of the vertical and median superior frontal lobes. Z Vergl Physiol. 1965;51:185–203.
Article Google Scholar
Mita P, Sun X, Fenyö D, Kahler DJ, Li D, Agmon N, et al. BRCA1 and S phase DNA repair pathways restrict LINE-1 retrotransposition in human cells. Nat Struct Mole Biol. 2020;27:179–91.
Article CAS Google Scholar
Mita P, Wudzinska A, Sun X, Andrade J, Nayak S, Kahler DJ, et al. LINE-1 protein localization and functional dynamics during the cell cycle. eLife. 2018;7:e30058.
Article PubMed PubMed Central Google Scholar
Taylor MS, Altukhov I, Molloy KR, Mita P, Jiang H, Adney EM, et al. Dissection of affinity captured LINE-1 macromolecular complexes. eLife. 2018;7:e30094.
Article PubMed PubMed Central Google Scholar
Wang L, Dou K, Moon S, Tan FJ, Zhang ZZ. Hijacking oogenesis enables massive propagation of LINE and retroviral transposons. Cell. 2018;174:1082–94.
Article CAS PubMed PubMed Central Google Scholar
Liu J, Dou X, Chen C, Chen C, Liu C, Xu MM, et al. N6-methyladenosine of chromosome-associated regulatory RNA regulates chromatin state and transcription. Science. 2020;367:580–6.
Article CAS PubMed PubMed Central Google Scholar
Ormundo LF, Machado CF, Sakamoto ED, Simões V, Armelin-Correa L. LINE-1 specific nuclear organization in mice olfactory sensory neurons. Mole Cell Neurosci. 2020;105:103494.
Article CAS Google Scholar
Lu JY, Chang L, Li T, Wang T, Yin Y, Zhan G, et al. Homotypic clustering of L1 and B1/Alu repeats compartmentalizes the 3D genome. Cell Res. 2021;31:613–30.
Article CAS PubMed PubMed Central Google Scholar
Xu W, Ren L, Zheng C, Cai J. Chromatin-interacting transposon RNAs linking to the core trans-inhibition circuitry for embryonic stem cell identity; 2021. p. 2021.04.28.441894.
Google Scholar
Fiorito G, von Planta C, Scotto P. Problem solving ability of Octopus vulgaris Lamarck (Mollusca, Cephalopoda). Behav Neural Biol. 1990;53:217–30.
Article CAS PubMed Google Scholar
Amodio P, Andrews P, Salemme M, Ponte G, Fiorito G. The use of artificial crabs for testing predatory behavior and health in the octopus. ALTEX. 2014;31:494–9.
Article PubMed Google Scholar
Fiorito G, Affuso A, Basil J, Cole A, de Girolamo P, D’Angelo L, et al. Guidelines for the care and welfare of cephalopods in research - a consensus based on an initiative by CephRes, FELASA and the Boyd Group. Lab Anim. 2015;49(2 Suppl):1–90.
Article PubMed Google Scholar
Baldascino E, Di Cristina G, Tedesco P, Hobbs C, Shaw TJ, Ponte G, et al. The gastric ganglion of Octopus vulgaris: preliminary characterization of gene- and putative neurochemical-complexity, and the effect of Aggregata octopiana digestive tract infection on gene expression. Front Physiol. 2017;8:1001.
Article PubMed PubMed Central Google Scholar
Andrews PLR, Darmaillacq A-S, Dennison N, Gleadall IG, Hawkins P, Messenger JB, et al. The identification and management of pain, suffering and distress in cephalopods, including anaesthesia, analgesia and humane killing. J Exp Marine Biol Ecol. 2013;447:46–64.
Article Google Scholar
Fiorito G, Affuso A, Anderson DB, Basil J, Bonnaud L, Botta G, et al. Cephalopods in neuroscience: regulations, research and the 3Rs. Invert Neurosci. 2014;14:13–36.
Article CAS PubMed PubMed Central Google Scholar
Hochner B. How nervous systems evolve in relation to their embodiment: what we can learn from octopuses and other molluscs. Brain Behav Evol. 2013;82:19–30.
Article PubMed Google Scholar
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protocols. 2013;8:1494–512.
Article CAS PubMed Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25.
Article PubMed PubMed Central CAS Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Article PubMed PubMed Central CAS Google Scholar
Parra G, Bradnam K, Korf I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–7.
Article CAS PubMed Google Scholar
Musacchia F, Basu S, Petrosino G, Salvemini M, Sanges R. Annocript: a flexible pipeline for the annotation of transcriptomes able to identify putative long noncoding RNAs. Bioinformatics. 2015;31:2199–201.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
Article PubMed PubMed Central CAS Google Scholar
Consortium TU. UniProt: a hub for protein information. Nucl Acids Res. 2015;43:D204–12.
Article CAS Google Scholar
Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, et al. CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res. 2013;41(Database issue):D348–52.
CAS PubMed Google Scholar
Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, et al. Rfam 11.0: 10 years of RNA families. Nucleic Acids Research. 2012. https://doi.org/10.1093/nar/gks1005.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9.
Article CAS PubMed PubMed Central Google Scholar
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 2000;28:304–5.
Article CAS PubMed PubMed Central Google Scholar
Morgat A, Coissac E, Coudert E, Axelsen KB, Keller G, Bairoch A, et al. UniPathway: a resource for the exploration and annotation of metabolic pathways. Nucleic Acids Res. 2012;40(Database issue):D761–9.
Article CAS PubMed Google Scholar
Wernersson R. Virtual Ribosome--a comprehensive DNA translation tool with support for integration of sequence feature annotation. Nucleic Acids Res. 2006;34(Web Server issue):W385–8.
Article CAS PubMed PubMed Central Google Scholar
Arrial RT, Togawa RC, Brigido M de M. Screening non-coding RNAs in transcriptomes from neglected species using PORTRAIT: case study of the pathogenic fungus Paracoccidioides brasiliensis. BMC Bioinformatics. 2009;10:239.
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Meth. 2015;12:115–21.
Article CAS Google Scholar
Pauli A, Valen E, Lin MF, Garber M, Vastenhouw NL, Levin JZ, et al. Systematic identification of long non-coding RNAs expressed during zebrafish embryogenesis. Genome Res. 2011. https://doi.org/10.1101/gr.133009.111.
Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011;147:1537–50.
Article CAS PubMed PubMed Central Google Scholar
Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011;25:1915–27.
Article CAS PubMed PubMed Central Google Scholar
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
Article CAS PubMed Google Scholar
Wu TD, Watanabe CK. GMAP: a genomic map** and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–75.
Article CAS PubMed Google Scholar
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLOS Computational Biol. 2013;9:e1003118.
Article CAS Google Scholar
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.
Article CAS PubMed Google Scholar
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012;40(Database issue):D306–12.
Article CAS PubMed Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
Article CAS PubMed PubMed Central Google Scholar
Keane TM, Wong K, Adams DJ. RetroSeq: transposable element discovery from Illumina paired-end sequencing data. Bioinformatics. 2012. https://doi.org/10.1093/bioinformatics/bts697.
Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33:2938–40.
Article CAS PubMed PubMed Central Google Scholar
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Article CAS PubMed PubMed Central Google Scholar
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3.
Article PubMed PubMed Central CAS Google Scholar
Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–4.
Article CAS PubMed Google Scholar
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, et al. TM4: a free, open-source system for microarray data management and analysis. BioTechniques. 2003;34:374–8.
Article CAS PubMed Google Scholar
Wong B. Points of view: color blindness. Nat Methods. 2011;8:441.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank Paul Andrews for the critical reading of the manuscript, Paolo Vatta and Marta Maurutto for the assistance in CNV experiments, Marco Miralto and the BIOINforMA service at SZN for informatics support, and Diego Vozzi for performing the WGS and Pamela Imperadore for her support.

Funding

The work has been supported by Progetto Premiale MolEcOC (Italian Ministry of Education, University and Research, MIUR), Flagship project RITMARE (MIUR and Stazione Zoologica), and BIOforIU PON Project (MIUR and European Regional Development Fund, FESR). Giuseppe Petrosino, Swaraj Basu, Massimiliano Volpe, and Giulia Di Cristina have been supported by a SZN PhD fellowship.

Author information

Giuseppe Petrosino, Giovanna Ponte, and Massimiliano Volpe contributed equally to this work.

Authors and Affiliations

Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, Villa Comunale, SZN, 80121, Naples, Italy
Giuseppe Petrosino, Giovanna Ponte, Massimiliano Volpe, Ilaria Zarrella, Concetta Langella, Giulia Di Cristina, Swaraj Basu, Francesco Musacchia, Filomena Ristoratore, Graziano Fiorito & Remo Sanges
Institute of Molecular Biology (IMB), Mainz, Germany
Giuseppe Petrosino
Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), Via Enrico Melen 83, 16152, Genova, Italy
Massimiliano Volpe, Federico Ansaloni, Stefano Gustincich & Remo Sanges
Department of Biomedical and Clinical Sciences, Linkö** University, Linkö**, Sweden
Massimiliano Volpe
Institute of Zoology, University of Cologne, Cologne, Germany
Giulia Di Cristina
Neurobiology Sector, Scuola Internazionale Superiore di Studi Avanzati (SISSA), Via Bonomea 265, 34136, Trieste, Italy
Sara Finaurini, Stefano Gustincich & Remo Sanges
Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Villa Comunale, SZN, 80121, Naples, Italy
Monia T. Russo & Maria I. Ferrante
Strand Life Sciences, Bengaluru, India
Swaraj Basu
Scientific Core Facilities & Technologies, GeneCore, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117, Heidelberg, Germany
Dinko Pavlinic & Vladimir Benes
Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland
Dinko Pavlinic
Marine Biological Laboratory (MBL), Woods Hole, MA, USA
Caroline Albertin
Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, 9040495, Japan
Oleg Simakov
Department of Molecular Evolution and Development, Wien University, Althanstraße 14 (UZA I), 1090, Wien, Austria
Oleg Simakov

Authors

Giuseppe Petrosino
View author publications
You can also search for this author in PubMed Google Scholar
Giovanna Ponte
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Volpe
View author publications
You can also search for this author in PubMed Google Scholar
Ilaria Zarrella
View author publications
You can also search for this author in PubMed Google Scholar
Federico Ansaloni
View author publications
You can also search for this author in PubMed Google Scholar
Concetta Langella
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Di Cristina
View author publications
You can also search for this author in PubMed Google Scholar
Sara Finaurini
View author publications
You can also search for this author in PubMed Google Scholar
Monia T. Russo
View author publications
You can also search for this author in PubMed Google Scholar
Swaraj Basu
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Musacchia
View author publications
You can also search for this author in PubMed Google Scholar
Filomena Ristoratore
View author publications
You can also search for this author in PubMed Google Scholar
Dinko Pavlinic
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Benes
View author publications
You can also search for this author in PubMed Google Scholar
Maria I. Ferrante
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Albertin
View author publications
You can also search for this author in PubMed Google Scholar
Oleg Simakov
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Gustincich
View author publications
You can also search for this author in PubMed Google Scholar
Graziano Fiorito
View author publications
You can also search for this author in PubMed Google Scholar
Remo Sanges
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RS and GF conceived, designed, and led the project. GP1, MV, and RS performed the bioinformatics analysis and interpreted the results with significant contributions from FA, SB, and FM. GP2 carried out the in situ hybridization and IHC experiments and choice sectioning and sampling of different animals. IZ cared of biological sampling and performed RNA extractions. SF and SG carried out the quantitative PCR experiments. DP and VB performed the RNAseq. OS and CA carried out the extraction and sequencing of O. bimaculoides samples. IZ, CL, GDC, MTR, MIF, and FR carried out the remaining molecular biology experiments and validations. RS and GF wrote the manuscript. All the authors discussed the results and contributed to the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Graziano Fiorito or Remo Sanges.

Ethics declarations

Ethics approval and consent to participate

Samples were taken from local fishermen, by applying humane killing following the principles detailed in Annex IV of Directive 2010/63/EU as described in the Guidelines on the Care and Welfare of Cephalopods (Fiorito et al. 2015) and following the protocol for the collection of tissues described by Baldascino and coworkers (Baldascino et al. 2017).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

Octopus vulgaris samples utilized for sequencing. Table S2. Counts of the assembled Octopus vulgaris and O. bimaculoides transcriptomes. Table S3. Counts of the repeats composition for the assembled Octopus vulgaris transcriptome. Table S4. LINEs used for phylogenetic analysis. Fig. S1. Completeness of Octopus transcriptomes. Fig. S2. Stringent classification of lncRNAs. Fig. S3. Most represented GO Molecular Function and Cellular Component classes. Fig. S4. Percentages of transcripts associated to transposable elements. Fig. S5. Southern blot analysis of the Octopus vulgaris LINE element. Fig. S6. In situ hybridization for LINE in the brain (SEM, SUB and OL) and arm of adult Octopus vulgaris, and relative controls.

Additional file 2:

Fasta file containing the Octopus vulgaris assembled and filtered transcriptome made up by 64,477 sequences.

Additional file 3:

Tab delimited text file containing raw reads counts for each of the assembled Octopus vulgaris transcript.

Additional file 4:

Tab delimited text file containing the annotations of the 64,477 filtered Octopus vulgaris transcripts.

Additional file 5:

GFF3 file containing the map** of the Octopus vulgaris transcripts on the genome.

Additional file 6:

Fasta file containing the Octopus bimaculoides assembled and filtered transcriptome made up by 92,820 sequences.

Additional file 7:

GFF3 file containing the map** of the Octopus bimaculoides transcripts on the genome.

Additional file 8:

Tab delimited text file containing the annotations of the 92,820 Octopus bimaculoides transcripts.

Additional file 9:

RepeatMasker analysis output file on the repeats content of the Octopus vulgaris assembled transcritome.

Additional file 10:

VCF file containing the output of MELT SPLIT run on the WGS of the Octopus bimaculoides samples.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Petrosino, G., Ponte, G., Volpe, M. et al. Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain. BMC Biol 20, 116 (2022). https://doi.org/10.1186/s12915-022-01303-5

Download citation

Received: 06 February 2021
Accepted: 21 April 2022
Published: 18 May 2022
DOI: https://doi.org/10.1186/s12915-022-01303-5

Identification of LINE retrotransposons and long non-coding RNAs expressed in the octopus brain

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Background

Results

Thousands of putative lncRNAs are expressed in the Octopus vulgaris nervous system

A full-length LINE element is transcribed in the Octopus vulgaris brain

The RTE-2_OV element is mainly expressed in amacrine neurons

Identification of a potentially active LINE in Octopus bimaculoides

Non-coding annotation of the assembled transcriptome

Assembly, map**, and annotation of the Octopus bimaculoides public RNAseq data

Conservation analysis and SINEUP search

Identification and classification of repetitive elements

Identification of full-length transposable elements

Identification of a potentially active retrotransposon in O. bimaculoides

Evaluation of the activity of the RTE element discovered in O. bimaculoides

Phylogenetic tree generation

Classification of transcripts expressed in each sample, expression peaks, and selection of candidates for validations

Polymerase chain reaction (PCR)

Cloning and primer walking of the LINE element in Octopus vulgaris

LINE copy number variation using quantitative real-time PCR in Octopus vulgaris

Southern blotting

Probe synthesis for in situ hybridization

In situ hybridization experiments

RTE-2_OV custom-made polyclonal antibodies

Immunohistochemistry and antibody validation

Imaging

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation