Animals, sampling, and ethical statement
Octopus vulgaris were collected by local fishermen from the Bay of Naples (Southern Tyrrhenian Sea, Italy) in early summer 2012. Animals were transported to the Stazione Zoologica Anton Dohrn in Napoli and maintained according to a standardized acclimatization protocol [66,67,68]. Samples were taken from local fishermen, by applying humane killing following principles detailed in Annex IV of Directive 2010/63/EU as described in the Guidelines on the Care and Welfare of Cephalopods [68] and following protocol for collection of tissues described by Baldascino and coworkers [69]. Death was confirmed by transection of the dorsal aorta. All dissections were carried out on a seawater ice bed. During surgery optic lobes (OL), supra- (SEM) and sub-esophageal (SUB) masses were dissected out from the animal, a piece of an arm (ARM), usually the second left arm, was also taken. The complete dissection lasted less than 10 min. Sampling from live animals occurred before the entry into force of the Directive 2010/63/EU in the Member States, and therefore, no legislation was in place in Italy regulating research involving cephalopods. However, the care and welfare of animals have been consistent with best practice [68, 70, 71] and in compliance with the requirements of the Directive 2010/63/EU that includes cephalopods within the list of species regulated for scientific research involving living animals. In addition, animals killed solely for tissue removal do not require authorization from the National Competent Authority under Directive 2010/63/EU and its transposition into national legislation.
RNA extraction and sequencing
For each octopus (N = 3), total RNA was isolated from central nervous tissues (SEM, SUB, and OL) and ARM, a part of the body including the largest quota of the neuronal population belonging to the peripheral nervous system [22, 72] and thus constituted by muscle and peripheral nervous tissue. SV total RNA isolation kit (Promega, #Z3100) was utilized according to the manufacturer’s protocol. DNA was degraded by treating samples with Turbo DNase Kit (Ambion) according to the manual. The quality and quantity of RNA were assessed by NanoDrop (Thermo Fisher) and RNA BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA). Paired-end libraries were prepared using the Illumina TruSeq RNA sample library preparation kit (Illumina, San Diego, CA, USA). Each sample was barcoded, and samples pooled and sequenced in two lanes on the Illumina HiSeq 2000 platform (paired-end, non-strand specific, 2 × 50 bp read length protocol).
Raw reads quality filtering
Quality of raw reads was assessed using FastQC (release 0.10.1). Raw reads were filtered and trimmed based on quality and adapter inclusion using Trimmomatic [73] (release 0.22; parameters: -threads 24, -phred 64, ILLUMINACLIP:illumina_adapters.fa:2:40:15, LEADING:3, TRAILING:3, SLIDINGWINDOW:3:20, MINLEN:25). Read pairs with both reads passing the filters were considered for the transcriptome assembly. Trimmed and filtered reads were normalized to remove duplicates using the normalize_by_kmer_coverage.pl script from Trinity [74] (release r2013_08_14; parameters: --seqType fq, --JM 240G, --max_cov 30, --JELLY_CPU 24).
De novo assembly and quantification of transcript abundances
Transcriptome was assembled using Trinity (release r2013_08_14) on the trimmed, filtered, and normalized reads exploiting the Jaccard clip to limit assembly of chimeras. Assembly was performed using the following parameters: --seqType fq, --JM 240G, --inchworm_cpu 24, --bflyHeapSpaceInit 24G, --bflyHeapSpaceMax 240G, --bflyCalculateCPU, --CPU 24, --jaccard_clip, --min_kmer_cov 2. To measure expression levels, raw reads were mapped on the assembled transcriptome using Bowtie (version 1; parameters: -p 24, --chunkmbs 10240, --maxins 500, --trim5 2, --trim3 2, --seedlen 15, --tryhard -a -S). SAM outputs from Bowtie [75] were converted into BAM, sorted, indexed, and counted using the view, sort, index, and idxstats programs from Samtools [76]. All transcripts not showing at least 0.5 reads mapping per million mapped reads (CPM) in at least 2 samples were discarded from the transcriptome as being expressed at too low levels and therefore likely deriving by noise or assembly artifacts.
Annotation and mapping of the assembled transcriptome
CEGMA (Core Eukaryotic Genes Mapping Approach; release 2.5) [77] was used to measure the completeness of the assembled transcriptomes using the set of 248 Core Eukaryotic Genes (CEGs). Transcripts annotation was performed using the Annocript pipeline [78] (release 0.2) with the combination of tool, parameters, and databases described below and using BLAST+ (release 2.2.27) [79]. To annotate proteins, we used BLASTX against the UniRef90 and Swiss-Prot databases from UniProt (release 2013_09) [80] with the following parameters: -word_size 4, -evalue 10e-5, -num_descriptions 5, -num_alignments 5, -threshold 18. To annotate protein domains, we used RPSBLAST against the Conserved Domains Database (CDD v3.10) [81] with the following parameters: -evalue 10e-5, -num_descriptions 20, -num_alignments 20). Ribosomal and small non-coding RNAs were identified using BLASTN against a custom database made by Rfam (realease 11.0) [82] and ribosomal RNA sequences from GenBank (parameters: -evalue = 10e-5, -num_descriptions 1, -num_alignments 1). Each transcript was associated to Gene Ontology (GO) [83], Enzyme Commission (EC) [84], and UniPathway [85] through cross-mapping of the best match from UniRef90 or Swiss-Prot using the annotation mapping tables from UniProt. For each transcript, we used the Virtual Ribosome (Dna2pep release 1.1) [86] to predict the length of the longest ORF searching across all reading frames without the constraint to begin translation from a methionine start codon (parameters: -o none, -r all). The non-coding potential for each transcript was calculated using Portrait (release 1.1) [87]. The Octopus vulgaris reference genome survey was downloaded on March 2021 from https://springernature.figshare.com/ndownloader/files/13876385. Assembled and filtered unique transcripts were mapped on the genome using gmap [88] (parameters: --suboptimal-score 0 -f gff3_gene --gff3-add-separators 0 -t 32 --min-trimmed-coverage 0.9 --min-identity 0.9) considering only transcripts aligning at least 90% of their length with 90% minimum identity. We were able to map 34,239 (~ 53%) transcripts.
Non-coding annotation of the assembled transcriptome
Putative lncRNAs were classified based on a heuristic process considering the annotation results. The constraints used to identify potential lncRNAs have to be considered very stringent (Additional file 1: Fig. S2). In published studies, different combinations of analyses have been used to identify lncRNAs [87, 89,90,91] (1) lack of similarity with proteins, (2) lack of similarity with domain profiles, (3) lack of similarity with other RNAs (ribosomal, snoRNA, etc.), (4) transcript and longest ORF lengths, and (5) non-coding potential. We put all these together and classified as lncRNA only those transcripts satisfying all the following conditions: (a) length ≥ 200 nucleotides; (b) lack of similarity with any of the following: protein from Swiss-Prot and UniRef90, domains from CDD, rRNA from GenBank, and other small ncRNA from Rfam; (c) longest ORF < 100 amino acids; and (d) non-coding potential score ≥ 0.95. Using these stringent constraints, we were able to predict in the O. vulgaris transcriptome 7806 (~ 12%) transcripts as putative lncRNAs.
Assembly, mapping, and annotation of the Octopus bimaculoides public RNAseq data
O. bimaculoides RNAseq raw data from Albertin et al. [24] were downloaded from NCBI SRA in October 2015 using the SRA Toolkit. Raw reads were filtered and trimmed based on quality and adapter inclusion using Trimmomatic (release 0.33; parameters: -threads 32, ILLUMINACLIP:illumina_adapters.fa:2:40:15:10:true LEADING:3 TRAILING:3 SLIDINGWINDOW:3:20 MINLEN:50). Read pairs with both reads passing the filters were considered for the assembly. Trimmed and filtered reads were assembled with Trinity (release 2.1.0; parameters: --seqType fq --SS_lib_type RF --CPU 32 --max_memory 240G --inchworm_cpu 32 --bflyHeapSpaceInit 24G --bflyHeapSpaceMax 240G --bflyCalculateCPU --normalize_reads --min_kmer_cov 2 --jaccard_clip) using digital normalization, strand information, the Jaccard clip and assuring that every kmer used in the assembly was present in at least 2 reads to reduce noise. Redundancy of assembled transcripts was reduced using Cd-hit [92] (version: 4.6, parameters: -c 0.90 -n 8 -r 0 -M 0 -T 0). To measure the expression levels, raw reads were mapped on the transcriptome using Bowtie (version 1, parameters: -t -q -p 32 --chunkmbs 10240 --maxins 500 --trim5 2 --trim3 2 --seedlen 28 --tryhard -a -S). SAM outputs from Bowtie were converted into BAM, sorted, indexed, and counted using the view, sort, index, and idxstats programs, respectively, from the Samtools software collection. All transcripts not showing at least 1 reads mapping per million mapped reads (CPM) in at least 1 sample were discarded from the transcriptome. Octopus bimaculoides genome was downloaded on August 2015 from http://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=Metazome.
Assembled and filtered unique transcripts were mapped on the genome using gmap [93] (version: 2015-09-28, parameters: --suboptimal-score 0 -f gff3_gene --gff3-add-separators 0 -t 32 --min-trimmed-coverage 0.9 --min-identity 0.9) considering only transcripts aligning at least 90% of their length with 90% minimum identity. We were able to map 84,043 (~ 90%) transcripts. Annotations and all the remaining analysis were executed as for O. vulgaris.
Conservation analysis and SINEUP search
Putative orthologs between O. vulgaris and O. bimaculoides species were identified using BLAST+ (program blastn, parameters: -best_hit_overhang 0.1, -evalue 1e-0.5) and searching for reciprocal best hit (RBH). Promoters were defined as 1000 nucleotides upstream the annotated transcription start site and extracted for each ortholog pairs for which both transcripts could be mapped on the respective genome presenting enough sequence space upstream the transcriptional start site. Promoter pairs were aligned among them by using the function pairwiseAlignment from the Biostrings Bioconductor [88] package in R using default parameters. To identify positional conservation between the two species, we selected all the O. vulgaris scaffolds containing at least 10 mapped transcripts and counted how many pairs of orthologs were present on the same scaffolds in both the species. To search potential SINEUP, in each species, we used the GenomicRanges Bioconductor package [94]. Basically, we collected the closest transcripts for each pair (mRNA/mRNA, lncRNAs/lncRNAs, mRNA/lncRNAs). We then parsed the RepeatMasker output calculating the coverage of the repeats for each transcript and selected the mRNA/lncRNAs pairs with head-to-head overlap having at least one SINE element in the non-overlapping part of the lncRNA.
Identification and classification of repetitive elements
Repetitive elements for each transcriptome were annotated using RepeatMasker (A.F.A. Smit, R. Hubley & P. Green RepeatMasker at http://repeatmasker.org; release 4.0.5) searching against the Repbase database [95] (release 20140131) with parameters: -species bilateria, -s, -gff. We counted from RepeatMasker output the repeat fragments present at least once in each transcript and built a matrix containing the percentage of transcripts containing fragments related to (a) retroelements, (b) DNA transposons, (c) satellites, (d) simple-repeats, (e) low complexity, (f) others, and (g) unknown classes for each transcriptome according to the RepeatMasker classification.
Identification of full-length transposable elements
We parsed RepeatMasker output calculating the percentage of overlap between the assembled transcripts and the repeat consensus from Repbase selecting all elements with at least 80% coverage on the repeat consensus. Elements showing the highest coverage were selected. On these, we used Virtual Ribosome to predict the longest complete ORFs by searching across all reading frames with methionine as start codon and a canonical stop (parameters: -o strict, -r all). A single transcript resulted with a complete ORF. On this, we used InterPro [96] to identify and classify protein domains. The potential catalytic amino acids essential for the retrotransposition were manually identified comparing the putative translation with those reported in Clements and Singer [38]. The same analysis was performed on both the transcriptomes of O. bimaculoides (assembled by Albertin and assembled by us). The analysis was also performed on the RepeatMasker annotation of the genome by Albertin downloaded from http://octopus.unit.oist.jp/OCTDATA/. For consistency, we also analyzed RepeatMasker annotations of the genome and the transcriptomes produced using the same tool, library, and parameters used for O. vulgaris and the other species considered in this study. In no one of the analyses, we could find a full-length transposable element retaining a complete ORF for O. bimaculoides. We then translated the main RepeatScout and RepeatModeler repeat libraries consensuses assembled by Albertin et al. (main RepeatScout library: http://octopus.unit.oist.jp/OCTDATA/TE_FILES/mainrepeatlib.fa.gz; RepeatModeler library: http://octopus.unit.oist.jp/OCTDATA/TE_FILES/oct.rm.tar.gz) with the Virtual Ribosome tool to predict longest ORF searching across all reading frames showing methionine as start codon (parameters: -o strict, -r all) and a canonical stop. The InterPro tool was then used to identify and classify the LINE characteristic domains. The potential catalytic amino acids essential for the retrotransposon activity were manually identified by comparing the ORF sequences with those reported in Clements and Singer. This led us to the identification of two potentially functional LINE retrotransposons.
Identification of a potentially active retrotransposon in O. bimaculoides
The two candidate retrotransposons found in O. bimaculoides RepeatModeler libraries were analyzed to search for integration sites in gonads and optic lobe using MELT [43] (v2.0.2). Two genomic DNAseq WGS libraries from gonads (SRR2010220 and SRR2005790) were downloaded from the European Nucleotide Archive (ENA) at https://www.ebi.ac.uk/ena/data/view/PRJNA270931. We generated two additional DNAseq WGS libraries from DNA extracted from the optic lobe (L001 and L002) of a different individual. O. bimaculoides reference genome was filtered for scaffold shorter than 10,000 bp and reads mapped on it using BWA [97] (v0.7.15; parameters: mem, -t 32). SAM output from BWA was converted into BAM, sorted, indexed, and counted using the view, sort, index, and idxstats programs, respectively, from the SAMtools software. The resulted sorted BAM files were used as input for MELT (parameters: -d 10000). Since the reads length differs between the two set of libraries (150 bp for gonads and 260 bp for optic lobe), the optic lobe dataset was trimmed with Trimmomatic (v0.32; parameters: CROP:200, HEADCROP:50) to obtain homogeneous reads of 150 bp in all the datasets. We filtered the integration sites (ISs) identified by MELT for entries which passed all the MELT checks and which presented at least 3 discordant pairs of reads supporting both left and right sides of the breakpoints. BLAST (v2.6.0; parameters: -evalue 99999) search of the candidate retrotransposons consensus sequences was performed against the genome and the identified ISs were additionally filtered out when the BLAST search showed similarity in a range of 260 bp around the IS breakpoint. The same analysis was performed using non-trimmed reads and two additional ISs identification programs, RetroSeq [98] and an in-house developed pipeline, and the significance of the results was maintained (data not shown).
Evaluation of the activity of the RTE element discovered in O. bimaculoides
We performed 30X coverage WGS of the DNA extracted from two different tissues (SUB and GILL) of two different O. bimaculoides individuals. About 150 ng of genomic DNA was processed in order to construct a whole-genome Illumina sequencing library using the Illumina DNA Prep kit according to the standard protocol in a manual procedure. The library QC has been performed by the Agilent DNA 1000 Kit run on the 2100 Bioanalyzer (Agilent Technologies, Inc.). We obtained 8 libraries with 62 nM average concentration and 513 bp average size. A 0.9-nM final library pool has been loaded on a NovaSeq 6000 S2 Reagent Kit (300 cycles) and run on the NovaSeq 6000 System. We obtained an average %Q30 = 90.4, 84.8% clusters passing filter, and a total output 1.55 Tb. The identification of non-reference LINE insertion events in GILL and SUB of two different individuals was performed using MELT (version 2.2.2). First, reads were mapped to the reference genome using bwa with default parameters (version 0.7.15). Then, the MEI zip file needed for the subsequent MELT analysis was generated by using the MELT BuildTransposonZIP command by (i) setting the error value to 3, (ii) providing the FASTA sequence of the RTE LINE, and (iii) providing the genomic coordinates of the annotated insertion sites of the RTE LINE (previously identified by masking the LINE sequence on the O. bimaculoides reference genome by RepeatMasker—version 4.0.5). Finally, MELT SPLIT analysis was run following the Preprocess, IndivAnalysis, GroupAnalysis, Genotype, and MakeVCF steps as indicated in the MELT documentation (https://melt.igs.umaryland.edu/manual.php). We selected only the integration sites which could be identified by at least 3 supporting reads and passing all the quality checks performed by the MELT (classified as PASS). The results were evaluated using the UpSetR library [99].
Phylogenetic tree generation
Evolutionary tree in Fig. 3d was generated using 100 full-length LINEs belonging to 15 LINE clades (Additional file 1: Table S4). Protein sequences were selected from Ohshima and Okada [39] and manually checked. InterPro was used to identify endonuclease and reverse transcriptase domains in all the LINEs. Multiple sequence alignments were performed using MAFFT [100] (v7.221; with option L-INS-i). We utilized TrimAl [101] (v1.4.rev15) to perform automated trimming aligned sequences (parameters: -fasta -automated1). Phylogenetic relationships between LINE elements were reconstructed with MrBayes [102] (v3.2.1). Bayesian analysis was run for six million generations with twenty-two chains, sampling every 1000 generations (6000 samples). Convergence was attained with a standard deviation of split frequencies below 0.01 and a consensus tree was generated using a burnin parameter of 1500 (25% of 6000 samples). The phylogenetic tree was visualized with FigTree program (release 1.4.2; http://tree.bio.ed.ac.uk/software/figtree/).
Classification of transcripts expressed in each sample, expression peaks, and selection of candidates for validations
We classified subsets of transcripts according to their expression levels across the different parts. A transcript was considered expressed in a specific part if, in all the replicates of that part, it showed an expression level > 0.5 CPM. This resulted in the classification of the groups of transcripts used to perform the analysis. We also classified transcripts having a peak of expression. These represent transcripts showing an expression level > 0.5 CPM in all three biological replicates of exclusively one part and below 0.5 in the others, which resulted in ~ 1800 transcripts. They were used to draw the heatmap using MeV (release 4.8.1) as part of the TM4 suite [103] with hierarchical clustering exploiting Pearson correlation. Candidates for validations were selected among the 1800 transcripts with expression peak using the following additional criteria. For coding transcripts, we randomly selected four coding transcripts representing octopus homologs of homeobox genes. The annotation was manually verified. For lncRNAs, we randomly selected four putative non-coding containing a SINE fragment. Both coding and lncRNA candidates were validated using RT-PCR and Sanger sequencing.
Polymerase chain reaction (PCR)
Octopus vulgaris cDNAs were generated from 200 ng of total RNA using Superscript VILO cDNA Synthesis Kit (Life Technologies) in 20 μl reaction volume. PCR were carried out using 20 ng of cDNA, 0.25 μl of Taq DNA Polymerase (5 U/μl; Roche), 1 μl of each specific forward and reverse primer (25 pmol/μl), 2.5 μl of PCR reaction buffer (10×), 2.5 μl of dNTP mix (10×), and water (up to final volume 25 μl). The ubiquitin gene (accession number FJ617440) was used as an internal control. Reactions for the coding transcripts were amplified with a single step of 2 min at 94 °C, 15 s at 94 °C, 30s at 60 °C, and 1 min at 72 °C for 35 cycles and 7 min for 72 °C. Reactions for the non-coding transcripts were amplified at an annealing temperature of 58 °C. The following primers were utilized:
-
Arx forward 5′-TCCCTGCCTTCTCAACACAT-3′
-
Arx reverse 5′-TCCGAACTTCCACGCTTACT-3′
-
Hoxb5a forward 5′-GTGGCGAGGAATTTAGGAAG-3′
-
Hoxb5a reverse 5′-GCAACAGTCATAGTCCGAACAG-3′
-
Phox2b forward 5′-AATGGGGTGAGATCCTTTCC-3′
-
Phox2b reverse 5′-TTCATTGCAATCTCCTCTCG-3′
-
Meox2 forward 5′-TCCAGAACCGTCGGATGAAA-3′
-
Meox2 reverse 5′-TACGTAAAGGGCACACACCT-3′
-
Seml forward 5′-CACTTGTGCAAGGTACCACG-3′
-
Seml reverse 5′-AGGTCTCCTTAAATTTATTTCTGTGCA-3′
-
Subl forward 5′-ACAGAGCATCTTGAGTCTCACT-3′
-
Subl reverse 5′-CACTCCTGCGCCTTTCATTT-3′
-
Oll forward 5′-GGATTGACCCTGCAACTTGG-3′
-
Oll reverse 5′-CAGTGATGACGGACTTGCAA-3′
-
Arml forward 5′-GTACCCCACAAAATTAAATC-3′
-
Arml reverse 5′-CACTCACAAGGCTTTAGTTGGC-3′
-
Ubi forward 5′-TGTCAAGGCAAAGATTCAAGA-3′
-
Ubi reverse 5′-GGCCATAAACACACCAGCTC-3′
Cloning and primer walking of the LINE element in Octopus vulgaris
PCR has been carried out on cDNA and gDNA with Takara LA Taq and the primer pair Line F - Line R7 with the following amplification program: 1 min at 94 °C (30 s at 94 °C, 5 min at 68 °C) × 35 cycles, 10 s at 72 °C. The specific amplicon obtained on cDNA has been gel extracted and cloned in pGEM – T Easy Vector System following manufacturer instructions. The cDNA clone has been Sanger sequenced with the following primers:
-
SP6 (pGEM’s multiple cloning region) 5′-ATTTAGGTGACACTATAGAA-3′
-
LineF 5′-CCCCAGTCGTCTTGACTTTG-3′
-
LineF1 5′-GAGCAGCCCTCTTCAGGAT-3′
-
LineF2 5′-GCGACCATCATCAGTGCTTA-3′
-
LineR2 5′-TCAGATGCCAGTGTTTGGAG-3′
-
LineF3 5′-GGGTCAGAAAGTGACGAGGA-3′
-
LineR3 5′-TGCATGAGGCGGAGTTTAG-3′
-
LineF4 5′-CAAGAGGCTGATCCTGGAGA-3′
-
LineR4 5′-CCGATCTCCTTTCCGCTTAT-3′
-
LineF5 5′-AGGAGAAATGCATGGAGCAG-3′
-
LineR5 5′-TGTTGATACCGGACTTGCAG-3′
-
LineR6 5′-CGGTAAGCAGTCCACGTCTC-3′
-
LineR7 5′-GAACTGCCGCCATGAGAC-3′
-
T7 (pGEM’s multiple cloning region) 5′-TAATACGACTCACTATAGGG-3′
LINE copy number variation using quantitative real-time PCR in Octopus vulgaris
Copy number variation analysis was performed on genomic DNA extracted from octopuses (N = 9; SEM, SUB, OL, and ARM). One ARM sample was chosen as calibrator, while 18S was chosen as invariant control. Purified genomic DNA concentrations were assessed by NanoDROP (Thermo Fisher Scientific). According to the starting concentration, DNA samples were diluted in TE buffer (10mM Tris-HCl, 1mM EDTA, pH 7.5) to a concentration of 100 ng/μL and then further diluted to a concentration of 10 ng/μL. All dilutions were checked by NanoDROP. Primers and multiplexing efficiencies were verified by linear regression to a standard curve ranging from 50 ng to 16 pg of genomic DNA. LINE and 18S slopes were − 3.3 and 3.8, respectively, and represent acceptable amplification efficiencies. Standard curves also confirmed that the final concentration of 5 ng DNA tested in qPCR was within the linear range of reaction. Reactions were performed in 20 μl reaction mixture containing iQ Multiplex Powermix (Bio-Rad), Taqman primers (10 μM), and probes (10 μM) differentially labeled (with FAM or VIC fluorophore) and specifically designed to hybridize with the target DNA sequences. LINE element was amplified using the following primers:
18S was amplified using the following primers:
As probe sequences we utilized the following:
qPCR was carried out for 20 s at 90 °C, followed by 40 cycles of 10 s at 95 °C and 30 s at 59 °C using the 7900HT Fast Real Time PCR System (Applied Biosystems). Assays were performed for each sample in duplicate and reproduced four times. Data obtained from the co-amplifications of the target DNA sequence and the internal invariable control 18S were analyzed using the 2–ΔΔCt method79.
Southern blotting
The brains (SEM, SUB, and OL) and a piece of an arm (ARM) of three O. vulgaris were dissected after humane-killing and immediately frozen in liquid nitrogen. Pulverized samples were treated following the methods utilized by Perelman and coworkers80; in brief, after phenol∶chloroform (50:50) extraction DNA was precipitated using cold isopropanol followed by centrifugation, suspended in TE buffer (10 mM Tris–HCl pH 8.0 and 0.1 mM EDTA pH 8.0), treated with ribonuclease A (10 μg/mL) and incubated at 37 °C for 30 min. DNA concentration was estimated using NanoDrop and quality checked by electrophoresis on 0.8% agarose gel. 10 μg genomic DNA for each tissue was digested with EcoRI (New England Biolabs) overnight at 37 °C and resolved on a 0.9% agarose gel for 15 h at 1.5V. DNA was transferred to a Hybond-N+ nylon membrane (0.45 μm; Amersham Pharmacia Biotech) according to Sambrook and Russell81. DIG-labeled LINE DNA probe was prepared by PCR DIG Probe synthesis kit (Roche). Hybridization and autoradiography were performed according to the DIG Application Manual (Roche).
Probe synthesis for in situ hybridization
We amplified by PCR a 356-bp cDNA fragment of the assembled LINE from bp 1512 to bp 1868 using the following primers:
The choice of the fragment and the design of primers have been based on manual curation steps ensuring that the chosen fragment is present exclusively in the transcript of the identified LINE element and in no other assembled transcripts. The amplified fragment was cloned into TOPO® TA Cloning® vector (Life Technologies, CA, USA) according to the manufacturer’s protocol. Cloned fragment was digested using BAMHI and ECORV restriction enzymes and validated by Sanger sequencing. Sense and antisense digoxigenin-labeled RNA probes were generated by in vitro transcription using the DIG-RNA Labeling Kit (SP6/T7; Roche Applied Sciences, QC, Canada). Labeled RNA probes were quantified by dot blot analysis.
In situ hybridization experiments
Brain masses (SEM, SUB, OL) and a segment of an arm from octopuses (N = 3) were fixed in paraformaldehyde 4% (PFA) in phosphate-buffered saline (PBS) at 4 °C (3h for brain masses; overnight, ARM). Samples were washed (four rinses in PBS), dehydrated in series of graded methanol/PBS (1:3, 1:1, 3:1 v/v), and stored at least one night in methanol (− 20 °C). Tissues were then rehydrated at 25 °C in a series of graded methanol/PBS (3:1, 1:1, 1:3 v/v) solutions and cryoprotected in 30% sucrose in PBS. After sucrose infiltration, samples were embedded in tissue freezing medium (OCT; Leica Biosystems) and sectioned using a cryostat (Leica CM3050 S). Sagittal and/or coronal sections (40 μm) were collected in PBST (phosphate-buffered saline including 0.1% Tween™ 20 and 0.2mM sodium azide). Washed free-floating sections were mounted on Superfrost Plus slides (Menzel Gläser) and let dry overnight under fume hood. Hybridizations were performed as described by Abler et al. 82 with modifications. After rehydration in PSBT, the sections were quenched at 25 °C in 6% H2O2 (30 min) treated with proteinase K (10 min) and post-fixed with PFA-G (4% paraformaldehyde and 0.2% glutaraldehyde) for 20 min. Prehybridization step was performed in hybridization solution (HB 50% formamide, 5× SSC, with 10 μg/mL heparin, 10 μg/mL yeast tRNA, and 1% Blocking reagent) for at least 1 h at 60 °C and then incubated overnight in HB with the digoxigenin-labeled riboprobes. Post-hybridization washes (50% formamide, 5× SSC, 1% SDS) were carried out for 2 h at 60 °C. The sections were washed in TNT (10mM Tris-HCL pH 7.5, 0.5 M NaCl, 0.1% Tween™ 20) at 25 °C and incubated for 15 min at 37 °C with RNase (0.25 μg/mL), followed by a FS (50% formamide, 5× SSC) incubation of 2 h (60 °C). DIG was detected with an alkaline phosphatase labeled antibody (Roche). After a saturation step in TBS pH 7.5 (10% sheep serum, 1% blocking reagent, 1% BSA, 0.1% Tween™ 20) for 1 h (room temperature), the sections were incubated overnight at 4 °C with antibodies (1:1000; in TBS containing: 5% sheep serum, 1% blocking reagent, 1% BSA). The following day, sections were washed for 2 h in TBS (pH 7.5; 0.1% Tween™ 20 and 2 mM levamisole) and then washed in alkaline phosphatase solution (100 mM Tris-HCL pH 9.5, 100 mM NaCl, 50 mM MgCl2, 0.1% Tween™ 20 and 2 mM levamisole). Bound antibodies were revealed using NBT-BCIP (Roche). After DIG in situ hybridization, slides were counterstained with DAPI (5 μg/mL, Invitrogen) washed and mounted using aqueous mounting.
RTE-2_OV custom-made polyclonal antibodies
Custom-made polyclonal antibodies were obtained from Primm Biotech Custom Antibody Services (Milan, Italy) and raised against two peptides derived from two portions of the RTE-2_OV protein: GAA (1-100 aa) and HAA (569-673 aa) resulting to be unique within the translation of the assembled transcriptome. To choose the portion to select, the manufacturer also took into consideration protein similarity (selection of regions with no significant identity to the murine and rabbit proteome), low complexity and transmembrane regions (exclusion of such regions), and distribution of predicted antigenic peptides (selection of regions with a high number of predicted antigenic peptide). The selected synthetic peptides were injected into two rabbits and boosted three times within 38 days (at days 21, 28, 35) after the first injection. The final bleeding was conducted 3 days after the last injection, and the crude sera were purified on Sepharose columns by immunoaffinity.
Immunohistochemistry and antibody validation
SEM, SUB, and OL dissected from O. vulgaris (N = 3) were immediately immersed 4% paraformaldehyde (PFA) in seawater (4 °C for 3 h). After fixation, samples were washed several times in PBS, cryoprotected in sucrose 30%, and embedded in OCT compound (OCT; Leica Biosystems). The embedded brain parts were then sectioned at 20 μm using a cryostat (Leica CM3050 S). No antigen retrieval was required. Tissue sections were rehydrated in three successive baths of 0.1 M PBS and incubated for 90 min (at RT) in 5% goat serum (Vector Laboratories Ltd.) diluted in 0.1 M PBS containing 0.05% Tween (PBTw). The slices were subsequently incubated at 4 °C with custom-made polyclonal antibodies raised against LINE (G and H; see RTE-2_OV custom-made polyclonal antibodies for details). The next day, slices were again washed by several changes of PBTw and incubation (at RT for 90 min) with secondary antibodies was carried out using Alexa Fluor 488 or 546 goat anti-rabbit IgG both diluted 1:200 in PBTw. Subsequently, sections were rinsed, and the cell nuclei were counterstained with DAPI (Molecular Probes, Eugene, OR). Finally, after further extensive washes, the sections were mounted with fluorescent mounting medium (Fluoromount, Sigma). For all antisera tested, omission of the primary antiserum and/or secondary antiserum resulted in negative staining. In addition, specificity was assured by pre-incubating (4 °C, overnight) the antibodies with 1 mg/mL of synthetic epitope (HAA and GAA, see RTE-2_OV custom-made polyclonal antibodies for details) before staining. Again, no immunostaining was observed. The two custom-made polyclonal antibodies raised against two different peptides of the RTE-2_OV protein stained the same spatial arrangement in the octopus brain tissue.
Imaging
Sections were observed under microscopes depending on the techniqueImage acquisition and processing were performed using the Leica Application Suite software (Leica Microsystems). For IHS, we utilized a Leica DMI6000 B inverted microscope, and for IHC, a Leica TCS SP8 X confocal microscope (Leica Microsystems, Germany). Tile Z-stacks were performed using a 0.2-μm step size. IHC figures have been assembled following guidelines for color blindness provided by Wong [104].