Animal care and embryo sampling
Experimental procedures and animal care were conducted in strict accordance with guidelines approved by the Animal Experiments Committee of the University of Tokyo (approval ID: AP19-8 and AP19-10). The inbred medaka strain (Hd-rR) was provided by NBRP (the National BioResource Project) at NIBB (the National Institute for Basic Biology, Aichi, Japan). Two wild strains, Kasasa (strain ID: WS1268) and Oura (strain ID: WS253), were supplied by NBRP at Utsunomiya University (Utsunomiya, Japan). These two strains are geographically close to each other and inhabit the same river system. They belong to the same sub-species group (southern population) as the inbred line Hd-rR [44]. The other two wild strains, Kaishi (strain ID: WS1275) and Tango (strain ID: WS240), were obtained from NBRP at Niigata University (Niigata, Japan). They belong to the southern population in Japan but inhabit different river systems. All adult medaka were maintained at 28 °C under a 14-h light:10-h dark cycle. Fertilized eggs were obtained by natural mating and incubated in hatching buffer at the same temperature used for keeping adults (28 °C). The embryos were carefully staged according to the standard developmental table [45]. The number of somites was used as a specific criterion to identify the stages of the phylotypic period (st. 23.5, fourteen somites; st. 28, thirty somites).
RNA extraction and RNA sequencing
Embryos
After the identification of developmental stages, each embryo was homogenized in QIAzol reagent (Qiagen, Germantown, MD, USA), and total RNA was isolated by using an RNeasy Min Elute kit (Qiagen) in accordance with the standard protocol. Total RNA concentrations were quantified by using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) with a Qubit RNA HS Assay Kit (Thermo Fisher Scientific). RNA quality was evaluated by using a BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA), and only samples with high-quality scores (RNA Integrity Number [RIN] ≥ 9) were used. RNAseq libraries were generated by using a TruSeq RNA sample preparation Kit v.2 (Illumina, San Diego, CA, USA) and sequenced by using a HiSeq1500 platform (Illumina, 100-bp single read, > 20 million mapped reads, Additional file 3: Table S2).
Adult tissues
Twenty-five target tissues were chosen from all around the adult body and dissected from a single adult individual to avoid overlapping sampling tissues (Fig. 3a). Four biological replicates (two males and two females of the d-rR strain) were prepared for each tissue. Dissected tissues were immediately homogenized in QIAzol reagent (Qiagen), and total RNA was isolated by using an RNeasy Mini kit (Qiagen) in accordance with the standard protocol. The quality and concentration of extracted RNA were evaluated in the same way as in the experiment done to extract RNA from embryos. For RNAseq library preparation, a TruSeq RNA sample preparation Kit v2 (Illumina) was used. The libraries were sequenced as paired-end (75-bp) reads on the NextSeq 550 platform (Illumina).
Sex identification of embryos
After total RNA isolation, DNA–protein aggregates were precipitated in ethanol–sodium citrate solution (0.1 M sodium citrate in 10% ethanol), and DNA was further extracted from these aggregates by ethanol precipitation. By using the extracted DNA, the sex of embryos was determined by using the PCR-amplifying DMY gene, a Y-chromosome-specific gene known to initiate male sex differentiation after stage 34 [46] (forward primer sequence: 5′-TGTAGTCCAGAGGCTTCGTC-3′; reverse primer sequence: 5′-GGACGATGAAGCAGAGTAGC-3′).
Estimation of gene expression levels from RNAseq data
Adapter trimming was performed by using the trimmomatic (ver. 0.38) program [47], and the quality of RNAseq data was evaluated by using FastQC (ver. 0.11.8, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The sequenced reads were mapped against the reference medaka genome (version ASM223467v1) by using HISAT2 (ver. 2.1.0) [48]. The mitochondrial DNA sequence from the reference genome was removed before the mapping. StringTie (ver. 1.3.5, with all the default parameters) [49, 50] was used to quantify relative gene expression levels (in transcripts per kilobase million [TPM]). Random sub-sampling of 20 million reads from the total number of genome-mapped reads for each sample was performed to avoid bias arising from differences in read depth among samples. The log10-transformed TPM levels (namely, log10(TPM+1)) were used in the following analysis, where \({x}_j^i\) represents the log-transformed gene expression level of the jth gene in the ith individual. We then filtered out genes with low expression, keeping those that showed \({x}_j^i\) ≥ 0.1 for all individuals. Note that consistent results were obtained with other expression-level thresholds ranging from 0 to 1.5 (data not shown).
Identification of genes with significantly greater deviation in expression levels from the technical error
In quantifying differences in phenotypes (whole embryonic transcriptome) and gene expression levels of same genes between individual embryos, bias from technical errors had to be reduced as much as possible. To do so, only genes showing significantly greater deviation in the inbred twin samples than in the technical replicates were used. (See also Additional file 1: Figure S5.) Notably, the deviations in expression of those genes expressed at low levels tended to be indistinguishable from the technical error. Technical replicates were prepared by pooling RNA from four individual embryos into one tube and then dividing the pooled sample into four subsamples. This was followed by RNAseq (Additional file 1: Figure S9). First, the difference in jth gene expression \(\left|{x}_j^i-{x}_j^k\right|\) was calculated, where the ith and kth individuals are gender-matched twins. The technical error in jth gene expression was calculated as the average of \(\left|{x}_j^{\mathrm{tech},i}-{x}_j^{\mathrm{tech},k}\right|\) over all possible combinations with i ≠ k (six combinations in total) among the four replicates, where \({x}_j^{\mathrm{tech},i}\) is the expression level of the jth gene in the ith technical replicate. Then, one-sided Wilcoxon rank-sum tests for each gene were performed to determine whether the inbred twins’ gene expression differences were significantly greater than those of the technical replicates (α = 0.01). Following the analysis, we used only those genes for which the expression differences were significantly larger than the technical error.
Evaluation of whole embryonic developmental stability
The difference in whole embryonic gene expression profiles between inbred twins was quantified by calculating the variance of distribution of the differential gene expression levels. Let \({y}_j^{ik}\) be the difference in jth gene expression between the ith and kth individuals, i.e. \({y}_j^{ik}=\left({x}_j^i-{x}_j^k\right)\). Then, the variance was defined as \({V}^{ik}=\frac{1}{N}\sum_j{\left({y}_j^{ik}-\overline{y_j^{ik}}\right)}^2\), where \(\overline{y^{ik}}\)is the average gene expression difference across genes and N is the number of genes to be analysed. The developmental stability of the gene expression profiles was evaluated as the average of the variance Vikacross gender-matched inbred twin pairs.
Evaluation of read-depth bias toward whole embryonic variation
Potential bias from differences in read depth toward whole embryonic phenotypic variation was evaluated by creating a simulated dataset with different read depths. Genome-mapped RNAseq reads were randomly picked with several read depths (3-, 5-, 10-, 15-, 20-, 25-, and 30-million genome-mapped reads) and calculated the variance of distribution of differential gene expression (Additional file 1: Figure S3b). The test data set was obtained from the sequence data of gender-matched, quadruplet inbred twins (st. 23.5) raised in the same environment until sampling. Data for 20 million reads were created uniformly for all samples, as this was the maximum depth that could be obtained for all samples.
Evaluation of whole embryonic evolutionary conservation
To quantify the intraspecies diversity of the embryonic gene expression profiles, the variance of the distribution of differential gene expression levels between two individuals were used, as defined above. After calculating Vik for the ith Kasasa and kth Oura embryos among all possible gender-matched pairs and defined it as the intraspecies diversity. Intraspecies diversity was also calculated by using the same method as in previous studies by using 1 − Spearman’s correlation coefficient (rho) [7, 10, 12]. As shown in Additional file 1: Figure S4, consistent results were obtained between 1 − Spearman’s method and the analysis using the variance Vik. Note that adult fish of the Kasasa and Oura strains were kept in the same breeding environment.
Evaluation of gene expression stability
The variation of each gene expression level was calculated by averaging the expression difference \(\left|{x}_j^i-{x}_j^k\right|\) for all inbred twin pairs and defined as the gene expression stability of the jth gene. As the variation depends on the absolute expression level, we normalized it by using the following procedure (Additional file 1: Figure S9). For each developmental stage, (1) sort genes by the absolute expression levels averaged over all inbred individuals; (2) calculate a running median of the expression variation over the absolute expression levels (window size 501 genes). In other words, for each gene, the variation was corrected against those of ± 250 genes with similar expression levels; (2a) in the case of windows with fewer than 250 genes on either half side (e.g. the 250 genes with the top or bottom expression levels), reduce the window size to give an equal number of genes on each side; (3) obtain the corrected expression variation by subtracting the corresponding median value. This procedure eliminated the dependency of the expression variation on the absolute expression level (Additional file 1: Figure S9c).
Evaluation of microevolutionary, intraspecies expression variation
To determine the intraspecies expression variation, we used the same calculation method that was used to evaluate the stability of gene expression. Namely, we calculated the expression difference between Kasasa and Oura embryos, \(\left|{x}_j^{\mathrm{Kasasa},i}-{x}_j^{\mathrm{Oura},k}\right|\), over all gender-matched combinations, where \({x}_j^{\mathrm{Kasasa},i}\) and \({x}_j^{\mathrm{Oura},k}\) represent the expression level of the jth gene of the ith individual in the Kasasa population and the kth individual in the Oura population, respectively. We then normalized the intraspecies expression variation in the same way as for inbred twins to eliminate expression-level dependency.
Evaluation of interspecies expression variation
Interspecies expression variation between medaka and other species was analysed as the expression difference between 1:1 orthologs (defined by reciprocal best BLAST hits using the longest peptide for each gene; e-value > 1e−5, BLAST+ ver. 2.9). The interspecies diversity between medaka and other species was also quantified as the difference in mean expression levels between 1:1 orthologs (defined by reciprocal best BLAST hits using the longest peptide for each gene; e-value > 1e−5, BLAST+ ver. 2.9). Given that there are no perfect counterparts in developmental stages between different species, we performed the analysis only for the most conserved developmental stage for each species, namely the phylotypic period (medaka, st. 23.5; zebrafish, prime-5; chicken, HH16; mouse, E9.0) [7, 10, 12, 28]. Peptide sequences of each species were obtained from the ensemble database (medaka, Ensembl v95/ASM223467v1; zebrafish, Ensembl v99/GRCz11; chicken, Ensembl v99/GRCg6a; mouse, Ensembl v99/GRCm38). For zebrafish, chicken, and mouse, the average expression level of each gene was calculated over biological replicates (3 for zebrafish, 2 for chicken, and 2 for mouse). For medaka, the average expression level was obtained over inbred individuals (46 for st. 15, 48 for st. 23.5, 50 for st. 28, and 26 for Hatch). The interspecies expression variation of the jth gene \(\left|{x}_j^{\mathrm{medaka}}-{x}_j^{\mathrm{species}}\right|\) using species ∈ {zebrafish, chicken, mouse} was then calculated and normalized as mentioned above to eliminate expression-level dependency.
Developmental genes and genes expressed through embryogenesis
Developmental genes
On the basis of gene-associated GO terms obtained from the Ensembl database (ver. 95), developmental genes were defined as those annotated with the GO term GO:0032502 [developmental process] or its descendant GO terms by using the GO.db package [51] (ver. 3.7.0) in R.
Genes expressed throughout embryogenesis
In accordance with previously published RNAseq data [39], genes were considered to be expressed throughout embryogenesis if they exhibited an average expression level among three biological replicates that was higher than the threshold (≥ 0.1) across all the sampled developmental stages [39] (16 developmental stages, Fig. 3c).
GO slim enrichment analysis
GO slim terms for each gene were obtained by using the R package ‘biomaRt’ (ver. 2.38) [52, 53] with the Ensembl 95 medaka genome. After we had extracted the 10% of genes with the smallest variations, the GO slim terms of the remaining genes were analysed and an enrichment analysis was performed. One-sided Fisher’s exact tests were performed (false discovery rate ≤ 0.01, Benjamini–Hochberg procedure for multiple comparisons).
Spatial and temporal pleiotropy of gene expression
Spatial pleiotropy of each gene was quantified as the number of tissues in which the gene was expressed among 25 tissues extracted from adult fish of the medaka d-rR strain (the strain of origin of the inbred Hd-rR strain). Genes with an average expression level TPM ≥ 1 among the replicate samples (two males and two females) were defined as expressed in that tissue. Temporal pleiotropy of each gene was quantified as the number of developmental stages in which the gene was expressed among the 16 developmental stages [39] (embryos of d-rR strain, three biological replicates for each stage). Genes with an average expression level TPM ≥ 1 were defined as being expressed at that stage.
DNA extraction and genome resequencing
Genomic DNA for each medaka was extracted by using a DNeasy Blood and Tissue Kit (Qiagen) with RNase treatment. In the case of wild medaka strains (Kasasa and Oura strains from the same river system, and Tango and Kaishi strains from different river systems), one male individual from each strain was collected. Two male and two female twins were collected in the case of inbred medaka (Hd-rR). After we had evaluated DNA quality by using NanoDrop (Thermo Fisher Scientific), DNA libraries were prepared with a TruSeq DNA Nano LT Library Prep kit (Illumina, San Diego, CA, USA). Sequencing was performed by using a NextSeq 550 platform (Illumina, paired-end 150-bp reads).
Evaluation of genomic diversity
Quality assessment and adapter trimming of DNAseq reads were performed in the same manner as for the RNAseq data. The sequence reads were aligned to the medaka reference genome (version, ASM223467v1) by using Bowtie2 (ver. 2.3.5.1) [54], and then only unique-hit reads were utilized. The aligned reads were further processed by using the ‘AddOrReplaceReadGroups’ command in Picard tools (ver. 2.20.8, http://broadinstitute.github.io/picard/) to assign read groups. Local realignment around indels was performed by using the Genome Analysis Toolkit [55] (GATK, ver. 3.8-1) ‘IndelRealigner’ command. For variant calling, the samtools [56] (ver. 1.9) ‘mpileup’ command was used, with default parameters. In the subsequent analysis, only high-quality sites [Phred-scaled quality score > 30, depth > ×5; filtered by using the bcftools [57] (ver. 1.9) ‘view’ command] were used. All the vcf files from different medaka samples were combined by using the ‘merge’ command in bcftools (ver. 1.9). PCA analysis was performed by using PLINK2.0 [58] (v2.00a2 64-bit, www.cog-genomics.org/plink/2.0/).
Estimation of potential regulatory regions by using ATAC-seq data
Open chromatin regions were first identified by using previously published ATAC-Seq data [28] and were linked with the closest gene ID via the closest proximal transcription start site (TSS; done by using the ‘closest-features’ command implemented in BEDOPS, ver. 2.4.37) [59]. Potential regulatory regions were then defined as those that overlapped with the distal (±5000 bp from the TSS) and proximal (–100 to +50 bp from the TSS) regions. For the ATAC-seq signals located at the boundaries of these distal and proximal regions, only those that overlapped with more than half of the region were used. TSS information for each gene was obtained by using the R package ‘biomaRt’ (ver. 2.38) [52, 53] with the Ensembl 95 medaka genome.
Single-nucleotide substitutions in open chromatin regions
The individual genomes of the following populations were mapped to the medaka reference genome (ver. ASM223467v1): Hd-rR (two males and two females), Kasasa (one male), Oura (one male), Kaishi (one male), and Tango (one male). Homozygous single-nucleotide substitutions from the reference genome were detected for sites with sufficient read quality and read depth (Phred-scaled quality score > 30, depth > ×5). Single-nucleotide substitutions within the open chromatin regions ±5000 bp around the TSS were assigned to each gene and counted across all eight samples.
TATA-box elements
TATA-box elements in TSS proximal regions (−100 to +50 bp) were scanned by using the FIMO web browser [60] (ver. 5.3.3; P value cut-off 10−4; reference medaka genome, version ASM223467v1). The following four motif sequences were used to detect the TATA-box: TATAAAAA, TATAAATA, TATATAAA, and TATATATA. For ‘TATA’ repeated sequences, any two base shifts were counted separately. Those on different DNA strands were also counted separately.
Statistics
Biological replicates consisted of embryos from different parents and born on different days to appropriately represent the population of interest. For statistical tests, α level = 0.05 was employed to indicate statistical significance throughout the analyses unless otherwise specified. To avoid an inflated type I error rate in multiple comparisons following the Kruskal–Wallis test, we performed a Steel–Dwass test implemented in the R package NSM3 (ver. 1.12) [61]. For GO slim term enrichment analysis, the Benjamini–Hochberg procedure was utilized with Fisher’s exact tests.