Regulatory and evolutionary impact of DNA methylation in two songbird species and their naturally occurring F1 hybrids

Background Regulation of transcription by DNA methylation in 5’-CpG-3’ context is a widespread mechanism allowing differential expression of genetically identical cells to persist throughout development. Consequently, differences in DNA methylation can reinforce variation in gene expression among cells, tissues, populations, and species. Despite a surge in studies on DNA methylation, we know little about the importance of DNA methylation in population differentiation and speciation. Here we investigate the regulatory and evolutionary impact of DNA methylation in five tissues of two Ficedula flycatcher species and their naturally occurring F1 hybrids. Results We show that the density of CpG in the promoters of genes determines the strength of the association between DNA methylation and gene expression. The impact of DNA methylation on gene expression varies among tissues with the brain showing unique patterns. Differentially expressed genes between parental species are predicted by genetic and methylation differentiation in CpG-rich promoters. However, both these factors fail to predict hybrid misexpression suggesting that promoter mismethylation is not a main determinant of hybrid misexpression in Ficedula flycatchers. Using allele-specific methylation estimates in hybrids, we also determine the genome-wide contribution of cis- and trans effects in DNA methylation differentiation. These distinct mechanisms are roughly balanced in all tissues except the brain, where trans differences predominate. Conclusions Overall, this study provides insight on the regulatory and evolutionary impact of DNA methylation in songbirds. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-024-01920-2.

1. Supplementary Results 2. Figure S1 -Figure 1 A-L in main manuscript coloured by sample group (species and hybrids) 3. Figure S2 -Methylation gene profile with gene body split into exons and introns 4. Figure S3 -Tissue-specific patterns of the association between genetic-and methylation differentiation 5. Figure S4 -CGI proportion gene profile 6. Figure S5 -Patterns of genetic and epigenetic change at misexpressed genes 7. Table S1 -sample and sequencing information (see external file) 8. Table S2 -Frequency of hypermethylation of tissue-specific DMRs 9. Table S3-6 -tsDMR GO analyses (see external files) 10.Table S7 -Classification system for the mechanism of DNA methylation divergence 11.Table S8

Gene annotation
To improve annotation of untranslated regions (UTRs) and better predict transcription start sites (TSS) and transcription termination sites (TTS), we updated the gene annotation of the collared reference genome FicAlb1.5.For this purpose, we used 36 RNA-seq samples from six tissues (brain, heart, kidney, liver, testis and spleen) of the six collared flycatcher individuals for the multi-assembly Oyster-River protocol [1] for de novo transcriptome assembly as well as MAKER [2].In total, 14,943 out of 16,576 genes mapped to the collared flycatcher chromosome-level assembly.Of these, 9,597 genes had at least one 5' UTR (in the following referred to as promoter set) and 8563 also had a 3' UTR (gene profile set).

Relationship between tsDMRs and tissue-specific expression
To understand the impact of tsDMRs on gene expression levels we quantified tissue-specific gene expression using the preferential expression measure (PEM; Figure SR1) [3].We tested for a significant deviation from random rank of PEM for genes with a tsDMR in a certain tissue.If tsDMRs are associated with tissue-specific expression, then we expect to see an excess of genes with tsDMRs in the promoter having either the highest or lowest expression.Only tsDMRs at CGI promoters in testis (COL, PIE and HYB) and brain (HYB) had a significant deviation from random PEM ranks (c 2 test of independence, p < 0.05), which means that there is an association between tsDMRs and tissue-specific expression.For these, we tested for a significant difference in PEM between genes with a tsDMR in a certain tissue and genes with no tsDMR in any tissue (reference set).This test answers the question: do genes with a detectable tsDMR in the promoter show greater tissue-specific expression than genes lacking promoter tsDMRs?This forms a test of the relative importance of DNA methylation compared to other unobserved explanatory variables for tissue-specific expression patterns.Testis, which had the highest number of tsDMRs overlapping CGI promoters, showed a significantly higher PEM in overexpressed genes compared to the reference set (Wilcoxon test, p < 0.05).This highlights the importance of DNA methylation in conferring testis-specific expression, since the reference set consisted both of genes with ubiquitous expression and genes whose tissuespecificity is controlled by mechanisms other than tissue-specific promoter methylation.Underexpressed genes with tsDMRs did not show lower PEM values, indicating that tsDMRs in promoters are generally associated with tissue-specific overexpression in this tissue.

Tissue-specific hybrid inheritance patterns of promoter DNA methylation
We investigated promoter methylation further because of its role in transcriptional repression.When comparing samples from the parental species using PCA, tissue was a more important factor to divergence in promoter DNA methylation compared to species (Figure SR2A-B).When we split up the dataset by tissue, the PCA separates the species with hybrids grouping intermediately along the first and second PC axes (Figure SR2C-L).This intermediate placement is distinct from earlier observations based on expression data, where misexpression in the hybrids dominated the PCA in all tissues except testis (Mugal et al. 2020).
To further understand the effects of DNA methylation divergence in promoters of hybrids we compared their methylation level with the parental species (see Supplementary Methods).This allowed us to classify the inheritance pattern of DNA methylation into six classes: conserved (HYB close to both COL and PIE), collared-dominant (HYB closer to COL), pied-dominant (HYB closer to PIE), and the two mismethylation categories, overdominant (HYB higher than COL and PIE) and underdominant (HYB lower than COL and PIE).For CGI promoters 81-95 % were classified as conserved per tissue compared to 48-76% for Other promoters (Fisher's exact test, p < 0.05, Figure SR2M).In contrast, the relative proportion of non-conserved inheritance classes were similar between promoter types and only significantly different for heart (p ≈ 0.033) and liver (p ≈ 0.013; Figure SR2M).For both heart and liver, more overdominance in Other promoters was driving the significance.Around 1-4 % of CGI promoters were mismethylated per tissue compared to 7-11 % for Other promoters, further highlighting the stronger conservation of CGI promoters (Figure SR2M).Distribution of inheritance patterns were significantly different between tissues both with (Fisher's exact test, p < 0.05) and without brain, which was sequenced to lower coverage (Table S1).Liver, for example, showed the highest excess of Collared-dominance while heart and testis had excesses of over-and underdominance respectively.These results were in line with genome-wide results indicating that hybrid inheritance of DNA methylation varies predictably among tissues (Table 1).Overall, the intermediate placement of hybrids in-between parental species in the PCA (Figure SR2), is explained by a mixture of collared-and pied-dominant as well as additive effects (Figure SR2M).Previous work on the Ficedula flycatchers has shown a greater genetic differentiation on the Z sex chromosome compared to the autosomes (A) [4], as well as divergence in gene expression [5].While all tissues and promoter type combinations showed a tendency towards less conservation on the Z chromosome, none of these differences were significantly different (Fisher's exact test, p > 0.05).Instead, a slightly higher proportion of additive effects on the Z sex chromosome compared to autosomes where significant for brain (CGI) as well as kidney and liver (Other) (Figure SR2M).This may be due to slightly greater methylation differentiation on the Z chromosome (<2 percentage points for all tissues; Wilcoxon rank sum test, p < 0.05; Figure SR3), to some extent perhaps caused by greater genetic differentiation (ttest; p < 2.2 * 10 -16 ) on the Z compared to autosomes (Figure SR4).There was also significantly more collared-dominance on Z compared to A in liver and brain (CGI), more pied-dominance on Z for heart (CGI), and less underdominance on Z in brain and testis (Other).

Figure SR1 .
Figure SR1.Distribution of tissue-specific expression values per promoter type and tissue.A PEM value >0 and <0 indicates overexpression and underexpression compared to other tissues respectively.

Figure SR2 .
Figure SR2.Principal component analysis of promoter methylation patterns and inheritance patterns of methylation in F1 hybrids.CGI promoter methylation PCA for all samples (A) and separated by tissue (C-G).Other promoter methylation PCA for all samples (B) and separated by tissue (H-L).When including all samples, tissue dominates methylation variation regardless of promoter type.When separating per tissue, hybrids generally show intermediate promoter methylation levels.The inheritance pattern of promoter methylation shows that this

Figure SR3 .
Figure SR3.Methylation differentiation (Mdiff) of promoter sequences on autosomes (A) and the Z chromosome.

Figure SR4 .
Figure SR4.Genetic differentiation (FST) in the promoter is higher on the Z compared to the autosomes.

Figure S1 :Figure S2 :Figure S3 :Figure S4 :
Figure S1: Figure 1 A-L of the main manuscript but colored by sample group (species and hybrids).The results show that COL, PIE and HYB show similar patterns.CGI A Brain B Heart C Kidney D Liver

Figure S5 :
Figure S5: Patterns of genetic and epigenetic change at DE genes between HYB and both COL and PIE (misexpressed genes).(A) and (D) show DMR frequency, while (B) and (D) show non-CpG FST between COL and PIE.Testis and brain not included in (E) and (F) for visibility due to very few DE genes.

-
Number of fixed difference loci by divergence class 12.

Table S9 -
Number of differentially expressed genes 13.

Table S10 -
Determinants in cis of misexpressed genes

Table S1 :
Sample and sequencing information (see external file).

Table S2 :
Frequency of hypermethylation of tissue-specific DMRs.Here being hypermethylated means that a tsDMR has a higher methylation level in a specific tissue compared to the others.For example, a majority of testis tsDMR were hypermethylated for COL, PIE and HYB.