Transcriptome-wide bsRNA-seq after separation by translation state
For polysome profiling, rapidly growing HeLa human cervical cancer cells (in biological triplicates B, C and E) were lysed in the presence of cycloheximide, lysates separated by ultracentrifugation through linear sucrose density gradients and multiple fractions taken [70] (Fig. 1a and Table S1 for parameters of each lysate). To monitor efficacy of separation on the gradients and reproducibility across replicates, the absorbance profile at 254 nm was recorded and the distribution of Ribosomal Protein L26 (RPL26) measured (Fig. 1b,c). RNA was isolated from fractions, which were spiked with Renilla Luciferase (R-Luc) RNA transcribed in vitro (sequences shown in Table S2), and its integrity checked (Figs. 1d and S1A). RNA fractions were then used in RT-qPCR (primers listed in Table S3) to establish the sedimentation behaviour of multiple cellular mRNAs (Figs. 1e and S1B). To best capture the varying mRNA profiles and, therefore, different translation states, we pooled RNA samples into four final fractions for bsRNA-seq. Fraction 1 encompassed the small ribosomal subunit peak, fraction 2 included the large ribosomal subunit and monosomal peaks, whereas fractions 3 and 4 covered light and heavy polysomal peak regions, respectively (indicated by the blue boxes in Fig. 1). Pooled bsRNA-seq fractions were prepared from each biological triplicate (Fig. S1C), laced with the ERCC (External RNA Controls Consortium) spike-in mix [71], depleted of rRNA and subjected to bisulfite conversion (see Fig. S1D for RNA integrity analyses before and after these steps). Libraries (termed LibB1–4, LibC1–4, and LibE1–4) were prepared and subjected to Illumina HiSeq sequencing (bsRNA-seq).
Transcriptome-wide identification of candidate m5C sites
bsRNA-seq yielded on average ~ 55 million (M) read pairs per library after initial processing. Reads were mapped to ‘bisulfite-converted’ references as shown in Fig. S2 and detailed in “Methods”. Of further note, the human reference genome was combined with the spike-in sequences, whereas dedicated references were used for rRNA and tRNA mapping. On average, ~ 58% of reads were uniquely mapped to the genome (an additional ~ 1.4%, ~ 0.07%, ~ 4.9%, and ~ 0.04% of reads mapped to the ERCC and R-Luc spike-ins, rRNA, and tRNAs, respectively; mapping statistics are given in Table S4).
Next, we merged the four bsRNA-seq fraction libraries into one composite library per biological replicate (termed cLibB, cLibC and cLibE, i.e. N = 3) to approximate a total cellular RNA analysis (Fig. S2). Using the initial read mapping, we then assessed overall cytosine conversion for different RNA types. We saw near complete conversion (> 99.8%) of both ERCC and R-Luc spike-ins, which are devoid of modified nucleosides and thus attest to the efficiency of the bisulfite conversion reaction (Fig. 2a). C-to-T conversion was also very high across the transcriptome (~ 99.7% for all annotated RNAs; ~ 99.8% for transcripts of protein-coding genes), consistent with a rare occurrence of m5C sites in mRNA [47,48,49,50, 58]. Conversion levels for tRNAs were lower (~ 94.2%), as expected from the common presence of m5C sites in tRNAs. The conversion of rRNAs was intermediate (~ 97.8%), which is inconsistent with the sparseness of m5C sites within mature rRNAs (only two sites known in 28S rRNA). We further visually inspected non-conversion at individual cytosine positions for multiple spike-in RNAs and tRNAs as well as for 28S, 18S, and 5.8 rRNAs (Fig. S3). Initial read mapping already reported high levels of non-conversion at known m5C positions within tRNAs as well as near complete conversion at other tRNA positions and at all cytosines within the spike-in transcripts (top panels in Fig. S3A,B). By contrast, mapping to rRNAs suffered from incomplete cytosine conversion in multiple clusters (top panels in Fig. S3C), particularly in 28S rRNA. This clustered non-conversion likely reflects the known sensitivity of the bisulfite reaction to secondary structure [72,73,74] and indicated additional filtering as necessary for high-confidence site calls.
As a first step towards that, we removed reads that contain more than three non-converted cytosines from the initial read mapping (the ‘3C’ filter) [37, 48, 63]. While this did not affect the analysis of spike-in RNAs and most tRNAs (middle panels in Fig. S3A,B), it noticeably reduced, but did not completely eliminate, clustered non-conversion for rRNAs (middle panels in Fig. S3C). Thus, as also suggested by others [48], we further implemented a ‘signal-to-noise’ filter that suppresses site calls at positions where less than 90% of mapped reads passed the 3C filter (‘S/N90’). Combined application of the 3C and S/N90 filters (3C & S/N90) did not change site calls for most tRNAs, although predictably, those with more than three genuine m5C sites were affected (bottom panels in Fig. S3B). While detection of the two known m5C sites at positions 3761 and 4417 in 28S rRNA benefited from the 3C filter, they were both flagged as unreliable by the S/N90 filter (compare middle and bottom panels in Fig. S3C, as well as the ‘zoomed plots’, Table S5A). On balance, we accepted this tendency for false negative site calls, at least in tRNA and rRNA, in favour of suppressing false positive calls transcriptome-wide.
Additional criteria we implemented for inclusion as a candidate m5C site were as follows: coverage above 30 reads (‘30RC’), at least 80% of bases identified are cytosine or thymidine (‘80CT’) and a minimal depth of at least five cytosines (‘5C’). Reproducibility of sites called in this way was high across the biological triplicates (R2 > 0.95; Fig. S4A). Nevertheless, most called sites had very low cytosine non-conversion (e.g. ≤ 1% on average for 35,090 of a total 39,529; Fig. 2b). Thus, to select sites likely to be ‘biologically meaningful’, we finally required an average non-conversion level across replicates of at least 10% (‘10MM’). This identified 1034 high-confidence candidate m5C sites in the transcriptome-wide mapping (Fig. 2b, Table S5B), with only a minority of sites showing very high levels of non-conversion (e.g. ~ 5% had levels above 60%). A total of 322 of these 1034 sites were also included in a set of HeLa cell m5C sites reported recently based on bsRNA-seq with similar mapping and site selection strategies [48] (Fig. 2d). The underlying concordance of the datasets is likely higher, as our data has around fourfold greater depth of coverage and overlapping the two site lists further suffers from non-conversion thresholding effects (leading to an exaggerated lack of overlap between sets around the 10% cut-off; Fig. 2e). Given their critical impact on data clean-up, the specific effects of the 3C and S/N90 filters on numbers of called sites in different RNA types are shown in Fig. S4B (with all other filters left in place). Notably, the great majority of candidate m5C sites (846 of 1034) was found in transcripts of protein coding genes (Fig. 2C); these became the main focus of further analyses, as detailed below.
Candidate m5C sites in tRNAs and other ncRNAs
Given that polysomal RNA was our source material, coverage for most ncRNAs was not expected to be high. Nevertheless, ~ 18% of sites that passed all our criteria mapped to ncRNA biotypes (Fig. 2c). Notable numbers were found in long intergenic ncRNAs (15 sites) and antisense transcripts (12 sites). These include sites in ribonuclease P RNA component H1 (RPPH1; chr14:20,343,234), small Cajal body-specific RNA 2 (SCARNA2; chr1:109,100,508), RNA component of signal recognition particle 7SL1 (RN7SL1; chr14:49,586,869) and two RN7SL pseudogenes, RN7SL395P (chr8:144,785,379) and RN7SL87P (chr5:144,140,971), as well as sites in the two vault RNAs, vtRNAs1–1 (chr5:140,711,344 and chr5:140,711,359) and vtRNA1–2 (chr5:140,718,999), all of which have been previously reported [9, 46, 75]. Additionally, we identified candidate sites that had not been specifically noted by previous studies, in NSUN5 pseudogene 2 (NSUN5P2; chr7:72,948,484), telomerase RNA component (TERC; chr3:169,764,738) and nuclear paraspeckle assembly transcript 1 (NEAT1; chr11:65,425,307). Although it did not fulfil our coverage criteria, we also saw evidence of non-conversion in SNORD62B (chr9:131,490,541). The sites in RPPH1, SCARNA2, NSNU5P2 and SNORD62B were further validated in independent biological samples (see below).
A total of 135 sites were located in intergenic regions, according to GENCODE v28 annotation; however, 84 of these reside within tRNAs according to RefSeq annotation. We systematically identified sites in tRNAs from reads that uniquely mapped within the processed tRNA sequence coordinates according to our bespoke pre-tRNA reference (see “Methods”; ~ 175,000 reads per cLib/replicate). tRNA coverage in our data is comparatively low as the bulk of tRNA sediment near the top of the gradient (Fig. 1c), a region we did not include in our bsRNA-seq fraction selection. Nevertheless, we identified a total of 119 candidate m5C sites in 19 tRNA iso-decoders (Table S5C). These sites typically show a high cytosine non-conversion level (see examples in Fig. S3C), and they are near exclusively located in the anticipated tRNA secondary structure positions (Fig. S4C). The abundant identification of the major known tRNA sites confirms the reliability of our dataset.
Validation of candidate m5C sites and their NSUN2-dependence
We employed a targeted approach termed amplicon-bsRNA-seq to confirm the presence of selected sites in independent biological samples. It uses RT-PCR to amplify specific transcript regions after bisulfite conversion and purified amplicons are sequenced using Illumina MiSeq technology. We combined this with siRNA-mediated knockdown of specific RNA methyltransferases (see below) to exclude false positives and further to determine MTase site specificity. Altogether, we report amplicon-bsRNA-seq data for 26 different RNAs in this study, including two R-Luc spike-ins and two tRNAs (as controls; Fig. S5) as well as 17 mRNAs (two sites in the 5′UTR, nine in the CDS and six in the 3′UTR) and five ncRNAs (Figs. 3 and S6; see also Table S6). Candidate sites for amplicon-bsRNA-seq were primarily selected based on their non-conversion level. As further described below, we generated two datasets, ‘confirmatory’ and ‘in depth’, which mainly targeted sites with high (> 30%) or low (< 30%) non-conversion level, respectively (Table S6). Further, we considered the expression level of the RNAs containing candidate sites. We chose RNAs with comparatively high expression level (Table S6), although some lowly expressed RNAs were also included out of interest.
New total RNA samples from HeLa cells as well as from two prostate cell lines (epithelial PrEC and cancerous LNCaP) were prepared to test a potential dependence of site presence on the tissue/source material [49, 56]. To test the MTase-dependence of sites, we used siRNA-mediated knockdown (KD) targeting NSUN2 or TRDMT1, alongside controls (siRNA targeting m5C:DNA methyltransferase 1 [DNMT1] or a non-targeting control [NTC] siRNA). Knockdown efficiency was monitored for each sample by Western blotting and RT-qPCR (Fig. S5A,B).
Two independent sample sets for amplicon-bsRNA-seq experiments were generated. The ‘confirmatory’ set (N = 1) targeted sites with high cytosine non-conversion level in all three cell lines, combining HeLa cells with the full panel of siRNAs, LNCaP cells with siRNAs against NSUN2 and NTC, while PrEC cells were used in non-transfected form only. The ‘in-depth’ set was performed in biological triplicates (N = 3) and targeted sites with lower non-conversion in HeLa cells only, combined with knockdown of NSUN2, TRDMT1 and NTC control. In vitro transcribed R-Luc spike-in controls were added prior to RNA bisulfite treatment and efficient conversion was observed for all samples (Fig. S5C). Assessment of tRNAGly (GCC) and tRNAThr (UGU) sequences generally showed high non-conversion at the known m5C positions (TRDMT1 targets C38, NSUN2 targets C48–50) with complete conversion at all other assessed cytosines. Importantly, non-conversion was selectively reduced at C38 in tRNAGly (GCC) in all TRDMT1 KD conditions, while the variable loop positions C48–50 strongly reacted to NSUN2 KD (Fig. S5D). NTC transfection and DNMT1 KD had no such effects on tRNA non-conversion levels. Using the ‘confirmatory’ HeLa cell sample set, we then confirmed three known NSUN2-dependent sites [9], in the NAPRT and CINP mRNAs (Fig. S6A, top row) as well as in the ncRNA RPPH1 (Fig. 3c, bottom left; see figure legends for full gene names). Altogether this established selective MTase KD and efficient bisulfite conversion.
Next, we tested 14 mRNA candidate m5C sites in HeLa cells, representing a range of coverage and non-conversion levels as well as various mRNA regions (two sites in the 5′UTR, eight in the CDS and four in the 3′UTR, respectively). Seven sites were validated in the ‘confirmatory’ samples and each showed clear reduction with NSUN2 KD (RPS3, NDUFB7, RTN3, SZRD1, OSBPL8 in Fig. 3a, left column; SCO1, MCFD2 in Fig. S6A, middle row). Eight sites were reproducibly detected in the in-depth samples; four showed clear and statistically significant reduction with NSUN2 KD (RPS3, GIPC1, SRRT, GID8; Fig. 3a, right column). Note, that the RPS3 site was validated with both sample sets, indicating that the ‘confirmatory’ samples are still suitable for candidate evaluation. Two sites still appeared to respond to NSUN2 KD albeit without reaching significance, likely because their low non-conversion level would require more replication (CCT5, NSUN2; Fig. S6A, bottom row). Interestingly, two sites convincingly lacked responses to either MTase KD (PIGG, ZDHHC8; Fig. 3b). NSUN2-independence for the PIGG site has been noted previously [48], we additionally show its TRDMT1-independence here. Using the ‘confirmatory’ samples, we further confirmed presence and selective sensitivity to NSUN2 KD for four sites in ncRNAs (the RPS3 pseudogene (AL139095.2) encoded in the CAGE1 intron, NSUN5P2, SNORD62B, SCARNA2; Fig. 3c). Finally, six candidate m5C sites were also explored using the two prostate cell lines (four in mRNAs: SZRD1, RTN3, SRRT, PWP2, and two in ncRNA: SCARNA2, SNORD62B; Fig. S6B). All these sites were found in both PrEC and LNCaP cells and they each responded to NSUN2 KD in LNCaP cells.
In summary, the presence of all chosen sites was validated by amplicon-bsRNA-seq, even though based on polysome bsRNA-seq they varied widely in coverage (e.g. PWP2, ZDHHC8, SNORD62B, NAPRT were actually below our 30RC cut-off) and in cytosine non-conversion level. Regarding the latter, there was a reasonably good concordance between non-conversion level by transcriptome-wide and amplicon-specific measurements (e.g. CCT5 13% vs 5%, RSP3 25% vs 13% SRRT 63% vs 77%; averages from polysome bsRNA-seq versus NTC in-depth sample, respectively; see Table S6). This highlights the reliability of both, transcriptome-wide and amplicon-bsRNA-seq data. Although only based on a limited comparison, sites in both mRNA and ncRNA could be found in all three cell lines, suggesting at least some overlap in m5C profile between different cell types. Importantly, of the 17 mRNA and five ncRNA sites examined by amplicon-bsRNA-seq here, all ncRNA sites and 15 in mRNA were found to be targeted by NSUN2 and none by TRDMT1. Two mRNA sites did not respond to either knockdown and thus might be targeted by other MTases. Altogether, these data further substantiate the notion that NSUN2 has a broad but not exclusive role in modifying cellular transcriptomes [47,48,49,50, 57, 58].
Candidate m5C sites display enrichment in multiple mRNA regions
Approximately 82% of sites we identified were present in transcripts of protein-coding genes (Fig. 2c). In support of specific roles, these sites are enriched for several Gene Ontology (GO) pathway terms, particularly those related to cell adhesion, translation and RNA processing/turnover (Fig. S7A, Table S7). These results broadly match findings in similar cell contexts [49, 50].
Given the broad role of NSUN2 in mRNA cytosine methylation, it can be expected that sites share features of the canonical tRNA substrates of the enzyme. Thus, we predicted RNA secondary structure around sites using the RNAfold tool in the ViennaRNA Package 2.0 [76]. Compared to randomised sequences, this shows a relatively lower base-pairing tendency for the region immediately upstream of sites (position − 4 to − 1). Either side of this region there are patterns of alternating short segments with increased or decreased propensity for base-pairing (Fig. 4a, top panel), neatly resembling the context of the major NSUN2-dependent sites in tRNA structural positions C48–50 (Fig. 4a, bottom panel). We also investigated the sequence context around the modified cytosine using ggseqlogo [77]. We noted a moderate bias for C or G in the two upstream positions and a moderate-to-strong G-bias in downstream positions 1–5, yielding a consensus of C/G-C/G-m5C-G/A-G-G-G-G (Fig. 4B). Again, this consensus is similar to the C48–50 position in tRNA and its immediate 3′ sequence context. These findings elaborate on earlier reports that ‘non-tRNA’ m5C sites reside within CG-rich regions [9, 49, 75]. The similarity to tRNA structure was further noted for sites in vtRNAs [75]. They closely match recent findings that NSUN2-dependent sites in mRNAs reside in a sequence and structural context resembling tRNAs [48].
Next, we analysed candidate m5C site distribution along mRNA regions. A scaled metagene analysis showed a marked increase in sites around start codons (Fig. 4c), confirming prior reports in human and mouse [49, 56, 57]. While sites were found in both UTRs and also in intronic regions, just under half of them were located within the CDS (Fig. 4D). CDS sites showed some codon bias, being enriched in eight codons specifying 5 amino acids, primarily in the first and second codon positions (Fig. S7B). Despite the numerical predominance of CDS sites, spatial enrichment analysis of sites in mRNA using RNAModR [78] revealed significant overrepresentation of sites in the 5′UTR, with a minor but significant underrepresentation in the CDS (Fig. 4e). Of note, adherence to the site sequence context established above (Fig. 4B) was strongest in the 5′UTR, with some divergence in the 3′ UTR (consensus: G/C-U/G-m5C-A/G-G-G-G-G (Fig. S7C). To further inspect site prevalence near start codons, we divided the surrounding region (− 400 to + 1000 nt) into 100-nt bins and directly tested for enrichment (Fig. 4f). The broad window and relatively coarse bin size were necessary to retain statistical power, given the relatively low site numbers in mRNA. This showed a gradient of decreasing site prevalence in 5′ to 3′ direction, confirming the concentration of sites along 5′UTRs. Within 5′UTRs, the − 201 to − 300 interval showed the highest odds ratio, albeit without reaching significance. As the median length of 5′UTRs represented in our data is 226 nt, this could suggest some concentration of candidate m5C sites near the mRNA 5′ end; however, site numbers are too low to ascertain this (Fig. S7D, left panel). The 100-nt region immediately downstream of the start codon showed significant site enrichment, while bins within the body of the CDS (> 600 nt downstream of start codons) showed a continued decrease of site density (Fig. 4f). We extended these analyses to several other mRNA features, which mostly remained inconclusive due to diminishing site numbers in any given region. There was, however, a significant site enrichment within the 100-nt interval immediately downstream of stop codons (Fig. 4g) and in the interval 101–200 nt upstream of mRNA 3′ ends as well (Fig. S7D, right panel). Altogether, the diversity of candidate m5C site distribution patterns observed here hint at distinct, context-dependent functional roles for m5C.
Transcriptome-wide anti-correlation between cytosine modification level and mRNA translation efficiency
Links to mRNA translation are suggested by several of the site enrichment patterns described above. This was also emphasised in a recent report, showing a significant negative correlation between candidate m5C site-content in the CDS and translation efficiency in the HeLa cell transcriptome [48]. To independently verify this, we obtained HeLa cell ribosome profiling data from several studies [33, 79, 80]. Cumulative distribution of translation efficiency values allowed us to compare mRNAs found by us to contain candidate m5C sites with the remaining mRNAs. Irrespective of underlying ribosome profiling data, we then saw a clear and significant tendency for site-containing mRNAs to be less well translated (Fig. S8A). In a similar vein, we also assessed any transcriptome-wide relationship with mRNA stability [33, 79] and found site-containing mRNAs to display a significant trend towards longer half-life (Fig. S8B). Interestingly, this latter observation matches findings reported recently for m5C-modified mRNAs in mammals and zebrafish [47, 51].
A unique advantage of our sampling approach is that it allows us to profile the level of cytosine non-conversion at each site across polysome gradient fractions, with bsRNA-seq data for each fraction available in biological triplicates (see Fig. S2, Table S8A). We first assessed the overall non-conversion range of the 846 sites in transcripts from protein-coding genes in each fraction. This showed declining non-conversion levels with increasing ribosome association, with comparisons of fraction 1-to-2 and 2-to-3 reaching statistical significance (Fig. 5a, left panel). This trend was most pronounced with sites in the CDS and still discernible with 5′UTR sites, whereas the 3′UTR and intronic sites did not show a clear trend (Fig. 5a, remaining panels). This demonstrates a negative correlation, on the bulk level, between the extent of cytosine non-conversion and mRNA translation state, primarily driven by observations with CDS sites.
Next, we considered non-conversion levels of sites individually and performed Mfuzz clustering [81]. We selected a set of sites in mature mRNA that had sufficient (≥ 10 reads average) coverage in all four fractions (F1234; 254 sites), as well as a second set that had sufficient coverage in fractions 2–4 but not in fraction 1 (F234; 315 sites). Mfuzz was run requiring 9 clusters (Figs. S9,S10A, Table S8B,C) before re-grouping clusters based on overall non-conversion trends. This generated three major profile patterns for each set, representing positive, neutral and negative correlation with translation state, respectively. Positive and negative pattern sets each showed significant non-conversion level change between fractions as expected (F1234 shown in Fig. 5b; F234 shown in Fig. S10B). Focusing on the set with stronger discriminative potential, F1234, we saw that non-conversion levels were not distributed across patterns equally; most notably, sites with positive patterns typically had low non-conversion levels through the fractions, whereas sites with a neutral pattern were, for unknown reasons, split into two groups, one with ~ 20% and a smaller group with ~ 60% non-conversion (Fig. 5b, top panels).
Notably, site profiles indicating negative correlation were the most common (~ 61%), with positive profiles (~ 19%) being the least frequent (Fig. 5c, top panel). This bias towards negative profiles was moderately but significantly enhanced with CDS sites (~ 68%), whereas 3′UTR sites were significantly underrepresented (~ 43%). Among profiles showing positive correlation with translation, CDS sites were significantly depleted (~ 13%), while 3′UTR sites (~ 38%) were significantly enriched. Conversely, compared to all sites, those in the negative pattern set were moderately but significantly enriched for CDS location and depleted for 3′UTR location. Sites with a positive pattern were depleted for CDS location but enriched for 3′UTR location (Fig. 5c, bottom panel). Many, but not all, of these observations were also made in the less discriminative F234 set (Fig. S10). With the caveat that a proportion of individual site profiles are based on imprecise measurements (see below), the key discernible features from the clustering approach were (a) a preponderance of sites showing negative correlation with translation state and, (b) while 5′UTR sites were relatively unremarkable, there was a tendency for CDS and 3′UTR site to segregate into negative and positive pattern sets, respectively.
Individual mRNA sites show robust anti-correlation with translation state
To identify individual sites displaying significant non-conversion change across the polysome profile, we performed pairwise logistic regression analysis (Table S8D). In each pairwise comparison, the majority of sites that reached significance showed a negative correlation with translation state; the F1-F2 comparison yielded the largest number of significant sites with the strongest bias towards the negative trend (Fig. S11A). A total of 43% of the sites in the F1234 set but only ~ 7% in the F234 set had significant differences in at least one pairwise comparison (Table S6). Focusing on the F1234 set, 108 sites reached significance comprising a total of 149 significant pair-wise comparisons. Of note, the large majority of these sites represented a negative trend with translation state, and most assignments were based on the F1-F2 and F2-F3 comparisons (Fig. S11B). Sixteen of these sites were in the 5′UTR, 81 in the CDS, and 11 in the 3′UTR, which represents significant enrichment of CDS sites and depletion of 3′UTR sites.
Sites with significant change in cytosine non-conversion level across several fraction steps, or with larger magnitude of change between steps, may represent the most compelling candidates for functional studies. Regarding the former, 12 sites reached significance in all three pairwise comparisons; 10 of these showed a negative trend (17/79 with 14/69 sites showing a negative trend based on two or a single pairwise comparison, respectively). Regarding the latter, 59 of the 149 significant pairwise comparisons satisfied an arbitrarily imposed criterion of ≥ 10% relative non-conversion difference. In total, 54 of these pairs represented a negative step. Furthermore, they were primarily based on the F1-F2 comparison (39; 15 on F2-F3 and 5 on F3-F4). In terms of sites, none of the 12 ‘triple significance’ sites satisfied this criterion at all steps, one having two steps and five having one step of the required magnitude (for the 17 ‘double significance’ sites: three for two steps, 12 for one step; 79 ‘single significance’ sites: 33 for one step). Overall, choosing sites for ‘compelling’ profiles across the polysome gradient primarily, but not exclusively, selects for those showing a negative trend with translation state.
The profiles of sites selected for significant (for two or more steps) and/or strong change (≥ 10% relative), as well as different level of cytosine non-conversion overall, are shown in Fig. 6 and Fig. S11C,D. The selection further contains several sites that were validated by amplicon-bsRNA-seq. Inspecting the few ‘promising’ sites with positive change showed that several of them actually displayed a complex profile pattern, consistent with their varied membership to the trend patterns described above (Fig. 5b) and leaving even fewer with a clear, monotonously positive association with translation state (Fig. S11D). By contrast, all ‘high-quality’ sites with negative change came from the negative trend pattern (Fig. 5b, left panel) and nearly all displayed a continuous decline in cytosine non-conversion from fraction 1 through to 4 (Figs. 6 and S11C). While a few of these latter sites were situated in the 5′ or 3′UTR, most of them were located in the CDS. Thus, bsRNA-seq has identified candidate m5C site in multiple individual mRNAs that suggest an interdependence with translation. These sites/mRNAs are now accessible to functional follow-up studies.