Co-expression strategy for lncRNA annotation
We hypothesized that a co-expression strategy based on a combination of the following features would likely generate the most accurate regulatory networks of lncRNAs (Fig. 1). First, we chose a 3rd generation next generation sequencing (NGS) single-molecule sequencing (SMS) platform as a foundation for estimation of expression of every lncRNA and mRNA species. This sequencing platform has relatively simple library preparation procedure that does not involve amplification steps [24] and thus more likely represents true original abundancies of various RNA species, especially in the low abundance range [25, 26]. The accuracy in this range is especially relevant for lncRNAs that tend to have low expression levels in general [3, 27]. Therefore, SMS could, in theory, provide a more accurate estimate of co-expression between lncRNAs and their potential target mRNAs. Second, we used a single cell type to generate the co-expression networks. While many publicly available expression datasets are available, reliance on them in the co-expression annotation of lncRNAs suffers from a major potential problem: lncRNAs are often expressed in just one or few cell types [3, 27]. Therefore, a co-expression analysis across multiple cell types would likely include many samples where any given lncRNA is either not expressed or expressed at noise levels and thus severely dilute the real correlation signals. Third, we used short time frames of transcriptome perturbing treatments (see below). We assumed that a co-expression analysis based on RNA levels measured shortly after the system is perturbed and forced to adapt by altering levels of various transcripts would more likely capture direct regulatory interactions as opposed to longer time treatments that could be diluted with indirect effects.
We have previously found expression of many vlincRNAs in a human leukemia cell line K562 [18]. This fact, together with the availability of multiple types of genomic data for this cell line from the ENCODE consortium [3], made K562 an attractive system for this study. The first step in our pipeline was to generate an expression database under multiple treatment conditions to calculate the co-expression of every vlincRNA with all protein-coding mRNAs. We profiled transcriptomes of K562 cell line after treatments with 29 inhibitors and anti-cancer drugs affecting diverse cellular pathways and functions (signaling pathways, cell cycle, DNA metabolism and repair, chromatin modifiers, etc.) (Additional file 1: Supplemental Table S1). As mentioned above, we used relatively short treatments of 3 and 6 h for each drug.
Total DNaseI-treated RNA from each sample was converted into cDNA using the not-so-random (NSR) hexamers devoid of sequences that bind to rRNAs [28] and analyzed using RNA-seq performed on the SMS platform. To estimate the degree of perturbation of the transcriptome by each drug, we estimated the number of differentially expressed (DE) up- or downregulated transcripts—both protein-coding mRNAs and vlincRNAs—defined by fold change (FC) > 1.5 in both time points relative to the solvent (DMSO or water) controls for both 12,995 annotated genes expressed in K562 and 407 vlincRNAs detected previously in this cell line [18] (Fig. 2a–c, Additional file 1: Supplemental Table S2). Overall, expression of 10,248 (78.9%) of the protein-coding genes changed under these conditions in at least one drug treatment with 7229 up- and 6698 downregulated genes (Fig. 2a, Additional file 1: Supplemental Table S2). The corresponding numbers for the vlincRNAs were 392 (96.3%) with 176 up- and 374 downregulated (Fig. 2a, Additional file 1: Supplemental Table S2). For any given drug treatment, we detected 1,190 (9.2%) and 623 (4.8%) up- or downregulated genes and correspondingly 11 (2.7%) and 79 (19.4%) vlincRNAs based on the corresponding median values across all treatment (Fig. 2a). Overall, vlincRNAs had a tendency to be downregulated as compared to known genes in response to drug treatments, suggesting potential negative correlation between these two types of transcripts (Fig. 2a, also see below).
The drugs varied significantly in respect to the effect on coding and non-coding transcriptomes (Fig. 2b, c, Additional file 1: Supplemental Table S2). Of the top three drugs that exhibited the largest upregulating effect on vlincRNAs—mirin (inhibitor of MRE11, a component of the MRN complex), BML-277 (CHK2 inhibitor), and YM-155 (possible DNA intercalator) (Fig. 2b, Additional file 1: Supplemental Table S2)—at least two are known to inhibit DNA damage sensing or response pathways (mirin and BML-277). DNA damage-related drugs also caused significant changes in the protein-coding transcriptome (Fig. 2c, Additional file 1: Supplemental Table S2). Still, the fraction of vlincRNAs upregulated in response to mirin and BML-277 treatments was higher than that of protein-coding mRNAs (Additional file 1: Supplemental Table S2). Furthermore, drugs that induced the highest fractions of expression of protein-coding genes affect epigenetic functions, such chromatin modifiers (panobinostat and EPZ-6438, inhibiting histone deacetylases and Ezh2 respectively) or readers of specific histone marks (bromodomain inhibitor I-BET151) and non-DNA damage related functions (Fig. 2c, Additional file 1: Supplemental Table S2). As such, it appears that the vlincRNA subclass of lncRNAs might be enriched in transcripts that participate in at least some cellular processes related to DNA damage.
To validate the reproducibility and authenticity of our expression analysis, we performed independent treatment experiments with three drugs (mirin, etoposide and SN-38) and analyzed the changes in expression of selected vlincRNAs in response to these drugs after 6 h of the treatments using real-time PCR. We selected 42 differentially expressed (DE) vlincRNAs and, as expected, most (36, 85.7%) DE vlincRNAs could be validated (Fig. 2d, Additional file 1: Supplemental Table S3). Furthermore, of the 6 vlincRNAs that could not be validated in the real-time PCR experiments, 4 (66.7%) showed expected direction of the change albeit not reaching the FC of 1.5. As such, the DE analysis based on the SMS RNA-seq platform appears to capture authentic and reproducible expression changes.
We then generated a list of mRNAs co-expressed with each vlincRNA. The co-expression was defined as Spearman correlation of either > 0.35 or < −0.35 between a vlincRNA and a protein-coding mRNA with the correlation significance p value < 0.01 (Fig. 1) calculated on 64 samples (drug treatments and solvent-treated control samples). For each vlincRNA, we found between 134 and 5385 (median 1615) co-expressed transcripts using these thresholds. Interestingly, we have observed a much higher number of negatively correlated mRNAs than positively correlated ones with the medians of 430 and 1,022 for the positively and negatively co-expressed transcripts respectively, and the trend towards negative correlation was highly significant (p value < 2.2E−16, Wilcoxon signed rank test) (Fig. 2e, Additional file 1: Supplemental Table S4). Nonetheless, similar to the results reported earlier [29], genes positively correlating with vlincRNAs were enriched in the immediate vicinity of these transcripts. The median co-expression correlations between vlincRNAs and genes located within 5 kb, 5–10 kb, 10–100 kb, and > 100 kb from each other were 0.44, 0.37, −0.38, and −0.38 respectively.
Validation of the co-expression networks using lncRNA-chromatin interaction profiles
A major potential limitation of the co-expression strategy is that the expression correlation (positive or negative) can occur without direct physical or functional interactions between the correlated entities. Since a number of functionally characterized lncRNAs appear to regulate other genes by interacting and modulating their chromatin environment [4, 5], we assumed that the vlincRNAs could also function in the same fashion, as was in fact shown for the VAD vlincRNA [19]. Therefore, we validated the co-expression networks obtained in this study by mapping the sites of genome-wide RNA-chromatin interactions of selected vlincRNAs with the underlying key assumption that vlincRNAs should either interact or be in a relatively close proximity to their target genes (Fig. 1).
For this purpose, we adapted previously published RAT (reverse transcription-associated trap) approach [30, 31] that has two key advantages for the very long transcripts studied in this work (Fig. 3a). First, RAT relies on in situ reverse transcription inside crosslinked nuclei with oligonucleotides complementary to an RNA of interest and in presence of biotinylated dCTP to label RNA-chromatin complexes. Following the streptavidin immunoprecipitation, the bound chromatin regions are identified based on NGS analysis (Fig. 3a). The incorporation of biotin into the resulting cDNA obviates the need to design multiple closely spaced biotinylated oligonucleotide as in other techniques (e.g., ChIRP and similar methods) designed to map sites of interactions between a specific lncRNA and chromatin [32,33,34], which would be economically prohibitive for these very long transcripts. Second, chromatin fragmentation is conducted with restriction enzymes (Fig. 3a, b) that do not fragment RNA or single-stranded DNA unlike the other approaches that use sonication [32,33,34] that would likely break these very long transcripts.
Recently, a number of methods to detect genome-wide RNA-chromatin interactions were developed. However, one common feature of these methods (such as GRID-seq [35], MARGI [36], and Red-C [37]) was ligation of nearby DNA and RNA molecules using bridging oligonucleotides. The latter were in the range of ~ 40–60 bases and could thus detect molecules separated by no more than 20 nm given length of a nucleotide being 0.34 nm. However, in our RAT assays, the size of the chromatin particles after DNA fragmentation reached 300 to 500 nm (Fig. 3c, Methods). Since all genomic regions would be expected to be located within such particles should be co-precipitated with the target transcript (Fig. 3a), this would mean that RAT is not limited to immediate interactions, but rather can measure much more distal proximity or colocalization between RNA and chromatin regions.
Since DNA damage-inducing drugs had the highest effect on the expression of vlincRNAs, we chose 6 vlincRNAs induced by the topoisomerase inhibitors (etoposide and/or SN-38) for the RAT analysis with an example of one such vlincRNA shown in Additional file 2: Supplemental Figure S1. The RAT procedure was performed on cells treated with either etoposide, SN-38 or DMSO. Overall, the RAT analysis was performed on 14 vlincRNA-treatment combinations with two biological replicas per combination with the goal of analyzing potential change in the networks in response to drug treatment. Each RAT assay was performed separately with 2 sets of non-overlapping oligonucleotides designed against the same vlincRNA (Fig. 3a, Methods). In addition, for each treatment, the RAT procedure was also performed without the oligonucleotides as a specificity control. Downstream analysis was performed using two levels of processed RAT signal: (1) average normalized RAT score calculated for every base pair in the human genome or (2) genomic region level obtained after application of thresholds of different stringency to the average normalized RAT score (Fig. 3d, Methods). The thresholds were defined based on the top 1 (most strict), 5, 10, 20, or 30 (least strict) percentile (%-ile) of the average normalized RAT score for each sample (Fig. 3d, Methods). Genes containing the RAT regions in their boundaries were considered co-localized with the corresponding vlincRNA.
As the first step in evaluation of the performance of the RAT approach, we estimated the overlap between RAT regions obtained from the biological replicas at either region or gene levels. In the former, exact genomic coordinates of the interacting regions had to be present in both replicas while in the latter genes had to contain interacting regions anywhere within their boundaries in both replicas but the coordinates of the interacting regions could be different. Overall, we found statistically significant overlap of the RAT signal between the replicas for every vlincRNA-treatment combination at both levels (Fig. 3e, Additional file 1: Supplemental Table S5). Furthermore, the overlaps of the RAT signal between the two replicas were statistically significant at multiple thresholds; however, as would be expected, the strengths of the overlaps, as measured by the odds ratios (defined in Fig. 3e) increased with the stringency of the RAT signal threshold (Fig. 3e, Additional file 1: Supplemental Table S5, Methods). In general, the odds ratios of the gene-level overlaps between the two replicas were consistently higher (Additional file 1: Supplemental Table S5). Therefore, unless specifically indicated, all analyses below were performed on genes containing the vlincRNA-chromatin interacting regions anywhere within their boundaries in both replicas (Methods).
To evaluate the relationship between co-expression and relative proximity in the nucleus between vlincRNAs and the co-expressed genes, for each vlincRNA, we measured average normalized aggregated RAT score (ANARS) in the boundaries of the corresponding co-expressed and background control genes and in their 5 kb flanking regions (Additional file 3: Supplemental Figure S2, Methods). As shown in the Fig. 4a for one vlincRNA (ID-1132), the negatively and positively co-expressed genes had a tendency to have higher ANARS in gene bodies and their flanking regions than background genes. To formalize this observation, we generated empirical cumulative distribution function (ECDF) plots representing distribution of the ranked ANARS for the co-expressed and background genes (Fig. 4b, Additional file 4: Supplemental Figure S3, Methods). The ANARS of the co-expressed genes was consistently higher for the negative and positive co-expressed genes than the background genes, in gene bodies and in the flanking regions, for most vlincRNA-treatment conditions as shown in Fig. 4b for vlincRNA ID-1132 and in Additional file 4: Supplemental Figure S3 for all other vlincRNAs.
To test whether the difference is significant, we calculated p values of the enrichment of normalized RAT signal in the co-expressed relative to the background control genes. The statistical analysis was performed on the top 30% of the ranked ANARS values for the co-expressed and background genes, as illustrated by the regions of the ECDF plots demarcated by the boxes on Fig. 4b (Methods). The actual p values are given in the Additional file 1: Supplemental Table S6, and the results of the analysis are summarized in the Fig. 4c (gene bodies) and Additional file 5: Supplemental Figure S4 (gene bodies and flanking regions). Interestingly, the enrichment of the ANARS in the positively and negatively co-expressed genes compared to the background genes was statistically significant for most (12/14) vlincRNA-treatment combinations (Fig. 4c, Additional file 5: Supplemental Figure S4). Furthermore, the enrichment was statistically significant in all 14 combinations for either positively or negatively or both types of co-expressed genes (Fig. 4c, Additional file 5: Supplemental Figure S4).
Most co-expressed genes were located on chromosomes other than the one harboring the corresponding vlincRNAs (trans). However, interestingly, the ANARS for the cis co-expressed genes (located on the same chromosome as the vlincRNA) had a tendency to be higher than that for all co-expressed genes as shown in the Fig. 4d, e for vlincRNA ID-1132 and in Additional file 6: Supplemental Figure S5 for all other vlincRNAs. We then estimated statistical significance of enrichment of the ANARS in the cis genes compared to all genes (Fig. 4c, Additional file 5: Supplemental Figure S4, Additional file 1: Supplemental Table S6). The enrichment was statistically significant among all samples for the positively co-expressed genes and for the majority (9/14) samples for the negatively co-expressed ones (Fig. 4c, Additional file 5: Supplemental Figure S4, Additional file 6: Supplemental Figure S5, Additional file 1: Supplemental Table S6). Taken together, these results provided a strong support that co-expressed genes were enriched using RAT procedure and therefore were located in the proximity of the corresponding vlincRNAs in the nucleus. However, positively co-expressed genes and those located on the same chromosome had consistently higher signal than the negatively co-expressed genes and those located on other chromosomes (see the “Discussion” section).
We then estimated overlap between the co-expression dataset and genes containing RAT regions for every vlincRNA and made the following two observations. First, the significance of the overlap depended on expression levels. Specifically, the low abundant genes had a much higher probability of having significant overlap between positively co-expressed genes and the genes showing evidence of co-localization compared to the highly abundant ones (Fig. 5a, b, Additional file 1: Supplemental Table S7). However, the trend was reversed for the genes negatively correlating with vlincRNAs (Fig. 5a, b, Additional file 1: Supplemental Table S7). We observed this trend for every vlincRNA and every treatment (Additional file 1: Supplemental Table S7). Therefore, to increase the signal to noise ratio, we first sorted genes by maximum expression among all samples and then filtered the negatively co-expressed genes by being in the top half of expressed genes and the positively co-expressed genes by being in the bottom half. Second, the strength of the overlap increased with the stringency of the RAT signal threshold as judged by the increasing odds ratios as illustrated in the Fig. 5c (Additional file 1: Supplemental Table S8). This result indicated that the RAT signal thresholds were indeed informative in enriching for co-localized vlincRNAs and their regulatory targets.
As the next step, we set to choose single RAT signal threshold individually for each of the 14 vlincRNA-treatment combinations based on the best overlap with the co-expressed genes as illustrated on Fig. 5d (Methods). Using these criteria, we found that a vlincRNA can be in the vicinity of 20–2030 (median 1104) and 47–239 (median 123) negatively and positively co-expressed genes correspondingly. The odds ratios and the p values for the overlap between the final chromatin interaction maps and the negatively co-expressed genes ranged respectively from 1.07 to 2.4 (median 1.23) and from 1.16E−81 to 7.82E−2 (median 9.36E−48) (Fig. 5e, f, boxplots marked “SMS” and Additional file 1: Supplemental Table S9). The corresponding values for the positively co-expressed genes were 1.14 to 2.38 (median 1.33) and 7.83E−15 to 3.89E−2 (median 3.91E−9) (Fig. 5e, f, boxplots marked “SMS” and Additional file 1: Supplemental Table S10). The important outcome of this analysis was that majority of genes co-expressed with a vlincRNA (74.2% positive- or 81.7% negative-correlating transcripts) had evidence of co-localization with that vlincRNA.
VlincRNAs directly regulate expression of genes in their regulatory networks
As the next step, to provide direct support for the regulatory effect of vlincRNAs, we assessed the effects of direct knockdown of 2 vlincRNAs achieved using the CRISPR/Cas13 system [23] on expression of genes in their regulatory networks (Fig. 1). We took advantage of the K562 cell line expressing doxycycline (Dox) inducible Cas13 that has been previously used by us to show biological relevance of vlincRNAs in a high-throughput screening [22]. In that study, a mixed population of cells with each cell stably expressing one of 588 individual gRNAs was subjected to a survival challenge with different anti-cancer drugs [22]. Here, we generated 8 stable cell lines expressing individual gRNAs found to make cells sensitive to genotoxic stress in that high-throughput screen and targeting 2 vlincRNAs [22]. For each vlincRNA, we generated 4 stable cell lines constitutively expressing 2 different targeting gRNAs and 2 cognate mis-match control gRNAs containing mutations in bases 12–14 of the 28-mer gRNA as previously reported [22]. These mutations would abrogate the activity of the gRNA [23]. To avoid clonal effects, each cell line was represented by a mixed population of cells with different sites of lentivirus insertion.
Each of the 8 cell lines was treated with Dox for 0, 3, or 6 days, and the RNA population from each sample was subjected to RNA-seq analysis. Overall, we observed consistent knockdown in 3 out of 4 gRNAs with an average depletion of 20.4% compared to day 0 and the non-targeting control gRNAs based on the RNA-seq analysis (Methods, Additional file 1: Supplemental Table S10). If depletion of a vlincRNA has an effect on the genes it regulates, then the RNA levels of the negatively correlated genes should increase while those for the positively correlated ones, decrease (Fig. 6a). Thus, the fold changes of the former in response to vlincRNA knockdown would be higher than that of the latter. Conversely, if a vlincRNA has no effect on the genes it regulates, there should be no difference in the relative expression changes between genes negatively and positively correlating with it. To determine whether the difference exists, fold change of each gene was calculated for the 3- and 6-day time point relative to (1) the corresponding mismatch control and (2) the un-induced samples (the day 0 controls) based on RNA-seq analysis (Methods).
We then estimated differences in the relative fold changes between 4 groups of genes for every vlincRNA. The first 3 groups were based on the co-expressed genes: (1) all negatively vs all positively co-expressed genes, (2) 100 most negatively vs 100 most positively co-expressed genes, and (3) 50 most negatively vs 50 most positively co-expressed genes. The final background control group consisted of all remaining genes, many of which also exhibited weak correlation (either positive or negative) with vlincRNA expression, which however did not pass the significance thresholds described above for these genes to be considered co-expressed with vlincRNAs (Fig. 6a). In theory, the effect of vlincRNA depletion on these background genes should be less than on the co-expressed genes. Thus, the relative fold change difference in the background genes negatively and positively correlating with vlincRNA expression would serve as a control for the differences observed between the negatively and positively co-expressed genes (Fig. 6a). Therefore, the background group was split into two subsets based on negative or positive correlation with a vlincRNA and the differences in the relative fold changes between the two groups were calculated. For each comparison, we calculated 3 metrics: (1) differences between median relative fold changes, (2) Cohen’s d effects of differences between the average relative fold changes, and (3) statistical significance of the difference using the Wilcoxon rank sum test (Fig. 6b–e, Additional file 1: Supplemental Table S10). The comparisons were done by treating the 3- and 6-day time points separately and by combining the two time points.
Strikingly, the relative fold changes of the negatively co-expressed genes were almost always higher than those of the positively co-expressed ones as signified by the differences of the medians and positive Cohen’s d scores (Fig. 6b–e, Additional File 1: Supplemental Table S10). However, the differences of the medians and the Cohen’s d values were much higher for the co-expressed genes compared to the background correlated genes (Fig. 6b–e, Additional file 1: Supplemental Table S10). This difference was particularly pronounced when the top 50 or 100 negatively co-expressed genes were compared with the top 50 or 100 positively co-expressed ones (Fig. 6b–e, Additional file 1: Supplemental Table S10). Overall, the magnitudes of the Cohen’s d effects were quite small, mostly < 0.1 for the control background genes (Fig. 6b, d, Additional file 1: Supplemental Table S10). The differences of the medians and Cohen’s d values were higher on the day 3 compared to day 6 (Fig. 6d, e, Additional file 1: Supplemental Table S10), possibly due to accumulation of indirect effects affecting expression of the target genes. Furthermore, using the Wilcoxon rank sum test, the median relative fold changes of the negatively co-expressed genes were significantly higher (p value < 0.05) than those of the positively co-expressed ones for 2 out of 3 gRNAs (Additional file 1: Supplemental Table S10). However, several comparisons for the remaining gRNA were reaching the threshold of significance with the p values in the 0.05–0.09 range (Additional file 1: Supplemental Table S10). Based on these results, we reached the following conclusions. First, vlincRNAs appear to directly regulate multiple other genes, both positively and negatively, and these regulatory interactions could be predicted based on expression correlation in our co-expression assay. Second, genes with stronger co-expression with vlincRNAs do exhibit stronger regulation by the vlincRNAs. Third, even relatively modest levels of depletion of these transcripts can have measurable molecular phenotypes.
Functional properties of vlincRNA regulatory networks
The strong statistical overlap between the SMS co-expression and the chromatin interaction datasets combined with the CRISPR/Cas13 validation indicated we identified true vlincRNA regulatory networks. As described above, RAT signal for the genes in vlincRNA networks was significantly higher than in the background genes in most treatments. Therefore, it appears that different treatments did not significantly alter vlincRNA regulatory networks. To further quantify this observation, we identified lists of genes shared by the co-expression and chromatin interaction datasets in each treatment (DMSO or drugs) for each vlincRNA. Then, we estimated the fraction of overlap among these lists for each vlincRNA. Overall, 83.7–100% (median of 92.9%) and 48.6–83.8% (median of 63.7%) of respectively negatively and positively correlated co-expressed genes were shared by the DMSO-treated controls and the drug treatments. The respective odds ratios of positive and negative co-expressions were 52.8–192.7 (median 83) and 6.3–95.6 (median 12.6), indicating that the overlaps between the drug treatments and DMSO treatments were statistically significant (Fig. 7a, Additional file 1: Supplemental Table S11). Also, the networks did not change significantly in response to treatments with different drugs. For the two vlincRNAs profiled in both the etoposide and SN-38 treated cells, 87.2–99% (median of 93.1%) and 80.4–93.7% (median of 87.1%) of respectively negatively and positively correlated co-expressed genes were shared by the two drugs.
Second, networks consisted primarily of genes located on chromosomes different from those where the vlincRNAs were found, as exemplified in the Fig. 7b. However, consistent with the results above, the odds ratios of the overlap between the co-expression and chromatin interaction datasets were higher for the genes located on the same chromosomes (cis) as the vlincRNAs than those on the other chromosomes (trans) (Fig. 7c), but only 12/28 of these overlaps were statistically significant (Fig. 7d, Additional file 1: Supplemental Table S12). The likely reason for it is that the number of genes on the same chromosomes was not as high as genome-wide (Fig. 7e, Additional file 1: Supplemental Table S12). Therefore, we combined all samples to increase the statistical power and could indeed show that the odds ratios of overlap between the co-expression and chromatin interaction datasets were higher in cis than in trans (p value 6.1E−3, Wilcoxon rank sum test). Therefore, these results suggest that vlincRNAs participate in both cis and trans interactions; however, while the latter are much more numerous, the RNA-chromatin interactions with the genes on the same chromosomes tend to be stronger (see the “Discussion” section).
To further understand the properties of the vlincRNA regulatory networks, we performed Gene Ontology (GO) analysis to annotate all 407 vlincRNAs based on the functions of genes in the networks. Strikingly, the networks for different vlincRNAs exhibited enrichment of similar functions (Fig. 7f, g, Additional file 1: Supplemental Table S13). Most of the negatively correlated networks were significantly enriched in functions related to RNA (Fig. 7f), while the positive networks were significantly associated with various development GO terms (Fig. 7g). For example, the top 5 enriched GO terms among the negatively correlated networks and shared by ≥ 65% of the vlincRNAs were “RNA processing,” “RNA splicing,” “mRNA splicing,” “mRNA metabolic process,” and “mRNA splicing” (Fig. 7f, Additional file 1: Supplemental Table S13). “DNA-templated transcription” was within top 20 such GO terms and shared by 50% of all vlincRNAs (Additional file 1: Supplemental Table S13). On the other hand, the top 5 GO terms enriched in the positively correlated networks were “nervous system development,” “central nervous system development,” “multicellular organism development,” “system development,” and “anatomical structure development” shared by 27–35% of the vlincRNAs (Fig. 7g, Additional file 1: Supplemental Table S13). Extending to the top 50 GO terms revealed additional enrichment of the negatively co-expressed networks in functions associated with cell-cycle, such as “cell cycle phase transition,” “cell cycle,” “mitotic cell cycle process,” “mitotic sister chromatid segregation,” and “negative regulation of cell cycle process” shared by 43–46% of vlincRNAs (Additional file 1: Supplemental Table S13). The same step revealed the enrichment in functions associated with cellular adhesion among the genes found in the positively correlated networks and shared by 13–21% of vlincRNAs (Additional file 1: Supplemental Table S13).
All in all, the enrichment of similar functions among the co-expressed genes suggested that vlincRNAs have somewhat similar patterns of expression. To address this, we calculated median Spearman correlation among vlincRNAs or mRNAs only and between pairs of vlincRNAs and mRNAs. The median vlincRNA-vlincRNA, vlincRNA-mRNA, and mRNA-mRNA correlations were respectively 0.28, −0.02, and 0.03 (Fig. 7h). Thus, indeed vlincRNAs tend to be coordinately regulated and participate in control of genes with similar functions.
VlincRNAs are required for cellular survival under stress conditions
To directly test whether vlincRNAs and their regulatory networks could have biological significance, we tested importance of the 2 vlincRNAs used for the CRISPR/Cas13 experiments for the cell’s ability to survive genotoxic stress. Cells from the 8 individual CRISPR/Cas13 cell lines described above were mixed in equal proportions, grown for 3 days in presence or absence of Dox and then treated with etoposide (also with or without Dox). As shown in our previous study, etoposide had a strong and long-lasting toxic effect on K562 cells, leading to a continuous cell death even after the removal of the drug, and a slow recovery [22]. Here, for every treatment, after removing etoposide, the cells were allowed to regrow for ~ 10 days until they resumed normal growth and appearance, and then we estimated survival of the cells harboring each gRNA by calculating the normalized abundance of that gRNA in the genomic DNA from the pooled cells using NGS. For every treatment and every gRNA pair, we calculated the ratio of targeting/non-targeting gRNA abundances to estimate relative survival of cell harboring gRNAs targeting vlincRNAs relative to cells harboring their cognate non-targeting controls.
Interestingly, the average/median of the ratios of the targeting gRNAs relative to their cognate controls either immediately after pooling or growth for 3 days before etoposide addition were 0.9/0.91 even though all cell lines were mixed in equal proportions (Additional file 1: Supplemental Table S14, Methods). This suggests that even during growth and expansion steps leading from the lentiviral transfection to establishment of the individual cell lines, preferential loss of cells expressing targeting gRNAs occurred presumably due to their toxicity combined with the leaky expression of Cas13 in the absence of Dox. The subsequent treatments with etoposide resulted in further drop in this ratio, especially when combined with the induction of Cas13 by Dox (Additional file 1: Supplemental Table S14). The average/median ratios of targeting vs non-targeting gRNAs were 0.91/0.87 for the etoposide/−Dox and 0.78/0.76 for the etoposide/+Dox treatment (Additional file 1: Supplemental Table S14). Overall, across all 4 gRNAs pairs, the drop in the ratio (indicating more dead cells) in the etoposide/+Dox samples was statistically significant with p values of 0.01 and 0.04 (one-sided Student’s t test) compared to the cells not treated with etoposide or those treated with etoposide/−Dox. Interestingly, cells expressing the gRNA D33_v2_6 that did not show significant vlincRNA depletion in the transcriptome analysis were most depleted even without the drug treatment compared to their non-targeting control in the cell survival analysis with the corresponding average ratios of 0.84, 0.76 and 0.66 (Additional file 1: Supplemental Table S14). These results suggest that our inability to detect consistent changes in the level of the target transcript could be caused by the death of cells where this vlincRNA is depleted in a mixed population of cells. Altogether, these results demonstrate that these vlincRNAs are required for cell survival during normal growth conditions and especially so under a genotoxic stress.
Effect of an RNA measurement platform on authenticity of co-expression derived networks
To test whether the significant overlap between the co-expression derived networks and chromatin interaction datasets would be a general feature for any expression dataset, we regenerated a fraction of the dataset used for the co-expression analysis using the 2nd generation Illumina platform also using rRNA-depleted total RNA. We generated the co-expression networks using the same criteria as above. Importantly, application of the same p value threshold to estimate the reliability of the correlation estimates should in theory account for the different numbers of samples used to calculate the co-expression correlations (64 for SMS vs 32 for Illumina). Furthermore, the Illumina dataset had a much larger (on average ~10 fold) number of reads generated per sample and significantly longer reads: paired-end 150 base reads vs single read of on average ~35 bases for SMS. For each vlincRNA, we found a higher number of co-expressed transcripts using the same thresholds in the Illumina RNA-seq dataset than in the SMS one with the corresponding median numbers of 2,073 and 1,615. As in the SMS-based co-expression analysis, we observed a statistically significant trend towards the negative correlation between vlincRNAs and mRNAs with the corresponding median numbers of negatively and positively co-expressed mRNAs of 1,119 and 943 per vlincRNAs (p value < 2.2E–16, Wilcoxon signed rank test).
However, the overlap with the RAT dataset was poor based on comparisons with individual vlincRNA-treatment combinations or on merged dataset made by combining all treatments for all vlincRNAs (Fig. 5e, f, Additional file 1: Supplementary Tables 15 & 16). While the odds ratios of the overlaps with the negatively co-expressed genes indicated enrichment and ranged from 1.03 to 1.74 (median 1.19) (Fig. 5e, Additional file 1: Supplemental Table S15), the p values were much less significant compared with those from SMS RNA-seq mentioned above, with the median p value of 3.7E−2 (ranging from 2.22E−47 to 0.35) (Fig. 5f, Additional file 1: Supplemental Table S15). Furthermore, the odds ratios for the positively co-expressed genes were much lower than those for the SMS RNA-seq dataset ranging from 0.8 to 1.06 with the median of 1.0 and the median p value being 0.53 (ranging from 3.09E−2 to 0.93) (Fig. 3a, b, Additional file 1: Supplemental Table S15). Similar results were obtained using the merged data: while the overlap was significant for the negatively co-expressed vlincRNAs for both platforms albeit with the higher significance in the case of SMS, it was only significant for the positively co-expressed vlincRNAs detected by the SMS platform (Additional file 1: Supplemental Table S16). When both positively and negatively co-expressed vlincRNAs were combined, the overlap was only significant for the SMS platform (Additional file 1: Supplemental Table S16).
We also compared the effect of vlincRNA knockdown on the co-expression networks generated by both platforms. Overall, the Illumina-generated networks had similar profiles as the SMS ones (Additional file 1: Supplemental Table S17). However, the differences between the co-expressed genes in the networks and the background genes were mush less significant (Additional file 1: Supplemental Table S17). For example, the Cohen’s d effects of the combined day 3 and 6 data for the gRNAs gRNA D30_v6_6 and D33_v2_10 were 0.148 and 0.273 for the network genes and correspondingly 0.062 and 0.113 for the background genes for the SMS-generated networks (Fig. 6b, Additional file 1: Supplemental Table S10)—on average 2.4-fold higher for the network genes. The corresponding values for the Illumina networks were 0.086 and 0.295 compared to 0.080 and 0.218 (Additional file 1: Supplemental Table S17)—on average 1.2-fold higher. Therefore, networks generated using different expression platforms even on the same sample type would likely differ significantly in terms of their authenticity.