From: Detection and characterization of constitutive replication origins defined by DNA polymerase epsilon
Feature | Source |
---|---|
RFD profiles | RFD profile data for HeLa and GM06990 cells was downloaded from the database described in [31]. Positions of replication origins marked with red rectangles are based on BED files with ORI positions provided by the authors. |
Known genes | Exon locations for hg19 used to make gene visualizations were obtained from the UCSC FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz. Alternative splice variants for a particular gene were combined into a single entry that contained all possible exons. |
GC content | GC content was calculated in 1-kb intervals for the hg19 reference genome based on the BSgenome.Hsapiens.UCSC.hg19 and seqinr [74] R libraries. |
Skew profile (S) and replication origins | Compositional skew was calculated according to the specifications from [21], in 1-kb intervals across the entire hg19 reference genome. |
Replication time | Replication time data for hg19, from 15 cell lines, obtained in the ENCODE project, were downloaded from GEO (ID: GSE34399) in a bigWig file format. The data represent smoothed wave signals for 1-kb windows, obtained in the Repli-seq experiment. |
Sequence conservation | Sequence conservation data was obtained from the phastCons100way UCSC track using phastCons100way.UCSC.hg19 Bioconductor library [75] |
Histone marks | Histone marks, including H2az, H3k27ac, H3k27me3, H3k36me3, H3k4me1, H3k4me2, H3k4me3, H3k79me2, H3k9ac, H3k9me3, and H4k20me1, were downloaded from the UCSC table browser, for the K562 cell line. |
CpG islands | CpG island locations for hg19 were downloaded from the AnnotationHub (AH5086 track) using the AnnotationHub Bioconductor library [76] |
Isochores | Isochore locations for hg19 were downloaded from https://bioinfo2.ugr.es/isochores database [32] and divided into one of 5 groups: L1, L2, H1, H2, H3, based on their average GC content according to the following thresholds: L1 ∈ [0, 37); L2 ∈ [37,41); H1 ∈ [41,46); H2 ∈ [46,53); H3 ∈ [53,100) |
DNAse hypersensitivity sites | DNAse hypersensitivity peaks originating from the ENCODE project were downloaded from the UCSC table browser, for the K562 cell line [77]. |
Repeats | Repeat sequence locations for hg19 were downloaded from the AnnotationHub (AH5122 track) using the AnnotationHub Bioconductor library [76] |
Simple repeats | Locations of simple repeats for hg19 were downloaded from the AnnotationHub (AH5124 track) using the AnnotationHub Bioconductor library [76] |
Alu sequences | Alu sequences are a subset of the Repeats track which contains all repeats from the Alu family (37 types). |
G-quadruplexes | G-quadruplex locations for the hg19 reference genome were downloaded from GEO (ID: GSE110582). The G4 locations originate from a study based on G4-seq [59] |
Transcription start sites (TSS) | Locations of transcription start sites (TSS) were determined based on the UCSC gene annotation file downloaded from the FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz |
Chromatin loops | Chromatin loop data were obtained from the GEO database (ID: GSE63525) for K562 cells [61]. |
S/MARs | Locations of scaffold/nuclear matrix attached regions (S/MARs) for hg19 were downloaded from the MARome database in a BED file format [62] |
TADs | Genomic coordinates (hg19) of TADs mapped in 8 cell lines we downloaded from the ENCODE project website at https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C [48]. |
Genomic coordinates (hg19) of TADs mapped in human hESC and IMR90 cells were downloaded from the RenLab website at http://chromosome.sdsc.edu/mouse/hi-c/download.html [49]. |