Skip to main content

Table 3 Sources of coordinate locations of various DNA features used in this study

From: Detection and characterization of constitutive replication origins defined by DNA polymerase epsilon

Feature

Source

RFD profiles

RFD profile data for HeLa and GM06990 cells was downloaded from the database described in [31]. Positions of replication origins marked with red rectangles are based on BED files with ORI positions provided by the authors.

Known genes

Exon locations for hg19 used to make gene visualizations were obtained from the UCSC FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz. Alternative splice variants for a particular gene were combined into a single entry that contained all possible exons.

GC content

GC content was calculated in 1-kb intervals for the hg19 reference genome based on the BSgenome.Hsapiens.UCSC.hg19 and seqinr [74] R libraries.

Skew profile (S) and replication origins

Compositional skew was calculated according to the specifications from [21], in 1-kb intervals across the entire hg19 reference genome.

Replication time

Replication time data for hg19, from 15 cell lines, obtained in the ENCODE project, were downloaded from GEO (ID: GSE34399) in a bigWig file format. The data represent smoothed wave signals for 1-kb windows, obtained in the Repli-seq experiment.

Sequence conservation

Sequence conservation data was obtained from the phastCons100way UCSC track using phastCons100way.UCSC.hg19 Bioconductor library [75]

Histone marks

Histone marks, including H2az, H3k27ac, H3k27me3, H3k36me3, H3k4me1, H3k4me2, H3k4me3, H3k79me2, H3k9ac, H3k9me3, and H4k20me1, were downloaded from the UCSC table browser, for the K562 cell line.

CpG islands

CpG island locations for hg19 were downloaded from the AnnotationHub (AH5086 track) using the AnnotationHub Bioconductor library [76]

Isochores

Isochore locations for hg19 were downloaded from https://bioinfo2.ugr.es/isochores database [32] and divided into one of 5 groups: L1, L2, H1, H2, H3, based on their average GC content according to the following thresholds: L1 ∈ [0, 37); L2 ∈ [37,41); H1 ∈ [41,46); H2 ∈ [46,53); H3 ∈ [53,100)

DNAse hypersensitivity sites

DNAse hypersensitivity peaks originating from the ENCODE project were downloaded from the UCSC table browser, for the K562 cell line [77].

Repeats

Repeat sequence locations for hg19 were downloaded from the AnnotationHub (AH5122 track) using the AnnotationHub Bioconductor library [76]

Simple repeats

Locations of simple repeats for hg19 were downloaded from the AnnotationHub (AH5124 track) using the AnnotationHub Bioconductor library [76]

Alu sequences

Alu sequences are a subset of the Repeats track which contains all repeats from the Alu family (37 types).

G-quadruplexes

G-quadruplex locations for the hg19 reference genome were downloaded from GEO (ID: GSE110582). The G4 locations originate from a study based on G4-seq [59]

Transcription start sites (TSS)

Locations of transcription start sites (TSS) were determined based on the UCSC gene annotation file downloaded from the FTP server: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/refGene.txt.gz

Chromatin loops

Chromatin loop data were obtained from the GEO database (ID: GSE63525) for K562 cells [61].

S/MARs

Locations of scaffold/nuclear matrix attached regions (S/MARs) for hg19 were downloaded from the MARome database in a BED file format [62]

TADs

Genomic coordinates (hg19) of TADs mapped in 8 cell lines we downloaded from the ENCODE project website at https://www.encodeproject.org/search/?type=Experiment&assay_title=Hi-C [48].

Genomic coordinates (hg19) of TADs mapped in human hESC and IMR90 cells were downloaded from the RenLab website at http://chromosome.sdsc.edu/mouse/hi-c/download.html [49].