Skip to main content

Table 2 Summary of the methods for defining positive and negative sets for model training and candidate CRMs for genome-wide predictions

From: Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

Methods

Labels

Positive set

Negative set

Classifier

Epigenetic marks data used

CRM candidates

Our method

TF binding

CRMs overlapping TF binding peaks

Randomly selected non-CRMs or CRM not overlapping STARR-seq peaks

LR

CA, H3K27ac, H3K4me1, H3K4me3

Predicted CRMs

Matched Filter

STARR-seq & H3K27ac peaks

2-kb regions around STARR-seq peaks overlapping H3K27ac or CA peaks

Randomly selected 2-kb bins not overlapping STARR and H3K27ac/CA peaks

SVM, random forest, rigid regression

CA, H3K9ac, H3K27ac, H3K4m1,H3K4m2, H3K4m3

2-kb sliding window

REPTILE

EP300 binding

DMRs in ±1-kb regions around the summits of top EP300 peaks

Randomly selected 2-kb bins not overlapping EP300 peaks

Random forest

mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac

2-kb sliding windows with 100-bp step size

RFECS

EP300 binding

±1-kb regions around the summits of top EP300 peaks

Randomly selected 2-kb bins not overlapping EP300 peaks

Random forest

mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac

2-kb sliding windows with 100-bp step size

DELTA

EP300 binding and promoters

Top EP300 peaks and all known promoter

Randomly selected 2-kb bins not overlapping EP300 peaks and promoters

AdaBoost

mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac

2-kb sliding windows with 100-bp step size

CSI-ANN

EP300 binding or known CRMs

Known CRMs or top EP300 peaks

Randomly selected 2-kb bins

Neural network

mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac

2-kb sliding windows with 100-bp step size