Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

Table 2 Summary of the methods for defining positive and negative sets for model training and candidate CRMs for genome-wide predictions

Methods	Labels	Positive set	Negative set	Classifier	Epigenetic marks data used	CRM candidates
Our method	TF binding	CRMs overlapping TF binding peaks	Randomly selected non-CRMs or CRM not overlapping STARR-seq peaks	LR	CA, H3K27ac, H3K4me1, H3K4me3	Predicted CRMs
Matched Filter	STARR-seq & H3K27ac peaks	2-kb regions around STARR-seq peaks overlapping H3K27ac or CA peaks	Randomly selected 2-kb bins not overlapping STARR and H3K27ac/CA peaks	SVM, random forest, rigid regression	CA, H3K9ac, H3K27ac, H3K4m1,H3K4m2, H3K4m3	2-kb sliding window
REPTILE	EP300 binding	DMRs in ±1-kb regions around the summits of top EP300 peaks	Randomly selected 2-kb bins not overlapping EP300 peaks	Random forest	mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac	2-kb sliding windows with 100-bp step size
RFECS	EP300 binding	±1-kb regions around the summits of top EP300 peaks	Randomly selected 2-kb bins not overlapping EP300 peaks	Random forest	mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac	2-kb sliding windows with 100-bp step size
DELTA	EP300 binding and promoters	Top EP300 peaks and all known promoter	Randomly selected 2-kb bins not overlapping EP300 peaks and promoters	AdaBoost	mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac	2-kb sliding windows with 100-bp step size
CSI-ANN	EP300 binding or known CRMs	Known CRMs or top EP300 peaks	Randomly selected 2-kb bins	Neural network	mCG, H3K4me1, H3K4me2, H3K4me3 H3K27me3, H3K9ac, H3K27ac	2-kb sliding windows with 100-bp step size

ISSN: 1741-7007