Skip to main content

Table 1 Methods for defining seven pairs of positive/negative training sets in a cell type with STARR-seq data available

From: Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

Methods

Labels

Size (sequences)

Positive set

Negative set

CRM+TF+/Non-CRM

TF binding

17,558~272,128

CRMs overlapping TF binding peaks

Randomly selected non-CRMs

CRM+TF+/CRM+S−

TF binding

17,558~272,128

CRMs overlapping TF binding peaks

CRMs that cannot be predicted to be active and do not overlap STARR peaks

CRM+S+/Non-CRM

STARR peaks

22,610~71,176

CRMs overlapping STARR peaks

Randomly selected non-CRMs

CRM+S+/CRM+S−

STARR peaks

22,610~71,176

CRMs overlapping STARR peaks

CRMs that cannot be predicted to be active and do not overlap STARR peaks

Bin+S+/Bin+S−

STARR peaks

60,668~109,118

700bp bin overlapping STARR peaks

700bp bin not overlapping STARR peaks

Bin+ac+/Bin+ac−

H3K27ac peaks

175,530~1,688,868

700bp bin overlapping H3K27ac

700bp bin not overlapping H3K27ac

Bin+S+&ac+/Bin+S−&ac−

STARR & H3K27ac peaks

7220~49,462

700bp bin overlapping STARR&H3K27ac peaks

700bp bin not overlapping STARR&H3K27ac peaks