Skip to main content
Fig. 1 | BMC Biology

Fig. 1

From: Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice

Fig. 1

Schematic of our two-step approach and workflow of our machine-learning classifiers models. In the first step, dePCRM2 maps (1-1) 1kbp binding peaks in all available TF binding data to the genome and then partitions (1-2) the peak-covered genome regions into a CRMs set (solid lines) and a non-CRMs set (dotted lines). In a cell/tissue type, a subset of the CRMs in the genome are active (red lines in the red box), while the remaining subset are non-active (blue lines in the blue box). Next, dePCRM2 predicts (1-3) a subset of the active CRMs in the cell type to be active based on their overlaps with available TF binding peaks in the very cell type (red lines with two binding peaks of pair-end TF ChIP-seq reads), while dePCRM2 typically cannot predict the remaining active CRMs to be active due to the lack of TF binding data (red lines without two binding peaks of pair-end TF ChIP-seq reads). In the second step, we construct (2-1) a positive set (CRM+TF+) using the active CRMs predicted by dePCRM2 in the cell type, and a negative set either by randomly selecting predicted non-CRMs in the genome (Non-CRM), or using the putative CRMs in the genome that do not overlap STARR-seq peaks in the cell type and cannot be predicted to be active by dePCRM2 (CRM+S−). We compute feature vectors (2-2) and train (2-3) a classifier model using a few epigenetic marks on the positive and negative sets in the cell type, or on pooled positive and negative sets from multiple cell types. We then use (2-4), the trained model to predict functional states of all the CRMs whose functional states cannot be predicted by dePCRM2 in the cell type (both red and black lines in the green box) or in an any different cell type

Back to article page