 Research Article
 Open Access
 Published:
A framework for modelling gene regulation which accommodates nonequilibrium mechanisms
BMC Biologyvolume 12, Article number: 102 (2014)
Abstract
Background
Gene regulation has, for the most part, been quantitatively analysed by assuming that regulatory mechanisms operate at thermodynamic equilibrium. This formalism was originally developed to analyse the binding and unbinding of transcription factors from naked DNA in eubacteria. Although widely used, it has made it difficult to understand the role of energydissipating, epigenetic mechanisms, such as DNA methylation, nucleosome remodelling and posttranslational modification of histones and coregulators, which act together with transcription factors to regulate gene expression in eukaryotes.
Results
Here, we introduce a graphbased framework that can accommodate nonequilibrium mechanisms. A generegulatory system is described as a graph, which specifies the DNA microstates (vertices), the transitions between microstates (edges) and the transition rates (edge labels). The graph yields a stochastic master equation for how microstate probabilities change over time. We show that this framework has broad scope by providing new insights into three very different ad hoc models, of steroidhormone responsive genes, of inherently bounded chromatin domains and of the yeast PHO5 gene. We find, moreover, surprising complexity in the regulation of PHO5, which has not yet been experimentally explored, and we show that this complexity is an inherent feature of being away from equilibrium. At equilibrium, microstate probabilities do not depend on how a microstate is reached but, away from equilibrium, each path to a microstate can contribute to its steadystate probability. Systems that are far from equilibrium thereby become dependent on history and the resulting complexity is a fundamental challenge. To begin addressing this, we introduce a graphbased concept of independence, which can be applied to subsystems that are far from equilibrium, and prove that historydependent complexity can be circumvented when subsystems operate independently.
Conclusions
As epigenomic data become increasingly available, we anticipate that gene function will come to be represented by graphs, as gene structure has been represented by sequences, and that the methods introduced here will provide a broader foundation for understanding how genes work.
Background
A quantitative approach to analysing gene regulation in terms of the interactions between transcription factors (TFs) and DNA was first developed for λ repressor in Escherichia coli [1]. In the eubacterial context, TFs bind and unbind from naked DNA and it was assumed that these processes quickly reach thermodynamic equilibrium. Equilibrium statistical mechanics could then be used to calculate the probability of DNA microstates, or patterns of TF binding to DNA. The generegulation function, which expresses the dependence of mRNA transcription rate on the concentrations of the TFs, was then calculated as an average over the microstate probabilities. This equilibrium “thermodynamic formalism” has been widely used to analyse gene regulation in eubacteria [2][6].
Eukaryotic genomes use several mechanisms that dissipate energy. These include epigenetic mechanisms, such as DNA methylation, nucleosome remodelling and posttranslational modification and demodification of histones, transcription factors, transcriptional coregulators and components of the transcriptional machinery, like RNA polymerase or Mediator. In each case, energy is expended to operate the mechanism, through consumption of intermediary metabolites such as ATP. Background metabolic processes maintain the concentration of such metabolites, thereby providing the free energy required away from thermodynamic equilibrium.
Despite the presence of such nonequilibrium mechanisms, the thermodynamic formalism has been widely used to analyse gene regulation in eukaryotes, including yeast [7], flies [8][13] and human cells [14], and has been extensively reviewed [15][19]. In most cases, nonequilibrium mechanisms have not been incorporated in these models. An exception has been work on nucleosome positioning [18], for which the argument was made that energy dissipation is used primarily to overcome energy barriers, after which nucleosomes and transcription factors reach equilibrium in competing for DNA, thereby allowing treatment within the thermodynamic formalism. While initially successful, more recent experimental work suggests that this does not fully explain nucleosome positioning and that it is important to take energy dissipation into account [20],[21]. Several other recent studies have also begun to raise doubts about the validity of the equilibrium assumption [22][24].
The biological significance of energy dissipation is broadly understood; it is essential for life. Its deeper implications for the molecular context were first clarified by John Hopfield in a seminal study [25]. He showed that if a molecular mechanism operated at equilibrium, then there was an absolute upper bound to how well it could carry out certain informationprocessing tasks, such as achieving fidelity in mRNA or protein production. The source of this upper bound was the property of detailed balance (discussed below), which is a fundamental physical constraint on equilibrium systems. To get beyond this upper bound, it is essential to expend energy and to drive the system away from equilibrium so that detailed balance no longer holds. Hopfield put forward a kinetic proofreading scheme, which he showed could achieve unlimited error correction by expending sufficient energy. Subsequent work has refined this scheme [26],[27] but the limitation in the capabilities of equilibrium mechanisms has been a fundamental insight.
Despite this understanding, the significance of nonequilibrium mechanisms in gene regulation remains unclear. Energy must evidently be expended to pack DNA into the nucleus and to organise chromatin mechanically but it seems unlikely that evolution would not also take advantage of energy dissipation for cellular information processing. From a different perspective, increasing amounts of epigenomic data are becoming available through highthroughput experimental projects [28][30]. Without being able to analyse rigorously the nonequilibrium mechanisms that give rise to such data, it seems unlikely that we will fully understand the epigenomic capabilities of eukaryotic DNA, whose role in both development and evolution is of considerable interest [31][33].
One of the barriers to progress here has been the absence of a mathematical framework that can accommodate nonequilibrium mechanisms in gene regulation. We have developed a graphbased, “linear framework” for timescale separation in biochemical systems [34][38], which is not limited to thermodynamic equilibrium. We show here how this can be adapted to the nonequilibrium mechanisms that are found in gene regulation. The framework yields a stochastic master equation for the probabilities of DNA microstates. An important feature of this equation is that it is linear (hence, “linear framework”). The nonlinearities that are always present in biochemical systems are accommodated through labels on the edges of the graph, without the need for any approximation. If a system is at equilibrium, the linear framework reduces to the thermodynamic formalism. The framework offers a chemist’s perspective in terms of reactions and rates in place of a physicist’s perspective in terms of states and free energies, and exploits graph theory to calculate the steadystate probabilities of microstates.
The catalytic production of mRNA by RNA polymerase is fundamentally irreversible and dissipative. In the thermodynamic formalism, the rate of mRNA expression is treated as an average over the equilibrium states. With the framework introduced here, the dissipative steps taken by mRNA polymerase can be explicitly included in the model, when required. What is not addressed here are the dynamics of mRNAs and proteins and the resulting important issue of gene expression noise [39],[40]. This has only recently been analysed in the context of gene regulatory architecture [41],[42]. It is possible to accommodate the numbers of mRNA and protein molecules within a graphbased framework but this requires infinite graphs in contrast to the finite graphs used here. The question of whether the graphtheoretic methods introduced here can be extended to infinite graphs is very interesting but lies outside the scope of the present paper.
We have three broad aims here. First, we want to introduce the new framework and show that it can be broadly applied to different types of problems in gene regulation and chromatin organisation. We use it to analyse systematically three very different ad hoc models: of steroidhormone responsive genes where detailed balance is still assumed, of inherently bounded chromatin domains where dissipation is critical but no specific gene is being regulated and of regulation of the yeast PHO5 gene where nonequilibrium nucleosome remodelling is explicitly included and detailed balance cannot be assumed. Second, we show that the generegulation function of PHO5 is surprisingly complex. We are able to explain this complexity as an inherent feature of nonequilibrium systems, which arises from the dependence on history away from equilibrium. The scope of this complexity appears not to have been experimentally explored and may reflect informationprocessing capabilities that could not be achieved at equilibrium. Our third aim is to begin the study of graphs that exhibit reduced complexity. We formulate a graphtheoretic concept of independence for nonequilibrium systems and show that historydependent complexity collapses when systems operate independently of each other.
To make this paper broadly accessible, we begin with a nontechnical description of the framework, introducing some key concepts and explaining how graph structures provide useful qualitative insights. We then explain how graphs are constructed in terms of specific biochemical processes acting on DNA and chromatin. The quantitative calculation of steadystate probabilities relies on previous work, which is brought together in the next section to make the paper as selfcontained as possible. The remaining sections work through the results described above.
Results
A graphtheoretic view of gene regulation
We offer in this section a nontechnical account of the linear framework as applied to gene regulation. The technical details are provided, along with references, in the section on ‘Calculating microstate probabilities at steady state’.
The framework starts with a labelled, directed graph consisting of a collection of vertices with directed edges between pairs of vertices and labels on the edges (Figure 1, bottom). The graphs considered here have only finitely many vertices and the edges always go between distinct vertices, so that there are no selfloops. It is further assumed that each graph is connected, which means that, given any two vertices, there is always a path of edges between them, ignoring edge directions. A connected graph is not in disjoint pieces.
The vertices of the graph correspond to microstates, or snapshots of DNA and its accompanying proteins. Figure 1 (top) shows the range of features that can potentially be found in a microstate, including TFs, transcriptional coregulators, RNA polymerase, nucleosomes, chromatin remodelling enzymes, DNA looping, various forms of posttranslational modification and DNA methylation. The directed edges correspond to transitions between microstates arising from biochemical reactions taking place on chromatin, such as the binding and unbinding of TFs or coregulators or posttranslational modification or demodification of proteins bound to DNA. Directed graphs of this kind are often found in the literature as qualitative summaries of the behaviour of regulatory mechanisms. Such cartoons can be given a rigorous mathematical basis through the methods introduced here.
The labels on the edges supply quantitative information in the form of effective rate constants for the corresponding transitions. Each label has units of inverse time, as in per second. The rate of some transitions, such as binding events, can depend on the concentration of components in solution around DNA. The labels can therefore be compound expressions involving component concentrations as well as kinetic parameters. In this way biochemical nonlinearity is accommodated in the labels. An important feature of the framework is that the numerical values of the parameters do not have to be known in advance. They can be treated as symbols and many properties of the system can be calculated in symbolic form. This permits analysis without having to measure or estimate the actual values of the parameters.
The level of granularity used for the microstates, and the corresponding transitions, is a matter of choice. It can range from coarsegrained descriptions of open and closed chromatin to finegrained descriptions of DNA sequence, individual nucleosomes and specific histone modifications. The choice depends on the context, the available experimental methods and data and the biological questions being asked. The graph constitutes a mathematical model of the system being studied and is best thought of not as a description of reality but as a precise statement of the assumptions being made about that reality – a hypothesis – from which rigorous deductions can be made and experiments proposed [43].
Because there is only one molecule of DNA, the dynamical behaviour of microstates has to be understood in terms of probabilities. If we imagine watching DNA over time, the microstates will fluctuate as transitions take place due to random molecular events, such as binding or unbinding of components. Let us denote the probability of the system being in microstate i at time t by u _{ i }(t). The following thought experiment may help to interpret this quantity. Imagine a large number of copies of the system being created in the identical starting condition at time 0, with the same initial microstate and the same protein components present in the surrounding solution at the same concentrations. As time progresses, the randomness of molecular events will cause the different copies of the system to diverge so that different microstates will be found in each system copy. The proportion of copies in which microstate i is found at time t is an approximation for u _{ i }(t) and this approximation becomes more accurate as the number of copies is increased. In other words, u _{ i }(t) measures how often microstate i will be found at time t, were it possible to repeatedly replay the system from its initial condition at time 0.
Probabilities can appear difficult to reason with but the graphbased framework offers a different way to think about them which may be more familiar. The vertices of the graph are regarded as chemical species with concentrations, the edges as chemical reactions and the labels as rate constants. Each reaction has only a single substrate and only a single product, like an isomerisation, so the graph describes a kind of onedimensional chemistry. This macroscopic interpretation allows us to reason about concentrations and reactions but gives the same results as the microscopic interpretation in terms of probabilities and transitions. In other words, if we imagine placing concentrations of matter at each vertex and allowing the chemistry to work, then the change in concentrations over time is identical to the change in probabilities over time. The only thing we have to remember is that probabilities add up to 1 – the system must be in some microstate – so that the total concentration of matter at all vertices should be kept at 1. Because the reactions only move matter between vertices, and neither create nor destroy it, the total concentration remains the same over time (see Equation 2 below), so we only need to make it 1 to begin with.
It is easy to imagine that, no matter what initial concentrations of matter are distributed over the vertices, the onedimensional chemistry will eventually reach a steady state, in which production and consumption of each species are in balance and the concentration of each species is unchanging. Such a steady state occurs no matter what the structure of the graph. In a general graph, the steady state can depend on the initial concentrations that were chosen at time 0, so that there is a memory of these initial conditions (see the section ‘Formation of an inherently bounded chromatin domain’). However, if the graph is strongly connected, such memory is lost and the steady state becomes independent of the initial conditions and depends only on the structure of the graph. A strongly connected graph is one in which any pair of vertices are connected, both ways, by a path of consecutive edges that all point in the same direction (Figure 2A). In effect, any two vertices can communicate with each other in both directions. Strong connectivity depends only on the edges and not on the labels.
A strongly connected graph can be arbitrarily large and complicated but its onedimensional chemistry is particularly simple. The steadystate concentration of each species can be calculated in terms of the edge labels using certain subgraphs called spanning trees (see Equation 7 below). Among other things, this shows that each microstate in a strongly connected graph has positive probability at steady state: if such a system is watched over time, each microstate will appear at steady state, even if that microstate had zero probability in the initial condition.
A general graph, which is not strongly connected, breaks up naturally into maximal strongly connected subgraphs, or strongly connected components (SCCs) (Figure 2B). Once matter has left a SCC under onedimensional chemistry, it can never return to it, for otherwise the SCC would not be maximal. Hence, matter eventually accumulates on those SCCs from which there is no escape, which are the terminal SCCs. If a microstate is not in a terminal SCC, its steadystate probability is zero: if the system is watched over time, such microstates never appear in the steady state, even if they had positive probability in the initial condition. For the microstates that do lie in terminal SCCs, their steadystate probability may or may not be zero depending on the initial conditions. For instance, if matter is only placed on the vertices of one terminal SCC, it will remain there forever and cannot escape into any other SCC, whose vertices will have zero probability at all times.
A system that reaches thermodynamic equilibrium always has a strongly connected graph. The property of detailed balance, which must always hold at equilibrium, requires that each edge in the graph has a corresponding reverse edge, so that strong connectivity is guaranteed. If the labels on a pair of reversible edges are a and b, then the ratio a/b is a thermodynamic quantity that depends only on the free energy difference between the two microstates (see Equation 6 below). The steadystate probabilities depend only on these thermodynamic ratios and can be calculated as products of the ratios along paths in the graph, without the need for any spanning trees (see Equation 5 below). This gives the same result as equilibrium statistical mechanics. In this way, the framework provides a generalisation of equilibrium statistical mechanics for generegulation systems that are far from equilibrium.
Constructing graphs to describe gene regulation
Linear framework graphs are constructed from labelled edges, which arise from two kinds of transitions, as listed below. The main restrictive assumptions concern the interplay between mechanisms taking place in solution around chromatin and those taking place on chromatin itself. The basic approach is to assume that these can be uncoupled from each other. More relaxed assumptions can be made, using the methods of [35], but at the expense of considerably increased complexity.
Binding transitions
These represent the binding of a component L to a microstate (Figure 3A). The label is a=k[ L], where k is an onrate and [ L] is the free concentration of L. We follow the thermodynamic formalism and assume, first, that components are neither synthesised nor degraded over the timescale of interest so that their total amounts are conserved quantities and, second, that the depletion of L can be ignored, so that the binding of a single molecule of L does not appreciably change its free concentration, [ L]. In other words, [ L]≈L _{tot}. Nonspecific binding to DNA can significantly reduce the free concentration and if this is thought to jeopardise the nodepletion assumption, a more elaborate analysis is needed [36],[44].
Components can also engage in interactions such as oligomerisation. We again follow the thermodynamic formalism and assume that such reactions are fast compared to binding reactions on DNA, so that they have reached a rapid equilibrium. The label on the edge has the form a=k[ X], were k is an appropriate onrate and X is the component form that binds to DNA (Figure 3B). [ X] can be calculated in terms of the concentrations of the underlying components using the rapid equilibrium assumption (Methods).
Nonbinding transitions
These are transitions in which the edge label does not contain a concentration term. They can arise from several different types of biochemical process:

unbinding reactions, in which a component that had previously bound to form the source microstate unbinds, with the offrate as the label (Figure 3C);

allosteric change, in which the conformational state of DNA, or of a component or complex in the microstate, is altered (Figure 3D);

threedimensional chromatin conformation change, such as DNA looping, in which separate parts of a microstate, such as a distal enhancer and a proximal promoter, bind or unbind from each other (Figure 3E), with the respective rate constants as the labels;

nucleosome assembly or disassembly, with the nucleosomes treated, for example, as individual entities (Figure 3F), so that the labels are the aggregated overall rates of the assembly or disassembly pathway;

enzymatic activity, in which an enzyme, which is assumed to be already bound in the source microstate, undertakes a biochemical reaction that alters the microstate, such as posttranslational modification or demodification of a histone, a coregulator or a transcription factor (Figure 3G, H), or methylation or demethylation of DNA (Figure 3I, demethylation is not shown), with the enzyme catalytic rate as the label;

RNA polymerase activity, including transcription initiation, open complex formation, promoter clearance, elongation, pausing, etc.; Figure 3J shows elongation as a single step following initiation but this can be broken down to a finer granularity as required.
Numerical values for the parameters appearing in the labels can sometimes be estimated from experimental data [10],[12],[45]. One of the advantages of the framework is that calculations can be undertaken with symbolic parameters, without having to know numerical values in advance.
Calculating microstate probabilities at steady state
The mathematical details of the linear framework were developed in previous work [35][37], as reviewed in [38]. As this may not be familiar, and to keep this paper as selfcontained as possible, the material is summarised here. Proofs of most of the assertions can be found in [37]. A graph of the kind constructed above, as in Figure 1, gives rise to a linear differential equation that describes how the probabilities of each microstate change in time. We first explain how this differential equation arises and then show how microstate probabilities can be calculated at steady state. The key formulas for the microstate probabilities are Equation 5 at equilibrium and Equation 7 away from equilibrium. We have italicised mathematical concepts that may be unfamiliar and have provided a glossary to explain these in the Methods.
Laplacian dynamics
Suppose we are given a graph G, as in Figure 4A, with vertices indexed 1,…,n. We typically use the index 1 for the reference microstate with no TFs bound and choose the order of the other microstates arbitrarily. The notation $i\stackrel{a}{\to}j$ signifies the edge with label a from source vertex i to target vertex j. A dynamics can be imposed on G in two equivalent ways. In the macroscopic interpretation, the vertices are chemical species and the edges are chemical reactions, which convert source species to target species. The edge labels are rate constants for the corresponding reactions, assuming massaction kinetics. Since each reaction is unimolecular, with only one substrate and one product, this onedimensional chemistry yields a linear dynamics (Figure 4A),
where x(t) is a column vector of species concentrations and $\mathcal{\mathcal{L}}\left(G\right)$ is an n×n matrix whose entries are labels, which is called the Laplacian matrix of G.
Since the dynamics interconverts between species and neither creates matter nor destroys it, the total concentration does not change over time. The dynamics therefore satisfies the conservation law
This corresponds to the columns of the Laplacian matrix adding up to 0 (Figure 4A), so that ${1}^{t}\xb7\mathcal{\mathcal{L}}\left(G\right)=0$, where 1 signifies the allones column vector and ^{t} denotes the transpose operation, which turns a column vector into a row vector.
In the microscopic interpretation, the vertices are microstates, the edges are transitions between microstates and the labels are infinitesimal transition rates for the corresponding edges. This means that, if $i\stackrel{a}{\to}j$ and Δ t is a time interval sufficiently small so that a Δ t<1, then the probability of taking the transition from state i to state j is approximately a Δ t and the approximation gets better as Δ t gets smaller (see Equation 15 in the glossary). This interpretation defines a continuous time, finite state Markov process. A Markov process gives rise to a master equation that describes how the microstate probabilities change over time. This master equation is identical to Equation 1, so that
where u _{ i }(t) is the probability of occurrence of microstate i at time t. The only difference with the macroscopic interpretation is that probabilities must always add up to 1, so that u _{tot}=1 in Equation 2. Matrices of Laplacian type often arise when master equations are used but the underlying graph, from which the Laplacian can always be derived, has not been exploited as we do here.
Steady states
In the macroscopic interpretation, no matter what graph and what initial condition are chosen, the dynamics always reaches a steady state, x ^{∗}, in which production and consumption of each species is exactly balanced, so that, d x ^{∗}/d t=0. By Equation 1, x ^{∗} is in the kernel of the Laplacian matrix: ${x}^{\ast}\in ker\mathcal{\mathcal{L}}\left(G\right)$.
A particularly important case arises when G is strongly connected (Figures 2A and 4B) because the kernel of the Laplacian is one dimensional:
In other words, there is a unique steady state, up to a scalar multiple. Given a basis vector for the kernel, ${\rho}^{G}\in ker\mathcal{\mathcal{L}}\left(G\right)$, it then follows from Equations 2 and 3 that the steadystate probabilities are obtained by normalising the entries of ρ ^{G} to its total amount, ${\rho}_{1}^{G}+\cdots +{\rho}_{n}^{G}=1\xb7{\rho}^{G}$, so that
Such a basis vector ρ ^{G} can be constructed in one of two ways, described next.
At thermodynamic equilibrium
If the graph represents a system that can reach thermodynamic equilibrium, then detailed balance must be satisfied [36]. This requires two conditions to hold. First, the graph must be reversible: if the graph has an edge $i\stackrel{a}{\to}j$, then it must also have a reverse edge, $j\stackrel{b}{\to}i$, corresponding to the same underlying biochemical reaction working in reverse. Note that reversible edges imply that the graph is strongly connected. Second, in any steady state, x ^{∗}, any such pair of reversible edges must be independently at equilibrium, with the forward flux in balance with the reverse flux, irrespective of any other edges involving i and j. Setting the two fluxes to be in balance, it follows that ${x}_{j}^{\ast}=(a/b){x}_{i}^{\ast}$.
To determine ${\rho}_{j}^{G}$, choose any path of reversible edges from vertex 1 to vertex j,
and let ${\rho}_{j}^{G}$ to be the corresponding product of label ratios,
It follows from detailed balance that ${x}_{j}^{\ast}={\rho}_{j}^{G}{x}_{1}^{\ast}$, so that x ^{∗}=λ ρ ^{G} where $\lambda ={x}_{1}^{\ast}$. Hence, ρ ^{G} provides the required basis vector of $ker\mathcal{\mathcal{L}}\left(G\right)$, from which probabilities can be calculated using Equation 4. For this procedure to be consistent, ${\rho}_{j}^{G}$ must be independent of the chosen path from 1 to j. This is ensured by the cycle condition, which is a necessary consequence of detailed balance [36]. It is an important feature of being at thermodynamic equilibrium that history does not matter: any path to a microstate can be used to determine its equilibrium probability.
Equation 5 is equivalent to the thermodynamic formalism through van’t Hoff’s formula. If $i\stackrel{a}{\to}j$ and $j\stackrel{b}{\to}i$, then, at thermodynamic equilibrium,
where Δ G is the freeenergy difference between microstates j and i, R is the molar Boltzmann constant and T is the absolute temperature. The product of label ratios in Equation 5 is transformed, through the exponential function in Equation 6, into a sum of free energies, which determines the free energy of microstate j relative to that of the reference microstate 1. The denominator in Equation 4 is then the partition function of equilibrium statistical mechanics.
Thermodynamic equilibrium requires detailed balance but a graph can satisfy detailed balance without being at equilibrium. For instance, certain graph structures in which each edge is reversible, such as a sequence structure (Figure 5A) or, more generally, a tree structure (Figure 5B), always satisfy detailed balance (Methods). In such a graph the edges may involve dissipative mechanisms. However, although an edge $i\stackrel{a}{\to}j$ is accompanied by a reverse edge $i\stackrel{a}{\to}j$, these edges may not arise from an underlying biochemical reaction operating reversibly but from two separate dissipative reactions, such as phosphorylation and dephosphorylation, each acting irreversibly. The ratio a/b would no longer have a thermodynamic interpretation in terms of a free energy difference, as in Equation 6.
Away from equilibrium
If the graph represents a system that is maintained away from thermodynamic equilibrium, then detailed balance may no longer hold. The graph may have irreversible edges and Equation 5 no longer works. If the graph is strongly connected, a basis vector of $ker\mathcal{\mathcal{L}}\left(G\right)$ can be calculated by the matrixtree theorem, a proof of which is given in the Appendix to [37]. This leads to the following procedure. Let Θ _{ j }(G) be the set of spanning trees of G that are rooted at microstate j. Informally, a tree is a subgraph with no cycles, it is spanning if it reaches every vertex and it is rooted at vertex i if i has no outgoing edges in the tree. Figure 4B gives examples of rooted spanning trees. It is not difficult to see that a graph is strongly connected if, and only if, it has a spanning tree rooted at each vertex and that a spanning tree always has one less edge than the number of vertices in G.
For a strongly connected graph, ${\rho}_{j}^{G}$ may be calculated by multiplying together the labels on the edges of each spanning tree rooted at j and adding up these products over all such spanning trees:
Because a strongly connected graph has at least one spanning tree rooted at each vertex, each entry in the basis vector is positive, so that ${\rho}_{j}^{G}>0$ for each j. Hence, by Equation 4, each microstate has positive steadystate probability. The denominator in Equation 4 provides a nonequilibrium partition function.
Nonstrongly connected graphs
Graphs arising in gene regulation may not always be strongly connected (see the section ‘Formation of an inherently bounded chromatin domain’ and Figure 6C). Steadystate probabilities for nonstrongly connected graphs can be calculated by considering the SCCs of G (Figures 2B and 4C). The SCCs inherit connections from the underlying graph but these connections can never form a cycle, for otherwise the SCCs would collapse into each other. It is therefore possible to identify terminal SCCs, from which there are no outgoing connections. The terminal SCCs yield steady states in the following way.
Let T _{1},…,T _{ t } denote the terminal SCCs. Each T _{ k } is by definition strongly connected, so that it has a basis vector ${\rho}^{{T}_{k}}\in ker\mathcal{\mathcal{L}}\left({T}_{k}\right)$, as given by Equation 7. We can now construct the vector ρ ^{G,k} that agrees with ${\rho}^{{T}_{k}}$ on those microstates that lie in T _{ k } and which is zero on all other microstates (Figure 4C). The vectors ρ ^{G,k} provide a basis for the kernel of the Laplacian of G:
The dimension of the kernel is then t, the number of terminal SCCs. Note that, if i is any microstate that is not in a terminal SCC, then ${\rho}_{i}^{G,k}=0$ for each basis vector ρ ^{G,k}.
The t basis vectors in $ker\mathcal{\mathcal{L}}\left(G\right)$ are matched by t conservation laws. In contrast to Equation 2, which is the only conservation law when t=1, the additional conservation laws for t>1 depend on the structure of the graph. These additional laws can be algorithmically calculated from $\mathcal{\mathcal{L}}\left(G\right)$.
Any steady state x ^{∗} can be expressed as a linear combination of the basis vectors in Equation 8. If these vectors are normalised to their respective totals, then, in the resulting expression for x ^{∗},
the coefficients z _{1},…,z _{ t } are the values taken by the t conservation laws.
Calculating gene expression
In the thermodynamic formalism, a rate of gene expression, g _{ i }, is assumed for each microstate i and the overall rate is taken to be proportional to the average over the steadystate microstate probabilities ${u}_{i}^{\ast}$. This average is given by
The same procedure is used for the examples studied here but the linear framework can accommodate the irreversible dynamics of mRNA polymerase (initiation, open complex formation, promoter escape, elongation, pausing, etc.) [17],[49],[50], as shown in Figure 3J. The dynamics of mRNAs and proteins can also be coupled to gene regulation within a graphtheoretic formalism [41]. However, this leads to infinite graphs because the number of mRNA or protein molecules may be unlimited.
Having summarised the linear framework and shown how it generalises the thermodynamic formalism to nonequilibrium contexts, we now discuss three applications that demonstrate the framework’s scope.
Regulation of steroidhormone responsive genes
Ong et al. have put forward a theoretical framework for gene induction [46], motivated by studies of steroidhormone receptors [51]. They use ad hoc methods, which are independent of previous work on gene regulation. We show here how their analysis can be generalised and simplified within the linear framework.
Recent work on steroidhormone sensitive genes has revealed new coregulators, such as the Ubiquitin conjugating enzyme, Ubc9, indicating the existence of multiple steps in addition to hormonereceptor binding to DNA [46]. Despite this additional complexity, generegulation functions [16], which describe how rates of gene expression depend on hormone concentration, are well fitted to Michaelis–Menten style functions, or firstorder Hill dose–response curves (FHDCs) in the language of Ong et al., who use their theoretical framework to derive conditions under which such FHDCs arise.
They consider a sequence of reversible reactions (Figure 5A), representing the behaviour of the promoter of a hormonesensitive gene. Such a sequence graph always satisfies detailed balance (Methods). We consider the more general case of an arbitrary graph G of reversible edges that satisfies detailed balance. This might be, for instance, a tree graph (Figure 5B), which also always satisfies detailed balance (Methods). If a general graph satisfies detailed balance it may not necessarily reach thermodynamic equilibrium and the edges of G may involve dissipative mechanisms.
We assume that components R,U,Y _{1},…,Y _{ m } are present and they can bind and unbind to form the microstates of G. Y _{1},…,Y _{ m } are background components that can engage in protein–protein interactions among themselves, so that their concentrations can appear in labels of the form $\mathrm{k\Phi}\left(\right[\phantom{\rule{0.3em}{0ex}}{Y}_{{i}_{1}}],\dots ,[\phantom{\rule{0.3em}{0ex}}{Y}_{{i}_{k}}\left]\phantom{\rule{0.3em}{0ex}}\right)$, where Φ is some function, as in Figure 3B. The nodepletion assumption allows free concentrations to be replaced by total concentrations, [ Y _{ i }]≈ Y _{i,tot}, so that the labels in which Y _{1},…,Y _{ m } occur are functions of rate constants and total amounts, or “constants”. R and U are titratable components, which, crucially, are assumed to bind at most once in each microstate. U corresponds to a coregulator like Ubc9, which does not engage in protein–protein interactions, so that the corresponding label has the form k ^{′}[ U] (Figure 3A). R corresponds to the steroidhormone receptor, to which the steroid hormone S binds to form a complex RS, which then binds DNA (Figure 3B with S=L and R=M). The label on the corresponding edge has the form k ^{″}[ R S] where
which is a FHDC as a function of [ S].
The main result is that, provided gene expression only occurs from microstates in which both R and U are bound, the average rate of gene expression, g([ S]), as given by Equation 10, is also a FHDC (Additional file 1A),
The constants M _{ G } and K _{ G } have clear interpretations in terms of G. M _{ G } is (evidently) the average rate of gene expression at saturation (i.e., when [ R S]=R _{tot}). Less obviously, K _{ G } is K _{ R } multiplied by the saturation probability of those microstates in which R is not bound. Additional file 1A gives the details of the proof and shows how the formulas in Ong et al. emerge from Equation 11. It also discusses how Ong et al. show, for the special case of a sequence, that g([ S]) remains a FHDC even if the nodepletion assumption is dropped at a concentration limiting step. Ong et al. also address other issues, such as inhibitory reactions, which are not discussed here.
The framework introduced here generalises and clarifies the work of Ong et al., showing how formulas like Equation 11 can be rigorously proved irrespective of the complexity of the underlying graph. The interpretation of the parameters in Equation 11 is new but emerges easily from our analysis (Additional file 1A). However, because detailed balance is assumed, the consequences of being away from equilibrium remain hidden, as we will see subsequently.
Formation of an inherently bounded chromatin domain
Our next application is to a model of chromatin organisation, with no explicit gene regulation. Hathaway et al. recently showed how a bounded chromatin domain could be nucleated in vivo and stably inherited as a form of epigenetic memory [47]. To explain the dynamics of such domains, they developed a mathematical model based on a linear array of 257 nucleosomes [47],[48]. This model is readily translated into our framework. We considered nucleosome arrays with varying numbers of sites n. We placed the nucleation site at the righthand end of our array (Figure 6A). This is essentially similar to the lefthand half of the array of 2n−1 nucleosomes (for n=129) considered by Hathaway et al. The microstates correspond to array marking patterns, of which there are 2^{n}, while the edges correspond to mark nucleation, propagation and turnover (Figure 6A,B). Propagation and turnover were assumed uniform at all nucleosomes, at rates k+ and k_, respectively. However, nucleation was limited to the nucleation site at rate k+, so that some edges are not reversible. This irreversibility reflects the dissipative mechanism of histone marking and the nonequilibrium nature of the model. The graph does not satisfy detailed balance but is strongly connected.
Hathaway et al. used a Monte Carlo simulation to generate stochastically a succession of microstates, from which steadystate probabilities were estimated as the frequencies with which microstates appear. They found that, if k+/k_≤1.5, marking persisted in a stochastically fluctuating but inherently bounded domain near the nucleation site, reflecting what was found experimentally.
Monte Carlo simulation is an efficient method for studying very large graphs: an array of 257 nucleosome has a graph with approximately 10^{77} microstates. However, the linear framework provides mathematical access to the steadystate probabilities for any array size and this yields insights that are not easily found by simulation. For instance, the ratio k+/k_ appears as a convenience in the simulations [48]. However, for a nucleosome array of n sites, the spanning trees in the corresponding graph (Figure 6A) have 2^{n}−1 edges, each of which is labelled k+ or k_. Dividing Equation 7 by ${\left(k\text{\_}\right)}^{{2}^{n}1}$, it is evident that the steadystate probabilities in Equation 4 depend only on the ratio k+/k_ and not on the individual rates. The importance of the ratio becomes readily apparent within our framework.
More significantly, Hathaway et al. proposed a modification to their model to explain the inherited stability of the domain after the nucleating stimulus was removed. They imposed a stabilisation of the nucleosome mark through a transition to a hypothetical new marked state, whose turnover was inhibited (Figure 6C, left). Each nucleosome can now be in one of three states and the graph has 3^{n} microstates (Figure 6C, right, for n=2). Because turnover is prevented by the stabilised mark, the graph is no longer strongly connected. If nucleation is stopped, as was done in the simulation, then the resulting graph has two terminal SCCs, each consisting of a single extreme microstate, one in which the entire nucleosome array is unmarked and the other in which the entire array is stably marked. According to Equation 9, all other microstates have zero steadystate probability.
Which of the two extreme microstates is reached in a simulated trajectory depends on the microstate in which nucleation is stopped. If some nucleosome has become stably marked in that microstate, then it cannot become unmarked, so the trajectory can only reach the completely stably marked microstate. This is likely to happen once the inherently bounded domain is established, unless the stabilisation rate, k ^{∗}, is so low that no stable mark has appeared. In their simulation, Hathaway et al. chose k ^{∗} to be low compared to propagation and turnover but not so low that stable marks had not appeared by the time nucleation was stopped. They concluded that the inherently bounded domain was stably maintained in the absence of the initial nucleating stimulus. Our analysis shows that this conclusion is incorrect. Once nucleation is stopped, the bounded domain becomes a transient phenomenon, which eventually expands to fill the whole array. It is conceivable that a bound on the domain size is maintained for sufficiently long to still be biologically relevant. But this places the stabilising rate k ^{∗} in a double bind: it must be sufficiently high so as to stabilise the domain, yet sufficiently low so as not to destroy its boundedness too quickly. Such finetuning of rate constants is inherently fragile and we think it is more likely that other mechanisms are at work to ensure stable inheritance of the inherently bounded domain.
Our framework allows these conclusions to be reached by elementary mathematical deductions, without the need for the numerical simulations undertaken by Hathaway et al.
Regulation of yeast PHO5
We now turn back to gene regulation and to one of the very few models in which a nonequilibrium mechanism has been rigorously analysed without assuming detailed balance. Pho5 is an acid phosphatase in Saccharomyces cerevisiae that is expressed under phosphatestarvation conditions. Kim and O’Shea undertook a quantitative analysis of PHO5 regulation by the transcription factor Pho4, using a construct detached from the phosphateresponse pathway [52] (Figure 7A).
To calculate the PHO5 generegulation function, Kim and O’Shea constructed a stochastic master equation based on a graph of transitions between DNA states. They pointed out that the nucleosomal transitions were dissipative and in some cases irreversible under their assumptions, so that detailed balance could not be assumed. Accordingly, they determined steadystate probabilities using the Symbolic Math Toolbox in MATLAB.
Kim and O’Shea’s graph of transitions is readily translated into our linear framework (Figure 7B). They assumed that the binding of Pho4 saturates according to a Hill function, which can be accommodated in a similar way to Figure 3B. The nonbinding reactions correspond to unbinding of Pho4 (Figure 3C), or to nucleosomal assembly or disassembly (Figure 3F). The graph is strongly connected, a point not mentioned by Kim and O’Shea, but as noted above for Equation 7, this ensures that the steadystate probability of each microstate is positive. They assumed that PHO5 is transcribed when there is no nucleosome occluding the TATA box, so that, in the average in Equation 10, g _{ i }=1 for the microstates 2, 3, 7, 8, 9 and 12 on the right in Figure 7B and g _{ i }=0 for those on the left. We used our own software written in the programming language Python to enumerate the spanning trees by a fast algorithm and then used the polynomial algebra capabilities of Mathematica to calculate the microstate probabilities and the generegulation function (Methods). This gave an identical result to Kim and O’Shea’s MATLAB calculation (H Kim, personal communication, January 2013). This strongly suggests that what can be done for the yeast PHO5 gene can be systematically undertaken for other genes with nonequilibrium features, with the solution now being understood explicitly through Equation 7, without recourse to MATLAB.
Having calculated the generegulation function using our framework, we sought to compare it to the experimental data acquired by Kim and O’Shea [52]. They used their synthetic construct (Figure 7A, with details in the caption) to measure the PHO5 generegulation function. In response to doxycycline, individual cells expressed Pho4YFP, which was treated as the input to the generegulation function, and this induced the expression of CFP from the Pho4responsive promoter in the construct. CFP was treated as the output as a proxy for Pho5. By using different doses of doxycycline to cover a range of Pho4YFP expression levels, the generegulation function was assembled from singlecell measurements. Kim and O’Shea also measured the generegulation function of five other variant promoters, in which the lowaffinity and highaffinity sites for Pho4 binding were either exchanged or removed.
Kim and O’Shea estimated the threshold and the maximum expression level of each variant by fitting their experimental data to a Hill function, whose Hill coefficient was found to be nearly 2 for all variants. They then fitted the estimated threshold and maximum values to the calculated generegulation function for each variant and found good agreement ([52], Figure 5). We were curious as to how well the generegulation function itself would fit the data. This is a more challenging question because the data are noisy and the generegulation function is very complicated (see below). To address this, we first smoothed the data. We then used numerical optimisation to find excellent quantitative fits to each variant individually (Figure 8, red curves) but could only undertake a manual fit to all variants collectively, which yielded the parameter values in Equation 16 (Methods). The collective fit was considerably poorer (Figure 8, black curves). While this broadly confirms Kim and O’Shea’s more coarsegrained analysis, it also suggests that the individual variants may exhibit more nuanced behaviours, which are better described by distinct parameter values.
Historydependent complexity away from equilibrium
Our analysis revealed further unexpected features of the PHO5 generegulation function. By Equation 7, each ${\rho}_{i}^{G}$ is a sum of distinct product terms (monomials) in the five edge labels (Figure 7B), of the form
Here, α is a positive integer, which records the number of spanning trees having that product of labels, and i _{1},…,i _{5} are nonnegative integers. Because the graph has 12 microstates, each spanning tree has 11 edges, so that the total degree of each monomial is 11: i _{1}+i _{2}+i _{3}+i _{4}+i _{5}=11. By examination of the calculated formulas, the maximal degree of ${k}_{\text{assoc}}^{\ast}$, in which the concentration of Pho4 appears, is 8. Considering only those monomials with this highestorder term, ${\left({k}_{\text{assoc}}^{\ast}\right)}^{8}$, the generegulation function looks like
The simplicity of these highestorder terms is deceptive, however. The numerator of Equation 12 has 261 distinct monomials while the denominator has 500 distinct monomials. Indeed, the graph in Figure 7B has 53,376 spanning trees in total. We see that the calculated PHO5 generegulation function is very complicated – the full details shown in Additional file 1C cover six pages – despite the model having only two binding sites and two nucleosomes. Because Kim and O’Shea did not provide the generegulation function in their original paper, these features are revealed here for the first time.
The linear framework allows us to understand this surprising explosion in complexity. At equilibrium, Equation 5 shows that any single path to a microstate can be used to calculate its steadystate probability. As a physicist would say, free energy at equilibrium is a function of the microstate, not of the route through which that microstate is reached. In marked contrast, away from equilibrium, Equation 7 shows that every spanning tree rooted at that microstate is required. In this case, all routes to the microstate become relevant and microstate probabilities depend in a more intricate way on the structure of the graph. Equation 7 takes care of the bookkeeping. The number of spanning trees increases very rapidly with the size of a graph: the complete undirected graph on n vertices (i.e., the graph in which there is an undirected edge between each pair of distinct vertices) has n ^{n−2} spanning trees in total. This worse than exponential increase manifests itself in the complexity of the PHO5 generegulation function.
It is important to appreciate, however, that it is not the complexity or the size of a graph that is the dominant factor in explaining the complexity found here. If we imposed additional edges on the graph in Figure 7B so as to make all the edges reversible, this would only make the graph more complex. If we then imposed detailed balance, which restricts the values of the parameters, the equilibrium probabilities would be given by Equation 5 rather than Equation 7 and the generegulation function could be written down in a few lines. The complexity uncovered here depends crucially on being far from thermodynamic equilibrium.
Additional study of PHO5 has shown that nucleosomes decouple the threshold for PHO5 expression from its dynamic range [53]. However, this kind of behaviour can be recapitulated within the thermodynamic formalism [54]. This suggests that the full implications of nonequilibrium behaviour, as revealed by the complexity of the PHO5 generegulation function, have not yet been uncovered experimentally. To suggest experimental options, we need ways to decompose the complexity found in Additional file 1C and to attribute aspects of it to specific biochemical mechanisms. Approximation methods may help in particular cases [55] but new ideas are needed for addressing the complexity barrier systematically, to which we now turn.
Graph independence leads to reduced complexity
Gene regulation often takes a modular form, with repeated binding sites, reiterated motifs and multiple enhancers [56],[57]. The microstate probabilities and the resulting generegulation function could become extremely complicated, especially if the modules are operating far from equilibrium. There is, however, one context in which simplification may be expected. This occurs when modules operate independently of each other, so that whatever takes place within one module does not affect what takes place in any other module. For instance, developmental genes are often regulated by multiple enhancers, which sometimes appear to act independently of each other [58].
Within the thermodynamic formalism, independence of binding sites leads to multiplication of the corresponding partition functions (described after Equation 6). For instance, a transcription factor, T, binding to a single site on DNA has the partition function 1+K[ T], where K is the association constant for binding. Suppose that there are m repeated binding sites to which T binds and suppose that each site has the same association constant. If these bindings are independent of each other, then the partition function for the msite system is obtained by simply multiplying the onesite partition function m times, to yield
On the other hand, if the sites are not independent, the partition function takes the more complicated form
where a _{1},…,a _{ m } can be arbitrary numbers. Evidently, the partition function in Equation 13 is considerably less complex and easier to understand. In the light of this result for equilibrium systems, we wanted to find a generalisation in which the modules are no longer individual binding sites but are represented by potentially complex graphs, which may not be at thermodynamic equilibrium. Such modules might correspond, for instance, to independent enhancers.
We used the product graph construction to capture the concept of independence. Let G and H be any two graphs which represent two modules within a gene regulation system. We make no assumptions about the graphs, which do not have to be at equilibrium and do not have to be strongly connected. The product graph G×H is constructed as follows (Figure 9). It has vertices (i,j), where i is a vertex in G and j is a vertex in H. The vertices are enumerated lexicographically, so that (i,j)<(i ^{′},j ^{′}) if either i<i ^{′} or i=i ^{′} and j<j ^{′}. For each labelled edge ${i}_{1}\stackrel{a}{\to}{i}_{2}$ in G and for every vertex j in H, the labelled edge $({i}_{1},j)\stackrel{a}{\to}({i}_{2},j)$ is created in G×H. The retention of the same label a on these edges ensures that the transition from (i _{1},j) to (i _{2},j) occurs independently of j and always at the same rate, which captures the independence assumption. Similarly, for each labelled edge ${j}_{1}\stackrel{a}{\to}{j}_{2}$ in H and for every vertex i in G, the labelled edge $(i,{j}_{1})\stackrel{b}{\to}(i,{j}_{2})$ is created in G×H. These are the only edges in G×H.
If the modules represented by G and H are operating independently of each other, then the graph of the combined system is given by G×H. What can be said about the ρ ^{G×H} in terms of ρ ^{G} and ρ ^{H}? When G and H are both strongly connected, then G×H is also strongly connected and a basis vector in the kernel of the Laplacian is given by
This uses the Kronecker product of two vectors, x⊗y, defined by (x⊗y)_{(i,j)}=x _{ i }y _{ j } (Figure 9). If either G or H are not strongly connected then G×H will not be strongly connected. A basis for the Laplacian kernel of G×H is then given by the Kronecker products ρ ^{G,i}⊗ρ ^{H,j} between each pair of basis vectors from each respective kernel. The precise product theorem is stated and proved in Additional file 1B.
In the example in Figure 9, the product theorem yields polynomials for the components of ρ ^{G×H} that have degree 3 in the labels. Since G×H is strongly connected, ρ ^{G×H} can also be calculated using the matrixtree formula in Equation 7. The resulting polynomials must have degree 5 because G×H has six vertices. However, each of the polynomials from Equation 7 has the same scalar factor of degree 2, given by
which can be divided out to give the much simpler expressions in Figure 9. The basis vectors from the product theorem are substantially less complicated, both in degree and in the numbers of monomials, than those from Equation 7.
This product theorem is important because it shows that a system that is far from equilibrium may still have simple expressions for its microstate probabilities. What is required is that the system has independent modules within it. This suggests a starting point for addressing the complexity challenge identified above, as reviewed further in the Discussion below.
Discussion
The equilibrium thermodynamic formalism has been widely adopted and has been very effective, as reviewed in [15][19]. The value of the new framework introduced here rests on extending this to accommodate nonequilibrium, dissipative mechanisms. Although life itself is fundamentally dissipative – we are only at equilibrium when we are dead – and the importance of dissipation has been broadly understood at the molecular level [25], its significance for gene regulation has remained elusive.
Recent work has started to reveal the limitations of equilibrium assumptions. Gelles and colleagues, using singlemolecule methods on E. coli promoters, assert that ‘it may be necessary to consider that transcription output is a nonequilibrium phenomenon controlled by the kinetic properties of the system, not simply its thermodynamics’ [22]. Lieb and colleagues, using a genomewide competition ChIP assay in yeast, show that thermodynamic quantities are substantially less well correlated with gene expression than kinetic quantities [23]. Reviewing these and other developments, Larson and colleagues state that: ‘Currently, most quantitative theoretical models describe transcriptional regulation as an equilibrium thermodynamic phenomenon.... Here we explain how this description is fundamentally inconsistent with the canonical view of gene regulation’ [24].
Despite these assertions, no specific informationprocessing task has been identified that cannot be achieved at equilibrium and for which nonequilibrium mechanisms are essential. We can suggest three possibilities where that might be the case.
First, the experimental construction of an inherently bounded chromatin domain by Hathaway et al. relies on irreversible, dissipative mechanisms. If their model is forced to be at equilibrium by imposing reversibility of the edges, it can be readily seen that the inherently bounded domain vanishes (Methods). This suggests that dissipation is essential for maintaining a bounded chromatin domain.
Second, recent work indicates that nucleosome positioning may depend crucially on nonequilibrium mechanisms. It has been suggested that both the SWI/SNF and ISWI/ACF chromatin remodelling complexes use an ATPdependent kinetic proofreading scheme to find the correct nucleosomal substrates on which to act [59],[60], in a manner essentially identical to Hopfield’s original scheme [61]. In contrast, as mentioned in the Background, nucleosomes have been treated as competing with transcription factors for binding to DNA within the thermodynamic formalism, ignoring the dissipative aspects [18],[62]. In support of this, Segal and Widom pointed out that in vitro reconstitution experiments using purified histones and genomic DNA, which would be expected to reach equilibrium, reproduce many aspects of in vivo nucleosome organisation. However, it has been a matter of contention as to how closely in vivo nucleosome organisation is matched in vitro. In attempting to resolve these issues, Struhl and Segal [21] point to more recent work [20] in which reconstitution with wholecell extract and ATP, presumably involving ATPdependent nucleosome remodellers, significantly improves in vitro recapitulation. Genetic deletion of nucleosome remodellers also has distinctive effects on nucleosome organisation. Pugh and colleagues suggest, in contrast to Segal and Widom, that ‘the active nucleosome organization in vivo may be at steady state, under the continuous expense of energy, rather than at equilibrium’ [20].
Third, we suggest that the combination of developmental precision and evolutionary plasticity may require nonequilibrium mechanisms. Experimental studies of the early Drosophila embryo suggest that the precision with which the hunchback gene is turned on and off in individual cells, in response to the maternal morphogen Bicoid, is close to the limits set by physics [63]. Nevertheless, the hunchback promoter varies considerably in the numbers and the positions of Bicoid binding sites between different species of Diptera [64], suggesting high evolutionary plasticity. While it may be possible to construct equilibrium mechanisms that achieve high precision, it seems difficult to achieve plasticity also. We speculate that nonequilibrium mechanisms may be essential to achieve both.
The framework that we have introduced here provides the foundation from which to explore such possibilities systematically. It has revealed the profound difference between equilibrium and nonequilibrium mechanisms, prefigured in Hopfield’s earlier work [25], but the remarkable complexity that we have uncovered away from equilibrium presents a formidable challenge. This complexity is fundamental because it arises from the underlying physics: history cannot be ignored away from thermodynamic equilibrium. We see two strategies for addressing this.
First, one strand of research within nonequilibrium statistical mechanics has sought to clarify the relationship between thermodynamic forces and microscopic fluxes within a graphtheoretic formalism [65] (further historical connections are reviewed in [37]). More recent developments in nonequilibrium statistical mechanics [66],[67] may help to decompose the historydependent complexity into physically meaningful components, which may then be experimentally accessible.
Second, from a mathematical perspective, our work shows that the complexity is modulated by the structure of the graph. Independence decreases the complexity, as in Figure 9, as does equilibrium, as in Equation 5. It may be reasonable to assume that some parts of a graph are at equilibrium, with dissipation serving not to maintain these microstates but, rather, to provide access to them over energy barriers, as previously suggested by Segal and Widom for nucleosome positioning [18], while other parts of the graph are maintained far from equilibrium and yet other parts may operate independently. If we could understand how to partition graphs in this way and how such partitioning simplified the steadystate probabilities, then we might have a means to address the complexity problem. We plan to explore these strategies in subsequent work. We anticipate that an interdisciplinary approach, combining biological experiments with physics and mathematics, will be essential to unravel how graph structure gives rise to function in the context of gene regulation.
A flood of new information about nucleosome positions, histone marks and DNA methylation is emerging from wholegenome projects such as ENCODE [28], the NIH Roadmap Epigenomics Project [29] and the European BLUEPRINT project [30]. The thermodynamic formalism has been successfully applied to wholegenome analysis at singlebase pair resolution. The corresponding graphs are even larger than those arising in Hathaway et al.’s study of bounded chromatin domains, with 10^{77} vertices, yet powerful dynamic programming methods allow equilibrium probabilities to be estimated from data [10],[12]. Incorporating nonequilibrium mechanisms on a wholegenome basis may be currently infeasible but similar approximation methods could plausibly be applied to individual genes, for which information may be available on how different molecular mechanisms interact, allowing the structure of the graph to be exploited, as suggested above, to reduce the complexity. We envisage, in this way, that the function of individual genes will come to be represented by mathematical graphs, just as the structure of individual genes has been represented by mathematical sequences. In contrast to sequences, graphs encode dynamics and functionality and their structures will change with our assumptions and data. Our existing sequencebased computational infrastructure may have to evolve to an infrastructure in which such dynamic graphs can be built, interrogated and analysed.
Methods
The experimental data discussed in this paper were obtained solely from the literature.
Calculating labelling functions
Figure 3B shows a sequencespecific transcription factor L that binds DNA only when also bound to a cofactor M. The component form that binds to DNA (which was called X in the main text) is LM. The rate constant for the transition is proportional to the free concentration of X=L M. This free concentration can be calculated by assuming that the binding of L and M,
has reached a rapid equilibrium, independently of the binding of LM to DNA. In this case, b[ L][ M]=c[ L M], so that
It follows that
which gives the formula for Φ([ L]) shown in Figure 3B. Rapid equilibrium amounts to a timescale separation, which uncouples the dynamics of the interactions in solution from those on DNA. The rapid equilibrium equations for more complicated interactions can often be formulated in terms of the linear framework, which can then be used to calculate [ X].
Glossary of mathematical concepts
Markov process. A timevarying probability distribution over a set of states in which the probability of reaching a given state in the next time step depends only on the current state. If time varies continuously then the next time step is interpreted infinitesimally, by taking a small unit of time, Δ t, and letting this tend to zero. The Markov property says that history does not matter in making the choice of which state comes next in time. However, history may be essential for determining the steadystate probabilities, as happens when the system is far from thermodynamic equilibrium.
Infinitesimal transition rate. Suppose that $i\stackrel{a}{\to}j$ is a labelled, directed edge in the graph. Treating the labels as infinitesimal transition rates defines a continuoustime, finite state Markov process, X(t), as follows: in any sufficiently small unit of time, Δ t, the conditional probability of microstate j occurring, given that microstate i has occurred, is a Δ t, to first order in Δ t. More formally,
With this notation, the probability of occurrence of microstate i at time t, which was denoted u _{ i }(t) in the main text, is given by u _{ i }(t)=Pr(X(t)=i).
Master equation. The probability of being in microstate i at time t+Δ t, u _{ i }(t+Δ t), can be calculated in terms of u _{ j }(t) and the infinitesimal transition rate from j to i, taking into account all microstates j that have an edge to i. The resulting differential equation, obtained by letting Δ t→0, which describes the forward evolution of probabilities over time, is the master equation, or Kolmogorov forward equation, of the Markov process [68]. The equivalence between the master equation of X(t) and Laplacian dynamics is proved in ([37], Corollary 2).
Kernel. If M is an n×n matrix acting on column vectors of size n, then the kernel of M, kerM, is the subspace of column vectors that become zero when multiplied by M: kerM={v  M·v=0}.
Strongly connected. In a graph G, vertex i is said to ultimately reach vertex j, denoted i⇝j, if either i=j or there is a path of directed edges from i to j:
Vertex i is said to be strongly connected to j if i⇝j and j⇝i. Strong connectivity is an equivalence relation on the vertices and the equivalence classes are called the SCCs of G. A graph is strongly connected if it has only one SCC. The graph in Figure 4B is strongly connected.
Cycle condition. If a graph describes a system that can reach thermodynamic equilibrium then it must satisfy detailed balance, as described in the main text. If detailed balance holds, then, in any cycle of reversible edges, the product of the labels going clockwise around the cycle must equal the product of the labels going counterclockwise around the cycle. Conversely, if a graph has reversible edges and the cycle conditions holds, then detailed balance is satisfied for any steady state of the graph. This is proved in ([36], Supporting Information).
Sequence/tree of reversible edges. A graph consisting of reversible edges, which are arranged in a sequence (Figure 5A) or, more generally, in a tree structure (Figure 5B), automatically satisfies detailed balance, irrespective of the edge labels. The argument for a sequence was presented in [69] but is easily generalised to a tree. Given a reversible edge, $i\stackrel{a}{\to}j$ and $j\stackrel{b}{\to}i$, and a steady state x ^{∗}, the net flux through the reversible edge is $a{x}_{i}^{\ast}b{x}_{j}^{\ast}$. If the reversible edge is a leaf of the tree structure then there can be no net flux leaving the tree from that edge. Hence, ${x}_{i}^{\ast}=(b/a){x}_{j}^{\ast}$. This reversible edge is therefore at equilibrium. This holds irrespective of the labels a and b. Arguing in this way by induction from the leaves, each reversible edge in the tree is independently at equilibrium, so that detailed balance holds.
Rooted spanning trees. A spanning tree of a graph G is a subgraph that contains each vertex of G (spanning) and that has no cycles when edge directions are ignored (tree). A spanning tree is rooted at vertex j in G if j is the only vertex with no outgoing edges. A graph is strongly connected if, and only if, it has at least one rooted spanning tree at each vertex ([37], Lemma 1). Figure 4B shows a strongly connected graph, together with the spanning trees rooted at each vertex.
Terminal strongly connected components. Let [ j] denote the SCC of G containing vertex j. In other words, [ j] is the equivalence class of vertex j under the relation of strong connectivity, as defined above. The SCC [ i] is said to precede [ j], denoted [ i]≼ [ j], if either [ i]= [ j] or some vertex in [ i] ultimately reaches some vertex in [ j]: i ^{′}⇝j ^{′} where i ^{′}∈ [ i] and j ^{′}∈ [ j]. Precedence defines a partial order on the SCCs of the graph G. We can therefore speak of the terminal SCCs, which are those that do not precede any other SCC. The graph in Figure 4C has three SCCs of which two are terminal (asterisks), while the graph in Figure 6C has five SCCs of which two are terminal (asterisks).
Calculating the PHO5generegulation function
The generegulation function of the PHO5 example was calculated using the matrixtree formula in Equation 7 and is shown in full in Additional file 1C. Software for enumerating spanning trees is available in packages like MATLAB, Mathematica and Maple, but we found these to be incapable of dealing with the large number of trees that arise. We therefore implemented in Python the fast algorithm developed by Takeaki Uno [70]. The resulting program reads a text file containing a description of a graph as a collection of labelled edges and, for each vertex in the graph, writes a text file listing the spanning trees rooted at that vertex. We also implemented an accompanying Mathematica notebook, which reads the graph description and the spanning tree files and assembles each ${\rho}_{i}^{G}$ as a polynomial function of the edge labels. The generegulation function can then be calculated using standard Mathematica functions for manipulating polynomial expressions. The Python program and the Mathematica notebook are freely available from our web site [71].
Fitting to the experimental data of Kim and O’Shea
Kim and O’Shea constructed 12 promoter variants ([52], Figure 3a). Six of these variants place a high affinity (H), low affinity (L) or deleted (X) Pho4binding site in the positions corresponding to UASp1 and UASp2 in Figure 7A. The remaining six variants use sites occluded by nucleosome 3, which is not modelled in Figure 7, and we did not analyse these variants. The wildtype promoter in Figure 7 corresponds to variant LH.
We obtained the experimental data in the form of an Excel spreadsheet [72]. This gives the raw fluorescence values for YFP, CFP and RFP (yellow, cyan and red fluorescent proteins, respectively) for about 400 to 500 cells for each variant under different doxycycline concentrations. The RFP was attached to a chromatin protein to mark the nucleus and the RFP value was used to normalise the YFP and CFP values on a percell basis to control against imaging variations. We used a ±7 moving average to smooth the data and scaled each variant to its maximum expression level for the plots shown in Figure 8.
Each of the six variants gives rise to a graph, which uses the same labels as the wild type (Figure 7B). The labels b and c are the rates of Pho4 dissociation from the lowaffinity and highaffinity sites, respectively. Kim and O’Shea assumed that the Pho4 association rate, a, is the same for both sites. If the Pho4 binding sites are changed in a variant, the labels b and c occur on different edges of the wildtype graph, while if a Pho4 binding site is deleted, some vertices become inaccessible and the graph changes from the 12vertex wildtype graph to a graph with eight vertices. We used the wildtype 12vertex generegulation function and a new eightvertex generegulation function calculated using Equation 7. We then changed the labels b and c in these two generegulation functions, as required, to generate the generegulation function for each of the six variants (details in the accompanying Mathematica notebook).
Kim and O’Shea assumed that the Pho4 association rate, a, is a Hill function of Pho4 concentration given by
so that the generegulation functions depend on six parameters:
These have units of concentration, for K, and inverse time, for the others. We followed Kim and O’Shea in assuming that [ Pho4]=α·nYFP, where nYFP is normalised YFP. The constant of proportionality, α, is not known but can be absorbed into the parameter K. We therefore left K as a dimensional parameter having units of concentration, and used nYFP as the input to the individual generegulation functions. We dedimensionalised the remaining parameters by dividing each by ${k}_{\text{max}}^{\ast}$, thereby replacing each edge label x by $x/{k}_{\text{max}}^{\ast}$, where x is one of a,b,c,d,e, and reducing the number of parameters from six to five. The red curves in Figure 8 were obtained by fitting each variant individually using the Levenberg–Marquardt algorithm in Mathematica. We were unable to do the same for a collective fit because the Levenberg–Marquardt algorithm did not terminate. We therefore used Mathematica to plot the generegulation function overlaid against the corresponding smoothed experimental data for each variant and used the Manipulate Manipulate capability to alter the values of the five parameters manually and to assess the goodness of fit to all the variants visually. We found the following numerical parameter values that yielded the collective fit shown in the black curves in Figure 8,
The Mathematica notebook in which these calculations were undertaken is freely available from our web site [71]. It provides the normalised experimental data, the smoothed experimental data and the individual and collective fits of the variant generegulation functions to the corresponding data.
Imposing equilibrium on the Hodges–Crabtree model
As explained in the main text, to impose equilibrium is to require that detailed balance holds. This means, first, that all edges in the graph must be reversible and, second, that the cycle condition (described in the glossary above) is satisfied. The graph of microstates for an array of three nucleosomes is shown in Figure 6B and we follow the notation introduced there in which microstates are denoted by bit strings, indicating whether (bit = 1) or not (bit = 0) a nucleosome is marked. Edges only occur between microstates that differ by a single bit, corresponding to nucleation or mark propagation, when the number of bits increases by 1 and the edge has label k+, or to mark turnover, when the number of bits decreases by 1 and the edge has label k_ (Figure 6A). Irreversibility only arises for some of the latter edges, when an isolated site, whose immediate neighbours are unmarked, loses its mark (for instance, 5→1, 3→1 and 6→2 in Figure 6B).
To impose reversibility, assume that reverse edges have been introduced into the graph as needed, each with the label k+. To check the cycle condition, choose any cycle of reversible edges from a vertex j back to itself,
In traversing this path, if an edge increases the number of bits in the microstate by 1, then the label encountered must be k+, while if an edge decreases the number of bits by 1, then the label must be k_. Since the path is a cycle, the number of edges with label k+ must equal the number of edges with label k_. Furthermore, for each edge with label k+, respectively, k_, the reverse edge has label k_, respectively, k+. But then the product of the labels going clockwise around the cycle must equal the product of the labels going counterclockwise around the cycle and the cycle condition is satisfied. The graph therefore satisfies detailed balance in any steady state.
Equilibrium probabilities can now be calculated using Equation 5. Let K=k+/k_. Given a microstate j, let β(j) be the number of bits in j that are set to 1. It is easy to construct a path of reversible edges from the reference microstate 1 to microstate j with just β(j) edges, each of which increases the number of bits by 1. Hence, according to Equation 5,
If the number of sites in the array is n, then the partition function is given by
However, there are $\left(\genfrac{}{}{0.0pt}{}{n}{\beta \left(j\right)}\right)$ microstates each having β(j) sites marked, so the partition function may be rewritten as
Another way of seeing this is to note that, when equilibrium is imposed, the system becomes identical to n independent copies of the onesite system. The partition function can then be calculated from the product formula (Equation 14), which is a special case of the product theorem proved in Additional file 1B. It now follows from Equation 4 that the probability of microstate j is given by
We see from this that the probability of a microstate depends only on the number of bits that are marked, rather than which bits are marked and, consequently, there can be no inherent bound on the size of the marked domain.
Additional file
Abbreviations
 FHDC:

firstorder Hill dose–response curve
 SCC:

strongly connected component
 TF:

transcription factor
References
 1.
Ackers GK, Johnson AD, Shea MA: Quantitative model for gene regulation by lambda phage repressor . Proc Nat Acad Sci USA. 1982, 79: 11291133. 10.1073/pnas.79.4.1129.
 2.
Buchler N, Gerland U, Hwa T: On schemes of combinatorial transcription logic . Proc Nat Acad Sci USA. 2003, 100: 51365141. 10.1073/pnas.0930314100.
 3.
Setty Y, Mayo AE, Surette MG, Alon U: Detailed map of a cisregulatory input function . Proc Nat Acad Sci USA. 2003, 100: 77027707. 10.1073/pnas.1230759100.
 4.
Bintu L, Buchler NE, Garcia GG, Gerland U, Hwa T, Kondev J, Kuhlman T, Phillips R: Transcriptional regulation by the numbers: applications . Curr Opin Gen Dev. 2005, 15: 125135. 10.1016/j.gde.2005.02.006.
 5.
Vilar JMG, Saiz L: DNA looping in gene regulation: from the assembly of macromolecular complexes to the control of transcriptional noise . Curr Op Genet Dev. 2005, 15: 136144. 10.1016/j.gde.2005.02.005.
 6.
Kuhlman T, Zhang Z, Saier MH, Hwa T: Combinatorial transcriptional control of the lactose operon of Escherichia coli . Proc Nat Acad Sci USA. 2007, 104: 60436048. 10.1073/pnas.0606717104.
 7.
Gertz J, Siggia ED, Cohen BA: Analysis of combinatorial cisregulation in synthetic and genomic promoters . Nature. 2009, 457: 215218. 10.1038/nature07521.
 8.
Zinzen RP, Senger K, Levine M, Papatsenko D: Computational models for neurogenic gene expression in the Drosophila embryo . Curr Biol. 2006, 16: 13581365. 10.1016/j.cub.2006.05.044.
 9.
Janssens H, Hou S, Jaeger J, Kim AR, Myasnikova E, Sharp D, Reinitz J: Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene . Nat Genet. 2006, 38: 11591165. 10.1038/ng1886.
 10.
Segal E, RavehSadka T, Schroeder M, Unnerstall U, Gaul U: Predicting expression patterns from regulatory sequence in Drosophila segmentation . Nature. 2008, 451: 535540. 10.1038/nature06496.
 11.
Fakhouri WD, Ay A, Sayal R, Dresch J, Dayringer E, Arnosti DN: Deciphering a transcriptional regulatory code: modeling shortrange repression in the Drosophila embryo. Mol Syst Biol2010, 6:341.
 12.
He X, Samee MAH, Blatti C, Sinha S: Thermodynamicsbased models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and shortrange repression. PLoS Comp Biol2010, 6:1000935.
 13.
Parker DS, White MA, Ramos AI, Cohen BA, Barolo S: The cisregulatory logic of Hedgehog gradient responses: key roles for Gli binding affinity, competition and cooperativity. Sci Signal2011, 4:38.
 14.
Giorgetti L, Siggers T, Tiana G, Caprara G, Notarbartolo S, Corona T, Pasparakis M, Milani P, Bulyk M, Natoli G: Noncooperative interactions between transcription factors and clustered DNA binding sites enable graded transcriptional responses to environmental inputs . Mol Cell. 2010, 37: 418428. 10.1016/j.molcel.2010.01.016.
 15.
Bintu L, Buchler NE, Garcia GG, Gerland U, Hwa T, Kondev J, Phillips R: Transcriptional regulation by the numbers: models . Curr Opin Gen Dev. 2005, 15: 116124. 10.1016/j.gde.2005.02.007.
 16.
Kim HD, Shay T, O’Shea EK, Regev A: Transcriptional regulatory circuits: predicting numbers from alphabets . Science. 2009, 325: 429432. 10.1126/science.1174062.
 17.
Michel D: How transcription factors can adjust the gene expression floodgates . Prog Biophys Mol Biol. 2010, 102: 1637. 10.1016/j.pbiomolbio.2009.12.007.
 18.
Segal E, Widom J: From DNA sequence to transcriptional behaviour: a quantitative approach . Nat Rev Genet. 2009, 10: 443456. 10.1038/nrg2591.
 19.
Sherman MS, Cohen BA: Thermodynamic state ensemble models of cisregulation. PLoS Comp Biol2012, 8:1002407.
 20.
Zhang Z, Wippo CJ, Wal M, Ward E, Korber P, Pugh BF: A packing mechanism for nucleosome organization reconstituted across a eukaryotic genome . Science. 2011, 332: 977980. 10.1126/science.1200508.
 21.
Struhl K, Segal E: Determinants of nucleosome positioning . Nat Struct Mol Biol. 2013, 20: 267273. 10.1038/nsmb.2506.
 22.
Sánchez A, Osborne ML, Friedman LJ, Kondev J, Gelles J: Mechanism of transcriptional repression at a bacterial promoter by analysis of single molecules . EMBO J. 2011, 30: 39403946. 10.1038/emboj.2011.273.
 23.
Lickwar CR, Mueller F, Hanlon SE, McNally JG, Lieb JD: Genomewide protein–DNA binding dynamics suggest a molecular clutch for transcription factor function . Nature. 2012, 484: 251255. 10.1038/nature10985.
 24.
Coulon A, Chow CC, Singer RH, Larson DR: Eukaryotic transcriptional dynamics: from single molecules to cell populations . Nat Rev Genet. 2013, 14: 572584. 10.1038/nrg3484.
 25.
Hopfield JJ: Kinetic proofreading: a new mechanism for reducing errors in biosynthetic processes requiring high specificity . Proc Nat Acad Sci USA. 1974, 71: 41354139. 10.1073/pnas.71.10.4135.
 26.
Zaher HS, Green R: Fidelity at the molecular level: lessons from protein synthesis . Cell. 2009, 136: 746762. 10.1016/j.cell.2009.01.036.
 27.
Murugan A, Huse DA, Leibler S: Speed, dissipation, and error in kinetic proofreading . Proc Nat Acad Sci USA. 2012, 109: 1203412039. 10.1073/pnas.1119911109.
 28.
Encode Project Consortium: An integrated encyclopedia of DNA elements in the human genome . Nature. 2012, 489: 5774. 10.1038/nature11247.
 29.
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, Hirst M, Lander ES, Mikkelsen TS, Thomson JA: The NIH Roadmap epigenomics mapping consortium . Nat Biotechnol. 2010, 28: 10451048. 10.1038/nbt10101045.
 30.
Abbott A: Europe to map the human epigenome. Nature2011, 477:518.
 31.
Bonasio R, Tu S, Reinberg D: Molecular signals of epigenetic states . Science. 2010, 330: 612616. 10.1126/science.1191078.
 32.
Cantone I, Fisher AG: Epigenetic programming and reprogramming during development . Nat Struct Mol Biol. 2013, 20: 282289. 10.1038/nsmb.2489.
 33.
Schmitz RJ: The secret garden – epigenetic alleles underlie complex traits . Science. 2014, 343: 10821083. 10.1126/science.1251864.
 34.
Thomson M, Gunawardena J: Unlimited multistability in multisite phosphorylation systems . Nature. 2009, 460: 274277. 10.1038/nature08102.
 35.
Thomson M, Gunawardena J: The rational parameterisation theorem for multisite posttranslational modification systems . J Theor Biol. 2009, 261: 626636. 10.1016/j.jtbi.2009.09.003.
 36.
Gunawardena J: A linear framework for timescale separation in nonlinear biochemical systems. PLoS ONE2012, 7:36321.
 37.
Mirzaev I, Gunawardena J: Laplacian dynamics on general graphs . Bull Math Biol. 2013, 75: 21182149. 10.1007/s1153801398848.
 38.
Gunawardena J: Timescale separation: Michaelis and Menten’s old idea, still bearing fruit . FEBS J. 2014, 281: 473488. 10.1111/febs.12532.
 39.
Raser JM, O’Shea EK: Noise in gene expression: origins, consequences and control . Science. 2005, 309: 20102013. 10.1126/science.1105891.
 40.
Sánchez A, Golding I: Genetic determinants and cellular constraints in noisy gene expression . Science. 2013, 342: 11881193. 10.1126/science.1242975.
 41.
Brown CR, Mao C, Falkovskaia E, Jurica MS, Boeger H: Linking stochastic fluctuations in chromatin structure and gene expression. PLoS Biol2013, 11:1001621.
 42.
Sánchez A, Garcia HG, Jones D, Phillips R, Kondev J: Effect of promoter architecture on the celltocell variability in gene expression. PLoS Comput Biol2011, 7:1001100.
 43.
Gunawardena J: Models in biology: ‘accurate descriptions of our pathetic thinking’. BMC Biol2014, 12:29.
 44.
Phillips R, Kondev J, Theriot J: Physical Biology of the Cell. 2009, Garland Science, New York, NY, USA
 45.
MichelmanRibeiro A, Mazza D, Rosales T, Stasevich TJ, Boukari H, Rishi V, Vinson C, Knutson JR, McNally JG: Direct measurement of association and dissociation rates of DNA, binding in live cells by fluorescence correlation spectroscopy . Biophys J. 2009, 97: 337346. 10.1016/j.bpj.2009.04.027.
 46.
Ong KM, Blackford JA, Kagan BL, Simons SS, Chow CC: A theoretical framework for gene induction and experimental comparisons . Proc Nat Acad Sci USA. 2010, 107: 71077112. 10.1073/pnas.0911095107.
 47.
Hathaway NA, Bell O, Hodges C, Milller EL, Neel DS, Crabtree GR: Dynamics and memory of heterochromatin in living cells . Cell. 2012, 149: 14471460. 10.1016/j.cell.2012.03.052.
 48.
Hodges C, Crabtree GR: Dynamics of inherently bounded histone modification domains . Proc Nat Acad Sci USA. 2012, 109: 1329613301. 10.1073/pnas.1211172109.
 49.
Darzacq X, ShavTal Y, de Turris V, Brody Y, Shenoy SM, Phair RD, Singer RH: Invivo dynamics of RNA polymerase II transcription . Nat Struct Mol Biol. 2007, 14: 796806. 10.1038/nsmb1280.
 50.
Kingston RE, Green MR: Modeling eukaryotic transcriptional activation . Curr Biol. 1994, 4: 325332. 10.1016/S09609822(00)000713.
 51.
Simons SS: What goes on behind closed doors: physiological versus pharmacological steroid hormone actions . Bioessays. 2008, 30: 744756. 10.1002/bies.20792.
 52.
Kim HD, O’Shea EK: A quantitative model of transcription factoractivated gene expression . Nat Struct Mol Biol. 2008, 15: 11921198. 10.1038/nsmb.1500.
 53.
Lam FH, Steger DJ, O’Shea EK: Chromatin decouples promoter threshold from dynamic range . Nature. 2008, 453: 246250. 10.1038/nature06867.
 54.
RavehSadka T, Levo M, Segal E: Incorporating nucleosomes into thermodynamic models of transcription regulation . Genome Res. 2009, 19: 14801496. 10.1101/gr.088260.108.
 55.
Parikh RY, Kim HD: The effect of an intervening promoter nucleosome on gene expression. PLoS ONE2013, 8:63072.
 56.
Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA: The evolution of transcriptional regulation in eukaryotes . Mol Biol Evol. 2003, 20: 13771419. 10.1093/molbev/msg140.
 57.
Spitz F, Furlong EEM: Transcription factors: from enhancer binding to developmental control . Nat Rev Genet. 2012, 13: 613626. 10.1038/nrg3207.
 58.
Levine M: Transcriptional enhancers in animal development and evolution . Curr Biol. 2010, 20: 754763. 10.1016/j.cub.2010.06.070.
 59.
Blossey R, Schiessel H: Kinetic proofreading of gene activation by chromatin remodeling . HFSP J. 2008, 2: 167170. 10.2976/1.2909080.
 60.
Narlikar G: A proposal for kinetic proof reading by ISWI chromatin remodeling motors . Curr Opin Chem Biol. 2010, 14: 660665. 10.1016/j.cbpa.2010.08.001.
 61.
Blossey R, Schiessel H: Kinetic proofreading in chromatin remodeling: the case of ISWI/ACF . Biophys J. 2011, 101: 3032. 10.1016/j.bpj.2011.07.001.
 62.
Mirny L: Nucleosomemediated cooperativity between transcription factors . Proc Nat Acad Sci USA. 2010, 107: 2253422539. 10.1073/pnas.0913805107.
 63.
Gregor T, Tank DW, Wieschaus EF, Bialek W: Probing the limits to positional information . Cell. 2007, 130: 153164. 10.1016/j.cell.2007.05.025.
 64.
McGregor AP, Shaw PJ, Hancock JM, Bopp D, Hediger M, Wratten NS, Dover GA: Rapid restructuring of bicoid dependent hunchback promoters within and between Dipteran species: implications for molecular coevolution . Evol Dev. 2001, 3: 397407. 10.1046/j.1525142X.2001.01043.x.
 65.
Schnakenberg J: Network theory of microscopic and macroscopic behaviour of master equation systems . Rev Mod Phys. 1976, 48: 571586. 10.1103/RevModPhys.48.571.
 66.
Jarzynski C: Equilibrium freeenergy differences from nonequilibrium measurements a master equation approach . Phys Rev E. 1997, 56: 50185035. 10.1103/PhysRevE.56.5018.
 67.
Crooks GE: Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences . Phys Rev E. 1999, 60: 27212726. 10.1103/PhysRevE.60.2721.
 68.
van Kampen NG: Stochastic Processes in Physics and Chemistry. 1992, Elsevier, Amsterdam, The Netherlands
 69.
Gunawardena J: Multisite protein phosphorylation makes a good threshold but can be a poor switch . Proc Nat Acad Sci USA. 2005, 102: 1461714622. 10.1073/pnas.0507322102.
 70.
Uno T: An algorithm for enumerating all directed spanning trees in a directed graph . Algorithms and Computation. 7th International Symposium, ISAAC’96; Osaka, JapanVolume 1178. Edited by: Asano T, Igarashi Y, Nagamochi H, Miyano S, Suri S. 1996, Springer, Berlin, Germany,
 71.
Linear framework software. [vcp.med.harvard.edu/software.html]
 72.
Excel spreadsheet for PHO5 data. [www.nature.com/nsmb/journal/v15/n11/extref/nsmb.1500S2.xls]
Acknowledgements
We thank Harold Kim and Erin O’Shea for their assistance with the material in [52].
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
RE and JG conceived and supervised the project. TA, FW and JG undertook the analysis. JG wrote the paper. All authors read and approved the final manuscript.
Tobias Ahsendorf, Felix Wong contributed equally to this work.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 linear framework
 Laplacian dynamics
 thermodynamic formalism
 nonequilibrium statistical mechanics
 gene regulation
 chromatin domains
 steroidhormone responsive genes
 phosphate regulation
 PHO5
 product graph
 independence