Logic modeling and the ridiculome under the rug
© Blinov and Moraru; licensee BioMed Central Ltd. 2012
Published: 21 November 2012
Skip to main content
© Blinov and Moraru; licensee BioMed Central Ltd. 2012
Published: 21 November 2012
Logic-derived modeling has been used to map biological networks and to study arbitrary functional interactions, and fine-grained kinetic modeling can accurately predict the detailed behavior of well-characterized molecular systems; at present, however, neither approach comes close to unraveling the full complexity of a cell. The current data revolution offers significant promises and challenges to both approaches - and could bring them together as it has spurred the development of new methods and tools that may help to bridge the many gaps between data, models, and mechanistic understanding.
Have you used logic modeling in your research? It would not be surprising if many biologists would answer no to this hypothetical question. And it would not be true. In high school biology we already became familiar with cartoon diagrams that illustrate basic mechanisms of the molecular machinery operating inside cells. These are nothing else but simple logic models. If receptor and ligand are present, then receptor-ligand complexes form; if a receptor-ligand complex exists, then an enzyme gets activated; if the enzyme is active, then a second messenger is being produced; and so on. Such chains of causality are the essence of logic models (Figure 1a). Arbitrary events and mechanisms are abstracted; relationships are simplified and usually involve just two possible conditions and three possible consequences. The presence or absence of one or more molecule, activity, or function, [some icons in the cartoon] will determine whether another one of them will be produced (created, up-regulated, stimulated) [a 'positive' link] or destroyed (degraded, down-regulated, inhibited) [a 'negative' link], or be unaffected [there is no link]. The icons and links often do not follow a standardized format, but when we look at such a cartoon diagram, we believe that we 'understand' how the system works. Because our brain is easily able to process these relationships, these diagrams allow us to answer two fundamental types of questions related to the system: why (are certain things happening)? What if (we make some changes)?
Logic models offer a conceptually simple representation of biology that is easy to simulate. They are naturally suited to exploring large-scale biological networks where causality links are being hypothesized, or sought: genome, transcriptome, proteome, metabolome, interactome, microbiome - the list goes on. We are witnessing an unprecedented increase in the amount and quality of data available for describing and modeling biology at the cellular level. Graphical representation of these data as a network of (putative) relationships with nodes and edges (Figure 1d), in its many variants, is now so common that it can be considered iconic . As the -ome names imply, we expect such data to be complete collections of components and/or properties. The problem is that they are neither complete nor correct. It has been argued that they often do not help understanding, and have occasionally been called the 'ridiculome'. While this is obviously tongue-in-cheek, it does reflect some real limitations. But if logic models are so easy to compute, can't they be used to test, correct, and refine large-scale models based on the existing complement of available -omics data? Actually they are, successfully so: they are the bread and butter of network inference, which aims to reverse-engineer the relationships between intracellular components responsible for regulating cellular function. In most cases we still do not know many of the interactions between various gene products, signaling molecules, metabolites, and so on, and how they lead to a particular cellular phenotype. Phenotypes characterized by high-throughput experimental measurements of state parameters (protein or mRNA expression levels, enzymatic activities, metabolite levels, and so on) can then be used to 'train' logic models that eventually will infer the putative network responsible for the observed behavior  (thus the 'network inference' designation).
Logic-derived models differ not only in the level of fine-graining of the functional relationships, but also in their ability to handle time - the dynamics of the systems. Boolean networks were originally designed to provide simple input-output relationships - that is, the steady-state achieved under varying conditions. This is appropriate, for example, for analyzing traditional transcriptomics or proteomics experiments. Whether we measure expression levels before or after some external perturbation (for example, applying a stimulus or drug), or compare different cell populations, it is still just a collection of different steady-states. True time-course data were typically limited to small scale experiments, but are now becoming available also in high-throughput technologies. Algorithms to allow logic-derived models to simulate dynamic systems aim to retain the simplicity of Boolean networks but with a fine-grained representation of time (Figure 2). Time discretization with synchronous updating is the simplest approach, where we can think of the system simply stepping through time from steady-state to steady-state. At the other end of the spectrum is continuous time representation: algorithms were recently developed to infer logic-based differential equations , and then simulations become similar to those of traditional kinetic models.
More detail comes with the burden of increasing computational complexity and the risk of over-parameterizing: the extensions to logic models described above require both choosing a functional form and inclusion of additional parameters such as coefficients and thresholds, all of which are often arbitrary or at best phenomenological. Von Neumann once famously quipped that 'With four parameters I can fit an elephant, and with five I can make him wiggle his trunk' (in fact, this was recently rigorously proven to be true ). High throughput experiments nowadays generate large amounts of data at such a rapid pace that we have trouble making sense of it all, but modelers complain that they still lack the data required to build sensible large-scale quantitative models. Having enough data to constrain the model is critical to avoid simulating phantoms.
The right choice of mathematical formalism thus depends on both the purpose of the model and the type, quantity, and quality of data at hand - and for systems of any complexity will most likely be a combination of multiple methods. This was (perhaps painfully) reinforced recently by the report by Karr et al.  of the first arguably successful comprehensive model of one of the simplest existing prokaryote species. This required a huge effort of software assembly, using many different modeling approaches and countless hours of manual data mining. The overall model is described in a 100+ page supplement, has thousands of parameters, and was implemented by using custom code development as well as 20 third-party software tools. It certainly looks daunting to reproduce and expand upon this work. If we look more closely, though, we will find that in addition to the superb publication materials, there is extensive information on several project-related websites, ranging from interactive browsing of the assembled knowledgebase for the model, to fully packaged downloadable code that is 'ready to run' (that is, if you have access to compute clusters, Matlab, and more, and the expertise to configure it all). Given adequate computer resources, re-enacting the published simulations may not take more than a stubborn graduate student's few sleepless nights. But to be able to build something similar in the context of your own data from other cell types, that is a different story altogether.
The gaps and uncertainties in the knowledge of networks are still prevalent in most cases, but custom -omics data are now much easier to obtain. So perhaps advances in logic-based modeling could help. A recent paper in BMC Systems Biology  presents an integrated platform for logic-derived modeling (CellNOptR) that enables users to navigate seamlessly between many of the different formalisms discussed above, allowing for different levels of detail in both time and state, and also providing the ability to combine network inference (prediction of the network topology from experimental data) with existing curated pathway information. This work could also appear intimidating for the non-expert. CellNOptR stands for Cellular Network Optimizer R, and is implemented as a Bioconductor  package. One might ask whether we need to be familiar with R (a statistical programming language) and/or Bioconductor (a public collection of software tools that use R, focused mostly on manipulation and analysis of genomic-related data) in order to use CellNOptR. Maybe, maybe not. But you do not need to be a programmer to use highly sophisticated computer software, just like you do not need to be an optical engineer to use a highly sophisticated confocal microscope. Bioinformatics-savvy users may prefer to invoke CellNOptR from their own R scripts, but those unfamiliar with such programming can simply stay within the cosy confines of Cytoscape  (a graph/network visualization software that is very popular among biologists) where they can install a simple plugin (CytoCopteR, distributed with the CellNOptR package) that provides all functionality in a user-friendly graphical interface form. Of course, we would expect a learning curve for new users. Microscope or software, one needs to learn how to use it, and, perhaps more importantly, to learn what one can expect to accomplish by using it in terms of both capabilities and limitations.
Why is this a powerful approach? Because it can greatly help to understand the system being modeled. We do not wish to engage here in discussing the meaning of 'understanding' and of the usefulness of models; these have been frequent topics in biological discourse in recent years. We will rather illustrate by a hypothetical example, in very broad and practical terms (the interested reader is referred to  and references 11, 14, 19, 35, 40, 44 cited therein for detailed descriptions of real world examples and algorithm testing/validation). Suppose we want to investigate how signaling and gene regulation via the epidermal growth factor and tumor necrosis factor-α receptors may be altered in human hepatocarcinoma cells. We would select information available in pathway databases and a relevant transcriptomics and/or proteomics experimental dataset (typically readouts after various perturbations), and then put CellNOptR to work. Some likely results of this exercise in logic modeling could be the predictions that, in contrast to other cells, in these cancer cells the tumor necrosis factor-α receptor does not activate phosphoinositide 3-kinase, both Map3K1 and Map3K7 are required to activate MKK4, an inhibitory link from ERK to SOS-1 may be present, and so on. This context-specific model refinement provides concrete hypotheses: maybe a putative interaction shown in a yeast two-hybrid experiment does not occur in vivo, or maybe the transformed cell line phenotype is simply different from the canonical pathway. The latter may prove to be critical information for identifying the mechanisms that cause the hepatocarcinoma cells to respond to stimuli differently than their normal liver cell counterparts. If the experimental data have detailed time-course readouts, the differences obtained when fitting via the different algorithms could lead to additional conclusions, such as the Ras activation of Map3K1 exhibits hypersensitivity, whereas the branch linking Map3K7 to NFκB inhibition is linear and robust to perturbations - perhaps critical information for identifying potential drug targets. The mechanistic insights and predictive power are much higher than what can be obtained from purely data-driven models or simulating purely pathway-derived models.
How does this relate to large multi-scale models? Covert and colleagues  were able to develop a whole-cell model of the bacterium Mycoplasma genitalium that accounts for all molecular components and their interactions, from electrolytes and metabolites to proteins and ribosome assemblies. This highly complex model was constructed by coordinating sub-models for each of 28 classes of cellular processes, a majority of which were mathematically represented by logic-derived models of some sort (see chapter 3 of supplement S1 in  for details). The software and methods developed by Saez-Rodrigues and colleagues  make a strong statement about the power of sophisticated logic-derived models for systems such as mammalian cells, where large parts of the molecular networks are not well understood, incomplete and with unknown parameters. But the bottom panel in Figure 2 of  provides for both a reality-check of current capabilities and a hint of things to come. Certain behaviors of the studied system (for example, the NFκB oscillations) can be captured only by using the logic-derived differential equations (a method that is considerably more computationally expensive, and which involves many additional arbitrary parameters). This may come as no surprise. Practitioners of detailed, quantitative, validated models have preached for a long time the importance of non-linear dynamics of intracellular molecular interactions, especially in signaling networks, but often also in metabolic or gene regulatory networks (in fact, these classifications of networks are increasingly blurred nowadays). Detailed studies have shown that parts of these networks can act as modules with distinct dynamical features (threshold, hysteresis, oscillatory instability, switch-like instability, and so on) . Such emergent properties may be due not only to network topology, but to the detailed kinetic rate laws and quantitative parameters. To complicate matters further, impedance effects sometimes change individual module behavior when multiple modules are connected to each other.
In fact, much more is swept under the rug than we have alluded to so far. Even the simple cartoon diagram shown in Figure 1a embodies more information than the simple causal links captured by the logic models shown in Figure 1b. Multiple phosphosites can create a combinatorial complexity of regulatory actions, and difficulties in mapping functional states to measured observable quantities. Compartments, scaffolds, and diffusion create spatial inhomogeneities and microdomains, which have critical functionality in many eukaryotic cells (often also in prokaryotes) . And to top it all off, there has been increasing evidence that parts of the cellular machinery employ fleeting, non-stoichiometric, pleiomorphic assemblies of molecules to carry out vital processes . Many novel methods and algorithms have been developed in recent years by the 'bottom-up' modelers and experimentalists to tackle these problems: rule-based  and network-free models , spatially resolved models with continuous representations (partial differential equations-based)  or with discrete representations (particle-based stochastics) , as well as refinement of methods long used in mathematical biology, such as agent-based simulation methods and constraint-based models. New theoretical methods and software applications continuously appear in different areas related to modeling, ranging from network-based approaches for predicting missing pathway interactions  to multi-level rule-based modeling .
Does this detract from our praise of the advances in logic-derived models discussed above? No. To the contrary, this is why we are really excited. Let us return to those logic-derived ordinary differential equations (ODEs and how they (can) relate to the other side of the field. Of course, they are phenomenological constructs of whatever arbitrary mathematical form is being provided (in this case Hill-type equations, which can capture a variety of common non-linear relationships with only two parameters). But such mathematical approximations are sometimes the starting point for discovering the underlying mechanism. In what is arguably one of the most influential modeling works related to biology, almost exactly 100 years ago Leonor Michaelis and Maud Menten used a phenomenological equation to fit the experimental measurements of the initial velocity of the invertase-catalyzed reaction (at time zero, when no product has formed yet, the reaction can be simplified and modeled as being irreversible). Based on that approximation, they posited that the enzyme activity could be explained by mass-action kinetics involving an intermediary reaction complex - the fundamental mechanism of enzymatic catalysis that was confirmed three decades later . The fact that explaining certain qualitative characteristics requires the ODE-based formalism is the perfect starting point for directing new detailed investigations of potential mechanistic hypotheses.
Moreover, if we can modify a logic-derived model and end up with a differential equations-based model, why not jump over the fence and use what is available in the world of kinetic models? For starters, much more powerful optimization algorithms and tools have been developed in that domain . Taking advantage of these would be trivial if models could be exported into a community standard format such as Systems Biology Markup Language (SBML) . And if support for this exchange format were implemented in reverse, too, one could import pre-existing detailed kinetic models into software that deals with logic formalism as just another form of prior knowledge for those interactions where such information already exists (for reference, as of this writing, the Biomodels database makes available 154,456 kinetic relationships between 133,559 molecular species in the curated branch).
It sounds trite to say that we need to use multiple approaches and tools in order to build truly complete and accurate cellular models. We are getting closer not only to integrating multiple logic-based formalisms easily, but also to crossing over into kinetic, spatial, rule-based models, and more. And the experimental data required for building all these different types of computational models at different scales and levels of detail will have to come from both 'small science' and 'big science' .
The authors wish to acknowledge NIH grants RR013186 (IIM) and GM095485 (MLB, IIM) as well as many stimulating discussions with colleagues at the RD Berlin Center for Cell Analysis and Modeling.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.