### Surveying the combinatorial space of developmental schemes

We considered organisms with *N*={3,4,5,6,7} genes. For each *N*, we have looked at {100,100,100,92,25} randomly generated gene regulation matrices (GR), respectively. For each GR, all values from \([0,0.1,0.2,\dots,1.0]\) were used for the parameters *P*_{asym},*P*_{sig}, and *P*_{adj}. A distinct set of developmental rules matrices (CD, *A*, and SG) was generated for each set of parameter values. And for each set of rules matrices, 10 randomly chosen cell types were used as initial conditions (in case of *N*=3, all 8 cell types were used). In all, we have looked at about ((100+100+100+92+25)×11^{3}×10)≈5.5×10^{6} systems. A total of 4,858,643 of these converged within 1000 time steps into homeostatic organisms.

### Model details

All codes used to generate and analyze data are written in Python3.6 or Octave 5.2.0.

#### Asymmetric cell division

In our model, for any cell type *C*_{i}, we generate different sets of daughter cell types *D*_{i} using the parameter *P*_{asym}∈[0,1]; for any daughter cell type \(D_{i_{i1}}\in D_{i},\forall k \leq N\),

$$\begin{array}{@{}rcl@{}} &&\text{if}\,\, \left(C_{i}(k) = 0\right)\,\, \text{then}\,\, \left(D_{i_{i1}}(k) = 0\right),\,\, \text{and}\\ &&\text{if}\,\, \left(C_{i}(k) = 1\right)\,\, \text{then}\,\, \left(D_{i_{i1}}(k) = {\text{Ber}}({P_{\text{asym}}})\right) \end{array} $$

We encode cell division in a binary matrix \(\phantom {\dot {i}\!}{\text {CD}}_{2^{N} \times 2^{N}}\); CD(*i*,*j*)=1 if cell type *C*_{j}∈*D*_{i}, else CD(*i*,*j*)=0 (Fig. 1b).

#### Signaling

The probability that a gene in the model produces a signaling molecule is *P*_{sig}∈[0,1]. Formally, let SG={0,1}^{N} be a binary vector. Then, gene *k* produces a signaling molecule if SG(*k*)=1, where SG(*k*)=Ber(*P*_{sig}) (Fig. 1c). Let SG_{j}={0,1}^{N} be the set of signals produced by cell type *C*_{j}. For any gene *k*, SG_{j}(*k*)=1 ⇔ (*C*_{j}(*k*)=1)∧(SG(*k*)=1).

Parameter *P*_{adj}∈[0,1] gives the probability of signal reception. We encode signal reception in a binary matrix \(\phantom {\dot {i}\!}A_{2^{N}\times 2^{N}}\). Cell type *C*_{i} receives all signals produced by cell type *C*_{j} if *A*(*j*,*i*)=1, where *A*(*j*,*i*)=Ber(*P*_{adj}). *C*_{i} receives no signals from cell type *C*_{j} if *A*(*j*,*i*)=0 (Fig. 1d). Cells can only receive signals from other cell types present in the same time step. Let \(T_{t} = \{0,1\}^{2^{N}}\) be a binary vector, where *T*_{t}(*i*)=1 if cell type *C*_{i} is present in the time step *t*. *T*_{t} represents the state of the organism at time step *t*. For some cell type, *C*_{i} present at time step *t*, let \({C^{\text {sig}}_{i}}\) represent its state immediately after signal exchange. In cell types that receive a signal, the corresponding genes are set to 1 (Fig. 1f). That is:

$$\begin{array}{*{20}l} &{} {C^{\text{sig}}_{i}}(k) = 1,\\ &{}\text{if}\,\, (C_{i}(k)=1) \vee \left({\sum\nolimits}_{j=1}^{2^{N}}(A(j,i) \times {\text{SG}}_{j}(k) \times T_{t}(j)) > 0\right) \end{array} $$

#### Gene regulation

We define gene regulation in the model as a set of stable cell types and cell types in the basins of these stable cell types. As mentioned earlier, stable cell types need not be fixed points (single-cell state) of the gene regulatory network; they can also be an oscillation (multiple cell states). Oscillatory stable cell types are represented as the set of all cell states that compose the oscillation.

Formally, a system with *N* genes can have *n*≤2^{N} stable cell types {*S*_{1},*S*_{2},…,*S*_{n}}, where *S*_{x} is itself a collection of *n*_{x} cell states \(\left \{C_{x_{1}}, \ldots C_{x_{n_{x}}}\right \}\) such that \(\phantom {\dot {i}\!}x_{1} < x_{2} <\ldots < x_{n_{x}}\). For any two cell types *S*_{x} and *S*_{y}, if *x*<*y*, then *x*_{1}<*y*_{1}.

We encode gene regulation in a binary matrix \(\phantom {\dot {i}\!}GR_{2^{N}\times 2^{N}}\). To generate *GR* for a given organism, we pick the number of stable cell types *n*≤2^{N} according to the uniform random distribution. First, we assign cell states that form the basins of these stable cell types: Cell states are uniform randomly partitioned among the *n* basins. We then choose cell states that form the stable cell type from within the corresponding basins: let *B*_{x} be a basin, then for some *j* such that (*C*_{j}∈*B*_{x}),(*C*_{j}∈*S*_{x}) with probability 0.5. For all *i* such that *C*_{i}∈*B*_{x},GR(*i*,*j*)=1 if (*C*_{j}∈*S*_{x}).

#### Homeostatic organisms and their cell type lineage graphs

Let us consider an organism in state *T*_{t} at time step *t*. Right after cell division, let the state of the organism be represented by \({T^{\text {div}}_{t}}\). After division, the organism is composed of all the daughter cells produced in that time step. That is:

$${T^{\text{div}}_{t}}(i) = 1,\,\, \text{if}\,\, \exists j \leq 2^{N}\,\, \text{s.t.}\,\, \left(T_{t}(j) = 1\right) \wedge ({\text{CD}}(j,i) = 1) $$

These daughter cells exchange signals among themselves. Let \({T^{\text {sig}}_{t}}\) represent the state of the organism right after signal exchange. Then:

$$\begin{array}{@{}rcl@{}} {T^{\text{sig}}_{t}}(i) &=& 1,\,\, \text{if}\,\, \exists j1 \leq 2^{N}\,\,\text{s.t.}\,\, {T^{\text{div}}_{t}}(j1) = 1,\,\, \text{where}\\ && \forall k\,\,\text{s.t.}\,\, C_{i}(k) = 0, C_{j1}(k) = 0,\,\, \text{and}\\ \forall k\, \text{s.t.}\, C_{i}(k)\!\! &=& \!\!1, \left(C_{j1}(k) = 1\right)\\&&\!\vee\!\left(\!{\sum\nolimits}^{2^{N}}_{j2=1}\!(A(j2,j1) \!\times\! {\text{SG}}_{j2}(k) \!\times\! {T^{\text{div}}_{t}}(j2))\!>\!0\!\right) \end{array} $$

The signals received by a cell type activates its gene regulatory network. Gene regulation updates the set of cell types according to the following expression: ∀*i*≤2^{N},

$$T_{t+1}(i) = 1,\,\, \text{if}\,\, \exists j\,\, \text{s.t.}\,\, \left({T^{\text{sig}}_{t}}(j)=1\right) \wedge ({\text{GR}}(j,i)=1) $$

Therefore, the organism is only composed of stable cell types. Let the system have *n*≤2^{N} stable cell states. Then, we can equivalently represent the state of the organism at time step *t* as a binary vector \(T^{SC}_{t} = [0,1]^{n}\), such that for *x*∈{1,2,…*n*}.

$$T^{SC}_{t}(x) = 1 \iff \left(T_{t}(i) = 1\right) \wedge \left(\exists C_{i} \in S_{x}\right) $$

We call states of the organism such that \(T^{SC}_{t+1} = T^{SC}_{t}\) homeostatic organisms (Fig. 1f, g).

We represent the homeostatic organism as a cell type lineage graph. The nodes of the graph represent stable cell states that are present in the homeostatic organism, and directed edges represent lineage relationships between these stable cell states. Let the stable cell states *S*_{x1} and *S*_{x2} both be present in the final organism, and let them be represented by nodes *V*_{a} and *V*_{b} of the lineage graph, respectively. Then, there is an edge from *V*_{a} to *V*_{b} if one of the daughter cells of *S*_{x1} gives rise to *S*_{x2} after one round of cell signaling and gene regulation (Fig. 1g). That is:

$$\begin{aligned} &&\text{Let} C_{i} \in S_{x1}\,\, \text{and}\,\, C_{l} \in S_{x2}.\\ &&\text{Then, there is an edge}\,\, V_{a} \rightarrow V_{b} \,\,\,\text{if}\\ && \exists j\,\, \text{s.t.}\,\, {\text{CD}}(i,j)=1\\ &&\text{and, in this organism}, {C^{\text{sig}}_{j}} = C_{k} \\ &&\text{where}\,\, {\text{GR}}(k,l) = 1 \end{aligned} $$

### Assignment of topologies to lineage graphs

We categorize lineage graphs into 6 topologies: unicellular, strongly connected component(SCC), cyclic, chain, tree, and other directed acyclic graphs (DAG). We ignore self-edges while assigning these topologies. A lineage graph is called *unicellular* if it has only a single node. For all other topologies, we used the networkx (version 2.2) module of Python3.6. A lineage graph is called *SCC* if the graph has more than 1 node and contains a single strongly connected component, it is called *cyclic* if the graph contains cycles and has more than one strongly connected component, it is called a *chain* if networkx classifies it as a tree and the maximum in-degree and out-degree are 1, it is called a *tree* if networkx classifies it as a tree and maximum in-dergee or out-degree is greater than 1, and it is called a *DAG* if networkx classifies it as a directed acyclic graph but not a tree.

### Lineage graph randomization protocol

We represent a lineage graph with *e* edges as a matrix *E*_{e×2}, where *E*(*i*,1) and *E*(*i*,2) represent the source and the target node of edge *i*, respectively. To randomize lineage graphs, we used a protocol that preserves in- and out-degrees of each node; we randomly choose pairs of edges from the graph and swap their target nodes. Let the randomized graph *E*_{rand} be initially identical to *E*. Then, for any two edges of the lineage graph *i*,*j*, we propose a swap:

$${E_{\text{rand}}}(i,2) = {E_{\text{rand}}}(j,2),\,\, \text{and}\,\, {E_{\text{rand}}}(j,2) = {E_{\text{rand}}}(i,2) $$

The swap is accepted if there is no edge *k* such that:

$$\begin{array}{@{}rcl@{}} &&({E_{\text{rand}}}(k,1) = {E_{\text{rand}}}(i,1))\wedge({E_{\text{rand}}}(k,2) = {E_{\text{rand}}}(j,2)), \text{or}\\&& ({E_{\text{rand}}}(k,1) = {E_{\text{rand}}}(j,1))\wedge({E_{\text{rand}}}(k,2) = {E_{\text{rand}}}(i,2)) \end{array} $$

The above condition ensures that the total number of unique edges in *E* and *E*_{rand} remain the same. We swap edges 1000 times for each lineage graph to randomize it.

### Independent and intrinsically independent cell types

We call a cell type *independent* if it has the same cell fate when grown outside the organism as it does when it is a part of the organism. The cell fate \({C^{\text {fate}}_{i}}\) of some cell type *C*_{i} in the organism is given by the set of cell types receiving an edge from the node *C*_{i} in the organism’s lineage graph. To decide whether a given cell type *C*_{i} is independent or not, we separate this cell type from the rest of the organism and allow it to undergo one round of cell division, signaling, and gene regulation, according to the same matrices CD,SG, *A*, and GR that were used to generate the organism from which it was taken. Let us call the resulting set of cell types \({C^{\text {reg}}_{i}}\). We call the cell *C*_{i} independent if \({C^{\text {reg}}_{i}}\) is identical to \({C^{\text {fate}}_{i}}\).

For some cell types, the basis of their independence is an insensitivity to signals produced in the organism. In such a case, the set of signals produced by the daughter cells of the cell type is sufficient to satisfy the maximum set of signals that each of the daughter cells can receive.

Let the set of daughter cells of cell type *C*_{i} in an organism be *D*_{i}. ∀*C*_{j}∈*D*_{i} let \({\text {Rec}^{\text {all}}_{j}}\) represent the maximal set of signals that it can receive, when all 2^{N} possible cell types are present together. That is, for all signaling molecules *k* such that SG(*k*)=1,

$$ {\text{Rec}^{\text{all}}_{j}}(k) = 1,\,\, \text{if}\,\, \Sigma^{2^{N}}_{l=1}\left(A(l,j) \wedge (C_{l}(k) = 1)\right) $$

And let \({\text {Rec}}^{D}_{j}\) be the set of signals it receives from within the set of cells *D*_{i}, i.e.,

$${\text{Rec}}^{D}_{j}(k) = 1,\,\,\text{if}\,\, \Sigma_{C_{l} \in D_{i}} \left(A(l,j) \wedge (C_{l}(k) = 1)\right) $$

If for all \(C_{j} \in D_{i},{\text {Rec}^{\text {all}}_{j}}\) = \({\text {Rec}}^{D}_{j},C_{i}\) is *intrinsically independent*.