 Research Article
 Open Access
 Published:
Mechanisms of blood homeostasis: lineage tracking and a neutral model of cell populations in rhesus macaques
BMC Biologyvolume 13, Article number: 85 (2015)
Abstract
Background
How a potentially diverse population of hematopoietic stem cells (HSCs) differentiates and proliferates to supply more than 10^{11} mature blood cells every day in humans remains a key biological question. We investigated this process by quantitatively analyzing the clonal structure of peripheral blood that is generated by a population of transplanted lentivirusmarked HSCs in myeloablated rhesus macaques. Each transplanted HSC generates a clonal lineage of cells in the peripheral blood that is then detected and quantified through deep sequencing of the viral vector integration sites (VIS) common within each lineage. This approach allowed us to observe, over a period of 412 years, hundreds of distinct clonal lineages.
Results
While the distinct clone sizes varied by three orders of magnitude, we found that collectively, they form a steadystate clone sizedistribution with a distinctive shape. Steadystate solutions of our model show that the predicted clone sizedistribution is sensitive to only two combinations of parameters. By fitting the measured clone sizedistributions to our mechanistic model, we estimate both the effective HSC differentiation rate and the number of active HSCs.
Conclusions
Our concise mathematical model shows how slow HSC differentiation followed by fast progenitor growth can be responsible for the observed broad clone sizedistribution. Although all cells are assumed to be statistically identical, analogous to a neutral theory for the different clone lineages, our mathematical approach captures the intrinsic variability in the times to HSC differentiation after transplantation.
Background
Around 10^{11} new mature blood cells are generated in a human every day. Each mature blood cell comes from a unique hematopoietic stem cell (HSC). Each HSC, however, has tremendous proliferative potential and contributes a large number and variety of mature blood cells for a significant fraction of an animal’s life. Traditionally, HSCs have been viewed as a homogeneous cell population, with each cell possessing equal and unlimited proliferative potential. In other words, the fate of each HSC (to differentiate or replicate) would be determined by its intrinsic stochastic activation and signals from its microenvironment [1, 2].
However, as first shown in MullerSieburg et al. [3], singly transplanted murine HSCs differ significantly in their longterm lineage (celltype) output and in their proliferation and differentiation rates [4–7]. Similar findings have been found from examining human embryonic stem cells and HSCs in vitro [8, 9]. While celllevel knowledge of HSCs is essential, it does not immediately provide insight into the question of blood homeostasis at the animal level. More concretely, analysis of singlecell transplants does not apply to human bone marrow transplants, which involve millions of CD34expressing primitive hematopoietic and committed progenitor cells. Polyclonal blood regeneration from such hematopoietic stem and progenitor cell (HSPC) pools is more complex and requires regulation at both the individual cell and system levels to achieve stable [10, 11] or dynamic [12] homeostasis.
To dissect how a population of HSPCs supplies blood, several highthroughput assay systems that can quantitatively track repopulation from an individual stem cell have been developed [6, 11, 13, 14]. In the experiment analyzed in this study, as outlined in Fig. 1, each individual CD34+ HSPC is distinctly labeled by the random incorporation of a lentiviral vector in the host genome before transplantation into an animal. All cells that result from proliferation and differentiation of a distinctly marked HSPC will carry identical markings defined by the location of the original viral vector integration site (VIS). By sampling nucleated blood cells and enumerating their unique VISs, one can quantify the cells that arise from a single HSPC marked with a viral vector. Such studies in humans [15] have revealed highly complex polyclonal repopulation that is supported by tens of thousands of different clones [15–18]; a clone is defined as a population of cells of the same lineage, identified here by a unique VIS. These lineages, or clones, can be distributed across all cell types that may be progeny of the originally transplanted HSC after it undergoes proliferation and differentiation. However, the number of cells of any VIS lineage across certain cell types may be different. By comparing abundances of lineages across blood cells of different types, for example, one may be able to determine the heterogeneity or bias of the HSC population or if HSCs often switch their output. This type of analysis remains particularly difficult in human studies since transplants are performed under diseased settings and followed for only 1 or 2 years.
We analyze here a systematic clonetracking study that used a large number of HSPC clones in a transplant and competitive repopulation setting comparable to that used in humans [19]. In these nonhuman primate rhesus macaque experiments, lentiviral vectormarked clones were followed for up to a decade posttransplantation (equivalent to about 30 years in humans if extrapolated by average life span). All data are available in the supplementary information files of Kim et al. [19]. This longterm study allows one to distinguish clearly HSC clones from other shortterm progenitor clones that were included in the initial pool of transplanted CD34+ cells. Hundreds to thousands of detected clones participated in repopulating the blood in a complex yet highly structured fashion. Preliminary examination of some of the clone populations suggests waves of repopulation with shortlived clones that first grow then vanish within the first 1 or 2 years, depending on the animal [19].
Subsequent waves of HSC clones appear to rise and fall sequentially over the next 4–12 years. This picture is consistent with recent observations in a transplantfree transposonbased tagging study in mice [20] and in human gene therapy [15, 16]. Therefore, the dynamics of a clonally tracked nonhuman primate HSPC repopulation provides rich data that can inform our understanding of regulation, stability, HSPC heterogeneity, and possibly HSPC aging in hematopoiesis.
Although the timedependent data from clonal repopulation studies are rich in structure, in this study we focus on one specific aspect of the data: the number of clones that are of a certain abundance as described in Fig. 2. Rather than modeling the highly dynamic populations of each clone, our aim here is to develop first a more global understanding of how the total number of clones represented by specific numbers of cells arises within a mechanistically reasonable model of hematopoiesis. The size distributions of clones present in the blood sampled from different animals at different times are characterized by specific shapes, with the largest clones being a factor of 100–1000 times more abundant than the most rarely detected clones. Significantly, our analysis of renormalized data indicates that the clone size distribution (measuring the number of distinct lineages that are of a certain size) reaches a stationary state as soon as a few months after transplantation (see Fig. 4 below). To reconcile the observed stationarity of the clone size distributions with the large diversity of clonal contributions in the context of HSPCmediated blood repopulation, we developed a mathematical model that treats three distinct cell populations: HSCs, transitamplifying progenitor cells, and fully differentiated nucleated blood cells (Fig. 3). While multistage models for a detailed description of differentiation have been developed [21], we lump different stages of cell types within the transitamplifying progenitor pool into one population, avoiding excess numbers of unmeasurable parameters. Another important feature of our model is the overall effect of feedback and regulation, which we incorporate via a populationdependent cell proliferation rate for progenitor cells.
The effective proliferation rate will be modeled using a Hilltype suppression that is defined by the limited space for progenitor cells in the bone marrow. Such a regulation term has been used in models of cyclic neutropenia [22] but has not been explicitly treated in models of clone propagation in hematopoiesis. Our mathematical model is described in greater detail in the next section and in Additional file 1.
Our model shows that both the large variability and the characteristic shape of the clone size distribution can result from a slow HSCtoprogenitor differentiation followed by a burst of progenitor growth, both of which are generic features of hematopoietic systems across different organisms. By assuming a homogeneous HSC population and fitting solutions of our model to available data, we show that randomness from stochastic activation and proliferation and a global carrying capacity are sufficient to describe the observed clonal structure. We estimate that only a few thousand HSCs may be actively contributing toward blood regeneration at any time. Our model can be readily generalized to include the role of heterogeneity and aging in the transplanted HSCs and provides a framework for quantitatively studying physiological perturbations and genetic modifications of the hematopoietic system.
Mathematical Model
Our mathematical model explicitly describes three subpopulations of cells: HSCs, transitamplifying progenitor cells, and terminally differentiated blood cells (see Fig. 3). We will not distinguish between myeloid or lymphoid lineages but will use our model to analyze clone size distribution data for granulocytes and peripheral blood mononuclear cells independently. Our goal will be to describe how clonal lineages, started from distinguishable HSCs, propagate through the amplification and terminal differentiation processes.
Often clone populations are modeled directly by dynamical equations for n _{ j }(t), the number of cells of a particular clone j identified by its specific VIS [23]. Since all cells are identical except for their lentiviral marking, meanfield rate equations for n _{ j }(t) are identical for all j. Assuming identical initial conditions (one copy of each clone), the expected populations n _{ j }(t) would be identical across all clones j. This is a consequence of using identical growth and differentiation rates to describe the evolution of the mean number of cells of each clone.
Therefore, for cells in any specific pool, rather than deriving equations for the mean number n _{ j } of cells of each distinct clone j (Fig. 2 a), we perform a hodograph transformation [24] and formulate the problem in terms of the number of clones that are represented by k cells, $c_{k} = \sum _{j}\delta _{k,n_{j}}$ (see Fig. 2 b), where the Kronecker δ function $\delta _{k,n_{j}}=1$ only when k=n _{ j } and is 0 otherwise. This counting scheme is commonly used in the study of cluster dynamics in nucleation [25] and in other related models describing the dynamics of distributions of cell populations. By tracking the number of clones of different sizes, the intrinsic stochasticity in the times of cell division (especially that of the first differentiation event) and the subsequent variability in the clone abundances are quantified. Figure 2 a, b qualitatively illustrates n _{ j } and c _{ k }, pretransplant and after 5 years, corresponding to the scenario depicted in Fig. 1 a. Cells in each of the three pools are depicted in Fig. 3, with different clones grouped according to the number of cells representing each clone.
The first pool (the progenitorcell pool) is fed by HSCs through differentiation. Regulation of HSC differentiation fate is known to be important for efficient repopulation [26, 27] and control [28] and the balance between asymmetric and symmetric differentiation of HSCs has been studied at the microscopic and stochastic levels [29–32]. However, since HSCs have life spans comparable to that of an animal, we reasoned that the total number of HSCs changes only very slowly after the initial fewmonth transient after transplant. For simplicity, we will assume, consistent with estimates from measurements [33], that HSCs divide only asymmetrically. Therefore, upon differentiation, each HSC produces one partially differentiated progenitor cell and one replacement HSC. How symmetric HSC division might affect the resulting clone sizes is discussed in Additional file 1 through a specific model of HSC renewal in a finitesized HSC niche. We find that the incorporation of symmetric division has only a small quantitative effect on the clone size distribution that we measure and ultimately analyze.
Next, consider the progenitorcell pool. From Fig. 3, we can count the number of clones c _{ k } represented by exactly k cells. For example, the black, red, green, and yellow clones are each represented by three cells, so c _{3}=4. Each progenitor cell can further differentiate with rate ω into a terminally differentiated cell. If progenitor cells undergo symmetric differentiation with probability η and asymmetric differentiation with probability 1−η, the effective rate of differentiation is 2η ω+(1−η)ω=(1+η)ω. In turn, fully differentiated blood cells (not all shown in Fig. 3) are cleared from the peripheral pool at rate μ _{d}, providing a turnover mechanism. Finally, each measurement is a smallvolume sample drawn from the peripheral blood pool, as shown in the final panel in Fig. 3.
Note that the transplanted CD34+ cells contain both true HSCs and progenitor cells. However, we assume that at long times, specific clones derived from progenitor cells die out and that only HSCs contribute to longlived clones. Since we measure the number of clones of a certain size rather than the dynamics of individual clone numbers, transplanted progenitor cells should not dramatically affect the steadystate clone size distribution. Therefore, we will ignore transplanted progenitor cells and assume that after transplantation, effectively only U unlabeled HSCs and C labeled (lentivirusmarked) HSCs are present in the bone marrow and actively asymmetrically differentiating (Fig. 3). Massaction equations for the expected number of clones c _{ k } of size k are derived from considering simple birth and death processes with immigration (HSC differentiation):
where k=1,2,…,C and $c_{0}(t) \equiv C  \sum _{k=1}^{\infty }c_{k}(t)$ is the number of clones that are not represented in the progenitor pool. Since C is large, and the number of clones that are of size comparable to C is negligible, we will approximate C→∞ in our mathematical derivations. We have suppressed the time dependence of c _{ k }(t) for notational simplicity. The constant parameter α is the asymmetric differentiation rate of all HSCs, while r and μ are the proliferation and overall clearance rates of progenitor cells. In our model, HSC differentiation events that feed the progenitor pool are implicitly a rate α Poisson process. The appreciable number of detectable clones (Fig. 1 b) implies the initial number C of HSC clones is large enough that asymmetric differentiation of individual HSCs is uncorrelated. The alternative scenario of a few HSCs undergoing synchronized differentiation would not lead to appreciably different results since the resulting distribution c _{ k } is more sensitive to the progenitor cells’ unsynchronized replication and death than to the statistics of the immigration by HSC differentiation.
The final differentiation from progenitor cell to peripheral blood cell can occur through symmetric or asymmetric differentiation, with probabilities η and 1−η, respectively. If parent progenitor cells are unaffected after asymmetric terminal differentiation (i.e., they die at the normal rate μ _{p}), the dynamics are feedforward and the progenitor population is not influenced by terminal differentiation. Under symmetric differentiation, a net loss of one progenitor cell occurs. Thus, the overall progenitorcell clearance rate can be decomposed as μ=μ _{p}+η ω. We retain the factor η in our equations for modeling pedagogy, although in the end it is subsumed in effective parameters and cannot be independently estimated from our data.
The first term in Eq. 1 corresponds to asymmetric differentiation of each of the C active clones, of which c _{ k } are of those lineages with population k already represented in the progenitor pool. Differentiation of this subset of clones will add another cell to these specific lineages, reducing c _{ k }. Similarly, differentiation of HSCs in lineages that are represented by k−1 progenitor cells adds cells to these lineages and increases c _{ k }. Note that Eq. 1 are meanfield rate equations describing the evolution of the expected number of clones of size k. Nonetheless, they capture the intrinsic dispersion in lineage sizes that make up the clone size distribution. While all cells are assumed to be statistically identical, with equal rates α, p, and μ, Eq. 1 directly model the evolution of the distribution c _{ k }(t) that arises ultimately from the distribution of times for each HSC to differentiate or for the progenitor cells to replicate or die. Similar equations have been used to model the evolving distribution of virus capsid sizes [34].
Since the equations for c _{ k }(t) describe the evolution of a distribution, they are sometimes described as master equations for the underlying process [34, 35]. Here we note that the solution to Eq. 1, c _{ k }(t), is the expected distribution of clone sizes. Another level of stochasticity could be used to describe the evolution of a probability distribution $P_{b}(\textbf {b};t) = P_{b}(b_{0}, b_{1},\ldots,b_{N_{\mathrm {p}}};t)\phantom {\dot {i}\!}$ over the integer numbers b _{ k }. This density represents the probability that at time t, there are b _{0} unrepresented lineages, b _{1} lineages represented by one cell in the progenitor pool, b _{2} lineages represented by two cells in the progenitor pool, and so on. Such a probability distribution would obey an N _{p}dimensional master equation rather than a onedimensional equation, like Eq. 1, and once known, can be used to compute the mean $c_{k}(t) = \sum _{\textbf {b}} b_{k}P(\textbf {b};t)$ . To consider the entire problem stochastically, the variability described by probability distribution P _{ b } would have to be propagated forward to the differentiated cell pool as well. Given the modest number of measured data sets and the large numbers of lineages that are detectable in each, we did not attempt to use the data as samples of the distribution P _{ b } and directly model the mean values c _{ k } instead. Variability from both intrinsic stochasticity and sampling will be discussed in Additional file 1.
After defining u(t) as the number of unlabeled cells in the progenitor pool, and $N_{\mathrm {p}}(t) = u(t)+\sum _{k=1}^{\infty }{kc}_{k}(t)$ as the total number of progenitor cells, we find $\dot {u} = (r  \mu) u + \alpha U$ and
Without regulation, the total population N _{p}(t→∞) will either reach N _{p}≈α(U+C)/(μ−r) for μ>r or will exponentially grow without bound for r>μ. Complex regulation terms have been employed in deterministic models of differentiation [28] and in stochastic models of myeloid/lymphoid population balance [36]. For the purpose of estimating macroscopic clone sizes, we assume regulation of cell replication and/or spatial constraints in the bone marrow can be modeled by a simple effective Hilltype growth law [22, 37]:
where p is the intrinsic replication rate of an isolated progenitor cell. We assume that progenitor cells at low density have an overall positive growth rate p>μ. The parameter K is the progenitorcell population in the bone marrow that corresponds to the halfmaximum of the effective growth rate. It can also be interpreted as a limit to the bone marrow size that regulates progenitorcell proliferation to a value determined by K, p, and μ and is analogous to the carrying capacity in logistic models of growth [38]. For simplicity, we will denote K as the carrying capacity in Eq. 3 as well. Although our data analysis is insensitive to the precise form of regulation used, we chose the Hilltype growth suppression because it avoids negative growth rates that confuse physiological interpretation. An orderofmagnitude estimate of the bone marrow size (or carrying capacity) in the rhesus macaque is K∼10^{9}. Ultimately, we are interested in how a limited progenitor pool influences the overall clone size distribution, and a simple, singleparameter (K) approximation to the progenitorcell growth constraint is sufficient.
Upon substituting the growth law r(N _{p}) described by Eq. 3 into Eq. 2, the total progenitorcell population N _{p}(t→∞) at long times is explicitly shown in Additional file 1: Eq. A19 to approach a finite value that depends strongly on K. Progenitor cells then differentiate to supply peripheral blood at rate (1+η)ω so that the total number of differentiated blood cells obeys
At steady state, the combined peripheral nucleated blood population is estimated to be N _{d}∼10^{9}– 10^{10} [39], setting an estimate of N _{d}/N _{p}≈(1+η)ω/μ _{d}∼1–10. Moreover, as we shall see, the relevant factor in our steadystate analysis will be the numerical value of the effective growth rate r, rather than its functional form. Therefore, the chosen form for regulation will not play a role in the mathematical results in this paper except to define parameters (such as K) explicitly in the regulation function itself.
To distinguish and quantify the clonal structure within the peripheral blood pool, we define $y_{n}^{(k)}$ to be the number of clones that are represented by exactly n cells in the differentiated pool and k cells in the progenitor pool. For example, in the peripheral blood pool shown in Fig. 3, $y_{1}^{(3)} = y_{2}^{(3)} = y_{4}^{(3)} = y_{6}^{(3)} = 1$ . This counting of clones across both the progenitor and peripheral blood pools is necessary to balance progenitorcell differentiation rates with peripheral blood turnover rates. The evolution equations for $y_{n}^{(k)}$ can be expressed as
where $y_{0}^{(k)} \equiv c_{k}  \sum _{n=1}^{\infty }y_{n}^{(k)}$ represents the number of progenitor clones of size k that have not yet contributed to peripheral blood. The transfer of clones from the progenitor population to the differentiated pool arises through $y_{0}^{(k)}$ and is simply a statement that the number of clones in the peripheral blood can increase only by differentiation of a progenitor cell whose lineage has not yet populated the peripheral pool. The first two terms on the righthand side of Eq. 5 represent immigration of clones represented by n−1 and n differentiated cells conditioned upon immigration from only those specific clones represented by k cells in the progenitor pool. The overall rate of addition of clones from the progenitor pool is thus (1+η)ω k, in which the frequency of terminal differentiation is weighted by the stochastic division factor (1+η). By using the Hilltype growth term r(N _{p}) from Eq. 3, Eq. 1 can be solved to find c _{ k }(t), which in turn can be used in Eq. 5 to find $y_{n}^{(k)}(t)$ . The number of clones in the peripheral blood represented by exactly n differentiated cells is thus $y_{n}(t) = \sum _{k=1}^{\infty }y_{n}^{(k)}(t)$ .
As we mentioned, Eqs. 1 and 5 describe the evolution of the expected clone size distribution. Since each measurement represents one realization of the distributions c _{ k }(t) and y _{ n }(t), the validity of Eqs. 1 and 5 relies on a sufficiently large C such that the marked HSCs generate enough lineages and cells to allow the subsequent peripheral blood clone size distribution to be sampled adequately. In other words, measurementtomeasurement variability described by e.g., $\phantom {\dot {i}\!}\langle c_{k}(t)c_{k^{\prime }}(t)\rangle  \langle c_{k}(t)\rangle \langle c_{k^{\prime }}(t)\rangle $ is assumed negligible (see Additional file 1). Our modeling approach would not be applicable to studying single HSC transplant studies [4–6] unless the measured clone sizes from multiple experiments are aggregated into a distribution.
Finally, to compare model results with animal blood data, we must consider the final step of sampling small aliquots of the differentiated blood. As derived in Additional file 1: Eq. A11, if S marked cells are drawn and sequenced successfully (from a total differentiated cell population N _{d}), the expected number of clones 〈m _{ k }(t)〉 represented by k cells is given by
where ε≡S/N _{d}≪1 and $F(q,t) \equiv \sum _{k=0}^{q}\langle m_{k}(t)\rangle $ is the sampled, expected cumulative size distribution. Upon further normalization by the total number of detected clones in the sample, C _{s}(t)=F(S,t)−F(0,t), we define
as the fraction of the total number of sampled clones that are represented by q or fewer cells. Since the data represented in terms of Q will be seen to be timeindependent, explicit expressions for $c_{k}, y_{n}^{(k)}$ , 〈m _{ k }〉, and Q(q) can be derived. Summarizing, the main features and assumptions used in our modeling include:

A neutralmodel framework [40] that directly describes the distribution of clone sizes in each of the three cell pools: progenitor cells, peripheral blood cells, and sampled blood cells. The cells in each pool are statistically identical.

A constant asymmetric HSC differentiation rate α. The appreciable numbers of unsynchronized HSCs allow the assumption of Poissondistributed differentiation times of the HSC population. The level of differentiation symmetry is found to have little effect on the steadystate clone size distribution (see Additional file 1). The symmetry of the terminal differentiation step is also irrelevant for understanding the available data.

A simple oneparameter (K) growth regulation model that qualitatively describes the finite maximum size of the progenitor population in the bone marrow. Ultimately, the specific form for the regulation is unimportant since only the steadystate value of the growth parameter r affects the parameter fitting.
Using only these reasonable model features, we are able to compute clone size distributions and compare them with data. An explicit form for the expected steadystate clone size distribution 〈m _{ k }〉 is given in Additional file 1: Eq. A32, and the parameters and variables used in our analysis are listed in Table 1.
Results and discussion
In this section, we describe how previously published data (the number of cells of each detected clone in a sample of the peripheral blood, which are available in the supplementary information files of Kim et al. [19]) are used to constrain parameter values in our model. We emphasize that our model is structurally different from models used to track lineages and clone size distributions in retinal and epithelial tissues [41, 42]. Rather than tracking only the lineages of stem cells (which are allowed to undergo asymmetric differentiation, symmetric differentiation, or symmetric replication), our model assumes a highly proliferative population constrained by a carrying capacity K and slowly fed at rate α by an asymmetrically dividing HSC pool of C fixed clones. We have also included terminal differentiation into peripheral blood and the effects of sampling on the expected clone size distribution. These ingredients yield a clone size distribution different from those previously derived [41, 42], as described in more detail below.
Stationarity in time
Clonal contributions of the initially transplanted HSC population have been measured over 4–12 years in four different animals. As depicted in Fig. 4 a, populations of individual clones of peripheral blood mononuclear cells from animal RQ5427, as well as all other animals, show significant variation in their dynamics. Since cells of any detectable lineage will number in the millions, this variability in lineage size across time cannot be accounted for by the intrinsic stochasticity of progenitorcell birth and death. Rather, these rises and falls of lineages likely arise from a complicated regulation of HSC differentiation and lineage aging. However, in our model and analysis, we do not keep track of lineage sizes n _{ i }. Instead, define Q(ν) as the fraction of clones arising with relative frequency ν≡f q/S or less (here, q is the number of VIS reads of any particular clone in the sample, f is the fraction of all sampled cells that are marked, and S is the total number of sequencing reads of marked cells in a sample). Figure 4 b shows data analyzed in this way and reveals that Q(ν) appears stationary in time.
The observed steadystate clone size distribution is broad, consistent with the mathematical model developed above. The handful of most populated clones constitutes up to 1–5 % of the entire differentiated blood population. These dominant clones are followed by a large number of clones with fewer cells. The smallest clones sampled in our experiment correspond to a single read q=1, which yields a minimum measured frequency ν _{min}=f/S. A single read may comprise only 10^{−4}– 10^{−3} % of all differentiated blood cells. Note that the cumulative distribution Q(ν) exhibits higher variability at small sizes simply because fewer clones lie below these smaller sizes.
Although engraftment occurs within a few weeks and total blood populations N _{p} and N _{d} (and often immune function) reestablish themselves within a few months after successful HSC transplant [43, 44], it is still surprising that the clone size distribution is relatively static within each animal (see Additional file 1 for other animals). Given the observed stationarity, we will use the steadystate results of our mathematical model (explicitly derived in Additional file 1) for fitting data from each animal.
Implications and model predictions
By using the exact steadystate solution for c _{ k } (Additional file 1: Eq. A21) in Additional file 1: Eq. A18, we can explicitly evaluate the expected clone size distribution 〈m _{ k }〉 using Eq. 6, and the expected cumulative clone fraction Q(q) using Eq. 7. In the steady state, the clone size distribution of progenitor cells can also be approximated by a gamma distribution with parameters a≡α/r and $\bar {r} \equiv r/\mu $ : $c_{k} \sim \bar {r}^{k} k^{1+a}$ (see Additional file 1: Eq. A27). In realistic steadystate scenarios near carrying capacity, r=r(N _{p})≲μ, as calculated explicitly in Additional file 1: Eq. A20. By defining $\bar {r}=r/\mu = 1\delta $ , we find that δ is inversely proportional to the carrying capacity:
The dependencies of 〈m _{ q }〉 on δ and a=α/r are displayed in Fig. 5 a, in which we have defined w≡(1+η)ω/μ _{d}.
Although our equations form a meanfield model for the expected number of measured clones of any given size, randomness resulting from the stochastic differentiation times of individual HSCs (all with the same rate α) is taken into account.
This is shown in Additional file 1: Eqs. A36–A39, where we explicitly consider the fully stochastic population of a single progenitor clone that results from the differentiation of a single HSC. Since independent unsynchronized HSCs differentiate at times that are exponentially distributed (with rate α), we construct the expected clone size distribution from the birth–death–immigration process [45] to find a result equivalent to that derived from our original model (Eq. 1 and Additional file 1: Eq. A21). Thus, we conclude that if a=α/r is small, the shape of the expected clone size distribution is mainly determined at short times by the initial repopulation of the progenitorcell pool.
Our model also suggests that the expected number of sampled clones relative to the number of active transplanted clones (see Additional file 1: Eq. A24) can be expressed as:
where the last approximation is accurate for ε w≪1 and C _{s}/C≪1. The clonal diversity one expects to measure in the peripheral blood sample is sensitive to the combination of biologically relevant parameters and rates δ and a=α/r. Figure 5 b shows the explicit dependence of the fraction of active clones on a and the combination of parameters defining δ, for ε w=ε(1+η)ω/μ _{d}=5×10^{−5}.
Our analysis shows how scaled measurable quantities such as C _{s}/C and C ^{−1}〈m _{ q }〉 depend on just a few combinations of experimental and biological parameters. This small domain of parameter sensitivity reduces the number of parameters that can be independently extracted from clone size distribution data. For example, the mode of terminal differentiation described by η clearly cannot be extracted from clonal tracking measurements. Similarly, models that are more detailed of the complex regulation processes would introduce additional parameters that are not resolved by these experiments. Nonetheless, we shall fit our data and known information contained in the experimental protocol to our model to estimate biologically relevant parameters, such as the total number of activated HSCs U+C, and thus indirectly C.
Model fitting
Our mathematical model for 〈m _{ k }〉 (and F(q) and Q(q)) includes numerous parameters associated with the processes of HSC differentiation, progenitorcell amplification, progenitorcell differentiation, peripheral blood turnover, and sampling. Data fitting is performed using clone size distributions derived separately from the read counts from both the left and right ends of each VIS (see [14] for details on sequencing). Even though we fit our data to 〈m _{ k }〉 using three independent parameters, a=α/r, $\bar {r}= r/\mu $ , and ε w, we found that within the relevant physiological regime, all clone distributions calculated from our model are most sensitive to just two combinations of parameters (see Additional file 1 for an explicit derivation):
where the last approximation for R is valid when $1\bar {r} = \delta \ll 1$ . While the fits are rather insensitive to ε w this parameter can fortunately be approximated from estimates of S and the typical turnover rate of differentiated blood. Consequently, we find two maximum likelihood estimates (MLEs) for a and R at each time point. It is important to note that fitting our model to steadystate clone size distributions does not determine all of the physiological parameters arising in our equations. Rather, they provide only two constraints that allow one to relate their values.
For ease of presentation, henceforth we will show all data and comparisons with our model equations in terms of the fraction Q(ν) or Q(q) (Figs. 4 b and 6 a, b). Figure 6 a, b shows MLE fitting to the raw data 〈m _{ k }〉 plotted in terms of the normalized but unrescaled data Q(q) for two different peripheral blood cell types from two animals (RQ5427 and RQ3570). Data from all other animals are shown and fitted in Additional file 1, along with overall goodnessoffit metrics. Raw cell count data are given in Kim et al. [19].
HSC asymmetric differentiation rate
The MLE for a=α/r, a ^{∗}, was typically in the range 10^{−2}– 10^{−1}. Given realistic parameter values, this quantity mostly provides an estimate of the HSC relative differentiation rate a ^{∗}∼α/(μ _{p}+η ω). The smallness of a ^{∗} indicates slow HSC differentiation relative to the progenitor turnover rate μ _{p} and the final differentiation rate η ω, consistent with the dominant role of progenitor cells in populating the total blood tissue. Note that besides the intrinsic insensitivity to ε w, the goodnessoffit is also somewhat insensitive to small values of a ^{∗} due to the weak dependence of c _{ k }∼1/k ^{1−a} on a (see Additional file 1). The normalized relative differentiation rates estimated from two animals are shown by the squares (right axis) in Fig. 6 c, d.
Number of HSCs
The stability of blood repopulation kinetics is also reflected in the number of estimated HSCs that contribute to blood (shown in Fig. 6 c, d). The total number of HSCs is estimated by expressing U+C in terms of the effective parameters, R and a, which in turn are functions of microscopic parameters (α,p,μ _{p},μ _{d},w, and K) that cannot be directly measured. In the limit of small sample size, S≪R ^{∗} K, however, we find U+C≈S/(R ^{∗} a ^{∗}) (see Additional file 1), which can then be estimated using the MLEs a ^{∗} and R ^{∗} obtained by fitting the clone size distributions. The corresponding values of U+C for two animals are shown by the circles (left axis) in Fig. 6 c, d. Although variability in the MLEs exists, the fluctuations appear stationary over the course of the experiment for each animal (see Additional file 1).
Conclusions
Our clonal tracking analysis revealed that individual clones of HSCs contributed differently to the final differentiated blood pool in rhesus macaques, consistent with mouse and human data. Carefully replotting the raw data (clone sizes) in terms of the normalized, rescaled cumulative clone size distribution (the fraction of all detected clones that are of a certain size or less) shows that these distributions reach steady state a few months after transplantation. Our results carry important implications for stem cell biology. Maintaining homeostasis of the blood is a critical function for an organism. Following a myeloablative stem cell transplant, the hematopoietic system must repopulate rapidly to ensure the survival of the host. Not only do individual clones rise and fall temporally, as previously shown [19], but as any individual clone of a certain frequency declines, it is replaced by another of similar frequency. This exchangecorrelated mechanism of clone replacement may provide a mechanism by which overall homeostasis of hematopoiesis is maintained long term, thus ensuring continued health of the blood system.
To understand these observed features and the underlying mechanisms of stem cellmediated blood regeneration, we developed a simple neutral population model of the hematopoietic system that quantifies the dynamics of three subpopulations: HSCs, transitamplifying progenitor cells, and fully differentiated nucleated blood cells. We also include the effects of global regulation by assuming a Hilltype growth rate for progenitor cells in the bone marrow but ignore celltocell variation in differentiation and proliferation rates of all cells.
Even though we do not include possible HSC heterogeneity, variation in HSC activation, progenitorcell regulation, HSC and progenitorcell aging (progenitor bursting), niche and signal moleculemediated controls, or intrinsic genetic and epigenetic differences, solutions to our simple homogeneous HSC model are remarkably consistent with observed clone size distributions. As a first step, we focus on how the intrinsic stochasticity in just the cellular birth, death, and differentiation events gives rise to the progenitor clone size distribution.
To a large extent, the exponentially distributed first HSC differentiation times and the growth and turnover of the progenitor pool control the shape of the expected longtime clone size distribution. Upon constraining our model to the physiological regime relevant to the experiments, we find that the calculated shapes of the clone size distributions are sensitive to effectively only two composite parameters. The HSC differentiation rate α sets the scale of the expected clone size distribution but has little effect on the shape. Parameters, including carrying capacity K, active HSCs U+C, and birth and death rates p,ω,μ _{p},μ _{d}, influence the shape of the expected clone size distribution 〈m _{ q }〉 only through the combination R, and only at large clone sizes.
Our analysis allowed us to estimate other combinations of model parameters quantitatively. Using a MLE, we find values for the effective HSC differentiation rate a ^{∗}∼10^{−2}– 10^{−1} and the number of HSCs that are contributing to blood within any given time frame U+C∼10^{3}– 10^{4}. Since the portion of HSCs that contribute to blood may vary across their typical life span L∼25 years, the total number of HSCs can be estimated by (U+C)×L/τ, where τ∼1 year [19]. Our estimate of a total count of ∼3×10^{4}– 3×10^{5} HSCs is about 30fold higher than the estimate of Abkowitz et al. [33] but is consistent with Kim et al. [19]. Note that the ratio of C to the total number of initially transplanted CD34+ cells provides a measure of the overall potency of the transplant towards blood regeneration. In the extreme case in which one HSC is significantly more potent (through, e.g., a faster differentiation rate), this ratio would be smaller. An example of this type of heterogeneity would be an HSC with one or more cancerassociated mutations, allowing it to outcompete other transplanted normal HSCs. Hence, our clonal studies and the associated mathematical analysis can provide a framework for characterizing normal clonal diversity as well as deviations from it, which may provide a metric for early detection of cancer and other related pathologies.
Several simplifying assumptions have been invoked in our analysis. Crucially, we assumed HSCs divided only asymmetrically and ignored instances of symmetric selfrenewal or symmetric differentiation. The effects of symmetric HSC division can be quantified in the steadystate limit. In previous studies, the selfrenewal rate for HSCs in primates is estimated as 4–9 months [46, 47], which is slightly longer than the short timescale (∼2–4 months) on which we observe stabilization of the clone size distribution. Therefore, if the HSC population slowly increases in time through occasional symmetric division, the clone size distribution in the peripheral blood will also shift over long times. The static nature of the clone distributions over many years suggests that size distributions are primarily governed by mechanisms operating at shorter timescales in the progenitor pool. For an HSC population (such as cancerous or precancerous stem cells [48]) that has already expanded through early replication, the initial clone size distribution within the HSC pool can be quantified by assuming an HSC pool with separate carrying capacity K _{HSC}. Such an assumption is consistent with other analyses of HSC renewal [49]. All our results can be used (with the replacement C→K _{HSC}) if the number of transplanted clones C≥K _{HSC} because replication is suppressed in this limit. When K _{HSC}≫C≫1, replicative expansion generates a broader initial HSC clone size distribution (see Additional file 1). The resulting final peripheral blood clone size distribution can still be approximated by our result (Eq. 6) if the normalized differentiation rate a≪1, exhibiting the insensitivity of the differentiated clone size distribution to a broadened clone size distribution at the HSC level. However, if HSC differentiation is sufficiently fast (a≪̸1), the clonal distribution in the progenitor and differentiated pools may be modified.
To understand the temporal dynamics of clone size distributions, a more detailed numerical study of our full timedependent neutral model is required. Such an analysis can be used to investigate the effects of rapid temporal changes in the HSC division mode [41]. Temporal models would also allow investigation into the evolution of HSC mutations and help unify concepts of clonal stability (as indicated by the stationarity of rescaled clone size distributions) with ideas of clonal succession [10, 11] or dynamic repetition [12] (as indicated by the temporal fluctuations in the estimated number U+C of active HSCs). Predictions of the timedependent behavior of clone size distributions will also prove useful in guiding future experiments in which the animals are physiologically perturbed via e.g., myeloablation, hypoxiation, and/or bleeding. In such experimental settings, regulation may also occur at the level of HSC differentiation (α) and a different mathematical model may be more appropriate.
We have not addressed the temporal fluctuations in individual clone abundances evident in our data (Fig. 4 a) or in the wavelike behavior suggested by previous studies [19]. Since the numbers of detectable cells of each VIS lineage in the whole animal are large, we believe these fluctuations do not arise from intrinsic cellular stochasticity or sampling. Rather, they likely reflect slow timescale HSC transitions between quiescent and active states and/or HSC aging [50]. Finally, subpopulations of HSCs that have different intrinsic rates of proliferation, differentiation, or clearance could then be explicitly treated. As long as each subtype in a heterogeneous HSC or progenitorcell population does not convert into another subtype, the overall aggregated clone size distribution 〈m _{ k }〉 will preserve its shape. Although steadystate data are insufficient to provide resolution of cell heterogeneity, more resolved temporal data may allow one to resolve different parameters associated with different cell types. Such extensions will allow us to study the temporal dynamics of individual clones and clone populations in the context of cancer stem cells and will be the subject of future work.
Abbreviations
 HSC:

hematopoietic stem cell
 HSPC:

hematopoietic stem and progenitor cell
 MLE:

maximum likelihood estimate
 VIS:

viral vector integration site
References
 1
Enver T, Heyworth CM, Dexter TM. Do stem cells play dice?Blood. 1998; 92:348–52.
 2
Hoang T. The origin of hematopoietic cell type diversity. Oncogene. 2004; 23:7188–98.
 3
MullerSieburg CE, Cho RH, Thoman M, Adkins B, Sieburg HB. Deterministic regulation of hematopoietic stem cell selfrenewal and differentiation. Blood. 2002; 100:1302–9.
 4
Copley MR, Beer PA, Eaves CJ. Hematopoietic stem cell heterogeneity takes center stage. Cell Stem Cell. 2012; 10:690–7.
 5
MullerSieburg CE, Sieburg HB, Bernitz JM, Cattarossi G. Stem cell heterogeneity: implications for aging and regenerative medicine. Blood. 2012; 119:3900–7.
 6
Lu R, Neff NF, Quake SR, Weissman IL. Tracking single hematopoietic stem cells in vivo using highthroughput sequencing in conjunction with viral genetic barcoding. Nat Biotechnol. 2011; 29:928–33.
 7
Huang S. Nongenetic heterogeneity of cells in development: more than just noise. Development. 2009; 136:3853–62.
 8
Osafune K, Caron L, Borowiak M, Martinez RJ, FitzGerald CS, Sato Y, et al. Marked differences in differentiation potential among human embryonic stem cell lines. Nat Biotechnol. 2008; 26:313–15.
 9
Pang WW, Price EA, Sahoo D, Beerman I, Maloney WJ, Rossi DJ, et al. Human bone marrow hematopoietic stem cells are increased in frequency and myeloidbiased with age. Proc Natl Acad Sci USA. 2011; 108:20012–17.
 10
Harrison DE, Astle CM, Lerner C. Number and continuous proliferative pattern of transplanted primitive immunohematopoietic stem cells. Proc Natl Acad Sci USA. 1988; 85:822–6.
 11
Verovskaya E, Broekhuis MJC, Zwart E, Ritsema M, van Os R, de Haan G, et al. Heterogeneity of young and aged murine hematopoietic stem cells revealed by quantitative clonal analysis using cellular barcoding. Blood. 2013; 122:523–32.
 12
Takizawa H, Regoes RR, Boddupalli CS, Bonhoeffer S, Manz MG. Dynamic variation in cycling of hematopoietic stem cells in steady state and inflammation. J Exp Med. 2011; 208:273–84.
 13
Gerrits A, Dykstra B, Kalmykowa OJ, Klauke K, Verovskaya E, Broekhuis MJC, et al. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood. 2010; 115:2610–18.
 14
Kim S, Kim N, Presson AP, An DS, Mao SH, Bonifacino AC, et al. Highthroughput, sensitive quantification of repopulating hematopoietic stem cell clones. J Virol. 2010; 84:11771–80.
 15
Biffi A, Montini E, Lorioli L, Cesani M, Fumagalli F, Plati T, et al. Lentiviral hematopoietic stem cell gene therapy benefits metachromatic leukodystrophy. Science. 2013; 341:1233158.
 16
Aiuti A, Biasco L, Scaramuzza S, Ferrua F, Cicalese MP, Baricordi C, et al. Lentiviral hematopoietic stem cell gene therapy in patients with Wiskott–Aldrich syndrome. Science. 2013; 341:1233151.
 17
CavazzanaCalvo M, Payen E, Negre O, Wang G, Hehir K, Fusil F, et al. Transfusion independence and HMGA2 activation after gene therapy of human βthalassaemia. Nature. 2010; 467:318–22.
 18
Cartier N, HaceinBeyAbina S, Bartholomae CC, Veres G, Schmidt M, Kutschera I, et al. Hematopoietic stem cell gene therapy with a lentiviral vector in Xlinked adrenoleukodystrophy. Science. 2009; 326:818–23.
 19
Kim S, Kim N, Presson AP, Metzger ME, Bonifacino AC, Sehl M, et al. Dynamics of HSPC repopulation in nonhuman primates revealed by a decadelong clonaltracking study. Cell Stem Cell. 2014; 14:473–85.
 20
Sun J, Ramos A, Chapman B, Johnnidis JB, Le L, Ho YJ, et al. Clonal dynamics of native haematopoiesis. Nature. 2014; 514:322–7.
 21
Loeffler M, Roeder I. Tissue stem cells: definition, plasticity, heterogeneity, selforganization and models – a conceptual approach. Cells Tissue Organs. 2002; 171:8–26.
 22
Bernard S, Belair J, Mackey MC. Oscillations in cyclical neutropenia: new evidence based on mathematical modeling. J Theor Biol. 2003; 223:283–98.
 23
Dingli D, Pacheco JM. Modeling the architecture and dynamics of hematopoiesis. Wiley Interdiscip Rev Syst Biol Med. 2010; 2:235–44.
 24
Courant R. Differential and integral calculus. Vol. II. London: Blackie & Son; 1936.
 25
D’Orsogna MR, Lakatos G, Chou T. Stochastic selfassembly of incommensurate clusters. J Chem Phys. 2012; 136:084110.
 26
MarciniakCzochra A, Stiehl T, Ho AD, Jager W, Wagner W. Modeling of asymmetric cell division in hematopoietic stem cells – regulation of selfrenewal is essential for efficient repopulation. Stem Cells Dev. 2009; 18:377–85.
 27
Kent DG, Li J, Tanna H, Fink J, Kirschner K, Pask DC, et al. Selfrenewal of single mouse hematopoietic stem cells is reduced by JAK2V617F without compromising progenitor cell expansion. PLoS Biol. 2013; 11:1001576.
 28
Lander AD, Gokoffski KK, Wan FYM, Nie Q, Calof AL. Cell lineages and the logic of proliferative control. PLoS Biol. 2009; 7:1000015.
 29
Hoffmann M, Chang HH, Huang S, Ingber DE, Loeffler M, Galle J. Noisedriven stem cell and progenitor population dynamics. PLoS One. 2008; 3:2922.
 30
Roshan A, Jones PH, Greenman CD. Exact, timeindependent estimation of clone size distributions in normal and mutated cells. J Roy Soc Interface. 2014; 11:20140654.
 31
McHale PT, Lander A. The protective role of symmetric stem cell division on the accumulation of heritable damage. PLoS Comput Biol. 2014; 10:1003802.
 32
Antal T, Krapivsky PL. Exact solution of a twotype branching process: Clone size distribution in cell division kinetics. J Stat Mech. 2010; P07028.
 33
Abkowitz JL, Caitlin SN, McCallie MT, Guttorp P. Evidence that the number of hematopoietic stem cells per animal is conserved in mammals. Blood. 2002; 100:2665–7.
 34
Morozov AY, Bruinsma R, Rudnick J. Assembly of viruses and the pseudolaw of mass action. J Chem Phys. 2009; 131:155101.
 35
Krapivsky PL, BenNaim E, Redner S. Statistical physics of irreversible processes. Cambridge, UK: Cambridge University Press; 2010.
 36
Szkely TJr, Burrage K, Mangel M, Bonsall MB. Stochastic dynamics of interacting haematopoietic stem cell niche lineages. PLoS Comput Biol. 2014; 10:1003794.
 37
Mackay R. Unified hypothesis for the origin of aplastic anemia and periodic hematopoiesis. Blood. 1978; 51:941–56.
 38
KeshetEdelstein L. Mathematical models in biology. New York, NY: SIAM; 2005.
 39
Wolfensohn S, Lloyd M. Handbook of laboratory animal management and welfare, 3rd ed. Oxford: Blackwell Publishing; 2003.
 40
Kimura M. Population genetics, molecular evolution, and the neutral theory: selected papers In: Takahata N, editor. Chicago, IL: University of Chicago Press: 1995.
 41
He J, Zhang G, Almeida AD, Cayoutte M, Simons BD, Harris WA. How variable clones build an invariant retina. Neuron. 2012; 75:786–98.
 42
Blanpain C, Simons BD. Unravelling stem cell dynamics by lineage tracing. Nat Rev Mol Cell Biol. 2013; 14:489–502.
 43
Guillaume T, Rubenstein DB, Symann M. Immune reconstitution and immunotherapy after autologous hematopoietic stem cell transplantation. Blood. 1998; 92:1471–90.
 44
Tzannou I, Leen AM. Accelerating immune reconstitution after hematopoietic stem cell transplantation. Clin Transl Immunol. 2014; 3:11.
 45
Allen LJS. An introduction to stochastic processes with applications to biology. Upper Saddle, NJ: Pearson Prentice Hall; 2003.
 46
Shepherd BE, Guttorp P, Lansdorp PM, Abkowitz JL. Estimating human hematopoietic stem cell kinetics using granulocyte telomere lengths. Exp Hematol. 2004; 32:1040–50.
 47
Shepherd BE, Kiem HP, Lansdorp PM, Dunbar CE, Aubert G, LaRochelle A, et al. Hematopoietic stemcell behavior in nonhuman primates. Blood. 2007; 110:1806–13.
 48
Driessens G, Beck B, Caauwe A, Simons BD, Blanpain C. Defining the mode of tumour growth by clonal analysis. Nature. 2012; 488:527–30.
 49
Sieburg HB, Cattarossi G, MullerSieburg CE. Lifespan differences in hematopoietic stem cells are due to imperfect repair and unstable meanreversion. PLoS Comput Biol. 2013; 9:1003006.
 50
Weiss GH. Equations for the age structure of growing populations. Bull Math Biophys. 1968; 30:427–35.
 51
Catlin SN, Busque L, Gale RE, Guttorp P, Abkowitz JL. The replication rate of human hematopoietic stem cells in vivo. Blood. 2011; 117:4460–6.
 52
DeBoer RJ, Mohri H, Ho DD, Perelson AS. Turnover rates of B cells, T cells, and NK cells in simian immunodeficiency virusinfected and uninfected rhesus macaques. J Immunol. 2003; 170:2479–87.
 53
Pillay J, den Braber I, Vrieskoop N, Kwast LM, de Boer RJ, Borghans JAM, et al. In vivo labeling with ^{2}H_{2}O reveals a human neutrophil lifespan of 5.4 days. Blood. 2010; 116:625–7.
Acknowledgments
This work was supported by grants from the National Institutes of Health (R01AI110297 and K99HL116234), the California Institute of Regenerative Medicine (TRX01431), the University of California, Los Angeles, AIDS Institute/Center for AIDS Research (AI28697), the National Science Foundation (PHY1125915 KITP/UCSB), and the Army Research Office (W911NF1410472). The authors also thank B Shraiman and RKP Zia for helpful discussions.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
TC and SG designed and developed the mathematical modeling and data analysis. TC, SG, and SK wrote the manuscript. SK and IC participated in study design and data interpretation. All authors read and approved the final manuscript.
Additional file
Additional file 1
Mathematical appendices and data fitting. (PDF 327 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Hematopoiesis
 Stem cell clones
 Lineage tracking
 Mathematical modeling