Evidence for close molecular proximity between reverting and undifferentiated cells

According to Waddington’s epigenetic landscape concept, the differentiation process can be illustrated by a cell akin to a ball rolling down from the top of a hill (proliferation state) and crossing furrows before stopping in basins or “attractor states” to reach its stable differentiated state. However, it is now clear that some committed cells can retain a certain degree of plasticity and reacquire phenotypical characteristics of a more pluripotent cell state. In line with this dynamic model, we have previously shown that differentiating cells (chicken erythrocytic progenitors (T2EC)) retain for 24 h the ability to self-renew when transferred back in self-renewal conditions. Despite those intriguing and promising results, the underlying molecular state of those “reverting” cells remains unexplored. The aim of the present study was therefore to molecularly characterize the T2EC reversion process by combining advanced statistical tools to make the most of single-cell transcriptomic data. For this purpose, T2EC, initially maintained in a self-renewal medium (0H), were induced to differentiate for 24H (24H differentiating cells); then, a part of these cells was transferred back to the self-renewal medium (48H reverting cells) and the other part was maintained in the differentiation medium for another 24H (48H differentiating cells). For each time point, cell transcriptomes were generated using scRT-qPCR and scRNAseq. Our results showed a strong overlap between 0H and 48H reverting cells when applying dimensional reduction. Moreover, the statistical comparison of cell distributions and differential expression analysis indicated no significant differences between these two cell groups. Interestingly, gene pattern distributions highlighted that, while 48H reverting cells have gene expression pattern more similar to 0H cells, they are not completely identical, which suggest that for some genes a longer delay may be required for the cells to fully recover. Finally, sparse PLS (sparse partial least square) analysis showed that only the expression of 3 genes discriminates 48H reverting and 0H cells. Altogether, we show that reverting cells return to an earlier molecular state almost identical to undifferentiated cells and demonstrate a previously undocumented physiological and molecular plasticity during the differentiation process, which most likely results from the dynamic behavior of the underlying molecular network.


Background
The integration and processing of endogenous and exogenous information constitute a fundamental requirement for cells to ensure functions and survival of unicellular or multicellular organisms. Cellular decision-making is then at the core of the physiological or pathological functioning of living organisms. Early views of the mechanisms governing cell-fate decisionmaking, and in particular cell differentiation, were based on bulk population data, leading to an over-simplifying deterministic framework. In these first views, cell commitment to a predefined cell type was thought to be triggered through a stereotyped sequence of intermediate states under the influence of specific signals [1].
Single-cell approaches have allowed to change the scale of observation of many molecular processes and revealed that an important heterogeneity in gene expression lies at the heart of isogenic cell populations [2,3]. Stochasticity in gene expression arises from different causes, such as the probabilistic nature of molecular interactions or transcriptional bursts [4]. Cell-to-cell variability is visible at all omics levels of gene expression, but is being widely studied at the transcriptomic level since various molecular biology tools are available for this scale of investigation [5]. Overall, this heterogeneity in gene expression has been shown to be critical for the process of differentiation, as it provides diversity without the cost of hardwire-encoded fate programs [6,7].
Furthermore, single-cell studies have also enabled the development of stochastic models to describe differentiation from single-cell transcriptomic data. One of the best-known models is Conrad Waddington's landscape, which also includes the non-genetic part of cell-to-cell heterogeneities [8]. According to Waddington's model, the shape of the landscape is determined by Gene Regulatory Networks (GRN) and state transitions are modelled as channeling events: a cell, presented as a ball, starts from a mountain top and crosses valleys before reaching a stable state by occupying basins or attractor states, shaped by an underlining GRN [9]. Once this stable state is reached, the state potential decreases and the associated cell fate is restricted or even irreversible [10].
However, it is now clearly accepted that some cells retain fate plasticity [11,12]. Under the forced modification of transcription factors stoichiometry, a cell that has reached a differentiated state can return to a more pluripotent stage challenging the classical hierarchical view of differentiation [13,14]. Quite interestingly, spontaneous fate reversion can be observed under a physiological or damaging condition where progenitors or even more committed cells return to an earlier stage, potentially more pluripotent, and reacquire progenitor or stem-cell-like phenotypes and characteristics [15][16][17][18]. In this view, our recent study has shown that chicken primary erythroid progenitor cells (T2EC) have retained the capacity to go back to a self-renewal state for up to 24H after the induction of differentiation before they irreversibly engaged in the differentiation process [19]. Despite intriguing and promising results, the molecular determinants of this so-called fate reversion and the molecular characterization of the reverting cells remain unexplored.
In this work, we go beyond the cellular and phenotypic characterization of the cell reversion process. We characterize the gene expression of primary erythroid progenitors and question if reverting cells undergo an actual fate reversion, i.e., in addition to regaining a comparable cellular state, reacquire a molecular state similar to undifferentiated cells.
For this, differentiation of self-renewing cells was induced by medium change during 24H. Then, we split the differentiating population so that half could pursue differentiation, and the second half was shifted back in a self-renewal medium (Fig. 1). To provide robust quantitative measurements of gene expression variability, we combined a highly sensitive targeted quantification method (scRT-qPCR) with genome-wide scRNAseq data to characterize the transcriptome of each population at the single-cell level: undifferentiated (0H), differentiating (24H and 48H), and reverting (48H reverting) cells. Our statistical analyses show that 48H reverting cells and undifferentiated cells were much more similar, whereas a separation was clearly visible between cells maintained in differentiation (48H differentiating cells) and cells in reversion (48H reverting cells). Furthermore, a statistical comparison of cell distributions indicated no significant differences between 0H cells and 48H reverting cells. Moreover, gene expression pattern distribution of 48H reverting cells showed a shift towards expression pattern distribution of 0H cells. Finally, we identified genes that discriminate 48H reverting cells and 0H cells. Using sparse PLS [20], we were able to show that the expression of 3 genes, HBBA, TBC1D7, and HSP90AA1, was discriminant between 48H reverting cells and 0H cells showing that reverting cells kept transcriptional traces of their induction to differentiation. In conclusion, our results show that reverting cells display gene expression patterns that are very similar to undifferentiated cells while retaining traces of their response to differentiation induction, which suggests an almost complete molecular reversion after 24H of differentiation induction.

Robustness of single-cell transcriptomics analysis
We sought to characterize at the molecular level the cells that were induced to differentiate for 24 h and that retained the ability to proliferate when placed back into a self-renewal medium. We used two different complementary single-cell transcriptomics technologies, scRT-qPCR and scRNAseq. scRT-qPCR allows for highly sensitive quantification but is knowledge-driven and offers information of a limited number of genes while scRNAseq, although less precise for low expression level [21], enables genome-wide quantification without prior knowledge. Furthermore, using two different single-cell technologies allowed us to cross-validate our observations and point toward robust conclusions.
We first obtained by scRT-qPCR the expression level of 83 genes involved in T2EC differentiation in 173, 173, 168, and 171 cells for 0H, 24H, and 48H of differentiation and 48H reverting cells, respectively. Those genes are known to distinguish cells along the differentiation process and include sterol biosynthesis, metabolism, globin subunits, and transcription factors expressed by erythroid progenitors as published in [19]. The robustness of our measurements was confirmed by a Pearson's correlation of 0.85 (p-value = 2.2e−16) between our experiments and the published data [19]. To investigate fate-reversion genome-wide by scRNAseq, we adapted the MARSseq (massively parallel single-cell RNA-Seq [22] -see the "Methods" section). Then, we obtained gene expression levels in 174, 181, 169, and 186 single cells for 0H, 24H, and 48H of differentiation and 48H reverting cells, respectively. The concordance between scRT-qPCR and scRNAseq data was confirmed by a Pearson's correlation of 0.73 (p-value = 1.34e−13) between the 74 genes common to both datasets.

Similarity between reverting and undifferentiated cells revealed by dimension reduction
We used UMAP to uncover potential similarities between 48H reverting cells and subgroups of differentiating cells by projecting the 4 conditions ( Fig. 2A, scRT-qPCR data, and Fig. 2B, scRNAseq data). Then, we focused on the normal differentiation process using the 3 time points of differentiation (0H, 24H, and 48H differentiating cells) (Fig. 2C-H). For both experiments, pairwise representations show that 24H differentiating cells tend to overlap with both 0H cells (Fig. 2C, D) and 48H differentiating cells (Fig. 2E, F). On the contrary, the undifferentiated cells and 48H differentiating cells clearly differ (Fig. 2G, H). Interestingly, pairwise representations also reveal that 48H reverting cells separate well from the 48H differentiating cells (Fig. 2I, J) and from 24H cells (Fig. 2K, L), but are visually not distinguishable from the 0H cells (Fig. 2M, N). Almost identical results were observed when, instead of plotting cells on the UMAPs calculated from the mix of the 4 conditions, we recalculated the UMAPs for each pair of conditions (Additional file 1: Fig. S1). Principal Component Analysis (PCA) also captured this general separation of the data (Additional file 1: Fig. S2). Those analyses suggest that the transcriptomes of 48H reverting cells are more similar to the undifferentiated cells than to any other condition at both scales of observation. This was further confirmed by the pairwise statistical comparison of average scRNAseq distributions ([23] -see the "Methods" section). As shown in Table 1, the average transcriptomes of 48H reverting and 48H differentiating cells are significantly different, as well as of undifferentiated and 48H differentiating cells. In contrast, no significant difference in average transcriptomes was detected between 0H and 48H reverting conditions (p-value >> 0.05), indicating a very close proximity of 48H reverting cells to undifferentiated cells.

48H reverting cells and undifferentiated cells have similar gene expression patterns
We then questioned if 48H reverting cells had gene expression patterns identical to 0H cells or retained, for some genes, an expression pattern more similar to 24H or 48H differentiating cells.
Pairwise scRNAseq DE (differential expression) analysis revealed that the "normal" erythrocyte differentiation  process showed an increase in the expression of hemoglobin-related genes during the kinetics (hemoglobin subunit epsilon 1 (HBBA), hemoglobin alpha-locus 1 (HBA1), and hemoglobin alpha, subunit D (HBAD)) ( Fig. 3A-C). On the other hand, 0H cells expressed a high level of LDHA (lactate dehydrogenase A), a marker for glycolysis metabolism used by self-renewing cells [24], and ID2 (inhibitor of DNA binding 2) coding for a transcription factor involved in differentiation inhibition [25]. Interestingly, when comparing 0H with 48H reverting cells, we saw only one gene that was significantly differentially expressed just above the threshold (Fig. 3D), the Fig. 3 Volcano plot of differentially expressed genes from scRNAseq data between conditions analyzed two by two. A 0H and 24H differentiating cells. B 24H differentiating and 48H differentiating cells. C 0H and 48H differentiating cells. D 0H and 48H reverting cells. E 48H reverting and 48H differentiating cells. Genes are considered significantly differentially expressed when the fold change is equal to or above 0.5 and the adjusted p-value is below 0.05 RSFR (RNase super family related) gene, that is highly expressed in precursor cells from chicken bone marrow [26]. Furthermore, when comparing 48H reverting with 48H differentiating cells, we found hemoglobin-related genes up in the differentiating cells and LDHA and ID2 up in reverting cells (Fig. 3E).
We more closely investigated gene expression distributions within the different conditions to see how gene expression patterns would evolve during the reversion process (Fig. 4). We selected 8 genes differentially expressed and which expression increases or decreases during the differentiation process. HBA1, HBBA, HBAD (different hemoglobin subunits), and FECH (Ferrochelatase) are involved in hemoglobin and heme pathways and are more expressed by differentiating cells while LDHA, ID2, CSTA (cystatin A1), and CRIP1 (Cysteine-rich intestinal protein1) are more expressed by self-renewing undifferentiated cells. We plotted and compared their distribution between the 4 conditions. For the genes involved in differentiation, we see a gradual shift in the distributions towards a higher level of expression as cells get more differentiated (Fig. 4A-D) and we see the opposite shift for genes involved in proliferation ( Fig. 4E-H). In all cases, the 48H reverting cell expression patterns for those genes shifted back to patterns closer to the 0H cells. At the time of observation and especially for genes up in differentiation, the 48H reverting cell expression patterns are not completely similar to those of 0H cells. This was further confirmed by using a dedicated statistical tool, sparse PLS (see below).
To go further on gene distribution comparisons, we computed Wasserstein distances, a geometric distance well suited for comparing multimodal distributions, for each 2000 genes of the scRNAseq dataset between each condition two by two. We then obtain 6 distributions of Wasserstein distance values. Finally, we computed the Gini index as a measure of statistical dispersion in each distribution (the higher the Gini index is, the higher inequality among the values). We performed 100 bootstraps and compared the Gini values obtained (Fig. 5A). Distribution of Wasserstein distances between 0H cells and 48H reverting cells had the smallest average Gini index among all distributions (Fig. 5B). This result points towards a closer global transcriptional state between 48H reverting cells and 0H cells.

48H reverting cells retain molecular traces of a commitment into differentiation
To further characterize the molecular changes that persisted after reversion, we sought to identify predictive genes that discriminate the most the 48H reverting cells and the undifferentiated cells. We performed logistic regression combined with dimension reduction (partial least square [20]) between 48H reverting cells and 0H cells and retained common most discriminating genes between scRT-qPCR and scRNAseq datasets. Interestingly, our results showed that only 3 common genes discriminate between the two cell groups: HBBA, TBC1D7, and HSP90AA1, the expression of which is shown in Fig. 6. HBBA is a subunit of the hemoglobin complex which carries oxygen, TBC1D7 is presumed to have a role in regulating cell growth and differentiation [27], and HSP90AA1 codes for an isoform of the HSP90 protein chaperone, which its specific transcription is known to be induced in response to insulin [28]. Looking closely, the 48H reverting cells have an intermediate expression level between differentiating cells and undifferentiated cells for the three predicted genes. The offset observed could be due to a longer duration of mRNA half-life at 24H of differentiation. We had previously performed a quantification of mRNA half-life during avian erythrocyte differentiation ( [29] Additional file 1: Fig. S3). We focused on mRNA half-life at 24H for those three genes. TBC1D7 and HSP90AA1 have a relatively short halflife as opposed to HBBA. Other genes analyzed whose expression increases during differentiation, such as DPP7, TPP1, or RPL22L1, have also a long half-life duration mRNA, but only HBBA was identified in our statistical analysis as discriminating between undifferentiated and 48H reverting cells.
These results confirmed that the 48H reverting cells display a gene expression pattern very close to those of 0H cells while still retaining traces of their engagement into the differentiation process independently of the mRNA half-life. The molecular process explaining such "lagging genes" will have to be explored.

Cells are distributed as a continuum along the differentiation path
At that stage, two hypotheses could be made: (1) Either all cells have engaged into a differentiation process and do molecularly revert to a self-renewal transcriptional state or (2) at 24H of differentiation two subpopulations coexist: one that is still undifferentiated and would give rise to the 48H reverting cells and a second more differentiated which would lead to the 48H differentiating population and die in the reversion experiment.
We hypothesized that the existence of two subpopulations at 24 h should lead to a higher number of modes in the distribution of some genes at that time point. To test this hypothesis, we therefore estimated for each condition the most-likely number of modes for the probability distribution of each gene, as assessed through a Gamma mixture on scRNAseq (see the "Methods" section). We found no significant difference in the number of modes observed between the 4 populations (Fig. 7), which The second hypothesis would also imply that in the 24H population, the cells engaged too far in the differentiation process would die a short time after media were changed, while only the undifferentiated ones would survive. We then measured the viability rate during the kinetics and found no difference in viability between the conditions and especially between the 24H differentiation and the 48H reversion conditions (Additional file 1: Fig. S4). Finally, the second hypothesis would also imply that the reverting cells are simply cells that have not yet entered the differentiation process. It would therefore be at odds with the evidence that the 48H reverting cells do retain traces of their engagement into the differentiation process (see upper).
Those results strongly suggest that the 24H cell population is not composed of two coexisting subpopulations of cells and that 48H reverting cells enter differentiation before going back to a transcriptomic state close to 0H cells.

Discussion
In the present study, we coupled two different single-cell transcriptomics techniques and state-of-the-art statistical approaches to demonstrate the fate reversibility of avian erythrocyte progenitors induced to differentiate for 24 h.
Our results revealed a very close proximity of reverting and undifferentiated cell transcriptomes. Indeed, statistical comparison of cell distributions showed no significant difference between 0H and 48H reverting cells while, as expected, significant changes in gene expression accompanied the differentiation sequence. The analysis of gene expression distribution patterns of the 48H reverting cells confirmed a switch toward the 0H cell gene expression profiles. First, DE analysis of scRNAseq data showed only one gene significantly differentially expressed between the two conditions. Second, Wasserstein distance analysis revealed closer distances between 48H reverting and 0H cells than to any other group of cells. Third, sparse PLS analysis indicated that the expression level of only three genes, HBBA, TBC1D7, and HSP90AA1, was predictive of the 48H reverting and undifferentiated cells. Interestingly, the persistence of those three genes in 48H reverting cells could not be attributed solely to mRNA half-life duration. However, we cannot exclude that it could be a mere delay and thus a characterization of the reverting cells at a later time point may show a complete molecular reversion.
All of our results therefore favor the hypothesis that a vast majority of the 48H reverting cells responded to differentiation induction by modifying their gene expression profiles but then returned to the self-renewal transcriptional state.
One must note that this would not be the sole example of large-scale transcriptomic changes on (relatively) short time scales [18,30]. The question as to whether such large-scale transcriptome changes are accompanied, or not, by (reversible) large-scale epigenetic changes remains an open question for future studies.
It has been described in the literature that during cellular decision-making, the cell state is maintained by dynamic interactions between positive and negative regulatory molecules [31] within the frame of a Gene Regulatory Network (GRN). These interactions can be repurposed by changing the stoichiometry of ubiquitous and specific regulatory molecules and factors [11,13]. In our study, the analysis of gene expression patterns during the reversion process confirmed that the determination of the fate of erythrocyte progenitors is directed by the constraints of the dynamics of the GRN, influenced by signals emitted by changing conditions of the environment surrounding the cells. In the absence of differentiation signals (or in the presence of self-renewal inducing signals), there is no ratchet in place that would prevent (at least at early stages in our case) the system to return back to its original quasi-steady state. This is in excellent agreement with the previous demonstration that there is a duration threshold for some GRN under which the system can return back to its original state [32]. This was proposed to allow cells to discriminate between bona fide signals and random noise in their environment and could represent a physiological system for finely tuning the in vivo production of red blood cells while preserving the pool of progenitors. We recently proposed a methodology for inferring the GRN underlying T2EC differentiation [29]. For that, we kept in silico cells under constant differentiation stimulus. It would be of interest to see if the inferred GRN would be able to revert, up to a certain point where no "spontaneous" return is possible [19], to its original state. This would be a very strong constraint to impose and should severely limit the number of putative GRN able to reproduce experimental data and thus approaching the most accurate network. Taken together, our results point towards a physiological plasticity and reversibility with respect to erythrocyte decision-making. It is also reminiscent of the plasticity observed in cancer stem cells that might not be specific to tumor cells [33]. In terms of the epigenetic landscape, our work implies that instead of a continuous gradient that the cells will roll down as in the classical Waddington's depiction [8], they may go through an unstable state and may, sometimes, roll upwards over a bump in the landscape [34]. Thus, differentiation should be more appropriately described as cells moving from well to well, that is, from one metastable state [35][36][37] to another one ( Fig. 8). This view abides by the multi-stability framework where a complex quasi-potential landscape aims at describing both normal and pathological differentiation processes [37,38], and exemplifies the fact that "commitment (is) a dynamical property of the landscape" [39]. It is important at this stage to remember that Waddington himself was aware that his drawing was but a simplification. Adapting and refining this landscape should not be considered as departing from his views. Such a nonmonotonous landscape has been proposed to account for the depiction of regeneration in adult tissues [12] and is consistent with previously proposed dynamical principles of cell fate restriction [10]. It is in excellent accordance with the recent depiction that cells can "climb uphill on Waddington's epigenetic landscape" during cranial neural crest cell development [15] and would also be more relevant to account for the "hesitant" behavior of human CD34+ stem cells in vitro [40] than a straight slope. It is beyond the scope of this discussion to go into more  [19]), the removal of the environmental influences results in their relaxing back to their original attractor state details, but a cell "climbing uphill" should be seen as equivalent to "the landscape bending into a new valley. "

Conclusions
Our work has provided a detailed molecular characterization of the probabilistic nature of erythrocyte cell fate determination, influenced by the constraints of the underlying Gene Regulatory Network dynamics, and driven by environmental influences.
These new insights into the process of cell reversion could also lead to significant improvements of the executable GRN inference scheme [29].
Differentiation was induced by removing the LM1 medium and placing the cells into a DM17 medium (α-MEM, 10% fetal bovine serum (FBS), 1 mM Hepes, 100 nM β-mercaptoethanol, 100 U/mL penicillin and streptomycin, 10 ng/mL insulin, and 5% anemic chicken serum (ACS [53]). Differentiation kinetics were achieved by collecting a sub-fraction of the cells at different times after induction of differentiation (0H and 24H). After 24H, the DM17 medium was removed and half of the cells were placed back into the LM1 medium while the other half was kept in the DM17 medium to achieve 48H reversion and 48H differentiation time points respectively (Fig. 1).
Cell population mortality was assessed by counting dead and living cells from the different time points and conditions after Trypan blue staining and using a Malassez cell.

Single-cell sorting
For both single-cell transcriptomics methods, cells were sorted in 96-well plates using FACS Aria IIμ, BD: 8 plates were produced for scRNAseq (2 plates per time point) and 8 plates were produced for scRT-qPCR (2 plates per time point). Since the first steps of library construction are performed per plate, we refer as "batch" the different plates.

Single-cell RT-qPCR analysis
All the manipulations related to the high-throughput scRT-qPCR experiments in microfluidics were performed according to the protocol recommended by the Fluidigm company (PN 68000088 K1, p.157-172). All steps from single-cell isolation to scRT-qPCR, gene selection, data generation, and cleaning are described in detail in [19]. The expression matrix was log1p transformed before subsequent analysis.

Single-cell RNAseq
scRNAseq was performed using an adapted version of the MARSseq protocol [22]. Unless specified, all indicated concentrations correspond to final concentrations.
Libraries were sequenced on a Next500 sequencer (Illumina) with a custom paired-end protocol to avoid a decrease of sequencing quality on read1 due to the high number of T added during polyA reading (130pb on read1 and 20pb on read2). We aimed for a depth of 200,000 raw reads per cell.

Bio-informatic pipeline
Fastq files were pre-processed through a bio-informatic pipeline developed in the team on the Nextflow platform [54]. Briefly, the first step removed Illumina adaptors. The second step de-multiplexed the sequences according to their plate barcodes. Then, all sequences containing at least 4T following cell barcode and UMI were kept. Using UMItools whitelist, the cell barcodes and UMI were extracted from the reads. The sequences were then mapped on the reference transcriptome (Gallus GallusGRCG6A.95 from Ensembl) and UMI were counted. Finally, a count matrix was generated for each plate.

Data filtering, normalization, and analysis
All analyses were carried out using R software (version 4.0.5; [55]). Matrixes from the eight plates were pooled together. Cells were filtered based on several criteria: reads number, gene number, count number, and ERCC content. For each criterion, the cutoff values were determined based on SCONE [56] pipeline and were calculated as follows: We selected genes present in at least two cells. The filtered matrix was then normalized using SCTransform from the Seurat package [57] and we corrected for batch effect, time effect, and sequencing depth effect. The expression matrix was finally log1p transformed.
Variable genes were identified using FindVariableFeatures from Seurat, vst method [58]. Based on visualization of gene variance, we retained the 2000 most variable features. Differentially expressed genes were identified using the FindMarkerGenes function from Seurat [58]. Analysis was done by pairwise comparisons between conditions; genes with log fold change ≥0.5 and adjusted p-value <0.05 were kept as significant. More information on QC filtering is given in Additional file 1: Fig. S5.

Statistical analysis
All statistical analyses were performed using the R software (version 4.0.5; [55]). Dimensionality reduction and visualization were performed using UMAP [59]. UMAP was performed directly on the 2000 most variable genes (from the scRNAseq dataset) or 83 genes (from the scRT-qPCR dataset) using default parameters. PCA was performed using prcomp function from the stats R package (version 3.6.2). Adaptive sparse PLS for logistic regression was performed using the plsgenomics package [20]. For this analysis, scRT-qPCR data were scaled. Sparse PLS is a supervised statistical analysis that allows to predict the most discriminant variables between two groups.
Wasserstein distance computation was done using the Transport R package [60] and was accomplished for each gene of the scRNAseq dataset.
Gini indexes were calculated using the Ineq R package on Wasserstein distance distributions [61].
Bootstraping was done using the sample_frac function from the Dplyr R package [62].

Estimation of multi-stability levels
For estimating the level of multi-stability in the data, we considered that the probability distribution of each mean − 3 * SD gene can be approximated by a Gamma distribution, or a mixture of Gamma distributions, since they are known to describe continuous single-cell data accurately [63]. More precisely, we parameterized the distribution of a gene i by: where Γ denotes the Gamma function: Ŵ(z) = ∞ o t z−1 e −t dt . Note that only the parameters a j i j=1,...,m depend on the mixture component j: this is related to the distribution arising from the well-established two-state model of gene expression [64], when only the frequency of mRNA bursts is regulated, as described in [65].
For every condition, we constructed 10 training sets consisting of 80% of the cells in the population (randomly chosen), and we estimated the parameters [ a j i j=1,...,m , b i with a MCMC algorithm for the numbers of mixture components m = 1, 2, 3 successively. We then considered that the optimal number of components for gene i was the one which minimized the average BIC score estimated on the 10 corresponding test sets.

Multivariate two-sample test
Samples were compared using a multivariate two-sample test based on the 2000 most variable genes. We suppose that the normalized gene expression X 1 and X 2 of two conditions (0H vs 48H reversion, 0H vs 48H differentiation, 48H reversion vs 48H differentiation) follow a multivariate Gaussian distribution N (µ 1 , �) and N (µ 2 , �) respectively, and we denote by n = n 1 + n 2 the total number of cells. Then, we test the null hypothesis H 0 : μ 1 = μ 2 using the generalized Hotelling's T 2 test [23]. The data being high dimensional (p > n), the between-gene pooled covariance matrix is not invertible and is replaced by its Moore-Penrose inverse. In this setting, the asymptotic distribution of the generalized Hotelling statistics is χ 2 (n − 2). The p-values were adjusted according to the Benjamini-Hochberg correction [66]. Analysis was performed using the fdahotelling R package [67].