- Research article
- Open Access
Time-resolved transcriptome and proteome landscape of human regulatory T cell (Treg) differentiation reveals novel regulators of FOXP3
BMC Biologyvolume 16, Article number: 47 (2018)
Regulatory T cells (Tregs) expressing the transcription factor FOXP3 are crucial mediators of self-tolerance, preventing autoimmune diseases but possibly hampering tumor rejection. Clinical manipulation of Tregs is of great interest, and first-in-man trials of Treg transfer have achieved promising outcomes. Yet, the mechanisms governing induced Treg (iTreg) differentiation and the regulation of FOXP3 are incompletely understood.
To gain a comprehensive and unbiased molecular understanding of FOXP3 induction, we performed time-series RNA sequencing (RNA-Seq) and proteomics profiling on the same samples during human iTreg differentiation. To enable the broad analysis of universal FOXP3-inducing pathways, we used five differentiation protocols in parallel. Integrative analysis of the transcriptome and proteome confirmed involvement of specific molecular processes, as well as overlap of a novel iTreg subnetwork with known Treg regulators and autoimmunity-associated genes. Importantly, we propose 37 novel molecules putatively involved in iTreg differentiation. Their relevance was validated by a targeted shRNA screen confirming a functional role in FOXP3 induction, discriminant analyses classifying iTregs accordingly, and comparable expression in an independent novel iTreg RNA-Seq dataset.
The data generated by this novel approach facilitates understanding of the molecular mechanisms underlying iTreg generation as well as of the concomitant changes in the transcriptome and proteome. Our results provide a reference map exploitable for future discovery of markers and drug candidates governing control of Tregs, which has important implications for the treatment of cancer, autoimmune, and inflammatory diseases.
Immunological tolerance to self and innocuous foreign antigens is maintained by a fine-tuned balance of several immune cells, with an indispensable non-redundant role for both thymic regulatory T cells (tTregs) and peripherally induced Tregs (pTregs) [1,2,3]. Naturally-occurring Tregs (nTregs) comprise tTregs and pTregs, both of which suppress other immune cells and express the transcription factor (TF) Forkhead Box P3 (FOXP3). FOXP3 is necessary to generate the full Treg signature and functionality, and mutations in FOXP3 lead to severe lethal autoimmune disease in scurfy mice and men . Significant progress has been made in elucidating the architecture of the regulatory elements in the FOXP3 gene locus, which responds primarily to TCR, CD28, IL-2, and TGF-β signaling pathways . However, most known regulators of FOXP3 are not specific to Tregs but are, rather, factors of general importance in T cells and other immune cells. FOXP3 regulation is incompletely understood particularly in the human system despite accumulating evidence for differing FOXP3 regulation in mice versus human such as activation-induced low-level FOXP3 expression in human but not murine conventional T cells [4, 6, 7] and expression of different FOXP3 isoforms in human Tregs [8,9,10,11].
Therapeutic manipulation of endogenous Tregs and adoptive transfer of Tregs are promising current strategies for the treatment of human autoimmune and inflammatory diseases such as type 1 diabetes (T1D) and graft-versus-host disease (GvHD), respectively [12, 13]. Tregs can also be recruited to or be induced at tumor sites, where suppression of anti-tumor immune responses can be detrimental; therefore, depletion of Tregs is currently in trials for cancer treatment [14, 15]. Thus, the further understanding of Treg induction and regulation of FOXP3 expression is highly relevant to expand the therapeutic opportunities for autoimmune and inflammatory diseases as well as cancer. Due to their involvement in different physiological functions, specific targeting of pTregs or tTregs may be warranted depending on the type of disease. In vivo, pTreg generation was shown to occur in the intestinal system, where chronic low-dose antigen stimulation under tolerogenic conditions favors Treg induction . Gut-associated dendritic cells as well as macrophages in the lung secrete TGF-β and the vitamin A metabolite all-trans retinoic acid (ATRA), which promote Treg induction [1, 16, 17]. Further, commensal microbiota-derived short-chain fatty acids, particularly butyrate, were demonstrated to favor colonic Treg induction [18, 19]. Specific depletion of pTregs in C57BL/6 mice leads to spontaneous development of allergic-type pathologies at mucosal sites in the gastro-intestinal tract and lung  as well as to defects in maternal-fetal tolerance in the placenta , but not to systemic autoimmune disease. This is concordant with the concept that tTregs are primarily needed to mediate self-tolerance, while pTregs are specifically induced at environmental interfaces to induce tolerance to foreign antigens. Nevertheless, a potential role for pTregs in mediating self-tolerance has also been suggested [22, 23].
Interestingly, Tregs induced from naïve T cells in vitro are able to take over nTreg functions, as they are able to rescue scurfy mice . Such in vitro induced Tregs (iTregs) can be generated by protocols that mimic the in vivo situation, usually containing IL-2 and TGF-β [1, 16]. Depending on the induction protocol used, iTregs can highly express FOXP3 beyond activation-induced levels, although the epigenetic profile and, consequently, the stability of iTregs differs from that of nTregs [25, 26]. Further, the suppressive functionality of iTregs is controversial, with results dependent on the protocol and controls used [27, 28]. However, to understand molecular events occurring prior to and during FOXP3 induction in human T cells, in vitro culture systems are required.
Thus, to further understand the regulation and induction of FOXP3 we performed deep molecular profiling over time during Treg induction, starting from naïve CD4+ T cells. Since there is no ‘gold standard’ protocol for human Treg induction, we aimed to find general FOXP3 regulators independent of a specific procedure. Therefore, we used control (‘Mock’) stimulated cells along with four different recently established Treg-inducing protocols in parallel . These protocols, utilizing the cytokines IL-2 and TGF-β in combination with ATRA, ATRA + rapamycin (Rapa), or butyrate, led to robust and reproducible induction of FOXP3 and other Treg signature molecules [28, 29]. Notably, the mTOR inhibitor Rapa proved efficient in enhancing iTreg generation and nTreg expansion [12, 30, 31], and we previously demonstrated that the combination of TGF-β + ATRA + Rapa induced iTregs with superior suppressive activity .
We present here an extensive time-series molecular profiling of human iTreg differentiation, providing a resource of RNA sequencing (RNA-Seq) transcriptomics and proteomics data covering multiple Treg-inducing protocols alongside with control cells that were activated without Treg-inducing factors. We also provide unstimulated naïve CD4+ T cells and nTregs from the same donors as negative and positive controls, respectively. We explored the transcriptome and proteome of the very same samples, enabling true integrative analysis of both data types and making deep quantitative mass spectrometry-based proteomics data of iTregs available to the scientific community. In addition, we provide a completely independent additional RNA-Seq dataset for TGF-β + ATRA iTregs generated in another laboratory under different culture conditions and with an extended time-series. We present evidence for iTreg-specific global cell polarization patterns in agreement with available relevant T cell signatures, and we find that functional categories underlying differentiation include a super-cluster of general molecular pathways related to cellular proliferation and metabolism, in parallel to a super-cluster linked to T cell polarization. Importantly, integration of our data revealed enrichment of a newly defined iTreg subnetwork for immune disease-associated genes and a central position in the iTreg subnetwork for many known crucial (n)Treg regulators, alongside with novel candidate molecules whose functionality in FOXP3 induction was confirmed by a targeted shRNA validation screen. Our rich data harbor the potential to reveal novel markers for distinction of Tregs from activated T cells or pTregs from tTregs, the latter being currently problematic in the human system [1, 32]. Indeed, our computational approach confirmed that novel iTreg molecules discovered in this study can outperform known Treg regulators in classifying iTregs versus activated T cells. In summary, we present a large resource of dynamic Treg transcriptome and proteome data, including several novel regulators of FOXP3, which can be explored as potential markers and drug targets in the future.
Human in vitro generated iTregs show robust expression of Treg signature molecules such as FOXP3 and Eos
To gain a better understanding of human FOXP3 induction, we induced iTregs from naïve CD4+ T cells allowing for the detection of iTreg signatures at the differentiated state but also of events preceding FOXP3 expression, which in the human system is only possible with in vitro assays. We performed a time-course molecular profiling during human iTreg induction, capturing molecular events occurring during differentiation of FOXP3+ cells along with control cells. It is well-known that TGF-β and IL-2 contribute to the induction of FOXP3 [1, 5]; however, there is no single standard protocol for iTreg generation and the literature concerning iTreg phenotype and function is controversial even for seemingly similar protocols [27, 28]. To identify robust and generic molecular events driving FOXP3 induction, we interrogated iTregs induced by four different protocols in parallel. This approach enabled stringent filtering for shared events occurring in all FOXP3-expressing iTreg populations independently of the specific protocol. The Treg-inducing protocols all included anti-CD3/-CD28 activation together with IL-2 and TGF-β, either alone or in combination with ATRA, ATRA + Rapa, or butyrate (Fig. 1a). Importantly, cells were cultured in serum-free medium to exclude traces of serum-derived TGF-β or other undefined factors, and as a control we used Mock-stimulated cells treated only with anti-CD3/-CD28 activation and IL-2. We isolated naïve CD4+ T cells and CD25high cells (here called ‘nTreg’) in parallel from the same three healthy male donors (for cell purities, see Additional file 1: Figure S1a). We isolated RNA and protein from the same cells, which were derived from iTregs and Mock-stimulated cells at five different time points of differentiation (2, 6, 24, and 48 h, and 6d) and from unstimulated naïve CD4+ T cells (‘Tnaive’) and nTregs (for experimental setup, see Fig. 1a). For quality control, an aliquot of the cells used for molecular profiling was phenotyped by flow cytometry on days 4 and 6 of culture. All iTregs displayed expression of FOXP3 that was enriched in CD25-expressing cells, unlike Mock-stimulated cells (Fig. 1b, c and Additional file 1: Figure S1b, c). In addition, we determined that CD45RA was downregulated more strongly in iTreg TGF-β + ATRA as compared to the other stimulated cultures (Fig. 1c and Additional file 1: Figure S1c). Furthermore, the fraction of IFN-γ-producing cells was reduced in iTregs compared to Mock-stimulated cells (Fig. 1c). As we described previously , CD25 was upregulated upon stimulation resulting in a high fraction of positive cells in Mock-stimulated cells and iTregs, except in iTreg populations induced in the presence of Rapa, which displayed a generally less activated phenotype (Fig. 1b, c and Additional file 1: Figure S1c). Accordingly, after pre-gating on CD25+ cells, iTregs induced in the presence of Rapa were more similar to iTregs induced with other protocols not only regarding expression of FOXP3 but also of CTLA-4 and CD45RA (Additional file 1: Figure S1c). Notably, despite lower fractions of FOXP3+ cells in iTregs generated with TGF-β + ATRA + Rapa, these iTregs had superior suppressive function in vitro according to our previously published results .
High-quality RNA extracted from different time points of T cell cultures was subjected to RNA-Seq, which confirmed enhanced expression of FOXP3 in all iTregs compared to Mock-stimulated cells at all time points (Fig. 1d). IKZF4 encoding for Eos, another gene important for Treg function , was also early and stably upregulated in all iTreg populations, reaching levels similar to nTregs (Fig. 1d). FOXP3 and IKZF4 expression results from RNA-Seq were confirmed by qRT-PCR from the same as well as additional donors (Additional file 1: Figure S1d) . From a subset of the samples, we performed quantitative mass spectrometry-based proteomics using high resolution isoelectric focusing (HiRIEF) nanoLCMS . The proteomics data confirmed high expression of FOXP3 and Eos protein in iTregs induced with TGF-β or TGF-β + ATRA + Rapa (Fig. 1e). Although FOXP3 expression in both RNA-Seq and proteomics data increased over time in iTregs, reflecting the increased fraction of FOXP3+ cells in the population as differentiation proceeds, the amounts remained below that in nTreg populations. Notably, on the per-cell level, when gating on activated (CD25+) cells, FOXP3 protein levels in iTregs were similar to nTregs, while Mock-stimulated cells did not display such FOXP3 expression even in CD25++ cells (Fig. 1b, c), emphasizing the importance of considering the fraction of CD25+ cells as well as the kinetics of gene expression over time in comparison to Mock-stimulated control cells. It was described that the FOXP3 expression level in murine Tregs is correlated to their function ; however, in human Tregs, expression of FOXP3 is more complex, wherein human Tregs are known to express three different FOXP3 splice isoforms with functional consequences [8,9,10,11]. We therefore asked whether iTregs induced by the conditions under study, despite similar total FOXP3 protein levels on a per-cell basis, may show a change in FOXP3 isoform expression compared to nTregs. To this end and to further confirm the proteomics data with the additional aspect of FOXP3 isoform expression, we performed western blot analysis of iTregs and nTregs. These data confirmed higher FOXP3 protein expression in all iTreg populations compared to Mock-stimulated cells, although lower than in nTregs (Additional file 1: Figure S1e). The results further suggest that iTregs express functional FOXP3 isoforms with a similar isoform expression pattern as in nTregs (Additional file 1: Figure S1e). iTregs resembled the FOXP3+ ‘Treg’ subset of CD4+ cells based on FOXP3 and Eos expression, and also expressed low levels of cytokines ascribed to Th1, Th2, Th17, or Tr1 subsets such as IFN-γ, IL-4, IL-13, IL-17, IL-22, and IL-10 (Fig. 1c) (omics data in repository and ).
Together, these results demonstrate the quality of the iTreg cells used for molecular profiling based on canonical Treg ‘up’ signature molecules, such as the Treg markers FOXP3 and Eos, and low expression of Treg ‘down’ signature molecules such as IFN-γ.
Time-course molecular profiling captures the dynamics and polarization of iTreg differentiation
We next performed a global analysis of the generated RNA-Seq and proteomics data from human iTregs and controls. In total, counting the GENCODE (v19) genes and applying a minimal filtering rule (≥ 1 count across all samples), we detected 33,991 genes. However, when considering an expression threshold (see Methods) for highly expressed genes (HEGs), which are more likely to be functional , our data comprised 15,910 HEGs corresponding to 11,378 (72%) protein-coding genes. With our mass spectrometry-based proteomics approach, we identified 9906 proteins (Ensembl protein ID), of which 6906 proteins (70%), corresponding to 6815 genes, were quantified in all samples, and accordingly used for relative quantification and statistical analysis. The overlap between the mass spectrometry-based proteome of 6815 genes and the HEGs was larger (55%) than the fraction of RNA-identified HEGs with no quantified protein (42%) (Additional file 1: Figure S2a, b). Despite the relatively high overlap, the number of detected features was higher for RNA-Seq than for proteomics, likely due to intrinsic technological features and not to major differences in the quality of the corresponding data. Indeed, we observed that the variability in both data types might primarily be explained by biological variables (such as the activation time or the treatment group) rather than technical variables, and that no major signs of unwanted batch effects were present (Additional file 1: Figure S3). This was also exemplified by the expected protein quantitative behavior, regardless of technical variables such as the Tandem Mass Tag (TMT) set, that was observed for known Treg proteins such as FOXP3 and Eos (Fig. 1e). Furthermore, we calculated the average number of proteins per cell , which included many important Treg and/or T cell regulatory proteins (Additional file 1: Figure S4), thus confirming the quality of our proteome data despite limited primary sample material.
To understand the dynamics of iTreg differentiation on a global scale, we relied on the wider coverage and sampling of the RNA-Seq data, while incorporating the effect of multiple experimental groups. Therefore, we performed unsupervised clustering using a version of Self-Organizing Maps (SOMs) that allows obtaining several parallel maps and perform data fusion (see Methods). As shown in Fig. 2a, the topology of the polarization indicated that a profound restructuring of gene expression was mainly observed over time for all conditions and already at the early time points, in line with the potent activation signals. Furthermore, when analyzed across conditions, the maps showed that differences were readily appreciable for the iTreg treatments compared to Mock stimulation, especially at late time points. When cross-comparing the iTreg time series, the TGF-β, TGF-β + ATRA, or TGF-β + butyrate conditions showed a similar overall molecular pattern, in contrast to TGF-β + ATRA + Rapa, which was one motivation to restrict the proteomics samples as well as further detailed analysis mainly to three treatment groups, namely Mock-stimulated cells, iTreg TGF-β, and iTreg TGF-β + ATRA + Rapa. Additionally, we have shown previously  that iTregs generated with TGF-β + ATRA + Rapa differed functionally, displaying enhanced suppressive activity in vitro.
Next, we asked whether the observed gene expression pattern over time was indicative of a bona fide iTreg polarization. As shown above, expression of FOXP3 and IKZF4 affirmed a Treg-like phenotype. To provide additional support and interpret the observed gene expression changes, we used publically available relevant T cell signatures. We obtained a three-dimensional map of the gene expression data using principle component analysis (PCA) (Fig. 2b and Additional file 1: Figure S3b); then, for each known signature and for each sample, we calculated a score, which essentially indicates whether a sample’s expression profile matches the known signature (Additional file 2: Table S1). The first two principal components (PCs) separated the cells mostly on the basis of the activation time and treatment group, the latter being more noticeable as time progresses and the spread increases (Additional file 1: Figure S3b, e). Correspondingly, when the signature vectors were correlated to the PC scores, PC1 and PC2 can be interpreted as related to cellular activation or the effect of TGF-β (selected ‘T cell activation’ and ‘TGF-β’ signatures in Fig. 2b). Interestingly, the ‘Treg vs. Tcon’ signatures were also positively correlated with PC1, suggesting that at least part of the variability observed over time may be correctly attributed to the anticipated Treg phenotype. Furthermore, inspecting the variation from the PC2–PC3 perspective, the in vitro polarized samples lie on a plane approximately defined by the activation and TGF-β signatures, while nTregs substantially diverge from this plane, and the human ‘Treg vs. Tcon’ signatures intersect this plane almost perpendicularly. While our analysis confirmed the similarity of the (unstimulated) nTreg samples with published Treg signatures (which, in the examples used, were based on unstimulated nTregs), it must be considered that the divergence of iTregs from nTregs may be largely influenced by the in vitro activation of iTregs, but may also reflect actual differences between iTregs and pTregs as well as between pTregs and tTregs or Treg subsets from different tissues [38,39,40,41,42]. Overall, the pattern of variability is readily compatible with a combined effect of time and experimental promotion of the iTreg phenotype as the driving forces shaping the differences between the samples.
iTregs show protocol-specific and general patterns of differentially expressed genes and proteins
To model gene expression over time and call differentially expressed genes (DEGs), we specifically focused on the extraction of the genes that reacted in an iTreg-specific manner over time compared to the Mock-stimulated cells (iTreg group effect), as opposed to the genes that changed over time as a consequence of cellular activation (time effect). Given the different kinetics of general activation and correspondingly lower CD25+ cell fractions in iTregs induced with Rapa (Additional file 1: Figure S1c and Figure S3b, e), modeling gene expression over time and in comparison to Mock-stimulated control T cells was especially important. To ensure a robust calling of DEGs, we employed three different methods for differential analysis in parallel and required the genes to be significant for at least two methods (see Methods; Additional file 1: Figure S2c and Additional file 3: Table S2). Briefly, we either considered the time as a discrete or continuous covariate and performed appropriate multiple linear modeling with interactions between the time and the group. A corresponding analysis was done for proteomics data to obtain differentially expressed proteins (DEPs) (see Methods; Additional file 1: Figure S2d, e and Additional file 3: Table S2).
The massive cellular response to the activation was exemplified by the large number of genes (10,093 DEGs; 4660 DEPs) that responded to the stimulation over time (Additional file 1: Figure S2c, d and Additional file 3: Table S2). In addition to the general activation effect, a fraction of the genes responded with an iTreg-specific expression pattern, as iTregs presented 1279 to 2249 DEGs depending on the inducing protocol (Fig. 3a and Additional file 1: Figure S2c), of which several were shared between iTreg conditions (Fig. 3a). According to the stringent criteria used here to call DEGs, only 368 genes were DEGs shared between all four iTreg groups (Fig. 3a), and the iTreg TGF-β + ATRA + Rapa condition was bounding this number to a lower range when included in a two- or three-way comparison, in line with the SOM analysis (Fig. 2a) in which this group displayed a more unique profile, while the other iTreg groups were more similar to each other. Thus, for proteomics, iTreg TGF-β + ATRA + Rapa and iTreg TGF-β were used to represent the Treg-inducing protocols, and the following analyses are therefore focused on these two iTreg groups. Confirming the DEG data, almost 50% of the proteins differentially expressed in iTreg TGF-β were also DEP in iTreg TGF-β + ATRA + Rapa, while the latter had more unique differentially expressed molecules (Fig. 3a, b). Alongside protocol-specific differential gene and protein expression, we observed a shared signature induced by all the iTreg protocols, which was most relevant from the perspective of studying generic FOXP3-inducing molecules.
A special effort was made to separate the iTreg group effect from the relatively larger time effect (Fig. 3c, d), and the number of DEGs and DEPs for both the activation as well as the iTreg-specific effect increased over time, albeit with different kinetics for RNA and protein. More RNAs were already different at the first baseline point for RNA-Seq (2 h) but very few proteins were differentially expressed at the corresponding first available proteomics baseline point (6 h) and a similar number of DEGs and DEPs was only modeled at the terminal stage (Fig. 3c, d).
This global analysis was followed by a parallel comparison of RNA and matched protein expression over time. We asked whether a gene’s mRNA level correlated with its protein level, as translational efficiency, post-translational regulation, and molecular half-lives are known to affect the relationship between the two, especially during state transitions [43, 44]. The overlap of protein-coding DEGs with DEPs was 37% (Additional file 1: Figure S2e). However, since no corresponding protein was quantified for 61% of the protein-coding genes (Additional file 1: Figure S2a), we considered only RNA:protein pairs for which both data were available for further analysis. We observed that a substantial fraction (58%) of those genes was differentially expressed in both domains (Fig. 3e, DEG and DEP). Next, we assessed whether the changes in the two domains were correlated; to do so, we first partitioned the RNAs and proteins based on their corresponding differential expression and then further classified the largest block (DEG and DEP) into five clusters (see Methods and Fig. 3e). We observed that approximately half of the molecules in the DEG and DEP block had concordant expression (Fig. 3e, clusters c, d) and the other half belonged to clusters with discordant profiles. While the relative kinetics of the RNA and protein expression would allow for a delayed accumulation of the corresponding protein, resulting in a poorly correlated profile (as for cluster e), we observed one cluster of genes (cluster b) for which protein and RNA had instead an anti-correlated profile. Interestingly, this cluster was enriched for mitochondrial proteins and functions (Fig. 3e and Additional file 1: Figure S5). Similarly, for the other discordant cluster (e) more proteins were reported to localize in specific cellular compartments (nucleolus, ER/Golgi) rather than nucleus and cytoplasm, with corresponding matching functional enrichment (Fig. 3e and Additional file 1: Figure S5). This analysis demonstrates that, even when the transcriptional perturbations of a cellular system are accompanied by a corresponding substantial alteration of the proteome, the individual transcript level cannot be always considered a proxy for the corresponding protein. Therefore, when selecting genes for functional follow-up from transcriptome data, proteome data are an important non-redundant resource.
iTreg gene expression signatures reflect specific functional programs
The functional transcriptional program of T cell activation and T helper cell subset differentiation in vitro  or T cell differentiation in the thymus  has been previously analyzed. Here, we specifically investigated the functional categories affected during the differentiation toward the iTreg subset, and we dissected, visualized, and grouped the distinct profiles of the genes that resulted differentially expressed as a function of the polarizing conditions (TGF-β or TGF-β + ATRA + Rapa).
To partition the genes with similar RNA expression over time, we used a model-based representation of the cluster of genes in the Mock control and in the iTreg time series (Additional file 4: Table S3). We further grouped the clusters using a summary measure of the correlation between the respective members (see Methods), and super-clusters were identified (Fig. 4a), i.e., two positively and densely connected components, one with a more generic proliferative and metabolic profile and another pointing to a more specific T cell program (Fig. 4a, b and Additional file 4: Table S3). The genes up-regulated over time were grouped at one extreme, while the down-regulated genes were found at the other. Moreover, for individual clusters, the differences between the iTreg and the Mock stimulation conditions were highlighted. The super-clusters not only showed data-driven dependence, but also sorted into interconnected domains with functionally related ontologies (Fig. 4b and Additional file 4: Table S3). Indeed, among the up-regulated clusters we identified enrichment for categories required and/or affected by a proliferative program, such as metabolism, cell cycle, RNA processing and translation, in line with the augmented metabolic requirements of activated T cells and a shift from a catabolic to an anabolic metabolic program . We observed that iTregs showed differences in the cluster enriched for metabolic processes, especially for iTregs induced with TGF-β + ATRA + Rapa and in agreement with the metabolic control exerted on the in vitro induction of T cell subsets with specific requirements for Tregs, including a role for the mammalian target of rapamycin (mTOR) [12, 31]. At the other extreme of the cluster network, a specific effect of the iTreg polarization conditions was shown for the clusters identified as ‘TGF-β clusters’ (clusters 10, 15, 23, 35, 38), mostly containing genes that were more abundantly expressed in the iTreg as compared to the Mock control time series (e.g., IKZF4, SMAD7, NOTCH1, TGFB1, SKIL). In the proximity, other ‘T cell activation’ genes (clusters 1, 4, 11, 12, 28, 29) were instead abundantly expressed at the beginning of the polarization but down-regulated over time (e.g., MAP3K1, STAT5B, MYC, MAX, CD3E). Interestingly, the FOXP3-containing cluster 3 appeared as being central in the network of clusters and bridging between the TGF-β series of clusters and the metabolism/cell cycle groups (clusters 13, 21, and 22). FOXP3, FURIN, CCR5, CD101, CD2, and CXCR5 were among the cluster 3 members and concordantly upregulated in iTregs. In summary, the functional gene expression program agrees with simultaneous proliferation, activation, and polarization, a typical behavior of differentiating T cells.
iTreg gene expression program inferred from network reconstruction
The polarization of iTregs from naïve T cells over time may encompass regulatory interactions that are changing over time as a result of the dynamic evolution of the system. Indeed, we showed above (Fig. 4) that the expression profile of the gene cluster comprising FOXP3 did not positively correlate with the profile of the clusters containing known FOXP3 activators (e.g., BACH2, FOXO1, STAT5B, FOXO3). The relative expression over time of each member of a regulator:target pair is key in determining the biological outcome and a simple approach based on correlation does not incorporate such a dynamic interaction. Therefore, we hypothesized that, after considering the temporal factor in reconstructing the network of interacting genes, it would be possible to retrieve the known, and possibly novel, molecular connections from our data and that we could apply the ‘guilty by association’ principle in order to retrieve original candidate regulatory genes. The most common dynamic network reconstruction algorithms require a very dense sampling rate to estimate lagged dependencies and the small sample size might be a limiting factor. We therefore sought to overcome this limitation by comparing the gene networks reconstructed statically but for two sets of samples, i.e., the ‘early’ (0–6 h) and the ‘late’ (24 h to 6 days) observations independently, and capture the temporal aspect of the regulatory system by inspecting the rewiring of the nodes. Indeed, when summarizing the expression profile of known FOXP3 positive transcriptional regulators as a signature score (see Methods; Additional file 1: Figure S6a), we verified that there is an early bout of induction of those regulators peaking at 6 h, except for RUNX1, NFATc4, and FOXP3 itself, which were upregulated later and specifically in iTregs. These observations are consistent with an accumulation of FOXP3 at later time points. We implemented a hub-centered approach to infer the connections between the genes in the early and late stages of the differentiation using a mutual information (MI) criterion and calculated a rewiring score by comparing the two networks (Additional file 5: Table S4). We robustly selected the hubs as the set of 307 TFs resulting differentially expressed both at the protein and RNA level after cellular activation or iTreg polarization. Of these, the subset of 49 TFs differentially regulated in iTregs (Fig. 5a) includes genes with a well-known role in Treg biology, which are maximally transcribed either early (BACH2, LEF1, IKFZ3, SATB1) or late (IKFZ4, NFIL3, GATA3, JUN, RXRA) in our system and show generally concordant RNA and proteins levels, in addition to novel factors not known in Treg/T cells (Fig. 5a). We obtained bootstrapped consensus early and late networks by calculating the MI between the hubs and all the other expressed genes. The two consensus networks were combined by union to derive a full network and a node-rewiring score  was used for ranking (see Methods and Additional file 1: Figure S6b, c). A major rewiring force was the T cell activation through CD3, CD28, and IL-2, which constitute a well-known potent signal for the dynamics of the cellular network. Therefore, we focused on the specific events occurring in iTregs by selecting a subnetwork formed by FOXP3 and the shared DEGs in all iTreg conditions (Fig. 3a), here called iTreg subnetwork. The term ‘iTreg subnetwork’ was chosen to indicate the differential expression in iTregs; however, it should be noted that several of these genes also appear similarly expressed in nTregs (Additional file 1: Figure S6d). This selection resulted in a total of 349 nodes (Fig. 5b), of which 19 were TF hubs. Notably, although the MI provides only an undirected graph, taking the direction from the TF hub to the target node into account, a direct comparison with literature and external data sources can be made in order to verify previously observed regulatory interactions. Indeed, several connections in our network were supported by a network of TF binding, independently reconstructed from DNase-Seq data from Tregs (see Methods; Fig. 5b). Such comparison, however, is limited due to incompleteness of the binding motif database. The selected iTreg subnetwork (Fig. 5b) was enriched for genes of the TGF-β pathway, as indicated by functional enrichment analysis using Gene Ontology, KEGG, Reactome, and MSigDB Immunological Signatures databases (Additional file 6: Table S5) and as expected by the involvement of TGF-β signaling in iTreg differentiation . We next explored whether the iTreg subnetwork genes may be differentially expressed simply as a consequence of FOXP3 expression in iTregs. Time-course analysis of gene expression profiles revealed that the majority of the iTreg subnetwork genes displayed differential expression prior to FOXP3 expression, suggesting that they may play a role in FOXP3 induction (Additional file 1: Figure S6d, e). Further, many iTreg subnetwork genes were represented in those gene clusters functionally assigned to TGF-β signaling earlier (Fig. 4), suggesting that they may be involved in the TGF-β pathway and not indirectly regulated by FOXP3 (Additional file 1: Figure S6e). Nevertheless, several iTreg subnetwork genes displayed a similar temporal expression profile as FOXP3 (Additional file 1: Figure S6d), and thus it is possible that they may be regulated by FOXP3 itself. To further explore this possibility, we studied the expression of iTreg subnetwork genes in published data  of pre-activated primary human naïve CD4+ T cells upon FOXP3 transduction compared to an empty vector. We determined that 9 of 349 iTreg subnetwork genes (including FOXP3) were significantly (false discover rate (FDR) < 0.05) differentially expressed in FOXP3- versus control-transduced cells (Additional file 1: Figure S6f), suggesting the possibility that these few genes may be expressed as a consequence of FOXP3 in iTregs.
In conclusion, our approach defined a novel iTreg subnetwork containing multiple genes potentially involved in FOXP3 induction. Importantly, along with novel potential Treg regulators, the iTreg subnetwork revealed known crucial Treg regulators like BACH2, NFIL3 (E4BP4), GATA3, NOTCH1, LEF1, SATB1, and IKZF4 (Eos) with a well-known role not only in iTregs, but also, and more importantly, in vivo in nTregs (Fig. 5a, b).
The selected iTreg subnetwork contains autoimmune disease-associated genes
The immunomodulatory and tolerogenic activities of Tregs in vivo are well-known and exemplified by the severe human disease immunodysregulation polyendocrinopathy enteropathy X-linked syndrome, which is caused by FOXP3 mutations  as well as by the contribution of Treg defects to multiple complex human autoimmune diseases . Based on the therapeutic activity of Tregs in numerous pre-clinical mouse models of autoimmune and inflammatory diseases in the past, recent advances have been made towards clinical trials applying Treg therapy, primarily in GvHD, transplantation, and T1D [12, 13]. For these reasons, we hypothesized that the genes that are affected by our experimental iTreg induction may be relevant in disease pathways.
Firstly, we tested whether any disease categories were enriched in our iTreg subnetwork as compared to the full network. Thus, we tested the genome-wide association studies (GWAS) catalog terms grouped by their disease ontology plus the additional Ai6 and Ai21 categories, which respectively include the GWAS genes for six common autoimmune diseases (multiple sclerosis (MS), rheumatoid arthritis, T1D, Crohn’s disease (CD), systemic lupus erythematous, and psoriasis) or the 21 autoimmune disorders as in Farh et al. . We observed notable overlap for autoimmune diseases as compared with 10,000 random gene sets selected from the full network (Ai6: odds ratio (OR) 1.57, p = 0.042, family-wise error rate (FWER) 0.443; Ai21: OR 1.55, p = 0.028, FWER 0.330) (Fig. 6a and Additional file 6: Table S5), and therefore corresponding to genes expressed in CD4+ T cells during activation/differentiation. We also used the association as in Menche et al.  as another catalog of diseases genes, further showing that disorders of the digestive and nervous system with an autoimmune component were ranked at the top of the enriched diseases (Additional file 6: Table S5). Indeed, when we compared the iTreg subnetwork to the mapped genes for CD in the GWAS catalog, we retrieved genes with variants previously identified in GWAS scans (ZFP36L1, PLCL1 DENND1B, BACH2, SLC22A23, JAZF1, IL18RAP, NOTCH1, TSPAN14, CD6). Similarly, MS susceptibility genes (CD6, CXCR5, ZFP36L1, KIF1B, CD58, BACH2) appeared in the iTreg subnetwork and we detected a positive enrichment for MS.
Secondly, as it has been shown that a network approach based on protein–protein interactions (PPI) can guide the interpretation of GWAS data , we identified the modules on a reference PPI network that were enriched for the above disease categories (see Methods). For three PPI modules (out of 119 modules) we detected an overrepresentation of the Ai6, Ai21, and/or digestive system disorder categories (Fig. 6a, b), further supporting the hypothesis that the iTreg subnetwork contains genes that act in the same autoimmune disease neighborhood, such as LEF1, BACH2, TCF4, KIF1B, CXCR5, and NOTCH1, which have been implicated in autoimmune diseases by independent reports.
Thirdly, to further support the connection with autoimmunity, we interrogated the Target Validation Platform of the Open Targets Consortium (www.targetvalidation.org) and retrieved all the association scores, which summarize the association evidence between a drug target and a disease, and it is calculated from various data sources, including genetic associations, somatic mutations, known drugs, pathways affected by pathogenic mutations, RNA expression, animal models, and text mining. When testing the diseases associated with the iTreg subnetwork as compared to the full network, we grouped the diseases with a nominal p value of < 0.05 (hypergeometric test) by therapeutic area and again verified that ‘Immune system disease’ was the prominent therapeutic area, including several autoimmune diseases such as MS and inflammatory bowel disease (IBD, including both CD and ulcerative colitis; Fig. 6c), further confirming that the selection of iTreg genes is an appropriate criterion for identifying known and likely also novel genes with relevance for human autoimmune diseases.
Given the above findings based mainly on genetic susceptibility, we formulated the hypothesis that, in the context of the diseases for which there is a likely contribution of T cells and in particular Tregs, the iTreg subnetwork genes should positively overlap with the genes whose expression is deregulated in patients compared to controls. Therefore, we retrieved publicly available gene expression signatures (see Methods) for IBD and MS – diseases in which Tregs are known to be relevant – and performed a Gene Set Enrichment Analysis with the iTreg subnetwork gene set. A positive enrichment was indeed observed when using the ranked signatures obtained from CD4+ T cells, not only from the affected tissue, but also from blood (Fig. 6d).
Molecular profiling reveals novel candidate molecules that can classify iTregs versus Mock-stimulated T cells
The above analyses confirm that the iTreg data presented here revealed disease-relevant and known important Treg regulatory genes. We therefore selected novel ‘candidate genes’ that appeared likely to have a role in iTreg generation, and specifically FOXP3 induction, for further investigation. Those ‘candidate genes’ (Fig. 7a and Additional file 1: Figure S7a) were chosen based on satisfying at least two of the following criteria: (1) being shared DEGs in all four iTreg conditions (Fig. 3a); (2) being nodes in the iTreg subnetwork (Fig. 5b); (3) integrating DEGs and DEPs in iTregs versus Mock control cells and sub-setting on TFs (‘iTreg hubs’ in Fig. 5a); or (4) for these genes, a similar expression profile on protein level if available. Based on integrating these criteria and ascertaining, through a literature search, that a role in Tregs had not been elucidated at the time, 37 novel ‘candidate genes’ with a putative role in FOXP3 induction were chosen. We preferably selected putative positive regulators of FOXP3 upregulated in iTregs compared to Mock-stimulated cells (Fig. 7a, b) and, thus, amenable to perturbation approaches by gene knockdown. Notably, the candidate genes chosen here were over-represented in few gene clusters according to Fig. 4 (Additional file 4: Table S3), that is, clusters 3 (‘FOXP3 cluster’), 5 (‘known regulator’ cluster), and 23 and 35 (‘TGF-β clusters’).
Besides the ‘Main dataset’ described above, we also provide another unpublished, completely independent additional RNA-Seq dataset (here called ‘independent RNA-Seq dataset’). This independent dataset comprises Mock-stimulated cells and TGF-β + ATRA iTregs induced under slightly different culture conditions (see Methods), from different donors and laboratories, and including additional time points. A high fraction of FOXP3-expressing cells in the population, as well as the suppressive activity of these iTregs, was confirmed (Additional file 1: Figure S7c, d). Further, early and stable IKZF4 mRNA expression as well as high FOXP3 mRNA in iTregs compared to Mock-stimulated cells was confirmed in these independent donors (Fig. 7c). Importantly, the fractions of activated (CD25+) cells in iTregs and Mock-stimulated cells were high and in a similar range between these conditions (Additional file 1: Figure S7c), allowing accurate comparison of gene expression in these cells even with bulk populations. When studying the expression pattern of the above-selected candidate genes in this independent dataset (Fig. 7c), we observed that most candidates showed a similar expression pattern in both datasets, confirming the general trend of iTreg-specific candidate gene expression as compared to Mock-stimulated cells (Fig. 7a–c). Additionally, the extended time-series revealed an early spike of expression for several candidate genes and a more clearly divergent expression between iTregs and Mock-stimulated cells at later time points of differentiation (Fig. 7c). Additionally, from a global perspective, PCA incorporating the shared treatment groups and time points showed that both the Main and independent dataset displayed a similar behavior, namely the time (activation effect) predominantly explained the variability and, further, a separation of the Mock control and iTreg groups was increasingly evident as time progressed (Additional file 1: Figure S7e).
We next performed an ‘in silico validation’ to test whether the newly defined candidates could separate iTregs from other cells in a better way than known Treg regulators or genes randomly chosen from the iTreg subnetwork nodes. Both candidate genes and known Treg regulators (Additional file 1: Figure S7a, b) were retrieved from the list of genes being DEGs in at least one iTreg condition compared to the Mock stimulation control (Fig. 3a). We first tested the classification capabilities on the Main RNA-Seq dataset that was used to identify the candidate genes, although the obtained accuracy estimates are known to be optimistically biased (see  and below). We used a Linear Discriminant Analysis (LDA) approach  followed by group classification and, as expected, the candidate genes performed better than known Treg regulators (or genes from all the 349 nodes in the iTreg subnetwork) in classifying iTregs (induced by any protocol) compared to Mock-stimulated cells (Fig. 7d). In detail, several single candidates were able to classify iTreg versus Mock cells with > 94% accuracy (data not shown). Similarly, many combinations of two candidates could classify iTreg versus Mock cells with 100% accuracy, while the maximal cross-validated accuracy for pairs from the 37 known Treg regulator list (including FOXP3 and IKZF4) was 96% (Fig. 7d, e).
Because the novel iTreg candidate gene set was created using all the Main dataset, the accuracies reported by the above LDA analyses are subject to the well-known feature selection bias . For example, the selected candidates (and known regulators) were differentially expressed in iTregs in the Main dataset, and also the same dataset was used for group classification in LDA. Moreover, most of the candidate genes (35/37) and known regulators are represented within the iTreg subnetwork control gene list, largely explaining the high classification accuracy for some pairs in the control list (Fig. 7d). To further confirm the classification ability of the candidate genes, we therefore performed additional analyses as indicated below.
Importantly, we confirmed the classification power of the candidate molecules in the independent RNA-Seq dataset that was not used for candidate selection. To do so, we tested all possible gene pairs within all three lists (37 candidates, 37 known, 349 iTreg) as above with LDA in the independent dataset. In this independent dataset, we could separate iTregs from Mock-stimulated T cells using candidate gene pairs with up to 93.3% cross-validated accuracy (mean ± SD 67.97 ± 10.63%), which was significantly higher (p < 0.0001) than with pairs from the 37 known Treg regulators or the 349 iTreg subnetwork gene lists (mean ± SD 56.8 ± 12.1% and 59.1 ± 13.8%, respectively; white boxes in Fig. 7d). To further explore good classifiers, we also sub-selected all 76 candidate gene pairs that performed with 100% accuracy in the Main dataset. This set of ‘top classifier’ pairs included all 37 candidate molecules, though some occurred more frequently than others (Additional file 1: Figure S8a). These top classifier pairs were able to separate iTregs and Mock cells in the independent dataset with up to 90% accuracy (mean ± SD 72.1 ± 7.7%; Additional file 1: Figure S8a–c).
As a second approach to confirm the iTreg classification ability of the candidate genes without feature selection bias, we used a Random Forest (RF) classification using all 15,910 HEGs as input to rank the individual genes for their ability to distinguish iTregs from Mock cells (Additional file 7: Table S6). Importantly, 21 (57%) of the candidate genes and only 2 genes of the 37 known Treg regulator list were represented in the top-ranking 100 genes (Additional file 1: Figure S8d and Additional file 7: Table S6), confirming the candidate genes’ individual ability for distinguishing iTregs from Mock cells with this unbiased RF analysis. Interestingly, among the top-ranking 100 genes, we observed several of the genes that also occurred frequently in top classifier gene pairs with 100% accuracy in the above LDA analysis, namely OCIAD2, OLFM2, CTSL, NKD1, TGFA, KCNJ15, SARDH, and PMEPA1 (Additional file 1: Figure S8a, d and Additional file 7: Table S6). Regarding important known Treg regulators, FOXP3 ranked at position 101, IKZF4 at position 5, and FURIN at position 8. Notably, half of the top-ranking 20 genes were candidate genes, among other molecules (Additional file 1: Figure S8d). Nevertheless, we did not prioritize the latter for follow-up analyses as several of these genes are not protein-coding or their role in Tregs or other T cells is already well known.
Together, these data confirm that the selection of candidate molecules was better than a set of known Treg regulators in classifying iTregs in the Main dataset as well as in an independent dataset.
Experimental validation of novel candidate molecules confirms functional effects on FOXP3 expression
The above analyses confirmed that several candidate molecules performed well in distinguishing Mock-stimulated cells from iTregs; however, differential expression does not necessarily mean that these molecules are involved in iTreg biology and FOXP3 induction. Therefore, we sought to validate the putative functional role of the candidate molecules in FOXP3 induction through a targeted shRNA screen. We produced lentiviral particles delivering shRNA pools against each of the candidate molecules (Additional file 8: Table S7), side-by-side with control lentiviruses (shScr or empty vector as negative controls, shFOXP3 or shIKZF4 as positive controls) and transduced primary T cells, which were then differentiated under iTreg or Mock stimulation conditions. To maximize knockdown efficiency in our screening setup, we used pools of shRNAs and, when possible, validated clones from the TRC library (Additional file 8: Table S7), although off-target effects cannot be excluded in such a screening setup. Efficiency of pooled shRNA versus single shRNA was tested beforehand using shRNAs against CD4 and measurement of CD4 knockdown efficiency (Additional file 1: Figure S9a).
Transduction efficiency, as measured by GFP transduction, was approximately 50–65% (Additional file 1: Figure S9b) and transduced cells were selected by puromycin during differentiation, following which FOXP3 and other markers were measured by flow cytometry. Targeting FOXP3 directly with shRNA led to almost complete knockdown of FOXP3 expression, and several negative controls (untransduced, pLKO.1 empty, and shScr vectors) did not affect FOXP3 induction, confirming absence of unspecific effects due to the procedure (Fig. 8a, b). As a result of our candidate validation screen, we found that shRNA targeting of most (30/37; 81%) of the candidate molecules resulted in significantly lower fractions of FOXP3+ cells in iTreg cultures (Fig. 8a–c). This screening setup suggests a functional role for these 30 candidates in FOXP3 expression; nevertheless, it should be noted that future studies are needed to confirm these results for individual candidates and in different experimental approaches in more detail. Some candidate molecules not only affected FOXP3 but also the general activation of T cells, as measured by CD25 upregulation and CD45RA downregulation (Fig. 8c). For several candidates, activation-induced, low-level FOXP3 expression in Mock-stimulated cells was also affected in cells transduced with targeting shRNA, albeit with fewer significant hits (Additional file 1: Figure S9c, d). Another indication for a biological relevance of the novel candidate molecules in Tregs could be their alteration in immune diseases with known Treg involvement. Notably, several candidate molecules scored as being associated with immune diseases, including MS and IBD (Additional file 1: Figure S7a).
Together, our data provide a resource of high-quality mRNA and protein data obtained at multiple time points during the differentiation of human primary iTregs induced under several different conditions and across different laboratories, along with Mock-stimulated T cells, unstimulated naïve CD4+ T cells, and CD25high nTregs. Our results uncovered a range of known crucial Treg regulators with relevance to several immune diseases such as IBD, along with several novel factors that are likely to play equally important roles in FOXP3 expression. Indeed, our in silico and experimental validation confirmed that the majority of the carefully selected novel candidate Treg regulatory molecules successfully classified iTregs and that their experimental perturbation led to reduced induction of FOXP3-expressing T cells.
Herein, we provide a resource of RNA-Seq and proteomics data covering a time-series during human iTreg differentiation, along with control activated, non-activated, and nTreg cells. To our knowledge, this is the first report of unpublished iTreg proteome data to date. Although the transcriptional signature of murine and human TGF-β and TGF-β + ATRA iTregs has been analyzed [3, 38, 39, 55, 56], transcriptional profiles of iTregs induced by the additional protocols used here have not been studied. Profiling iTregs induced with several protocols in parallel allows for the detection of generic, protocol-independent FOXP3 inducers and at the same time reveals information about specific Treg signatures induced by certain compounds. For example, ATRA has a well-known induction effect of gut-homing properties  and Rapa inhibits the Akt/mTOR pathway and downstream genes . Furthermore, previous studies on murine iTregs did not include time-series but only end-stage transcriptional analysis of fully differentiated cells, while the driving molecules involved in initiation of the differentiation program and FOXP3 upregulation likely operate earlier and might be entirely missed with these approaches. One recent study presents time-series data during human Treg induction with TGF-β ; however, the authors did not further focus on Tregs and no significant enrichment for disease-associated SNPs in TF genes was found in iTregs in their study. This is contrary to our study, in which we detect an association of iTreg genes with several immune system diseases, although it should be noted that the results from our disease enrichment analyses must be regarded solely as a comparative relative enrichment of certain diseases over others, complicated by the fact that the disease categories are not independent and therefore standard methods for p value correction are likely too conservative. Further, the studies may not be absolutely comparable because, in the aforementioned study , FOXP3 was neither measured on protein nor at the single-cell level and iTregs expressed relatively low FOXP3 and relatively high TBX21 and RORC mRNA compared to Th1 and Th17 cells, respectively, potentially suggesting sub-optimal iTreg differentiation. In contrast, the transcriptional data in the present work comprises experiments from two different laboratories, in both of which the iTreg phenotype was assessed with Treg markers, such as the expression of FOXP3 protein at the single-cell level, and the suppressive function was tested. Furthermore, we present matched RNA and protein expression of other Treg signature genes such as the important Treg regulator Eos , which is not usually measured in human Tregs due to limitations for detection by antibody-based methods. Interestingly, iTregs induced by our protocols expressed high amounts of Eos protein and mRNA (reaching levels similar to nTregs), a feature that was previously shown to require more than simple ectopic FOXP3 expression, suggesting an association with an nTreg-like phenotype [26, 57]. We confirmed high expression of Eos protein in human nTregs and, importantly, our unique time-course iTreg transcriptome and proteome data show that Eos was one of the strongest and earliest upregulated molecules in all iTreg conditions tested. Therefore, we included Eos as a further positive control for shRNA validation and we showed, for the first time, that knockdown of Eos strongly reduced FOXP3 expression in human iTregs.
In our study, the availability of time-series RNA-Seq data along with proteomic data and an additional independent RNA-Seq experiment with similar molecular patterns enabled the exploration of a most robust iTreg signature and the containment of differences due to experimental setup or technical factors. Nevertheless, it should be noted that limitations still exist, including individual donor variability and the number of biological replicates and time points available. Another important aspect to consider when using this data resource for future studies is the different fraction of activated (CD25+) and FOXP3+ cells under diverse differentiation conditions. Measuring gene expression in these heterogeneous bulk populations may affect calling certain differentially expressed genes, and single-cell analyses or studies in sorted sub-populations may enable identification of additional factors in the future – although these methods are not without their own limitations. Since we were interested in early events preceding FOXP3 induction, and CD25 as a general activation marker was upregulated only after 12–24 h, it was not possible to perform the expression profiling on re-sorted CD25+ cells. Furthermore, sorting would take time and thus affect the accuracy of the time points, but more importantly the global T cell transcriptome is sensitive to changes induced by magnetic or flow sorting [58, 59]. Therefore, we freshly isolated the cells from peripheral blood, rested the naïve T cells before starting the time-series stimulation for all conditions and time points in parallel, and finally lysed the cells as fast and comparably as possible; all in a carefully balanced plate design to additionally avoid batch effects or influences of processing order. Other impactful studies have similarly measured the time-course gene expression on the population level during differentiation of other CD4 T cell subsets (Th1, Th2, Th17) where potential sorting markers are also expressed later than early genes of interest [60,61,62,63]. These studies successfully identified important genes, although the fraction of the desired differentiated cells were often lower than the fraction of FOXP3+ iTregs in our study. Most importantly, the genes we termed as DEGs in iTregs were defined in comparison to Mock-stimulated cells, and the latter also upregulated CD25 upon activation. Although there were protocol-specific differences in the absolute fraction of CD25+ cells, it was generally in a similar range in both iTregs (except those induced in the presence of Rapa) and Mock-stimulated cells; the latter expressed CD25 but not FOXP3 even when gated on CD25+ cells. Expression of each gene was modeled over time and the candidate DEGs and iTreg subnetwork genes were defined by comparing iTregs to Mock-stimulated cells in this time-series analysis. In particular, candidate gene expression was usually relatively low in naïve and Mock-stimulated cells, while being upregulated in iTregs under all conditions (including the Rapa-containing condition) and often already at early time points (before CD25 would be expressed). In line with the well-known proliferation-inhibiting effect of Rapa [29, 64], we observed by PCA that, on a global scale, Rapa-induced iTregs seemed to ‘lag behind in time’ and the time-course modeling of the genes would likely reveal iTreg-specific genes (vs. Mock-stimulated cells) even if the kinetics were different, which would not be possible with a simple sample-to-sample comparison at a single time point. Nevertheless, Rapa-induced iTregs displayed many unique genes compared to other iTregs, which may be partially attributable to the lower fraction of activated cells but also to specific Rapa-induced effects. Particularly for global comparisons of all activation-induced (not only iTreg specific) genes, it should be considered that Rapa-induced iTreg populations contained lower fractions of activated cells; however, this study is focused on iTreg-specific genes shared between different iTreg protocols (vs. Mock stimulation).
Despite these limitations, novel Treg regulatory molecules with differential expression were revealed here and original findings relevant for human immunology are discussed below, both from a global genomic perspective and a gene-centric view. For example, although the suppressive capacity of iTregs is controversial and strongly depends on the particular inducing and resting conditions, suppression assay setup, and controls used [27,28,29], our data may be useful to define suppressive cell markers and genes necessary for suppressive function, as we provide data from iTregs generated under several conditions and displaying varying degrees of suppressive activity, along with nTregs. Interestingly, and as discussed previously , iTregs induced with TGF-β + ATRA + Rapa displayed enhanced suppressive activity in vitro despite the lowest FOXP3 fraction compared to other iTreg populations, which may be related to the lower expression of ‘Treg down’ signature genes in TGF-β + ATRA + Rapa iTregs, modulation of TCR signal strength, or inhibition of contaminating ‘non-Treg’ proliferation by Rapa. Here, the higher sensitivity of conventional T cells compared to Tregs to inhibition by Rapa  may allow for a preferential expansion of Treg-like cells despite lower total fractions of activated cells in the presence of Rapa and may explain the relatively high expression of iTreg candidate genes despite the lower CD25+ cell fraction. Very recently, Candia et al.  re-sorted CD4+CD25highCD127− cells from iTreg cultures prior to suppression assays and confirmed that human iTregs generated in the presence of Rapa had superior suppressive activity in vitro compared to the corresponding iTregs generated without Rapa. However, the in vivo relevance and functionality of iTregs is still controversial, and even iTregs generated with similar protocols are not always determined to be suppressive in vitro and/or in vivo. For instance, although we previously determined that iTregs were not suppressive in vivo in a xenogeneic GvHD model , others have shown that iTregs generated with similar protocols could suppress disease in vivo [67,68,69]. In addition to the technical details that may compromise comparability of such assays between different studies, it is also possible that different subsets of Tregs may suppress only certain target cell types and/or in specific tissues. Tregs employ numerous suppressive mechanisms , and in recent years it emerged that in vivo-generated nTregs are highly heterogeneous with tissue-specific phenotypic and functional specialization of nTregs beyond the classification of pTreg and tTreg subsets [40,41,42]. For instance, Tregs residing in the gut, which may mostly comprise pTregs and may resemble ATRA-induced iTregs, are different from Tregs in other tissues [16, 41]. In this regard, it is important to mention that ATRA and Rapa were shown to confer differential homing capacities to murine and human iTregs [66, 71]. It is therefore conceivable that iTregs with a certain phenotype may be suppressive in one, but not in another disease model. Importantly, although the in vivo relevance of iTregs generated in this study remains unclear, iTregs are the only applicable system to study FOXP3 induction in human primary cells. Further, we have confirmed the highly reproducible molecular patterns of iTregs on multiple levels (RNA, protein) and provided several controls alongside. The identification of several well-known Treg regulators in our study supports the relevance of our results for FOXP3 expression in human T cells.
On a global scale, our integration of data from primary human T cells revealed interesting aspects of the protein:RNA correlation during a major cellular transition like T cell activation and differentiation. About half of the molecules being both DEG and DEP showed a concordant pattern over time, while the other half presented either poor or negative correlation, in line with previous works reporting that RNA levels can only poorly predict protein abundance [44, 72, 73] and that gene-specific correction factors can improve the consistency between RNA levels and protein copy number . For example, we observed a negative RNA:protein correlation among mitochondrial ‘OXPHOS’ genes, which has been previously observed in tumor cells . We further identified a group of genes enriched in ribosomal and nucleolar functions with a poor RNA:protein correlation, which may be explained by previously reported mechanisms, such as low translational efficiency, as observed for members of the TOP mRNA family, which comprises components of the translational machinery, including ribosomal proteins and some small nucleolar RNAs . Interestingly, both metabolism-associated and ribosomal proteins have been recently investigated as a signature to define the functional specialization of human nTregs . Thus, despite the relatively high overlap between DEPs and DEGs, our integrative analysis underlines the importance of studying not only transcript but also protein abundance. While transcriptional data, due to its richness, coverage, and cost efficiency, may allow for more stringent statistical analyses and sample throughput, proteomics data are a valuable addition to narrow down findings from RNA data in order to obtain more confidence for defining differentially expressed molecules and selecting those for perturbation and functional validation approaches.
If a gene set-focused level is required, based on our study, we propose novel iTreg regulators, the function of which was validated by demonstrating that knocking them down in an shRNA screen prevented FOXP3 expression. It should be noted that decreased FOXP3 expression upon candidate knockdown may not be related solely to altered FOXP3 induction, and that a role of candidate molecules in FOXP3 maintenance is also possible. Yet, the expression kinetics of most candidate molecules being expressed prior to FOXP3, as well as the functional assignment of many candidate molecules to the TGF-β pathway cluster together with the known role of the TGF-β response element in FOXP3 induction , suggest a role in FOXP3 induction at least for the respective candidate molecules. Further, except for one candidate gene (OLFM2), ectopic FOXP3 expression in human T cells in an external dataset did not cause differential expression. Particularly for those candidate molecules that are TFs, a direct regulation of the FOXP3 gene is a possibility, but FOXP3 expression as a readout cannot distinguish between direct and indirect regulation.
Future work has to be performed to confirm and extend the findings from the shRNA validation screen and the gene expression studies, such as assessing the molecular mechanisms through which candidate molecules affect FOXP3 expression, exploration of the candidate molecules in other cell types including CD4+ T cell subsets, and studying the relevance of the molecules in Tregs in vivo. Furthermore, despite protocol optimization and careful design of the shRNA screen, including several negative and positive controls, we cannot completely exclude off-target effects or divergent knockdown efficiencies of individual shRNA clones. This should be addressed in the future by deconvoluting individual shRNA effects, correlating knockdown efficiency with the effect size on FOXP3, and confirming the results for each candidate molecule in detail with independent assays.
Despite the limitations of this study, it is tempting to speculate on the potential of the candidate genes as novel drug targets. They belong to several molecular classes amenable to different drug targeting approaches, as they include TFs, surface molecules, and members of the TGF-β signaling pathway. For example, the candidate TF NR3C1 (glucocorticoid receptor) is already a well-known drug target , and its effects on iTregs should be specifically investigated further in the future. Interestingly, the TF TRIM22 selected by integrating iTreg RNA and protein data and having strong effects on FOXP3 in the validation screen does not have a murine homologue and was recently described to be associated with early-onset IBD , emphasizing the importance of studying human instead of murine T cells.
The data presented here may be useful in future investigations to discover new markers to define Treg subsets. Indeed, our new candidate molecules outperformed known Treg regulators in classifying iTregs versus Mock-stimulated T cells. Although the LDA approach is subject to the selection bias , we confirmed a good performance of the candidate molecules in the independent dataset, and we also found the candidate molecules strongly represented in the top-ranking 100 and even top 20 genes obtained from unbiased RF analysis, further supporting our results at least for a subset of candidate molecules. In addition to discrimination of iTregs from Mock-stimulated cells, the classifiers could potentially be useful for pTreg versus tTreg discrimination as well, which would be relevant for specific targeting of either subset depending on the disease. Due to similar induction requirements (IL-2 and TGF-β) it may be speculated that iTregs are likely to be more similar to pTregs than to tTregs. Furthermore, pTregs and tTregs differ in their TCR repertoire [1, 80] and, in this respect, due to the common naïve T cell precursor, iTregs might be similar to pTregs; nevertheless, in vivo, only a small fraction of pTreg and conventional T cell TCR sequences seem to overlap . Unfortunately, in contrast to murine cells where Neuropilin-1 and Helios can distinguish pTregs and tTregs at least under most non-inflammatory conditions, these markers are not usable for such distinction in human cells . In accordance with our data, murine nTregs and iTregs were shown to differ in parts of their signature while, experimentally, in vivo-induced murine pTregs were more similar to nTregs [3, 38, 39], arguing for additional factors during in vivo Treg induction. Yet, signatures of in vivo-induced pTregs differ depending on the mode of conversion and, regarding the Treg signature genes, pTregs still considerably differ from ex vivo-isolated (splenic or lymph node) Tregs while much of the in vivo-converted Treg signature overlaps with in vitro-converted iTregs ; these observations may reflect imprinting of tissue-specific signatures. Furthermore, the activation with anti-CD3/-CD28 largely shapes the gene signature of in vitro differentiating cells as evident from our time-series data, and is therefore likely to at least partially contribute to the observed difference between iTregs and ex vivo-derived (unstimulated) nTregs. Cell number limitations did not allow us to include activated nTregs in this study, but in accordance with this notion, a drastic change of the human nTreg gene signature upon in vitro activation has been described along with a shared activation signature between activated Tregs and activated Tcons [81, 82]. It should also be noted, again in this context, that nTregs comprise different subsets [40, 42] and can be isolated to different grades of purity; and, in our study, nTregs were isolated solely on the basis of high CD25 expression because of the relatively high cell numbers required for proteomics. These bulk ‘nTregs’ supposedly include pTreg and tTreg subsets; the latter may be prominent according to studies in mice from which it has been estimated that more than 70% of Tregs in the periphery are tTregs . A similar proportion has been proposed for human Tregs in peripheral blood, although use of the marker Helios has been debated . Thus, differences between the nTreg and iTreg signatures observed in the present study might reflect true differences in iTregs versus nTregs but also pTregs versus tTregs. Despite the controversies regarding Helios as a tTreg marker , it is worth mentioning that, in line with the notion of iTregs being more similar to pTregs, iTregs induced under all conditions and in both datasets of the present work displayed low expression of IKZF2 (encoding for Helios) not exceeding that of Mock-stimulated or unstimulated cells, while nTregs expressed drastically higher levels of IKZF2 mRNA and Helios protein (data not displayed). Interestingly, iTregs in this study expressed IKZF4 (Eos) but not IKZF2 (Helios), both of which were previously shown to be expressed in a FOXP3-independent manner and suggested to correlate with an nTreg-specific DNA methylation signature [26, 57]. However, the Treg epigenetic signature, such as DNA demethylation of the Treg-specific demethylated region in the FOXP3 gene, which is important for stable FOXP3 expression, appears divergent in iTregs versus nTregs (including in in vivo-induced pTregs in some studies) [2, 25, 26, 28, 57]. Yet, relative instability of pTregs may even be a favorable scenario in vivo, conferring local and transient immune suppression in specific tissues, in contrast to long-lasting, stable tolerance to self-antigens mediated by tTregs and preventing autoimmune disease. Interestingly, deletion of the element containing the Treg-specific demethylated region in the FOXP3 locus did not affect either FOXP3+ thymocyte numbers in vivo or FOXP3 induction, but compromised Treg stability, with functional relevance in chronic infectious settings or in older mice [83,84,85]. Irrespective of the differences between nTregs and iTregs, the latter appear as the most suitable system to study early events of FOXP3 induction in human cells. Indeed, our approach using differentiation of iTregs enabled the identification of several well-known nTreg factors, along with novel molecules for which we confirmed a functional role in in vitro FOXP3 induction.
In conclusion, we provide a valuable data resource with RNA-Seq and corresponding proteomics time-series data of primary human T cells, comprising naïve T cells, activated T cells, iTregs induced by four different protocols, and nTregs. These data can be exploited in the future for detailed analyses of iTreg signatures, including specific effects of Treg-inducing compounds and discovery of markers to distinguish Treg subsets. We present a set of 37 novel candidate molecules, of which several functionally affected induction of FOXP3+ iTreg cells in a shRNA screening setting and could classify iTregs versus Mock-stimulated T cells. These candidate molecules have the potential to be developed further for use as drug targets in autoimmune and inflammatory diseases or cancer, given the known relevance of FOXP3+ Tregs in these diseases.
Human T cell isolations
Anonymized healthy donor buffy coats were purchased from the Karolinska University Hospital (Karolinska Universitetssjukhuset), Huddinge, Sweden. The study was approved by the Regional Ethical Review Board in Stockholm (Regionala etikprövningsnämnden i Stockholm), Sweden (approval number: 2013/1458–31/1). Fresh buffy coats were filled with PBS to 170 mL and used to isolate human peripheral blood mononuclear cells (PBMCs) by gradient centrifugation using Ficoll-Paque Plus (GE Healthcare), followed by monocyte depletion through plastic adherence in RPMI medium containing 10% FCS (Invitrogen). Afterwards, magnetic activated cell sorting (MACS) was utilized to isolate T cell subsets from PBMCs: CD25high ‘nTregs’ were first prepared by positive isolation as described previously  using limited amounts (2 μL/107 cells) of CD25 beads (Miltenyi) and two subsequent MACS columns. Platelets were removed from PBMCs by low-speed centrifugation (200× g, 5–10 min, 20 °C; 3–4 times), followed by isolation of naïve CD4+ T cells from the nTreg-depleted fraction by negative isolation using the naïve CD4+ T Cell Isolation Kit II, human (Miltenyi), according to the manufacturer’s instructions. In brief, CD45RO+ memory T cells and non-CD4+ T cells were indirectly magnetically labeled using a cocktail of biotin-conjugated antibodies against CD8, CD14, CD15, CD16, CD19, CD25, CD34, CD36, CD45RO, CD56, CD123, TCRγ/δ, HLA-DR, and CD235a. For shRNA validation screening, fresh blood was obtained from anonymized healthy donors after informed consent at the La Jolla Institute for Allergy and Immunology, La Jolla, USA, according to institutional guidelines (Normal Blood Donor Program protocol VD-057). Blood was diluted 1:2 with PBS and PBMCs and naïve CD4+ T cells were isolated as above. The purity of MACS-isolated cells was assessed by flow cytometry. Cells were counted in trypan blue solution using a Countess Automated Cell Counter (Life Technologies) and viability was determined by trypan blue staining and/or flow cytometry (see below). T cells were cultured at 5% CO2/37 °C.
iTreg differentiation for molecular profiling
Naïve CD4+ T cells were isolated from buffy coats of male anonymous donors (aged 37, 38, and 34 years) by MACS as described above, rested for several hours and then plated under iTreg differentiation conditions as described previously [28, 29]. Briefly, cells were stimulated with 5 μg/mL plate-bound anti-CD3 antibody (clone OKT3; Biolegend, LEAF grade), 1 μg/mL soluble anti-CD28 antibody (clone CD28.2; Biolegend, LEAF grade), and 100 IU/mL IL-2 (carrier-free; R&D Systems). ‘Mock’ control cells received no further compounds. For Treg-inducing conditions, TGF-β1 (5 ng/mL carrier-free; R&D Systems), ATRA (10 nM; Sigma-Aldrich), Rapa (100 ng/mL; Calbiochem EMD Millipore), or sodium butyrate (100 μM; Sigma-Aldrich) were added. Plates were prepared first, pre-warmed to 37 °C, and cells added last. Cells were plated in 96 U-well plates at 1.0–1.2 × 105 cells/well, in a 200 μL final volume/well serum-free X-Vivo 15 medium (Lonza) supplemented with Glutamax (Gibco). Margin wells were not used for cell culture, and all unused wells were filled with 200 μL PBS. The plating layout was balanced for iTreg conditions and time points within and across donors to prevent any batch effects through processing order or culturing samples on the same plate. Samples for each donor were prepared in completely independent experiments but with reagents aliquoted from the same batches. Cells were incubated for 2 h to 6 days. At the indicated time points (2, 6, 24, and 48 h, and 6 days), several wells per condition were pooled (between 5 and 30 wells, considering cell growth for conditions and time points) and cells were processed for RNA and protein extraction. On days 4 and 6, 1–2 wells of each iTreg condition were used for flow cytometric analysis (see below).
RNA and protein extraction for molecular profiling
At the indicated time points, the plates were placed on ice, 100 μL of the supernatant were removed, and 100 μL of ice-cold X-Vivo 15 medium was added to each well. Cells were processed in the order of plating according to the experimental design, balanced for donor and condition to prevent batch effects by processing time. Replicate wells were pooled with a multichannel pipet, rinsed with cold PBS, and the cells transferred to 15 mL tubes. Cells were centrifuged (450× g, 10 min, 4 °C), the supernatant removed, and the cells were washed twice with 1 mL of ice-cold PBS each (1000× g, 5 min, 4 °C). The supernatant was removed completely, cells were vortexed and lysed in RLT buffer (Qiagen) supplemented with 142 mM beta-mercaptoethanol (Sigma Aldrich), and homogenized using Qia Shredder columns (Qiagen). Lysates were frozen on dry ice and stored at −80 °C until processing. RNA and proteins were extracted with the AllPrep DNA/RNA/Protein Mini Kit (Qiagen) according to the manufacturer’s recommendations with the following modifications. Centrifugations were carried out at 9600× g except for protein precipitation, which was at 20,800× g. RNA was eluted with 40 μL of RNase-free water and elution was repeated with the first eluate. An aliquot of RNA was taken for nanodrop, bioanalyzer, and qRT-PCR, and the remaining RNA was frozen on dry ice and stored at −80 °C. The protein precipitate was solubilized (5 min, 95 °C) in freshly prepared buffer containing 4% (w/v) SDS, 25 mM HEPES pH 7.6, and 1 mM DTT, and the insoluble material was removed by centrifuging for 1 min at 20,800× g. An aliquot of the soluble protein supernatant was taken for BCA assay, and the extracts were frozen on dry ice and stored at −80 °C. The BCA assay of 1:5 diluted samples was performed according to the manufacturer (Pierce BCA Protein Assay Kit, Thermo Fisher Scientific) in a 96-well format; an aliquot of the lysis buffer (frozen along with the sample aliquots) was used (1:5 diluted) to prepare the standard curve with BSA protein. All samples were used for RNA-Seq. All samples except the 2 h time point samples, iTreg TGF-β + ATRA and iTreg TGF-β + butyrate, were used for proteomics.
RNA sequencing (RNA-Seq)
RNA concentration was measured on a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific). RNA quality was controlled using an Agilent RNA 6000 Pico Kit on a 2100 Bioanalyzer instrument (Agilent Technologies). RNA integrity numbers were 8.47 ± 0.50 (mean ± SD; range 7.6–10.0); 1 μg of RNA per sample (except for the 2 h time point samples of Donor 2: 500 to 900 ng) was used for library preparation with the TruSeq Stranded mRNA HT kit (Illumina), with Dual Indexing adapters (8-plex). As a control, an Ambion ERCC Spike-In Control Mix was used (Thermo Fisher Scientific). According to the standard protocol, libraries were purified from excess sequencing adapters with Agencourt AMPure XP beads (Beckman Coulter). Libraries were generated in two batches, in which samples were balanced across a library batch and adapter 8-plex according to donor, iTreg/control condition, and time point to allow for the accounting of potential batch effects. Library concentration was measured on a Qubit 2.0 Fluorometer (Thermo Fisher Scientific) and library quality and size were determined by using an Agilent High Sensitivity DNA Kit on a 2100 Bioanalyzer instrument (Agilent Technologies). Libraries were quantified with the KAPA Library Quantification Kit (KAPA Biosystems). Pools of eight libraries (except for one lane, nine libraries) were denatured using NaOH, and PhiX Control v3 was added to each pool. Pools were clustered and sequenced in two batches, using one High Output Run and one Rapid Run, on a HiSeq 2500 Sequencing Platform (Illumina). For the High Output run, library pools were clustered on a HiSeq Flow Cell v3 on the cBot system, using TruSeq PE Cluster Kit v3 - cBot - HS, and Read 1 Sequencing Primer Mix (HP10) from a TruSeq Dual Index Sequencing Primer Kit. Clustered libraries were then sequenced with 75 nt paired-end reads, using TruSeq SBS Kit v3 50 cycle kits. The Index 1 sequencing primer (HP12) and Read 2 sequencing primer (HP11) provided in the TruSeq Dual Index Sequencing Primer Kit were used. For the Rapid run, library pools were clustered on a HiSeq Rapid Flow Cell v1 using TruSeq Rapid Duo cBot Sample Loading Kit. Clustered libraries were then sequenced using a TruSeq Rapid PE Cluster Kit together with three TruSeq Rapid SBS 50 cycle kits. On average, 18.9 million read pairs were obtained per sample (mean ± SD 18.9 × 106 ± 5.23 × 106; range 5.3 × 106 to 32.1 × 106). BCL basecall files were de-multiplexed and converted to FASTQ files with bcl2fastq v1.8.4.
Quantitative reverse transcriptase PCR (qRT-PCR)
An aliquot of RNA used for sequencing, and from an additional donor, was taken for qRT-PCR analysis. cDNA was prepared with the SuperScript VILO cDNA Synthesis Kit (Invitrogen) according to the manufacturer’s instructions. mRNA was quantified with gene expression mastermixes for SYBR Green or Taqman method (Applied Biosystems) and measured on a StepOne plus detector system (Applied Biosystems) as follows. IKZF4 and RPL13A mRNA was measured using Applied Biosystems best coverage Taqman probes (FAM reporter), and FOXP3 and RPL13A mRNA was measured by SYBR Green qRT-PCR (primers: FOXP3 Forward: AGCTGGAGTTCCGCAAGAAAC, FOXP3 Reverse: TGTTCGTCCATCCTCCTTTCC; RPL13A Forward: TCCAAGCGGCTGCCGAAGATG, RPL13A Reverse: CTTCCGGCCCAGCAGTACCTGT). The relative mRNA expression was determined by normalization to RPL13A and results are presented as fold induction compared to mRNA amounts of unstimulated T naïve of the same donor, which were set to 1. Fold expression was calculated using the ∆∆Ct method according to the following formula (Ct is the threshold cycle value):
Sample preparation for mass spectrometry
At least 25 μg of protein per sample from Allprep extraction (see above; in 4% SDS, 25 mM HEPES, 1 mM DTT) was used for proteomics. Lysates were heated to 95 °C for 5 min followed by sonication for 1 min and centrifugation at 14,000× g for 15 min. The supernatant was mixed with 1 mM DTT, 8 M urea, and 25 mM HEPES at pH 7.6, and transferred to a 10-kDa cut-off centrifugation filtering unit (Pall, Nanosep), and centrifuged at 14,000× g for 15 min. Proteins were alkylated by 50 mM iodoacetamide in 8 M urea and 25 mM HEPES for 10 min. The proteins were then centrifuged at 14,000× g for 15 min followed by two more additions and centrifugations with 8 M urea and 25 mM HEPES. Trypsin (Promega) in 250 mM urea and 50 mM HEPES was added to the cell lysate at a ratio of 1:50 trypsin:protein and incubated overnight at 37 °C with gentle shaking. The filter units were centrifuged at 14,000× g for 15 min followed by another centrifugation with ultra-pure water (Milli-Q, Millipore) and the flow-through was collected. Peptides were labelled with TMT10-plex reagent according to the manufacturer’s protocol (Thermo Fisher Scientific) and cleaned by a strata-X-C-cartridge (Phenomenex). Samples for the same culture condition within a donor were kept together in a 10-plex. A slight batch effect dependent on the TMT set (Additional file 1: Figure S3) may be explained by the distribution of individual donors within each TMT set, which was performed to increase quantification overlap. In total, five 10-plexes were used, and an internal standard containing a mix of protein from all samples was used as one TMT tag in each TMT set to enable relative quantification between TMT sets. Some samples were loaded in duplicate in different 10-plexes.
Immobilized pH gradient – isoelectric focusing (IPG-IEF) of peptides
The TMT-labeled peptides, 250 μg per TMT-10-plex, were separated by IPG-IEF on a 3–10 strip. Peptides were extracted from the strips into 72 fractions by a prototype liquid handling robot, supplied by GE Healthcare Bio-Sciences AB. A plastic device with 72 wells was put onto each strip and 50 μL of ultra-pure water (Milli-Q, Millipore) was added to each well. After 30 min incubation, the liquid was transferred to a 96-well plate and the extraction was repeated two more times. The extracted peptides were dried in a SpeedVac vacuum concentrator and dissolved in 3% acetonitrile and 0.1% formic acid.
Q Exactive analysis
Before analysis on the Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fischer Scientific), peptides were separated using an Ultimate 3000 RSLCnano system. Samples were trapped on an Acclaim PepMap nanotrap column (C18, 3 μm, 100 Å, 75 μm × 20 mm), and separated on an Acclaim PepMap RSLC column (C18, 2 μm, 100 Å, 75 μm × 50 cm; Thermo Fisher Scientific). Peptides were separated using a gradient of A (5% DMSO, 0.1% formic acid) and B (90% acetonitrile, 5% DMSO, 0.1% formic acid), ranging from 6% to 37% B in 30–90 min (depending on IPG-IEF fraction complexity) with a flow of 0.25 μL/min. The Q Exactive was operated in a data-dependent manner, selecting the top 10 precursors for fragmentation by high collision dissociation (HCD). The survey scan was performed at 70,000 resolution from 400 to 1600 m/z, with a max injection time of 100 ms and target of 1 × 106 ions. For generation of HCD fragmentation spectra, a max ion injection time of 140 ms and automatic gain control of 1 × 105 were used before fragmentation at 30% normalized collision energy and 35,000 resolution. Precursors were isolated with a width of 2 m/z and put on the exclusion list for 70 s. Single and unassigned charge states were rejected from precursor selection.
Peptide and protein identification
All Orbitrap data was searched by SequestHT under the software platform Proteome Discoverer 1.4 (Thermo Fisher Scientific) against the Ensembl 81 human protein database and filtered to a 1% FDR. A precursor mass tolerance of 10 ppm and product mass tolerances of 0.02 Da for HCD-Fourier Transform Mass Spectrometry (FTMS) were used. Further settings used were trypsin with two missed cleavage; iodoacetamide on cysteine and TMT on lysine and N-terminal as fixed modifications; and oxidation of methionine as variable modification. Quantification of TMT-10-plex reporter ions was performed by Proteome Discoverer on HCD-FTMS tandem mass spectra using an integration window tolerance of 10 ppm. Only peptides unique to a protein group were used for quantitation.
T cells were cultured and harvested as described above, and protein extraction was performed as above for molecular profiling. Protein concentrations were determined by BCA assay and 25 μg of protein per lane were resolved by SDS-PAGE using Mini-PROTEAN TGX Gel (Bio-Rad). Proteins were transferred by standard Wetblot procedures to Protran nitrocellulose membranes (Amersham GE Healthcare). Membranes were blocked with 5% nonfat dry milk in TBS containing 0.1% (w/v) Tween 20. The primary antibodies used were the following: anti-FOXP3 (clone eBio7979, eBioscience), anti-PARP (clone C2–10, BD Biosciences), anti-GAPDH (clone 6C5, Santa Cruz Biotechnology), anti-alpha-Tubulin (clone B-5-1-2, Sigma-Aldrich), and anti-STIM1 (N-terminal, rabbit polyclonal, Sigma-Aldrich). The FOXP3 antibody clone eBio7979 recognizes an epitope between exon 3 and the Zink finger domain, and should recognize all three known FOXP3 isoforms. Several proteins were used as loading controls, since ‘housekeeping’ proteins GAPDH and alpha-tubulin were differentially expressed between samples. Horseradish peroxidase-conjugated secondary antibodies were from Santa Cruz Biotechnology, and protein bands were developed in a Vilber Fusion Solo S chemiluminescence acquisition system (Vilber Lourmat) using Immobilon Western Chemiluminescent horseradish peroxidase substrate (Millipore). Only non-saturated exposures were used for visualization and analysis, and bands were quantified with ImageJ software version 1.48v.
Flow cytometry and antibodies
An aliquot of cells used for molecular profiling was stained for flow cytometry on days 4 and 6 of culture. Day 4 samples were stored after fixation and processed further together with day 6 samples. On day 6, samples were stained without and after 4 h restimulation with 0.5 μM ionomycin and 10 ng/mL of phorbol 12-myristate 13-acetate (Sigma Aldrich) in the presence of Golgi plug (BD Biosciences). If not otherwise stated, staining was performed as follows:
Cell surface antigen staining with the following antibodies: CD4-PerCP (clone SK3, BD Biosciences), CD45RA-PE-Vio770 (clone T6D11), CD45RA-FITC (clone T6D11), and CD25-PE (clone 4E3; all Miltenyi Biotec) were performed in the dark with antibody dilutions in FACS buffer (PBS/0.5% HSA) for 30 min at 4 °C. Cells were washed once with PBS, resuspended in FACS buffer, and acquired or used for subsequent intracellular staining. After surface staining, cells were washed twice with PBS and stained with the Fixable Viability Dye eFlour780 (ebioscience) for 30 min at 4 °C (dark), then washed twice and used for intracellular staining.
Intracellular staining was performed at 4 °C with the Foxp3 Staining Buffer Set (ebioscience) according to the manufacturer’s protocol using the following antibodies: FOXP3-APC (ebioscience, clone 236A/E7), CTLA-4-BV421 (BD Biosciences, clone BNI3), and IFN-γ-FITC (ebioscience, clone 4S.B3). Isotype control antibodies (mIgG1κ APC isotype, ebioscience, clone P220.127.116.11.1; mIgG2aκ BV421 isotype, BD Biosciences, clone MOPC-173; mIgG1κ FITC isotype, ebioscience, clone: P18.104.22.168.1) were used at the same final concentrations (w/v) as the corresponding intracellular staining antibodies.
Acquisition and analysis
If not otherwise stated, acquisition was performed on a CyAn ADP 9 Color Analyzer (Beckman Coulter), and parameter compensation was performed automatically with the CyAn software (Summit) tool using single stained samples containing positive cells. Flow cytometry raw data were analyzed using FlowJo Software (Tree Star) and exported values for cell fractions were analyzed in GraphPad Prism 6 or R.
Lentiviral transduction of T cells for shRNA validation screen
shRNA libraries used were based on the pLKO.1-puro TRC vector backbone developed by The RNAi Consortium (TRC). TRC1, TRC1.5, or TRC2 clones from the Sigma Mission TRC - Human collection (Sigma Aldrich) were used, cherry-picked, and obtained as bacterial glycerol stocks from the RNAi center at the La Jolla Institute for Allergy and Immunology (for clone list, see Additional file 8: Table S7). Bacteria were inoculated in overnight cultures and pLKO.1 plasmids were extracted individually with the E.Z.N.A. Plasmid DNA Mini Kit I (Omega Bio-tek) according to the manufacturer’s instructions with an optional second washing step and eluted in sterile 10 mM Tris at pH 8.0. Empty pLKO.1 vector or pLKO.1-Scr (with scrambled non-targeting shRNA) were used as negative controls, and pLKO3G (encoding for GFP in place of puromycin resistance gene) was used to control for transfection and transduction efficiency. A pool of clones targeting human CD4 was used to optimize T cell transduction and knockdown. For knockdown of ‘iTreg candidate genes’ , a pool of 2 to 8 (mean 4.3, SD 0.9) shRNA-encoding plasmids per gene was transfected to 293T cells in equal amounts to produce lentivirus. For some genes, two independent pools of shRNA-encoding plasmids were used. Each pool contained only shRNA-encoding plasmids from the same TRC version (for list of clones, see Additional file 8: Table S7).
293T cells (derivative of human embryonic kidney 293 cells containing the SV40 T-antigen) were obtained from ATCC and tested negative for mycoplasma. 293T cells were harvested at 60–85% confluency and seeded at 4 × 105 cells per well in six-well plates in 2 mL of antibiotic-free DMEM high-glucose medium supplemented with 10% FBS and 2 mM glutamate (Gibco). After 20–23 h (confluency 40–60%), the medium was replaced with 2 mL of fresh pre-warmed medium and 1–2 h later, 293T cells were transfected for lentivirus production. Cells were co-transfected with pLKO.1 shRNA transfer vectors (see above) along with packaging and pseudotyping vectors (pCMV-dR8.9 and pCMV-VSV-G, a kind gift from Sara Trifari, La Jolla Institute for Allergy and Immunology) using jetPRIME transfection reagent according to the manufacturer’s recommendations (Polyplus transfection). After 4 h, the medium was removed and replaced by 1 mL of fresh pre-warmed medium. Transfection efficiency was estimated after approximately 24 h in samples transfected with pLKO3G to be above 95% by fluorescence microscopy. The virus-containing supernatant was harvested 38 h after transfection and centrifuged at 350× g for 5 min to remove cellular material, and used on the same day to transduce T cells.
T cell transduction and culture
Human naïve CD4+ T cells were pre-activated for 16–18 h with plate-bound anti-CD3 antibody (coated at 5 μg/mL), soluble anti-CD28 antibody (1 μg/mL), and 100 U/mL of IL-2 in 96 U-well plates at 130,000 cells/well in 200 μL of X-Vivo 15 medium. Margin wells of 96 U-well plates were not used for culturing but instead filled with 200 μL of PBS. T cell plates were centrifuged (450× g, 10 min) and 165 μL of medium was removed and kept in the incubator as T cell-conditioned medium. T cells were transduced with 200 μL/well of fresh viral supernatant with a final concentration of 8 μg/mL of Polybrene (hexadimethrine bromide, Millipore) by spin infection (900× g, 1 h, 35 °C). T cells were subsequently cultured for 3 h at 37 °C/5% CO2, then centrifuged (500× g, 5 min), and viral supernatants were removed and replaced by T cell-conditioned medium. At 26 h after transduction, T cell differentiation was started. Differentiation plates coated with anti-CD3 antibody were prepared with compounds for iTreg conditions (anti-CD28 + IL-2 + TGF-β1 + ATRA) or ‘Mock’ stimulation (anti-CD28 + IL-2) as described above, with additional supplementation of medium with 1 μg/mL of puromycin (Gibco) except for controls without the puromycin resistance gene. Transduced T cells and controls were centrifuged (500× g, 10 min), the medium was removed, and cells were washed twice with 200 μL of pre-warmed X-Vivo 15 medium. T cells were resuspended in 60 μL of X-Vivo 15 medium, and 50 μL of the cell suspension were transferred to pre-warmed prepared differentiation plates (one well each for iTreg and Mock stimulation per targeted gene and donor). The lack of hindrance of FOXP3 induction by pre-activation, transduction, or puromycin selection had been previously tested. After 4–5 days of differentiation culture, T cells were stained for viability and expression of CD25, CD45RA, and FOXP3, as described above, except for control cells transduced with pLKO3G, which were only stained for viability and then fixed with 2% paraformaldehyde (methanol-free) in PBS instead of the FOXP3 buffer set. Samples were acquired on a FACS Canto II cytometer (BD Biosciences) and compensation was performed with the in-build compensation tool (BD FACSDiva software) using single stained samples.
RNA-Seq data analysis
FASTQ files were adapter- and quality-trimmed using Trim Galore!, and aligned with TopHat2 using hg19 genome index and GENCODE v19 transcriptome model. Alignment metrics and statistics were extracted from the BAM files using Picard tools. Reads were summarized to the GENCODE v19 genes with the summarizeOverlaps function of the GenomicAlignments  package. Unmapped reads were mapped with bowtie2 against an ERCC index in order to obtain ERCC spike-in counts and FPKM values and to explore the dynamic range.
Exploratory analysis and visualization (Additional file 1: Figure S3) was performed using DESeq2 R package , after removing the genes with zero counts in all samples and normalization using the DESeq2 size factors. Blind regularized logarithmic transformation (rlog) was applied to the count matrix in order to calculate Euclidean distances and correlations between samples. PCA on rlog-transformed data and multidimensional scaling using Poisson distance on counts were also used to visualize sample-to-sample relationships. To explore the relative effect of technical and biological factors, a separate linear model was fitted between each of the PC scores and a series of explanatory variables and the R2 was calculated for each of them, obtaining a matrix of R2 values reflecting the association between the PCs and the explanatory variables.
Classification into LEGs and HEGs
We firstly obtained a threshold to define if a gene was considered expressed or not in the full dataset. It has been previously shown  that RNA-Seq can classify the metazoan genes, on the basis of the relative abundance of their mRNAs, into one group of lowly expressed and putatively non-functional genes (LEGs) and another group of HEGs (Additional file 1: Figure S6b). After verifying that the distribution of protein-coding genes in our dataset appears bimodal when reported as log2(FPKM) (fragments per kilobase per million mapped reads) values, we aggregated the log2(FPKM) values per group (G02–G06) and fitted a one-dimensional normal mixture model with two components and variable variance with mclust  to each group separately. After obtaining classification vectors, the midpoint between the maximum log2(FPKM) value of the LEGs and the minimum log2(FPKM) value of the HEGs was considered. A threshold (t = 1.496) was chosen as the average of all midpoints. Then, we considered a gene expressed in our samples if log2(FPKM) > t for at least 15 samples, obtaining 15,910 HEGs. RNA expression levels for candidate genes were further classified based on the range of expression considering all samples as follows: + < 100 counts; ++ ~100–1000 counts; +++ ~1000–10,000 counts (Additional file 1: Figure S7a).
Differential expression analysis
To model the differential expression over time, we used three methods based on R packages: (1) a DESeq2  generalized linear model of the negative binomial family with time as a discrete factor; (2) a DESeq2 generalized linear model of the negative binomial family with a natural cubic spline of time; and (3) a maSigPro  third degree polynomial regression. We define the variable describing the different treatments as ‘group’, with the following levels: G01, unstimulated cells; G02, Mock-stimulated cells (control); G03, iTreg TGF-β; G04, iTreg TGF-β + ATRA; G05, iTreg TGF-β + ATRA + Rapa; G06, iTreg TGF-β + butyrate; and G07, unstimulated nTreg. Similarly, we define the following naming conventions for the variable ‘time point’: T01, 0 h; T02, 2 h; T03, 6 h; T04, 24 h; T05, 48 h; T06, 6 days.
For method (1), the samples for G01 and G07 were not included and we considered a design formula that models the group differences at time point 2 h (T02), the difference over time points, and any group-specific differences over time points (the interaction term) for each gene i and sample j:
With DESeq2, the dispersion was estimated and the model was fitted, and then a Wald test was performed for individual coefficients to extract log2(Fold Change) values, p values, and FDRs. Results were considered significant if FDR < 0.01. A likelihood ratio test was also performed comparing the full model with a model without the interaction term. For method (2), we operated similarly to method (1), but with a change in the design formula in order to model the gene expression as a smooth function of time, namely a natural cubic spline. The group difference and the interaction with the smooth function were also considered. Results were considered significant if FDR < 0.01. For method (3), we used the rlog-transformed data to perform a stepwise cubic regression and extract the best model for each gene using maSigPro, with the time and group as explanatory variables. Results were considered significant if FDR < 0.01 and R2 > 0.7.
To summarize the results, we assigned individual scores (G) counting the methods that defined a gene as being significantly differentially expressed. A gene was called a DEG if it was amongst the significant results of at least two of the three methods, corresponding to a score ≥ 2. In particular, we defined five different individual scores, corresponding to the time and the four different iTreg conditions (G time , G G03 , G G04 , G G05 , G G06 ), plus a binary total score (G Sum ), annotating if the gene was overall considered a DEG for at least one individual score. For the time comparisons, the following requirements were considered: for the DESeq2 methods (1) and (2), FDR < 0.01 for any base coefficient related to time; for the maSigPro method (3), FDR < 0.01 and R2 > 0.7 for the ‘control’ time series. For the group comparisons, we considered the following requirements: for methods (1) and (2), FDR < 0.01 for any coefficient of the ‘coefficient category’, i.e., the coefficient for the base variable ‘group’ and the all the corresponding interaction coefficients (‘group’ × ‘time’); for the maSigPro method (3), FDR < 0.01 and R2 > 0.7 for the time series corresponding to the ‘group’ variables.
Proteomics data analysis
TMT ratio sample median normalization (ratio R vs. internal standard) was performed, using the median PSM TMT reporter ratio from peptides unique to a gene symbol. The protein relative quantification values (log2R) were used for exploratory analysis and visualization (Additional file 1: Figure S3), only considering the proteins detected in all samples. To explore sample-to-sample relationships, we used correlations between samples and PCA. To investigate the relative effect of technical and biological variables, a separate linear model was fitted between each of the PC scores and a series of explanatory variables and the R2 for each of them was calculated, obtaining a matrix of R2 values that was tested for associations between the PCs and the explanatory variables. To model the differential protein expression over time, samples for G01 and G07 were excluded and we used a design in limma R package  that models the group differences at baseline, the difference over time points, and the group-specific differences over time points (the interaction term), similarly to the method (1) for the detection of DEGs, but restricted to the time points and groups included in the proteomic experiments. Empirical Bayes statistics, p values, and FDR results were considered significant if FDR < 0.05.
To summarize the results, we created individual binary scores (0/1) to define DEPs. In particular, we defined three different individual scores, P, corresponding to the time and the two different iTreg conditions (P time , P G03 , P G05 ), plus a binary total score (P Sum ), annotating whether the protein was overall considered a DEP for at least one individual score. For the time comparisons, we considered FDR < 0.05 for any base coefficient related to time as a requirement; for the group comparisons, we required FDR < 0.05 for any coefficient of the ‘coefficient category’, i.e., the coefficient for the base variable ‘group’ and all the corresponding interaction coefficients (‘group’ × ‘time’).
The number of proteins per cell was estimated (for an average of all samples in which the respective protein was detected) according to Wiśniewski et al. . Based on this number, candidate protein expression levels (Additional file 1: Figure S7a) were classified as follows: – not detected; + < 10,000/cell; ++ 10,000–100,000/cell; +++ > 100,000/cell. Similarity of protein expression profiles to RNA profiles was considered ‘NA’ (not applicable) when the protein was not detected, or only detected in few samples.
We created SOMs as implemented in the R package kohonen . We started from a matrix of gene z-scores obtained from the RNA-Seq rlog-transformed data and selected all the DEGs according to the score G above. A supersom model was trained, considering seven different layers corresponding to the groups G01 to G07 above and a 20 × 20 toroidal hexagonal grid. The average codebook vector for the samples belonging to each group was plotted at each time point in order to obtain a topological map of the iTreg polarization over time.
From the MSigDB  website (http://software.broadinstitute.org/gsea/msigdb/) the v5.1 C7 (‘immunologic signatures’) collection was downloaded and filtered to contain sets matching only ‘TH1’, ‘TH2’, ‘TH17’, ‘Treg’, ‘TCONV’ , or ‘CD4’ in their names. We incorporated four additional gene sets named ‘HUMAN_Treg_signature_UP’ , ‘HUMAN_Treg_signature_DN’ , ‘MOUSE_Treg_signature_UP’ , and ‘MOUSE_Treg_signature_DN’ , corresponding to the human and mouse up- or down-regulated genes in Treg versus Tcon as defined in Ferraro et al.  and Hill et al. , respectively. The human homologs of mouse genes were retrieved using the mouse Ensembl BioMart service. Then, for each pair of gene set (‘UP’ and ‘DN’) we calculated a score similarly to Gaublomme et al. . Briefly, we started from a matrix of gene z-scores obtained from the rlog-transformed data. For each sample, we then defined the score as the mean of the genes in ‘UP’ minus the mean of the genes in ‘DN’. In order to quantify the relevance of each signature in a sample PC space, we calculated the Pearson correlation coefficient between each of the first three PC scores and the signature score. Finally, on a two-dimensional sample PCA plot we represented selected correlation value pairs (subset of those with p < 10− 6 in at least one of the three first PCs) as arrows starting from the origin and pointing to the pair of correlation values for the corresponding PC scores. We grouped the selected scores in three categories, i.e., ‘Treg vs. Tcon’ , ‘T cell activation’ , and ‘TGF-β treatment’.
RNA:protein parallel comparison
We compared RNA and protein expression by matching RNA to the reference proteome, using the Ensembl gene ID as a key. We averaged log2Rs if multiple proteins mapped to the same gene. To obtain the heatmap in Fig. 3e, we first separately the obtained z-scores from the rlog-transformed RNA-Seq data and the log2R values for proteins, before combining them by columns. Hierarchical clustering was applied independently to the four blocks corresponding to ‘DEG and DEP’, ‘DEG not DEP’, ‘DEP not DEG’, and ‘not DEG and not DEP’. The DEG and DEP features were further classified into clusters, by cutting the tree in a number of partitions showing greater support, as evaluated by the average silhouette width and the Dunn Index recursively calculated for a range of 4–10 clusters. A Spearman correlation value was calculated for each gene:protein pair separately and values were grouped for each subset shown in Fig. 3e, where histograms show the corresponding distributions.
The subcellular localization was retrieved from the Human Protein Atlas  (www.proteinatlas.org). Only localizations with ‘Validated’ or ‘Supportive’ reliability were considered. The given localizations were classified in a reduced set, including Nuclear membrane, Nucleoli, Nucleoplasm, Cytoskeleton, Cytosol, Mitochondria, ER-Golgi, Plasma membrane, and Vesicles. For each group in Fig. 3e, the relative fractions of the categories were calculated. All the compartments for multi-localizing proteins were considered.
Gene clustering and functional annotation
Gene clusters (Fig. 4) were identified using a model-based expectation-maximization algorithm, using the R package mclust . We firstly extracted the genes differentially expressed in groups G03 and/or G05 (G G03 ≥ 2 and/or G G05 ≥ 2) from the matrix of the rlog-transformed data for the samples belonging to groups G02, G03, or G05. We then aggregated the individual donors by calculating an average value for each gene. The matrix was then z-score transformed. We estimated the number of Gaussian mixture components using the Bayesian Information Criterion calculated for a range of 2–50 components and for different parameterizations of the covariance matrix. We selected the model and the number of components maximizing the Bayesian Information Criterion. Then, mixture components were hierarchically combined as explained in Baudry et al. , resulting in the final selection of 42 clusters. The clusters were tested for functional enrichment with a hypergeometric test and categories from Gene Ontology (www.geneontology.org/), KEGG (http://www.genome.jp/kegg/), and Reactome databases (http://www.reactome.org/), using the topGO  with a ‘weight01’ algorithm, Category and ReactomePA libraries, respectively.
The association between the clusters was quantified by averaging the Spearman correlation coefficient calculated between all the pairs of genes from each cluster, and a p value was assigned to each association using a permutation approach. Briefly, for each combination of two clusters A and B with n a and n b genes and m ordered samples, we obtained the vector ρ of Spearman correlation coefficients between all s = n a × n b gene pairs; then, the correlation (β) between the two sets A and B was calculated as:
To obtain a permutation-based p value, we repeated this procedure (n perm = 10,000 times) by randomly labelling in each permutation the samples of cluster B. As a result, we obtained a vector π with n perm elements. We then calculated a p value as the fraction of the absolute values of π greater than the absolute value of the original β correlation value. We considered a significant association between every pair of clusters if p < 0.01.
Transcriptional network reconstruction
To model the dynamics of the system, we reconstructed two gene networks using the log2(FPKM) values from ‘Early’ and ‘Late’ samples as indicated in Table 1 and only for the HEGs.
We used ARACNe  to infer edges between the hubs and the expressed genes. Hubs were defined as the TFs that resulted differentially expressed at the gene (DEG) and protein (DEP) levels. In detail, we firstly selected the genes with G time ≥ 2, G G03 ≥ 2, and/or G G05 ≥ 2, then from this list we selected the genes with P time = 1, P G03 = 1, and/or P G05 = 1. RNAs and proteins were matched by the corresponding associated Ensembl gene ID. Finally, we considered all TF genes identified with either or both of two alternative annotations: (1) the human genes with a symbol annotated with the term ‘GO:0003700’ in the Gene Ontology Consortium database (www.geneontology.org) or (2) the Ensembl gene ID retrieved by querying the BioMart service (http://grch37.ensembl.org/) with the Gene Ontology ID ‘GO:0003700’. This list of 307 hubs was provided as input in the ARACNe run in order to calculate a MI value between a hub and all the other genes. Overall, 200 bootstrap networks were generated with a p value threshold of 10− 10 for MI estimation and a data processing inequality tolerance of 0.1 only for triplets involving at least one TF. A consensus network with a p value threshold of 10− 8 was finally obtained from all bootstrap networks previously generated. Early and Late consensus networks were imported into Cytoscape 3.4 (http://www.cytoscape.org/cy3.html) and they were compared with the DyNet algorithm  in order to identify the most rewired nodes, by ranking the nodes with a decreasing Dn score. To select a robust target core subnetwork of iTreg signature genes in the vicinity of FOXP3 (‘iTreg subnetwork’), we selected FOXP3 itself and all the nodes differentially expressed at the mRNA level in all iTreg conditions (G G03 ≥ 2 AND G G04 ≥ 2 AND G G05 ≥ 2 AND G G06 ≥ 2).
DNase footprint networks
DNase-Seq data were obtained from ENCODE (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDgf, sample ‘wgEncodeUwDgfTregwb78495824’) with DNase hypersensitivity (DHS) peaks. Protein-binding footprints (FP) within the DHS peaks were identified by scanning the DHS intervals for gaps in the signals using Wellington , with the parameters: -fp 6,40,2 -sh 4,30,2 -fdrlimit -10. The sequences of the resulting FP intervals were then scanned for protein binding motifs using the MatchTM program  with TRANSFAC positional weight matrices database , using parameters that minimize the sum of both positive and negative error rates. To improve prediction quality, several quality scores were calculated and examined, including FP p value, match score of the sequence to the motif, conservation scores (PhastCons and PhyloP in both glire and placental mammals), and footprint occupancy scores (FOS) , with a few modifications. First, the length of the flanking shoulders was optimized to yield the minimal FOS value; second, FOS was calculated for the predicted TF binding site within the footprint (termed ‘TFBS FOS’), as well as for the whole footprint (‘large FOS’). Networks were reconstructed by drawing an edge from the TF containing the binding motif to its target genes. A target gene is defined as such if binding is predicted to occur within its genomic coordinates, or in its promoter (1300 bp upstream).
Enrichment for GWAS
The GWAS catalog was downloaded (www.ebi.ac.uk/gwas. Accessed 2016–08-14, version 1.0.1) and disease traits were grouped according to their parent term in the Experimental Factor Ontology, resulting in 17 categories. Two additional categories were added, Ai6 and Ai21, which include the GWAS genes for six common autoimmune diseases (MS, rheumatoid arthritis, T1D, CD, systemic lupus erythematous, and psoriasis) or the 21 autoimmune disorders as in Farh et al. , respectively. Each SNP was associated with the overlapping gene or the upstream and downstream genes as reported by the GWAS catalog after the Ensembl mapping pipeline. We tested the iTreg subnetwork for enrichment of the disease categories, with the full network as background, and ORs were calculated. For each category, a null distribution of ORs was obtained by resampling 10,000 sets of nodes with the same size as the iTreg subnetwork from the full network. The resampled OR null distribution was used to calculate a FWER using the step-down minimum p value procedure . Corresponding one-tailed p values were obtained from the fraction of resampled runs with an OR greater than the original OR. This resampling-based p value (R) is similar to that obtained with a hypergeometric distribution (Additional file 6: Table S5). To test the enrichment for specific diseases, we considered the collection of diseases and their associated genes from Menche et al. . ORs and p values were calculated with a hypergeometric test.
Open Targets associations
To derive association scores from the Open Targets resource (www.opentargets.org) the version from September 2016 was used. A minimum score of 0.25 was used to filter out weak associations. The enrichment was tested as above with a hypergeometric test. The categories with a nominal p value below 0.05 were grouped by their therapeutic areas to draw a treemap using the treemap library. Diseases were included into all the therapeutic areas they belonged to.
Detection of PPI modules
We considered data from three different public PPI data repositories (Human Interactome database , STRING v10.0 , and iRefIndex ), and integrated them into a large human PPI network with 26,655 nodes and 334,460 edges, which included both experimentally validated as well as computationally predicted interactions. We applied a score filter of 0.7 on the STRING interactions. To obtain an iTreg-specific PPI network (Treg-PPI), we filtered the integrated PPI network in a way that at least one end of the interacting nodes or their first neighbor is a DEP in iTregs (p < 0.05), and the interaction is being reported in at least two databases. We then applied a clustering method based on a simulation of stochastic flows on the graph, MCL , to identify topological modules in the Treg-PPI. We re-executed MCL with 10 different inflation parameters for detecting the modules with different sizes. We then mapped the categorized GWAS catalog genes described above and the core gene list (iTreg subnetwork) onto the Treg-PPI. Finally, we tested the identified modules for enrichment of core and disease genes using Fisher’s exact test (categories with nominal p < 0.05 were considered, and 119 hypotheses were tested). We did not consider the modules whose enrichment was exclusively due to core genes. The above data analysis pipeline was done in MATLAB and Cytoscape.
Gene Set Enrichment Analysis (GSEA)
GSEA was performed using the package fgsea  and the iTreg subnetwork as gene set. We obtained gene lists from the Expression Atlas , contrasting the expression levels in controls and disease samples from datasets including MS and IBD patients (selected experiments: E-GEOD-12251, E-GEOD-14580, E-GEOD-16879, E-GEOD-23205, E-GEOD-4183, E-GEOD-60424, E-GEOD-6731, E-MEXP-2083, E-MTAB-2973, E-MTAB-69). We also included data from two recent publications [112, 113] not available in the Expression Atlas, which were processed to extract the DEGs between patients and controls. The genes were ranked by -log10P*sign(logFC) and the significance was assessed with 10,000 permutations.
iTreg classification by LDA and RF
LDA  was performed to better understand the distribution of the data regarding different sets of features. LDA easily handles the case where the within-class frequencies are unequal. This method maximizes the ratio of between-class variance to the within-class variance in any particular dataset, thereby guaranteeing maximal separability. LDA was run on gene expression (rlog-transformed RNA-Seq) data.
For testing how well Mock-stimulated cells (G02) can be separated from iTregs generated by any protocol (G03-G06) by selecting two genes, LDA was run for all possible gene pairs selected from three different gene lists (‘37 candidate’ genes, ‘37 known’ Treg regulators, or ‘349 iTreg’, the latter being the 349 nodes in the iTreg subnetwork); we forced that both Mock-stimulated cells and iTregs should be presented in the cross-validation population. For confirmation, the same classifiers (all possible gene pairs from the three gene lists) were applied to rlog-transformed RNA data from the corresponding five time points (2, 6, 24, and 48 h, and 5 days) of the independent RNA-Seq dataset to classify iTregs (G04 + serum) versus Mock-stimulated cells (G02). We also defined ‘top classifiers’ as combinations of two candidate genes derived from the ‘37 candidate’ gene list, which separated groups 100% (in both runs per pair) in the Main dataset, and tested all these top classifier pairs in the independent dataset to determine the minimal accuracy per classifier pair for iTreg classification in the independent dataset. All calculations were performed using the classification tool box of MATLAB, with the auto regularization and five-fold cross-validation option. LDA results were further analyzed in GraphPad Prism (version 7) and Cytoscape.
As an unbiased way of feature (gene) selection to separate Mock (G02) and iTreg (G03–G06) conditions, we applied RFs to measure the importance of genes. RF is a nonparametric method that builds an ensemble model of decision trees from random subsets of features and bagged samples of the training data . We ranked all 15,910 HEGs using the property PermutedVarDeltaError of TreeBagger class in MATLAB. All 37 candidate molecules, and all 37 known Treg regulators, were represented amongst HEGs. PermutedVarDeltaError returns the difference in the model error when permuting the values of a specific variable.
Additional independent RNA-Seq dataset
Naïve CD4+ T cells were isolated from PBMCs by negative selection using the naïve CD4+ T cell isolation kit II (Miltenyi Biotech; > 90% cell purity) from three healthy donors (male, age 32 years; female, age 28 years; male, age 28 years). The study was performed according to the Declaration of Helsinki and was approved by the ethics committee of the Ärztekammer Westfalen-Lippe and Medizinische Fakultät der Westfälischen Wilhelms-Universität Münster (registration number 2010262fS). The participants provided informed consent. Cells were stimulated with 0.5 μg/mL of plate-bound anti-human CD3 (clone UCHT1; Beckman Coulter) and 0.5 μg/mL of soluble anti-human CD28 antibody (clone CD28.2; eBioscience). Cells were stimulated in 24-well plates at 2 × 106 cells/mL in 1 mL of X-Vivo 15 medium (Lonza). For Mock control cells, no further reagents were added, while for iTregs, 10 ng/mL of IL-2 (Peprotech), 10 ng/mL of TGF-β1 (R&D Systems), 10 nM ATRA (Sigma-Aldrich), and 10% (v/v) human serum (Human AB Serum, PAA Laboratories) were added. Samples were taken after 0.5, 1, 2, 6, 12, 24, 48, 72, 96, and 120 h, and lysates were stored at −80 °C until RNA extraction. At each time point, cells were stained by intracellular flow cytometry for FOXP3 expression.
To assess the functional suppressive capacity of iTregs, suppression assays were performed as previously described for nTregs . Briefly, PBMCs were isolated from fresh EDTA blood from a healthy donor and stained with cell proliferation dye efluor670 (eBioscience) according to the manufacturer’s instructions. These cells were re-suspended in X-Vivo15 (Lonza) and used as effector cells. iTregs or Mock control cells harvested on day 3, were washed and re-suspended in X-Vivo15. Effector cells were co-cultured in the absence or presence of different ratios of iTregs or Mock control cells (PBMC:iTreg/Mock cells 1:1 to 1:0.03). Proliferation of effector cells was stimulated with 0.5 μg/mL of anti-CD3 mAB (OKT3, Biolegend). On day 4, the proliferation of effector cells was determined with flow cytometry.
RNA was extracted with AllPrep RNA/DNA minikit (Qiagen) (RNA integrity number mean ± SD 9.1 ± 0.7996), mRNA was enriched via the poly-A tail using oligo-dT attached magnetic beads. Libraries were prepared according to Illumina TruSeq RNA Sample Preparation v2 Guide (part #15026495) and sequenced (paired-end with 2 × 100 bp read length) on an Illumina HiSeq 2500 instrument generating 6 to 37 Mio reads per sample. Paired-end reads were aligned with Tophat (version 2.0.8b)  and Bowtie2 (version 2.1.0)  to human reference genome (hg19) and Gencode transcriptome (v19). We estimated the mean insert size and standard deviation for each sample by mapping one million reads with Bowtie2. Read counts were determined using HTSeq-count (version 0.6.1)  with parameters ‘--stranded no --order name’.
Yadav M, Stephan S, Bluestone JA. Peripherally induced tregs - role in immune homeostasis and autoimmunity. Front Immunol. 2013;4:232.
Schmitt EG, Williams CB. Generation and function of induced regulatory T cells. Front Immunol. 2013;4:152.
Haribhai D, Williams JB, Jia S, Nickerson D, Schmitt EG, Edwards B, et al. A requisite role for induced regulatory T cells in tolerance based on expanding antigen receptor diversity. Immunity. 2011;35:109–22.
Ramsdell F, Ziegler SF. FOXP3 and scurfy: how it all began. Nat Rev Immunol. 2014;14:343–9. https://doi.org/10.1038/nri3650.
Delacher M, Schreiber L, Richards DM, Farah C, Feuerer M, Huehn J. Transcriptional control of regulatory T cells. Curr Top Microbiol Immunol. 2014;381:83–124.
Pillai V, Ortega SB, Wang CK, Karandikar NJ. Transient regulatory T-cells: a state attained by all activated human T-cells. Clin Immunol. 2007;123:18–29.
Wang J, Ioan-Facsinay A, van der Voort E, Huizinga TW, Toes RE. Transient expression of FOXP3 in human activated nonregulatory CD4+ T cells. Eur J Immunol. 2007;37:129–38.
Ziegler SF. FOXP3: of mice and men. Annu Rev Immunol. 2006;24:209–26.
Allan SE, Passerini L, Bacchetta R, Crellin N, Dai M, Orban PC, et al. The role of 2 FOXP3 isoforms in the generation of human CD4+ Tregs. J Clin Invest. 2005;115:3276–84.
Mailer RKW, Falk K, Rötzschke O. Absence of leucine zipper in the natural FOXP3Delta2Delta7 isoform does not affect dimerization but abrogates suppressive capacity. PLoS One. 2009;4:e6104. https://doi.org/10.1371/journal.pone.0006104.
Joly A-L, Liu S, Dahlberg CIM, Mailer RKW, Westerberg LS, Andersson J. Foxp3 lacking exons 2 and 7 is unable to confer suppressive ability to regulatory T cells in vivo. J Autoimmun. 2015;63:23–30. https://doi.org/10.1016/j.jaut.2015.06.009.
Spence A, Klementowicz JE, Bluestone JA, Tang Q. Targeting Treg signaling for the treatment of autoimmune diseases. Curr Opin Immunol. 2015;37:11–20.
Gregori S, Passerini L, Roncarolo MG. Clinical outlook for type-1 and FOXP3+ T regulatory cell-based therapy. Front Immunol. 2015;6:593.
Pere H, Tanchot C, Bayry J, Terme M, Taieb J, Badoual C, et al. Comprehensive analysis of current approaches to inhibit regulatory T cells in cancer. Oncoimmunology. 2012;1:326–33.
Liu C, Workman CJ, Vignali DAA. Targeting regulatory T cells in tumors. FEBS J. 2016;283(14):2731–48.
Travis MA, Sheppard D. TGF-β activation and function in immunity. Annu Rev Immunol. 2014;32:51–82.
Liu Z-M, Wang K-P, Ma J, Guo ZS. The role of all-trans retinoic acid in the biology of Foxp3+ regulatory T cells. Cell Mol Immunol. 2015;12:553–7. https://doi.org/10.1038/cmi.2014.133.
Furusawa Y, Obata Y, Fukuda S, Endo TA, Nakato G, Takahashi D, et al. Commensal microbe-derived butyrate induces the differentiation of colonic regulatory T cells. Nature. 2013;504:446–50.
Arpaia N, Campbell C, Fan X, Dikiy S, van der Veeken J, DeRoos P, et al. Metabolites produced by commensal bacteria promote peripheral regulatory T-cell generation. Nature. 2013;504:451–5.
Josefowicz SZ, Niec RE, Kim HY, Treuting P, Chinen T, Zheng Y, et al. Extrathymically generated regulatory T cells control mucosal TH2 inflammation. Nature. 2012;482:395–9.
Samstein RM, Josefowicz SZ, Arvey A, Treuting PM, Rudensky AY. Extrathymic generation of regulatory T cells in placental mammals mitigates maternal-fetal conflict. Cell. 2012;150:29–38.
Wing K, Sakaguchi S. Regulatory T cells exert checks and balances on self tolerance and autoimmunity. Nat Immunol. 2010;11:7–13.
Breuer J, Schwab N, Schneider-Hohendorf T, Marziniak M, Mohan H, Bhatia U, et al. Ultraviolet B light attenuates the systemic immune response in central nervous system autoimmunity. Ann Neurol. 2014;75:739–58. https://doi.org/10.1002/ana.24165.
Huter EN, Punkosdy GA, Glass DD, Cheng LI, Ward JM, Shevach EM. TGF-beta-induced Foxp3+ regulatory T cells rescue scurfy mice. Eur J Immunol. 2008;38:1814–21.
Huehn J, Polansky JK, Hamann A. Epigenetic control of FOXP3 expression: the key to a stable regulatory T-cell lineage? Nat Rev Immunol. 2009;9:83–9.
Morikawa H, Ohkura N, Vandenbon A, Itoh M, Nagao-Sato S, Kawaji H, et al. Differential roles of epigenetic changes and Foxp3 expression in regulatory T cell-specific transcriptional regulation. Proc Natl Acad Sci USA. 2014;111:5289–94.
Shevach EM, Thornton AM. tTregs, pTregs, and iTregs: similarities and differences. Immunol Rev. 2014;259:88–102.
Schmidt A, Eriksson M, Shang M-M, Weyd H, Tegnér J. Comparative analysis of protocols to induce human CD4+Foxp3+ regulatory T cells by combinations of IL-2, TGF-beta, retinoic acid, rapamycin and butyrate. PLoS One. 2016;11:e0148474. https://doi.org/10.1371/journal.pone.0148474.
Schmidt A, Éliás S, Joshi RN, Tegnér J. In vitro differentiation of human CD4+FOXP3+ induced regulatory T cells (iTregs) from naïve CD4+ T cells using a TGF-β-containing protocol. J Vis Exp. 2016:118:e55015.
Shan J, Feng L, Li Y, Sun G, Chen X, Chen P. The effects of rapamycin on regulatory T cells: Its potential time-dependent role in inducing transplant tolerance. Immunol Lett. 2015;162:74–86.
Galgani M, De Rosa V, La Cava A, Matarese G. Role of metabolism in the immunobiology of regulatory T cells. J Immunol. 2016;197:2567–75. https://doi.org/10.4049/jimmunol.1600242.
Dhamne C, Chung Y, Alousi AM, Cooper LJ, Tran DQ. Peripheral and thymic foxp3(+) regulatory T cells in search of origin, distinction, and function. Front Immunol. 2013;4:253.
Pan F, Yu H, Dang EV, Barbi J, Pan X, Grosso JF, et al. Eos mediates Foxp3-dependent gene silencing in CD4+ regulatory T cells. Science. 2009;325:1142–6.
Branca RMM, Orre LM, Johansson HJ, Granholm V, Huss M, Pérez-Bercoff Å, et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat Methods. 2013;11:59–62. https://doi.org/10.1038/nmeth.2732.
Wan YY, Flavell RA. Regulatory T-cell functions are subverted and converted owing to attenuated Foxp3 expression. Nature. 2007;445:766–70. https://doi.org/10.1038/nature05479.
Hebenstreit D, Fang M, Gu M, Charoensawan V, van Oudenaarden A, Teichmann SA. RNA sequencing reveals two major classes of gene expression levels in metazoan cells. Mol Syst Biol. 2011;7:497. https://doi.org/10.1038/msb.2011.28.
Wiśniewski JR, Hein MY, Cox J, Mann M. A ‘proteomic ruler’ for protein copy number and concentration estimation without spike-in standards. Mol Cell Proteomics. 2014;13:3497–506. https://doi.org/10.1074/mcp.M113.037309.
Hill JA, Hall JA, Sun CM, Cai Q, Ghyselinck N, Chambon P, et al. Retinoic acid enhances Foxp3 induction indirectly by relieving inhibition from CD4+CD44hi Cells. Immunity. 2008;29:758–70.
Feuerer M, Hill JA, Kretschmer K, von Boehmer H, Mathis D, Benoist C. Genomic definition of multiple ex vivo regulatory T cell subphenotypes. Proc Natl Acad Sci U S A. 2010;107:5919–24.
Richards DM, Delacher M, Goldfarb Y, Kägebein D, Hofer A-C, Abramson J, et al. Treg cell differentiation: from thymus to peripheral tissue. Prog Mol Biol Transl Sci. 2015;136:175–205. https://doi.org/10.1016/bs.pmbts.2015.07.014.
Tanoue T, Atarashi K, Honda K. Development and maintenance of intestinal regulatory T cells. Nat Rev Immunol. 2016;16:295–309. https://doi.org/10.1038/nri.2016.36.
Benoist C, Mathis D. Treg cells, life history, and diversity. Cold Spring Harb Perspect Biol. 2012;4:a007021. https://doi.org/10.1101/cshperspect.a007021.
Liu Y, Beyer A, Aebersold R. On the dependency of cellular protein levels on mRNA abundance. Cell. 2016;165:535–50. https://doi.org/10.1016/j.cell.2016.03.014.
Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet. 2012;13:227–32. https://doi.org/10.1038/nrg3185.
Tuomela S, Lahesmaa R. Early T helper cell programming of gene expression in human. Semin Immunol. 2013;25:282–90. https://doi.org/10.1016/j.smim.2013.10.013.
Mingueneau M, Kreslavsky T, Gray D, Heng T, Cruse R, Ericson J, et al. The transcriptional landscape of αβ T cell differentiation. Nat Immunol. 2013;14:619–32. https://doi.org/10.1038/ni.2590.
Pearce EL, Pearce EJ. Metabolic pathways in immune cell activation and quiescence. Immunity. 2013;38:633–43. https://doi.org/10.1016/j.immuni.2013.04.005.
Goenawan IH, Bryan K, Lynn DJ. DyNet: visualization and analysis of dynamic molecular interaction networks. Bioinformatics. 2016;32(17):2713–5. https://doi.org/10.1093/bioinformatics/btw187.
Bin Dhuban K, d’Hennezel E, Nagai Y, Xiao Y, Shao S, Istomine R, et al. Suppression by human FOXP3(+) regulatory T cells requires FOXP3-TIP60 interactions. Sci Immunol 2017;2(12). https://doi.org/10.1126/sciimmunol.aai9297.
Buckner JH. Mechanisms of impaired regulation by CD4(+)CD25(+)FOXP3(+) regulatory T cells in human autoimmune diseases. Nat Rev Immunol. 2010;10:849–59.
Farh KK-H, Marson A, Zhu J, Kleinewietfeld M, Housley WJ, Beik S, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2014;518:337–43. https://doi.org/10.1038/nature13835.
Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347:1257601. https://doi.org/10.1126/science.1257601.
Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002;99:6562–6. https://doi.org/10.1073/pnas.102102699.
Fukunaga K. Introduction to Statistical Pattern Recognition. San Diego, CA: Academic Press Inc.; 1990. https://doi.org/10.1016/0098-3004(96)00017-9.
Gustafsson M, Gawel DR, Alfredsson L, Baranzini S, Björkander J, Blomgran R, et al. A validated gene regulatory network and GWAS identifies early regulators of T cell–associated diseases. Sci Transl Med. 2015;7:313ra178. https://doi.org/10.1126/scitranslmed.aad2722.
Ullah U, Andrabi SBA, Tripathi SK, Dirasantha O, Kanduri K, Rautio S, et al. Transcriptional repressor HIC1 contributes to suppressive function of human induced regulatory T cells. Cell Rep. 2018;22:2094–106. https://doi.org/10.1016/j.celrep.2018.01.070.
Ohkura N, Hamaguchi M, Morikawa H, Sugimura K, Tanaka A, Ito Y, et al. T cell receptor stimulation-induced epigenetic changes and Foxp3 expression are independent and complementary events required for Treg cell development. Immunity. 2012;37:785–99.
Westendorf K, Okhrimenko A, Grün JR, Schliemann H, Chang H-D, Dong J, et al. Unbiased transcriptomes of resting human CD4+ CD45RO+ T lymphocytes. Eur J Immunol. 2014;44:1866–9. https://doi.org/10.1002/eji.201344323.
Beliakova-Bethell N, Massanella M, White C, Lada S, Du P, Vaida F, et al. The effect of cell subset isolation method on gene expression in leukocytes. Cytometry A. 2014;85:94–104. https://doi.org/10.1002/cyto.a.22352.
Lund RJ, Loytomaki M, Naumanen T, Dixon C, Chen Z, Ahlfors H, et al. Genome-wide identification of novel genes involved in early Th1 and Th2 cell differentiation. J Immunol. 2007;178:3648–60.
Tuomela S, Salo V, Tripathi SK, Chen Z, Laurila K, Gupta B, et al. Identification of early gene expression changes during human Th17 cell differentiation. Blood. 2012;119:e151–60.
Lee Y, Awasthi A, Yosef N, Quintana FJ, Xiao S, Peters A, et al. Induction and molecular signature of pathogenic TH17 cells. Nat Immunol. 2012;13:991–9. https://doi.org/10.1038/ni.2416.
Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, et al. A validated regulatory network for Th17 cell specification. Cell. 2012;151:289–303. https://doi.org/10.1016/j.cell.2012.09.016.
Chi H. Regulation and function of mTOR signalling in T cell fate decisions. Nat Rev Immunol. 2012;12:325–38.
Battaglia M, Stabilini A, Tresoldi E. Expanding human T regulatory cells with the mTOR-inhibitor rapamycin. Methods Mol Biol. 2012;821:279–93. https://doi.org/10.1007/978-1-61779-430-8_17.
Candia E, Reyes P, Covian C, Rodriguez F, Wainstein N, Morales J, et al. Single and combined effect of retinoic acid and rapamycin modulate the generation, activity and homing potential of induced human regulatory T cells. PLoS One. 2017;12:e0182009.
Hippen KL, Merkel SC, Schirm DK, Nelson C, Tennis NC, Riley JL, et al. Generation and large-scale expansion of human inducible regulatory T cells that suppress graft-versus-host disease. Am J Transplant. 2011;11:1148–57.
Lu L, Zhou X, Wang J, Zheng SG, Horwitz DA. Characterization of protective human CD4CD25 FOXP3 regulatory T cells generated with IL-2, TGF-beta and retinoic acid. PLoS One. 2010;5:e15150.
Qian X, Wang K, Wang X, Zheng SG, Lu L. Generation of human regulatory T cells de novo with suppressive function prevent xenogeneic graft versus host disease. Int Immunopharmacol. 2011;11:630–7.
Schmidt A, Oberle N, Krammer PH. Molecular mechanisms of treg-mediated T cell suppression. Front Immunol. 2012;3:51.
Jhunjhunwala S, Chen LC, Nichols EE, Thomson AW, Raimondi G, Little SR. All-trans retinoic acid and rapamycin synergize with transforming growth factor-beta1 to induce regulatory T cells but confer different migratory capacities. J Leukoc Biol. 2013;94:981–9.
Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, et al. Global quantification of mammalian gene expression control. Nature. 2011;473:337–42. https://doi.org/10.1038/nature10098.
Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, et al. Proteogenomic characterization of human colon and rectal cancer. Nature. 2014;513:382–7. https://doi.org/10.1038/nature13438.
Edfors F, Danielsson F, Hallström BM, Käll L, Lundberg E, Pontén F, et al. Gene-specific correlation of RNA and protein levels in human cells and tissues. Mol Syst Biol. 2016;12:883. https://doi.org/10.15252/msb.20167144.
Meyuhas O, Kahan T. The race to decipher the top secrets of TOP mRNAs. Biochim Biophys Acta. 1849;2015:801–11. https://doi.org/10.1016/j.bbagrm.2014.08.015.
Procaccini C, Carbone F, Di Silvestre D, Brambilla F, De Rosa V, Galgani M, et al. The proteomic landscape of human ex vivo regulatory and conventional T cells reveals specific metabolic requirements. Immunity. 2016;44:406–21. https://doi.org/10.1016/j.immuni.2016.01.028.
van der Veeken J, Arvey A, Rudensky A. Transcriptional control of regulatory T-cell differentiation. Cold Spring Harb Symp Quant Biol. 2013;78:215–22. https://doi.org/10.1101/sqb.2013.78.020289.
Brewer JA, Vogt SK, Sleckman BP, Swat W, Kanagawa O, Muglia LJ. Knock-ins and conditional knockouts: in vivo analysis of glucocorticoid receptor regulation and function. Endocr Res. 2002;28:545–50. https://doi.org/10.1081/ERC-120016839.
Li Q, Lee CH, Peters LA, Mastropaolo LA, Thoeni C, Elkadri A, et al. Variants in TRIM22 that affect NOD2 signaling are associated with very-early-onset inflammatory bowel disease. Gastroenterology. 2016;150:1196–207.
Lathrop SK, Bloom SM, Rao SM, Nutsch K, Lio C-W, Santacruz N, et al. Peripheral education of the immune system by colonic commensal microbiota. Nature. 2011;478:250–4. https://doi.org/10.1038/nature10434.
Bhairavabhotla R, Kim YC, Glass DD, Escobar TM, Patel MC, Zahr R, et al. Transcriptome profiling of human FoxP3+ regulatory T cells. Hum Immunol. 2016;77:201–13. https://doi.org/10.1016/j.humimm.2015.12.004.
Birzele F, Fauti T, Stahl H, Lenter MC, Simon E, Knebel D, et al. Next-generation insights into regulatory T cells: expression profiling and FoxP3 occupancy in Human. Nucleic Acids Res. 2011;39:7946–60.
Zheng Y, Josefowicz S, Chaudhry A, Peng XP, Forbush K, Rudensky AY. Role of conserved non-coding DNA elements in the Foxp3 gene in regulatory T-cell fate. Nature. 2010;463:808–12.
Feng Y, Arvey A, Chinen T, van der Veeken J, Gasteiger G, Rudensky AY. Control of the Inheritance of Regulatory T Cell Identity by a cis Element in the Foxp3 Locus. Cell. 2014;158:749–63. https://doi.org/10.1016/j.cell.2014.07.031.
Li X, Liang Y, LeBlanc M, Benner C, Zheng Y. Function of a Foxp3 cis-element in protecting regulatory T cell identity. Cell. 2014;158:734–48. https://doi.org/10.1016/j.cell.2014.07.030.
Schmidt A, Oberle N, Weiss EM, Vobis D, Frischbutter S, Baumgrass R, et al. Human regulatory T cells rapidly suppress T cell receptor-induced Ca(2+), NF-kappaB, and NFAT signaling in conventional T cells. Sci Signal. 2011;4:ra90.
Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, et al. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9:e1003118. https://doi.org/10.1371/journal.pcbi.1003118.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. https://doi.org/10.1186/s13059-014-0550-8.
Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. 2002;97:611–31. https://doi.org/10.1198/016214502760047131.
Nueda MJ, Tarazona S, Conesa A. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics. 2014;30:2598–602.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
Wehrens R, Buydens LMC. Self- and super-organizing maps in R: The kohonen package. J Stat Softw. 2007;21:1–19.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50. https://doi.org/10.1073/pnas.0506580102.
Ferraro A, D’Alise AM, Raj T, Asinovski N, Phillips R, Ergun A, et al. Interindividual variation in human T regulatory cells. Proc Natl Acad Sci U S A. 2014;111:E1111–20. https://doi.org/10.1073/pnas.1401343111.
Hill JA, Feuerer M, Tash K, Haxhinasto S, Perez J, Melamed R, et al. Foxp3 transcription-factor-dependent and -independent regulation of the regulatory T cell transcriptional signature. Immunity. 2007;27:786–800.
Gaublomme JT, Yosef N, Lee Y, Gertner RS, Yang LV, Wu C, et al. Single-cell genomics unveils critical regulators of Th17 cell pathogenicity. Cell. 2015;163:1400–12.
Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419.
Baudry J-P, Raftery AE, Celeux G, Lo K, Gottardo R. Combining mixture components for clustering. J Comput Graph Stat. 2010;19:332–53. https://doi.org/10.1198/jcgs.2010.08111.
Alexa A, Rahnenfuhrer J, Lengauer T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006;22:1600–7. https://doi.org/10.1093/bioinformatics/btl140.
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera R, et al. aracne: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(Suppl 1):S7. https://doi.org/10.1186/1471-2105-7-S1-S7.
Piper J, Elze MC, Cauchy P, Cockerill PN, Bonifer C, Ott S. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 2013;41:e201. https://doi.org/10.1093/nar/gkt850.
Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, et al. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 2001;29:281–3.
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 2006;34(Database issue):D108–10. https://doi.org/10.1093/nar/gkj143.
Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90. https://doi.org/10.1038/nature11212.
van der Laan MJ, Dudoit S, Pollard KS. Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate. UC Berkeley Div Biostat Work Pap Ser. 2003. Working Paper 139. https://biostats.bepress.com/ucbbiostat/paper139.
Rolland T, Taşan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, et al. A proteome-scale map of the human interactome network. Cell. 2014;159:1212–26. https://doi.org/10.1016/j.cell.2014.10.050.
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39(Database):D561–8. https://doi.org/10.1093/nar/gkq973.
Razick S, Magklaras G, Donaldson IM. iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 2008;9:405. https://doi.org/10.1186/1471-2105-9-405.
Van Dongen S. Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl. 2008;30:121–41. https://doi.org/10.1137/040608635.
Sergushichev A. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv. 2016. https://doi.org/10.1101/060012.
Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, et al. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res. 2016;44:D746–52. https://doi.org/10.1093/nar/gkv1045.
Sarmento OF, Svingen PA, Xiong Y, Sun Z, Bamidele AO, Mathison AJ, et al. The role of the histone methyltransferase enhancer of Zeste homolog 2 (EZH2) in the pathobiological mechanisms underlying inflammatory bowel disease (IBD). J Biol Chem. 2017;292(2):706–22.
Ventham NT, Kennedy NA, Adams AT, Kalla R, Heath S, O’Leary KR, et al. Integrative epigenome-wide analysis demonstrates that DNA methylation may mediate genetic risk in inflammatory bowel disease. Nat Commun. 2016;7:13507. https://doi.org/10.1038/ncomms13507.
Breiman L. Random Forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324.
Trinschek B, Luessi F, Gross CC, Wiendl H, Jonuleit H. Interferon-beta therapy of multiple sclerosis patients improves the responsiveness of T cells for immune suppression by regulatory T cells. Int J Mol Sci. 2015;16:16330–46. https://doi.org/10.3390/ijms160716330.
Trapnell C, Pachter L, Salzberg SL. TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–11.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.
Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9. https://doi.org/10.1093/bioinformatics/btu638.
Okuda S, Watanabe Y, Moriya Y, Kawano S, Yamamoto T, Matsumoto M, et al. jPOSTrepo: an international standard data repository for proteomes. Nucleic Acids Res. 2017;45:D1107–11. https://doi.org/10.1093/nar/gkw1080.
The authors would like to acknowledge support from Clinical Proteomics Mass Spectrometry at Karolinska University Hospital and the Science for Life Laboratory for providing assistance in mass spectrometry and data analysis, and we are grateful for support from Hillevi Andersson Sand from the facility. Anjana Rao (Division of Signaling and Gene Expression, La Jolla Institute for Allergy and Immunology (LJI)) is gratefully acknowledged for help in design and interpretation of the shRNA validation screen. From the same division at LJI, we thank Sara Trifari for providing protocols, packaging plasmids, and invaluable advice to establish lentivirus production and T cell transduction, and Chan-Wang (Jerry) Lio and Victor Wong for suggestions on virus production. Elise Pham from the Functional Genomics Center at LJI is acknowledged for providing glycerol stocks and advice for shRNA screening. Peri Noori (Computational Medicine Unit, KI) is acknowledged for laboratory management and advice on RNA isolations. We thank Gordon Ball (Computational Medicine Unit, KI) for server maintenance. We thank Daniela Roosterman (Department of Neurology, University Hospital Münster) for excellent technical assistance. We thank Sarita Heinonen, Marjo Hakkarainen, and Päivi Junni as well as personnel in Finnish Functional Genomics Centre and Bioinformatics core at Turku Centre for Biotechnology, University of Turku and Åbo Akademi University for their excellent technical assistance and Dr. Omid Rasool for fruitful discussions. We thank Biocenter Finland for the infrastructure support. The computational analysis of additional RNA-Seq data was performed using resources provided by the Aalto Science-IT project.
AS was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme (FP7, Project ID: 326930, project acronym ‘ITREGDIFFERENTIATION’), the Dr. Åke Olsson Foundation, Karolinska Institutet Stiftelser & Fonder, the Erik and Edith Fernström Foundation, the German Society for Immunology, and EUROIMMUN AG. AS and JT were supported by a CERIC (Center of Excellence for Research on Inflammation and Cardiovascular disease) grant. AS, JT, and DGC received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7, ERC Project ID: 617393 ‘CAUSALPATH’). JT was supported by Vetenskapsrådet Medicine and Health (Dnr 2011–3264), Torsten Söderberg Foundation, FP7 STATegra, AFA Insurance, and Stockholm County Council. FM was supported by the Karolinska Institutet Stiftelser & Fonder. NAK was supported by Vinnova VINNMER fellowship, Stratneuro and Karolinska Institutet Stiftelser & Fonder. SE was supported by Karolinska Institute’s faculty funds for doctoral education (KID-funding). HW and CCG received funding from the German Research Foundation (DFG, Collaborative Research Centre CRC 128 ‘Initiating/Effector versus Regulatory Mechanisms in Multiple Sclerosis – Progress towards Tackling the Disease’ project A09 to HW and CCG and Z02 to HW) and from the Federal Ministry of Education and Research (BMBF) supporting the Disease-related Competence Network for Multiple Sclerosis (Krankheitsbezogenes Netzwerk Multiple Sklerose, KKNMS, FKZ 01FI1603a). UB was supported by the Cluster of Excellence ‘Cells in Motion’ (CiM) Bridging Fund. RL, HL, and HW were supported by the Academy of Finland (AoF; grant 256355) and the German Research Foundation (DFG) (Immunology Initiative ‘Systems biology approach to molecular mechanisms of human TGFb induced iTreg cell differentiation and the role of iTreg in Multiple sclerosis’). RL and HL were supported by the AoF Centre of Excellence in Molecular Systems Immunology and Physiology Research 2012–2017 (AoF grant 250114). RL was supported by AoF grant 294337, the Sigrid Juselius Foundation and the Paulo Foundation.
Availability of data and materials
The datasets generated and analyzed during the current study are available in repositories as follows:
Mass spectrometry proteomics data is deposited to jPOSTrepo  (a repository that is in the ProteomeXchange consortium) with the dataset identifier JPST000224 & PXD005703 (https://repository.jpostdb.org/entry/JPST000224).
RNA-Seq data accession codes: GSE94396 (Main dataset) and GSE96538 (independent dataset) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94396,
Ethics approval and consent to participate
The experimental methods comply with the Declaration of Helsinki and ethical approval was obtained as appropriate and is detailed in the respective Methods sections.
CCG received speaker honoraria and travel expenses for attending meetings from Bayer Health Care, Genzyme, and Novartis Pharma GmbH and her research is funded by the Interdisciplinary Center for Clinical Studies (IZKF). HW receives honoraria for acting as a member of Scientific Advisory Boards and as consultant for Biogen, Evgen, MedDay Pharmaceuticals, Merck Serono, Novartis, Roche Pharma AG, Sanofi-Genzyme, as well as speaker honoraria and travel support from Alexion, Biogen, Cognomed, F. Hoffmann-La Roche Ltd., Gemeinnützige Hertie-Stiftung, Merck Serono, Novartis, Roche Pharma AG, Sanofi-Genzyme, TEVA, and WebMD Global. HW is acting as a paid consultant for Abbvie, Actelion, Biogen, IGES, Novartis, Roche, Sanofi-Genzyme, and the Swiss Multiple Sclerosis Society. HW received research funding from the Else Kröner Fresenius Foundation, Fresenius Foundation, Hertie Foundation, NRW Ministry of Education and Research, IZKF Muenster and RE Children’s Foundation, Biogen GmbH, GlaxoSmithKline GmbH, Roche Pharma AG, and Sanofi-Genzyme.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
All other authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure S1. Flow cytometry and qRT-PCR quality control of cellular samples used for molecular profiling. Figure S2. Differential gene and protein expression analysis and overlap of RNA-Seq and proteomics data. Figure S3. Exploratory analysis of RNA-Seq and proteomics data. Figure S4. Estimated number of proteins per cell based on iTreg proteomics data. Figure S5. Gene Ontology and pathway enrichment analysis of the DEG and DEP clusters. Figure S6. iTreg subnetwork reconstruction strategy. Figure S7. Features of the iTreg candidate molecules and the confirmatory independent RNA-Seq dataset. Figure S8. Linear Discriminant and Random Forest analyses confirm the potential of candidate genes to classify iTregs. Figure S9. Experimental validation of novel FOXP3+ Treg regulatory molecules. (PDF 3722 kb)
Table S1. Signature scores. (XLS 30 kb)
Table S2. Differentially expressed genes and proteins. (XLSX 6515 kb)
Table S3. Clustering and functional annotation. (XLSX 214 kb)
Table S4. Network nodes and edges. (XLSX 109 kb)
Table S5. Functional and disease annotation iTreg subnetwork. (XLSX 37 kb)
Table S6. Random Forest ranking for iTreg classification. (TXT 546 kb)
Table S7. shRNA clone list. (XLSX 15 kb)