Assessing the human immune system through blood transcriptomics
© Chaussabel et al. 2010
Received: 13 May 2010
Accepted: 15 June 2010
Published: 1 July 2010
Skip to main content
© Chaussabel et al. 2010
Received: 13 May 2010
Accepted: 15 June 2010
Published: 1 July 2010
Blood is the pipeline of the immune system. Assessing changes in transcript abundance in blood on a genome-wide scale affords a comprehensive view of the status of the immune system in health and disease. This review summarizes the work that has used this approach to identify therapeutic targets and biomarker signatures in the field of autoimmunity and infectious disease. Recent technological and methodological advances that will carry the blood transcriptome research field forward are also discussed.
The immune system plays a central role not only in health maintenance but also in pathogenesis: excess immunity is associated, for instance, with auto-immune diseases (for example, multiple sclerosis, type 1 diabetes, psoriasis, lupus, rheumatoid arthritis), inflammation (sepsis, inflammatory bowel disease) and allergy, as well as cell and organ rejection; deficient immunity is, on the other hand, linked to cancer or susceptibility to infection.
The human genome can be investigated from two different angles that consist of either determining its make up or measuring its output. Sequence variation can be detected using, for instance, single nucleotide polymorphism (SNP) chips, which permit the identification of common polymorphisms or rare mutations associated with diseases. Hundreds of thousands of SNPs can be typed using these platforms, yielding a genome-wide, hypothesis-free scan of genetic associations for a given phenotype of interest. Many such genome-wide association studies (often referred to as GWAS) have been published in recent years, a number of them investigating the genetic underpinning of immune-related diseases . Notably, such studies have been useful to pinpoint genes and pathways that may be involved in the pathogenesis of autoimmune diseases . Associations between common genetic variants and resistance to infection have also been reported [3, 4]. However, parameters measured by this approach are determined by heredity and will not change throughout the life of an individual. This is in contrast to transcript abundance, which is the parameter measured by the second genome-wide profiling approach. Transcriptional activity is largely dependent on environmental factors and, as a result, RNA abundance will change dynamically over time. For instance, sets of transcripts may be induced in response to an infectious challenge and return to baseline levels following pathogen clearance. Dynamic changes in the cellular make up of a tissue will also effect changes in transcript abundance that will be measured on a genome-wide scale.
Transcriptional profiles have been obtained from many human tissues -including, for instance, the skin [5, 6], muscle , liver [8, 9], kidney [10, 11] or brain  - but the status of the immune system can be best monitored by profiling transcript abundance in blood. Indeed, profiling transcript abundance in blood provides a 'snap shot' of the complex immune networks that operate throughout the entire body. However, while this has proven to be a valid approach to finding clues about pathogenesis as well as to identifying potential biomarkers [13–16], a number of challenges and limitations exist. Data interpretation is one of them. Firstly, the volume of data generated from such studies can be overwhelming, and it is necessary to integrate information from a multitude of sources (study design, quality control data, sample information, and importantly clinical information) in order for the results to be interpretable. Secondly, the changes in transcript abundance observed in complex tissues such as blood can be caused not only by regulation of gene transcriptional activity but also by relative changes in abundance of cell populations expressing transcripts at constant levels. Thirdly, in addition to pathogenic processes, a number of factors may affect blood transcript abundance and confound the analysis. Medications and co-morbidities are two such factors that often restrict patient selection and complicate data interpretation. This review will discuss some of the strategies recently developed that will address some of these limitations.
Other technologies should be considered for the profiling of focused sets of genes. Nanostring technology can, for instance, detect the abundance of up to 500 transcripts with high sensitivity . The approach is 'digital' since it counts individual RNA molecules using strings of fluorochromes as reporters to identify the different RNA species. Other technology platforms developed by, among others, Luminex, High Throughput Genomics or Fluidigm round up the offering for 'sub-genome' transcript profiling.
The field of autoimmunity has proven a fertile ground for blood transcriptional studies. Alterations in transcript abundance in the blood of patients reflect the sustained response against self-antigens and, more generally, uncontrolled inflammatory processes. Such diseases often present with recurring-remitting patterns of activity, with episodes of flaring that may be reflected by fluctuations in transcript abundance. The work has initially focused on diseases with clear systemic involvement such as systemic lupus erythematosus (SLE) [20, 21]. Multiple cell types and soluble mediators, including IL10 [22, 23] and IFNγ [24–26], have been proposed to be at the center of lupus pathogenesis. While some scattered evidence indicated the potential role of type I interferon in lupus, several observations did not support the hypothesis: first, not every SLE patient has detectable serum type I IFN levels ; second, dysregulation of type-I IFN production is not found in most murine SLE-models ; and third, genetic linkage and association studies had not identified candidate lupus susceptibility genes within the IFN pathway . However, in one of our earliest microarray studies we demonstrated that all but one of the pediatric patients exhibited upregulation of IFN-inducible genes, and the only patient lacking this signature had been in remission for over 2 years . In addition, it was found that treating SLE patients with high dose IV steroids, which are used to control disease flares, results in the silencing of the IFN signature. A surprise from these initial studies was the absence of type I IFN gene transcripts in the face of an abundance of IFN-inducible ones in the blood cells of SLE patients. A likely explanation is that the cells producing type I IFN, and therefore transcribing these genes, migrate to sites of injury. Altogether, results from microarray studies played a key role in convincing the community of the potential importance of type I IFN in SLE pathogenesis [15, 30–34]. A phase Ia trial to evaluate the safety, pharmacokinetics, and immunogenicity of anti-IFN monoclonal antibody (mAb) therapy in adult SLE patients was recently conducted . The antibody elicited a specific and dose-dependent inhibition of overexpression of type I IFN-inducible genes in both whole blood and skin lesions from SLE patients, at both the transcript and protein levels. As expected, overexpression of BLyS/BAFF, a type I IFN-inducible gene, also decreased with treatment. Thus, this first trial supports the proposed central role of type I IFN in human SLE.
Systemic onset juvenile arthritis (SoJIA) is another disease with systemic involvement that greatly benefited from the study of blood transcriptional profiles with the development of both therapeutic and diagnostic modalities [14, 16, 36, 37]. Diseases with specific organ involvement have also been the subject of significant, yet not always extensive, blood profiling efforts. Blood signatures have, for instance, been obtained from patients with multiple sclerosis [38, 39]. Given the inaccessibility of the brain, blood constitutes a particularly attractive source of surrogate molecular markers for this disease. These efforts have yielded a systemic signature and identified potential predictive markers of clinical relapse and response to treatment [40–42]. Transcriptional signatures have also been generated in the context of dermatologic diseases. In this case, the target organ being readily accessible, efforts have been focusing on profiling transcript abundance in skin tissues [43, 44]. However, systemic involvement has been recognized in recent years to be an important component of autoimmune skin diseases and unique blood transcriptional profiles have also been identified in patients with, for example, psoriasis [45–47].
Blood transcriptional profiles have been generated in the context of many other autoimmune diseases. Indeed, the range of autoimmune/autoinflammatory diseases that have been investigated encompasses SLE [20, 21, 48, 49], juvenile idiopathic arthritis [16, 50–53], multiple sclerosis [54, 55], rheumatoid arthritis [56–59], Sjogren's syndrome , diabetes [61, 62], inflammatory bowel disease , psoriasis and psoriatic arthritis [45, 47], inflammatory myopathies [64, 65], scleroderma [66, 67], vasculitis  and anti-phospholipid syndrome . The body of work produced that focuses on blood transcript profiling in the context of autoimmune diseases has been covered at length in a recent review .
Global changes in transcript abundance have also been measured in the blood of patients with infectious diseases. In this context, alterations of blood transcriptional profiles are a reflection of the immunological response mounted by the host against pathogens. This response is initiated by specialized receptors expressed at the surface of host cells recognizing pathogen-associated molecular patterns . Different classes of pathogens signal through different combinations of receptors, eliciting in turn different types of immune responses . This translates experimentally into distinct transcriptional programs being induced upon exposure of immune cells in vitro to distinct classes of infectious agents [73–75]. Similarly, patterns of transcript abundance measured in the blood of patients with infections caused by different etiological agents were found to be distinct .
Predictably, dramatic changes were observed in the blood of patients with systemic infections (for example, sepsis) [76, 77]. However, profound alterations in patterns of transcript abundance were also found in patients with localized infections (for example, upper respiratory tract infection, urinary tract infections, pulmonary tuberculosis, skin abscesses) [13, 16, 78]. Measuring changes in host transcriptional profiles may therefore prove of diagnostic value even in situations where the causative pathogenic agent is not present in the test sample. Importantly, it may also help ascertain the severity of the infection and monitor its course.
Infections often present as acute clinical events; thus, it is important to capture dynamic changes in transcript abundance that occur during the course of the infection from the time of initial exposure. Blood signatures have been described in the context of acute infections caused by a wide range of pathogenic parasites, viruses and bacteria, including Plasmodium [79, 80], respiratory viruses (influenza, rhinovirus, respiratory syncytial virus) [13, 81–84], dengue virus [85, 86], and adenovirus , as well as Salmonella , Mycobacterium tuberculosis , Staphylococcus aureus , Burkholderia pseudomallei  and the general context of bacterial sepsis [77, 89–91]. Some of those pathogens will persist and establish chronic infections (for example, human immunodeficiency virus and Plasmodium) that may lead to a state of latency (for example, tuberculosis), and transcript profiling may be used in those situations as a surveillance tool for monitoring disease progression or reactivation.
Blood profiling of infectious diseases remains limited in scale. In particular, additional studies will be necessary to ascertain dynamic changes occurring over time.
In addition to autoimmune and infectious diseases, blood transcript profiling studies have been carried out in the cancer research field. While hematological malignancies have led the way (reviewed in ), blood profiles have also been obtained more recently from patients with solid organ tumors . Notably, these signatures can reflect not only the immunological or physiological changes effected by cancers but also the presence of rare tumor cells in the circulation [94–96].
Blood signatures have also been obtained from solid organ transplant recipients in the context of both tolerance [97–99] and graft rejection [10, 100, 101]. While such signatures can also be detected in biopsy material [102–104], blood offers the distinct advantage of being accessible for safely monitoring molecular changes on a routine basis.
Some work has also been done in the context of cardiovascular diseases where inflammation is known to play an important role. Hence, profiles have been identified in a wide range of conditions, including stroke, chronic heart failure or acute coronary syndrome [105–108].
The body of published work is too large to be cited in this review - and it is likely to be only the tip of the iceberg, with a lot more unpublished data scattered throughout public and private repositories. Other efforts have yielded, for instance, blood transcriptional signatures in patients with neurodegenerative diseases [109–111], and those associated with disease exacerbation or responsiveness to glucocorticoids in patients with asthma [112, 113], and with responses to environmental exposure [114–116], exercise [117, 118] or even laughter . Unfortunately, too many published studies are underpowered and sometimes lack even the most rudimentary validation steps. All too often primary data are not available for reanalysis either, reflecting a lack of enforcement of editorial policies, or the absence thereof in some journals. Hence, one of the main challenges for this field is to move beyond the proof of principle stage and consolidate the wealth of data being generated.
Collectively, studies published thus far demonstrate that alterations in transcript abundance can be detected on a genome-wide scale in the blood of patients with a wide range of diseases. This statement is far from trivial given the skepticism that initially met studies investigating the blood transcriptome of patients. We have also learned that: 1) multiple diseases can share components of the blood transcriptional profile - for instance, the case for inflammation or interferon signatures; 2) while no single element of the profile may be specific to any given disease it is the combination of those elements that makes a signature unique; and finally, 3) the work accomplished to date highlights the importance of carrying out analyses aiming at directly comparing transcriptional profiles across diseases. Indeed, much can be learned, for instance, about autoimmunity from studying responses to infection, and vice versa. Furthermore, such efforts may eventually lead us closer to a molecular classification of diseases. First, however, technological and methodological advances are necessary for the blood transcriptome research field to move beyond the proof of principle stage.
Recent progress in blood transcriptome research has been possible thanks to the development of robust sample collection techniques and the introduction of high throughput gene expression microarray platforms. Such advances have been necessary but the margin for progression in the field is still very significant. We describe here some of the current hurdles and discuss potential solutions for overcoming them.
A data mining primer: basic steps used for analysing microarray data
Here we provide basic analysis steps and important considerations for microarray data analysis:
- Per-chip normalization: This step controls for array-wide variations in intensity across multiple samples that form a given dataset. Arrays, as with all fluorescence based assays, are subject to signal variation for a variety of reasons, including the efficiency of the labeling and hybridization reactions and possibly other, less well defined variables, such as reagent quality and sample handling. To control for this, samples are normalized by first subtracting background and then employing a normalization algorithm to rescale the difference in overall intensity to a fixed intensity level for all samples across multiple arrays.
- Data filtering: Typically more than half of the oligonucleotide probes present on a microarray do not detect a signal for any of the samples in a given analysis. Thus, a detection filter is applied to exclude these transcripts from the original dataset. This step avoids the introduction of unnecessary noise in downstream analyses.
- Unsupervised analysis: The aim of this analysis is to group samples on the basis of their molecular profiles without a priori knowledge of their phenotypic classification. The first step, which functions as a second detection filter, consists of selecting transcripts that are expressed in the dataset and display some degree of variability, which will facilitate sample clustering. For instance, this filter could select transcripts with expression levels that deviate by at least two-fold from the median intensity calculated across all samples. Importantly, this additional filter is applied independently of any knowledge of sample grouping or phenotype, which makes this type of analysis 'unsupervised'. Next, pattern discovery algorithms are often applied to identify 'molecular phenotypes' or trends in the data.
- Clustering: Clustering is commonly used for the discovery of expression patterns in large datasets. Hierarchical clustering is an iterative agglomerative clustering method that can be used to produce gene trees and condition trees. Condition tree clustering groups samples based on the similarity of their expression profiles across a specified gene list. Other commonly employed clustering algorithms include k-means clustering and self-organizing maps.
- Class comparison: Such analyses identify genes that are differentially expressed among study groups ('classes') and/or time points. The methods for analysis are chosen based on the study design. For studies with independent observations and two or more groups, t-tests, ANOVA, Mann-Whitney U tests, or Kruskal-Wallis tests are used. Linear mixed model analyses are chosen for longitudinal studies.
- Multiple testing correction: Multiple testing correction (MTC) methods provide a means to mitigate the level of noise in sets of transcripts identified by class comparison (in order to lower permissiveness of false positives). While it reduces noise, MTC promotes a higher false negative rate as a result of dampening the signal. The methods available are characterized by varying degrees of stringency, and therefore they produce gene lists with different levels of robustness.
• Bonferroni correction is the most stringent method used to control the familywise error rate (probability of making one or more type I errors) and can drastically reduce false positive rates. Conversely, it increases the probability of having false negatives.
• Benjamini and Hochberg false discovery rate  is a less stringent MTC method and provides a good balance between discovery of statistically significant genes while limiting false positives. By using this procedure with a value of 0.01, 1% of the statistically significant transcripts might be identified as significant by chance alone (false positives).
- Class prediction: Class prediction analyses assess the ability of gene expression data to correctly classify a study subject or sample. K-nearest neighbors is a commonly used technique for this task. Other available class prediction procedures include, but are not limited to, discriminant analysis, general linear model selection, logistic regression, distance scoring, partial least squares, partition trees, and radial basis machine.
- Sample size: The number of samples necessary for the identification of a robust signature is variable. Indeed, sample size requirements will depend on the amplitude of the difference between, and the variability within, study groups.
A number of approaches have been devised for the calculation of sample size for microarray experiments, but to date little consensus exists [126–129]. Hence, best practices in the field consist of the utilization of independent sets of samples for the purpose of validating candidate signatures. Thus, the robustness of the signature identified will rely on a statistically significant association between the predicted and true phenotypic class in the first and the second test sets.
MicroRNA (miRNA) control has emerged as a critical regulatory circuit of the immune system. Measuring changes in miRNA abundance in the blood of human subjects in health and disease is therefore a promising new field of investigation. These short non-coding single-stranded RNAs about 22 nucleotides in length have been found to play essential regulatory roles [130–132]. These molecules exhibit highly specific, regulated patterns of expression and control protein expression by translational repression, mRNA cleavage, or promotion of mRNA decay. Interestingly, thanks to their small size, miRNA molecules are stable and can be measured not only in blood cells but also in circulation in the serum . They are thus not only potentially important contributors to immune function, but also potential sources of biomarkers.
Blood transcriptome research will also benefit from conceptual advances that may help address shortcomings inherent to whole blood profiling.
First, blood is a complex tissue and changes in transcript abundance can be attributed to either transcriptional regulation or relative changes in composition of leukocyte populations. Two approaches exist for 'deconvoluting' these two phenomena. First, one can isolate and individually profile different cell populations present in the blood. This approach may also permit the identification of transcripts expressed at low levels or the detection of differences in expression that would otherwise be drowned in whole blood [134, 135]. However, isolation methods may introduce technical bias, and require extensive sample processing. A second approach consists of deconvoluting whole blood transcriptional profiles 'in silico'. This type of analysis attempts to deduce cellular composition or cell-specific levels of gene expression using statistical methodologies [136–141].
Finally, we must also keep in mind that the immune status of a human subject is not entirely reflected by its blood profile obtained at the steady state. Indeed, an individual's capacity to respond to innate as well as antigen-specific immune signals may also provide useful and complementary information.
In conclusion, blood transcript profiling has earned its place in the molecular and cellular profiling armamentarium used to study the human immune system. Changes in transcript abundance recapitulate the influence of genetic, epigenetic, cellular and environmental factors. Initially considered to belong to the 'cutting edge', this approach has become both robust and practical. As discussed in this review, it has become a mainstay for the study of immune function in patients with a wide range of diseases. Furthermore, recent studies have demonstrated the utility of blood transcriptome profiling for monitoring immune responses to drugs or vaccines [35, 142, 143]. Thus, blood transcript profiling is developing into a mainstream tool for the assessment of the status of the human immune system.
The work of the authors is supported by the Baylor Health Care System Foundation and the National Institutes of Health (U19 AIO57234-02, U01 AI082110, P01 CA084512).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.