Serious limitations of the QTL/Microarray approach for QTL gene discovery

Background It has been proposed that the use of gene expression microarrays in nonrecombinant parental or congenic strains can accelerate the process of isolating individual genes underlying quantitative trait loci (QTL). However, the effectiveness of this approach has not been assessed. Results Thirty-seven studies that have implemented the QTL/microarray approach in rodents were reviewed. About 30% of studies showed enrichment for QTL candidates, mostly in comparisons between congenic and background strains. Three studies led to the identification of an underlying QTL gene. To complement the literature results, a microarray experiment was performed using three mouse congenic strains isolating the effects of at least 25 biometric QTL. Results show that genes in the congenic donor regions were preferentially selected. However, within donor regions, the distribution of differentially expressed genes was homogeneous once gene density was accounted for. Genes within identical-by-descent (IBD) regions were less likely to be differentially expressed in chromosome 2, but not in chromosomes 11 and 17. Furthermore, expression of QTL regulated in cis (cis eQTL) showed higher expression in the background genotype, which was partially explained by the presence of single nucleotide polymorphisms (SNP). Conclusions The literature shows limited successes from the QTL/microarray approach to identify QTL genes. Our own results from microarray profiling of three congenic strains revealed a strong tendency to select cis-eQTL over trans-eQTL. IBD regions had little effect on rate of differential expression, and we provide several reasons why IBD should not be used to discard eQTL candidates. In addition, mismatch probes produced false cis-eQTL that could not be completely removed with the current strains genotypes and low probe density microarrays. The reviewed studies did not account for lack of coverage from the platforms used and therefore removed genes that were not tested. Together, our results explain the tendency to report QTL candidates as differentially expressed and indicate that the utility of the QTL/microarray as currently implemented is limited. Alternatives are proposed that make use of microarray data from multiple experiments to overcome the outlined limitations.


Background
The study of genetics of quantitative traits has benefited from the availability of new technologies that generate massive information at the genomic and transcriptomic levels [1]. Microarray technology has been recognized as a powerful tool that could aid in the identification of the genes underlying quantitative trait loci (QTL; [2,3]). Microarray data can be analyzed within a QTL context following a genetical genomics (GG) approach [4]. This methodology considers gene expression values as a quantitative trait that can be mapped to chromosomal locations in a segregating population. Such genomic positions are called expression QTL (eQTL), which can be either cis-or trans-acting modifiers of gene expression, depending on whether they are located in the vicinity of or far from the measured gene, respectively. In practice, the validity of this distinction depends on the resolution of the QTL analysis, i.e., the density of the genetic map and the size of the segregating population [5]. Here we refer to QTL for phenotypes other than gene expression as pQTL to differentiate them from eQTL.
Some general ideas on eQTL can be drawn from relevant GG studies. Experiments in yeast [6], maize and mouse [7], and humans [8] found that most transcripts are affected by multiple loci, with each locus accounting for less than one third of parental expression differences. eQTL with the largest effects are located in close proximity to the target gene (within 10 Kb in yeast and within 1 Mb in mice and humans), which are referred to as proximal or cis eQTL [6,7]. However, most of the detected eQTL have been found to be trans-acting. The overall distribution of eQTL along the genome reveals the presence of "hot spots" with trans-eQTL for a large number of genes genome-wide [9]. These hubs of trans-eQTL do not necessarily represent transcription factors, but more likely represent a heterogeneous group of transcription regulators [10] or could simply be the result of unaccounted for nongenetic correlation among transcripts [11]. When real, trans-eQTL hot spots can be used to identify gene modules under common genetic regulation [12]. Their detection, however, requires a larger sample size than typical GG studies provide; hence, they are missed in smaller-scale designs [9,13,14]. cis-eQTL are unique by some desirable properties, e.g., (1) known location of the causal gene and (2) their effect sizes are usually large and can be detected with smaller sample sizes. In general, cis-eQTL are regarded as strong quantitative trait gene (QTG) candidates when they are located under pQTL, their expression is correlated with the phenotype, and they tend to be located in regions not involved in identity-by-descent (IBD) relationships [15][16][17]. However, the cost of largescale microarray profiling of a segregating population restricts the application of the GG approach. Therefore, experimental designs requiring a lower number of microarrays for the identification of cis-eQTL are desirable.
An alternative approach is defined in this paper as the QTL/microarray approach. This approach refers to the combined use of traditional QTL mapping and subsequent microarray profiling of nonrecombinant parental or congenic strains to reduce the number of candidate genes in QTL regions [18][19][20]. The methodology used in QTL/microarray studies, even though it varies among researchers, shares some common procedures that can be summarized as follows: (1) QTL mapping experiment and identification of genes located within confidence or support intervals for QTL, (2) a test of differential expression between parental strains (Parental design), (3) cross-reference list of positional candidates from step 1 and expression candidates from step 2, (4) hypothesis-or knowledge-driven filtering of the list of candidates, (5) independent confirmation of differential expression, and (6) experimental validation of causative genes.
Step 3 sometimes compares gene expression on congenic and background strains (Congenic design) or between animals with extreme phenotypes from a segregating population (Extremes design).
Step 5 often involves measuring expression of candidate genes by qRT-PCR in an independent set of samples, but can also involve Northern or Western blot analysis. Step 6 is usually part of a separate project that follows a QTL/ microarray study.
The rationale behind QTL/microarray studies is that causative genes may have polymorphisms causing differences in their level of expression that translate into varying amounts of mRNA and ultimately varying amounts of functional proteins, leading to observable phenotypes. There are several mechanisms by which a QTG could change gene expression levels. A mutation in the binding site for a transcription factor may, for instance, decrease its binding affinity to the ligand, affecting the gene's transcription level [21]. Mutations in the transcription factors themselves could also affect recognition of their targets, thus also changing the gene's transcription level. Likewise, nonsense mutations in the coding sequence of a gene can decrease transcript levels by nonsense-mediated decay [22].
In contrast to the GG approach, transcript profiling of nonrecombinant animals does not allow QTL mapping of the expression levels and therefore cannot differentiate between cis-and trans-eQTL. However, co-localization of a differentially expressed gene and the pQTL can be tested, given that a physical, or genetic, map is available for the genes. This would be equivalent to a cis-eQTL/pQTL co-localization test for the genes under cis control. An approach for sorting out cis-from trans-eQTL in this experimental design consists of first isolating the genomic region with the pQTL in a congenic strain by backcrossing a donor strain to a recipient strain for multiple generations and then testing differential gene expression between the congenic and the recipient background strain. Depending on the size of the congenic strain, differentially expressed genes in the donor region are likely to be under cis regulation or alternatively by trans control from linked genes within the limits of the congenic interval. In contrast, differential expression of genes outside the donor region is expected to be regulated, directly or indirectly, by genes located within the congenic interval. Contaminating donor DNA in places outside the congenic donor region could produce false trans-eQTL. However, nonrecombinant individuals from an F 2 cross between the congenic and background strains can be used to randomize the effect of contaminating regions.
The QTL/microarray approach is not exempt from issues that need to be addressed. A well-known problem associated with these studies is that DNA polymorphisms can affect the binding of microarray probes and significantly decreasing detectable signals. Such artifacts can produce an increase in false-positive results when genetically divergent individuals are compared with microarrays [23,24]. Presumably, however, this is not the only issue that may affect QTL/microarray results. Despite the wide use of the approach, no systematic evaluation of its performance has been reported.
In the current paper, a critical study of the QTL/ microarray approach applied to the analysis of complex traits with clinical relevance in humans was performed. First, the literature was reviewed and a meta-analysis of rodent studies that have implemented this approach was conducted. Second, a microarray experiment with three mouse congenic strains was designed to test whether differential expression is associated with QTL peaks. The advantages and limitations of the QTL/microarray approach are discussed, and recommendations for the effective use of microarrays for the dissection of complex traits are provided.

Literature review
Thirty-seven published studies using the QTL/microarray approach in rats (Table 1) or mice ( Table 2) were examined for key features of their experimental design. Most of the microarrays used were whole-genome arrays, including both cDNA, especially for older studies, and oligonucleotide arrays. Although the objective of using microarrays in most of these studies was similar, i.e., identification of differentially expressed genes under pQTL peaks, the methodologies used varied greatly. Particularly, no standards were followed in the statistical methodologies used for testing differential expression. Some studies do not report the methodology used at all [25,26]. Others report from one to three sequentially applied criteria to select lists of differentially expressed genes, which varied in nature and in statistical support. For instance, arbitrary thresholds by absolute intensity difference or fold changes (FC) [27][28][29][30][31][32][33], different combinations of FC and t-statistics [34][35][36], ranking by FC and selecting only the top 100 most significant genes [37], correlation to a hypothetical constant gene [38], and concordance in FC direction across experimental groups [39], among others (for the complete list studies with details about experimental design, see Additional file 1). Although applying multiple criteria can be attractive in terms of reducing the number of candidates, this is likely to reduce power in an unpredictable manner. This is further complicated when a criterion such as fold change (FC) is used, which has unknown significance level.
In some cases, the QTL/microarray design has led to the identification of a QTG. In rats, two studies were identified confirming the role of QTGs, i.e., Cd36 in insulin resistance [27] and Klk1 in hypertension [31]. Using a cDNA microarray, the Cd36 gene was selected as differentially expressed in adipose tissue between two divergent inbred strains of rats as well as between a congenic strain isolating a QTL on chromosome 4 and the background strain. This gene was located within the limits of the congenic donor region, but it was not mentioned whether  Table 3.
other genes in the region were also differentially expressed [27]. This finding suggested Cd36 as a candidate associated with insulin resistance syndromes. Northern blots and sequencing revealed that the microarray probe targeting the 3' end of the mRNA molecule detected alternative splicing of the last exon and not differential expression. This was translated into absence of protein levels in plasma, and a transgenic model confirmed the effects of Cd36 on lipid levels in the blood [27]. It was not mentioned by Atiman et al. [27] how many genes were located within the congenic region, presumably because it was not known at the time of publication, when a physical map of the rat genome was not available. Unfortunately, the identity of the microsatellite markers defining the donor region was not provided, and it was not possible to calculate the size of the region. For this reason, the study by Aitman et al. [27] was not included in the meta-analysis in Table 3.   Table 3.
The QTL/microarray approach has been extensively used to study hypertension. A total of nine studies were identified that compared gene expression between rat strains that showed spontaneous difference for blood pressure [31][32][33]40,41] or between congenic and background strains created from these lines [30,[42][43][44]. These studies have led to the direct identification of as many as 50 candidate genes for hypertension and to the confirmation of at least one causal gene, namely, Klk1 [31]. Instead of microarrays, Iwai et al. [31] used real-time polymerase chain reaction (PCR) to test 399 transcripts located within the confidence interval for the QTL position. It was reported that among 240 transcripts that were detected in kidney tissue, two were differentially expressed, i.e., Klk1 and Ngfg. From these, only Klk1 was confirmed by Western blot analysis [31]. The role of Klk1 as a QTG was confirmed by alleviation of hypertension symptoms through adenoviral transfer of human Kallikrein 1. Because expression profiling was restricted to the target region of the QTL, this study was not included in the overrepresentation test in Table 3.
Among the mouse studies reviewed, one QTG was identified using the QTL/microarray approach, i.e., Alox15. Klein et al. [26] generated a congenic strain to isolate a QTL for peak bone mineral density on mouse chromosome 11 that was previously identified by Klein et al. [45]. This was followed by measuring gene expression in kidney tissue using a whole genome high-density array (see Table 2). Alox15 was the only gene reported as being differentially expressed in the congenic region between the strains. This was confirmed by qRT-PCR in kidney and osteoblast cell cultures. Furthermore, the role of Alox15 was confirmed by a complementation test with an Alox15 -/knockout mouse as well as two drugs that inhibit the protein product coded by this gene [26]. There are over 2500 genes known in the donor region for the congenic strain, so as in previous examples, after QTL mapping, microarray testing was the single step that reduced the number of candidate genes the most. Unfortunately, it was not reported how many genes were differentially expressed genome-wide, and therefore this study also had to be excluded from our tests in Table 3.

Meta-analysis
Because of missing information, statistical testing could be performed on only 20 of the 37 studies compiled ( Table 1 and Table 2). In addition, microarrays do not cover all genes in the genome [46]. To control for the differences in the level of genome coverage between platforms used in each study, the meta-analysis was performed using two reference sets: genes in the genome and probes in the microarray (see Methods). Filtering by differential expression, on average, selected 1.9% of pQTL candidates' genes (range from 0% to 15%; Table  3). Of 20 publications with sufficient information, 6 (30%) reported differentially expressed genes that were significantly enriched and 3 (15%) studies were underrepresented for genes within the pQTL or congenic region when compared with the whole genome (P < 0.05). When using probes in the array as a reference, 7 (35%) were overrepresented and 7 (35%) were underrepresented ( Table 3). The four experiments that were underrepresented for candidates only when using genes in the microarray as reference were performed on microarrays with low genome coverage (13K, [47], 12.5K, [35,48]) or on filter arrays [49]. Because the number of pQTL candidates represented in the microarrays was not available, we used the total number of genes under QTL intervals as an approximation. Low genome coverage produced overestimation of the true number of tested candidates, explaining their apparent underrepresentation among differentially expressed genes in these platforms.
The number of tests for differential expression applied in each study was used to question whether increasing the number of selection criteria increases the probability of selecting QTL candidate genes. No such trend was observed. In fact, the largest enrichments were seen in studies using a single selection criterion ( Figure 1).
When divided by type of experiment, only one of eight Parental comparisons revealed enrichment for candidate genes. This is not unexpected, given that the test was performed not on QTG but only on positional candidates. Furthermore, inbred strains can present genomewide genetic divergence that is not related to the specific phenotype under study. This limitation of the Parental design can be alleviated with the comparison of Extremes, where animals are specifically selected for the phenotype of interest and regions not harboring QTL are expected to segregate randomly relative to the phenotype. The Extremes design is the gene expression equivalent of mapping by allelic association where marker genotype frequencies are compared instead [50]. In both types of comparison, the degree to which differences between extremes are informative about an underlying association with the phenotype depends on the population and sample sizes, range of linkage disequilibrium (LD), and population history and structure; therefore, proper experimental design is required to avoid spurious associations [51]. Of two studies reviewed of this type, one presented significant enrichment. However, the one that showed significance used a modified design where the two extreme groups were composed of one parental strain and one recombinant inbred line [52]. Because of the confounding of two designs, i.e., Parental and Extremes, and its small sample size [only one recombinant inbred line (RIL) per extreme group], no further interpretation of this result is attempted. In contrast, the Congenic design was implemented in 11 studies with sufficient information for meta-analysis. Of these, five studies revealed enrichment for candidate genes. By design, only regions that are confirmed to harbor QTL have genetic divergence, and therefore enrichment of QTL candidates can be expected. However, multiple factors can potentially contribute to this trend. Polymorphic genomic regions are more likely to host pQTL for any trait and to generate allelic bias in probe binding for one strain versus the other. This situation can be expected in cases where the microarray has been designed for one of the strains that are being compared, or for a strain that is genetically more closely related to one of them. Allelic bias of probe binding will have a systematic effect on fluorescence intensity levels, which can be interpreted as differential gene expression. Since higher polymorphism rate is expected to increase both frequency of QTL as well as allelic bias of probe signals, the variables would be associated and the observed overrepresentation of pQTL candidates may result from such confounding effect. Both hypotheses are not mutually exclusive, since in reality QTL regions may be enriched for both polymorphisms that produce allelic bias as well as for functional polymorphisms that produce differential expression. However, it is important to assess the relative importance of these two factors in the apparent tendency for some QTL/microarray experiments to report pQTL candidates as differentially expressed.
Meta-analysis of results from the literature presents several limitations. Overrepresentation of pQTL candidates can be affected by a number of factors, such as Phenotype abbreviations: EIAD: Endotoxin-induced airway disease, HSCP: Hematopoietic stem cells proliferation, IBD: Inflammatory bowel disease. a Number of pQTL candidate genes was extracted from references or calculated from UCSC Genome Browser within the limits of the pQTL confidence intervals or congenic donor regions. Selected Genes columns indicate number of differentially expressed genes genome-wide (GW) or in candidate regions (pQTL). OR columns denote the ratio of odds, i.e. fraction of candidate genes in the selected and in the references sets. P-values (P) for over-(OR > 1) or underrepresentation (OR < 1) of QTL candidate genes in list of selected genes were calculated using the genome and microarray probes as reference (see methods). Significant p-values (P < 0.05) are shown in bold.
publication bias for genes that are functional candidates or that are located near pQTL and inaccurate estimation of the gene coverage in microarrays. The number of candidate genes that were targeted by each microarray was largely unknown and the two reference sets used, i. e., total number of genes in the genome and number probes in the microarray, may not be optimum reference sets for overrepresentation tests. Furthermore, heterogeneity in quality of microarray annotation, definition of candidate region limits, and statistical procedures for data processing and differential expression testing limit our ability to investigate the specific causes of enrichment for pQTL candidates in their results. Therefore, an in-house microarray experiment was deemed necessary to specifically test for overrepresentation of candidate genes among differentially expressed genes. This gave us complete control over all these variables and allowed performing more specific tests that considered probe mismatches, IBD regions, and QTL location within congenic donor regions.

Experiment using three congenic strains
Differential expression between three CAST.C57.hg -/mouse congenic strains with their genotypic background controls C57.hg -/was tested on the Illumina Mouse-6 microarray with samples from brain, liver, and gonadal white adipose tissue (see Methods; data available at the NCBI GEO repository by accession GSE22042). This platform has coverage for 71.7% of 30,388 EntrezGenes in the mm9 genome assembly and for 75.5% of Entrez-Genes in the donor region of the congenic strains. Differential expression analysis detected a total of 577, 110, and 109 genes (targeted by 682, 131, and 148 probes) genome-wide that were affected by allelic variants of genes in the congenic region of HG2D, HG11, and HG17, respectively. Of these, 124, 89, and 95 genes were located within the donor regions of those strains. Probes selected within the donor regions presented an allelic bias toward higher intensity of the C57 samples (Figure 2a). Since the reference mouse genome sequence used to design the microarray probes is C57, polymorphisms between this strain and CAST may have an effect on the binding affinity of microarray probes. A total of 410 probes overlapped at least one of the 289,541 known or imputed single nucleotide polymorphisms (SNP) within the limits of the donor regions (31 probes had two SNP, and one probe had three SNP). Of these, 209 probes detected a transcript in at least one tissue. This is in agreement with an observed bias toward higher intensity from C57 alleles in probes with known or imputed SNP (Figure 2b). However, allelic bias for genes in the donor region persisted even after these probes were removed (Figure 2c), suggesting that many SNP between C57 and CAST may still be unknown. However, it is also possible that the allelic bias is reflecting functional polymorphisms that can be detected in only one direction. For instance, insertions in the CAST genome would not be detected, whereas insertions in the C57 genome are detected as deletions in the CAST genome. Likewise, nonsense mutations causing RNA decay will only be apparent in CAST, since probes were designed for C57 mRNA molecules. On the basis of these findings, we recommend the use of high-density microarrays that target mRNA molecules in multiple locations, and custom probeset definitions can be designed to target only perfect matching sequence in the particular cross under study. The absence of this or alternative techniques [23] in all the reviewed papers leads us to conclude that such probebinding artifacts also explain, at least in part, the increased frequency of candidate genes among reported differentially expressed genes. Differential expression testing reduced the number of positional candidates from 1596, 1132, and 1347 to 124, 89, and 95 genes, that is, a reduction in 92.2%, 92.1%, and 92.9%, for HG2D, HG11, and HG17, respectively. However, these apparently high filtering rates result not only from lack of differential expression but also from lack of expression, removal of probe due to SNP, and lack of coverage of the microarray. Because untested  genes cannot be discarded as candidates, genes that were targeted only by SNP-overlapping probes or not targeted at all in the microarray must be added back to the list of potential candidates. These represent 29%, 22.6%, and 33.2% of genes in the chromosomes 2, 11, and 17 donor regions, which were excluded from differential expression testing. The 124, 89, and 95 differentially expressed genes plus the 463, 256, and 447 genes that were not tested leaves a total of 587, 345, and 542 genes that would need further testing in each of the H2D, HG11, and HG17 congenic regions. The effective reduction of candidates by differential expression testing after adjustment was 63.2%, 69.5%, and 59.8%, respectively, i.e., 64% on average.
All differentially expressed genes located outside donor regions can be considered under trans regulation, in other words, trans-eQTL modulated. Genes regulated in trans were observed in almost every chromosome in the genome (Table 4). Differentially expressed genes within the congenic regions are candidates for cis-eQTL regulation and represented 21.5%, 80.1%, and 87.2% of selected genes, which is highly unlikely by chance considering that only 5.2%, 4%, and 4.2% of genes in the array are located in each of these regions (P = 6.38 × 10 -43 , P = 1.11 × 10 -104 , and P = 4.24 × 10 -117 ) for chromosomes 2, 11, and 17, respectively. This high cis-eQTL enrichment was observed despite the fact that probes overlapping known SNP between CAST and C57 genomes were removed. Furthermore, some genes classified as trans regulated lay right at the ends of the congenic regions and are most likely cis regulated (Figure 2). This is expected because the limits for the donor regions used here represent minimum intervals from low-density genotyping [53], and the true limits may extend further than these intervals. We observed an approximate 3.1 to 1 ratio between cis and trans eQTL. It has been argued that selection acts distinctively on cis eQTL owing to quantitative effects, limited pleiotropy, and more exposure to selective pressure due to codominant effects versus a recessive mode of action characteristic of trans regulation [54]. Therefore, cis eQTL could have a predominant role in shaping genetic regulation of transcription [55]. However, empirical evidence compiled from GG studies in multiples species favors the view that trans eQTL are prevalent but show smaller effects than cis eQTL and can be missed at low sample sizes [6][7][8]56,57]. Although statistical issues related to multiple testing of trans eQTL and power to detect smaller effects makes it difficult to estimate the true ratio of cis versus trans regulatory loci [55], eQTL studies in yeast [58] and Arabidopsis [59] have shown that expression of most transcripts is most likely regulated by multiple loci, and a study in humans showed significant enrichment of interaction among multiple loci affecting gene expression [57]. Therefore, we hypothesize that the ratio observed here is due to the small experimental design with only four replicates per genotype and that the overrepresentation of candidates' genes in congenic QTL/microarray experiments may result from biased detection of cis eQTL as a consequence of low power [9,13,14].
F2 offspring subcongenics from three congenic strains have been assayed for the same set of biometric measurements (Additional file 2) resulting in identification of at least 13, 7, and 5 QTL on chromosomes 2, 11, and 17 respectively [60,61] (Figure 3 and Additional file 3). The large number and overlap of QTL intervals would make it impossible to test for co-localization between differentially expressed genes and QTL. Instead, we tested whether the probability of differential expression was homogeneous along donor regions, conditioned on the number of expressed genes in bins of 2 Mb. A Fisher's exact test (see Methods) revealed no significant departure from homogeneity for chromosome 2 (P = 0.81), 11 (P = 0.52), or 17 (P = 0.67). Inspection of Figure 3 shows that the fraction of differentially expressed genes closely follows the distribution of genes in the donor regions ( Figure 3). These results indicate that the distribution of selected genes within donor regions was mostly explained by the number of expressed genes and is not concentrated in any particular QTL region.
cis eQTL have been reported to be located preferentially in non-IBD regions [15], and their regulated genes have higher density of predicted SNP on transcription binding sites [21]. Therefore, multiple authors have proposed using the IBD status of genes to filter or prioritize cis eQTL candidates [15,20,62]. We tested whether genetic diversity within congenic regions was associated with differential expression. High genomic divergence between C57 and CAST resulted in only 14.7%, 6.4%, and 7.2% of the genes in donor regions of chromosomes 2, 11, and 17, respectively, to be located within IBD blocks. Using the IBD criteria discarded only 4, 3, and 9 genes from 124, 89, and 95 cis eQTL candidates in chromosomes 2, 11, and 17, respectively. Overrepresentation of differentially expressed genes within non-IBD blocks was observed only in chromosome 2 (P = 2.35 × 10 -6 ) but not in chromosomes 11  (P = 0.35) or 17 (P = 0.54). Therefore, although cis eQTL candidates were preferentially located in non-IBD regions in agreement with Doss et al. [15], this was not significant once the probability of any gene to be located in IBD blocks is taken into account in chromosomes 11 and 17. This indicates that in this cross of highly divergent strains, enrichment of cis eQTLs in these two chromosomes was driven only by the overall higher rate of genes in those regions. This added to the limited number of genes that would be removed from the candidate list by the IBD criterion indicates that there would be no real gain in using this approach. Because we are inspecting only the chromosomes in one cross, we refrain from generalizing this conclusion to other cases. However, there are four main reason why using IBD to filter down lists of cis eQTL candidates should be done with care. In most cases, IBD is inferred from incomplete genotype data originated from resequencing [63], genotype imputation [64], or microarray genotyping for SNP discovered by the previous two methods [65]. Errors or lack of coverage from these methods could lead to imprecision in defining the size of IBD blocks that leads to filtering (or not) of genes that are actually non-IBD. Second, mutations that arose after the split of strain ancestors are missed from imputation techniques on the basis of a few highly divergent strains. Third, in the absence of evidence that the strain ancestors were homozygous at all loci, it is possible that modern strains have fixed different alleles of functional SNP that existed as heterozygous SNP in the strain ancestors. Fourth, enhancers in non-IBD regions may regulate expression of genes that are in IBD. We are aware of a least one case where this would have eliminated the causal gene if the IBD criterion were used. Prcp was identified as a candidate gene for obesity by subcongenic isolation and gene expression data from brain [66,67]. This gene is located in an IBD region between the donor strains BALB/cByJ and the background strain C57BL/6ByJ. However, in vitro assays with recombinant PRCP demonstrated that it has enzymatic activity to inactivate α-melanocyte-stimulating hormone (α-MSH 1-13 ) by removing the C-terminal amino acid to produce α-MSH 1-12 . α-MSH 1-13 , a critical anorexigenic neuromodulator in the hypothalamus. A mouse model with a gene trap in PRCP confirmed effects of PRCP on obesity. In addition, inhibiting PRCP activity in vivo decreased food intake, confirming the role of Prcp in weight maintenance via control of active α-MSH 1-13 levels [67]. Sequencing of this gene in the congenic revealed no SNP in the expressed sequence but only a promoter C T transition that is hypothesized to affect the observed changes in gene expression and protein activity, food intake, and obesity phenotypes. This SNP was not known at that time and could not be inferred from the parental strains' ancestry. Therefore, identity by descent does not imply lack of DNA polymorphisms and, more important, of genetic differences that affect phenotypes. In summary, differential expression testing in three congenic strains revealed expression signatures enriched for eQTL candidates, which resulted in hundreds of genes to be considered for further testing. Expression differences within the donor regions were distributed according to the overall distribution of genes in those regions and were affected by IBD blocks only on chromosome 2. High genetic divergence between C57 and CAST resulted in very limited number and size of IBD regions. Filtering by IBD would only discard 16 genes, which according to previous reports may well contain a causal variant. Intense phenotyping of F2 fine mapping populations revealed high genetic complexity, with multiple QTL, in each of these regions. The HG2D congenic includes multiple QTL for the same phenotype (body weight) with opposite genotype effects [61]. Furthermore, it is possible that many more QTL would be detected in these genomic regions if more phenotypes were collected. From the current data, it is impossible to distinguish between long-reaching linkage and pleiotropy or to infer causal relationships between QTL, transcripts, and organismal phenotypes. Phenotyping of large mapping populations would be necessary to break the association between these confounding effects.
The results from our literature review and the present experiment do not invalidate the use of microarrays for dissecting QTL. On the contrary, they stress the need for new approaches to make better use of these data. It has been shown that reanalyzing large repositories of microarray data can identify profiles of differential expression that are highly predictive of gene associations to human diseases [68,69]. By ranking genes by the ratio of experiments showing differential expression across many conditions, Chen et al. [68] were able to rediscover disease genes with 79% specificity and 37% sensitivity and proposed using this criterion to prioritize candidates resulting from GWAS associations. More recently, Gorlov et al. [69] found that the top genes by differential expression between normal and cancer tissue from human prostate are enriched for the same functional categories as the top candidate genes in GWAS and that strength of association in both tests was correlated. A similar approach could be employed in model organisms by reanalyzing results from a large number of microarray studies in inbred and congenic strains to increase both power and significance of genotype-phenotype associations. One could prioritize genes that are repeatedly differentially expressed between mouse genotypes that show similar phenotypic differences. Furthermore, such an approach would benefit from the availability of dense maps of SNP by using modern statistical tools to associate phenotypes to haplotypes produced by historical recombinations while accounting for genetic background and population structure [70]. Results from such analysis could be used to confirm and refine the position of eQTL from GG studies using F2 or backcrosses.
Using meta-analysis of large collections of microarray data to prioritize QTL candidates in rodents can present several advantages over similar approaches in humans. Linkage disequilibrium (LD) in humans can extend large distances; it is affected by population structure and history and can even reach across multiple chromosomes (see [71] and references therein). The situation is further complicated in case control designs where environmental factors significantly contribute to variation in gene expression [72]. In addition, technical factors such as time of preparation of samples for different populations have been proposed to explain some of the wide range of differences in gene expression observed between Hap-Map populations [55,73]. Environmental factors can be more tightly controlled in experimental populations, although differences between laboratories do have a significant effect on microarray results and must be considered [74]. Population structure is also present in laboratory mouse strains, and the contribution of different lines of ancestry is unequal across different regions of the genome [75], which would affect analyses of expression data across multiple strains. However, new populations can be designed to remove the effects of population structure. One population of note is the Collaborative Cross [76], which promises to greatly reduce LD and population structure (EJ Chesler, personal communication), and since it will be a panel of recombinant inbred strains, a large volume of phenotype and gene expression data is expected to accumulate over time. We think that these resources and the development of better methods for data analysis will greatly improve the success of using microarray data to dissect complex traits in rodents.

Conclusions
A review of 37 studies from the literature that have applied the QTL/microarray showed that this approach is effective in reducing the list of candidate genes, with an average proportion of 1.9% candidate genes being differentially expressed across experiments. However, a meta-analysis of published results showed no significant overrepresentation of positional candidate genes among those selected in 70% of the studies. Most of the studies that did show enrichment of candidates were comparisons between congenic and background strains. Lack of standards in analytical methods for testing differential expression testing as well as a tendency to apply multiple criteria for probe selection was observed. Our analysis showed that no increase in enrichment is gained from this technique. In three reviewed studies, filtering by differential expression led to the identification of a QTG gene, where only a couple of genes were reported to be differentially expressed [26,27,77]. However, other studies with similar filtering ratios did not have such a favorable outcome (Table 3), and therefore luck played a role in determining the outcome in these studies.
By performing three independent congenic microarray experiments, we found high enrichment of genes within donor regions among differentially expressed genes. Within the limits of the donor regions, no clustering of differentially expressed genes to any particular pQTL region was observed, but a rather homogeneous distribution once overall gene density is accounted for. The high genetic divergence between C57 and CAST caused only~6-15% of genes in the donor regions to fall in IBD blocks and genes within these blocks were less likely to be differentially expressed on chromosome 2 but not on chromosomes 11 or 17. On the basis of this and previous findings, IBD was not used to filter candidate genes. Furthermore, lack of genome coverage from the microarray used and removal of probes overlapping SNP excluded~30% of positional candidates from differential expression testing. Overall, differential expression testing resulted in a reduction of the number of candidates by~60-70%, leaving~300-500 genes per donor region that need further testing. Therefore, our power to refine lists of candidate genes within donor regions from microarray data was rather poor. In addition, small sample size in our experiment restricted identification of trans eQTL. We expect that the same is true in the reviewed studies, contributing to the overrepresentation of pQTL candidates in some of them, resulting in a large number of candidates and making experimental validation impractical.
We showed that cis eQTL can be uninformative about QTG because they can show allelic bias toward higher expression of the reference genome. In our data, we found that this bias was partially explained by SNP on the binding region of probes, but that lack of full sequence of the CAST genome at the time of our analysis did not allow complete removal of this effect. Even with the availability of full CAST genome sequence, the limited number of probes in the platform used restricts the possibility of eliminating all probes matching SNP.

Collection of Microarray Studies
Compilation of studies using the QTL/microarray approach in mice and rats was performed by retrieving all citations that matched the Entrez query "(gene expression OR microarray) AND (QTL OR complex trait)" in PubMed [78], resulting in a list of 588 references. This list was manually curated by reading titles and abstracts and by keeping only studies that reported original results from using microarrays to identify candidates that cause differences in complex phenotypes in mouse and rat.

Meta-analysis of QTL/Microarray studies
The analysis examined the overrepresentation or underrepresentation of genes within the pQTL or target regions that were selected in the reviewed studies. The information relevant for the objective of this analysis was (1) the number of genes under pQTL peaks, i.e., confidence limits for the position of pQTL, (2) number of genes in the genome being represented in the microarrays platform used, (3) number of differentially expressed genes, and (4) fraction of selected genes that are located within the confidence interval for the pQTL. In cases where microarray profiling was performed on congenic versus background strains, every gene located in the donor region of the congenic strain was considered a positional candidate.
Relevant information from all studies is shown in Tables 1 and 2 and was extracted as follows. The number of genes under a pQTL, if provided, was extracted from the text. Otherwise, it was estimated from the UCSC Genome Browser [79] by querying all known genes between the flanking markers for the pQTL confidence intervals; if flanking markers were not provided, physical or genetic confidence interval limits were obtained from the text or inferred from figures by visual inspection. Last, if no explicit confidence limits were plotted, a 1.5 drop in LOD from the pQTL peak was used. If confidence limits were only available as genetic positions, cM were transformed to Mbp using the results from a high-density SNP mapping experiment [80]. For experiments on congenic strains, flanking markers of the donor regions where used. NCBI mouse genome build 37 and rat build 3.4 were used to locate genetic markers in the physical genomic coordinates. The results of data collection are summarized in Table  3. The numbers of candidate genes in Table 3 are the totals for all pQTL considered in each study and should be regarded as approximate estimates of the real numbers.
Significance of over or under representation of pQTL candidate genes was assessed by a Fisher's exact test with the null hypothesis (H o ): candidate genes genes in reference set selected candidat = e e genes selected genes , where "candidate genes" refers to genes located within pQTL confidence intervals or donor congenic regions (depending on the experimental design of the microarray experiment), "genes in reference set" is the total number of genes considered, "selected candidate genes" is the number of genes differentially expressed in the candidate region, and "selected genes" is the total number of differentially expressed genes. In other words, under the null hypothesis, the detection of differentially expressed genes is assumed as a process of randomly sampling genes from a pool that includes genes within and outside the target region. Therefore, the fraction of target genes in the selected sample is expected to be equal to the fraction in the gene set that is used as a reference. The ratio of these two ratios is called the Ratio of Odds (OddsRatio). A Fisher's exact test for overrepresentation (OddsRatio > 1) or underrepresentation (OddsRatio < 1) of candidate genes in the list of differentially expressed genes was performed and P values were calculated from the hypergeometric distribution by using the fisher.test function in R. Two overrepresentation tests were performed by either using the total number of genes in the genome or the number of probes in the microarray. The number of genes in the genome was obtained from the assembly statistics at the Ensembl genome browser [81,82] by adding counts for "Known protein-coding," "Projected protein-coding," and "Novel protein-coding" gene categories (26,404 M. musculus genes, Build 37; 22,503 R. norvegicus genes, Build 3.4). The second test, using number of probes in the microarray, is intended to control for the effect of different levels of genome coverage by microarrays. An ideal reference would only consider genes that are included in the microarrays. However, because of nonexistent or obsolete annotations for some platforms, this information is not always known and we used the total number of probesets as an approximation.

Experiment using three congenic strains
Three congenic mouse strains were profiled with microarrays and were analyzed using a QTL/microarray approach to identify candidate genes that regulate obesity traits: HG2D (HG.CAST-(D2Mit329-D2Mit457)), HG11 (HG.CAST-(D11Mit260-D11Mit255, MGI reference: 3771218), and HG17 (HG.CAST-(D17Mit196-D17Mit190); MGI reference: 3771215) [53]. All these congenic strains isolate CAST/EiJ (CAST) alleles in a C57BL/6J hg/hg (C57) background and bare the hg deletion in the high growth locus on chromosome 10 [83,84]. Animals for microarray profiling were generated from an F 2 intercross between congenic males and C57BL/6J control females. Mice were weaned at 3 weeks old, housed in age-and sex-matching cages with five or fewer animals per cage. All animals were fed a standard Purina Formulab Chow 5008 diet and killed at 9 weeks old. Nonrecombinant animals for the congenic region that were homozygous for congenic or background alleles and hg/hg for the high growth locus were selected. Brain, liver, and gonadal white adipose tissues were collected from four biological replicates, snap frozen in liquid nitrogen, and stored at -80°C. All samples were obtained from males except for adipose tissue in HG2D. All mouse protocols followed the guidelines of the American Association for Accreditation of Laboratory Animal Care [85].
RNA was purified, prepared, and hybridized using manufacturers' protocols. Briefly, total RNA was extracted with TRIzol reagent (Invitrogen, Carlsbad, CA), DNase treated (TurboDNA, Ambion Inc., Austin, TX), and cDNA was generated by reverse transcription using Total Prep RNA Amplification Kit (Illumina Inc., San Diego, CA). cDNA was labeled with Biotin-16-UTP and hybridized to Mouse-6 V1 Illumina BeadArrays at the Gene Expression Core facility of the University of California Davis Genome Center (Davis, CA). BeadArrays were scanned and features were extracted using BeadStudio V. 1.5.1.3 (Illumina Inc., San Diego, CA). Local background correction was done at scanning using default values. Bead level data was summarized by removing outliers greater than 3 median absolute deviations (MADs) from the median and by calculating the mean and variance of the remaining beads. Summarized intensity values were imported into the R 2.9.2 language/environment for normalization and analysis [86].
The probes in Mouse-6 array were gene-annotated inhouse to ensure current coverage of the mouse genome. Probe sequences were aligned to the mm9 mouse genome (Genome Build 37) obtained from the UCSC Genome Browser [87]. Sequence alignment was performed with stand-alone BLAT [88]. Probes mapping to multiple locations or with gaps larger than 10 Kb were not annotated. The resulting alignments were overlaid to, in this order, NCBI's RefSeqs [89], mouse mRNA, and human proteins mapped to the mouse genome by UCSC [79]. Intensity values were normalized with a quantiles the affy package from Bioconductor [90]. Probes were filtered by present calls (P < 0.01) in four or more samples from any given tissue. A total of 72 samples (3 strains × 2 genotypes × 3 tissue × 4 replicates) were hybridized to 12 Mouse-6 chips. Differential expression was tested by fitting a cell-means linear