A gene expression atlas of the domestic pig
- Tom C Freeman1Email author,
- Alasdair Ivens2, 6,
- J Kenneth Baillie1,
- Dario Beraldi1, 7,
- Mark W Barnett1,
- David Dorward1,
- Alison Downing1,
- Lynsey Fairbairn1,
- Ronan Kapetanovic1,
- Sobia Raza1,
- Andru Tomoiu1,
- Ramiro Alberio3,
- Chunlei Wu4,
- Andrew I Su4,
- Kim M Summers1,
- Christopher K Tuggle5,
- Alan L Archibald1Email author and
- David A Hume1Email author
© Freeman et al; licensee BioMed Central Ltd. 2011
Received: 16 October 2012
Accepted: 23 October 2012
Published: 15 November 2012
This work describes the first genome-wide analysis of the transcriptional landscape of the pig. A new porcine Affymetrix expression array was designed in order to provide comprehensive coverage of the known pig transcriptome. The new array was used to generate a genome-wide expression atlas of pig tissues derived from 62 tissue/cell types. These data were subjected to network correlation analysis and clustering.
The analysis presented here provides a detailed functional clustering of the pig transcriptome where transcripts are grouped according to their expression pattern, so one can infer the function of an uncharacterized gene from the company it keeps and the locations in which it is expressed. We describe the overall transcriptional signatures present in the tissue atlas, where possible assigning those signatures to specific cell populations or pathways. In particular, we discuss the expression signatures associated with the gastrointestinal tract, an organ that was sampled at 15 sites along its length and whose biology in the pig is similar to human. We identify sets of genes that define specialized cellular compartments and region-specific digestive functions. Finally, we performed a network analysis of the transcription factors expressed in the gastrointestinal tract and demonstrate how they sub-divide into functional groups that may control cellular gastrointestinal development.
As an important livestock animal with a physiology that is more similar than mouse to man, we provide a major new resource for understanding gene expression with respect to the known physiology of mammalian tissues and cells. The data and analyses are available on the websites http://biogps.org and http://www.macrophages.com/pig-atlas.
Keywordspig porcine Sus scrofa microarray transcriptome transcription network pathway gastrointestinal tract
The comprehensive definition of the mammalian transcriptome has altered our view of genome complexity and the transcriptional landscape of tissues and cells. Systematic analysis of the transcriptome is of central interest to the biology community, but global coverage was not possible until the complete sequencing of the human and mouse genomes and the advent of microarrays. The pioneering work by Su et al. [1, 2] provided the first comprehensive analysis of the protein-encoding transcriptome of major organs of human and mouse. Others have used microarrays or alternative methods to map expression in specific tissues or cell types [3–7]. The work of the FANTOM and ENCODE projects has revealed the true complexity of the mammalian transcriptome, highlighting the impact of alternative initiation, termination and splicing on the proteome, and the prevalence of multiple different classes of non-coding RNAs (ncRNAs) [8–11]. The pace of data acquisition has continued to grow with the increasing reliability and decreasing cost of the core technologies such as microarrays and the sequencing of RNA (RNAseq). Despite these efforts, knowledge of the human transcriptional landscape is still sparse. Efforts to curate and analyze an 'atlas' from the existing human microarray data are hindered by the fact that certain types of samples have been analyzed extensively, for example hematopoietic cells and cancers, while little or no data are available for many other tissues and cell types . Studies of the non-pathological human transcriptome are compromised further because most tissues can only be obtained post-mortem, the provenance of samples can be variable and the health status of the individual from whom they were obtained is often unknown.
With numerous predicted mammalian protein-coding loci still having no informative functional annotation and even less insight into the function of the many non-protein-coding genes, detailed knowledge of a transcript's expression pattern can provide a valuable window on its function. Previously, we have used coexpression analysis of large mouse datasets to provide functional annotation of genes, characterization of cell types and discovery of candidate disease genes [13–16]. Isolated cell types may differ not only in their specialized function but also in their engagement with 'housekeeping' processes, such as growth and proliferation, mitochondrial biogenesis and oxidative phosphorylation, metabolism and macromolecule synthesis, the cytoskeleton, the proteasome complex, endocytosis and phagocytosis. Genes coding for proteins within pathways, both generic and cell-specific, often form coexpression clusters , so one can infer the function of a gene of unknown function from the transcriptional company it keeps, by applying the principle of guilt-by-association. The identification of coexpression clusters can, in turn, inform the identification of candidate genes within genomic intervals associated with specific traits from genome-wide association studies (GWAS) or classical linkage studies. For example, we identified a robust cluster of genes that is expressed specifically in cells of mesenchymal lineages in the mouse [14–16]. The cluster contained a large number of genes previously shown to be causally associated with inherited abnormalities of the musculoskeletal system in humans [14–16]. By inference, other genes within this cluster that have less informative annotation or no known function, are likely to be involved in musculoskeletal development. As noted previously , the conservation of coexpression clusters can provide an even more powerful indicator of likely conserved function. These authors mapped coexpressed clusters onto 850 human Mendelian disease loci of unknown molecular basis from Online Mendelian Inheritance in Man (OMIM) and identified 81 candidate genes based upon their conserved restricted expression within the affected organ.
The domestic pig (Sus scrofa) is economically important in its own right, and has also been used increasingly as an alternative model for studying human health and disease and for testing new surgical (including transplantation) and pharmacological treatments (reviewed in [18, 19]). Compared to traditional rodent models, the pig is more closely-related to humans in its size, growth, development, immunity and physiology as well as its genome sequence . The translation of preclinical studies in rodents into clinical applications in humans is frequently unsuccessful, especially for structures where rodents have very different anatomy and physiology, such as the cardiovascular system [21, 22]. The recently released pig genome sequence (Sscrofa10.2, ftp://ftp.ncbi.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Sus_scrofa/Sscrofa10.2/)  and associated annotation will greatly enhance the utility of the pig as a model . However, compared to the mouse, the knowledge of the pig transcriptome is very limited partly due to a lack of commercial expression microarrays with comprehensive gene coverage . While several EST (Expressed Sequence Tag) sequencing projects have explored gene expression across a range of tissues [26–28], a systematic global study of the tissue expression landscape is not available. Here we present a new microarray platform for the pig with greatly improved gene coverage and annotation. We have used this array to generate an expression atlas for the pig, comparable to the human/mouse expression atlases, and, using advanced visualization and clustering analysis techniques, we have identified networks of co-expressed genes. A detailed analysis of the porcine gastrointestinal tract illustrates the power of the analytical approach and data. These data will support improved annotation of the pig and human genomes and increase the utility of the pig as a model in medical research.
Results and discussion
The pig is uniquely important both as a major source of food and an animal model for human disease. Until recently the lack of a genome sequence for the pig and consequently many of the functional-genomic analysis tools, have limited the kind of analyses now routine in human and mouse systems. Here we report the design, annotation and validation of a new comprehensive microarray for the analysis of gene expression in the pig and a first attempt to produce a global map of the porcine protein coding transcriptome.
Comparison of Affymetrix arrays designed for analysis of the pig transcriptome.
Porcine Genome Array
UniGene Build 28 (2004)
Genome build 9
Genome build 9
Number of probes
Number of probesets
Includes non-coding RNAs
Arrays still provide a very cost effective solution for producing a large amount of high quality gene expression data. In terms of speed of data acquisition and availability of established analysis routines that can be run on desktop machines, arrays still have many advantages over sequencing-based analyses. With improvements in the assembly and annotation of the genome and gene models and RNAseq analyses increasing our knowledge of the transcriptional landscape of the transcriptome, there is no doubt the current array design will be enhanced.
The primary cohort of animals used for this study was a group of three- to four-month old juvenile pigs of both sexes. We aimed to gather samples of every major pig tissue. Where possible biological replicates were analyzed that originated from different animals of each sex. Regional analysis of the brain is clearly important, and more feasible in pigs than in mice, but the method of killing (cranial bolt) meant that detailed dissection of brain was not possible. The age/stage of the animals also meant that certain tissues could not be collected and the panel of tissues was supplemented by samples of placenta and a mature testis (since these are major sites of tissue restricted gene expression) [1, 2]. Since macrophages have proved to be one of the most complex sources of novel mRNAs , we included a number of macrophage samples (with or without lipopolysaccharide (LPS) stimulation) in the atlas. For details of the tissues and cells used for this study see Additional file 1, Table S1.
BioLayout Express3D [30, 31] is a unique tool in the analysis of large complex expression datasets. The statistical approach employed centers on the principle of coexpression, based on the transcript-to-transcript comparison of the expression signal across the samples analyzed, by calculation of a Pearson correlation matrix. For any given comparison, the Pearson value can range from +1 (perfect correlation) to -1 (perfect anti-correlation). The correlation and clustering algorithms within BioLayout Express3D, together with the ability to visualize and explore very large network graphs, mean that it is uniquely positioned for the analysis of large datasets and has been used extensively for this purpose [14, 16, 32–34]. A graph derived from a given correlation cut-off value includes only those genes that are related in expression to others above the selected threshold and more or less complex graphs may be analyzed by decreasing or increasing this value, respectively. Core topological structures that often form separate graph components at high thresholds are robust and are maintained as correlation cut-off values are lowered.
List of 50 largest network clusters and association with particular tissue/cells/pathway.
Number of transcripts
Alveolar macs>>other macs
LPS-induced (high in other tissues)
Immune response (IFN)
Immune response (LPS)
MHC class I
Small intestine (jej./ileum)>blood-spleen-LN
Blood-immune organs>GI tract (immune)
CNS-highest in cortex
CNS-high in spinal cord
SI epithelium (enterocyte)
GI tract>>gall bladder
Salivary gland acinar cell
General-low in macs/CNS
Smooth muscle (high in many)
Steroid hormone biosynthesis
Fallopian tube>adult testis
Many tissues-highly variable
General but not even
General but not even
General but not even-highly expressed
General but not even
General but not even
General but not even
General, relatively even
House keeping (HK1)
General, relatively even-low in macs
House keeping (HK2)
General, relatively even
House keeping (HK3)
General, relatively even
House keeping (HK4)
General, relatively even
House keeping (HK5)
Spinal cord 1 rep only (tech artefact)
Several of the largest clusters showed relatively little tissue specificity in their expression and might be considered to be 'housekeeping' genes since the proteins they encode are likely to be functional in all cell types. Such clusters are a common feature of large correlation graphs where a relatively low threshold has been employed. Genes/probes with limited informative nomenclature were over-represented in these clusters, perhaps reflecting previous research focus on genes that demonstrate tissue-restricted expression profiles . Aside from these large, nondescript clusters, the majority of the coexpression clusters were made up of transcripts that have a distinct tissue/cell restricted expression pattern. In each case, the cluster was named based upon the tissue/cell(s) in which the genes were most highly-expressed. These data recapitulate many of the known tissue restricted expression patterns that have been described for human and mouse [1, 2]. For example, there were multiple large clusters of genes with strong expression in the macrophage samples with a subset more highly-expressed in the alveolar macrophages and another set induced by LPS. Each of these clusters contained genes for numerous well-studied macrophage surface markers and receptors, and proinflammatory cytokines. A detailed comparative analysis of human and pig macrophage gene expression has been reported elsewhere . The present analysis did not identify the single large phagocytosis/lysosome functional cluster that was evident in the analysis of mouse primary cell data [14, 32]. This cluster tends to be broken up when tissue samples are included in the analysis because many of the components of this system are utilized more generally in vesicle-trafficking and in other pathways.
A secondary feature of the network graph is that clusters with similar expression patterns formed neighborhoods (Figure 2). For instance, clusters of genes selectively expressed in the reproductive tract, gastrointestinal tract, central nervous system (CNS), mesenchymal-derived tissues, dermal tissues or blood cells tended to occupy similar areas. In this way the graph distributed the transcriptome into groups of genes associated with tissues composed of cells of different embryonic lineages.
Because cells and tissues differ in their engagement with fundamental biochemical processes, the graph also contained clusters that grouped together genes associated with a particular cellular process (pathway) which may be active in a wide range of tissues albeit not at the exact same level. Examples include clusters enriched for ribosomal (clusters 50, 65, 79 and 184), cell cycle (cluster 14) and oxidative phosphorylation (clusters 27 and 99) genes. The clusters of ribosomal genes form a separate graph component which together contain 106 transcripts (approximately 94 genes), including at least 37 known ribosomal protein genes (others appear in the list but are annotated with LocusLink (LOC) gene identifiers), genes for eukaryotic translation initiation factors (EEF1B2, EIF3E, EIF3H), two members of the RNaseP complex, NACA (nascent polypeptide-associated complex alpha subunit), U1 and U4 small nuclear ribonucleoproteins and at least 23 small nucleolar RNAs (snoRNAs). snoRNAs function to guide modifications of other RNAs, particularly ribosomal protein mRNAs , consistent with their co-clustering with components of the ribosome complex. Different tissues also vary in their rates of cell renewal and consequently in the proportions of proliferating cells. Genes involved in the cell cycle, therefore, have a pattern of expression that reflects the mitotic activity of the tissues and such genes are readily identified in the graph. Cluster 14 contains many genes for proteins known to be involved in the cell cycle (GO term enrichment analysis of this cluster returned P-values of 5.2 × 10-60 for 'cell cycle' and 2.9 × 10-51 for 'mitosis') and supports the involvement of other cluster 14 genes in this pathway. For example, the cluster includes vaccinia-related kinase 1 (VRK1) shown recently to play a role in the control of mitosis , highlighting the importance of our approach for annotation of uncharacterized genes.
Genes associated with the oxidative phosphorylation pathway present in clusters 27 and 99.
ACO2, CS, FH, IDH2, IDH3B, MDH2, SUCLG1
Oxidative phoshorylation, Complex I
NDUFA1, NDUFA10, NDUFA12, NDUFA8, NDUFA9, NDUFAB1, NDUFB1, NDUFB2, NDUFB3, NDUFB5, NDUFB6, NDUFB8, NDUFB9, NDUFC1, NDUFC2, NDUFS1, NDUFS2, NDUFS6, NDUFV2, NDUFV3
MT-ND1, MT-ND2, MT-ND3, MT-ND4, MT-ND4L,
Oxidative phoshorylation, Complex II
Oxidative phoshorylation, Complex III
CYC1, UQCR10, UQCRB, UQCRC1, UQCRFS1, UQCRH
Oxidative phoshorylation, Complex IV
COX4I1, COX5B, COX6B, COX6C, COX7B2
MT-CO1, MT-CO2, MT-CO3
Oxidative phoshorylation, Complex V
ATP5A1, ATP5C1, ATP5F1, ATP5G1, ATP5G3, ATP5J2, ATP5H
Cytochrome C biosynthesis
Fatty acid (long chain) beta-oxidation
ACADVL, GOT2, HADHA, HADHB, PTGES2
Mitochondrial membrane transport
CHCHD3, NNT, SAMM50, TIMM8B, TOMM7, TUFM, VDAC1
Mitochondrial RNA processing
SLIRP, MRPL2, MRPS24
COQ6, COQ7, COQ9
BOLA3, BRP44, CHCHD10, GBAS
C11orf67, C6H4orf52, IMMT, LOC100060661, LOC100512781, LOC100520866, LOC100523804, SS18L2, WDR45
Gm8437, LOC100512762, LRP1B, MTRNR2L4
To validate the GI-specific analysis, we initially selected a number of gene families/classes where expression is known to be specific to certain cell populations in other mammals [see Additional file 5, Figure S1]. Keratins are structural proteins that distinguish different classes of epithelial cells . We looked at eight keratin gene family members (Figure S1a). All but KRT8 and KRT19 were heavily expressed in the tongue, KRT5, KRT13 and KRT78 were also expressed in the lower esophagus and fundus, both of which are lined with a stratified squamous epithelium. KRT8 and KRT19, markers of columnar epithelium [42, 43], showed the anticipated inverse pattern, with strong expression in the salivary gland, antrum and along the entire length of the small and large intestine. To confirm region-specific epithelial function, we examined the expression of four well-characterized brush border hydrolases: lactase (LCT), sucrose-isomaltase (SI), aminopeptidase N (ANPEP) and dipeptidyl-peptidase 4 (DPP4) (Figure S1b). LCT is responsible for the enzymatic cleavage of the milk sugar lactose and was detected in the duodenum and jejunum but not in the ileum. SI expression was low in the duodenum and peaked in jejunum, with lower expression in the ileum. ANPEP and DPP4 were expressed all along the small intestine. DPP4 was also highly expressed in the salivary gland and in the distal colon. These observations fit the known expression patterns for these genes in post-weaned rabbits . Associated with the role of the intestine in nutrient uptake, there were a large number of solute transporters included in the GI tract data (86 members of the SLC family alone), and many showed region-specific expression patterns consistent with their known functions (Figure S1c). For example, ferroportin (SLC40A1), a protein involved in iron export from duodenal epithelial cells and found to be defective in patients with iron overload [44, 45], was restricted to duodenum. The expression of the enterocyte sodium/glucose cotransporter (SLC5A1) was restricted to the small intestine, expression levels peaking in the jejunum  and the chloride transporter of apical membrane of columnar epithelium of the colon (SLC26A3)  which when mutated results in congenital chloride diarrhea, was largely restricted to the large bowel samples. Other cell-specific 'marker' genes, for example, mucins (salivary gland: MUC12, MUC19; stomach: MUC1, MU5AC; colon: MUC4), gut hormones (stomach: GKN1, GKN2; duodenum: CCK, GKN3, MLN), lymphocyte markers (T cell: CD2, CD3D/E, CD8A; B cell: CD19, CD22, CD79A/B, CD86), myosins (smooth muscle: MYL6, MYL9; skeletal muscle: MYL1, MYL3, MYL4) and collagens (connective tissue: COL1A1, COL1A2, COL5A1, COL6A1) were also enriched in samples where they would be expected (Figures S1d-h, respectively).
Cluster analysis summary of transcripts expressed in a region-specific manner along the porcine GI tract.
GI clusters r = 0.9, MCL1.7
Unique gene IDs
Cluster Expression Profile Description
Stratified squamous epithelium
Stratified squamous epithelium
General-higher in fundus/intestine
Stratified columnar epithelium
Colon epithelial specific function
Colon epithelial specific function
Colon epithelial specific function
Mucous neck/gastric glands
Complement (crypts/goblet cells)
B cell/cell cycle
Plasma B cell
High in intestine (small>large)/fundus
High in intestine (small>large)/fundus
Stomach/intestine-variable between animals
Stomach/intestine-variable between animals
General-higher in stomach/intestine
General-higher in stomach/intestine
MHC class 1 antigen presentation
General-higher in intestine/stomach
MHC class 2 antigen presentation (immune)
General-higher in pylorus>antrum
Smooth muscle/ECM (Fibroblast)
General-higher in tongue/u-esophagus/antrum/pylorus
General-higher in ileum, rectum
General-higher in small intestine
General-higher in muscle
In analyzing these data we have attempted to relate the clusters to the cell composition of the GI tact, based on the gene membership of clusters and their expression pattern. The different samples varied significantly in their muscle content, so some of the largest clusters contained muscle-specific genes. GI-cluster 4 was enriched for genes known to be expressed specifically in skeletal muscle and were highly expressed in the tongue and esophageal samples (Figure 5b). In contrast, the genes in GI-cluster 2 were highly expressed throughout the GI tract, peaking in the pylorus sample. The cluster contained not only genes associated with smooth muscle but also many extra-cellular matrix (ECM)-associated genes identified previously from mouse data [15, 48]. Expression of these genes was shared with other mesenchymal lineages (fat, adipose, bone) and they formed a separate cluster in the whole atlas data. GI-cluster 9 sits between GI-clusters 2 and 4 and comprises a set of genes expressed in both muscle types. Another cluster in this region of the graph (GI-cluster 17) contained many of the genes associated with oxidative phosphorylation (as discussed above) with a number of interesting and plausible new additions to this pathway. Finally, GI-cluster 10 genes were highly-expressed in the pylorus sample. The cluster contained numerous neuron-associated genes and may derive from neuronal/supporting cells that make up the enteric plexus. Although the motile and hormonal activity of the GI tract is controlled by a complex nervous system, neurons actually represent only a small percentage of the cells that make up the organ. Hence, their expression signature would appear to be relatively weak compared with other cell types.
The GI tract is also a major immune organ. It represents one of the main battle grounds in an animal's defense against invading pathogens because of the large surface area, the nutrient rich luminal environment and the requirement for a thin lining permeable to nutrients. It is, therefore, unsurprising that the largest cluster of genes (GI-cluster 1) contained many genes associated with the immune system, their expression being two- to three-fold higher in the ileum than other regions. The lower small intestine is known to be associated with increased immune surveillance and the presence of Peyer's patches (specialized lymphoid follicles associated with sampling and presentation of luminal antigens). The cluster analysis did not separate the immune cell types which are largely co-located in the lamina propria and lymphoid aggregates. Included in GI-cluster 1 were genes encoding many of the protein components of the B cell receptor complex (CD19, CD22, CD79A/B, CR2) but also numerous genes identified in the full atlas analysis as being expressed specifically by T cells or macrophages. Also evident in this cluster were many of the core components of the cell cycle, for example cyclins, DNA polymerases, kinesins, and so on, again identified in the whole atlas as a discrete cluster (atlas cluster 14). The association of cell cycle genes with an immune signature is most likely due to the high level of lymphocyte proliferation , which increases the proportion of cells undergoing mitosis relative to the rest of the organ. In the neighborhood of the main GI immune cluster were smaller clusters of immune-associated genes that were expressed in a distinct but related manner, perhaps connected to regional immune specialization. GI-cluster 20 contains many of the components of the T cell receptor complex (CD2, CD3D/E/G, CD8A) which could be aligned with the distribution of intraepithelial lymphocytes. The analysis also detected a small, heavily expressed cluster of plasma B cell genes (GI-cluster 39, high expression in salivary gland, stomach and along the length of the small and large intestines) and two small clusters of immune response genes (GI-clusters 27 and 33) that varied significantly in their level of expression between animals. Other clusters were enriched for MHC class 1 (GI-cluster 11) and class 2 (GI-cluster 22) antigen presentation pathway genes.
Although the lamina propria of the gut contains the largest macrophage population in the body , many of the macrophage-specific genes identified in the whole atlas were not detectable in GI-cluster 1. For each of the genes in the macrophage cluster as defined in the full atlas dataset, we calculated the ratio of their highest expression in macrophages to their highest expression across GI tract samples. The average ratio was around 5, suggesting that macrophages provide around 20% of the total mRNA yield from the gut. The genes that were under-expressed based upon this ratio were derived mainly from atlas cluster 18, the subset of macrophage-expressed genes that was enriched in alveolar macrophages. The most repressed was CYP7A1, the cholesterol-7-hydroxylase, which metabolizes bile acids. The other striking feature was the large number of genes for C-type lectins, including CLEC5A (MDL1), CLEC7A (dectin), CD68 (macrosialin), CLEC4D (MCL), SIGLEC1 (sialoadhesin), CLEC13D (MCR1, CD206), CLEC4E (mincle) and CLEC12B, that are highly-expressed in alveolar macrophages but appeared down-regulated in the GI tract. This pattern indicates that macrophages of the gut are distinct from those of the lung and blood, perhaps adapted to be hypo-responsive to food-derived glycoproteins where those of the lung must use the same receptors to recognize and engulf potential pathogens. The phenotype of lamina propria macrophages may also vary within different regions of the GI tract thereby breaking up their expression signature.
The epithelial layer exhibits a great diversity between different GI compartments, its structure and function changing in line with requirements. Many clusters correlated with the known region-specific expression of structural proteins and solute carriers described above. GI-clusters 3 and 8, containing specific keratin genes, are related to the stratified squamous epithelial populations that protect against abrasion and mechanical damage to the underlying tissues in the tongue and esophagus. Genes in GI-cluster 3 tended to be expressed in equal levels in the tongue and lower esophagus, whereas genes in GI-cluster 8 were more restricted in their expression to the tongue. These genes define the specific signature of stratified squamous epithelial populations present in this organ. Similarly GI-clusters 13 and 16 which were high in the salivary gland or along the entire length of the gut, respectively, likely represent genes specifically expressed in the stratified or ciliated columnar epithelium present in these organs. Among the columnar epithelium populations, which line the gut from the stomach to the rectum, there was region-specific differentiation, reflected by the differing levels of expression of genes along the longitudinal axis of the intestine and the presence of specific populations of glandular cells. Enriched in GI-cluster 5 were many transcripts (representing 251 unique gene IDs) that were expressed specifically in the small intestine and encode the machinery for the digestion and absorption of nutrients. In contrast, there were relatively few genes expressed specifically in the colon (GI-clusters 25 and 29, representing 37 unique gene IDs) and little evidence of functional compartmentalization of expression along that organ. Among these genes many matched the known markers of this tissue but others were novel. There are various glandular and endocrine cell populations that are integral to the columnar epithelial lining and in many cases have their origins in the same epithelial stem cell populations located at the base of the crypts. Because they inhabit specific niches within the GI tract, genes expressed specifically within them have a unique expression pattern. For this reason, we can assign the genes in GI-cluster 23 with some confidence to expression in the fundic glands, GI-cluster 18 genes to pyloric glands and GI-cluster 12 genes to mucous secreting superficial gastric glands. These assignments are also strongly supported by the gene membership of these clusters and the lists expand the complement of genes known to be expressed in these specialized glandular systems. The genes in GI-cluster 14 were likely expressed in glandular/endocrine cells present only in the duodenum. Finally, genes expressed in the salivary gland could be segregated to those expressed in serosal (GI-cluster 6) or mucosal (GI-cluster 15) acini. While both were exclusively expressed in the salivary gland they separate the two salivary gland samples, presumably due to chance sampling of different regions of the gland.
This work describes the first detailed analysis of the transcriptional landscape of the pig. Since the pig is a large animal with a physiology that is closer to man's than is that of mouse, this analysis provides a major new resource for understanding gene expression with respect to the known physiology of mammalian tissues and cells. At the single gene level, this dataset represents a comprehensive survey of gene expression across a large range of pig tissues. In instances where the expression of a gene is regulated in a tissue-specific manner it represents a good starting point for understanding its likely cellular expression pattern and, therefore, its functional role. The availability of the data on the BioGPS web portal renders the data amenable to such queries. However, it is the ability to understand the expression of a gene in the context of others that makes this analysis unique. Correlation analysis and the use of advanced network visualization and clustering techniques go beyond standard pairwise hierarchical approaches in defining coexpression relationships between genes. The approach used here allows us to capture and visualize the complexity of these relationships in high dimensional data, rendering large proportions of the data available for analysis. Using this network clustering approach we have been able to recapitulate known expression and functional relationships between genes as well as infer new ones based on guilt-by-association. The detailed analysis of the transcriptional landscape of the gastrointestinal tract provides the first comprehensive view of the regional specialization of this organ in a large animal, and has highlighted numerous candidate genes that may underlie genetic diseases of the human gastrointestinal tract such as colitis and cancer.
Design of the 'Snowball' array and annotation of the probesets
Porcine expressed sequences (cDNA) were collated from public data repositories (ENSEMBL, RefSeq, Unigene and the Iowa State University ANEXdb database) to create a non-overlapping set of reference sequences. A series of sequential BLASTN analyses, using the National Center for Biotechnology Information (NCBI) blastall executable, were performed with the -m8 option. The initial subject database comprised 2,012 sequences of manually annotated S. scrofa gene models from Havana provided by Jane Loveland (The Sanger Institute) on 29 July 2010, plus 21,021 sequences acquired using Ensembl BioMart Sscrofa (build 9, version 59 on 22 July 2010). For each iteration, query sequences that did not have an alignment with a bitscore in excess of 50 were added to the subject database prior to the next iteration.
35,171 pig mRNA sequences from NCBI, downloaded on 15 July 2010: 6,286 added to subject database
7,882 pig RefSeq sequences from NCBI, downloaded on 15 July 2010: 0 added to subject database (all RefSeq's were already represented in source 1)
43,179 pig Unigene sequences from NCBI, downloaded on 15 July 2010 (filtered to include only those longer than 500 bases): 10,125 added to subject database
121,991 contig sequences, downloaded from Iowa Porcine Assembly v1 (http://www.anexdb.orgt) on 30 July 2010 (filtered to include only those longer than 500 bases): 10,536 added to subject database.
2,370 miRNA sequences (pig, cow, human, mouse), downloaded from miRbase, 30 July 2010 (Release 15, April 2010, 14197 entries): all added without BLASTN analysis.
The final subject database comprised 52,355 expressed sequences.
To facilitate the design of array probes that were uniformly distributed along the entire length of transcripts, transcripts were split into several probe selection regions (PSRs), each of which was then the target for probe selection. The size of each PSR, typically around 150 nucleotides, was determined by the length of the input sequence, with the ultimate aim being to obtain 20 to 25 probes per transcript. Oligonucleotide design against the approximately 343,000 PSRs was performed by Affymetrix (High Wycombe, UK). In addition, standard Affymetrix controls for hybridization, labelling efficiency and non-specific binding were included on the array (a total of 123 probesets) together with complete tiling probesets for 35 porcine-related virus genome sequences (both strands, center-to-center gap of 17 nucleotides) for possible future infection-based studies. The final array is comprised of 1,091,987 probes (47,845 probesets) with a mean coverage of 22 probes/transcript.
For each query the hit with lowest e-value within each species was chosen.
Genes with e-value hits <1e-9 against Homo sapiens were annotated with HUGO (Human Genome Organization) Gene Nomenclature Committee (HGNC) names/descriptions; however, genes with matches starting with 'LOC' were not used.
Step 2 was repeated using in order: S. scrofa, Bos taurus, Pan troglodytes, Mus musculus, Canis lupus familiaris, Pongo abelii, Equus caballus, Rattus norvegicus, Macaca mulatta.
Step 3 was repeated using any other species (in no particular order) to which a hit could be obtained.
For the remaining probes LOC gene annotations were used from (in order of priority): H. sapiens, S. scrofa, B. taurus, P. troglodytes, M. musculus
Everything else was used, in no particular order.
Out of 47,845 sequences represented on the array, 27,322 probesets have annotations that correspond to a current (15 December 2011) HGNC symbol for human protein coding gene, 14,426 of which are unique (out of a total 19,219 listed by HGNC). The remaining probesets were annotated with the information available for those sequences. The array design has been submitted to ArrayExpress (AcNo. A-AFFY-189).
Tissues and cells
The majority of fresh tissue samples were obtained from young Landrace pigs (one male, three female 12- to 16-weeks old) that were being sacrificed for another study examining normal expression patterns in hematopoietic cell lineages. Pigs were sedated with ketamine (6 mg/kg) and azaperone (1 mg/kg), left undisturbed for a minimum of 15 minutes, and then killed by captive bolt. Tissues were dissected and a small piece immediately snap-frozen on dry ice and stored in a -155°C freezer until RNA extraction. All tissues were collected within a window of 10 to 90 minutes following the death of the animal. Samples of adult testis (Large White-Landrace-Duroc cross, eight- years-old) and placenta (Large White-Landrace cross, gestation day 50) that were not obtainable from the young animals were collected separately. Samples of blood and three different macrophage populations were also obtained from other animals. Blood samples were collected by jugular venepuncture of 8- to 12-week old Landrace males and 3 ml was placed in Vacuette Tempus Blood RNA tubes (Applied Biosystems, Warrington, UK) and stored at 4°C until RNA extraction. Alveolar macrophages were collected from the same animals by washing the left caudal/diaphramatic lung lobe with PBS (using 200 to 250 ml) followed by centrifugation of the bronchoalveolar lavage fluid at 800 g for 10 minutes; the supernatant (alveolar wash fluid) was retained. The alveolar macrophages were washed once with PBS prior to analysis. Bone marrow- (BMDM) and monocyte-derived macrophages (MDM) were generated from primary monocytes. A total of 400 ml of blood was collected together with five posterior ribs from each side of male Large White-Landrace pigs of 8- to 12-weeks of age. The buffy coat (after spinning the blood for 15 minutes at 1200 g) was mixed to one volume of RPMI and separated on a Ficoll gradient (Lymphoprep, Axis-Shield, Norway) for 25 minutes at 1,200 g. Peripheral blood mononuclear cells (PBMC) were then washed twice (10 minutes at 600 g, then 10 minutes at 400 g) with PBS. Bone-marrow cells (BMC) were isolated and cryopreserved at -155°C as previously described . Both BMC and PBMC were thawed and derived into macrophages in the presence of recombinant human CSF-1 for five to seven days. BMDM and MDM were then treated with LPS from Salmonella enterica serotype Minnesota Re 595 (L9764, Sigma-Aldrich, Saint-Louis, USA) at a final concentration of 100 ng/ml and RNA was collected at 0 and 7 hours.
Total RNA was extracted using the RNeasy kit as specified by the manufacturer (Qiagen Ltd, Crawley, UK). RNA concentration was measured using ND-1000 Nanodrop (Thermo Scientific, Wilmington, USA). The quality was assessed by running the samples on the RNA 6000 LabChip kit (Agilent Technologies, Waldbronn, Germany) with the Agilent 2100 bioanalyzer. A total of 500 ng of total RNA was amplified using the Ambion WT Expression Kit (Affymetrix). A total of 5.5 µg of the resulting cDNA was fragmented and labelled using the Affymetrix Terminal Labelling Kit. The fragmented and biotin labelled cDNA was hybridized to the Snowball arrays, using the Affymetrix HybWashStain Kit and Affymetrix standard protocols. The fluidics protocol used was FS_0001. In total, 111 arrays were run on samples derived from 65 tissue/cell types.
All animal care and experimentation was conducted in accordance with guidelines of The Roslin Institute and the University of Edinburgh and under the Home Office project licence number PPL 60/4259.
Data quality control and analysis
The quality of the raw data was analyzed using the arrayQualityMetrics package in Bioconductor (http://www.bioconductor.org/) and scored on the basis of five metrics, namely maplot, spatial, boxplot, heatmap and rle in order to identify poor quality data . Arrays failing on more than two metrics, were generally removed. However, in a number of cases after examining the data, particularly from a number of the macrophage samples, it was considered that their poor quality control (QC) score was down to the samples being significantly different from the others but not of poor quality. RNA samples from the pancreas were partially degraded and consequently these data were scored as being of a lower quality, but were left in the final analysis due to yielding a cluster of pancreatic marker genes. A further QC step involved the creation of a sample-sample correlation network where edges represented the Pearson correlation value and nodes the samples [see Additional file 10, Figure S3]. In a number of cases samples clearly did not group with similar samples, indicating a likely error at the point of collection or during processing and these samples were removed from the analysis. Details of the tissues/cells used in this study are given in Additional file 1, Table S1.
Following QC, data from 104 arrays run on samples derived from 62 tissue/cell types were normalized using the robust multi-array average (RMA) expression measure . In order to make these data accessible all raw and normalized data have been placed in ArrayExpress (AcNo. E-MTAB-1183) and the expression and graph layout files have been made available to support future graph-based analyses using BioLayout Express3D [see Additional files 2 and 3]. Furthermore, the data have been uploaded onto the BioGPS web site (http://biogps.org)  enabling the search for a profile of an individual gene and those correlated with it. This site also supports mouse and human atlas datasets allowing the direct comparison of gene expression profiles across species. Following data normalization, samples were ordered according to tissue type and the dataset was saved as an '.expression' file and then loaded into the network analysis tool BioLayout Express3D , as described previously . A pairwise Pearson correlation matrix was calculated for each probeset on the array as a measure of similarity between the signal derived from different probesets. All Pearson correlations with r ≥0.7 were saved to a '.pearson' file and a correlation cut off of r = 0.8 was used to construct a graph containing 20,355 nodes (probesets) and 1,251,575 edges (correlations between nodes above the threshold). The minimum sub-graph component size included in the network was five. Graph layout was performed using a modified Fruchterman-Rheingold algorithm  in three-dimensional space in which nodes representing genes/transcripts are connected by weighted, undirected edges representing correlations above the selected threshold. Gene coexpression clusters were determined using the MCL algorithm , which has been demonstrated to be one of the most effective graph-based clustering algorithms available . An MCL inflation value of 2.2 was used as the basis of determining the granularity of clustering, as it has been shown to be optimal when working with highly structured expression graphs . Clusters were named according to their relative size, the largest cluster being designated Cluster 1. Graphs of each dataset were explored extensively in order to understand the significance of the gene clusters and their relevance to the cell biology of pig tissues. A cluster was annotated if the genes within it indicated a known function shared by multiple members of the cluster. These analyses were supplemented by comparison of the clusters with tissue- and cell-specific clusters derived from network-based analyses of a human tissue atlas and an atlas of purified mouse cell populations [14, 32] and tissues, Gene Ontology , The Human Protein Atlas database  and comprehensive reviews of the literature (data not shown). A description of the average profile and gene content of the major clusters can be found in Additional file 4, Table S2.
In order to focus down specifically on expression patterns along the porcine GI tract, the data from these tissues (30 samples in total) were treated separately. Due to the smaller size of this dataset there is a greater chance of low intensity data being correlated by chance, so data were removed for all probesets where the maximum normalized expression value never exceeded a value of 50 in any of the GI samples. This filtering left 29,918 probesets. These data were then subjected to network analysis at a correlation cut-off value of r = 0.90 and clustered using an MCL inflation value of 2.2. This network was inspected manually and clusters were removed where they showed no particular region-specific expression pattern or were most likely formed due to contamination of GI tissues with surrounding tissues (for example, it would appear that one of the rectal samples was contaminated with glandular tissue of the reproductive tract). The remaining data were again subjected to network analysis (r = 0.90) producing a graph composed of 5,199 nodes/195,272 edges [see Additional file 6, Figure S2] which was clustered using an MCL inflation value of 1.7 (the lower inflation value reducing the overall number of clusters). The resulting cluster analysis of 120 clusters with a membership between 801 and 5 probesets, was then explored in order to annotate the most likely cellular source of the expression signatures observed. This was aided by reference to the cluster analysis of the whole dataset.
bone marrow cells
bone marrow-derived macrophages
central nervous system
HUGO (Human Genome Organization) Gene Nomenclature Committee
Markov cluster algorithim
peripheral blood mononuclear cells
probe selection regions
robust multi-array average
sequencing of RNA
small nucleolar RNAs
The development of the pig Snowball array was funded by the BBSRC/Defra/MRC/Wellcome Trust 'Combating swine influenza initiative' grant (BB/H014292/1), the animals used were funded by BBSRC grant (BB/G004013/1). AIS also thanks NIGMS/NIH for support of BioGPS (GM083924). We would also like to thank Dr Jane Loveland, The Sanger Institute, for her assistance in granting us access to the latest gene models available as part of the VEGA project and the team at Affymetrix, in particular Dr Lucy Reynolds, for their assistance in the design of the Snowball array. Central to this work has been BioLayout Express3D, and we would like to take this opportunity to thank the other members of the BioLayout Express3D team for all their efforts over the years and the BBSRC whose funding has made it possible (BB/F003722/1, BB/I001107/1). The Roslin Institute is supported by a BBSRC Institute Strategic Programme Grant.
- Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, Patapoutian A, Hampton GM, Schultz PG, Hogenesch JB: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA. 2002, 99: 4465-4470. 10.1073/pnas.012025199.PubMedPubMed CentralView Article
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView Article
- Ge X, Yamamoto S, Tsutsumi S, Midorikawa Y, Ihara S, Wang SM, Aburatani H: Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues. Genomics. 2005, 86: 127-141. 10.1016/j.ygeno.2005.04.008.PubMedView Article
- Heng TS, Painter MW: The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol. 2008, 9: 1091-1094. 10.1038/ni1008-1091.PubMedView Article
- Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K, Lee S, Ally A, Tam A, Sa D, Rogers S, Charest D, Stott J, Zuyderduyn S, Varhol R, Eaves C, Jones S, Holt R, Hirst M, Hoodless PA, Marra MA: Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. Genome Res. 2007, 17: 108-116.PubMedPubMed CentralView Article
- Krupp M, Marquardt JU, Sahin U, Galle PR, Castle J, Teufel A: RNA-Seq Atlas - A reference database for gene expression profiling in normal tissue by next generation sequencing. Bioinformatics. 2012, 28: 1184-1185. 10.1093/bioinformatics/bts084.PubMedView Article
- Venkataraman S, Stevenson P, Yang Y, Richardson L, Burton N, Perry TP, Smith P, Baldock RA, Davidson DR, Christiansen JH: EMAGE--Edinburgh Mouse Atlas of Gene Expression: 2008 update. Nucleic Acids Res. 2008, 860-865. 36 (Database)
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447: 799-816. 10.1038/nature05874.PubMedView Article
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563.PubMedView Article
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, et al: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38: 626-635. 10.1038/ng1789.PubMedView Article
- Thurman RE, Day N, Noble WS, Stamatoyannopoulos JA: Identification of higher-order functional domains in the human ENCODE regions. Genome Res. 2007, 17: 917-927. 10.1101/gr.6081407.PubMedPubMed CentralView Article
- Lukk M, Kapushesky M, Nikkila J, Parkinson H, Goncalves A, Huber W, Ukkonen E, Brazma A: A global map of human gene expression. Nat Biotechnol. 2010, 28: 322-324. 10.1038/nbt0410-322.PubMedPubMed CentralView Article
- Lattin JE, Schroder K, Su AI, Walker JR, Zhang J, Wiltshire T, Saijo K, Glass CK, Hume DA, Kellie S, Sweet MJ: Expression analysis of G protein-coupled receptors in mouse macrophages. Immunome Res. 2008, 4: 5-10.1186/1745-7580-4-5.PubMedPubMed CentralView Article
- Mabbott NA, Kenneth Baillie J, Hume DA, Freeman TC: Meta-analysis of lineage-specific gene expression signatures in mouse leukocyte populations. Immunobiology. 2010, 215: 724-736. 10.1016/j.imbio.2010.05.012.PubMedView Article
- Mabbott NA, Kenneth Baillie J, Kobayashi A, Donaldson DS, Ohmori H, Yoon SO, Freedman AS, Freeman TC, Summers KM: Expression of mesenchyme-specific gene signatures by follicular dendritic cells: insights from the meta-analysis of microarray data from multiple mouse cell populations. Immunology. 2011, 133: 482-498. 10.1111/j.1365-2567.2011.03461.x.PubMedPubMed CentralView Article
- Summers KM, Raza S, van Nimwegen E, Freeman TC, Hume DA: Co-expression of FBN1 with mesenchyme-specific genes in mouse cell lines: implications for phenotypic variability in Marfan syndrome. Eur J Hum Genet. 2010, 18: 1209-1215. 10.1038/ejhg.2010.91.PubMedPubMed CentralView Article
- Ala U, Piro RM, Grassi E, Damasco C, Silengo L, Oti M, Provero P, Di Cunto F: Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput Biol. 2008, 4: e1000043-10.1371/journal.pcbi.1000043.PubMedPubMed CentralView Article
- Fairbairn L, Kapetanovic R, Sester DP, Hume DA: The mononuclear phagocyte system of the pig as a model for understanding human innate immunity and disease. J Leukocyte Biol. 2011, 89: 855-871. 10.1189/jlb.1110607.PubMedView Article
- Lunney JK: Advances in swine biomedical model genomics. Int J Biol Sci. 2007, 3: 179-184.PubMedPubMed CentralView Article
- Wernersson R, Schierup MH, Jorgensen FG, Gorodkin J, Panitz F, Staerfeldt HH, Christensen OF, Mailund T, Hornshoj H, Klein A, Wang J, Liu B, Hu S, Dong W, Li W, Wong GK, Yu J, Bendixen C, Fredholm M, Brunak S, Yang H, Bolund L: Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing. BMC Genomics. 2005, 6: 70-10.1186/1471-2164-6-70.PubMedPubMed CentralView Article
- Abarbanell AM, Herrmann JL, Weil BR, Wang Y, Tan J, Moberly SP, Fiege JW, Meldrum DR: Animal models of myocardial and vascular injury. J Surg Res. 162: 239-249.
- Wall RJ, Shani M: Are animal models as good as we think?. Theriogenology. 2008, 69: 2-9. 10.1016/j.theriogenology.2007.09.030.PubMedView Article
- Groenen MAM, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens H-J, Li S, Larkin DM, Kim H, Frantz LAF, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, Bystrom M, et al: Pig genomes provide insight into porcine demography and evolution. Nature.
- Walters EM, Wolf E, Whyte J, Mao J, Renner S, Nagashima H, Kobayashi E, Zhao J, Wells KD, Critser JK, et al: Completion of the swine genome will promote swine as a large animal biomedical model. BMC Medical Genomics.
- Tuggle CK, Wang Y, Couture O: Advances in swine transcriptomics. Int J Biol Sci. 2007, 3: 132-152.PubMedPubMed CentralView Article
- Fahrenkrug SC, Smith TP, Freking BA, Cho J, White J, Vallet J, Wise T, Rohrer G, Pertea G, Sultana R, Quackenbush J, Keele JW: Porcine gene discovery by normalized cDNA-library sequencing and EST cluster assembly. Mamm Genome. 2002, 13: 475-478. 10.1007/s00335-001-2072-4.PubMedView Article
- Gorodkin J, Cirera S, Hedegaard J, Gilchrist MJ, Panitz F, Jorgensen C, Scheibye-Knudsen K, Arvin T, Lumholdt S, Sawera M, Green T, Nielsen BJ, Havgaard JH, Rosenkilde C, Wang J, Li H, Li R, Liu B, Hu S, Dong W, Li W, Yu J, Wang J, Staefeldt HH, Wernersson R, Madsen LB, Thomsen B, Hornshøj H, Bujie Z, Wang X, et al: Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags. Genome Biol. 2007, 8: R45-10.1186/gb-2007-8-4-r45.PubMedPubMed CentralView Article
- Tuggle CK, Green JA, Fitzsimmons C, Woods R, Prather RS, Malchenko S, Soares BM, Kucaba T, Crouch K, Smith C, Tack D, Robinson N, O'Leary B, Scheetz T, Casavant T, Pomp D, Edeal BJ, Zhang Y, Rothschild MF, Garwood K, Beavis W: EST-based gene discovery in pig: virtual expression patterns and comparative mapping to human. Mamm Genome. 2003, 14: 565-579. 10.1007/s00335-002-2263-7.PubMedView Article
- Orwell G: Animal Farm. 1951, London: Penguin Books
- Freeman TC, Goldovsky L, Brosch M, van Dongen S, Maziere P, Grocock RJ, Freilich S, Thornton J, Enright AJ: Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput Biol. 2007, 3: 2032-2042.PubMedView Article
- Theocharidis A, van Dongen S, Enright AJ, Freeman TC: Network visualisation and analysis of gene expression data using BioLayout Express3D. Nat Protocols. 2009, 4: 1535-1550. 10.1038/nprot.2009.177.PubMedView Article
- Hume DA, Summers KM, Raza S, Baillie JK, Freeman TC: Functional clustering and lineage markers: insights into cellular differentiation and gene function from large-scale microarray studies of purified primary cell populations. Genomics. 2010, 95: 328-338. 10.1016/j.ygeno.2010.03.002.PubMedView Article
- Kapetanovic R, Fairbairn L, Beraldi D, Sester DP, Archibald AL, Tuggle CK, Hume DA: Pig bone marrow-derived macrophages resemble human macrophages in their response to bacterial lipopolysaccharide. J Immunol. 2012, 188: 3382-3394. 10.4049/jimmunol.1102649.PubMedView Article
- Natividad A, Freeman TC, Jeffries D, Burton MJ, Mabey DC, Bailey RL, Holland MJ: Human conjunctival transcriptome analysis reveals the prominence of innate defense in Chlamydia trachomatis infection. Infect Immun. 2010, 78: 4895-4911. 10.1128/IAI.00844-10.PubMedPubMed CentralView Article
- Terns MP, Terns RM: Small nucleolar RNAs: versatile trans-acting molecules of ancient evolutionary origin. Gene Expr. 2002, 10: 17-39.PubMed
- Valbuena A, Sanz-Garcia M, Lopez-Sanchez I, Vega FM, Lazo PA: Roles of VRK1 as a new player in the control of biological processes required for cell division. Cell Signal. 2011, 23: 1267-1272. 10.1016/j.cellsig.2011.04.002.PubMedView Article
- Befroy DE, Petersen KF, Dufour S, Mason GF, Rothman DL, Shulman GI: Increased substrate oxidation and mitochondrial uncoupling in skeletal muscle of endurance-trained individuals. Proc Natl Acad Sci USA. 2008, 105: 16701-16706. 10.1073/pnas.0808889105.PubMedPubMed CentralView Article
- McGivney BA, McGettigan PA, Browne JA, Evans AC, Fonseca RG, Loftus BJ, Lohan A, MacHugh DE, Murphy BA, Katz LM, Hill EW: Characterization of the equine skeletal muscle transcriptome identifies novel functional responses to exercise training. BMC Genomics. 2010, 11: 398-10.1186/1471-2164-11-398.PubMedPubMed CentralView Article
- Martherus RS, Sluiter W, Timmer ED, VanHerle SJ, Smeets HJ, Ayoubi TA: Functional annotation of heart enriched mitochondrial genes GBAS and CHCHD10 through guilt by association. Biochem Biophys Res Commun. 2010, 402: 203-208. 10.1016/j.bbrc.2010.09.109.PubMedView Article
- Freeman TC: Parallel patterns of cell-specific gene expression during enterocyte differentiation and maturation in the small intestine of the rabbit. Differentiation. 1995, 59: 179-192. 10.1046/j.1432-0436.1995.5930179.x.PubMedView Article
- Moll R, Divo M, Langbein L: The human keratins: biology and pathology. Histochem Cell Biol. 2008, 129: 705-733. 10.1007/s00418-008-0435-6.PubMedPubMed CentralView Article
- Bartek J, Bartkova J, Taylor-Papadimitriou J, Rejthar A, Kovarik J, Lukas Z, Vojtesek B: Differential expression of keratin 19 in normal human epithelial tissues revealed by monospecific monoclonal antibodies. Histochem J. 1986, 18: 565-575. 10.1007/BF01675198.PubMedView Article
- Quaroni A, Calnek D, Quaroni E, Chandler JS: Keratin expression in rat intestinal crypt and villus cells. Analysis with a panel of monoclonal antibodies. J Biol Chem. 1991, 266: 11923-11931.PubMed
- Gordeuk VR, Caleffi A, Corradini E, Ferrara F, Jones RA, Castro O, Onyekwere O, Kittles R, Pignatti E, Montosi G, Garuti C, Gangaidzo I, Gomo ZA, Moyo VM, Rouault TA, MacPhail P, Pietrangelo A: Iron overload in Africans and African-Americans and a common mutation in the SCL40A1 (ferroportin 1) gene. Blood Cells Mol Dis. 2003, 31: 299-304. 10.1016/S1079-9796(03)00164-5.PubMedView Article
- Thomas C, Oates PS: Ferroportin/IREG-1/MTP-1/SLC40A1 modulates the uptake of iron at the apical membrane of enterocytes. Gut. 2004, 53: 44-49. 10.1136/gut.53.1.44.PubMedPubMed CentralView Article
- Freeman TC, Wood IS, Sirinathsinghji DJ, Beechey RB, Dyer J, Shirazi-Beechey SP: The expression of the Na+/glucose cotransporter (SGLT1) gene in lamb small intestine during postnatal development. Biochim Biophys Acta. 1993, 1146: 203-212. 10.1016/0005-2736(93)90357-6.PubMedView Article
- Lohi H, Kujala M, Makela S, Lehtonen E, Kestila M, Saarialho-Kere U, Markovich D, Kere J: Functional characterization of three novel tissue-specific anion exchangers SLC26A7, -A8, and -A9. J Biol Chem. 2002, 277: 14246-14254. 10.1074/jbc.M111802200.PubMedView Article
- Hume DA, MacDonald KP: Therapeutic applications of macrophage colony-stimulating factor-1 (CSF-1) and antagonists of CSF-1 receptor (CSF-1R) signaling. Blood. 2012, 119: 1810-1820. 10.1182/blood-2011-09-379214.PubMedView Article
- David CW, Norrman J, Hammon HM, Davis WC, Blum JW: Cell proliferation, apoptosis, and B- and T-lymphocytes in Peyer's patches of the ileum, in thymus and in lymph nodes of preterm calves, and in full-term calves at birth and on day 5 of life. J Dairy Sci. 2003, 86: 3321-3329. 10.3168/jds.S0022-0302(03)73934-4.PubMedView Article
- Platt AM, Mowat AM: Mucosal macrophages and the regulation of immune responses in the intestine. Immunol Lett. 2008, 119: 22-31. 10.1016/j.imlet.2008.05.009.PubMedView Article
- Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM: A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009, 10: 252-263. 10.1038/nrg2538.PubMedView Article
- Smith SB, Qu HQ, Taleb N, Kishimoto NY, Scheel DW, Lu Y, Patch AM, Grabs R, Wang J, Lynn FC, Miyatsuka T, Mitchell J, Seerke R, Desir J, Eijnden SV, Abramowicz M, Kacet N, Weill J, Renard ME, Gentile M, Hansen I, Dewar K, Hattersley A, Wang R, Wilson ME, Johnson JD, Polychronakos C, German MS: Rfx6 directs islet formation and insulin production in mice and humans. Nature. 2010, 463: 775-780. 10.1038/nature08748.PubMedPubMed CentralView Article
- Soyer J, Flasse L, Raffelsberger W, Beucher A, Orvain C, Peers B, Ravassard P, Vermot J, Voz ML, Mellitzer G, Gradwohl G: Rfx6 is an Ngn3-dependent winged helix transcription factor required for pancreatic islet cell development. Development. 2010, 137: 203-212. 10.1242/dev.041673.PubMedPubMed CentralView Article
- Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, Zwahlen M, Kampf C, Wester K, Hober S, Wernerus H, Bjorling L, Ponten F: Towards a knowledge-based Human Protein Atlas. Nat Biotechnol. 2010, 28: 1248-1250. 10.1038/nbt1210-1248.PubMedView Article
- Wu F, Sapkota D, Li R, Mu X: Onecut 1 and Onecut 2 are potential regulators of mouse retinal development. J Comp Neurol. 2012, 520: 952-969. 10.1002/cne.22741.PubMedPubMed CentralView Article
- Vanhorenbeeck V, Jenny M, Cornut JF, Gradwohl G, Lemaigre FP, Rousseau GG, Jacquemin P: Role of the Onecut transcription factors in pancreas morphogenesis and in pancreatic and enteric endocrine differentiation. Dev Biol. 2007, 305: 685-694. 10.1016/j.ydbio.2007.02.027.PubMedView Article
- Tang W, Li Y, Osimiri L, Zhang C: Osteoblast-specific transcription factor Osterix (Osx) is an upstream regulator of Satb2 during bone formation. J Biol Chem. 2011, 286: 32995-33002. 10.1074/jbc.M111.244236.PubMedPubMed CentralView Article
- Zhang J, Tu Q, Grosschedl R, Kim MS, Griffin T, Drissi H, Yang P, Chen J: Roles of SATB2 in osteogenic differentiation and bone regeneration. Tissue Eng Part A. 2011, 17: 1767-1776. 10.1089/ten.tea.2010.0503.PubMedPubMed CentralView Article
- Gyorgy AB, Szemes M, de Juan Romero C, Tarabykin V, Agoston DV: SATB2 interacts with chromatin-remodeling molecules in differentiating cortical neurons. Eur J Neurosci. 2008, 27: 865-873. 10.1111/j.1460-9568.2008.06061.x.PubMedView Article
- Britanova O, de Juan Romero C, Cheung A, Kwan KY, Schwark M, Gyorgy A, Vogel T, Akopov S, Mitkovski M, Agoston D, Sestan N, Molnar Z, Tarabykin V: Satb2 is a postmitotic determinant for upper-layer neuron specification in the neocortex. Neuron. 2008, 57: 378-392. 10.1016/j.neuron.2007.12.028.PubMedView Article
- Rosenfeld JA, Ballif BC, Lucas A, Spence EJ, Powell C, Aylsworth AS, Torchia BA, Shaffer LG: Small deletions of SATB2 cause some of the clinical features of the 2q33.1 microdeletion syndrome. PLoS One. 2009, 4: e6568-10.1371/journal.pone.0006568.PubMedPubMed CentralView Article
- Magnusson K, de Wit M, Brennan DJ, Johnson LB, McGee SF, Lundberg E, Naicker K, Klinger R, Kampf C, Asplund A, Wester K, Gry M, Bjartell A, Gallagher WM, Rexhepaj E, Kilpinen S, Kallioniemi OP, Belt E, Goos J, Meijer G, Birgisson H, Glimelius B, Borrebaeck CA, Navani S, Uhlen M, O'Connor DP, Jirstrom K, Ponten F: SATB2 in combination with cytokeratin 20 identifies over 95% of all colorectal carcinomas. Am J Surg Pathol. 2011, 35: 937-948. 10.1097/PAS.0b013e31821c3dae.PubMedView Article
- Kauffmann A, Gentleman R, Huber W: arrayQualityMetrics--a bioconductor package for quality assessment of microarray data. Bioinformatics. 2009, 25: 415-416. 10.1093/bioinformatics/btn647.PubMedPubMed CentralView Article
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.PubMedView Article
- Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, Su AI: BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol. 2009, 10: R130-10.1186/gb-2009-10-11-r130.PubMedPubMed CentralView Article
- Fruchterman TMJ, Rheingold EM: Graph drawing by force directed placement. Softw Exp Pract. 1991, 21: 1129-1164. 10.1002/spe.4380211102.View Article
- van Dongen S: Graph clustering by flow simulation. PD Thesis. 2000, University of Utrecht
- Brohee S, van Helden J: Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics. 2006, 7: 488-10.1186/1471-2105-7-488.PubMedPubMed CentralView Article
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView Article
- Uhlen M, Bjorling E, Agaton C, Szigyarto CA, Amini B, Andersen E, Andersson AC, Angelidou P, Asplund A, Asplund C, Berglund L, Bergström K, Brumer H, Cerjan D, Ekström M, Elobeid A, Eriksson C, Fagerberg L, Falk R, Fall J, Forsberg M, Björklund MG, Gumbel K, Halimi A, Hallin I, Hamsten C, Hansson M, Hedhammar M, Hercules G, Kampf C, et al: A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol Cell Proteomics. 2005, 4: 1920-1932. 10.1074/mcp.M500279-MCP200.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.