Body-wide transcriptomics analysis of the pig
To generate a body-wide expression atlas of the porcine protein-coding genes, 350 samples representing 98 different tissues and 14 organ systems (Fig. 1A) were collected from four young adult (two males and two females, 1-year-old) Bama minipigs. The 98 tissues were grouped into 44 main organ/tissue types based on shared developmental, functional, and/or anatomical properties (Fig. 1B). The 30 tissue types that represent the central nervous system were included in Sjöstedt et al. [21] comparing expression profiles across human, pig, and mouse brains. The protein-coding expression data of the pig brain is also integrated into the Human Protein Atlas (HPA) Brain section.
The dissection accuracy of the tissue samples was confirmed by histological analysis of adjacent tissue. Samples were sequenced with an average depth of 165.5 million reads (Additional file 2), and read counts were normalized (protein-coding transcript per million (pTPM) for visualization, and normalized expression (NX) for gene classification) for all 22,342 protein-coding genes. In total, 22,007 (98.5%) genes were detected (NX > 1) in at least one tissue type, ranging from 13,607 to 16,867 genes detected per tissue type (Additional file 1: Fig. S1A). Highly specialized tissue types, such as the lens and joint cartilage, express fewer genes, whereas tissues composed of many different cell types (e.g., testis and brain) express the highest number of genes, in line with results from human tissues [1, 22, 23].
To further investigate similarities in global transcriptome profiles between tissues, Spearman correlation was used in a pairwise correlation heatmap for the 44 grouped tissues (Additional file 1: Fig. S1B). The heatmap with a body-wide representation of all tissues and organs shows that testis and the various brain samples have the most divergent global expression profiles similar to the pattern in human tissues [1, 2]. The corresponding dendrogram based on the global transcriptomics profile across all protein-coding genes (Fig. 1C) demonstrates that related tissues cluster together, including tissues of the respiratory system, immune system, gastrointestinal tract, muscle tissues, and the nervous system. In general, closely clustering tissues often share germ layer origin, functions, and/or cellular composition, e.g., skin, mouth tissues, and cornea all include ectoderm-derived squamous epithelium [24, 25]. The esophagus, although containing squamous epithelium, revealed a high degree of similarity with the salivary gland and other secretory tissues due to the presence of esophageal glands [26]. Neuroectoderm-derived tissues such as brain tissues, pituitary gland, pineal gland, and retina cluster into one major branch (Fig. 1C and Additional file 1: Fig. S1B). The mesoderm-derived tissues, including all soft tissues, and skeletal and cardiac muscles are clustered closely. The endoderm-derived tissues including respiratory tissues (i.e., lung, bronchus, trachea, larynx) and gastrointestinal tissues are clustered together. In contrast, tissues composed of major cell types originating from different germ layers are clustered between the major germ layers, such as glands and reproductive tissues. The testis, with a large enrichment of germ cells, is clustered separately. Similar clustering patterns of tissues per germ layer has previously been described [27], including in pig specifically [15].
The genome-wide expression profiles were investigated for all the 350 individual samples using dimensional reduction analysis and the results for principal component analysis (PCA) are shown in Fig. 2 and using UMAP in Additional file 1: Fig. S2A. The analysis shows that tissue types with related functions share similar global expression profile and that the brain samples have a unique expression pattern compared to peripheral tissues. The 30 brain subregions cluster according to the basic organization of the brain, with the spinal cord and brainstem together with corpus callosum and other white matter-rich regions, separated from neuronal rich cerebellum and cortical areas (Fig. 2). The shared developmental neuroectoderm origin between the brain, endocrine tissues, and retina [28, 29] is also seen in the global expression comparison. Additionally, tissues from the gastrointestinal (GI) tract show similarity to lymphoid tissues, possibly explained by local germinal centers in GI, and specialized GI immune cells [30]. The respiratory system is found close to the GI tract and lymphoid tissues, due to the presence of mucus-secreting goblet cells (also found in GI) and respiratory tract-associated immune cells [31]. A correlation analysis showed a high correlation between samples of the same tissue type, with an average Spearman correlation of 0.96, ranging from 0.89 to 0.99, depending on tissue type (Additional file 1: Fig. S2B).
Genome-wide annotation of the protein-coding genes
To generate an overview of the body-wide distribution and specificity of pig genes, the gene classification approach used in the Human Protein Atlas program was adapted to classify all 22,342 porcine protein-coding genes as described in Additional file 3 [1, 2], and exemplified in Additional file 1: Fig. S3D. The categories of tissue enriched, group enriched, and tissue enhanced are collectively termed tissue elevated. The specificity categorization shows that 13,372 genes have elevated expression in one or more tissues, out of which 3085 genes show enriched expression (Additional file 1: Fig. S3A and S3C). Genes with elevated expression are as expected mostly found in tissues with highly specialized cells, such as the brain (n = 2930), testis (n = 2,718), and lymphoid tissues (n = 1,360) (Additional file 1: Fig. S3B). Whereas tissue types composed of large proportions of common structures and cell types have lower number of genes with elevated expression, such as smooth muscle-rich tissues or soft tissues (e.g., aorta and adipose tissues).
A network plot (Fig. 3A) was constructed to visualize commonalities between tissues in terms of tissue and group enriched genes across all the tissues and organs analyzed here. Most tissue enriched genes are found in the testis (n = 1004) followed by the brain (n = 409) and liver (n = 239) similar to the corresponding analysis in the human body [2]. Most group enriched genes are found between the heart and skeletal muscle (n = 57) and between the kidney and liver (n = 50). The data has been published in a new open-access resource called the Pig RNA Atlas (www.rnaatlas.org), to allow researchers to explore the list of genes corresponding to the various tissues and organs. Furthermore, analysis of tissue distribution highlighted 1046 genes to be detected in a single tissue type (Additional file 1: Fig. S3A), out of which a large fraction was also classified as testis enriched. The highly specific expression of the testis is due to the testis-specific Sertoli and germ cells and has previously been described in human [2], pig [15, 32], macaque [33], and mouse [34]. In contrast, a large portion of the genes is classified as low tissue specificity and detected in all tissues (n = 7699), and this set of genes is also interesting to study further.
To confirm and further explore expression profiles observed in pig tissues at the protein level, we stained tissues with antibodies for visualization of proteins corresponding to genes classified as tissue enriched, in terms of location and distribution (Fig. 3B). The examples include the brain enriched Myelin oligodendrocyte glycoprotein (MOG), a protein detected in oligodendrocytes and myelin sheets in the brain; the liver enriched Asialoglycoprotein receptor 1 (ASGR1) which is a liver transmembrane protein detected in hepatocytes; the testis enriched Cysteine-rich secretory protein 2 (CRISP2) detected in spermatids; the skeletal muscle enriched Troponin T1 (TNNT1) detected in the slow muscle fibers; and skin enriched Desmocollin 1 (DSC1) a desmosomal cadherin detected in the membrane of keratinocytes. In all cases, the good agreement between the RNA expression and protein detection supports the approach to use RNA as proxy for mapping protein profiles in tissue.
New genome-wide classification of expression profiles based on UMAP dimensionality reduction
To complement the genome-wide annotation of expression based on specificity and distribution as previously described, we here introduce a new classification system for gene expression based on dimensional reduction of global expression patterns using UMAP, and subsequently density-based clustering [35]. The expression of 22,342 protein-coding genes across the 350 individual samples was projected onto two dimensions (Additional file 1: Fig S4A-B), and the genes were subsequently classified into 84 clusters based on their expression across the tissues and organs (Fig. 4 and Additional file 4). In this manner, all protein-coding genes have been classified based on their similarity in expression with other genes across all samples, designating each gene into a single Tissue Expression Cluster. Based on the cluster’s expression profile and functional enrichment analyses, an annotation of the clusters was performed, assigning each cluster a name describing the cluster’s specificity, and/or function (Fig. 4 and Additional file 5). To facilitate annotation and further characterize the 84 clusters, tissue specificity category, expression proportion per tissue type, and abundance level were summarized in Additional file 1: Fig. S5. Genes in each cluster can be explored in the open-access Pig RNA Atlas, together with cluster annotations based on Gene Ontology (GO) terms and tissue specificity.
The expression UMAP shows an expression “landscape” with distinct clusters with genes related to tissues and/or functions, such as the testis or muscle contraction. Many genes involved in neurological functions can be found in the brain-related clusters situated adjacent to each other. Similarly, cluster of genes involved in immunological function such as the clusters annotated as “lymphoid B cells,” “lymphoid T cells,” and “housekeeping defence” are found adjacent to each other. Interestingly, the “housekeeping” genes expressed in all tissues are found in distinct clusters, mostly adjacent to each other in the UMAP, as exemplified by the clusters annotated as “housekeeping protein processing” and “housekeeping regulation” (Fig. 4).
When the tissue specificity classification is superimposed upon the cluster landscape (Additional file 1: Fig. S6A), patterns of the various categories emerge. Additional file 1: Fig. S6A shows that genes classified as tissue enriched or group enriched reside in smaller clusters of genes, or at the periphery of larger groups of genes, while genes classified as tissue enhanced are centrally located and partially overlapping with the genes annotated as low tissue specificity. Furthermore, genes classified as tissue elevated in a tissue cluster together, exemplified in Additional file 1: Fig. S6B, which shows how the majority of the genes classified as brain elevated cluster together, spatially distinct from genes classified as elevated in the lung, lymphoid tissues, or testis. In addition to clustering by tissue specificity, genes with a functional relationship can be observed to be co-localized, such as for cluster 23 (a cluster of 477 genes, highlighted in Additional file 1: Fig. S6B), which harbors genes with elevated expression in both testis and lung, as well as choroid plexus, upper respiratory system, and fallopian tube (Additional file 1: Fig. S7), and more in-depth analysis reveals that many of these genes code for proteins of ciliated cells, including proteins involved in mobility, such as the sperm flagella [36].
To facilitate cluster annotation and find an association between clusters and tissues, a hypergeometric test was conducted, calculating the extent of the observed overlap between genes elevated for each tissue and the cluster genes. Genes classified as elevated in the lung, testis, choroid plexus, upper respiratory system, and fallopian tube are significantly overlapped with cluster 23 (Fig. 5). Indeed, Gene Set Analysis (GSA) towards GO annotations revealed that cluster 23 is enriched with genes related to cilium functions, including cilium movement, organization, and assembly. These results indicate that genes are arranged in groups of clusters with distinct relation to certain tissue types. For instance, clusters 46, 49, and 50 contain genes highly expressed in muscle tissues, although each cluster also shows a distinct expression pattern: cluster 46 is dominated by the skeletal muscle, while cluster 49 is mainly expressed in the heart muscle. Other examples include cluster 33 with almost exclusive expression in the lens and clusters 57 and 59, which include genes important for squamous epithelium and include several different keratin-coding genes.
There are 18 Tissue Expression Clusters containing altogether 9910 protein-coding genes with “housekeeping” functions, with an overrepresentation of genes classified as low tissue specificity, as exemplified by clusters 22, 53, and 66. Functional analysis shows that cluster 66 (2285 genes) is mainly enriched for genes related to transcription, RNA processing, and DNA repair. Similarly, cluster 22 contains 204 genes related to DNA-template regulation of transcription, while cluster 53 only includes 19 mitochondrial protein-coding genes, verifying the housekeeping-related functions of the clusters.
Thirteen clusters were annotated as “low abundant - uncharacterized” due to limited gene information, low expression levels, and limited functional data. However, among the uncharacterized clusters, olfactory receptors were highly represented with cluster 6 harboring 73 out of 88 genes coding for olfactory receptors and cluster 21 with many olfactory receptors (17 out of 54) found in male reproductive tissues, such as the testis and epididymis, and cluster 56 (10 of 13 genes) with olfactory receptors found in the lung and bronchus. This suggests that the porcine olfactory receptors have additional functionality beyond olfaction, which is consistent with previous findings of the human olfactory receptors [37].
In summary, we have introduced a new genome-wide classification scheme to identify genes with similar expression profiles based on dimensional reduction. This has allowed us to classify all pig protein-coding genes into 84 Tissue Expression Clusters. This new approach for classification is an attractive tool for annotation of mammalian proteomes to catalogue all proteins according to body-wide expression patterns.
Comparison of body-wide gene expression between pig and human
The pig whole-body expression atlas enables us to compare tissue-wide similarities and differences between the pig and human expression. Here, we analyze the expression profiles of 32 tissue types for which the data presented here for pig could be compared with the data already generated for human tissues [1]. First, we generated a UMAP of the global expression profiles of these tissues in human and pig (Fig. 6A). As expected, tissues from the two species cluster together based on tissue types, but certain tissues such as the ovary, breast, and cervix show distinct expression profile differences in pig and human. The retina and bone marrow show a large discrepancy in the clustering, which is expected since the sampling for these tissues from the two species differed. The pig retina was isolated with as little pigment layer as possible, whereas the human retina sample included the pigment layer. Similarly, the pig bone marrow was used without further fractionation, whereas the human bone marrow was Ficoll separated, thus isolating mononuclear cells from e.g. adipose cells, vessels, and non-hematopoietic components [38]. The esophagus and salivary gland also show somewhat different clustering for pig and human tissues, most likely explained by the abundance of glands in the submucosal layer of the pig esophagus, which are limited in the human esophagus.
To achieve a detailed comparison regarding tissue-specific expression profiles, we subsequently investigated the overlap between the specificity classification categories in pig versus human using the updated gene classification described previously [2]. Figure 6B shows that 6496 genes are classified as low tissue specificity in both pig and human tissues, while the remaining 9673 genes are classified as elevated in either of the two species. A majority of the elevated genes are classified similarly in the two species (Additional file 1: Fig. S8B) with few elevated genes showing a different tissue specificity. The gene category overlap was particularly high when comparing tissue enriched and group enriched genes (Additional file 1: Fig. S8B and S8C) with 76% and 80% of the genes having overlap in classification respectively. This demonstrates the similar molecular architecture of these evolutionary close species.
However, there are some interesting differences that are worth more in-depth studies to understand their respective molecular function in human and pig. For example, the neuropeptide galanin (GAL) was classified as tissue enriched in the pig adrenal gland, but was classified as not detected in human adrenal gland samples. Similarly, the pro-neuropeptide Y precursor (NPY) is classified as group enriched in the human adrenal gland, brain, and prostate, while being group enriched in brain and lymphoid tissues in pig. Additionally, the human testis-specific protein, MORC family CW-type zinc finger 1 (MORC1), is classified as enriched in the pig liver. This list of genes classified as elevated in different tissue types between human and pig (Additional file 6) is obviously of high relevance for our understanding of evolutionary processes that drive species differences.
To statistically assess the similarity between human and pig gene classification, a hypergeometric test was performed for each pair of human and pig tissues (Fig. 6C). Brain, liver, and lymphoid tissues show high similarity between human and pig. As expected, the analysis revealed similarities between the heart (cardiac) muscle and skeletal muscle, as well as between the brain and retina. Interestingly, the hypergeometric test suggests overlap in expression profiles between the fallopian tube and lung, which is most likely explained by the presence of ciliated cells in both tissues. To further explore the global transcriptome similarity between human and pig tissues, we performed a genome-wide comparison of gene expression between pig and human for each tissue using Spearman correlation, resulting in 32 scatter plots (Additional file 1: Fig. S8E). The global transcriptome correlation between species for the individual tissue types ranges from 0.60 to 0.80. Collectively, the body-wide gene expression comparison between pig and human thus suggests that the global protein-coding gene expression is similar between the two species. However, an interesting exception is the low similarity for reproductive tissue, as exemplified by ductus deferens, ovary, endometrium, cervix, and prostate. It would be of interest to extend this comparison to other mammals, such as rodents, to give context to the similarity between human and pig.
An alternative approach to investigate similarities and differences between human and pig is to perform antibody-based tissue profiling, to allow a single-cell analysis of the corresponding protein in situ in the context of neighboring cells. Here, we used antibodies raised against the human ortholog to probe the tissue profile in both human and pig tissue (Fig. 6D and Additional file 1: Fig. S8A). The first example is the Phospholamban (PLN) protein showing a similar staining in the heart muscle of both pig and human, supporting its role in calcium regulation in myocytes [39]. Similarly, Cadherin 17 (CDH17) is shown to stain GI-related tissues in both species, supporting the GI-enriched classification in both species. Furthermore, the special AT-rich sequence-binding protein 2 (SATB2) classified as enriched in the intestine and brain in both species shows similar staining in the intestine of both species. It is also reassuring that pyruvate dehydrogenase E1 beta subunit (PDHB) classified as low tissue specificity in both species shows a ubiquitous staining across many tissues in both species (Additional file 1: Fig. S8A).
The antibody-based profiling can also be used to validate the genes with differential expression in the two species. In Fig. 6D, the antibody-based tissue profiling of estrogen synthetase (CYP19A1) is shown. CYP19A1 was classified as testis enriched in pig, but instead enriched in the placenta in humans. The tissue profiling confirms the high abundance of this protein in pig testis, while antibodies to this protein instead stain human placenta [40]. Interestingly, the CYP19A1 catalyzes the synthesis of estrogens from androgens in the steroid hormone biosynthesis and is associated to fertility in pig [41]. In this context, it is interesting to note that many genes related to steroid hormones are differentially expressed in the testis of the two species, most likely due to the abundant number of Leydig cells in pig testis compared to human testis. This is further exemplified by scavenger receptor class B member 1 (SCARB1), a receptor important for uptake of cholesteryl esters and ovarian steroidogenesis [42, 43]. This protein shows a similar protein profile in the adrenal gland, testis, and ovary. However, both the RNA expression level and the protein abundance are much lower in the human testis and ovary as compared to the corresponding tissues in pig (Additional file 1: Fig. S8D).
The Pig RNA Atlas
An interactive Pig RNA Atlas (www.rnaatlas.org) has been launched as part of this study. This open-access resource harbors more than 20,000 separate web pages, including summary pages for all protein-coding genes of pig. Genes are searchable based on gene name and gene id. Categorizations in terms of specificity, distribution, and UMAP-based Tissue Expression Profile clusters are presented and searchable for each gene. Human ortholog data is an integrative part of the atlas with tissue expression profiles for both human and pig shown on the pig gene summary pages. In addition, the tissues are grouped into organ systems, each described in separate chapters with illustrative images and IHC examples. The Pig RNA Atlas also includes a pig histology dictionary based on representative stained sections from the tissues in this study, providing morphological details and comparison to human tissues.