Platypus globin genes and flanking loci suggest a new insertional model for beta-globin evolution in birds and mammals
© Patel et al. 2008
Received: 28 May 2008
Accepted: 25 July 2008
Published: 25 July 2008
Skip to main content
© Patel et al. 2008
Received: 28 May 2008
Accepted: 25 July 2008
Published: 25 July 2008
Vertebrate alpha (α)- and beta (β)-globin gene families exemplify the way in which genomes evolve to produce functional complexity. From tandem duplication of a single globin locus, the α- and β-globin clusters expanded, and then were separated onto different chromosomes. The previous finding of a fossil β-globin gene (ω) in the marsupial α-cluster, however, suggested that duplication of the α-β cluster onto two chromosomes, followed by lineage-specific gene loss and duplication, produced paralogous α- and β-globin clusters in birds and mammals. Here we analyse genomic data from an egg-laying monotreme mammal, the platypus (Ornithorhynchus anatinus), to explore haemoglobin evolution at the stem of the mammalian radiation.
The platypus α-globin cluster (chromosome 21) contains embryonic and adult α- globin genes, a β-like ω-globin gene, and the GBY globin gene with homology to cytoglobin, arranged as 5'-ζ-ζ'-αD-α3-α2-α1-ω-GBY-3'. The platypus β-globin cluster (chromosome 2) contains single embryonic and adult globin genes arranged as 5'-ε-β-3'. Surprisingly, all of these globin genes were expressed in some adult tissues. Comparison of flanking sequences revealed that all jawed vertebrate α-globin clusters are flanked by MPG-C16orf35 and LUC7L, whereas all bird and mammal β-globin clusters are embedded in olfactory genes. Thus, the mammalian α- and β-globin clusters are orthologous to the bird α- and β-globin clusters respectively.
We propose that α- and β-globin clusters evolved from an ancient MPG-C16orf35-α-β-GBY-LUC7L arrangement 410 million years ago. A copy of the original β (represented by ω in marsupials and monotremes) was inserted into an array of olfactory genes before the amniote radiation (>315 million years ago), then duplicated and diverged to form orthologous clusters of β-globin genes with different expression profiles in different lineages.
The evolution of the vertebrate globin superfamily has been extensively studied for many decades by comparing the structure and function of members of the gene families. These are principally haemoglobin, myoglobin, cytoglobin and neuroglobin and, more recently, globin X (in fish and amphibians ) and globin Y (specific to amphibians ).
Haemoglobin genes (alpha- and beta-globin) are of particular interest because of their critical role in oxygen transportation from the respiratory surfaces to the inner organs, and because of the dire effects of mutations in human globin genes that cause haemoglobinopathies . The genes contained in the alpha (α)- and beta (β)-globin clusters are expressed at different stages of development and in different tissues. Together, gene products from both clusters form the functional tetrameric haemoglobin molecules needed to fulfil oxygen requirements.
The evolutionary history of α- and β-globin genes can be traced back to the common ancestors of fish, amphibians and amniotes (reptiles, birds and mammals), by comparing gene structure and composition of α- and β-globin clusters across vertebrates. In the amphibians Xenopus laevis and X. tropicalis, α- and β-globin genes are tightly juxtaposed as 5'-α-β-3' [2, 4–6]. In the Antarctic notothenioid fish (Notothenia coriiceps, N. angustata, Trematomus hansoni, T. pennellii), there is also a single 5'-α-β-3' locus , although in pufferfish (Fugu rubripes) there are two globin clusters (one with α-globin genes and the other with both α- and β-globin genes), which are located on different chromosomes .
In amniotes, α- and β-globin clusters are located on different chromosomes. It was proposed that the ancestral α- and β-globin genes were located together in the common ancestor of amniotes, as they are in fish and amphibians, but became separated, either by chromosome fission or translocation between α- and β-genes, or by chromosome/genome or in trans duplication and gene loss .
Further duplications then occurred in amniote lineages. The ancestral α-globin gene is thought to have duplicated twice before the divergence of the bird-mammalian lineages, to produce progenitors of embryonic globin genes π/ζ, and adult αD and αA, all of which are present in birds (for example, the chicken Gallus gallus) [9–11] and mammals [12, 13]. The order and timing of these duplications is still debated, as is their origin: for instance, αD may have evolved by duplication either of adult αA (see ), or of an embryonic α-like gene . After the avian and mammalian lineages diverged, there were further tandem duplications of the π/ζ and αA lineages to produce more complex marsupial and eutherian ('placental') mammalian α-globin clusters, 5'-ζ-ψζ'-αD-ψα3-α2-α1-θ-3' (see [12, 15–18]). The timing of these duplication events is also uncertain, because we do not know whether these seven α-like globin genes all existed at the stem of the mammalian radiation.
As for many other gene families , comparisons of globin genes between distantly related mammals have provided unique insight into the evolution and function of the mammalian globins. Marsupials diverged from eutherian mammals about 148 million years ago (MYA), and mammalian Subclass Theria that contains these groups diverged from monotremes (Subclass Prototheria) about 166 MYA , so comparisons between these major mammal groups provide depth for evolutionary comparisons. Monotremes retain many anatomical and developmental features shared with birds and reptiles. Their small genome, too, and disjunct chromosome size classes are reminiscent of reptile genomes, and the 10 sex chromosomes in a karyotype of 52 chromosomes is unique among mammals [21–23]. Their importance for comparative studies is now increasingly recognised after the sequencing of the genome of a monotreme, Ornithorhynchus anatinus (platypus), to a depth of six to eight times by the Washington University Genome Centre, St Louis .
Indeed, studies of marsupial globins have clarified the timing of some of the duplications. The finding of single ε- (embryonic) and β-globin (adult) genes together in the marsupial β-globin cluster indicated that a two-gene cluster (ε-β) was present in the common therian ancestor [25–28]. Genes in the cluster were further duplicated to produce the ancestral eutherian β-globin cluster of 5'-ε-γ-η-δ-β-3' (see [29–32]), which then underwent further tandem duplication events. In contrast, the bird (G. gallus) β-like globin genes (ε-βH-βA-ρ) show very little homology to the mammalian β-like globin genes [33, 34].
The discovery of a β-like globin gene (ω -globin) adjacent (3') to the α-globin cluster in marsupials led to a re-interpretation of globin evolution in birds and mammals [35, 36]. Comparative sequence and phylogenetic analysis suggested that the ω-globin gene was more closely related to bird β-like globin genes than to other mammalian β-like globin genes. The specific function of the ω-globin gene is not yet known, but it is expressed just before birth and in the early stages of pouch young development . In addition, the ω-globin product binds to α-like globin chains to form functional haemoglobin, so it is likely to be involved in oxygen transportation [35–37].
This paralogy hypothesis (which rests on the rather weak orthology between the chicken β and marsupial ω), as well as the dates and types of other duplications, could be further tested by studying globin genes of monotreme mammals, and using comparative data to infer the ancestral globin gene arrangement of a mammal ancestor 166 MYA. The availability of platypus genomic sequences now provides an efficient way to discover all of the globin genes and regulatory signals, and to understand their function and evolution. Studies of globin genes in monotremes are also interesting because the specialized features and lifestyle of these unique mammals may have given rise to special adaptations of globin genes to fulfil unusual oxygen requirements. These features include the need for oxygen by diffusion through the egg membrane to the embryo after birth and the physiological response to hypoxic conditions during hibernation, burrowing and diving [38–40].
Little is known about monotreme α- and β-globin families. More than 30 years ago, studies of adult blood revealed a single adult α and β globin protein in the platypus [41, 42] and echidna (Tachyglossus aculeatus [43, 44]). Lee et al.  later isolated an adult β-globin gene in the echidna that encoded a polypeptide identical to the previously isolated echidna β-globin . To date, there is no evidence of any monotreme embryonic ζ- or ε-globin genes.
We used platypus genomic sequences from bacterial artificial chromosomes (BACs) to characterise the α- and β-globin gene families of the platypus and investigate their molecular evolution. In particular, we searched for embryonic and ω-globin genes and any novel globin genes that might fulfil the requirements for oxygen transport under hypoxic conditions. We investigated the genome context in order to infer the structure and origin of the ancestral α- and β-globin clusters at the stem of the mammalian radiation. Our results strongly support the hypothesis that the mammalian α- and β-globin clusters are orthologous to the avian α- and β-globin clusters, respectively, and that the β cluster evolved by transposition of a copy of the beta-like ω-globin gene in an amniote ancestor.
The draft sequence assembly of platypus  is readily available on the University of California Santa Cruz (UCSC) Genome Browser . However, currently the assembly is incomplete for the α- and β-globin clusters, as individual globin genes appear on different contigs. There are also sequences of the platypus BAC clones available in NCBI GenBank that are not yet annotated and assembled, nor is part of the platypus genome assembly. Two of these are Oa_Bb-2L7 [GenBank:AC195438] and Oa_Bb-131M24 [AC203513], which were identified from the Encyclopaedia of DNA Elements Project to contain parts of the α-globin cluster (see Methods). The BAC clone Oa_Bb-484F22 [GenBank: AC192436] containing the β-globin cluster was obtained by screening a male platypus BAC library (Clemson University Genomic Institute, USA) and was subsequently fully sequenced and assembled by the Washington University Genome Sequencing Centre (St Louis, USA). These sequences were therefore used in this study to characterise the whole α- and β-globin clusters in the platypus.
One BAC (Oa_Bb-2L7) contained two embryonic α-like globin genes, and a second BAC (Oa_Bb-131M24) contained six α-like globin genes and a β-like globin gene (see Additional file 1). These two BACs were found to overlap by 10,066 base pairs (bp), resulting in a contig of 330,126 bp that contained the entire platypus α-globin cluster and flanking genes.
Gene-structure of the predicted platypus α- and β-like globin genes and GBY
α 1 /α 3
Two genes at the 5' end of the α-globin cluster were both identified as ζ-like (referred to here as ζ and ζ') and predicted to encode polypeptides of 142 amino acids (aa), which are typical of known functional mammalian α-like globin genes. The amino acid sequence alignment of ζ and ζ' shows 95% identity. In the promoter region of both genes, CACCC and CAAT consensus boxes are conserved at similar positions, and in comparable order to that of human ζ and ζ' (Figure 5B).
Adjoining the two ζ-like globin genes, four other α-like globin genes were identified. One was an orthologue of bird and reptilian αD, and the other three were orthologues of adult α genes (here called α3, α2 and α1). The long and uninterrupted open reading frame (ORF) of αD strongly suggests that it encodes a functional polypeptide of 141 aa, typical of known functional αD globin genes. The platypus αD globin gene contains introns of 1450 bp (intron 1) and 1610 bp (intron 2) that are very large compared with those of other α-like globins, which are usually less than 1000 bp.
Analyses of the platypus adult α-like globin genes reveal three adult (α3, α2 and α1) globin genes in the α-globin cluster. The sequence of α3 (the most 5' gene, adjacent to αD) was found to be almost identical to α1 (the most 3' gene) in their exon and intron regions, as well as in flanking regions of about 130 bp on both sides. The coding region was 100% identical, and just two sites in intron 1 were found to be different between the two genes. In order to confirm that identification of these two identical genes was not due to an error in the assembly of the original sequence data, the boundaries of the region containing the homology between α1 and α3 was further analysed by a BLAST search of the platypus whole-genome shotgun (WGS) database (data not shown). Two contigs were identified with homology to α1 and α3; these had identical sequences on one side of the boundary but different sequences on the other, confirming the presence of two separate genes. Further confirmation was obtained by performing a Southern blot on the α-globin-containing BACs, digested with an enzyme (EcoRV) that does not cut within the α1, α2and α3 (data not shown). Probing with α1/α3 revealed two bright bands, corresponding to α1 and α3, and one fainter band between them, corresponding to α2. Probing with α2 produced the same three bands, but in this case the middle one was brighter, corresponding to α2, and the outer bands were fainter, corresponding to α1 and α3. These analyses confirmed the existence of separate genes α1 and α3 in the platypus α-globin cluster. The α2 gene, located between α1 and α3, was distinct from both genes in the coding sequence (with 83% homology), in intron lengths (intron 1: 405 bp in α1/α3 and 720 bp in α2; intron 2: 151 bp in α1/α3 and 155 bp in α2) and in the promoter region (Figure 5B).
The amino acid sequence encoded by α1 and α3 was identical to the platypus adult α-chain previously identified by Whittaker and Thompson , implying that at least one of these genes is expressed in the adult platypus. The coding domain of α1 and α3 is shorter (426 bp) than that of α2 (429 bp), because it lacks the first three nucleotides of exon 1. The ORF of α2 gives a strong indication that it is translated into a functional polypeptide of 142 aa, typical of known functional mammalian α-like globin genes.
On the 3' side of the six α-like globin genes, a β-like globin gene was predicted, which was identified as the orthologue of the marsupial ω-globin gene. This platypus ω-globin gene has a typical three-exon/two-intron structure, conserved donor/acceptor splice sites, and encodes a polypeptide of 146 aa, typical of all vertebrate β-like globin genes (Table 1). The promoter region located 5' of the ω-globin initiation codon contains conserved sites for CAAT-EKLF-CACCC in an order identical to that of marsupial ω-globin gene.
Unexpectedly, GenomeScan predicted a gene based on the protein similarities with the α- and β-polypeptide chains, approximately 1.5 kb 3' of the ω-globin gene. Like other α- and β-globins, this gene also has a three-exon/two-intron structure and conserved donor/acceptor splice sites (Table 1). The lengths of its exons 1, 2, and 3 are 98, 223 and 144 bp, respectively, compared with 92, 223 and 129 bp in other β-like globin genes. However, it has much larger introns of 3364 bp (intron 1) and 3053 bp (intron 2). The long and uninterrupted ORF of this gene can be translated into a polypeptide of 154 aa, which is atypical of any known α- or β-like globin genes. A BLAST search of the amino acid sequence of this gene obtained the best hit with Globin Y (gby) of the amphibian X. laevis (identity score of 39%), and weaker identity scores with Cytoglobins (cygb) of other species, such as the fish Danio rerio (27%), X. tropicalis (26%), chicken (28%) and human (25%) at the protein level. We designated this gene 'GBY' based on similarities with X. laevis gby, and its similar position adjoining the globin cluster . The predicted polypeptide of platypus GBY (154 aa) was shorter than X. laevis gby (156 aa), and quite different from X. laevis cygb (179 aa), D. rerio cygb1 (174 aa) and cygb2 (179 aa), and human CYGB (190 aa). Using the Expressed Sequence Tag (EST) database, a BLAST search of the platypus GBY also obtained an identity score of 38% with X. tropicalis gby that was expressed in both tadpoles and adults, but produced no significant matches with any other mammalian genes. The present work was the first opportunity to analyse the promoter region of any GBY gene (Figure 5B).
In the platypus, only two β-like globin genes were predicted within the 129,521 bp BAC clone (Oa_Bb-484F22) by GENSCAN and GenomeScan (see Additional file 1). When the predicted amino acid sequences were subjected to BLAST search, the 5' gene had best hits with mammalian embryonic ε-globin genes. Although the phylogenetic analyses using Bayesian inference (BI; see below) indicated that this gene was more closely related to the platypus and echidna adult β-globin genes than to therian ε-globin genes, the position of this gene on the 5' end of the β-globin cluster and expression data (see below) supports its orthology with mammalian embryonic ε-globin genes, and is henceforth referred to as ε. The 3' gene encoded a protein identical to the previously identified platypus adult β-chain , and is henceforth referred to as β.
Both genes encode polypeptides of 146 aa, typical of known functional mammalian β-like globin genes. The promoter region of the platypus β has conserved sites of CACCC and CAAT in all three extant of mammals. However, the promoter region of the platypus ε appears to be quite different from other mammalian ε-globin genes and even from the platypus β (Figure 5B). The promoter of platypus ε contains only one predicted motif (CAAT), whereas the promoters of other mammalian ε, β and the platypus β contain many predicted motifs.
Transcription studies were performed to gain insight into the expression and function of all of the predicted platypus globin genes. Adult liver, kidney, spleen, testis, lung and brain were obtained for this project: no embryonic samples were available (or are ever likely to be available) for this vulnerable and iconic species. Observation of the expression of any of the predicted genes would constitute a good indication that the gene is transcriptionally active and functional.
Phylogenetic analyses of the α-like globin genes using BI and maximum parsimony (MP) produced several noteworthy results. The platypus adult α globin genes (α1/α3 and α2) grouped closely together to the exclusion of eutherian and marsupial α- and θ-globin genes for all analyses, although posterior probability (69%) and bootstrap support (66%) for this arrangement were relatively weak (Figure 2). This finding suggests that the duplication leading to the marsupial and eutherian θ-globin lineage occurred after the divergence of the monotreme and therian lineages. This is consistent with the absence of a θ-globin gene from the region between platypus α1- and ω-globin, its expected location based on its position in marsupial α-globin clusters [12, 56].
Both platypus ζ-globin genes grouped closely together and formed a sister group relationship with chicken π, supported by a high posterior probability of 97% (Figure 2). A sister group relationship was also found in MP trees for analyses of the entire platypus coding region (bootstrap support <50%), and when third positions in the codon were excluded, was supported by 73% bootstrap pseudoreplicates (data not shown). This differs from the expectation that platypus ζ-globin genes would group with other mammalian ζ-globin genes to the exclusion of chicken π, suggesting that other factors (for example, purifying selection) operated to maintain a similar sequence in birds and monotremes.
There is still considerable uncertainty in the phylogenetic position of the αD-globin clade. It has recently been proposed that the αD globin lineage resulted from duplication of the embryonic α-globin lineage, with phylogenetic analyses supporting a sister lineage relationship of these lineages to the exclusion of the adult α-globin lineage . However, this arrangement was not supported in BI analyses of the data set used here, and the position of the αD lineage was different in the different analyses. Analyses using BI (Figure 2) supported the sister lineage relationship of the αD and adult α-globin lineages (as proposed by Cooper et al. ), with 87% posterior probability support. In contrast, all MP analyses supported the sister lineage status of αD and embryonic α-globin genes, indicating an uncertainty in the phylogenetic position of the αD-globin clade.
Phylogenetic analyses of the β-globin genes provided results similar to recently reported phylogenetic analyses [35, 36], with one notable exception. The BI analyses of coding sequence data (Figure 3) provided strong support (99% posterior probability) for the sister relationship of bird and mammalian β-like globin genes, contradicting previously published phylogenies of mammalian β-globin genes showing a sister relationship of marsupial ω-globin and bird β-like globin genes [35, 36]. MP analyses (Figure 4), excluding third position in the codon, gave a similar tree arrangement, albeit with very low bootstrap support (<50%). In marked contrast to the BI analyses of DNA sequence data, BI protein analyses (data not shown) supported the sister relationship of bird β-like globin and mammal ω-globin lineages with a high posterior probability (99%).
Lastly, phylogenetic analyses using BI indicated that the platypus ε gene was more closely related to the platypus and echidna adult β-globin genes than to therian ε-globin genes, suggesting it may not be orthologous to marsupial and eutherian ε-globin (Figure 3). BI analyses of β-globin protein data and MP analyses of the coding sequence data, with third codon positions excluded, grouped the gene as an ancestral lineage to eutherian and monotreme adult β-globin genes (see Figure 4). This ancestral position suggests that the lineage evolved following duplication of an ancestral β-globin gene prior to the divergence of monotremes and therians.
To explore the genome context of the α- and β-globin clusters in the platypus and other vertebrates, the platypus BAC sequences and the genomes of other sequenced species were searched for loci residing beside the α- and β-globin clusters.
As well as globin genes, GENSCAN predicted within the platypus α-globin 330,126 bp contig many genes that flank the platypus α-globin cluster (Figure 5A), which were identified by BLAST analyses. These include IL9RP3-POLR3K-C16orf33-C16orf8-MPG-C16orf35 upstream (5') of the α-globin cluster, and, LUC7L-ITFG3-RGS11-ARHGDIG-PDIA2-AXIN1 downstream (3') of the α-globin cluster (Figure 5A).
GENSCAN also predicted numerous genes other than globin genes in the platypus β-globin BAC (484F22). These were identified by a BLAST search as members of the olfactory receptor gene (ORG) family that are responsible for odour detection. Three conserved ORG members were identified at the 5' end of the platypus β-globin cluster and one conserved ORG member at the 3' end (Figure 5A).
To compare β-globin flanking loci, ORG genes, as well as other genes that are closest to the β-globin cluster in other species, RRM1, CCKBR and ILK were searched for in the human, opossum, chicken and zebrafish genomes that were accessible from Ensembl . Data from frog (X. tropicalis) was not useful since all of these loci lie on different contigs or scaffolds due to assembly problems. The locations of multiple ORG genes, RRM1, CCKBR and ILK were found to be conserved adjacent to β-globin cluster of birds and mammals [60, 61], but not for the α-β cluster of fish and frogs, nor beside the second α-β cluster of zebrafish and pufferfish (Figure 8B). Thus the genome context of the platypus β-globin cluster is the same as in therian mammals and birds, but this is different from the α-β cluster of fish and frogs.
The phylogenetic position of monotremes makes comparisons with platypus of special value for exploring the organization, function and evolution of mammalian genes and genomes. The availability of platypus genome sequence data now makes many such studies possible, and have been used here to characterise the platypus α- and β-globin gene clusters and explore their evolutionary history.
The platypus α-globin cluster contains at least eight genes within more than 40 kb, including six α-like globin genes (including the identical α1 and α3), one β-like globin gene (ω-globin) and a gene belonging to another member of the globin super-family (GBY) arranged in the order 5'-ζ-ζ'-αD-α3-α2-α1-ω-GBY-3' (Figure 5A). The cluster maps to chromosome 21, the smallest autosome in platypus. All eight genes are likely to be functional since their expression was detected in tissues of an adult platypus.
Importantly, the platypus α-globin cluster contains a copy of the β-like ω-globin gene, also found in the marsupial α-globin cluster, but absent in humans, supporting the hypothesis that ω-globin was present in the common ancestor of all mammals. Phylogenetic analyses also confirm the ancient ancestry of the ω-globin gene, as concluded by Wheeler et al. [35, 36]. Among adult platypus tissues this gene was expressed only in the spleen. In marsupials, expression of the ω-globin gene was detected just prior to birth and during early pouch young development , although the site of expression was not studied, and there was no evidence of adult expression in blood cells.
We discovered a globin gene GBY in the platypus that is adjacent (3') to ω in the α-globin cluster. It has a typical three-exon/two-intron structure like other α/β-globin genes, contains an ORF encoding a polypeptide chain of 154 aa, and is expressed in almost all adult tissues, most strongly in testis. The amino acid sequence is unrelated to any of the other globin genes in the cluster, so it is unlikely to be derived by duplication of α- or ω-globin within the monotreme lineage. Rather, it shows sequence similarity to gby of X. tropicalis and X. laevis, a gene thought to be related to cytoglobins .
Little is known of the function of amphibian gby, or its relationship with other globins. Fuchs et al.  reported that amphibian gby encodes a bona fide globin of 156 aa, having all of the sequence features of a functional respiratory protein. gby was expressed in all adult tissues tested in X. laevis, most strongly in ovary, kidney and eye, and was present in 20 expressed sequence tag clones from different stages of X. laevis and X. tropicalis embryonic and adult development , suggesting that it is expressed in embryonic as well as adult stages. Phylogenetic analysis of all vertebrate globins  showed that the gby lineage diverged at the base of two separate clades, one comprising all vertebrate cytoglobins, myoglobins, agnathan globins and bird globin E, and the other comprising the haemoglobin α- and β-chains.
The position of platypus GBY adjacent to the α-globin cluster and flanked by LUC7L mirrors its position in X. tropicalis between the main α-β cluster and LUC7L . Another common feature of both was strong expression in gonads (ovary in X. laevis  and testis in platypus), so GBY has sex-related expression in both lineages. Thus GBY is not specific to amphibians, as was thought, but was a component of the cluster in an ancient tetrapod, and has been lost, or has diverged beyond recognition, in birds and therian mammals.
Characterisation of the platypus β-globin cluster revealed two β-like globin genes over about 13.2 kb that are arranged in the same order as marsupials, 5'-ε-β-3' (Figure 5A). This cluster is located on platypus chromosome 2q5.1. Both genes appear to be transcriptionally active and are likely to be functional.
At the time of revising this paper, an independent paper on monotreme β-like globin genes was published by Opazo et al.  in which they reported the presence of ω, εP and βP in the platypus. Largely on the basis of phylogenetic analyses of flanking and coding sequence data, they proposed that platypus εP and βP were not 1:1 orthologues of therian ε and β, respectively, and arose by independent duplication of an ancestral β-globin gene in the monotreme lineage, with a separate duplication event, just prior to the divergence of therians, producing the progenitors of ε and β of therians. This hypothesis was strongly supported by our BI phylogenetic (Figure 3) analyses, but not by MP analyses of coding sequence data, with third codon sites excluded (Figure 4), or BI analyses of protein sequence data (not shown). These contradictory analyses highlight the difficulty in resolving deep relationships among globin genes, particularly when the time periods between duplication and speciation events are relatively small, the phylogenetic signal at third codon positions is potentially saturated, and non-synonymous sites may be subjected to purifying or positive selection. Despite a very high posterior probability (100%) for the grouping of platypus ε with monotreme β, this value is a Bayesian probability and depends on the model adequately representing the evolution of the gene. Furthermore, although it was reported  that the 5' flanking sequences of platypus ε and β were similar, we found no evidence for similarity of the promoter signals of these two genes (Figure 5B).
We consider that a more parsimonious explanation is that the platypus ε is orthologous to the marsupial and eutherian embryonic β-like globin lineages (ε and γ), and arose by duplication of an ancestral β-globin gene prior to the mammalian radiation (166 MYA; Figure 9B). The sequence of platypus ε may have been homogenised by some gene conversion events, leading it to group with other monotreme adult β-like globin genes. In addition to the MP analyses reported above, this explanation is further supported by the conserved position of ε to the 5' side of the adult β-globin gene in the platypus cluster, which is similar to that found in other therian β-globin gene clusters ; see also ). Amino acid sequence analyses (BlastP) also provided additional support for the orthology of platypus ε to other mammalian ε-globin genes. Although we were unable to examine the expression of the genes in embryonic tissues, it was found that the expression profile of the platypus ε was similar to the embryonic α-like globins ζ and ζ' of the platypus, but not to the adult β-globin gene, supporting its potential role as an embryonic β-like globin gene.
The discovery of the marsupial ω-globin gene in the α-globin cluster [35, 36] was critical in re-interpreting the relationships of the α- and β-globin clusters in amniotes (reptiles, birds and mammals) to favour the hypothesis that these clusters in birds and mammals are paralogous, having diverged independently from different ancestral copies of the vertebrate α-β-globin locus .
Our observation of an ω-globin gene in the α-globin cluster in the platypus, as well as in the marsupials, confirms that the ancestral mammal α-globin cluster contained a β-like globin gene that was lost in eutherians, as proposed by Wheeler et al. [35, 36]. However, the position of monotreme and marsupial ω in the phylogeny (Figure 3) is more consistent with the original hypothesis  that mammal and bird β-globin are orthologous, having descended from the same β-globin progenitor in an amniote ancestor, and this is strongly supported by flanking sequence data (see below). Our data support the proposition that the ω -globin gene represents an ancient β-like globin gene lineage that is ancestral to a group containing both mammalian and bird β-globins with a high posterior probability (99%). This arrangement, however, was not supported by analyses of amino acid sequence data, indicating that there is uncertainty in the phylogenetic position of ω-globin relative to bird β-globins, or that convergent evolution of bird β-globin genes and ω-globin resulted in their similarity at the protein level. To further resolve the key question of whether bird and mammal β-globin gene clusters are orthologous we carried out comparative analyses of flanking loci of the α- and β-globin clusters.
We found that the platypus α-globin cluster is flanked by MPG, C16orf35, GBY and LUC7L, and that the same genes (except GBY) flank the α-globin cluster in mammals and birds [58, 59]. The same genes flank the α-β cluster of frog, and even zebrafish and the α-cluster of pufferfish  (except GBY and LUC7L), implying that a very ancient region containing these genes (5'-MPG-C16orf35-α-β-GBY-LUC7L-3'), or perhaps an even larger region, was present in their common ancestor and has been conserved since the evolution of jawed vertebrates more than 450 MYA.
In contrast, the amniote β-globin clusters reside in a very different genome, sharing none of the flanking loci with the mammal and bird α-globin clusters, or the α-β cluster of frogs and fish. In platypus, as well as in therian mammals [60, 61], the β-globin clusters are flanked by numerous ORG genes on both sides. In birds, also, the β-globin cluster is embedded in ORG genes . Even the outside loci RRM1, CCKBR and ILK lie in the same orientation with respect to the bird and mammalian β-globin clusters , suggesting that the 5'-RRM1-ORG-β (cluster)-ORG-CCKBR-ILK-3' arrangement has been conserved since before the divergence of birds and mammals, more than 315 MYA. Therefore, the bird β-globin cluster is orthologous to the β-globin clusters of mammals.
This analysis of flanking loci, in addition to the phylogenetic analyses reported above, refutes the prevailing hypothesis that mammal and bird α- and β-globin clusters evolved from different (paralogous) copies of an ancestral α-β-globin region containing MPG-C16orf35-α (cluster)-β (cluster)-GBY-LUC7L. Rather, the context of β-globin clusters within olfactory receptor genes in birds as well as mammals suggests that a copy of a β-globin locus was moved into a region replete with ORG genes before the divergence of birds and mammals 315 MYA. The precise mechanism for this translocation is unknown, but is likely to be either by transposition of a tandem duplicate of an ancestral β-globin gene, or retrotransposition of an intron-containing primary transcript. Phylogenetic analyses suggest that this ancestral β-globin gene within the α-globin cluster is represented by the platypus and marsupial ω-globin gene. The transposed β-globin gene then independently duplicated several times within the avian and mammalian lineages to form the different clusters of differentially expressed β-globin genes. Full details of this new model are given in Figure 9A and 9B.
This hypothesis could be further tested by investigating the gene organization of the α- and β-globin clusters in reptiles such as lizards and snakes, which form a sister group to birds. Our hypothesis predicts that reptiles should possess a MPG-C16orf35-α (cluster)-β (cluster)-GBY-LUC7L cluster, and an unlinked RRM1-ORG-β (cluster)-ORG-CCBKR-ILK cluster like birds and mammals. The full genome sequence of the first reptilian species,Anolis carolinensis, will provide an opportunity to test this hypothesis.
At the start of this project there were no trace sequences available for any globin genes in the platypus trace archive. We therefore designed probes to screen the platypus male Oa_Bb BAC library (Clemson University Genomic Institute, USA). The platypus β-globin-specific primers OaBGF (5'-tggacccagaggttctttgac-3') and OaBGR (5'-tgcaattcactcagcttggag-3') were designed from the reference tammar β-globin sequence [GenBank: AY450928] using Primer3 . Amplification by PCR was performed in a final volume of 25 μl, with 40 ng genomic DNA, 1× Buffer (Roche, Australia), 0.2 mM dNTPs, 0.05 U Taq (Roche, Australia) and 1 μM each of forward and reverse primers. PCR cycling conditions were: 94°C for 2 minutes, then 35 cycles of 94°C for 30 seconds, 50 to 60°C for 30 seconds, 72°C for 1 minute, followed by 72°C for 10 minutes. The PCR products were sub-cloned according to the TOPO TA cloning® Kit Protocol (Invitrogen, Australia) and the resulting plasmids were purified according to the centrifugation protocol of Wizard® Plus SV Minipreps DNA Purification System (Promega, Australia). The purified plasmids were confirmed to contain PCR products of a partial platypus β-globin gene (167 bp) by sequencing at the Australian Genome Research Facility (AGRF, Brisbane, Australia) using M13 forward (5'-gtaaaacgacggccag-3') and M13 reverse (5'-caggaaacagctatgac-3') primers. Once confirmed, they were used as probes to screen the platypus BAC library.
The platypus BAC library filters were pre-hybridised at 65°C with Church Buffer (1 mM EDTA, 0.5 M phosphate buffer, 7% (w/v) SDS) including 1% BSA for 4 hours. The platypus β-globin probes (25 ng) were labelled with 32P-dATP using the Megaprime DNA labelling System (GE Healthcare, Australia) following the manufacturer's instructions. The probes were allowed to hybridise to the filters at stringent conditions (65°C with the above buffer) for 24 hours and then washed twice for 15 minutes each in 2 × SSC/0.1%SDS and 1 × SSC/0.1%SDS. Autoradiography was carried out for 14 days at -80°C with an intensifying cassette.
Unlike β, BACs were not screened for α-like globin genes. Instead they were identified directly from the Encyclopaedia Of DNA Elements Project , in which the α-globin cluster is one of the targeted regions [12, 66]. Two platypus BAC clones (Oa_Bb-2L7 and Oa_Bb-131M24), which were sequenced but not yet annotated, were identified by computational analysis (below) to contain parts of the α-globin cluster and a ω-globin gene.
DNA from the identified BAC clones (including those that were screened) was extracted using Wizard® Plus SV Minipreps DNA Purification System (Promega, Australia). The purified BAC clones were then subjected to Dot or Southern Blot to confirm the presence of α- or β-globin genes respectively.
Dot blot methods were used to verify the presence of the α-like globin genes. In a plate containing Luria broth agar with chloramphenicol, a Hybond N+ (GE Healthcare, Australia) filter was placed and multiple 1 μl of liquid culture BAC clones were spotted onto the filter. The plate was incubated at 37°C overnight and then the filter was soaked in Denaturation Solution (0.5 M NaOH and 1.5 M NaCl) for 5 minutes, followed by soaking twice in Neutralisation Solution (0.5 M Tris-Cl pH 7.4 and 1.5 M NaOH) for 5 minutes each. The filter was then rinsed in 2 × SSC, soaked in 0.4 M NaOH for 20 minutes and washed with 6 × SSC to remove all cellular debris. The filters were then screened with the platypus α-globin probes using the standard library screening procedure (above).
Southern blotting was used to verify the presence of the β-like globin genes. In a 40 μl reaction, 20 to 40 ng BAC DNA was digested with 10 U of restriction enzyme,HIND III (Roche, Australia). The reaction was incubated at 37°C for at least 4 hours and separated by electrophoresis on a 0.8% agarose gel overnight at 40 V. The DNA fragments were transferred onto a Hybond N+ (GE Healthcare, Australia) nylon filter overnight by capillary action following the manufacturer's instructions, and cross-linked in 0.4 M NaOH for 20 minutes. These filters were then screened with the platypus β-globin probes using the standard library screening procedure (above).
Male platypus metaphase spreads were prepared and in situ mapping was performed using two-colour FISH as described previously by McMillan et al. . The verified BACs containing the α-like globin genes (ζ and ζ': Oa_Bb-2L7) and β-like globin genes (ε and β: Oa_Bb-484F22) were labelled with different fluorochromes and then hybridised to the chromosomes. The signals were detected by fluorescent microscopy, where at least twenty metaphase images were captured and analysed.
Information about the platypus BAC clones containing the α-like globin genes along with the ω-globin gene was obtained directly from the ENCODE Project . Their sequence information was obtained from GenBank; accession numbers: AC195438 (Oa_Bb-2L7) and AC203513 (Oa_Bb-131M24).
The BAC clone containing the β-like globin genes that were found from the library screening procedure were sequenced at the Washington University Genome Sequencing Centre (St Louis, USA). The sequence information for this BAC clone was obtained from GenBank: AC192436 (Oa_Bb-484F22).
Using sequence information of AC195438, AC203513 and AC192436, genes were predicted by GENSCAN  and GenomeScan  using default settings. All predicted gene sequences were then subjected to a BLAST search of the translated nucleotide acid (BlastX) and protein (BlastP) databases to confirm their identities.
Transcription factor binding motifs were predicted in the 200 bp promoter region located 5' to the predicted platypus α- and β-like genes and GBY by rVista 2.0  using user-defined consensus sequences for 'CACCC', 'CAAT', 'TATA', GATA1 ('WGATAR' ) and EKLF ('NGNGTGGGN' ). The same criteria were used to predict the same motifs in marsupials (Didelphis virginiana [ζ and ψζ': AC139599] and Sminthopsis macroura [αD, ψα3, α2, α1, ω: AC146781; and ε, β: AC148754]) and in humancs [ζ, ψζ', αD, ψα3, α2, α1: NG_000006; and ε, β: NG_000007] for consistency in comparison.
To confirm that the presence of two almost identical genes (α1 and α3) was real rather than an assembly error, the boundaries (~300 bp) of the homologous regions were investigated by a BLAST search against the platypus WGS database. The raw sequences of best hits were extracted from NCBI GenBank, cleaned and aligned in Sequencher v4.8 (Gene Codes Corporation, Michigan) using default settings.
Southern blotting was also used to verify the presence of α1 and α3 genes. In a 30 μl reaction, 100 μg BAC DNA (Oa_Bb: 131M24, 130N2, 150K14 and 223I12) was digested with 10 U of restriction enzyme,EcoRV (Roche, Australia). The reaction was incubated at 37°C for at least 4 hours and separated by electrophoresis on a 0.8% agarose gel overnight at 40 V. The DNA fragments were transferred onto a Hybond N+ (GE Healthcare, Australia) nylon filter overnight by capillary action following the manufacturer's instructions, and cross-linked in 0.4 M NaOH for 20 minutes. These filters were then screened with the platypus α1/α3 (test) and α2 (control) probes using the standard library screening procedure (above).
PCR primers used for amplification of the α- and β-like globin genes including GBY from the platypus gDNA and cDNA
α 1 /α 3
Phylogenetic analyses were employed to verify the identities of the platypus globin genes and study the evolutionary relationships of the different members of the α- and β-globin gene families. This study was restricted to the coding domains of the α- and β-globin members and the accession numbers of the sequences used are given in the legends of Figures 2 and 3. Phylogenetic analyses were conducted using MP in PAUP* v.4.0b10 , and a BI approach using MrBayes v.3.1.2 . Concordance of trees from each of the different methods, bootstrap proportions and posterior probability estimates were used to examine the robustness of nodes.
MP analyses were conducted for the entire coding sequence matrix and after excluding third codon positions using a heuristic search option and default options (TBR branch swapping), with the exception of using random stepwise addition repeated 100 times. Character state optimisation for MP trees used the DELTRAN option. MP bootstrap analyses  were carried out using 1000 bootstrap pseudoreplicates, employing a heuristic search option with random stepwise addition.
The program MODELTEST  and the Akaike information criterion (AIC) were used to assess the most appropriate model for BI analyses. The MODELTEST analyses were facilitated using the program MrMTgui v1.0 . The MODELTEST analysis was carried out on separate codon positions for α- and β-globin data sets. For α-globin sequences, a general time reversible (GTR) model , with a proportion of invariant sites (I) and unequal rates among sites , modelled with a gamma distribution (G) was found to be the most appropriate model to use for first and second codon positions, and a GTR+G model was appropriate for third codon positions under the AIC. For β-globin sequences a GTR+I+G model was considered appropriate for first positions, and a GTR+G model was found to be appropriate for second and third codon positions. The MrBayes analysis was carried out applying these different models to each codon position using an unlinked analysis, with default uninformative priors. Four chains were run simultaneously for 2 million generations in two independent runs, sampling trees every 100 generations. The program TRACER (version 1.3; ) was used to assess tree and parameter convergence. For both the α-globin and β-globin analyses all effective sample sizes for all parameters were >1297, indicating a sufficient sample of the parameter space had been taken. A burn-in of 2000 trees (equivalent to 200,000 generations) was chosen for each independent run of MrBayes, with a >50% posterior probability consensus tree constructed from the remaining 36,002 trees (18,001 trees each run).
A BI analysis using MrBayes (version 3.1.2) was also carried out using protein sequence data from β-globin genes. A mixed protein model was used, allowing the optimum model of protein evolution to be assessed from a selection of nine fixed-rate models. The optimum model was found to be the Dayhoff model with a posterior probability of 1.0. The analyses were conducted using two million generations in two independent runs, sampling trees every 100 generations. A burn-in of 2,000 trees was used for each run with a 50% consensus tree constructed from the remaining 36,002 trees.
We thank Dr Frank Grützner (The University of Adelaide) and Tim Hore (The Australian National University) for providing platypus tissues and RNA preparations. We also thank Dr Paul Waters for providing computational help with the identification and confirmation of two identical but different genes α1 and α3.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.