On the origin of POU5F1
© Frankenberg and Renfree; licensee BioMed Central Ltd. 2013
Published: 9 May 2013
Skip to main content
© Frankenberg and Renfree; licensee BioMed Central Ltd. 2013
Published: 9 May 2013
Pluripotency is a fundamental property of early mammalian development but it is currently unclear to what extent its cellular mechanisms are conserved in vertebrates or metazoans. POU5F1 and POU2 are the two principle members constituting the class V POU domain family of transcription factors, thought to have a conserved role in the regulation of pluripotency in vertebrates as well as germ cell maintenance and neural patterning. They have undergone a complex pattern of evolution which is poorly understood and controversial.
By analyzing the sequences of POU5F1, POU2 and their flanking genes, we provide strong indirect evidence that POU5F1 originated at least as early as a common ancestor of gnathostomes but became extinct in a common ancestor of teleost fishes, while both POU5F1 and POU2 survived in the sarcopterygian lineage leading to tetrapods. Less divergent forms of POU5F1 and POU2 appear to have persisted among cartilaginous fishes.
Our study resolves the controversial evolutionary relationship between teleost pou2 and tetrapod POU2 and POU5F1, and shows that class V POU transcription factors have existed at least since the common ancestor of gnathostome vertebrates. It provides a framework for elucidating the basis for the lineage-specific extinctions of POU2 and POU5F1.
Loss of potency during differentiation is fundamental to the development of complex metazoans. Pluripotent embryonic cells are able to give rise ultimately to all tissues of the adult body. In at least some mammals, pluripotency can be “captured” in vitro in the form of indefinitely self-renewing embryonic stem (ES) cells. Thus ES cells can serve as a model for the differentiation of their in vivo counterparts into ectoderm, mesoderm and endoderm derivatives.
POU5F1 (also called OCT4 or OCT3/4) is a central regulator of pluripotency in mammals. In the mouse, deletion of Pou5f1 causes loss of pluripotency in the inner cell mass and differentiation to trophoblast, revealing its earliest developmental role . POU5F1 is also a potent reprogramming factor capable of facilitating the derivation of induced pluripotent stem (iPS) cells [2, 3]. Conditional knockout of Pou5f1 in mouse primordial germ cells results in their apoptosis , showing that the role of POU5F1 is not exclusively restricted to preventing differentiation.
POU2 is a vertebrate paralog of POU5F1 that has been best characterized in zebrafish. Curiously, some vertebrate lineages, such as salamanders, marsupials and monotremes, have preserved both POU2 and POU5F1 in their genomes while in other vertebrates one or the other gene has become extinct [5–7]. Thus squamate reptiles and eutherian mammals have only POU5F1 while birds and frogs have only POU2 (called POUV in birds). In Xenopus, POU2 is present as three tandem copies - OCT25, OCT60 and OCT90.
For reasons that are not fully clear, teleost pou2 was recently renamed pou5f1 despite multiple pieces of evidence for a closer affinity to POU2 orthologs of tetrapods. Onichtchouk  argued that since orthologous genes are defined “as originating from a single ancestral gene in the last common ancestor of the compared genomes”, teleost pou2 is orthologous to mammalian POU5F1. However, by the same argument, teleost pou2 is also orthologous to tetrapod POU2 orthologs, thus obviating the need for a name change. Teleost pou2 shares more sequence similarity as well as conserved synteny with tetrapod POU2[5, 6], but perhaps more importantly, it was not proven whether the duplication event giving rise to each paralog occurred after or before the common ancestor of tetrapods and teleost fishes. If the latter, POU5F1 must have become extinct in teleosts as it has in some other tetrapod lineages such as birds and frogs.
POU5F1 and POU2 share a five-exon genomic structure that is characteristic of the class V POU family. Exons 1 and 5 encode the poorly conserved N- and C-terminal transactivation domains, respectively, while Exons 2 to 4 encode the highly conserved POU-domain, which comprises the POU-specific domain and the POU-homeodomain separated by a short linker region [9–11].
To gain insight into the origins of the class V POU family of transcription factors in vertebrates, BLAST searches were performed for sequences homologous to mammalian POU2 and POU5F1. Previously unreported orthologs of POU5F1 were identified from a large number of vertebrate species, including the painted turtle (Chrysemys picta bellii), Indian python (Python molurus) and coelacanth (Latimeria chalumnae). POU2 orthologs were also identified in many species, including the alligator (Alligator mississippiensis), painted turtle, coelacanth and spotted gar (Lepisosteus oculatus).
The avian POU2 ortholog - POUV - was identified in genome assemblies of the turkey (Meleagris gallopavo), medium ground finch (Geospiza fortis) and budgerigar (Melopsittacus undulatus), adding to the previously identified orthologs from chicken  and zebra finch . Conserved open reading frames orthologous to chicken Exon 1 could not be identified in other avian species. As chicken Exon 1 was previously identified as unlikely to be homologous to Exon 1 from non-avian orthologs , all available avian genomes were re-examined. Low stringency BLAST searches identified a single sequence (Ti 224571611) from the chicken whole genome shotgun (WGS) trace archives with homology to the proximal promoter and 5′ part of Exon 1 of non-avian POU2 orthologs (see below). In addition, a primordial germ cell-derived partial chicken EST (GenBank accession DR410403) included sequence with clear homology to the 3′ part of Exon 1 from non-avian POU2 orthologs. The apparent absence of both the proximal POU2 promoter and the “canonical” Exon 1 in other birds is probably due to gaps in their respective genome assemblies, suggesting that features of this region impart recalcitrance to sequencing. We conclude that the previously published cDNA for chicken POUV represents a rare or non-canonical chicken-specific transcript (retaining the first intron) that was selectively isolated due to the PCR-based methods used.
To maximize statistical power, we first compared the translated dogfish sequence (the only chondrichthyan sequence spanning Exons 5 to 8) with NPDC1 and NPDC1L orthologs of other species, including a tunicate (Ciona savignyi) NPDC1/NPDC1L homolog as an outgroup. The dogfish sequence clustered with NPDC1L orthologs with a significant bootstrap value using three different methods for generating consensus phylogenetic trees (maximum parsimony, maximum likelihood and neighbor-joining) (Figure 3A). In a comparison with the sequences from coelacanth, a species with both NPDC1L and NPDC1 (to control for lineage-specific differences in divergence rate), the dogfish sequence was clearly more similar to NPDC1L than to NPDC1, indicating that its clustering with NPDC1L in the consensus trees was not simply due to more rapid divergence from an ancestral NPDC1-like sequence (Figure 3E). This indicated that a gene more similar to NPDC1L than to NPDC1 has existed since at least as early as the common ancestor of Chondrichthyes and Osteichthyes, and that duplication of an NPDC1/NPDC1L ancestral gene must have occurred before the split between Sarcopterygii and Actinopterygii, since both groups have NPDC1 orthologs that are more similar to each other than to NPDC1L. To examine whether the duplication occurred even earlier in a common ancestor of Chondrichthyes and Osteichthyes, we performed phylogenetic analyses of the other chondrichthyan sequences from elephantfish and little skate (Figure 3B-D). Both of the elephantfish sequences (spanning Exons 5 to 6 and 7 to 8, respectively) clustered with NPCD1 orthologs and were separate from the dogfish sequence and NPDC1L orthologs, regardless of the exon analyzed or the method used. For the little skate, one of the two Exon 5 sequences and one of the two Exon 8 sequences clustered with the dogfish sequence regardless of the analysis method and with significant bootstrap values for three of the six analyses (one for Exon 5 and two for Exon 8), indicating that these sequences are orthologous to the dogfish sequence. The other little skate Exon 5 and Exon 8 sequences, plus the Exon 6 sequence, each clustered with an elephantfish sequence to the exclusion of all other sequences in almost every case (8/9), with only one (non-significant) exception (Exon 5 - maximum parsimony; Figure 3B). Bootstrap values for this clustering were significant in three of the other eight analyses (Exon 6 - maximum parsimony; Exon 8 - maximum likelihood and neighbor-joining). These results strongly suggested that chondrichthyans collectively have both NPDC1 and NPDC1L paralogs and that both are present in the little skate genome. To exclude the possibility that the putative NPDC1 ortholog (in elephantfish and little skate) is a chondrichthyan-specific paralog of the dogfish sequence, we compared the two elephantfish sequences (Exons 5 to 6 and 7 to 8) to coelacanth NPDC1 and NPDC1L (Figure 3E). Both elephantfish sequences were more similar to coelacanth NPDC1 than to either the dogfish sequence or coelacanth NPDC1L, strongly arguing against a scenario in which the elephantfish sequences are derived from a chondrichthyan-specific duplication of an ancestral NPDC1/NPDC1L precursor that was more similar to extant NPDC1L orthologs than to NPDC1 orthologs. It may thus be concluded that orthologs of both NPDC1 and NPDC1L are present among cartilaginous fishes and, therefore, that the duplication event giving rise to POU2 and POU5F1 must have occurred at least as early as a common ancestor of extant gnathostomes.
Combined, the above data suggest that orthologs of both POU2 and POU5F1 exist among cartilaginous fishes. Although the identity of every WGS contig cannot be assigned with confidence, evidence suggests that the little skate has both a POU2 and a POU5F1 ortholog, while the elephantfish has only a POU2 ortholog. This is consistent with the presence of both NPDC1 and NPDC1L orthologs in the little skate, but only an NPDC1L ortholog in the elephantfish.
Our data show that the duplication that gave rise to POU5F1 and POU2 occurred in a common gnathostomal ancestor. This can be deduced by combining two crucial pieces of evidence. First, conserved synteny shows that the duplication was multigenic and also gave rise to the paralogs NPDC1 and NPDC1L. Second, orthologs of both NPDC1 and NPDC1L were identified in cartilaginous fishes. Consistent with this deduction, we identified sequences in cartilaginous fishes that appear to correspond to either POU2 or POU5F1. Orthologs of both POU2 and POU5F1 are likely to still be present in the genome of the little skate, although their sequences appear less divergent from each other than they are in higher vertebrates. We also predict that an ortholog of POU5F1 is present in the spiny dogfish, since this species also retains an ortholog of NPDC1L, but POU5F1 is presumably extinct in the elephantfish lineage.
Contrary to a recent assertion , our study provides clear evidence that the pou2 gene of teleost and other actinopterygian fishes is a bona fide ortholog of tetrapod POU2 and not of POU5F1. Its recent renaming to pou5f1 (RefSeq-ID NM_131112.1) by the zebrafish nomenclature committee is, therefore, misleading. POU5F1 became extinct possibly in a common ancestor of actinopterygians, or at least of teleost fishes. This finding is important because misleading nomenclature can potentially lead to misleading assumptions regarding evolutionary conservation versus divergence of the respective roles of POU2 and POU5F1.
Orthologs of POU2 and POU5F1 from various vertebrates have been tested for their ability to maintain pluripotency in mouse ES cells or to generate mouse or human iPSCs. Non-eutherian POU5F1 orthologs from axolotl  and platypus  both have this ability. POU2 orthologs from opossum, chicken, Xenopus, axolotl and medaka are also able to maintain or induce pluripotency [6, 7, 12, 16, 17], even in species that have retained both paralogs (axolotl and opossum). Surprisingly, although medaka pou2 can maintain mESC pluripotency, pou2 of another teleost fish, zebrafish, cannot . Neverthess, the conservation in function of class V POU family members despite very poor sequence conservation in the transactivation domains can perhaps be expected considering that deletion of either (but not both) of the N- or C-terminal domains did not affect the ability of mouse POU5F1 to maintain ES cell pluripotency . Maintaining pluripotency in ES cells probably serves as a model for only a limited proportion of the roles POU2 and POU5F1 serve in vivo. Thus, although there is strong evidence for an ancient role for the common ancestor of POU2 and POU5F1 at least in the maintenance of pluripotency, deducing distinct functions and roles between various POU2 and POU5F1 orthologs will probably require in vivo assays other than ES cell complementation. This would include deducing the function of the conserved (K/R)XWYXF motif in the N-terminal domain.
A general, although not universal pattern, appears to be that POU2 orthologs are more widely expressed in non-germline and non-pluripotent tissues than are POU5F1 orthologs. In marsupials, POU2 transcripts are detectable by RT-PCR in a wide range of adult tissues whereas POU5F1 expression is restricted to the germ line and early conceptuses [5, 19]. Nevertheless, POU2 is also differentially expressed in early tammar conceptuses  and protein immunolocalization suggests that POU2 is a more specific marker than POU5F1 of very early epiblast . Interestingly in the sturgeon, pou2 transcripts were also detected in many adult tissues .
POU2 orthologs seem to have a more important role than POU5F1 in early neural development. In the axolotl, POU2 but not POU5F1 is expressed specifically in the early neural plate and later in the developing hindbrain , in a pattern similar to chicken POUV, Xenopus OCT25 and OCT91 and zebrafish pou2[21–25], but not medaka pou2.
The pattern of germ cell expression is also inconsistent among POU2 and POU5F1 orthologs. Marsupial POU5F1 but not POU2 expression was detected by in situ hybridization in primordial germ-cells and early spermatogonia [5, 19], whereas both axolotl paralogs are expressed in germ cells . Germ cell expression has also been reported for chicken POUV (POU2) and Xenopus OCT60 and among teleosts for medaka  and cod  but not for zebrafish. Nevertheless, all POU5F1 orthologs that have been examined are expressed in germ cells, which may be significant. Two modes of germ cell specification are recognized among vertebrates - predetermined (germ plasm) and inductive (regulative). In the predetermined mode, maternally inherited germ plasm is partitioned during cleavage to a subset of cells, which are then specified to become germ cells. In the inductive mode, there is no germ plasm and germ cells become specified by inducing signals from neighboring cells. The inductive mode is considered ancestral, with the predetermined mode independently derived in birds, frogs and teleost fishes . The predetermined mode was proposed to be correlated with a derived mode of mesoderm induction [30, 31] as well as with a more POU5F1-like class V POU transcription factor , although this preceded knowledge of the paralogous relationship between POU2 and POU5F1 among vertebrates [5, 6]. We thus hypothesized that inductive germ cell specification is specifically correlated with the presence of a POU5F1 ortholog, irrespective of the presence of POU2 . Our present data are still largely consistent with this hypothesis. Evidence suggests that turtles have inductive germ cell specification , while retaining POU5F1 (and POU2). To our knowledge, no data exist on the mode of germ cell specification in crocodilians, which would be expected to share a similar mode with birds. Evidence suggests that the sturgeon (a basal actinopterygian) lacks germ plasm and is thus likely to have the inductive mode of germ cell specification [31, 34]. The sturgeon genome has not been sequenced, so it is possible that it has a POU5F1 ortholog in addition to its previously reported POU2 ortholog . Indeed, Johnson et al. do refer to an unpublished “Oct-4” sequence from sturgeon. In the sequenced genome of the spotted gar (a less basal, non-teleost actinopterygian), we found all five exons of a POU2 ortholog but no exons corresponding to a POU5F1 ortholog. Thus POU2 presumably became extinct in a common ancestor of gars and teleost fishes. To our knowledge, the mode of germ cell specification of gars has not been investigated. Early studies of elasmobranch fishes cited by Extavour and Akam  drew conflicting conclusions regarding the mode of germ cell specification in elasmobranch fishes and no studies have examined fishes of the subclass Holocephali (for example, elephantfish). Further studies examining the mode of germ cell specification in several of the above lineages will provide powerful data to test the intriguing notion that the acquisition of predetermined germ cell specification permits or even drives the loss of POU5F1.
Our study resolves the controversial evolutionary relationship between teleost pou2 and tetrapod POU2 and POU5F1. It shows that class V POU transcription factors have existed at least since the common ancestor of gnathostome vertebrates and provides a framework for elucidating the basis for the lineage-specific extinctions of POU2 and POU5F1, which is likely to be informative for understanding their roles in development.
General sequence analysis was performed using MacVector software, version 12.7.3 (MacVector, Inc.; Cary, North Carolina, USA). Sources of all sequences are detailed in Additional file 4. Sequences were selected to provide a broad range of taxonomic groups. The three Xenopus POU2 orthologs (OCT91, OCT60 and OCT25) were not included in analyses since they display considerable sequence divergence, which could be related to redundancy among them. All alignments were performed using the algorithm Muscle (with default parameters) in MacVector on translated sequences. Subsequent manual adjustment was only performed for the full POU2/POU5F1 alignment. Phylogenetic analyses on aligned sequences were performed using PHYLIP version 3.69 , using the maximum parsimony (100 replicates, 10 jumbles), maximum likelihood without the assumption of a molecular clock (1,000 replicates, 10 jumbles) and neighbor-joining (1,000 replicates) methods with default parameters.
Embryonic stem cells
Expressed sequence tags
induced pluripotent stemcells
Whole genome shotgun.
We thank Dr. Natalia Tapia for providing the sequence of axolotl POU2.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.