Comparative analysis of protein coding sequences from human, mouse and the domesticated pig
© Jørgensen et al; licensee BioMed Central Ltd. 2005
Received: 05 November 2004
Accepted: 28 January 2005
Published: 28 January 2005
The availability of abundant sequence data from key model organisms has made large scale studies of molecular evolution an exciting possibility. Here we use full length cDNA alignments comprising more than 700,000 nucleotides from human, mouse, pig and the Japanese pufferfish Fugu rubrices in order to investigate 1) the relationships between three major lineages of mammals: rodents, artiodactyls and primates, and 2) the rate of evolution and the occurrence of positive Darwinian selection using codon based models of sequence evolution.
We provide evidence that the evolutionary splits among primates, rodents and artiodactyls happened shortly after each other, with most gene trees favouring a topology with rodents as outgroup to primates and artiodactyls. Using an unrooted topology of the three mammalian species we show that since their diversification, the pig and mouse lineages have on average experienced 1.44 and 2.86 times as many synonymous substitutions as humans, respectively, whereas the rates of non-synonymous substitutions are more similar. The analysis shows the highest average dN/dS ratio in the human lineage, followed by the pig and then the mouse lineages. Using codon based models we detect signals of positive Darwinian selection in approximately 5.3%, 4.9% and 6.0% of the genes on the human, pig and mouse lineages respectively. Approximately 16.8% of all the genes studied here are not currently annotated as functional genes in humans. Our analyses indicate that a large fraction of these genes may have lost their function quite recently or may still be functional genes in some or all of the three mammalian species.
We present a comparative analysis of protein coding genes from three major mammalian lineages. Our study demonstrates the usefulness of codon-based likelihood models in detecting selection and it illustrates the value of sequencing organisms at different phylogenetic distances for comparative studies.
Large scale sequencing projects of many different species allow us to investigate phylogenetic issues in much more detail and to identify whether certain genes have had an extraordinary evolution in one or more species and thus gain insight into the actions of natural selection. Despite the sequencing of an increasing number of mammalian genomes and the implementation of more sophisticated evolutionary models using maximum likelihood and Bayesian methodology, the branching order within the mammalian phylum is still not completely resolved. The main reason for this uncertainty is that the diversification of these orders occurred over a short period of time, making the inference of branching order a difficult problem. One of the highly debated issues concerns the relative order of branching among primates, artiodactyls and rodents [1–9]. Here, the Japanese pufferfish Fugu rubrices is used as an outgroup to estimate the branching order of the three species relative to each other.
Overview of the codon models used in the analyses.
Lineage specific models
M0: One Ratio
κ, τpig, τhuman, τmouse, ω
M1a: Free Ratio
κ, τpig, τhuman, τmouse, ωpig, ωhuman, ωmouse
Site specific models
κ, τpig, τhuman, τmouse, p 0 (p 0 + p 1 = 1), ω[0;1[
Branch-Site specific models
M2a: Model A
κ, τpig, τhuman, τmouse, p 0(p 0 + p 1 = 1), p 2, ω[0;1[, ωforeground
Comparison of topologies.
1. codon pos.
2. codon pos.
We used the baseml program of PAML to compare the three topologies in a nucleotide based framework. Different nucleotide based substitution models were used to maximize the likelihood on the three topologies for each of the three codon positions separately. The results of using different models of nucleotide evolution were highly similar so here we only discuss the results obtained with the HKY85 model . The results based on the third codon position shows that Fugu is too distantly related to the three mammals to be informative in placement of the root of the mammals (results not shown). The first and second codon positions do not show such saturation and should therefore be useful in comparing the three topologies. Consistent with the results based on the amino acid substitution model we see that topology B is favoured in most genes, followed by topology A and topology C, respectively. The actual numbers from the second codon position are 215, 386 and 179 in favour of topology A, topology B and topology C respectively and 208 alignments are uninformative. The corresponding numbers for the first codon position are 215, 545, 175 and 53 (Table 2.).
The internal branch is rather short in all cases. Therefore in the remaining analyses we treat the mouse, human, pig split as a trifurcation. Depending on which topology is actually the right one, the only bias introduced by treating the topology as a star tree, as shown in Figure 3d, is a minor overestimation of the branch length of the species that actually roots the other two.
The rates of evolution
The rates of evolution.
Positive Darwinian selection
The dN/dS ratios on the three different lineages were estimated under the free ratio model (Figure 4g–i). Most genes in all three species have an average dN/dS ratio very close to zero with the average dN/dS ratio higher in human than in pig, which again is higher than in the mouse lineage.
Genes where all branches have ω > 1 based on the one ratio model.
Number of substitutions
Genes with branches where ω > 1 based on the free ratio model.
Number of substitutions
ω > 1
Genes predicted to be under positive selection with the branch-site models.
Human gene annotation
Cytochrome c oxidase subunit VIIc
Eukaryotic translation initiation factor 4E-like 3
Pituitary tumor-transforming 2
Grancalcin, EF-hand calcium binding protein
Ocular development-associated gene (Interim)
Small EDRK-rich factor 1B (centromeric)
Similar to RIKEN cDNA 1300003K24 (Interim)
Fibrinogen, gamma polypeptide
Hexosaminidase A (alpha polypeptide)
Sterol carrier protein 2
Sjogren syndrome antigen B (autoantigen La) *
5'-nucleotidase, cytosolic III
Actin related protein 2/3 complex, subunit 2, 34 kDa
TCF3 (E2A) fusion partner (in childhood leukaemia)
Chromosome 20 open reading frame 116
The molecular function of the genes predicted to be under positive selection was determined using the Panther server  and the NCBI server using the newest build of the human genome. Both annotation servers are updated on a regular basis when new information becomes available. During the course of this study the annotation of several genes changed. Of our 1120 alignments 188 are currently not annotated as functional genes indicating that they might possibly be pseudogenes in human; see the Discussion for more details on this subject. The proportion of genes that we report to have undergone positive selection in the human lineage at the 5% level of significance can therefore be viewed as either 58/1120 ~5.2% or 43/931 ~4.6%, indicating that possible pseudogenes are only slightly overrepresented in the genes predicted to have undergone adaptive evolution. The genes predicted to have been under positive selection in the pig and mouse lineage show a similar trend.
Heterogeneity in dN/dS ratios over sites.
Fraction of codons
ω = 0
ω = 1/3
ω = 2/3
ω = 1
ω = 3
8 × 10-5
F3 × 4
1 × 10-5
Codon usage bias
Codon usage in the three mammalian species.
Evaluation of the choice of codon equilibrium frequencies.
Estimated Branch Lengths
F1 × 4
F3 × 4
F3 × 4 + CpG
The phylogeny of the early mammalian radiation has been extensively debated over the last two decades. The classical view based on fossil evidence states that all major orders of placental mammals first appear right after the Cretaceous-Tertiary (KT) boundary approximately 65 million years ago . This sudden appearance of all major placental orders is known as the mammalian radiation. With the use of molecular data this late radiation has been challenged and it is now widely accepted that the radiation of the placental orders probably occurred many million years before the KT boundary [29–31]. Molecular data have also been used to investigate the relative branching orders of many of the larger clades of placental mammals [1–7, 9, 30]. One of the issues that have been debated extensively is the placement of Rodentia in the placental tree. Some studies favour a basal placement of the rodents [1, 3–5, 32, 33] while other studies favour a sister relationship between primates and rodents [6–8]. Recently strong evidence based on insertions, deletions and ancient transposable elements in favour of a sister relationship of primates and rodents has been reported [2, 34].
The incongruence of single gene phylogenies was investigated in a recent study of eight yeast species . The phylogeny commonly believed to be correct is completely resolved when concatenating 20 or more randomly chosen genes to form a super gene. A concatenated multi gene approach was also shown to resolve single gene incongruences in a recent study on green algae . Here we use 988 full cDNA alignments comprising 672,918 nucleotides to investigate the branching order of the three mammalian species. We present results based on both single gene phylogenies and a concatenated super gene. All genes including the concatenated super gene were analysed with both nucleotide and amino acid based substitution models. All methods favour a primate-artiodactyls clade with rodents as an outgroup but with a relatively short internal mammalian branch, indicating that the mammalian radiation happened within a short period of time. The different methods used in this study have very different assumptions but they all show the same general results. The HKY85 model takes into account differences in nucleotide frequencies and transition/transversion biases and allows for differences in substitution rates among the lineages. However, it is still possible that complexities unaccounted for such as non-stationarity and irreversibility of the substitution process have created biases that lead to long-branch attraction of Fugu and Mouse and an erroneous conclusion. Furthermore, the incongruence between our analysis and many recent studies is also affected by the following. (1) The choice of outgroup; bony fishes are believed to have diverged approximately 450 million years ago , making saturation effects in synonymous sites a real problem. We are therefore forced to only consider nonsynonymous sites or amino acid replacements in the phylogenetic analyses. The recently completed genome sequence of the chicken (Gallus gallus) shows that the average value of dS between human and chicken genes is approximately 1.66 , which indicates that many genes may still be too distantly related for synonymous sites to avoid problems with saturation. A marsupial species would provide a much better outgroup when available [3, 32]. (2) Taxon sampling; by only using three species the variance of the parameter estimates can be quite high and the power to discriminate between two conflicting topologies quite low. The sequencing of more species will lessen this problem. (3) Overly simplistic evolutionary models; here we use only nucleotide and amino acid based models. If a more closely related outgroup was available the use of more complex codon based models could be beneficial in resolving the apparent conflict. Several extensions have been made to the codon models during the past few years. One obvious extension to the codon models is a model that incorporates CG avoidance within and over codon boundaries. This will clearly improve the fit of the data to the model and therefore give more accurate parameter estimates. Including context dependencies over codon boundaries and information about protein structure have also been shown to increase the fit of the models to protein coding data and therefore should result in better parameter estimates [38, 39]. (4) Gene trees and species trees can be different; the split between the three groups probably occurred within a very short period of time, allowing for the possibility that different genes actually have different phylogenies due to ancient polymorphisms at the time of the speciation. Using even larger number of genes and a sufficiently sophisticated model should lessen this problem [35, 36].
The rate of synonymous substitution was estimated to be almost three times higher in rodents than in other mammals, in agreement with previous investigations that also showed an elevated rate in rodents [40–42]. This has historically often been explained by a generation time effect. Species that have short generation times experience more generations in the time span we consider and consequently they will experience more neutral substitutions over time. The fact that the pig, which has a generation time intermediate between mouse and humans, has an intermediate rate of synonymous substitutions, seems to agree with this theory. For a more thorough discussion of the generation time hypothesis in mammals see . The nearly neutral theory of molecular evolution predicts that the generation time effect should be smaller for non-synonymous substitutions [42, 44, 45]. The simple argument is that animals with short generation times such as rodents often have a very large effective population size. In a population with a large effective population size slightly deleterious mutations will be removed from the gene pool more effectively than in a population with a small effective population size, where genetic drift will reduce the efficiency of natural selection. Figure 4g–h shows the distribution of the dN/dS ratio in the three lineages. The average dN/dS ratio is highest in humans suggesting a small effective population size, while it is smallest in mouse suggesting a larger effective population size.
Previous studies of the occurrence of positive selection based on pair wise comparisons have revealed a very low occurrence of positive selection. In a study of 3595 alignments only 17 genes showed evidence of positive selection . The branch specific models used here only find one gene where the dN/dS ratio is significantly larger than one. The gene reported is XM_165930. XM_165930 was originally annotated as being similar to cold shock domain protein A, but it has recently been removed from Genbank as a result of standard genome annotation processes.
Codon based branch-site models similar to the ones used here were used in a paper based on a three way comparison among chimpanzees, humans and mice . They report that approximately 1.6 % of all the genes studied have been undergoing positive selection in the lineage leading to modern humans. Using a similar criterion our study indicates that approximately 3.0 % of the genes studied have been undergoing positive selection on the lineage leading to humans; the corresponding numbers for pig and mouse are 2.0 % and 2.2 % respectively. When comparing these two studies it is important to consider the following three things: (1) the relatively short average length of the genes studied here decreases the power of the models to detect positive selection; (2) the use of the new BEB method for detecting positively selected sites should reduce the number of false positives, making our estimates more conservative and more accurate; (3) our study deals with a completely different phylogenetic level, covering a much longer time span than the study by Clark and colleagues.
The multiple testing and the small number of taxa used in a study like this imply that the results presented should not be taken as conclusive evidence for positive selection, but more as an approach to searching among the thousands of genes to look for genes that may have evolved in a biologically interesting manner. Comparative approaches such as the one we use here can only be a first step towards showing that positive Darwinian selection may be a key part in the evolution of many different gene families. Further experimental and computational analyses must then be used to investigate the suggested candidates more thoroughly.
During the course of our investigation a large fraction of the genes were re-annotated as putative pseudogenes: 188/1120 ~16.8%. However, all these genes have uninterrupted reading frames in all three species; only a tiny fraction of all codons seems to have evolved in a neutral-like fashion (ω~1), and the distributions of the synonymous as well as the nonsynonymous rates of these putative pseudogenes are almost identical to the distributions of the remaining genes (results not shown). The only difference is a slight increase in the dN/dS ratio in the human lineage, which is actually due to a few genes that experience an unusually high dN/dS ratio. Omitting these genes from the analysis removes the observed differences completely. Thus, if all these genes are indeed pseudogenes in human, the loss of function must have occurred quite recently and they may not be pseudogenes in pig and mouse.
The collection of a large set of pig cDNA sequences has enabled us to study long term evolutionary trends in mammalian genes. Our results indicate that the codon models are able to detect evolutionary signals indicating adaptive evolution in several genes. Our phylogenetic investigation of the primate, rodent, artiodactyl split disagree with most recent findings in favouring a primate, artiodactyl clade with rodents as an outgroup. Our study indicates that several genes that are not classified as genes in the most recent human annotation might after all be real genes; or at least they have become pseudogenes very recently, and the orthologous genes in mouse and pig might still be functional. This shows the potential of comparative methods in identifying functional regions of the genome.
Complete cDNA from the domesticated pig Sus scrofa was assembled at the Danish Institute of Agricultural Sciences (DIAS) from cDNA libraries from 100 different tissues constructed at DIAS and the Royal Veterinary and Agricultural University in the following way. Total RNA was purified from selected tissues using Rneasy (Qiagen) or Tri ReagentR and poly(A+) mRNA was selected using Oligotex (Qiagene) or PolyATract (Promega). Directional cloneable cDNA was synthezised from Poly(A+) mRNA using the cDNA Synthesis Kit (Stratagene) and was ligated into Eco RI/Xho I digested pTrueBlue (GenomicsOne) or pBluescript (Stratagene) followed by electrotransformation into E. coli XL1-Blue MRF' (Stratagene). 5'-EST sequencing was performed using standard protocols (Applied Biosystem). The sequences were trimmed to the longest open reading frame and the termination codons were removed.
Homologues sequences from human, mouse and the Japanese pufferfish Fugu rubrices were obtained with the blastall program with default parameters; the E-score was set to 10-8. We constructed two different datasets, one with and one without Fugu rubrices. Individual alignments were made using ClustalW version 1.83 with default parameters . We kept the pig reading frame intact in the alignments by removing any columns where the alignment gave rise to gaps in the pig sequence. Alignments that resulted in premature stop codons, or were shorter than 30 codons, were removed. We used the one ratio model to estimate the total branch length of the tree as well as the synonymous branch lengths. These distributions were used to detect peculiar genes where one or more sequences might not be a true orthologue, and all outliers were thereafter removed from the dataset. This analysis gave 1120 alignments of mouse, human and pig, and of these 988 also included Fugu. The 1120 original cDNAs from Sus scrofa have been deposited in Genbank with the following accession numbers: AY609387-AY610506.
Phylogeny and rates of evolution
Nine hundred and eighty-eight four-species alignments were concatenated into a super gene. The three topologies were compared using the super gene as well as each individual gene. Both nonsynonymous nucleotide substitutions and amino acid substitutions were investigated with PAML v. 3.14 . The nonsynonymous substitutions were represented by the first and second codon positions of all codons, and the three different topologies were investigated with baseml using the HKY85 model (model = 4) of nucleotide substitutions. The likelihood was then maximized under the three different topologies using all the individual genes as well the concatenated super gene. The codeml program with the codons translated to amino acids (seqtype = 3) were also used to investigate the three topologies. We used different models of amino acid evolution to maximize the likelihood under the three topologies and since the results were highly similar we only present the results from the empirical method of Whelan and Goldman (model = 2, aaratefile = wag.dat).
Using the 1120 three species alignments, the synonymous and nonsynonymous rates of evolution were estimated with the codeml program (seqtype = 1) using the free ratio model (model = 2) with the transition/transversion ratio estimated from the data (fix_kappa = 0).
Investigation of selection
The different tests for positive Darwinian selection are all based on extensions of the basic codon based likelihood model . Likelihood ratio tests (LRTs) were used to compare nested models where one allows for positive selection and the other does not. The probability that a codon i substitutes into another codon j during the time interval t is determined by the rate matrix Q = (q ij ) with entries
for i ≠ j, with corresponding substitution probability matrix given by exp(Qt). Here π j is the equilibrium codon frequency of codon j, κ is the transition/transversion ratio and ω is the dN/dS ratio. All parameters are estimated independently for each gene. The star topology of the three species is used to estimate the branch lengths (τhuman, τpig, τmouse) for synonymous and non-synonymous substitutions.
Positive selection was tested in two different ways. Test 1 averages over sites but differentiates among lineages. The LRT compares the free ratio model where all three lineages have a different value of ω estimated from the data with the one ratio model where all three lineages share a common value of ω . We note that this test is more a test of variable dN/dS ratios among lineages than a test for positive selection. The free ratio model has three parameters for ω and the one ratio model only one. The LRT statistic is calculated as 2 times the differences in maximum log likelihood and is asymptotically distributed as a χ2 distribution with 2 degrees of freedom. The genes found in one or more lineages evolving with a dN/dS ratio > 1 are compared to a nested model where the dN/dS ratio is fixed at 1 in the lineages shown to have a dN/dS ratio larger than one to see whether the result can be attributed to natural selection or just relaxation of selective pressures.
Test 2 is based on a new and improved version of the branch-site method presented in . We will refer to this model as model A. The LRT is based on a comparison of the neutral model and model A. The neutral model assumes two categories of sites, a proportion p 1 of sites where ω1 are estimated from the data and is forced to lie between 0 and 1, and a proportion p2 of neutrally evolving sites where ω1 = 1 (p 1 + p 2 = 1). Model A furthermore allows a pre-specified branch to have a proportion of sites that evolve with a different value of ω estimated from the data. This value cannot be smaller than 1. The LRT follows a χ2 distribution with 2 degrees of freedom. If the value of ω in the foreground lineage is estimated to be equal to one the model collapses to the neutral model.
PAML v. 3.14  was used to estimate likelihood and parameters under each model. Codon equilibrium frequencies can be estimated from data using either simple proportions in the full data set (the CT model with 60 parameters), assuming equal frequencies (Fequal model), multiplying overall counts of nucleotide frequencies (F1 × 4 model, 3 parameters) or counts of nucleotide frequencies for each codon position (F3 × 4 model, 9 parameters). The codon table (CT) was used for analysis of the concatenated super gene and the F3 × 4 model was used on the individual genes.
CpG Extension of the codon models
A simple extension of the F3 × 4 codon equilibrium frequency model can incorporate CpG avoidance by adding an extra parameter that penalizes a C followed by a G in the second and third codon position. The new model is parameterised as follows
Here πi 1 1 represents the frequency of nucleotide i 1, at codon position 1, and ψ(0 < ψ < 1) is a CpG penalizing parameter. The scaling factor cψ ensures that the codon frequencies sum to one.
We would like to thank Andrew Clark, Rasmus Nielsen, Nick Goldman, Thomas Bataillon, Ole Fredslund Christensen, and two anonymous reviewers for many valuable comments on previous versions of this manuscript.
We acknowledge the Sino-Danish Pig Genome Sequencing Consortium that has generated the pig data used. The data are part of a much larger data set of one million ESTs, which is under publication.
The Sino-Danish Pig Genome Consortium consists of The Danish Veterinary and Agricultural University (KVL), Denmark, the Danish Institute of Agricultural Sciences (DIAS), Denmark, and the Beijing Genomics Institute/James D. Watson Institute of Genome Sciences (BGI/WIGS), China, in collaboration with Institute of Human Genetics, University of Aarhus, Denmark.
In particular we acknowledge the construction of cDNA libraries by Susanna Cirera with the help of Milena Sawera, Trine Green and Bente Juul Nielsen at KVL as well as Jakob Hedegaard with the help of Lone Bruhn Madsen, Bo Thomsen, Xuegang Wang and Miao Zhu at DIAS and Lin Li and Bin Liu at BGI/WIGS.
- Li WH, Gouy M, Sharp PM, O'HUigin C, Yang YW: Molecular phylogeny of Rodentia, Lagomorpha, Primates, Artiodactyla, and Carnivora and molecular clocks. Proc Natl Acad Sci U S A. 1990, 87 (17): 6703-6707.PubMed CentralView ArticlePubMedGoogle Scholar
- Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol JR, Prasad AB, Lee-Lin SQ, Maduro VV, Summers TJ, Portnoy ME, Dietrich NL, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley CP, Brooks SY, Granite S, Guan X, Gupta J, Haghighi P, Ho SL, Huang MC, Karlins E, Laric PL, Legaspi R, Lim MJ, Maduro QL, Masiello CA, Mastrian SD, McCloskey JC, Pearson R, Stantripop S, Tiongson EE, Tran JT, Tsurgeon C, Vogt JL, Walker MA, Wetherby KD, Wiggins LS, Young AC, Zhang LH, Osoegawa K, Zhu B, Zhao B, Shu CL, De Jong PJ, Lawrence CE, Smit AF, Chakravarti A, Haussler D, Green P, Miller W, Green ED: Comparative analyses of multi-species sequences from targeted genomic regions. Nature. 2003, 424 (6950): 788-793. 10.1038/nature01858.View ArticlePubMedGoogle Scholar
- Janke A, Feldmaier-Fuchs G, Thomas WK, von Haeseler A, Paabo S: The marsupial mitochondrial genome and the evolution of placental mammals. Genetics. 1994, 137 (1): 243-256.PubMed CentralPubMedGoogle Scholar
- Misawa K, Janke A: Revisiting the Glires concept – phylogenetic analysis of nuclear sequences. Mol Phylogenet Evol. 2003, 28 (2): 320-327. 10.1016/S1055-7903(03)00079-4.View ArticlePubMedGoogle Scholar
- Easteal S: The pattern of mammalian evolution and the relative rate of molecular evolution. Genetics. 1990, 124 (1): 165-173.PubMed CentralPubMedGoogle Scholar
- Reyes A, Gissi C, Catzeflis F, Nevo E, Pesole G, Saccone C: Congruent Mammalian Trees from Mitochondrial and Nuclear Genes Using Bayesian Methods. Mol Biol Evol. 2004, 21 (2): 397-403. 10.1093/molbev/msh033.View ArticlePubMedGoogle Scholar
- Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS: Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001, 294 (5550): 2348-2351. 10.1126/science.1067179.View ArticlePubMedGoogle Scholar
- Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O'Brien SJ: Molecular phylogenetics and the origins of placental mammals. Nature. 2001, 409 (6820): 614-618. 10.1038/35054550.View ArticlePubMedGoogle Scholar
- Misawa K, Nei M: Reanalysis of Murphy et al.'s data gives various mammalian phylogenies and suggests overcredibility of Bayesian trees. J Mol Evol. 2003, 57 (Suppl 1): S290-296. 10.1007/s00239-003-0039-7.View ArticlePubMedGoogle Scholar
- Muse SV, Gaut BS: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994, 11 (5): 715-724.PubMedGoogle Scholar
- Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11 (5): 725-736.PubMedGoogle Scholar
- Yang Z: Inference of selection from multiple species alignments. Curr Opin Genet Dev. 2002, 12 (6): 688-694. 10.1016/S0959-437X(02)00348-9.View ArticlePubMedGoogle Scholar
- Bielawski JP, Yang Z: Maximum likelihood methods for detecting adaptive evolution after gene duplication. J Struct Funct Genomics. 2003, 3 (1–4): 201-212. 10.1023/A:1022642807731.View ArticlePubMedGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15 (5): 568-573.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155 (1): 431-449.PubMed CentralPubMedGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19 (6): 908-917.View ArticlePubMedGoogle Scholar
- Bishop JG, Dean AM, Mitchell-Olds T: Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. Proc Natl Acad Sci U S A. 2000, 97 (10): 5322-5327. 10.1073/pnas.97.10.5322.PubMed CentralView ArticlePubMedGoogle Scholar
- Zanotto PM, Kallas EG, de Souza RF, Holmes EC: Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics. 1999, 153 (3): 1077-1089.PubMed CentralPubMedGoogle Scholar
- Haydon DT, Bastos AD, Knowles NJ, Samuel AR: Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates. Genetics. 2001, 157 (1): 7-15.PubMed CentralPubMedGoogle Scholar
- Mathews S, Burleigh JG, Donoghue MJ: Adaptive evolution in the photosensory domain of phytochrome A in early angiosperms. Mol Biol Evol. 2003, 20 (7): 1087-1097. 10.1093/molbev/msg123.View ArticlePubMedGoogle Scholar
- Bailly X, Leroy R, Carney S, Collin O, Zal F, Toulmond A, Jollivet D: The loss of the hemoglobin H2S-binding function in annelids from sulfide-free habitats reveals molecular adaptation driven by Darwinian positive selection. Proc Natl Acad Sci U S A. 2003, 100 (10): 5885-5890. 10.1073/pnas.1037686100.PubMed CentralView ArticlePubMedGoogle Scholar
- Jansa SA, Lundrigan BL, Tucker PK: Tests for positive selection on immune and reproductive genes in closely related species of the murine genus mus. J Mol Evol. 2003, 56 (3): 294-307. 10.1007/s00239-002-2401-6.View ArticlePubMedGoogle Scholar
- Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferriera S, Wang G, Zheng X, White TJ, Sninsky JJ, Adams MD, Cargill M: Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science. 2003, 302 (5652): 1960-1963. 10.1126/science.1088821.View ArticlePubMedGoogle Scholar
- Suzuki Y, Nei M: False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol Biol Evol. 2004, 21 (5): 914-921. 10.1093/molbev/msh098.View ArticlePubMedGoogle Scholar
- Zhang J: Frequent false detection of positive selection by the likelihood method with branch-site models. Mol Biol Evol. 2004, 21 (7): 1332-1339. 10.1093/molbev/msh117.View ArticlePubMedGoogle Scholar
- Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001, 18 (5): 691-699.View ArticlePubMedGoogle Scholar
- Hasegawa M, Kishino H, Yano T: Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985, 22 (2): 160-174.View ArticlePubMedGoogle Scholar
- Thomas PD, Kejariwal A, Campbell MJ, Mi H, Diemer K, Guo N, Ladunga I, Ulitsky-Lazareva B, Muruganujan A, Rabkin S, Vandergriff JA, Doremieux O: PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res. 2003, 31 (1): 334-341. 10.1093/nar/gkg115.PubMed CentralView ArticlePubMedGoogle Scholar
- Easteal S: Molecular evidence for the early divergence of placental mammals. Bioessays. 1999, 21 (12): 1052-1058. 10.1002/(SICI)1521-1878(199912)22:1<1052::AID-BIES9>3.0.CO;2-6. discussion 1059View ArticlePubMedGoogle Scholar
- Eizirik E, Murphy WJ, O'Brien SJ: Molecular dating and biogeography of the early placental mammal radiation. J Hered. 2001, 92 (2): 212-219. 10.1093/jhered/92.2.212.View ArticlePubMedGoogle Scholar
- Kumar S, Hedges SB: A molecular timescale for vertebrate evolution. Nature. 1998, 392 (6679): 917-920. 10.1038/31927.View ArticlePubMedGoogle Scholar
- Janke A, Xu X, Arnason U: The complete mitochondrial genome of the wallaroo (Macropus robustus) and the phylogenetic relationship among Monotremata, Marsupialia, and Eutheria. Proc Natl Acad Sci U S A. 1997, 94 (4): 1276-1281. 10.1073/pnas.94.4.1276.PubMed CentralView ArticlePubMedGoogle Scholar
- Easteal S: Rate constancy of globin gene evolution in placental mammals. Proc Natl Acad Sci U S A. 1988, 85 (20): 7622-7626.PubMed CentralView ArticlePubMedGoogle Scholar
- de Jong WW, van Dijk MA, Poux C, Kappe G, van Rheede T, Madsen O: Indels in protein-coding sequences of Euarchontoglires constrain the rooting of the eutherian tree. Mol Phylogenet Evol. 2003, 28 (2): 328-340. 10.1016/S1055-7903(03)00116-7.View ArticlePubMedGoogle Scholar
- Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425 (6960): 798-804. 10.1038/nature02053.View ArticlePubMedGoogle Scholar
- Gontcharov AA, Marin B, Melkonian M: Are Combined Analyses Better Than Single Gene Phylogenies? A Case Study Using SSU rDNA and rbcL Sequence Comparisons in the Zygnematophyceae (Streptophyta). Mol Biol Evol. 2004, 21 (3): 612-624. 10.1093/molbev/msh052.View ArticlePubMedGoogle Scholar
- Consortium ICGS: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716. 10.1038/nature03154.View ArticleGoogle Scholar
- Robinson DM, Jones DT, Kishino H, Goldman N, Thorne JL: Protein evolution with dependence among codons due to tertiary structure. Mol Biol Evol. 2003, 20 (10): 1692-1704. 10.1093/molbev/msg184.View ArticlePubMedGoogle Scholar
- Siepel A, Haussler D: Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 2004, 21 (3): 468-488. 10.1093/molbev/msh039.View ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562. 10.1038/nature01262.View ArticlePubMedGoogle Scholar
- Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Alba M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hubner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Payseur BA, Bourque G, Lopez-Otin C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428 (6982): 493-521. 10.1038/nature02426.View ArticlePubMedGoogle Scholar
- Ohta T: An examination of the generation-time effect on molecular evolution. Proc Natl Acad Sci U S A. 1993, 90 (22): 10676-10680.PubMed CentralView ArticlePubMedGoogle Scholar
- Li WH, Ellsworth DL, Krushkal J, Chang BH, Hewett-Emmett D: Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylogenet Evol. 1996, 5 (1): 182-187. 10.1006/mpev.1996.0012.View ArticlePubMedGoogle Scholar
- Ohta T: Slightly deleterious mutant substitutions in evolution. Nature. 1973, 246 (5428): 96-98.View ArticlePubMedGoogle Scholar
- Ohta T: Synonymous and Nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J Mol Evol. 1995, 40: 56-63. 10.1007/BF00166595.View ArticlePubMedGoogle Scholar
- Endo T, Ikeo K, Gojobori T: Large-scale search for genes on which positive selection may operate. Mol Biol Evol. 1996, 13 (5): 685-690.View ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: PAML: A program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.