The colonization of land by animals: molecular phylogeny and divergence times among arthropods

Background The earliest fossil evidence of terrestrial animal activity is from the Ordovician, ~450 million years ago (Ma). However, there are earlier animal fossils, and most molecular clocks suggest a deep origin of animal phyla in the Precambrian, leaving open the possibility that animals colonized land much earlier than the Ordovician. To further investigate the time of colonization of land by animals, we sequenced two nuclear genes, glyceraldehyde-3-phosphate dehydrogenase and enolase, in representative arthropods and conducted phylogenetic and molecular clock analyses of those and other available DNA and protein sequence data. To assess the robustness of animal molecular clocks, we estimated the deuterostome-arthropod divergence using the arthropod fossil record for calibration and tunicate instead of vertebrate sequences to represent Deuterostomia. Nine nuclear and 15 mitochondrial genes were used in phylogenetic analyses and 61 genes were used in molecular clock analyses. Results Significant support was found for the unconventional pairing of myriapods (millipedes and centipedes) with chelicerates (spiders, scorpions, horseshoe crabs, etc.) using nuclear and mitochondrial genes. Our estimated time for the divergence of millipedes (Diplopoda) and centipedes (Chilopoda) was 442 ± 50 Ma, and the divergence of insects and crustaceans was estimated as 666 ± 58 Ma. Our results also agree with previous studies suggesting a deep divergence (~1100 – 900 Ma) for arthropods and deuterostomes, considerably predating the Cambrian Explosion seen in the animal fossil record. Conclusions The consistent support for a close relationship between myriapods and chelicerates, using mitochondrial and nuclear genes and different methods of analysis, suggests that this unexpected result is not an artefact of analysis. We propose the name Myriochelata for this group of animals, which includes many that immobilize prey with venom. Our molecular clock analyses using arthropod fossil calibrations support earlier studies using vertebrate calibrations in finding that deuterostomes and arthropods diverged hundreds of millions of years before the Cambrian explosion. However, our molecular time estimate for the divergence of millipedes and centipedes is close to the divergence time inferred from fossils. This suggests that arthropods may have adapted to the terrestrial environment relatively late in their evolutionary history.


Background
The terrestrial environment has been greatly altered by the actions of organisms over Earth's history. Prokaryotes were probably the first organisms to colonize land, and this occurred as early as 2.6 billion years ago [1][2][3]. The presence of organisms on exposed land will accelerate weathering through physical and chemical processes and may in turn affect the global atmosphere and climate [4]. Therefore, it is of interest to know when different groups of organisms colonized land to better understand their effect on the biosphere.
The earliest undisputed fossils of terrestrial plants, animals, and fungi are all from the early Palaeozoic (Ordovician and Silurian; 495 -417 million years ago, Ma) [5][6][7][8][9]. However, the earliest animal fossils are known from ~600 Ma [10][11][12] and the earliest representatives of the "plant lineage", such as green algae [13] and red algae [14], from even earlier (1000 -750 Ma and 1200 Ma, respectively). This raises the possibility that land was colonized by multicellular eukaryotes prior to the Ordovician. A previous molecular clock analysis addressed the question of land colonization by plants and fungi, resulting in early time estimates of about 700 and 1000 Ma, respectively [15]. However, molecular clock analyses have not addressed the colonization of land by animals.
Myriapods (centipedes, millipedes) and chelicerates (e.g., arachnids, horseshoe crabs) have figured prominently in the earliest evidence of terrestrial animals. For example, the first taxonomically identifiable body fossils of terrestrial animals are arachnids and chilopods from the late Silurian (~419 Ma) of England [6], and the oldest unambiguous evidence of sub-aerial animal activity is of arthropod (diplopod-like) trackways from ~450 Ma sediments [16]. Older arthropod traces (possibly from the latest Cambrian) have been reported from terrestrial sediments, eolian dune deposits accumulated in a sandy beach environment of southern Canada [17], although the terrestrial nature of the trace makers is uncertain. Older, marine myriapod-like fossils are known [18], but no marine remains attributable to Chilopoda or Diplopoda have been found.
Here, we use molecular clock and phylogenetic methods to place temporal constraints on the early history of arthropods and the colonization of land by animals. Our focus is on the divergence of millipedes and centipedes because they represent the most ancient living lineages of terrestrial animals (based on the fossil record) whose common ancestor presumably was terrestrial. In turn, this provides a minimum estimate for the time when land was colonized by animals. With the use of a phylogenetic framework derived from analyses of all available nuclear genes, we also estimate times of other major divergences in the history of living arthropods, and the divergence of arthropods and deuterostomes. These additional data help to constrain a maximal time for the colonization of land by arthropods.

Results
All phylogenetic analyses resulted in significant (>95%) clustering of myriapods and chelicerates (figure 1). Pancrustacea (insects and crustaceans) was found in most analyses, the only exceptions being minimum evolution with paralinear distances, and both weighted and unweighted parsimony analyses of nuclear + mitochondrial genes, and the unweighted parsimony analysis of only nuclear genes (Table 1). However, because these exceptional analyses used models that are less complex than recommended [19], it is possible that they were influenced by substitutional biases. Support for Pancrustacea was significant in most analyses of nuclear genes and the Bayesian analysis of nuclear + mitochondrial genes. The two representative crustaceans (branchiopods and malacostracans) formed a group in the nuclear gene analysis and the Bayesian all-gene analysis. However, insects joined malacostracans in the remaining all-gene analyses. This uncertainty of relationships within Pancrustacea has been encountered previously [20] and may be related to different rates of evolution among genes.
Divergence time estimates for the millipede-centipede split ranged from 357 -537 Ma across different molecular clock methods, with an average of 442 ± 50 Ma (Table 2). Average time estimates for the other divergences among arthropods were 475 ± 53 Ma for Xiphosura-Arachnida, 642 ± 63 Ma for Myriapoda-Chelicerata, 614 ± 23 Ma for Branchiopoda-Malacostraca, 666 ± 58 Ma for Insecta-Crustacea, and 725 ± 46 Ma for Pancrustacea-(Chelicerata-Myriapoda). As expected, the use of different calibrations resulted in different time estimates, and although the penalized likelihood method (SGG PL ) showed the greatest sensitivity to calibration differences, most methods yielded similar time estimates.
The use of exclusively fossil calibration points (Chilopoda-Diplopoda, Xiphosura-Arachnida) or molecular calibration points (Arthropoda-Deuterostomia) did not result in substantially different time estimates (table 2). Therefore, time estimates using the largest number of calibration points and proteins (averaged across methods) were chosen to summarize the time of divergence for each node (Table 2, Figure 2).
Using the assumption that the ancestral arthropod was aquatic (marine) and that terrestrialization is derived within arthropods, myriapods colonized land after the origin of the myriapod lineage (chelicerate-myriapod divergence; 642 ± 63 Ma) and before the millipede-centi-Phylogenetic relationships of the arthropods Figure 1 Phylogenetic relationships of the arthropods (a) Minimum evolution tree of the nine concatenated nuclear genes. (b) Minimum evolution tree of the 24 concatenated nuclear and mitochondrial genes. Both trees were obtained using gamma corrected Kimura 2-parameter (transversion) distances. The same (or similar) tree topologies were obtained using other methods (Table 1).   A timescale of arthropod evolution Figure 2 A timescale of arthropod evolution Numbers associated with nodes are divergence times (Ma) and their standard errors ( Table 2). Three calibration points were used: the fossil-based divergence of Chilopoda and Diplopoda (423 Ma), the fossilbased divergence of Xiphosura and Arachnida (480 Ma), and the 993 Ma deuterostome-arthropod divergence estimated from a previous molecular clock analysis.

Arthropod phylogeny
A major limitation of this study, with respect to phylogenetic implications, is the sparse taxonomic sampling. As in most studies, there is a trade-off in terms of taxa and genes or proteins. In this case, we have emphasized a large number of proteins to increase the statistical resolution of relationships and time estimates at the expense of taxonomic representation. Nonetheless, our results agree with most previous molecular phylogenetic analyses in supporting a close relationship between insects and crustaceans (Pancrustacea) and between myriapods and chelicerates. Of the two groups, Pancrustacea has received the strongest support in the past [20][21][22][23][24]. Nonetheless, a myriapod-chelicerate grouping has been found previously with mitochondrial genes [23][24][25], nuclear ribosomal genes [21], and nuclear protein-coding genes [26]. Additional evidence has come from hemocyanine structure [27].
We propose the name Myriochelata (in allusion to the joining of Myriapoda and Chelicerata) for the group containing myriapods and chelicerates, which otherwise is unnamed [28]. Although we are unable to identify any morphological trait diagnostic of this clade, some trends are evident that might reflect the morphological or ecological nature of the ancestral myriochelatan. For example, many species of extant myriochelatans (e.g., spiders, scorpions, centipedes) inject and immobilize prey with a poison, albeit with structures that are not homologous. Envenomation of prey is also found among pancrustaceans, but it is less broadly distributed in that group. Certainly, envenomation has arisen multiple times in arthropods, associated mostly (but not exclusively) with terrestrial predation. The significance of this trait in arthropod evolution must await sequence evidence from a greater diversity of taxa (e.g., pycnogonids, remipedes) than is currently available, and a careful examination of the early fossil record of animals (especially from the Cambrian). In general, the difficulty in finding morphological characters diagnostic of these major clades of arthropods is probably the result of deep branching of the lineages and an early fossil record that shows great morphological diversification (the Cambrian Explosion) and some important gaps [26,29,30].

Timescale for animal evolution and colonization of land
Among animals, arthropods have been considered to be the earliest colonizers of land based on fossil evidence [5][6][7]. However, it is possible that other animal phyla colonized land even earlier. Among them, nematodes, tardigrades, and annelids are likely candidates given their current exploitation of terrestrial environments, yet these groups have relatively poor fossil records.
Determining the number of such colonization events requires a consideration of phylogeny, the fossil record, and morphological traits associated with terrestriality. For arthropods, at least four major colonization events are inferred, leading to the chilopods + diplopods, insects, arachnids, and isopod crustaceans. Additional events may have occurred in smaller lineages [24,25,31]. Moreover, the recent discovery of a basal marine hexapod fossil from the Devonian [32] suggests that some hexapod traits previously believed to have evolved as adaptations to land may have first appeared in a marine setting. Along the same lines, it is also possible that millipedes and centipedes colonized land independently. However, because the earliest fossils of those groups are presumably terrestrial [6] and our molecular time estimate is only 5% earlier than the age of those fossils, our assumption of a terrestrial common ancestor of millipedes and centipedes has little affect on the time of colonization.
Our relatively young time estimates for the millipede-centipede (442 ± 50 Ma) and xiphosuran-arachnid (475 ± 53 Ma) divergences contrast with the much older time estimate for the deuterostome-arthropod divergence (~1100-900 Ma) (Tables 1 and 2). The first two are close to the corresponding fossil record estimates whereas the third one is ~400 million years earlier than the earliest fossil evidence for animals [12]. Most molecular clock analyses in the last three decades, including those using many genes, have resulted in similarly deep time estimates for the arthropod-deuterostome divergence [33]. Currently a debate exists as to whether divergences among animals are best represented by molecular clock dates or the fossil record [33][34][35][36][37]. The molecular clock dates suggest that early animals fossilized poorly, possibly because they were small and soft-bodied. Alternatively, others have argued that molecular time estimates are older than the true times because of statistical biases, rate changes, and calibration biases [36,38], although replies to those criticisms have been made [37].
The results of this study address several of these criticisms of molecular clocks in the following ways. (1) (5) The use of concatenated alignments (supergenes) yielded similar results to non-concatenated (multigene) analyses, indicating that the multigene analyses of previous studies were not responsible for the discordance between molecular clocks and the fossil record.
In summary, our molecular clock analysis resulted in some time estimates (e.g. millipede-centipede) in agreement with the fossil record and others (e.g., insect-crustacean) much earlier than fossil evidence (assuming that the crustaceans are, in fact, monophyletic). However, all studies, including this one, are limited by the small size of the available sequence data (5 -10 proteins across major groups of arthropods). In the future, it should be possible to use several hundred proteins across a diversity of arthropod taxa, and such an analysis should greatly increase the precision of phylogenetic and molecular clock analyses.

Conclusions
We have found strong statistical support for the unconventional grouping of myriapods and chelicerates, a taxon that we herein name Myriochelata. We also note that many myriochelatans immobilize their prey with venom. Using only arthropod fossil calibrations, our molecular clock analyses support earlier studies that used vertebrate calibrations in finding a deep divergence of deuterostomes and arthropods, hundreds of millions of years before the Cambrian explosion. However, our much younger molecular time estimate for the millipede-centipede divergence is close to the divergence time inferred from fossils for that node. This suggests that the colonization of land by arthropods occurred relatively late in their evolutionary history.

Phylogenetic methods
The alignments of all nine nuclear and 15 mitochondrial genes were concatenated (Additional file 1) because individual gene data sets were, for the most part, insufficient for statistical resolution of arthropod phylogeny (Additional file 2). This required the use of exemplars (representatives) from each of the major groups. Preliminary analyses using 16 -54 taxa (Additional file 1) were used as a guide to choosing exemplars so that fast-or slowevolving species were not selected.
Analyses of the concatenation of the nine nuclear genes, as well as a full data set of 24 genes, were carried out using minimum evolution, maximum parsimony, maximum likelihood, and Bayesian inference. Minimum evolution analyses were performed using Kimura two-parameter distances with transversions-only, gamma corrected Kimura two-parameters distances with transversionsonly, and paralinear distances. For the paralinear distance analyses, the proportion of invariable sites in the considered alignment was estimated using likelihood and assuming a HKY85 + gamma + proportion of invariable sites model of DNA evolution. Identical sites were then removed proportionally to the base frequencies estimated from all sites (PAUP default settings). Maximum parsimony analyses were carried out with an equal weighting and a 2:1 weighting scheme favoring transversions over transitions. The model used for the maximum likelihood analyses (GTR + gamma+ proportion of invariable sites) was selected using Modeltest [19], whereas Bayesian inference was carried out under mixed models, and full parameter estimation was performed during tree search for each gene.
All minimum evolution analyses were performed using MEGA 2.1 [40] and PAUP [41], while likelihood and parsimony analyses, as well as the estimation of the specific gamma parameters for the minimum evolution analyses were carried out using PAUP. Bayesian analyses were performed with MrBayes 3.0 [42]. The branch and bound algorithm was used in the parsimony analyses, while in the likelihood analyses 100 heuristic searches were per-formed. In the latter, starting trees were obtained using random sequence addition and swapped using the tree bisection reconnection algorithm. In the Bayesian analysis, 300 000 generations were run and trees were used only after convergence was reached.
Robustness for the nodes in the minimum evolution, maximum parsimony, and maximum likelihood trees was evaluated using the bootstrap. PAUP settings for the bootstrap analyses were as follows: 2 000 replicates with full heuristic search (one random addition sequence) and the multiple tree option turned off. Trees were swapped using the TBR algorithm. Support for the groups recovered in the Bayesian analyses were expressed as their posterior probabilities. Gapped sites were removed prior to all analyses.

Molecular clock methods
Molecular clock analyses were performed using a diversity of methods [37]: multigene global least squares (MGG LS ) [43,44], multigene local least squares (MGL LS ) [45], supergene global least squares (SGG LS ) [46], supergene local least squares (SGL LS ), supergene local divtime (SGL DT ; DivTime5b) [47], supergene local multidivtime (SGL MDT ; Multidivtime) [48], and supergene local penalized likelihood (SGL PL ; r8s version 1.5) [49,50]. Least squares methods are distance based, SGL PL is a semi-parametric likelihood-smoothing method, and SGL DT and SGL MDT are Bayesian methods. Multigene methods use the mean or mode [37,44,51] of time estimates from individual proteins whereas supergene methods derive a single time estimate from the simultaneous analysis of all available proteins. SGL DT and SGL MDT are two different implementations of the Bayesian method of Thorne et al. [47]. The difference between the two being that with SGL DT protein sequences are concatenated and considered as a single entity, whereas with SGL MDT each considered protein maintains its individuality during the analyses.
We used the divergence of centipedes (Chilopoda) and millipedes (Diplopoda) as the minimal (most recent) time of arthropod terrestrialization and the divergence of myriapods with their closest relatives (in this case, chelicerates) as the maximal (earliest) time for myriapod terrestrialization. The maximal time for arthropod terrestrialization was considered as the divergence between insects (Insecta) and crustaceans (Crustacea). This does not preclude the possibility that land was colonized even earlier by extinct species or groups of arthropods (or other animals).
Divergence times were estimated using two calibration points from the arthropod fossil record (Additional file 1): Xiphosura-Arachnida (~480 Ma) [52] and Chilopoda-Diplopoda (423 Ma; see Additional File 1 for explanation of date) [53,54]. A third calibration point, the divergence of deuterostomes (vertebrates) and arthropods based on a molecular clock study using 50 proteins and calibrated with the vertebrate fossil record (993 Ma) [33], was used in some analyses and tested (with the arthropod fossil record and by substituting vertebrates with a tunicate, Ciona intestinalis) in others (see below). Analyses were performed using calibration points separately and simultaneously, for comparison. The millipede-centipede fossil divergence was not used to estimate the molecular clock time of that divergence.
GenBank was screened for nuclear encoded proteins (>100 amino acids) in which divergence times could be estimated between two or more arthropod groups, resulting in 84 proteins. For each, reciprocal BLAST analyses were used to identify and assemble sets of available sequences, and orthology was investigated through phylogenetic analyses. In those analyses, 61 proteins (27 291 amino acid positions) were selected for further analysis after determining that orthologous relationships (divergences corresponding to speciation events) could be distinguished from paralogous relationships (divergences corresponding to gene duplication events).
The least squares-based global and local clock methods (MGG LS , SGG LS , MGL LS , SGL LS ) were performed on 49 proteins (Additional file 1) that passed the relative rate test [55] as implemented in PHYLTEST [56]. Gamma-corrected Poisson distances were used for these analyses and specific gamma parameters were estimated with PAML [57]. The Bayesian and the likelihood-smoothing local clock methods (SGL DT , SGL MDT , and SGL PL ) were used with all 61 proteins that passed or did not pass the rate tests. Branch lengths for the penalized likelihood analyses were estimated with PAML (assuming a Gamma corrected JTT model of amino acid substitution). For the Bayesian analyses, branch length estimation was performed using the software AUTOestbranches [47] in the case of SGL DT , and Estbranches [48] in the case of SGL MDT . Standard errors for SGG LS and SGL LS were calculated from amonggene comparisons (i.e., MGG LS and MGL LS ) and were not available for SGL PL .

Authors' contributions
DP carried out the bioinformatic research and drafted the manuscript. LLP and ML sequenced the new G3PDH and enolase sequences. LLP carried out preliminary analyses of the sequence data. SBH directed the research and assisted with drafting of the manuscript. All authors read and approved the final manuscript.