Phylogenomics supports microsporidia as the earliest diverging clade of sequenced fungi
© Capella-Gutiérrez et al; licensee BioMed Central Ltd. 2012
Received: 2 April 2012
Accepted: 31 May 2012
Published: 31 May 2012
Skip to main content
© Capella-Gutiérrez et al; licensee BioMed Central Ltd. 2012
Received: 2 April 2012
Accepted: 31 May 2012
Published: 31 May 2012
Microsporidia is one of the taxa that have experienced the most dramatic taxonomic reclassifications. Once thought to be among the earliest diverging eukaryotes, the fungal nature of this group of intracellular pathogens is now widely accepted. However, the specific position of microsporidia within the fungal tree of life is still debated. Due to the presence of accelerated evolutionary rates, phylogenetic analyses involving microsporidia are prone to methodological artifacts, such as long-branch attraction, especially when taxon sampling is limited.
Here we exploit the recent availability of six complete microsporidian genomes to re-assess the long-standing question of their phylogenetic position. We show that microsporidians have a similar low level of conservation of gene neighborhood with other groups of fungi when controlling for the confounding effects of recent segmental duplications. A combined analysis of thousands of gene trees supports a topology in which microsporidia is a sister group to all other sequenced fungi. Moreover, this topology received increased support when less informative trees were discarded. This position of microsporidia was also strongly supported based on the combined analysis of 53 concatenated genes, and was robust to filters controlling for rate heterogeneity, compositional bias, long branch attraction and heterotachy.
Altogether, our data strongly support a scenario in which microsporidia is the earliest-diverging clade of sequenced fungi.
Microsporidia are spore-forming, intracellular eukaryotic parasites that infect a wide range of hosts . Their taxonomic classification has evolved through time, representing one of the taxa for which the position in the tree of life has been subject to most radical changes . Initially described as "yeast-like fungi" , microsporidians were soon re-assigned to Sporozoa, an assemblage of spore-forming protozoa. Later, microsporidia were proposed to be one of the most primitive eukaryotic lineages  based on initial electron microscopy studies, which suggested a remarkable absence of widespread eukaryotic features such as golgi bodies, peroxisomes, mitochondria and 9+2 microtubules. Indeed, in the context of the Archeozoa hypothesis , microsporidia were postulated to be direct descendants of a primitive eukaryote that predated mitochondrial endosymbiosis. This idea received support from initial phylogenies based on rRNA and elongation factor genes [6–8]. However, during the 1990s, a growing number of phylogenetic studies suggested a close relationship between microsporidia and fungi, and revealed that more basal positions of microsporidia were likely the result of long-branch attraction (LBA), a methodological artifact affecting highly divergent sequences . Finally, the sequencing of Encephalitozoon cuniculi , and the discovery of microsporidian mitosomes , a relic version of mitochondria, precipitated the re-classification of microsporidia as fungi. This view is also supported by the presence of fungal traits, such as closed mitosis, spores containing chitin and trehalose [12, 13], the presence of RPL21/RPS9 as neighboring genes , and the presence as independent genes of two domain pairs that are fused in non-fungal opisthokonts (ubiquitin/RPS30 and glutamyl/prolyl tRNA synthetases ).
While the fungal nature of microsporidia is now accepted, their exact position in the fungal tree remains debated [13, 16]. Different analyses report alternative scenarios, none of which has been conclusively proven. For instance, an eight-gene analysis resolves microsporidia as a sister group to dikarya , while a six-loci analysis  clusters them with the chytrid Rozella allomyces. In the latter, this scenario was not significantly better supported than five alternative ones, including microsporidia being a sister-group to all fungi, to dikarya or to chytrids. Yet another placement, microsporidia as a sister group to zygomycota, was suggested from phylogenies of tubulins , but these divergent genes are prone to artifacts . As molecular phylogenetics has been unable to satisfactorily resolve the issue, others have explored alternative molecular traits, such as the conservation of gene order. For instance, Lee et al.  found that microsporidia and zygomycotina bore similar gene arrangement in the MAT locus, while such organization was uncommon in other fungi. Based on this and other shared neighboring gene pairs, the authors tentatively placed microsporidia with zygomycota, but this interpretation has been recently challenged .
Most of these studies have been limited by the availability of a single microsporidian genome. This, together with the extremely fast-evolving nature of microsporidian sequences, has certainly limited the strength of previous conclusions. The recent release of five additional microsporidian genomes [22–25], as well as the higher availability of genomes from other early-branching fungi, allows us to re-assess the position of microsporidia with an unprecedented amount of information. In addition, higher genomic sampling among sister groups of fungi provides us now with a better choice of slow evolving out-groups. Increased taxonomic sampling is particularly important to tackle artifacts such as LBA [26, 27], which is known to affect phylogenies of microsporidian sequences. Here, we set out to exploit all available genomic information to resolve the long-standing question of the phylogenetic position of microsporidia. Our analyses include assessment of gene order conservation, of thousands of individual gene phylogenies and of phylogenies based on 53 concatenated loci. In all cases, we contrasted alternative hypotheses in the presence of filters or models that control for potential sources of artifacts, such as compositional bias, LBA or heterotachy. Altogether, our data strongly support microsporidia as the earliest diverging clade of sequenced fungi.
We first explored similarities in chromosomal gene order (also referred to as synteny or colinearity) between microsporidia and other clades, which has been used to associate microsporidia and zygomycotina . This was mainly based on comparisons among E. cuniculi, Rhizopus oryzae and other fungi. However, as more microsporidian and zygomycota genomes became available it is necessary to assess the generality of such associations. For instance, a closer inspection at the MAT locus on a broader set of species and strains revealed that this locus had indeed undergone several re-arrangements . Furthermore, a recent detailed analysis of the MAT locus found that this organization is likely to be either a derived ancestral opisthokont character or the result of convergence . We believe that, besides this particular locus, the generally higher number of neighboring pairs found between E. cuniculi and R. oryzae must be considered carefully too, especially when relaxed criteria for synteny and homology are used. For instance, the occurrence of segmental duplications in only one of the lineages contributes to an overestimation of conserved pairs. Indeed, by analyzing the phylogenies of the putative syntenic pairs, we found that a significant number (at least 25%) of the proposed conserved neighboring pairs were related to gene expansions in R. oryzae, so that the same E. cuniculi pair would detect several R. oryzae pairs resulting from recent duplications (see Additional file 1, Figure S1). Considering that a recent whole genome duplication has been proposed for R. oryzae , it is necessary to account for this effect.
Next, we used ML analysis to contrast 12 competing scenarios (Additional file 1, Figure S14). The hypothesis placing microsporidia as a sister group to all other fungi was significantly more supported than any alternative scenario, as inferred from all but one of the eight statistical tests implemented in CONSEL . The exception was the Shimodaira-Hasegawa test, which, as noted by the authors, is unreliable when more than two hypotheses are contrasted . When repeating the test only contrasting these two hypotheses, the alternative one could be discarded.
LBA has been recognized as a major problem in phylogenies considering microsporidian sequences. We have adopted several strategies to minimize LBA as much as possible . These include: i) shortening of problematic branches by means of a dense taxonomic sampling of microsporidia, ii) selection of multiple, slow evolving out-groups, iii) use of models accounting for LBA, compositional biases and heterotachy, and iv) removal of fastest evolving sites. After applying all these filters, our favored scenario remained highly supported. Yet, LBA may still be present to some degree. To explore its potential influence we performed additional tests. First, we repeated the analysis without the out-groups. Under these circumstances an alternative in-group topology, grouping chytrids and zygomycota, received similar support to an unaltered in-group topology. This suggests some effect of the out-group sequences which may be attributed to LBA. Nevertheless, our data mentioned above suggest that the position of microsporidia is stronger in the absence of factors promoting LBA. In addition, several other lines of evidence suggest that our preferred scenario is not the result of LBA. First, the position in which microsporidia are placed in our tree does not correspond to the expected one for extremely divergent sequences, as indicated by the inclusion of random sequences. Rather, extremely long branches are attracted within the out-groups, and not as a sister group to them, as found in many of the individual gene trees. This is also supported by a recommended test for LBA , in which microsporidian sequences in our concatenated alignments are substituted by random ones . Under these conditions these extremely divergent sequences were mostly (80% of the cases in 100 replicas) within the out-groups, and only rarely (< 5%) in the position of microsporidia in our favored scenario. Second, and contrary to what would be expected under LBA, the support for our proposed scenario increases, rather than decreases, when applying stringent filters to the data. This is true for the analysis based on the concatenated alignment (removal of fast evolving sites) and those based on individual trees (use of longer, and less ambiguous alignments). Alternative scenarios gain some support only when more noisy subsets of the data are used, suggesting that existing biases erode rather than increase the support to our proposed scenario. Third, a parametric simulation  showed that the level of divergence present in our set was not sufficient to create an artifactual grouping of microsporidia and out-group sequences (see Methods). Thus, we conclude that our preferred scenario is likely not the result of LBA.
Altogether, our results show that an earliest branching position of microsporidia is more supported than any alternative placement within fully-sequenced fungi. This conclusion is mostly based on sequence analysis, since low levels of gene order conservation provide insufficient information. Similarly, single-gene phylogenetic analyses are dominated by lack of phylogenetic signal and are prone to artifacts. However, our favored scenario appears to be supported by the largest fraction of genes, especially when loci likely to be carrying scarce phylogenetic signal are removed. Such scenario is also the most parsimonious in terms of the number of inferred duplications, and receives the strongest support in combined analyses of 53 loci. Our results are thus compatible with earlier studies supporting this association [20, 41, 42], as well as with a more recent, parallel analysis based on multiple Nematocida genomes (personal communication Christina Cuomo 25). This scenario has obvious taxonomic and evolutionary implications [16, 18]. For instance, some traits shared with most but not all fungi, such as the absence of flagellated spores, would imply an independent, secondary loss in microsporidia. This and other characteristic traits of microsporidia, such as the presence of mitosomes, reduced genomes or the loss of cell-wall during the infective stage, would be the result of extreme adaptations to a parasitic lifestyle and, hence, provide little or none phylogenetic information. Taxonomically, our topology is consistent with microsporidia being defined as either the most basal lineage in fungi or as its closest sister-clade. Nevertheless, as discussed above, the fungal affiliation of microsporidia is supported by many shared traits (synapomorphies). In addition, since our taxonomic sampling is still reduced regarding basal groups we cannot test some hypotheses, such as the sisterhood of microsporidia and Rozella  or related species . Interestingly, Rozella shares with microsporidia the absence of a cell wall during the infective stage, although it is unclear whether this would represent a convergent adaptation to an intracellular parasitic lifestyle. Of note, we found that the alternative placement of chytrids and zygomycota forming a monophyletic group, was often emerging among the alternative, less supported, hypotheses and that our analyses showed variations with respect to the branching order among microsporidians. Clearly, additional sequences will help to resolve these issues.
Proteins encoded in 121 fungal genomes, including 6 microsporidians, as well as 4 out-group species were downloaded (Additional file 1, Table S4). This dataset was divided into two sub-sets: a primary set was formed by the 6 microsporidians, 4 out-groups, and 12 fungal species representing the main groups. A secondary set was formed by the remaining species (see Additional file 1, Table S4 for details). An expanded dataset with additional species (Additional file 1, Table S5) was created to test the robustness of the topology to increased taxonomic sampling.
A similar analysis to that described in  was performed on the primary set. We used two approaches with varying levels of stringency. First, a "relaxed synteny" approach as described in  was applied. In brief, this method defines a pair of genes as "syntenic" if two genes in the query genome, that have at most three intervening genes, have homologs in the other genome separated by at most four genes. If the intervening genes in the query have homologs in the other genome, then these should be in a window of less than 15 genes. Second, we implemented a "strict synteny" method in which two genes in the same orientation separated by at most three intervening genes is considered to be syntenic if their orthologs (rather than simply homologs) in the other species are separated by at most three intervening genes. If one of the intervening genes has an ortholog in the other species, thus implying a genomic re-arrangement, then the pair is discarded. The Kruskal-Wallis test was used to assess for statistical significance of the differences .
A phylome was reconstructed using as a seed each microsporidian species. Homologous were identified by Smith-Waterman  searches (E-value < 10-5, > 50% coverage). To maximize the number of gene trees encompassing all fungal groups, we adopted the following strategy: for each seed microsporidian sequence, homologs were searched in the primary set (see above). If no homolog was found for a given species, we extended the search to species in the secondary set belonging to the same group. Note that although a different microsporidian species is used as a seed in each phylome, they all contain homologs from all microsporidian species included in the analysis. Next, we reconstructed trees using the pipeline described in . In brief, sequences are aligned with three different programs: MUSCLE v3.7 , MAFFT v6.712b , and DIALIGN-TX . Alignments were run in forward and reverse direction (that is, using the Head or Tail approach ), and the six resulting alignments were combined with M-COFFEE , then the resulting meta-alignment was trimmed with trimAl v1.3 , (consistency-score cutoff 0.1667, gap-score cutoff 0.1). Trees were reconstructed using the best-fitting evolutionary model. Model selection was performed as follows: A Neighbor Joining (NJ) tree was reconstructed as implemented in BioNJ ; The likelihood of this topology was computed, with branch-length optimization, using seven evolutionary models (JTT, LG, WAG, Blosum62, MtREV, VT and Dayhoff), as implemented in PhyML v3.0 ; The two best-fitting models, as determined by the AIC criterion , were used to derive ML trees, using SPR (Subtree Pruning and Regrafting), a discrete gamma-distribution with four rate categories plus invariant positions, and the fraction of invariant positions and gamma parameter were inferred from the data. Branch support was computed using an aLRT (approximate likelihood ratio test) based on a chi-square distribution. Alignments and trees are available at ftp://ftp.cgenomics.org/microsporidia
We derived super-trees based on the whole collection of trees by using duptree  to search for the species topology implying the least number of duplications.
In order to determine the phylogenetic position of microsporidia as inferred from each individual gene tree, only trees with at least one representative from each of the six predefined groups (see Additional file 1. Table S4) were used. Trees were rooted using the most distant out-group sequence, and speciation and duplication events were detected using the species-overlap algorithm (species-overlap-score = 0.0) implemented in ETE v2 . ETE was used to explore the species content of the nearest partition to the microsporidian sequences (that is, the sister-group). This defined the sister-group of microsporidia (for example, Zygomycota if only the zygomycota species were found, or All fungi if all the remaining fungal groups were sister to microsporidia).
We concatenated the trimmed alignments of 53 single-copy protein families with one-to-one orthologs in at least 18 species of the primary set. After removing positions with no residue variation in all sequences, and positions that only contained gaps in all microsporidia species, the alignment contained 25,640 positions. In addition, BMGE  was used to remove possible compositional heterogeneity bias across the sequences resulting in an alignment of 9,745 columns. Subsequently, four different tree reconstruction strategies were applied on these two alignments: First, ML trees were derived as described above, using LG , since this model best fit 50 out of 53 of the individual alignments. Second, ML trees were also derived using the C40 CAT model, as implemented in PhyML. Third, to account for potential heterotachy, we derived ML trees with a free rates parameter covarion model recently implemented in PhyML. Finally, Bayesian trees were inferred with PhyloBAYES v3.2  using CAT, 2 independent Markov chain Monte Carlo (MCMC) chains, a saving frequency of 100 generations, and the following stop criteria, as recommended in the manual: 1) a maximum bipartition discrepancy (maxdiff) < 0.1 and 2) a minimum effective size > 100 for all parameters. Consensus trees were generated using the majority-rule.
To determine the contribution to the observed phylogenetic signal from positions with different evolutionary rates, we repeated the procedures mentioned above on sub-alignments resulting from sequentially removing fastest-evolving sites. TreePuzzle v5.2  was used to assign residues to one of 16 possible rate categories. Then, the two, four, six and eight more divergent categories were subsequently removed from the two alignments described above. In addition, the alignment was partitioned into four sub-alignments based on the level of divergence within the microsporidia only. ML analysis was repeated on all resulting sub-alignments, as described above.
Using ETE , we constructed 12 alternative topologies (Additional file 1, Figure S14). To avoid influence of phylogenetic relationships not directly tested, internal microsporidian branches were collapsed and then resolved for each alignment with RAxML . Branch lengths were optimized for each alignment and all competing hypotheses were compared using the eight statistical tests implemented in CONSEL . The confidence set of topologies was obtained by collecting trees not significantly different (P-values > 0.05).
We performed a parametric test as described in . A total of 53 sequence families were simulated, as implemented in ROSE , using as a known tree one in which microsporidia clustered with chytrids. The same tree reconstruction procedures as explained before were implemented. In the reconstructed tree, microsporidia were not at the earliest branching position, indicating that LBA is not sufficient to artifactually result in that placement with the methodologies used (Additional file 1, Figure S17).
Block Mapping and Gathering with Entropy
approximate likelihood ratio test
Subtree Pruning and Regrafting
Markov chain Monte Carlo.
TG group research has grants from the Spanish ministry of science and innovation (GEN2006-27784E, BFU2009-09168), SCG has a grant from Spanish ministry of science and innovation (BES-2010-036260). We acknowledge Hilary Morrison and the A. locustae sequencing project, Emily Troemel and Christina Cuomo for their unpublished Nematocida parisii data and other unpublished genome data at the “Microsporidian Comparative Genomes Consortium” and Iñaki Ruiz-Trillo for data from “Origins of Multicellularity” Sequencing Projects at Broad Institute of Harvard and MIT. The authors also wish to thank Stephane Guindon for his assistance in the use of the free-rate parameter covarion model in PhyML.