Evolution and diversity of Rickettsia bacteria
© Weinert et al. 2009
Received: 30 July 2008
Accepted: 02 February 2009
Published: 02 February 2009
Skip to main content
© Weinert et al. 2009
Received: 30 July 2008
Accepted: 02 February 2009
Published: 02 February 2009
Rickettsia are intracellular symbionts of eukaryotes that are best known for infecting and causing serious diseases in humans and other mammals. All known vertebrate-associated Rickettsia are vectored by arthropods as part of their life-cycle, and many other Rickettsia are found exclusively in arthropods with no known secondary host. However, little is known about the biology of these latter strains. Here, we have identified 20 new strains of Rickettsia from arthropods, and constructed a multi-gene phylogeny of the entire genus which includes these new strains.
We show that Rickettsia are primarily arthropod-associated bacteria, and identify several novel groups within the genus. Rickettsia do not co-speciate with their hosts but host shifts most often occur between related arthropods. Rickettsia have evolved adaptations including transmission through vertebrates and killing males in some arthropod hosts. We uncovered one case of horizontal gene transfer among Rickettsia, where a strain is a chimera from two distantly related groups, but multi-gene analysis indicates that different parts of the genome tend to share the same phylogeny.
Approximately 150 million years ago, Rickettsia split into two main clades, one of which primarily infects arthropods, and the other infects a diverse range of protists, other eukaryotes and arthropods. There was then a rapid radiation about 50 million years ago, which coincided with the evolution of life history adaptations in a few branches of the phylogeny. Even though Rickettsia are thought to be primarily transmitted vertically, host associations are short lived with frequent switching to new host lineages. Recombination throughout the genus is generally uncommon, although there is evidence of horizontal gene transfer. A better understanding of the evolution of Rickettsia will help in the future to elucidate the mechanisms of pathogenicity, transmission and virulence.
Rickettsia bacteria are intracellular symbionts of eukaryotes. The genus is classified in the family Rickettsiaceae within the alpha-proteobacteria, and is closely related to the genera Erlichia and Wolbachia [1, 2]. Rickettsia are most noted for causing human diseases, including Rocky Mountain spotted fever and epidemic typhus, which has been a major source of mortality at times in human history . However, all known vertebrate-associated Rickettsia are vectored by arthropods as part of their life-cycle, and many other Rickettsia are found exclusively in arthropods with no known secondary host (for convenience, we will refer to the former as 'vertebrate Rickettsia' and the latter as 'arthropod Rickettsia'). In recent years, arthropod Rickettsia have been discovered in a diverse range of hosts, suggesting that they are more common than had been suspected [4–16]. Nevertheless, research effort has tended to concentrate on the medically important vertebrate Rickettsia, or on the more common arthropod endosymbionts, such as Wolbachia and Cardinium, and so we know little about the biology of arthropod Rickettsia. Even less is known about the closely related bacteria that have been recently discovered in organisms such as leeches and protists, and in metagenomic studies sequencing all DNA in an environmental sample [17–25]. This neglect is unfortunate, because comparing the vertebrate pathogens with related species can help to elucidate the mechanisms of pathogenicity, transmission and virulence [26, 27]. However, this requires a robust phylogeny for the genus.
Historically, Rickettsia were classified into three major groups based on serological characteristics, namely the 'typhus group', 'spotted fever group' and 'scrub typhus group', although subsequent DNA sequencing led to the latter being reassigned to the related genus Orientia . The relationship of species within the remaining two groups of Rickettsia has been the subject of intensive study over the last decade as progressively more informative genes have been sequenced [29–32] culminating in a multi-genic approach . As a result it has been suggested that the spotted fever group consists of two sister clades, one of which is now designated 'transitional'  (although see ). A fourth so-called 'ancestral' clade, including Rickettsia bellii and Rickettsia canadensis, is thought to be basal to the other groups and is largely non-pathogenic to vertebrates. However, the position of R. canadensis remains uncertain .
While many studies have helped to clarify the relationships between the vertebrate Rickettsia, only one recent study has explored the relationship of the well classified groups to the newly discovered arthropod Rickettsia . The authors found that most arthropod Rickettsia are basal to the vertebrate Rickettsia and that the Rickettsia associated with leeches, protists and freshwater environments fell into two phylogenetic groups, distinct from the arthropod and vertebrate groups. The only known exceptions are a small number of arthropod Rickettsia that fell within the group otherwise infecting leeches [12, 36, 37]. However, Perlman et al.  were only able to provide little statistically significant support for relationships among the arthropod Rickettsia. This is almost certainly because the study relied on partial sequences of 16S rDNA, which is extremely slowly evolving, and so lacking in phylogenetic resolution. Improving this situation is challenging because amplifying other genes in basal strains has proven problematic, perhaps because the genes in question may either be missing or too divergent for PCR amplification using existing primers. Also, resolving some deep nodes in the Rickettsia species tree continues to be a problem. The reasons for this are unclear but could be exacerbated by long-branch attraction. One of the best ways to minimise this effect is to sample for more taxa and add them to the tree in the hope of breaking up (thereby shortening) the long branches.
Here, to explore the diversity of arthropod Rickettsia, we screened 4454 arthropods to uncover new Rickettsia strains and sequenced four genes from five known and 20 new bacterial strains. We use the recently published Orientia tsutsugamushi genome  to design PCR primers allowing amplification of rapidly evolving genes from strains that lie between the genera Rickettsia and Orientia. To include other hosts, we also searched published metagenomic databases for Rickettsia sequences. With these data, we have been able to produce the first well-resolved phylogeny of the entire genus Rickettsia, showing how the vertebrate Rickettsia relate to the other taxa. Our phylogeny has allowed us to identify and name additional novel groups. Furthermore, we were able to compare host associations among these groups, identify major life history transitions, and investigate the extent of recombination within the genus.
Rickettsia strains sequenced.
Rickettsia obtained from:
Bombyliidae (bee fly)
Bombyliidae (bee fly)
Reduviidae (assassin bug)
Meloidae (blister beetle)
Subcoccinella vigintiquattuorpunctata (24 spot ladybird)
Halyzia sedecimguttata (orange ladybird)
Calvia quattuordecimguttata (cream spot ladybird)
Coccidula rufa (ladybird)
Rhizobius chrysomeloides (ladybird)
Scymnus suturalis (ladybird)
Adalia bipunctata (2 spot ladybird) edinburgh
Adalia decempuntata (10 spot ladybird)
Gall wasp screen
Jiggins and Tinsley 
Adalia bipunctata (2 spot ladybird) moscow
Adalia bipunctata (2 spot ladybird) cambridge
Adalia bipunctata (2 spot ladybird) ribe
Chen et al. 
Acyrthosiphon pisum (pea aphid)
Lawson et al. 
Brachys tessellatus (buprestid beetle)
To obtain a phylogeny of the genus Rickettsia, we combined a concatenated alignment of the four genes we sequenced, with data from other Rickettsia strains available from Genbank (accession number available in Additional file 1, Accession numbers of genes used in the phylogenetic analysis). For most of the previously described arthropod Rickettsia, only 16S rDNA sequence is available, and so we allowed for missing data in the alignment where a gene had not been sequenced. Missing data should not decrease phylogenetic resolution for taxa with complete data, and is likely to be a problem for other taxa only when the number of characters is very low .
Likelihood values of SH tests.
Likelihood of tree topology
It is also important to investigate the influence of missing data on the phylogeny. Therefore, we constructed a second tree that included only taxa with complete sequences for the three genes atpA, coxA and gltA (excluding 16S due to its marginally significant SH test). This 'complete data' tree is shown in Figure 1b. Overall, the topologies of the two trees are very similar (Figure 1a and 1b), but most nodes had higher support in the tree with complete sequences. In particular, there is strong bootstrap support for the group largely composed of ladybird symbionts in the complete data tree (Figure 1b) but not on the missing data tree (Figure 1a). An exception is the placement of R. canadensis, which is uncertain in the complete data tree but is well supported on the missing data tree (probably because the missing data tree includes two closely related taxa; Figure 1a). The composition of the transitional group and the placement of Rickettsia prowazekii also differ in the two trees. Rickettsia within the typhus group (R. prowazekii and Rickettsia typhi) are striking in that they reside on longer branches than other Rickettsia in the trees. This is indicative of rate heterogeneity, which can cause a long-branch attraction artefact where the taxa will appear in an incorrect place. In the missing data tree the transitional group is monophyletic, while in the complete data tree R. prowazekii groups with Rickettsia akari (Figure 1a and 1b). However, constraining R. akari and the transitional group to be monophyletic in the complete data tree only causes a marginally significant drop in the likelihood (SH test; lnL = 20.003 p = 0.066).
By mapping host species onto our phylogeny, we are able to make inferences about patterns of host-switching in the genus. It is clear from Figure 1 that Rickettsia bacteria have an extremely diverse host range, occurring in arthropods, vertebrates, plants, amoebas, ciliates, annelids and hydrozoa, and that there have been numerous shifts between these hosts. The earliest shift splits the genus into two major divisions: the hydra and torix groups and all other arthropod Rickettsia. As mentioned, the hydra group are symbionts of protists and undetermined hosts. Although one member of this group was found in the marine ciliate Diophrys from brackish water , and another from a deep sea octocoral, all others are from freshwater environments or damp terrestrial environments. In general it appears that marine Rickettsia are rare. Indeed, from over 13 billion open reading frames compiled from marine metagenomic datasets  we detected no homologues of greater than 91% identity to the 16S gene of hydra group Rickettsia. The next split in the tree separates all the remaining Rickettsia from the torix group (Figure 1) which contains symbionts of leeches (phylum Annelida), an amoeba  and arthropods (a sandfly, a cranefly, a biting midge and a booklouse). In the torix group, as with the hydra group, the vast majority of the hosts are aquatic at some stage in their life cycle (the sole exception being the booklouse).
The remainder of the arthropod Rickettsia, including all strains sequenced in the present study, form a monophyletic group (Figure 1). Parsimony suggests that the ancestral state of this clade is to infect arthropods, with one or more lineages subsequently evolving to also infect vertebrates. In addition, there have been multiple transitions between blood feeding and non blood feeding insects. Perlman et al.  demonstrated that forcing R. bellii to group with other blood feeders gives a significantly worse tree. SH tests of our phylogeny showed that forcing R. canadensis and Rickettsia felis to group with other blood feeders similarly gives a significantly worse fit (SH tests on all groups: p < 0.001).
Our results therefore show clearly that there have been numerous host shifts, sometimes between taxonomically distant hosts. However, it is equally clear that related Rickettsia tend to share related hosts. Multiple different strains were detected within ladybird beetles, ticks, lice, parasitic wasps and bee-flies, and in all cases, two or more of these strains cluster together. Nevertheless, this pattern does not seem to be explained by ancestral infection followed by co-speciation of parasite and host. From Figure 1a, the three different strains of Rickettsia found in Adalia bipunctata do not appear to be monophyletic as one of the A. bipunctata strains groups with Adalia decempunctata with high posterior support. Unfortunately only four parasitoid individuals from the oak gall wasp screen were infected, not allowing us to test the influence of host relatedness, host interaction frequency and geographic isolation on frequency of horizontal transfer events.
In addition to clustering according to host type, Figure 1 also demonstrates phylogenetic clustering by ecology (although it is often difficult to separate these effects). For example, the two major groups of vertebrate Rickettsia, the spotted fever or typhus groups, consist solely of vertebrate Rickettsia, containing no arthropod Rickettsia. However, the transitional group differs from this pattern containing both vertebrate Rickettsia and Rickettsia infecting non-blood feeding arthropods (Figure 1). A second ecological adaptation to increase transmission is to skew the sex ratio of the host towards females, which are the sex that most efficiently transmits the infection to offspring for vertically transmitted Rickettsia. Some of these Rickettsia are known or suspected to kill male hosts early in their development, and there appears to be two separate origins of this adaptation on the tree (once within a buprestid beetle in the bellii group and once within ladybirds in the adalia group). There are 11 strains of Rickettsia that infect ladybird beetles, and nine of these cluster in a single monophyletic group. The ones that cluster elsewhere are probably not male-killers (male ladybird beetles are also infected at high prevalence ). A third possible source of ecological clustering relates to herbivorous hosts. Such clustering may reflect ecology in two possible ways. Firstly, many symbionts are known to supplement their hosts with amino acids that are rare in phloem sap (although a mutualistic role for Rickettsia has never been demonstrated). Secondly, Rickettsia may be transmitted horizontally through plants (one case is already known). It has previously been asserted that the bellii group consists mainly of herbivorous arthropod symbionts . Four Rickettsia in this group are indeed known to infect sap sucking arthropods (a whitefly, a leaf hopper, an aphid and a red spider mite), and three of these group separately from the other members of the bellii group (Figure 1). However, we have uncovered a large number of predatory insect hosts in this group, and sap sucking insects in other groups (a spittlebug symbiont is in the transitional group). Therefore, the view that members of the bellii group are mainly associated with herbivorous arthropods is not supported by these new data. However, it is possible that the DNA signal could have come from the guts of these insects, as abdomens were sometimes extracted where there was not enough ovary tissue (although the signal would not be expected to be strong).
We did, however, detect some evidence of recombination events within two of the four genes. The maximum χ2 test and phi test identified multiple recombination breakpoints in the gltA and coxA genes. In coxA, the breakpoint pattern indicted that there had been some recombination between an ancestor of the adalia group and of the rhizobius group (maximum χ2 test χ2 = 42.79 p < 0.001; phi test p < 0.001). In gltA, there was evidence of recombination between R. akari of the transitional group and the adalia group (maximum χ2 test χ2 = 46.78 p < 0.001; phi test p = 0.021). In contrast, no recombination was detected within the 16S and atpA genes (maximum χ2 test χ2 = 8.92 p = 0.783; phi test p = 0.960; maximum χ2 test χ2 = 12.13 p = 0.57; phi test p = 0.759 (respectively)).
Split networks were constructed for each of the four genes to identify possible sources of conflicting signal and recombination in the data (Additional file 3, Split networks for each of the individual genes used in the study). This method has an advantage over tree-based methods as posterior support and bootstrap values measure robustness solely with respect to sampling error (as opposed to systematic bias), and with large sample size robustness will generally be high as noise in the data is filtered out. The split network constructed for the 16S gene was tree-like (containing no significant splits). In contrast the other three genes showed a small amount of phylogenetic conflict, with statistical support for two different trees. In all cases, one of these trees corresponded to that shown in Figure 1, suggesting that this tree accurately reflects the evolutionary history of most of the genome. The discrepancies were as follows. The atpA split network showed additional support for a tree where R. prowazekii is basal to the other vertebrate groups. This pattern is consistent with a tree based on protein alignments of the ten Rickettsia genomes . The coxA split network supported a closer relationship between Rickettsia chrysomeloides symbiont and the adalia group, which is consistent with the recombination pattern for this gene. The gltA split network also supported this same relationship although this was not reflected in the recombination breakpoint pattern.
We have identified a large number of new strains of Rickettsia, including several new groups, and shown that arthropod Rickettsia are both common and diverse. We have also constructed the largest and most robust phylogenetic analysis of the genus to date. Importantly, we used a multiple locus approach, as using single genes to build species phylogenies can seriously confound the true relationship between strains, especially with loci that are prone to recombination .
It is useful to view our results in the context of the evolution of the whole order Rickettsiales. To do this, we have used a molecular clock to date the divergence of different groups, and this is shown in Figure 2. The common ancestor was presumably free-living, as the earliest diverging genus of the order is Pelagibacter. Pelagibacter species account for 26% of the bacterial rDNA sequences from sea water  and have the smallest genomes of free-living bacteria. About 525–775 million years ago there was a transition to living within cells, followed by a split into endosymbionts of protists (Holospora) [44, 45] and a clade that primarily infects arthropods. Holospora species infect the nuclei of paramecium and are generally considered pathogenic to their hosts; for example, Holospora undulata can sterilise their hosts, reduce the rate of asexual division and increase host mortality . The most parsimonious interpretation of the tree, therefore, is that the transition to infecting arthropods occurred approximately 425–525 million years ago in this lineage (Figure 2), which can be compared to the first appearance of most metazoan phyla in the Cambrian explosion (approximately 540 million years ago).
All other genera in the order Rickettsiales are associated with arthropods although many have other diverse hosts. The genus Midichloria has only been found in Ixodidae ticks, and resides inside mitochondria. Bacteria in the genus Neorickettsia are primarily associated with helminths, where they can be transmitted to vertebrates . Wolbachia have been described in only arthropods and nematodes, and most are thought to be vertically transmitted (reviewed in ). Ehrlichia and Anaplasma are horizontally transmitted in arthropods and vertebrates [49, 50] and Orientia are vertically transmitted in mites and can be horizontally transferred to humans [51, 52].
The genus Rickettsia is approximately 150 million years old (Figure 2). Parsimony would suggest that the common ancestor of Rickettsia infected arthropods, and that species in the hydra and torix groups then switched to infect other eukaryotes such as protists, leeches and numerous unidentified hosts (many of which may be protists) (Figures 1 and 3). However, care should be taken with this interpretation, as symbionts of arthropods are more thoroughly sampled than those of other animals. In addition, two patterns call into question the interpretation that the ancestral state was arthropod infection. First, the genome sequence of R. bellii includes many genes that are more related to other amoeba symbionts than to other Rickettsia . This is compatible with an ancestor of R. bellii infecting amoebas and exchanging genes with other amoebal symbionts. Second, of the arthropod hosts within the torix group (three Diptera and a booklouse), all of the Dipteran hosts have larval stages that feed on aquatic microbiota, with the other hosts within the group also being aquatic. Although host switching could occur in either direction, transmission from protist to arthropod is more intuitive given that the related genus Neorickettsia is transmitted between hosts through ingestion . Further sampling of other eukaryotic hosts may resolve the question of the ancestral state.
Regardless of this, we have shown that the remaining clade of Rickettsia (i.e. those not in the hydra or torix groups) all have associations with arthropods; either as the only known host or in conjunction with a vertebrate or plant host (Figure 1). The rhizobius and meloidae groups, which all infect beetles, diverged from the other taxa early in the evolution of this clade. There was then a rapid radiation about 50 million years ago that led to most of the strains we know of. This includes the bellii group, which is probably the largest group of arthropod Rickettsia as it contains all but three strains from the worldwide sample. This sample includes both a diverse array of arthropods (it rarely includes the same host genus twice), and it will tend to pick up high prevalence infections (only a single specimen of each host species was tested).
Our results show clearly that switching between arthropod hosts has been a common feature of Rickettsia evolution. Within the genus, closely related bacteria sometimes infect different host phyla and classes (Figure 1), but the genus arose long after the major arthropod orders diverged  (Figure 2). However, the host phylogeny is not entirely unrelated to the bacterial phylogeny, and there are many cases of related Rickettsia strains infecting related hosts. In the case of many mutualistic symbionts, the bacterial phylogeny precisely mirrors the host phylogeny, indicating that the bacteria and host have co-speciated . However, this is not the case in the Rickettsia. Even in the adalia group, where a group of related bacteria all infect related hosts, the host and bacterial phylogenies are different. Therefore, Rickettsia symbioses are short-lived on an evolutionary scale, which is consistent with most of these infections being parasitic.
Our analysis has also allowed us to reconstruct the changes in the ecology of the genus. Rickettsia are almost entirely restricted to terrestrial and freshwater habitats (Figure 1). Within the genus, there have been three major transitions in life history: becoming sex ratio distorters, arthropod vectored vertebrate pathogens and, in one case, an arthropod vectored plant pathogen. Based on current data, infecting plants and parthenogenesis induction in the arthropod host has arisen only once, and male-killing twice. Until the effect of R. bellii on vertebrates in the field has been properly defined, we cannot say for sure how many times vertebrate pathogenesis has evolved.
The recent discovery of plasmids in the genus Rickettsia opens up the possibility that horizontal gene transfer may be common between strains [56–59]. Furthermore, there have been reports of recombination between Rickettsia strains [60, 61]. This has important implications for the evolution of Rickettsia, as genes can sweep through different genetic backgrounds of bacterial strains, thereby potentially increasing the spread of genes altering bacterial pathogenicity. Recombination can also complicate the inference of relationships between strains, as recombination violates the assumption that a strain has one evolutionary history.
It is clear from our data that these different genes have very similar phylogenetic histories and recombination must therefore be infrequent (although it is possible that the exchange of plasmids may be common). However, we detected one clear-cut case of recombination between different groups of Rickettsia. In the symbiont of the ladybird beetle C. rufa (Figure 3) the sequences of atpA and coxA place (s)C. rufa within the bellii group, whereas gltA and 16S place it within the transitional group (Additional file 2, Phylogenetic trees of each of the individual genes used in the study). In the R. felis genome (from the transitional group), the gene sequences of atpA and coxA are approximately 670 kb apart. If this represents one recombination event and the genes are syntenic with the R. felis genome, it will have included approximately 45% of the genome. The biggest known recombination event in Rickettsia, which occurred in Rickettsia massiliae, is a 54 kb segment containing many genes that facilitate conjugal DNA transfer. Intriguingly, although R. massiliae is in the spotted fever group, this region of DNA was also thought to originate from the bellii group . As well as this, Gillespie et al.  found that many of the genes on the R. felis plasmid have a closer relationship to the bellii group. This evidence suggests that conjugation with the bellii group Rickettsia may have an important role in the evolution of the groups containing vertebrate pathogens.
We also detected recombination within the coxA and gltA genes. This is particularly surprising given that the individual gene topologies did not seem to conflict in any way (Table 2). This can only be explained if the recombination event is ancient, and indeed the breakpoint patterns affected all members in particular groups, suggesting that the events pre-dated the divergence of the different groups. Even though recombination machinery has been detected in Rickettsia genomes , this is the first evidence that housekeeping genes recombine, and could have implications for the inference of relationships, especially since housekeeping genes (in particular gltA in Rickettsia) are often used to build phylogenies. Therefore recombination should be investigated more fully, especially when using single genes to build phylogenies. These ancient recombination events involve the adalia group and the rhizobius group, as well as the transitional group. This would seem to indicate that recombination is not unique to the bellii and vertebrate groups, and may be widespread throughout all arthropod Rickettsia and possibly the other basal groups. However, the recombination signal is different from the above cases, as it is intragenic and over a small area.
It is clear from our data that Rickettsia are common and diverse bacteria. However, the basic biology of most of these strains is entirely unknown and it is therefore unclear how these have spread through populations. As Rickettsia are primarily intracellular, they cannot survive for long in the external environment (but see  for cell-free persistence of related Wolbachia). For this reason, they are most readily maintained either by vertical transmission (mother to offspring) in their arthropod hosts or, in the case of blood-sucking arthropods, by horizontal transmission through an infected vertebrate (one case is also known of transmission through a plant ). Because infectious transmission between arthropod hosts is thought to be rare, the general view is that exclusively arthropod Rickettsia are maintained within a host species primarily by transovarial transmission, and therefore must enhance the fitness of infected females . Some Rickettsia raise infected female fitness in an indirect way by manipulating host reproduction towards infected daughters at the expense of sons, either by killing male offspring as embryos (male-killing) or by inducing parthenogenesis [13, 65]. The closely related bacterium O. tsutsugamushi also causes a female biased sex ratio in its mite host . Theoretically, arthropod Rickettsia could also be maintained by directly providing a fitness benefit to infected females as shown for other bacterial groups [67–71], eg by providing essential nutrients or protection from other infective agents. Although, Rickettsia are required for egg production in the booklouse Liposcelis bostrychophila, and are therefore obligatory, in most cases where the arthropod relationship has been studied in depth, Rickettsia are pathogenic [8, 72–74] or have no observable effect [75, 76], making a mutualistic role for Rickettsia in those hosts unlikely.
For those Rickettsia that are vertebrate pathogens but vectored by arthropods, the effects of the bacteria on their arthropod hosts are generally poorly understood .Rickettsia prowazekii is clearly pathogenic to infected lice, and transmission through humans is essential to the maintenance of the bacteria in arthropod populations. In every other case, human infections are accidental, but transmission through other vertebrates may allow the bacteria to persist in populations. Many of the bacteria that can infect vertebrates are also transmitted vertically by the arthropod host . In these cases, even very occasional horizontal transmission through the vertebrate host can enhance the maintenance of bacteria in arthropod populations.
Our data also have implications for transmission. We have shown that R. felis (transitional group),R. canadensis (canadensis group) and R. bellii (bellii group) are more closely related to Rickettsia in non-blood feeding hosts than to those found in other blood feeding hosts. Therefore, are these strains even transmitted horizontally? As far as we are aware, even in cases where the bacteria can infect vertebrates (as is the case with R. felis), there has been no recorded instance of transmission back to arthropods (i.e. ectoparasites can not pick up the infection from vertebrates). Therefore, while there are multiple origins of infecting blood-feeding arthropods, the ability to be transmitted from vertebrates back into the arthropod host may have arisen once only, and subsequently been lost in the transitional group after the divergence of R. akari and Rickettsia australis.
We still do not have a complete understanding of how Rickettsia are maintained within host populations or how they move horizontally between host species. A better understanding of these dynamic processes can be achieved by detailed studies of representatives from the different groups described here.
We have identified 20 new arthropod Rickettsia and described the major transitions and life-history strategies throughout the phylogeny. This raises many questions about how these bacteria are maintained and spread throughout populations of arthropods and vertebrates. Rickettsia are known to distort the sex ratio of their hosts by male-killing and inducing parthenogenesis, and are also horizontally transmitted through vertebrates and plants. However, these phenotypes are probably not manifest in the majority of strains discovered and so there may be other ways in which Rickettsia are maintained in host populations. For example, there seem to be intriguing links to host oogenesis in some strains and a possible case of a beneficial effect in the torix group . Exploring the biology of these new strains is essential if we are to learn more about the genus.
We obtained most of the Rickettsia strains we sequenced from three PCR screens of insects collected in the wild (Table 1). These used primers that amplify the 16S rDNA of Rickettsia . The first screen tested 2149 ladybirds from 21 different species collected from the UK, Germany, Spain and New Zealand for the presence of Rickettsia . We sequenced a Rickettsia from a single individual from each of the eight species shown to be infected. The second screen tested 1458 individuals of Hymenoptera associated with galls induced by oak gall wasps (Hymenoptera: Cynipidae, Cynipini; ), comprising nine species of oak gall wasp, 26 species of associated chalcid parasitoid, and ten species of oak gall wasp inquiline (Hymenoptera: Cynipidae, Synergini) (A Aebi and G Stone, unpublished data). We sequenced a Rickettsia from single individuals from three of the four species that were infected. The third study screened 847 individuals, each of which was a different species of arthropod from the classes Arachnida, Entognatha, Malacostraca and Insecta. The individuals from Arachinida comprised six of the order Araneae and one Holothyrida. The five Entognatha were all Collembola and the individual from Malacostraca was from the order Isopoda. The individuals from the Insecta comprised 240 of the order Hymenoptera, 218 Diptera, 206 Coleoptera, 86 Hemiptera, 28 Lepidoptera, nine Orthoptera, nine Thysanoptera, eight Odonata, eight Heteroptera, five Homoptera, five Blattodea, four Neuroptera, three Dermaptera, and one individual each of Mantodea, Pscoptera, Siphonaptera, Strepsiptera, and Trichoptera (L Weinert and J Werren, unpublished data). The insects were collected from worldwide locations. All nine Rickettsia isolates from this screen were sequenced. More detailed information on infected and uninfected species from unpublished data can be found in the supplementary information (Additional file 4, The distribution of Rickettsia among arthropods). We also included a Rickettsia from the pea aphid Acyrthosiphon pisum , a male-killing Rickettsia from the buprestid beetle Brachys tessellatus  and three Rickettsia strains from the ladybird beetle A. bipunctata, each of which has been shown to be genetically distinct [73, 79].
Primers used for PCR amplification and sequencing.
Primer sequence (5'-3')
ATP synthase F1 alpha subunit
subunit I of cytochrome C oxidase
The PCR products were incubated at 37°C for 40 minutes with shrimp alkaline phosphatase (Promega, Southampton, UK) to digest unincorporated dNTPs and exonuclease I (NEB, Hertfordshire, UK) to digest the PCR primers. They were then sequenced using Big Dye technology (Applied Biosystems, CA) in both directions using the PCR primers and run on a 3730 capillary sequencer (Applied Biosystems, CA).
Nucleotide sequences were edited and assembled using Sequencher 4.1 (GeneCodes, MI), and aligned using the ClustalW application within Bioedit v.7.0.1. All sequences within alignments were checked to ensure they encoded functional proteins (with the exception of the 16S gene). The model of sequence evolution used for each gene was selected by including only parameters that significantly improved the fit of the model to our data. These parameters were identified by comparing alternative models using hierarchical likelihood ratio tests in the program MODELTEST v.3.7 . The evolutionary models used were as follows: 16S – HKY+G, gltA – K81uf+I+G, coxA – GTR+G and atpA – GTR+G.
Phylogenetic hypotheses were inferred using maximum likelihood in PAUP v.4.b10 and using the Bayesian MC3 approach implemented in MrBayes v3.1 . We combined our data with published sequences from all the known non-vertebrate Rickettsia strains, and all the Rickettsia from the ancestral, typhus and transitional groups, as well as Rickettsia helvetica, Rickettsia montanaensis, R. massiliae, Rickettsia japonica, Rickettsia conorii, Rickettsia peacockii and Rickettsia rickettsii from the spotted fever group (Figure 1a). We also included O. tsutsugamushi as an outgroup (we checked that this species is a genuine outgroup by reconstructing a 16S rDNA tree rooted with Wolbachia pipientis; data not shown). All accession numbers are given in Additional file 1, Accession numbers of genes used in the phylogenetic analysis. Maximum parsimony trees were created using the tree-bisection reconnection branch swapping method, and these were then used both to estimate model parameters and as a starting tree for the maximum likelihood analysis. The maximum likelihood trees were then found using the nearest-neighbour-interchanges branch swapping method. The robustness of the tree topologies was assessed by repeating the analysis using 1000 bootstrapped datasets. The GTR+I+G model of evolution was used for the concatenated dataset of the three genes.
The Bayesian analysis incorporated four Markov chains (three heated and one cold chain), consisting of 1,000,000 generations with sampling every 100 generations. Two simultaneous runs with different random start trees were performed, and the first 25% of samples were discarded as burn-in. For the Bayesian analysis including missing data, the data were partitioned for the four different genes and assigned the appropriate evolutionary model (given above), then unlinked so that the parameters were estimated separately and allowed to have a different evolutionary rate. The MCMC analysis was then run for 6,000,000 generations, after which the standard deviation of split frequencies (a measure of the similarity of the two independent trees in the run) fell below a proposed threshold for model convergence of 0.01 . For the phylogeny that contains missing data, we used only the Bayesian approach.
Split networks for each of the four genes were constructed using the neighbour-net method in SplitsTree4 [84, 85]. Networks represent multiple trees simultaneously, and they can therefore identify conflicting signals in the data. These may arise from either genetic exchange between bacterial strains, or from systematic error in the underlying model of evolution. The neighbour-net method computes a matrix of distances (much like the neighbour joining method) and produces a network with weights assigned to each split that are proportional to the number of sites that support the split. We used non-parametric bootstrapping to identify splits supported with > 95% confidence, and only included these statistically significant splits in our network (otherwise representing the data as a bifurcating tree) .
We tested whether there were significant topological differences between the maximum likelihood trees of the four genes and a tree produced from the concatenated sequences of all four genes using the Shimodaira-Hasegawa test . The test statistic for a given gene is generated by comparing the maximised likelihood score for that gene with topology unconstrained, with the likelihood obtained when topology was fixed at the maximum likelihood topology obtained from the concatenated dataset. The null distribution of the test statistic for a gene is generated from 1000 nonparametric bootstrapped datasets, although to reduce the computational burden, nuisance parameters were fixed at values estimated from the original dataset (RELL method). This test was applied to each of the genes with the Rickettsia strain from C. rufa removed for reason of recombination (see Results).
We tested for recombination between Rickettsia strains in two ways. First, we used the maximum χ2 test  implemented in RDP v3b22 . This test takes all possible triplets of sequences, removes any gaps, and makes an alignment of just the polymorphic sites. A window is then slid along this alignment in single nucleotide steps. At each position a χ2 statistic is calculated as a measure of the likelihood that recombination has occurred between these sequences. The size of the window was set at approximately 3/4 the numbers of polymorphic sites present for each triplet. To correct for the large number of multiple tests performed, we obtained an analysis-wide significance threshold of χ2 by repeating the analysis on 1000 datasets that were simulated without recombination (simulations performed using Seq-Gen ). The maximum χ2 test of recombination is one of the most powerful tests of recombination  but it can occasionally falsely infer the presence of recombination under some conditions, such as in regions that contain mutational hot-spots . Therefore we also used the pairwise homoplasy index (PHI) test of recombination  implemented in SplitsTree4. The test exploits the fact that when recombination has occurred, sites that are physically close in the sequence should yield compatible phylogenies more often than distant sites. The phi statistic (Φ w ) quantifies the degree of congruence between parsimonious trees at closely-linked sites up to 100 bp (w = 100). A p-value can then be obtained by comparing this statistic with a distribution of values obtained when the position of sites along the sequence is determined at random. To speed computation, this null distribution can be approximated by a normal distribution, whose mean and variance are calculated analytically from the data.
To date key transitions in the order Rickettsiales, we calibrated a 16S rDNA phylogeny of the order using the substitution rate of this gene estimated for the endosymbiont Buchnera . This tree was reconstructed with a molecular clock enforced. We checked that enforcing a clock did not significantly reduce the likelihood of the tree by comparing the likelihoods of a tree with and without a clock enforced using a likelihood ratio test.
We would like to thank Remy Ware and Mike Majerus for providing samples, Crystal Allen for help with DNA screening, and John Welch for comments on the draft. LAW is supported by post graduate NERC funding. FMJ is supported by a Royal Society University Research Fellowship. GNS, FMJ and AA were supported by NERC grants NE/D007178/1, NER/B/504406/1 and NE/E014453/1. The screening for Rickettsia in the arthropod collection at U. Rochester was supported by US NSF EF-0328363 to JHW.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.