Current molecular definitions of species use rules or cut-off values (e.g. ≥ 70% DNA-DNA hybridization) and rarely take account of the genotypic diversity within and between populations [3]. A more natural and pragmatic approach is to analyse large populations of related isolates, that are believed to cover multiple species, and to observe whether suitable molecular methods can resolve distinct clusters in sequence space that can be given appropriate names [11]. This approach has not yet been rigorously applied to bacteria. Consequently we have no idea whether large populations of related bacteria can invariably be divided into discrete clusters using suitable molecular methods or, alternatively, whether many groups of related bacteria fall into a genetic continuum where clear divisions do not exist.
Sequence-based approaches should help us answer this question. However, most studies have focused on single loci and small numbers of isolates, whereas multilocus approaches with large populations are essential as the history of individual genes (including rRNA operons [20]) may be obscured by interspecies recombination, and clusters observed using a small number of isolates may merge when larger numbers of isolates are considered. Comparison of the tree based on the concatenated sequences with the individual gene trees clearly illustrates the inadequacy of single loci for resolving N. meningitidis and N. lactamica (Figure 2). The concatenation of the seven housekeeping loci shows that multiple loci can buffer against the distorting effects of inter-species recombination and that the boundaries between the three dominant species in the Neisseria MLST database can be resolved.
Network based methods (e.g. Neighbor-Net [21], Splitstree [22]) applied to both the concatenates and individual loci produce output with numerous reticulations, indicating the conflicting signals in the data, such that the implied relationships between STs within clusters have no phylogenetic meaning. Nevertheless, the use of multiple loci enables us to observe the species clusters even in the presence of conflicting signals. The three main clusters coincide well with the species names derived by standard microbiological procedures and the present definitions of N. meningitidis, N. lactamica and N. gonorrhoeae are reasonably secure; the two N. lactamica that clustered highly anomalously probably represent species mis-identification. The most critical test of the multilocus approach is the ability to resolve N. lactamica from N. meningitidis since these colonise the same body site, the nasopharynx. Resolution of these named species was remarkably good, although the boundaries between N. lactamica and N. meningitidis are somewhat fuzzy, due to the existence of intermediate forms. This is to be expected as recombinogenic bacteria have mosaic genomes, resulting from the occasional replacement of chromosomal segments with those from related populations. Thus, in any large dataset, there may be isolates in which one or more of the loci used in a multilocus approach to species definition will have been recently introduced from a related population. Single unusually divergent replacements, or replacements at more than one of the multiple loci, may place isolates away from the majority of isolates of the species. However, only seven STs in Figure 1 fell into this category (of 667 STs from isolates identified as either N. meningitidis or N. lactamica), and there was no overlap between these two named species (i.e. a region containing isolates identified as both species interspersed with one another).
Sorting the human commensal Neisseria into species has been difficult, with frequent revisions of species names [23]. We gain some insight into the extent and source of this difficulty in Figure 1, where isolates assigned as N. mucosa, N. sicca and N. subflava each fall in very different parts of the tree, and the subtree shown in Figure 1A contains several closely related isolates that have been assigned to these three different named species. Additional studies of the human commensal, Neisseria (and of other groups plagued with similar problems, such as viridans streptococci) using the multilocus approach with large datasets, should clarify whether they fall into distinct clusters, or whether the difficulties in defining species by phenotypic methods reflect an underlying genetic reality in which resolved clusters are not evident.
If necessary, further resolution between apparent clusters may be attempted by increasing the numbers of loci sequenced. Provided that the alleles at these loci show a degree of specificity to a given species cluster, then the resolution of that cluster will be enhanced. If this cannot be demonstrated, then it is likely that the isolates under test do not genuinely form separate populations, and should not be considered to be distinct species. This approach lends itself to "electronic taxonomy", in which systematic classification may be evermore finely elucidated through the accumulation of online sequence databases.
The work described here obviously begs the question of what forces or mechanisms could generate such separation among recombining bacteria. We offer a simple model for recombining organisms as follows: consider two populations freely recombining within themselves and with each other. New mutations arising in one population will readily spread to the other, and to an observer they appear to form one cluster of related strains. If a barrier to recombination should be erected between them, such that isolates are much more likely to undergo recombination with their own population, then the rate of generation of new genotypes within each population may increase beyond the rate at which such genetic innovation is shared and the two populations begin to diverge. As the populations diverge, decreasing sequence identity will further impede recombination, thus reinforcing the effect of the original genetic barrier and creating a permanent separation [24, 25].
It is not difficult to suggest candidate mechanisms. Niche separation is one example, and almost certainly underlies the tight well-defined cluster of N. gonorrhoeae. Unlike the other human Neisseria, which colonise the nasopharynx, the primary niche of the gonococcus is the genital tract, and it has been proposed that gonococci arose relatively recently due to the successful invasion of the genital tract by a nasopharyngeal Neisseria lineage [26]. Similarly, what appears to be single body site (e.g. the human nasopharynx) may contain multiple niches that can be exploited, leading to opportunities for speciation. Restriction-modification systems [27], limitation of transformability by differences in pheromone-type [28] and similar processes are feasible alternatives.
The point at which such a group is described as a species is a matter more of human interest and attention than any intrinsic evolutionary process. The properties of the species clusters we observe will be determined by the diversification of those strains sharing the speciation loci (i.e those that determine gene flow). Because speciation is gradual, we should be able using estimates of recombination within and between groups derived from multilocus data, to define nascent species which if they continue to diversify in isolation, are expected to form distinct sequence clusters, ie species, in the future.