Gene-specific selective sweeps in bacteria and archaea caused by negative frequency-dependent selection

Background Fixation of beneficial genes in bacteria and archaea (collectively, prokaryotes) is often believed to erase pre-existing genomic diversity through the hitchhiking effect, a phenomenon known as genome-wide selective sweep. Recent studies, however, indicate that beneficial genes spread through a prokaryotic population via recombination without causing genome-wide selective sweeps. These gene-specific selective sweeps seem to be at odds with the existing estimates of recombination rates in prokaryotes, which appear far too low to explain such phenomena. Results We use mathematical modeling to investigate potential solutions to this apparent paradox. Most microbes in nature evolve in heterogeneous, dynamic communities, in which ecological interactions can substantially impact evolution. Here, we focus on the effect of negative frequency-dependent selection (NFDS) such as caused by viral predation (kill-the-winner dynamics). The NFDS maintains multiple genotypes within a population, so that a gene beneficial to every individual would have to spread via recombination, hence a gene-specific selective sweep. However, gene loci affected by NFDS often are located in variable regions of microbial genomes that contain genes involved in the mobility of selfish genetic elements, such as integrases or transposases. Thus, the NFDS-affected loci are likely to experience elevated rates of recombination compared with the other loci. Consequently, these loci might be effectively unlinked from the rest of the genome, so that NFDS would be unable to prevent genome-wide selective sweeps. To address this problem, we analyzed population genetic models of selective sweeps in prokaryotes under NFDS. The results indicate that NFDS can cause gene-specific selective sweeps despite the effect of locally elevated recombination rates, provided NFDS affects more than one locus and the basal rate of recombination is sufficiently low. Although these conditions might seem to contradict the intuition that gene-specific selective sweeps require high recombination rates, they actually decrease the effective rate of recombination at loci affected by NFDS relative to the per-locus basal level, so that NFDS can cause gene-specific selective sweeps. Conclusion Because many free-living prokaryotes are likely to evolve under NFDS caused by ubiquitous viruses, gene-specific selective sweeps driven by NFDS are expected to be a major, general phenomenon in prokaryotic populations. Electronic supplementary material The online version of this article (doi:10.1186/s12915-015-0131-7) contains supplementary material, which is available to authorized users.


Equations defining the model
The population dynamics of prokaryotic hosts is defined by the following selection S and recombination matrix R : [S] P,P = f P f P t , [R] P, ! P = ρ P, ! P , f P t is the average fitness of prokaryotic hosts at time t . p P (t) is the frequency of prokaryote genotype P at time t . ! p(t) is the vector whose P th entry is p P (t) . S is a diagonal matrix (the entry at the i th row and the j th column of a matrix X is denoted by [X] i, j ). ρ P, ! P is the rate of recombination that transforms prokaryote genotype ! P into P , and it is defined as ρ P, ! P = (αr / a) d S P ,S ! P (1−αr / a) n−d S P ,S ! P ×{δ E P ,E ! P [1−δ E P ,0 rp E (t) −δ E P ,1 r(1− p E (t))]+ (1−δ E P ,E ! P )r[δ E P ,1 p E (t) +δ E P ,0 (1− p E (t))]} ×{δ N P ,N ! P [1−δ N P ,0 rp N (t) −δ N P ,1 r(1− p N (t))]+ (1−δ N P ,N ! P )r[δ N P ,1 p N (t) +δ N P ,0 (1− p N (t))]} where δ i, j is Kronecker's delta, d S P ,S ! P is the number of S loci that differ between P and ! P , and p E (t) and p N (t) are the frequency of allele 1 at the E locus and that at the N locus at time t , respectively. The population dynamics of viruses is similarly defined: S is basically identical to the above with P replaced by V , and the entries of R is defined as

Supplementary results
Maximum recombination rate below which NFDS can cause gene--specific selective sweeps As described under Results, gene--specific selective sweeps require sufficiently low basal recombination rates (Figure 3c). This result was interpreted based on the inequality (αr) 2 s e −1 << r . Below, we derive a similar inequality based on simplified mathematical models.
As described in Results section, the beneficial allele at the E locus spreads through the two scenarios: the pathway involving recombination at the E locus and the pathway involving recombination at the S loci. We can evaluate which pathway allows faster spread and, therefore, is more dominant than the other by considering simplified mathematical models that describe the spread of the beneficial allele either via the one or the other pathway.
The spread of the allele via the pathway involving recombination at the E locus can be modeled by the equation where x i is the frequency of individuals carrying the beneficial allele in one subpopulation, assuming that x i is small. The index i denotes different subpopulations. x 0 is the frequency of the individuals that already have the beneficial allele ( x 0 is assumed to be constant). rx 0 is the rate of production of x i via recombination at the E locus. This same term appears in the equation for any value of i because production of x i via this pathway requires only one recombination event.
By contrast, the spread of the allele via the pathway involving recombination at the S loci can be modeled by the equation where y i is the frequency of individuals carrying the beneficial allele in one subpopulation, assuming that y i is small. y 0 is assumed to be constant (as is the case for x 0 ). αry i−1 is the rate of production of y i via the transformation of y i−1 through recombination at the S loci. We assumed that y i−h can be transformed into y i by h recombination events at the S loci. Moreover, we ignored the terms of the order o(r) for r → 0 as well as the decrease of y i due to transformation into y i+1 . Note the difference between Eqs. (S1) and (S2): x i can be reached from x 0 by one recombination step, whereas y i requires at least i recombination steps from y 0 . Using the above equations, the speed at which the beneficial gene spreads through the i th subpopulation can be estimated as the times required for x i and y i to reach a high value starting at 0. These estimates need not be precise-the only information needed is how these times depend on the two parameters, r and n . Eq.
(S1) can be integrated: where we set This can be integrated: Considering only the highest order term, we can approximate the time at which y 2 reaches a value y (denoted by τ 2 ) by Taking derivative with respect to r , we obtain where we used the fact that τ 2 → ∞ as r → 0 . The last equation indicates that τ 2 depends logarithmically on r as r → 0 . From Eqs. (S4) and (S5), we obtain (assuming where we used the fact that τ 2 depends logarithmically on r as r → 0 . According to Eq. (S6), if r is so small that (αr) 2 τ 2 << r , the pathway involving recombination at the E locus becomes dominant over the pathway involving recombination at the S loci. τ 2 depends only logarithmically on r , so that it does not much affect the order--of--magnitude comparison between (αr) 2 τ 2 and r . Numerical calculations show that τ 2 >> s e −1 (data not shown).

Effect of finite population sizes
In the above argument, the logarithmic dependency of τ E and τ 2 on r is crucial. For example, if the dependency of τ E and τ 2 were inversely linear such that τ E = r −1 and

Alternative model
The model described in the main text assumes the kill--the--winner dynamics caused by viruses to incorporate NFDS. We focused on this model because the kill--the--winner dynamics seems to be the predominant mechanism that causes NFDS and high diversity in natural prokaryotic populations. However, the kill--the--winner dynamics per se is not necessary for NFDS because NFDS can be caused by other types of ecological or social interactions (see Introduction). Moreover, multiple mechanisms of NFDS might operate concurrently within a natural prokaryotic population.
To examine whether NFDS can cause gene--specific selective sweeps regardless of the specific mechanisms that cause NFDS, we considered an alternative model that abstracts away from the kill--the--winner dynamics and also takes account of multiple, concurrently--operating mechanisms for NFDS. This model assumes that prokaryote genomes have multiple S loci subject to NFDS that do not interact with each other (i.e., NFDS imposed at different loci is caused by independent mechanisms). Specifically, the fitness of prokaryotes was defined in a frequency--dependent manner as follows: where p P S k is the frequency of the allele at the k th S locus of the host genome P in the prokaryote population. The population dynamics of other biological entities such as viruses is not explicitly incorporated into the model. This model is very similar to the model considered in Peck [1], except that the latter does not consider neutral loci.
The results obtained with the above alternative model showed that NFDS can cause gene--specific sweeps even if the recombination rate at the loci subject to NFDS is higher than at the other loci (E and N), provided the conditions described in the main text are satisfied ( Figure S2). These results suggest that NFDS can cause gene--specific selective sweeps regardless of the specific mechanisms that lead to NFDS. There is also a difference between the results obtained with the alternative model and those obtained with the model described in the main text. As shown in Figure  S2b, the value of J rel for n = 5 does not become as small as that for n = 2 as r → 0 in contrast to the case for the original model (Figure  3c), where J rel for n = 5 become as small as that for n = 2 as r → 0 . The reason for this discrepancy is as follows. The restriction of genome--wide selective sweeps (and, thereby, the promotion of gene--specific selective sweeps) depends on the maximum fraction a particular susceptibility type can attain within a prokaryote population as described in Result section. In the original model, this fraction is l −n . In the alternative model, it is l −1 because each S locus is subject to NFDS independently of each other. Because l was set larger for n = 2 than for n = 5 in Figure S2b (so as to be consistent with Figure 3c), the minimum possible value of J rel is smaller for n = 2 than for n = 5 .

Figure S2
Relative clonality measured with the alternative model. (a) Relative clonality J rel as a function of α (see Table 1 for notation). l n was fixed at 1024, while n was