Although it has been clear for quite some time that no universal molecular clock exists, a new question is emerging about what is the phylogenetic footprint of local molecular clocks. With increasing densely sampled phylogenetic trees, we should start to be able to get estimates of the extent of local clocks.

A major limitation of local clock models has been a dearth of methods to appraise all the possible rate assignments for various lineages [42]. BSSVS permits the efficient exploration of all 2^{2n-2} possible local clock models and automatically returns the most parsimonious descriptions of the data.

The RLC description finds notable similarity to a compound Poisson process for rate variation [4]. Under this process, a Poisson number of change-points fall independently onto the branches of a phylogenetic tree. At each change-point, a Gamma-distributed random variable punctuates the current substitution rate. Without additional external information, the number of change-points (if greater than 1) and their specific locations along the branch are not identifiable by the likelihood, though this can be resolved by the prior. However this lack of identifiability places into question the benefit of allowing the large (in fact infinite) augmented state space of change points in the compound Poisson process that our BSSVS approach avoids. Under BSSVS, either there exists no change along a branch or there exists more than one and the new branch-specific rate represents an average over all events and their locations. BSSVS can also generalize to model heterogeneity in aspects of the CTMC process beyond rate variation. Examples we are considering include random local changes in nucleotide composition; a natural extension of previous work on modeling compositional heterogeneity [43]. It is also possible to use this approach to model random local changes in parameters of the tree prior [44].

Compared to the auto-correlated rate models [3], the RLC approach imparts some different prior assumptions on rate variance among branches. For example, the prior variance on a lineage-specific rate depends on the number of internal nodes traversed between the root and branch, not the time-duration. Obviously, this feature vanishes as the marginal prior on rates integrates over all possible trees. In the RLC model the number of traversed nodes reflects the number of sampled speciation events a lineage has encountered. The evolutionary and sampling scenarios for which this serves as a better proxy for rate change than does time-duration is outside the scope of this work. Formal model testing can help settle this debate on a dataset-by-dataset basis. We have not attempted model comparison between the RLC and other relaxed clock models as part of this work, as it is a very challenging task. New methods for computing Bayes factors between non-nested phylogenetic models, such as path sampling [45, 46] and stepping-stone sampling [47] may improve this situation in the future.

Further, hybrid models remain within reach in which rate multipliers ϕ draw *a priori* from a multivariate distribution. The multivariate generalization of the Gamma is a Wishart, characterized by a scale matrix. This scale matrix could be a function of the time-tree.

While the transition kernels we employ in this paper successfully explore the posterior distribution for the three examples, we can envision datasets for which our algorithm would have difficulties producing accurate estimates of the posterior distribution. High correlation most likely exists between the evolutionary tree τ and location indicators δ along τ at which local clocks start. Some datasets may possess posterior support for alternative trees whose clock structures vary considerably. This situation poses a significant difficulty for our current transition kernels. These kernels alternate between updating τ with only small changes δ and updating δ conditional on τ. In this construction, very rarely is it possible to make large moves in both tree-space and clock structures simultaneously, leading to potentially long mixing times. To remedy this, kernels that propose larger simultaneous jumps are warranted. While we are currently exploring different choices, finding kernels whose Hastings ratio remains convenient to calculate and function well across a range of datasets is proving challenging. We do, however, remain optimistic.

Alternatively, [48] encourages a collapsed Gibbs sampler via parameter marginalization when encountering high correlation. While it is computationally intractable to analytically integrate the model sampling density over all possible τ or all possible δ, a "local" collapse suggests a viable option. [49] exploit such an approach when sampling over the joint space of trees and sequence alignments; when proposing an update to τ , these authors integrate over the smaller portion of alignment-space affected by jumping from the current to proposed tree; then, given the new tree, re-sample a consistent and probable alignment. For the RLC model, a "local" collapse equates to integrating out the location indicators *δ*
_{
k
} on branches near the affected portions of τ and reduces to a discrete summation over a modest number of combinations. There still exists correlation between indicators δ and rate multiplier ϕ; however, we believe this correlation strength is much smaller than between that above, as the multipliers only enter into the likelihood when *δ*
_{
k
} = 1 and, hence, have considerably more freedom in their realized values. In any case, researchers should not blindly apply Bayesian samplers to new datasets; samplers require care and thought to ensure adequate exploration of the posterior parameter space.