Sequence similarity network showing the giant connected component (GCC) of each dataset. GCC were constructed at the most inclusive sequence similarity threshold (≥85%) for DNA (above) and cDNA (below) networks. Supernodes of the GCC represent Louvain communities (LCs); the size of the supernode indicates the number of sequences in the respective LC, the color indicates the proportion of targeted sequences in the respective LC (for example, sequences from cultured ciliates at t-2, sequences from cultured ciliates and previous environmental samplings at t-1, sequences from BioMarKs at t). The figure should be read along the time of data generation axis from left to right to follow the gradual discovery of ciliates by Sanger sequencing (t-2) compared to previous environmental studies (t-1) and finally to the current 454 pyrosequencing study (t). Color of nodes at time t-2 indicates the proportion of sequences from cultured ciliates in each LC; color of nodes at time t-1 indicates the proportion of sequences from either cultured ciliates or former environmental ciliates studies in each LC; color of nodes at time t indicates the proportion of BioMarKs 454 sequences in each LC (different color code was used for reasons of visualization). GCC, giant connected components; LC, Louvain communities.