Skip to main content
Fig. 1 | BMC Biology

Fig. 1

From: To kill or to be killed: pangenome analysis of Escherichia coli strains reveals a tailocin specific for pandemic ST131

Fig. 1

Steps in pangenome development. The complete genomes of E. coli (1324 genomes) with a total of 6,201,720 protein sequences are evaluated through multiple steps, i.e., (1) within-genome cluster analysis to extract representative (longest) protein sequences with at least 98% identity; (2) across-genome analysis to extract core genome sequences with at least 98% identity; (3) across-genome analysis to extract representative sequences from clusters of homologous sequences at SeqID=90% and SeqLC=90%; (4) parametrization of representative sequences from step (3) across different SeqID [range 40 to 80%] and SeqLC [range 50 to 90%] to obtain clusters of homologous genes/proteins; (5) put all the clusters together from step (1) to (4); and (6) merging of clusters with identical PFAM domains through re-clustering of representative sequences from each cluster from step (5) with SeqID=40% and SeqLC=50%

Back to article page