Skip to main content

Advertisement

Table 2 CEGMA analysis of selected datasets

From: Transcriptome, proteome and draft genome of Euglena gracilis

Assembly Organism Gene status Prots %Completeness Total Average %Ortho
Genome E. gracilis Complete 22 8.87 37 1.68 54.55
  Partial 50 20.16 89 1.78 56
T. brucei Complete 196 79.03 259 1.32 24.49
  Partial 205 82.66 282 1.38 28.29
L. major Complete 194 78.23 220 1.13 11.34
  Partial 204 82.26 245 1.2 15.69
Transcriptome E. gracilis Complete 187 75.4 390 2.09 65.78
  Partial 218 87.9 506 2.32 69.72
T. brucei Complete 190 76.61 393 2.07 60
  Partial 205 82.66 448 2.19 63.41
L. major Complete 133 53.63 275 2.07 64.66
  Partial 194 78.23 405 2.1 64.43
  1. Comparisons for CEGMA scores between E. gracilis, T. brucei and L. major as an estimate of ‘completeness’ based on 248 CEGs. Prots number of 248 ultra-conserved CEGs present in genome, %Completeness percentage of 248 ultra-conserved CEGs present, Total total number of CEGs present including putative orthologs, Average average number of orthologs per CEG, %Ortho percentage of detected CEGs that have more than 1 ortholog, Complete those predicted proteins in the set of 248 CEGs that when aligned to the HMM for the KOG for that protein family, give an alignment length that is 70% of the protein length. i.e. if CEGMA produces a 100 amino acid protein, and the alignment length to the HMM to which that protein should belong is 110, then we would say that the protein is “complete” (91% aligned), Partial those predicted proteins in the 248 sets that are incomplete, but still exceeds a pre-computed minimum alignment score. Keys are as described [58]