Skip to main content

Table 2 CEGMA analysis of selected datasets

From: Transcriptome, proteome and draft genome of Euglena gracilis

Assembly

Organism

Gene status

Prots

%Completeness

Total

Average

%Ortho

Genome

E. gracilis

Complete

22

8.87

37

1.68

54.55

 

Partial

50

20.16

89

1.78

56

T. brucei

Complete

196

79.03

259

1.32

24.49

 

Partial

205

82.66

282

1.38

28.29

L. major

Complete

194

78.23

220

1.13

11.34

 

Partial

204

82.26

245

1.2

15.69

Transcriptome

E. gracilis

Complete

187

75.4

390

2.09

65.78

 

Partial

218

87.9

506

2.32

69.72

T. brucei

Complete

190

76.61

393

2.07

60

 

Partial

205

82.66

448

2.19

63.41

L. major

Complete

133

53.63

275

2.07

64.66

 

Partial

194

78.23

405

2.1

64.43

  1. Comparisons for CEGMA scores between E. gracilis, T. brucei and L. major as an estimate of ‘completeness’ based on 248 CEGs. Prots number of 248 ultra-conserved CEGs present in genome, %Completeness percentage of 248 ultra-conserved CEGs present, Total total number of CEGs present including putative orthologs, Average average number of orthologs per CEG, %Ortho percentage of detected CEGs that have more than 1 ortholog, Complete those predicted proteins in the set of 248 CEGs that when aligned to the HMM for the KOG for that protein family, give an alignment length that is 70% of the protein length. i.e. if CEGMA produces a 100 amino acid protein, and the alignment length to the HMM to which that protein should belong is 110, then we would say that the protein is “complete” (91% aligned), Partial those predicted proteins in the 248 sets that are incomplete, but still exceeds a pre-computed minimum alignment score. Keys are as described [58]