Transcriptome, proteome and draft genome of Euglena gracilis

Table 2 CEGMA analysis of selected datasets

Assembly	Organism	Gene status	Prots	%Completeness	Total	Average	%Ortho
Genome	E. gracilis	Complete	22	8.87	37	1.68	54.55
		Partial	50	20.16	89	1.78	56
	T. brucei	Complete	196	79.03	259	1.32	24.49
		Partial	205	82.66	282	1.38	28.29
	L. major	Complete	194	78.23	220	1.13	11.34
		Partial	204	82.26	245	1.2	15.69
Transcriptome	E. gracilis	Complete	187	75.4	390	2.09	65.78
		Partial	218	87.9	506	2.32	69.72
	T. brucei	Complete	190	76.61	393	2.07	60
		Partial	205	82.66	448	2.19	63.41
	L. major	Complete	133	53.63	275	2.07	64.66
		Partial	194	78.23	405	2.1	64.43

Comparisons for CEGMA scores between E. gracilis, T. brucei and L. major as an estimate of ‘completeness’ based on 248 CEGs. Prots number of 248 ultra-conserved CEGs present in genome, %Completeness percentage of 248 ultra-conserved CEGs present, Total total number of CEGs present including putative orthologs, Average average number of orthologs per CEG, %Ortho percentage of detected CEGs that have more than 1 ortholog, Complete those predicted proteins in the set of 248 CEGs that when aligned to the HMM for the KOG for that protein family, give an alignment length that is 70% of the protein length. i.e. if CEGMA produces a 100 amino acid protein, and the alignment length to the HMM to which that protein should belong is 110, then we would say that the protein is “complete” (91% aligned), Partial those predicted proteins in the 248 sets that are incomplete, but still exceeds a pre-computed minimum alignment score. Keys are as described [58]

ISSN: 1741-7007