Skip to main content

Table 1 Descriptions of codon usage databases for either the generic class or B strain of E. coli. Each annotation describes the source of the genetic data, the total number of coding DNA sequences (CDS) extracted from the gene source(s), and the number of codons extracted from genes used to construct each database

From: Assessing optimal: inequalities in codon optimization algorithms

Author or database

E. coli strain

Gene source

# CDS

# codons

Sharp and Li [8] a

Generic

GenBank

27

6240

15

9223

57

25,010

58

22,612

Kazusa database [9]

Generic

GenBank

8087

2,330,943

B

GenBank

11

3771

HIVE-CUT database [10]

Generic

GenBank and RefSeq

68,262,063

20,219,118,236

B

GenBank and RefSeq

13,042

3,953,593

GtRNAdb [11, 12]

Generic

GenBank and RefSeq

5011

1,538,003

GenScriptb

Proprietary

Undefined

Undefined

Undefined

Dong et al. [13]

W1485 (K12)

N/A

Total RNA

Undefined

  1. aAuthors divided their dataset into four groups that represent genes that exhibit “very high expression,” “high levels of expression,” “moderate codon bias,” or “low codon bias” that are represented from the top down, respectively
  2. bwww.genscript.com