- Research article
- Open Access
The TyrA family of aromatic-pathway dehydrogenases in phylogenetic context
BMC Biologyvolume 3, Article number: 13 (2005)
The TyrA protein family includes members that catalyze two dehydrogenase reactions in distinct pathways leading to L-tyrosine and a third reaction that is not part of tyrosine biosynthesis. Family members share a catalytic core region of about 30 kDa, where inhibitors operate competitively by acting as substrate mimics. This protein family typifies many that are challenging for bioinformatic analysis because of relatively modest sequence conservation and small size.
Phylogenetic relationships of TyrA domains were evaluated in the context of combinatorial patterns of specificity for the two substrates, as well as the presence or absence of a variety of fusions. An interactive tool is provided for prediction of substrate specificity. Interactive alignments for a suite of catalytic-core TyrA domains of differing specificity are also provided to facilitate phylogenetic analysis. tyrA membership in apparent operons (or supraoperons) was examined, and patterns of conserved synteny in relationship to organismal positions on the 16S rRNA tree were ascertained for members of the domain Bacteria. A number of aromatic-pathway genes (hisH b , aroF, aroQ) have fused with tyrA, and it must be more than coincidental that the free-standing counterparts of all of the latter fused genes exhibit a distinct trace of syntenic association.
We propose that the ancestral TyrA dehydrogenase had broad specificity for both the cyclohexadienyl and pyridine nucleotide substrates. Indeed, TyrA proteins of this type persist today, but it is also common to find instances of narrowed substrate specificities, as well as of acquisition via gene fusion of additional catalytic domains or regulatory domains. In some clades a qualitative change associated with either narrowed substrate specificity or gene fusion has produced an evolutionary "jump" in the vertical genealogy of TyrA homologs. The evolutionary history of gene organizations that include tyrA can be deduced in genome assemblages of sufficiently close relatives, the most fruitful opportunities currently being in the Proteobacteria. The evolution of TyrA proteins within the broader context of how their regulation evolved and to what extent TyrA co-evolved with other genes as common members of aromatic-pathway regulons is now feasible as an emerging topic of ongoing inquiry.
Dehydrogenases dedicated to L-tyrosine (TYR) biosynthesis comprise a family of TyrA homologs that have different specificities for the cyclohexadienyl substrate: ones specific for L-arogenate (AGN), ones specific for prephenate (PPA), and those that are able to use both [1, 2]. Figure 1 illustrates the biochemical relationship of these specificities to divergent transformations beginning with chorismate (CHA) utilization and converging on TYR formation. Compounding this complexity, a given TyrA enzyme having any of the aforementioned cyclohexadienyl specificities may be specific for NAD+ or NADP+, or may use both. This is consistent with a growing appreciation [3, 4] that different substrate specificities are often accommodated across a given protein family that nevertheless maintains a common scaffold of fundamental reaction chemistry. Even within the single category of broad TyrA specificity, there is a continuum ranging from examples where alternative substrates are accepted equally well to other cases where one substrate may be preferred by an order of magnitude or more. Table 1 provides a key to the nomenclature used to identify the various possible substrate-utilization combinations (both cyclohexadienyl and pyridine nucleotide) exhibited by TyrA proteins.
The TyrA family is typical of many protein families in that its members have a relatively small core domain that is not highly conserved. As such, substantial challenges for bioinformatic analysis are posed. Here we have not only carried out a labor-intensive manual analysis, but we have also developed tools intended to facilitate and refine follow-on studies of this protein family in the genome era. The approaches implemented in this study with the TYR segment of aromatic biosynthesis hopefully can serve as a template for forthcoming integrant analyses of other pathway segments of aromatic biosynthesis, and indeed for metabolic subsystems in general.
This manuscript contains three broad sections. First, the biochemical and enzymological complexity of the TyrA protein family is presented in terms of the diversity that exists in nature with respect to substrate specificity and the association of the core domain with other catalytic or regulatory domains. Secondly, the genomic colinear organization of tyrA genes with other genes is evaluated, i.e., tyrA is considered in its syntenic context. Thirdly, tyrA is evaluated in its context of regulation. These three sections are tied together in a framework of evolutionary perspective.
Results and discussion
Background of TyrA diversity
Our evolutionary analysis is limited by the amount of information that can be managed in a single study, with the focus fixed upon the domain Bacteria (due to the relative density of genome representation for Bacteria in the public databases). However, in order to show where future expansion of the analysis might lead, the selection of TyrA proteins in Fig. 2 are from all three domains of life, i.e., Bacteria, Archaea, and Eukarya (lower eukaryotes and higher plants). For practicality of presentation, numerous orphan (i.e., without close relatives) TyrA sequences are not shown, and not all members of a given group are necessarily included. The main purposes of the radial tree shown in Fig. 2 are: (i) to illustrate that TyrA proteins of major phylogenetic groupings are generally congruent with 16S rRNA groupings and (ii) to convey a snapshot visualization of the overall complexity of the TyrA protein family from the vantage point of its varied substrate specificities as well as its multiple fusion partners.
As an illustration of the detailed information that follows, note that the TyrA sequences from the beta Proteobacteria at five o'clock in Fig. 2 form a cohesive cluster (termed a 'congruency group'). In this clade there exists a proposed ancestral background of broad specificity where either AGN or PPA in combination with either NAD+ or NADP+ could be used. This profile of broad substrate use (which can be denoted as NAD(P)TyrAc; see Table 1) generally persists in the beta Proteobacteria. From this background, narrowed specificities for the AGN/NADP+ couple emerged once in the lineage represented by Nitrosomonas europaea (Fig. 2; dark blue line), narrowed specificity for NAD+ emerged once in species of Neisseria (orange line), and fusion of tyrA c with aroF (which encodes enolpyruvylshikimate-3-P synthase, the sixth enzyme in the common pathway of aromatic biosynthesis; see [5, 6] for nomenclature used) occurred recently within the Burkholderia lineage. These character-state transformations appear to occur with relative ease, and independent emergence of the same character states can be seen elsewhere in the tree.
Phylogenetically congruent TyrA groupings
Multiple alignments of catalytic-core domains
A phylogenetic tree is only as good as the input alignment. An optimal multiple alignment of TyrA homologs requires a trimmed set of sequences that corresponds to the catalytic-core domain. Alignment of sequences with non-homologous N-terminal fusions (such as with chorismate mutase• (AroQ•), HisHb•, or plant transit peptides•; note the convention of using a bullet to indicate the fusion point of one domain with another domain) will make them appear to be more closely related than they actually are because residues in the non-homologous N-terminal regions find matches at random. Likewise, those TyrA sequences with C-terminal fusions (such as with •AroF, •ACT, or •REG) will appear to be anomalously close to one another. Even enzyme proteins that have much greater sequence conservation and amino-acid lengths than TyrA proteins cannot reasonably be expected to yield a protein tree that would be congruent over an extensive phylogenetic range with the overall 16S rRNA tree. However, if genome representation is sufficiently dense within a range of closely related organisms, 16S rRNA congruency with a given protein can be expected within that range of organisms provided that (i) the particular functional role has been retained and (ii) lateral gene transfer has not occurred to obscure the relationship. This expectation follows from the outcome of a detailed analysis of tryptophan-pathway proteins in Bacteria [7, 8].
Congruency within major clades
TyrA sequences from higher-plant and yeast Eukarya form cohesive clusters. Genome representation among Archaea is still relatively limited. (Fig. 2 does reveal, however, that genes encoding TyrA proteins in Archaea have experienced various catalytic- and regulatory-domain fusions at least as frequently as those in Bacteria). Eventual expansion of both the tryptophan-pathway and tyrosine-pathway analyses to Archaea should be quite interesting.
The great majority of TyrA sequences available are from Bacteria, and one can see (by inspection of the major clades supported by high bootstrap values in Fig. 2) a qualitatively apparent congruence of TyrA-tree sub-sections with 16S rRNA expectations of vertical genealogy. Thus, all cyanobacteria possess a NADPTyrAa type of TyrA enzyme, and this is a very cohesive grouping. A few of the larger cyanobacterial genomes have a co-existing second enzyme of the TyrAc_Δ type (discussed in detail later). The low-GC gram-positive bacteria (Bacillus/Staphylococcus/Enterococcus/Listeria) exhibit the NADTyrAp pattern of specificity and also possess a C-terminal domain (ACT) of allosteric regulation. It is interesting that the TyrAp•ACT proteins of the Streptococcus lineage (at eight o'clock in Fig. 2) differ from the main low-GC clade in possessing broad specificity for pyridine nucleotides (as indicated with black line color). The most parsimonious evolutionary conclusion would be that in the low-GC gram-positive grouping, acquisition of the ACT domain and narrowed specificity for prephenate preceded narrowed specificity for NAD+. Thus, the latter event occurred after divergence of the Streptococcus lineage from the remainder of the low-GC clade. Members of the subclass taxon Actinobacteridae (mostly actinomycetes) possess AGN-specific TyrA enzymes (light blue fill color in Fig. 2), but they separate into two distinct groups that correlate either with broad specificity for pyridine nucleotides (Actinobacteridae_1) or a NAD+-specific pattern (Actinobacteridae_2). The Proteobacteria are discussed immediately below.
By far the greatest genomic density available is for Proteobacteria, the group of Bacteria that includes purple bacteria and their relatives. The various divisions of Proteobacteria, as currently named, lack hierarchical equivalence. For example, the epsilon and delta divisions branch from much deeper positions on the phylogenetic tree than do the alpha Proteobacteria. As genome representation expands for epsilon and delta Proteobacteria, it is probable that these will subdivide to newly named groupings of approximate hierarchical equivalence with alpha Proteobacteria. The most recently diverged Proteobacteria are the beta and gamma divisions. From the combination of our previous analysis of tryptophan biosynthesis [7, 8], TYR biosynthesis (this paper), and other segments of aromatic biosynthesis (unpublished data), we find it useful to separate "upper-gamma" Proteobacteria from "lower-gamma" Proteobacteria (an "enteric lineage" with Shewanella oneidensis as approximately the most divergent member). This separation is because the beta Proteobacteria and the upper-gamma Proteobacteria exhibit a smooth continuity of relatively few evolutionary events with respect to aromatic biosynthesis, in striking contrast to extraordinarily dynamic evolutionary events in the lower-gamma Proteobacteria. As a consequence, the lower-gamma Proteobacteria are much more distinct (in terms of aromatic biosynthesis) from the upper-gamma Proteobacteria than the upper-gamma are from the beta Proteobacteria.
Figure 2 shows that alpha, beta and epsilon divisions of Proteobacteria form phylogenetically coherent clusters with respect to their TyrA proteins. Although delta Proteobacteria fall into two well-separated groupings denoted as Delta_1 and Delta_2, this should not be surprising since these groupings diverge at a deep level on the 16S rRNA tree where genome representation is poor. In addition, the Myxococcus xanthus TyrA sequence, currently an orphan (three o'clock in Fig. 2), represents a third divergent lineage in delta Proetobacteria. In contrast to delta Proteobacteria, genomic representation for the gamma Proteobacteria is relatively good. Nevertheless their TyrA sequences separate into several well-spaced groupings, albeit for entirely different reasons. In this case, the split seen between two clades of these fairly close relatives (upper-gamma and lower-gamma) is attributed to particularly dynamic evolutionary events compressed into a relatively short time span in the lower-gamma Proteobacteria. (We refer to such a dynamic divergence as an evolutionary jump; see the next section.) Note that the allocation of upper-gamma and lower-gamma Proteobacteria to separate TyrA congruency groups is not the same as being incongruent. It is quite possible that as new genomes come on line, new and intermediate TyrA sequences may result in the merging of the foregoing two congruency groups (currently tyrosine congruency group 1 (TyrCG-1) and tyrosine congruency group 2 (TyrCG-2)).
Comparison of tryptophan and tyrosine congruency groups
Although the true extent of lateral gene transfer (LGT) at present must be described as intensely controversial, there is little doubt that any given organism is mosaic with respect to some unknown fraction of its gene repertoire. Our "accounting" system for keeping track of proteins that are faithful to the vertical genealogy is to formulate congruency groupings that are defined by congruence of given protein-tree clusters to a section of the 16S rRNA tree. Ultimately this information will reveal which organisms are "pure" with respect to the vertical inheritance of a given pathway or pathway segment. Our congruency groups are intended to be fluid, in that with the continued availability of new sequences, a previous orphan sequence may very well become the seed for a new congruency group. On the other hand, previously separate congruency groups have the potential to merge. (See Methods for more information.) The present tyrosine congruency groups are listed on the AroPath website .
Seven tryptophan congruency groups in Bacteria were previously formulated  based upon the correspondence of cohesive clusters in trees of Trp-protein concatenates with sections of 16S rRNA trees. The information input for formulation of tryptophan congruency groups is of greater quality than for tyrosine congruency groups because seven-protein concatenates could be used for the former. On the other hand, the broad information input supporting tyrosine congruency groups in this study is more comprehensive because of greater genome availability. Tryptophan congruency group 1 (TrpCG-1) corresponds perfectly with the organisms represented in TyrCG-1, these being the lower-gamma Proteobacteria (enteric lineage). The upper-gamma Proteobacteria (TyrCG-2) and the beta Proteobacteria (tyrosine congruency group 3; TyrCG-3) are represented by different tyrosine congruency groups. In contrast, the membership of tryptophan congruency group 2 (TrpCG-2) includes both the upper-gamma Proteobacteria and the beta Proteobacteria. The latter merging probably reflects the advantage conferred by the greater information content of the concatenated sequences used to define tryptophan congruency groups.
Species of Xylella and Xanthomonas are usually referred to as gamma Proteobacteria. They probably represent an outlying deeply branching lineage, although trees based on concatenated strings of proteins  or 16S rRNA  position them with beta Proteobacteria. In any event, Trp-protein concatenate trees placed Xylella and Xanthomonas within TrpCG-2, which contains both upper-gamma and beta Proteobacteria. In contrast, the TyrA domains from Xylella and Xanthomonas were well separated (at about two o'clock in Fig. 2) from those of any other organism. This might simply be due to the limited resolving power of a single protein in combination with too few close relatives. (Note that single Trp-protein trees sometimes failed to achieve the congruency-group placements that were resolved by seven-protein Trp concatenates ). An additional clue may be relevant. The TyrA proteins from the Xylella/Xanthomonas genera possess an ACT domain, which has not been observed in any other proteobacterial TyrA proteins thus far. In view of this, origin by LGT seems to be a distinct possibility, but with the important caveat that no likely genome donors are yet obvious on the criterion of sequence similarity. Perhaps more likely is the following possible explanation that postulates a basis for accelerated divergence. The TyrA domains of Xanthomonas/Xylella proteins have an indel structuring (insertions and/or deletions) that places them within the TyrAc_Δ specificity subclass (see below). We suggest (see below) that such indel structuring reflects interaction of the core TyrA domain with an extra-domain extension. Thus, selection for amino acid changes accomplishing a new domain-domain interaction could account for accelerated divergence of the Xanthomonas/Xylella sequences on the TyrA tree (Fig. 2).
Cohesive tryptophan congruency groups of the alpha Proteobacteria (tryptophan congruency group 3; TrpCG-3) and the cyanobacteria (tryptophan congruency group 4; TrpCG-4) match up well with the corresponding tyrosine congruency groups (tyrosine congruency group 4 (TyrCG-4) and tyrosine congruency group 8 (TyrCG-8), respectively). The TyrA proteins of epsilon Proteobacteria define a cohesive tyrosine congruency group (tyrosine congruency group 5; TyrCG-5), whereas the Trp-protein concatenates of epsilon Proteobacteria did not exhibit a coherent congruency group, due at least in part to LGT . The delta Proteobacteria separate into two distinct tyrosine congruency groups: Delta_1 (tyrosine congruency group 6; TyrCG-6) and Delta_2 (tyrosine congruency group 7; TyrCG-7), as shown in Fig. 2. It is likely that corresponding tryptophan congruency groups exist (work in progress), but at the time of the Xie et al. study  only Trp-pathway protein concatenates for Desulfovibrio vulgaris (Delta_2) and Geobacter sulfurreducens (Delta_1) were available, and they were provisionally listed as "orphans". In the present work TyrA sequences from Deinococcus radiodurans and Thermus thermophilus are the sole members of tyrosine congruency group 12 (TyrCG-12). At the time of the Trp-pathway work, the genome of Thermus was unavailable and the Deinococcus concatenate was listed as an orphan. It is expected that the Deinococcus and Thermus concatenates will now seed a new tryptophan congruency group.
Whereas tryptophan congruency group 5 (TrpCG-5) is defined by cohesive concatenates from actinomycete bacteria, the TyrA proteins from the same organisms separated into two distinct congruency groups. It is intriguing that this partitioning into two congruency groups correlates with narrowed specificity for NAD+ (indicating an evolutionary jump) in one of the groups. The latter group (tyrosine congruency group 11; TyrCG-11) is denoted Actinobacteridae_2 in Fig. 2, whereas tyrosine congruency group 10 (TyrCG-10) is displayed as Actinobacteridae_1. The opposite scenario whereby a single tyrosine congruency group corresponds to split tryptophan congruency groups applies in the case of low-GC gram-positive bacteria. Whereas TyrA proteins form a single congruency group in these organisms (tyrosine congruency group 9; TyrCG-9), a small cluster of Trp-pathway concatenates from Bacillus subtilis, B. stearothermophilus, and B. halodurans (tryptophan congruency group 6; TrpCG-6) separate distinctly from the remaining organisms (tryptophan congruency group 7; TrpCG-7). The latter evolutionary jump reflects a dynamic scenario of tryptophan-pathway evolutionary events that include loss of one gene from the trp operon, insertion of the trp operon into a 6-gene aro operon to produce a supraoperon, and acquisition of the TRAP (tryptophan-activated RNA-binding protein) mechanism of regulation by an RNA-binding protein .
Tyrosine congruency groups and tryptophan congruency groups are maintained and updated at the AroPath website .
Distribution in nature of TyrA specificity subclasses for the cyclohexadienyl substrate
Four qualitative classes of specificity for the cyclohexadienyl substrate populate the TyrA superfamily of homologs (Fig. 1). These include PPA-specific (TyrAp), AGN-specific (TyrAa), the broad-specificity cyclohexadienyl (TyrAc) dehydrogenases and a fourth class represented by an enzyme of antibiotic biosynthesis (PapC) that converts 4-amino-4-deoxy-prephenate to 4-amino-phenylpyruvate . Representatives of each specificity class have been studied at molecular and genetic levels. TyrA family members sharing a given substrate specificity do not necessarily cluster tightly together, and assignment of substrate specificity to experimentally uncharacterized TyrA homologs is uncertain unless they exhibit very high amino acid identities with experimentally characterized TyrA proteins. In some cases we do not accept older literature reports without more recent verification. For example, the yeast Saccharomyces cerevisiae TyrA x was characterized as a TyrA p protein  long before it was recognized  that PPA preparations were often contaminated with AGN (an unknown compound at that time).
Our collection of curated TyrA sequences at AroPath (see Table 3) contains trimmed sequences that comprise catalytic-core domains. This collection was divided into two groups based on whether the sequences contained the relatively short N-terminal pyridine-nucleotide discriminator segment or the longer C-terminal cyclohexadienyl-substrate core segment. The sequences in the latter group were assembled into subgroups representing established substrate specificities (TyrAa, TyrAp and TyrAc) and were aligned separately to obtain overall consensus sequences for cyclohexadienyl-substrate core segments. The TyrAc group members from the lower-gamma assemblage of Proteobacteria (as well as from a few other lineages) were so distinctive that a fourth group (TyrAc_Δ) was defined. This latter group is, in fact, the most divergent of the four. Figure 3 shows a comparison of the four consensus sequences, with invariant anchor residues shaded yellow and residues conserved across all groups shaded in gray. Residues within each group that are >50% conserved are shown in capital letters. In pairwise BLAST (Basic Local Alignment Tool) comparisons, TyrAa and TyrAc consensus sequences are most similar (47% identity), followed by the TyrAc/TyrAp pair (40% identity), with TyrAa and TyrAp exhibiting 34% identity. TyrAc_Δ is quite distinct from the other three groupings, exhibiting only 27% identity with TyrAc, 23% identity with TyrAc, and 18% identity with TyrAp.
Many TyrA proteins (at least in the domain Bacteria) are of the TyrAc subclass. The cyclohexadienyl dehydrogenases commonly accept PPA or AGN about equally well, but various degrees of preference for one of the alternative substrates are also observed. Detailed molecular and genetic studies of TyrAc proteins from Pseudomonas aeruginosa, , P. stutzeri , and Zymomonas mobilis  have been carried out. The distinct variety of TyrAc mentioned above, which has been denoted TyrAc_Δ exhibits a number of indels (mostly deletions) within the catalytic-core region when its consensus sequence is aligned with those of the other TyrA classes (Fig. 3). It is intriguing that the indel structuring of TyrAc_Δ correlates with the presence of an extra-core extension. This extension is often AroQ, but not always. For example, in the genera Nostoc and Anabaena it appears to be a degraded, catalytically inactive AroQ, whereas in Xanthomonas or Xylella it is an ACT domain. Since the one large clade of TyrAc_Δ proteins that has so far been studied prefers PPA over AGN by well over an order of magnitude, an evolutionary relationship of indel insertions to the narrowing of substrate preference for PPA might exist. If so, however, this cannot be the only molecular change to accomplish favored utilization of PPA over AGN since a number of TyrAc proteins, (e.g., TyrAc from Neisseria gonorrhoeae), also exhibits an overwhelming preference for PPA, even though this class lacks the indel structuring.
The TyrA a class of specificity is currently represented by higher plants and at least three widely spaced bacterial lineages: cyanobacteria, actinomycetes and Nitrosomonas europaea. This discontinuity of phylogenetic spacing is consistent with a fundamental evolutionary scenario  whereby the ancestral dehydrogenase was a broad-specificity TyrA c that evolved narrowed substrate specificity (to yield either TyrA p or TyrA a ) independently on multiple occasions in modern lineages. The ubiquitous presence of TyrAa in cyanobacteria has been heavily documented . Nitrosomonas europaea currently (as of March, 2005) has no sufficiently close genome relatives that have been sequenced. The first BLAST hit returned from a NADPTyrA a query from N. europaea (March,2005) is the protein from Ralstonia solanacearum (48% identity), which is known to possess broad specificity for both of its substrates (i.e., NAD(P)TyrA c ) [21, 22].
The TyrA sequences of Actinobacteria separate into two distinct groupings on the protein tree (Fig. 2). Coryneform bacteria in one sub-cluster have been rigorously characterized as the NAD(P)TyrAa substrate specificity type. On the other hand, a variety of Streptomyces species have been shown [23, 24] to possess NADTyrAa, and TyrA proteins of these organisms populate the second Actinobacteria sub-cluster of Fig. 2. Figure 4 shows sequence alignments of the N-terminal pyridine-nucleotide discriminator regions of currently available actinomycetes. The conserved 'D' residue (highlighted in yellow) in the upper group is a reliable indicator of NAD+ specificity, in part because NADP+ is repelled by the negative charge at this position. The asparagine residue (highlighted in blue) in the corresponding position in members of the lower group indicates NAD(P)+ specificity as discussed by Bonner et al. . Rubrobacter xylanophilus is the most distant representative of the Actinobacteria, being the sole member of the subclass taxon Rubrobacteridae, and its protein (denoted Rxyl) appears as an orphan in Fig. 2.
A similar relationship of phylogenetic separation associated with narrowed specificity for pyridine-nucleotide substrate exists for the low-GC gram-positive bacteria (eight o'clock in Fig. 2). Here the major clade is NAD+-specific, whereas species of Streptococcus have retained the ancestral breadth of specificity for NAD+/NADP+. Alignments of the pyridine-nucleotide discriminator regions of these latter two groups match up extremely well with the upper alignment of Fig. 4 where residue 32 of the Wierenga fingerprint  is 'D' and with the lower alignment where residue 32 is 'N' (data not shown).
Recently, a plant tyrA a from Arabidopsis thaliana has been reported to consist of two near-identical domains that are fused . The gene encoding this 68-kDa protein co-exists in the genome with a single-domain paralog  that encodes a predicted 37-kDa protein, somewhat larger than the catalytic-core domain of TyrA a from Synechocystis. TyrA a (known to be located in higher-plant chloroplasts ) may have originated from cyanobacteria via endosymbiosis. If so, however, the plant TyrAa sequences have diverged sufficiently that they no longer share a specific phylogenetic grouping with the cyanobacterial TyrA sequences. This is in marked contrast with the phylogenetic coherence of the tryptophan synthase subunit proteins (TrpEa and TrpEb_1) from cyanobacteria and higher plants .
TyrAp is conspicuously represented by a large clade of low-GC gram-positive organisms, of which Bacillus subtilis TyrAp is the best studied . Thus far, all TyrAp proteins are fused to a C-terminal ACT domain, and therefore no "minimal" TyrAp proteins that consist only of a catalytic core are available as yet. At the level of physiological function, it should be added that those cyclohexadienyl dehydrogenases that exhibit a very substantial preference for prephenate are for all practical purposes prephenate dehydrogenases, even though they carry a formal designation of TyrAc or TyrAc_Δ. These include most, if not all, of the AroQ•TyrAc_Δ enzymes of the enteric lineage (lower-gamma in Fig. 2). The TyrAc protein from Neisseria gonorrhoeae (and by inference, the closely related N. meningitides) is also a well-studied example of overwhelming preference for prephenate .
PapC participates in the formation of p-aminophenylalanine as a step in the synthesis of at least two antibiotics (see Fig. 1). It is so far represented by only a few sequences. The PapC specificity is strongly indicated by absence of the otherwise invariant residue H197 (E. coli numbering) that is associated with recognition of a 4-hydroxy moiety in the cyclohexadienyl substrates of the aforementioned dehydrogenases. This moiety, of course, differs in being a 4-amino substituent in the substrate used by the PapC dehydrogenase (Fig. 1). See Bonner et al.  for a more detailed overview.
The "redundant" trp/aro supraoperon of Nostoc/Anabaena
All cyanobacteria possess a highly conserved tyrA a gene, as well as a complete suite of tryptophan-pathway genes that are dispersed (unlinked) in the genome. The large-genome cyanobacterial lineage consisting of the Nostoc and Anabaena genera possess in addition a unique and seemingly redundant trp/aro supraoperon consisting of most of the aforementioned genes . These include a second tyrA gene (curated as tyrA c_Δ ), six trp-pathway genes (all except trpC), and genes encoding the first two common-pathway steps of aromatic amino acid biosynthesis. All of these supraoperonic genes appear to be redundant in that they are represented by homologs (paralogs or xenologs) elsewhere in the Nostoc and Anabaena genomes at scattered loci. The closest BLAST hits for the Nostoc/Anabaena TyrAc_Δ proteins are not the co-existing TyrAa homologs present in their own genomes (and universally present in cyanobacteria). Rather the closest BLAST hits are to the TyrAc_Δ domains of the AroQ•TyrAc_Δ fusions in the enteric lineage. Since the enteric proteins are NAD+-specific and strongly prefer prephenate, it is likely that the "extra" cyanobacterial proteins are also NADTyrAc_Δ proteins. Indeed, this would be consistent with enzymological evidence provided in the literature for both Nostoc and Anabaena .
Concerning the evolutionary origin of the redundant block of linked genes found in the Nostoc and Anabaena genomes, at least two possibilities await further illumination. (i) These genes might have been acquired by a common ancestor of Nostoc and Anabaena via lateral gene transfer. This is consistent with the observation that biosynthetic-pathway operons are generally absent in the cyanobacteria, and all of the linked genes could have been recruited in a single event. However, at present no candidate donor genomes are known that possess this supraoperon combination of genes. If the TyrAc_Δ proteins of Nostoc/Anabaena and the enteric lineage are possibly related by LGT, it is of interest that the N-terminal extension of TyrAc_Δ from Nostoc/Anabaena resembles a degraded AroQ domain of AroQ•TyrAc_Δ from enterics. In both cases the N-terminal residues may compensate for indel deletions within the catalytic core region of TyrAc_Δ. Subsequently, AroQ function may have evolved in one lineage (or have been lost in the other). This possibility of domain-domain interaction is consistent with the established interdependence of the AroQ• and •TyrAc_Δ domains from E. coli . Alternatively, tyrA a and tyrA c_Δ (and the duplicated trp and aro genes present in the supraoperon) might be ancient paralogs within the cyanobacterial lineage. If so, at a time following divergence of heterocystous cyanobacteria from the unicellular cyanobacteria, the latter may have lost the clustered block of aromatic-pathway genes in a single event of reductive evolution. The supraoperonic genes might be related to a specialized function associated with "developmental" physiological processes that typify the filamentous, heterocyst-forming cyanobacteria. This might be reminiscent of the nature of the phenazine-pigment operon of Pseudomonas aeruginosa. Here unique phenazine-pathway genes are combined with a redundant gene of common-pathway aromatic biosynthesis and two redundant (and fused) genes of tryptophan biosynthesis. This accomplishes the linkage of specific phenazine biosynthesis with a supply of 2-amino-2-deoxy-isochorismate, the branchpoint of divergence toward phenazine and tryptophan [33, 34]. This complexity in which multiple paralogs are differentially deployed is consistent with the large genome sizes of Anabaena (7.2 MB) and Nostoc (9.2 MB), compared with the much smaller unicellular genomes of Prochlorococcus marinus (1.7 MB), Synechococcus sp. WH8102 (2.4 MB), and Synechocystis sp. PCC6803 (3.6 MB).
Profile hidden Markov models (HMMs) to distinguish specificity subfamilies for cyclohexadienyl substrate
The limited information thus far available about specific molecular roles of particular TyrA amino acid residues has been summarized recently . The catalytic-core domains of known TyrAa, TyrAp, TyrAc, and TyrAc_Δ proteins were selected from our files of TyrA catalytic-core domains , and a new subset of sequences was prepared that lacked the pyridine nucleotide discriminator segment, a glycine-rich βαβ region at the N terminus. Although the glycine-rich βαβ region is not the only segment that contacts pyridine nucleotide substrate, it is the sole region that discriminates between NAD+ and NADP+. The resulting trimmed sequence is defined as the "cyclohexadienyl-substrate core segment". No distinctive motifs were found that, in isolation, would be a clear predictive indicator of specificity for cyclohexadienyl substrate. Similar substrate specificity profiles probably can be dictated by alternative patterns of interplay between different residue combinations.
Because of the rapid accumulation of incorrectly annotated TyrA entries in GenBank and other databases, partly due to the complications of misnaming that are associated with gene fusions and partly to a failure to assimilate published substrate specificities, the use of BLAST does not return reliable annotations with respect to substrate specificity. Even the HMMs used in Pfam  and Interpro  were not helpful in this case because the HMM deployed in those databases was broadly but incorrectly defined as 'prephenate dehydrogenase (NADP+) activity' for all TyrA dehydrogenases (accession number PF02153 in Pfam and entry IPR003099 in Interpro). However, Profile HMM is known to be well suited for modeling a particular sequence family of interest and for finding additional remote homologs . It is reputed to outperform methods that rely only upon pair-wise alignment of homologous residues in predicting protein function . Therefore, profile HMMs were constructed using our multiple sequence alignments of each curated TyrA specificity subfamily, using the HMMER package .
The profile HMMs obtained are only tentatively reliable for prediction of substrate specificity. To facilitate ongoing and future functional annotations, we have made our profile HMMs available as a working resource for "specificity prediction" at AroPath . Users can match query sequences against the four profile HMMs to predict the subfamily to which a query sequence belongs. It is anticipated that future experimental data relevant to substrate specificity will facilitate refinement of the prediction program. For example, at present the program predicts that the TyrA sequences from organisms such as Helicobacter pylori and Saccharomyces cerevisiae belong to the TyrAa grouping, and it will be interesting to see whether this holds up to experimental confirmation. It is additionally fascinating that (i) the dehydrogenase from Archaeoglobus fulgidus is predicted to belong to the indel-containing TyrAc_Δ grouping and (ii) that it possesses a possible cooperatively interacting extra-core domain extension (an AroQ fusion), just as occurs for the large clade of enteric bacteria. If this is relevant, it is even more fascinating that the Archaeoglobus aroQ is fused at the C-terminal side of tyrA c_Δ, rather than at the N terminus as is the case with enteric bacteria.
Users at AroPath  can enter query sequences into interactive multiple sequence alignments with any of the four sets of "cyclohexadienyl-substrate core segments" sequences that were used to train the profile HMMs. An ongoing effort is in process to extend the predictor capability to include the pyridine nucleotide substrate as well. One can also align query sequences of interest with either an assemblage of the complete set of curator-approved TyrA catalytic-core TyrA sequences or with any desired subset of seed sequences.
The catalytic-core domain of TyrA proteins
The simplest set of fully functional TyrA proteins consists only of the catalytic-core domain (about 180 amino acids)  and includes the well-characterized TyrA c enzymes from Neisseria gonorrhoeae  and Zymomonas mobilis , as well as TyrA a from a cyanobacterium . In addition the catalytic-core domain from Pseudomonas stutzeri has been engineered for study from a tyrA c •aroF fusion . These model core proteins are roughly as divergent from one another on the TyrA protein tree as are the organisms that contain them (Fig. 2). In view of the possibility raised in this paper about inter-domain interactions, the single-domain TyrA proteins are undoubtedly the simplest sources for study of the fundamental properties of the catalytic-core domain.
Xie et al.  suggested that in the set of catalytic-core TyrA proteins, inhibitors bind at the catalytic site and exhibit classical competitive inhibition with respect to the particular cyclohexadienyl substrates that can be accepted by a given organism. This model predicts that the specificity for the sidechains of substrates used would parallel the specificity for inhibitor sidechains. The information summarized in Table 4 supports this expectation. Thus, the TyrA c proteins of P. stutzeri and P. aeruginosa will accept either a pyruvyl (as with PPA) or an alanyl (as with AGN) sidechain in the alternative substrates used, and this is paralleled by recognition of either a pyruvyl (4-hydroxyphenylpyruvate) or an alanyl (TYR) sidechain in the competent inhibitor structures. In another case, the N. gonorrhoeae TyrAc exhibits an overwhelming substrate preference for PPA, and consistent with the foregoing, is subject to inhibition by 4-hydroxyphenylpyruvate but not by TYR. A variety of analog inhibitor structures were used by Xie et al.  to show that the minimal structure for binding at the substrate-binding site of P. stutzeri TyrA c is a six-membered ring with a 4-hydroxy substituent.
In contrast to the TyrA c proteins just described, the Z. mobilis TyrA c is totally insensitive to inhibition by either 4-hydroxyphenylpyruvate or TYR. Since both of these compounds lack a 1-carboxy moiety, it is reasonable to assume that the 1-carboxy substituent present in the two substrates accepted may be required for binding at the catalytic center. Thus, although TyrAc from Z. mobilis will accept the same two substrates as does the TyrA c from P. stutzeri, the greatly different inhibition results suggest that Z. mobilis obeys more stringent rules for binding at the catalytic site (i.e., a ring carboxylate must be present).
Synechocystis sp. and Arabidopsis thaliana TyrAa proteins accept as a substrate only AGN, which has an alanyl sidechain. The ring-carboxylate moiety is evidently not absolutely required for binding since these TyrAa proteins can recognize TYR (alanyl sidechain) as a competitive inhibitor. In contrast, since N. europaea TyrAa is not inhibited by TYR, it resembles the Z. mobilis TyrAc in the putative requirement for a 1-carboxy substituent to secure successful binding at the catalytic site.
In summary, some TyrA proteins probably exercise greater discrimination in their requirement for a 1-carboxy moiety for binding at the catalytic site, and these are insensitive to competitive inhibition by the aromatic reaction products (which lack the 1-carboxy substituent). Other TyrA proteins that require the 1-carboxy moiety for the fundamental catalytic process, but presumably do not require it for binding, will recognize product inhibitors that have the same sidechain as any substrate recognized.
Specificity for the pyridine nucleotide co-substrate within the TyrA superfamily
NAD+ differs from NADP+ only in that NADP+ has a phosphate group esterified at the 2'-position of adenosine ribose. Therefore, the ability of a dehydrogenase to discriminate between those two lies in the particular enzyme region that contacts the ribose moiety. The glycine-rich region known to constitute the ADP-binding βαβ fold is well known to be this point of contact . This Rossmann β α β fold is inevitably positioned at the extreme N terminus of TyrA proteins, and the typical GXGXXG motif is almost always observed, as illustrated in Fig. 4. This region is helpful for assessment of probable specificities for pyridine nucleotide. One can be fairly sure that TyrA proteins possessing D-32 (E. coli numbering, reference ) are NAD+-specific. A negatively charged residue (D or E) at position 32 is critical for hydrogen binding to the diol group of the ribose near the adenine moiety in NAD+-specific enzymes. NADP+-specific dehydrogenases cannot tolerate a negatively charged residue at position 32. TyrA proteins that possess an asparagine residue in the corresponding position appear to be broadly specific for both NAD+ and NADP+ as discussed above. No clearcut motif has been identified for NADP+-specific TyrA proteins, although at least one positively charged residue is expected in the region just beyond residue 32. By elimination, those sequences lacking D-32 or N-32 are strong candidates for NADP+ specificity. As with the cyclohexadienyl co-substrate, narrowed specificity for NAD+ (or NADP+) also seems to have occurred independently on many occasions (some examples given earlier).
The absolute specificity of TyrAp proteins for PPA tends to be accompanied by absolute specificity for NAD+, as illustrated by the large Bacillus/Staphylococcus/Listeria/Enterococcus clade at eight o'clock in Fig. 2. However, it is interesting that species of Streptococcus have retained the presumed ancestral breadth of specificity for the pyridine nucleotide substrate. The opposite relationship, whereby absolute specificity for AGN tends to be accompanied by absolute specificity for NADP+, is also observed. Here three of the four TyrAa lineages described earlier exhibit this pattern. Exceptions, though, are the aforementioned TyrAa proteins of Actinobacteridae_1 which accept either NAD+ or NADP+, as well as the TyrAa proteins of the sister Actinobacteridae_2 which are specialized for NAD+ [42, 43].
The TyrAc proteins of most complete-genome organisms thus far have happened to be NAD+-specific, and this has been the property of the most rigorously characterized ones (from Z. mobilis, P. stutzeri, and P. aeruginosa). However, it is clear from extensive enzymological surveys  that TyrAc proteins having broad specificity for NAD+/NADP+ are common, examples including species of Ralstonia and Burkholderia. The spectrum of variation that can exist, even within a clade of organisms that are of fairly close relationship, is illustrated by one striking example. In the pseudomonad clade marked by a common tyrA•aroF fusion, the Acinetobacter sp. TyrAc is NADP+-specific , whereas the sister subclade Pseudomonas/Azotobacter exhibits NAD+ specificity (Fig. 2). Here the entire clade marked by a common ancestral fusion shares approximately the same profile of cyclohexadienyl substrate preference, but cofactor specificity has been narrowed in opposite directions.
We had previously suggested that there might be a general structural relationship of substrate pairing that tends to favor interaction between PPA and NAD+, on the one hand, and, on the other hand, between the greater positive charge of AGN and the greater negative charge of NADP+. These relationships may indeed be favored, but it increasingly appears that any combination can occur.
Beyond the catalytic core: allosteric domains
Various lineages have acquired an amino acid binding domain known as the ACT domain (pfam01842), which is known to bind a variety of amino acids, thus functioning as an allosteric domain for many proteins including phosphoglycerate dehydrogenase, aspartokinase, acetolactate synthase, phenylalanine hydroxylase, prephenate dehydratase and formyltetrahydrofolate deformylase. Recruitment of this domain by fusion with tyrA p appears to have occurred in a common ancestor of the large Bacillus/Staphylococcus/Listeria/Enterococcus/Streptococcus assemblage (Fig. 2). It is interesting that B. subtilis also possesses a gene encoding a free-standing ACT domain in its genome (incorrectly annotated as pheB). An additional fusion of genes encoding an ACT domain and tyrA (that arose independently, judging from the widely spaced tree positions) occurred in the common ancestor of Xanthomonas and Xylella. Actinobacteria usually possess a C-terminal extension that probably functions as an allosteric domain. The extension possessed by the Actinobacteridae_2 assemblage, which includes Streptomyces coelicolor and its relatives, appears to be an ACT domain. On the other hand, it is not all all clear that the C-terminal extension of the Actinobacteridae_2 assemblage is an ACT domain. This difference, in addition to the differing specificities for pyridine nucleotide substrate, may have contributed to the overall TyrAa divergence observed between the two Actinobacteridae groups. There is no correlation between presence of the ACT domain and specificity for cyclohexadienyl substrate since TyrA p from the Bacillus clade is PPA-specific, Xanthomonas/Xylella TyrAc is broadly specific, and Streptomyces TyrAa is AGN-specific.
B. subtilis, which belongs to the large clade having an ACT domain as a carboxy extension, has been extensively characterized . 4-Hydroxyphenylpyruvate is an effective competitive inhibitor, as would be consistent with our proposed effects at the catalytic core for a PPA-specific enzyme. However, TYR, phenylalanine (PHE) and tryptophan were also inhibitors. The violation of the rule that the latter three amino acid inhibitors would not be expected to bind the catalytic core region (because they have alanyl sidechains even though the substrate-binding site only recognizes the pyruvyl sidechain of prephenate) and the finding that some of these were not competitive inhibitors can now be accounted for by the presence of the allosteric ACT domain. A carboxy extension shared by a number of Archaea (denoted 'REG' in Fig. 2) is presumably a regulatory domain as well. This is consistent with the recent result of Porat et al.  that not only 4-hydroxyphenylpyruvate, but also TYR, inhibited prephenate dehydrogenase activity of Methanococcus maripaludis.
The tyrA gene is a popular fusion partner
Fusion with aroQ
tyrA may be fused with a number of other catalytic domains, each of them relevant to aromatic biosynthesis (Fig. 2). aroQ (encoding chorismate mutase) is frequently fused with a number of other aromatic-pathway genes . The lower-gamma Proteobacteria (enteric lineage) located at twelve o'clock in Fig. 2 possess an aroQ•tyrA c_Δ fusion. The fusion physically links chorismate mutase (which forms PPA) with TyrAc_Δ (which utilizes PPA). The two protein domains of AroQ•TyrAc_Δ may have co-evolved to produce cooperative protein-protein interactions since physical separation of the domains evoked relatively low activities of both activities in E. coli . Substantial comparative work shows that the aroQ•tyrA c_Δ fusion has been stably maintained throughout the entire enteric lineage . Exceptions in some genomes lacking this fusion altogether can be attributed to reductive evolutionary loss in pathogens (e.g., Haemophilus ducreyi) or endosymbionts (e.g., Buchnera aphidicola). An independent aroQ•tyrA fusion was generated in the common ancestor of Sulfolobus solfataricus and S. tokodaii (Fig. 2). Since the TyrA domain of Sulfolobus species lacks the indel structure of the TyrAc_Δ class, it would be interesting to see whether physical separation of the two domains would yield evidence of independent function, in contrast to the results mentioned just above for E. coli.
Fusion with aroF
Secondly, tyrA c has been fused with aroF on at least two separate occasions in Bacteria. (The aroF gene encodes enolpyruvylshikimate-3-P synthase, the sixth enzyme in the common pathway of aromatic biosynthesis; see [5, 6] for nomenclature used.) One clade includes members of the upper-gamma Proteobacteria: P. aeruginosa, P. syringae, P. putida, P. stutzeri, P. fluorescens and Azotobacter vinelandii. It is interesting that P. syringae has experienced a deletion of about 200 residues at the N-terminal region of the AroF domain. This has been coupled with the acquisition of a stand-alone aroF gene that is absent in other members of the clade. Interestingly, the latter AroF shows high identity only with AroF from Agrobacterium tumefaciens, an alpha proteobacterium. The A. tumefaciens aroF, in turn, is unique compared to its α-subdivision relatives, both in having divergent sequence and in being unlinked to cmk and rpsA. Thus, it seems likely that the incongruence of AroF belonging to both P. syringae and A. tumefaciens reflects acquisition via LGT from some as yet unknown source. The disruption of the fused aroF domain in P. syringae is an unusual instance where the catalytic function of one fusion domain has become discarded while the function of the second domain has been retained. It is interesting to consider the possibility that the truncated remnant of the aroF fusion domain might be exploitable for use as an innovative source of a new regulatory domain. An additional fusion of tyrA with aroF has occurred independently within the beta Proteobacteria in the common ancestor of Burkholderia pseudomallei and B. mallei. This has been very recent since the closely related B. fungorum and B. cepacia organisms lack the fusion.
It has been suggested that presence of a given fusion may be useful for sorting out clades that diverged from a common ancestor, independent of other methods . Different fusions offer the power of discriminating clades at various hierarchical levels, i.e., nested clades discriminated by nested gene fusions. The tyrA•aroF fusion occurred in the common ancestor of the clade that includes the upper-gamma Proteobacteria shown in Fig. 2. One can reasonably assume that relatively close upper-gamma organisms lacking the tyrA•aroF fusion diverged from the common ancestor of the fusion clade prior to the fusion event. Such would appear to be the case, for example, with Acidithiobacillus ferrooxidans, an outlying member of the upper-gamma Proteobacteria that lacks the fusion. It is reasonable to conclude that the fusion event must have pre-dated the differential specialization for the pyridine nucleotide cosubstrate that distinguishes Acinetobacter sp. (NADP+-specific) from the large grouping of pseudomonads that are NAD+-specific.
Fusion with hisH b
Thirdly, a single organism, Rhodobacter sphaeroides, possesses a hisH b•tyrA fusion that must have occurred very recently. hisH b encodes an aromatic aminotransferase that is closely related to (or sometimes even synonymous with) imidazole acetol phosphate aminotransferase . The hisH b /tyrA/aroF linkage group is part of a supraoperon in some gram-negative bacteria in which a relatively conserved, yet frequently shuffled gene order is observed [5, 6]. Hence, it is reasonable to assume that at the time just prior to fusion, hisH b, tyrA and aroF were adjacent. Note that among the fusions currently known, hisH b and aroF are fused to the N-terminal and C-terminal ends of tyrA, respectively. It would be interesting to know the substrate specificity of the R. sphaeroides TyrA domain. If it is AGN-specific the significance of hisH b presumably would be to transaminate PPA to form AGN, the substrate used by TyrAa (see Fig. 1). On the other hand, if the dehydrogenase is PPA-specific, the significance of the HisHb domain would be to transaminate the product of the TyrAp reaction. If the enzyme is a TyrAc enzyme (as is probable), then HisHb likely is competent to catalyze either of the foregoing reactions.
Fusion with ACT
The widespread ACT regulatory domain appears to have been acquired by independent fusions at least three separate times judging from the widely separated lineages that possess a TyrA•ACT fusion (Fig. 2). Xie et al.  initially noted homologous domains positioned at the N terminus of mammalian phenylalanine hydroxylase and at the C terminus of most microbial prephenate dehydratases. This domain is responsible for phenylalanine-mediated activation and phenylalanine-mediated inhibition of the hydroxylase and dehydratase enzymes, respectively. This domain was later named the ACT domain  and shown to be a widely distributed domain family that shares a conserved overall fold. Members of the ACT-domain family possess a wide variety of different ligand-binding capabilities. For example, the ACT domain of 3-phosphoglycerate dehydrogenase binds L-serine as a allosteric inhibitor.
Fusion with REG
Another putative regulatory domain fused to tyrA (denoted tyrA•REG) is thus far restricted to some of the Archaea. This domain is a predicted regulatory domain, as described in COG4937.
A novel 4-domain fusion
Archaeoglobus fulgidus exhibits a striking four-domain fusion consisting of three catalytic domains and a regulatory ACT domain (TyrA•AroQ•PheA•ACT). The TyrA domain is predicted to belong to the TyrAc_Δ class when used as a query input into the AroPath Specificity Predictor Tool . We speculated earlier that the •AroQ fusion domain of Archaeoglobus may exercise cooperative interactions with TyrAc_Δ, as appears to occur between the AroQ•TyrAc_Δ domains of E. coli and its relatives.
tyrAin its syntenic context
Although the genes of prokaryotes have clearly been subject to frequent scrambling, some gene-gene associations persist more tenaciously than others. Xie et al. [5, 6] asserted that one such ancestral gene string that has resisted scrambling forces is hisH b > tyrA > aroF. As suggested above, contemporary gene fusions can serve as frozen-in-time indicators of ancient gene organizations that were later obscured by gene-scrambling events. Another gene string that is often within the syntenic region of hisH b, tyrA, and aroF is cmk > rpsA. Gene synteny in prokaryotes has not been easily recognized because substantial manual scrutiny in combination with a sufficient density of genomic representation on a given portion of the phylogenetic tree is necessary to detect patterns of synteny that are camouflaged by frequent scrambling events (inversion, deletion and transposition).
The domain Bacteria is now represented by a collection of sequenced genomes that is progressively approaching the genomic densities needed for meaningful analysis. Figure 5 provides a visual sense of the frequency with which tyrA is closely positioned with other genes of aromatic biosynthesis, as well as the underlying patterns of overall synteny. These patterns are unstable, and yet persistent traces of synteny can be seen where genomic representation is sufficiently dense. The four genes of particular emphasis in this paper are color coded. Other genes that are engaged in aromatic biosynthesis are colored grey, and any other genes are white. At a very deep level of phylogenetic branching, Thermotoga exhibits a tyrA gene flanked by seven genes encoding all of the common steps of aromatic biosynthesis (two of them being fused). Since closely related genomes are not yet available here, we cannot judge whether these genes came together recently or whether an ancient pattern of synteny has been retained. Although tyrA is not linked to any functionally relevant genes in Aquifex, representing another point of deep phylogenetic branching, this does not necessarily mean that tyrA was not already generally associated with other aromatic-pathway genes at an early time. For reasons that are totally mysterious, certain scattered lineages exhibit a total lack of operon organization for aromatic-pathway genes (and indeed for most other biosynthetic pathways, such as that for histidine biosynthesis). These lineages (Fig. 5) include, besides Aquifex, those of Deinococcus, the actinomycetes, the cyanobacteria, and Chlorobium. Except for the actinomycetes, this phenomenon of total gene dispersal also applies to genes of tryptophan biosynthesis [7, 8].
When the various examples of hisH b > tyrA > aroF linkage are mapped on a 16S rRNA tree, they first appear in gram-positive bacteria. In Bacillus and related organisms (such as Listeria), the hisH b > tyrA > aroF unit is associated with a large ancestral operon consisting of aroG > aroB > aroH > hisH b > tyrA p > aroF. Bacillus additionally possesses the cmk > rpsA unit, albeit in a separate location. Interestingly, in one narrow subclade (B. subtilis, B. halodurans and B. stearothermophilus) the trp operon has been inserted between aroH and hisH b to yield a supraoperon that has been fully characterized as a complex functional unit . See Xie et al.  for a full presentation of evolutionary interpretation relevant to the latter. Though highly scrambled, a pattern of association of pheA with hisH b > tyrA >aroF is suggested by linkage patterns seen at the hierarchical level of Cytophaga and Bacteroides (Fig. 5). aroQ became associated with pheA through gene fusion as early as the divergence of the Spirochaetes to yield an aroQ•pheA>tyrA>aroF>cmk>rpsA linkage unit (Leptospira interrogans in Fig. 5). The aroQ•pheA gene associated with tyrA and aroF in Clostridium difficile appears to have arisen from a distinctly different fusion event than that present in delta, epsilon, beta and upper-gamma Proteobacteria or from that present in lower-gamma Proteobacteria (based upon analysis of inter-domain linker regions; unpublished data).
Consensus ancestral gene organizations for the most densely represented divisions of Proteobacteria have been deduced as shown at the bottom of Fig. 5. Detailed information that supports a deduced consensus for ancestral gene organizations with respect to beta Proteobacteria, upper-gamma Proteobacteria, and lower-gamma Proteobacteria are shown later (Figs. 6, 7). We suggest that the last common ancestor of all Proteobacteria possessed the gene organization aroQ•pheA>hisH b>tyrA>aroF>cmk>rpsA. This is similar to the synteny that has been retained in general by the beta Proteobacteria and the upper-gamma Proteobacteria. The aroQ•pheA>hisH b>tyrA portion likely specified all the catalytic requirements for conversion of chorismate to PHE and conversion of chorismate to TYR. Chorismate mutase activity specified by the aroQ domain could supply PPA for both PHE and TYR biosynthesis. Likewise, HisHb, widely utilized as an aromatic aminotransferase , could also function for both PHE and TYR biosynthesis. Though currently available members of delta and epsilon Proteobacteria exhibit substantial gene scrambling, the various fragmentary linkage patterns seen provide support for the ancestor proposed. Geobacter (and other Delta_1 members) has the aroQ•pheA > tyrA > aroF > cmk > rpsA linkage group (with lytB inserted between cmk and rpsA). Desulfovibrio vulgaris, another delta Proteobacterium (Delta_2) that is highly divergent from Geobacter, has a very interesting pattern of conservation and scrambling. aroQ•pheA > aroF > tyrA has been attached to a complete 7-gene trp operon. hisH b > cmk (not shown in Fig. 5) is completely separated from rpsA. The supraoperonic gene organization shown for D. vulgaris begins with two recently discovered genes, herein denoted aroA' and aroB', that encode enzymes specifying an alternative biochemical route to dehydroquinate . The epsilon Proteobacteria all display significant gene scrambling, but piecemeal evidence for the unscrambled ancestor proposed is present. For example, Campylobacter jejuni possesses an aroQ•pheA > hisH b unit, as well as aroF > lytB > rpsA (Fig. 5). Wollinella succinogenes and Helicobacter hepaticus both possesses an aroF > lytB > rpsA unit.
The ancestor of alpha Proteobacteria has lost the aroQ•pheA fusion, and a stand-alone pheA is consistently observed. Members of this group are quite uniform in the stable possession of hisH b > tyrA and aroF > cmk > rpsA as two separated linkage groups. The beta Proteobacteria are represented by members that have the gene organization: serC > aroQ•pheA > hisH b > tyrA > aroF > cmk > rpsA. This is also seen in the members of the upper-gamma Proteobacteria.
Figure 5 includes organisms that illustrate the traces of synteny that can be detected in Bacteria where overall genome representation is just barely adequate. The following two figures illustrate how syntenic patterns of more resolution and refinement become evident with denser genome representation.
Zooming in on syntenic contexts of proteobacteria
Beta proteobacteria and upper-gamma proteobacteria
The beta Proteobacteria exhibit a dynamic but still interpretable pattern of altered synteny (Fig. 6 and Table 5). Species of Ralstonia have retained the proposed ancestral synteny that is marked with yellow highlighting in Fig. 6. This syntenic organization is such that the aromatic-gene unit aroQ•pheA > hisH b > tyrA > aroF is nested between gyrA > serC at the leftward flank and cmk > rpsA > himD at the rightward flank. Species of Burkholderia (the next closest lineage) are almost identical, but exhibit individual evolutionary events (marked by circled numbers on the left, which correspond to a description of the proposed evolutionary events given in companion Table 5). These events include gene insertion, loss of hisH b, translocation of genes away from the ancestral supraoperon, and fusion of tyrA and aroF (in the common ancestor of B. mallei and B. pseudomallei). At a deeper level in the beta Proteobacteria section of the tree, Nitrosomonas europaea exhibits a separation of the ancestral supraoperon between tyrA and aroF. Either a very large insertion was made between tyrA and aroF, or one of the two gene clusters shown was transposed as part of a sufficiently large segment to include all of the conserved flanking genes. In Chromobacterium violaceum tyrA has become completely isolated from other gene members of the ancestral supraoperon, and aroF has assumed an inverted orientation with respect to cmk. Species of Neisseria exhibit no remnants of supraoperon synteny at all, and wholesale dispersal of all the supraoperon genes has occurred. (It is interesting that among the beta Proteobacteria, Neisseria species are also unique in that all of the trp-pathway genes are dispersed ).
The gamma Proteobacteria have separated into two distinctly different synteny patterns. The lower-gamma Proteobacteria have undergone marked syntenic change (see below). The assemblage portrayed between Acidithiobacillus and Microbulbifer in the lower part of Fig. 6 (termed the upper-gamma Proteobacteria) exhibit a strong overall syntenic resemblance of supraoperon genes to that of the beta Proteobacteria. Acidithiobacillus possesses a near-intact ancestral supraoperon, differing only in having two insertions: one gene encoding 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase between hisH b and tyrA, and the other being the insertion of serA between serC and aroQ•pheA. Pseudomonas aeruginosa and P. stutzeri have also retained nearly intact ancestral supraoperons, differing only in the fusion of tyrA and aroF. The serC > aroQ•pheA > hisH b > tyrA•aroF > cmk > rpsA supraoperon has been studied in P. stutzeri [5, 6]. The tyrA•aroF fusion occurred in the common ancestor of the clade shown between Azotobacter and Microbulbifer in Fig. 6. The supraoperons of P. syringae, P. fluorescens and P. putida lack hisH b. P. syringae exhibits a recent C-terminal truncation of the aroF domain, coupled with acquisition elsewhere in the genome of a free-standing •aroF that is not phylogenetically congruent (probably of LGT origin). Acinetobacter sp. and Microbulbifer degradans possess an aroQ•pheA > tyrA•aroF unit that has become dissociated from serC at one end and from cmk on the other end. In Xylella and Xanthomonas, hisH b has been deleted from the genome and tyrA has been transposed away from serC > aroQ•pheA > aroF. The latter unit has been transposed away from gyrA, the ancestral flanking gene. On the other hand, cmk > rpsA has remained next to himD, the gene usually flanking rpsA.
The enteric lineage
The lower-gamma Proteobacteria differ sharply from upper-gamma Proteobacteria in their possession of the tyrA c_Δ class of tyrA and its fusion with aroQ. In Fig. 2 this clade of AroQ•TyrAc_Δ fusions was presented as one exhibiting absolute specificity for NAD+, combined with an overwhelming but not complete specificity for PPA. In Fig. 7 the gene synteny associated with tyrA c_Δ is profiled against the 16S rRNA phylogenetic trees of the lower-gamma Proteobacteria possessing these genes, and the proposed evolutionary events are summarized in the companion Table 6. Figure 5 has indicated a synteny consensus for the common ancestor at this hierarchical level whereby gyrA > serC > hisH b > aroF > cmk > rpsA parallels the ancestral synteny of β-Proteobacteria, but without aroQ•pheA or tyrA in the middle of the linkage group. Many dynamic evolutionary events of altered aromatic biosynthesis have occurred within the lower-gamma Proteobacteria since their divergence from the upper-gamma Proteobacteria. This includes the emergence of three allosterically distinct DAHP synthases, one of which now comprises the two-gene, three-domain tyr operon (aroA Iα_Y > aroQ•tyrA c_Δ ). The upper-gamma Proteobacteria characteristically possess the aroA Iα paralogs encoding AroAIα_H (TRP-inhibited DAHP synthase) and AroAIα_Y (TYR-inhibited DAHP synthase). It has been asserted that AroAIα_F (PHE-inhibited DAHP synthase) was the most recent paralog, acquired just after divergence of the lower-gamma Proteobacteria . It is bizarre that Shewanella oneidensis possesses a pseudogene of aroA Iβ fused to the C terminus of aroQ•pheA. The aroA Iβ subclass of Family-I DAHP synthases is not usually observed in gram-negative bacteria .
The dissociation of tyrA c_Δ from the serC/rpsA linkage group correlates with the fusion of aroQ with tyrA c_Δ . The aroQ•pheA fusion has also escaped from the serC/rpsA linkage grouping and has become linked with the newly emerged tyr operon. Some sort of duplication and recombinational event between aroQ•pheA and tyrA c_Δ may have led to the creation of aroQ•tyrA c_Δ since the AroQ•PheA proteins of lower-gamma Proteobacteria are distinct from AroQ•PheA proteins of other Proteobacteria with respect to the inter-domain linker length and the indel content (data not shown).
Although it usually is absent from the lower-gamma Proteobacteria, HisHb has persisted as the broad-specificity aromatic aminotransferase in the Pasteurella/Haemophilus grouping where two hisH paralogs are generally present, one of narrow specificity (denoted hisH n) being within the histidine operon. The aspC gene next to aroF in Shewanella is a paralog that probably functions as an aromatic aminotransferase, suggestive of the situation in the E. coli grouping where tyrB is a close paralog relative of aspC, tyrB having become specialized for aromatic biosynthesis . Gene reduction associated with both endosymbiotic and pathogenic lifestyles are evident. Thus, Buchnera lacks tyrA, cmk, hisH, tyrB, and possesses only a single aroA Iα species (aroA Iα_H ). Haemophilus ducreyi also lacks tyrA, as well as aroAIα_H and the entire trp operon .
TyrA in its context of regulation
Knowledge of the gene regulation impacting TyrA in prokaryotes is sparse, being limited to the lower-gamma Proteobacteria. Here, extensive information gathered from E. coli has revealed that aroQ•tyrA c_Δ belongs to a large regulon controlled by the TyrR repressor. The limited phylogenetic distribution of TyrR, being present only in the lower-gamma Proteobacteria (Fig. 8), indicates that it is a recent evolutionary acquisition. In E. coli the regulon members that are under the control of tyrR are the aroA Iα_Y > tyrA operon, the aroLM operon, tyrP, tyrB, aroP, mtr, aroA Iα_F and tyrR itself . Thus, tyrR not only regulates the tyrosine branch of the pathway, but heavily impacts the common pathway and the transport of all three aromatic amino acids as well.
Although outside the scope of this study, a logical expansion of it would be to examine the individual evolutionary histories of all the members of the contemporary E. coli TyrR regulon, i.e., asking when and in what order did these genes come under the influence of tyrR? Clearly, the recruitment of structural genes by tyrR has been recent, quite dynamic and even now, exhibits evidence of further ongoing change. For example, tyrosine phenol-lyase (a catabolic enzyme that is only sparsely present in gamma Proteobacteria) has been recruited to the TyrR regulons of Erwinia herbicola  and Citrobacter freundii . In these cases, not only does TyrR perform as a transcriptional activator, but it requires cyclic AMP receptor protein and integration host factor to do so.
As exemplified by E. coli, TyrR is generally a repressor. However, the transcriptional expression of mtr is activated by TyrR in the presence of TYR, and tyrP is activated in the presence of PHE (although it is repressed in the presence of TYR). The N-terminal domain of TyrR has been associated with the ability of TyrR to activate transcription in the case of mtr and tyrP . Members of the Haemophilus/Pasteurella lineage have all lost the N-terminal domain and presumably all lack the ability to accomplish transcriptional activation, as has been demonstrated experimentally with H. influenzae TyrR .
In view of the interesting complexity that two operons (mtr and aroLM) in E. coli are regulated by both tyrR and trpR , it may be more than coincidental that tyrR and trpR seem to have emerged at about the same evolutionary time, i.e., coincident with the divergence of the upper-gamma Proteobacteria from the lower-gamma Proteobacteria (Fig. 7). A possible interaction between the TyrR and TrpR proteins has been noted .
PhhR in relationship to aromatic catabolism
Arias-Barrau et al.  have recently characterized a central catabolic pathway (Hmg) that degrades homogentisate in three steps to fumarate and acetoacetate as a source of carbon and energy. One of several peripheral pathways feeding into the central pathway begins with PHE and produces homogentisate via the reactions of phenylalanine hydroxylase (Phh), aromatic aminotransferase, and 4-hydroxyphenylpyruvate dioxygenase (Hpd). In the absence of Phh, a shorter version of the peripheral pathway is one that can use TYR, but not PHE, as a source of carbon and energy. In Fig. 8 the presence of Phh, Hpd, and Hmg segments of catabolism are mapped on a 16S rRNA tree. (The aromatic aminotransferase distribution is not shown since a multiplicity of aromatic aminotransferases having overlapping substrate specificities makes it particularly challenging to identify the functional role .) The cyanobacteria are unique among Bacteria in the use of Hpd for a completely different metabolic role unrelated to aromatic catabolism, i.e., the synthesis of vitamin E derivatives .
PhhR is a homolog of TyrR that has been shown in P. aeruginosa to be a divergently transcribed activator of a 3-gene operon needed for PHE and TYR catabolism . The structural genes encode phenylalanine hydroxylase (phhA), carbinolamine dehydratase (phhB) and 4-hydroxyphenylpyruvate aminotransferase (phhC), and are powered by a σ54 promoter [61, 62]. PhhR evolved relatively recently since it is only present in some gamma Proteobacteria (Fig. 8). The ancestral regulatory gene for the Phh peripheral pathway may have been a member of the leucine-responsive regulatory protein/asparagine synthase C (Lrp/AsnC) family judging from the adjacent and divergently oriented position of asnC genes to phhA in organisms such as Xanthomonas axonopodis and Mesorhizobium loti. A recent overview of the many different regulator families involved in the control of aromatic catabolism conveys an emerging sense of the variety and dynamic evolutionary processes that underlie aromatic catabolism . Occasional distant homologs of phhR that appear in erratic fashion (see Fig. 9) may have some other regulatory function. For example, Clostridium tetani may use its PhhR homolog as a transcriptional activator of the gene encoding tyrosine phenol-lyase, as occurs in species of Erwinia  and Citrobacter .
Relationship of TyrR and PhhR
What might be of origin of TyrR? TyrR is an anomalous member of the large prokaryote family of σ54 enhancer-binding proteins that activate promoters dependent upon σ54. TyrR is unique within its homology grouping in that it targets σ70 promoters for regulation, usually (but not always) being a repressor. Its closest homolog relative is PhhR, a canonical member of σ54 enhancer-binding proteins. σ54-dependent enhancer proteins possess a highly conserved σ54-contact motif, GAFTGA, that is intimately involved in formation of the ternary complex of enhancer and σ54-RNA polymerase holoenzyme . This is perfectly or nearly perfectly retained in the upper clades shown in Fig. 9, but is disrupted or completely absent in the clades between Shewanella oneidensis and Pasteurella multocida. The deeper phylogenetic distribution of PhhR (Fig. 8) suggests that TyrR evolved as a variant of PhhR. If correct, a regulatory gene that is oriented to catabolism (phhR), and itself of relatively recent origin, was conscripted even more recently for a completely new role in the regulation of primary biosynthesis (tyrR).
Consistent with the latter supposition, the gain of TyrR generally correlates with the loss of competence for aromatic catabolism (Fig. 8). In contrast to the Citrobacter/Salmonella/Escherichia/Shigella and the Pasteurella/Haemophilus clades (whose TyrR homologs completely lack the GAFTGA motif), the remaining enteric clades have retained some residues in this region. These residues appear to be more than random remnants. It would be interesting to know if these residues have any functional significance. Indeed, the Photobacterium/Vibrio clade has retained the ancestral catabolic capabilities (Fig. 8) that would appear to demand retention of regulation via PhhR; yet the parallelism of the overall features of biosynthesis that are shared with other lower-gamma Proteobacteria would seem, on the other hand, to demand TyrR-mediated regulation. Perhaps this "TyrR" species participates in the regulation of both catabolic and biosynthetic genes. In this connection, it is interesting that Chaney et al.  found that a change in the GAFTGA motif of NifA could be partially "suppressed" by mutational changes in the N-terminal region of σ54.
Even more striking as a possible evolutionary intermediate is the most outlying member of the lower- gamma Proteobacteria, Shewanella oneidensis. The position of its TyrR on the protein tree parallels expectations based on the 16S rRNA tree. This, plus the conservation of the TyrR regulon features and the overall gene synteny suggest E. coli-like function as TyrR, i.e. acting as a general repressor of regulon-member σ70 promoters engaged in aromatic biosynthesis. However, the location of "tyrR" in S. oneidensis between phhA and phhB on one side, and hmgB and hmgC on the other side, strongly implies some kind of regulatory relationship with the catabolic genes. It would be quite interesting to determine experimentally whether "TyrR" in S. oneidensis (and maybe Vibrio, as well) can function as a repressor of the usual suite of σ70 promoters, as well as an activator of σ54 promoters for phhA/phhB and/or hmgB/hmgC.
We suggest that TyrR evolved as a modified version of PhhR as follows. In view of the distribution of genes encoding PhhR and TyrR, as well as the aforementioned catabolic enzymes, the most parsimonious evolutionary scenario may be that central and peripheral catabolic pathways depicted in Fig. 8 are quite ancient, but acquisition of PhhR as a σ54-dependent activator of phenylalanine hydroxylase was quite recent, originating about the time of divergence of gamma Proteobacteria. The clade defined by Shewanella/Vibrio/Photobacterium retained the catabolic pathway, whereas the other enteric lineages discarded the catabolic pathway, but retained PhhR, which was then recruited as a σ70-dependent regulator of aromatic biosynthesis (TyrR).
Regulation by attenuation
A widespread mechanism of regulation is via an attenuation mechanism whereby transcripts initiated at given promoters can be terminated prior to reaching the structural genes of an operon. Whether termination occurs usually depends on the balance (dictated by a variety of mechanisms) between mutually exclusive terminator and anti-terminator structures .
Merino has developed a website  to provide a database of putative attenuators ahead of various operons in Bacteria. We screened this database for likely attenuators relevant to the regulation of tyrA. Table 7 shows intriguing results that point to significant experimental work that would be desirable. tyrA is frequently a member of apparent supraoperons, as alluded to elsewhere in this paper, and some of these appear to be large gene clusters controlled by attenuation. Substantial work is needed to establish the depth of clades possessing a given attenuator. For example, the hisH b > tyrA operon is reliably present throughout all alpha Proteobacteria. Since Agrobacterium tumefaciens has been found to possess an attenuator ahead of the hisH b > tyrA operon, one might reasonably expect that most of the alpha Proteobacteria would possess the attenuator as well. If not, this attenuator would have been a very recent evolutionary innovation. Likewise, since the aroA Iα_Y > tyrA operon is widely present throughout the lower-gamma Proteobacteria, it would be interesting to confirm whether only the several species of Vibrio identified on the Merino website have an attenuator ahead of this operon (or whether other attenuators present are too weak to exceed the threshold imposed for preliminary detection).
Some of the supraoperons that appear to be controlled by attenuation are interesting in that they contain the majority of genes needed for both PHE and TYR biosynthesis, e.g., the supraoperons in Enterococcus faecalis and Streptococcus pneumoniae. The latter organism displays two attenuator units. The supraoperon of Desulfovibrio vulgaris is novel in that it begins with two relatively rare genes encoding alternative enzyme steps for aromatic biosynthesis , denoted here as aroA' and aroB'. The leading five genes are adjacent to the seven-gene trp operon.
Protein divergence within a vertical genealogy is not necessarily smooth and progressive. Qualitative biochemical innovations can result in a barrage of new selective pressures that result in evolutionary jumps. The consequent incongruence might easily be mistaken for LGT. The basis for evolutionary jumps will usually only be recognized by detailed and comprehensive analyses of any given subsystem. Examples in this study are as follows. (i) The tyrA c_Δ gene of the lower-gamma Proteobacteria has diverged markedly from tyrA c of the upper-gamma Proteobacteria. Here the milestone event was fusion of aroQ to a putative tyrA c in the ancestor of lower-gamma Proteobacteria to produce aroQ•tyrA c_Δ. Indels within the •tyrA c_Δ domain presumably reflect a multiplicity of selections for functional interactions known to exist between the two fused domains as discussed earlier. (ii) Members of the subclass taxon Actinobacteridae possess TyrAa proteins that separate into two distinct groupings. The presumed ancestral NAD(p)TyrAa that is still present in the Actinobacteridae_1 clade very likely spawned the divergent NAD+-specific variety of TyrAa to yield the contemporary Actinobacteridae_2 clade.
The previous evolutionary analysis of trp-pathway genes [7, 8] can be viewed as a model for comparable studies with other gene systems. Expansion to the greater aromatic pathway is a logical extension. The dynamics of evolutionary change for tyrA can be matched to the dynamics exhibited by the trp system. For example, the lower-gamma Proteobacteria separate as a distinct phylogenetic unit from beta Proteobacteria and upper-gamma Proteobacteria on criteria defined by milestone evolutionary events that altered many character states of both tryptophan and tyrosine biosynthesis in the lower-gamma Proteobacteria. In the future one can anticipate that comprehensive and systematic phylogenetic analysis of each protein member of the TYR, PHE and TRP branches, the common aromatic-pathway trunk, and minor vitamin-like branches (such as the 4-aminobenzoate/folate branch) will accommodate a progressively integrated picture of the entire aromatic network, including catabolic pathways and many other specialized pathways.
Most TyrA sequences were obtained from the National Center for Biotechnology Information (NCBI) . TyrA sequences from incomplete genomes were retrieved from the PEDANT database . Several sequences in our curated TyrA collection have been corrected for incorrect translation start sites. Various curated TyrA sequence files can be downloaded from our website. These files include complete sequences, trimmed catalytic-core domains, and amino-acid sequence segments that are relevant to specificity for pyridine nucleotide or to specificity for the cyclohexadienyl substrate. The sequence files are summarized in Table 3.
TyrA proteins that cluster together on the TyrA protein tree in congruence with the 16S rRNA tree are called congruency groups. Exact correspondence of branching orders is not necessarily observed. So far, congruency groupings have been assembled for tryptophan-protein concatenates  and for TyrA proteins. Completion of equivalent work with the remaining aromatic-pathway segments will identify the repertoire of bacterial organisms in possession of a "pure" vertical genealogy with respect to aromatic biosynthesis. Congruency groups for TyrA can be accessed at our AroPath website , where a listing of the membership of congruency groups is maintained and updated. Any members of congruency-group clusters, whose position there is incongruent with 16S rRNA expectations, probably (but not necessarily) originated by LGT. The donor lineage may not be obvious, but as more genomes come on line, many cases where donor identities are currently unknown may become revealed. A listing of "orphan" TyrA proteins that belong to no current congruency group is given. Such orphans reflect the lack of sufficient genome representation in particular phylogenetic regions and undoubtedly will become the nucleus for additional congruency groups in due course.
Multiple alignments were obtained by use of the ClustalW or ClustalX programs (Version 1.83) . Manual adjustments were needed in the region of the GxGxxG motif for binding of pyridine nucleotide cofactor in the N-terminal region of TyrA proteins. Guidance for alignment was assisted by maximizing conformation with the Wierenga fingerprint, making allowance for a variable loop of 2–5 residues . This was done with the assistance of the BioEdit multiple alignment tool of Hall (5.0.9 Edition) . The refined multiple alignment was used as input for generation of a phylogenetic tree using the phylogeny inference package (Version 3.2), PHYLIP . The neighbor-joining program was used to obtain a distance-based tree. The distance matrix was obtained by use of Protdist with a Dayhoff Pam matrix. The Seqboot and Consense programs were then applied to assess the statistical support of the tree using bootstrap resampling (1,000 replications). We also used the ANCESCON package , which produced similar results as shown in Fig. 2 (albeit with even wider separation of many groups). The presence of regulatory domains (ACT and REG) was accepted when indicated by the Domain Architecture Retrieval Tool (DART) on the BLAST menu at NCBI .
Profile hidden Markov models for each of the four TyrA subfamilies, TyrAa, TyrAc, TyrAp and tyrA c_Δ , were built using Sean Eddy's HMMER package . The HMMs were generated from our file of curated cyclohexadienyl-substrate core segments (see Table 3). The seed sequences for each subfamily were first aligned using ClustalW . The resulting multiple sequence alignments were then manually edited to produce more accurate alignment of the seed sequences. Finally, the edited multiple sequence alignments were used to generate the profile HMMs for each TyrA subfamily.
Appraisal of gene fusions as one-time or multiple events
Whether any given contemporary gene fusions tracked back to a fusion event in a common ancestor or whether they occurred independently was evaluated by phylogenetic analysis of the individual protein domains and by inspection of the inter-domain linker region. Linker regions were determined by multiple alignments of fusion sequences with corresponding free-standing domains present in the closest relatives to organisms that lack the gene fusions.
Xie G, Bonner CA, Jensen RA: Cyclohexadienyl dehydrogenase from Pseudomonas stutzeri exemplifies a widespread type of tyrosine-pathway dehydrogenase in the TyrA protein family. Comp Biochem Physiol C Toxicol Pharmacol. 2000, 125: 65-83.
Jensen RA: Tyrosine and phenylalanine biosynthesis: relationship between alternative pathways, regulation and subcellular location. Rec Adv Phytochem. 1986, 20: 57-82.
Todd AE, Orengo CA, Thornton JM: Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol. 2001, 307: 1113-1143. 10.1006/jmbi.2001.4513.
Teichmann SA, Rison SCG, Thornton JM, Riley M, Gough J, Clothia C: The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. J Mol Biol. 2001, 311: 693-708. 10.1006/jmbi.2001.4912.
Xie G, Brettin T, Bonner CA, Jensen RA: Mixed-function supraoperons that exhibit overall conservation, albeit shuffled gene organization, across wide intergenomic distances within eubacteria. Microb Comp Genomics. 1999, 4: 5-28.
Xie G, Bonner CA, Jensen RA: A probable mixed-function supraoperon in Pseudomonas exhibits gene organization features of both intergenomic conservation and gene shuffling. J Mol Evol. 1999, 49: 108-121.
Xie G, Keyhani N, Bonner CA, Jensen RA: Ancient origin of the tryptophan operon and the dynamics of evolutionary change. Microbiol Mol Biol Rev. 2003, 67: 303-342. 10.1128/MMBR.67.3.303-342.2003.
Xie G, Bonner CA, Song J, Keyhani NO, Jensen RA: Inter-genomic displacement via lateral gene transfer of bacterial trp operons in an overall context of vertical genealogy. BMC Biology. 2004, 2: 15-10.1186/1741-7007-2-15.
Gil R, Silva FJ, Zientz E, Delmotte F, Gonzalez-Candelas F, Latorre A, Rausell C, Kamerbeek J, Gadau J, Holldobler B, et al: The genome sequence of Blochmannia floridanus : comparative analysis of reduced genomes. Proc Natl Acad Sci USA. 2003, 100: 9388-9393. 10.1073/pnas.1533499100.
Gevers D, Vandepoole K, Simillion C, Van de Pere Y: Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 2004, 12: 148-154. 10.1016/j.tim.2004.02.007.
Blanc V, Gil P, Bamasjacques N, Lorenzon S, Zagorec M, Schleuniger J: Identification and analysis of genes from Streptomyces pristinaespiralis encoding enzymes involved in the biosynthesis of the 4-dimethylamino-L-phenylalanine precursor of pristinamycin I. Mol Microbiol. 1997, 23: 191-202. 10.1046/j.1365-2958.1997.2031574.x.
Lingens F, Göbel W, Üsseler H: Regulation der biosynthesis der aromatischen aminosauren in Saccharomyces cerevisiae, I. Hemmung der Enzymaktivitaten (Feedback-Wirkung). Biochem Z. 1966, 346: 357-67.
Zamir LO, Jung E, Jensen RA: Co-accumulation of prephenate L-arogenate and spiro-arogenate in a mutant of Neurospora. 1983, 258: 6492-6496.
National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
Xia T, Jensen RA: A single cyclohexadienyl dehydrogenase specifies the prephenate dehydrogenase and arogenate dehydrogenase components of the dual pathways to L-tyrosine in Pseudomonas aeruginosa. J Biol Chem. 1990, 265: 20033-20036.
Zhao G, Xia T, Ingram L, Jensen RA: An allosterically insensitive class of cyclohexadienyl dehydrogenase from Zymomonas mobilis. Eur J Biochem. 1993, 212: 157-165. 10.1111/j.1432-1033.1993.tb17646.x.
Jensen RA: Enzyme recruitment in evolution of new function. Annu Rev Microbiol. 1976, 30: 409-425. 10.1146/annurev.mi.30.100176.002205.
Hall GC, Flick MB, Gherna RL, Jensen RA: Biochemical diversity for biosynthesis of aromatic amino acids among the cyanobacteria. J Bacteriol. 1982, 149: 65-78.
Subramaniam P, Bhatnagar R, Hooper A, Jensen RA: The dynamic progression of evolved character states for aromatic amino acid biosynthesis in gram-negative bacteria. Microbiology. 1994, 140: 3431-3440.
Byng GS, Whitaker RJ, Gherna RL, Jensen RA: Variable enzymological patterning in tyrosine biosynthesis as a means of determining natural relatedness among the Pseudomonadaceae. J Bacteriol. 1980, 144: 247-257.
Keller B, Keller E, Gorisch H, Lingens F: Biosynthesis of phenylalanine and tyrosine in Streptomycetes. Hoppe Seylers Z Physiol Chem. 1983, 364: 455-459.
Keller B, Keller E, Lingens F: Arogenate dehydrogenase from Streptomyces phaeochromogenes. Purification and properties. Biol Chem Hoppe Seyler. 1985, 366: 1063-1066.
Bonner CA, Jensen RA, Gander JE, Kehani NO: A core catalytic domain of the TyrA protein family: arogenate dehydrogenase from Synechocystis. Biochem J. 2004, 382: 279-291. 10.1042/BJ20031809.
Wierenga RK, Terpstra P, Hol WGJ: Prediction of the occurrence of the ADP-binding β α β-fold in proteins, using an amino-acid sequence fingerprint. J Mol Biol. 1986, 187: 101-107. 10.1016/0022-2836(86)90409-2.
Rippert P, Matringe M: Molecular and biochemical characterization of an Arabidopsis thaliana arogenate dehydrogenase with two highly similar and active protein domains. Plant Mol Biol. 2002, 48: 361-368. 10.1023/A:1014018926676.
Rippert P, Matringe M: Purification and kinetic analysis of the two recombinant arogenate dehydrogenase isoforms of Arabidopsis thaliana. Eur J Biochem. 2002, 269: 4753-4761. 10.1046/j.1432-1033.2002.03172.x.
Xie G, Forst C, Bonner CA, Jensen RA: Significance of two distinct types of tryptophan synthase beta chain in Bacteria, Archaea and higher plants. Genome Biol. 2002, 3: Research0004.1-0004.13. 10.1186/gb-2001-3-1-research0004.
Champney WS, Jensen RA: The enzymology of prephenate dehydrogenase in Bacillus subtilis. J Biol Chem. 1970, 245: 3763-3770.
Xie G, Bonner CA, Brettin T, Gottardo R, Keyhani NO, Jensen RA: Lateral gene transfer and ancient paralogy of operons containing redundant copies of tryptophan-pathway genes in Xylella species and heterocystous cyanobacteria. Genome Biol. 2003, 4: R14-10.1186/gb-2003-4-2-r14.
Chen S, Vincent S, Wilson DB, Ganem B: Mapping of chorismate mutase and prephenate dehydrogenase domains in the Escherichia coli T-protein. Eur J Biochem. 2003, 270: 757-763. 10.1046/j.1432-1033.2003.03438.x.
Mavrodi DV, Ksenzenko VM, Bonsall RF, Cook RJ, Boronin AM, Thomashow LS: A seven-gene locus for synthesis of phenazine-1-carboxylic acid by Pseudomonas fluorescens 2–79. J Bacteriol. 1998, 180: 2541-2548.
Pierson LS, Gaffney T, Lamb S, Gong F: Molecular analysis of genes encoding phenazine biosynthesis in the biological control bacterium. Pseudomonas aureofaciens 30–84. FEMS Lett. 1995, 134: 299-307. 10.1016/0378-1097(95)00423-X.
Eddy SR: Profile-hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.
Park J, Kaplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C: Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998, 284: 1201-1210. 10.1006/jmbi.1998.2221.
Fazel A, Jensen R: Obligatory biosynthesis of L-tyrosine via the pretyrosine branchlet in coryneform bacteria. J Bacteriol. 1979, 138: 805-815.
Fazel AM, Bowen JR, Jensen RA: Arogenate (pretyrosine) is an obligatory intermediate of L-tyrosine biosynthesis: confirmation in a microbial mutant. Proc Natl Acad Sci USA. 1980, 77: 1270-1273.
Byng GS, Berry A, Jensen RA: Evolutionary implications of features of aromatic amino acid biosynthesis in the genus Acinetobacter. Arch Microbiol. 1985, 143: 122-129. 10.1007/BF00411034.
Porat I, Waters BW, Teng Q, Whitman WB: Two biosynthetic pathways for aromatic amino acids in the archaeon Methanococcus maripaludis. J Bacteriol. 2004, 186: 4940-4950. 10.1128/JB.186.15.4940-4950.2004.
Calhoun DH, Bonner CA, Gu W, Xie G, Jensen RA: The emerging periplasm-localized subclass of AroQ chorismate mutases, exemplified by those from Salmonella typhimurium and Pseudomonas aeruginosa. Genome Biol. 2001, 2research0030.1-0030.16.
Ahmad S, Jensen RA: The stable evolutionary fixation of a bifunctional tyrosine-pathway protein in enteric bacteria. Microbiol Lett. 1988, 52: 109-116. 10.1016/0378-1097(88)90309-6.
Jensen RA, Ahmad S: Nested gene fusions as markers of phylogenetic branchpoints in prokaryotes. Trends Ecol Evol. 1990, 5: 219-224. 10.1016/0169-5347(90)90135-Z.
Jensen RA, Gu W: Evolutionary recruitment of biochemically specialized subdivisions of Family I within the protein superfamily of aminotransferases. J Bacteriol. 1996, 178: 2161-2171.
Aravind L, Koonin EV: Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol. 1999, 287: 1023-1040. 10.1006/jmbi.1999.2653.
Henner D, Yanofsky C: Bacillus subtilis and other gram-positive bacteria. Biochemistry, physiology, and molecular genetics. Edited by: Sonenshein AL, Hoch J, Losick R. 1993, Washington, DC: ASM Press
White RH: L-Aspartate semialdehyde and a 6-deoxy-5-ketohexose 1-phosphate are the precursors to the aromatic amino acids in Methanocaldococcus jannashii. Biochemistry. 2004, 43: 7618-7627. 10.1021/bi0495127.
Ahmad S, Johnson JL, Jensen RA: The recent evolutionary origin of the phenylalanine-sensitive isozyme of 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase in the enteric lineage of bacteria. J Mol Evol. 1987, 25: 159-167.
Jensen RA, Xie G, Calhoun DH, Bonner CA: The correct phylogenetic relationship of KdsA (3-deoxy-D-manno-octulosonate 8-phosphate synthase) with one of two independently evolved classes of AroA (3-deoxy-D-arabino-heptulosonate 7-phosphate synthase). J Mol Evol. 2002, 54: 416-423.
Pittard AJ, Camakaris H, Yang J: The TyrR regulon. Mol Microbiol. 2005, 55: 16-26. 10.1111/j.1365-2958.2004.04385.x.
Katayama T, Suzuki H, Koyanagi T, Kumagai H: Cloning and random mutagenesis of the Erwinia herbicola tyrR gene for high-level expression of tyrosine phenol-lyase. Appl Envir Microbiol. 2000, 66: 4764-4771. 10.1128/AEM.66.11.4764-4771.2000.
Bai Q, Somerville R: Integration host factor and cyclic AMP receptor proein are required for TyrR-mediated activation of tpl in Citrobacter freundii. J Bacteriol. 1998, 180: 6173-6186.
Zhao S, Somerville RL: Isolated operator binding and ligand response domains of the TyrR protein of Haemophilus influenzae associate to reconstitute functional repressor. J Biol Chem. 1999, 274: 1842-1847. 10.1074/jbc.274.3.1842.
Arias-Barrau E, Olivera E, Luengo J, Fernandez C, Galan B, Garcia J, Diaz E, Miñambres B: The homogentisate pathway: a central catabolic pathway involved in the degradation of L-phenylalanine, L-tyrosine, and 3-hydroxyphenylacetate in Pseudomonas putida. J Bacteriol. 2004, 186: 5062-5077. 10.1128/JB.186.15.5062-5077.2004.
Dähnhardt D, Falk J, Appel J, van der Kooij A, Schulz-Friedrich R, Krupinska K: The hydroxyphenylpyruvate dioxygenase from Synechocystis sp. PCC 6803 is not required for plastoquinone biosynthesis. FEBS Lett. 2002, 523: 177-181. 10.1016/S0014-5793(02)02978-2.
Song J, Jensen RA: PhhR, a divergently transcribed activator of the phenylalanine hydroxylase gene cluster of Pseudomonas aeruginosa. Mol Microbiol. 1996, 22: 497-507. 10.1046/j.1365-2958.1996.00131.x.
Zhao G, Xia T, Song J, Jensen R: Pseudomonas aeruginosa possesses homologues of mammalian phenylalanine hydroxylase and 4a-carbinolamine dehydratase/DCoH as part of a three-component gene cluster. Proc Natl Acad Sci USA. 1994, 91: 1366-1370.
Tropel D, van der Meer J: Bacterial transcriptional regulators for degradation pathways of aromatic compounds. Microbiol Mol Biol Rev. 2004, 68: 474-500. 10.1128/MMBR.68.3.474-500.2004.
Chaney M, Grande R, Wigneshweraraj S, Cannon W, Casaz P, Gallegos M-T: Binding of transcriptional activators to sigma 54 in the presence of the transition state analog ADP-aluminum fluoride: insights into activator mechanochemical action. Genes Dev. 2001, 15: 2282-2294. 10.1101/gad.205501.
Yanofsky C: The different roles of tryptophan transfer RNA in regulating trp operon expression in E. coli versus B. subtilis. Trends Genet. 2004, 20: 367-374. 10.1016/j.tig.2004.06.007.
Predicted attenuators in bacteria. [http://cmgm.stanford.edu/~merino]
Riley ML, Schmidt T, Wagner c, Mewes H-W, Frishman D: The PEDANT genome database in 2005. Nuc Ac Res. 2005, 33: D308-D310. 10.1093/nar/gki019.
Chenna R, Sugawara H, Koike T, Lopez R, Gibson T, Higgins D, Thompson J: Multiple sequence alignment with the Clustal series of programs. Nucl Ac Res. 2003, 31: 3497-3500. 10.1093/nar/gkg500.
Felsenstein J: PHYLIP-Phylogeny Inference Package (version 3.2). Cladistics. 1989, 5: 164-166.
Cai W, Pei J, Grishin NV: Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol. 2004, 4: 33-10.1186/1471-2148-4-33.
Eddy S: HMMER package. 1995, [http://hmmer.wustl.edu]
R. Jensen thanks the National Library of Medicine (Grant G13 LM008297) for partial support. This research is partially supported by the U. S. Army Research Institute of Infectious Diseases (USAMRIID). This analysis would not have been possible were it not for the yeoman efforts in comparative enzymology carried out over a period of more than 25 years by many students and postdoctoral fellows, most notably Graham S. Byng, Robert Whitaker, Alan X. Berry and Suhail Ahmad. This has produced an invaluable resource of comprehensive data, some of it unpublished. This paper is dedicated to our colleague and collaborator, John E. Gander, on the occasion of his 80th birthday.
JS and MW integrated this specific effort with the broader and ongoing objective of implementing a dynamic and progressively updateable website (AroPath). JS also made substantial contributions to the bioinformatic work. CB did all of the art work and a majority of the bioinformatic analyses. RJ provided initial guiding concepts, a general organizational overview, and assembled the initial manuscript draft. CB, RJ, and JS contributed to the formulation of conclusions made, and all of the authors read and approved the final version of the manuscript.
Electronic supplementary material
Additional File 1: Table S1, entitled "Key to organism acronyms and sequence identifiers", is provided as supplementary material in an html document. This table contains the full collection of sequence data and annotations contained in this paper, and gene identification (gi) numbers are included and hyperlinked to facilitate access to the corresponding GenBank records. For future reference to a progressively updated table, refer to the AroPath website . (HTML 40 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.