Conservation of coding sequences suggest duplication and subfunctionalization of an ancestral arthropod gene into proneural and ase-like functions
A highly conserved bHLH domain characterizes proteins encoded by the ac-sc gene family, but outside this domain conservation is very low. Recently, two conserved domains were identified that, in insects, enable a distinction to be made between ASH genes that are expressed in proneural domains, called henceforth proneural ASH genes, and the sensory organ precursor-specific ase genes [22]. Firstly, proteins encoded by ASH genes contain a 16 amino acid carboxy-terminal domain (PDDEELLDYISWWQQQ) that is characteristic of all insect ASH proteins but is less well conserved in Ase proteins (50% identity or less). Secondly, Ase proteins have a characteristic five amino acid motif (hydrophobic-Lys-polar-Glu-hydrophobic) that is absent in all proneural ASH proteins outside the Diptera. These motifs allowed a clear subdivision of the ASH and ase genes in different orders of insects, which is upheld by phylogenetic analysis [22, 25]. A single ase gene (but a variable number of ASH genes) is present in each species analyzed.
In order to classify proneural and precursor-specific genes in other arthropod groups, we applied the above criteria to recently published sequences. Two ASH genes were described in the crustacean Triops longicaudatus [24]. The authors show that the deduced amino acid sequence of Tl-ASH1 bears the ASH carboxy-terminal domain, while this sequence is not conserved in Tl-ASH2. We identified the Ase motif in Tl-ASH2, confirming that this gene is in fact an asense orthologue (Figure 1). Furthermore, we detected single ASH and ase orthologues, Dpu-ASH and Dpu-ase, in the Daphnia pulex genome (Daphnia Genome Consortium), which can clearly be distinguished by the presence of the respective domains (Figure 1). The spider Cupiennius salei (chelicerate) displays two ASH orthologues [23] but sequence analysis does not unambiguously distinguish a bona fide ase gene in this species. The carboxy-terminal domain of CsASH1 displays a greater similarity to that of insect and crustacean ASH proteins (56% amino acid identity) than does that of CsASH2 (30%). However, neither CsASH1 nor CsASH2 contain the five amino acid motif characteristic of Ase (Figure 1). A single orthologue has been identified in each of the myriapods Glomeris marginata [26] and Strigamia maritima (Strigamia Genome Project, Human Genome Sequencing Consortium, Baylor College). We are confident that there is only a single copy present in the S. maritima genome (see Materials and methods). They show 50% and 62% identity with the insect ASH carboxy-terminal domain, respectively, and 78% identity with the carboxy-terminal domain of CsASH1. They lack the Ase-specific motif and would appear to be ASH genes.
Earlier analyses suggested that the two ASH genes of T. longicaudatus and of C. salei arose from duplication events that are independent from those of insects and from each other [23, 24]. We have performed a new phylogenetic analysis that includes the D. pulex sequences that were not previously available. All ASH and ase genes from the insects D. melanogaster and Tribolium castaneum, the crustaceans T. longicaudatus and D. pulex, the chelicerate C. salei and the myriapods G. marginata and S. maritima were included. The phylogenetic tree shows that the ASH and Ase proteins of insects group together, as do those of crustaceans, while the ASH proteins of myriapods and of chelicerates are arranged in a single group (Figure 2). Both the insect and the crustacean proteins are clearly subdivided into the ASH and Ase groups, that is, T. castaneum Ase groups with D. melanogaster Ase and T. castaneum ASH is arranged in a group with D. melanogaster Ac, Sc and the proneural protein Lethal of Scute (L'sc). Similarly, the D. pulex Dpu-Ase protein groups with T. longicaudatus Tl-ASH2, and Tl-ASH1 groups with Dpu-ASH, rather than with the insect orthologues (Figure 2). The spider CsASH1 and CsASH2 are arranged in a group with the single myriapod homologues (Figure 2). The analysis suggests independent duplication events in insects, crustaceans and chelicerates. However, two features confound the phylogenetic inference. First, the ASH and ase genes in the individual arthropod groups might have evolved at different rates - for example, a faster evolution of insect ASH and ase genes would prevent them from grouping with their crustacean orthologues. Second, the myriapod and the chelicerate ASH genes might group together because they have retained many ancestral homologies.
To gain further insight into patterns of gene duplication, we have directly tested three different tree topologies in support of: a single ancestral duplication giving an ASH-like and an ase-like gene at the base of the arthropods; independent duplications at the base of the insect-crustacean lineage and of the chelicerate-myriapod branch; and independent duplications at the base of each of the insect, crustacean and chelicerate lineages. The Shinodaira-Hasegawa test discards the first possibility. It supports a single duplication in the chelicerate-myriapod branch. However, it cannot distinguish whether a duplication took place in the last common ancestor of insects and crustaceans or whether it occurred independently in each of these groups. The presence of the Ase-specific domain (which was not used for construction of the phylogenetic tree), together with the position of the ase-SOPE (see below), favors a single duplication common to insects and crustaceans. Thus, within the limits of this analysis, which employs only very short sequences (66 amino acids), the data suggest an independent ASH/ase-like duplication in insects/crustaceans and chelicerates.
C. salei CsASH2 rescues ase-specific defects in D. melanogaster
Insect AS-C genes are also distinguishable by their expression patterns: ASH genes are expressed in proneural domains prior to the segregation of neural precursors, in contrast to the ase genes, which are only expressed in neural precursors after they have been singled out [7, 8, 11, 12, 27, 28]. In a similar fashion, the crustacean proneural gene Tl-ASH1 was shown to be expressed in clusters of cells, whereas Tl-ASH2 is expressed later in only a subset of the Tl-ASH-expressing cells [24], providing further evidence that Tl-ASH2 is likely to be an ase orthologue. Interestingly, in the spider, CsAHS1 is expressed in proneural domains whereas expression of CsASH2 is restricted to neural precursors (groups of precursors, instead of single cells, are formed in this species [23]). This suggests that CsASH2 might carry out an ase-like function. We therefore investigated whether CsASH2 can rescue the specific defects caused by a loss of ase activity in D. melanogaster.
Flies lacking ase function exhibit only a mild phenotype because activity of ac, sc and senseless compensates for most of the defects [11, 13, 29, 30]. However, one defect is specific to ase: differentiation of the stout mechanosensory bristles of the triple row of bristles on the anterior wing margin is impaired [11, 13]. In ase1 mutant flies these bristles show variable defects that include a split shaft, two to three shafts arising from a single socket, an empty socket or a complete duplication (Figure 3H-K). When over-expressed, Dm-ase, but neither Dm-ac nor Dm-sc, has been shown to rescue these defects [11, 13]. We found that CsASH2, as well as Tc-ase, display a rescuing activity comparable to that of Dm-ase (Figure 3A-G). The number of defective bristles in ase1 flies is reduced from an average of 9.7 to 2.5, 4.9 and 3.9 in flies expressing Dm-ase, Tc-ase and CsASH2, respectively (Figure 3A-G; Additional file 1). CsASH2 can therefore substitute for functions specific to Dm-ase, which suggests that it might carry out precursor-specific ase-like functions in the spider.
ase-like genes of insects, crustaceans and chelicerates and the ASHgenes of myriapods bear a conserved regulatory sequence in the UTR
The ase gene of D. melanogaster bears a cis-regulatory sequence in the 5' UTR, the SOPE, that drives expression of a reporter gene in the SOP [13]. Although equivalent enhancer elements drive expression of sc and ac in SOPs, they are located upstream of the transcription start site [14, 15, 19]. Genome analysis indicates that the ase SOPE is conserved in the 5' UTR of the ase gene of other insects whereas no such sequence is found in the transcribed regions of insect proneural ASH genes [22]. The presence of a SOPE in the UTR therefore provides another feature with which to distinguish between ASH and ase-like genes.
The SOPE bears binding sites for four specific transcription factors. Interestingly, we identified clusters of the relevant binding sites in the UTR of D. pulex ase and in CsASH2 of C. salei. The putative SOPE is located in the 5' UTR of Dpu-ase and in the 3' UTR of CsASH2. No such sequence is found in the UTR of CsASH1. Individual binding sites were identified by manual comparison of consensus sequences (Figure 4; Additional file 2). The Dm-ase SOPE, located 144 bp upstream of the start codon, contains four E boxes, two α boxes, one β box and one N box. We identified a putative SOPE in an additional insect, T. castaneum, which covers 1,145 bp of the 5' UTR starting 95 bp upstream of the ase open reading frame (ORF). It contains three E boxes, two α, five β and one N box (Figure 4; Additional file 2). In D. pulex the putative SOPE is located 1,048 bp upstream of the ase ORF and extends over 882 bp in the presumptive 5' UTR. One E box, one α box, two β boxes and one N box are present in this region (Figure 4; Additional file 2). The putative SOPE of C. salei CsASH2 is also close to the ORF but is located in the 3' UTR, between 3 bp and 249 bp downstream of the stop codon. It contains three E boxes, one α box, and one β box. No N box can be identified in this species (Figure 4; Additional file 2).
Remarkably, we also detected a putative SOPE in the 5' UTR of the single S. maritima ASH gene (Strigamia Genome Project, Human Genome Sequencing Consortium, Baylor College). It covers the region between 36 and 601 bp upstream of the ORF and contains four E boxes, three α boxes, five β boxes and one N box (Figure 4; Additional file 2). These data suggest that a cis-regulatory element located in the UTR, the SOPE, is conserved in the ase-like genes of insects, crustaceans and chelicerates and in the ASH gene of myriapods.
In order to identify conserved motifs and demonstrate the level of conservation, we generated sequence 'logos' [31] based on the aligned sequences of the individual arthropod SOPE boxes (Additional file 3). The α box shows the most degenerate consensus sequence, with conservation limited to the central part of the NF-κB binding site. However, a clear motif is recovered for the E, β and N boxes. We could not detect a significant conservation of nucleotides surrounding the motifs of the boxes (Additional file 3). In line with previous publications, we identified the E box 'logo' as CAGCTG. This consensus sequence binds strongly to daughterless-AS-C heterodimers. Moreover, unlike ASC homodimers, the binding of Ase-Ase homodimers to this site was observed [13]. It is interesting to note that E boxes with the CAGCTG logo are present once in each arthropod species, although, overall, the motif is present in only 5 of the 17 E boxes identified.
The SOPE of C. salei CsAHS2 and T. castaneum ase are functional in D. melanogaster
The Dm-ase SOPE had been shown to display enhancer activity when placed upstream of an hsp70 promoter and a reporter gene [13]. To test its effects when positioned in the UTR, we generated transgenic lines carrying UAS constructs containing either the entire transcribed region (including the UTR sequences) or merely the ORF. Since reporter gene fusion constructs that cover different regions upstream of the ORF only restrict expression to single SOPs if the 560-bp UTR containing the SOPE is present [13], the ORF+SOPE constructs should reduce the number of bristles. Three independent lines of each construct were crossed to four different Gal4 lines, each of which drives expression in all or part of the D. melanogaster notum. As expected, both transgenes caused the development of ectopic bristles but their number was significantly reduced in the construct containing the entire UTR. Flies expressing the UAS-Dm-ase ORF displayed, in total, an average of 10.9 ectopic bristles, compared with 7.3 in flies expressing the UAS-Dm-ase ORF+SOPE (Figure 5A-C; Additional file 4). We therefore conclude that the SOPE regulates gene activity from its position in the UTR and that, when transcription is initiated from exogenous UAS sequences, it functions to dampen transcription. This is consistent with its proposed function to restrict proneural gene activity from broad expression domains to single neural progenitors.
To see whether the strong conservation of binding sites in the UTR of other arthropod ase-like genes is meaningful, we tested the putative SOPEs of T. castaneum and C. salei for function in transgenic flies. Transgenic lines were made containing UAS sequences and the entire transcribed regions or just the ORFs of Tc-ase and CsASH2. Three independent lines of each construct were crossed to the same four Gal4 lines as above. The number of ectopic bristles was used to measure activity. Flies expressing the UAS-Tc-ase ORF displayed an average of 6.0 ectopic bristles, and those expressing UAS-Tc-ase ORF+SOPE an average of 4.4 (Figure 5A, D, E; Additional file 4). Flies expressing the UAS-CsASH2 ORF displayed an average of 14.3 ectopic bristles, and those expressing UAS-CsASH2 ORF+SOPE an average of 3.4 (Figure 5A, F, G; Additional file 4, misexpression experiment). These data indicate that the SOPEs of T. castaneum and C. salei function in a similar fashion to that of D. melanogaster, consistent with the conservation of binding sites in these sequences.