An arthropod cis-regulatory element functioning in sensory organ precursor development dates back to the Cambrian
© Ayyar et al. 2010
Received: 5 July 2010
Accepted: 24 September 2010
Published: 24 September 2010
Skip to main content
© Ayyar et al. 2010
Received: 5 July 2010
Accepted: 24 September 2010
Published: 24 September 2010
An increasing number of publications demonstrate conservation of function of cis-regulatory elements without sequence similarity. In invertebrates such functional conservation has only been shown for closely related species. Here we demonstrate the existence of an ancient arthropod regulatory element that functions during the selection of neural precursors. The activity of genes of the achaete-scute (ac-sc) family endows cells with neural potential. An essential, conserved characteristic of proneural genes is their ability to restrict their own activity to single or a small number of progenitor cells from their initially broad domains of expression. This is achieved through a process called lateral inhibition. A regulatory element, the sensory organ precursor enhancer (SOPE), is required for this process. First identified in Drosophila, the SOPE contains discrete binding sites for four regulatory factors. The SOPE of the Drosophila asense gene is situated in the 5' UTR.
Through a manual comparison of consensus binding site sequences we have been able to identify a SOPE in UTR sequences of asense-like genes in species belonging to all four arthropod groups (Crustacea, Myriapoda, Chelicerata and Insecta). The SOPEs of the spider Cupiennius salei and the insect Tribolium castaneum are shown to be functional in transgenic Drosophila. This would place the origin of this regulatory sequence as far back as the last common ancestor of the Arthropoda, that is, in the Cambrian, 550 million years ago.
The SOPE is not detectable by inter-specific sequence comparison, raising the possibility that other ancient regulatory modules in invertebrates might have escaped detection.
The initiation of development of the nervous system in vertebrates and higher invertebrates involves the activity of proneural genes that encode transcription factors of the basic helix-loop-helix (bHLH) class . Their expression in the neuroectoderm endows cells with neural potential and also contributes to the specification of neuronal identity. Proneural genes are conserved throughout the animal kingdom and fall into two main classes: the achaete-scute (ac-sc) and atonal (ato) gene families. They are initially expressed during development in groups/domains of equivalent neuroectodermal cells. An essential, conserved characteristic of proneural genes is their ability to restrict their own activity to single or a small number of progenitor cells within these domains . This is achieved through a process called lateral inhibition, mediated by Notch signaling . The Notch ligand Delta is up-regulated by proneural proteins in future neural precursors and activates the Notch signaling cascade in neighboring cells, resulting in down-regulation of proneural gene expression [3, 4]. Repression of proneural genes is mediated by the products of the Notch target genes Hairy/Enhancer of split. This ancient regulatory network was probably inherited from the earliest Metazoa .
Regulatory sequences involved in the restriction of proneural gene expression from proneural domains to selected neural precursors have mostly been studied in Drosophila melanogaster, in particular with respect to the ac-sc genes and their role in the development of sensory bristles of the adult peripheral nervous system. The D. melanogaster ac-sc gene complex (AS-C) comprises four genes, three of which are required for bristle development. ac and sc are expressed in discrete proneural clusters through the activity of a number of independently acting cis-regulatory modules that are scattered throughout the approximately 150 kb of the AS-C and respond to positional cues [6–9]. Subsequently, the expression of ac and sc refines to single sensory organ precursors (SOPs) where high levels of Ac/Sc activate the third gene, asense (ase), whose expression is limited to SOPs [10–13]. Lateral inhibition and SOP expression is mediated by a specific cis-regulatory element, the SOP enhancer (SOPE) . The SOPE contains binding sites for a number of transcription factors. Auto-regulation in the SOP relies on E boxes, binding sites for Ac, Sc and Ase, which activate their own transcription . The E boxes also mediate repression in cells not selected to be SOPs: products of the Enhancer of split (E(spl)) genes activated by Notch signaling associate with Ac-Sc, leading to transcriptional repression . Binding sites for NF-κB proteins, α boxes, are present and also mediate both activation and repression [14, 17]. It is likely that low levels of NF-κB and high levels of Ac-Sc activate, whereas high levels of NF-κB and low levels of Ac-Sc repress, the neural program . In addition, the SOPEs contain AT-rich sequences, β boxes, of unknown function and N boxes that, in the case of the ac-SOPE, have been shown to bind the transcriptional repressor Hairy [14, 15, 19, 20]. All three genes bear their own SOPE. That of ac is in the promoter close to the transcription start site and differs from the others in being devoid of α boxes (unpublished observations, P. Simpson) [15, 19]. It drives expression of reporter genes first in proneural domains and then in SOPs [15, 19]. The SOPE of sc, positioned 3 kb upstream of the transcriptional start site, and that of ase, positioned in the 5' UTR, drive expression of reporter genes exclusively in the SOP [13, 14]. The SOPEs are strongly conserved in other Drosophilidae.
Proneural genes of both the ac-sc and ato classes have undergone independent duplication events in different taxa. The ato gene family is much expanded in vertebrates whereas duplication of ac-sc genes has taken place in different groups of arthropods [21–24]. Previous data from available insect genomes have shown that while ac-sc genes have undergone a number of duplication events, all species analyzed bear a single ase gene. Conservation of both specific amino acid sequences and the SOPE in the 5' UTR suggest that the insect ase genes are derived from a common ancestor . Here we show that achaete-scute homologue (ASH) and ase-like genes are present in arthropods other than insects. We present evidence that gene duplications separating proneural from precursor-specific (ase-like) functions possibly occurred independently in different arthropod groups and that a SOPE in UTR sequences in ase-like genes of all groups has been inherited from an ancestral ASH/ase precursor gene in the last common ancestor of the Arthropoda.
A highly conserved bHLH domain characterizes proteins encoded by the ac-sc gene family, but outside this domain conservation is very low. Recently, two conserved domains were identified that, in insects, enable a distinction to be made between ASH genes that are expressed in proneural domains, called henceforth proneural ASH genes, and the sensory organ precursor-specific ase genes . Firstly, proteins encoded by ASH genes contain a 16 amino acid carboxy-terminal domain (PDDEELLDYISWWQQQ) that is characteristic of all insect ASH proteins but is less well conserved in Ase proteins (50% identity or less). Secondly, Ase proteins have a characteristic five amino acid motif (hydrophobic-Lys-polar-Glu-hydrophobic) that is absent in all proneural ASH proteins outside the Diptera. These motifs allowed a clear subdivision of the ASH and ase genes in different orders of insects, which is upheld by phylogenetic analysis [22, 25]. A single ase gene (but a variable number of ASH genes) is present in each species analyzed.
To gain further insight into patterns of gene duplication, we have directly tested three different tree topologies in support of: a single ancestral duplication giving an ASH-like and an ase-like gene at the base of the arthropods; independent duplications at the base of the insect-crustacean lineage and of the chelicerate-myriapod branch; and independent duplications at the base of each of the insect, crustacean and chelicerate lineages. The Shinodaira-Hasegawa test discards the first possibility. It supports a single duplication in the chelicerate-myriapod branch. However, it cannot distinguish whether a duplication took place in the last common ancestor of insects and crustaceans or whether it occurred independently in each of these groups. The presence of the Ase-specific domain (which was not used for construction of the phylogenetic tree), together with the position of the ase-SOPE (see below), favors a single duplication common to insects and crustaceans. Thus, within the limits of this analysis, which employs only very short sequences (66 amino acids), the data suggest an independent ASH/ase-like duplication in insects/crustaceans and chelicerates.
Insect AS-C genes are also distinguishable by their expression patterns: ASH genes are expressed in proneural domains prior to the segregation of neural precursors, in contrast to the ase genes, which are only expressed in neural precursors after they have been singled out [7, 8, 11, 12, 27, 28]. In a similar fashion, the crustacean proneural gene Tl-ASH1 was shown to be expressed in clusters of cells, whereas Tl-ASH2 is expressed later in only a subset of the Tl-ASH-expressing cells , providing further evidence that Tl-ASH2 is likely to be an ase orthologue. Interestingly, in the spider, CsAHS1 is expressed in proneural domains whereas expression of CsASH2 is restricted to neural precursors (groups of precursors, instead of single cells, are formed in this species ). This suggests that CsASH2 might carry out an ase-like function. We therefore investigated whether CsASH2 can rescue the specific defects caused by a loss of ase activity in D. melanogaster.
The ase gene of D. melanogaster bears a cis-regulatory sequence in the 5' UTR, the SOPE, that drives expression of a reporter gene in the SOP . Although equivalent enhancer elements drive expression of sc and ac in SOPs, they are located upstream of the transcription start site [14, 15, 19]. Genome analysis indicates that the ase SOPE is conserved in the 5' UTR of the ase gene of other insects whereas no such sequence is found in the transcribed regions of insect proneural ASH genes . The presence of a SOPE in the UTR therefore provides another feature with which to distinguish between ASH and ase-like genes.
Remarkably, we also detected a putative SOPE in the 5' UTR of the single S. maritima ASH gene (Strigamia Genome Project, Human Genome Sequencing Consortium, Baylor College). It covers the region between 36 and 601 bp upstream of the ORF and contains four E boxes, three α boxes, five β boxes and one N box (Figure 4; Additional file 2). These data suggest that a cis-regulatory element located in the UTR, the SOPE, is conserved in the ase-like genes of insects, crustaceans and chelicerates and in the ASH gene of myriapods.
In order to identify conserved motifs and demonstrate the level of conservation, we generated sequence 'logos'  based on the aligned sequences of the individual arthropod SOPE boxes (Additional file 3). The α box shows the most degenerate consensus sequence, with conservation limited to the central part of the NF-κB binding site. However, a clear motif is recovered for the E, β and N boxes. We could not detect a significant conservation of nucleotides surrounding the motifs of the boxes (Additional file 3). In line with previous publications, we identified the E box 'logo' as CAGCTG. This consensus sequence binds strongly to daughterless-AS-C heterodimers. Moreover, unlike ASC homodimers, the binding of Ase-Ase homodimers to this site was observed . It is interesting to note that E boxes with the CAGCTG logo are present once in each arthropod species, although, overall, the motif is present in only 5 of the 17 E boxes identified.
To see whether the strong conservation of binding sites in the UTR of other arthropod ase-like genes is meaningful, we tested the putative SOPEs of T. castaneum and C. salei for function in transgenic flies. Transgenic lines were made containing UAS sequences and the entire transcribed regions or just the ORFs of Tc-ase and CsASH2. Three independent lines of each construct were crossed to the same four Gal4 lines as above. The number of ectopic bristles was used to measure activity. Flies expressing the UAS-Tc-ase ORF displayed an average of 6.0 ectopic bristles, and those expressing UAS-Tc-ase ORF+SOPE an average of 4.4 (Figure 5A, D, E; Additional file 4). Flies expressing the UAS-CsASH2 ORF displayed an average of 14.3 ectopic bristles, and those expressing UAS-CsASH2 ORF+SOPE an average of 3.4 (Figure 5A, F, G; Additional file 4, misexpression experiment). These data indicate that the SOPEs of T. castaneum and C. salei function in a similar fashion to that of D. melanogaster, consistent with the conservation of binding sites in these sequences.
Most new genes are thought to arise through gene duplication because of the need to evolve simultaneously signals for regulation and transcript processing. Our data suggest that gene duplications in individual arthropod lineages have led to the segregation of proneural (ASH) and precursor-specific (ase-like) functions of a single ancestral gene. We show here that bona fide ase genes are present in crustaceans. Phylogenetic analysis was unable to resolve whether the ase and ASH genes of insects and crustaceans are derived from the duplication of an ancestral gene in the last common ancestor of both groups or rather from independent duplications in the individual lineages. However, the presence of the Ase motif in both insect and crustacean ase genes would support a common origin. This is consistent with the Tetraconata hypothesis, which suggests a sister group relationship of insects and crustaceans . We are confident that the myriapod S. maritima has a single ASH gene. This gene would therefore need to perform both proneural (ASH) and precursor-specific (ase) functions. This is likely to reflect the ancestral state. Myriapod and chelicerate ASH genes group together in our phylogenetic analysis. We think this might simply reflect the retention in both groups of many ancestral homologies. Although the phylogenetic position of myriapods is still under debate, most phylogenies are consistent with the Mandibulata hypothesis, which proposes a sister group relationship of Myriapoda and Tetraconata (insects and crustaceans) . The chelicerate lineage represents a basal branch of the arthropods. An independent duplication in chelicerates resulting in two ac-sc orthologues is supported by our phylogenetic analysis. Sequence comparison based on the conserved domains of the insect and crustacean genes does not distinguish a proneural ASH and a precursor-specific ase gene. However, CsASH2 is expressed exclusively in neural precursors and contains an SOPE in the transcript. This, together with its ability to rescue the ase mutant phenotype in D. melanogaster, strongly suggests that CsASH2 carries out an ase-like function. Together, the data support the hypothesis of subfunctionalization and gradual divergence of arthropod ASH and ase-like functions.
In D. melangaster, the SOPE has been shown to mediate the process of refining transcription from a field of cells to single, spaced precursors . The ability to restrict their own transcription to subsets of progenitors is the most highly conserved process associated with proneural genes throughout the animal kingdom . Our data suggest that, at least in arthropods, this process is linked to the presence of the SOPE. It has been shown recently that upstream fragments outside of the SOPE do not drive reporter gene expression in single cells. Furthermore, mutations of the E boxes abolish the activity of the SOPE enhancer . Thus, we are confident that the reduction of ectopic bristles in our transgenic flies containing the ORF+SOPE results from the activity of the SOPE enhancer. In view of the high level of conservation of the specific binding sites, it is likely that it requires not only auto-regulation and Notch-mediated lateral inhibition but also an important contribution from NF-κB signaling [14, 18]. A unique feature of the ase-like genes is the location of the SOPE in the UTR of the transcript. The single ASH gene of S. maritima has retained the SOPE in the transcript. If S. maritima does indeed reflect the ancestral condition, it would indicate that the SOPE was present in the UTR of the ancestral ASH/ase precursor gene. This would place the origin of this regulatory sequence as far back as the last common ancestor of the Arthropoda, that is, in the Cambrian, 550 million years ago.
It appears that in both chelicerates and Tetraconata, the SOPE has been retained in the transcript of the ase-like gene after duplication. In D. melanogaster we know that the ASH duplicates are also regulated by a SOPE but that it has been dislocated from the transcription unit. Like those of Diptera, expression of the ASH genes of crustaceans and spiders is refined from initially broad domains to neural precursors, suggesting that they too are subject to lateral inhibition and the activity of a SOPE [23, 24, 34]. Therefore, in these species also, a SOPE might reside amongst regulatory sequences outside the transcription unit of the ASH genes.
The fact that the SOPE is found in the UTR of all ase-like genes, including the single myriapod orthologue, whether 5' or 3', suggests that this location is important. One possible reason is that it is protected here and is less likely to become separated from the gene since re-arrangement would more often lead to mutations and loss of gene activity. Moreover, if the gene comes under the influence of any other regulatory sequences (outside the transcription unit) the SOPE would still be active. Analysis of the activity of the transgenes whose transcription is initiated by Gal4 > UAS sequences indicates that the presence of the SOPE in the UTR dampens activity. Perhaps the protected location is a failsafe mechanism to ensure refinement of expression to single progenitors. In this context it is interesting to note that we identified a putative SOPE enhancer in the 5' UTR of senseless, another gene whose expression becomes restricted to SOPs [29, 35] (Additional file 5). Alternatively, the SOPE might have been retained in the UTR because it covers an area that contains additional elements for controlling post-transcriptional regulation such as RNA folding. The predicted secondary structure (using the RNAfold WebServer ) of the UTRs shows characteristic structures such as stem-loops and pseudoknots (Additional file 6). However, whether these arrangements exert influence on the regulation of the ase-like genes remains to be shown.
Separation of the SOPE from the transcription unit in ASH genes presumably occurred during (or after) duplication of the ancestral ASH/ase precursor gene. In D. melanogastser, the Dm-sc SOPE is 3 kb upstream of the transcription unit and, furthermore, another cis-regulatory element is situated between the SOPE and the transcription start site [14, 37]. Dm-sc is subject to regulatory input from an array of independently acting enhancer elements, in addition to the SOPE, each of which has to be brought into close proximity to the basal promoter to drive expression in distinct regions [9, 38, 39]. One consequence of this is that the SOPE is probably only active at certain times. In contrast, the Dm-ase SOPE would continuously modulate the rate of transcription after initiation from the basal promoter by virtue of its position in the UTR.
After duplication of the ancestral ASH/ase gene, the SOPE appears to have been disconnected from the transcript of the duplicate that becomes the proneural ASH. This event is likely to have occurred before the divergence of insects and crustaceans. A similar occurrence might have taken place convergently in chelicerates. We suggest that disconnection of the SOPE from the transcript has facilitated the greater complexity of spatial and temporal regulation that underlies the diversity of patterning of the nervous system in arthropods. This could have unfolded during evolution as follows. The common ancestor of the Arthropoda probably had a single ASH/ase-like gene, similar to that of the extant myriapods. It would have been expressed ubiquitously over the neuro-epithelium and subsequently restricted to single precursors. This ancestral expression pattern can be observed today in Onychophora, the closest relative of the arthropods (B. J. Erkisson and A Stollewerk, unpublished). Transcriptional modulation would have been mediated by the SOPE, located in a protected position in UTR sequences. Gene duplication followed by subfunctionalization resulted in proneural ASH and precursor-specific ase-like genes independently in Tetraconata and chelicerates. Retention of the SOPE in the transcript of ase-like genes ensures its expression in SOPs. Loss of the SOPE from the transcript allowed ASH expression to be spatially regulated by other (non-transcribed) cis-regulatory sequences. An independent transcriptional regulation would not be effective in the ase-like genes because of the presence of the SOPE in the UTR. Cis-regulatory elements for spatial expression might have been acquired more recently. Indeed, the most complex regulation of ASH genes is seen in cyclorraphous flies, where expression in small clusters of cells at precise positions prefigures the development of large sensory bristles, macrochaetes. Macrochaetes are an evolutionary novelty of higher flies and are found in species-specific patterns. In D. melanogaster the patterns rely on an array of independently acting cis-regulatory elements  that are likely to have arisen in the Cyclorrapha along with the additional duplication events of the ancestral ASH gene [25, 40].
An increasing number of publications demonstrate conservation of function of cis-regulatory elements without sequence similarity (reviewed by ). In vertebrates the functional conservation even spans the evolutionary distance between humans and zebrafish . In invertebrates such functional conservation has only been shown for closely related species that diverged from their common ancestor not longer than 25 to 60 million years ago (for example, [43, 44]). Our results demonstrate for the first time the existence of an ancient arthropod regulatory element dating back to the Cambrian (about 500 million years ago). The element shows a conserved function but without sufficient sequence conservation to be detected on the basis of sequence alignment, opening the possibility that other ancient invertebrate regulatory elements remain to be discovered.
Flies were maintained on standard cornmeal-agar medium at 18°C and Oregon-R was used as a control. Strains used were: ase 1 (formerly known as sc 2 ), toll-8[MD806] Gal4 , ptc-Gal4, sca[537.4] Gal4, achaete[SBM] Gal4 , UAS-sc (FlyBase ). UAS-constructs for ectopic expression of D. melanogaster and T. castaneum ase and C. salei CsASH2 were generated by standard techniques. P-element-mediated transformation was performed by standard techniques.
ase 1 flies were crossed to hsp70Gal4 > UAS Dm-ase, hsp70Gal4 > UAS Tc-ase and hsp70Gal4 > UAS CsASH2 flies, respectively, and allowed to lay eggs in culture bottles for 3 days. Heat shocks were preformed between 16 hours and 8 hours before puparium formation. Heat shock expression was driven by three 1-hour heat shocks at 37°C, separated by 2 hour intervals at 25°C. Wings were mounted in glycerol and analyzed under a compound microscope (Leica).
Three independent lines were generated for each UAS construct (ORF only and ORF+SOPE) and crossed to four different Gal4 lines that activated the constructs in the expression domains of toll-8, patched, scabrous and ac. Flies of the appropriate genotype were selected, mounted and the bristles were counted under the dissecting microscope (Leica). Statistical analysis was performed using Microsoft Excel.
The sequenced genomes of D. pulex (Daphnia_pulex 2006-09 JGI) and S. maritima (Strigamia maritima Genome Project by Baylor College of Medicine, NCBI Project ID 20501) were searched using tblastn with the ASH and Ase proteins of D. melanogaster, T. castaneum and C. salei as queries. Hits with relevant homology to the bHLH domain were further characterized. Three genes were identified in the D. pulex genome: an ASH homologue (JGI_V11_254034), an ase homologue (JGI_V11_254038) and a truncated copy of ase (JGI_V11_232740). In the S. maritima genome only one ASH homologue was identified; a second gene analyzed was too divergent in the basic region of the bHLH domain to be classified as an ASH gene.
Phylogenetic trees were constructed using all ASH and ase genes from the insects D. melanogaster and T. castaneum, the crustaceans T. longicaudatus and D. pulex, the chelicerate C. salei and the myriapods G. marginata and S. maritima. Amino acid sequences have been aligned with ClustalW2 , manually improved and conserved regions selected with Gblocks (using permissive parameters) . The resulting alignment is only 66 amino acids long and corresponds roughly to the basic region and helixes of the bHLH domain and to the carboxy-terminal domain. Trees have been constructed by maximum likelihood methods and tree topologies compared with the Shinodaira-Hasegawa test as implemented in the Phylip package .
open reading frame
sensory organ precursor
sensory organ precursor enhancer
We are grateful to Michael Akam for giving us access to the unpublished S. maritima genome data. We thank Andrew Jarman and Aziz Aboobaker for helpful discussions and Sung Ly for technical assistance. This work was supported by grants 29156 (Wellcome Trust) to PS and STO 361/2-1, STO 361/6-2 (German Research Foundation) and BB/F021909/1 (Biotechnology and Biological Sciences Research Council) to AS.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.