Human Lsg1 defines a family of essential GTPases that correlates with the evolution of compartmentalization
© Reynaud et al. 2005
Received: 07 September 2005
Accepted: 07 October 2005
Published: 07 October 2005
Skip to main content
© Reynaud et al. 2005
Received: 07 September 2005
Accepted: 07 October 2005
Published: 07 October 2005
Compartmentalization is a key feature of eukaryotic cells, but its evolution remains poorly understood. GTPases are the oldest enzymes that use nucleotides as substrates and they participate in a wide range of cellular processes. Therefore, they are ideal tools for comparative genomic studies aimed at understanding how aspects of biological complexity such as cellular compartmentalization evolved.
We describe the identification and characterization of a unique family of circularly permuted GTPases represented by the human orthologue of yeast Lsg1p. We placed the members of this family in the phylogenetic context of theYlqFRelatedGTPase (YRG) family, which are present in Eukarya, Bacteria and Archea and include the stem cell regulator Nucleostemin. To extend the computational analysis, we showed that hLsg1 is an essential GTPase predominantly located in the endoplasmic reticulum and, in some cells, in Cajal bodies in the nucleus. Comparison of localization and siRNA datasets suggests that all members of the family are essential GTPases that have increased in number as the compartmentalization of the eukaryotic cell and the ribosome biogenesis pathway have evolved.
We propose a scenario, consistent with our data, for the evolution of this family: cytoplasmic components were first acquired, followed by nuclear components, and finally the mitochondrial and chloroplast elements were derived from different bacterial species, in parallel with the formation of the nucleolus and the specialization of nuclear components.
Comparative genomics is a powerful method for identifying the potential functions of previously uncharacterized genes, allowing their distribution among the kingdoms of life to be characterized, and the changes in sequence and regulation underpinning their conserved or divergent functions to be tracked . Comparative genomics has been enormously facilitated by progress in bioinformatics tools, comprising the enormous amount of information available from databases concerning protein localization [2, 3], viability [4, 5], protein expression , genetic interactions  and protein-protein interactions . These resources are usually focused on one particular organism (S. cerevisiae,C. elegans,D. melanogaster orB. subtilis) and are therefore mainly used by the small part of the scientific community working with this organism and able to handle the outcome and limitations. Attempts have been made to correlate large datasets across species, for example in the case of protein-protein interactions . These cross-correlation analyses are based on the presumption that sequence and structural similarities between gene products can be used to assess functional similarities [10, 11] and could in principle be extended to protein localization, viability or partners.
Genomics should be particularly powerful in the case of GTP binding proteins (or GTPases), which despite extraordinary functional diversity are all believed to have evolved from a single common ancestor . As a result, all known GTPases have a conserved switch mechanism of action, core structure and sequence motifs. These proteins are found in all domains of life and are involved in such essential processes as vesicular trafficking, protein translation, intracellular signal transduction and cell cycle progression [12–14]. GTP binding proteins are often described as molecular switch proteins because of their particular mode of action. Binding and hydrolysis of GTP results in conformational changes in the so-called switch regions of the protein, which define the active GTP- and the inactive GDP-bound forms; these are used, for instance, for regulating receptor activation and cargo recruitment to membranes .
We have used comparative genomics to identify and characterize the human homologue of the yeast protein Lsg1. Here, we describe a novel family of GTP binding proteins, which we have named YRG (YlqF Related GTPases). Members of this family contain a central GTPase domain showing a unique circular permutation of the known G motifs of the GTP binding proteins. A phylogenetic analysis was used for cross-species comparisons, focusing on sub-cellular localization, cell viability and the known functions of each subfamily member. This analysis showed that YRG family members are essential, have increased in eukaryotes as cell compartmentalizationhas evolved, and show functional conservation in relation to rRNA maturation.
Recently, we have localized more than 800 human proteins in living cells with the aim of gaining preliminary functional data . Analysis of these proteins for sequences exhibiting characteristic GTPase motifs such as the P-loop  allowed us to identify a subset of proteins as putative GTPases.
A BLAST search for similar protein sequences  shows that this unusual GTPase is present as a single copy per genome  (Figure 1B [see Additional file 1]). Only one member of the family has so far been experimentally defined, namely the Lsg1 protein inS. cerevisiae . Accordingly, we named the human protein hLsg1 (human orthologue ofLsg1). All orthologues of hLsg1 possess a central MMR/HSR1 domain belonging to KOG1424 in the database of clusters of orthologous genes . The identity between aligned sequences ranges from 31% to 88%. Interestingly, except in theE. cuniculi member, the GTPase domain contains an unusual insertion in comparison to the canonical GTPase structure. This insertion separates the G4 element from the remaining GTPase elements (G1, G2 and G3) (Figure 1A).
In order to elucidate the potential function of hLsg1, we extended our phylogenetic analysis. Owing to their unique structure, circularly permuted GTPases have previously been reported [22, 23] and partially grouped into the Yawg/YlqF family (COG1160) , which is mainly restricted to prokaryotes and microbial eukaryotes (S. cerevisiae,E. cuniculi). This family contains five subfamilies: YjeQ (YloQ), MJ1464, YqeH, YlqF and Yawg. The latter three branches have eukaryotic members, YlqF representing the ancestor of hLsg1. Interestingly, while the YqeH subfamily is limited to only one member per species (also labeled as Euk-porin in sequence database), and the YjeQ subfamily is mainly restricted to bacteria , the YlqF subfamily shows a large expansion of this gene family in eukarya (Figure 1C). The YlqF subfamily can be further subdivided into five clades: YlqF (bacterial), MTG1 (KOG2485), LSG1 (KOG1424), NOG2 (Yawg, KOG2423), and NUG1 (KOG2484) according to theS. cerevisiae nomenclature. The YlqF family expands further in Coelomates [GNL1, 23] and in Deuterostomia (Nucleostemin ) (Figure 1B [see Additional file 2]).
Next, we exploited the experimental data from a comprehensive large-scale localization screen in yeast  and we conducted literature searches to deduce the possible cellular localizations of the different family members, ranging from the nucleolus to the mitochondria. The nucleolus is the compartment in which the large ribosomal RNA precursor (pre-rRNA) is synthesized, processed into the mature 18S, 5.8S, and 28S rRNAs and assembled with proteins to form ribosomal subunits that move to the nucleoplasm and are finally exported to the cytoplasm. Mitochondria and chloroplasts also possess a set of ribosomes. All yeast members (LSG1, NOG2, NUG1 and MTG1) are involved in ribosome biogenesis [24, 28–30], and YjeQ binds to the ribosome inE. coli . Finally, using ChloroP  to predict proteins localized to the chloroplast, we detected a sixth subfamily in YlqF, called ChYlqF (for Chloroplast YlqF), and a second subfamily in YqeH, called ChYqeH (for chloroplast YqeH). These are only found in plant genomes and group in the phylogenetic tree with the cyanobacteria YRG and YqeH members (Figure 1B [see Additional files 1 and 2]).
We used the large datasets from gene viability screens of bacteria, worms and flies to compare our observations with data about other YRG family members. YjeQ was shown to be indispensable for the growth ofE. coli andB. subtilis . InC. elegans, YRG orthologues are non-viable (t19a6.2a, t19a6.2b, k01c8.9, C53H9) (Figure 3C). Since large human RNAi screens are only now in progress, no data were available for other YRG human genes. However, interestingly, overexpression of nucleostemin was shown to be lethal .
According to our results, hLsg1 is essential, like its yeast counterpart, and this characteristic seems to be common to the YRG family members. This implies that each YRG protein fulfils essential functions.
Compartmentalization of the human cell allows better control of function and reactions steps in many pathways, including ribosome assembly. Cellular localization is a key to defining protein function. Using large-scale localization screens, we previously identified hLsg1 as an endoplasmic reticulum localized protein , in contrast to yeast Lsg1, which is proposed to localize specifically to the cytosol .
Immunostaining with an antibody against the entire protein showed that the endogenous protein also localized to reticular membranes, and in a fraction of the cells to a number of small punctuate nuclear structures. These results are very similar to those obtained with the hLsg1-YFP fusion protein (Figure 4A, 2). Double staining showed that hLsg1 partially co-localized with an ectopically expressed FP (fluorescent protein) used to mark the ER(Clontech ER marker) (Figure 4B, bottom row), as well as with the nuclear envelope marker lamin B1 (Figure 4B, top row). However, hLsg1 was largely absent from the Golgi complex, which was labeled with antibodies against the Golgi membrane protein golgin97, and from mitochondria, marked by antibodies against HSP60 (data not shown). Moreover, the small hLsg1-positive nuclear structures observed in a fraction of the cells co-localized with coilin, a typical marker of Cajal bodies (CBs) (indicated byarrowheads in Figure 4B, middle row). The CBs are functionally linked to the nucleolus and play a major role in the maturation of RNP, acting on the mRNA as well as the rRNA pathway .
These data demonstrate that in contrast to its yeast counterpart, hLsg1 localizes to the ER and to Cajal bodies in the nucleus.
To determine whether hLsg1 shuttled between nucleus and cytosol via a CRM1-dependent nuclear export pathway, we transfected Vero cells with either hLsg1-YFP or hLsg1 deletion mutants and compared the localization of the fusion proteins after treatment with the CRM-1 nuclear export inhibitor Leptomycin B (LMB) (Figure 5B). Full-length hLsg1 (YFP-hLsg1) is LMB-sensitive (Figure 5B); so is its C-terminal counterpart hLsg1-CFP (data not shown). To confirm this, we performed the same experiment using the deletion mutants YFP-hLsg1-1-600 and YFP-hLsg1-480-658 as well as the full length YFP-hLsg1. We also took intermediate time points (3 h and 5 h) to obtain insights into the kinetics of hLsg1 shuttling. Interestingly, YFP-hLsg1 accumulates in the nucleus over an 8 h period, and at 5 h most of the transfected cells showed punctate labeling in the nucleus reminiscent of Cajal bodies. YFP-hLsg1-480-658 showed a permanent nuclear location and YFP-hLsg1-1-600 was constantly in the cytosol.
These data suggest that hLsg1 shuttles between the cytosol and Cajal bodies via a CRM1-dependent export mechanism.
Using database sequence similarity searches coupled with phylogenetic analysis, we were able to unite the circularly permuted GTPases into a family that we have named YRG forYlqFRelatedGTPases [see Additional files 1 and 2]. The YlqF protein family represents the largest subfamily of YRG expansion in eukarya, which is potentially involved in ribosome biogenesis.
Phylogenetic analysis defines ten GTPase subfamilies with a global phyletic distribution compatible with their presence in the last universal common ancestor (LUCA) of extant life forms . An emerging concept suggests that these universal GTPases are necessary either for ribosome function or for transmitting information from the ribosome to downstream targets to generate specific cellular responses. These are associated with translation and include four translation factors, two OBG-like GTPases, the two signal-recognition-associated GTPases, the MRP subfamily of MinD-like ATPases and the YRG family. Here we have defined the YRG family for the first time as a eukaryotic expansion of the original Yawg/YlqF family  tightly coupled to the evolution of compartmentalization.
The YRG family was originally defined as a particular class of GTPases showing a circularly permuted structure, with the four GTPase motifs reorganized as G4 followed by G1, G2 and G3 (Figure 1A). This circular permutation is unique in the GTPase superfamily. However, we have shown that this inverted structure does not seem to affect GTPase activity or folding, in agreement with other studies [31, 39]. Moreover, regarding the potential function of this family, it has been pointed out that most YRG members bind to the ribosome [YjeQ, ], are involved in the maturation of ribosomes or mitoribosomes [24, 28, 29, 2], localize to compartments related to rRNA maturation [NGP, [1, 39]], and are essential proteins (see Figure 3C and Additional file 2). Altogether, this indicates that YRG members have an essential role in ribosomal assembly.
Interestingly, hLsg1 is the only member of this family that shows a dual localization (cytosol/endoplasmic reticulum and Cajal Bodies). The cytosol contains huge numbers of ribosomes freely diffusing or bound to the endoplasmic reticulum, and is the main transit pathway for rRNAen route to the mitochondria or the chloroplast. Cajal Bodies are spherical nuclear bodies containing a variety of components including nucleolar proteins, snRNPs and SMN. They are dynamic structures functionally linked to the nucleolus, presumably involved in RNP maturation and related to gene expression [43, 44]. Consistent with these data, one could hypothesize that hLsg1 is a regulator of the rRNA pathway that can relocate to Cajal Bodies and interact with specific factors such as nucleolar proteins. The observation that Leptomycin B treatment leads to accumulation of hLsg1 in the nucleus clearly indicates shuttling via a CRM1-dependent export pathway. We hypothesized that hLsg1 relocalizes from the cytosol to the nucleus in response to internal (e.g. cell cycle) or external (e.g. growth factor) stimuli. In this way, hLsg1 would act on the control of rRNA biosynthesis at its source: the nucleolus. In the future, these hypotheses will be tested for hLsg1 and for the other YRG family members to elucidate their role in rRNA biosynthesis and maturation.
Using comparative genomics, we defined the YRG family as a unique group of circularly permuted GTPases. We suggest a potential function for this family, as well as a potential pathway by which the family members may act sequentially, following an evolutionary process linked to compartimentalization (Figure 6). A future goal will be to test this hypothesis experimentally and to dissect the molecular mechanisms of action of each member of the pathway.
The translated sequence of theHomo sapiens gene FLJ11301 (GenBank accession no. NP_060855) was used to search the non-redundant protein database at the National Center for Biotechnology Information using the PSI-BLASTP program (15). Homologues were identified inHomo sapiens (GenBank accession identifier BAA92116),Mus musculus (XP_148574),Danio rerio (AAH66695),Caenorhabditis elegans (NP_490904),Caenorhabditis briggsae (CAE74467),Drosophila melanogaster (NP_569915),Anopheles gambiae (EAA13064),Saccharomyces cerevisiae (NP_011416),Schizosaccaromyces pombe (NP_593948),Arabidopsis thaliana (NP_172317),Zea mays (AAD41267),Encephalitozoon cuniculi (CAD26329),Eremothecium gossypii (NP_985506) andPlasmodium falciparum (NP_702181). The sequence corresponding toRattus norvegicus had to be reconstructed using an insertion fromMus musculus, probably owing to an incorrect gene prediction (XP_213604).
The 14 orthologous sequences were aligned using the ClustalW program . PSI-BLAST searches on the NCBI protein database were performed using different representatives of the YRG family as seed, according to the bibliography, and were iterated until members of the closest subfamily were found in the list of hits. The sets of orthologous sequences were manually checked for sequence integrity and to clarify subfamily definitions. Progressively larger multiple sequence alignments were built by constructing multiple sequence alignments of each subfamily, which were manually polished and added together stepwise. At each step, the parts outside the central GTPase domain, which often showed no homology across subfamilies (and therefore should not be aligned), were trimmed to facilitate the production of the next multiple sequence alignment. The final multiple sequence alignment was used to produce the corresponding phylogenetic tree (excluding the non-aligned regions) using ClustalW. The full list of sequences used for the tree and their database identifiers are given as supplementary material [see Additional file 1].
HeLa (ATCC CCL-2) and Vero (ATCC CCL-81) cells were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% FCS and penicillin/streptomycin at 37°C in an atmosphere of 5% CO2. Cells were seeded on to glass coverslips, Nunc plates or LabTek dishes and were transfected using Fugene6 (Roche) according to the manufacturer's protocols. For immunocytochemistry, transiently transfected HeLa cells were grown on coverslips and fixed in ice-cold methanol for 5 min at -20°C. The cells were then washed again and incubated in PBS for 20 min. Primary and secondary antibodies were diluted in PBS. The cells were incubated with primary antibodies followed by secondary antibodies for intervals of 30 min with three washing steps in between. The coverslips were then mounted in Mowiol on glass slides. Images of the stained cells were acquired using either a Zeiss Cell Observer System or a Leica AOBS confocal laser-scanning microscope.
Nucleotide binding was measured by the filtration method. Recombinant proteins were incubated in 20 mM Tris-HCl pH 7.5, 1 mM DTT, 5 mM MgCl2, 10 mM EDTA, 0.5 g/l bovine serum albumin, (3H)GTP or (3H)GDP (7,7 Ci/mmol, Amersham-Pharmacia-Biotech) and cold 30 μM GTP or GDP. After incubation at 30°C for the indicated times, samples were diluted in 500 μl of ice-cold washing buffer (20 mM Tris-HCl pH 7.5, 25 mM MgCl2 and 100 mM NaCl) and applied to a nitrocellulose filter (0.45 μm, Millipore). The filters were rinsed with 4 × 4ml ice-cold washing buffer and the radioactivity retained on the filters was determined by scintillation counting.
GTPase activity measurement by HPLC was described by Ahmadian et al. 1999 .
siRNA sequences were BLAST searched against the human genome to ensure that they were specific for hLsg1. The hLsg1 siRNA sequence showed no exact or near exact matches to any other sequence in the human genome and are therefore hLsg1-specific. siRNAs were synthesized by EUROGENTEC. hLsg1 siRNA (5'-UGGAGAGAAACUGCAAGACTT-3') targets nucleotides 506–524 of human hLsg1 relative to the first nucleotide of the start codon.
Cells were seeded into 12-well plates. Twenty-four hours later, they were transfected with 1.68 μg of siRNA per well (unless otherwise noted). Transfections were as described  with the following modifications. Additional OptiMEM (Invitrogen) was not added, and medium was removed before transfection and replaced with 400 μl of OptiMEM. Full-serum medium (unless otherwise noted) was added 4 h post-transfection. At the indicated times post-transfection, the cells were washed twice with PBS and detached from the plate with PBS EDTA. Whole cell extract was obtained by lysing the cells with RIPA buffer containing protease inhibitors and DTT. Protein concentrations were measured using the Bradford assay. Extracts were run on 8% polyacrylamide gels (12% for actin) and transferred to nitrocellulose membranes. The membranes were blocked overnight at 4°C in 1% non-fat dry milk (1 h at room temperature in 5% non-fat dry milk for actin), then probed with either rabbit polyclonal anti-hLsg1 or anti-actin (Santa Cruz Biotechnology) antibody for 1 h at room temperature (overnight at 4°C for actin), washed, and probed with a horseradish peroxidase-conjugated secondary antibody for 1 h at room temperature. Signals were detected using the ECL-Plus reagent (Amersham Biosciences).
We thank Angus Lammond for sharing an aliquot of anti-coilin antibodies. The yeast Lsg1 construct was a kind gift from Arlen Johnson. We appreciate the help of Jeremy Simpson in siRNA design and preparation of this manuscript, and we are grateful for funding from the European Molecular Biology Organization (Long Term Fellowship) to E. Reynaud. M.A. Andrade is the recipient of a Canada Research Chair.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.