Modified base-binding EVE and DCD domains: striking diversity of genomic contexts in prokaryotes and predicted involvement in a variety of cellular processes

Background DNA and RNA of all cellular life forms and many viruses contain an expansive repertoire of modified bases. The modified bases play diverse biological roles that include both regulation of transcription and translation, and protection against restriction endonucleases and antibiotics. Modified bases are often recognized by dedicated protein domains. However, the elaborate networks of interactions and processes mediated by modified bases are far from being completely understood. Results We present a comprehensive census and classification of EVE domains that belong to the PUA/ASCH domain superfamily and bind various modified bases in DNA and RNA. We employ the “guilt by association” approach to make functional inferences from comparative analysis of bacterial and archaeal genomes, based on the distribution and associations of EVE domains in (predicted) operons and functional networks of genes. Prokaryotes encode two classes of EVE domain proteins, slow-evolving and fast-evolving ones. Slow-evolving EVE domains in α-proteobacteria are embedded in conserved operons, potentially involved in coupling between translation and respiration, cytochrome c biogenesis in particular, via binding 5-methylcytosine in tRNAs. In β- and γ-proteobacteria, the conserved associations implicate the EVE domains in the coordination of cell division, biofilm formation, and global transcriptional regulation by non-coding 6S small RNAs, which are potentially modified and bound by the EVE domains. In eukaryotes, the EVE domain-containing THYN1-like proteins have been reported to inhibit PCD and regulate the cell cycle, potentially, via binding 5-methylcytosine and its derivatives in DNA and/or RNA. We hypothesize that the link between PCD and cytochrome c was inherited from the α-proteobacterial and proto-mitochondrial endosymbiont and, unexpectedly, could involve modified base recognition by EVE domains. Fast-evolving EVE domains are typically embedded in defense contexts, including toxin-antitoxin modules and type IV restriction systems, suggesting roles in the recognition of modified bases in invading DNA molecules and targeting them for restriction. We additionally identified EVE-like prokaryotic Development and Cell Death (DCD) domains that are also implicated in defense functions including PCD. This function was inherited by eukaryotes, but in animals, the DCD proteins apparently were displaced by the extended Tudor family proteins, whose partnership with Piwi-related Argonautes became the centerpiece of the Piwi-interacting RNA (piRNA) system. Conclusions Recognition of modified bases in DNA and RNA by EVE-like domains appears to be an important, but until now, under-appreciated, common denominator in a variety of processes including PCD, cell cycle control, antivirus immunity, stress response, and germline development in animals.


, Conserved genomic context of MmcQ/YjbR-EVE fusion proteins in Pasteurellales
Representative MmcQ/YjbR-EVE fusion protein neighborhoods from Pasteurellales. Genes are shown as arrows from 5' to 3'. The order of α-proteobacteria, species, and genomic coordinates for each neighborhood are indicated, as are the GenBank genome accessions and, in parentheses, the GenBank accessions for each EVE protein. Representative EVE protein neighborhoods from Azospirillum. Genes are shown as arrows from 5' to 3'. The order of αproteobacteria, species, and genomic coordinates for each neighborhood are indicated, as are the GenBank genome accessions and, in parentheses, the GenBank accessions for each EVE protein.

Supplementary Figure 8, COG1743->DUF499->SWI2/SNF2 helicase-nuclease->EVE defense systems in diverse archaea
EVE protein neighborhoods from archaea that contain COG1743->DUF499->SWI2/SNF2 helicase-nuclease operons. Genes are shown as arrows from 5' to 3'. The order of α-proteobacteria, species, and genomic coordinates for each neighborhood are indicated, as are the GenBank genome accessions and, in parentheses, the GenBank accessions for each EVE protein.

Supplementary Figure 9, Alignment of EVE and DCD domains
Alignment of representative EVE and DCD domain sequences. The alignment was made with PROMALS3D and rendered with ESPript (Robert and Gouet 2014). The first 6 sequences are highly conserved EVE domains of the type found in the two largest CLANS clusters, followed by 3 which are more similar to DCD domains. The final 8 sequences are DCD domains, and the last two are canonical DCD domains from plants, including the most characterized from N-rich protein (NRP) in Glycine max.

Supplementary Figure 10, Hypothesis for the evolution of the RdRP complex eTudor proteins in C. elegans
In C. elegans, the piRNA pathway has become hyperdeveloped and integrated with the Dicer-dependent RNAi response. The foundation for this is apparently the mutation of an eTudor domain in a tandem eTudor protein that was the ancestor of the EKL-1 and ERI-5 eTudor proteins. These eTudor factors are core components of distinct RNA dependent RNA polymerase (RdRP) complexes responsible for secondary siRNA biogenesis in worms, which occurs in response to binding of mRNA by Dicerdependent primary siRNAs generated from dsRNA as well as piRNAs, which in C. elegans are transcribed from individual loci. The other factors in the complexes minimally include an RdRP and DRH-3 (Dicer-related helicase-3). Presumably, the mutation in the N-terminal eTudor domain allows for a novel type of interaction with a methylated protein, likely one of the other RdRP complex components. We predict this factor to be DRH-3, as it is present in all RdRP complexes, although it could be the various RdRPs that distinguish different RdRP complexes, or another factor altogether.

Supplementary Figure 11, Alignment of eTudor domains in choanoflagellates that are fused to DCD domains and the eTudor domain from Drosophila SND1 (Tudor-SN)
Alignment of eTudor domain sequences from choanoflagellates that are found in DCD proteins and the eTudor domain from the Tudor-SN ortholog encoded by D. melanogaster, SND1. The alignment was made with PROMALS3D and rendered with ESPript (Robert and Gouet 2014). The conserved residues and secondary structures characteristic of eTudor domains are present.