Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay
© Zhang et al.; licensee BioMed Central Ltd. 2009
Received: 21 April 2009
Accepted: 14 May 2009
Published: 14 May 2009
Nonsense-mediated decay is a mechanism that degrades mRNAs with a premature termination codon. That some exons have premature termination codons at fixation is paradoxical: why make a transcript if it is only to be destroyed? One model supposes that splicing is inherently noisy and spurious transcripts are common. The evolution of a premature termination codon in a regularly made unwanted transcript can be a means to prevent costly translation. Alternatively, nonsense-mediated decay can be regulated under certain conditions so the presence of a premature termination codon can be a means to up-regulate transcripts needed when nonsense-mediated decay is suppressed.
To resolve this issue we examined the properties of putative nonsense-mediated decay targets in humans and mice. We started with a well-annotated set of protein coding genes and found that 2 to 4% of genes are probably subject to nonsense-mediated decay, and that the premature termination codon reflects neither rare mutations nor sequencing artefacts. Several lines of evidence suggested that the noisy splicing model has considerable relevance: 1) exons that are uniquely found in nonsense-mediated decay transcripts (nonsense-mediated decay-specific exons) tend to be newly created; 2) have low-inclusion level; 3) tend not to be a multiple of three long; 4) belong to genes with multiple splice isoforms more often than expected; and 5) these genes are not obviously enriched for any functional class nor conserved as nonsense-mediated decay candidates in other species. However, nonsense-mediated decay-specific exons for which distant orthologous exons can be found tend to have been under purifying selection, consistent with the regulation model.
We conclude that for recently evolved exons the noisy splicing model is the better explanation of their properties, while for ancient exons the nonsense-mediated decay regulated gene expression is a viable explanation.
Note that the 55-nt rule will not capture all NMD subject transcripts. Singh et al. found that an artificial 3' untranslated region (UTR) of > 420 nucleotides can stimulate NMD independent of the 55-nt rule . Upstream open reading frames (uORFs) can also trigger NMD in a size-dependent manner . Furthermore, cytoplasmic poly(A)-binding protein (PABP) inhibits the interaction between eRF3 and Upf1 in vitro and prevents NMD in cells when positioned in proximity to the termination codon [15, 18–20]. Based on these findings, a unified model is proposed that the distinction between translation termination at PTCs and at 'normal' termination codons relies on the physical distance between the terminating ribosome and PABP . Nonetheless, the 55-nt rule is the best defined and unlikely to greatly mislead.
That NMD is an important mechanism is witnessed by the fact that the malfunction of NMD results in serious consequences. In mice, loss of Rent1 (UPF1), a key factor of NMD, leads to death at an early embryonic developmental stage . Moreover, approximately one-third of inherited genetic disorders and many forms of cancers are associated with mutant genes containing PTCs [23–26].
The fact of NMD raises numerous questions. How many genes have transcripts subject to NMD? More particularly, beyond the occasional rare allele, why do many genes have PTCs? Their existence is at first sight paradoxical: why do cells make a transcript only for it to be degraded? Given that transcripts are degraded to a certain extent, might NMD genes be subject to weaker purifying selection and might their relative freedom to explore sequence space ensure that they are hot-spots for further adaptive changes? Here we define a set of RefSeq genes that are likely to be subject to NMD so as to investigate the above issues.
As regards the central paradox of NMD, two hypotheses are prominent [27, 28]. First, one can suppose that splicing is an inherently error-prone process regularly throwing up the same unwanted transcripts [29, 30]. This being so, NMD can degrade these non-functional transcripts avoiding costly-to-make, potentially toxic, proteins. We consider this the noisy or spurious transcript model. Second, the NMD machinery need not always be operative and can be regulated. It can, for example, be suppressed under nutrient-limiting conditions . Similarly, both levels of RNPS1 (an exon junction component) and hypoxia can modulate NMD intensity [31, 32]. NMD could then be a mechanism to permit up-regulation of specific transcripts on suppression of NMD. This we dub the regulation model.
These two models make numerous predictions about the properties and evolution of NMD target genes and exons. The spurious transcript model predicts that NMD genes should have more than one splice isoform. The regulation model need not predict the same. The noisy model would predict that spurious exons that are not multiples of three (and hence induce frameshifts) should be more likely to provide the selective conditions favouring an in-frame PTC to prevent translation . The noisy model additionally predicts that the NMD-inducing exons should be in rare transcripts and more common in recently evolved exons. By contrast, the regulation model predicts that the NMD-specific exons should be under purifying selection and that the NMD regulation of orthologous genes be conserved in relatively distant species.
Prior evidence can be given to support both models. As regards the regulation model, up-regulation of enzymes associated with amino acid biosynthesis on NMD inactivation, for example, has been related to a feedback circuit coupling low amino acid levels with inactivation of NMD and hence increased translation rates of mRNAs associated with amino acid biosynthesis . More interestingly, it has been reported that NMD is coupled with alternative splicing to regulate a variety of genes . For instance, ribosomal mRNAs (for example rpL3, rpL12) [33, 34] and splicing-related factors (for example, SC35, PTB) [35–37] are auto-regulated by NMD and alternative splicing. Related to this finding, tens of conserved stop-containing exons whose inclusion renders the transcript sensitive to NMD are found in mice, and these exons are unusually frequent in genes that encode splicing activators (such as serine/arginine-rich proteins) and are unexpectedly enriched in the so-called 'ultraconserved' elements in the mammalian lineage .
Other evidence supports the noisy splice model. Several studies have made efforts to identify and study the naturally occurring transcripts regulated by NMD [9–11, 39–42]. By aligning expressed sequence tags (ESTs) on genomic regions to infer splicing isoforms, about 35% of alternative splicing events are predicted to have the potential to produce PTC-containing transcripts [40, 42]. Using a similar method, Baek and Green also found that about 20% of conserved alternative splicing events produced PTCs . Based on the full-length transcripts, Xing and Lee found that 11% of alternatively spliced isoforms contained PTCs . Using an alternative splicing microarray platform Pan and co-workers found that most of the PTC-containing transcripts were low in abundance across examined tissues, and this low abundance was independent of NMD function , arguing against the regulation function of NMD. Furthermore, comparative analysis shows that NMD-inducing alternative splicing events are not conserved between humans and mice  suggesting noise above regulation, as does the finding that comparison of experimentally identified S. cerevisiae, D. melanogaster and human NMD putative targets showed that most NMD candidates were not orthologous among these species . As a possible alternative explanation for the latter finding is the existence of different PTC-recognition mechanisms in each species , here we consider two species with the same PTC-recognition mechanism, namely mouse and human.
Based on RefSeq mRNAs of high quality, we systematically and computationally identify NMD candidates in both species, according to the well-defined mammalian NMD 55-nt rule  (Figure 1) and employ this set to attempt to distinguish the noisy splicing from the regulation model. We start by defining the data set, ensuring that PTCs are not rare alleles or sequencing artefacts. We then consider the functional and evolutionary properties of NMD candidates. We find that NMD candidates are not commonly conserved between humans and mice. NMD-specific exons are rich in young and low-inclusion-level exons. Although the NMD-specific exons have a high ratio of non-synonymous (Ka) to synonymous (Ks) substitution rate, neutral evolution can be rejected. We find no significant enrichment of NMD candidates in the class of genes subject to positive selection.
Two to four percent of RefSeq genes are nonsense-mediated decay candidates and not explained as rare alleles or sequencing artefacts
Comparison of the gene structures of mouse and human nonsense-mediated mRNA decay and non-nonsense-mediated mRNA decay candidates
Average intron length
1.1 × 10-3
2.1 × 10-5
5' UTR length
8.4 × 10-4
7.0 × 10-6
3' UTR length
2.9 × 10-28
4.4 × 10-47
5.6 × 10-12
1.6 × 10-6
Total number of genes
By reference to the Mammalian Gene Collection full length cDNAs [49, 50] and dbSNP  data (Table S1 in Additional file 1), we find that the NMD candidates have the same quality support as other genes, indicating that these candidates are not the result of sequence artefacts or rare alleles. We also found that human NMD candidates here are enriched for NMD potential targets, previously determined in the study by Mendell et al  (Table S2 in Additional file 1). As shown in Table 1, NMD candidates generally encode shorter proteins and longer introns and UTRs compared with non-NMD genes.
Few genes are nonsense-mediated decay candidates in both mouse and human
Few genes are nonsense-mediated mRNA decay candidates both in humans and mice.
Number of orthologs
NMD candidates in either species
NMD candidates in both species
As it is likely that some NMD transcripts were missed from the analysis due to strict data selection, we repeated the analysis using all the RefSeq mRNAs, including predicted mRNAs (with prefix 'XM_'). The number of NMD candidates in both genomes increased as expected in this second round. However, the intersection between NMD orthologs was still small (see Table S3 in Additional file 1). The result is consistent with a previous comparison of human, fruitfly D. melanogaster and S. cerevisiae NMD candidates [4, 9–12]. We can also show that low rates of NMD conservation are not consistent with the normal rates of stop-codon turnover (see Table S4 in Additional file 1). Our finding shows that, even when the mechanism of PTC regulation is not a variable, deterministic regulation by NMD is generally not selectively favoured over the long term or is not the correct explanation for most PTCs.
Nonsense-mediated decay exons tend not to be multiples of three long
If an exon is included by noisy splicing, pressure not to be translated should be higher if the exon is not a multiple of three long than if it is a multiple of three long. The inclusion of such exons will induce a frame shift if translated, which will change the encoded amino acids downstream of included exons. If so, this will result in proteins that are at best costly to make and non-functional, and at worst are toxic to cells. In contrast, inclusion of an exon that is a multiple of three will typically result in a small peptide insert but need not disrupt the overall function of that protein. The costs of noisy splicing can be reduced to some extent by degradation of these noisy transcripts employing mRNA decay systems, such as by NMD. As exons that are not a multiple of three impose a higher cost, we expect stronger selection for PTCs in exons that are not multiples of three compared with those that do not induce frameshifts, assuming an equal rate of mis-splicing. The regulation model makes no prediction of bias.
Classification of human exon types based on RefSeq data
Comparison of exon creation/loss between human nonsense-mediated mRNA decay-specific and non nonsense-mediated mRNA decay-single exons
Conserved in target
Conserved in target
Lost in target
Created in source
Nonsense-mediated decay-specific exons tend to be in the low inclusion category and newly created
According to the noisy splice model, the NMD transcript is an alternatively spliced unwanted transcript. The regulation model does not require the NMD transcripts to be alternatively spliced transcripts, nor, if they are, need they be the minority form (the transcript isoform that constitutes a small fraction (less than one third) of transcripts from the same gene). Are then NMD genes more likely to be alternatively spliced than random genes and are the NMD transcript isoforms rare? To investigate this we mapped our gene lists to Ensembl genes with BioMart , and extracted the splicing isoform information from the ASD database [54, 55].
Identification of alternatively spliced genes based on the ASD database
Are NMD exons ancient or new? To explore this question, we examined the exon creation and loss events for each NMD-specific cassette exon. To obtain this information, we started by mapping our exons to those from the ASAP2 and the VEEDB databases [56, 57]. The VEEDB database provides exon conservation information for each given exon based on splice site conservation, this being extracted from a 17-vertebrate UCSC multi-genome alignment . Using this exon conservation data, we could determine whether a given human exon is conserved or absent in mouse and dog (outgroup). Unfortunately, only a small portion of exons are mapped to the VEEDB database . As shown in Table 4, 9 out of 35 NMD-specific exons were created after the human-mouse split. This proportion (25.7%) is significantly higher than that of non-NMD-single class exons (11.9%) (Fisher's exact test, P = 0.0315) (see Additional file 2 for human NMD-specific exons conserved in mice). When we compared humans against Rhesus macaques (Macaca mulatta) with mice as outgroup, the difference is more significant (Additional file 1, Table S5, NMD-specific 21.2%, non-NMD-single 4.5%, P = 0.001139). These findings are consistent with the fact that NMD-inducing exons are often not conserved among species . The exon loss rates are much smaller and there was no difference between NMD and non-NMD exons.
An association between new and alternatively spliced exons can probably also account for the rapid turnover of genes subject to NMD. Two types of species-specific alternative splicing events can be defined [58, 59]. One type, referred to as 'species-specific alternative splicing of conserved exons', is represented by a conserved exon that is alternatively spliced in one species but constitutively spliced in the other species. The other type, referred to as 'genome-specific alternative splicing', is represented by an alternative exon in one species which is not detectable in the ortholog of the other species (see Figure Three in  for diagrams). More than 41% of species-specific alternative splicing events of conserved exons and 61% genome-specific alternative splicing events had the potential to trigger NMD, while the NMD-inducing rate of conserved alternative splicing events between human and mice was much lower (< 31%) . No matter which form it is, both types cause NMD to occur more often in only one of the two species than in both species, and can hence explain to a considerable extent NMD status divergence.
Where an orthologous exon can be found, the nonsense-mediated decay-specific exon is or was under purifying selection
K a/K s ratios comparison between human nonsense-mediated mRNA decay and Non-nonsense-mediated mRNA decay concatenated alignments.
in NMD alignmenta
in Non-NMD alignmentb
Comparing alternative cassette exons with other parts of the same gene may be an error-prone test as cassette exons may generally have weaker evolutionary pressures due to their exclusions in some splicing isoforms. Indeed, we observe a higher K a/K s ratio in non-NMD-single class of exons than that in other regions (Table 6). To exclude the effect of this, we compared the NMD-specific K a/K s ratio against non-NMD-single exons. A higher K a/K s ratio for the NMD-specific class was still observed (Table 6, 0.3423 versus 0.2554).
While these results are consistent with the regulated expression model, fuller interpretation of the results is non-trivial. First, the ability to detect orthologous exons predisposes to finding exons functioning in regulation. Second, even if the spurious transcript model is correct for the lineage with the stop, the null for K a/K s is not 1. Crucial is when the exon was subject to NMD and what form of selection operated prior to this. If the exon was subject to NMD very recently, then most of the evolutionary history of the exon down the human lineage was not evolving in response to the presence of the stop. Only if the exon was always spurious, unlikely given that this exon is found in multiple distant taxa, would K a/K s = 1 be expected for the lineage in question. In short, by virtue of the fact that we can find distant orthologous exons, we are almost certainly biasing the data set to exons that have been and possibly still are functional. Given that K a/K s < 1 we can be confident that for some of the time the exon has not been spurious.
Nonsense-mediated decay candidates are fast evolving but not hot-spots for adaptive evolution
Test of neutral evolution for human nonsense-mediated mRNA decay-specific exons under branch model.
Number of parameters
Given that NMD candidates evolve faster in their own lineage than orthologs (relative rate test, Table S7 in Additional file 1), possibly owing to reduced selective constraints, it is tempting to suppose that NMD genes and exons are potentially given much more freedom to roam sequence space than exons of, for example, house-keeping genes. Might this predispose NMD genes to be hot-spots for adaptive evolution?
The frequency of positive selection in human genes is not correlated with nonsense-mediated mRNA decay status.
Under-representation of some functional classes of genes in the nonsense-mediated decay set is consistent with noisy splicing
Biological process analysis of nonsense-mediated mRNA decay candidates.
Biological process terms
Biological process unclassified
1.45 × 10-11
1.28 × 10-11
1.68 × 10-3
5.70 × 10-2
Cell-surface receptor-mediated signal transduction
3.56 × 10-3
1.08 × 10-2
7.10 × 10-2
1.44 × 10-2
8.75 × 10-3
6.03 × 10-2
5.14 × 10-3
8.14 × 10-3
Total number of genes
To examine the skew we employed tools of PANTHER database [65, 66]. The biological processes with Bonferroni-corrected P values < 0.1 in either species are listed in Table 9. About half of the NMD candidates were classified as Biological process unclassified. This was the only set showing over-representation with the NMD class. The only other classes showing significant deviation from expected showed under-representation, these being Developmental processes, Cell-surface receptor-mediated signal transduction, Sensory perception, Signal transduction, Chemosensory perception, and Olfaction. The functional distributions of NMD candidates in humans and mice were quite similar. Of the top six most significant GO terms from either species, five (Biological process unclassified, Developmental processes, Sensory perception, Signal transduction, Chemosensory perception) also appeared in the list of the most significant in the other species (Table 9).
To further determine whether the divergence of orthology in NMD candidates also led to functional divergence, we compared the functional distributions of human and mouse NMD candidates using the FatiGO web tool , which is able to detect particular GO terms for which the two lists of genes have different proportions of genes. For feasibility, the GO terms for mouse genes were deduced from the corresponding human orthologs. No significant GO terms were detected at any level (GO levels 3 to 9; see Additional file 4). To exclude the effect of orthologs that are NMD candidates in both species, we repeated the analysis after removing these orthologs from either or both species. As before there was no GO term showing a significant difference between mice and humans in the regularity of NMD (data not shown). This indicates that there is no functional class in which there are significantly more or fewer human NMD candidates compared against mouse NMD candidates.
These results suggest that NMD targets different genes in the two species, but ones largely in the same functional categories (Table 9 and Additional file 4). While superficially this looks like evidence for regulated splicing, if alternatively spliced genes are more prone to incorrect splicing, we expect, under the noisy splicing model, that the NMD-under-represented functional classes will have fewer alternatively spliced genes than other classes.
Evidence that there are fewer alternatively spliced genes in nonsense-mediated mRNA decay under-represented functional classes in humans.
Biological process terms
Cell surface receptor mediated signal transduction
While NMD must play a role in preventing the translation of rare alleles with premature stop codons , it is perhaps surprising that 2 to 4% of our genes have a premature stop codon that is not just a rare allele (Table S1 in Additional file 1). While it is likely that in some cases (ancient exons) NMD functions in a regulatory mode, our results more strongly support the noisy splicing model. Many features are consistent with this: the rarity of genes regulated by NMD in one species being regulated by NMD in the other (controlling for PTC recognition mechanism) (Table 2 and Table S3 in Additional file 1); the excess of exons that are not multiples of three long (Table 3); the association with alternative splicing (Table 5) and with minor splice forms (Figure 2); and the excess of modern exons associated with NMD (Table 4 and Table S5 in Additional file 1). Conservation of the gene classes subject to NMD (Table 9 and Additional file 4) is also consistent, given that gene class and propensity for alternative splicing covary (Table 10). These results are consistent with previous studies and extend their findings.
Is our estimate of 2 to 4% of genes being subject to NMD accurate? This estimate is on the lower bounds, compared with prior approaches [9–12]. This likely reflects in part both our conservative approach and a tendency for alternative methods to over-estimate. As regards the latter, previous studies based on EST data [40, 42, 43] or expression microarrays  found higher proportions of NMD candidates. However this may include some aberrant transcripts due to noise in EST data . More problematic is the possibility that, as candidates are identified based on expression profile changes after inhibition of NMD, many indirect NMD targets are included [31, 39] (for example, those up-regulated by a protein made from an NMD-regulated gene). There are, however, at least two reasons why our study might be conservative. First, because the RefSeq database excludes splice forms without enough experimental support, many true NMD targets may well be missed. Further, in employing the NMD 55-nt rule (Figure 1) to identify the NMD candidates, we may well miss transcripts regulated in a different manner. Notably both extended 3'UTR and uORFs can trigger NMD to some extent [18, 19]. Parenthetically, our identified NMD candidates show longer 3'UTRs than non-NMD genes (Table 1). Given an association between long 3'UTR and NMD, it is possible that both long 3'UTRs and an exon junction complex downstream of a PTC contribute to targeting. On balance then our 2 to 4% figure is probably conservative. By equal measure, our NMD sample should be relatively clean (that is, low false positive rate). For this reason we suggest that the results that we present are likely to be robust.
Is it likely that spurious splicing will explain most PTCs in other organisms? Consider, for example, S. cerevisiae. Here only 5% of genes have introns and alternative splice forms seem relatively rare. A priori in such a genome regulated expression is expected to be the dominant explanation. Nonetheless, a noisy splicing model of some form may yet be viable. In the yeast genome, more than 70% of genomic regions are transcribed [69, 70] and the richness of the transcriptome is greater than expected. It is viable to suppose that some fraction of these transcripts is spurious and selection for PTCs, out of the normal reading frame, is selected for. Less clear is how such a model might explain an in-frame PTC in a protein coding gene where the PTC is the unique stop codon in the gene.
We find good evidence consistent with the noisy splicing model, especially in the case of recent exons. However, for ancient exons with a PTC association with NMD regulated regulation is a viable model.
We downloaded the sequences and annotations for human and mice from the NCBI RefSeq  database (Build 36.1) in January 2007. To improve the confidence of NMD candidate identification, we only extracted the transcripts with initial letters 'NM_'. Based on these annotations, for each transcript, we calculated the distance from the stop codon to the exon-exon junction closest to the 3' end. According to the NMD rule [3, 4, 16], we classified transcripts as NMD candidates if the distance was > 55 nucleotide bps. Then, we defined a gene as an NMD candidate if it had at least one NMD candidate transcript. All the remaining genes in the genome were classified as non-NMD candidates.
The UTR length, average intron length and protein length for each transcript were also calculated or extracted from the annotations. For each gene with multiple transcript variants, we collapsed these parameters into one by choosing the splicing form with the longest protein and calculating the means of transcript isoforms.
Exon type classification
We started from 3,362 genes with at least two RefSeq mRNAs in our dataset, which included 252 NMD and 3,110 Non-NMD candidates. We only considered the coding exons in each transcript and excluded the two marginal 5' and 3' exons within each gene due to their general incompleteness. For NMD candidates, we searched the exon isoforms only observed in NMD transcripts and defined these exons as NMD-specific exons, and defined the remaining exons in NMD candidates as NMD-non-specific exons. For non-NMD candidates, we classified exons as non-NMD-single and non-NMD-multiple. The former were observed in only one splicing transcript isoform for a given gene and the latter were observed in at least two different splicing transcript isoforms.
Mapping exons to ASAP2 database
Since the ASAP2 database gives the positions of exons on human chromosomes  of NCBI build 35.1 [46, 72, 73], equivalent to UCSC hg17 [74–76], we mapped the human exon set of RefSeq mRNAs to the ASAP2 database  as follows: first, we converted the exon positions on reference sequence contigs into those on chromosomes (NCBI build 36.1, UCSC hg18) using a Perl script. Second, we used the UCSC  liftOver tool to convert these positions into those on human NCBI build 35.1 (UCSC hg17). Finally, we compared these positions with those in the ASAP2 database and retrieved the ASAP2 exons that exactly matched the RefSeq exon set. 95,029 of 209,222 RefSeq exons can be uniquely mapped to the ASAP2 database. Based on these matched exons, we can easily obtain the splicing state, inclusion levels, exon creation/loss from the ASAP2 and the VEEDB database tables [56, 57].
Construction of exon alignments for nonsense-mediated decay -specific exons and calculation of K a/K s
Given the NMD-specific cassette exon lists above, we extracted the corresponding regions from human-mouse-dog CDS alignments (built by a Clustal W, version 1.83 , see Additional file 1 for details) and concatenated them together. The remained regions in these alignments were also concatenated. These concatenations were inputted separately into PAML  package to calculate K a/K s ratio for each lineage under the free ratio model . The alignments for non-NMD-single exons and remaining parts were similarly extracted and inputted into PAML for Ka/Ks calculations.
To see if the NMD-specific exons were under neutral evolution, we fixed the K a/K s at one in the human NMD lineage under free ratio model , and compared this with a more general model (with K a/K s free) to test if the neutral model could be rejected using likelihood ratio tests (calculated in R ).
Gene ontology analysis
Comparisons of functional distributions of NMD candidates between human and mouse were carried out using the FatiGO program . FatiGO implements the nested inclusive analysis, in which the test is done recursively until the deepest level in which significance is obtained and only this last level is reported. In this way both variables, the efficiency of the test and the highest precision in the term found, are optimized. The program computes a Fisher's two-tail exact test to statistically define over- or under-represented terms between two lists of genes, and the original P values are corrected by a false discovery rate approach .
The detection of over- or under- represented functional entries for NMD candidates was done based on the PANTHER database [65, 66]. The NMD candidate list was compared with the all the genes used and the P values were determined using a binomial test for each functional category. The original P values were adjusted using a modified Bonferroni correction method, which accounted for the nesting relationship among GO terms at different levels to avoid too conservative corrections.
expressed sequence tag
- Ka :
non-synonymous substitution rate
- Ks :
synonymous substitution rate
nonsense-mediated mRNA decay
premature termination codon
upstream open reading frame
- NMD-specific exon:
exon observed only in NMD RefSeq transcripts
- NMD-non-specific exon:
exon in NMD candidate gene but not NMD-specific one
- Non-NMD-single exon:
exon observed in only one RefSeq transcript in non-NMD candidate genes
- Non-NMD-multiple exon:
exon shared by at least two RefSeq transcripts in each non-NMD candidate gene
- Cassette exon:
exon completely alternatively spliced.
We thank Dr Lin Weng and Dr Guohui Ding for critically reading the paper. We thank Heng Xu, Yang Liu and Xianfeng Chen for helpful discussions. We also would like to thank the two anonymous referees for their constructive comments. This work is supported by the National High Technology Research and Development Program of China (2006AA02Z330, 2006AA02A301), the National Basic Research Program of China (No. 2007CB512202, 2004CB518603), the National Natural Science Foundation of China, Key Program (No.30530450), and the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No. KSCX1-YW-R-74). LDH is a Royal Society Wolfson Research Merit Award Holder.
- Chang YF, Imam JS, Wilkinson ME: The nonsense-mediated decay RNA surveillance pathway. Ann Rev Biochem. 2007, 76: 51-74. 10.1146/annurev.biochem.76.050106.093909.View ArticlePubMedGoogle Scholar
- Conti E, Izaurralde E: Nonsense-mediated mRNA decay: molecular insights and mechanistic variations across species. Curr Opin Cell Biol. 2005, 17: 316-325. 10.1016/j.ceb.2005.04.005.View ArticlePubMedGoogle Scholar
- Maquat LE: Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol. 2004, 5: 89-99. 10.1038/nrm1310.View ArticlePubMedGoogle Scholar
- Rehwinkel J, Raes J, Izaurralde E: Nonsense-mediated mRNA decay: Target genes and functional diversification of effectors. Trends Biochem Sci. 2006, 31: 639-646. 10.1016/j.tibs.2006.09.005.View ArticlePubMedGoogle Scholar
- Kerenyi Z, Merai Z, Hiripi L, Benkovics A, Gyula P, Lacomme C, Barta E, Nagy F, Silhavy D: Inter-kingdom conservation of mechanism of nonsense-mediated mRNA decay. Embo J. 2008, 27: 1585-1595. 10.1038/emboj.2008.88.PubMed CentralView ArticlePubMedGoogle Scholar
- Wen J, Brogna S: Nonsense-mediated mRNA decay. Biochem Soc Trans. 2008, 36: 514-516. 10.1042/BST0360514.View ArticlePubMedGoogle Scholar
- Shyu AB, Wilkinson MF, van Hoof A: Messenger RNA regulation: to translate or to degrade. Embo J. 2008, 27: 471-481. 10.1038/sj.emboj.7601977.PubMed CentralView ArticlePubMedGoogle Scholar
- Lelivelt MJ, Culbertson MR: Yeast Upf proteins required for RNA surveillance affect global expression of the yeast transcriptome. Mol Cell Biol. 1999, 19: 6710-6719.PubMed CentralView ArticlePubMedGoogle Scholar
- He F, Li X, Spatrick P, Casillo R, Dong S, Jacobson A: Genome-wide analysis of mRNAs regulated by the nonsense-mediated and 5' to 3' mRNA decay pathways in yeast. Mol Cell. 2003, 12: 1439-1452. 10.1016/S1097-2765(03)00446-5.View ArticlePubMedGoogle Scholar
- Mendell JT, Sharifi NA, Meyers JL, Martinez-Murillo F, Dietz HC: Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise. Nat Genet. 2004, 36: 1073-1078. 10.1038/ng1429.View ArticlePubMedGoogle Scholar
- Rehwinkel J, Letunic I, Raes J, Bork P, Izaurralde E: Nonsense-mediated mRNA decay factors act in concert to regulate common mRNA targets. RNA. 2005, 11: 1530-1544. 10.1261/rna.2160905.PubMed CentralView ArticlePubMedGoogle Scholar
- Wittmann J, Hol EM, Jack HM: hUPF2 silencing identifies physiologic substrates of mammalian nonsense-mediated mRNA decay. Mol Cell Biol. 2006, 26: 1272-1287. 10.1128/MCB.26.4.1272-1287.2006.PubMed CentralView ArticlePubMedGoogle Scholar
- Muhlrad D, Parker R: Aberrant mRNAs with extended 3' UTRs are substrates for rapid degradation by mRNA surveillance. RNA. 1999, 5: 1299-1307. 10.1017/S1355838299990829.PubMed CentralView ArticlePubMedGoogle Scholar
- Amrani N, Ganesan R, Kervestin S, Mangus DA, Ghosh S, Jacobson A: A faux 3'-UTR promotes aberrant termination and triggers nonsense-mediated mRNA decay. Nature. 2004, 432: 112-118. 10.1038/nature03060.View ArticlePubMedGoogle Scholar
- Behm-Ansmant I, Gatfield D, Rehwinkel J, Hilgers V, Izaurralde E: A conserved role for cytoplasmic poly(A)-binding protein 1 (PABPC1) in nonsense-mediated mRNA decay. Embo J. 2007, 26: 1591-1601. 10.1038/sj.emboj.7601588.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagy E, Maquat LE: A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem Sci. 1998, 23: 198-199. 10.1016/S0968-0004(98)01208-0.View ArticlePubMedGoogle Scholar
- Kuzmiak HA, Maquat LE: Applying nonsense-mediated mRNA decay research to the clinic: progress and challenges. Trends Mol Med. 2006, 12: 306-316. 10.1016/j.molmed.2006.05.005.View ArticlePubMedGoogle Scholar
- Singh G, Rebbapragada I, Lykke-Andersen J: A competition between stimulators and antagonists of Upf complex recruitment governs human nonsense-mediated mRNA decay. PLoS Biol. 2008, 6: e111-10.1371/journal.pbio.0060111.PubMed CentralView ArticlePubMedGoogle Scholar
- Silva AL, Ribeiro P, Inacio A, Liebhaber SA, Romao L: Proximity of the poly(A)-binding protein to a premature termination codon inhibits mammalian nonsense-mediated mRNA decay. RNA. 2008, 14: 563-576. 10.1261/rna.815108.PubMed CentralView ArticlePubMedGoogle Scholar
- Eberle AB, Stalder L, Mathys H, Orozco RZ, Muhlemann O: Posttranscriptional gene regulation by spatial rearrangement of the 3' untranslated region. PLoS Biol. 2008, 6: e92-10.1371/journal.pbio.0060092.PubMed CentralView ArticlePubMedGoogle Scholar
- Muhlemann O: Recognition of nonsense mRNA: towards a unified model. Biochem Soc Trans. 2008, 36: 497-501. 10.1042/BST0360497.View ArticlePubMedGoogle Scholar
- Medghalchi SM, Frischmeyer PA, Mendell JT, Kelly AG, Lawler AM, Dietz HC: Rent1, a trans-effector of nonsense-mediated mRNA decay, is essential for mammalian embryonic viability. Hum Mol Genet. 2001, 10: 99-105. 10.1093/hmg/10.2.99.View ArticlePubMedGoogle Scholar
- Frischmeyer PA, Dietz HC: Nonsense-mediated mRNA decay in health and disease. Hum Mol Genet. 1999, 8: 1893-1900. 10.1093/hmg/8.10.1893.View ArticlePubMedGoogle Scholar
- Ivanov I, Lo KC, Hawthorn L, Cowell JK, Ionov Y: Identifying candidate colon cancer tumor suppressor genes using inhibition of nonsense-mediated mRNA decay in colon cancer cells. Oncogene. 2007, 26: 2873-2884. 10.1038/sj.onc.1210098.View ArticlePubMedGoogle Scholar
- Noensie EN, Dietz HC: A strategy for disease gene identification through nonsense-mediated mRNA decay inhibition. Nat Biotechnol. 2001, 19: 434-439. 10.1038/88099.View ArticlePubMedGoogle Scholar
- Khajavi M, Inoue K, Lupski JR: Nonsense-mediated mRNA decay modulates clinical outcome of genetic disease. Eur J Hum Genet. 2006, 14: 1074-1081. 10.1038/sj.ejhg.5201649.View ArticlePubMedGoogle Scholar
- McGlincy NJ, Smith CW: Alternative splicing resulting in nonsense-mediated mRNA decay: what is the meaning of nonsense?. Trends Biochem Sci. 2008, 33: 385-393. 10.1016/j.tibs.2008.06.001.View ArticlePubMedGoogle Scholar
- Lareau LF, Brooks AN, Soergel DA, Meng Q, Brenner SE: The coupling of alternative splicing and nonsense-mediated mRNA decay. Adv Exp Med Biol. 2007, 623: 190-211.View ArticlePubMedGoogle Scholar
- Jaillon O, Bouhouche K, Gout JF, Aury JM, Noel B, Saudemont B, Nowacki M, Serrano V, Porcel BM, Segurens B, Le Mouel A, Lepere G, Schachter V, Betermier M, Cohen J, Wincker P, Sperling L, Duret L, Meyer E: Translational control of intron splicing in eukaryotes. Nature. 2008, 451: 359-362. 10.1038/nature06495.View ArticlePubMedGoogle Scholar
- Kurmangaliyev YZ, Gelfand MS: Computational analysis of splicing errors and mutations in human transcripts. BMC Genomics. 2008, 9: 13-10.1186/1471-2164-9-13.PubMed CentralView ArticlePubMedGoogle Scholar
- Viegas MH, Gehring NH, Breit S, Hentze MW, Kulozik AE: The abundance of RNPS1, a protein component of the exon junction complex, can determine the variability in efficiency of the Nonsense Mediated Decay pathway. Nucleic Acids Res. 2007, 35: 4542-4551. 10.1093/nar/gkm461.PubMed CentralView ArticlePubMedGoogle Scholar
- Gardner LB: Hypoxic inhibition of nonsense-mediated RNA decay regulates gene expression and the integrated stress response. Mol Cell Biol. 2008, 28: 3729-3741. 10.1128/MCB.02284-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Mitrovich QM, Anderson P: Unproductively spliced ribosomal protein mRNAs are natural targets of mRNA surveillance in C. elegans. Genes Dev. 2000, 14: 2173-2184. 10.1101/gad.819900.PubMed CentralView ArticlePubMedGoogle Scholar
- Cuccurese M, Russo G, Russo A, Pietropaolo C: Alternative splicing and nonsense-mediated mRNA decay regulate mammalian ribosomal gene expression. Nucleic Acids Res. 2005, 33: 5965-5977. 10.1093/nar/gki905.PubMed CentralView ArticlePubMedGoogle Scholar
- Sureau A, Gattoni R, Dooghe Y, Stevenin J, Soret J: SC35 autoregulates its expression by promoting splicing events that destabilize its mRNAs. Embo J. 2001, 20: 1785-1796. 10.1093/emboj/20.7.1785.PubMed CentralView ArticlePubMedGoogle Scholar
- Wollerton MC, Gooding C, Wagner EJ, Garcia-Blanco MA, Smith CW: Autoregulation of polypyrimidine tract binding protein by alternative splicing leading to nonsense-mediated decay. Mol Cell. 2004, 13: 91-100. 10.1016/S1097-2765(03)00502-1.View ArticlePubMedGoogle Scholar
- Saltzman AL, Kim YK, Pan Q, Fagnani MM, Maquat LE, Blencowe BJ: Regulation of multiple core spliceosomal proteins by alternative splicing-coupled nonsense-mediated mRNA decay. Mol Cell Biol. 2008, 28: 4320-4330. 10.1128/MCB.00361-08.PubMed CentralView ArticlePubMedGoogle Scholar
- Ni JZ, Grate L, Donohue JP, Preston C, Nobida N, O'Brien G, Shiue L, Clark TA, Blume JE, Ares M: Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay. Genes Dev. 2007, 21: 708-718. 10.1101/gad.1525507.PubMed CentralView ArticlePubMedGoogle Scholar
- Guan Q, Zheng W, Tang S, Liu X, Zinkel RA, Tsui KW, Yandell BS, Culbertson MR: Impact of nonsense-mediated mRNA decay on the global expression profile of budding yeast. PLoS Genet. 2006, 2: e203-10.1371/journal.pgen.0020203.PubMed CentralView ArticlePubMedGoogle Scholar
- Lewis BP, Green RE, Brenner SE: Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc Natl Acad Sci USA. 2003, 100: 189-192. 10.1073/pnas.0136770100.PubMed CentralView ArticlePubMedGoogle Scholar
- Pan Q, Saltzman AL, Kim YK, Misquitta C, Shai O, Maquat LE, Frey BJ, Blencowe BJ: Quantitative microarray profiling provides evidence against widespread coupling of alternative splicing with nonsense-mediated mRNA decay to control gene expression. Genes Dev. 2006, 20: 153-158. 10.1101/gad.1382806.PubMed CentralView ArticlePubMedGoogle Scholar
- Green RE, Lewis BP, Hillman RT, Blanchette M, Lareau LF, Garnett AT, Rio DC, Brenner SE: Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genes. Bioinformatics. 2003, 19 (Suppl 1): i118-121. 10.1093/bioinformatics/btg1015.View ArticlePubMedGoogle Scholar
- Baek D, Green P: Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing. Proc Natl Acad Sci USA. 2005, 102: 12813-12818. 10.1073/pnas.0506139102.PubMed CentralView ArticlePubMedGoogle Scholar
- Xing Y, Lee CJ: Negative selection pressure against premature protein truncation is reduced by alternative splicing and diploidy. Trends Genet. 2004, 20: 472-475. 10.1016/j.tig.2004.07.009.View ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence project: update and current status. Nucleic Acids Res. 2003, 31: 34-37. 10.1093/nar/gkg111.PubMed CentralView ArticlePubMedGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, D61-65. 10.1093/nar/gkl842. 35 Database
- Pruitt KD, Tatusova T, Maglott DR: NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, D501-504. 33 Database
- Johansson MJ, He F, Spatrick P, Li C, Jacobson A: Association of yeast Upf1p with direct substrates of the NMD pathway. Proc Natl Acad Sci USA. 2007, 104: 20872-20877. 10.1073/pnas.0709257105.PubMed CentralView ArticlePubMedGoogle Scholar
- Strausberg RL, Feingold EA, Klausner RD, Collins FS: The mammalian gene collection. Science. 1999, 286: 455-457. 10.1126/science.286.5439.455.View ArticlePubMedGoogle Scholar
- Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, MGC Project Team: The status, quality, and expansion of the NIH full-length cDNA project: The Mammalian Gene Collection (MGC). Genome Research. 2004, 14: 2121-2127. 10.1101/gr.2596504.View ArticlePubMedGoogle Scholar
- dbSNP. [http://www.ncbi.nlm.nih.gov/projects/SNP/]
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314: 1041-1052. 10.1006/jmbi.2000.5197.View ArticlePubMedGoogle Scholar
- Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, et al: Ensembl 2008. Nucleic Acids Res. 2008, D707-714. 36 Database
- Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA: ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res. 2006, D46-55. 10.1093/nar/gkj031. 34 Database
- Thanaraj TA, Stamm S, Clark F, Riethoven JJ, Le Texier V, Muilu J: ASD: the Alternative Splicing Database. Nucleic Acids Res. 2004, D64-69. 10.1093/nar/gkh030. 32 Database
- Kim N, Alekseyenko AV, Roy M, Lee C: The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res. 2007, D93-98. 10.1093/nar/gkl884. 35 Database
- Alekseyenko AV, Kim N, Lee CJ: Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes. RNA. 2007, 13: 661-670. 10.1261/rna.325107.PubMed CentralView ArticlePubMedGoogle Scholar
- Modrek B, Lee CJ: Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003, 34: 177-180. 10.1038/ng1159.View ArticlePubMedGoogle Scholar
- Pan Q, Bakowski MA, Morris Q, Zhang W, Frey BJ, Hughes TR, Blencowe BJ: Alternative splicing of conserved exons is frequently species-specific in human and mouse. Trends Genet. 2005, 21: 73-77. 10.1016/j.tig.2004.12.004.View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573.View ArticlePubMedGoogle Scholar
- Arbiza L, Dopazo J, Dopazo H: Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome. PLoS Comput Biol. 2006, 2: e38-10.1371/journal.pcbi.0020038.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19: 908-917.View ArticlePubMedGoogle Scholar
- Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.View ArticlePubMedGoogle Scholar
- Thomas PD, Kejariwal A, Guo N, Mi H, Campbell MJ, Muruganujan A, Lazareva-Ulitsky B: Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 2006, W645-650. 10.1093/nar/gkl229. 34 Web Server
- Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A: PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003, 13: 2129-2141. 10.1101/gr.772403.PubMed CentralView ArticlePubMedGoogle Scholar
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004, 20: 578-580. 10.1093/bioinformatics/btg455.View ArticlePubMedGoogle Scholar
- Sorek R, Safer HM: A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res. 2003, 31: 1067-1074. 10.1093/nar/gkg170.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453: 1239-1243. 10.1038/nature07002.View ArticlePubMedGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticlePubMedGoogle Scholar
- Jenuth JP: The NCBI. Publicly available tools and resources on the Web. Methods Mol Biol. 2000, 132: 301-312.PubMedGoogle Scholar
- National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov/]
- Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 2008, D773-779. 36 Database
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, D493-496. 10.1093/nar/gkh103. 32 Database
- The UCSC Genome Browser Database. [http://genome.ucsc.edu/]
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralView ArticlePubMedGoogle Scholar
- Ihaka R, Gentleman R: R: A Language for Data Analysis and Graphics. J Comput Graph Stat. 1996, 5: 299-314. 10.2307/1390807.Google Scholar
- Benjamini Y, Yekutieli D: The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001, 29: 1165-1188. 10.1214/aos/1013699998.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.