Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs — spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such ‘traps’ being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Similar content being viewed by others
References
King, J. L. & Jukes, T. H. Non-Darwinian evolution. Science 164, 788–798 (1969).
Sharp, P. M., Averof, M., Lloyd, A. T., Matassi, G. & Peden, J. F. DNA sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 349, 241–247 (1995).
Ikemura, T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389–409 (1981).
Ikemura, T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13–34 (1985).
Sharp, P. M. & Li, W.-H. The codon adaptation index — a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987).
Qian, W., Yang, J. R., Pearson, N. M., Maclean, C. & Zhang, J. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. 8, e1002603 (2012).
Akashi, H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136, 927–935 (1994).
Stoletzki, N. & Eyre-Walker, A. Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol. Biol. Evol. 24, 374–381 (2007).
Sharp, P. M., Emery, L. R. & Zeng, K. Forces that influence the evolution of codon bias. Philos. Trans. R. Soc. Lond. B Biol. Sci. 365, 1203–1212 (2010).
dos Reis, M. & Wernisch, L. Estimating translational selection in eukaryotic genomes. Mol. Biol. Evol. 26, 451–461 (2009).
Lynch, M. & Conery, J. S. The origins of genome complexity. Science 302, 1401–1404 (2003).
Duret, L. Evolution of synonymous codon usage in metazoans. Curr. Opin. Genet. Dev. 12, 640–649 (2002).
Hunt, R. C., Simhadri, V. L., Iandoli, M., Sauna, Z. E. & Kimchi-Sarfaty, C. Exposing synonymous mutations. Trends Genet. 30, 308–321 (2014).
Bali, V. & Bebok, Z. Decoding mechanisms by which silent codon changes influence protein biogenesis and function. Int. J. Biochem. Cell Biol. 64, 58–74 (2015).
Kudla, G., Lipinski, L., Caffin, F., Helwak, A. & Zylicz, M. High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol. 4, e180 (2006).
Mordstein, C. et al. Codon usage and splicing jointly influence mRNA localization. Cell Syst. 10, 351–362.e8 (2020).
Zuckerman, B., Ron, M., Mikl, M., Segal, E. & Ulitsky, I. Gene architecture and sequence composition underpin selective dependency of nuclear export of long RNAs on NXF1 and the TREX complex. Mol. Cell 79, 251–267.e6 (2020).
Lin, M. F. et al. Locating protein-coding sequences under selection for additional, overlap** functions in 29 mammalian genomes. Genome Res. 21, 1916–1928 (2011).
Caceres, E. F. & Hurst, L. D. The evolution, impact and properties of exonic splice enhancers. Genome Biol. 14, R143 (2013).
Savisaar, R. & Hurst, L. D. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res. 28, 1442–1454 (2018).
Keightley, P. D. & Halligan, D. L. Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans. Genetics 188, 931–940 (2011).
Eory, L., Halligan, D. L. & Keightley, P. D. Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes. Mol. Biol. Evol. 27, 177–192 (2010).
Wen, P., **ao, P. & **a, J. dbDSM: a manually curated database for deleterious synonymous mutations. Bioinformatics 32, 1914–1916 (2016).
Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 22, 96–118 (2021).
Hurst, L. D. Evolutionary genomics and the reach of selection. J. Biol. 8, 12 (2009).
Andrews, G. et al. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science 380, eabn7930 (2023).
Luthra, I. et al. Biochemical activity is the default DNA state in eukaryotes. Preprint at bioRxiv https://doi.org/10.1101/2022.12.16.520785 (2022).
Camellato, B., Brosh, R., Maurano, M. T. & Boeke, J. D. Genomic analysis of a synthetic reversed sequence reveals default chromatin states in yeast and mammalian cells. Preprint at bioRxiv https://doi.org/10.1101/2023.06.20.545713v2 (2022).
Xu, H., Li, C., Xu, C. & Zhang, J. Chance promoter activities illuminate the origins of eukaryotic intergenic transcriptions. Nat. Commun. 14, 1826 (2023).
Preker, P. et al. PROMoter uPstream transcripts share characteristics with mRNAs and are produced upstream of all three major types of mammalian promoters. Nucleic Acids Res. 39, 7179–7193 (2011).
Schuler, A., Ghanbarian, A. T. & Hurst, L. D. Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs. Mol. Biol. Evol. 31, 3164–3183 (2014).
Managadze, D. et al. Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs. Genome Biol. Evol. 3, 1390–1404 (2011).
Haerty, W. & Ponting, C. P. Mutations within lncRNAs are effectively selected against in fruitfly but not in human. Genome Biol. 14, R49 (2013).
Johnsson, P., Lipovich, L., Grandér, D. & Morris, K. V. Evolutionary conservation of long non-coding RNAs; sequence, structure, function. Biochim. Biophys. Acta 1840, 1063–1071 (2014).
Ponting, C. P. & Haerty, W. Genome-wide analysis of human long noncoding RNAs: a provocative review. Annu. Rev. Genomics Hum. Genet. 23, 153–172 (2022).
Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).
Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).
Schlackow, M. et al. Distinctive patterns of transcription and RNA processing for human lincRNAs. Mol. Cell 65, 25–38 (2017).
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
Wang, J. et al. Primate-specific endogenous retrovirus-driven transcription defines naive-like stem cells. Nature 516, 405–409 (2014).
Raskó, T. et al. A novel gene controls a new structure: piggybac transposable element-derived 1,unique to mammals, controls mammal-specific neuronal paraspeckles. Mol. Biol. Evol. 39, msac175 (2022).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Carlevaro-Fita, J. et al. Ancient exapted transposable elements promote nuclear enrichment of human long noncoding RNAs. Genome Res. 29, 208–222 (2019).
Pickrell, J. K., Pai, A. A., Gilad, Y. & Pritchard, J. K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).
Bénitìere, F., Necsulea, A. & Duret, L. Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoans. Preprint at bioRxiv https://doi.org/10.1101/2022.12.09.519597v5 (2023).
Irimia, M. et al. Complex selection on 5′ splice sites in intron-rich organisms. Genome Res. 19, 2021–2027 (2009).
Savisaar, R. & Hurst, L. D. Estimating the prevalence of functional exonic splice regulatory information. Hum. Genet. 136, 1059–1078 (2017).
Wagner, A. Energy constraints on the evolution of gene expression. Mol. Biol. Evol. 22, 1365–1374 (2005).
Kudla, G., Murray, A. W., Tollervey, D. & Plotkin, J. B. Coding-sequence determinants of gene expression in Escherichia coli. Science 324, 255–258 (2009).
Cambray, G., Guimaraes, J. C. & Arkin, A. P. Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli. Nat. Biotechnol. 36, 1005–1015 (2018).
Mittal, P., Brindle, J., Stephen, J., Plotkin, J. B. & Kudla, G. Codon usage influences fitness through RNA toxicity. Proc. Natl Acad. Sci. USA 115, 8639–8644 (2018).
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).
Lionetti, M. et al. A compendium of DIS3 mutations and associated transcriptional signatures in plasma cell dyscrasias. Oncotarget 6, 26129–26141 (2015).
Fasken, M. B. et al. The RNA exosome and human disease. Methods Mol. Biol. 2062, 3–33 (2020).
Morton, D. J. et al. The RNA exosome and RNA exosome-linked disease. RNA 24, 127–142 (2018).
Giunta, M. et al. Altered RNA metabolism due to a homozygous RBM7 mutation in a patient with spinal motor neuropathy. Hum. Mol. Genet. 25, 2985–2996 (2016).
Insco, M. L. et al. Oncogenic CDK13 mutations impede nuclear RNA surveillance. Science 380, eabn7625 (2023).
Luo, S. et al. The evolutionary arms race between transposable elements and piRNAs in Drosophila melanogaster. BMC Evol. Biol. 20, 14 (2020).
Bertozzi, T. M., Elmer, J. L., Macfarlan, T. S. & Ferguson-Smith, A. C. KRAB zinc finger protein diversification drives mammalian interindividual methylation variability. Proc. Natl Acad. Sci. USA 117, 31290–31300 (2020).
Fox, A. H. & Lamond, A. I. Paraspeckles. Cold Spring Harb. Perspect. Biol. 2, a000687 (2010).
Kaneko, H. et al. DICER1 deficit induces Alu RNA toxicity in age-related macular degeneration. Nature 471, 325–330 (2011).
Muotri, A. R. et al. L1 retrotransposition in neurons is modulated by MeCP2. Nature 468, 443–446 (2010).
Tsivion-Visbord, H. et al. Increased RNA editing in maternal immune activation model of neurodevelopmental disease. Nat. Commun. 11, 5236 (2020).
Ansell, B. R. E. et al. A survey of RNA editing at single-cell resolution links interneurons to schizophrenia and autism. RNA 27, 1482–1496 (2021).
Li, P. et al. Aicardi–Goutieres syndrome protein TREX1 suppresses L1 and maintains genome integrity through exonuclease-independent ORF1p depletion. Nucleic Acids Res. 45, 4619–4631 (2017).
Stearrett, N. et al. Expression of human endogenous retroviruses in systemic lupus erythematosus: multiomic integration with gene expression. Front. Immunol. 12, 661437 (2021).
Dembny, P. et al. Human endogenous retrovirus HERV-K(HML-2) RNA causes neurodegeneration through Toll-like receptors. JCI Insight 5, e131093 (2020).
Ramirez, P. et al. Pathogenic tau accelerates aging-associated activation of transposable elements in the mouse central nervous system. Prog. Neurobiol. 208, 102181 (2022).
Grundy, E. E., Diab, N. & Chiappinelli, K. B. Transposable element regulation and expression in cancer. FEBS J. 289, 1160–1179 (2022).
Van Meter, M. et al. SIRT6 represses LINE1 retrotransposons by ribosylating KAP1 but this repression fails with stress and age. Nat. Commun. 5, 5011 (2014).
Hastings, M. L. & Krainer, A. R. Pre-mRNA splicing in the new millennium. Curr. Opin. Cell Biol. 13, 302–309 (2001).
Liu, H. X., Cartegni, L., Zhang, M. Q. & Krainer, A. R. A mechanism for exon skip** caused by nonsense or missense mutations in BRCA1 and other genes. Nat. Genet. 27, 55–58 (2001).
Neri, F. et al. Intragenic DNA methylation prevents spurious transcription initiation. Nature 543, 72–77 (2017).
Ilinskaya, O. N. & Mahmud, R. S. Ribonucleases as antiviral agents. Mol. Biol. 48, 615–623 (2014).
Meola, N. et al. Identification of a nuclear exosome decay pathway for processed transcripts. Mol. Cell 64, 520–533 (2016).
Ogami, K. et al. An Mtr4/ZFC3H1 complex facilitates turnover of unstable nuclear RNAs to prevent their cytoplasmic transport and global translational repression. Genes. Dev. 31, 1257–1271 (2017).
Lubas, M. et al. Interaction profiling identifies the human nuclear exosome targeting complex. Mol. Cell 43, 624–637 (2011).
Chen, L. L., DeCerbo, J. N. & Carmichael, G. G. Alu element-mediated gene silencing. EMBO J. 27, 1694–1705 (2008).
Monaghan, L., Longman, D. & Cáceres, J. F. Translation-coupled mRNA quality control mechanisms. EMBO J. 42, e114378 (2023).
Anderson, P. & Kedersha, N. Stress granules: the Tao of RNA triage. Trends Biochem. Sci. 33, 141–150 (2008).
Ding, S. W. & Voinnet, O. Antiviral immunity directed by small RNAs. Cell 130, 413–426 (2007).
Gao, G. X., Guo, X. M. & Goff, S. P. Inhibition of retroviral RNA production by ZAP, a CCCH-type zinc finger protein. Science 297, 1703–1706 (2002).
Kesner, J. S. et al. Noncoding translation mitigation. Nature 617, 395–402 (2023).
Liu, J. et al. The RNA m6A reader YTHDC1 silences retrotransposons and guards ES cell identity. Nature 591, 322–326 (2021).
Ries, R. J., Pickering, B. F., Poh, H. X., Namkoong, S. & Jaffrey, S. R. m6A governs length-dependent enrichment of mRNAs in stress granules. Nat. Struct. Mol. Biol. 30, 1525–1535 (2023).
Lee, E. S. et al. N6-Methyladenosine (m6A) promotes the nuclear retention of mRNAs with intact 5′ splice site motifs. Preprint at bioRxiv https://doi.org/10.1101/2023.06.20.545713v2 (2023).
He, P. C. et al. Exon architecture controls mRNA m6A suppression and gene expression. Science 379, 677–682 (2023).
Delaunay, S., Helm, M. & Frye, M. RNA modifications in physiology and disease: towards clinical applications. Nat. Rev. Genet. https://doi.org/10.1038/s41576-41023-00645-41572 (2023).
Sun, T. et al. Crosstalk between RNA m6A and DNA methylation regulates transposable element chromatin activation and cell fate in human pluripotent stem cells. Nat. Genet. 55, 1324–1335 (2023).
Janeway, C. A. Jr Approaching the asymptote? Evolution and revolution in immunology. Cold Spring Harb. Symp. Quant. Biol. 54 Pt 1, 1–13 (1989).
Logsdon, J. M. The recent origins of spliceosomal introns revisited. Curr. Opin. Genet. Dev. 8, 637–648 (1998).
Sakharkar, M. K., Chow, V. T. & Kangueane, P. Distributions of exons and introns in the human genome. Silico Biol. 4, 387–393 (2004).
Zhang, J., Sun, X. L., Qian, Y. M., LaDuca, J. P. & Maquat, L. E. At least one intron is required for the nonsense-mediated decay of triosephosphate isomerase mRNA: a possible link between nuclear splicing and cytoplasmic translation. Mol. Cell Biol. 18, 5272–5283 (1998).
Le Hir, H., Nott, A. & Moore, M. J. How introns influence and enhance eukaryotic gene expression. Trends Biochem. Sci. 28, 215–220 (2003).
Brocke, K. S., Neu-Yilik, G., Gehring, N. H., Hentze, M. W. & Kulozik, A. E. The human intronless melanocortin 4-receptor gene is NMD insensitive. Hum. Mol. Genet. 11, 331–335 (2002).
Savisaar, R. & Hurst, L. D. Purifying selection on exonic splice enhancers in intronless genes. Mol. Biol. Evol. 33, 1396–1418 (2016).
Long, H. et al. Evolutionary determinants of genome-wide nucleotide composition. Nat. Ecol. Evol. 2, 237–240 (2018).
Ho, A. T. & Hurst, L. D. Unusual mammalian usage of TGA stop codons reveals that sequence conservation need not imply purifying selection. PLoS Biol. 20, e3001588 (2022).
Charneski, C. A., Honti, F., Bryant, J. M., Hurst, L. D. & Feil, E. J. Atypical at skew in Firmicute genomes results from selection and not from mutation. PLoS Genet. 7, e1002283 (2011).
Seczynska, M., Bloor, S., Cuesta, S. M. & Lehner, P. J. Genome surveillance by HUSH-mediated silencing of intronless mobile elements. Nature 601, 440–445 (2022).
Duret, L. & Galtier, N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu. Rev. Genomics Hum. Genet. 10, 285–311 (2009).
Liu, H. et al. Tetrad analysis in plants and fungi finds large differences in gene conversion rates but no GC bias. Nat. Ecol. Evol. 2, 164–173 (2018).
Galtier, N. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19, 65–68 (2003).
D’Onofrio, G., Mouchiroud, D., Aissani, B., Gautier, C. & Bernardi, G. Correlations between the compositional properties of human genes, codon usage, and amino-acid-composition of proteins. J. Mol. Evol. 32, 504–510 (1991).
Duret, L. & Hurst, L. D. The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution. Mol. Biol. Evol. 18, 757–762 (2001).
Duret, L. & Galtier, N. The covariation between TpA deficiency, CpG deficiency, and G + C content of human isochores is due to a mathematical artifact. Mol. Biol. Evol. 17, 1620–1625 (2000).
Morales, A. C. et al. Causes and consequences of purifying selection on SARS-CoV-2. Genome Biol. Evol. 13, 17 (2021).
Zhou, Z. et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl Acad. Sci. USA 113, E6117–E6125 (2016).
Newman, Z. R., Young, J. M., Ingolia, N. T. & Barton, G. M. Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9. Proc. Natl Acad. Sci. USA 113, E1362–E1371 (2016).
Zhao, F. et al. Genome-wide role of codon usage on transcription and identification of potential regulators. Proc. Natl Acad. Sci. USA 118, e2022590118 (2021).
Vlaming, H., Mimoso, C. A., Field, A. R., Martin, B. J. E. & Adelman, K. Screening thousands of transcribed coding and non-coding regions reveals sequence determinants of RNA polymerase II elongation potential. Nat. Struct. Mol. Biol. 29, 613–620 (2022).
Pantier, R. et al. SALL4 controls cell fate in response to DNA base composition. Mol. Cell 81, 845–858 (2021).
Hisano, M., Ohta, H., Nishimune, Y. & Nozaki, M. Methylation of CpG dinucleotides in the open reading frame of a testicular germ cell-specific intronless gene, Tact1/Actl7b, represses its expression in somatic cells. Nucleic Acids Res. 31, 4797–4804 (2003).
Hodges, B. L., Taylor, K. M., Joseph, M. F., Bourgeois, S. A. & Scheule, R. K. Long-term transgene expression from plasmid DNA gene therapy vectors is negatively affected by CpG dinucleotides. Mol. Ther. 10, 269–278 (2004).
Parmley, J. L., Chamary, J. V. & Hurst, L. D. Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol. Biol. Evol. 23, 301–309 (2006).
Carlini, D. B. & Genut, J. E. Synonymous SNPs provide evidence for selective constraint on human exonic splicing enhancers. J. Mol. Evol. 62, 89–98 (2006).
Sauna, Z. E. & Kimchi-Sarfaty, C. Understanding the contribution of synonymous mutations to human disease. Nat. Rev. Genet. 12, 683–691 (2011).
Savisaar, R. & Hurst, L. D. Both maintenance and avoidance of RNA-binding protein interactions constrain coding sequence evolution. Mol. Biol. Evol. 34, 1110–1126 (2017).
Parmley, J. L. & Hurst, L. D. Exonic splicing regulatory elements skew synonymous codon usage near intron–exon boundaries in mammals. Mol. Biol. Evol. 24, 1600–1603 (2007).
Willie, E. & Majewski, J. Evidence for codon bias selection at the pre-mRNA level in eukaryotes. Trends Genet. 20, 534–538 (2004).
Warnecke, T. & Hurst, L. D. Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Mol. Biol. Evol. 24, 2755–2762 (2007).
Eskesen, S. T., Eskesen, F. N. & Ruvinsky, A. Natural selection affects frequencies of AG and GT dinucleotides at the 5′ and 3′ ends of exons. Genetics 167, 543–550 (2004).
Livingstone, M. et al. Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants. Hum. Mutat. 38, 1336–1347 (2017).
Eyre-Walker, A. & Hurst, L. D. The evolution of isochores. Nat. Rev. Genet. 2, 549–555 (2001).
Potrzebowski, L. et al. Chromosomal gene movements reflect the recent origin and biology of therian sex chromosomes. PLoS Biol. 6, e80 (2008).
Mordstein, C. et al. Transcription, mRNA export, and immune evasion shape the codon usage of viruses. Genome Biol. Evol. 13, evab106 (2021).
Goodman, D. B., Church, G. M. & Kosuri, S. Causes and effects of N-terminal codon bias in bacterial genes. Science 342, 475–479 (2013).
Gu, W., Zhou, T. & Wilke, C. O. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput. Biol. 6, e1000664 (2010).
Palazzo, A. F. et al. The signal sequence coding region promotes nuclear export of mRNA. PLoS Biol. 5, e322 (2007).
Huang, Y., Gattoni, R., Stevenin, J. & Steitz, J. A. SR splicing factors serve as adapter proteins for TAP-dependent mRNA export. Mol. Cell 11, 837–843 (2003).
Huang, Y. & Steitz, J. A. Splicing factors SRp20 and 9G8 promote the nucleocytoplasmic export of mRNA. Mol. Cell 7, 899–905 (2001).
Courel, M. et al. GC content shapes mRNA storage and decay in human cells. eLife 8, e49708 (2019).
Chen, C.-Y. et al. AU binding proteins recruit the exosome to degrade ARE-containing mRNAs. Cell 107, 451–464 (2001).
Namkoong, S., Ho, A., Woo, Y. M., Kwak, H. & Lee, J. H. Systematic characterization of stress-induced RNA granulation. Mol. Cell 70, 175–187.e8 (2018).
Khong, A. et al. The stress granule transcriptome reveals principles of mRNA accumulation in stress granules. Mol. Cell 68, 808–820.e5 (2017).
Takata, M. A. et al. CG dinucleotide suppression enables antiviral defence targeting non-self RNA. Nature 550, 124–127 (2017).
Duan, J. & Antezana, M. A. Mammalian mutation pressure, synonymous codon choice, and mRNA degradation. J. Mol. Evol. 57, 694–701 (2003).
Hrossova, D. et al. RBM7 subunit of the NEXT complex binds U-rich sequences and targets 3′-end extended forms of snRNAs. Nucleic Acids Res. 43, 4236–4248 (2015).
Lubas, M. et al. The human nuclear exosome targeting complex is loaded onto newly synthesized RNA to direct early ribonucleolysis. Cell Rep. 10, 178–192 (2015).
Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013).
Kimchi-Sarfaty, C. et al. A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315, 525–528 (2007).
Radhakrishnan, A. et al. The DEAD-Box protein Dhh1p couples mRNA decay and translation by monitoring codon optimality. Cell 167, 122–132.e9 (2016).
Radhakrishnan, A. & Green, R. Connections underlying translation and mRNA stability. J. Mol. Biol. 428, 3558–3564 (2016).
Buschauer, R. et al. The Ccr4–Not complex monitors the translating ribosome for codon optimality. Science 368, eaay6912 (2020).
Medina-Munoz, S. G. et al. Crosstalk between codon optimality and cis-regulatory elements dictates mRNA stability. Genome Biol. 22, 14 (2021).
Shu, H. et al. FMRP links optimal codons to mRNA stability in neurons. Proc. Natl Acad. Sci. USA 117, 30400–30411 (2020).
Kumar, A. et al. The slowing rate of CpG depletion in SARS-CoV-2 genomes is consistent with adaptations to the human host. Mol. Biol. Evol. 39, msac029 (2022).
Ficarelli, M. et al. CpG dinucleotides inhibit HIV-1 replication through zinc finger antiviral protein (ZAP)-dependent and -independent mechanisms. J. Virol. 94, e01337-19 (2020).
Hurst, L. D. et al. A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators. Genome Biol. 15, 413 (2014).
Bestor, T. H. DNA methylation: evolution of a bacterial immune function into a regulator of gene expression and genome structure in higher eukaryotes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 179–187 (1990).
Voo, K. S., Carlone, D. L., Jacobsen, B. M., Flodin, A. & Skalnik, D. G. Cloning of a mammalian transcriptional activator that binds unmethylated CpG motifs and shares a CXXC domain with DNA methyltransferase, human trithorax, and methyl-CpG binding domain protein 1. Mol. Cell Biol. 20, 2108–2121 (2000).
Bauer, A. P. et al. The impact of intragenic CpG content on gene expression. Nucleic Acids Res. 38, 3891–3908 (2010).
Singer, T., McConnell, M. J., Marchetto, M. C., Coufal, N. G. & Gage, F. H. LINE-1 retrotransposons: mediators of somatic variation in neuronal genomes? Trends Neurosci. 33, 345–354 (2010).
Singh, M. et al. A new human embryonic cell type associated with activity of young transposable elements allows definition of the inner cell mass. PLoS Biol. 21, e3002162 (2023).
Mirihana Arachchilage, G., Hetti Arachchilage, M., Venkataraman, A., Piontkivska, H. & Basu, S. Stable G-quadruplex enabling sequences are selected against by the context-dependent codon bias. Gene 696, 149–161 (2019).
Varshney, D., Spiegel, J., Zyner, K., Tannahill, D. & Balasubramanian, S. The regulation and functions of DNA and RNA G-quadruplexes. Nat. Rev. Mol. Cell Biol. 21, 459–474 (2020).
Wang, Y., Qiu, C. & Cui, Q. A large-scale analysis of the relationship of synonymous SNPs changing microRNA regulation withfunctionality and disease. Int. J. Mol. Sci. 16, 23545–23555 (2015).
Brest, P. et al. A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat. Genet. 43, 242–245 (2011).
Gartner, J. J. et al. Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc. Natl Acad. Sci. USA 110, 13481–13486 (2013).
Hamdorf, M. et al. miR-128 represses L1 retrotransposition by binding directly to L1 RNA. Nat. Struct. Mol. Biol. 22, 824–831 (2015).
Eyre-Walker, A. & Keightley, P. D. The distribution of fitness effects of new mutations. Nat. Rev. Genet. 8, 610–618 (2007).
Fairbrother, W. G., Holste, D., Burge, C. B. & Sharp, P. A. Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLoS Biol. 2, E268 (2004).
Chamary, J. V. & Hurst, L. D. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 6, R75 (2005).
Nackley, A. G. et al. Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314, 1930–1933 (2006).
Schattner, P. & Diekhans, M. Regions of extreme synonymous codon selection in mammalian genes. Nucleic Acids Res. 34, 1700–1710 (2006).
Wu, X. M. & Hurst, L. D. Why selection might be stronger when populations are small: intron size and density predict within and between-species usage of exonic splice associated cis-motifs. Mol. Biol. Evol. 32, 1847–1861 (2015).
Prats-Ejarque, G., Lu, L., Salazar, V. A., Moussaoui, M. & Boix, E. Evolutionary trends in RNA base selectivity within the RNase A superfamily. Front. Pharmacol. 10, 1170 (2019).
Mendell, J. T., Sharifi, N. A., Meyers, J. L., Martinez-Murillo, F. & Dietz, H. C. Nonsense surveillance regulates expression of diverse classes of mammalian transcripts and mutes genomic noise. Nat. Genet. 36, 1073–1078 (2004).
Torres, M. et al. Paraspeckles as rhythmic nuclear mRNA anchorages responsible for circadian gene expression. Nucleus 8, 249–254 (2017).
Prasanth, K. V. et al. Regulating gene expression through RNA nuclear retention. Cell 123, 249–263 (2005).
Lucks, J. B., Nelson, D. R., Kudla, G. R. & Plotkin, J. B. Genome landscapes and bacteriophage codon usage. PLoS Comput. Biol. 4, e1000001 (2008).
De Vlugt, C., Sikora, D. & Pelchat, M. Insight into influenza: a virus cap-snatching. Viruses 10, 641 (2018).
Jalkanen, A. L., Coleman, S. J. & Wilusz, J. Determinants and implications of mRNA poly(A) tail size—does this protein make my tail look big? Semin. Cell Dev. Biol. 34, 24–32 (2014).
Mauro, V. P. Codon optimization in the production of recombinant biotherapeutics: potential risks and considerations. Biodrugs 32, 69–81 (2018).
Ho, A. T. & Hurst, L. D. Effective population size predicts local rates but not local mitigation of read-through errors. Mol. Biol. Evol. 38, 244–262 (2021).
Allert, M., Cox, J. C. & Hellinga, H. W. Multifactorial determinants of protein expression in prokaryotic open reading frames. J. Mol. Biol. 402, 905–918 (2010).
Wu, X. & Hurst, L. D. Determinants of the usage of splice-associated cis-motifs predict the distribution of human pathogenic SNPs. Mol. Biol. Evol. 33, 518–529 (2016).
Abrahams, L. et al. Evidence in disease and non-disease contexts that nonsense mutations cause altered splicing via motif disruption. Nucleic Acids Res. 49, 9665–9685 (2021).
Soemedi, R. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat. Genet. 49, 848–855 (2017).
Mühlhausen, S. & Hurst, L. D. Transgene-design: a web application for the design of mammalian transgenes. Bioinformatics 38, 2626–2627 (2022).
Sharp, C. P. et al. CpG dinucleotide enrichment in the influenza A virus genome as a live attenuated vaccine development strategy. PLoS Pathog. 19, e1011357 (2023).
Yew, N. S. et al. CpG-depleted plasmid DNA vectors with enhanced safety and long-term gene expression in vivo. Mol. Ther. 5, 731–738 (2002).
Kariko, K., Buckstein, M., Ni, H. & Weissman, D. Suppression of RNA recognition by Toll-like receptors: the impact of nucleoside modification and the evolutionary origin of RNA. Immunity 23, 165–175 (2005).
Vaidyanathan, S. et al. Uridine depletion and chemical modification increase Cas9 mRNA activity and reduce immunogenicity without HPLC purification. Mol. Ther. Nucleic Acids 12, 530–542 (2018).
Ohta, T. The nearly neutral theory of molecular evolution. Ann. Rev. Ecol. Syst. 23, 263–286 (1992).
Christmas, M. J. et al. Evolutionary constraint and innovation across hundreds of placental mammals. Science 380, eabn3943 (2023).
Lynch, M. et al. Genetic drift, selection and the evolution of the mutation rate. Nat. Rev. Genet. 17, 704–714 (2016).
Sung, W., Ackerman, M. S., Miller, S. F., Doak, T. G. & Lynch, M. Drift-barrier hypothesis and mutation-rate evolution. Proc. Natl Acad. Sci. USA 109, 18488–18492 (2012).
Seczynska, M. & Lehner, P. J. The sound of silence: mechanisms and implications of HUSH complex function. Trends Genet. 39, 251–267 (2023).
Acknowledgements
The authors thank the Humboldt Foundation for the award of the Humboldt Prize to L.D.H. to enable him to spend time in Germany. S.R. is funded by the Evolution Education Trust.
Author information
Authors and Affiliations
Contributions
L.D.H. and S.R. researched data for the article. All authors contributed substantially to discussion of the content. L.D.H. wrote the article. All authors reviewed and/or edited the manuscript before submission. Z.I. contributed to discussion of issues regarding transposable elements specifically.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Glossary
- Codon optimality-mediated RNA decay
-
A process by which mRNAs with abundant non-optimal codons (those matching rare tRNAs) are subject to decay, probably mediated by slow ribosomal progression.
- Codon usage bias
-
A pattern in which synonymous codons occur at frequencies different from some null expectation (often the null expectation is of equal frequency).
- CpG islands
-
Long (>200 bp) stretches of DNA rich in CpG dinucleotides that commonly occur in the vicinity of mammalian promoters. They are commonly unmethylated if the associated gene is expressed.
- Cryptic splice sites
-
Splice sites within exons away from the canonical splice site that can result in alternative splicing.
- Effective population size
-
(Ne). The number of individuals who an idealized neutrally evolving population would require for it to have properties (such as genetic diversity) equivalent to the real population. Ne is often smaller than the census population size (N). In simple cases, it approximates to the number of breeding individuals.
- Exonic splice enhancers
-
(ESEs). Short (6–8 bp) RNA motifs that enable more accurate splicing, particularly in the vicinity of weak splice sites. ESEs often function as RNA-binding sites for serine–arginine-rich proteins.
- Exon-junction complex
-
A protein complex that forms on a pre-mRNA molecule at the junction between two exons joined by RNA splicing.
- Exosome complex
-
A multi-protein intracellular complex that degrades many types of RNA. In eukaryotes, it is present in the cytoplasm, the nucleus and the nucleolus. In humans, the cytoplasmic form is associated with DIS3L, whereas the nuclear complex contains DIS3. The nuclear form is termed the nuclear exosome complex, that which is targeted by the nuclear exosome targeting (NEXT) complex. The (RNA) exosome complex is not to be confused with exosomes, extracellular vesicles generated by cells.
- GC-biased gene conversion
-
(gBGC). Biased gene conversion describes the recombination of short stretches of genetic material from a donor sequence to an acceptor sequence that is biased as to which is the donor strand and which is the acceptor strand. In gBGC, AT:GC mismatches in recombinant sections are resolved with a preference towards G or C.
- G-Quadruplexes
-
Stable secondary structures formed in guanine-rich DNA and RNA sequences.
- Inverted repeat Alu elements
-
(IRAlus). Single-stranded Alu elements that are followed downstream by their reverse complement. This characteristic often allows inverted repeats to fold on themselves and form double-stranded structures.
- LINE1
-
(L1). LINE elements, of which L1 is an example, are the most abundant class of transposable elements in the human genome, formed by autonomous retrotransposition.
- Mutational equilibrium
-
The frequency (for example, of nucleotides) in a population at which, if only mutation bias and neutral evolution affect the frequency, the frequency does not change.
- Neutral evolution
-
The process by which allele frequency is determined by chance events alone operating on alleles of the same fitness.
- No-go mRNA decay
-
A process by which RNAs with stacks of stalled ribosomes are degraded.
- Nonsense-mediated decay
-
(NMD). A process by which mRNA molecules containing premature stop codons are degraded.
- Non-stop RNA decay
-
A process by which mRNAs without a proper stop codon are identified and targeted for decay. In eukaryotes, this process discharges ribosomes stalled at the 3′ end of mRNAs, directing those mRNAs to the exosome complex.
- Processing bodies
-
Ribonucleoprotein bodies found in the cytoplasm that retain mRNA molecules and contain proteins required for mRNA decay. A similar role is carried out by cytoplasmic stress granules.
- Retrogenes
-
Processed copies of genes formed from reverse transcription of mature mRNA molecules of a parental gene (hence without introns).
- Spurious transcripts
-
Transcripts generated by functionally irrelevant cellular events (such as transcription factor binding to random sequence).
- Synonymous mutations
-
Single base-pair mutations in a protein-coding exon that change the codon to a different but synonymous one and hence do not a priori change the amino acid sequence of the encoded protein.
- Translational selection
-
Selection that favours synonymous mutations owing to their effects on translation, typically assumed to be mediated by faster or more accurate translation. Commonly evidenced by a codon usage bias that favours synonymous codons matching more abundant iso-acceptor tRNAs.
- Transposable elements
-
DNA sequences that can move within genomes and replicate without depending on gene replication of the host cell.
- Zinc finger antiviral protein
-
(ZAP). A protein involved in preventing retroviral infection by binding of CpG-rich sequences and recruitment of the RNA degradation machinery.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Radrizzani, S., Kudla, G., Izsvák, Z. et al. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 25, 431–448 (2024). https://doi.org/10.1038/s41576-023-00686-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41576-023-00686-7
- Springer Nature Limited