Abstract
Extended CAG trinucleotide repeats (TNR) in the genes huntingtin (HTT) and androgen receptor (AR) are the cause of two progressive neurodegenerative disorders: Huntington’s disease (HD) and Spinal and Bulbar Muscular Atrophy (SBMA), respectively. Anyone who inherits the mutant gene in the complete penetrance range (>39 repeats for HD and 44 for SBMA) will develop the disease. An inverse correlation exists between the length of the CAG repeat and the severity and age of onset of the diseases. Growing evidence suggests that it is the length of uninterrupted CAG repeats in the mRNA rather than the length of poly glutamine (polyQ) in mutant (m)HTT protein that determines disease progression. One variant of mHTT (loss of inhibition; LOI) causes a 25 year earlier onset of HD when compared to a reference sequence, despite both coding for a protein that contains an identical number of glutamines. Short 21–22 nt CAG repeat (sCAGs)-containing RNAs can cause disease through RNA interference (RNAi). RNA hairpins (HPs) forming at the CAG TNRs are stabilized by adjacent CCG (in HD) or CUG repeats (in SBMA) making them better substrates for Dicer, the enzyme that processes CAG HPs into sCAGs. We now show that cells deficient in Dicer or unable to mediate RNAi are resistant to the toxicity of the HTT and AR derived HPs. Expression of a small HP that mimics the HD LOI variant is more stable and more toxic than a reference HP. We report that the LOI HP is processed by Dicer, loaded into the RISC more efficiently, and gives rise to a higher quantity of RISC-bound 22 nt sCAGs. Our data support the notion that RNAi contributes to the cell death seen in HD and SBMA and provide an explanation for the dramatically reduced onset of disease in HD patients that carry the LOI variant.
Similar content being viewed by others
Introduction
Trinucleotide repeat (TNR) expansions in a number of genes are the cause of many neurodegenerative diseases [1]. The most frequently amplified triplet is CAG (that codes for the amino acid glutamine [Q]), as found in Huntington’s disease (HD) [2], Spinal and Bulbar Muscular Atrophy (SBMA) [3], and many other so-called triplet repeat diseases [4,5,6,7,8,9,10,11,12]. HD is caused by expansion of a CAG repeat in exon 1 of the huntingtin (HTT) gene. It is marked by progressive degeneration of neurons particularly in the striatum [4, 13]. Anyone who inherits an expanded CAG TNR in the mutant (m)HTT gene in the full penetrance range (>39 repeats) will develop the disease, with the length of the CAG inversely correlating with the severity and age of onset of the disease [13, 14]. Gene silencing experiments in mouse models have shown that when the expression of mHTT is reduced symptoms improve [59] and anti-Flag M2 Magnetic beads (Sigma #M8823), a library was prepared and then sequenced on an Illumina Hi-Seq 4000 exactly as previously described [60]. RNA seq data can be accessed at GSE201691 and GSE201692.
Sequences used for small RNA library preparation:
19 nt RNA size marker: rCrGrUrArCrGrCrGrGrGrUrUrUrArArArCrGrA;
35 nt RNA size marker: rCrUrCrArUrCrUrUrGrGrUrCrGrUrArCrGrCrGrGrArArUrArGrUrUrUrArArArCrUrGrU;
To identify the reads derived from the HTT HPs, we used regular expressions within Perl to extract all reads that contained one of the following 19 nt long sequences: group 1: CAGCAGCAGCAGCAGCAGC, AGCAGCAGCAGCAGCAGCA, GCAGCAGCAGCAGCAGCAG; group 2: CCGCCGCCGCCGCCGCCGC, CGCCGCCGCCGCCGCCGCC, GCCGCCGCCGCCGCCGCCG. Reads were summed up in the two groups in all samples as well as all remaining reads were summed up as group 3.
Small RNA seq of short RNA oligonucleotides
Small RNA libraries for the 19 nt and 35 nt RNA size marker (sequences above) as well as for (CAG)7 and (CAG)12 were prepared as described above for library post Ago pull down. In each case, 10 pmol RNA was radiolabeled as described [61] before proceeding for library preparation. For Set 1 (Fig. 5A), post 3' ligation with adenylated adapter, the 19 nt RNA was combined with 35 nt and (CAG)7 RNA was combined with (CAG)12 and then 5' ligation was performed individually for the two combined samples. For Set 2 (Fig. 5A), all four RNA samples were combined post 3' ligation. After reverse transcription, cDNA for Set 1 was amplified using two different 3' PCR primers for the two combined samples and for Set 2, only one 3' PCR primer was used. Post sequencing on Illumina Hi-Seq 4000, the reads for Set 1 were first separated by Illumina based on 3' PCR primers and then both for Set 1 and 2 using the barcode on 3' adenylated adapters. RNA seq data can be accessed at GSE201694.
Monitoring growth over time and quantification of cell death
To monitor cell growth over time, cells were seeded between 1000 and 4000 per well in a 96-well plate in triplicates. The plate was then scanned using the IncuCyte ZOOM live cell imaging system (Essen BioScience). Images were captured at regular intervals, at the indicated time points, using a 10x objective. Cell confluence was calculated using the IncuCyte ZOOM software (version 2015A). A viability assay that measures the level of ATP within cells was done in 96-well plates. Briefly, 96 h post reverse transfection with siRNAs or HPs, media in each well was replaced with 50 μl fresh medium and 50 μl of Cell Titer-Glo reagent (Promega #G7570) was added. The plates were covered with aluminum foil and shaken for 5 min and then incubated for 10 min at room temperature before the luminescence was read on a BioTek Cytation 5.
RNA secondary structure predictions and binding energy calculations
To determine the folding and binding energies of HTT or AR HPs, we used RNAfold [62] (at http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) with the following settings: (1) Fold algorithms and basic options: minimum free energy (MFE) and partition function, avoid isolated base pairs, dangling energies on both sides of a helix in any case; (2) Energy parameters: RNA parameters (Turner model, 1999); After conversion of SHAPE reactivities, apply pseudo energies to: Stacked pairs; slope (m): 1.9; intercept (b): 0.7. We chose as output options: interactive RNA secondary structure plot. For each RNA the structure with the lowest ΔG was used. We either subjected the TNR containing regions of wtHTT and its mutants with 15 extra nucleotides added to the 5' and the 3' end or the mHTT and mAR mimicking short HPs as well as pure CAG TNR containing oligonucleotides to the analysis.
Data analyses
For the analysis of sCAGs in the Ago pull down RNA seq analysis in Fig. 1A SPOROS output A_normCounts was generated as described [63]. This file includes BLAST search results for murine miRNAs and all RNA classes. This information was used to calculate the percent miRNA content for each sample. All reads with uninterrupted CAG repeats of 11 nts or longer were identified and listed.
For the analysis in Fig. 1B we used an RNA seq data (50 nucleotides read length) set of 293T cells infected with lentiviral vectors expressing exon 1 of wild-type HTT (wtHTT, 18 polyQ repeats) or mutant HTT (mHTT 66Q, 66 polyQ repeats)—all in triplicates [64]. The data were obtained from GEO, accession number GSE78928. To identify all reads that contained CAG repeats of lengths ranging from 10 to 50 nts, we generated 40 files in which we isolated a CAG repeats (10,11,12…..or 50 nts in lengths) in each individual read from the preceding and trailing nucleotides and then counted the number of reads in each file. Every read was only counted once in the group with the longest repeat length it appeared. The average read numbers that contain different lengths of CAG repeats were plotted with Standard Deviation in Fig. 1B.
For the analysis in Fig. 1C, the same data set was used in addition to triplicate RNA seq data sets generated from brains of mice infected with adeno associated viral vectors expressing exon 1 of wtHTT or mutant HTT [64] either 10 days or 3 weeks after injection of viruses. In these cases all 50mer reads comprised of pure CAG, AGC or GCA repeats were counted.
To perform the analysis in Fig. 1D, we first generated lists of all human genes that contain either a CAG or a CUG repeat sequence of 10, 11, 12….19 nts nucleotides in length or longer in their mRNA. To this end all 5'UTRs, ORFs and 3'UTRs were extracted from the Homo sapiens (GRCh38.p7) gene dataset of the Ensembl database using the Ensembl Biomart data mining tool. To perform the analysis in Fig. 1E, we first generated lists of all murine genes that contain a CUG repeat sequence of 10 nts or longer in their mRNA. To this end all 5'UTRs, ORFs and 3'UTRs were extracted from the Mus musculus (GRCm39) gene dataset of the Ensembl database using the Ensembl Biomart data mining tool. For each gene, only the longest deposited 5'UTR, ORF, or 3'UTR was stitched together. Custom perl scripts were used to identify whether each mRNA contained an identical match to a particular repeat sequence.
GSEA was performed using the GSEA v2.2.4 software from the Broad Institute (www.http://software.broadinstitute.org/gsea); 1000 permutations were used. 20 lists (see above) with the genes containing genes with the different CAG or CUG lengths were used. They were set as custom gene sets to determine enrichment of genes in downregulated genes from an RNA-seq data set comparing expression of genes between brains of 49 normal brains and 20 brains from HD patients as described [65]. The human data were retrieved from GSE64810, the mouse data from GSE50379. Log(Fold Change) was used as the ranking metric. p-values below 0.05 were considered significantly enriched.
For the analysis shown in Fig. S1 gene array data sets on 293T, HeLa and human brains were downloaded from GEO (accession numbers: GSE171397 and GSE209928, and GSE64810). The data of all coding genes from untreated cells or control brains were extracted and each sample was normalized to one million reads. All human genes containing (CUG)n, (UGC)n, or (GCU)n repeats of 10 or more nucleotides in length were highlighted as well as all genes that are part of the list of critical survival genes available at DepMap.org (version 22Q2). We downloaded all 2165 genes that were shown to be critical of survival of any of the 1840 different cell lines tested. Percent expression of these genes was calculated and pie charts were generated in Excel. Venn diagrams of all potential target genes in the three data sets with normalized expression signals of >100 were generated using http://bioinformatics.psb.ugent.be/webtools/Venn/ and http://www.biovenn.nl (to obtain the correct size proportional circles).
Statistical analyses
Two-way analysis of variances (ANOVA) was performed using the Stata 14 software to compare treatment effects over the course of the experiment for the varying cell types. The Fishers exact test for Fig. 1C was done by using the online tool at https://www.socscistatistics.com/tests/fisher/default2.aspx. All other statistical analyses were conducted in Stata 14 (RRID:SCR_012763) or R 3.3.1 in Rstudio (RRID:SCR_000432).
Results
Evidence of silencing of CUG TNR containing genes in the brains of HD patients and HD mice
Even though RNAi active sCAGs of 21 nt in length form and can be detected specifically in HD patients using either Northern blotting or sequencing after polyadenylating and cloning them into a sequencing vector, the amount of sCAG was found to be very difficult to quantify by RNA seq analysis [34]. We have made similar observations. In an RNA seq analysis of RISC-bound small RNAs in brains of R/6 mice with 250 or 450 CAG long TNRs [66] we did not find a single read with a CAG TNR >19 nt and all CAG TNR containing reads were either detected at background levels or were derived from other genes (red bold numbers in Fig. 1A). This was also apparent when the RNA seq data from another study were examined [64]. That study employed expression of exon 1 of HTT containing either a wild-type (wt) length of CAG TNRs (18Q, 54 nts) or a mutant length (66Q, 198 nts). It was intriguing that in a large RNA seq analysis no increase in (CAG)n-containing reads between 10 and 50 nt in length was detected in 293T cells infected with a lentiviral mtHTT when compared to cells infected with lentiviral wtHTT (Fig. 1B). In addition, even the reads of short (CAG)n containing genes were of very low abundance. A similar finding was made when the number of reads with pure (CAG)n were counted in an RNA seq data set of mouse brains infected with an adeno associated virus (AAV) expressing either wt or mtHTT (Fig. 1C). Only 11 reads with 50 nt long CAG, AGC or GCA repeats were detected in these mice 10 days after infection, with even fewer reads detectable at 3 weeks after infection. Not a single pure (CAG)n containing read of 19 nt or longer was detected in any of the three replicates of the small RNA seq samples or with an RNA immunoprecipitation sequencing assay (data not shown). The reason for the difficulties of detecting CAG TNR containing RNAs by RNA seq is not known but is likely due to the repetitive nature of these RNA species.
We therefore decided to test whether in HD patients we could find indirect evidence of the expression of CAG TNR containing RNAs. Assuming that they act through RNAi we would expect to find a downregulation of genes containing the target sequence of a CAG containing small RNA: CUG trinucleotide repeats [(CUG)n]. We previously provided evidence with in vitro transfected cells that a CAG derived siRNA of 19 nts caused a significant downregulation of genes that contained CUG TNRs of 19 nt or longer [50]. We chose a large RNA seq data set from a study that compared gene expression between 49 normal and 20 HD patient brains [65] to perform gene set enrichment analyses (GSEA) with ten different lists of genes that contain CUG repeats of 10 nt or longer, 11 nt or longer, etc. up to 19 nt or longer assuming various lengths of complementarity between the sCAGs and (CUG)n-containing targets. Enrichment scores increased with longer CUG TNRs and all but one was statistically significant (Fig. 1D, bottom left). This suggests that CAG TNR can target a variety of genes with different lengths of CUG TNRs. It appears that the most significant downregulation was found with genes containing a CUG TNR of 16 nts and 19 nts (GSEA graphs on top of Fig. 1D). In contrast, the increase in enrichment with longer TNR length was much less pronounced in genes containing CAG TNRs and all but one did not reach statistical significance even though the number of genes containing either CAG or CUG TNRs for each TNR length was comparable (numbers in bottom panels in Fig. 1D). Similar results were obtained by analyzing a gene array data set of control (Hdh(Q20/Q20)) and mutant HD (Hdh(Q111/Q111)) mice [67]. An enrichment of (CUG)n (10 nt or longer) containing genes was found in the genes downregulated in striatum of the Q111 versus the Q20 mice (Fig. 1E). These data suggest that in HD patient brains and a HD mouse model there is selective pressure on downregulation of CUG TNR containing genes consistent with the interpretation that they could be targeted by CAG TNR containing short RNAs through RNAi.
The length of uninterrupted stem regions in CAG TNR containing HD derived hairpins correlates with disease severity and inversely correlates with disease onset
Patients develop HD when the length of the CAG expansion in the HTT gene exceeds 36 TNRs (Fig. 2A) [14]. The R-loop structure that is formed by the CAG TNRs present in HTT can be predicted to fold into extended stems interrupted by loop regions (Fig. 2B). It has been shown that such stem containing HPs are substrates for Dicer [35]. We therefore predicted that the longer the stem that forms in mutant HTT (mHTT) is and the lower the binding energy, the more sCAG will form as these structures will be better substrates for Dicer. To test this hypothesis in a simulation, we performed RNA folding experiments of the section in HTT containing an increasing length of CAG TNR stretches (Fig. 2B). The longest stem of 16 repeats was predicted to form in the RNAs with the longest uninterrupted CAG TNR. At the same time the stability of these structures also increased (as shown by the decreasing binding energies) with an increased TNR length. The increase of stem lengths from 6 to 16 CAG TNRs correlates with a worsening in HD disease scores [13].
An open question remains as to how extending the uninterrupted CAG TNR length from 40 to 42 in the HTT LOI mutant by adding just two point mutations (Fig. 2C, D) could result in a dramatic reduction in disease onset by 25 years [52]. We predicted that these minor changes may affect the folding of the HPs in a way that would allow them to form more stable structures with strongly extended uninterrupted CAG TNR containing stem regions. When we compared the predicted secondary structure of the HTT reference sequence with that of the LOI mutant, we found a profound shift from a tripartite stem structure disrupted by a loop region and a longest stem of 15 CAG TNRs to a more stable bipartite structure forming one long stem region of 25 CAG TNRs, by far the longest uninterrupted CAG TNR containing stem detected in any RNA folding analysis of mHTT with the lowest binding energy (Fig. 2C, D). The extended CAG repeat containing stem region in the LOI allele could be a better substrate for Dicer and result in generation of an increased amount of sCAGs.
Short oligonucleotide mimetics of the reference and LOI HTT mutants have different levels of toxicity on cells through RNAi
It was previously shown that the overall structural architecture of the triplet repeat region in four HTT transcripts that differed only by the length of the uninterrupted CAG TNR was very similar [35, 68]. We therefore predicted that a HP with shorter CAG repeats that can be easily synthesized and transfected would be a good mimetic of the overall structure formed by CAG repeats, and that structures with longer repeats would be even more toxic. We designed short HP models of the Ref and the LOI mHTT structures (Fig. 3A). As with the longer version, the short mimetics of these two variants had different binding energies and stem regions of different lengths. Single stranded pure CAG TNR containing oligonucleotides were used as a control. According to previous studies they were also expected to fold into a stem through the formation of R-loops [45]. To determine whether these HPs would affect cell viability differently, we transfected them into the neuroblastoma cell line NB7 [54]. Both the Ref and the LOI mutant slowed growth more than the (CAG)21 control HP (Fig. 3B, left panel). Interestingly, the LOI HP was significantly more toxic to the cells than the Ref HP. This was confirmed by viability assays which also included four pure (CAG)n containing control hairpins. In contrast to the HD derived HPs, none of these (CAG)n containing ones were toxic to the cells (Fig. 3B, right panel).
To determine whether the toxicity exerted by the HD derived HPs involved RNAi, we tested the two mutant HTT HPs in HeLa cells with a deletion of Ago2 (Fig. 3C). These Ago2 knockout cells were completely resistant to cell growth inhibition by the Ref HP and highly resistant to the effects of the LOI HP. In this experiment even a pure CAG containing HP of 40 CAG repeats had no activity. These data suggested that the observed toxicity was dependent on a functional RISC. This was also confirmed in viability assays (Fig. 3D). In neither HeLa nor 293T cells deficient in Ago2 expression did either of the two HD derived HPs show toxicity. Both 293T and HeLa cells express a substantial amount of genes (~7.5%) that contain CUG repeats of at least 10 nt in length (Fig. S1A, B) many of which are substantially expressed in both cell lines (Fig. S1A, B, D). Interestingly, 60% of the top ten most highly expressed (CUG)n containing genes were critical survival genes (shown in red in Fig. S1A, B). Human brains also expressed about the same amount of (CUG)n containing genes and two of the top ten most highly expressed ones were also in the top ten in the two cell lines (Fig. S1C). A substantial number of such genes were expressed in all three data sets (Fig. S1E).
A number of reports have demonstrated that (CAG)n containing HPs are good substrates for Dicer [35, 45,46,47]. We therefore predicted that the two toxic HD derived HPs would not be toxic to cells deficient in Dicer expression. Indeed, the two HD derived HPs which were toxic to 293T parent cells did not significantly kill 293T Dicer ko cells (Fig. 3E, left two panels), however, a minor reduction in cell viability was still detected. To test whether any residual Dicer expression we detected by Western blotting on longer exposure in 293T Dicer ko cells (not shown), could have affected the results, we transfected the HCT116 cells which were shown to tolerate a complete biallelic deletion of Dicer [69] (Fig. 3E, right three panels). While the Ref HP was not toxic to these Dicer ko cells, the LOI HP still appeared to affect cell viability. It is possible however, that this was due to some loading of HP sequences into the RISC without the help of Dicer because cells deficient for AGO1, 2 and 3 were completely resistant to the toxicity of the two HD derived HPs (Fig. 3E, far right panel). These data also exclude that toxicity exerted by the HPs was due to binding of the HPs to other RNA binding proteins such as muscleblind 1 (MBNL1) [70].
The toxicity of CAG TNR hairpin mimetic of mutant androgen receptor depends on the length of the CAG repeat containing stem
The idea that a more stable HP makes it more toxic was also proposed for HPs that were predicted to form in the CAG TNR expansion present in AR causing SBMA [68]. It was shown that the stability of both HTT and AR HP structures in vitro is affected by neighboring repeat regions [68]. In the HTT locus, there is a polymorphic CCG tract that is 12 bp downstream of the expansion-prone (CAG)n (Fig. 2A). Similarly, the AR locus contains a (CTG)3(CAG)n sequence (Fig. 4A) with a monomorphic (CAG)6 tract 18 bp downstream [3]. We predicted that this stabilized structure in mAR may also result in it being a better substrate for Dicer and that this structure would be highly toxic to cells via RNAi. We also predicted that a longer CAG repeat containing stem region in the HP would result in production of a higher amount of sCAG and hence greater toxicity. To test this hypothesis, we synthesized two AR gene derived short HP mimetics with a CAG TNR-containing stem stabilized by the authentic CAG/CUG TNR clamp at its base (Fig. 4B). One contained 3 CUG repeats and 9 CAG repeats (AR-HP 3-9) and the other 3 CUG and 17 CAG repeats (AR-HP 3-17). The 3-17 HP was predicted to form a more stable structure than the 3–9 HP. When transfected into NB7 cells the 3-17 HP was more toxic than that 3–9 HP (Fig. 4C). It was also more toxic than even the HD derived LOI HP likely due to forming a more stable structure caused by its complete complementarity in the CAGCAGCAGCA:UGCUGCUGCUG clamp. Even the high toxicity of the 3-17 HP was due to RNAi as both HeLa and 293T cells lacking Ago2 expression were completely protected from this toxicity (Fig. 4D, E). Similar to the results obtained with the HD derived HPs the AR derived HP did not kill 293T cells deficient in Dicer expression (Fig. 4F). These data suggest that short HPs mimic the activity of the longer sequences found in either HD or SBMA patients and that a combination of the length of the CAG TNR-containing stem regions and their predicted folding energies affect the toxicity of the HP killing RNAi competent cells.
The HD LOI hairpin produces more RISC-bound sCAGs than the reference hairpin
We were wondering whether we would find a higher amount of RISC-bound sCAGs in cells transfected with the more stable and more toxic HD derived LOI HP compared to the Ref HP. However, our data and those by others [34] suggested that CAG TNRs are difficult to sequence on the Illumina platform. To test whether CAG TNR-containing RNAs could be sequenced at all, we generated sets of libraries for small RNA seq (Fig. 5A). In set 1 we used the Illumina platform to sequence two independent libraries: one derived from 10 pmol of two RNA size markers (19 and 35 nt, as nonrepetitive controls) and one that contained the same amount of two CAG TNR containing short RNAs (21 and 36 nt in length). We chose the 21 nt long CAG TNR sequence (CAG)7 as this is the length of short CAG repeat containing RNAs (sCAGs) that was shown to be associated with disease pathology in HD patients [34]. In set 2 we first mixed all four oligonucleotides and then sequenced the resulting library (Fig. 5A). This way CAG TNR containing oligonucleotides were in competition with the nonrepetitive size markers during all steps of library generation and sequencing. In none of the experiments were the larger oligonucleotides efficiently sequenced in this small RNA seq experiment. In set 1 (CAG)7 was more efficiently sequenced than the 19 nt marker. However, sequencing errors of (CAG)7 were much higher than seen with the control. Only 63% of all reads had the expected sequence and length (Fig. 5B, left). In set 2 sequencing of (CAG)7 was less efficient than that of the 19 nt marker suggesting that the CAG TNR-containing oligonucleotide was at a disadvantage compared to the nonrepetitive sequence (Fig. 5B, right). However, the results also suggested that it was possible to sequence sCAGs when they were present at high concentration.
We therefore decided to use RNA seq to analyze RISC-bound sCAGs in cells transfected with the HTT HP. The LOI HP contains a long stem with a mixture of (CAG)n and (CCG)n (Fig. 3A). We first transfected NB7 cells with 2.5 nM of these two HPs, the (CAG)21 and a nontargeting siRNA control (siNT1). We then performed an Ago pulldown as previously described [60] and sequenced the Ago bound small RNAs (Fig. 5C). We detected a significant number of pure CAG containing short RNAs in the cells transfected with the Ref HP, with only small amounts of CCG containing short RNAs. In the cells transfected with the LOI HP we found about four times more RISC-bound sCAGs but about the same small amount of short RNAs containing the CCG repeat sequence. These data are in line with a previous report showing that transcripts composed of CUG and CAG repeats are better Dicer substrates than those composed of CCG and CGG repeats [35]. The amount of CAG-containing short RNAs pulled down from cells transfected with the same amount of (CAG)21 was also small. These results suggest that (1) the LOI HP results in about four times more sCAGs bound to the RISC, consistent with the higher toxicity of this HP when compared to the Ref sequence, and (2) CAG-containing short RNAs are more efficiently loaded into the RISC than CCG containing sequences. The most abundant RISC-bound short RNAs were 21–22 nt in length (Fig. 5D) consistent with Dicer cleaving the HPs and in line with data from a previous analysis which found that Dicer cleavage of (CAG)n results in 21–22 nt long sCAGs [35]. Interestingly, each length group only contained one defined species, with all CAG-containing RISC-bound short RNAs beginning with AGC and most of the abundant (CCG)n-containing short RNAs starting with CCG. The finding that the sequence and length of the most abundant RISC bound CAG TNR-containing short RNA is identical between the cells transfected with the LOI and the Ref HP suggests that it is the amount of these toxic sequences and not their sequence or length that distinguishes the LOI mutant from the Ref sequence. In summary, our data suggest that CAG repeat HPs derived from either HD or SBMA kill cells through RNAi after being processed by Dicer and that the HD LOI mutant is more toxic to cells than the reference sequence because it gives rise to higher amounts of RISC bound sCAGs.
Discussion
Our data confirm previous results that the regions that contain extended (CAG)n in both HTT and AR and form HPs are stabilized by adjacent nonCAG TNR sequences that act as clamps [35, 68]. In addition, they suggest that both the HTT and the AR-derived HPs are toxic to cells through RNAi. Both HPs depend on Dicer for processing and AGO2 to mediate RNAi. Our data also suggest that the LOI mutant HTT is more toxic than the Ref sequence and this is based on its unique structure with much longer CAG TNR sequences that are part of an extended double stranded stem region without an interruption by a loop region. This may make this structure a better substrate for Dicer resulting in an uptake of a larger number of CAG containing short RNAs into the RISC. Longer double stranded (CAG)n extensions in HTT will therefore result in higher amounts of RISC bound sCAG and hence higher toxicity.
Recently, the data on the role of the length of uninterrupted CAG mRNA rather than the length of the polyQ stretch was confirmed in a new transgenic mouse model [70]. These bacterial artificial chromosome (BAC) transgenic mice express human mutant huntingtin (mHTT) with uninterrupted CAG repeats (BAC-CAG mice). By comparing these mice with multiple other HD mouse models carrying CAA-interrupted CAG repeats a robust positive correlation between the average concordance and uninterrupted mutant huntingtin CAG repeat length was found, whereas the correlation with glutamine repeat length was not statistically significant. Interestingly, while it was mentioned that CAG containing short RNAs can be toxic to cells, the toxicity of the CAG repeat containing RNAs was mostly discussed in the context of RAN translation and of their association with nuclear foci formation and colocalization with MBNL1 rather than through the RNAi activity of small CAG repeat containing RNAs.
MBNL1 binds to double stranded CUG repeat regions [72]. It is believed that via this activity MBNL1 contributes to the formation of nuclear CUG RNA foci, and that nuclear but not cytoplasmic localization triggers pathogenesis in the CUG repeat disease Myotonic dystrophy type 1 (DM1) [73]. There is, however, evidence showing that such foci do not contribute to disease pathology [74]. Furthermore, experimental results show that structures formed by CAG TNRs are susceptible to RNAi, suggesting that these HPs are transported to the cytosol where most of the RISC complexes are located [35, 68] and where they can become RNAi active. Our data suggest that HPs mimicking the RNA structures that form in mHTT or mAR are toxic to cells through RNAi. Based on our finding that a HP resembling the HTT LOI mutant is more toxic and produces more sCAG than the Ref mHTT, we provide an alternative explanation for how only two point mutations in mHTT in the LOI variant can result in a 25 year earlier age at onset of disease. Our results support the idea that targeting sCAGs rather than the entire mCAG-RNA would be a good approach to treating these diseases, as this would selectively reduce the amount of disease-causing sCAGs without affecting the mRNA levels of the wild-type HTT mRNA. An allele specific targeting would therefore not be necessary when inhibiting sCAGs in diseases caused by CAG repeat extensions.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Murmann AE, Yu J, Opal P, Peter ME. Trinucleotide repeat expansion diseases, RNAi and cancer. Trends Cancer. 2018;4:684–700.
The_Huntington’s_Disease_Collaborative_Research_Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s Disease Collaborative Research Group. Cell 1993;72:971–83.
La Spada AR, Wilson EM, Lubahn DB, Harding AE, Fischbeck KH. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 1991;352:77–9.
Nalavade R, Griesche N, Ryan DP, Hildebrand S, Krauss S. Mechanisms of RNA-induced toxicity in CAG repeat disorders. Cell Death Dis. 2013;4:e752.
Komure O, Sano A, Nishino N, Yamauchi N, Ueno S, Kondoh K, et al. DNA analysis in hereditary dentatorubral-pallidoluysian atrophy: correlation between CAG repeat length and phenotypic variation and the molecular basis of anticipation. Neurology 1995;45:143–9.
Orr HT, Chung MY, Banfi S, Kwiatkowski TJ Jr., Servadio A, Beaudet AL, et al. Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1. Nat Genet. 1993;4:221–6.
Sanpei K, Takano H, Igarashi S, Sato T, Oyake M, Sasaki H, et al. Identification of the spinocerebellar ataxia type 2 gene using a direct identification of repeat expansion and cloning technique, DIRECT. Nat Genet. 1996;14:277–84.
Kawaguchi Y, Okamoto T, Taniwaki M, Aizawa M, Inoue M, Katayama S, et al. CAG expansions in a novel gene for Machado-Joseph disease at chromosome 14q32.1. Nat Genet. 1994;8:221–8.
Zhuchenko O, Bailey J, Bonnen P, Ashizawa T, Stockton DW, Amos C, et al. Autosomal dominant cerebellar ataxia (SCA6) associated with small polyglutamine expansions in the alpha 1A-voltage-dependent calcium channel. Nat Genet. 1997;15:62–9.
David G, Abbas N, Stevanin G, Durr A, Yvert G, Cancel G, et al. Cloning of the SCA7 gene reveals a highly unstable CAG repeat expansion. Nat Genet. 1997;17:65–70.
Holmes SE, O’Hearn EE, McInnis MG, Gorelick-Feldman DA, Kleiderlein JJ, Callahan C, et al. Expansion of a novel CAG trinucleotide repeat in the 5’ region of PPP2R2B is associated with SCA12. Nat Genet. 1999;23:391–2.
Fujigasaki H, Martin JJ, De Deyn PP, Camuzat A, Deffond D, Stevanin G, et al. CAG repeat expansion in the TATA box-binding protein gene causes autosomal dominant cerebellar ataxia. Brain 2001;124:1939–47.
Gatchel JR, Zoghbi HY. Diseases of unstable repeat expansion: mechanisms and common principles. Nat Rev Genet. 2005;6:743–55.
Walker FO. Huntington’s disease. Lancet 2007;369:218–28.
Boudreau RL, McBride JL, Martins I, Shen S, **ng Y, Carter BJ, et al. Nonallele-specific silencing of mutant and wild-type huntingtin demonstrates therapeutic efficacy in Huntington’s disease mice. Mol Ther. 2009;17:1053–63.
Kennedy WR, Alter M, Sung JH. Progressive proximal spinal and bulbar muscular atrophy of late onset. A sex-linked recessive trait. Neurology 1968;18:671–80.
Lund A, Udd B, Juvonen V, Andersen PM, Cederquist K, Davis M, et al. Multiple founder effects in spinal and bulbar muscular atrophy (SBMA, Kennedy disease) around the world. Eur J Hum Genet. 2001;9:431–6.
Atsuta N, Watanabe H, Ito M, Banno H, Suzuki K, Katsuno M, et al. Natural history of spinal and bulbar muscular atrophy (SBMA): a study of 223 Japanese patients. Brain 2006;129:1446–55.
Wild EJ, Tabrizi SJ. Therapies targeting DNA and RNA in Huntington’s disease. Lancet Neurol. 2017;16:837–47.
Duyao MP, Auerbach AB, Ryan A, Persichetti F, Barnes GT, McNeil SM, et al. Inactivation of the mouse Huntington’s disease gene homolog Hdh. Science 1995;269:407–10.
Nasir J, Floresco SB, O’Kusky JR, Diewert VM, Richman JM, Zeisler J, et al. Targeted disruption of the Huntington’s disease gene results in embryonic lethality and behavioral and morphological changes in heterozygotes. Cell 1995;81:811–23.
Zeitlin S, Liu JP, Chapman DL, Papaioannou VE, Efstratiadis A. Increased apoptosis and early embryonic lethality in mice nullizygous for the Huntington’s disease gene homologue. Nat Genet. 1995;11:155–63.
Ross CA. Polyglutamine pathogenesis: emergence of unifying mechanisms for Huntington’s disease and related disorders. Neuron 2002;35:819–22.
Orr HT, Zoghbi HY. Trinucleotide repeat disorders. Annu Rev Neurosci. 2007;30:575–621.
Napierala M, Krzyzosiak WJ. CUG repeats present in myotonin kinase RNA form metastable “slippery” hairpins. J Biol Chem. 1997;272:31079–85.
Hsu RJ, Hsiao KM, Lin MJ, Li CY, Wang LC, Chen LK, et al. Long tract of untranslated CAG repeats is deleterious in transgenic mice. PLoS One. 2011;6:e16417.
Bogomazova AN, Eremeev AV, Pozmogova GE, Lagarkova MA. The role of mutant RNA in the pathogenesis of Huntington’s disease and other polyglutamine diseases. Mol Biol. 2019;53:954–67.
Yuan Y, Compton SA, Sobczak K, Stenberg MG, Thornton CA, Griffith JD, et al. Muscleblind-like 1 interacts with RNA hairpins in splicing target and pathogenic RNAs. Nucleic Acids Res. 2007;35:5474–86.
Jain A, Vale RD. RNA phase transitions in repeat expansion disorders. Nature 2017;546:243–7.
Tsoi H, Lau TC, Tsang SY, Lau KF, Chan HY. CAG expansion induces nucleolar stress in polyglutamine diseases. Proc Natl Acad Sci USA. 2012;109:13428–33.
Tsoi H, Chan HY. Expression of expanded CAG transcripts triggers nucleolar stress in Huntington’s disease. Cerebellum 2013;12:310–2.
Tsoi H, Lau CK, Lau KF, Chan HY. Perturbation of U2AF65/NXF1-mediated RNA nuclear export enhances RNA toxicity in polyQ diseases. Hum Mol Genet. 2011;20:3787–97.
Banez-Coronel M, Ayhan F, Tarabochia AD, Zu T, Perez BA, Tusi SK, et al. RAN Translation in Huntington Disease. Neuron 2015;88:667–77.
Banez-Coronel M, Porta S, Kagerbauer B, Mateu-Huertas E, Pantano L, Ferrer I, et al. A pathogenic mechanism in Huntington’s disease involves small CAG-repeated RNAs with neurotoxic activity. PLoS Genet. 2012;8:e1002481.
Krol J, Fiszer A, Mykowska A, Sobczak K, de Mezer M, Krzyzosiak WJ. Ribonuclease dicer cleaves triplet repeat hairpins into shorter repeats that silence specific targets. Mol Cell. 2007;25:575–86.
Wang Y, Sheng G, Juranek S, Tuschl T, Patel DJ. Structure of the guide-strand-containing argonaute silencing complex. Nature 2008;456:209–13.
Leuschner PJ, Ameres SL, Kueng S, Martinez J. Cleavage of the siRNA passenger strand during RISC assembly in human cells. EMBO Rep. 2006;7:314–20.
Schirle NT, MacRae IJ. The crystal structure of human Argonaute2. Science 2012;336:1037–40.
Eulalio A, Huntzinger E, Izaurralde E. GW182 interaction with Argonaute is essential for miRNA-mediated translational repression and mRNA decay. Nat Struct Mol Biol. 2008;15:346–53.
Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, et al. MicroRNA genes are transcribed by RNA polymerase II. EMBO J. 2004;23:4051–60.
Han J, Lee Y, Yeom KH, Kim YK, ** H, Kim VN. The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev. 2004;18:3016–27.
Yi R, Qin Y, Macara IG, Cullen BR. Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev. 2003;17:3011–6.
Bernstein E, Caudy AA, Hammond SM, Hannon GJ. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 2001;409:363–6.
Hutvagner G, McLachlan J, Pasquinelli AE, Balint E, Tuschl T, Zamore PD. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 2001;293:834–8.
Freudenreich CH. R-loops: targets for nuclease cleavage and repeat instability. Curr Genet. 2018;64:789–94.
Sobczak K, Krzyzosiak WJ. CAG repeats containing CAA interruptions form branched hairpin structures in spinocerebellar ataxia type 2 transcripts. J Biol Chem. 2005;280:3898–910.
Sobczak K, de Mezer M, Michlewski G, Krol J, Krzyzosiak WJ. RNA structure of trinucleotide repeats associated with human neurological diseases. Nucleic Acids Res. 2003;31:5469–82.
Rue L, Banez-Coronel M, Creus-Muncunill J, Giralt A, Alcala-Vida R, Mentxaka G, et al. Targeting CAG repeat RNAs reduces Huntington’s disease phenotype independently of huntingtin levels. J Clin Invest. 2016;126:4319–30.
Creus-Muncunill J, Guisado-Corcoll A, Venturi V, Pantano L, Escaramis G, Garcia de Herreros M, et al. Huntington’s disease brain-derived small RNAs recapitulate associated neuropathology in mice. Acta Neuropathol. 2021;141:565–84.
Murmann AE, Gao QQ, Putzbach WT, Patel M, Bartom ET, Law CY, et al. Small interfering RNAs based on huntingtin trinucleotide repeats are highly toxic to cancer cells. EMBO Rep. 2018;19:e45336.
Genetic Modifiers of Huntington’s Disease Consortium. Electronic address ghmhe, genetic modifiers of Huntington’s disease C. CAG repeat not polyglutamine length determines timing of Huntington’s disease onset. Cell. 2019;178:887–900. e14
Wright GEB, Collins JA, Kay C, McDonald C, Dolzhenko E, **a Q, et al. Length of uninterrupted CAG, independent of polyglutamine size, results in increased somatic instability, hastening Onset of Huntington disease. Am J Hum Genet. 2019;104:1116–26.
Chu Y, Kilikevicius A, Liu J, Johnson KC, Yokota S, Corey DR. Argonaute binding within 3'-untranslated regions poorly predicts gene repression. Nucleic Acids Res. 2020;48:7439–53.
Teitz T, Wei T, Valentine MB, Vanin EF, Grenet J, Valentine VA, et al. Caspase 8 is deleted or silenced preferentially in childhood neuroblastomas with amplification of MYCN. Nat Med. 2000;6:529–35.
Eckenfelder A, Segeral E, Pinzon N, Ulveling D, Amadori C, Charpentier M, et al. Argonaute proteins regulate HIV-1 multiply spliced RNA and viral production in a Dicer independent manner. Nucleic Acids Res. 2017;45:4158–73.
Putzbach W, Gao QQ, Patel M, van Dongen S, Haluck-Kangas A, Sarshad AA, et al. Many si/shRNAs can kill cancer cells by targeting multiple survival genes through an off-target mechanism. eLife 2017;6:e29702.
Morton AJ, Glynn D, Leavens W, Zheng Z, Faull RL, Skepper JN, et al. Paradoxical delay in the onset of disease caused by super-long CAG repeat expansions in R6/2 mice. Neurobiol Dis. 2009;33:331–41.
Ciamei A, Detloff PJ, Morton AJ. Progression of behavioural despair in R6/2 and Hdh knock-in mouse models recapitulates depression in Huntington’s disease. Behav Brain Res. 2015;291:140–6.
Hauptmann J, Schraivogel D, Bruckmann A, Manickavel S, Jakob L, Eichner N, et al. Biochemical isolation of Argonaute protein complexes by Ago-APP. Proc Natl Acad Sci USA. 2015;112:11841–5.
Patel M, Wang Y, Bartom ET, Dhir R, Nephew KP, Adli M, et al. The ratio of toxic-to-nontoxic microRNAs predicts platinum sensitivity in ovarian cancer. Cancer Res. 2021;81:3985–4000.
Benhalevy D, McFarland HL, Sarshad AA, Hafner M. PAR-CLIP and streamlined small RNA cDNA library preparation protocol for the identification of RNA binding protein target sites. Methods 2017;118-119:41–9.
Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. 2011;6:26.
Bartom ET, Kocherginsky M, Baudel B, Vaidyanathan A, Haluck-Kangas A, Patel M, et al. SPOROS: A pipeline to analyze DISE/6mer seed toxicity. PLoS Comp Biol. 2021;18:e1010022.
Pircs K, Petri R, Madsen S, Brattas PL, Vuono R, Ottosson DR, et al. Huntingtin aggregation impairs autophagy, leading to Argonaute-2 accumulation and global MicroRNA Dysregulation. Cell Rep. 2018;24:1397–406.
Labadorf A, Hoss AG, Lagomarsino V, Latourelle JC, Hadzi TC, Bregu J, et al. RNA sequence analysis of human huntington disease brain reveals an extensive increase in inflammatory and developmental gene expression. PLoS One. 2015;10:e0143563.
Kielar C & Morton JA. Early neurodegeneration in R6/2 mice carrying the Huntington’s disease mutation with a super-expanded CAG repeat, despite normal lifespan. J Huntington’s Dis. 2020:7;61–76.
Ribeiro FM, Devries RA, Hamilton A, Guimaraes IM, Cregan SP, Pires RG, et al. Metabotropic glutamate receptor 5 knockout promotes motor and biochemical alterations in a mouse model of Huntington’s disease. Hum Mol Genet. 2014;23:2030–42.
de Mezer M, Wojciechowska M, Napierala M, Sobczak K, Krzyzosiak WJ. Mutant CAG repeats of Huntingtin transcript fold into hairpins, form nuclear foci and are targets for RNA interference. Nucleic Acids Res. 2011;39:3852–63.
Kim YK, Kim B, Kim VN. Re-evaluation of the roles of DROSHA, Exportin 5, and DICER in microRNA biogenesis. Proc Natl Acad Sci USA. 2016;113:E1881–9.
Gu X, Richman J, Langfelder P, Wang N, Zhang S, Banez-Coronel M, et al. Uninterrupted CAG repeat drives striatum-selective transcriptionopathy and nuclear pathogenesis in human Huntingtin BAC mice. Neuron. 2022;110:1173–92.e7.
MacDonald ME. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. The Huntington’s disease collaborative research group. Cell. 1993;72:971–83.
Miller JW, Urbinati CR, Teng-Umnuay P, Stenberg MG, Byrne BJ, Thornton CA, et al. Recruitment of human muscleblind proteins to (CUG)(n) expansions associated with myotonic dystrophy. EMBO J. 2000;19:4439–48.
Dansithong W, Wolf CM, Sarkar P, Paul S, Chiang A, Holt I, et al. Cytoplasmic CUG RNA foci are insufficient to elicit key DM1 features. PLoS One. 2008;3:e3968.
Saudou F, Finkbeiner S, Devys D, Greenberg ME. Huntingtin acts in the nucleus to induce apoptosis but death does not correlate with the formation of intranuclear inclusions. Cell 1998;95:55–66.
Katsuno M, Tanaka F, Adachi H, Banno H, Suzuki K, Watanabe H, et al. Pathogenesis and therapy of spinal and bulbar muscular atrophy (SBMA). Prog Neurobiol. 2012;99:246–56.
Acknowledgements
We are grateful to Drs. Sarah Gallois-Montbrun, Klaas Mulder, Bryan Cullen, and David Corey for providing the HeLa Ago2 knock-out, 293T Ago2 knock-out, 293T Dicer knock-out, and HCT116 Ago1/2/3 triple knock-out cells, respectively. We would also like to thank Dr. Eulalia Marti for helpful discussions.
Funding
This work was supported by start-up funds of MEP and a grant from the CHDI to AJM.
Author information
Authors and Affiliations
Contributions
MEP and AEM designed and supervised the project; AEM, MP, and S-YJ performed research, and analyzed data; AJM provided brains from HD transgenic mice; ETB provided bioinformatics support and analyzed data; MEP wrote the manuscript, and all authors reviewed and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Edited by: Dr Pier Giorgio Mastroberardino
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Murmann, A.E., Patel, M., Jeong, SY. et al. The length of uninterrupted CAG repeats in stem regions of repeat disease associated hairpins determines the amount of short CAG oligonucleotides that are toxic to cells through RNA interference. Cell Death Dis 13, 1078 (2022). https://doi.org/10.1038/s41419-022-05494-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41419-022-05494-1
- Springer Nature Limited
This article is cited by
-
Death Induced by Survival gene Elimination (DISE) correlates with neurotoxicity in Alzheimer’s disease and aging
Nature Communications (2024)