Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins

Van Nostrand, Eric L.; Pratt, Gabriel A.; Yee, Brian A.; Wheeler, Emily C.; Blue, Steven M.; Mueller, Jasmine; Park, Samuel S.; Garcia, Keri E.; Gelboin-Burkhart, Chelsea; Nguyen, Thai B.; Rabano, Ines; Stanton, Rebecca; Sundararaman, Balaji; Wang, Ruth; Fu, **ang-Dong; Graveley, Brenton R.; Yeo, Gene W.

doi:10.1186/s13059-020-01982-9

Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins

Research
Open access
Published: 06 April 2020

Volume 21, article number 90, (2020)
Cite this article

Download PDF

You have full access to this open access article

Genome Biology Aims and scope Submit manuscript

Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins

Download PDF

Eric L. Van Nostrand^1,2,
Gabriel A. Pratt^1,2,
Brian A. Yee^1,2,
Emily C. Wheeler^1,2,
Steven M. Blue^1,2,
Jasmine Mueller^1,2,
Samuel S. Park^1,2,
Keri E. Garcia^1,2,
Chelsea Gelboin-Burkhart^1,2,
Thai B. Nguyen^1,2,
Ines Rabano^1,2,
Rebecca Stanton^1,2,
Balaji Sundararaman^1,2,
Ruth Wang^1,2,
**ang-Dong Fu^1,2,
Brenton R. Graveley³ &
…
Gene W. Yeo^1,2

21k Accesses
25 Altmetric
1 Mention
Explore all metrics

Abstract

Background

A critical step in uncovering rules of RNA processing is to study the in vivo regulatory networks of RNA binding proteins (RBPs). Crosslinking and immunoprecipitation (CLIP) methods enable map** RBP targets transcriptome-wide, but methodological differences present challenges to large-scale analysis across datasets. The development of enhanced CLIP (eCLIP) enabled the map** of targets for 150 RBPs in K562 and HepG2, creating a unique resource of RBP interactomes profiled with a standardized methodology in the same cell types.

Results

Our analysis of 223 eCLIP datasets reveals a range of binding modalities, including highly resolved positioning around splicing signals and mRNA untranslated regions that associate with distinct RBP functions. Quantification of enrichment for repetitive and abundant multicopy elements reveals 70% of RBPs have enrichment for non-mRNA element classes, enables identification of novel ribosomal RNA processing factors and sites, and suggests that association with retrotransposable elements reflects multiple RBP mechanisms of action. Analysis of spliceosomal RBPs indicates that eCLIP resolves AQR association after intronic lariat formation, enabling identification of branch points with single-nucleotide resolution, and provides genome-wide validation for a branch point-based scanning model for 3′ splice site recognition. Finally, we show that eCLIP peak co-occurrences across RBPs enable the discovery of novel co-interacting RBPs.

Conclusions

This work reveals novel insights into RNA biology by integrated analysis of eCLIP profiling of 150 RBPs with distinct functions. Further, our quantification of both mRNA and other element association will enable further research to identify novel roles of RBPs in regulating RNA processing.

Transcriptome-wide identification of the RNA-binding landscape of the chromatin-associated protein PARP1 reveals functions in RNA biogenesis

Article Open access 28 November 2017

A systems view of spliceosomal assembly and branchpoints with iCLIP

Article 30 September 2019

CLIPdb: a CLIP-seq database for protein-RNA interactions

Article Open access 05 February 2015

Background

RNA can act as a carrier of information from the nucleus to the cytoplasm in the processing of protein-coding genes, as a regulatory molecule that can control gene expression, and even as an extracellular signal to coordinate trans-generational inheritance [1,2,3]. RNA binding proteins (RBPs) interact with RNA through a wide variety of primary sequence motifs and RNA structural elements to control all processing steps [3]. Furthermore, with the increase in the number of RBPs that are becoming associated with human diseases, identifying their RNA targets and how they are regulated has become an unmet, urgent need.

To identify direct RNA targets of RBPs, RNA immunoprecipitation (RIP) and crosslinking and immunoprecipitation (CLIP) methods are frequently used. CLIP-based methods utilize UV crosslinking to covalently link an RBP with its bound RNA in live cells, enabling both stringent immunoprecipitation washes and denaturing SDS-PAGE protein gel electrophoresis and nitrocellulose membrane transfer which serves to remove background unbound RNA [4]. Analyses of single RBP binding profiles by CLIP have provided unique insights into basic mechanisms of RNA processing, as well as identified downstream effectors that drive human diseases [5,6,7]. Further efforts to profile multiple human RBPs in the same family or regulatory function by CLIP illustrated coordinated and complex auto- and cross-regulatory interactions among RBPs and their targets [8,9,10]. Rising interest in organizing public deeply sequenced CLIP datasets to enable the community to extract novel RNA biology is apparent from newly available computational databases and integrative methods [11, 12]. However, methodological differences between CLIP approaches, combined with simple experimental variability between labs and variation in acceptable quality control metrics, add significant challenges to interpretation of differences observed.

The field of transcription regulation observed similar challenges and opportunities in integrating transcription factor target profiles [13]. To address this challenge, the ENCODE consortium piloted large-scale profiling of transcription factor targets using a single standardized chromatin immunoprecipitation (ChIP-seq) protocol [14]. The initial effort to profile 119 factors generated a unified dataset for creating and assaying robust quality assessment standards [15], and led to insights into modeling transcription factor complexes, binding modalities, and regulatory networks [16]. More critically, however, this has served as an invaluable resource for researchers to annotate potential functional variants [17] and generate hypotheses across a variety of fields of interest. This success suggested that a similar effort to profile RBP targets using a standardized methodology could similarly drive significant insights in RNA biology.

To this end, we introduced the enhanced CLIP (eCLIP) methodology featuring a size-matched input control [18] and characterized hundreds of immunoprecipitation-grade antibodies with a standardized workflow [19] to generate 223 eCLIP datasets profiling targets for 150 RBPs in K562 and HepG2 cell lines [https://www.encodeproject.org) [20].

Many CLIP methods included radioactive labeling of the 5′ end of RNA fragments with ³²P to visualize protein-RNA complexes after SDS-PAGE electrophoresis and membrane transfer in order to query whether RNA bound to co-purified RBPs of different size is present [4]. However, the eCLIP protocol we utilized above did not include this direct visualization of protein-associated RNA due to the complexity of incorporating radioactive labeling at this scale, preferring validation of eCLIP signal with orthogonal approaches (such as comparison with in vitro-derived motifs or overlap with knockdown/RNA-seq changes). To address this question for future large-scale eCLIP profiling, we pursued alternative labeling approaches. We found that ligation of biotinylated cytidine (instead of the normal RNA adapter) enabled visualization similar to that observed with ³²P while using commercially available chemiluminescent detection reagents for biotin-labeled nucleic acids (Additional file 3: Fig. S1a-c) [21]. We note that unlike ³²P labeling (which is done as a 5′ phosphorylation reaction with T4 Polynucleotide Kinase), this labeling uses the standard eCLIP RNA adapter ligation reaction and thus may more accurately reflect true protein-coupled RNA positioning.

Surprisingly, when expanding this approach across RBPs, we observed detectable transfer of RNA from non-crosslinked cells to nitrocellulose membranes in a supplier-dependent manner (Additional file 3: Fig. S1d-f). We had previously noted that certain sourced nitrocellulose membranes contained greater amounts of RNA, which would then be recovered during library preparation (particularly in input libraries, which lack adapter addition prior to membrane transfer) [22]. However, we now observed that the recommended (lower contaminant, membrane I) membrane from that effort showed increased transfer of RNA than our previous supplier (membrane G) (Additional file 3: Fig. S1d-f). Although the signal observed in crosslinked samples was typically significantly higher (median 12.5-fold across 17 RBPs tested), with 88% (15 out of 17) RBPs greater than 5-fold (Additional file 3: Fig. S1d), for 2 out of 17, we observed within 5-fold RNA transfer in non-crosslinked samples (Additional file 3: Fig. S1d,f).

To directly query whether this led to artifactual eCLIP peak identification, we chose seven eCLIP experiments performed with membrane I and performed replicate experiments with membrane G. Using MATR3 as an example, we observed that peak fold-enrichment compared across membranes was similar to that observed for within-membrane replicates (Additional file 3: Fig. S1g). Extending this to all seven RBPs, only one (FXR2) out of seven showed notably lower replication of peak significance using membrane G (Additional file 3: Fig. S1h), and even in that case, we observed high overall correlation in peak fold-enrichment (Additional file 3: Fig. S1i). Conservation of signal was not limited to peak calls, as we observed similar enrichments for retrotransposable and other RNA elements as well (Additional file 3: Fig. S1j). Thus, although our data indicates that whether RNA that is not crosslinked to protein will transfer to nitrocellulose membranes is supplier- and product-dependent, but that it does not generally appear to add significant background to the eCLIP profiles studied here.

Recovering RNA binding protein association to retrotransposons and other multicopy RNAs

Standard peak analysis revealed a wide variety of binding modes to mRNAs, with RBPs enriched for coding sequences, 3′ and 5′ untranslated regions, proximal and distal intronic regions, and non-coding RNAs (Additional file 3: Fig. S2a) [23]. We found that simply including non-uniquely mapped reads in standard analysis created thousands of peaks in introns, in intergenic regions, and at pseudogenes that typically lacked standard peak shapes (likely reflecting sequencing errors relative to the main expressed transcript), indicating the need for improved methods to properly quantify RBP binding to such loci.

In order to include these RNA types in eCLIP analysis, we developed a “family-aware map**” approach in which adapter-trimmed reads are first mapped against a database of sequences for primary transcripts and pseudogenes for 82 families (Fig. 2a) (Additional file 4). Reads map** to reference transcripts contained within a family (e.g., LINE, YRNA, or 18S rRNA) are used for quantitation, but reads that map to multiple families are masked (discarding an average of 1.1% of reads). These results are then integrated with standard unique genomic map** in order to incorporate reads that uniquely map to regions annotated as repetitive elements by RepeatMasker [24] into the final family quantitation (Fig. 2a). Confirming the success of this approach, we observed that in eCLIP replicates of YRNA-associating factor TROVE2/RO60 in K562, only 3.7 and 6.8% (replicate 1 and 2, respectively) of usable reads uniquely mapped to YRNA transcripts with standard processing (2.9 and 5.1% to RNY1/2/4/5, with another 0.7% and 1.8% to YRNA pseudogenes) (Fig. 2b). In contrast, for these same datasets, 14.2% and 21.7% of reads mapped uniquely to the YRNA family using the family-aware map** approach, making use of hundreds of thousands of additional reads that did not uniquely map to individual transcripts (Fig. 2b). Performing this analysis for all RBPs, we observed a wide range of read recovery and enrichment for particular elements (Fig. 2c, Additional file 5). For some RBPs such as RPS11 (K562), an average of 95.2% of reads were only recovered using family map** (68.1% map** to RNA18S with an additional 24.1% to RNA28S). In contrast, only 10.4% of reads in KHSRP (K562) eCLIP mapped to multicopy family elements, with 58.9% uniquely map** to the genome (including 41.1% uniquely map** to introns outside of RepeatMasker elements) (Fig. 2c).

At the element level, our family-aware map** strategy recovers many known processing or interacting factors, including RBPs enriched for the mature 18S (RPS3, RPS11) and 28S rRNA (DDX21, NOL12) as well as the 45S rRNA precursor (UTP18, WDR43), tRNAs (NSUN2), RN7SK (LARP7), YRNA (TROVE2), and others (Fig. 2d). To validate this approach, we considered 17 RNA elements with well-studied direct links to either RBP function (such as snoRNA binding with rRNA processing and snRNA binding with snRNA processing and the spliceosome) or specific RBP regulators (e.g., snRNA RN7SK with LARP7 [25] and YRNAs with TROVE2/Ro60 [26]) (Additional file 3: Fig. S2d). We observed that 140 eCLIP datasets had one of these 17 elements as the most highly enriched (by relative information, which we observed to better enable comparison across elements versus fold-enrichment), and in 84 (60%) of these cases, the RBP was previously characterized as having the element-paired RBP function, indicating that this approach is highly successful at recovering targets that reflect annotated functions of profiled RBPs. To set a cutoff for analysis, we found that an information cutoff of 0.2 maximized predictive accuracy, at which 70% (74 out of 105 RBPs with the most enriched RNA element meeting this cutoff) had annotated functions matching the known role for this element (Additional file 3: Fig. S2e). Using this cutoff, 235 RBP-element pairings were identified with large numbers of RBPs associated with mRNA regions (42 with CDS, 24 with 3′UTR, 40 with distal intronic, and 23 with proximal intronic regions) and rRNA (24 with RNA28S and 15 with RNA18s, as well as 12 with precursor 45S rRNA), and smaller numbers associated with other specific RNA classes (Fig. 2d, Table 1).

Table 1 Predominant RNA element for each eCLIP dataset

Full size table

Characterization of ribosomal RNA interactors and processing factors

Ribosomal RNA (rRNA) is the most abundant RNA found in eukaryotic cells and plays essential roles in defining the structure and activity of the ribosome. In humans, the 5S rRNA is separately transcribed, whereas the 18S, 28S, and 5.8S rRNAs are transcribed as one 45S precursor transcript that then undergoes a complex series of cleavage and RNA modification steps to process the mature rRNAs, which then form complex structures that scaffold the assembly of ~ 80 proteins to create the functional ribosome [27]. Unbiased approaches have characterized over 250 additional factors as playing critical roles in processing pre-rRNA, indicating that rRNA processing and function represent a major function of RBPs in humans [28].

Considering the 150 RBPs profiled, we observed that different subsets of RBPs showed enrichment to specific rRNAs (Fig. 3a), suggesting that the incorporation of normalization against paired input was successful in removing general background at abundant transcripts. Although we are unable to distinguish between map** to mature 18S, 28S, and 5.8S transcripts versus those regions in the precursor, the ~ 10-fold lower read density we observe for 45S (median 281 reads per million (RPM)) versus 18S (2715 RPM) or 28S (1983 RPM) in eCLIP input samples (Additional file 3: Fig. S3a-c) suggests that the majority of 18S and 28S reads reflect mature rRNA transcripts. Considering 30 RBPs previously shown to effect pre-rRNA processing [28], we found that 16 had enrichment for one of the three (18S, 28S, or 45S) rRNAs (42.1% of RBPs meeting a 0.101 position-wise information cutoff) relative to 12.5% of others (3.4-fold enriched, p = 0.00025 by Fisher’s exact test) (Additional file 3: Fig. S3d). Despite high and relatively even read density overall on the abundant rRNA transcripts (Additional file 3: Fig. S3a-c), we observed that these rRNA-enriched RBPs showed a number of specific enrichment patterns: two on the 45S precursor (one situated around the 01 and A0 early processing sites, and a second located ~ 2000 nt further downstream that is discussed below), a cluster at position ~ 4200 of the 28S, and a cluster at ~ 1150 of the 18S, along with other profiles unique to individual RBPs (Fig. 3a). Distinct ribosomal components RPS3 and RPS11 had different positional enrichments, as expected given their different positioning within the 18S ribosome (Additional file 3: Fig. S3e).

Our data on rRNA precursor position-specific enrichment confirms and provides further resolution to proteins previously characterized to play roles in ribosomal RNA processing. Some factors had specific positioning, including DDX51 which had specific enrichment at the 3′ end of 28S as well as the 3′-ETS precursor region, consistent with previous characterization of the role of DDX51 in 3′ end maturation of 28S [29], and UTP18 which had specific enrichment at the 5′ end, matching its roles in early cleavages at the 01, A0, and 1 sites suggested from large-scale screening data [28] (Fig. 3b, c, Additional file 3: Fig. S3f-g). Others, such as WDR3, had broader enrichment patterns that suggest participation in multiple maturation steps (Fig. 3d, Additional file 3: Fig. S3h).

Surprisingly, we observe a cluster of RBP association in the 45S precursor around position 2100, a region located between the A0 and 1 processing sites which lacks a well-defined processing role (Fig. 3a) [27]. Two of these factors have previous links to nucleolar activity, as ILF3 (also known as NF90) was previously shown to associate with pre-60S ribosomal particles in the nucleolus and knockdown of ILF3 gives defects in rRNA biogenesis [28, 30], and LIN28B has been shown to repress let-7 processing by sequestering pri-let-7 in the nucleolus [31]. In this region, multiple sites of ILF3 and SSB enrichment flank a more specific region enriched in LIN28B eCLIP (Fig. 3e, Additional file 3: Fig. S3i) which has previously been described to contain a potential rRNA-encoded microRNA, rmiR-663a [32]. As rmiR-663a shares similar sequence to genomic-encoded miR-663a on chromosome 20 (and would have the same mature miRNA sequence), it has been challenging to isolate expression of the ribosomal-encoded transcript in isolation [33], and indeed, the majority of LIN28B eCLIP reads map** to pri-miRNA map equally to both variants (Sup Fig. 3j). However, when we used sequence variants in the pri-miR sequence as well as the more variable flanking sequence to estimate their separate expression (Fig. 3f), we observed that reads unique to the rmiR outnumbered those unique to genomic homologs by more than 400-fold (Fig. 3g and Additional file 3: Fig. S3j-k), indicating that the observed signal is likely derived from 45S rather than other genomic homologs.

Finally, we considered binding to snoRNAs, a class of highly structured small RNAs that play essential roles in guiding modification of ribosomal RNAs. We found that enrichment for C/D-box snoRNAs, which canonically guide methylation of RNA, was highly correlated to enrichment for the 45S precursor (R² = 0.67, p = 1.6 × 10⁻⁵⁴) (Fig. 3h), providing further confirmation that these 45S-enriched RBPs are likely playing key roles in rRNA processing. Surprisingly, however, we observed that enrichment for H/ACA-box snoRNAs showed far lower correlation with enrichment for either C/D-box snoRNAs (R² = 0.42) or the 45S precursor (R² = 0.17) (Fig. 3i, Additional file 3: Fig. S3l). Thus, this data confirms the ability of eCLIP with input normalization to specifically isolate enrichment between abundant snoRNA classes, and suggests that (at least for the RBPs profiled to date here) we see stronger overlap between rRNA precursor and C/D-box versus H/ACA-box snoRNAs.

Repetitive elements define a significant fraction of the RBP target landscape

Repetitive elements constitute a large fraction of the non-coding genome [34], and elements annotated by RepBase constitute an average of 12.2% of reads observed in eCLIP input experiments (Additional file 3: Fig. S4a). In particular, as retrotransposable L1/LINE and Alu elements constitute 10.8% and 0.4% of intronic sequences, respectively (Additional file 3: Fig. S4b), they represent a significant fraction of the pool of nuclear transcribed pre-mRNAs available for RBP interactions. Although some RBPs have been shown to play roles in regulation of active retrotransposition [35], the majority of intronic elements have accumulated mutations or deletions and are no longer capable of active retrotransposition, leaving the question of their function relatively poorly understood. However, recent analyses of RBP targets identified by CLIP (including early releases of the eCLIP data considered here) have shown that both antisense Alu and antisense LINE elements contain cryptic splice sites that can lead to improper splicing and polyadenylation, suggesting that a major yet unappreciated role for many RBPs may be to suppress the emergence of inappropriate cryptic RNA processing sites introduced upon retrotransposition [36, 37].

Querying for RBPs with enriched eCLIP signal at retrotransposable and other repetitive elements, we surprisingly observed that only a small subset of elements (notably including L1 and Alu elements both in sense and antisense orientation) showed high RBP specificity, whereas most elements showed extremely highly correlated enrichments across RBPs (Fig. 4a, Additional file 3: Fig. S4c). This group of elements showed enrichment in a small subset of eCLIP experiments, notably including multiple members of the highly abundant HNRNP family (HNRNPA1, HNRNPU, HNRNPC, and HNNRPL), indicating that they may be coordinately regulated to prevent inappropriate RNA processing.

Analysis of Alu elements recapitulated a previously described interaction of HNRNPC with antisense Alu elements [36], but additionally revealed two RBPs with more than 5-fold enrichment: ILF3 (enriched for both sense and antisense Alu elements) and RNA Polymerase II component POLR2G (antisense) (Fig. 4b, Additional file 3: Fig. S4d). Both of these factors have previous links to RNA processing through Alu elements, as ILF3 association was suggested to repress RNA editing in Alu elements [39] and Alu elements have been shown to effect RNA Polymerase II elongation rates [40]. In total, 19 datasets showed more than 2-fold enrichment for either Alu or antisense Alu elements (Fig. 4b).

Considering L1/LINE elements, we observed enrichment with far more RBPs, with 26 datasets showing 5-fold enrichment (Fig. 4c). Interestingly, we observed generally distinct sets for sense versus antisense L1 enrichment, with only HNRNPC (in K562, but not HepG2) and ZC3H8 showing enrichment for both (Fig. 4c, Additional file 3: Fig. S4e). The RBPs identified here align well with those identified in an independent analysis of L1-associated RBPs which used a subset of these datasets along with independent iCLIP and other datasets, confirming robustness of this analysis across different approaches to quantify enrichment to L1 elements [37]. To query the role of L1 association, we first considered whether binding could specifically act to repress L1 retrotransposition itself. Of the 15 RBPs with more than 5-fold enrichment at sense L1 elements, SAFB (p = 0.002), PPIL4 (0.06), and TRA2A (p = 0.05) were all identified as candidate suppressors of L1 retrotransposition in a recent genome-wide CRISPR screening assay [38], suggesting that this eCLIP enrichment approach identifies functional regulators of retrotransposition (Fig. 4d).

However, we observed that while enriched signal was centered at L1 sense and antisense elements, the signal often extended for multiple kilobases on either side (Additional file 3: Fig. S4f), indicating that despite the overlap with functional regulators of active lines, the majority of eCLIP signal is likely coming from inactive L1 elements contained within pre-mRNAs rather than independently transcribed active L1 elements in the cell lines studied here. Thus, we next assayed whether these RBPs showed evidence for silencing cryptic RNA processing sites created upon retrotransposition, as previously described [36, 37]. To do this, we hypothesized that knockdown of such RBPs would lead to inclusion of premature stop codons that signal nonsense-mediated decay, ultimately decreasing abundance of target mRNA transcripts. For MATR3, we indeed observed that genes containing one or more antisense L1 elements overlapped by peaks showed significantly decreased expression upon RBP knockdown (Fig. 4e), consistent with recent findings that MATR3 binding blocks both cryptic poly(A)-sites and splice sites within LINEs [37]. Interestingly, we observed a similar pattern for 3 other RBPs with antisense L1 enrichment, HNRNPM (which has been identified in complexes with MATR3 [4f, Additional file 3: Fig. S4g).

Meta-gene binding profiles reveal RBP functions

Next, we turned to the question of whether eCLIP peak distributions could reveal RBP roles in mRNA processing. To better separate RBP association patterns, we considered the distribution peaks across a meta-gene generated by size-normalizing binding across all protein-coding transcripts relative to transcription start and stop sites and start and stop codons, and then averaging across all expressed genes (Fig. 5a). Considering binding relative to the coding region (CDS) and 5′ and 3′ untranslated regions of spliced mRNA, we observed an overall average of approximately one peak per gene across the entire mRNA (Additional file 3: Fig. S5a), with a variety of patterns of individual RBP association (Fig. 5b).

At a global level, the most striking observation was clear delineation points at the start and stop codon positions (Fig. 5b, c), likely reflecting the fact that translation initiation is unique to the 5′UTR whereas the 3′UTR is the only region where bound RBPs will not be removed by translating ribosomes. However, more subtle clustering revealed distinct subgroups within the broader 5′UTR-, CDS-, and 3′UTR-enriched classes (Fig. 5b, d). For example, we observed two distinct classes of 5′UTR binding that appear to correlate with distinct RBP functions. The first (5UTR.TSS) showed greater enrichment closer to the transcription start site and included nuclear 5′ end processing factors such as cap-binding protein NCBP2 (Fig. 5b, d). In addition to 5′ end enrichment, this class also contained RBPs with substantial 3′UTR signal, such as 3′ end processing factor CSTF2T (which also showed significant signal extending past annotated transcription termination sites (Additional file 3: Fig. S5b), consistent with previous CLIP studies [42]). A second set (5UTR.SC) showed biased peak presence closer to the start codon and included both canonical translational initiation factors (such as EIF3G, EIF3D, and EIF3H) as well as RBPs previously shown to play translational regulatory roles (including DDX3X, SRSF1, and FMR1) (Fig. 5b).

Similarly, we also observed distinctions within CDS binding, with either uniform (CDS.UN) density or biased towards the 5′ (CDS.5P) or 3′ (CDS.3P) end. We observed that 13 out of 15 spliceosomal RBPs showed CDS enrichment (10 of which fell into the CDS.UN category), likely reflecting the general lack of introns in 5′UTRs (due to their small size) and 3′UTRs (as they would create targets for nonsense-mediated decay) (Fig. 5b, d).

Finally, we observed multiple modalities of 3′UTR peak distribution. The 3UTR.Un class showed relatively uniform density and contained many well-characterized 3′UTR binding proteins, including NMD factor UPF1 and stress granule factor TIA1. In contrast, RBPs in the 3UTR.5P class had peak density enriched closer to (and continuing 5′ of) the stop codon, including the well-studied IGF2BP family of RBPs (Additional file 3: Fig. S5c). Finally, we observed a number of RBPs with increased enrichment towards the transcription termination site (3UTR.TTS).

Next, we considered whether these patterns corresponded to different RNA processing functions. Although the number of RBPs is limited for some functions, we observed that many clusters had significant overlaps with distinct RBP functional annotations (Fig. 5e, Additional file 3: Fig. S5d). In particular, RBPs associated with nuclear RNA processing steps showed little change (median 1.2-fold decrease in peak density around the stop codon), whereas RBPs with cytoplasmic roles showed a significant 1.6-fold increase (Additional file 3: Fig. S5e), consistent with a stronger role for the stop codon as a delineation point for cytoplasmic RBP association. In all, our results suggest that the pattern of relative enrichment in different gene regions is predictive of the regulatory role that the RBPs play.

Splicing regulatory roles revealed by intronic meta-gene profiles

Next, we performed regional analysis to query binding to exons (specifically 50 nt bordering the splice sites) and 500 nt of proximal introns flanking both the 3′ and 5′ splice sites. As an example, we observed that out of 89,265 introns present in highly expressed transcripts (TPM > 1), 2699 had a significant IDR peak from eCLIP of U2AF2 in K562 cells (Additional file 3: Fig. S6a). These peaks had a stereotypical positioning at the 3′ splice site (extending into the downstream exon due to the use of full reads rather than just read 5′ ends for analysis), matching the well-characterized role of U2AF2 in 3′ splice site recognition (Fig. 6a). These matrices were then summed across all introns to calculate a meta-intron plot representing the average peak coverage at each position, with confidence intervals estimated by bootstrap** (Fig. 6b).

Performing this analysis for 130 RBPs with sufficient peaks (see the “Methods” section), we observed that the profiles recapitulated many known binding patterns, including U2AF1 and U2AF2 at the 3′ splice site, SF3B4 and SF3A3 at the branch point, PRPF8 at the 5′ splice site, and RBFOX2 and PTBP1 at proximal introns (Fig. 6c). Clustering analysis indicated a number of distinct RBP association patterns. In addition to a large group of exclusively exonic datasets, we observed clusters for the canonical splicing features (5′ splice site, 3′ splice site, and branch point), and two additional clusters: one where RBPs showed enrichment for peaks at proximal introns flanking both the 5′ and 3′ splice sites, and one with dominant enrichment in the 5′ splice site proximal intron only (Fig. 6c, right). We also observed a wide range of peak frequency; canonical splicing machinery components such as U2AF2, SF3B4, and PRPF8 had significantly enriched peaks at many introns (with a position maximum of 3.6%, 7.8%, and 5.3% of queried abundant introns respectively in K562), whereas factors such as PTBP1 and RBFOX2 were less commonly enriched at specific positions (0.1% and 0.5%, respectively) (Fig. 6c).

Insights into spliceosomal association and core splicing regulation

The breadth of RBPs profiled provided a unique opportunity to explore their interactions with the spliceosome and their impacts on splicing regulation. In addition to contacting the intron, many spliceosomal and splicing regulatory proteins also interact with the spliceosomal small nuclear RNAs (snRNAs). The overall snRNA family includes five specific RNA families (U1, U2, U4, U5, and U6, which also have variant isoforms that differ slightly in sequence) that play essential roles in canonical GT-AG RNA splicing, as well as four (U11, U12, U4atac, U5atac) specific to the minor AT-AC spliceosome, each of which plays specific mechanistic roles during splicing [43]. Thus, RBP association with a particular snRNA can help to map its function to a particular step in splicing. Quantitating snRNA enrichment using the family-aware map** described above, we recapitulated many known associations between RBPs and the spliceosome, including interactions of SF3B4 with U2 snRNA (47- and 32-fold enriched in HepG2 and K562, respectively) [44] and GEMIN5 with U1 (11.2-fold enriched in K562) [45] (Fig. 7a). In some cases, these dominated overall RNA recovery; for example, an average of 41% of reads from SF3A3 eCLIP and 17% and 20% of SF3B4 eCLIP reads in HepG2 and K562 respectively mapped to the U2 snRNA, whereas U2 reads averaged only 0.7% in input samples.

Interestingly, while many factors showed similar association between analogous snRNAs in the major and minor spliceosomes (such as PRPF8 and SMNDC1 with U6 and U6atac, and SF3B1 and SF3B4 with U2 and U12), some RBPs were specifically associated with either the major (SF3A3, which was 29.5-fold enriched for U2 but 1.2-fold depleted for U12 in HepG2, and QKI, 118.6-fold enriched for U6 but 2.4-fold depleted for U6ATAC) or minor spliceosome (HNRNPM, which was 8.1-fold enriched in K562 and 7.6-fold in HepG2 for U11 but 5.3- and 4.2-fold depleted for U1) (Fig. 7a, Supplemental Fig. 7a-d). Although preliminary analysis did not show altered splicing upon HNRNPM knockdown specifically at U11/U12 introns, previous studies have suggested that HNRNPM may contribute to minor intron splicing through interactions with FUS [46].

In the first catalytic step of intron splicing, a transesterification step joins the 5′ splice site with the branch point to create an intron lariat structure (Additional file 3: Fig. S7e). This is an essential step in splicing and helps to define 3′ splice site choice, but identification of branch points has remained challenging due to variable positioning (ranging from 20 to 40 nucleotides upstream of the 3′ splice site) and a degenerate sequence motif [47]. Recent efforts to use either specialized library preparation protocols or focused analysis of deep sequencing to identify branch points via lariat junction-spanning reads have enabled the identification of tens of thousands of branch points, but the regulation of branch point recognition and its role in splicing regulation remains poorly understood. Considering the RBPs profiled here, we observe multiple RBPs showing specific enrichment at branch points, including both known regulators (such as SF3 complex components SF3B4 and SF3A3), as well as novel factors (including RBM5). Indeed, analysis of these datasets coupled with focused iCLIP profiling of purified spliceosomes recently indicated distinct patterns of RBP association at branch points and 5′ and 3′ splice sites, which yielded unique insights into how branch point strength defines RBP association and splicesomal assembly dynamics [48].

However, we were particularly intrigued by the observation of a striking pattern of both 5′ splice site and branch point enrichment for the RBP AQR (Fig. 7b). Knockdown of AQR yielded over 30,000 altered alternative splicing events, by far the most of any knockdown performed by the ENCODE consortium to date (including canonical splicing components including U2AF1/2 and SF3B4) [7c). Motif analysis of these positions yielded the canonical branch point motif signal (with 92% containing an A at the base prior to read starts) (Fig. 7c). Thus, these results suggest that AQR eCLIP signal is derived from introns after lariat formation, where reverse transcription is incapable of reading through the branch point adenosine (Additional file 3: Fig. S7e), and that deeper sequencing of AQR eCLIP (potentially with improved methodology to enrich reads at the 3′ rather than 5′ splice site) will provide direct identification of branch points in human.

Next, we considered eCLIP signal at alternatively spliced cassette exons. Considering “native” cassette exons in wild-type K562 and HepG2 cells, we observed that branch point factors SF3B4 and SF3A3 showed decreased signal at alternative exons relative to constitutive exons, consistent with U2AF2 and other spliceosomal components and potentially reflecting overall lower spliceosomal occupancy (Additional file 3: Fig. S7f). However, at alternative 3′ splice sites with the proximal site increased upon knockdown of branch point components SF3B4 and SF3A3, we observed that average eCLIP enrichment for SF3B4 and SF3A3 was decreased at the typical branch point location but increased towards the 3′ splice site (compared to eCLIP signal at native A3SS events which utilize both distal (upstream) and proximal 3′ splice sites in control shRNA datasets) (Fig. 7d, Additional file 3: Fig. S7g). Consistent with previous mini-gene studies showing that 3′ splice site scanning and recognition originates from the branch point and can be blocked if the branch point is moved too close to the 3′ splice site AG [50], these results provide further evidence that use of branch point complex association to restrict recognition by the 3′ splice site machinery may be a common regulatory mechanism [51] (Additional file 3: Fig. S7h).

Clustering of RBP binding identifies known and novel co-associating factors

Large-scale RBP target profiling using a consistent methodology enables cross-comparison between datasets. Considering simple overlap between peak sets for all profiled RBPs, we observed significant overlap for many pairs of RBPs, which often formed co-associating groups (Fig. 8a, left). These groups of RBPs with highly overlap** peaks generally segregated into four major categories. First, we observe high similarity between the same RBP profiled in HepG2 and K562 (including QKI, PTBP1, and LIN28B) (Fig. 8a, green). Indeed, we observe an average peak overlap of 30.0% between the same RBP in K562 and HepG2 versus 4.9% for random RBP pairings (6.1-fold increased), confirming the broad reproducibility of binding across cell types (Fig. 8b). Second, we observe many cases of high overlap between eCLIP for homologous RBPs within the same family, including TIA1 and TIAL1, IGF2BP1/2/3, and fragile X-related FMRP, FXR1, and FXR2 (Fig. 8a, yellow). Third, we observe clusters containing known co-regulating RBPs, including recognition and processing machinery for the 3′ splice site (U2AF1 and U2AF2), branch point (SF3B4 and SF3A3), and 5′ splice site (EFTUD2, RBM22, PRPF8, and others), as well as a group of RBPs that play general roles in binding the 5′UTR of nearly all genes to regulate translation (DDX3X, EIF3G, and NCBP2) (Fig. 8a, red).

Interestingly, we observe unexpected clusters that suggested potential novel complexes or co-interacting partners (Fig. 8a, blue). Some clusters likely reflect overlap** targeting to specific types of RNAs: for example, one cluster contains three RBPs we described above to show specific enrichment at antisense L1/LINE elements (HNRNPM, BCCIP, and EXOSC5). The patterns of other clusters are often less clear, with some containing both well-studied RBPs as well as those with no known RNA processing roles (for example, high overlap between HNRNPL and AGGF1 across both cell types). To consider whether these likely reflected true instances of RBP co-interaction, we asked whether RBPs that had higher peak overlap were more likely to have interactions from large-scale IP-mass spectrometry experiments. Using the BioPlex 2.0 database of ~ 56,000 interactions [52], we observed that RBPs with IP-MS interactions showed an average 2.3-fold increase in eCLIP peak overlap (11.4% versus 4.9% for RBPs without interactions), suggesting that there is a general correlation between peak overlap and RBP interactions (Fig. 8c).

Finally, we performed co-immunoprecipitation (co-IP) studies focusing on one predicted novel interaction group involving HNRNPL and AGGF1. We observed that AGGF1 co-immunoprecipitated HNRNPL, unlike unrelated factors RBFOX2 or FMR1 (Additional file 3: Fig. S8a). We note that this co-IP was observed using less stringent co-IP wash buffers, but was not observed using the high-salt wash buffers present in eCLIP (Additional file 3: Fig. S8b), indicating that the overlap in eCLIP binding likely reflects independent crosslinking events to the distinct RBPs. Thus, these results indicate that the eCLIP data resource reveals many novel RBP interactions that are likely to reflect previously unidentified regulatory complexes.

Discussion

The ENCODE RNA binding protein resource contains 1223 replicated datasets for 356 RBPs, including in vivo targets by eCLIP, in vitro binding motifs by RNA Bind-N-Seq, subcellular localization by immunofluorescence, factor-responsive expression and splicing changes by knockdown/RNA-seq, and DNA associations by ChIP-seq [71], suggesting that RPS3 eCLIP may capture ribosome association on translating mRNAs and could be used as a general approach to assay translation. Similarly, our meta-exon analysis of AQR (followed by further analysis of crosslink-induced termination sites) showed that AQR eCLIP could identify branch points for a set of highly abundant introns, suggesting that further development of profiling of AQR binding targeted to 3′ splice site regions could yield a highly specific approach to identification of branch points transcriptome-wide. Recent work using iCLIP to specifically purify spliceosome-associated RNAs further showed that other eCLIP datasets analyzed here also showed highly stereotypical crosslinking patterns around branch points, which could also broadly map branch point locations and reveal unique insights into the combinatorial effect of branch point and splice site strength on spliceosomal assembly and dynamics [48].

The diversity of distinct RBP association patterns can also be flipped to predict features of a queried RNA. For example, recent work used the ENCODE eCLIP resource to identify UPF1 as one of many RBPs with specific enrichment at 3′UTRs [56]. This finding enabled improved prediction of whether a queried transcript was a protein coding versus long non-coding RNA by incorporating presence (or absence) of UPF1 eCLIP signal as a biomarker for translation [56]. Similarly, our unbiased analysis of foci of enrichment on the 45S rRNA precursor suggested two regions as notably highly enriched across multiple RBPs, one of which matches a well-characterized region (between the canonical 01 and A0 processing sites) with another suggesting interesting regulatory mechanisms linking ribosomal RNA and microRNA processing. Similar analysis identifying eCLIP datasets with enrichment on regulatory non-coding RNAs ** of protein-RNA interactions with individual nucleotide resolution. J Vis Exp. 2011;(50). https://doi.org/10.3791/2638 ." href="/article/10.1186/s13059-020-01982-9#ref-CR73" id="ref-link-section-d130486411e2523">73].

*Family-aware map** to multicopy elements*

The software pipeline used to quantify enrichment for retrotransposable and other multicopy elements is available at https://github.com/YeoLab/repetitive-element-map**, and was initially described in [75]. Within each family, transcripts were given a priority value, with primary transcripts prioritized over pseudogenes. Map** to the reverse strand of a transcript was counted separately from forward strand map**, creating a second “antisense” family for each RNA family above (which utilized the same element priority order), with the exception of simple repeats (which were all combined into one family).

To quantify eCLIP signal, paired-end sequencing reads were first adapter trimmed as previously described [18]. Next, reads were mapped against the repetitive element database using bowtie2 (v. 2.2.6) with options “-q --sensitive -a -p 3 --no-mixed –reorder” to output all map**s. Read map**s were then processed as follows. First, for each paired-end read pair, only map**s with the lowest alignment scores summing both mismatch penalties (defined as MN + floor((MX − MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value, and default values MX = 6, MN = 2, as described in bowtie2 reference material) and gap penalties (defined as GO + N × GE, where GO = gap open = 5, GE = gap extend = 3, N = gap length) were kept. Next, the map** to the transcript with the highest priority within a RNA family (as listed above) was identified as the “primary” match map**. At this stage, read pairs which had equal best alignments to multiple repeat families were discarded, with only reads map** to a single repeat family considered for further quantification.

Next, these RNA family map**s were integrated with unique genomic map** from the standard eCLIP processing pipeline (using read map** prior to PCR duplicate removal). For read pairs that mapped both to an RNA family above as well as uniquely to the genome, the map** scores (as defined above) were compared. If the unique genome map** was more than 2 mismatches per read (24 alignment score for the read pair) better than to the repeat element, the unique genomic map** was used; otherwise, it was discarded and only the repeat map** was kept. Next, PCR duplicates were removed by comparing all read pairs based on their map** start and stop position (either within the genome or within the mapped primary repeat) and unique molecular identifier sequence, and all but one read pair for read pairs sharing these three values were defined as PCR duplicates and removed. At this stage, RepeatMasker-predicted repetitive elements in the hg19 genome were additionally obtained from the UCSC Genome Browser [24]. Element counts for RepBase elements were therefore determined as the sum of repeat family-mapped read pairs (described above) plus the number of reads that mapped uniquely to the genome at positions which overlapped (by at least one base) RepeatMasked RepBase elements. Reads uniquely map** to non-RepBase genomic regions were then annotated into one of 11 additional classes in the following priority order (based on GENCODE v19 annotations): CDS, 5′UTR and 3′UTR, 3′UTR, 5′UTR, proximal intronic (within 500 nt of splice sites), distal intronic (remaining intronic regions), non-coding exonic, non-coding proximal intronic, non-coding distal intronic, antisense to GENCODE transcripts, and intergenic.

Finally, the number of post-PCR duplicate removal read pairs map** to each class was counted in both IP and paired input sample and normalized for sequencing depth (using the total number of post-PCR duplicate read pairs from both unique genomic map** as well as repeat map** as the denominator to calculate fraction of reads). Significance was determined by Fisher’s exact test or Pearson’s chi-square test if all expected and observed values were five or more. Relative information content of each element in each replicate was calculated as \( {p}_i\times {\log}_2\left(\frac{p_i}{q_i}\right) \), where p_i and q_i are the fraction of total reads in IP and input respectively that map to element i. To combine two biological replicates, the average reads per million (RPM) was calculated across two IP samples and compared against the paired input experiment to calculate one overall fold-enrichment and relative information value per dataset.

Validation of RNA element links with RBP functional annotations

To quantify whether RNA element enrichment matched with RBP functions, a set of positive control pairings were generated between RNA elements with known links to either RBP function or known RBPs contained within a well-characterized ribonucleoprotein complex (Additional file 3: Fig. S2a). One hundred forty datasets for which the RBP had at least one of these annotated functions were selected, and datasets were sorted by relative information of the most-enriched class. Accuracy (defined as (TP + TN)/(TP + TN + FP + FN)) was then calculated, where true positives (TP) were RBPs for which the most-enriched RNA element was greater than the cutoff value and the RBP has published evidence for the function associated with the most-enriched RNA element, false positives (FP) were RBPs that had an RNA element meeting the relative information cutoff but the RBP lacked publication evidence for the linked function, false negatives (FN) were RBPs lacking an RNA element meeting the relative information cutoff but the RBP had published evidence for functions associated with at least one RNA element class, and true negatives (TN) were RBPs lacking annotated functions or RNA elements meeting the relative information cutoff. Accuracy was calculated for each possible relative information cutoff, and the maximum point (0.2) was chosen.

Ribosomal RNA analysis

RBPs with roles in ribosomal RNA processing were obtained from [28]. Position-wise relative information was calculated as above, using the number of reads overlap** the position in IP versus input for each dataset (using paired-end read 2 only, as was done for genomic map**). To obtain a cutoff for further analysis, RBPs were sorted by the maximum position-wise relative information on the 45S rRNA precursor, and at each value, the F1 score was calculated (defined as (2 × TP)/(2 × TP + FP + FN)) using the definitions described above. The maximum point at 0.101 was used for further analysis.

To quantify enrichment at the rmiR-663 ribosomal versus genomic paralog loci, sequences of rmiR-663 and four genomic-encoded paralogs (miR-663a, miR-663b, AC010970.1, and AC136932.1) were obtained from the UCSC Genome Browser, along with 100 nt of flanking sequence. Only reads that perfectly aligned (with zero mismatches or gaps) to these sequences were counted for further analysis.

Retrotransposable element analysis

L1 retrotransposition genome-wide CRISPR screening data was obtained from Liu et al. [38], using Combo casTLE Effect scores from K562 cells. Bonferroni correction was performed on uncorrected casTLE p values using n = 15 (the number of L1 (sense)-enriched RBPs queried).

To calculate change in expression of L1-containing bound genes, DESeq-calculated gene expression fold changes for RBP knockdown/RNA-seq data were obtained from the ENCODE DCC (http://www.encodeproject.org) for all RBPs with both eCLIP and RNA-seq performed in the same cell type. L1 sense and anti-sense elements were taken from RepeatMasker-predicted repetitive elements in the hg19 genome obtained from the UCSC Genome Browser [24]. For each gene in GENCODE v19, the transcript with the highest abundance in rRNA-depleted total RNA-seq in HepG2 (ENCODE accession ENCFF533XPJ, ENCFF321JIT) and K562 (ENCFF286GLL, ENCFF986DBN) was chosen as the representative transcript, and the set of expressed genes (10,247 in HepG2 and 9162 in K562 with TPM ≥ 1) were considered. Next, genes were separated into three classes: “≥ 1 bound L1(as)” genes with at least one antisense L1 element that overlapped a significant peak identified in eCLIP, “bgd with ≥ 1 L1(as)” genes with at least 1 antisense L1 element but did not have an element that overlapped with an eCLIP peak, or “Bgd” which contained all expressed genes. Significance was determined by the Kolmogorov-Smirnov test with no multiple hypothesis testing correction.

To compare reference versus divergent L1 elements, we defined “canonical” reads as those which mapped best (and were assigned) to sequences present in RepBase, whereas “divergent” reads mapped better to unique genomic loci than to the reference sequence.

Calculation of overall element coverage (Additional file 3: Fig. S4b) was based on the above set of 9162 reference transcripts in K562 expressed with TPM ≥ 1.

Meta-gene and meta-exon peak density maps

To generate meta-gene and meta-exon maps, for each gene in GENCODE v19, the transcript with the highest abundance in rRNA-depleted total RNA-seq in HepG2 (ENCODE accession ENCFF533XPJ, ENCFF321JIT) and K562 (ENCFF286GLL, ENCFF986DBN) was chosen as the representative transcript, and the set of expressed genes (10,247 in HepG2 and 9162 in K562 with TPM ≥ 1) were considered. Datasets with fewer than 100 mRNA-overlap** peaks were discarded, leaving 205 datasets. Next, each gene was split into 162 bins (13 for 5′UTR, 100 for CDS, 49 for 3′UTR), based on the median 5′UTR, CDS, and 3′UTR lengths of highly expressed (TPM ≥ 10) GENCODE v19 transcripts in K562 cells. For each eCLIP dataset, the average peak coverage for each bin was calculated for each gene and then averaged over all genes to generate final meta-gene plot. To generate confidence intervals, bootstrap** was performed by randomly selecting (with replacement) the same number of transcripts and calculating the average position-level peak coverage as above, with the 5th and 95th percentiles (out of 100 permutations) shown. For further visualization and analysis, only 104 RBPs where the 5th percentile was at least 0.002 peaks per gene (~ 20 peaks in at least one bin) were considered. Normalized coverage was then calculated by setting the maximum position to one and minimum position to zero for each eCLIP dataset. Cross-position correlations were calculated using normalized coverage for across all 104 RBPs at each position. Odds ratios and significance (determined by Fisher’s exact test or Yates’ chi-square test if observed and expected values were greater than five) utilized RBP annotations (Additional file 3) from [6), an additional normalization was performed by dividing each position by the maximum meta-exon value for that dataset, in order to scale the meta-exon profiles between 0 and 1.

Analysis of AQR enrichment at branch points

To identify points of enriched read termination in AQR eCLIP, regions from − 50 nt to − 15 nt from annotated 3′ splice sites were obtained from GENCODE v19, and the subset of regions with at least 20 overlap** reads in AQR eCLIP in K562 cells were taken for further analysis. Points of enrichment were identified as those where more than half of reads overlap** the overall region terminated at the same position. Motif analysis was performed by counting the frequency of 11-mers centered on the read start position with 5 nt flanking on either side. Motif logos were generated with seqLogo (R).

Enrichment of branch point factors at alternative 3′ splice site events

Splicing maps profiling normalized enrichment for SF3B4 and SF3A3 at RBP knockdown-responsive alternative 3′ splice site events were generated as previously described [20, 76]. In brief, the set of differential 3′ splice site events for RBP-knockdown/RNA-seq was identified from rMATS analysis between RBP knockdown and paired non-target control. Normalized read density in eCLIP was then calculated for each differential event by subtracting input read density from IP read density (each normalized per million mapped reads). To weigh each event equally, position-wise subtracted read density was then normalized to sum to one across the entire event region (composed of 50 nt of exonic and 300 nt of flanking intron), including a pseudocount of one read (normalized by total mapped read density) at each position. The highest 2.5% and lowest 2.5% values at each position across all events were then removed, and the mean was then calculated across all other events to define the final splicing map. As a control, a set of “native” alternative 3′ splice site events was defined as those which showed alternative usage (0.05 < inclusion < 0.95) in control K562 or HepG2 cells, respectively. Confidence intervals were generated by randomly sampling the number of events in the RBP-responsive class from the native alternative 3′ splice site set 1000 times, processing this sampled set as described above, and plotting the 0.5th to 99.5th percentiles.

Co-occurrence of RBP eCLIP peaks and validation of subcomplexes of RBPs

Overlap between eCLIP datasets A and B was determined by calculating the fraction of significant and reproducible peaks in dataset A that overlapped (by at least one base) a peak in dataset B, and vice versa the fraction of peaks in B that overlapped a peak in A, and taking the maximum of those fractions as the overall pairwise fraction overlap. Only datasets with at least 100 reproducible and significant peaks were used for this analysis. Gene Set Enrichment Analysis was performed using the GSEA software package [77]. RBP interaction data was obtained from the BioPlex 2.0 dataset [52].

IP-western validation was performed using HNNRPL (ab6106, Abcam), RBFOX2 (A300-864A, Bethyl), FMR1 (RN016P, Bethyl), AGGF1 (A303-634A, Bethyl), and TNRC6A (RN033P, MBLI) antibodies in UV crosslinked K562 cells. Immunoprecipitation in high-salt wash conditions was performed using standard eCLIP wash buffers, beads, and other reagents [18]. Low-salt co-immunoprecipitation conditions used identical conditions, except for lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 1% Triton X-100, 0.1% Sodium deoxycholate, and Protease Inhibitor cocktail (Promega)) and wash buffer (5 washes total in TBS + 0.05% NP-40). Westerns were probed with HNNRPL (ab6106, Abcam) primary antibody and TrueBlot secondary (Rockland).

Availability of data and materials

Raw and processed eCLIP data is available at the ENCODE Data Coordination Center (https://www.encodeproject.org) under accession ID ENCSR456FVU [20]. Accession identifiers for individual datasets used are provided in Supplementary Table 1.

References
Posner R, Toker IA, Antonova O, Star E, Anava S, Azmon E, Hendricks M, Bracha S, Gingold H, Rechavi O. Neuronal small RNAs control behavior transgenerationally. Cell. 2019;177:1814–26 e1815.
Article PubMed PubMed Central CAS Google Scholar
Morris KV, Mattick JS. The rise of regulatory RNA. Nat Rev Genet. 2014;15:423–37.
Article PubMed PubMed Central CAS Google Scholar
Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nat Rev Genet. 2014;15:829–45.
Article PubMed CAS Google Scholar
Ule J, Jensen KB, Ruggiu M, Mele A, Ule A, Darnell RB. CLIP identifies Nova-regulated RNA networks in the brain. Science. 2003;302:1212–5.
Article PubMed CAS Google Scholar
Martinez FJ, Pratt GA, Van Nostrand EL, Batra R, Huelga SC, Kapeli K, Freese P, Chun SJ, Ling K, Gelboin-Burkhart C, et al. Protein-RNA networks regulated by normal and ALS-associated mutant HNRNPA2B1 in the nervous system. Neuron. 2016;92:780–95.
Article PubMed PubMed Central CAS Google Scholar
Modic M, Ule J, Sibley CR. CLI** the brain: studies of protein-RNA interactions important for neurodegenerative disorders. Mol Cell Neurosci. 2013;56:429–35.
Article PubMed PubMed Central CAS Google Scholar
Ule J, Stefani G, Mele A, Ruggiu M, Wang X, Taneri B, Gaasterland T, Blencowe BJ, Darnell RB. An RNA map predicting Nova-dependent splicing regulation. Nature. 2006;444:580–6.
Article PubMed CAS Google Scholar
Sohrabi-Jahromi S, Hofmann KB, Boltendahl A, Roth C, Gressel S, Baejen C, Soeding J, Cramer P. Transcriptome maps of general eukaryotic RNA degradation factors. Elife. 2019;8. https://elifesciences.org/articles/47040, https://www.ncbi.nlm.nih.gov/pubmed/31135339.
Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010;141:129–41.
Article PubMed PubMed Central CAS Google Scholar
Huelga SC, Vu AQ, Arnold JD, Liang TY, Liu PP, Yan BY, Donohue JP, Shiue L, Hoon S, Brenner S, et al. Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep. 2012;1:167–78.
Article PubMed PubMed Central CAS Google Scholar
Yang YC, Di C, Hu B, Zhou M, Liu Y, Song N, Li Y, Umetsu J, Lu ZJ. CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genomics. 2015;16:51.
Article PubMed PubMed Central CAS Google Scholar
Hu B, Yang YT, Huang Y, Zhu Y, Lu ZJ. POSTAR: a platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins. Nucleic Acids Res. 2017;45:D104–14.
Article PubMed CAS Google Scholar
Marinov GK, Kundaje A, Park PJ, Wold BJ. Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda). 2014;4:209–23.
Article Google Scholar
Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.
Article CAS Google Scholar
Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 2012;22:1813–31.
Article PubMed PubMed Central CAS Google Scholar
Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100.
Article PubMed PubMed Central CAS Google Scholar
Pazin MJ. Using the ENCODE resource for functional annotation of genetic variants. Cold Spring Harb Protoc. 2015;2015:522–36.
Article PubMed PubMed Central Google Scholar
Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, Sundararaman B, Blue SM, Nguyen TB, Surka C, Elkins K, et al. Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods. 2016;13:508–14.
Article PubMed PubMed Central CAS Google Scholar
Sundararaman B, Zhan L, Blue SM, Stanton R, Elkins K, Olson S, Wei X, Van Nostrand EL, Pratt GA, Huelga SC, et al. Resources for the comprehensive discovery of functional RNA elements. Mol Cell. 2016;61:903–13.
Article PubMed PubMed Central CAS Google Scholar
Van Nostrand EL, Freese P, Pratt GA, Wang X, Wei X, **ao R, Blue SM, Dominguez D, Cody NAL, Olson S, Sundararaman B, et al. A large-scale binding and functional map of human RNA binding proteins. bioRxiv. 2017.
England TE, Uhlenbeck OC. 3′-terminal labelling of RNA with T4 RNA ligase. Nature. 1978;275:560–1.
Article PubMed CAS Google Scholar
Van Nostrand EL, Nguyen TB, Gelboin-Burkhart C, Wang R, Blue SM, Pratt GA, Louie AL, Yeo GW. Robust, cost-effective profiling of RNA binding protein targets with single-end enhanced crosslinking and immunoprecipitation (seCLIP). Methods Mol Biol. 1648;2017:177–200.
Google Scholar
McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Sekhon M, Wylie K, Mardis ER, Wilson RK, et al. A physical map of the human genome. Nature. 2001;409:934–41.
Article PubMed CAS Google Scholar
Smit AFA, Hubley R, Green, P. : RepeatMasker Open-3.0. 1996-2010.
Eichhorn CD, Yang Y, Repeta L, Feigon J. Structural basis for recognition of human 7SK long noncoding RNA by the La-related protein Larp7. Proc Natl Acad Sci U S A. 2018;115:E6457–66.
Article PubMed PubMed Central CAS Google Scholar
Farris AD, O'Brien CA, Harley JB. Y3 is the most conserved small RNA component of Ro ribonucleoprotein complexes in vertebrate species. Gene. 1995;154:193–8.
Article PubMed CAS Google Scholar
Mullineux ST, Lafontaine DL. Map** the cleavage sites on mammalian pre-rRNAs: where do we stand? Biochimie. 2012;94:1521–32.
Article PubMed CAS Google Scholar
Tafforeau L, Zorbas C, Langhendries JL, Mullineux ST, Stamatopoulou V, Mullier R, Wacheul L, Lafontaine DL. The complexity of human ribosome biogenesis revealed by systematic nucleolar screening of pre-rRNA processing factors. Mol Cell. 2013;51:539–51.
Article PubMed CAS Google Scholar
Srivastava L, Lapik YR, Wang M, Pestov DG. Mammalian DEAD box protein Ddx51 acts in 3′ end maturation of 28S rRNA by promoting the release of U8 snoRNA. Mol Cell Biol. 2010;30:2947–56.
Article PubMed PubMed Central CAS Google Scholar
Wandrey F, Montellese C, Koos K, Badertscher L, Bammert L, Cook AG, Zemp I, Horvath P, Kutay U. The NF45/NF90 heterodimer contributes to the biogenesis of 60S ribosomal subunits and influences nucleolar morphology. Mol Cell Biol. 2015;35:3491–503.
Article PubMed PubMed Central CAS Google Scholar
Piskounova E, Polytarchou C, Thornton JE, LaPierre RJ, Pothoulakis C, Hagan JP, Iliopoulos D, Gregory RI. Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell. 2011;147:1066–79.
Article PubMed PubMed Central CAS Google Scholar
Son DJ, Kumar S, Takabe W, Kim CW, Ni CW, Alberts-Grill N, Jang IH, Kim S, Kim W, Won Kang S, et al. The atypical mechanosensitive microRNA-712 derived from pre-ribosomal RNA induces endothelial inflammation and atherosclerosis. Nat Commun. 2013;4:3000.
Article PubMed CAS Google Scholar
Yoshikawa M, Fujii YR. Human ribosomal RNA-derived resident microRNAs as the transmitter of information upon the cytoplasmic cancer stress. Biomed Res Int. 2016;2016:7562085.
Article PubMed PubMed Central CAS Google Scholar
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.
Article PubMed CAS Google Scholar
Taylor MS, LaCava J, Mita P, Molloy KR, Huang CR, Li D, Adney EM, Jiang H, Burns KH, Chait BT, et al. Affinity proteomics reveals human host factors implicated in discrete stages of LINE-1 retrotransposition. Cell. 2013;155:1034–48.
Article PubMed PubMed Central CAS Google Scholar
Zarnack K, Konig J, Tajnik M, Martincorena I, Eustermann S, Stevant I, Reyes A, Anders S, Luscombe NM, Ule J. Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Cell. 2013;152:453–66.
Article PubMed PubMed Central CAS Google Scholar
Attig J, Agostini F, Gooding C, Chakrabarti AM, Singh A, Haberman N, Zagalak JA, Emmett W, Smith CWJ, Luscombe NM, Ule J. Heteromeric RNP assembly at LINEs controls lineage-specific RNA processing. Cell. 2018;174:1067–81 e1017.
Article PubMed PubMed Central CAS Google Scholar
Liu N, Lee CH, Swigut T, Grow E, Gu B, Bassik MC, Wysocka J. Selective silencing of euchromatic L1s revealed by genome-wide screens for L1 regulators. Nature. 2018;553:228–32.
Article PubMed CAS Google Scholar
Quinones-Valdez G, Tran SS, Jun HI, Bahn JH, Yang EW, Zhan L, Brummer A, Wei X, Van Nostrand EL, Pratt GA, et al. Regulation of RNA editing by RNA-binding proteins in human cells. Commun Biol. 2019;2:19.
Article PubMed PubMed Central Google Scholar
Tajaddod M, Tanzer A, Licht K, Wolfinger MT, Badelt S, Huber F, Pusch O, Schopoff S, Janisiw M, Hofacker I, Jantsch MF. Transcriptome-wide effects of inverted SINEs on gene expression and their impact on RNA polymerase II activity. Genome Biol. 2016;17:220.
Article PubMed PubMed Central CAS Google Scholar
Damianov A, Ying Y, Lin CH, Lee JA, Tran D, Vashisht AA, Bahrami-Samani E, **ng Y, Martin KC, Wohlschlegel JA, Black DL. Rbfox proteins regulate splicing as part of a large multiprotein complex LASR. Cell. 2016;165:606–19.
Article PubMed PubMed Central CAS Google Scholar
Kargapolova Y, Levin M, Lackner K, Danckwardt S. sCLIP-an integrated platform to study RNA-protein interactomes in biomedical research: identification of CSTF2tau in alternative processing of small nuclear RNAs. Nucleic Acids Res. 2017;45:6074–86.
Article PubMed PubMed Central CAS Google Scholar
Turunen JJ, Niemela EH, Verma B, Frilander MJ. The significant other: splicing by the minor spliceosome. Wiley Interdiscip Rev RNA. 2013;4:61–76.
Article PubMed CAS Google Scholar
Champion-Arnaud P, Reed R. The prespliceosome components SAP 49 and SAP 145 interact in a complex implicated in tethering U2 snRNP to the branch site. Genes Dev. 1994;8:1974–83.
Article PubMed CAS Google Scholar
Jiang D, Zou X, Zhang C, Chen J, Li Z, Wang Y, Deng Z, Wang L, Chen S. Gemin5 plays a role in unassembled-U1 snRNA disposal in SMN-deficient cells. FEBS Lett. 2018;592:1400–11.
Article PubMed CAS Google Scholar
Reber S, Stettler J, Filosa G, Colombo M, Jutzi D, Lenzken SC, Schweingruber C, Bruggmann R, Bachi A, Barabino SM, et al. Minor intron splicing is regulated by FUS and affected by ALS-associated FUS mutants. EMBO J. 2016;35:1504–21.
Article PubMed PubMed Central CAS Google Scholar
Mercer TR, Clark MB, Andersen SB, Brunck ME, Haerty W, Crawford J, Taft RJ, Nielsen LK, Dinger ME, Mattick JS. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303.
Article PubMed PubMed Central CAS Google Scholar
Briese M, Haberman N, Sibley CR, Faraway R, Elser AS, Chakrabarti AM, Wang Z, Konig J, Perera D, Wickramasinghe VO, et al. A systems view of spliceosomal assembly and branchpoints with iCLIP. Nat Struct Mol Biol. 2019;26:930–40.
Article PubMed PubMed Central CAS Google Scholar
De I, Bessonov S, Hofele R, dos Santos K, Will CL, Urlaub H, Luhrmann R, Pena V. The RNA helicase Aquarius exhibits structural adaptations mediating its recruitment to spliceosomes. Nat Struct Mol Biol. 2015;22:138–44.
Article PubMed CAS Google Scholar
Smith CW, Chu TT, Nadal-Ginard B. Scanning and competition between AGs are involved in 3′ splice site selection in mammalian introns. Mol Cell Biol. 1993;13:4939–52.
Article PubMed PubMed Central CAS Google Scholar
Bradley RK, Merkin J, Lambert NJ, Burge CB. Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution. PLoS Biol. 2012;10:e1001229.
Article PubMed PubMed Central CAS Google Scholar
Huttlin EL, Bruckner RJ, Paulo JA, Cannon JR, Ting L, Baltier K, Colby G, Gebreab F, Gygi MP, Parzen H, et al. Architecture of the human interactome defines protein communities and disease networks. Nature. 2017;545:505–9.
Article PubMed PubMed Central CAS Google Scholar
Yang EW, Bahn JH, Hsiao EY, Tan BX, Sun Y, Fu T, Zhou B, Van Nostrand EL, Pratt GA, Freese P, et al. Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA. Nat Commun. 2019;10:1338.
Article PubMed PubMed Central CAS Google Scholar
Bahrami-Samani E, **ng Y. Discovery of allele-specific protein-RNA interactions in human transcriptomes. Am J Hum Genet. 2019;104:492–502.
Article PubMed PubMed Central CAS Google Scholar
Nussbacher JK, Yeo GW. Systematic discovery of RNA binding proteins that regulate microRNA levels. Mol Cell. 2018;69:1005–16 e1007.
Article PubMed PubMed Central CAS Google Scholar
Choi SW, Nam JW. TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association. BMC Bioinformatics. 2018;19:41.
Article PubMed PubMed Central CAS Google Scholar
Francisco-Velilla R, Fernandez-Chamorro J, Ramajo J, Martinez-Salas E. The RNA-binding protein Gemin5 binds directly to the ribosome and regulates global translation. Nucleic Acids Res. 2016;44:8335–51.
Article PubMed PubMed Central CAS Google Scholar
Shi M, Zhang H, Wu X, He Z, Wang L, Yin S, Tian B, Li G, Cheng H. ALYREF mainly binds to the 5′ and the 3′ regions of the mRNA in vivo. Nucleic Acids Res. 2017;45:9640–53.
Article PubMed PubMed Central CAS Google Scholar
Folco EG, Lee CS, Dufu K, Yamazaki T, Reed R. The proteins PDIP3 and ZC11A associate with the human TREX complex in an ATP-dependent manner and function in mRNA export. PLoS One. 2012;7:e43804.
Article PubMed PubMed Central CAS Google Scholar
Nojima T, Gomes T, Grosso ARF, Kimura H, Dye MJ, Dhir S, Carmo-Fonseca M, Proudfoot NJ. Mammalian NET-Seq reveals genome-wide nascent transcription coupled to RNA processing. Cell. 2015;161:526–40.
Article PubMed PubMed Central CAS Google Scholar
Davidson L, Kerr A, West S. Co-transcriptional degradation of aberrant pre-mRNA by Xrn2. EMBO J. 2012;31:2566–78.
Article PubMed PubMed Central CAS Google Scholar
Emili A, Shales M, McCracken S, **e W, Tucker PW, Kobayashi R, Blencowe BJ, Ingles CJ. Splicing and transcription-associated proteins PSF and p54nrb/nonO bind to the RNA polymerase II CTD. RNA. 2002;8:1102–11.
Article PubMed PubMed Central CAS Google Scholar
Lagier-Tourenne C, Polymenidou M, Hutt KR, Vu AQ, Baughn M, Huelga SC, Clutario KM, Ling SC, Liang TY, Mazur C, et al. Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs. Nat Neurosci. 2012;15:1488–97.
Article PubMed PubMed Central CAS Google Scholar
Scott DD, Trahan C, Zindy PJ, Aguilar LC, Delubac MY, Van Nostrand EL, Adivarahan S, Wei KE, Yeo GW, Zenklusen D, Oeffinger M. Nol12 is a multifunctional RNA binding protein at the nexus of RNA and DNA metabolism. Nucleic Acids Res. 2017;45:12509–28.
Article PubMed CAS PubMed Central Google Scholar
Kaiser RWJ, Ignarski M, Van Nostrand EL, Frese CK, Jain M, Cukoski S, Heinen H, Schaechter M, Seufert L, Bunte K, et al. A protein-RNA interaction atlas of the ribosome biogenesis factor AATF. Sci Rep. 2019;9:11071.
Article PubMed PubMed Central CAS Google Scholar
Liang C, **ong K, Szulwach KE, Zhang Y, Wang Z, Peng J, Fu M, ** P, Suzuki HI, Liu Q. Sjogren syndrome antigen B (SSB)/La promotes global microRNA expression by binding microRNA precursors through stem-loop recognition. J Biol Chem. 2013;288:723–36.
Article PubMed CAS Google Scholar
Gottlieb E, Steitz JA. Function of the mammalian La protein: evidence for its action in transcription termination by RNA polymerase III. EMBO J. 1989;8:851–61.
Article PubMed PubMed Central CAS Google Scholar
Nie Y, Ding L, Kao PN, Braun R, Yang JH. ADAR1 interacts with NF90 through double-stranded RNA and regulates NF90-mediated gene expression independently of RNA editing. Mol Cell Biol. 2005;25:6956–63.
Article PubMed PubMed Central CAS Google Scholar
Bahn JH, Ahn J, Lin X, Zhang Q, Lee JH, Civelek M, **ao X. Genomic analysis of ADAR1 binding and its involvement in multiple RNA processing pathways. Nat Commun. 2015;6:6355.
Article PubMed CAS Google Scholar
Han JS, Szak ST, Boeke JD. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature. 2004;429:268–74.
Article PubMed CAS Google Scholar
Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324:218–23.
Article PubMed PubMed Central CAS Google Scholar
Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. A census of human transcription factors: function, expression and evolution. Nat Rev Genet. 2009;10:252–63.
Article PubMed CAS Google Scholar
Konig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J. iCLIP--transcriptome-wide map** of protein-RNA interactions with individual nucleotide resolution. J Vis Exp. 2011;(50). https://doi.org/10.3791/2638.
Chan PP, Lowe TM. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Res. 2016;44:D184–9.
Article PubMed CAS Google Scholar
Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11.
Article PubMed PubMed Central Google Scholar
Yee BA, Pratt GA, Graveley BR, Van Nostrand EL, Yeo GW. RBP-Maps enables robust generation of splicing regulatory maps. RNA. 2019;25:193–204.
Article PubMed PubMed Central CAS Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50.
Article PubMed PubMed Central CAS Google Scholar
Download references
Acknowledgements
We thank members of the Yeo lab, as well as Christopher Burge, Eric Lécuyer, and Stefan Aigner, and members of the Graveley, Burge, Lécuyer, and Fu labs for helpful comments and suggestions during the development of this work.

Peer review information

Kevin Pang was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the team.

Review history

The review history is available as Additional file 6.

Funding
This work was funded by the National Human Genome Research Institute ENCODE Project, contract U54HG007005 to BRG (principal investigator) and GWY (co-principal investigator), and U41HG009889 to BRG (PI) and GWY (PI). GWY and X-DF were supported by R01 HG004659. ELVN is a Merck Fellow of the Damon Runyon Cancer Research Foundation (DRG-2172-13) and is supported by the NHGRI (K99 HG009530).
Author information
Authors and Affiliations
Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
Eric L. Van Nostrand, Gabriel A. Pratt, Brian A. Yee, Emily C. Wheeler, Steven M. Blue, Jasmine Mueller, Samuel S. Park, Keri E. Garcia, Chelsea Gelboin-Burkhart, Thai B. Nguyen, Ines Rabano, Rebecca Stanton, Balaji Sundararaman, Ruth Wang, **ang-Dong Fu & Gene W. Yeo
Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
Eric L. Van Nostrand, Gabriel A. Pratt, Brian A. Yee, Emily C. Wheeler, Steven M. Blue, Jasmine Mueller, Samuel S. Park, Keri E. Garcia, Chelsea Gelboin-Burkhart, Thai B. Nguyen, Ines Rabano, Rebecca Stanton, Balaji Sundararaman, Ruth Wang, **ang-Dong Fu & Gene W. Yeo
Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA
Brenton R. Graveley
Authors
Eric L. Van Nostrand
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel A. Pratt
View author publications
You can also search for this author in PubMed Google Scholar
Brian A. Yee
View author publications
You can also search for this author in PubMed Google Scholar
Emily C. Wheeler
View author publications
You can also search for this author in PubMed Google Scholar
Steven M. Blue
View author publications
You can also search for this author in PubMed Google Scholar
Jasmine Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Samuel S. Park
View author publications
You can also search for this author in PubMed Google Scholar
Keri E. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Chelsea Gelboin-Burkhart
View author publications
You can also search for this author in PubMed Google Scholar
Thai B. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ines Rabano
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Stanton
View author publications
You can also search for this author in PubMed Google Scholar
Balaji Sundararaman
View author publications
You can also search for this author in PubMed Google Scholar
Ruth Wang
View author publications
You can also search for this author in PubMed Google Scholar
**ang-Dong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Brenton R. Graveley
View author publications
You can also search for this author in PubMed Google Scholar
Gene W. Yeo
View author publications
You can also search for this author in PubMed Google Scholar
Contributions
ELVN, SMB, JM, SP, KEG, CGB, TBN, IR, RS, BS, and RW generated the eCLIP and RNP visualization data. ELVN, GAP, and BAY performed the data analysis and software development. ELVN, XDF, BRG, and GWY wrote the paper and led the data generation and analysis. The authors read and approved the final manuscript.
Corresponding authors
Correspondence to Brenton R. Graveley or Gene W. Yeo.
Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Competing interests

ELVN is co-founder, member of the Board of Directors, on the SAB, equity holder, and paid consultant for Eclipse BioInnovations. GWY is co-founder, member of the Board of Directors, on the SAB, equity holder, and paid consultant for Locana and Eclipse BioInnovations. GWY is a visiting professor at the National University of Singapore. ELVN's and GWY's interests have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies. The other authors declare that they have no competing interests.

Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1: Table S1.
Accession identifiers for eCLIP datasets used in the manuscript.
Additional file 2: Table S2.
RNA binding protein function annotations, localization patterns, and predicted RNA binding domains.
Additional file 3: Supplementary Figures S1-S8
.
Additional file 4: Table S3.
List of multi-copy element annotations used in family-aware map**.
Additional file 5: Table S4.
Quantitation of multi-copy RNA family enrichment for 223 eCLIP datasets.
Additional file 6:
Review history.
Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions
About this article
Cite this article
Van Nostrand, E.L., Pratt, G.A., Yee, B.A. et al. Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins. Genome Biol 21, 90 (2020). https://doi.org/10.1186/s13059-020-01982-9
Download citation
Received: 14 October 2019
Accepted: 03 March 2020
Published: 06 April 2020
DOI: https://doi.org/10.1186/s13059-020-01982-9
Keywords
eCLIP
CLIP-seq
RNA binding protein
RNA processing

Advertisement

Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Background

Recovering RNA binding protein association to retrotransposons and other multicopy RNAs

Characterization of ribosomal RNA interactors and processing factors

Repetitive elements define a significant fraction of the RBP target landscape

Meta-gene binding profiles reveal RBP functions

Splicing regulatory roles revealed by intronic meta-gene profiles

Insights into spliceosomal association and core splicing regulation

Clustering of RBP binding identifies known and novel co-associating factors

Discussion

Family-aware map** to multicopy elements

Validation of RNA element links with RBP functional annotations

Ribosomal RNA analysis

Retrotransposable element analysis

Meta-gene and meta-exon peak density maps

Analysis of AQR enrichment at branch points

Enrichment of branch point factors at alternative 3′ splice site events

Co-occurrence of RBP eCLIP peaks and validation of subcomplexes of RBPs

Availability of data and materials

References

Acknowledgements

Peer review information

Review history

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

*Family-aware map** to multicopy elements*