Abstract
Effective interpretation of genome function and genetic variation requires a shift from epigenetic map** of cis-regulatory elements (CREs) to characterization of endogenous function. We developed hybridization chain reaction fluorescence in situ hybridization coupled with flow cytometry (HCR–FlowFISH), a broadly applicable approach to characterize CRISPR-perturbed CREs via accurate quantification of native transcripts, alongside CRISPR activity screen analysis (CASA), a hierarchical Bayesian model to quantify CRE activity. Across >325,000 perturbations, we provide evidence that CREs can regulate multiple genes, skip over the nearest gene and display activating and/or silencing effects. At the cholesterol-level-associated FADS locus, we combine endogenous screens with reporter assays to exhaustively characterize multiple genome-wide association signals, functionally nominate causal variants and, importantly, identify their target genes.
Similar content being viewed by others
Data availability
All raw CRISPRi screening data, MPRA data and processed files have been uploaded to the ENCODE portal with accession no. ENCSR455UGU. Track hubs are available for each locus screened at the following links: https://genome.ucsc.edu/s/skr2/GATA_HCR;https://genome.ucsc.edu/s/skr2/CD164_HCR;https://genome.ucsc.edu/s/skr2/ERP29_HCR;https://genome.ucsc.edu/s/skr2/LMO2_HCR;https://genome.ucsc.edu/s/skr2/NMU_HCR;https://genome.ucsc.edu/s/skr2/MEF2C_HCR;https://genome.ucsc.edu/s/skr2/FADS_HCR; andhttps://genome.ucsc.edu/s/skr2/MYC_HCR. DNase hypersensitivity and histone modification data were collected from ENCODE (https://www.encodeproject.org). Topologically associated domains were collected from the TADKB (http://dna.cs.miami.edu/TADKB/). Genome-wide association study data were collected from the UKBB and the Global Lipids Genetics Consortium (https://biobank.ndph.ox.ac.uk/showcase/ and http://lipidgenetics.org, respectively). Fine-map** data are available at the Finucane Lab (https://www.finucanelab.org/data).
Code availability
The CASA software is available at https://github.com/sjgosai/CASA. The Python software is managed using Miniconda, which is available at https://repo.continuum.io/miniconda/. The Bowtie software is available at https://bioconda.github.io. The GuideScan software is available at https://bioconda.github.io. FlowJo is available at https://www.flowjo.com/solutions/flowjo/ (v.10.7 was used). CellProfiler is available at https://cellprofiler.org/releases. The SAIGE software is available at https://github.com/weizhouUMICH/SAIGE. The BOLT-LMM software is available at https://alkesgroup.broadinstitute.org/BOLT-LMM/BOLT-LMM_manual.html. The FINEMAP software is available at http://www.christianbenner.com (v.1.3.1 was used). The susieR software is available at https://github.com/stephenslab/susieR (v.0.8.1.0521 was used).
Change history
08 September 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41588-021-00943-7
References
Davis, C. A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
Sanyal, A., Lajoie, B. R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Huang, H. et al. Fine-map** inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Mahajan, A. et al. Fine-map** type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Vockley, C. M. et al. Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort. Genome Res. 25, 1206–1214 (2015).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).
Ray, J. P. et al. Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features. Nat. Commun. 11, 1237 (2020).
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
Canver, M. C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).
Sanjana, N. E. et al. High-resolution interrogation of functional elements in the noncoding genome. Science 353, 1545–1549 (2016).
Korkmaz, G. et al. Functional genetic screens for enhancer elements in the human genome using CRISPR–Cas9. Nat. Biotechnol. 34, 192–198 (2016).
Fulco, C. P. et al. Systematic map** of functional enhancer–promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Rajagopal, N. et al. High-throughput map** of regulatory DNA. Nat. Biotechnol. 34, 167–174 (2016).
Diao, Y. et al. A tiling-deletion-based genetic screen for cis-regulatory element identification in mammalian cells. Nat. Methods 14, 629–635 (2017).
Leonetti, M. D., Sekine, S., Kamiyama, D., Weissman, J. S. & Huang, B. A scalable strategy for high-throughput GFP tagging of endogenous human proteins. Proc. Natl Acad. Sci. USA 113, E3501–E3508 (2016).
Gasperini, M. et al. A genome-wide framework for map** gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Choi, H. M. T. et al. Third-generation in situ hybridization chain reaction: multiplexed, quantitative, sensitive, versatile, robust. Development 145, dev165753 (2018).
Lachmann, A. et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9, 1366 (2018).
Cho, S. W. et al. Promoter of lncRNA gene PVT1 is a tumor-suppressor DNA boundary element. Cell 173, 1398–1412.e22 (2018).
Perez, A. R. et al. GuideScan software for improved single and paired CRISPR guide RNA design. Nat. Biotechnol. 35, 347–349 (2017).
Bhattacharya, A., Chen, C.-Y., Ho, S. & Mitchell, J. A. Upstream distal regulatory elements contact the Lmo2 promoter in mouse erythroid cells. PLoS ONE 7, e52880 (2012).
Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).
Landry, J.-R. et al. Expression of the leukemia oncogene Lmo2 is controlled by an array of tissue-specific elements dispersed over 100 kb and bound by Tal1/Lmo2, Ets, and Gata factors. Blood 113, 5783–5792 (2009).
Oram, S. H. et al. A previously unrecognized promoter of LMO2 forms part of a transcriptional regulatory circuit mediating LMO2 expression in a subset of T-acute lymphoblastic leukaemia patients. Oncogene 29, 5796–5808 (2010).
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Gilbert, L. A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).
Tycko, J. et al. Mitigation of off-target toxicity in CRISPR–Cas9 screens for essential non-coding elements. Nat. Commun. 10, 4063 (2019).
Ye, K., Gao, F., Wang, D., Bar-Yosef, O. & Keinan, A. Dietary adaptation of FADS genes in Europe varied across time and geography. Nat. Ecol. Evol. 1, 167 (2017).
Mychaleckyj, J. C. et al. Multiplex genomewide association analysis of breast milk fatty acid composition extends the phenotypic association and potential selection of FADS1 variants to arachidonic acid, a critical infant micronutrient. J. Med. Genet. 55, 459–468 (2018).
Mathieson, I. et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528, 499–503 (2015).
Fenton, J. I., Gurzell, E. A., Davidson, E. A. & Harris, W. S. Red blood cell PUFAs reflect the phospholipid PUFA composition of major organs. Prostaglandins Leukot. Essent. Fatty Acids 112, 12–23 (2016).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics 35, 4851–4853 (2019).
Tukiainen, T. et al. Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci. Hum. Mol. Genet. 21, 1444–1455 (2012).
GTEx Consortium Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Bycroft, C. et al. The UK Biobank resource with deep phenoty** and genomic data. Nature 562, 203–209 (2018).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).
Morgens, D. W. et al. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nat. Commun. 8, 15178 (2017).
Wang, T., Lander, E. S. & Sabatini, D. M. Viral packaging and cell culture for CRISPR-based screens. Cold Spring Harb. Protoc. 2016, pdb.prot090811 (2016).
Kruschke, J. K. Rejecting or accepting parameter values in Bayesian estimation. Adv. Methods Pract. Psychol. Sci. 1, 270–280 (2018).
Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016).
Hoffman, M. D. & Gelman, A. The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014).
Wang, J. et al. Nascent RNA sequencing analysis provides insights into enhancer-mediated gene regulation. BMC Genomics 19, 633 (2018).
Wang, J. et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 41, D171–D176 (2013).
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP–Seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Benner, C., Havulinna, A. S., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-map**: effect sizes and regional heritability. Preprint at bioRxiv https://doi.org/10.1101/318618 (2018).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-map**. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Liu, T. et al. TADKB: family classification and a knowledge base of topologically associating domains. BMC Genomics 20, 217 (2019).
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Acknowledgements
We thank C. Fulco, A. Lin, C. Myhrvold, H. Metsky, B. Petros, J. Ray, S. Schaffner and J. Xue for editing and conversations about the manuscript. We thank C. Otis, N. Pirete and P. Rodgers at the Broad Flow Cytometry Core for cytometry and sorting assistance. We thank the Broad Imaging Platform for custom scripting and assistance in image analysis. We thank J. Ray and M. Bakalar in the Hacohen Lab for sorting and microscopy assistance. We thank C. Fulco, J. Engreitz and E. Lander for discussion on PrimeFlow and CRISPR screens. This work and S.K.R., S.J.G., A.G., A.M.-S., S.K., D.B. and R.T. were supported by the ENCODE Functional Characterization Center (grant no. UM1HG009435), a Broad SPARC grant and the Howard Hughes Medical Institute. S.K.R. is partially supported by grant nos. K99HG010669 and F32HG00922. R.T. is supported by grant nos. R00HG008179 and R01AI151051. S.J.G. was partially supported by grant no. 4T32GM007226-41.
Author information
Authors and Affiliations
Contributions
S.K.R., S.J.G. and R.T. designed the experiments. S.K.R., A.G., A.M.-S., K.M., G.M.B., A.G.-Y., D.B., S.K., R.M.B., M.L.S. and R.T. performed the experiments. S.K.R., S.J.G., A.M.-S. and R.T. designed and performed the data analysis. M.K., J.C.U. and H.K.F. performed the fine-map** analyses. S.K.R., S.J.G., A.G., A.M.-S., H.K.F., P.C.S. and R.T. contributed to the writing of the manuscript and the interpretation of the data.
Corresponding authors
Ethics declarations
Competing interests
P.C.S. is a cofounder of and consultant to Sherlock Biosciences and board member of the Danaher Corporation. The other authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Ran Elkon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 CRISPRi induction, sorting schema, and construction of CASA (CRISPR Activity Screen Analysis), a generative model of CRE activity.
a, Induction of CRISPRi linked to BFP via doxycycline shows robust activation. b, Example sorting strategy showing detection of a target transcript (GATA1) amplified with Alexa-647 conjugated hairpins, and a housekee** transcript (TBP) amplified with Alexa-488 conjugated hairpins. The top and bottom 10% of 647:488 normalized ratio are differentially sorted. c, The generative process underlying CASA (CRISPR Activity Screen Analysis) described as a plate model, explicit statistical parameterization, and variable definitions. Shaded and unshaded circles indicate observed and latent variables, respectively. The variable W corresponds to the set of windows tested, while each Nw arises from the set of sgRNAs considered at the wth window.
Extended Data Fig. 2 HCR-FlowFISH screens display high similarity and increased sensitivity compared to growth screens at the GATA1 locus.
a, Overlap of the GATA1 guide library used in this study and Fulco et al.17 library. b, High correlation (Pearson r = 0.84, two-sided t-test P = 3.4 × 10−106) between individual guide scores for detected sgRNAs shared in the GATA1 HCR-FlowFISH screen and the Fulco et al.17 growth screen (black line is the ordinary least squares regression best fit, gray shaded band is 95% confidence interval). c, Guide-wise score comparison for all sgRNAs shared between growth and HCR-FlowFISH screens, showing read-depth of gRNA drives correlation more than off-target effects (cutting specificity). d, Individual gRNA guide scores plotted at the GATA1 promoter locus display the opposite direction CREs for GATA1 and HDAC6. e, Comparison of individual guide scores for guides shared between the HCR-FlowFISH and Fulco et al.17 growth screens. The distributions scores within CREs are more distinctly separated from those without when using HCR-FlowFISH. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively. n = 906 (grey boxes) and n = 313 (green boxes) shared guides analyzed outside and inside CRE boundaries, respectively.
Extended Data Fig. 3 HCR-FlowFISH and CASA enhance selectivity of CRISPRi screens at the GATA1 locus.
a, HCR-FlowFISH and PrimeFlow-CRISPRi individual guide score comparison for shared guides. Guides are grouped by overlap with CASA-nominated CREs. We find using HCR-FlowFISH improves separability between guide scores inside and outside of designated CREs compared to PrimeFlow. We also note guide score variability is reduced in HCR-FlowFISH. The minima, centers, and maxima of the boxes indicate the 25th, 50th, and 75th percentiles of the data distributions. Whiskers capture all remaining data, excluding outliers extending beyond 1.5 times the interquartile range below or above the 25th or 75th percentiles, respectively. n = 2,897 (grey boxes) and n = 88 (green boxes) shared guides analyzed outside and inside CRE boundaries, respectively. b, CASA CRE identification on simplified ABC data and comparison to HCR data. CASA only considers the highest and lowest expression bins from the first PCR replicate of each CRISPRi-FlowFISH screen replicate, yet distinguishes CREs from non-specific scores induced by perturbing the GATA1 gene body, in contrast to the original analysis.
Extended Data Fig. 4 HCR-FlowFISH and CASA identify CREs for multiple loci.
a,b, Connectogram diagrams showing K562 DHS (light blue), K562 H3K27ac (dark blue), guide coverage (black), HCR-FlowFISH composite guide score tracks, and CASA CREs calls for MYC (teal), PVT1 (salmon), LMO2 (orange), CAPRIN1 (navy), and CAT (lilac). CASA-derived CRE activity scores are shown as lines connecting the CRE to the target gene, and colored by effect on transcript abundance (black decreases abundance, red increases abundance). In a, ‘Pro’ and ‘e1-4’ denote the promoter and enhancers identified at this locus in Fulco et al.17. In b, ‘P’, ‘I’, ‘D’, denote the proximal, intermediate and distal promoters of LMO2, respectively. c, Relative mRNA expression compared to unperturbed cells for CRISPRi perturbations of distal, intermediate + distal, and proximal + distal promoters. Three technical replicates shown, bars represent standard deviation.
Extended Data Fig. 5 HCR-FlowFISH and CASA reveal complex CRE sharing at the FADS locus.
a,b, Individual guide scores (points) and CASA CRE calls (bars) of HCR-FlowFISH screens for FADS1 (green), FADS2 (teal), and FADS3 (orange). K562 DHS (light blue) and H3K27ac (dark blue) peaks are also shown. Notably, these elements are shared between all three FADS genes. Surprisingly, perturbing the CRE in a results in a modest, but detectable, increase in FADS3 transcripts, in contrast to the decreases in FADS1 and FADS2 transcript abundance.
Extended Data Fig. 6 Functional characterization nominates rs174466 as a FADS3 CRE-activity altering SNP.
a, Genomic region surrounding the FADS3 promoter, highlighting tiling MPRA signal (red) and HCR-FlowFISH composite score for FADS3 (orange). rs174466 is denoted, along with all variants in linkage disequilibrium (r2 ≥ 0.2). Variants within an HCR-FlowFISH identified FADS3 CRE are labeled in orange, and variants displaying allelic skew from MPRA are denoted with a red outline. SP2 ChIP-seq signal overlap** rs174466 is included in grey. b, GWAS trait associations with rs174466 shows multiple overlaps with metabolic targets of FADS3. c, MPRA activity for reference and alternate version of the rs174466 shows increased CRE activity on the alternate allele. d, Motif for SP2 highlighting change to alternate allele better matches the canonical motif.
Supplementary information
Supplementary Information
Supplementary Figs. 1–7
Supplementary Tables
Supplementary Tables 1–3
Supplementary Data 1
Sequencing primers, qPCR primers, MPRA primers and guides.
Supplementary Data 2
CRISPRi HCR–FlowFISH screen results.
Supplementary Data 3
MPRA results.
Supplementary Data 4
FADS locus SNPs, GWAS and fine-map** results.
Rights and permissions
About this article
Cite this article
Reilly, S.K., Gosai, S.J., Gutierrez, A. et al. Direct characterization of cis-regulatory elements and functional dissection of complex genetic associations using HCR–FlowFISH. Nat Genet 53, 1166–1176 (2021). https://doi.org/10.1038/s41588-021-00900-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00900-4
- Springer Nature America, Inc.
This article is cited by
-
A common regulatory haplotype doubles lactoferrin concentration in milk
Genetics Selection Evolution (2024)
-
Genome-wide characterization and expression of Oryza sativa AP2 transcription factor genes associated with the metabolism of mesotrione
Chemical and Biological Technologies in Agriculture (2024)
-
Genome-wide Cas9-mediated screening of essential non-coding regulatory elements via libraries of paired single-guide RNAs
Nature Biomedical Engineering (2024)
-
Engineered CRISPR-Cas12a for higher-order combinatorial chromatin perturbations
Nature Biotechnology (2024)
-
Multicenter integrated analysis of noncoding CRISPRi screens
Nature Methods (2024)