Background

The SWI/SNF complex remodels chromatin through ATP-dependent DNA sliding, H2A/H2B dimer eviction, and nucleosome ejection functions [1,2,3]. SWI/SNF remodeling activities open chromatin and promote accessibility for other DNA-binding factors and chromatin regulators [4, 5]. SWI/SNF complex composition is heterogeneous and cell type dependent [6]. SWI/SNF regulates lineage-specific enhancer activity through multiple mechanisms [7]. Protein subunit architecture contributes to SWI/SNF complex specificity through specialized cofactor interactions. The activities of chromatin remodelers and associated machinery are known to modulate the epigenome by regulating histone post-translational modifications and nucleosome composition [5]. Multiple chromatin remodeler complexes are often observed at the same genomic loci and can perform redundant, cooperative, or antagonistic transcriptional regulatory roles [8].

Subunits within the mammalian SWI/SNF (BAF) chromatin remodeler complex are mutated across an estimated 20% of all human cancers [9]. Tissue-specific propensities for mutations in certain SWI/SNF subunits are also evident [10]. ARID1A (BAF250A) is the most frequently mutated SWI/SNF subunit [11]. ARID1A is the largest SWI/SNF subunit and acts as a structural scaffold for other subunits in certain SWI/SNF complexes [12, 13]. ARID1A also exhibits essential DNA-binding activity albeit in a non-sequence-specific manner [14, 15]. Defects in chromatin accessibility and higher-order chromatin structure are thought to underlie ARID1A and SWI/SNF mutant pathogenesis at least partially [16, 17]. Uterine endometrial cancer displays high rates of ARID1A mutation, with roughly 40% of cases showing loss of ARID1A expression [18, 19]. ARID1A mutations and loss of expression are also observed in deeply invasive forms of endometriosis, which is characterized by ectopic spread of the endometrium [20,21,22]. ARID1A mutations are also common in endometriosis-associated ovarian cancers [23, 24].

In the endometrial epithelium, we have previously shown that ARID1A normally promotes epithelial identity by repressing the expression of mesenchymal and invasion genes, through promoter-proximal and distal chromatin interactions that affect transcriptional activity [25,26,27]. In particular, we have found that ARID1A-dependent repression of super-enhancer activity plays critical roles in the maintenance of epithelial identity in the endometrium [25,26,27]. Other reports have demonstrated that ARID1A and SWI/SNF can function as a repressor, often through interactions with repressive machinery [15, 28,29,30]. Although nucleosome structure and histone post-translational modifications are suspected mechanisms, it remains poorly understood how SWI/SNF governs the epigenome.

Here, we reveal a mechanism by which ARID1A maintains histone variant H3.3 in active chromatin. This regulation is required for binding of the SWI/SNF-like CHD4 (NuRD) remodeler complex and linked to the CHD4-interacting multivalent histone reader ZMYND8, notably at a subset of super-enhancers. We finally reveal that this mechanism of ARID1A, H3.3, CHD4, and ZMYND8 co-repression targets physiologically relevant genes involved in epithelial-to-mesenchymal transition (EMT) and cellular invasion, and these genes are aberrantly upregulated in human endometriomas. Altogether, our studies reveal a role for ARID1A-containing SWI/SNF complexes in the maintenance of H3.3, and, at a subset of physiologically relevant target genes, H3.3, ARID1A, CHD4, and ZMYND8 are required for transcriptional repression.

Results

ARID1A regulates H3.3-associated active chromatin

Our previous studies have demonstrated that ARID1A promotes epithelial characteristics in immortalized 12Z human endometriotic epithelial cells at both the transcriptional and phenotypic levels, such that ARID1A loss leads to epithelial-to-mesenchymal transition (EMT) and enhanced migration and invasion [25, 27]. ARID1A loss in 12Z recapitulates many of the molecular and cellular features observed in ARID1A-deficient endometrial epithelia in vivo [25, 27]. Altogether, 12Z cells represent a model system to explore physiological roles for ARID1A in epigenomic regulation.

Histone H3.3 is a variant of canonical H3 with known ties to active chromatin and transcriptional regulation [31, 32]. Like ARID1A, H3.3 has also been observed to mark and regulate active enhancers [33,34,35,36]. We investigated the relationship between ARID1A binding and H3.3 in 12Z cells. To measure genome-wide H3.3 localization, we performed H3.3 chromatin immunoprecipitation followed by sequencing (ChIP-seq) in control 12Z cells (n = 2 IP replicates). Significant H3.3 enrichment was observed at 40,006 genomic regions (Fig. 1A). Intronic, intergenic, and promoter-TSS regions comprised the vast majority of H3.3 enrichment sites (Fig. 1A). H3.3 ChIP-seq peaks were 1830 bp in width on average and ranged from <500 bp to >10 kilobases (Fig. 1B). Intersecting H3.3 ChIP-seq peaks with our previously published ARID1A ChIP-seq data from these cells [25] revealed that over half of each peak set overlapped (Fig. 1C), a 16.9-fold over-representation genome-wide (Fig. 1D).

Fig. 1
figure 1

Genome-wide analysis of H3.3-ARID1A chromatin co-regulation. A Genomic annotation of 40,006 genome-wide H3.3 ChIP-seq peaks in 12Z cells (n = 2). B Distribution of H3.3 peak widths. Median H3.3 peak width is 1830 bp. C Genome-wide overlap of ARID1A and H3.3 ChIP-seq peaks. D Genome-wide association between H3.3 and other previously measured chromatin features, per genomic bp, quantified as [observed / expected]. Statistic is hypergeometric enrichment. E Enrichment for H3.3 and ARID1A co-regulation across 18 chromatin states previously modeled via ChromHMM [27]. Left, enrichment of H3.3 peaks; center, enrichment of H3.3+ARID1A binding; right, enrichment of ARID1A binding at sites with vs. without H3.3. Statistic is hypergeometric enrichment. F Left, ARID1A binding levels (ChIP/input fold-enrichment, FE) at H3.3+ vs. H3.3− ARID1A peaks. Right, H3.3 abundance (ChIP/input fold-enrichment) at ARID1A+ vs. ARID1A− H3.3 peaks. Statistic is two-tailed, unpaired Wilcoxon’s test. G Top, enrichment of H3.3 at genes promoter-proximally bound by ARID1A. Bottom, enrichment of ARID1A+H3.3 co-binding at genes DE following ARID1A loss (siARID1A treatment). Statistics are hypergeometric enrichment test and pairwise two-tailed Fisher’s exact test. H Example hg38 browser shots of genes and regulatory elements co-regulated by H3.3 and ARID1A. y-axis is log-likelihood ratio (logLR) of assay signal (compared to input chromatin for ChIP-seq or background genome for ATAC-seq). Small bars under tracks indicate significant peak detection by MACS2 (FDR < 0.05). Super-enhancers were detected by ROSE from H3K27ac ChIP-seq. * p < 0.05, ** p < 0.01, *** p < 0.001

We previously constructed a genome-wide chromatin state map accompanying ARID1A loss in 12Z cells via chromHMM [37] by measuring seven chromatin features associated with transcriptional regulation: total RNA, ATAC (accessibility), H3K27ac, H3K18ac, H3K4me1, H3K4me3, and H3K27me3 [27]. Similar to our previous reports of ARID1A regulated chromatin states, genomic H3.3 enrichment was highly associated with all active, euchromatic features, but not heterochromatic H3K27me3 (Fig. 1D). Annotating H3.3 enrichment in each of our characterized chromatin states revealed that H3.3 is associated with similar regulatory chromatin states as ARID1A binding, most notably super-enhancers and active typical enhancers (Fig. 1E, left). In agreement, co-regulation by H3.3 and ARID1A was most prominently observed at these same chromatin states (Fig. 1E, center). Next, we examined ARID1A binding at H3.3-marked vs. H3.3-absent chromatin sub-states and found that ARID1A binding was associated with H3.3 at promoter-proximal and genic super-enhancers and active transcription start sites (TSS) (Fig. 1E, right). Upon further investigation of ARID1A and H3.3 genome-wide binding patterns, we observed that genome-wide ARID1A peaks showed overall stronger ARID1A binding when H3.3 was also localized, and H3.3 was overall more abundant at genome-wide H3.3 peaks also bound by ARID1A (Fig. 1F). These data indicate that ARID1A and H3.3 may co-regulate active chromatin elements like enhancers and gene promoters.

We previously reported that ARID1A chromatin binding near gene promoters is associated with transcriptional regulation, such that ARID1A loss leads to aberrant gene expression [25]. Our H3.3 data further revealed that ARID1A binding at promoter-proximal regulatory elements is highly enriched among genes marked by promoter-proximal H3.3 (Fig. 1G, top), indicating that ARID1A transcriptional regulation may be coupled with H3.3. Moreover, the 2037 genes co-marked by ARID1A and H3.3 in the promoter-proximal region (±3 kb surrounding TSS) were more likely to show differential expression (DE) following ARID1A loss than genes without promoter-proximal H3.3 (Fig. 1G, bottom). In addition, locus-scale investigation clearly showed that ARID1A and H3.3 often co-mark active chromatin regulatory elements, which infrequently also includes gene body coating by H3.3, such as at COL1A1, THBS1, and SERPINE1 (Fig. 1H). These data collectively suggest H3.3 may be linked to transcriptional regulatory activity by ARID1A at the level of chromatin.

ARID1A chromatin interactions maintain H3.3

To understand the relationship between ARID1A and H3.3, we depleted ARID1A from 12Z cells using lentiviral shRNA particles targeting ARID1A (shARID1A) then measured H3.3 by ChIP-seq. Our differential H3.3 ChIP-seq analysis (shARID1A vs. non-targeting shRNA control, n = 2) indicated that nearly 1/3 of tested H3.3 regions showed significant differences in H3.3 abundance (csaw/edgeR, FDR < 0.05) at 72 h following ARID1A knockdown (Fig. 2A, Additional file 1: Fig. S1A). We noted that ARID1A knockdown in 12Z cells did not result in obvious changes in global H3.3 levels by immunoblotting of the histone fraction (Additional file 1: Fig. S1B), suggesting any effects are likely occurring at the level of chromatin. This is further supported by our previously reported 12Z ARID1A knockdown RNA-seq data [25] indicating that the dominantly expressed H3.3-encoding gene isoform, H3F3B, does not change in expression (Additional file 1: Fig. S1C). We then investigated how ARID1A chromatin binding may be directly associated with the observed changes in H3.3 following ARID1A loss. Strikingly, ARID1A-bound differential H3.3 regions almost exclusively lost H3.3 and rarely gained H3.3 (Fig. 2B). Corroborating this result, we also profiled canonical H3.1/3.2 histone levels by ChIP-seq (n = 2) and observed that ARID1A-bound, H3.3-marked chromatin regions gain H3.1/3.2 (Fig. 2C). While 33% of all tested H3.3 regions had detectable ARID1A binding, 81% of the 8418 shARID1A decreasing H3.3 regions were normally bound by ARID1A, as opposed to only 3% of the 11,059 shARID1A increasing H3.3 regions (Fig. 2D). These results indicate that ARID1A interactions with H3.3 chromatin may serve to promote H3.3 incorporation or maintain its stability. When ARID1A is mutated, H3.3-marked regions shift toward canonical H3.1/3.2.

Fig. 2
figure 2

Genome-wide analysis of ARID1A-dependent H3.3. A MA plot of shARID1A vs. control differential H3.3 ChIP-seq (n = 2), across 67,502 tested genomic regions. Regions are colored based on shARID1A differential H3.3 significance. Inset pie chart depicts distribution of significantly increasing and decreasing H3.3 regions (csaw/edgeR FDR < 0.05) compared to stable H3.3 (FDR > 0.05). FDR < 0.05 was used as the significance threshold for all downstream analyses. B shARID1A differential H3.3 regions segregated by detection of ARID1A binding in wild-type cells. Left, MA plot with all genome-wide H3.3 tested regions, colored by ARID1A binding status. Right, box plot quantification of shARID1A log2FC H3.3 abundance, segregated by ARID1A binding status. Statistic is two-tailed, unpaired Wilcoxon’s test. C Analysis of canonical H3 (H3.1/3.2) changes (ChIP-seq, n = 2) at H3.3-marked genomic regions following ARID1A knockdown (shARID1A), segregated by ARID1A binding status as in B. Statistic is two-tailed, unpaired Wilcoxon’s test. D Enrichment of ARID1A binding detection at regions with decreasing H3.3 following ARID1A loss compared to all tested H3.3 regions. Statistics are hypergeometric enrichment test and pairwise two-tailed Fisher’s exact test. E Magnitude of H3.3 change (log2FC) among ARID1A-bound, shARID1A significantly decreasing vs. increasing H3.3 regions. Statistic is two-tailed, unpaired Wilcoxon’s test. F Distribution of H3.3-enriched region widths among shARID1A stable vs. increasing vs. decreasing H3.3 regions. Statistic is two-tailed, unpaired Wilcoxon’s test. G Chromatin state enrichment among shARID1A increasing and decreasing H3.3 regions, calculated per 200 bp genomic interval. Statistic is hypergeometric enrichment. H Top 10 significant (FDR < 0.05) enriched Hallmark pathways (left) and GO Biological Process gene sets (right) among genes with ARID1A-bound, shARID1A decreasing promoter-proximal H3.3. I Representative hg38 locus near CCL2 displaying H3.3 maintained by ARID1A chromatin interactions. *** p < 0.001

We further characterized the changes in H3.3 occurring following ARID1A loss. Globally, we found that typical enhancers (distal regions marked by H3K27ac and ATAC, >3 kb away from a TSS and excluding super-enhancers) were enriched for shARID1A-driven H3.3 alterations as compared to gene promoter-proximal regions and super-enhancers (Additional file 1: Fig. S1D). Intriguingly, gene promoter-proximal regions displayed both decreasing and increasing H3.3, whereas distal typical enhancers and super-enhancers almost exclusively lost H3.3 if significantly affected (Additional file 1: Fig. S1E). Among ARID1A-bound genomic H3.3 regions, shARID1A decreasing H3.3 regions tended to display greater differences in H3.3 abundance than shARID1A increasing H3.3 regions (Fig. 2E), supporting a role for ARID1A in promoting maintenance of H3.3 rather than limiting it. Regions that displayed shARID1A decreasing H3.3 also tended to have overall wider genomic footprints than increasing or stable H3.3 regions (Fig. 2F). In agreement with where ARID1A-H3.3 co-regulation is most frequently observed, chromatin state enrichment analysis indicated that ARID1A loss led to depletion of H3.3 at promoter-proximal super-enhancers and highly active enhancers, while increasing H3.3 was observed over actively transcribed gene bodies (Fig. 2G). From the 412 genes we identified with ARID1A-bound, shARID1A decreasing promoter-proximal H3.3, we found significant enrichment for inflammatory, hypoxia, apoptosis, locomotion, and EMT pathways, such as CCL2 (Fig. 2H,I). These data suggest that ARID1A maintains H3.3 at active regulatory elements such as enhancers and super-enhancers, and, when ARID1A is lost, redistribution of H3.3 occurs toward active genes already marked by H3.3.

H3.3 depletion phenocopies transcriptional effects of ARID1A loss

We next sought to determine the transcriptional consequences of H3.3 loss in endometrial epithelia. We hypothesized that H3F3B could be knocked down to reduce H3.3 levels for acute transcriptome evaluation without impeding cell health (Fig. 3A). Using siRNA targeting H3F3B (siH3F3B), we observed H3.3 depletion by immunoblotting without affecting the cell cycle (Fig. 3B, Additional file 1: Fig. S2A-B). RNA-seq transcriptome analysis (n = 3) 72 h following siRNA transfection showed clear loss of H3F3B expression, but not H3F3A, accompanying 1608 significant DE genes (DESeq2, FDR < 0.001) including those both upregulated (repressed by H3.3) and downregulated (activated by H3.3) (Fig. 3C–E). As expected, we also observed highly significant enrichment for H3.3-dependent transcriptional changes among genes marked by promoter-proximal H3.3 (Additional file 1: Fig. S2C). Similar to our previous observations with acute ARID1A loss [25], depletion of H3.3 led to mostly minor alterations in gene expression, with the majority of DE genes displaying <0.5 log2FC expression change (Fig. 3E). These data indicate H3.3 serves both activating and repressing roles in transcriptional regulation of endometrial epithelial cells.

Fig. 3
figure 3

Transcriptional effects of H3.3 depletion and overlap with ARID1A. A Baseline relative linear expression of H3F3A (H3-3A) and H3F3B (H3-3B) gene isoforms encoding H3.3, as measured by RNA-seq (n = 3). B Western blot for H3.3 and total H3 in control vs. siH3F3B treated cells. C Global transcriptomic effects of 24,192 genes following H3.3 knockdown via siH3F3B treatment (RNA-seq, n = 3). Red dots represent significant DE genes (DESeq2, FDR < 0.001). D Relative linear expression of H3F3A and H3F3B by RNA-seq in control and siH3F3B cells (n = 3). E Volcano plot depicting siH3F3B vs. control differential gene expression (DGE). Top significant genes are labeled. F Significant overlap in DE genes following H3.3 knockdown (siH3F3B) vs. ARID1A knockdown (siARID1A). Statistic is hypergeometric enrichment. G Directional segregation of siH3F3B/siARID1A overlap** DE genes. A positive association is observed by chi-squared test, i.e., genes are more likely to be upregulated or downregulated in both conditions as opposed to antagonistic regulation. H Scatter plot of siH3F3B vs. siARID1A expression log2FC (with shrinkage correction) for all 19,900 transcriptome-wide commonly detected genes. Statistics are Pearson (r) and Spearman (rs) correlation coefficients. Colored dots indicate significant DE genes (FDR < 0.001) in both treatment conditions. I Association between H3.3 transcriptional repression (siH3F3B upregulation) and transcriptional co-regulation by ARID1A (siARID1A DE). Statistic is two-tailed Fisher’s exact test. J Scatter plot of 196 shared DE genes upregulated following knockdown of either H3.3 or ARID1A. These genes are mutually repressed by H3.3 and ARID1A. K Top significant (FDR < 0.05) enriched gene sets among the 196 ARID1A-H3.3 mutually repressed genes among various gene set databases

Comparing the gene expression changes following H3.3 loss with those following ARID1A loss, we observed significant overlap, with 682 shared dysregulated genes (Fig. 3F). These 682 genes were then grouped by direction of change (upregulated vs. downregulated) to identify genes with the same or different expression patterns following ARID1A vs. H3.3 loss. A significant association was observed between the effects of H3.3 and ARID1A loss indicating shared transcriptional consequences (Fig. 3G). Gene expression changes also positively correlated transcriptome-wide (Fig. 3H). Intriguingly, the 682 genes also affected by ARID1A loss were more likely to be transcriptionally repressed by H3.3 (Fig. 3I). In total, 196 genes were identified as mutually repressed by both ARID1A and H3.3, including PLAU, ADAMTS15, C1S, CD82, CCL2, and CLSTN2 (Fig. 3J). In agreement with differential H3.3 patterns, these 196 co-repressed genes were enriched for similar gene sets as observed among the ARID1A-bound, shARID1A decreasing promoter-proximal H3.3 gene set, including EMT, TNFα signaling, estrogen response, apoptosis, adhesion, migration, extracellular matrix, and collagens (Fig. 3K). Altogether, these data suggest that ARID1A and H3.3 co-regulate similar target genes in endometrial epithelial cells. At the chromatin level, depletion or destabilization of H3.3 as a result of ARID1A loss may lead to the upregulation of a physiologically relevant set of EMT and invasion genes.

ARID1A co-regulates H3.3 with CHD4 and ZMYND8

While our data implicate H3.3 in the ARID1A mutant endometrium, few reports have linked SWI/SNF activity to H3.3 containing nucleosomes [38, 39]. To gain insight into factors associated with H3.3 regulation by ARID1A, we used the ReMap2020 database of 165 million peak regions extracted from genome-wide binding assays [40]. For all 1135 transcriptional regulators included in this database, we calculated genome-wide associations for each set of factor peaks with H3.3-marked (H3.3+) vs. H3.3-absent (H3.3−) ARID1A binding. This analysis revealed that two zinc finger MYND-type proteins, ZMYND11 (BS69) and ZMYND8 (PRKCBP1, RACK7), were among the top co-regulators associated with H3.3+ ARID1A chromatin binding (Fig. 4A, OR = 2.93 and 2.43 for ZMYND11 and ZMYND8, respectively). These data suggest that H3.3 regulation by ARID1A may be mediated by these co-regulators. ZMYND11 and ZMYND8 are multivalent chromatin readers that are suggested to function as interfaces between histones and other chromatin regulator complexes like remodelers, writers, and erasers [41, 42]. Both proteins interact with H3/H4 acetylated tails through bromodomains and may show specificity toward or against H3.3-containing nucleosomes [42,1: Fig. S9J-L), including 603 genes affected by each of the four knockdowns (FDR < 0.05) (Fig. 7F, Additional file 1: Fig. S9L). These included 60 genes mutually repressed by ARID1A, CHD4, ZMYND8, and H3.3 (Fig. 7G). These mechanistic co-repressed genes were enriched for EMT, adhesion, development, locomotion, collagens, and extracellular matrix gene sets (Fig. 7H). Further, 68% of these genes were marked by gene body H4K16ac, an enrichment compared to less than half of all expressed genes (Additional file 1: Fig. S10). Two physiologically relevant target genes revealed through integrative epigenomic analysis are PLAU and TRIO, both of which are located within broad H4K16ac+ domains and near active H3.3+ super-enhancers co-bound by ARID1A, CHD4, and ZMYND8 (Fig. 7I). ARID1A loss leads to decreased promoter-proximal H3.3 abundance and transcriptional hyperactivation of PLAU and TRIO (Fig. 7I). We also observed that co-knockdown of ARID1A and CHD4 led to increased induction of PLAU compared to either knockdown separately (Additional file 1: Fig. S11).

ARID1A-H3.3 repressed chromatin targets are aberrantly activated in human endometriomas

Our studies in the 12Z human endometrial epithelial cell line have revealed a mechanism of cooperative regulation by ARID1A, CHD4, and ZMYND8 at H3.3-marked chromatin. To support the relevance of these chromatin regulatory networks on pathologically related gene expression, we utilized a transcriptome expression data set comparing human endometriomas to control endometrial tissue samples [53]. Endometriomas are a result of ectopic spread of endometrial tissue onto the ovary, forming cysts associated with ovarian cancer development [20, 54], and numerous reports have observed high rates of ARID1A mutation or loss of expression in endometriomas [21, 22, 55]. Three ARID1A-H3.3 related gene sets were investigated for relevance in human endometrioma gene expression alterations: (1) ARID1A-bound, shARID1A decreasing promoter-proximal H3.3 genes (n = 412), (2) ARID1A-H3.3 co-repressed genes (i.e., siARID1A/siH3F3B upregulated, FDR < 0.001, n = 196), and (3) ARID1A-H3.3-CHD4-ZMYND8 co-repressed genes (i.e., upregulated with any knockdown, FDR < 0.05, n = 60). We observed significant enrichment for all three of these gene sets among human endometrioma DGE (Fig. 8A, left). Moreover, the overlap** DE genes were more likely to be upregulated in endometriomas than expected by chance, indicating relief of repression is also observed in pathology (Fig. 8A, right). Similarly, examining the endometrioma vs. control endometrium expression log2FC values indicated that each gene set tended to be overall upregulated in the pathological, pre-cancerous state (Fig. 8B). Mechanistic genes aberrantly activated in endometriomas that could be attributed to disruption of ARID1A-H3.3 chromatin repression mechanisms include C1S, SCARB1, GYPC, WWC3, COL6A2, and MAP4K4 (Fig. 8C). Collectively, our data indicate that ARID1A-SWI/SNF maintains the histone variant H3.3 in active regulatory elements, and a subset of physiologically relevant genes are co-regulated by CHD4 and ZMYND8, such that loss of any of these factors leads to alleviation of transcriptional repression and consequential aberrant gene activation in various endometrial disease contexts where ARID1A mutations are thought to drive pathogenesis (Fig. 9).

Fig. 8
figure 8

Mechanistic gene expression alterations in human endometriomas. A Left, enrichment for ARID1A-H3.3 co-repressive chromatin mechanistic gene sets among human endometrioma (ovarian endometriosis) vs. control endometrium DE genes reported by Hawkins et al. [53], compared to all unique measured genes. Right, proportion of overlap** DE genes that are upregulated vs. downregulated in endometriomas, compared to all unique measured genes. Statistic is hypergeometric enrichment. B Box plots displaying endometrioma expression log2FC values for probes annotated to genes within mechanistic gene sets, compared to all measured probes. Statistic is two-tailed, unpaired Wilcoxon’s test. C Relative expression box-dot plots of 6 genes upregulated in endometriomas vs. control endometrium that are co-repressed by ARID1A, H3.3, CHD4, and ZMYND8. Statistic is limma FDR-adjusted p. * p < 0.05, ** p < 0.01, *** p < 0.001

Fig. 9
figure 9

Proposed model of H3.3 chromatin regulation by ARID1A-SWI/SNF and co-regulators. ARID1A and SWI/SNF chromatin remodeling activities are required for H3.3 incorporation or maintenance at certain active regulatory elements across the genome, such as super-enhancers. When ARID1A is mutated or lost, H3.3 maintenance is disrupted, and nucleosome composition shifts toward canonical H3.1/3.2 at ARID1A-bound sites. Consequential to local H3.3 depletion, H3.3 reader factor occupancy is reduced—such as the CHD4-containing NuRD complex—leading to impaired chromatin regulation and aberrant target gene expression. At H3.3+ H4K16ac+ super-enhancer-like elements located promoter-proximally upstream of genes, H3.3 maintenance by ARID1A-SWI/SNF is associated with repression of transcriptional hyperactivation and the NuRD cofactor ZMYND8

Discussion

We have provided evidence that ARID1A functions to maintain the variant histone H3.3 in active regulatory elements. ARID1A loss leads to H3.3 depletion at active enhancers and super-enhancers, due to disrupted ARID1A chromatin interactions, leading to gain of canonical H3.1/3.2 and redistribution of H3.3 toward active genic and transcribed elements. We further showed that this mechanism is largely independent of H3.3-interacting remodeler CHD4-NuRD. Instead, our data suggest that ARID1A-dependent maintenance of H3.3 is required for CHD4-NuRD binding at a subset of enhancers. Therefore, the BAF complex helps to facilitate H3.3 incorporation, and this activity is required for the recruitment of alternative chromatin remodelers and chromatin regulators with unique regulatory activity.

SWI/SNF is thought to eject nucleosomes and open chromatin [2, 4] rather than assemble nucleosomes. SWI/SNF disruption of nucleosomes may be required for H3.3 incorporation and thus coupled to nucleosome assembly. Therefore, we hypothesize that H3.3 regulation by ARID1A-SWI/SNF occurs by ejecting nucleosomes in favor of H3.3 incorporation by other assembly or chaperone factors, such as HIRA, at active regulatory elements [31]. Unlike the DAXX/ATRX complex, which governs H3.3 incorporation at pericentromeric heterochromatin and telomeres and has intrinsic ATP-dependent remodeling activity, HIRA may rely on other chromatin remodeling complexes for its chaperone activity [31]. In the absence of ARID1A-SWI/SNF, H3.3 nucleosome assembly by HIRA may be impeded by the lack of H3.1/3.2 nucleosome remodeling by the BAF complex. The related CHD1 remodeler is known to be required for H3.3 deposition into chromatin in vivo [56], further suggesting a necessary role for SWI/SNF remodeler activity in H3.3 nucleosome assembly. In addition, both FACT and Polybromo-associated Brm (PBAP) complex are thought to facilitate H3.3 incorporation at boundary elements in Drosophila [57]. P400 is another SWI/SNF-like remodeler recently shown to exchange H3.3 nucleosomes that could also possibly collaborate with SWI/SNF [58].

Given previous associations between H3.3 epigenetic memory and cell fate plasticity [32], it is intriguing to consider a role for BAF complex regulation of H3.3 as being a critical determinant of endometrial epithelial cell identity and homeostasis across the menstrual cycle when proliferation and differentiation occur. Further, it remains possible that alternative SWI/SNF complex configurations also participate in H3.3 maintenance, and these complexes could be responsible for H3.3 incorporation at sites unaffected by ARID1A loss.

ARID1A maintenance of H3.3 is associated with genomic interactions with CHD4, a catalytic subunit in the SWI/SNF-like NuRD remodeler complex. As CHD4 knockdown does not lead to the widespread H3.3 depletion observed with ARID1A knockdown, and ARID1A is required for CHD4 recruitment to active regulatory elements, loss of CHD4 co-regulation of H3.3 chromatin is likely the consequence of ARID1A loss. We also observed sub-stoichiometric physical interactions between ARID1A and CHD4, but the significance of direct ARID1A-CHD4 interactions is unclear. CHD4 interactions with histone reader ZMYND8 appear to be associated with further chromatin target regulation specificity, where ZMYND8 may be recruited to H4(K16)ac-marked chromatin through its bromodomain. However, further experimentation, such as ZMYND8 depletion or bromodomain mutation, would be required to confirm the suspected function of the ZMYND8 module in complex recruitment. ZMYND8 co-regulation appears to be associated with chromatin repression, notably at promoter-proximal super-enhancers located upstream of genes, such that disruption of this chromatin mechanism causes relief of repression and subsequent transcriptional hyperactivation. Plasminogen activator urokinase (PLAU) was identified as a key target gene repressed by this mechanism in our 12Z endometriotic epithelial cell model. PLAU was also recently observed as transcriptionally activated during human menstruation [59], suggesting similar repressive chromatin mechanisms may govern PLAU regulation in the healthy endometrium. PLAU is upregulated in ovarian endometrioid carcinomas from women with concurrent endometriosis [60], suggesting PLAU upregulation may promote malignant transformation in endometriosis. ARID1A mutations are frequently observed in endometriosis-associated ovarian cancers [23, 24]. C1S, a component of the complement C1 complex, is another gene that is transcriptionally repressed by ARID1A, CHD4, ZMYND8, and H3.3 that is aberrantly upregulated in human endometriomas. It has been reported that the complement system is activated in women with endometriosis [61], suggesting that ARID1A mutation and associated disruption of chromatin repression may be a possible disease mechanism. In addition to ARID1A, it should be noted that CHD4 mutations leading to nucleosome remodeling defects are also frequent in endometrial cancer [62,63,64] and may lead to de-repression of similar target genes.

H3.3 is considered an active chromatin mark associated with transcriptional activation. However, our data and others have demonstrated that H3.3 can play roles in transcriptional repression, as well as transcriptional poising and higher-order chromatin regulation, although the mechanisms governing these specific activities remain unclear [31]. A simple hypothesized mechanism explaining how H3.3 can function repressively is through associations with CHD4 and the NuRD complex, as we have studied here. Historically, NuRD has been studied as a repressor due to its subunit composition that includes the histone deacetylases HDAC1/2, although activating roles of NuRD are also known [65,66,67]. An early study of H3.3 chromatin dynamics indicated that NuRD components were associated with active regions marked by high H3.3 turnover [68]. More recently, NuRD has been shown to directly interact with H3.3 nucleosomes [48]. The finding that CHD4 recruitment is dependent upon H3.3 maintenance by ARID1A at a subset of enhancers further supports the notion that CHD4 co-repressive activity at these sites is likely a result of H3.3 regulation by the BAF complex. When ARID1A is mutated or lost, an H3.3 to H3.1/3.2 switch may impair CHD4 binding through its normal H3.3 reader function, leading to loss of NuRD HDAC co-repressive activity. Intriguingly, CHD4/NuRD was recently shown to control super-enhancer accessibility and maintain lower acetylation levels through its HDAC activity [69], similar to our findings with ARID1A by antagonizing P300 [27]. In support of our data, the authors observed physical interactions between CHD4 and SWI/SNF. Recently, NuRD and SWI/SNF recruitment to active TSS and enhancers was impaired in H3.3K4A mutant mouse ESCs [39], suggesting that NuRD and SWI/SNF recruitment is dependent on the K4 residues on H3.3.

In silico analyses from the ReMap 2020 transcriptional regulator peak database [40] predicted that ZMYND8 is highly associated with H3.3 chromatin regulation by ARID1A. Here, we detected high stringency physical interactions between CHD4 and ZMYND8 as a possible explanation of this co-regulatory activity, as we have demonstrated that ARID1A maintenance of H3.3 is required for CHD4 binding at enhancers. Others have also reported that ZMYND8 interacts with NuRD in numerous contexts [41, 44, 46, 47]. Intriguingly, one recent study reported that ZMYND8 directly recognizes mutant H3.3G34R [70]. Our data indicate that ZMYND8 links repressive H3.3 to H4 acetylation. In support, the ZMYND8 bromodomain directly interacts with acetylated H4 tails [44], and TIP60-mediated H4 acetylation can functionally recruit ZMYND8 through this mechanism to repress transcription with CHD4 in response to DNA damage [46]. Our data also indicate that ARID1A directly suppresses chromatin accessibility at sites marked by H4 acetylation, suggesting that SWI/SNF chromatin remodeler activity may be involved in ZMYND8-NuRD-mediated chromatin repression. ZMYND8-NuRD repression in response to DNA damage was previously shown to rely on KDM5A demethylase activity [71], further suggesting other factors may orchestrate repression vs. activation logic. ZMYND8 has been reported to be a super-enhancer factor that suppresses hyperactivation [49]. Corroborating our results, ZMYND8 was previously shown to associate with NuRD at super-enhancers [47]. We found that super-enhancers that become hyperacetylated following ARID1A loss are normally associated with the highest levels of H3.3 and ZMYND8 binding. In our proposed model, ZMYND8 bromodomain interactions with H4 acetylated tails might facilitate recruitment and transcriptional repression at active chromatin in association with NuRD, such as at H3.3+ super-enhancers. Further work will seek to elucidate how ZMYND8 functions toward transcriptional repression.

In addition to promoter-proximal and distal enhancer chromatin regulation, SWI/SNF, NuRD, and ZMYND8 have been shown to mediate transcriptional pausing and elongation by Pol II and associated machinery [72,73,74,75], as well as DNA repair [46, 76, 77]. Super-enhancers mark critical cell identity genes [78], and recent evidence suggests chromatin mechanisms coupling transcription and DNA repair occur at super-enhancers to control transcriptional hyperactivation [79]. Super-enhancer chromatin co-regulation by ARID1A, CHD4, and ZMYND8 may fine-tune transcriptional activation states and thus reflect a mechanism at the intersection of transcriptional regulation and other chromatin-regulated processes.

Conclusions

In summary, ARID1A-SWI/SNF activities facilitate maintenance of the histone variant H3.3 in active chromatin, such that ARID1A loss leads to local H3.3 depletion, gain of canonical H3.1/3.2, and H3.3 redistribution toward genic elements with transcriptional consequences. At physiologically relevant genomic regions like super-enhancers, ARID1A collaborates with the repressive CHD4-NuRD remodeling complex and reader protein, ZMYND8, to suppress hyperactivation associated with ARID1A-dependent maintenance of H3.3. ARID1A-CHD-ZMYND8-mediated repression affects genes that are aberrantly activated in human endometriomas. These studies have revealed that SWI/SNF regulation of variant histone exchange influences the activities of other chromatin remodelers and regulators by altering nucleosome substrates, and this mechanism plays substantial roles in women’s health and disease.

Methods

Cell culture, siRNA transfections, and lentiviral shRNA particle usage

Adherent, human 12Z endometriotic epithelial cells were cultured in DMEM/F12 media in the presence of 10% serum (FBS), 1% L-glutamine, and 1% penicillin/streptomycin. Cells were seeded in antibiotic-free media the day before siRNA transfection. Then, 50 nM siRNA (Dharmacon, ON-TARGETplus) were transfected into cells using the Lipofectamine RNAiMAX (Thermo Fisher Scientific) reagent, according to the manufacturer protocol, in OptiMEM (Gibco). Growth media was replaced 24 h following transfection, without antibiotics. Forty-eight hours after transfection, low serum (0.5% FBS) growth media was added with antibiotics. Cells were harvested 72 h following siRNA transfection. Lentiviral shRNA particles were prepared with Lenti-X 293T cells (Takara) and MISSION pLKO.1 plasmids (Sigma-Aldrich) as previously described [27]. Lentiviral shRNA particles were titered using the qPCR Lentiviral Titration Kit (ABM). shRNA particles were transduced into 12Z cells at a 100-fold multiplicity of infection, and media was replaced 24 h later. Cells were harvested 72 h following shRNA transduction.

Cell cycle analysis

The Click-iT Plus EdU Flow cytometry Assay Kit (Invitrogen) was used for cell cycle assays. 12Z cells were treated with 10 mM of EdU for 2 h in culture media. Cells were harvested by trypsinization and washed in 1% BSA in PBS. Cells were resuspended in 100 μL of ice-cold PBS, and 900 μL of ice-cold 70% ethanol was added dropwise while vortexing. Cells were incubated on ice for 2 h. Cells were washed with 1% BSA in PBS and then treated with the Click-iT Plus reaction cocktail including Alexa Fluor 488 picolyl azide according to the manufacturer’s instructions for 30 min. Cells were washed with 1× Click-iT permeabilization buffer and wash reagent, and then treated with 5 mM of Vybrant DyeCycle Ruby Stain (Thermo Fisher) diluted in 1% BSA in PBS for 30 min at 37 °C. Flow cytometry was performed using a BD Accuri C6 flow cytometer (BD Biosciences) and analyzed using FlowJo v10 software (BD Biosciences).

Histone extraction

12Z cells were washed with PBS and scraped in PBS containing 5 mM sodium butyrate. Cells were centrifuged and resuspended in TEB buffer (PBS supplemented with 0.5% Triton X-100, 5 mM sodium butyrate, 2 mM phenylmethylsulfonyl fluoride, 1× protease inhibitor cocktail) and incubated on a 3D spindle nutator at 4 °C for 10 min. Cells were centrifuged at 3000 RPM for 10 min at 4 °C. TEB wash step was repeated once. Following second wash, pellet was resuspended in 0.2 N HCl, and incubated on 3D spindle nutator at 4 °C overnight. The following day, samples were neutralized with 1:10 volume 1M Tris-HCl pH 8.3. Sample was centrifuged at 3000 RPM for 10 min at 4 °C, and supernatant containing histone proteins was collected.

Co-immunoprecipitation (co-IP)

Nuclear extracts were prepared as previously described [15], dialyzed overnight into 0% glycerol (25 mM HEPES, 0.1 mM EDTA, 12.5 mM MgCl2, 100 mM KCl, 1 mM DTT) using a Slide-A-Lyzer G2 Dialysis Cassette (10 kDa cutoff, Thermo Fisher Scientific), and quantified with the BCA Protein Assay Kit (Pierce, Thermo Fisher Scientific). Primary antibodies (anti-ARID1A, D2A8U, Cell Signaling; anti-CHD4, D8B12, Cell Signaling) were conjugated to Protein A Dynabeads (Invitrogen) overnight at 4 °C in 1× PBS + 0.5% BSA. Normal rabbit IgG (Cell Signaling) IPs were performed in parallel at equivalent masses, as negative controls. Five hundred micrograms nuclear lyase was diluted into IP buffer (20 mM HEPES, 150 mM KCl, 10% glycerol, 0.2 mM EDTA, 0.1% Tween-20, 0.5 mM DTT) to a final volume of 1 mL and clarified by centrifugation. After overnight IP at 4 °C, bead slurries were washed with a series of IP buffers at different KCl concentrations: 2× washes at 150 mM, 3× washes at 300 mM, 2× washes at 100 mM, 1× wash at 60 mM. Immunoprecipitants were eluted in 2× Laemmli buffer + 100 mM DTT at 70 °C for 10 min with agitation.

Glycerol gradient sedimentation

Nuclear extracts were prepared, dialyzed, and quantified as described in the co-IP methods section. Density sedimentation by glycerol gradient was performed and probed similar to published reports [13]. Briefly, 4.5 mL 10–30% linear glycerol gradients were prepared using an ÄKTA start (Cytiva) from density sedimentation buffer (25 mM HEPES, 0.1 mM EDTA, 12.5 mM MgCl2, 100 mM KCl, 1 mM DTT) additionally containing 30 and 10% glycerol for initial and target concentrations, respectively. Two hundred micrograms nuclear lyase was overlaid on the glycerol gradient followed by ultracentrifugation at 40,000 rpm in an AH-650 swinging bucket rotor (Thermo Fisher Scientific) for 16 h at 4 °C. Two hundred twenty-five microliters gradient fractions were collected and concentrated using StrataClean resin (Agilent). Concentrated fractions were eluted in 1.5× Laemmli buffer + 37.5 mM DTT and run on SDS-PAGE for immunoblotting.

Immunoblotting

Whole-cell protein lysates were prepared as previously described [27]. Proteins were quantified with the BCA Protein Assay Kit (Pierce, Thermo Fisher Scientific). Protein samples in Laemmli buffer + DTT were denatured at 94 °C for 3 min prior to running on SDS-PAGE gels (6% gels for co-IP and glycerol gradients, 15% gels for histone extracts, and 4–15% gradient gels for whole-cell protein lysates). Gels containing histone extracts were wet transferred to nitrocellulose membranes at 4 °C for 3 h at 400 mA current, then dried at room temperature followed by re-hydration in TBS + 0.1% Tween-20 (TBS-T) and blocking with Odyssey blocking buffer (LI-COR). All other gels were semi-dry transferred to PVDF using a Trans-Blot Turbo (Bio-Rad) according to the manufacturer’s protocol designed for high molecular weight proteins, and blocked with either 5% BSA or 5% milk in TBS. The following primary antibodies were used: anti-ARID1A (D2A8U, Cell Signaling), anti-CHD4 (D4B7, Cell Signaling), anti-ZMYND8 (A302-089, Bethyl), anti-ZMYND8 (Atlas), anti-BRG1 (ab110641, abcam), anti-BAF155 (D7F8S, Cell Signaling), anti-HDAC1 (10E2, Cell Signaling), anti-histone H3.3 (ab176840, abcam), anti-histone H3.3 (2D7-H1, abnova), and anti-histone H3 (D1H2, Cell Signaling). IRDye fluorescent dye (LI-COR) secondary antibodies were used for LI-COR fluorescence-based protein visualization of histones. Horseradish peroxidase (HRP) conjugated secondary antibodies (Cell Signaling) were used for chemiluminescence-based protein visualization of all other targets. Clarity Western ECL substrate (Bio-Rad) was used to activate HRP for chemiluminescence, captured by ChemiDoc XRS+ imaging system (Bio-Rad).

mRNA-seq and analysis

Seventy-two hours after initial siRNA transfection, and 24 h after low-sera conditioning, 12Z cells were purified for RNA using the Quick-RNA Miniprep Kit (Zymo Research). Transcriptome libraries (n = 3 replicates) were prepared and sequenced by the Van Andel Genomics Core from 500 ng of total RNA using the KAPA mRNA HyperPrep kit (v4.17) (Kapa Biosystems). RNA was sheared to 300–400 bp. Prior to PCR amplification, cDNA fragments were ligated to IDT for Illumina unique dual adapters (IDT DNA Inc). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor dsDNA System (Promega), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled, and 50 bp, paired-end sequencing was performed on an Illumina NovaSeq 6000 sequencer using a 100-cycle sequencing kit (Illumina). Each library was sequenced to an average raw depth of 20–25 million reads. Base calling was done by Illumina RTA3 and output of NCS was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0.

For analysis, briefly, raw reads were trimmed with cutadapt [80] and Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) followed by quality control analysis via FastQC [81] and MultiQC [82]. Trimmed reads were aligned to hg38 assembly and indexed to GENCODE (v28) along with gene feature counting via STAR [83]. Low count genes with less than 1 count per sample on average were filtered prior to count normalization and differential gene expression (DGE) analysis by DESeq2 with empirical Bayes shrinkage for fold-change estimation [84, 85]. Wald probabilities were corrected for multiple testing by independent hypothesis weighting (IHW) [86] for downstream analyses. In presented analyses, “log2FC” is the empirically observed log2 fold-change in expression between conditions, while “slog2FC” is a moderated log2 fold-change estimate that removes noise from low count genes using the apeglm shrinkage estimator as implemented in DESeq2 [87]. Pairwise comparisons between different DGE analyses and gene sets were initially filtered for genes with transcripts commonly detected in both cell populations.

Histone peptide arrays

Anti-acetyl-H2A.Z (K4/K7) (D3V1I, Cell Signaling) antibody specificity was analyzed via histone peptide microarrays as previously described [88] with minor modifications. Arrays were designed in ArrayNinja [89] and printed using a 2470 Arrayer (Quanterix). All hybridization and wash steps were performed at ambient temperature. Slides were blocked with hybridization buffer (1× PBS [pH 7.6], 0.1% Tween, 5% BSA) for 30 min, then incubated with primary antibody diluted 1:1000 in hybridization buffer for 1 h. Slides were washed 3× for 5 min with PBS, then probed with Alexa647-conjugated secondary antibody diluted 1:5000 in hybridization buffer for 30 min. Slides were washed 3× for 5 min with PBS, dipped in 0.1× PBS to remove salt, and spun dry. Slides were scanned on an InnoScan 1100 microarray scanner (Innopsys), and images were analyzed and quantified using ArrayNinja. Plots were generated in Prism (GraphPad). Each peptide antigen is printed six times per array, and each antibody was screened on two separate arrays.

Chromatin immunoprecipitation (ChIP-seq) and analysis

Wild-type and lentiviral shRNA particle transduced 12Z cells were treated with 1% formaldehyde in growth media for 10 min at ambient temperature. Formaldehyde was quenched by the addition of 0.125 M Glycine and incubation for 5 min at room temperature, followed by PBS wash and scra**. 1×107 crosslinked cells were used for each ChIP, and each antibody and condition for ChIP was performed in duplicate. Chromatin from crosslinked cells was fractionated by digestion with micrococcal nuclease using the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) according to the manufacturer protocol, followed by 30 s of sonication. ChIP was then performed according to the SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling) with the addition of 5 mM sodium butyrate to preserve histone acetylation. To each 1.25 mL IP, the following antibodies were used: 1:125 anti-histone H3.3 (2D7-H1, abnova); 1:250 anti-histone H3.1/3.2 (61629, Active Motif); 1:50 anti-histone H2A.Z-acetyl (K4/K7) (D3V1I, Cell Signaling); 1:250 anti-histone H2A.Z (ab4174, abcam); 1:50 anti-acetyl-histone H4 (06-866, Millipore); 1:125 anti-histone H4K16ac (39167, Active Motif); 1:50 anti-CHD4 (D4B7, Cell Signaling); 1:250 anti-ZMYND8 (A302-089, Bethyl). Crosslinks were reversed with 0.4 mg/mL Proteinase K (Thermo Fisher) and 0.2 M NaCl at 65 °C for 2 h. DNA was purified using the ChIP DNA Clean & Concentrator Kit (Zymo).

Libraries for input (n = 1 per condition) and IP (n = 2) samples were prepared by the Van Andel Research Institute Genomics Core. Ten nanograms of material was used for input samples, and the entire precipitated sample was used for IPs. Libraries were generated using the KAPA Hyper Prep Kit (v5.16) (Kapa Biosystems). Prior to PCR amplification, end-repaired and A-tailed DNA fragments were ligated to IDT for Illumina UDI Adapters (IDT DNA Inc.). Quality and quantity of the finished libraries were assessed using a combination of Agilent DNA High Sensitivity chip (Agilent Technologies), QuantiFluor® dsDNA System (Promega), and Kapa Illumina Library Quantification qPCR assays (Kapa Biosystems). Individually indexed libraries were pooled, and 50 bp, paired-end sequencing (for ZMYND8, H3.3, H2A.Zac, and H4K16ac) or 100 bp, single-end sequencing (for CHD4, H2A.Z, and pan-H4ac) was performed on an Illumina NovaSeq 6000 sequencer using a 100-cycle sequencing kit (Illumina). Each library was sequenced to minimum read depth of 80 million reads per input library and 40 million reads per IP library. Base calling was performed by Illumina NCS v2.0, and NCS output was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.9.0.

New and re-analyzed (differential) ChIP-seq experiments were analyzed as previously described [27]. Briefly, wild-type CHD4 and differential H2A.Z and pan-H4ac ChIP-seq experiments were analyzed as single-end libraries, while wild-type ZMYND8 and differential H3.3, H2A.Zac, and H4K16ac ChIP-seq were analyzed as paired-end libraries. Raw reads for IPs and inputs were trimmed with cutadapt [80] and Trim Galore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) followed by quality control analysis via FastQC [81] and MultiQC [82]. Trimmed reads were aligned to GRCh38.p12 reference genome [90] via Bowtie2 [91] with flag “--very-sensitive.” Aligned reads were sorted and indexed with samtools [92]. Only properly paired read fragments were retained for paired-end libraries via samtools view with flag “-f 3” followed by sorting and indexing. For libraries intended for differential analyses, molecular complexity was then estimated from duplicate rates by ATACseqQC [93] and preseqR [94], and libraries were subsampled to equivalent molecular complexity within an experimental design based on these estimates with samtools. Picard MarkDuplicates (http://broadinstitute.github.io/picard/) was used to remove PCR duplicates, followed by sorting and indexing. MACS2 [95] was used to call peaks on each ChIP replicate against the respective input control. For CHD4 and ZMYND8 IPs, MACS2 called broadPeaks with FDR < 0.05 threshold and otherwise default settings. For H2A.Z and H2A.Zac IPs, MACS2 called narrowPeaks with FDR < 0.05 threshold and flags “--nomodel --extsize 146” to bypass model building. For H3.3, pan-H4ac, and H4K16ac IPs, MACS2 called broadPeaks with FDR < 0.05 threshold and flags “--nomodel --extsize 146” to bypass model building. The resulting peaks were repeat-masked by ENCODE blacklist filtering and filtered for non-standard contigs [96]. A naive overlap** peak set, as defined by ENCODE [97], was constructed by calling peaks on pooled replicates followed by bedtools intersect [98] to select for peaks of at least 50% overlap with each biological replicate.

ChIP-seq differential histone abundance analysis (n = 2 per condition) was performed with csaw [99]. First, a consensus peak set was constructed for each differential experiment from the union of replicate-intersecting, filtered MACS2 peak regions called in each condition. When examining the effects of ARID1A knockdown on canonical H3.1/3.2 abundance at H3.3-marked sites, the H3.3 peak set was utilized for this analysis. ChIP reads were counted in these query regions by csaw, then filtered for low abundance peaks with average log2CPM < −3. When comparing ChIP libraries, any global differences in IP efficiency observed between the two conditions were considered a result of technical bias to ensure a highly conservative analysis. As such, we employed a loess-based local normalization to the peak count matrix, as is implemented in csaw [99], to assume a symmetrical MA distribution. A design matrix was then constructed from one “condition” variable. The count matrix and loess offsets were then supplied to edgeR [100] for estimating dispersions and fitting quasi-likelihood generalized linear models for differential abundance hypothesis testing. Nearby query regions were then merged up to 500 bp apart for a maximum merged region width of 5 kb, and the most significant probability was used to represent the merged region. Finally, FDR < 0.05 threshold was used to define significant differentially abundant regions.

Chromatin state modeling and optimization

The same genome-wide chromatin 18-state map of 12Z cells with or without ARID1A depletion, constructed with ChromHMM [37, 101] using total RNA, ATAC, H3K4me1, H3K4me3, H3K18ac, H3K27ac, and H3K27me3 data [27], was re-analyzed in Figs. 1, 2, and 5 studies. A refined ChromHMM model was constructed with further addition of H3.3, H2A.Z, H2A.Zac (K4/K7), pan-H4ac (K5/K8/K12/K16), and H4K16ac features with some procedural modifications. In order to reduce technical confounders in differential chromatin state analysis between control and ARID1A-depleted cell types, we adopted an equalized binarization framework described by Fiziev et al. [102]. Briefly, the ChromHMM chromosomal signal intermediate files during BAM binarization were saved and imported into R. Feature signal values were then background-subtracted by respective control signals when available (e.g., input chromatin for ChIP; does not occur for ATAC). For each feature and cell type, those (background-subtracted) signal values were ranked, and the top n ranked binarization calls are selected, where n is the lower number of calls among the two cell types for the given feature. The result is a new equalized binarization, where each feature has the same number of “present” region calls in both cell types, per chromosome. As an example, if H3K18ac called 27,000 present regions on chromosome 1 in control cells and 35,000 present regions in ARID1A-depleted cells, then the top 27,000 regions are retained in both cell types. Chromatin state models from 5 to 40 states were then computed using the “concatenated” approach to unify both cell types for differential state comparisons. The new chromatin state model was optimized at 25 states through a strategy devised by Gorkin et al. [103], which utilizes the ChromHMM CompareModels function to compare feature emission parameters from the 40-state (most complex) model against all other simpler models, as well as a k-means clustering of emission probabilities from all models together and analyzing the goodness of fit. See Additional file 1: Fig. S6 for related analyses. Across both strategies, 25 states was observed as a threshold for >95% median maximal state correlation and goodness of fit (between-cluster vs. total sum-of-squares) relative to the most complex model.

Bioinformatics and statistics

The human endometrioma vs. control endometrium genome-wide expression (Illumina BeadChips) data set [53] was retrieved from GEO accession GSE23339 and analyzed via GEO2R and limma [104,105,106]. biomaRt was used for all gene nomenclature and mouse-human ortholog conversions [107]. The cumulative hypergeometric distribution was calculated in R for enrichment tests. HOMER was used to quantify sequencing reads across sets of genomic regions including heatmaps [108]. GenomicRanges functions were used to intersect and manipulate genomic coordinates [109]. IGV [110] was used for visualizing epigenomic data across hg38 loci as MACS2 enrichment log-likelihood ratio (logLR) for ChIP-seq and ATAC-seq or FPKM for RNA-seq. Hierarchical clustering by Euclidean distance and heatmaps were generated by ComplexHeatmap [111]. ggplot2 was used for some plots in this study [112]. The statistical language R was used for various computing functions throughout this study [113].