Background

Cohesin plays a major role in the three dimensional organization of the genome in addition to mediate sister chromatid cohesion [1]. This conserved protein complex consists of four core components, Structural Maintenance of Chromosomes (SMC) subunits SMC1 and SMC3, the kleisin subunit RAD21 and the HEAT-repeat containing STAG/SA subunit [2]. Another two HEAT-repeat proteins associate with cohesin. One is NIPBL, which forms a heterodimer with MAU2 and is considered the cohesin loader [3]. The other is PDS5, which can associate with WAPL to drive cohesin unloading but can also stabilize cohesin on chromatin by promoting its acetylation by acetyltransferases ESCO1/2 and the binding of Sororin [47]. There are two PDS5 proteins in vertebrate cells, PDS5A and PDS5B, highly homologous except in their C-terminal regions [8]. Knock-out mice for either gene die before birth, suggesting that full compensation cannot be achieved [4]. The two PDS5 proteins are present in cells throughout the cell cycle, can associate with either cohesin-STAG1 or cohesin-STAG2 and can recruit ESCO1 to cohesin [8, 9]. Moreover, both must be depleted in order to alter cohesin dynamics and promote the appearance of vermicelli in MEFs and HeLa cells [5, 7]. There are also two versions of the STAG subunit, STAG1 and STAG2, for which both overlap** and distinct roles have been described in recent years [1013]. Elimination of either STAG gene results in embryonic lethality [14, 15]. Mutations in STAG1 and STAG2 have been identified in human developmental syndromes and STAG2 is frequently mutated in cancer [16, 17]. Understanding the specific roles of the different versions of cohesin that coexist in the cell may provide important hints for a rational design of specific treatments for these patients. Importantly, several lines of evidence suggest that these disease-causing mutations alter the function of cohesin in genome folding rather than in sister chromatid cohesion.

The role of cohesin in genome architecture centers on its ability to generate chromatin loops [18, 19, 7]. According to the loop extrusion model, cohesin loads on DNA and forms a small loop that is progressively extended until the complex is released from DNA by the action of PDS5-WAPL or until it is stopped and stabilized by chromatin-bound CTCF [2022]. This explains why cohesin largely colocalizes with CTCF genome-wide [23, 24]. The boundary and/or anchoring function of CTCF depends on its interaction with an interface formed by STAG and RAD21, which is the same that requires WAPL, but possibly also on PDS5 and ESCO1 [25, 26, 27, 7, 13]. Recent in vitro reconstitution of loop extrusion by cohesin has demonstrated the requirement for NIPBL to activate the SMC1/3 ATPase [28, 29]. In yeast, Pds5 competes with Scc2 (NIPBL ortholog) for cohesin-binding and structural studies indicate that the two proteins bind the same region in RAD21 [30, 31]. It is therefore possible that PDS5 contributes to halting extrusion at CTCF sites by preventing the interaction of cohesin with NIPBL. In addition, PDS5 may promote SMC3 acetylation by ESCO1 at these sites, as deletion of CTCF dramatically decreases cohesin acetylation [32]. While acetylated cohesin has been mapped to CTCF sites, the genome-wide localization of PDS5 proteins in mammalian cells has not been reported yet [33, 9, 34].

Recent studies have described the genome-wide distribution of cohesin variants STAG1 and STAG2 [35, 10, 36, 37, 38, 45]. However, they use a single antibody to detect the four-subunit cohesin complex, an antibody against RAD21. We have performed our experiments using antibodies against different cohesin subunits and regulators and used also data from other studies using a different set of cohesin antibodies [32, 44]. Different antibodies have different abilities to recognize their epitopes at sites in which cohesin may present a different conformation or be bound by different regulators [46]. While it is difficult to compare data obtained with different antibodies, read density plots shown in Figs. 1 and 2 comparing distribution of ChIP reads between CTCF and non-CTCF sites for each antibody in different cell lines support the existence of non-CTCF cohesin sites and further show that cohesin-STAG2 is the preferred variant at those sites, consistent with our previous results in human cells [37] and mouse embryonic stem cells [36].

At least a fraction of the non-CTCF cohesin sites could represent loading sites, as previously suggested [41]. In WT MEFs, the more dynamic cohesin-STAG2 would be more available for loading and therefore it would be detected at these sites more frequently. In Ctcf KO MEFs, as cohesin-STAG1 becomes more dynamic [13] this variant would also occupy those sites. Conversely, when the absence of WAPL abrogates cohesin release and there is no free cohesin available, cohesin cannot be detected at those sites. One puzzling observation is that CTCF depletion decreases cohesin (SMC1) occupancy not only at CTCF sites, but also at non-CTCF sites. We reckon that the gain of STAG1 at these positions does not compensate for the loss of STAG2, a consequence of the strong reduction in STAG2 protein levels observed by immunoblot analyses in Ctcf KO MEFs. The reason for this reduction is unclear. One possibility is that NIPBL preferentially engages the STAG1 complex, which becomes more available in the absence of CTCF, and that STAG2 that cannot be loaded/stabilized on chromatin is degraded. This effect could be particularly strong in the experimental conditions used here, as extensive depletion of CTCF required 10 days of cell culture in low serum in the presence of Cre recombinase. Additional replicates for this experiment, maybe also in different experimental conditions and cell lines, will be required to validate these hypotheses.

Recently, the validity of NIPBL ChIP-seq datasets, including the one used here, and the actual existence of defined loading sites for cohesin have been called into question [47]. Whether this is the case or not, an alternative scenario, previously discussed, is that sites to which cohesin relocates in the absence of CTCF represent secondary boundaries for loop extrusion, such as those occupied by the transcriptional machinery [47, 32]. Recent reports have shown a correlation between NIPBL occupancy and the presence of RNA polII and transcriptional regulators [4850]. Moreover, we have previously identified preferential interactions between STAG2 and transcription factors in human cells [37]. Whether these interactions recruit cohesin to enhancers/promoters or are the consequence of cohesin being retained by the transcriptional machinery remains to be clarified.

The role of Pds5 proteins in CTCF boundary function

Here we have shown that genome-wide distribution of PDS5A and PDS5B is virtually identical and the two proteins localize almost exclusively at CTCF-bound cohesin sites. Deletion of one or the other PDS5 paralog does not have a noticeable effect on cohesin distribution while cohesin is moderately reduced at all sites in Pds5 DKO MEFs. Wutz et al. [7] showed that PDS5 proteins are required for CTCF boundary function and suggested that this might depend, at least in part, on exclusive binding of cohesin to PDS5 or NIPBL, as suggested by structural studies [30]. Our immunoprecipitation data confirm that human cohesin complexes bound to PDS5 do not interact with NIPBL, as suggested by results in yeast [31]. The competition of NIPBL and PDS5 for binding cohesin appears to be regulated by acetylation [51, 27]. This acetylation, in turn, may occur preferentially at CTCF sites, as it is dramatically reduced in Ctcf KO MEFs [32]. Mutation of CTCF in residues that are key for the cohesin-CTCF interaction (Y226A/F228A) decreases but does not abrogate cohesin localization at CTCF sites, suggesting the existence of additional interaction surfaces [26]. Intriguingly, the N-terminal region of CTCF contains a motif, found also in WAPL and SORORIN, that interacts with the APEAP motif in the N-terminus of PDS5 [52, 53]. Thus, a single cohesin complex could interact with the N-terminal regions of two CTCF proteins at the base of a chromatin loop: one through PDS5 and the other through the RAD21-STAG interface (Fig. 6). The latter appears to be dominant, since deletion of the PDS5-binding motif in CTCF did not reduce insulation or Hi-C peak strength in mouse ES cells [52]. Further analyses are required to address the functional consequences of the PDS5-CTCF interaction. Moreover, this study detected a preferential interaction of CTCF with PDS5A, but the PEAP motif is present in both PDS5A and PDS5B. This situation is reminiscent of what happens with STAG1 and STAG2: while the region interacting with CTCF (CES) is present in both STAG proteins, a preferential interaction of STAG1 with CTCF has been described [13]. It is therefore possible that regions in the paralogs beyond those identified as critical reinforce or hinder their interaction with CTCF. The exclusive presence of PDS5 protein at CTCF cohesin sites support the idea that they contribute to arrest cohesin at CTCF sites by several mechanisms that include preventing access of NIPBL, promoting ESCO1-mediated SMC3 acetylation and providing an additional interaction surface for cohesin and CTCF.

Fig. 6
figure 6

Cohesin-mediated loops at CTCF convergent sites. Speculative model showing how a single cohesin complex could be arrested at the base of a chromatin loop with CTCF to motifs in convergent orientation. The N-terminal region of the CTCF molecule on the left would interact with PDS5 while the one on the right would bind the STAG/RAD21 interface. PDS5 binding to RAD21 would prevent the interaction with NIPBL, halting extrusion, while WAPL interactions with PDS5 and STAG/RAD21 would be also precluded, blocking release of the complex. Acetylation of SMC3 head (Ac) would strengthen the cohesin–PDS5 interaction

Methods

MEF isolation and culture

MEFs of the following genotypes were used in this study: Stag1 −/− [15], Stag2 f/Y; Cre-ERT2 [14], Pds5A ± , Pds5B ± , Pds5A f/f; Pds5B f/f; Cre-ERT2 [4] and Ctcf f/f [54]. Mice were housed in a pathogen-free animal facility following the animal care standards of the institution. All procedures have been revised and approved by the required authorities (Comunidad Autónoma de Madrid). Primary MEFs were isolated from E12.5 embryos and cultured in DMEM supplemented with 20% FBS at 37 ºC under 90% humidity and 5% CO2. Conditional knock out MEFs (Stag2 f/Y; Cre-ERT2 and Pds5A f/f; Pds5B f/f; Cre-ERT2) were cultured in medium with 1 μM 4-hydroxy tamoxifen for 4 and 5 days, respectively. For CTCF elimination, a clone of immortalized Ctcf f/f MEFs was infected with Adeno-Cre viruses (University of Iowa) at 250 pfu/cell in DMEM supplemented with 2% FBS. Medium was replaced after 24 h and cells were collected 9 days later for immunoblot and chromatin immunoprecipitation analyses.

Immunoblotting

Whole cell extracts for immunoblot were prepared by resuspension in Laemmli buffer at 10,000 cells/µl, sonication and boiling for 5 min at 95 ºC, fractionated in SDS-polyacrylamide gels and transferred to nitrocellulose membranes for 1 h at 100 V in transfer buffer I. Membranes were blocked in 5% skimmed milk in TBST, incubated with antibodies for 1–2 h in 1% BSA-TBST. Antibodies are listed in Additional file 2: Table S2. Horseradish peroxidase (HRP)-conjugated secondary antibodies (Amersham Biosciences) were used at 1:5000 dilution in blocking solution for 1 h at RT. ECL develo** reagent (Amersham Biosciences) was used.

Immunoprecipitation

Whole cell extracts for immunoprecipitation (Fig. 3, Additional file 1: Fig. S4), were prepared by lysing asynchronously growing MEFs in lysis buffer [0.5% NP-40 in TBS supplemented with 0.5 mM DTT, 0.1 mM PMSF and 1X complete protease inhibitor cocktail (Roche)] on ice for 30 min followed by sonication. Then NaCl was added to 0.3 M and the extract rotated for 30 min at 4 ºC. Salt concentration was then lowered to 0.1 M NaCl by dilution and glycerol added to 10% final concentration. Extracts were incubated with specific antibodies for 2 h at 4 ºC and rotated with 1/10 vol of protein A agarose beads for 1 h at 4 ºC. The beads were washed 6 times with 20 vol of lysis buffer and eluted in SDS-DTT gel loading buffer for 5 min at 95 ºC.

For immunoprecipitation reactions shown in Fig. 4, HeLa nuclear extracts were used. These were prepared in buffer B (20 mM K-Hepes, pH8, 0.1 M KCl, 2 mM MgCl2, 0.2 mM EDTA, 20% glycerol, 0.5 mM PMSF, 1 mM 2-mercaptoethanol) as described [55]. For the experiment shown in Fig. 4b, 25 µl of extract were incubated with 2.5 µl of IgG (control), 2.5 µg anti-SMC1, or 1.25 µg each anti-Pds5A and anti-PDS5B, for 2 h on ice and additional 2 h rotating at 4 ºC after adding 7.5 μl of protein A magnetic beads to the mixture. Beads were then recovered, washed with buffer B supplemented with 0.01% NP40, boiled and the supernatant was analyzed by immunoblotting. For the experiment in Fig. 4c, 100 µl of extract were incubated with 15 µg each PDS5A and PDS5B antibodies (PDS5-dep) or 30 µg of rabbit IgG (mock-dep) for 2 h on ice, 30 µl of protein A-sepharose beads were added to each mix and tubes were rotated overnight at 4 ºC. The supernatant was recovered, 1 µl was kept as input and to the rest we added 15 µg of anti-SMC1. After 2 h on ice, 25 µl of protein A magnetic beads (Millipore) were added and the tubes rotated for 2 h at 4 ºC. The supernatant (unbound) and immune complexes (bound) were then mixed with Laemmli buffer, boiled 5 min at 95 ºC and analyzed by immunoblotting.

Biochemical fractionation and salt extraction

Chromatin fractionation was performed as described [56]. Cells were resuspended at 2·107 cells/mL in buffer A (10 mM HEPES pH 7.9, 10 mM KCl, 1.5 mM MgCl2, 0.34 M sucrose, 10% glycerol, 1 mM DTT, 1 mM NaVO4, 0.5 mM NaF, 5 mM β-glycerophosphate, 0.1 mM PMSF), and incubated on ice for 5 min in the presence of 0.1% Triton X-100. Low-speed centrifugation (4 min/600 g/4 °C) allowed the separation of the cytosolic fraction (supernatant) and nuclei (pellet). Nuclei were washed and subjected to hypotonic lysis in buffer B (3 mM EDTA, 0.2 mM EGTA, 1 mM DTT, 1 mM NaVO4, 0.5 mM NaF, 5 mM β-glycerophosphate, 0.1 mM PMSF) 30 min on ice. Nucleoplasmic and chromatin fractions were separated after centrifugation (4 min/600 g/4 °C). Chromatin was resuspended in Laemmli buffer and sonicated twice for 15 s at 20% amplitude. For salt extraction experiments, chromatin fractions were either left untreated or treated with 0.5 M NaCl in modified buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 0.34 M sucrose, 10% glycerol and supplemented as above) for 30 min on ice. Solubilized proteins were separated from insoluble chromatin by low-speed centrifugation (4 min/600 g/4 °C) and prepared for immunoblotting.

Immunofluorescence

Cells grown on coverslips were pre-extracted with 0.5% Triton X-100 in CSK buffer (10 mM Pipes pH 7.0, 100 mM NaCl, 3 mM MgCl2 and 300 mM sucrose) for 5 min before fixation in 2% paraformaldehyde for 15 min at room temperature. Coverslips were blocked with 3% BSA, 0.05% Tween-20 in PBS for 30 min. Primary and secondary antibodies were diluted in blocking solution and incubated for 1 h each. DNA was counterstained with 1 µg/ml DAPI. A Leica DM6000 microscope was used to obtain grayscale images, which were later analyzed using FIJI software.

Inverse fluorescence recovery after photobleaching (iFRAP)

One wild-type MEF clone was immortalized using SV40 large T antigen and used to generate RAD21-GFP, STAG1-GFP and STAG2-GFP expressing cell lines by CRISPR/Cas9-mediated homologous recombination, as described [57]. Donor plasmids containing the C-terminus of the targeted genes with in-frame GFP were generated by Gibson Assembly. sgRNA sequences were designed using “crispr.mit.edu” (Additional file 2: Table S3) and cloned in pX335 plasmids, that also encodes Cas9n-D10A. Plasmids were introduced in MEFs by electroporation with a Neon Transfection System (ThermoFisher) applying 2 pulses of 20 ms at 1400 V. Positive cells were selected through an Influx Cell Sorter (BD) based on the GFP signal over control cells. The resulting polyclonal population was characterized by immunofluorescence, immunoblot and immunoprecipitation (Additional file 1: Fig. S4) and used for iFRAP, selecting cells showing nuclear GFP signal. Cells were seeded in 8-well chambered coverslips (Ibidi) at 40,000 cells/cm2 48 h prior to performing the experiment. The next day media was changed to 0.1% FBS for 24 h. iFRAP was performed in a Leica TCS-SP5 (AOBS) confocal microscope from Germany Leica Microsystems using a 40x/1.2 NA HCX PL APO objective with immersion oil. Cells were kept in a climate chamber at 37 ºC with 5% CO2 during the experiment. Image acquisition used the HCSA software in LAS AF 2.7. Cells were photobleached with an argon laser and the recovery was monitored by live-cell imaging, Pictures were taken immediately before and after photobleaching as well as every 30 s during recovery. Videos were analyzed using FIJI software using the plug-in Turboreg (http://bigwww.wpfl.ch/thevenaz/turboreg) for image alignment. For each timepoint, the difference in intensity between the bleached and unbleached areas of the cell nucleus is calculated after background subtraction, normalization to initial fluorescence (i.e., t = 0 in the unbleached area) and to total cell intensity. Statistical analysis and curve fit (non-linear regression) were carried out with GraphPad Prism.

ChIP sequencing and analysis

Chromatin immunoprecipitation was performed in asynchronously growing MEFs as described [39] with antibodies listed in Additional file 2: Table S2. For SMC1 and STAG2 ChIPs in Ctcf f/f ± Cre, MEFs arrested in G0 were used and around 5% of sonicated chromatin from MCF10A cells was mixed with the mouse chromatin before addition of antibodies for calibration purposes. Around 5 ng of immunoprecipitated chromatin in each sample were used for library preparation. DNA libraries were applied to an Illumina flow cell for cluster generation and sequenced on an Illumina HiSeq2000. Alignment of reads to the reference mouse genome (mm10) was performed using ‘Bowtie2’ (version 2.4.2) under default settings [58]. Duplicates were removed using GATK4 (version 4.1.9.0) and peak calling was carried out using MACS2 (version 2.2.7.1) after setting the q value (FDR) to 0.05 and using the ‘–extsize’ argument with the values obtained in the ‘macs2 predictd’ step [59]. “CTCF” and “non-CTCF” cohesin positions in MEFs (Fig. 1A) were defined using called peaks generated as indicated above from ChIP-seq data for cohesin subunits obtained in this study and a previous study from our group [39] as well as CTCF ChIP-seq data from this study and two additional studies from the Peters’ group [32, 44]. “CTCF” and “non-CTCF” cohesin positions in different human cell types (Fig. 2) were defined in the same way, merging first called peaks for each cohesin subunit and then separating these “cohesin” peaks in two clusters according to the presence/absence of CTCF signal. Motif analysis (Additional file 1: Fig. S3) was performed using MEME-ChIP with standard parameters [60]. We considered all the motifs discovered by SPAMO with a P-value cut-off  < 10−10.

For analysis of calibrated ChIP-seq, profiles for each antibody were normalized by coverage and then multiplied by the occupancy ratio (OR) = (WhIPm)/(WmIPh), where Wm and IPm are the number of reads mapped to the mouse genome from input (W) and immunoprecipitated (IP) fractions, and Wh and IPh are reads mapped to the human genome from the input and IP fractions used for calibrating [61]. When calibrated ChIP-seq was not available, normalization was done by RPKM. Mean read-density profiles and read-density heatmaps for different chromatin-binding proteins were generated with deepTools 3.5.0 [62]. For the data shown in Fig. 1d, we first obtained the ratio between WT and KO for CTCF and non-CTCF cohesin positions for each biological replicate pair. Then, we calculated the log2-fold change between CTCF and non-CTCF positions. Chromatin states used in Fig. 4 were generated using ChromHMM [63]. CTCF, H3K27ac, H3K4me3, H3K4me1, H3K27me3, POLII and input datasets were used to generate 8 or 15 hidden Markov model-states. In the end, we used the model with 8 states, but splitting the enhancer state in two based on the 15-state model. For the correlation of PolII with NIPBL and CTCF (Fig. 4f), active promoters were selected as those with an average signal for H3K4me3 in  ± 2.5 Kb around TSS  > 1. This corresponds to 13,290 TSSs out of the 24,388 total number of TSSs (used in the heatmaps of Fig. 4e).

ChIP-qPCR

For ChIP-qPCR, SYBR Green PCR Master Mix and an ABI Prism® 7900HT instrument (Applied Biosystems®) was used and reactions were performed in triplicate. Fold enrichment of cohesin-binding at a given position was calculated over the binding at a nearby position showing few reads in the browser (negative region). Chromosome coordinates of the validated peaks and the corresponding primers are listed in Additional file 2: Table S4.

RNA sequencing and analysis

Asynchronous MEFs (3 clones) were harvested and RNA was extracted using RNeasy kit from Qiagen. PolyA  + RNA was purified with the Dynabeads mRNA purification kit (Invitrogen), randomly fragmented and converted to double-stranded cDNA and processed through subsequent enzymatic treatments of end-repair, dA-tailing and ligation to adapters as in Illumina’s ‘TruSeq RNA Sample Preparation Guide’ (Part # 15031047 Rev. D). Adapter-ligated library was completed by limited-cycle PCR with Illumina PE primers and applied to an Illumina flow cell for cluster generation (TruSeq cluster generation kit v5) and sequenced on HiSeq2000 following manufacturer’s protocols. Fastq files with 86-nt single-end sequenced reads were quality-checked with FastQC (S. Andrews, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and aligned to the mouse genome with Nextpresso executing TopHat-2.0.0 using Bowtie 0.12.7 and Samtools 0.1.16 allowing two mismatches and five multi-hits [64]. Reads were mapped to mm10 genes using HTSeq [65].

Genomic data

Genomic data generated in this study have been deposited in GEO database (accession number GSE212151). A list with these and additional datasets used appears in Additional file 2: Table S1.