Background

Although human pluripotent stem cells (hPSCs) and notably embryonic stem cells (ESCs) hold great promise in the regeneration of tissues, conventional hESCs are essentially classified as primed PSCs, like mouse epiblast stem cells (mEpiSCs) that exhibit lower differentiative capability [1,2,3,4]—limiting their potentially broad applicability. Significant progress has enabled the conversion of primed hESCs to the naive state of pluripotency, resembling well-characterized naive mouse ESCs (mESCs) [5,6,7,8,9,10,11]. Intriguingly, naive hPSCs can generate blastocyst-like structures in vitro under effective three-dimensional (3D) culture conditions [12]. Naive hPSCs express higher levels of specific pluripotent genes such as OCT4, NANOG, STELLA/DPPA3, and DPPA5—as well as the specific endogenous retrovirus HERVH—whereas primed hPSCs particularly express ZIC2, OTX2, and B3GAT1 [7, 8, 11, 13,18, 19], the mechanisms that regulate the specific transcription and control of these states have remained elusive. By taking advantage of a single-cell, diploid chromatin conformation-capture method termed Dip-C that reveals sufficiently high-resolution genomic structures [23], and our results revealed a difference from the previous assessment of CpG frequency. There was also a similar trend with respect to primed central aggregation of additional B components. We ascertained that the active euchromatin organization or compartmentalization at the nuclear center of the naive state and the repressive heterochromatic compartmentalization in the central region of the primed sate in hESCs appeared to be active or open chromatin (A) or repressed or closed chromatin (B) compartments, respectively, and that they were distinct from the A-B compartment switch reported previously [23,24,26, 27]. To assess whether the expression of pluripotent genes was linked to the enhancer chromatin structure, we analyzed the distribution of the enhancers at the gene loci in primed and naive hESCs. Our data revealed that more enhancers were enriched around the loci of the naive pluripotent genes in naive hESCs, and more enhancers were distributed around the loci of primed pluripotent genes in primed hESCs (Additional file 1: Fig. S4b). Moreover, primed pluripotent genes showed overlap of the loci with the enhancers in primed hESCs, and naive pluripotent genes overlapped at the loci with enhancers in naive hESCs (Fig. 6a, Additional file 1: Fig. S8). These results are consistent with the model in which chromatin is folded to form an enhancer–promoter loop, thus facilitating transcription.

Fig. 6
figure 6

Transcriptional regulation and chromatin accessibility in naive and primed states. a Radial positioning along the genome of primed or naive genes and enhancers in naive or primed hESCs. b Heatmaps of ATAC-seq signal distribution around the transcriptional start site (TSS) ± 3000 bp of expressed genes and average profiles of the enrichment at the TSS in naive or primed hESCs (two replicates are shown for each state). c 3D structural differences between the two alleles of the representative marker-gene loci in primed and naive cells. d Joint analysis by ATAC-seq, CUT&Tag, and RNA-seq of the relationships between gene expression and chromatin accessibility in naive (NANOG) and primed states (ZIC2). Genome browser tracks of RNA-seq, ATAC-seq, and CUT&Tag-seq data of CTCF, H3K9me3, and H3K27me3 at the NANOG and ZIC2 loci in naive and primed hESCs

To form an enhancer–promoter loop, a relaxed open chromatin structure is required. Map** open chromatin using an assay for transposase-accessible chromatin (ATAC-seq) previously revealed distinct chromatin accessibility of naive and primed hPSCs [22]. To explore the consequences of distinct genomic organization of primed and naive states and to discern whether genomic spatial organization facilitated chromatin accessibility, we performed low (cell number) input ATAC-seq on our naive hESCs and compared them with primed hESCs. ATAC-seq detected more open euchromatin structure in naive cells (Fig. 6b), confirming the Dip-C analysis, showing that the naive state possessed a smaller number of chromatin contacts (Fig. 4i). When we simulated the contact morphology of primed and naive pluripotent genes localized in the chromatin based on a previously reported method [39]. In addition, TADs may represent a “population average” of individual loops that differ on a cell-to-cell basis. It is acknowledged that TADs are indeed present in individual cells and that their observed hierarchy may reflect multimeric associations between individual regions within the TAD [39]. 3D chromosomal structures based on Hi-C chromosomal conformation capture data in population cells show that TADs are largely preserved during the transition between the naive and primed states of hESCs [40], and TAD structure is also revealed by single-cell Dip-C. While enhancers are occupied by transcription factors, mediators, and cohesin—and their associated nucleosomes marked by H3K27ac [40]—histone modifications themselves may not be required for chromatin organization in these differential states. Loss of H3K27ac, however, perturbs transcriptional but not 3D chromatin architectural resetting [31].

mESCs and mEpiSCs can be distinguished by active and inactive compartmental organization and switching in sub-nuclear positioning that is associated with replication timing [41], where it appears that the heterochromatin (B) compartment is located in the nuclear periphery, whereas the active euchromatin (A) compartment is in the interior in both naive mESCs and mEpiSCs. We herein showed that primed hESCs form inactive compartments enriched with heterochromatin at the nuclear center and active euchromatin at the nuclear periphery that are distinct from the active euchromatin and naive pluripotent genes localized at the nuclear center in naive hESCs. Naive pluripotent gene networks between human and mouse PSCs are not well conserved and more closely resemble their respective blastocysts [42]. In fact, the naive pluripotent state observed for mouse ESCs has been difficult to capture in hESCs, appearing to be transitory in the human embryo itself. Thus, the direct application of mouse embryology to humans has not always been successful due to fundamental developmental differences between the two species [43]. There were also some differences between the naive hESC lines derived directly from the early embryo [5, 44] and those that we employed in the present study. The naive cells we used in this work were derived from the conversion of the primed hESC line H9 by an LTR7-GFP reporter, and we maintained the HERVH hyper-activation required for human pluripotency. The expression profile of naive cells enriched using the HERVH reporter in our analysis most closely resembled the inner cell mass when compared with the naive cells obtained by Gafni et al. [5] and showed a similar XIST expression and H3K27me3 distribution pattern around the X chromosome [26]. Cells were permeabilized in 500 μL of ice-cold Hi-C lysis buffer (10 mM Tris, pH 8.0; 10 mM NaCl; 0.2% IGEPAL CA 630) and 100 μL of protease inhibitor (Sigma, P8340) for ≥ 15 min on ice, washed in cold Hi-C lysis buffer on ice (with centrifugation at 2500 g for 5 min), and further permeabilized in 50 μL of 0.5% SDS at 62 °C for 10 min. The SDS was quenched by adding 145 μL of water and 25 μL of 10% Triton X-100 and incubated at 37 °C for 15 min with rotation. The cells were then digested by adding 25 μL of 10 × NEB Buffer 2 and 20 μL of 25 U/μL MboI (NEB, R0147M) and incubated overnight at 37 °C with rotation. On the second day, the cells were washed with 1 mL of ligation buffer (1 × T4 DNA ligase buffer, NEB B0202S) and 0.1 mg/mL BSA (NEB B9000S) and ligated in 1 mL of ligation buffer and 10 μL of 1 U/μL T4 DNA ligase (Life Tech, 15,224–025) at 16 °C for 20 h.

Single-cell isolation by flow cytometry

The ligated cells (in ligation buffer) were filtered through a 40-μm cell strainer (Falcon) and sorted into 0.2-mL UV-irradiated, DNA low-bind tubes (MAXYMum Recovery, Axygen) containing lysis buffer using a FACSAria III flow cytometer (BD, 85-μm nozzle). The area-scaling factor was set, and forward scatter (FSC)-A and side scatter (SSC)-A were used to exclude large-sized cellular structures or debris, and scatter SSC-W was set to avoid contamination by doublets or triplets. Single cells were sorted into a PCR tube by applying the “1.0 drop single” sorting mode. The collected single cells were stored for several months at − 80 °C.

Whole-genome amplification in Dip-C

Appreciable DNA contact information can be lost in traditional bulk Hi-C, but we greatly reduced any loss by inserting n different tags. As a result, only 1/n of input DNA contacts was lost in our study. We implemented META with n = 20 tags, and the sequences were treated according to previously described methods [Haplotype imputation (2D)

In each round of imputation, contacts in an “evidence” set voted to impute unknown haplotypes of contacts in a “target” set. For each target contact, a list of compatible haplotype tuples was first enumerated. Each evidence contact would then vote for haplotype tuples from this list, if such contact fell within 10 Mb in L0.5 distance from the target contact and was compatible with one and only one haplotype tuple from the list. Imputation would occur if the winning haplotype tuple gathered ≥ 3 votes and ≥ 90% of all votes.

Special care was taken for intrachromosomal contacts because intrahomologous contacts were far more frequent than interhomologous contacts, especially at short ranges (small genomic separation). A target contact would be assumed intrahomologous without voting, if its two legs were separated by ≤ 10 Mb; otherwise, voting still occurred but a winning interhomologous vote would only be accepted if two legs were separated by ≥ 100 Mb. In addition, intrachromosomal contacts that had unknown haplotypes on both legs were not imputed.

One leg as both the target and the evidence sets was estimated for the contact location. Such imputation was repeated two more times, each time with previous results as the new evidence set. Results were subsequently cleaned by removal of isolated contacts (< 2 other contacts that had the same haplotypes within 10 Mb in L0.5 distance). Finally, cleaned results were used as the evidence set to impute a target set of all interchromosomal contacts that had unknown haplotypes on both legs.

3D reconstruction

Simulated annealing was performed by nuc_dynamics [50] as a peak caller, with a q-value (FDR) threshold of 0.01. Based on a dynamic Poisson distribution, MACS2 can effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction of binding sites. Unique read for a position for peak calling is used to reduce false positive peaks, statically significant peaks are finally selected by calculated false discovery rate of reported peaks. Deeptools [51] is used for the heatmap plots. ATAC-seq peaks from all study samples were merged to create a union set of sites. Read densities were calculated for each peak for each sample, differential peaks between Naive and Primed were identified by DEseq2 [52] with adjusted P ≤ 0.05, |log2fold change|≥ 1.

Cut-tag: use FastQC v0.11.9 for quality control of raw sequencing readings. Using TrimGalore v0.6.6 to remove raw readings from low-quality base and linker sequences(https://github.com/FelixKrueger/TrimGalore). Compare the filtered reading with the reference mouse genome assembly mm10 of the mouse sample and the human genome assembly GRCh38 of the human sample using Bowtie2 v2.4.4. The options are end-to-end, very sensitive, no mixing, no inconsistency, phred33-I 10-X 700. Use the sorting function of samtools v1.13 to sort aligned bam files based on chromosome coordinates. Use the genomecov function of bedtools v2.30 to summarize the sorted bam files into a bedgraph file (Quinlan et al., 2010) [53]. In the case of samples with multiple biological replicates, use the unionBeg function combination of bedtools v2.30 to replicate specific bed chart files. In strict mode, perform peak calls on all bedgraph files using SEACR v1.3 by selecting the top 1% of the call peak. SEACR is specifically developed for CUT&RUN and is also a recommended pipeline for chromatin analysis data with very low background, such as CUT&Tag. Perform visual QC on peak values of bam files and calls using Seqmonk [54].