Main

The placenta, which is an integral part of the maternal–fetal interface, serves the critical function of supplying nutrition and oxygen to the develo** fetus and removes waste materials and carbon dioxide. It is a transient organ that responds to developmental cues and environmental stimuli during pregnancy1,2,3. The human placenta develops during implantation when trophoblasts differentiate from the trophectoderm (TE) of the blastocyst4,5. Following implantation, the TE gives rise to the following three critical trophoblast cell types: cytotrophoblast cells (CTBs), extravillous trophoblast cells (EVTs) and multinucleated syncytiotrophoblast (STB). CTBs can act as progenitor cells that differentiate into EVTs or fuse to the multinucleated STB4g–i).

Fig. 4: Characterization of similarity of hTSCs and placental organoids when compared to placental villi in early pregnancy.
figure 4

a, The schematic illustrates the lentiviral vector used for STAT5A and MITF overexpression. b, Bright-field images show hTSCs-BL-STAT5AOE and STB-BL-STAT5AOE. Scale bar: 100 μm. c, RT–qPCR analysis of STAT5A and target gene expression in STB-BL with DOX-inducible overexpression of STAT5A. Data are shown as mean ± s.d. P values by multiple unpaired two-tailed t test. n = 3 independent experiments. The dotted lines in c and e represent the baseline 1. d, Bright-field images show hTSCs-BL-MITFOE and STB-BL-MITFOE. Scale bar: 100 μm. e, RT–qPCR analysis of MITF and target gene expression in STB-BL with DOX-inducible overexpression of MITF. Data are shown as mean ± s.d. P values by multiple unpaired two-tailed t test. n = 3 independent experiments. f, Verification and visualization of eGRN enhancer-gene regulatory events near two target genes of two STB mature 2 specific TF regulators, CEBPB and FOSL2. Genomic tracks are calculated and compared with CEBPB and FOSL2 CUT&Tag, CEBPB ChIP–seq datasets, which are prepared with STB-BL and hTSC-BL cell lines. IgG mock is used for negative control. Regulatory events are highlighted with the following criteria: eGRN target gene regions (enhancers) should overlap STB-BL CUT&Tag peak(s) and/or ChIP–seq peak(s) but are depleted in hTSC-BL. g, Schematic representation of trophoblast organoids derivation. hTSC-BL are transferred to matrigel droplets on day 0 (D0) and maintained in trophoblast organoid medium to analyze on day 6 (D6). h, Bright fields and immunofluorescence analysis of trophoblast markers (CDH1 and hCG) in trophoblast organoids derived from hTSCs-BL with DOX-inducible overexpression of MITF. Scale bars: 250 μm for bright-field images, 20 μm for immunofluorescence images. i, ELISA analysis of CSH2 expression in trophoblast organoids derived from hTSCs-BL with DOX-inducible overexpression of MITF. Data are shown as mean ± s.d. P values by unpaired two-tailed t test. n = 3 independent experiments. CDS, coding sequence; hPGK, human phosphoglycerate kinase 1 promoter; P2A: porcine teschovirus-1 2A peptide; Puro, puromycin; TRE3GS, TRE3GS inducible promoter; Tet-on 3G, Tet-on 3G transactivator protein; WPRE, Woodchuck hepatitis virus’s post-transcriptional regulatory element; OE, overexpression.

Source data

Overall, our findings indicate that our datasets offer a reliable reference for evaluating in vitro models recapitulating certain aspects of STB heterogeneity, regulatory mechanisms and functions during early pregnancy.

Matured STB nuclear heterogeneity and function were revealed in late pregnancy

To systematically investigate the development of human placental STB in late pregnancy, we examined our large-scale single-nucleus transcriptomic (23,981 nuclei) and epigenomic (24,692 nuclei) datasets from lSTB and lCTB nuclei. Quality control (Supplementary Fig. 7a–d) and annotation of lCTB subtypes are described in Supplementary Note 15. Gene sets previously used to classify nuclear types in early pregnancy demonstrated notable consistency in late pregnancy samples (Fig. 5a,b, Supplementary Fig. 7e,f and Supplementary Table 3). Cluster 11 and cluster 6 were classified as two nascent nuclear types of STB in late pregnancy with high expression of SH3TC2 (lSTB nascent 1 and lSTB nascent 2). PAPPA expression was substantially increased in late pregnancy and was present in five STB subclusters, including cluster 4 (lSTB premature 1-a), cluster 1 (lSTB premature 1-b), cluster 3 (lSTB mature 1-a), cluster 2 (lSTB mature 1-b) and cluster 7 (lSTB mature 1-c). In contrast, FLT1 expression was only detected in cluster 5 (lSTB mature 2-a) and partially in cluster 8 (lSTB mature 2-a), which represents a small fraction of nuclei in STB in late pregnancy (Fig. 5a and Supplementary Fig. 7e). Detailed information on the annotation of STB nuclear subtypes is shown in Supplementary Notes 16 and 17. The decreasing expression of FLT1-positive STB nuclei may indicate an intriguing regulatory effect of the altered environmental cues from hypoxia in early pregnancy to normoxia in late pregnancy and dynamic developmental demands. Interestingly, the small population of FLT1-expressing lSTB nuclei showed dramatic closure of the chromatin locus around the FLT1 gene when compared with early pregnancy (Supplementary Fig. 8b,d). The low chromatin accessibility around the FLT1 gene expression locus during late pregnancy led us to consider detectable FLT1 RNA in clusters 5 and 8 as residual RNA of the FLT1 gene from an earlier pregnancy stage.

Fig. 5: lSTB mature 1 dominates the STB nuclear subclusters in late pregnancy.
figure 5

a, UMAP shows CTB and STB nuclei profiled with snRNA-seq in late pregnancy. The bottom UMAP indicates the expression patterns of representative marker genes for STB nuclear subclusters in late pregnancy. The expression levels are presented with color intensities. Gene expression raw counts were normalized by depth, logarithmized and z score scaled, and finally smoothed with imputation. b, UMAP shows CTB and STB nuclei profiled with snATAC-seq in late pregnancy. The bottom UMAP indicates the gene-activity scores of representative marker genes for STB nuclear subclusters in late pregnancy. The activity scores are presented with color intensities. c, Pseudotime ordering of lSTB nuclei profiled with snRNA-seq shows the differentiation trajectory in late pregnancy by monocle 2 DDRTree algorithm. The differentiation time is presented with color intensities. Nuclei of each cluster are aligned above and ordered by pseudotime values. d, Trajectory heatmap shows 16 typical new marker genes (row) expressed dynamically along pseudotime ordering (column). The pseudotime time is presented with color intensities. e, Functional networks visualize the GO enrichment results. Nodes represent GO terms and DEGs, whereas edges represent a gene’s membership in the GO term. Node color, log(FC) of DEGs; edge width, the signification of enrichment test P adjust value. GO enrichment analysis is tested by one-sided Fisher’s exact test and P values are adjusted for multiple comparisons with the BH method. f, smFISH staining of indicated marker genes (PAPPA, FLT1 and hCG) characterizes lSTB mature 1 and lSTB mature 2 in late pregnancy (left). Statistical analysis of the proportion of lSTB mature 1 and lSTB mature 2 (middle). Data are shown as mean ± s.d. P values by unpaired two-tailed t test. n = 6 donors. The schematic represents the distribution of lSTB mature 1-a, lSTB mature 1-b and lSTB mature 2 in the late placental STB (right). g, UMAP shows lCTB and lSTB nuclei profiled with snRNA-seq and snATAC-seq after the python package 'GLUE' integration (Methods). h, Heatmap shows the activity of TF-target regulatory modules elicited from the integration of snRNA-seq and snATAC-seq data using the FigR package. Genes represent target genes and candidate TF regulators in rows and columns, respectively. i, TF-regulatory network construction with a combination of modules lSTB mature 1-a and lSTB mature 1-b in late pregnancy. The Domains of Regulatory Chromatin (DORC) accessibility and expression level are presented with circle sizes and color intensities, respectively. The network is built with a different method for late pregnancy.

Source data

To further uncover the molecular mechanisms driving lSTB nuclear heterogeneity and place the newly annotated lSTB nuclear subclusters in a defined trajectory, we performed pseudotime analysis (Fig. 5c,d). A linear transcriptional trajectory was revealed for all PAPPA-positive nuclei in late pregnancy. Following the order of pseudotime, we observed that the first cluster of lSTB nuclei was largely composed of nascent nuclei we annotated with specific gene expression of PDE4D, BACE2 and SH3TC2 (Fig. 5c and Supplementary Fig. 7h–j). lSTB nuclei with high expression of LAMA3 and located at the end of the trajectory were the most matured nuclear type with BMP1 highly expressed nuclei located in the middle (Fig. 5c and Supplementary Fig. 7h–j). We further revealed dynamic gene expression throughout a continuum of nuclear-state transition in Fig. 5d and identified DEGs along the trajectory, including PDE4D, SH3TC2, PAPPA, BMP1, STAT5A, PTCHD4, CCDC30, INPP5D and CROT. GO enrichment analysis revealed that extracellular matrix organization (ECM) was highly associated with nuclei in lSTB mature 1-a and lSTB mature 1-b subcluster (Fig. 5e and Supplementary Table 8).

Placenta in late pregnancy was stained with probes targeting genes PSG8, SH3TC2, LEP, PAPPA and FLT1. The smFISH results for FLT1 and PAPPA suggest that a substantial portion of STB nuclei exhibit PAPPA expression, aligning with our bioinformatic analysis (Fig. 5f and Supplementary Note 17).

The integration of snRNA-seq and snATAC-seq through analysis of peaks and gene regulatory proximity to the linear genome reveals strong connections of gene expression patterns and accessibility dynamics of regulatory elements (Fig. 5g, Supplementary Fig. 9a–g and Supplementary Note 18). Subsequently, we performed a TF-mining and TF-regulatory network analysis (FigR, v0.1.0) and identified seven unique modules of genes that are regulated by distinct TFs (Fig. 5h,i and Supplementary Figs. 10 and 11). Notably, key TFs that were enriched in the most matured and functional lSTB mature 1-a and lSTB mature 1-b subclusters, such as STAT6, STAT5A, STAT4 and MITF (module 1 and module 2 as shown in Figs. 5h and 5i), were also important in the eSTB mature 1 subcluster in early pregnancy. GO enrichment revealed a strong association between ECM and the nuclear subtypes of STB in the late pregnancy stage (Supplementary Note 19).

Discussion

Recently, nuclear-specific functions in the syncytia of various organisms, including the fungus Ashbya gossypii49, muscle fibers50 and other tissue systems, have been documented. However, the nuclear heterogeneity and differentiation in the human placental STB remains largely unexplored16,18,19,22,51. Here we leveraged the power of snRNA-seq and snATAC-seq to profile the nuclear diversity and regulatory mechanism from both transcriptomic and epigenetic perspectives. Our analysis demonstrated the dynamic STB nuclear subtypes during early and late pregnancy. It provided evidence of significant heterogeneity in the STB nuclei from a gene expression and regulation perspective, allowing us to identify a dynamic bifurcating trajectory of STB nuclei with distinct biological functions. These findings together with the eGRNs we uncovered lay the groundwork for further exploration of mechanisms driving human placental STB development in supporting successful pregnancy (Supplementary Note 20).

Specifically, integrative snRNA-seq and snATAC-seq analysis of STB nuclear populations in early pregnancy, revealed some TFs, including STAT5A, that might serve as a driving force in guiding the differentiation of STB nuclei towards a more mature state, which is characterized by high levels of PAPPA expression. PAPPA, a serum marker used for screening developmental abnormalities in early pregnancy, was detected at increasing concentrations in the maternal circulation during pregnancy52. We anticipate that future work will clarify the relationship of TFs we identified here and the conditions associated with PAPPA. Moreover, a direct regulatory network FLT1 controlled by CEBPB and FOSL2 was revealed. This is especially important in the clinical context, as CEBPB was recently reported to be involved in EVT dysfunction in the severe preeclampsia placenta53,54. The abnormally overexpressed soluble FLT1 protein in the placenta, which is secreted into maternal blood, is also known to contribute to the pathogenesis of preeclampsia55. Therefore further investigation is necessary to fully understand the regulatory roles of CEBPB and FOSL2 in FLT1 expression in the STB during pregnancy and in related diseases.

Newly established hTSCs have been considered a substantial breakthrough in the field48,56,57,58. Our previous work has already proven that hTSCs could be used as a model for human trophoblast differentiation during implantation59. Manipulation of the key STB nuclear lineage-determining TFs that we identified here, in hTSCs, could provide a better experimental model in the interpretation of human placenta STB in later stages beyond implantation.

In summary, we have presented a comprehensive characterization of STB nuclear heterogeneity during pregnancy (see the summary model in Supplementary Fig. 13). The data we generated not only enhanced our understanding of the regulatory mechanisms that govern the dynamic development of STB nuclear identities but also established a valuable framework for future studies aimed at interpreting the roles of key molecules associated with placental development and disorders in humans.

Methods

Description of human placental donors and ethical approval

The placental specimens used in this study of snRNA-seq and snATAC-seq were collected from six early pregnant women (6–9 weeks of pregnancy) and six late pregnant women (38–39 weeks of pregnancy). Additional three healthy placental tissues from the three early pregnant women (7–8 weeks of pregnancy) and three placental tissues from the three late pregnant women (38 weeks of pregnancy) were used for experimental validation (RNAscope). The placental specimens were all collected from Peking University Third Hospital. Sex identification for samples was analyzed with PCR of the following two sex-linked genes: SRY and NLGN4, before sequencing. This research was approved by the Ethics Committee of the Institute of Zoology, the Chinese Academy of Sciences, and the Ethics Committee of the Peking University Third Hospital under research license (2019) JLS (242-01). All patients participating in the study were healthy pregnant women without any exclusion. Every patient involved in the project has signed the informed consent. All the patients in our research project shared one ethical approval license number.

Library construction and sequencing

Single-nucleus isolation

The villi were separated from the early or term placenta and cut up after removing the blood with PBS. Ice-cold Nuclei EZ lysis buffer (Sigma-Aldrich, NUC-101) containing RNAase inhibitor (Takara, 2313B) and protease inhibitor (Sigma-Aldrich, P8340) was added to resuspend the tissue and grinded with a Dounce tissue grinder (Sigma-Aldrich, D8938). The lysates were bathed in ice and lysed twice. After washing with PBS, the nuclei were split in half and filtered through a 40 μm sieve to proceed with ATAC-seq and RNA-seq.

snRNA-seq

Single Cell B Chip Kit (10x Genomics, 1000074) and the nucleus suspension (600 nuclei per microliter determined by CountStar) were loaded onto the Chromium single cell controller (10x Genomics) to generate single-nucleus gel beads in the emulsion (GEMs) according to the manufacturer’s protocol. In brief, single nuclei were suspended in PBS containing 0.04% BSA. About 16,000 nuclei were added to each channel, with 8,000 target nuclei estimated to be recovered. Captured nuclei were lysed, and the released RNA was barcoded through reverse transcription in individual GEMs. Reverse transcription was performed on an S1000TM Touch Thermal Cycler (Bio-Rad) at 53 °C for 45 min, followed by 85 °C for 5 min and held at 4 °C. The cDNA was generated and then amplified, and quality was assessed using an Agilent 4200 (performed by CapitalBio Technology). Next, snRNA-seq library was constructed using Single Cell 3′ Library and Gel Bead Kit V3.1 according to the manufacturer’s instructions. The libraries were finally sequenced using an Illumina NovaSeq 6000 sequencer with a sequencing depth of at least 100,000 reads per nuclei with a pair-end 150 bp (PE150) reading strategy (performed by CapitalBio Technology).

snATAC-seq

Following the 10x Genomics single-cell ATAC solution, by using Chromium Chip E Single Cell Kit (1000156) and Chromium Single Cell ATAC Library and Gel Bead Kit (1000110), the nuclei in a bulk sample were partitioned into nanoliter-scale GEMs and a pool of ~750,000 10× Barcodes was sampled to separately and uniquely index the transposed DNA of each nucleus. Libraries were then generated (performed by CapitalBio Technology). The libraries were sequenced using an Illumina Novaseq sequencer with a sequencing depth of at least 25k read pairs per nuclei with a pair-end 50 bp (PE50) reading strategy.

Massive integration analysis of snRNA-seq and snATAC-seq from two pregnancy periods

Identification of nucleus types at a global level

To understand the nuclei type consistent of snRNA-seq and snATAC-seq of the early and late pregnancy, we started the analysis with the Cellranger ‘aggre’ command (without dept normalization) to aggregate libraries. For a quick and general analysis, we used the Python packages Scanpy v1.6.1 (ref. 34) and SnapATAC v2.2.0 (ref. 35) to perform the upstream analysis. Then each data modality was applied with Harmony40 to remove the batch effect, and we then achieved dimension reduction, nuclei cluster partition and annotation with known marker genes. We used customized R codes to visualize the results. To assist cluster annotation in snATAC-seq, the gene-activity score of each gene was quantified by make_gene_matrix function from the package SnapATAC. For integrating the 24 libraries from two modalities at two pregnancy stages, the liger package (v0.5.0)41 was finally applied. The nuclear major types from different modalities and pregnancy stages were visualized by marker genes in the integration space.

DEG identification of the STB population

To understand the diversity of the STB nuclei in early and late pregnancy, we identified the DEGs according to the snRNA-seq analysis results. We first subset all STB nuclear clusters from the early and late pregnancy data and performed DEG test at the single-nucleus level with the following three methods: Seurat findMarkers (Wilcoxon test, default parameter), Scanpy rank_gene_groups (Wilcoxon test, default parameter) and method from ref. 60 (analysis of variance (ANOVA), fold change (FC) > 0.5, P < 0.05). We then found a shared set within the top 200 of each sorted DEG gene sets. To directly visualize and compare the gene expression difference between pregnancy stages, we aggregated and calculated a pseudobulk RNA-seq data by stage and donor. We then averaged logarithm of counts per million reads, log2(CPM) expression values of the six donors per stage and plotted with scatter plot and labeled the top 20 genes. GO of Biological Process enrichment was analyzed with DAVID61 of the knowledge base v2023q2 with share DEG genes.

Differential TF identification of the STB population

We applied chromVAR62 to perform a quick scan for snATAC-seq data of two pregnancy stages against the JASPAR 2020 motif database36. In brief, the genomic positions of each motif were identified in each peak. A deviation score of each motif was calculated for each nucleus, producing a deviation score by nuclei matrix. Then this matrix was z-score normalized. We used FindMarkers in the Seurat package to identify differentially enriched motifs with default parameters between early and late pregnancy. TF motifs of interest were selected from the top 50 list and further aligned and visualized with gene-activity score and gene expression data.

snRNA-seq analysis for STB subtypes in early pregnancy

Dimension reduction and graphic clustering for nucleus-type annotation

We used the R package Seurat (v3)63 to conduct the general upstream analysis. Generally, we used principal component analysis for dimension reduction with a maximum dimension of 30. We then performed graphic clustering by the ‘Louvain’ algorithm with a start resolution of 0.9. The final resolution was chosen after an iterative tuning for best interpreting of the biological complexity of the placenta sample. We annotated each cluster by marker gene expression.

Differential gene expression analysis and GO enrichment analysis

To identify DEGs, we applied Seurat FindAllMarkers with parameters of min.pct = 0.1, logfc.threshold = 0.25. Top 50 DEGs were visualized by heatmap. Clusterprofiler (v3.18.1) was applied for GO enrichment analysis and visualized by customized R script. All DEGs were saved and used for further analysis.

Differentiation trajectory reconstruction

We used the R package Monocle2 (ref. 64) to conduct pseudotime analysis. DEGs identified before were used as the informative gene set, which would encode the cell differential trajectory. To focus on the STB nuclei differentiation, we excluded the CTB nuclei and started at the fusion-competent CTB nuclei (ERVFRD-1 positive). We set DEGs mentioned above as ordering genes for unbiased trajectory inference. The trajectory tree was then visualized as ‘DDRTree,’ and essential genes during differentiation were examined by heatmap. Each nuclear cluster type was classified along the trajectory path and visualized by customized R scripts.

snATAC-seq analysis for STB subtypes in early pregnancy

Dimension reduction and graphic clustering for nucleus-type annotation

We applied the SnapATAC v2 (ref. 35,65) package to perform the general upstream analysis because we found this package used a spectral clustering method as the dimension reduction algorithm, which can better interpret the intrinsic population structure of the STB clusters. Harmony40 was applied for batch effect removal. We also compared the dimension reduction results with cisTopic (v0.3.0) with the Latent Dirichlet Allocation (LDA) algorithm, ArchR (v1.0.1) with the iterative Latent Semantic Indexing (LSI) algorithm and Signac (v1.1.0) with the LSI algorithm (data not shown). All these tools produced similar dimension reduction results. For nuclei clustering, we applied the ‘Leiden’ algorithm with a resolution of 0.9 for the initiative graphic clustering. To gain reasonable cluster numbers (which is essential to interpret the nuclei types), we iteratively tested the resolution to optimize the final clustering results. Several known marker genes were used to verify and annotate the clusters.

Differentially accessible regions (DARs) identification

We used the function ‘rank_genes_groups’ in the scanpy package with ‘t test’ methods to identify DARs. This function used a raw peak count matrix as the input data. We applied a strict filtering threshold of P value < 0.001 and a minimal log2(FC) of 1. The top 5,000 DARs were kept and used for further analysis.

Differentiation trajectory reconstruction

We applied a supervised trajectory inference method44 to illustrate the developmental paths of STB nuclei in early pregnancy. Briefly, we tried to fit a candidate trajectory path on a UMAP map and then pseudotime values were assigned to each nucleus. It is based on the hypothesis that UMAP coordinates per se harbor the intrinsic differential information.

Matching snRNA-seq and snATAC-seq clusters in early pregnancy

Calculate gene-activity score for snATAC-seq nuclei

We aggregated and counted the snATAC-seq Tn5 insertion sites within the gene body and a 2 kb extended upstream region to calculate the gene-activity score for each gene of the human gene annotation database (Gencode v41) by SnapATAC v2 function make_gene_matrix.

Clusters matching with liger

Similar to the previous massive integration analysis, we applied the liger method for modality integration of the early pregnancy data. We selected the variable genes set from snRNA-seq data with var.thresh = 0.2 and set the k = 35. Finally, the Integrative Non-negative Matrix Factorization (iNMF) dimension reduction matrix was calculated with all nucleus ids as the col-names. Louvain clustering was then conducted with resolution = 0.8.

To visualize the integration results, we merged every liger modality dataset and built a Seurat object with both gene-activity score and gene expression data. Then we normalized the data slot with ‘normalizeData’ function in Seurat. We visualized the marker genes in the integration space to annotate liger clusters. snRNA-seq clusters and snATAC-seq clusters were joined and paired according to the integration clusters. A confusion matrix of cluster pairs was also made from paired nuclei with annotation labels in previous results.

Quantify and visualize the between-modality alignment result

We applied a customized K-Nearest Neighbors (KNN) algorithm to quantify the nucleus-to-nucleus matching result. In brief, we first randomly picked 50 snATAC-seq nuclei in the liger integration UMAP space and then iteratively found the nearest snRNA-seq nuclei. The results were compared and visualized with a combination of snRNA-seq and snATAC-seq clusters by a scatter plot.

Peak-to-gene linkage analysis

We used the liger iNMF dimension reduction matrix to pair nuclei of the snRNA-seq data to nuclei from the snATAC-seq data by a simple nearest Euclidean distance strategy44,66. This method found the closest RNA nuclei for each ATAC nuclei in the liger iNMF space and built the nuclei pairs. Accordingly, the pairing result was used to impute the expression matrix (gene × nuclei of snRNA) to the imputedRNA matrix (gene × nuclei of snATAC), resulting in two matrics with paired nucleus ids. Then all-to-all correlation was calculated among all gene rows of the imputedRNA matrix against all peak rows of the peak count matrix (peak × nuclei of snATAC). We then filtered the correlation table with a cutoff of correlation co-efficiency of 0.45 and an False Discovery Rate (FDR) cutoff of 0.0001. The resulting peak-versus-gene pairs were defined as peak-to-gene links. Then all peak-to-gene links were aligned with a combination of peak accessibility and gene expression data and visualized as a heatmap. To illustrate the local enhancer–promoter regulatory landscape of essential genes (PAPPA and FLT1), we plotted genomic regions around genes with the pseudobulk ATAC tracks matching single nuclear gene expression data.

TF mining and construction of a regulatory network

To decipher the TF regulation network for STB with a multi-omics-based strategy67, we reinterpreted the core steps with homemade R codes with slight modifications to perform the TF mining and construct a precise and robust ‘TF—cis-element—target gene’ regulatory network (Supplementary Fig. 6g). In brief, the TF motif enrichment tool i-cisTarget68 and pycistarget69 were used to scan TF motif enrichment in previously identified cluster-specific DARs (coordinates were lifted from hg38 to hg19), with default parameters except for an area under the curve (AUC) threshold of 0.001. We used the motif Position Weight Matrix (PWM) database only. For pycistarget, we used the nonredundant motif database ‘hg38_screen_v10_clust’ for clearer TF gene assignment. The TF motif enrichment results of the above mentioned two packages were collected, merged and parsed, and a maximal normalized enrichment score (NES) was assigned to each TF motif that passed the AUC threshold 3.0. According to the hypothesis that a TF regulator is a true-positive TF regulator only if the TF gene expression is positively correlated with the accessibility of motifs66, we calculated Pearson’s correlation for all TF motif NES against TF gene expression. We ranked the correlations for TF genes and visualized all selected cluster-specific TF regulators in a heatmap.

To build the ‘TF—cis-element—target gene’ regulatory network, we extracted all candidate target genes for each TF motif in the enrichment table and searched for the linked peaks (cis-element) for each target gene within all predefined peak-to-gene links. The final networks were constructed with Cytoscape (v3.9.1)70, integrating node and edge attributions according to snRNA-seq or snATAC-seq data.

Culture of hTSCs

The culture of human trophoblast was performed as described previously47. Briefly, the plate was coated with 5 μg ml−1 Collagen I (Corning) at 37 °C at least for 1 h. hTSCs were cultured in hTSCs medium (DMEM/F12 (Gibco) supplemented with 0.1 mM 2-mercaptoethanol (Gibco), 0.2% FBS (Gibco), 0.5% Penicillin–Streptomycin (Gibco), 0.3% BSA (Sigma-Aldrich), 1% ITS-X supplement (Gibco), 1.5 μg ml−1 l-ascorbic acid (Sigma-Aldrich), 50 ng ml−1 Epidermal Growth Factor (EGF, MedChemExpress), 2 μM CHIR99021 (MedChemExpress), 0.5 μM A83-01 (MedChemExpress), 1 μM SB431542(MedChemExpress), 0.8 mM valproic acid (VPA, Wako) and 5 μM Y27632 (MedChemExpress)). hTSCs were dissociated with TrypLE (Gibco) for 8 min at 37 °C, and the cells were passaged to a new Collagen I-coated plate. hTSCs were routinely passaged every 4–5 d at a 1:4–1:6 ratio.

Lentivirus production

The full-length coding sequence of STAT5A and MITF were cloned into the pLVX-TetOne-TRE3GS-MCS-2A-PURO lentiviral vector, respectively. The overexpressing lentivirus vector together with packaging plasmids pPAX2 and pMD2.G were cotransfected into HEK293T cells using a polyethyleneimine transfection protocol (Proteintech). The viral supernatants were collected at 48 and 72 h post-transfection and then filtered through a 0.45 μm polyethersulfone (PES) filter.

Generation of overexpressing hTSCs

hTSCs were preseeded into 12-well plate at a density of 5 × 104 cells per well and were infected with overexpressing lentivirus at a confluency of 30% for 12 h. Then the infected hTSCs were treated with 2 μg ml−1 puromycin for 48 h to establish stable overexpressing hTSC lines.

Differentiation of hTSCs into STB

hTSCs were grown to 80% confluence in the hTSCs medium and dissociated with TrypLE for 8 min at 37 °C. For the induction of STB, hTSCs were seeded in a six-well plate precoated with 2.5 μg ml−1 Collagen I at a density of 1 × 105 cells per well and cultured in STB medium (DMEM/F12 supplemented with 0.1 mM 2-mercaptoethanol, 0.5% Penicillin–Streptomycin, 0.3% BSA, 1% ITS-X supplement, 2.5 μM Y27632, 2 μM forskolin (MedChemExpress) and 4% Knockout Serum Replacement (KSR, Gibco). The medium of overexpressing STB was supplemented with 5 μM DOX (MedChemExpress). The medium was replaced every 2 d and analyzed on day 6.

Generation of trophoblast organoid

The establishment of trophoblast organoids was performed as described previously71. Briefly, dissociated hTSCs were suspended in 100 μl Matrigel at a density of 1 × 104. One droplet per well was added to an eight-well ibidi chamber or eight droplets per well were added to a six-well plate. After setting the plate at 37 °C for 15 min, 200 µl or 2 ml hTSC culture medium was loaded per well. In this case, it is called trophoblast organoid medium. The medium of overexpressing organoids was supplemented with 5 μM DOX. The medium was replaced every 2 d and analyzed on day 6.

Comparison of in vivo and in vitro snRNA-seq

We first analyzed and annotated two datasets of snRNA-seq obtained from hTSCs-BL and hTSCs-CT30 using the Seurat algorithm, similar to the analysis we performed on placenta samples in early pregnancy. We used known marker genes, including GATA3, TEAD4, TOP2A, PCNA and ITGA6, to annotate proliferative hTSC and CTB. In addition, we annotated STB-BL and STB-CT30 subclusters using markers that were previously used to annotate eCTB fusion (ERVFRD-1), eSTB nascent (SH3TC2) and eSTB mature (CGA, PAPPA and FLT1). We used the Seurat CCA algorithm63 to identify anchor genes and integrated two snRNA-seq datasets from hTSCs-BL and hTSCs-CT30 with snRNA-seq datasets from placental villus in early pregnancy. Subsequently, we jointly annotated the clusters using marker genes and aligned them along the differentiation trajectory using Monocle2 (v2.14.0)64,72.

CUT&Tag

The experiment was conducted according to the instructions provided by Vazyme (TD904). In brief, approximately 5 × 104 cells were collected and counted, then mixed with activated ConA beads and incubated overnight with primary antibodies (CEBPB—Proteintech, 23431-1-AP; FOSL2—USBiological, 351814; rabbit IgG—Abcam, ab172730). The secondary antibody (VazymeAb207) was incubated at room temperature with rotation for 1 h. After removing the supernatant, pA/G-Tnp Pro and TTBL were separately incubated with rotation at room temperature and 37 °C for 1 h. The supernatant was then mixed with DNA Extract Beads Pro to dissolve DNA with ddH2O and used directly for PCR amplification. Subsequently, VAHTS DNA Clean Beads were added to purify the DNA, which was then subjected to next-generation sequencing using the Illumina NovaSeq platform.

ChIP–seq

The experiment of ChIP–seq was performed as previously described73. Briefly, about 5 × 106 hTSCs-BL and STB-BL were used for cross-linking by 1% formaldehyde and quenched by 125 mM glycine. Cells were lysed and nuclei were sonicated and diluted to a final concentration of 0.1% by SDS. Protein A/G magnetic beads were added and incubated at 4 °C for 1 h. Precleared chromatin was incubated with primary antibody (CEBPB, 23431-1-AP, Proteintech) or monoclonal rabbit isotype control (Abcam, ab172730) overnight at 4 °C. Then protein A/G magnetic beads were added to capture the specific chromatin complexes. The supernatants were collected with 500 mM NaCl and incubated at 65 °C overnight for reverse cross-linking. Samples were incubated at 37 °C for 1 h with RNase A and were incubated at 50 °C for 1 h with proteinase-K. The released DNA was purified by the QIAquick PCR Purification Kit and processed for sequencing library preparation. Multiplexed ChIP–seq libraries were prepared by NEBNext Ultra II DNA Library Prep Kit (NEB, E7645S). The final libraries and multiplexed libraries were subjected to Illumina NovaSeq 6000 using PE150 (paired-end 150 nt) sequencing.

Analysis of CUT&Tag and ChIP–seq data

We applied bowtie2 (v2.4.2) with default parameters for raw Fastq reads map** to human (hg38) reference genome and used MACS2 to call peaks for CUT&Tag and ChIP–seq datasets. We used the ‘reduce’ function from the R package ‘GenomicRanges’ for the merge of peaks of each replication. We then intersected and counted overlap peaks from CUT&Tag and previous eGRN peaks that were assigned to nearby genes by peak-to-gene links. We quantitatively summarized the validation result by the overlap percentage of these two peak sets. Finally, we visualized the regulatory events with CUT&Tag, ChIP–seq tracks and eGRN TF-target peaks tracks for two randomly picked genes within the eGRN target gene list.

smFISH with RNAscope

Fresh tissues were collected and fixed in 10% neutral-buffered formalin (Sigma-Aldrich) for 18 h at room temperature followed by dehydration in ascending series of ethanol, clearing in xylene, embedding in paraffin and section for smFISH with RNA probes targeting genes SH3TC2, LEP, FLT1, PAPPA, PSG8, STAT5A and FOSL2 following instructions from RNAscope Multiplex Fluorescent Reagent Kit V2 (Advanced Cell Diagnostics, 323100). First, sections were pretreated with RNAscope hydrogen peroxide, the process of target retrieval and protease plus to unmask target RNA and permeabilize the cells. Second, probes were carefully prepared and used for the hybridization with the target RNA molecules. Third, RNAscope detection reagents were used to amplify the hybridization signals through sequential hybridization of amplifiers and label probes and stained with different fluorescent dyes. Finally, sections were counterstained with DAPI, mounted with fluorescent mounting medium and stored at −20 °C for long-term storage. Images were collected under a Zeiss LSM 880 confocal laser scanning microscope, and the image processing was performed with ZEN software. Additional methods are described in detail in Supplementary Methods.

Statistics

Clustering result refinement

We tuned the clustering result with different graph construction parameters of k and community discovery methods (Louvain or Leiden) for snATAC-seq data. After a fix of k = 30 and setting the clustering algorithm as ‘Leiden,’ we iterated the community detection resolution and chose the most frequent stable cluster assignment. We then evaluate the cluster results by the Dunn index and Silhouette coefficient.

Statistical analysis was performed using GraphPad Prism software (v9.0.0) by unpaired t test or one-way ANOVA analysis. Results were shown as means ± s.d.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.