Introduction

The capacity to differentiate into derivatives of all three embryonic germ layers are the central defining feature of all pluripotent stem cells (PSC), but assessing this property remains a challenge for human cell lines. PSC were first recognized as embryonal carcinoma (EC) cells in teratocarcinomas, germ cell tumors that also contain a wide array of somatic tissues1,2,3,4. In a classic experiment, using a teratocarcinoma of the laboratory mouse characterized by Stevens5 Kleinsmith and Pierce6 provided the first functional demonstration of pluripotency by showing that single cells from ascites-grown embryoid bodies (EBs) could generate tumors containing EC cells together with somatic tissues. The connection between teratocarcinoma and normal embryos was subsequently established by experiments showing that embryos transplanted to extra-uterine sites inevitably develop into teratomas or retransplantable teratocarcinomas7,8. The discovery that murine EC cells can participate in embryonic development when transferred to early mouse embryos to give rise to chimeric mice9 led to the realization that EC cells have the developmental capacity of cells of the inner cell mass. This laid the groundwork for the derivation of embryonic stem (ES) cells from mouse embryos10,11 and later from human embryos12 and of induced PSC (iPSC) from differentiated human cells13,14.

In assessing mouse ES or iPS cell lines, pluripotency is functionally defined from the PSC. However, for human PSC, be they ES or induced pluripotent stem cells (iPSC) cells13,14, this fundamental assay is by the cell line’s ability, when transferred to a preimplantation embryo, to form to a chimeric animal in which all of the somatic tissues and the germ line include participating cells not available. Moreover, a variety of well characterized PSC, from both mice and primates have only a limited ability to participate in chimera formation, even though they can differentiate into tissues of all three germ layers in teratoma and in vitro assays15. With the advent of technologies for producing large numbers of human PSC16,17, some destined for clinical applications, the need for rapid and convenient assays of a specific PSC’s pluripotency and differentiation competence has become paramount.

The purpose of this study was to provide an authoritative assessment of several established alternative techniques for determining the developmental potential of human PSC lines. The PluriTest® assay18 (www.pluritest.org), is a bioinformatics assay in which the transcriptome of a test cell line is compared to the transcriptome of a large number of cell lines known to be pluripotent. This test can be carried out rapidly with small numbers of cells, an important consideration in the early stages of establishing new PSC lines. PluriTest is able to exclude cells that differ substantially from undifferentiated stem cells, but does not directly assess differentiation capacity. Complementing PluriTest’s focus on the undifferentiated state, various methods have been developed to monitor differentiation of the PSCs themselves in vitro, including protocols that induce spontaneous differentiation of cells in either monolayer or suspension culture, or directed differentiation under the influence of specific growth factors and culture conditions that promote the emergence of particular lineages19,20. One of the most common approaches has been the use of differentiation in suspension culture, when clusters of cells undergo differentiation to form embryoid bodies (EB), often with some internal structure apparent21. EB differentiation has also been combined with gene expression profiling and bioinformatic quantification of gene signatures, giving rise to the pluripotency scorecard assay22. Further development of this scorecard defined a panel of 96 genes that identified the differentiation capacity of a given cell line more quantitatively than the typical histology-based teratoma assay23. The teratoma assay has long been regarded as the ‘gold standard’ for assessing human PSC pluripotency. Not only do truly pluripotent cells generate a very wide array of derivatives in these tumors, but they are also often organized into organoid structures reminiscent of those that appear during embryonic development24. However, both the production of teratomas as xenografts, and their detailed analysis, which requires appropriately trained specialists, is costly and time consuming, and may be limited by concerns over animal welfare. Moreover, the teratoma assay, as routinely performed, does not yield quantitative information on lineage differentiation potential25, although gene expression analysis of the teratomas themselves can supply more definitive analysis.

In the current International Stem Cell Initiative (ISCI) study, following discussion at an ISCI workshop attended by about 100 members of the human PSC research community, we carried out a comparison of these approaches for assessing pluripotency by conducting a series of assays with human PSC lines, both ES and iPS cells. PluriTest was used to assess the transcriptome of the undifferentiated cell lines. For the EB assay, we chose one widely used approach, the ‘Spin EB’ system21 and used an adapted lineage scorecard methodology22 to assess the results. The Spin EB method provides for control of input cell number and good cell survival, and allows for differentiation under neutral conditions and under well-defined conditions expected to promote differentiation towards ectoderm, mesoderm or endoderm. Differentiation in teratomas was appraised by both histological examination and by “TeratoScore”, a computational quantitation of gene expression data derived from teratoma tissue26.

These blinded analyses, conducted by independent experts on PSC-derived samples in four highly experienced laboratories, shows that each of these methods can be used to indicate pluripotency and that each is able to detect some variation in developmental potential among the cell lines. The choice of which method(s) should be used must be dictated by the biological question posed and the future use of the PSCs in question. We propose a schema outlining the choice of methodology for particular applications.

Results

Experimental design

To compare PluriTest, EB differentiation and teratoma, assays under conditions that would reflect variability between laboratories and cell lines, four separate, expert laboratories in four countries carried out these studies on each of three different, independent PSC lines and a fourth cell line, H9 (WA09)12, which was common to all (Supplementary Table 1). All the experimental material was processed centrally, with high-throughput RNA sequencing (RNA-seq), quantitative real-time PCR and histology, as well as bioinformatics analyses carried out by single-specialized laboratories. In total, we compared results from 13 PSC lines (seven ESC and six iPSC lines).

Genetic integrity

It has been suggested that karyotypically variant PSC might be associated with persistence of undifferentiated cells in xenograft tumors27,28. As an important adjunct to the differentiation studies we took several approaches to assess the genetic integrity of the cell lines. Prior to initiating the experiments, the four test laboratories confirmed that the cell lines they planned to use had normal diploid karyotypes, excepting NIBSC5, which carried a gain of the chromosome 20q amplicon that has been previously described29. Gene expression data also permitted evaluation of the genetic integrity of the cell lines at the time they were used in the experiments. Over- or under-representation of specific regions of the genome in the undifferentiated PSC lines was evaluated using expression karyoty** (e-Karyoty**)30. Of the 13 cell lines, only one, the ES cell line MEL1 INSGFP/w, showed an aberrant e-karyotype containing extra copies of chromosomes 12 and 17 (Fig. 1a). These discrepancies from the test laboratories reports for NIBSC5 and MEL1 INSGFP/w most likely reflect the sensitivities of different assays for detecting low level genetic mosaicism31 and the propensity of variants to overgrow the culture rapidly once they appear32. Consistent with this interpretation, the MEL1 INSGFP/w is known to exhibit karyotypic instability in culture (RM, EGS, and AGE, unpublished results). Because of the heterogeneous cell composition of teratomas a different methodology is required to evaluate the chromosomal integrity of the cells comprising them. eSNP-karyoty** enables a direct analysis of chromosomal aberrations by calculating the expression ratio of SNPs, making it less sensitive to global gene expression changes between different samples33. eSNP-Karyoty** of the teratomas indicated that most remained karyotypically diploid, but also revealed that teratomas derived from NIBSC5 had additional copies of chromosomes 12 (and perhaps 20), and that teratomas derived from MEL1 INSGFP/w carried an additional copy of chromosome 17, but not chromosome 12 (Fig. 1b). Extra copies of human chromosomes 12, 17, and 20 are recurrent changes in cultured PSCs, and have also been reported in human germ cell tumors29. These changes likely reflect a selective advantage conferred by extra copies of genes on these chromosomes to cells grown either in vitro or in vivo34,35. Taken together our results suggest that cultures of NIBSC5 and MEL1 INSGFP/w, but of none of the other 11 lines, were initially mosaic containing low levels of variant cells.

Fig. 1
figure 1

Detection of chromosomal aberrations in PSC and tumors using e-Karyoty** and eSNP-karyoty**. a e-Karyoty**: each line depicts the moving average plots of global gene expression in 13 different cell lines over 300-gene bins. The gene expression of 12 cell lines (black lines) was close to the total mean, suggesting a normal karyotype. In contrast, all replicates of the MEL1 INSGFP/w (cyan) cell line showed considerable upregulation of genes from both chromosomes 12 and chromosome 17, suggesting that it harbors an additional copy of these chromosomes. b eSNP-karyoty**: detection of chromosomal aberrations in tumors using eSNP-karyoty**. Each line depicts the moving average (over 151 SNPs) of gene expression generated from RNA-seq data of tumor derived from 13 different cell lines (one plot per source cell line). Colors represent tumor replicates. Only tumors derived from MEL1 INSGFP/w and NIBSC5 show an altered allele ratio in both replicates, suggesting an aberrant karyotype with additional copies of chromosomes 17 and 12, respectively

PluriTest analysis

PluriTest was used to assess the molecular similarity of the different undifferentiated cell lines to that of other known PSC lines. RNA samples were analyzed using the Illumina Human HT-12 v4 Expression BeadChip and subjected to the PluriTest algorithm18. PluriTest generates two summary scores from global gene expression profiles: a pluripotency score that predicts whether a cell sample is pluripotent based on the similarity of its gene expression signature to gene expression profiles of a large collection of human PSC; and a novelty score that detects the presence of gene expression patterns usually not associated with human PSC. A pluripotent cell line is characterized as passing the PluriTest if it simultaneously exhibits a high Pluripotency and a low-novelty score. If the scores of a test cell line deviate from the empirically determined Pluripotency and Novelty thresholds, the sample is flagged for further investigation. As the original PluriTest algorithm was developed for an older Illumina BeadChip platform, it was adapted to a new platform using the H9 samples from all four laboratories as a control for technical variation (Supplementary Fig. 1). Analyzing samples with the updated PluriTest script, showed that at least one replicate of most lines assayed passed both PluriTest criteria (Fig. 2; Supplementary Fig. 1).

Fig. 2
figure 2

Pluritest. a All PluriTest results from this study (red circles) are based on normalization to the H9 samples and were plotted on the background of the empirical density distribution of all pluripotent (red cloud) and differentiated samples (blue clouds) in the PluriTest training dataset18. bf highlight the subsets of samples included in this study: All results from the same hPSC line (H9) cultured at each laboratory (b). Samples from Lab 1 (c), Lab 2 (d), Lab 3 (e), Lab 4 (f) are highlighted specifically. All cell lines are above the Pluripotency Score threshold (θP >= 20). Both replicates of two cell lines MEL1 INSGFP/w in d and DF19-9-11T.H in e score above the Novelty threshold (θN >= 1.67) and thus would be highlighted for further investigation. Three cell lines show larger differences between the novelty scores of their respective replicate samples 201B7 in c, RM3.5 C in d, and Oxford-2 in f

In the case of cell lines RM3.5 and Oxford-2, while we observed high-Pluripotency Scores in both replicates (Fig. 2), there was a large difference in the Novelty Scores between the two replicates, placing one replicate above the empirical threshold for the Novelty Score (1.67). A similar result was obtained for one of the two replicates from the 201B7 cell line. The differences in Novelty score observed between replicates could be due to technical failures of the array hybridization, or it could reflect differing extents of spontaneous differentiation in the cell line samples analyzed. Nevertheless, we concluded that all cell lines with one replicate below the empirical Novelty Score threshold passed PluriTest and are predicted to have pluripotent differentiation potential in vitro and in vivo. However, the PSC lines DF19-9-11T.H and MEL1 INSGFP/w did not pass the empirically determined Novelty Score threshold of 1.67, thus flagging them for further investigation. Interestingly, the MEL1 INSGFP/w PSC line did have an abnormal e-Karyotype (Fig. 1a, b), providing a possible explanation for its borderline results in PluriTest.

Scorecard analysis of embryoid body differentiation in vitro

The participating laboratories also subjected their cell lines to a standardized embryoid body (EB)-differentiation protocol under four different conditions: neutral, without the addition of exogenous growth factors that favored any particular lineage, and directed conditions designed to promote initial differentiation into ecto-, meso-, or endoderm lineages, respectively21. It was anticipated that these protocols would be sufficient to direct differentiation toward the germ layer of interest but would not necessarily support the generation of more mature cell types. Lysates from the resulting EBs were examined by qRT-PCR at 0, 4, 10, and 16 days of differentiation for expression of 190 genes (Supplementary data 2, 3) modified from the set used by Bock et al.22, to include genes characteristically expressed in undifferentiated PSC, extraembryonic endoderm, trophectoderm, early definitive ectoderm, mesoderm, and endoderm. For each lineage and for undifferentiated cells, we picked an equal number (n = 15) of marker genes for further analysis (Supplementary Table 2), by focusing on those genes with the strongest lineage-specific upregulation of genes in our dataset (Methods section). These marker genes were generally more highly expressed in EBs cultured under the corresponding differentiation conditions, while expression of markers of undifferentiated cells gradually dropped (Fig. 3a, Supplementary Fig. 2a). Gene expression was least variable 4 days after induction of differentiation compared to other time points (Supplementary Fig. 2b, c).

Fig. 3
figure 3

Differentiation potential and propensity in EBs. a The line plots show the mean log2 expression change (relative to day 0) of marker genes (Supplementary table 3) as a function of time and averaged over all cell lines. The expression change is shown under ectoderm conditions for ectoderm markers, mesoderm conditions for mesoderm markers, endoderm conditions for endoderm markers, and across all conditions for markers of undifferentiated cells. Shaded contours indicate the minimum/maximum observed value. b A summary table of the lineage scorecard evaluation of the “propensity” (spontaneous differentiation, left) and “potential” (directed differentiation, right) for each cell line (rows) to differentiate into the respective lineage (columns). Colors and symbols indicate increased (blue) and limited (grading of lighter blues) preference for expression of lineage- specific marker genes. +++: score >3; ++: score 2–3; +: score 1–2; +/−: score <1. nd not analyzed due to RNA failing quality control criteria. c Scatterplots contrasting the lineage score after 16 days of EB differentiation (“propensity”; x-axis) with the lineage score for teratomas derived from the same cell lines (y-axis). The lineage scores for ectoderm (left), mesoderm (center), and endoderm (right) marker expression are shown separately

The lineage scorecard analysis was carried out as described previously22 but with the refined gene set (Supplementary Table 3) and with one conceptual extension: the “potential” of cells to undergo differentiation into the three primary lineages under directed differentiation conditions was distinguished from their “propensity” to differentiate under neutral conditions. The “potential” of a cell to differentiate into a certain lineage was defined as the lineage score at 16 days of directed differentiation culture conditions. That is, ectoderm induction was used for ectoderm marker profiling, mesoderm induction for mesoderm markers, and endoderm induction for endoderm markers. The “propensity” (or inherent bias) of a cell line to undergo differentiation was calculated from the lineage scores (Methods section) of all marker sets after 16 days in neutral differentiation conditions.

Scorecard analysis resulted in three key observations (Fig. 3b, Supplementary Fig. 2a, b). First, in neutral culture conditions all cell lines had the propensity to upregulate ectoderm markers, but all cell lines also initiated mesoderm and endoderm expression programs, though some (KhES-1, 201B7, RM3.5C, and H9 from Labs 2 and 4) had reduced propensities to form one or both of these latter lineages, an apparent bias not recapitulated in the teratoma assay (Table 1 below). Second, ectoderm-inducing and mesoderm-inducing conditions elicited strong, homogeneous expression signatures consistent with the expected directed lineage, while endoderm-inducing conditions elicited more variable responses, depending on both the cell line and on the laboratory, a result most marked in the Oxford-2 line. Third, the data suggest that, overall, all cell lines were capable of differentiating into representatives of all three lineages, although there were differences in how well and how consistently the PSC lines responded to these specific differentiation cues.

Table 1 Histology and teratoscore comparison of xenograft tumors

Differentiation in xenograft teratomas in vivo

Each laboratory produced between one and three xenograft tumors from each cell line, by subcutaneous injection into immunodeficient mice, as described in Methods section (Supplementary Table 1). Although a common protocol was suggested for tumor production, local circumstances mandated some modifications to this protocol in each case, particularly with respect to the particular strains of mice used as hosts. After cutting each tumor into several pieces, approximately half of them were randomly selected for histology, while the other half was processed to provide RNA for RNA-seq and TeratoScore analysis.

All PSC-derived tumors were classified as teratomas, since each contained tissues identified as derivatives of the three germ layers (Fig. 4a, b). Overall, a median of 10% (range, 5–30%) of the differentiated tissues observed were of endodermal derivation, 40% (range, 10–60%) represented tissues of mesodermal origin and 45% (range, 10–80%) represented tissues of ectodermal origin (Table 1 and Fig. 4c). Cells from all three embryonic germ layers were found in the teratomas, derived from both ES and iPS cell lines produced by each of the laboratories. Although all teratomas contained derivatives of the three embryonic germ layers, in fact only a fairly narrow range of tissues was routinely identified. Neural tube-like structures, pigmented epithelium and squamous epithelium accounted for most ectoderm, cartilage, connective tissue, and bone for most mesoderm, and glandular, ductal and intestine tissue for most of the endoderm (Fig. 4c).

Fig. 4
figure 4

Histological evaluation of three embryonic germ layers and undifferentiated EC-Like and yolk sac elements in xenograft tumors. a Mucus secreting intestinal-like epithelium (End-endoderm), neural tube rosettes (Ect-ectoderm), and intervening stroma (Mes-mesoderm) (×240). b Intestinal-like epithelium (End-endoderm), surrounded by connective tissue, smooth muscle and fat cells (Mes-mesoderm). The left outer rim of mesoderm is lined by intestinal-like epithelium (End-endoderm). To the left there is pigmented epithelium (P), corresponding to retina (Ect-ectoderm), and a nest of glycogen rich squamous epidermal cells (Sq) (×120). c A summary of tissue types recorded per individual tumor piece surveyed from each laboratory; at least two pieces of each tumor were examined. d Lower magnification view of a teratoma containing undifferentiated stem cells (EC-Like, ECL), identified as embryonal carcinoma-like (ECL) cells, neural tube-like rosettes (N) and non-descript stromal cells (×120). e Higher magnification of the same xenograft. Undifferentiated ECL cells (ECL) are arranged into anastomosing cords. Dark dot-like cells are undergoing apoptosis. Compare the loosely structured chromatin of the ECL cells with the dark nuclei containing condensed chromatin in the neural rosettes (N) (×240). f Two embryoid bodies (EB) forming tubes lined by ECL cells, separated by a space from the surrounding yolk sac epithelium (YS). Both embryoid bodies contain prominent apoptotic bodies. Note the loosely textured yolk sac (YS) corresponding to the connective tissue that runs between the yolk sac and the blastocyst (magma reticulare) of early human embryos (×120). g Antibody to OCT3/4 staining ECL cell nuclei. h Antibody to the zinc-finger protein ZBTB16 reacts with the nuclei of yolk sac cells around three cylinders of ECL cells. i Antibody to SALL4 staining ECL cell nuclei and also the yolk sac (YS) cells in their vicinity

Some teratomas also contained areas of undifferentiated cells, which we designated as embryonal carcinoma-like (ECL) cells, some exhibited areas of yolk sac elements, and some contained cells in some areas organized into EB like structures (Fig. 4d–f). The histological identification of the ECL was confirmed by immunostaining for expression of OCT3/4 (POU5F1) (Fig. 4g) and the yolk sac cells by immunostaining for ZBTB16 (Fig. 4h)6). Analysis of gene expression in the pluripotent state by PluriTest can be used as a screen to identify rapidly cells that also meet other criteria of pluripotency. PluriTest was designed to be continuously improved: as data from well-characterized training sets of cell lines that show defective differentiation or malignant behavior are added, PluriTest gains power to discriminate subtler characteristics of pluripotent cells. Meanwhile, if direct and quantitative confirmation of differentiation capacity is required, we recommend in vitro spontaneous and directed EB differentiation combined with bioinformatic scorecard analysis, which provides a rapid and facile alternative to the teratoma assay, and one that can be accepted as evidence of pluripotency for purposes of standard cell line characterization. Further, consideration of indicator gene panels taking into account key nodes in gene regulatory networks, may provide better identification of differentiation outliers and future assessment of the capacity for morphogenesis in 3D organoid type cultures in vitro might also prove helpful. At present, we suggest that, independent of other assays used to characterize these cells, it is prudent to carry out the teratoma assay on cells destined for clinical use. Cell banks should consider this option carefully as a part of their standard characterization protocol, particularly for widely used cell lines. The application of TeratoScore provides a more quantitative approach to the readout of the teratoma assay, compared to histologic analysis alone; however, we strongly recommend further research efforts to identify in vitro surrogate biomarkers indicative of malignant potential. Future comparison of results from teratoma assays with in vitro studies, and with genomic analyses of cell lines that yield teratomas with malignant elements, may provide simpler approaches, including in vitro surrogate genetic and epigenetic biomarkers, to identify cell lines with malignant potential.

Fig. 6
figure 6

Proposed strategy to analyze new human pluripotent stem cell lines depends on the information required. To first determine whether or not a cell line is pluripotent (orange lines), its signature can be compared to that of known pluripotent cells’ signatures using PluriTest. To confirm whether that cell line (blue lines), is capable of differentiating into derivatives of all three embryonic germ layers in vitro embryoid body (EB) formation in ‘neutral’ differentiation conditions, or by specific lineage-promoting differentiation conditions, combined with bioinformatic scorecard analysis, should be sufficient. If necessary differentiation to specific mature cells types may be also assessed in vivo by xenografting and teratoma formation followed by either histological analysis or RNA-seq analysis using Teratoscore. But to evaluate whether the cell line in question might have malignant potential (green lines) careful examination of histological sections of the teratoma using antibodies to specific markers or by focusing the RNA-seq and TeratoScore on specific markers associated with a malignant phenotype is suggested

Methods

Cell culture

Each participating laboratory was asked to select three PSC lines (Supplementary Table 1) to analyze together with a PSC line, H9 (WA09)12, that was used in common in all laboratories. The cells were grown according to the standard conditions typically used in each participating laboratory (Supplementary Table 1).

e-Karyoty**

Gene expression profiles of the undifferentiated samples were analyzed using Illumina HT12 microarray platform as described for PluriTest, below. Annotations of the microarray platform probes were obtained from the Illumina website (http://www.illumina.com/). Probe sets were organized by their chromosomal location, and their expression values were log2-transformed. Probe sets without annotated chromosomal locations were removed. An expression threshold was defined according to the levels of the upper third highest expressing probes, and probes with lower expression were elevated to this threshold. Probe sets not expressed in over 20% of the samples were removed to decrease expression noise. To obtain a comparative value, the median of each gene expression value across all samples was subtracted from the gene’s expression value in each sample. This median also served as a baseline to examine expression bias. The 10% most variable genes, calculated by the sum of squares of relative expression value for each gene, were removed during the analysis. Data were processed using CGH-Explorer (http://www.softgenetics.com/CGHExplorer.html). A moving-average plot was generated using the moving average fit tool, with windows of 300 genes.

eSNP-karyoty**

eSNP-karyoty** was performed as previously described33. Briefly, RNA-seq reads were aligned to the genome (assembly version GRCh38) using Tophat254 and SNPs were called using GATK HaplotypeCaller55. Called SNPs were filtered by read number, with SNPs expressed in <20 transcripts discarded, and minor allele frequency and allelic ratio (major to minor) was calculated for the whole transcriptome. For visualization, moving medians of the major to minor ratios were plotted along the moving medians of the chromosomal positions using a window of 151 SNPs.

PluriTest

PluriTest analysis was performed as previously described18. We used R3.2.1 together with lumi 2.20.2 and the original PluriTest workspace. Due to an overall shift in PluriTest results from experiments performed with newer versions of the Illumina microarray platform, we added a correction-vector to the matrices used in the computation of the Pluripotency and Novelty Scores. We used the H9 samples available from all laboratories to correct the data toward the reference H9 normalization target used in the original PluriTest algorithm (Supplementary Figure 1). The shift-vector is simply the difference between the row-wise means of the H9 samples in the current dataset and a H9 reference sample used as the normalization target in the original PluriTest implementation18. Since, the shift-vector is not restricted to positive data, we relax the non-negativity condition and estimate the matrix calculation by replacing the multiplicative update used in the original PluriTest workspace with a standard linear regression. The modified algorithm was tested on the original training dataset to guarantee consistent results (Supplementary Figure 1). The scripts required to run the analysis are provided via the group GitHub repositories and PluriTest’s website (www.pluritest.org). The PluriTest workspace is available at https://github.com/pluritest/pluritestCompared.

Production of size-controlled embryoid bodies

EBs were produced as previously described56. Briefly, cells were trypsinized, counted and re-seeded in 96 well u-bottomed plates at a density of 3000 cells per well in APEL media (Stem Cell Technologies, Vancouver, CA) supplemented with factors for four differentiation conditions: neutral (without any growth factors), ectoderm (10 μM dorsomorphin, 10 μM SB431542 and 100 ng/ml basic-FGF), endoderm (100 ng/ml Activin-A, 1 ng/ml BMP4) and mesoderm (20 ng/ml Activin-A, 20 ng/ml BMP4) differentiation. All growth factors were added once to the cultures at the onset of differentiation, the medium was not changed and the EBs were left in suspension for the course of the experiment. Biological replicates of each cell line were differentiated and harvested at three time points (4, 10, and 16 days) into RNAlater (Life Technologies, USA) and stored at −80 °C for future use.

RT-PCR gene expression analysis

Total RNA was extracted and purified using the PicoPure RNA Isolation Kit (Arcturus Bioscience) and QCed with a 2100 Bioanalyzer (Agilent Technologies). The high-capacity cDNA Archive Kit (Thermo Fisher Scientific) was used to generate cDNA representative of the polyadenylated transcriptome. Preamplification of cDNA was performed using the TaqMan PreAmp Master Mix (Thermo Fisher Scientific) following manufacturer’s instructions with 10 cycles of amplification. Each of the two sets of 96 Delta Gene assays were pooled for priming of the preamplification reaction (i.e., two independent preamplification runs for each RNA sample). Delta gene assays were designed and provided by the manufacturer (Fluidigm) and are listed in Supplementary data 6, 7. Real-time PCR was performed using these Delta Gene assays (Fluidigm), the preamplified cDNAs, and 96.96 Dynamic Arrays (Fluidigm) run on a Biomark HD Real-time PCR System (Fluidigm) following the Fast Gene Expression Analysis Using EvaGreen protocol (User Guide PN 68000088 J1) provided by the manufacturer (Fluidigm). Cycle Threshold (Ct) values were calculated using the instrument’s software (Application Version 4.1.2; Fluidigm).

Production of teratomas

Teratomas were generated in immunodeficient mice according to a common protocol but necessarily modified to accommodate local laboratory circumstances (Supplementary Table 1) and governed by local animal experimental rules. After a suitable growth period, the tumors were excised and divided into several pieces. To ensure representation across the tumor, a random selection of half of the pieces of each tumor was placed in RNAlater (Life Technologies, USA) and frozen at −80 °C prior to ship** for RNA-seq analysis. The remaining half of the pieces were fixed in 10% formal-saline prior to processing for histological analysis.

Histological analysis

At least two different teratoma pieces were sampled from each PSC line injected. Serial sections from 2 to 10 different pieces of each tumor were examined by two investigators who estimated the amount of differentiation into tissues derived from all three germ layers. The presence of yolk sac, embryoid bodies (EB) and undifferentiated cells, classified as embryonal carcinoma-like cells (ECL), were also noted.

Immunohistochemical staining

Sections (4 µm) from formalin-fixed, paraffin embedded samples were subjected to immunohistochemical detection of ZBTB16, SALL4, and OCT3/4 (POU5F1). Briefly, after deparaffinization and rehydration, tissue sections were treated using either citrate buffer (ZBTB16) or Borg Decloaker (SALL4, OCT3/4, Biocare Medical, Concord, CA) for 5 min in a pressure cooker for antigen retrieval. Hydrogen peroxide (3%) was then applied to the sections to quench endogenous peroxidase activity. Sections were then incubated with primary antibodies against ZBTB16 (PLZF clone D-9; 1:50 dilution; Santa Cruz Biotechnology, Santa Cruz, CA, USA), SALL4 (Clone 954–1054; 1:100 dilution; Biocare Medical, Concord, CA, USA) and OCT3/4 (Clone SEM; prediluted; Biocare Medical, Concord, CA, USA) for 45 min. After extensive rinsing, all sections were incubated with anti-mouse HRP-labeled polymer (EnVision TM + System, Dako, Carpinteria, CA, USA) for 30 min. Finally, the staining was visualized by DAB+ (Dako, Carpinteria, CA, USA). Immunohistochemical staining was performed using the IntelliPATH FLX Automated Stainer at room temperature. A light hematoxylin counterstain was performed, following which the slides were dehydrated, cleared, and mounted using permanent mounting media.

RNA-seq analysis

RNA was purified as described in RT-PCR analysis (below) and the same teratoma RNA samples were used in both the RT-PCR and RNA-seq experiments. RNA-seq libraries were prepared using the RNA sample preparation kit v2 (Illumina) according to the manufacturer’s standard protocol. Briefly, polyadenylated RNA was first purified from total RNA was first purified through oligo-dT attached magnetic beads using two rounds of purification. Poly(A) RNA was subsequently fragmented and primed with random hexamers for cDNA synthesis. First strand cDNA synthesis was for 50 min at 42 °C using SuperScript II reverse transcriptase. After second strand cDNA synthesis, multiple indexing adapters were ligated, and libraries quality controlled with 2100 Bioanalyzer (Agilent Technologies), normalized and pooled prior to sequencing. Libraries were subjected to 101 base pair-end multiplex sequencing on an Illumina HiSeq 2000 in high-output mode. Samples were multiplexed (7–8 samples per lane) resulting in an average depth of 58 million reads per sample. Reads were aligned (human reference genome hg19) and transcripts counted using Tophat and Cufflinks. Data for analysis was expressed as FPKM values (Supplementary data 8, 9)

TeratoScore analysis

Since the original TeratoScore analysis26 was performed on DNA microarray data, it was necessary to adapt the algorithm to the analysis of RNA-seq data. Briefly, a 100-gene scorecard of tissue-specific genes representing the three embryonic germ layers and extra- embryonic membranes was established (Supplementary data 4). By comparing RNA-seq expression data of 14 human body tissues, we identified genes with high tissue-specificity (expressing over 8-fold higher in a given tissue, compared to the mean of all other tissues). The expression of these genes was then compared between human PSCs and teratomas, validating their enrichment in differentiated cells (expressing over fourfold higher in teratomas compared to PSCs). Tissue specificity was finally validated using Amazonia! (http://amazonia.transcriptome.eu), with a requirement for distinct tissue expression (an order of magnitude over all, or most, other tissues)57. The RNA-seq data utilized in this analysis were obtained from the following sources: 13 human body tissues were obtained from The Genotype-Tissue Expression project (GTEx, http://www.gtexportal.org)58 (Supplementary data 5). A minimum of five samples from each tissue was used to calculate a baseline expression, with samples chosen by the shortest ischemic time and highest RNA quality (Supplementary data 5). RNA-seq expression data for extra-embryonic tissues and human PSCs were obtained from the NIH Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra) (Supplementary Data 5). Teratoma gene expression was obtained from 4 karyotypically normal teratomas from the ISCI cohort (Supplementary data 5). Gene lists for central and peripheral nervous system, and for small intestine and colon were each merged together, as their specific-gene expression was similar. To generate the TeratoScore output, average expression for each lineage was calculated as the mean expression of all genes representing that lineage. TeratoScore grades were calculated as the multiplication of these means and dividing this product by 100.

RT-PCR analysis of EB differentiation

Data from all Fluidigm plates were collected (Supplementary Data 2) and analyzed in R (https://www.r-project.org). Low-quality datasets were removed (<33% of expected genes detected) and the raw Ct values were capped at 35, scaled to the control genes (ACTB, GAPDH), quantile-normalized59,60. To ease interpretation, we inverted the normalized numbers by subtracting them from the maximum (Ct = 35), resulting in numbers in which greater values represent stronger expression. For all further analysis and plots, we selected 15 marker genes per lineage based on effect size during differentiation in the EB assays (Supplementary Table 2). To this end, we calculated the rank of each gene in the comparison between expression measurements at day 16 (compared to day 0) per culture condition, taking the median across all cell lines. We then picked the 15 top-ranked genes for each condition as markers for the respective lineage, and the fifteen genes with the lowest average rank (i.e., those that were downregulated, on average, as response to culture conditions) as markers for undifferentiated cells. Scorecard analysis was afterwards performed as previously described22. Briefly, we calculated a parametric gene set enrichment analysis on moderated t-scores for comparisons between each set of replicates (per cell line, time and condition) and all data at day 0. We then used a modified gene set enrichment analysis to examine the over representation of lineage markers in the gene lists ordered by these t-scores61,62,63.

Data availability

The authors declare that all data supporting the findings of this study are available within the article and its supplementary information files or from the corresponding author upon reasonable request. Data from Illumina arrays (Pluritest), Fluidigm PCR and RNA-seq experiments have been deposited in the GEO database under accession code GSE97964.