Abstract
Tumors comprise diverse cancer cell populations with specific capabilities for adaptation to the tumor microenvironment, resistance to anticancer treatments, and metastatic dissemination. However, whether these populations are pre-existing in cancer cells or stochastically appear during tumor growth remains unclear. Here, we show the heterogeneous behaviors of cancer cells regarding response to anticancer drug treatments, formation of lung metastases, and expression of transcription factors related to cancer stem-like cells using a DNA barcoding and gene expression recording system. B16F10 cells maintained clonal diversity after treatment with HVJ-E, a UV-irradiated Sendai virus, and the anticancer drug dacarbazine. PBS treatment of the primary tumor and intravenous injection of B16F10 cells resulted in metastases formed from clones of multiple cell lineages. Conversely, BL6 and 4T1 cells developed spontaneous lung metastases by a small number of clones. Notably, an identical clone of 4T1 cells developed lung metastases in different mice, suggesting the existence of cells with high metastatic potential. Cas9-based transcription recording analysis in a human prostate cancer cell line revealed that specific cells express POU5F1 in response to an anticancer drug and sphere formation. Our findings provide insights into the diversity of cancer cells during tumor progression.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Cancer cell proliferation promotes tumor growth with continuously increasing diversity in gene expression profiles, resulting in tumor heterogeneity that allows tumors to include various clonal proportions adapting to the tumor microenvironment, resistance to anticancer treatments, and migration to distant metastatic lesions [3, 37]. Recent large-scale cancer genome analysis has revealed a hierarchical evolution in cancer progression and identified unique mutations in distant metastases from a primary tumor, suggesting that metastatic cancer cells acquire de novo mutations during migration or proliferation in the metastases [7, 39]. However, genetic mutations are not the only cause of the diverse phenotypes of cancer cells, but epigenetic regulation also contributes to the development of tumor diversity [6, 32].
In cancer progression, resistance to anticancer therapies and metastatic potential are the major factors contributing to cancer-related deaths [18]. Thus, it is critical to identify whether cancer cells include a pre-existing population with resistance to anticancer therapies and metastatic abilities or whether these cancer cells appear during cancer progression [9, 10]. Cancer stem-like cells (CSCs) are thought to initiate tumorigenesis and resist environmental stress to promote cancer progression [1]. Pluripotent transcription factors, including NANOG, SOX2, and OCT3/4, contribute to the establishment of stem cell-like properties in cancer cells, similar to embryonic stem cells [19, 24, 25]. Some cancer cell lines indicate the existence of elite cells that initiate tumorigenesis, generate metastases, and survive anticancer drug treatment [2, 19, 25, 29]. However, anticancer drugs do not always select specific clonal populations [24]. Moreover, the contribution of rare clonal proportions with distinct capabilities, such as CSCs, to cancer progression is unclear.
We examined whether a cancer cell population with the distinct capabilities of proliferation, metastasis, and resistance to anticancer drugs is pre-existing or develops post treatment in multiple cancer cell lines using DNA barcoding and gene expression recording systems. To this end, we utilized mouse melanoma cells B16F10 and B16/BL6 (BL6) and the mouse mammary carcinoma cell line 4T1 to identify the distinct cancer cell population promoting cancer progression. These cell lines are often used to analyze metastasis and the human prostate cancer cell line DU145 for tracing CSCs under environmental stress. We previously found that DU145 cells contain NANOG-positive CSCs that escape NK cell attack [13, 31]. In this study, we used a lentivirus-based DNA barcoding system to distinguish cancer cells during cancer progression. We also established DU145 with the insertion of Cas9 immediately before the stop codons of the NANOG, SOX2, and POU5F1 genes. The expression of these transcription factors in the self-targeting guide RNA (stgRNA) was recorded as the frequency of mutations in the sequence, reflecting Cas9 activity [27]. Our results revealed multiple clones and a few clone-composed metastases, the stochastic appearance of anticancer drug-resistant cells, and POU5F1 expression in specific cells responding to either sphere formation or anticancer drugs. Thus, therapeutic strategies are required to target pre-existing and post-treatment cancer cells with metastatic and anticancer drug resistance capabilities.
Results
DNA barcoding system identifies each cell, allowing analysis of cell lineage, proportion, and dynamics at a single-cell level. We designed a lentivirus plasmid with random sequences comprising 30 bases each. The plasmids included 700,807 variations in the depth of ~ 20 M reads, suggesting that 7.1% variation in barcode occupied 95% of the library (Fig. 1A). Sequencing error contributes to a nucleotide mismatch, appearing as hamming distance of 1–2. We sought to reduce the bias causing sequencing errors by aggregating barcodes according to Hamming distance and selecting barcodes detected in plasmids and all barcoded-B16F10, BL6, and 4T1 cells (see Materials and Methods). The obtained 4,367 barcodes did not contain barcodes with excessive amounts (Fig. 1B) and showed an average abundance (Fig. 1C). Thus, we analyzed the dynamics of these 4367 barcodes during cancer progression in B16F10, BL6, and 4T1 cells.
First, we determined whether cancer cells show diversity in cell proliferation in vivo. We inoculated B16F10 cells on the backs of C57BL/6N mice and collected tumors with diameters 5, 10, and 15 mm, corresponding to 4, 10, and 14 days after inoculation (Fig. 1D). Cell numbers increased ~ 200-fold from 5 to 10 mm tumors and ~ fivefold from 10 to 15 mm tumors (p = 0.0018 and p = 0.00019, respectively; Fig. 1E). We found a significant difference in the detected barcode numbers among the samples, although the difference was slight (Fig. 1F). Because the barcode data were skewed, we calculated the corrected barcode number using Shannon entropy to obtain the effective barcode numbers by removing skewing from the data [38] (Fig. 1G and H). The original barcoded-B16F10 cells contained ~ 3250 barcodes. B16F10 tumors drastically increased in cell numbers from 5 to 10 mm tumors, and the number of barcoded cells decreased slightly after cell proliferation (2930–2723, p = 0.025). In contrast, an ~ fivefold increase in cell number from 10 to 15 mm significantly decreased the number of barcode variations (2723–2000, p = 1.2e–06, Fig. 1H). We obtained the top 10 barcodes with the highest number of reads from each sample and created a list of those barcodes. The amounts of barcodes in that list in each sample were used to calculate JS divergence. We then analyzed the change in the abundance of individual barcodes of the top 10 in each sample. Results suggested that the composition of barcodes significantly changed during tumor growth from 10 to 15 mm but not during tumor growth from 5 to 10 mm compared to in vitro barcoded-B16F10 cells (Fig. 1I). We then applied Jensen–Shannon (JS) divergence to compare the similarities between samples [20]. A high JS divergence score indicates low similarity between samples, whereas a low score indicates high similarity. We detected high similarity in the composition of barcodes among in vitro B16F10 cells (5 and 10 mm tumors). In comparison, 15 mm tumors showed different compositions from other samples based on the top 10 barcodes from each sample or all barcodes (Fig. 1J). These data indicate the transition of cell proliferation profiles from homogeneous to heterogeneous during tumor growth in B16F10 tumors.
Next, we examined whether the repression of tumor growth by anticancer drugs decreases the variety of barcodes and enriches a specific clone in B16F10 tumors. We analyzed two anticancer drugs with different molecular mechanisms to repress tumor growth: hemagglutinating virus of Japan envelope (HVJ-E), a UV-irradiated Sendai virus that almost lost replicating ability, and dacarbazine (DTIC) [15, 17, 23]. HVJ-E and DTIC significantly suppressed B16F10 tumor growth (p = 0.0015 and p = 0.0106, respectively; Fig. 2A). Notably, HVJ-E and DTIC-suppressed tumor growth did not decrease the corrected barcode number in B16F10 tumors, compared to the PBS-treated group (p > 0.77, Fig. 2B). We did not find the enrichment of a specific barcode across HVJ-E and DTIC-treated tumors (Fig. 2C). We obtained the top 10 barcodes as described in Fig. 1. DTIC tended to affect the composition of the top 10 barcodes more than PBS and HVJ-E (Fig. 2D). In contrast, the two anticancer drugs were similar to PBS in affecting the composition of all barcodes (Fig. 2D). These results suggest that B16F10 cells do not contain pre-determined anticancer drug-resistant cells and that the cells surviving in the tumor are stochastically determined during tumor growth even under anticancer drug treatment.
Next, we determined the cellular composition of lung metastases comprising B16F10 cells. We created a B16F10 lung metastasis model by injecting PBS into the primary tumor and intravenously injecting B16F10 cells to examine whether these methods affect the cellular composition of lung metastases. Three injections of PBS into B16F10 primary tumors promoted the formation of lung metastasis. We separately collected the primary tumor and metastases and then analyzed the barcode composition (Fig. 3A). The corrected barcode number detected in the lung metastases was comparable to that detected in the primary tumor (p = 0.077, Fig. 3B). We also did not find enrichment of specific barcodes in lung metastases compared to the primary tumor (Fig. 3C). The intravenous injection of B16F10 cells resulted in lung metastases (Fig. 3D). A diverse composition of barcodes was observed in each lung metastasis, and the number of barcodes was less than that of barcodes in the original cells (p < 0.00076, Fig. 3E). We did not find barcodes enriched in metastases across all mice (Fig. 3F). The cellular composition of lung metastases was highly dependent on the mouse. Metastases in mouse #1 were composed of a few or multiple different cells, whereas those in mouse #4 and #5 comprised ~ 2000 different clones. Mice #2 and #3 showed metastases composed of a few diverse cell types (Fig. 3G). These data indicated the method-dependent cellular composition of B16F10 lung metastases. These results suggested that most B16F10 cells can contribute to metastasis formation.
To examine the cellular composition of spontaneous lung metastases, we barcoded BL6 and 4T1 cells since they can spontaneously establish lung metastases from the primary tumor. The barcoded BL6 cells contained ~ 3200 types of barcodes. We inoculated 1 × 106 barcoded BL6 cells onto the backs of C57BL/6N mice (Fig. 4A). The BL6 primary tumors comprised 683 barcodes on average (median, 747); in contrast, the lung metastases comprised 58 barcodes on average (median, 4) (p = 0.0027, Fig. 4B). Most lung metastases were composed of a few clones; however, some metastases contained ~ 1000 clones, suggesting that it caused a bias when the whole lung was analyzed to determine the cellular composition (Fig. 4C). Each metastasis was enriched in different clones in the same mice and across all the BL6-burdened mice (Fig. 4D). The primary tumors contained 114 barcodes on average (107 on median) although we inoculated the same number of barcoded-4T1 cells into BL6 cells (Fig. 4E and 4F). Most of the 4T1 spontaneous lung metastases contained two barcodes on average (median, two), which was significantly less than that in the primary tumor (p = 0.0015, Fig. 4F). All 4T1 lung metastases analyzed contained one or two clones (Fig. 4G). Mice #1 and #2 formed lung metastases from the different clones, whereas mouse #4 developed five metastases from the same clone (Fig. 4H). Moreover, one lung metastasis in mouse #4 and two metastases in mouse #5 were composed of the same clone, suggesting that 4T1 cells contain pre-existing cells with high metastatic ability. These data suggest that most spontaneous lung metastases are composed of a limited number of stochastically selected cells and that some cell lines, such as 4T1, have a rare population, which is consistent with a previous report [40].
The DNA barcoding system can identify diverse cell proliferation, anticancer drug-resistant cells, and cell metastasis. However, it cannot trace the lineage of cells with specific gene expression. CSCs are thought to contribute to cancer progression. However, the descendants of CSCs comprise a significant population of cancer cells after anticancer drug treatment and under cell stress conditions is not clear. Hence, we used Cas9 and stgRNA to establish transcription-recording cancer cells [27]. We previously found that DU145 cells (human prostate cancer cell line) require the expression of NANOG, a stem cell-related transcription factor, to resist anticancer drugs and escape immune surveillance [13, 31]. We inserted the Cas9 sequence along with three repeats of mClover3 immediately before the stop codon of stem cell-related transcription factors, including NANOG, POU5F1, and SOX2 (NANOG-, POU5F1-, and SOX2-DU145 cells) (Fig. 5A). DU145 cells with the Cas9 cassette in each gene formed spheres similar to wild-type cells (Fig. 5B). Although we did not detect significant mClover3-positive cells in NANOG-DU145, POU5F1-DU145, and SOX2-DU145 cells by flow cytometer (Fig. 5C), we detected Cas9 protein in control or Docetaxel (DTX)-treated NANOG-DU145, POU5F1-DU145, and SOX2-DU145 cells by western blotting (Fig. 5D). DU145 cells with the Cas9 cassette in each gene expressed transcription factors comparable to those in wild-type cells (Fig. 5D). DTX- treatment did not significantly affect the protein expression of each transcription factor but sphere formation decreased the protein expression (Fig. 5D and E). These results indicate that the Cas9 cassette acts as a surrogate for the expression of each transcription factor in DU145 cells with Cas9 cassette.
To determine whether the expression of these transcription factors promotes resistance to anticancer drugs and stress conditions, we traced the expression of transcription factors in each cell during DTX treatment or sphere formation. We labeled the Cas9 cassette-introduced DU145 cells with a lentivirus-encoding DNA barcode and stgRNA. We identified 116,722 barcodes corrected using Shannon entropy and Hamming distance 2. DTX treatment and sphere formation suppressed cell proliferation in all DU145 cells with the Cas9 cassette and DNA barcoding (Fig. 5F). The number of DNA barcodes in each cell line was decreased on DTX treatment compared to that in untreated cells (Fig. 5G). Sphere formation reduced the number of DNA barcodes in NANOG #112 and POU5F1 cells but not in NANOG #128 and SOX2 cells (Fig. 5G). Next, we analyzed the sequence changes that occurred in stgRNA. We confirmed a mutation at position 16 in the stgRNA sequence in DU145 cells with Cas9 compared to the plasmid control (Fig. S1), indicating that DU145 cells with Cas9 and stgRNA work to record transcription factor expression levels in the stgRNA. We used deletion and insertion mutations for the calculation because mismatch mutations were found throughout the stgRNA sequence, suggesting a Cas9-independent error. Thus, we compared deletion and insertion as percent mutations in stgRNA on days 0, 7, and 14 in each cell. The percentage of mutated stgRNA increased in DTX-treated NANOG-DU145 cells (Fig. 6A). In contrast, we detected a slight increase in mutated stgRNA in DTX-treated POU5F1-DU145 cells (Fig. 6B) but not in SOX2-DU145 cells (Fig. 6C). However, sphere formation did not significantly increase the percentage of stgRNA mutations in any of the cells (Fig. 6A–C).
To examine whether the expression of these transcription factors is associated with an increase in the proportion of cells, we computed Spearman's correlations between the mean of mutation size of stgRNA and the number of barcode reads, weighted by the number of barcode reads. We then detected higher and significant values of the correlations, especially in Day14 DTX-treated POU5F1 cells (Spearman’s correlation 0.455 for #112 and 0.433 for #114), compared to NANOG (Spearman’s correlation 0.410 for #112 and 0.388 for #128) and SOX2 (Spearman’s correlation 0.350 for #98 and 0.370 for #101) cells (Fig. 6D–F, Table S1). The results suggest that cells with POU5F1 expression are associated with an increase in proportion compared to NANOG and SOX2 expression. Next, we examined whether a cell with a specific lineage expresses these transcription factors, particularly in the top 10 cells with a high number of mutated reads. POU5F1-DU145 cells included more cells that are susceptible to recurrently acquiring the mutation during DTX treatment or sphere formation (Fig. 6E). Compared to the POU5F1 expression profile, NANOG and SOX2 tended to be more stochastically expressed in DU145 cells (Fig. 6D and F). The cumulative distribution fraction of the mutation in all cells showed no significant difference between samples in each cell, as mutations in stgRNA were not often detected (Fig. S2). We also did not detect significant changes in the cellular composition during DTX treatment or sphere formation (Fig. S3). These results suggest that POU5F1 is expressed in a specific cell population upon distinct stimulation in DU145 cells.
Discussion
Anticancer therapy effect on tumor composition
Tumors are composed of various types of cancer cells. Whether the composition of cancer cells is consistent or increases in heterogeneity during tumor growth remains unclear. Our DNA-barcoded B16F10 cells showed a homogeneous proliferation within 10 days after inoculation of cancer cells and subsequently switched to a heterogeneous proliferation state 14 days after inoculation and decreased the number of cell variations (Fig. 6G). These results suggest three possibilities: (1) Rapid tumor growth induces hypoxia, resulting in stochastic cell death at a later stage of tumor growth; (2) cell–cell competition includes stochastic cell death in a loser cell since there is limited space in the mouse skin; and (3) Immune cells begin to eliminate cancer cells at a later stage of tumor growth.
Although DTX and HVJ-E treatments significantly suppressed tumor growth, neither decreased the barcode number in any tumor (Fig. 2). The remaining cancer cell profile remains controversial; several studies suggest that chemotherapy eradicates most cancer clones [2, 14, 16, 19, 30, 40]. Taken together, our findings suggest that 4T1 cells have the ability to form cell populations with different properties and comprise elite cells with metastatic capacity.
CSC properties in DU145 human prostate cancer cell line
WE traced the progeny of cells expressing stem cell transcription factors, including NANOG, POU5F1, and SOX2, during anticancer drug treatment or sphere formation. We observed that cells retrospectively expressing NANOG were increased by DTX treatment at some time point, but those retrospectively expressing POU5F1 and SOX2 were not (Fig. 6I). However, DTX treatment or sphere formation induced the expression of POU5F1 but not NANOG and SOX2 in identical cells. Notably, distinct cells showed increased POU5F1 expression in response to DTX treatment or sphere formation, suggesting that POU5F1 is expressed in a scarce population and that the population includes various cells that respond differently to external stimuli. POU5F1 expression is controlled by several transcription factors, such as HIF1 and MYC, in cancer cells. In renal cell carcinoma (RCC), POU5F1 expression is activated by a HIF1-pathway-responsive promoter in the long terminal repeat (LTR) element that is located upstream of POU5F1 [34]. DNA methylation at the MYC-binding elements in the POU5F1 locus is involved in regulating POU5F1 expression in cancer cells [5]. These suggest that epigenetic status at the POU5F1-regulating element region is critical for the regulation of POU5F1 expression. DTX- or sphere-responding POU5F1-DU145 cells might have a distinct epigenetic profile in the POU5F1-regulating element region. However, DTX- or sphere-responding POU5F1-DU145 clones are 0.52% and 0.37% for #112 and 0.82% and 0.37% for #114 in DU145 cells, meaning it is challenging to analyze a property of the rare population even using current single-cell technologies. NANOG expression is regulated by several transcription factors, such as p53, STAT3, Gli1/2, and HIF [8]. SOX2 expression is regulated by SOX4, OCT4, STAT3, E2F, DNMT, and FoxO1 [Cell barcoding B16F10 (2 × 104 cells/mL), 4T1 (5 × 104 cells/mL), B16BL6 (5 × 104 cells/mL), and DU145-Cas9 knock-in cells (1 × 105 cells/mL) were seeded into 10 or 15 cm culture dishes and incubated for 24 h. Cells were then infected with the barcode library at an multiplicity of infection (MOI) of < 0.3 to be labeled with a single barcode per cell. The medium was replaced with fresh medium 24 h after infection, and the cells were incubated further for 24 h. MOI was assessed by the proportion of blue fluorescent protein (BFP)-expressing cells using flow cytometry (BD FACS Canto II). The cells were maintained in the presence of puromycin for 7 days to remove uninfected cells 48 h after infection. The spheres of barcoded-DU145-Cas9 knock-in cells were formed as previously described [31]. Briefly, each cell line (1 × 105 cells) was plated on EZ-Bind Shut™ SP 100 mm Dish (IWAKI) or Prime Surface 60 mm dish (Sumitomo Bakelite) and cultured in DMEM/F12 (Gibco, NY, USA) supplemented with B27 (Gibco), 4 μg/mL insulin (Sigma, MO, USA), 20 ng/mL epithelial growth factor (Gibco), and 20 ng/mL basic fibroblast growth factor (ORF, Kopavogur, Iceland) for 7, 10, or 14 days at 37 °C and 5% CO2. DU145 Cas9 knock-in cells were seeded into 6-well plates at a density of 1 × 105 cells/well on day 0 and incubated for 24 h. Media were subsequently replaced with fresh media containing 4 nM DTX (Tokyo Chemical Industry) and incubated for 48 h. Cells were retained in the absence of DTX after 48 h of incubation and collected on days 7 and 14. Six-week-old female C57BL/6N and BALB/c mice were purchased from CLEA Japan and maintained in a temperature-controlled pathogen-free room. All animals were handled according to the approved protocols and guidelines of the Animal Committee of Osaka University (Osaka, Japan). HVJ (VR-105 parainfluenza Sendai/52 Z strain) was acquired from the American Type Culture Collection (Manassas, VA, USA), amplified by injection into 10–14-day-old embryonated eggs and purified as previously described [26]. The purified live HVJ virus was inactivated by UV irradiation (189 mJ/cm2). The inactivated HVJ was used as HVJ-E. C57BL/6N mice were inoculated intradermally with barcoded-B16F10 cells (5 × 105 cells/mouse). When the tumor diameter reached approximately 5, 10, or 15 mm, the mice were sacrificed and tumors were collected. Cell suspensions from tumors were obtained by mincing and digestion with collagenase type I (FUJIFILM Wako Pure Chemical Corporation), as previously described [26]. Viable cells were counted using trypan blue (Nacalai Tesque). Tumor size was measured using a slide caliper, and tumor volume was calculated using the following formula: tumor volume (mm3 = 0.5 × length (mm × [width (mm)]2)). C57BL/6N mice were inoculated intradermally with barcoded-B16F10 cells (5 × 105 cells/mouse). When the tumor diameter reached 4.3–4.8 mm, PBS (Nacalai Tesque), HVJ, or DTIC (FUJIFILM Wako Pure Chemical Corporation) was administered. PBS (50 μL/shot) and HVJ (2000 HAU/50μL/shot) were injected directly into the tumor thrice every alternate day. DTIC (100 mg/kg) was administered intraperitoneally thrice every alternate day. Ten days after the first administration, the mice were sacrificed and tumors were collected. Cell suspensions from the tumors were obtained by mincing and digestion with collagenase type I, as previously described [26]. The number of viable cells was counted using trypan blue. Barcoded-B16F10 cells (5 × 105 cells) were suspended in 200 μL of PBS (Nacalai Tesque) and injected into the tail vein of C57BL/6N mice. Eight days after the injection of cells, the mice were sacrificed, and macroscopically visible lung metastases were collected. To establish the 4T1 lung metastasis model, barcoded-4T1 cells (1 × 106 cells) were implanted subcutaneously into BALB/c mice. The primary tumors and macroscopically visible lung metastases were collected 42 days after tumor inoculation. To produce the B16BL6 lung metastasis model, barcoded-B16BL6 cells (1 × 106 cells) were implanted subcutaneously in C57BL/6N mice. The primary tumors and macroscopically visible lung metastases were collected 27 days after tumor inoculation. Cell suspensions from primary tumors were obtained by mincing and digestion with collagenase type I as previously described [26]. The number of viable cells was counted using trypan blue. Genomic DNA was isolated using the DNeasy Blood & Tissue Kit (Qiagen) or the Quick-DNA™ Miniprep Plus Kit (Zymo Research). The region containing the barcode and stgRNA was amplified using Q5 Hot Start High-Fidelity 2 × Master Mix (New England BioLabs). For the PCR of primary tumor of Barcode-B16F10 cells, genomic DNA isolated from 2 × 106 cells was used. When the number of harvested barcoded-B16F10 cells was less than 2 × 106 cells, all harvested cells were used. All genomic DNA isolated from each lung metastasis was used for the PCR of lung metastasis. For the PCR of the primary tumors of barcoded-4T1 and barcoded-B16BL6 cells, genomic DNA isolated from 4 × 106 cells was used. For the PCR of DU145 cells, half of the harvested genomic DNA was used. The following cycling conditions were applied for PCR: 30 s at 98 °C; 30 cycles of 10 s at 98 °C, 15 s at 61 °C, and 20 s at 72 °C; and a final extension at 72 °C for 2 min. The following primer pairs were used for the first PCR: gLibrary-Hiseq-50 bp-SE-U1: ACACTCTTTCCCTACACGACGCTCTTCCGATCTTGTGGAAAGGACGAAACA gLibrary-HiseqX-L1: GACTGGAGTTCAGACGTGTGCTCTTCCGATCTCTAAAGCGCATGCTCCAGAC The PCR products were purified using Monarch PCR & DNA Cleanup Kit (New England BioLabs) or NucleoSpin Gel and PCR Clean-up Midi (MACHEREY–NAGEL GmbH & Co., KG). Eight hundred picograms of purified first PCR products was used as templates for the second PCR with NEB Next Q5 Hot Start HiFi PCR Master Mix (New England BioLabs) in a total volume of 50 μL. In the second PCR, each sample was labeled with 6-bp-long index sequences. The PCR cycling conditions were as follows: 30 s at 98 °C; 12 cycles of 10 s at 98 °C and 1 min 15 s at 65 °C; and a final extension for 5 min at 65 °C. The following primer pairs were used for the second PCR: CRISPR-Library-F: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATOT iPCRtag-X-L2: CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATOT PCR products were purified using AMPure XP (Beckman Coulter). DNA concentrations of the PCR products were assessed by qPCR using the KOD SYBR qPCR Mix (TOYOBO). Subsequently, PCR products were pooled at equimolar ratios, and sequencing was performed with 30% PhiX on an Illumina HiseqX with 150 bp paired-end reads. Cell lysates were subjected to SDS-PAGE, and the separated proteins were transferred onto Immobilon-P transfer membranes (Merck Millipore, Tokyo, Japan). To detect the proteins, anti-NANOG [Cell Signaling Technology (CST), MA, USA; #4903], anti-OCT4 (CST; #2750), anti-SOX2 (CST; #3579), anti-Ty1 (Sigma-Aldrich, SAB4800032), and anti-histone H3 (CST; 4499) primary antibodies were used. Enhanced chemiluminescence horseradish peroxidase-conjugated donkey antirabbit IgG (GE Healthcare Japan, Tokyo, Japan) was used as the secondary antibody for the detection of NANOG, POU5F1, SOX2, and histone H3. Signals were detected using Chemi-Lumi One (Nacalai Tesque) or Chemi-Lumi One Super (Nacalai Tesque) and an ImageQuant LAS 4000 mini system (GE Healthcare). Awk v4.1.3 (https://www.gnu.org) Cutadapt v1.9.1 [22] FASTX-Toolkit 0.0.13 (http://hannonlab.cshl.edu/fastx_toolkit/) trim_galore v0.6.6 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) bbmap v38.90 (https://www.osti.gov/servlets/purl/1241166) umi_tools v1.0.1 [35] R 4.0.4 (https://www.r-project.org) Rstudio 1.4.1106 (https://www.rstudio.com) WeightedCorr v2.1 (https://github.com/matthijsz/weightedcorr) Dplyr v1.0.5 (https://cran.r-project.org/web/packages/dplyr/index.html) ggplot2 v3.3.3 (https://cran.r-project.org/web/packages/ggplot2/index.html) tidyverse v1.3.0 (https://cran.r-project.org/web/packages/tidyverse/index.html) gplots v3.1.1 (https://cran.r-project.org/web/packages/gplots/index.html) The adapter sequence was removed from the read2 sequence using cutadapt with the option -e 0.2. Reads with low quality and < 25 bases were removed using fastq_quality_trimmer with the options -Q33 -t 20 -l 25 and -Q33 -q 20 -p 80. Fastq was converted to Fasta using fastq_to_fasta with the option -Q 33. Next, we counted the reads of each barcode variety in each sample, accepting only length-30 barcode sequences. Although not all of them had the exact form of WSN (W: A/T, S: C/G, N: A/T/G/C) × 10, we did not impose restrictions on that point (in fact, exact WSNx10 reads account for only 10% of the total, and limiting to them would destroy essential parts of our data). The error might be caused by the limitation of a company's ability to generate the precise DNA oligo. Reads with unidentified bases were excluded from the analyses. Since each sample included many similar barcodes, due to reading errors or mutations, we merged them if the Hamming distance was ≤ 2. The algorithm is outlined in (a). The resulting dataset was used in the subsequent analyses. This reduced the variation to approximately 10–50%, depending on the origin of the samples. Expression scores Low-quality reads were removed from the paired reads using trim_galore with the option –fastqc -j 6 –paired. Paired reads were combined using bbmerge. stgRNA and barcode sequences were extracted using umi_tools with the option –bc-pattern to determine a user-defined barcode pattern. To estimate the cumulative expression levels of the target gene, we adopted mutation scores from the original stg sequences [27]. We scored a mismatch base as 1, a start of deletion or insertion as 2.5, and an extension of them as 0.5. Alignment with the minimum mutation score was computed using the conventional dynamic planning algorithm. For even-score alignments, mutations were preferred in the following order: mismatch, insertion, and deletion. The obtained alignments gave separate scores for mismatches, insertions, and deletions. Among them, the mismatch scores presented rather noisy behaviors in our data; therefore, we employed only insertion and deletion scores as indicators of the expression levels. Valid stg selection The stg sequences should ideally be unique in the barcoded-plasmids inserted into DU145 cells. However, the real sequence data reported additional 1386 stg variations, accounting for 5.2% reads in addition to the correct template (94.8% reads). Some of these variations were very different from the template and produced large noise in the stg sequences of the resulting cells. To delete them, we picked up variations with 4.0 or larger mutation scores from the template, including 143 variations counting 215 reads (0.00068% of the total). For each of the 71,361 stg variations that were observed in any of the samples, we computed the mutation scores from each of the 143 variations, and if any of them was smaller than that from the template, the stg variation was regarded as originating from a wrong-stg plasmid and the corresponding reads were simply ignored. In this way, 46,134 were deleted and 25,227 stg variations survived. The ignored reads were 0.3–7.4% in each sample. Barcode selection and merger Similar to the tumor data, including the stg sequences, a set of (barcode, stg, and reads) triplets were obtained for each sample. The barcodes in the original plasmids were selected and merged using the same algorithm as for the tumor data. For example, (A, X, M) and (A′, X, N) were merged into (A, X, M + N), whereas (A, X, M) and (A′, Y, N) yielded (A, X, M) and (A, Y, N), when A and A′ were similar barcodes (Hamming distance ≤ 2). For all other samples, we retained only reads with barcodes included in the resulting set. After this process, the barcode variations were reduced to approximately 25% (plasmid), 50% (total), 40% (Day 0), 20–30% (Day 7), and 10–25% (Day 14). Further analysis of barcode data was performed using R. To remove barcodes derived from sequence error, barcodes detected in all plasmids, barcoded-B16F10, BL6, and 4T1 cells were analyzed using R package dplyr. The corrected barcode number of each sample was calculated as previously reported and plotted using the R ggplot2 package [4]. For the comparison of the barcode composition between samples, the JS divergence was calculated using all the barcodes or the top 10 barcodes of each sample, and a heatmap was generated using the R ggplot2 package. A bubble plot of the top 10 barcodes for each sample was constructed using the R ggplot2 package. A cumulative bar plot of the barcodes was constructed using the R ggplot2 package. Hierarchical clustering heatmaps of mean mutation scores were constructed using the top 10 barcodes with mutated reads using the R gplot package. A heatmap of the mutation rate (deletion, insertion, and mismatch) in each position in stgRNA was constructed using the R ggplot2 package. The cumulative distribution fractions of insertions and deletions in each sample were plotted using the R ggplot2 package. Statistical analysis was performed using R. Welch’s t-test with Bonferroni adjustment was used to compare the two samples. Weighted Spearman correlation was calculated WeightedCorr. Statistical significance was set at p < 0.05.Sphere-forming culture
In vitro treatment with DTX
Mice
Preparation of HVJ-E
Tumors with different size
Tumor treatment
Intravenous lung metastasis model
Spontaneous lung metastasis model
Next-generation sequencing of barcode library
Western blot analysis
Bioinformatic analysis of data
Tools
R package
NGS data of barcodes in B16F10, B16BL6, and 4T1 tumors
DU145 Expression-Records
Data processing using R
Analysis of barcode composition
Analysis of stgRNA
Statistical analysis
Data availability
The barcoding datasets generated during this study are available from the corresponding authors upon reasonable request. The codes used in this study are available at GitHub (https://github.com/GeneTherapyScience/barcode_expression/).
Abbreviations
- CSCs:
-
Cancer stem-like cells
- PDXs:
-
Patient-derived xenografts
- BL6:
-
B16/BL6
- stgRNA:
-
Self-targeting guide RNA
- JS:
-
Jensen–Shannon
- HVJ-E:
-
Hemagglutinating virus of Japan envelope
- DTIC:
-
Dacarbazine
- DTX:
-
Docetaxel
- RCC:
-
Renal cell carcinoma
- LTR:
-
Long terminal repeat
- CDF:
-
Cumulative distribution fraction
- ∆TK:
-
Delta thymidine kinase
- BSD:
-
Blasticidin S-resistant gene
- NT:
-
Non-treatment
References
Batlle E, Clevers H (2017) Cancer stem cells revisited. Nat Med 23:1124–1134. https://doi.org/10.1038/nm.4409
Bhang HC, Ruddy DA, Krishnamurthy Radhakrishna V, Caushi JX, Zhao R, Hims MM, Singh AP, Kao I, Rakiec D, Shaw P, Balak M, Raza A, Ackley E, Keen N, Schlabach MR, Palmer M, Leary RJ, Chiang DY, Sellers WR, Michor F, Cooke VG, Korn JM, Stegmeier F (2015) Studying clonal dynamics in response to cancer therapy using high-complexity barcoding. Nat Med 21:440–448. https://doi.org/10.1038/nm.3841
Birkbak NJ, McGranahan N (2020) Cancer genome evolutionary trajectories in metastasis. Cancer Cell 37:8–19. https://doi.org/10.1016/j.ccell.2019.12.004
Bystrykh LV, Belderbos ME (2016) Clonal analysis of cells with cellular barcoding: when numbers and sizes matter. In: Turksen K (ed) Stem cell heterogeneity: methods and protocols, methods in molecular biology. Springer, New York, pp 57–89. https://doi.org/10.1007/7651_2016_343
Filipponi D, Emelyanov A, Muller J, Molina C, Nichols J, Bulavin DV (2019) DNA damage signaling-induced cancer cell reprogramming as a driver of tumor relapse. Mol Cell 74:651-663.e8. https://doi.org/10.1016/j.molcel.2019.03.002
Flavahan WA, Gaskell E, Bernstein BE (2017) Epigenetic plasticity and the hallmarks of cancer. Science 357:eaal2380. https://doi.org/10.1126/science.aal2380
Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, Mitchell TJ, Rubanova Y, Anur P, Yu K, Tarabichi M, Deshwar A, Wintersinger J, Kleinheinz K, Vázquez-García I, Haase K, Jerman L, Sengupta S, Macintyre G, Malikic S, Donmez N, Livitz DG, Cmero M, Demeulemeester J, Schumacher S, Fan Y, Yao X, Lee J, Schlesner M, Boutros PC, Bowtell DD, Zhu H, Getz G, Imielinski M, Beroukhim R, Sahinalp SC, Ji Y, Peifer M, Markowetz F, Mustonen V, Yuan K, Wang W, Morris QD, Spellman PT, Wedge DC, Van Loo P (2020) The evolutionary history of 2,658 cancers. Nature 578:122–128. https://doi.org/10.1038/s41586-019-1907-7
Gong S, Li Q, Jeter CR, Fan Q, Tang DG, Liu B (2015) Regulation of NANOG in cancer cells. Mol Carcinog 54:679–687. https://doi.org/10.1002/mc.22340
Gunnarsson EB, De S, Leder K, Foo J (2020) Understanding the role of phenotypic switching in cancer drug resistance. J Theor Biol 490:110162. https://doi.org/10.1016/j.jtbi.2020.110162
Jain P, Duddu AS, Jolly MK (2022) Stochastic population dynamics of cancer stemness and adaptive response to therapies. Essays Biochem 66:387–398. https://doi.org/10.1042/EBC20220038
Jeter CR, Yang T, Wang J, Chao H-P, Tang DG (2015) Concise review: NANOG in cancer stem cells and tumor development: an update and outstanding questions. Stem Cells 33:2381–2390. https://doi.org/10.1002/stem.2007
Kalmar T, Lim C, Hayward P, Muñoz-Descalzo S, Nichols J, Garcia-Ojalvo J, Martinez AA (2009) Regulated fluctuations in nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol 7:e1000149. https://doi.org/10.1371/journal.pbio.1000149
Kawamura N, Nimura K, Nagano H, Yamaguchi S, Nonomura N, Kaneda Y, Kawamura N, Nimura K, Nagano H, Yamaguchi S, Nonomura N, Kaneda Y (2015) CRISPR/Cas9-mediated gene knockout of NANOG and NANOGP8 decreases the malignant potential of prostate cancer cells. Oncotarget 6:22361–22374
Kim C, Gao R, Sei E, Brandt R, Hartman J, Hatschek T, Crosetto N, Foukakis T, Navin NE (2018) Chemoresistance evolution in triple-negative breast cancer delineated by single-cell sequencing. Cell 173:879-893.e13. https://doi.org/10.1016/j.cell.2018.03.041
Kiyohara E, Tamai K, Katayama I, Kaneda Y (2012) The combination of chemotherapy with HVJ-E containing Rad51 siRNA elicited diverse anti-tumor effects and synergistically suppressed melanoma. Gene Ther 19:734–741. https://doi.org/10.1038/gt.2011.123
Kreso A, O’Brien CA, van Galen P, Gan OI, Notta F, Brown AMK, Ng K, Ma J, Wienholds E, Dunant C, Pollett A, Gallinger S, McPherson J, Mullighan CG, Shibata D, Dick JE (2013) Variable clonal repopulation dynamics influence chemotherapy response in colorectal cancer. Science 339:543–548. https://doi.org/10.1126/science.1227670
Kurooka M, Kaneda Y (2007) Inactivated Sendai virus particles eradicate tumors by inducing immune responses through blocking regulatory T cells. Cancer Res 67:227–236. https://doi.org/10.1158/0008-5472.CAN-06-1615
Lambert AW, Pattabiraman DR, Weinberg RA (2017) Emerging biological principles of metastasis. Cell 168:670–691. https://doi.org/10.1016/j.cell.2016.11.037
Lan X, Jörg DJ, Cavalli FMG, Richards LM, Nguyen LV, Vanner RJ, Guilhamon P, Lee L, Kushida MM, Pellacani D, Park NI, Coutinho FJ, Whetstone H, Selvadurai HJ, Che C, Luu B, Carles A, Moksa M, Rastegar N, Head R, Dolma S, Prinos P, Cusimano MD, Das S, Bernstein M, Arrowsmith CH, Mungall AJ, Moore RA, Ma Y, Gallo M, Lupien M, Pugh TJ, Taylor MD, Hirst M, Eaves CJ, Simons BD, Dirks PB (2017) Fate map** of human glioblastoma reveals an invariant stem cell hierarchy. Nature 549:227–232. https://doi.org/10.1038/nature23666
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151. https://doi.org/10.1109/18.61115
Lu R, Neff NF, Quake SR, Weissman IL (2011) Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat Biotech 29:928–933. https://doi.org/10.1038/nbt.1977
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17:10–12. https://doi.org/10.14806/ej.17.1.200
Matsushima-Miyagi T, Hatano K, Nomura M, Li-Wen L, Nishikawa T, Saga K, Shimbo T, Kaneda Y (2012) TRAIL and Noxa are selectively upregulated in prostate cancer cells downstream of the RIG-I/MAVS signaling pathway by nonreplicating Sendai virus particles. Clin Cancer Res 18:6271–6283. https://doi.org/10.1158/1078-0432.CCR-12-1595
Merino D, Weber TS, Serrano A, Vaillant F, Liu K, Pal B, Stefano LD, Schreuder J, Lin D, Chen Y, Asselin-Labat ML, Schumacher TN, Cameron D, Smyth GK, Papenfuss AT, Lindeman GJ, Visvader JE, Naik SH (2019) Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer. Nat Commun 10:766. https://doi.org/10.1038/s41467-019-08595-2
Nguyen LV, Cox CL, Eirew P, Knapp DJHF, Pellacani D, Kannan N, Carles A, Moksa M, Balani S, Shah S, Hirst M, Aparicio S, Eaves CJ (2014) DNA barcoding reveals diverse growth kinetics of human breast tumour subclones in serially passaged xenografts. Nat Commun 5:5871. https://doi.org/10.1038/ncomms6871
Pan YC, Nishikawa T, Chang CY, Tai JA, Kaneda Y (2021) CXCL2 combined with HVJ-E suppresses tumor growth and lung metastasis in breast cancer and enhances anti-PD-1 antibody therapy. Molecular Therapy - Oncolytics 20:175–186. https://doi.org/10.1016/j.omto.2020.12.011
Perli SD, Cui CH, Lu TK (2016) Continuous genetic recording with self-targeting CRISPR-Cas in human cells. Science 353:aag0511–aag0511. https://doi.org/10.1126/science.aag0511
Prager BC, **e Q, Bao S, Rich JN (2019) Cancer stem cells: the architects of the tumor ecosystem. Cell Stem Cell 24:41–53. https://doi.org/10.1016/j.stem.2018.12.009
Quinn JJ, Jones MG, Okimoto RA, Nanjo S, Chan MM, Yosef N, Bivona TG, Weissman JS (2021) Single-cell lineages reveal the rates, routes, and drivers of metastasis in cancer xenografts. Science. https://doi.org/10.1126/science.abc1944
Quintana E, Shackleton M, Sabel MS, Fullen DR, Johnson TM, Morrison SJ (2008) Efficient tumour formation by single human melanoma cells. Nature 456:593–598. https://doi.org/10.1038/nature07567
Saga K, Park J, Nimura K, Kawamura N, Ishibashi A, Nonomura N, Kaneda Y (2019) NANOG helps cancer cells escape NK cell attack by downregulating ICAM1 during tumorigenesis. J Exp Clin Cancer Res 38:416. https://doi.org/10.1186/s13046-019-1429-z
Salgia R, Kulkarni P (2018) The genetic/non-genetic duality of drug ‘resistance’ in cancer. Trends in Cancer 4:110–118. https://doi.org/10.1016/j.trecan.2018.01.001
Shaffer SM, Dunagin MC, Torborg SR, Torre EA, Emert B, Krepler C, Beqiri M, Sproesser K, Brafford PA, **ao M, Eggan E, Anastopoulos IN, Vargas-Garcia CA, Singh A, Nathanson KL, Herlyn M, Raj A (2017) Rare cell variability and drug-induced reprogramming as a mode of cancer drug resistance. Nature 546:431–435. https://doi.org/10.1038/nature22794
Siebenthall KT, Miller CP, Vierstra JD, Mathieu J, Tretiakova M, Reynolds A, Sandstrom R, Rynes E, Haugen E, Johnson A, Nelson J, Bates D, Diegel M, Dunn D, Frerker M, Buckley M, Kaul R, Zheng Y, Himmelfarb J, Ruohola-Baker H, Akilesh S (2019) Integrated epigenomic profiling reveals endogenous retrovirus reactivation in renal cell carcinoma. EBioMedicine 41:427–442. https://doi.org/10.1016/j.ebiom.2019.01.063
Smith T, Heger A, Sudbery I (2017) UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27:491–499. https://doi.org/10.1101/gr.209601.116
Sun J, Ramos A, Chapman B, Johnnidis JB, Le L, Ho Y-J, Klein A, Hofmann O, Camargo FD (2014) Clonal dynamics of native haematopoiesis. Nature 514:322–327. https://doi.org/10.1038/nature13824
Turajlic S, Swanton C (2016) Metastasis as an evolutionary process. Science 352:169–175. https://doi.org/10.1126/science.aaf2784
Turksen K (ed) (2016) Stem cell heterogeneity: methods and protocols, methods in molecular biology. Springer, New York
Vendramin R, Litchfield K, Swanton C (2021) Cancer evolution: darwin and beyond. EMBO J 40:e108389. https://doi.org/10.15252/embj.2021108389
Wagenblast E, Soto M, Gutiérrez-Ángel S, Hartl CA, Gable AL, Maceli AR, Erard N, Williams AM, Kim SY, Dickopf S, Harrell JC, Smith AD, Perou CM, Wilkinson JE, Hannon GJ, Knott SRV (2015) A model of breast cancer heterogeneity reveals vascular mimicry as a driver of metastasis. Nature. https://doi.org/10.1038/nature14403
Zhang S, **ong X, Sun Y (2020) Functional characterization of SOX2 as an anticancer target. Sig Transduct Target Ther 5:1–17. https://doi.org/10.1038/s41392-020-00242-3
Acknowledgements
We thank Mayuko Okado, Noriko Ohta, and Mieko Watanabe for technical assistance. Some parts of the sequence analyses were performed on the Students' Compute Server of Osaka University Medical School and Compute Server of Taki Lab. (Rikkyo Univ. Department of AI/RIKEN iTHEMS). Y.Umeki. was supported by the Osaka University Medical Doctor Scientist Training Program. This work was supported in part by AMED under grant number 18 cm0106341h0001 and JSPS KAKENHI Grant number JP21K19408, JP21H05158, and JP21H05160 and research grants from Bristol-Myers Squibb, the Osaka Community Foundation, and the Osaka University Program for the Support of Networking among Present and Future Researchers to K.N.
Funding
This work was funded by AMED (18 cm0106341h0001); JSPS (JP21K19408, JP21H05158, JP21H05160); Bristol-Myers Squibb; Osaka Community Foundation; Osaka University Program for the Support of Networking among Present and Future Researchers; Osaka University Medical Doctor Scientist Training Program.
Author information
Authors and Affiliations
Contributions
YU and KN designed experiments; YU, YU and KS performed experiments; NO performed computation; KN performed bioinformatics; YK provided expert advice; KN designed and supervised the study and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethics Statement
Approval of the research protocol by an Institutional Reviewer Bord: yes. Informed consent: not applicable. Registry and the Registration no. of the study/trial: not applicable. Animal studies: yes.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Umeki, Y., Ogawa, N., Uegaki, Y. et al. DNA barcoding and gene expression recording reveal the presence of cancer cells with unique properties during tumor progression. Cell. Mol. Life Sci. 80, 17 (2023). https://doi.org/10.1007/s00018-022-04640-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00018-022-04640-4