Introduction

Circulating tumor cells (CTCs) are tumor cells with high vitality and high metastatic potential that originate from primary or metastatic tumors of epithelial origin and shed into the blood circulation system. CTCs are one of the important components of liquid biopsy, and provide a window to monitor tumor progression in real time [1,2,3]. Generally, high-throughput sequencing analysis of tumor tissue is based on the analysis of a mixed sample of millions of cells, which reflects the overall genomic characteristics of the cell but ignores the heterogeneity of the tumor cell, resulting in the dilution of genetic material from CTCs and cancer stem cells (CSCs) and other low-abundance but functionally important cells. However, the emergence of single-cell sequencing technology has solved this problem well [4,41, 43]. The microfluidics sorting method (microfluidics) can be coupled with downstream genome amplification technology to complete single-cell sorting, lysis, and amplification in one step, such as Fluidigm's C1 single-cell amplifier with high throughput (each chip can be completed 96 single-cell sorting), small reaction volume (can increase amplification efficiency and reduce reagent consumption), less contamination and little impact on sequencing. The disadvantages are a low capture rate for viscous and nonspherical cells and a high chip cost [39, 42].

The DEPArray sorting system (Di-Electro-Phoretic Array system) refers to a semiautomatic sorting system that separates rare cells from a mixed cell population [44]. Visualizing the cells to be sorted by fluorescent labeling, single cells are captured by the "electronic cage" formed by the microelectrodes on the chip, and then the microelectrodes are turned on or off to move the sorted and captured target cells to a suitable position on the chip and place them in suitable media for subsequent sequencing analysis [45]. The disadvantage is that it takes a long time, and the sample volume is small (only 14 μL). Peripheral blood samples need to be divided and enriched in CTCs before they can be sorted by this system. This approach has been used at the single-cell level to study CTCs in breast cancer and colorectal cancer [7, 46, 47]. The Cell Celector sorting system is an automatic sorting system that separates rare cells from mixed cell populations. It automatically retrieves single cells and cell clones through a multifunctional robot system to achieve single-cell sorting and directly separates the target cells or clones mechanically without affecting cell viability, allowing for real-time, highly accurate observation of cell images for cell sorting; however, this method is time-consuming [48].

Single cell whole genome amplification

The DNA content of a single cell is only 6 ~ 7 pg, which does not meet the level of DNA content required for whole-genome sequencing. High-fidelity, high-efficiency and unbiased genome amplification are required for whole-genome sequencing of a single cell. The development of whole-genome amplification (WGA) has promoted the progression of single-cell genome sequencing technology [49].

Based on the PCR amplification method, that is, the PCR amplification method that improves the specific primers or random primers of traditional PCR, such as primer-adapter PCR (LA-PCR), primer extension preamplification PCR, PEP-PCR and degenerate oligonucleotide primer PCR (degenerate oligonucleotides deprimed PCR, DOP-PCR), solves the problems of different primer annealing kinetics, low fidelity of the enzyme and exponential amplification, but insufficient coverage and unevenness of amplification products, amplification deviations, and allele deletions may cause single-nucleotide variation (SNV) and cause false positives [50, 51]. In addition, according to CTC enrichment techniques, the prevalence and metastatic potential of CTC subpopulations may differ, leading to different conclusions.

Multiple displacement amplification (MDA) uses random hexamers as primers to continuously synthesize φ29 DNA polymerase with strong synthesis ability, high fidelity, and strong strand displacement activity and completes the amplification at 30 °C [52, 53]. Under isothermal conditions, random primers with exonuclease activity are combined with the template. During amplification, φ29 DNA polymerase can replace the complementary strand of the template, and the substituted single-stranded DNA is further amplified as a new template, showing a branched structure. Exponential amplification is completed, and amplicon fragments of 5–10 kb are formed. MDA is a commonly used single-cell whole-genome amplification method with high coverage and uniformity, good accuracy, and long amplicons, but there are high allele deletion rates, exponential amplification-caused sequence-dependent deviations, and the approach neglects differences between cells. The lack of heterogeneity detection in this approach means that it is not suitable for detecting copy-number variation (CNV) [58]. As this method involves linear amplification and high uniformity without amplification deviation, it is more suitable for the detection of CNVs. However, due to the low fidelity of the Taq DNA polymerase used, the false positive rate of SNV detection by this method is higher (approximately 40 times higher than that of MDA) [54].

The linear amplification via the transposon insertion (LIANTI) system introduced a specially designed Tn5 transposon that contains the T7 promoter that can be randomly combined with the genome; then, in vitro, linear amplification is reversed. Thousands of RNA copies are subsequently reversed and synthesized by the second strand to form the LIANTI amplicon for library construction. Compared with other amplification methods, the coverage and uniformity of this method are improved, and the allele deletion rate and false positive rate are reduced [59].

The DNA content of a single cell is at the pg level, so special attention should be paid to the prevention of contamination from the environment and the operator to reduce nonspecific amplification during amplification, and operation and control of contamination should be performed under sterile, controllable air pressure conditions.

Single cell whole genome sequencing

Single-cell sequencing analysis technology was developed by the American Anderson Cancer Research Center and Cold Spring Harbor Laboratory in 2011 [4]. Commonly used next-generation sequencing (NGS) platforms include 454 Life Sciences/Roche, Illumina and Applied Biosystems' SOLiD system (Fig. 3) [60]. Single-cell genome sequencing first detects the total amount of amplified products and fragment distribution and constructs a library of qualified samples. Library preparation includes randomly interrupting the amplified products into small DNA fragments, end repair, adding A, adding adapters and PCR amplification to obtain the required library, and sequencing the library concentration and amplified fragment distribution after passing quality inspection.

Fig. 3
figure 3

Technical characteristics including sample preparation, sequencing chemistries, and data output formats of different sequencing approaches [60]. The copyright of this image belongs to Reference [60]

The basic process of single-cell genome sequencing data analysis is similar to NGS. First, the original offline data is filtered and the quality of the sequencing is evaluated. Then, the filtered data are compared to the reference genome, and the corresponding indicators are quality-controlled. Due to the uneven coverage and high chimerism rate brought about by WGA, the data need to be preprocessed. For example, the nucleic acid library is standardized and can be spliced using traditional splicing methods [60]. At present, data analysis methods developed for single-cell sequencing include Smash Cell, Velvet-SC, and SPAdes. These high-performance computing platforms and bioinformatics methods have overcome the problem of uneven coverage caused by expansion to a certain extent [61,62,63]. Single-cell genome sequencing can provide information regarding large-scale genomic structure variations, including genome rearrangements, insertions, duplications, inversions and transpositions, as well as genomic structural variation information, such as CNVs and SNVs. SNVs include single base insertions, deletions and mutations. Through these genomic structural variations, tumor driver genes and biomarkers can be found, and the progression of tumorigenesis can also be understood [64].

Single cell sequencing of circulating tumor cells

As an important indicator of tumor progression, CTC single-cell whole-genome sequencing analysis of the peripheral blood of patients with solid tumors helps to understand the occurrence and development of tumors, especially tumor heterogeneity and drug resistance, and can identify the mechanisms of tumor development. Discovering gene mutations can lead to the discovery of new driver genes, enhance understanding of the clonal origin and evolutionary mechanism of tumors, recognize the genetic sequence differences between tumor subtypes, and contribute to the discovery of new biomarkers [49]. The information obtained through single-cell sequencing is more comprehensive, making up for the deficiencies in tumor stratification based on a single biopsy, and is widely used to assist in the early diagnosis of tumors, the selection of therapeutic drugs, prognosis prediction and relapse monitoring. Tumor diagnosis and prognosis prediction via single-cell sequencing is a noninvasive method (Table 2).

Table 2 Summary of single-cell sequencing studies of various cancer CTCs

Single-cell sequencing technologies for CTCs

This section describes emerging and important single-cell sequencing technologies for CTCS, such as Hydro-Seq and EPISOT & EPIDROP assays. Yu-Heng Cheng et al. presented Hydro-Seq, a high-efficiency contamination-free cell capture scRNA-seq platform, for the gene expression profiling of CTCs. Hydro-Seq utilizes size-based single-cell capture to prevent bias that may result from molecular CTC selection. This cell capture protocol achieves high cell capture efficiency (72.85 2.64%, representing standard deviation n = 3) for the analysis of a small number of CTCs in a 10 ml blood sample. To enable contamination-free single-cell sequencing, the Hydro-Seq chamber integrates pneumatic valves that allow washing of cell and noncell contaminant clearing chambers on the chip. In addition, the chamber array can be expanded to thousands of chambers for massive parallel analysis. By sequencing 666 CTCs from 21 patients with advanced breast cancer, we validated the utility of Hydro-Seq, identifying cellular heterogeneity as a key biomarker of tumor metastasis and treatment. Hydro-Seq offers the ability to analyze CTCs by single-cell whole transcriptome sequencing for metastasis studies and companion diagnostic applications [65].

Liquid biopsy has been introduced as a new diagnostic concept based on the analysis of CTCsor circulating tumor-derived factors, particularly cell-free tumor DNA (ctDNA). Highly sensitive liquid biopsy assays have since been developed that can be applied to detect and describe minimal residual disease (MRD), which reflects the presence of tumor cells that disseminate from the primary lesion to distant organs in patients lacking any clinical or radiological signs of metastasis, or residual tumor cells remaining after local therapy that ultimately lead to local recurrence. This application is a new frontier in liquid biopsy analysis, which is challenged by the very low concentrations of CTCs in blood samples.

Pantel & Alix-Panabières discussed key techniques such as EPISOT & EPIDROP assays used to detect and characterize CTCs in MRD monitoring and highlighted the current use of CTC analysis to detect and monitor MRD as well as acquire clinical data on therapeutic targets and resistance mechanisms relevant to the management of individual cancer patients [66].

CTC analysis promotes the accurate ty** of tumors

Many previous studies used Sanger sequencing or NGS methods to detect specific gene mutations in CTCs at the single-cell level and found heterogeneity between tumor patients and patients with the same tumor; for example, Sanger sequencing revealed differences between breast cancer CTCs and breast cancer tumor tissue. The level of heterogeneity in PIK3CA gene mutation status is large. In addition, heterogeneous CTC subgroups can also be found based on mutations in the TP53 gene in CTCs [7, 67]. In colorectal cancer studies, BRAF, PIK3CA, and KRAS mutations in different CTCs were found, suggesting the existence of a large level of tumor heterogeneity both between individuals and within the same individual. Similarly, sequencing of BRAF and KIT mutations in malignant melanoma revealed a high degree of heterogeneity between CTCs and tumor tissue [6, 68].

In addition, through genome-wide sequencing and comparative genome hybridization (array comparative genomic hybridization, aCGH) technology, CNV variation patterns in CTCs have been studied at the genome-wide level. Through aCGH analysis of breast cancer CTCs, it was found that the CNV variation in CTCs from different patients with the same pathological type of tumor was highly heterogeneous, suggesting that breast cancer can be more accurately typed according to the CNV variation pattern [46]. A CTC single-cell whole-genome sequencing study on multiple tumors, including gastric cancer, colorectal cancer, breast cancer, and lung cancer, revealed that different CTCs in the same patient had highly consistent genome-wide CNV change patterns, while the CTC CNV variation patterns of different tumors and different pathological types were quite different. Similar to the results obtained in above studies, the CNV change pattern in CTCs of breast cancer patients is complicated, suggesting that CNV analysis can be used as a basis for more accurate ty** [69].

CTC analysis can reveal the mechanism of tumor metastasis

Tumor recurrence and metastasis are the main causes of tumor death. CTC single-cell sequencing analysis of CNV and other mutation patterns is helpful to understand the mechanism of tumor metastasis. Heitzer et al. [70] used aCGH to analyze the CNV variation patterns in primary tumors, metastases, and CTCs of colorectal cancer at the single-cell genome level and found that CTCs contained the same CNV variations as the primary tumors and metastases in addition to new CNV variations. Lohr et al. [8] selected 10 patients with metastatic prostate cancer in whom CTCs were not detected in the peripheral blood and performed exome sequencing analysis on tissue samples of the primary tumor and metastatic tumors. Ten mutations, referred to as "early mutations," were found in the primary focus, and 56 mutations, referred to as "metastatic mutations," were found in the metastatic focus. Then, two more patients with metastatic prostate cancer with peripheral blood containing more than 20 CTCs were selected. Exome sequencing analysis was performed on the primary tumors, metastases and CTC single cells, and it was found that 9 of the CTCs were associated with the primary tumors. The same "early mutations" found in the foci and 41 "metastatic mutations" were found in the CTCs, indicating that the CTCs contained both primary tumor and metastasis genetic mutation information; this, it is possible to determine the mechanism of tumor metastasis from the mutations found in CTCs [8]. In 2017, Gao et al. [69] performed genome-wide sequencing analysis on 28 primary tumor cells, 5 CTCs, and 3 metastatic lymph nodes from a colon cancer patient. They found that the CNV variation in 28 tumor cells of the primary focus had greater heterogeneity. That is, the correlation coefficient of the CNV variation of any two primary focus cells was between 0.09 and 0.96; and the CNV variation pattern among the 5 CTCs was similar The CNV variation was close that of three metastatic lymph nodes and is similar to a certain subpopulation of cells in the primary tumor, indicating that the change in the CNV pattern in the process of tumor metastasis is gradually converging, suggesting that there may be only a small group of cells with a higher degree of malignancy Tumor cells can enter the circulatory system from the primary foci and then form metastatic foci. Through comprehensive analysis of SNV, CNV and structural variation, a two-step model of the formation of multi-interval CNVs has been proposed; that is, a multi-interval copy number increase occurs due to a series of replication fork pauses and template transpositions during DNA replication, followed by homologous recombination. Further amplification of this region to a higher copy number reveals the cause of tumor CNV formation at a deeper level (Fig. 4) [70].

Fig. 4
figure 4

Evolution of SNVs and Large-scale CNAs in Primary Tumour cells and CTCs. A Schematic diagram of the manner in which primary tumour cells intravasate and become CTCs. B SNVs of primary tumour cells and CTCs. The distribution of 20 non-synonymous mutations was assessed in 28 primary tumour cells (Cells 1–28) and five CTCs (CTCs 1–5) from a colon cancer patient (blue box, mutant; grey box, wild type). Three clones of cells were present according to a probabilistic modelling-based approach. C CNA patterns of the primary tumour, one control leukocyte (C1), single primary tumour cells, CTCs, and three lymph node metastases (Pri., primary tumour; Meta., metastases). The copy numbers (blue and red dots) are plotted along the genome at a bin size of 500 kb. The ordinate coordinate represents copy numbers ranging from 0 to 6 (a copy number of more than 6 copies is set to 6). Phylogenetic tree on the left was constructed based on the segmented copy numbers of single cells [70]. The copyright of this image belongs to Reference [70]

Dynamic monitoring of tumor progression

As CTCs are an important component of liquid biopsy, an increasing number of studies have tried to use the tumor mutation information from CTCs to guide the clinical treatment of tumors. In prostate cancer, the glucocorticoid receptor (GR) is a prime suspect for acquired therapy resistance, as resistance to the antiandrogen enzalutamide (Enz) can occur through bypass of androgen receptor (AR) blockade by the glucocorticoid receptor (GR)[70]. Prostate cancer (PCa) cells are able to increase GR signaling during anti-androgen therapy and thereby circumvent androgen receptor (AR) blockade and cell death [71, 72]. In 2014, Dago et al. [70] analyzed the whole-genome CNV variation of peripheral blood CTCs in patients with castration-resistant prostate cancer at four treatment time points, before chemotherapy, before treatment with abiraterone, when symptoms were significantly relieved, and when symptoms worsened. Combined with CTC morphology, androgen receptor (androgen receptor, AR) expression levels and other comprehensive analyses, it was found that the change in CNV pattern in CTCs at different periods was significantly different, especially when abiraterone was ineffective and the symptoms were aggravated. CNVs varied greatly, and a subpopulation of CTCs had MYC gene amplifications. The appearance of this subpopulation of malignant CTCs has a significant correlation with the resistance of patients to abiraterone. In contrast to fixed genomic alterations, Shah et al. [71] found that GR-mediated antiandrogen resistance is adaptive and reversible due to regulation of GR expression by a tissue-specific enhancer. GR expression is silenced in prostate cancer by a combination of AR binding and EZH2-mediated repression at the GR locus but is restored in advanced prostate cancers upon reversion of both repressive signals. Puhr et al. [72] identified MAO-A as a directly upregulated mutual epithelial and stromal GR target, which is induced after GC treatment and during PCa progression. Their findings demonstrate that targeting MAO-A represents an innovative therapeutic strategy to synergistically block GR- and AR-dependent PCa cell growth and thereby overcome therapy resistance. Their research showed that CTC single-cell sequencing can be used to dynamically monitor the response of cancer patients to treatment, discover the evolution of tumor cells and disease progression in a timely manner, and establish a new multiparameter comprehensive analysis liquid biopsy program (Fig. 5) [73].

Fig. 5
figure 5

AR subcellular localization changes at the time of disease progression. A Comparison of the AR subcellular localization in the CTCs identified in the blood prior to and after nine weeks of abiraterone treatment. Correlation between the AR and DAPI signals within the cell is indicative of AR being colocalized with DAPI, i.e. localized in the cell nucleus. High correlation was generally seen before abiraterone treatment, but a shift to less nuclear stain was observed after nine weeks of treatment (p = 0.00017, Wilcoxon sum-rank test). B and D Height maps constructed from the pixel intensities of CK (red), AR (green) and DAPI (blue) in representative CTCs to visualize the subcellular localization of AR. The cell in (B) was isolated before abiraterone initiation and displays AR staining confined to the nucleus, while cytoplasmic AR staining is observed in the CTC identified at the time of therapeutic relapse (D). C and E Plots of AR versus DAPI signal intensities for each pixel inside the cell in the 406images of the CTCs in (B) and (D), respectively. Each plot point is colored by the corresponding CK signal intensity. Nuclear localization was observed as positive correlation between the two intensities (C), and nuclear exclusion as negative correlation (E). All graphs and were done using the ggplot2 and rgl packages in R [73]. The copyright of this image belongs to Reference [73]

Dynamic monitoring of primary tumor cells, CTCs and tumor metastatic cells through single-cell sequencing can help to elucidate tumor progression in real time in a noninvasive manner, understand the key oncogenes and tumor suppressor genes of tumor patients, and understand the variation in genomic CNVs, as early diagnosis of tumors, dynamic treatment monitoring, and discovery of drug resistance mutations and other important personalized treatment information provide the basis for potential clinical application prospects (Fig. 6) [74,75,76,77,78]. Single-cell sequencing compares the differences between single-cell genomes, transcriptomes, and epigenetic groups in peripheral blood CTCs and tumor primary tumors, metastatic lymph nodes, and metastatic tumors, reducing interference from tumor heterogeneity and increasing understanding of the biology of tumor development. The evolution of this process provides a new perspective [6,7,8,9, 68].

Fig. 6
figure 6

Single cell genomics of CTCs from patients (a) H&E staining of the primary tumor of metastatic breast and lung cancer patients. Tissue biopsies were used to determine the presence of DNA mutations on the oncogene PIK3CA and EGFR. b Panel of CTCs from the same metastatic breast and lung cancer patients in (a). Micrographs of the CTCs identified and subsequently released for molecular analysis using our selective release mechanism (scale bar 10 μm). c Micrographs of amplified DNA of the single CTCs shown in (b). d Sequencing of the amplified DNA from the single CTCs shown in (b). The 3140A/G (H1047R) point mutation in the PIK3CA oncogene as well as the exon 19 deletion and the 2573 T/G (L858R) point mutation in the EGFR oncogene were detected at the single cell level [74]. The copyright of this image belongs to Reference [74]

Determining the efficacy of adjuvant therapy

It is also an important application direction of single-cell sequencing analysis to understand the therapeutic effect of tumors through CTC single-cell transcriptome sequencing analysis. In 2014, Ting et al. [77] used CTCs-i Chip to enrich the peripheral blood CTCs of a pancreatic cancer mouse model and analyzed the transcriptome of 75 CTCs at the single-cell level. They comprehensively analyzed the expression levels of marker genes of epithelial cells, hematopoietic cells and endothelial cells, found that there were 7 different cell subpopulations in CTCs, and that the extracellular matrix od mouse pancreatic cancer CTCs had high expression of Dcn, Sparc, Ccdc80, Col1a2, Col3a1 and Timp2 isogenic genes, which are related to the dissemination of the tumor to distant organs. In 2015, this group used the same method to analyze 77 CTCs obtained from the peripheral blood of 13 prostate cancer patients and found that gene expression in CTCs was heterogeneous and included androgen receptor (AR) mutants and splicing-related differential expression of isomers. On this basis, a retrospective analysis of patients using AR inhibitors and their response rate to inhibitors was performed and the results revealed that after using AR inhibitors, patients with CTCs with nonclassical Wnt signaling still showed positive prostate-specific antigens or still needed radiation therapy. The Wnt signaling pathway and its downstream RAC1, RHOA, and CDC42 signals were activated, indicating that the changes in cell signaling pathways in CTCs may be related to the therapeutic response of patients [78]. Compared with CTC single-cell genome sequencing analysis, CTC single-cell transcriptome sequencing analysis is relatively difficult.

Conclusion and prospects

Single-cell sequencing is a booming emerging technology. In 2013, Science magazine ranked the field of single-cell sequencing among the top six. However, a major challenge in the field is sample size. These cells occur at extremely low frequency, and even after successful enrichment, captured CTCs are of different times (passively and/or actively detached from tumors at different time points) and tumors (primary and/or metastatic tumor), confounding the sequencing results. Furthermore, different single-cell enrichment/library prep/WGA/sequencing technologies used in independent studies serve as another potential source of variation. Single-cell sequencing technology is not yet fully mature; for example, human-generated amplification of whole genome amplification, low coverage, poor reproducibility, allele deletion, false positives and false negatives, as well as errors in sequencing and splicing software occur. Analyzing heterogeneity and clonal evolution, and the discovery of driver genes is challenging. With the continuous optimization of genome-wide amplification methods and the rapid development of bioinformatics methods, these problems will be gradually solved. Amplification methods with higher coverage and better uniformity will promote the development of single-cell genome sequencing technology. For the analysis of single-cell sequencing data, large sample sequencing analysis methods are commonly used, such as Mu Tect, Var Scan, and Monovar. In recent years, researchers have also successfully developed many bioinformatics methods to better analyze high-throughput data. Salehi et al. [79] proposed ddClone based on the analysis results of real and simulated datasets, which analytically integrates NGS and SCS data, leveraging their complementary attributes through a joint statistical inference. Furthermore, technological advances have made it possible to measure spatially resolved gene expression at high throughput. Svensson et al. [80] developed SpatialDE, a statistical test to identify genes with spatial patterns of expression variation from multiplexed imaging or spatial RNA sequencing data, which implemented “automatic expression histology” (the spatial gene clustering approach that enables expression-based tissue histology). Additionally, single-cell transcriptome sequencing and epigenetic sequencing methods are constantly being developed, and single-cell genome sequencing be used in the integrated analysis of single-cell multiomics [81, 82].

Given the key role of CTCs in the development of tumors, with the continuous maturation of single-cell sequencing technology and the standardization of CTC enrichment identification technology, CTC single-cell sequencing analysis will contribute to our understanding of the genetic heterogeneity, evolution, and drug resistance of tumor cells. The integrated analysis of single-cell sequencing combined with other omics provides valuable information and will also promote the development of precision medicine for tumors.