Introduction

A single cell is the ultimate unit of life activity, in which genetic mechanisms and the cellular environment interplay with each other and shape the formation and function of such complex structures as tissues and organs. Dissecting the composition and characterizing the interaction, dynamics, and function at the single-cell resolution are crucial for fully understanding the biology of almost all life phenomena, under both normal and diseased conditions. Cancer, a disease caused by somatic mutations conferring uncontrolled proliferation and invasiveness, could in particular benefit from advances in single-cell analysis. During oncogenesis, different populations of cancer cells that are genetically heterogeneous emerge, evolve, and interact with cells in the tumor microenvironment, which leads to host metabolism hijacking, immune evasion, metastasis to other body parts, and eventual mortality. Cancer cells can also manifest resistance to various therapeutic drugs through cellular heterogeneity and plasticity. Cancer is increasingly viewed as a ‘tumor ecosystem’, a community in which tumor cells cooperate with other tumor cells and host cells in their microenvironment, and can also adapt and evolve to changing conditions [1,2,3,4,5].

Detailed understanding of tumor ecosystems at single-cell resolution has been limited for technological reasons. Conventional genomic, transcriptomic, and epigenomic sequencing protocols require microgram-level input materials, and so cancer-related genomic studies were largely limited to bulk tumor sequencing, which does not address intratumor heterogeneity and complexity. The advent of single-cell sequencing technologies [6,7,8] has shifted cancer research to a new paradigm and revolutionized our understanding of cancer evolution [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22], tumor heterogeneity [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,86]. Methods that couple additional experimental techniques with single-cell sequencing technologies are also gaining traction [21, 87,88,89,90,91], to provide a more integrated analysis of single cells.

Fig. 1
figure 1

State of the art of single-cell sequencing technologies. Single-cell sequencing technologies have been designed for almost all the molecular layers of genetic information flow from DNA to RNA and proteins. For each molecular layer, multiple technologies have been developed, all of which have specific advantages and disadvantages. Single-cell multi-omic technologies are close to comprehensively depicting the state of the same cells. We apologize for the exclusion of many single-cell sequencing methods due to the limited figure space

Accompanying the tremendous progress of experimental single-cell sequencing technologies, specialized bioinformatics and algorithmic approaches have also been developed to best interpret the single-cell data while reducing their technological noise. Examples of these approaches include the imputation of dropout events [92,93,94,95], normalization and correction of batch effects [96,97,98,99,100], clustering for identification of cell types [98, 101,102,103,104,105,106,107,108], pseudo-temporal trajectory inference [109,110,111,112], spatial position inference [87, 88, 90], and data visualization [102, 113,114,115]. Progress in this area requires the application of statistics, probability theory, and computing technologies, which lead to new algorithms, software packages, databases, and web servers. Detailed information of specific single-cell technologies and the underlying principles of the algorithms have been elegantly discussed in other reviews [61, 64,65,66,67,68,69,70, 72, 116,117,118,119,86], or alternatively to combine bulk and single-cell sequencing together and then conduct deconvolution analysis [127]. Deconvolution analysis for bulk RNA-seq data uses cell-type signature genes as inputs [128,129,130], which can be substituted by single-cell sequencing results, although critical computational challenges still exist, such as collinearity among single cells. If marker genes for known cell types are orthogonal to each other, the proportions of each cell type in a bulk sample can be reliably estimated. However, collinearity of gene expression exists widely among single cells, which complicates the deconvolution process. At present, successful deconvolution of bulk RNA-seq data based on scRNA-seq-defined signatures has been reported only in cases where orthogonal molecular signatures and fine cluster structures are well balanced [131]. The wide usage of scRNA-seq based deconvolution will hinge upon the availability of comprehensive single-cell clusters and the development of general methods for selecting orthogonal signatures for each cell type.

Spatial information of single cells in the tissue is often lost during the isolation step and thus single-cell sequencing data typically do not show how cells are organized to implement the concerted function within a tissue of interest. Many new techniques have been developed to keep or restore the spatial information of sequenced single cells such as fluorescence in situ hybridization (FISH), single-molecule fluorescence in situ hybridization (smFISH), laser capture microdissection, laser scanning microscopy, including two-photon laser scanning microscopy, and fluorescence in situ sequencing [21, 30, 87,88,89,90,91, 132,133,134,135,136,137,138,139,140,141,142,143]. However, at present all of these techniques have inherent limitations and only apply to specific spatial architecture. For example, while FISH-based technologies can map the spatial distribution of a set of selected genes upon which the spatial information of single cells subject to RNA-seq can be reconstructed via probabilistic inference, the methods are limited to two dimensions and the inference is primarily dependent on the availability of marker genes that can properly discriminate the spatial characteristics with sufficient resolutions. Other conditions for valid marker genes include accurate and robust estimation of their expression levels, but this requirement can be greatly compromised by inherent dropout in scRNA-seq protocols. Accurate restoration of single cell spatial positions via FISH-based inference also requires replicable tissues for parallel FISH and scRNA-seq, which can be only approximately fulfilled on model organisms. For human cancers, however, such requirements usually cannot be met and spatial-recording methods have thus been proposed. With laser capture microdissection, single cells are obtained simultaneously when their spatial information is recorded. However, the cellular throughput of such methods is extremely limited due to operation difficulties, and the biological interpretation of the recorded spatial positions are confined because adjacent cells cannot be properly dissected for scRNA-seq, whereas sequenced cells are often distantly distributed. Low molecular throughput is also problematic with these recently developed in situ sequencing methods. Typically, only tens or hundreds of known genes can be in situ labeled or sequenced, far from the requirement of fully understanding the molecular landscapes of single cells of interest. Furthermore, the replicability of such complicated experiments also imposes barriers for their practical applications to human samples.

Because single-cell sequencing captures individual cells at a particular time point, other factors such as cell cycle and functional state must be considered. By contrast, these factors are often ignored in bulk sequencing due to the average effect. Cell cycle phases can be discerned by phase-specific expression analysis [194] and provides great promise for cancer research. c Cellular interaction map**. Application of single-cell multi-omics techniques to resolve both the somatic mutations and gene expression, which will allow the investigation of immunogenicity of single cancer cells. d Single-cell epigenetics. Techniques to resolve the heterogeneity of cancer cells and tumor-infiltrating immune cells, which may provide new insights into the regulatory mechanisms within tumors and new drug targets to modulate tumor progression