Introduction

Characterizing the spatial distributions of molecularly defined cell types is a shared goal of the Human Cell Atlas (HCA)1, NIH BRAIN Initiative Cell Census Network (BICCN)2, Human BioMolecular Atlas Program (HuBMAP)3 and related collaborative efforts. The core elements in this task include transcriptional classification and spatial localization of cell types, which involves integration of single cell and spatially-resolved transcriptomics to define and spatially match cell types through the analysis of combinatorial gene expression patterns in tissue sections. Single cell and single nucleus RNA sequencing (scRNA-seq) has rapidly progressed into a high throughput standardized methodology and has been used by many labs as a major workhorse for cell type classification in many organs. In contrast, spatial transcriptomics methods are still evolving, varying substantially in methodology, degree of multiplexing, cost, and throughput, and lacking consensus data standards and analysis methods.

Characterizing spatially-resolved cell types is essential in the brain in order to study the exceptional cellular heterogeneity and functional significance of its spatial organization. ScRNA-seq has revealed an unprecedented granularity of neuronal cell types in mouse and human brains4,5,6,7, providing a comprehensive landscape of cell type heterogeneity defined by their transcriptional profiles. Recently, a number of multiplex fluorescence in situ hybridization (mFISH) and in situ RNA sequencing methods8,9,10,11,12,13,14,15,16,17,18 have been reported for conducting spatial transcriptomics experiments at the cellular level. Each method is independently optimized for marker gene panel design, tissue processing, transcript sequencing, and imaging steps of the pipeline, requiring different strategies for data processing, quality control, and downstream analysis. The SpaceTx Consortium, an organized effort consisting of both experimental and computational working groups, took the lead to evaluate the performance of currently available spatially-resolved transcriptomics methods in high quality cortical brain samples, with the goal of building consensus maps of cortical cell type distributions based on combined analysis of single cell and spatially-resolved transcriptomics. The overarching effort of the SpaceTx Consortium will be described in a separate publication22, and ExSeq23,24) applied to tissue section from the mouse primary visual cortex brain region (VISp)4. All mRNA detection data (spot-by-gene matrices) were segmented using the same segmentation procedure—Baysor25, which also included consistent quality control approaches for doublet and low-quality cell removal. The segmentation step produced the cell-by-gene matrices that were used to assign the spatially-resolved cell types to scRNA-seq reference cell types using cell type matching algorithms.

Figure 1
figure 1

Overview of the SpaceTx analysis workflow. The reference scRNA-seq cell types of the primary visual cortex (VISp) of mouse brain are from Tasic et al.4. Spatial transcriptomics data were generated by four image-based experimental protocols (MERFISH, smFISH, BaristaSeq, and ExSeq). Segmentation and quality control were performed using a common procedure (Baysor). Six computational algorithms (ATLAS, FR-Match, map.cells*, mfishtools, pciSeq, and Tangram) for cell type matching were applied. Two meta-analysis strategies were used to combine the individual matching results. Spot-based segmentation-free cell type assignment was conducted using SSAM. All data and matching results can be viewed in Cytosplore Viewer (https://viewer.cytosplore.org).

Teams of the SpaceTx Consortium explored six computational algorithms (ATLAS26, FR-Match27,28, map.cells*4, mfishtools29, pciSeq30, and Tangram31), which produced individual cell type assignments with various probabilistic assignment scores. To arrive at consensus cell type assignments, two meta-analysis strategies were developed to combine the individual assignments more quantitatively (Geometric Mean Combining Strategy, hereinafter GMCS), or more qualitatively (Negative Weighting Combining Strategy, hereinafter NWCS) (see “Methods” section). In parallel, spot-based cell type assignment was performed by SSAM32 using a guided mode, which partially borrows information from the combined assignment results. All spatial data and cell type assignment results were loaded into the Cytosplore Viewer (https://viewer.cytosplore.org) for interactive visualization and data exploration, where an integrated tSNE33 map for all annotated cells in all spatial methods are presented together with single method viewers for comparative analysis.

In situ hybridization spatial transcriptomics data

The overarching SpaceTx Consortium project evaluated spatial transcriptomics technologies including multiple imaging-based spatial transcriptomics protocols alongside spatial sequencing methods, 10 × Visium, and Slide-seq. For the scope of this meta-analysis of computational evaluations, we focus on the in situ imaging-based protocols with data that has passed preliminary quality control. All data used and presented in this manuscript were generated by the experimental working group of the SpaceTx Consortium. Experimental details, including gene panel selection, tissue distribution and processing, image processing, etc., are available in22, and ExSeq23,24 protocols. In general, the imaging-based protocols conduct multiple rounds of chemistry and microscopy to measure the location of individual mRNA molecules in the tissue section. Since each spatial method has unique requirements for numbers of genes and expression levels, each experimental protocol assembled different probe panels with specific gene sets in their design (Supplementary Table S1). Sensitivities of detection across protocols are reported in Table 1 in the SpaceTx Consortium paper30, and Tangram31) were applied to assign reference scRNA-seq cell types to each segmented cell with an associated confidence score (or probabilistic assignment) based on the cell-by-gene count matrix (see “Methods” section). Confidence scores are metrics with values in the [0,1] range, but are defined differently in each algorithm (see “Methods” section and citations). Applying the cell type matching algorithms produces a cell-by-type matching matrix as a primary output, consisting of probabilistic assignment of each segmented cell to each of the reference cell types at the subclass level. Deterministic cell type assignment for each spatial cell is defined as the reference cell type that has the highest confidence score for each algorithm.

In this subsection and the next, we focus the discussion of cell type matching performance on MERFISH data; similar analyses were applied to the other spatial data as well. Similar but less refined matching results were observed, potentially due to the fact that fewer genes were probed in those gene panels. For further exploration, all spatial data on VISp are publicly available as a data resource at https://spacetx.github.io/data.html.

A key challenge for the deterministic assignment of cell types was the extensive differences observed among the individual matching results without the availability of a gold standard result to compare against, although some information about expected spatial distributions is available based on scRNA-seq data4. For example, the deterministic cell type assignment for the L2/3 IT subclass showed substantial differences in the number of matched cells (Fig. 3A) and the spatial distribution of the cells matched to the same subclass (Fig. 3B) among the individual matching methods. The differences were also reflected in the substantial amount of disagreements of cells matched to the same subclass (Fig. 3C,D).

Figure 3
figure 3

Cell type matching performance comparison on the L2/3 IT subclass of MERFISH data. Six computational methods were applied to match/assign reference cell types to the spatial cells. (A) Number of cells matched to the L2/3 IT subclass by each individual method. (B) Spatial distribution of the cells matched to the L2/3 IT subclass by each individual method. X-axis is the spatial axis perpendicular to cortical layers measured as distance (μm) from pia (left end: upper layer, right end: deeper layer). (C) Overlap** of cells matched to the L2/3 IT subclass by each individual method. (D) Breakdown of the intersections of cells matched to the L2/3 IT subclass by individual methods.

In this example, the L2/3 IT subclass is a relatively abundant cell population consisting of intratelencephalic (IT) neurons that are expected to appear in supragranular cortical layers (layers 2 and 3). Out of the 2150 MERFISH cells for cell type matching, the number of cells matched to the L2/3 IT subclass were 349 (ATLAS), 581 (FR-Match), 693 (map.cells*), 798 (mfishtools), 637 (pciSeq), and 176 (Tangram) in the individual matching results (Fig. 3A). In all cases, the great majority of cells matching to L2/3 IT are found in supragranular cortex, as expected, although the exact footprint of this layer and number of off-target matchings varied by method (Fig. 3B). All six methods identified 127 cells in common (Fig. 3C). Tangram matched the smallest number of cells to the L2/3 IT subclass, followed by ATLAS. Most of ATLAS and Tangram matched cells were matched equivalently by the other methods, which may suggest that these methods have higher precision (a.k.a. positive predictive value) but lower sensitivity for this subclass. The other four methods identified 397 L2/3 IT cells in common (Fig. 3D), suggesting there is a relatively abundant L2/3 IT cell population (18% of all segmented cells) identifiable by the majority of methods. We may regard the method-specific cells (8 for ATLAS, 46 for FR-Match, 50 for map.cells*, 97 for mfishtools, 72 for pciSea, and 2 for Tangram) as cells that may have weaker signal and more noise in their combinatorial marker gene expression pattern; these noisy cells appear to be the major source of the observed spillover effect in the layer distributions (Supplementary Fig. S1) for this specific cell subclass.

Spatial coordinate plots with confidence score intensities for each individual matching are available in Supplementary Figs. S2S7. The highest confidence score as the deterministic cell type assignment for each cell are plotted. All methods were able to recapitulate the laminar pattern of neuronal cells to some extent, particularly with respect to cells matched with higher confidence. Because these precision vs. sensitivity results for L2/3 IT are not necessarily representative of results from other subclasses, and since we do not have a ground truth result to assess performance against, we are not able to conclude that any specific method outperforms the others on cell type matching. Instead, we chose to computationally combine these matching methods, as has been done previously for other computational tasks such as cell type clustering35 and cell morphology tracing36.

Combined matchings

It is likely that the individual cell type matching methods have different advantages and experimental biases, and often produce different cell type assignments, especially in those cells with fewer total transcripts or less confident segmentation boundaries (Supplementary Fig. S8). Assuming that the majority of individual methods would produce some level of accurate cell type matching/assignment, combining their results using an ensemble approach may provide the best classification result. We used two different strategies to combine all individual matching results in the ensemble meta-analysis—GMCS and NWCS, each producing a re-calculated confidence score matrix for determining the consensus cell type assignment. The GMCS combined matching approach considers each individual matching result as the vertex of a polygon whose geometric median, the point with minimum average Euclidean distance from these vertices, serves as the combined result (see “Methods” section). The NWCS combined matching approach is a rank-based weighted average of the confidence scores from each individual matching method using only the highest score for each cell (see “Methods” section). For all matching results, deterministic cell type assignment is defined as the cell type with the highest confidence score for a given cell. The confidence scores could be used as a quantitative metric of matching strength. However, they are defined differently and show very different distributional properties from each matching method (Supplementary Fig. S9). Even though all confidence scores are in the range of [0,1], they are therefore not directly comparable across individual matching results. As such, the ranks (i.e., ordered statistics) of the scores are pragmatically more useful, with deterministic cell type assignment using the top-ranked confidence score.

Using the L2/3 IT subclass as an example, the combined matching results are more similar in the number of cells (798 for GMCS and 659 for NWCS) matched to the subclass (Fig. 4A) and the spatial distributions mostly aligned (Fig. 4B). Between the two combined matching results, the vast majority of the cells matched to the same subclass (Fig. 4C), indicating strong agreement between the two combined matchings. The combined matching GMCS and NWCS assigned 31% and 37% of all MERFISH cells to the L2/3 IT subclass, respectively, although there is still some spillover of the matched cells in the layer distribution. Using the ensemble approaches, more cells in the expected supragranular cortical layers are matched to the L2/3 IT subclass compared to individual matchings (Fig. 4D). On average (arithmetic mean), up to 75% (red curve) of cells in the supragranular cortical layers are matched to L2/3 IT across all 6 individual methods. In contrast, up to 90% (green and blue curves) of cells in the supragranular cortical layers are matched to this subclass using these ensemble approaches, suggesting that these ensemble methods are effective producing a highly compacted consensus result, whereas simple combinations (e.g., arithmetic mean) of the individual matching results are not effectively aggregating the cells matched to L2/3 IT subclass towards the expected cortical layers.

Figure 4
figure 4

Combined cell type matching performance on the L2/3 IT subclass of MERFISH data. Two ensemble approaches were applied to combine the individual matching results, resulting in two combined matchings—GMCS and NWCS. (A) Number of cells matched to the L2/3 IT subclass in each combined matching result. (B) Spatial distribution of the cells matched to the L2/3 IT subclass in each combined matching result. X-axis is the spatial axis perpendicular to cortical layers measured as distance (μm) from pia (left end: upper layer, right end: deeper layer). (C) Overlap** of cells matched to the L2/3 IT subclass in the combined matching results. (D) Comparison between averaged summary statistics of the individual matching results and summary statistics of the combined matching results. Each dot represents a bin size of 25 μm in the cortical depth axis. Avg_frac is the fraction of cells matched to the L2/3 IT subclass in the cortical depth bin averaged over individual matching methods. Avg_method is the average number of individual methods that matched a cell to L2/3 IT subclass in the cortical depth bin. Frac_GMCS and Frac_NSCS are the fraction of cells matched to the L2/3 IT subclass in the cortical depth bin for the combined matching results. Supragranular cortical layers are indicated by the black bar.

Considering all cells, the two combined matching results produced cell type assignments with 83% (= number of cells assigned to the same subclass/total number of cells) of cells being assigned to the same subclass, overcoming the large differences among individual matching results. The combined confidence score intensity matching plots for all cells are available in Supplementary Figs. S10S11; and the distribution of all cells in cortical layers by each combined matching are in Supplementary Fig. S12. Though the distributions of matched cells in cortical layers are very similar for the abundant GABAergic and glutamatergic subclasses between the two combined matchings (Supplementary Fig. S12), they differ in rare and non-neuronal subclasses (e.g., Meis2, Endothelial, and Macrophage), suggesting that it is more difficult to detect and match rare cell types in spatial transcriptomics. Overall, these results suggest that, while individual matching algorithms may have different strengths and biases leading to somewhat different results, the ensemble methods provide a more robust cell type matching/assignment for the spatial cells.

Laminar distributions of neurons from computational methods

The spatial distribution of many known neuron types have been studied through frozen dissection, RNA scope staining, confocal imaging, etc. For example, Fig. 1 in Tasic et al.4 shows the laminar patterns of mouse VISp reference scRNA-seq cell types in layer dissections. With the MERFISH data and all computational matching results, we plot the calculated spatial distributions of the matched inhibitory and excitatory neurons in Fig. 5. For the inhibitory neuron types, the Vip neurons are distributed in upper layers and the Sst and Pvalb neurons are distributed in deeper layers as expected. For the excitatory neuron types, in general, the major peak of each spatial distribution curve is found in the expected layer, but with varying width of the major peak and minor peaks in some cases.

Figure 5
figure 5

Spatial distributions of inhibitory and excitatory neurons in individual and combined matching results of MERFISH data. Top: distributions (cortical depth) of inhibitory neurons of Vip, Sst, and Pvalb types show a peak of Vip type in upper layers and peaks of Sst and Pvalb types in deeper layers. Bottom: distributions of excitatory neurons follow the major laminar pattern expected, with various minor peaks in different methods. For detailed view, see the corresponding spatial coordinate plots of the matched cells for each subclass using each method in Supplementary Figs. S2S7 and S10S11.

Individual matching method agreement across protocols

Experimental protocols showed different detection sensitivities across platforms (Table 1 in the SpaceTx Consortium paper28. A recent publication also reported a systematic evaluation of the theoretical and practical aspects of data transformations and normalizations for single-cell RNA-seq data42.

In this manuscript, we first compared gene detection sensitivity and gene expression patterning across spatial experimental methods, which revealed high variability and very different dynamic ranges in the in situ hybridization data across different experimental protocols. We also presented a systematic evaluation of the individual cell type matching algorithms and the combined matching strategies using the MERFISH dataset as an example. The cell-based cell type matching algorithms were applied following the same segmentation step on the image data. Individual matching results varied largely in their metrics of matching confidence as well as their deterministic cell type assignments, among which no overall “winner” could be claimed without a gold standard result to compare against. Given the variable performance of individual matching results, we used ensemble meta-analysis approaches to combine these individual matchings to form consensus results. The meta-analysis approaches largely improved the agreement between the consensus matchings, where the majority of the cells have the same cell type assignment by the two combined matching strategies. One exception is the NWCS and GMCS results for BaristaSeq, which may further suggest that rank-based approaches (e.g., NWCS) can be practically more useful for meta-analysis and more robust to the variations from individual matching results. Using the spot-based cell type matching algorithm, similar results as the consensus results could be efficiently obtained without explicit segmentation, given that precise gene signatures are available.

A Cytosplore Viewer compilation allows all spatial cells from all evaluated experimental protocols to be viewed in an integrated tSNE map based on the SpaGE-imputed expression scores from scRNA-seq reference data. This enables interactive selection of cells (either through free-form selection or per cell type subclasses), confirming the consistency of the layer patterns across spatial protocols. Differential analysis between free-form cell selections proved particularly useful for identifying gene expression gradients across cortical layers and confirming them across protocols. A side-by-side comparison between the segmentation-based workflow and segmentation-free SSAM revealed a larger density of local maxima detected by SSAM compared to the segmentation-based analysis. However, the spatial patterning of cell type subclasses was highly conserved between both methods. Finally, a direct comparison between both combining strategies revealed similar cell type matching results for smFISH, MERFISH and ExSeq. For BaristaSeq, the combined matching by GMCS resulted in inconclusive results, whereas the NWCS matching still performed reasonably well.

Alongside individual publications utilizing SpaceTx tissue and data12,28,43, the SpaceTx project produced three consortium-level outputs: (i) a summary manuscript of the overarching SpaceTx Consortium effort30, and Tangram31.

ATLAS (A Tool for Learning from Atlas-scale Single-cell multi-omic measurements) (https://github.com/spacetx-spacejam/edv) uses a neural network classifier that applies a central moment discrepancy (CMD)30 (https://github.com/acycliq/pciSeq) is a Python package for probabilistic cell ty** by in situ sequencing. It uses a Bayesian algorithm, leveraging scRNA-seq data to first estimate the probability of each spot belonging to a cell and then each cell to a scRNA-seq cluster. Spots dataframe, segmentation image labels, and scRNA-seq data are required inputs to the algorithm.

Tangram31 (https://github.com/broadinstitute/Tangram) is distributed as a Python package, based on PyTorch and scanpy. Tangram requires as input a single-cell (or single-nucleus) gene expression dataset and a spatial gene expression dataset. Tangram learns an alignment for the single-cell data onto space by fitting gene expression on the shared genes. The output of the matching algorithm is a cell-by-spot matrix, that gives the probability for cell \(i\) to be in spot \(j\). Using this matching matrix, Tangram can project any annotation (e.g., cell types) from single-cell data onto space. The standard pipeline (with cell-level map**) has been applied, using functions tg.map_cells_to_space for learning the matching and tg.project_cell_annotations for projecting cell types computed on scRNA-seq data onto space.

Combining strategies for consensus matching

Geometric Median Combining Strategy (GMCS)

Given the above combining strategy weighing certain matchings over others, we also introduce an independently-developed combining strategy using a geometric median approach that considers each matching equally. Given \(m\) matchings, each matching \(c\) cells to a probability distribution over \(n\) potential cell types, we create a \(m\)-gon (polygon with \(m\) vertices) with vertices in the \(n\)-dimensional space (\({R}^{n}\)). For each of these polygons, we then find the geometric median, i.e., the point \(p\in {R}^{n}\) at which the sum of the \({L}_{2}\) norms from \(p\) to each vertex in the polygon is minimized. Intuitively, such a point considers each of the individual matchings equally, as having a point \(p\) closer to one individual matching's vertex than another would not minimize the sum of the \({L}_{2}\) norms. The confidence with which this matching assigns cell types is consequently a function of how similar or disparate constituent matchings are. Accordingly, certain data modalities for which the individual matching results largely disagree with one another, e.g., BaristaSeq, resulted in not-as-well-classified cells, whereas data modalities in which each cell's corresponding polygon is of relatively small area, e.g., MERFISH, yielded very well-defined consensus matching (Results).

Negative Weighting Combining Strategy (NWCS)

A weighting approach was designed to combine the six individual cell type matching results. An evaluation of the individual matching results revealed that: (1) The probabilistic assignments (a.k.a. confidence scores) that reflect the confidence of matching for each spatial cell to each reference cell type showed very different distributions from method to method; some were more binary as either 0 or 1 and others showed more plateau distributions (Supplementary Fig. 9). (2) Despite the distributional difference, some cells were assigned to the same cell type with the highest confidence score by all of the methods (i.e., well-matched cells), whereas other cells were only matched to a cell type with a high score by only one method (i.e., inconsistently-matched cells). In order to avoid the bias introduced by the accidental assignment of those inconsistently-matched cells, we designed a negative weighting scheme to borrow the best-matched confidence score among all methods. NWCS performs the following steps to combine the individual matching results: (1) Find the best-matched cell types of each cell by kee** the cell-wise highest confidence score. (2) Assign a negative weight (− 1) to all other cell types for each cell. (3) The combined confidence score matrix is the sum of all negatively weighted confidence score matrices of each individual method. (4) The NWCS cell type deterministic assignment is the cell type with the maximum confidence score for each cell in the combined matrix.

Segmentation-free cell type analysis method

SSAM (Spot-based Spatial cell-type Analysis by Multidimensional mRNA density estimation)32 analysis is a method that uses the guided mode to generate segmentation-free cell type assignments of the GMCS and NWCS consensus cell types. For all datasets (MERFISH, smFISH, BaristaSeq, and ExSeq), the kernel density estimation (KDE) was performed with the location of mRNAs of each gene with the bandwidth 2.5 μm. For SSAM analysis, the resulting vector field was normalized by a library size of 10, and then log-transformed. For GMCS and NWCS cell normalization, the mRNA count of each cell type cluster was normalized to a library size of 10 per cell, and then log-transformed. The gene expression signature of each consensus cell type was computed by taking the mean of all normalized cells in the same cluster. The resulting signatures were then mapped to the vector field, by computing Pearson’s correlations between each consensus signature to all pixels in the vector field. The resulting cell types were filtered with the minimum correlation threshold 0.6.