Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Zhang, Yun; Miller, Jeremy A.; Park, Jeongbin; Lelieveldt, Boudewijn P.; Long, Brian; Abdelaal, Tamim; Aevermann, Brian D.; Biancalani, Tommaso; Comiter, Charles; Dzyubachyk, Oleh; Eggermont, Jeroen; Langseth, Christoffer Mattsson; Petukhov, Viktor; Scalia, Gabriele; Vaishnav, Eeshit Dhaval; Zhao, Yilin; Lein, Ed S.; Scheuermann, Richard H.

doi:10.1038/s41598-023-36638-8

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Article
Open access
Published: 13 June 2023

Volume 13, article number 9567, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Download PDF

Yun Zhang¹^na1,
Jeremy A. Miller²^na1,
Jeongbin Park³^na1,
Boudewijn P. Lelieveldt^4,5,
Brian Long²,
Tamim Abdelaal^4,5,
Brian D. Aevermann¹^nAff12,
Tommaso Biancalani⁶^nAff13,
Charles Comiter⁶,
Oleh Dzyubachyk⁴,
Jeroen Eggermont⁴,
Christoffer Mattsson Langseth⁷,
Viktor Petukhov^8,9,
Gabriele Scalia^6,13,
Eeshit Dhaval Vaishnav⁶,
Yilin Zhao²,
Ed S. Lein² &
…
Richard H. Scheuermann^1,10,11

2687 Accesses
4 Citations
Explore all metrics

Abstract

With the advent of multiplex fluorescence in situ hybridization (FISH) and in situ RNA sequencing technologies, spatial transcriptomics analysis is advancing rapidly, providing spatial location and gene expression information about cells in tissue sections at single cell resolution. Cell type classification of these spatially-resolved cells can be inferred by matching the spatial transcriptomics data to reference atlases derived from single cell RNA-sequencing (scRNA-seq) in which cell types are defined by differences in their gene expression profiles. However, robust cell type matching of the spatially-resolved cells to reference scRNA-seq atlases is challenging due to the intrinsic differences in resolution between the spatial and scRNA-seq data. In this study, we systematically evaluated six computational algorithms for cell type matching across four image-based spatial transcriptomics experimental protocols (MERFISH, smFISH, BaristaSeq, and ExSeq) conducted on the same mouse primary visual cortex (VISp) brain region. We find that many cells are assigned as the same type by multiple cell type matching algorithms and are present in spatial patterns previously reported from scRNA-seq studies in VISp. Furthermore, by combining the results of individual matching strategies into consensus cell type assignments, we see even greater alignment with biological expectations. We present two ensemble meta-analysis strategies used in this study and share the consensus cell type matching results in the Cytosplore Viewer (https://viewer.cytosplore.org) for interactive visualization and data exploration. The consensus matching can also guide spatial data analysis using SSAM, allowing segmentation-free cell type assignment.

Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data

Article 29 October 2018

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Article Open access 13 December 2023

Probabilistic cell ty enables fine map of closely related cell types in situ

Article 18 November 2019

Introduction

Characterizing the spatial distributions of molecularly defined cell types is a shared goal of the Human Cell Atlas (HCA)¹, NIH BRAIN Initiative Cell Census Network (BICCN)², Human BioMolecular Atlas Program (HuBMAP)³ and related collaborative efforts. The core elements in this task include transcriptional classification and spatial localization of cell types, which involves integration of single cell and spatially-resolved transcriptomics to define and spatially match cell types through the analysis of combinatorial gene expression patterns in tissue sections. Single cell and single nucleus RNA sequencing (scRNA-seq) has rapidly progressed into a high throughput standardized methodology and has been used by many labs as a major workhorse for cell type classification in many organs. In contrast, spatial transcriptomics methods are still evolving, varying substantially in methodology, degree of multiplexing, cost, and throughput, and lacking consensus data standards and analysis methods.

Characterizing spatially-resolved cell types is essential in the brain in order to study the exceptional cellular heterogeneity and functional significance of its spatial organization. ScRNA-seq has revealed an unprecedented granularity of neuronal cell types in mouse and human brains^4,5,6,7, providing a comprehensive landscape of cell type heterogeneity defined by their transcriptional profiles. Recently, a number of multiplex fluorescence in situ hybridization (mFISH) and in situ RNA sequencing methods^{8,9,10,11,12,13,14,15,16,17,18} have been reported for conducting spatial transcriptomics experiments at the cellular level. Each method is independently optimized for marker gene panel design, tissue processing, transcript sequencing, and imaging steps of the pipeline, requiring different strategies for data processing, quality control, and downstream analysis. The SpaceTx Consortium, an organized effort consisting of both experimental and computational working groups, took the lead to evaluate the performance of currently available spatially-resolved transcriptomics methods in high quality cortical brain samples, with the goal of building consensus maps of cortical cell type distributions based on combined analysis of single cell and spatially-resolved transcriptomics. The overarching effort of the SpaceTx Consortium will be described in a separate publication²², and ExSeq^23,24) applied to tissue section from the mouse primary visual cortex brain region (VISp)⁴. All mRNA detection data (spot-by-gene matrices) were segmented using the same segmentation procedure—Baysor²⁵, which also included consistent quality control approaches for doublet and low-quality cell removal. The segmentation step produced the cell-by-gene matrices that were used to assign the spatially-resolved cell types to scRNA-seq reference cell types using cell type matching algorithms.

Teams of the SpaceTx Consortium explored six computational algorithms (ATLAS²⁶, FR-Match^27,28, map.cells*⁴, mfishtools²⁹, pciSeq³⁰, and Tangram³¹), which produced individual cell type assignments with various probabilistic assignment scores. To arrive at consensus cell type assignments, two meta-analysis strategies were developed to combine the individual assignments more quantitatively (Geometric Mean Combining Strategy, hereinafter GMCS), or more qualitatively (Negative Weighting Combining Strategy, hereinafter NWCS) (see “Methods” section). In parallel, spot-based cell type assignment was performed by SSAM³² using a guided mode, which partially borrows information from the combined assignment results. All spatial data and cell type assignment results were loaded into the Cytosplore Viewer (https://viewer.cytosplore.org) for interactive visualization and data exploration, where an integrated tSNE³³ map for all annotated cells in all spatial methods are presented together with single method viewers for comparative analysis.

In situ hybridization spatial transcriptomics data

The overarching SpaceTx Consortium project evaluated spatial transcriptomics technologies including multiple imaging-based spatial transcriptomics protocols alongside spatial sequencing methods, 10 × Visium, and Slide-seq. For the scope of this meta-analysis of computational evaluations, we focus on the in situ imaging-based protocols with data that has passed preliminary quality control. All data used and presented in this manuscript were generated by the experimental working group of the SpaceTx Consortium. Experimental details, including gene panel selection, tissue distribution and processing, image processing, etc., are available in²², and ExSeq^23,24 protocols. In general, the imaging-based protocols conduct multiple rounds of chemistry and microscopy to measure the location of individual mRNA molecules in the tissue section. Since each spatial method has unique requirements for numbers of genes and expression levels, each experimental protocol assembled different probe panels with specific gene sets in their design (Supplementary Table S1). Sensitivities of detection across protocols are reported in Table 1 in the SpaceTx Consortium paper³⁰, and Tangram³¹) were applied to assign reference scRNA-seq cell types to each segmented cell with an associated confidence score (or probabilistic assignment) based on the cell-by-gene count matrix (see “Methods” section). Confidence scores are metrics with values in the [0,1] range, but are defined differently in each algorithm (see “Methods” section and citations). Applying the cell type matching algorithms produces a cell-by-type matching matrix as a primary output, consisting of probabilistic assignment of each segmented cell to each of the reference cell types at the subclass level. Deterministic cell type assignment for each spatial cell is defined as the reference cell type that has the highest confidence score for each algorithm.

In this subsection and the next, we focus the discussion of cell type matching performance on MERFISH data; similar analyses were applied to the other spatial data as well. Similar but less refined matching results were observed, potentially due to the fact that fewer genes were probed in those gene panels. For further exploration, all spatial data on VISp are publicly available as a data resource at https://spacetx.github.io/data.html.

A key challenge for the deterministic assignment of cell types was the extensive differences observed among the individual matching results without the availability of a gold standard result to compare against, although some information about expected spatial distributions is available based on scRNA-seq data⁴. For example, the deterministic cell type assignment for the L2/3 IT subclass showed substantial differences in the number of matched cells (Fig. 3A) and the spatial distribution of the cells matched to the same subclass (Fig. 3B) among the individual matching methods. The differences were also reflected in the substantial amount of disagreements of cells matched to the same subclass (Fig. 3C,D).

In this example, the L2/3 IT subclass is a relatively abundant cell population consisting of intratelencephalic (IT) neurons that are expected to appear in supragranular cortical layers (layers 2 and 3). Out of the 2150 MERFISH cells for cell type matching, the number of cells matched to the L2/3 IT subclass were 349 (ATLAS), 581 (FR-Match), 693 (map.cells*), 798 (mfishtools), 637 (pciSeq), and 176 (Tangram) in the individual matching results (Fig. 3A). In all cases, the great majority of cells matching to L2/3 IT are found in supragranular cortex, as expected, although the exact footprint of this layer and number of off-target matchings varied by method (Fig. 3B). All six methods identified 127 cells in common (Fig. 3C). Tangram matched the smallest number of cells to the L2/3 IT subclass, followed by ATLAS. Most of ATLAS and Tangram matched cells were matched equivalently by the other methods, which may suggest that these methods have higher precision (a.k.a. positive predictive value) but lower sensitivity for this subclass. The other four methods identified 397 L2/3 IT cells in common (Fig. 3D), suggesting there is a relatively abundant L2/3 IT cell population (18% of all segmented cells) identifiable by the majority of methods. We may regard the method-specific cells (8 for ATLAS, 46 for FR-Match, 50 for map.cells*, 97 for mfishtools, 72 for pciSea, and 2 for Tangram) as cells that may have weaker signal and more noise in their combinatorial marker gene expression pattern; these noisy cells appear to be the major source of the observed spillover effect in the layer distributions (Supplementary Fig. S1) for this specific cell subclass.

Spatial coordinate plots with confidence score intensities for each individual matching are available in Supplementary Figs. S2–S7. The highest confidence score as the deterministic cell type assignment for each cell are plotted. All methods were able to recapitulate the laminar pattern of neuronal cells to some extent, particularly with respect to cells matched with higher confidence. Because these precision vs. sensitivity results for L2/3 IT are not necessarily representative of results from other subclasses, and since we do not have a ground truth result to assess performance against, we are not able to conclude that any specific method outperforms the others on cell type matching. Instead, we chose to computationally combine these matching methods, as has been done previously for other computational tasks such as cell type clustering³⁵ and cell morphology tracing³⁶.

Combined matchings

It is likely that the individual cell type matching methods have different advantages and experimental biases, and often produce different cell type assignments, especially in those cells with fewer total transcripts or less confident segmentation boundaries (Supplementary Fig. S8). Assuming that the majority of individual methods would produce some level of accurate cell type matching/assignment, combining their results using an ensemble approach may provide the best classification result. We used two different strategies to combine all individual matching results in the ensemble meta-analysis—GMCS and NWCS, each producing a re-calculated confidence score matrix for determining the consensus cell type assignment. The GMCS combined matching approach considers each individual matching result as the vertex of a polygon whose geometric median, the point with minimum average Euclidean distance from these vertices, serves as the combined result (see “Methods” section). The NWCS combined matching approach is a rank-based weighted average of the confidence scores from each individual matching method using only the highest score for each cell (see “Methods” section). For all matching results, deterministic cell type assignment is defined as the cell type with the highest confidence score for a given cell. The confidence scores could be used as a quantitative metric of matching strength. However, they are defined differently and show very different distributional properties from each matching method (Supplementary Fig. S9). Even though all confidence scores are in the range of [0,1], they are therefore not directly comparable across individual matching results. As such, the ranks (i.e., ordered statistics) of the scores are pragmatically more useful, with deterministic cell type assignment using the top-ranked confidence score.

Using the L2/3 IT subclass as an example, the combined matching results are more similar in the number of cells (798 for GMCS and 659 for NWCS) matched to the subclass (Fig. 4A) and the spatial distributions mostly aligned (Fig. 4B). Between the two combined matching results, the vast majority of the cells matched to the same subclass (Fig. 4C), indicating strong agreement between the two combined matchings. The combined matching GMCS and NWCS assigned 31% and 37% of all MERFISH cells to the L2/3 IT subclass, respectively, although there is still some spillover of the matched cells in the layer distribution. Using the ensemble approaches, more cells in the expected supragranular cortical layers are matched to the L2/3 IT subclass compared to individual matchings (Fig. 4D). On average (arithmetic mean), up to 75% (red curve) of cells in the supragranular cortical layers are matched to L2/3 IT across all 6 individual methods. In contrast, up to 90% (green and blue curves) of cells in the supragranular cortical layers are matched to this subclass using these ensemble approaches, suggesting that these ensemble methods are effective producing a highly compacted consensus result, whereas simple combinations (e.g., arithmetic mean) of the individual matching results are not effectively aggregating the cells matched to L2/3 IT subclass towards the expected cortical layers.

Considering all cells, the two combined matching results produced cell type assignments with 83% (= number of cells assigned to the same subclass/total number of cells) of cells being assigned to the same subclass, overcoming the large differences among individual matching results. The combined confidence score intensity matching plots for all cells are available in Supplementary Figs. S10–S11; and the distribution of all cells in cortical layers by each combined matching are in Supplementary Fig. S12. Though the distributions of matched cells in cortical layers are very similar for the abundant GABAergic and glutamatergic subclasses between the two combined matchings (Supplementary Fig. S12), they differ in rare and non-neuronal subclasses (e.g., Meis2, Endothelial, and Macrophage), suggesting that it is more difficult to detect and match rare cell types in spatial transcriptomics. Overall, these results suggest that, while individual matching algorithms may have different strengths and biases leading to somewhat different results, the ensemble methods provide a more robust cell type matching/assignment for the spatial cells.

Laminar distributions of neurons from computational methods

The spatial distribution of many known neuron types have been studied through frozen dissection, RNA scope staining, confocal imaging, etc. For example, Fig. 1 in Tasic et al.⁴ shows the laminar patterns of mouse VISp reference scRNA-seq cell types in layer dissections. With the MERFISH data and all computational matching results, we plot the calculated spatial distributions of the matched inhibitory and excitatory neurons in Fig. 5. For the inhibitory neuron types, the Vip neurons are distributed in upper layers and the Sst and Pvalb neurons are distributed in deeper layers as expected. For the excitatory neuron types, in general, the major peak of each spatial distribution curve is found in the expected layer, but with varying width of the major peak and minor peaks in some cases.

Individual matching method agreement across protocols

Experimental protocols showed different detection sensitivities across platforms (Table 1 in the SpaceTx Consortium paper²⁸. A recent publication also reported a systematic evaluation of the theoretical and practical aspects of data transformations and normalizations for single-cell RNA-seq data⁴².

In this manuscript, we first compared gene detection sensitivity and gene expression patterning across spatial experimental methods, which revealed high variability and very different dynamic ranges in the in situ hybridization data across different experimental protocols. We also presented a systematic evaluation of the individual cell type matching algorithms and the combined matching strategies using the MERFISH dataset as an example. The cell-based cell type matching algorithms were applied following the same segmentation step on the image data. Individual matching results varied largely in their metrics of matching confidence as well as their deterministic cell type assignments, among which no overall “winner” could be claimed without a gold standard result to compare against. Given the variable performance of individual matching results, we used ensemble meta-analysis approaches to combine these individual matchings to form consensus results. The meta-analysis approaches largely improved the agreement between the consensus matchings, where the majority of the cells have the same cell type assignment by the two combined matching strategies. One exception is the NWCS and GMCS results for BaristaSeq, which may further suggest that rank-based approaches (e.g., NWCS) can be practically more useful for meta-analysis and more robust to the variations from individual matching results. Using the spot-based cell type matching algorithm, similar results as the consensus results could be efficiently obtained without explicit segmentation, given that precise gene signatures are available.

A Cytosplore Viewer compilation allows all spatial cells from all evaluated experimental protocols to be viewed in an integrated tSNE map based on the SpaGE-imputed expression scores from scRNA-seq reference data. This enables interactive selection of cells (either through free-form selection or per cell type subclasses), confirming the consistency of the layer patterns across spatial protocols. Differential analysis between free-form cell selections proved particularly useful for identifying gene expression gradients across cortical layers and confirming them across protocols. A side-by-side comparison between the segmentation-based workflow and segmentation-free SSAM revealed a larger density of local maxima detected by SSAM compared to the segmentation-based analysis. However, the spatial patterning of cell type subclasses was highly conserved between both methods. Finally, a direct comparison between both combining strategies revealed similar cell type matching results for smFISH, MERFISH and ExSeq. For BaristaSeq, the combined matching by GMCS resulted in inconclusive results, whereas the NWCS matching still performed reasonably well.

Alongside individual publications utilizing SpaceTx tissue and data^12,28,43, the SpaceTx project produced three consortium-level outputs: (i) a summary manuscript of the overarching SpaceTx Consortium effort³⁰, and Tangram³¹.

ATLAS (A Tool for Learning from Atlas-scale Single-cell multi-omic measurements) (https://github.com/spacetx-spacejam/edv) uses a neural network classifier that applies a central moment discrepancy (CMD)³⁰ (https://github.com/acycliq/pciSeq) is a Python package for probabilistic cell ty** by in situ sequencing. It uses a Bayesian algorithm, leveraging scRNA-seq data to first estimate the probability of each spot belonging to a cell and then each cell to a scRNA-seq cluster. Spots dataframe, segmentation image labels, and scRNA-seq data are required inputs to the algorithm.

Tangram³¹ (https://github.com/broadinstitute/Tangram) is distributed as a Python package, based on PyTorch and scanpy. Tangram requires as input a single-cell (or single-nucleus) gene expression dataset and a spatial gene expression dataset. Tangram learns an alignment for the single-cell data onto space by fitting gene expression on the shared genes. The output of the matching algorithm is a cell-by-spot matrix, that gives the probability for cell \(i\) to be in spot \(j\). Using this matching matrix, Tangram can project any annotation (e.g., cell types) from single-cell data onto space. The standard pipeline (with cell-level map**) has been applied, using functions tg.map_cells_to_space for learning the matching and tg.project_cell_annotations for projecting cell types computed on scRNA-seq data onto space.

Combining strategies for consensus matching

Geometric Median Combining Strategy (GMCS)

Given the above combining strategy weighing certain matchings over others, we also introduce an independently-developed combining strategy using a geometric median approach that considers each matching equally. Given \(m\) matchings, each matching \(c\) cells to a probability distribution over \(n\) potential cell types, we create a \(m\)-gon (polygon with \(m\) vertices) with vertices in the \(n\)-dimensional space (\({R}^{n}\)). For each of these polygons, we then find the geometric median, i.e., the point \(p\in {R}^{n}\) at which the sum of the \({L}_{2}\) norms from \(p\) to each vertex in the polygon is minimized. Intuitively, such a point considers each of the individual matchings equally, as having a point \(p\) closer to one individual matching's vertex than another would not minimize the sum of the \({L}_{2}\) norms. The confidence with which this matching assigns cell types is consequently a function of how similar or disparate constituent matchings are. Accordingly, certain data modalities for which the individual matching results largely disagree with one another, e.g., BaristaSeq, resulted in not-as-well-classified cells, whereas data modalities in which each cell's corresponding polygon is of relatively small area, e.g., MERFISH, yielded very well-defined consensus matching (Results).

Negative Weighting Combining Strategy (NWCS)

A weighting approach was designed to combine the six individual cell type matching results. An evaluation of the individual matching results revealed that: (1) The probabilistic assignments (a.k.a. confidence scores) that reflect the confidence of matching for each spatial cell to each reference cell type showed very different distributions from method to method; some were more binary as either 0 or 1 and others showed more plateau distributions (Supplementary Fig. 9). (2) Despite the distributional difference, some cells were assigned to the same cell type with the highest confidence score by all of the methods (i.e., well-matched cells), whereas other cells were only matched to a cell type with a high score by only one method (i.e., inconsistently-matched cells). In order to avoid the bias introduced by the accidental assignment of those inconsistently-matched cells, we designed a negative weighting scheme to borrow the best-matched confidence score among all methods. NWCS performs the following steps to combine the individual matching results: (1) Find the best-matched cell types of each cell by kee** the cell-wise highest confidence score. (2) Assign a negative weight (− 1) to all other cell types for each cell. (3) The combined confidence score matrix is the sum of all negatively weighted confidence score matrices of each individual method. (4) The NWCS cell type deterministic assignment is the cell type with the maximum confidence score for each cell in the combined matrix.

Segmentation-free cell type analysis method

SSAM (Spot-based Spatial cell-type Analysis by Multidimensional mRNA density estimation)³² analysis is a method that uses the guided mode to generate segmentation-free cell type assignments of the GMCS and NWCS consensus cell types. For all datasets (MERFISH, smFISH, BaristaSeq, and ExSeq), the kernel density estimation (KDE) was performed with the location of mRNAs of each gene with the bandwidth 2.5 μm. For SSAM analysis, the resulting vector field was normalized by a library size of 10, and then log-transformed. For GMCS and NWCS cell normalization, the mRNA count of each cell type cluster was normalized to a library size of 10 per cell, and then log-transformed. The gene expression signature of each consensus cell type was computed by taking the mean of all normalized cells in the same cluster. The resulting signatures were then mapped to the vector field, by computing Pearson’s correlations between each consensus signature to all pixels in the vector field. The resulting cell types were filtered with the minimum correlation threshold 0.6.

Data availability

Experimental data collected in this study that passed quality control are available as downloadable datasets at https://spacetx.github.io/data.html. Each dataset is associated with a readme document, including contact information, cell-by-gene count matrices, mapped cell tables, and spot tables. The raw data are summarized in Table 1.

Code availability

Data analyses were performed using open-source software. The cell segmentation method Baysor is available at https://github.com/kharchenkolab/Baysor (v0.5.0). The cell type matching algorithms: FR-Match is available at https://github.com/JCVenterInstitute/FRmatch (v2.0), map.cells* is available at https://github.com/AllenInstitute/scrattch.hicat (v1.0.0), mfishtools is available at https://github.com/AllenInstitute/mfishtools (v0.0.2), pciSeq is available at https://github.com/acycliq/pciSeq (v0.0.45), Tangram is available at https://github.com/broadinstitute/Tangram (v1.0.0). The segmentation-free algorithm SSAM is available at https://github.com/HiDiHlabs/ssam (v1.0.2). The data visualization software Cytosplore iewer is available at https://viewer.cytosplore.org (v3.0.0). The reproducible code for the results reported in this study are available at https://spacetx.github.io/results.html.

References

Regev, A. et al. The human cell atlas. Elife 6, e27041 (2017).
Article PubMed PubMed Central Google Scholar
BRAIN Initiative Cell Census Network (BICCN). A multimodal cell census and atlas of the mammalian primary motor cortex. Nature 598(7879), 86–102 (2021).
Article Google Scholar
HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574(7777), 187–192 (2019).
Article ADS CAS Google Scholar
Tasic, B. et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature 563(7729), 72–78 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573(7772), 61–68 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Hodge, R. D. et al. Transcriptomic evidence that von Economo neurons are regionally specialized extratelencephalic-projecting excitatory neurons. Nat. Commun. 11(1), 1–14 (2020).
Article Google Scholar
Bakken, T. E. et al. Comparative cellular analysis of motor cortex in human, marmoset and mouse. Nature 598(7879), 111–119 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348(6233), aaa6090 (2015).
Article PubMed PubMed Central Google Scholar
Jemt, A. et al. An automated approach to prepare tissue-derived spatially barcoded RNA-sequencing libraries. Sci. Rep. 6(1), 1–9 (2016).
Article Google Scholar
Kebschull, J. M. et al. High-throughput map** of single-neuron projections by sequencing of barcoded RNA. Neuron 91(5), 975–987 (2016).
Article CAS PubMed PubMed Central Google Scholar
La Manno, G. et al. Molecular diversity of midbrain development in mouse, human, and stem cells. Cell 167(2), 566-580.e519 (2016).
Article PubMed PubMed Central Google Scholar
Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362(6416), eaau5324 (2018).
Article ADS PubMed PubMed Central Google Scholar
Moffitt, J. R. et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl. Acad. Sci. 113(39), 11046–11051 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Moffitt, J. R., & Zhuang, X. RNA imaging with multiplexed error-robust fluorescence in situ hybridization (MERFISH), in Methods in Enzymology, Vol. 572 1–49 (Elsevier, 2016).
Shah, S. et al. Single-molecule RNA detection at depth by hybridization chain reaction and tissue hydrogel embedding and clearing. Development 143(15), 2862–2867 (2016).
CAS PubMed PubMed Central Google Scholar
Shah, S., Lubeck, E., Zhou, W. & Cai, L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92(2), 342–357 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353(6294), 78–82 (2016).
Article ADS PubMed Google Scholar
Eng, C.-H.L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature 568(7751), 235–239 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Long, B., Miller, J., & The SpaceTx Consortium. SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain. ar**v preprint ar**v:230108436; 2023.
**a, C., Fan, J., Emanuel, G., Hao, J. & Zhuang, X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc. Natl. Acad. Sci. 116(39), 19490–19499 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, M. et al. Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature 598(7879), 137–143 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, X. et al. High-throughput map** of long-range neuronal projection using in situ sequencing. Cell 179(3), 772-786.e719 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, F. et al. Nanoscale imaging of RNA with expansion microscopy. Nat. Methods 13(8), 679–684 (2016).
Article CAS PubMed PubMed Central Google Scholar
Alon, S. et al. Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems. Science 371(6528), eaax2656 (2021).
Article CAS PubMed PubMed Central Google Scholar
Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics. Nat. Biotechnol. 40, 1–10 (2021).
Google Scholar
Vaishnav, E. D. Evolution, Evolvability, Expression and Engineering (Massachusetts Institute of Technology, 2022).
Google Scholar
Zhang, Y. et al. FR-Match: Robust matching of cell type clusters from single cell RNA sequencing data using the Friedman–Rafsky non-parametric test. Brief. Bioinform. 22(4), bbaa339 (2021).
Article PubMed Google Scholar
Zhang, Y., Aevermann, B., Gala, R. & Scheuermann, R. H. Cell type matching in single-cell RNA-sequencing data using FR-Match. Sci. Rep. 12(1), 9996 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Nicovich, P. R., Taormina, M. J., Baker, C. A., Nguyen, T. N., Thomsen, E. R., Garren, E., Long, B., Gorham, M., Miller, J., Hage, T. Multimodal cell type correspondence by intersectional mFISH in intact tissues. bioRxiv 525451 (2019).
Qian, X. et al. Probabilistic cell ty** enables fine map** of closely related cell types in situ. Nat. Methods 17(1), 101–106 (2020).
Article CAS PubMed Google Scholar
Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18(11), 1352–1362 (2021).
Article PubMed PubMed Central Google Scholar
Park, J. et al. Cell segmentation-free inference of cell types from in situ transcriptomics data. Nat. Commun. 12(1), 1–13 (2021).
Google Scholar
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008).
MATH Google Scholar
Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445(7124), 168–176 (2007).
Article ADS CAS PubMed Google Scholar
Risso, D. et al. clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets. PLoS Comput. Biol. 14(9), e1006378 (2018).
Article ADS PubMed PubMed Central Google Scholar
Wang, C.-W., Lee, Y.-C., Pradana, H., Zhou, Z. & Peng, H. Ensemble neuron tracer for 3D neuron reconstruction. Neuroinformatics 15(2), 185–198 (2017).
Article PubMed Google Scholar
McHugh, M. L. Interrater reliability: the kappa statistic. Biochem. Med. 22(3), 276–282 (2012).
Article MathSciNet Google Scholar
Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48(18), e107–e107 (2020).
Article CAS PubMed PubMed Central Google Scholar
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18(1), 100–106 (2021).
Article CAS PubMed Google Scholar
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361(6400), eaat5691 (2018).
Article PubMed PubMed Central Google Scholar
Palla, G. et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods 19(2), 171–178 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ahlmann-Eltze, C., & Huber, W. Comparison of transformations for single-cell RNA-seq data. Nat. Methods 1–8 (2023).
Langseth, C. M. et al. Comprehensive in situ map** of human cortical transcriptomic cell types. Commun. Biol. 4(1), 1–7 (2021).
Article Google Scholar
Fang, R. et al. Conservation and divergence of cortical cell organization in human and mouse revealed by MERFISH. Science 377(6601), 56–62 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19, 1–9 (2022).
Article CAS Google Scholar
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184(13), 3573-3587.e3529 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., & Saminger-Platz, S. Central moment discrepancy (CMD) for domain-invariant representation learning. ar**v preprint ar**v:170208811 2017.
Zellinger, W. et al. Robust unsupervised domain adaptation for neural networks via moment alignment. Inf. Sci. 483, 174–191 (2019).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work was supported by the Chan Zuckerberg Initiative DAF, an advised fund of the Silicon Valley Community Foundation (2017-174399, 2018-182730); the U.S. National Institutes of Health (RF1MH123220); the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Gravitation (NWO: 024.004.012); NWO TTW project 3DOMICS (NWO: 17126); the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government MSIT (No.2020-2-01450, Artificial Intelligence Convergence Research Center, Pusan National University); and the National Research Foundation of Korea (NRF) grant funded by the Korea government MSIT (No. 2022R1F1A1076160). The funding bodies had no role in the design or conclusions of this study.

Author information

Brian D. Aevermann
Present address: Chan Zuckerberg Initiative, Redwood City, CA, USA
Tommaso Biancalani
Present address: Genentech, South San Francisco, CA, USA
These authors contributed equally: Yun Zhang, Jeremy A. Miller and Jeongbin Park.

Authors and Affiliations

J. Craig Venter Institute, La Jolla, CA, USA
Yun Zhang, Brian D. Aevermann & Richard H. Scheuermann
Allen Institute for Brain Science, Seattle, WA, USA
Jeremy A. Miller, Brian Long, Yilin Zhao & Ed S. Lein
School of Biomedical Convergence Engineering, Pusan National University, Busan, Korea
Jeongbin Park
LKEB, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Boudewijn P. Lelieveldt, Tamim Abdelaal, Oleh Dzyubachyk & Jeroen Eggermont
Pattern Recognition and Bioinformatics Group, Delft University of Technology, Delft, The Netherlands
Boudewijn P. Lelieveldt & Tamim Abdelaal
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Tommaso Biancalani, Charles Comiter, Gabriele Scalia & Eeshit Dhaval Vaishnav
Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
Christoffer Mattsson Langseth
Biotech Research and Innovation Centre, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Viktor Petukhov
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Viktor Petukhov
Department of Pathology, University of California, San Diego, CA, USA
Richard H. Scheuermann
Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, USA
Richard H. Scheuermann
Genentech, South San Francisco, CA, USA
Gabriele Scalia

Authors

Yun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy A. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Jeongbin Park
View author publications
You can also search for this author in PubMed Google Scholar
Boudewijn P. Lelieveldt
View author publications
You can also search for this author in PubMed Google Scholar
Brian Long
View author publications
You can also search for this author in PubMed Google Scholar
Tamim Abdelaal
View author publications
You can also search for this author in PubMed Google Scholar
Brian D. Aevermann
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Biancalani
View author publications
You can also search for this author in PubMed Google Scholar
Charles Comiter
View author publications
You can also search for this author in PubMed Google Scholar
Oleh Dzyubachyk
View author publications
You can also search for this author in PubMed Google Scholar
Jeroen Eggermont
View author publications
You can also search for this author in PubMed Google Scholar
Christoffer Mattsson Langseth
View author publications
You can also search for this author in PubMed Google Scholar
Viktor Petukhov
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Scalia
View author publications
You can also search for this author in PubMed Google Scholar
Eeshit Dhaval Vaishnav
View author publications
You can also search for this author in PubMed Google Scholar
Yilin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ed S. Lein
View author publications
You can also search for this author in PubMed Google Scholar
Richard H. Scheuermann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.Z., J.A.M, J.P., B.P.L., B.L. and R.H.S. conceived the study, designed the meta-analysis plan, and wrote the manuscript. Y.Z., J.A.M, J.P. and B.P.L. led the data analyses and software development. B.L. and J.A.M. managed and processed the spatial transcriptomics datasets. J.P and B.L. led the website development. T.A., B.D.A., T.B., C.C., O.D., J.E., C.M.L., V.P., G.S., E.D.V and Y.Z. contributed to data analyses. R.H.S. and E.S.L. supervised the work.

Corresponding author

Correspondence to Richard H. Scheuermann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Miller, J.A., Park, J. et al. Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain. Sci Rep 13, 9567 (2023). https://doi.org/10.1038/s41598-023-36638-8

Download citation

Received: 18 December 2022
Accepted: 07 June 2023
Published: 13 June 2023
DOI: https://doi.org/10.1038/s41598-023-36638-8
Springer Nature Limited

This article is cited by

Spatial transcriptomics in health and disease
- Sanjay Jain
- Michael T. Eadon
Nature Reviews Nephrology (2024)
Challenges and best practices in omics benchmarking
- Thomas G. Brooks
- Nicholas F. Lahens
- Gregory R. Grant
Nature Reviews Genetics (2024)

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Abstract

Similar content being viewed by others

Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Probabilistic cell ty enables fine map of closely related cell types in situ

Introduction

In situ hybridization spatial transcriptomics data

Combined matchings

Laminar distributions of neurons from computational methods

Individual matching method agreement across protocols

Combining strategies for consensus matching

Geometric Median Combining Strategy (GMCS)

Negative Weighting Combining Strategy (NWCS)

Segmentation-free cell type analysis method

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

About this article

Cite this article

This article is cited by

Spatial transcriptomics in health and disease

Challenges and best practices in omics benchmarking

Navigation

Reference-based cell type matching of in situ image-based spatial transcriptomics data on primary visual cortex of mouse brain

Abstract

Similar content being viewed by others

Introduction

In situ hybridization spatial transcriptomics data

Combined matchings

Laminar distributions of neurons from computational methods

Individual matching method agreement across protocols

Combining strategies for consensus matching

Geometric Median Combining Strategy (GMCS)

Negative Weighting Combining Strategy (NWCS)

Segmentation-free cell type analysis method

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation