Introduction

The spatial organization of molecules in a cell is essential for performing their functions. While protein localization [1] and disease-associated mislocalization are well appreciated [2, 3], the same principles for RNA have begun to emerge. For instance, the spatial and temporal regulation of RNA play a crucial role in localized cellular processes such as cell migration and cell division [4, 5], as well as specialized cell functionalities like synaptic plasticity [6,7,8]. Mislocalization of RNA has been associated with diseases such as Huntington’s disease (HD), where defects in axonal mRNA transport and subsequent translation in human spiny neurons lead to cell death and neurodegeneration [9,10,11,12].

The study of subcellular RNA localization necessitates single-molecule measurements. Since the development of single-molecule fluorescent in situ hybridization (smFISH), recent advances in multiplexed methods such as MERFISH [13], seqFISH + [14], HybISS [15], and Ex-Seq [16] have enabled RNA localization measurements at near transcriptome scales, while maintaining single-molecule resolution. A number of computational toolkits, such as Squidpy [17], stLearn [18], Giotto [19], Seurat [20], and Scanpy [21] enabled the characterization of tissue architecture, cell–cell interactions, and spatial expression patterns. Despite the single-molecule measurements in spatial transcriptomics, these analytical approaches are limited to investigating spatial variation at the multicellular scale and lack the ability to investigate subcellular organization. To further our understanding of RNA localization and its function in normal and abnormal cell activity, we need to expand our analytical capacity to the subcellular scale.

Recent methods such as FISH-quant-v2 [22] and FISHFactor [23] identify subcellular patterns describing the spatial distribution of RNA species, but are unable to annotate more than a single gene per cell or are limited to analyze at most 20,000 molecules on accessible computing resources. In contrast, a single spatial transcriptomics experiment measures at least hundreds to thousands of genes across hundreds of thousands of cells. Additionally, methods such as ClusterMap [24] and Baysor [25] highlight the potential for transcript locations alone to inform meaningful domains such as cell and nuclear regions. Using spatial proteomics data, CAMPA [26] and Pixie [27] utilize subcellular spatial variation in protein abundance to identify subcellular regions and annotate pixel-level features.

Building on these promising approaches, we present Bento, an open-source Python toolkit for scalable analysis of spatial transcriptomics data at the subcellular resolution. Bento ingests single-molecule resolution data and segmentation masks, utilizing geospatial tools (GeoPandas [28], Rasterio [29]) for spatial analysis of molecular imaging data, and data science tools including SciPy [30] and Tensorly [31] for scalable analysis of high-dimensional feature matrices. Furthermore, Bento is a member of the Scverse ecosystem [32], enabling integration with Scanpy [21], Squidpy [17], and more than 30 other single-cell omics analysis tools.

Results

Overview of Bento data infrastructure for subcellular analysis

In order to facilitate a flexible workflow, Bento is generally compatible with molecule-level resolution spatial transcriptomics data (Fig. 1A), such as datasets produced by MERFISH [13], seqFISH + [14], CosMx (NanoString) [33], Xenium (10 × Genomics) [15, 34], and Molecular Cartography (Resolve Biosciences) [35]. Bento’s workflow takes as input (1) 2D spatial coordinates of transcripts annotated by gene and (2) segmentation boundaries (e.g., cell membrane, nuclear membrane, and any other regions of interest) (Fig. 1B). While 3D molecular coordinates are commonly included, 3D segmentation information is limited to z-stacked 2D segmentation, limiting its usability. If available, Bento can also handle arbitrary sets of segmentations for other subcellular structures or regions of interest. These inputs are stored in the AnnData data format [36], which links cell and gene metadata to standard count matrices, providing compatibility with standard single-cell RNA-seq quality control and analysis tools in the Scverse ecosystem [32]. With a data structure for segmentation boundaries and transcript coordinates in place, Bento can easily compute spatial statistics and measure spatial phenotypes to build flexible multidimensional feature sets for exploratory subcellular analysis and utilize these spatial metrics to augment quality control (Fig. 1C).

Fig. 1
figure 1

Workflow and functionality of the Bento toolkit. A Single-molecule resolved spatial transcriptomics data from commercial or custom platforms are ingested into Bento where it is converted to the AnnData format (B), where it can be manipulated with Bento as well as a wide ecosystem of single-cell omics tools. C Geometric statistics are illustrated for the seqFISH + dataset, including metrics describing cell and nuclear geometries and cell density to assess overall data quality. D Bento has a standard interface to perform a wide variety of subcellular analyses

Bento offers a precise yet flexible palette of novel complementary subcellular analyses (Fig. 1D). We introduce RNAforest, a multilabel approach for annotating RNA localization patterns adapted from FISH-quant v2 [22]. We find that many RNAs are spatially distributed according to gene function. We then implement RNAcoloc, a context-specific approach to quantify colocalization to characterize how genes colocalize with each other in a compartment-specific manner. Having established systematic patterning and organization of RNA transcripts, we demonstrate RNAflux, an unsupervised method for semantic segmentation of subcellular domains. RNAflux first quantifies subcellular expression gradients at pixel resolution before identifying consistent subcellular domains via unsupervised clustering. We demonstrate the utility of Bento’s tools by applying them to identifying critical localization changes in human iPSC-derived cardiomyocytes upon drug treatment with doxorubicin, a widely used chemotherapeutic known to cause cardiotoxicity [37].

RNAforest: utilizing subcellular landmarks to predict RNA subcellular localization

In computer vision, key points or landmarks are commonly used for tasks like facial recognition [38] and object detection. Analogous to these classical applications, we derive spatial features using cell and nucleus boundaries as landmarks to predict RNA localization patterns from spatial summary statistics. Building on the summary statistics used for classifying smFISH data in FISH-quant v2 [39], RNAforest consists of an ensemble of five binary random forest classifiers rather than a single multi-classifier model to assign one or more labels. These pattern labels, adapted from several high-throughput smFISH imaging experiments in HeLa cells [40,41,42,43], are broadly applicable to eukaryotic cells: (i) nuclear (contained in the volume of the nucleus), (ii) cytoplasmic (diffuse throughout the cytoplasm), (iii) nuclear edge (near the inner/outer nuclear membrane), (iv) cell edge (near the cell membrane), and (v) none (complete spatial randomness). It is important to note, as was done previously in FISH-quant v2 [39] that because of the 2D nature of the dataset, RNA that is in truth cytoplasmic but above or below the nucleus will still appear as though in the nucleus when collapsed in the z-dimension. As we use the FISH-quant v2 pattern simulation framework, this is accounted for in the training dataset.

We used the FISH-quant v2 simulation framework to generate realistic ground-truth data [42]. Each sample is defined as a set of points with coordinates in two dimensions, representing the set of observed transcripts for a gene in a particular cell. In total, we simulated 2000 samples per class for a total of 10,000 samples (see the “ Methods” section). We used 80% of the simulated data for training and held out the remaining 20% for testing (Additional File 1: Fig. S1A). Each sample is encoded by a set of 13 input features, describing characteristics of its spatial point distribution, including proximity to cellular compartments and extensions (features 1–3), measures of symmetry about a center of mass (features 4–6), and measures of dispersion and point density (features 7–13) (Fig. 2A). These features are normalized to morphological properties of the cell to control for variability in cell shape. A detailed description of every feature is described in Additional File 1: Table S1, and model architectures and hyperparameters are described in Additional File 2: Table S2 (see the “ Methods” section).

Fig. 2
figure 2

Subcellular localization pattern identification with RNAforest. A Thirteen spatial summary statistics are computed for every gene-cell pair describing the spatial arrangement of molecules and boundaries in relation to one another. The features (Supp. Table 1) are inputs for RNAforest, a multilabel ensemble classifier that assigns one or more subcellular localization labels: cell edge, cytoplasmic, nuclear, nuclear edge, and none. The colors for each label are used consistently throughout to figure. Top 10 genes for each label visualized for each label other than “none” in B U2-OS cells and C 3T3 cells. D and E are UpSet plots showing the proportion of measured transcripts assigned to each label. F and G show the relative label proportion across cells for each gene and are colored by the majority label (F and G). H Top 5 consistent genes for each label. I ssGEA identifies the enrichment of GO cellular component domains for each label in the 3T3 cell dataset. Stars denote p-values under thresholds defined in the legend. P-values are derived from ssGSEA permutation tests with Benjamini–Hochberg correction controlling for false discovery rate

We applied RNAforest on two datasets from different spatial platforms, cell types, and gene panel sizes: a MERFISH dataset in U2-OS cells and a seqFISH + dataset in 3T3 cells. Validation performance on manual annotation of subsets of both datasets shows that RNAforest generalizes well despite biological and technical differences (see the “ Methods” section, Additional File 1: Fig. S1B). The MERFISH dataset measured 130 genes (low plexity) with high detection efficiency per gene (111 molecules per gene per cell on average), while the seqFISH + dataset measured 10,000 genes (very high plexity) with lower detection efficiency (8 molecules per gene per cell on average) (Fig. 2B, C, Additional File 1: Fig. S1C-F). In agreement with previous work characterizing RNA localization of 411 genes [43], we find that genes commonly exhibit variability in localization across cells. This suggests that heterogeneity in localization likely generalizes to the entire transcriptome. Of the localization patterns besides “none,” “nuclear” was the most common (22.1%) in the U2-OS osteosarcoma cells (Fig. 2D, F), while “cell edge” was the most common (15.9%) in the 3T3 fibroblast cells (Fig. 2E, G).

In the U2-OS cells, we found many genes to have preferential localization in different subcellular compartments (Fig. 2H). In agreement with our RNAflux findings, we find genes known to localize to the nucleus [44, 45] to be frequently labeled “nucleus” (MALAT1, SOD2) and genes encoding secreted extracellular proteins [13] to be frequently labeled “nuclear edge” (FBN1, FBN2). As expected, we find genes preferentially “nuclear” and “nuclear edge” localized to mirror nucleus and endoplasmic reticulum genes found in a 10 k genes MERFISH study of U2-OS cells that included ER staining [3C, D). Factor 3 describes the colocalization of gene pairs in the cytoplasm of cluster 1 cells, especially a number of genes that attract PIK3CA transcripts While little is known about PIK3CA RNA interactions, the PI3K pathway regulates mitotic organization, including the regulation of dynein and dynactin motor proteins. DYNC1H1 is among the top genes attracting PI3KCA and specifically encodes cytoplasmic dynein, a motor protein critical for spindle formation and chromosomal segregation in mitosis [55]. This hints that not only is compartmental localization of RNA linked to the cell cycle [59]. We successfully reproduced the identification of unique cell types that reflect canonical expression markers (Fig. S4A and B). We apply RNAflux to cell type-disaggregated subsets of the full datasets in fields of view of interest with at least 100 cells of interest and find that fluxmaps 1–3 show different enrichment scores for Nucleus and Endoplasmic Reticulum (ER; combination of ERM and ER lumen). We note that despite the successful delineation of discrete regions in the form of fluxmaps when looking at different regions of tissue that are enriched for different cell types, nuclear and ER enrichment scores change for each fluxmap (Fig. S4C).

In summary, RNAflux finds distinct subcellular domains with consistent spatial organization and local gene composition. As an unsupervised method, RNAflux can be applied to any cell type for inferring subcellular domains from transcript locations and functionally annotated with biological enrichment analysis. This process is best performed on cell type-separated data to guard low-rank embeddings from being generated by cells of vastly different morphologies. This disaggregation is important because, unlike uniform cell lines, heterogeneous tissue composed of functionally diverse stromal cells and leukocytes should indeed be expected to have different distributions and complements of subcellular domains.

Doxorubicin-induced stress in cardiomyocytes depletes RNA from the endoplasmic reticulum

Having established Bento’s utility to characterize RNA localization in U2OS cells, we applied Bento to quantify changes in localization in the context of perturbations in cells. Specifically, we performed single-molecule spatial transcriptomics on doxorubicin-treated and untreated cardiomyocytes to identify RNA localization changes as a result of treatment.

Doxorubicin (DOX) was once one of the most effective broad-spectrum anti-cancer anthracycline antibiotics [60, 61] with particular efficacy against solid malignancies such as lung and breast cancer, as well as hematologic neoplasia [62, 63]. However, DOX’s propensity to cause cardiac damage in patients has led to significant limitations in its clinical use [64]. The exact mechanism by which DOX induces heart failure is unclear, but significant evidence suggests cardiomyocyte injury driven by oxidative stress is a major factor [62, 65,66,67,68]. Specifically, DOX causes stress and dysfunction in multiple cellular compartments in cardiomyocytes such as mitochondria, Sarco/endoplasmic reticulum (SER), deficiencies in calcium signaling, and lipid degradation at the cellular membrane [69]. We reasoned that by measuring the localization of the RNA transcripts of 100 genes crucial to cardiomyocyte health and function (Additional File 2: Table S2) and leveraging the tools developed within Bento, we could recapitulate known dysfunction of subcellular domains in cardiomyocytes upon DOX stress and measure novel RNA localization phenotypes.

We utilized a chemically defined protocol to differentiate human induced pluripotent stem cells (iPSCs) into beating cardiomyocytes and treated them with either DMSO (vehicle) or 2.5 μM DOX for 12 h before fixation (see the “ Methods” section). Single-molecule spatial transcriptomes were measured by Resolve Bioscience using Molecular Cartography. The resulting data was segmented using ClusterMap [24] for cell boundaries and Cellpose [70] for nuclei boundaries (Fig. 5A). Non-myocytes were filtered out using SLC8A1 as a canonical marker for cardiomyocytes (see the “ Methods” section, Additional File 1: Fig. S3A). Comparing vehicle and DOX-treated cardiomyocytes, we found NPPA, a classic marker for cardiac stress [71, 72], to be upregulated in DOX-treated cells (Fig. 5B). We identified subcellular domains in the vehicle and DOX-treated cardiomyocytes using RNAflux, clustering the domains into four fluxmap domains (Fig. 5C, D, Additional File 1: Fig. S3B). Enrichment of location-specific gene expression aligned domains to the nucleus (nuclear pore, nucleolus, and nucleus), ERM and OMM, ER lumen, and cytosol respectively (Fig. 5E, Additional File 1: Fig. S3C). Comparing the gene composition in each domain, we observe an overall localization bias towards both the nucleus and ERM/OMM in vehicle-treated cells (Fig. 5E top), in agreement with prior poly(A) smFISH studies [73]. However, RNA in the DOX-treated cardiomyocytes demonstrated a shift in average RNA localization away from the ERM/OMM and towards the nucleus (Fig. 5E bottom). There was no correlation when comparing the logFC in expression and the difference in nuclear composition of genes after treatment, indicating that localization towards the nucleus is not driven by transcript abundance (Spearman r = 0.07, p = 0.4944). There is evidence that 90% of genes have a half-life of less than 260 min [74], far less than the 12-h DOX treatment, indicating that the shift in RNA localization is likely due to reduced nuclear export of newly synthesized RNA from the nucleus to the ERM/OMM. Indeed, even low concentrations of DOX have been demonstrated to alter structural fibrous proteins as well as mitochondrial depolarization and fragmentation [75]. Of particular note, the RNA binding protein RBM20—a critical regulator of mRNA splicing of genes encoding key structural proteins associated with cardiac development and function—had a pronounced depletion of RNA transcripts outside of the nucleus upon DOX treatment (Fig. 5G left). With further validation, this may indicate nuclear retention and or degradation of nuclear exported RBM20 mRNA as a potential mechanism of DOX-induced cardiomyopathy. Similarly, we found the mRNA of calcium voltage-gated channel subunit CACNB2 to also deplete outside of the nucleus (Fig. 5G middle). The loss of CACNB2 translation outside of the nucleus may impact calcium signaling crucial to cardiomyocyte function [76]. Most genes only showed weak shifts in localization, similar to LAMP2 (Fig. 5G right).

Fig. 5
figure 5

Subcellular RNA localization changes upon Doxorubicin treatment in iPSC-derived cardiomyocytes. A Cardiomyocytes derived from human iPSCs were treated with DMSO or 2.5 μM DOX for 12 h. The localizations of 100 genes relevant to cardiomyocyte health and function were measured using Molecular Cartography. Cell boundaries were determined using ClusterMap and nuclei were segmented using Cellpose. B Top 10 differentially upregulated and downregulated genes in vehicle versus treatment. T-test was used for comparisons. All genes shown are significant given an adjusted p-value threshold of p < 0.01. Benjamini–Hochberg correction was used to control for the false discovery rate. Vehicle and treatment conditions have n = 7159 and 6260 cells respectively. C APEX-seq location-specific gene enrichment of fluxmap domains for the cytosol, endoplasmic reticulum membrane (ERM), endoplasmic reticulum lumen (ER Lumen), nuclear lamina, nucleus, nucleolus, nuclear pore, and outer mitochondrial matrix (OMM). D Fluxmap domains visualized for a representative field of view of cardiomyocytes for vehicle and treatment respectively highlighting cellular nuclei, ERM/OMM, ER Lumen, and cytosol. E RNAflux fluxmap enrichment of each gene averaged across vehicle and treatment cardiomyocytes captures changes in subcellular RNA localization. Top 10 genes are labeled and ranked by the largest shifts between compartment compositions. Shifts are quantified by Wasserstein distance. F Average gene enrichment in each fluxmap across vehicle and treatment conditions colored by log-fold expression demonstrates population-level shifts in transcript subcellular localization. G Visualization of RBM20, CACNB2, and LAMP2 transcripts confirms the depletion of transcripts from the perinuclear and cytosolic compartments of cardiomyocytes upon DOX treatment

Discussion

Bento seeks to interrogate biology via its “subcellular first” approach to spatial analysis, complementary to “cell-type or tissue first” spatial analysis methods. The toolkit enables quantitative, reproducible, and accessible analysis agnostic to spatial technology platforms in a standardized framework. We implement three novel methods to interrogate subcellular RNA organization: RNAforest for supervised annotation of localization patterns, RNAcoloc for compartment-aware colocalization analysis, and RNAflux for identifying transcriptionally distinct subcellular domains. We showed that with RNAflux, we were able to quantify RNA localization in a variety of contexts, including domain-specific gene localization, drug-induced changes in localization, and cell type-specific localization. With both RNAflux and RNAforest, we find that subcellular mRNA localization reflects gene function. With RNAcoloc, we explore the use of CLQ scores to quantify pairwise gene colocalization in the context of asymmetric associations.

From these results, we found three main factors to limit the effectiveness of subcellular-resolution analysis: molecule density, segmentation quality, and target panel composition. In particular, RNAflux becomes uninformative if too few molecules are detected per cell or if the number of molecules per gene is too sparse. We found that datasets with higher density, i.e., molecules per micrometer2, are less noisy and inform more coherent gradients and domains. Both the U2-OS dataset and cardiomyocyte datasets had high enough molecule density to identify consistent fluxmaps. Notably, RNAflux robustly highlights domains corresponding to the nucleus and ER despite some poor cell segmentation and partially unannotated nuclei in the cardiomyocytes dataset (Fig. 5D). As most commercial target panels are largely composed of marker genes for cell type identification, RNAflux embeddings should be interpreted carefully, especially if transcripts show little spatial variation in subcellular localization. In contrast, RNAforest performs reliably beyond a minimum of 5–10 molecules per sample but is sensitive to accurate segmentation for calculating cell morphology-dependent features (Additional File 1: Fig. SE-F). The 3T3 cells were manually segmented and the U2-OS cells had relatively accurate segmentation and were therefore amenable to applying RNAforest. We found that the segmentation in the cardiomyocytes is accurate enough for single-cell gene expression analysis, but lacked the precision needed to apply RNAforest. In the case of RNAcoloc, the limiting factor to identifying relevant biology is the target panel composition. The current focus of most target panels typically includes cell type markers and highly expressed genes, whereas it would be more informative to identify colocalizing members of protein complexes, functional pathways, or ligand-receptor pairs. Our ability to characterize relevant pathways with our curated cardiomyocyte gene panel shows how gene panel design focused on function enables discovery. Ultimately, non-targeted transcriptome-scale technologies will be necessary to unlock the full potential of subcellular biology.

A dimensional limitation of Bento is its current inability to process three-dimensional spatial transcriptomic data. While some commercially available spatial transcriptomic methods yield RNA molecular coordinates in 3D, the nuclear and cell segmentation is inevitably still two-dimensional making it difficult to interpret z-dimensional positions lacking the context of cellular geometry in 3D. However, the algorithms behind RNAforest, RNAcoloc, RNAflux, and the plethora of feature calculation functions in Bento are inherently extensible to leveraging three-dimensionality. When three-dimensional cell segmentation improves, we intend to extend Bento to support three-dimensional analysis.

Conclusions

Conventionally, RNA is treated as an intermediary vehicle encoding genomic information for protein synthesis. We began our investigation of RNA localization with the hope of understanding how the spatial organization of RNA functions as a mechanism for post-transcriptional regulation. However, RNAflux conceptually introduces using RNA molecular coordinates as a latent layer of information encoding cellular space–time. Here, we used that latent layer of information to identify subcellular domains. As spatial omic technologies improve to capture more and more information, the potential applications of such latent embeddings will grow as well. Indeed at the tissue level, this concept is already being leveraged with a recent tool, TensionMap, using RNA localization information to predict mechanical tension [77]. As applications for spatial transcriptomics grow in popularity and complexity, we envision that Bento is a platform for the next generation of tools needed to quantify the complex molecular dynamics governing normal and abnormal cellular processes.

Methods

MERFISH and seqFISH + data preprocessing

For the seqFISH + dataset, we limited the scope of our analysis to the set of genes for which at least 10 molecules were detected in at least one cell. This helped reduce sparsity in the data, resulting in 3726 genes remaining. Because pattern classification requires nuclear segmentation masks, we removed all cells lacking annotated nuclei for the remainder of 179 cells. Because the MERFISH data had a much higher number of molecules detected per gene, no gene needed to be removed. Again, cells without annotated nuclei were removed, leaving 1022 cells for downstream analysis.

Preprocessing cardiomyocytes datasets

Single-cell expression matrices of both vehicle replicates and both DOX treatment samples were concatenated as a single expression matrix. Cells were projected into two dimensions with UMAP dimensional reduction. No significant batch effects were detected. Leiden clustering was performed at resolution = 0.5 to isolate and filter out a non-myocyte population depleted in SLC8A1 expression (Additional File 1: Fig. S3A). All described preprocessing steps were performed in Scanpy [21].

RNAforest: model selection and training

We evaluated 4 base models for the multilabel classifier including random forests (RF), support vector machines (SVM), feed-forward fully-connected neural networks (NN), and convolutional neural networks (CNN). While all other models use the 13 spatial features for input (Additional File 1: Table S1), the CNN takes 64 × 64 image representations of each sample as input. Each multilabel classifier consists of 5 binary classifiers with the same base model. We used the labeled 10,000 simulated samples for training, stratifying 80% of the simulated data for training and holding out the remaining 20% for testing. To select the best hyperparameters for each multilabel classifier, we sampled from a fixed hyperparameter space with the Tree-structured Parzen Estimator algorithm and evaluated performance with fivefold cross-validation (Additional File 2: Table S2). We retrained the final model (random forest base model) on all training data with the best-performing set of hyperparameters (Additional File 1: Fig. S1E). Exact steps can be found and reproduced in notebooks stored in the GitHub repository: https://github.com/ckmah/bento-manuscript.

RNAforest: image rasterization of molecules and segmentation masks for CNN

To generate an image for a given sample, point coordinates, the cell segmentation mask, and the nuclear segmentation mask are used. The area of the cell is tiled as a 64 × 64 grid, where each bin corresponds to a pixel in the final image. Values are stored in a single channel to render a grayscale image. Pixels inside the cell are encoded as 20 and inside the nucleus encoded as 40. Bins with molecules are encoded as (40 + 20 × n) where n is the number of molecules. Finally, values are divided by 255 and capped to be between 0 and 1.

RNAforest: simulating training data

We trained a multilabel classifier to assign each gene in every cell labels from five categories: (i) nuclear (contained in the volume of the nucleus), (ii) cytoplasmic (diffuse throughout the cytoplasm), (iii) nuclear edge (near the inner/outer nuclear membrane), (iv) cell edge (near the cell membrane), and (v) none (complete spatial randomness). These categories are a consolidation of those observed in several high-throughput smFISH imaging experiments in HeLa cells [40,41,42,43]. We used the FISH-quant simulation framework (https://code.google.com/archive/p/fish-quant/) to generate realistic ground-truth images using empirically derived parameters from the mentioned high-throughput smFISH imaging experiments in HeLa cells [42]. In total, we simulate 2000 samples per class for a total of 10,000 training samples.

  1. 1.

    Cell shape: Cell morphology varies widely across cell types, and for classifier generalizability, it is important to include many different morphologies in the training set. We use a catalog of cell shapes for over 300 cells from smFISH images in HeLa cells that capture nucleus and cell membrane shape [42]. Cell shapes were obtained by cell segmentation with CellMask and nuclear segmentation was obtained from DAPI staining.

  2. 2.

    mRNA abundance: We simulated mRNA abundance at three different expression levels (40, 100, and 200 mRNA per average-sized cell) with a Poisson noise term. Consequently, total mRNA abundance per cell was between 5 and 300 transcripts.

  3. 3.

    Localization pattern: We focused on 5 possible 2D localization patterns, including cell edge, cytoplasmic, none, nuclear, and nuclear edge. Each pattern was further evaluated at 3 different degrees—weak, moderate, and strong. Moderate corresponds to a pattern typically observed in a cell, whereas weak is close to spatially random. These 5 classes aim to capture biologically relevant behavior generalizable to most cell types; there is room for additional classes describing other biologically relevant localization patterns so long as they can be accurately modeled.

RNAforest: manual annotation of true biological validation data

Using 3 individual annotators, we annotated the same 600 samples across both datasets, kee** samples with 2 or more annotator agreements as true annotations, resulting in 165 annotated seqFISH + samples and 238 annotated MERFISH samples (403 total). We used Cohen’s kappa coefficient [78] to calculate agreement between pairs of annotators for each label yielding an overall coefficient of 0.602. We found that pairwise agreement between annotators across labels was fairly consistent ranging between 0.588 and 0.628, while label-specific agreement varied more, ranging between 0.45 and 0.72 (Additional File 4: Table S4).

RNAforest: functional enrichment of gene pattern distributions

For enrichment of compartment-specific expression from **a et al. [46], scores are calculated by taking the weighted sum of gene pattern frequencies and published compartment log fold-change values (Additional File 1: Fig. S2). The Benjamini–Hochberg correction was used to correct p-values for multiple hypothesis testing.

For the seqFISH + dataset, we performed a single-sample Gene Set Enrichment Analysis [79, 80] on gene pattern frequencies to compute enrichment scores (Fig. 3I). ssGSEA was performed with the GSEApy Python package and the “GO_Cellular_Component_2021” gene set library curated by Enrichr [81]. Gene sets with a minimum size of 50 and a maximum size of 500 were analyzed.

Colocation quotient for RNA colocalization analysis

Pairwise colocalization of genes was determined for each compartment of every cell separately. In this case, each cell was divided into compartments, cytoplasm, and nucleus. The colocation quotient (CLQ) was calculated for every pair of genes \({\varvec{A}}\) and \({\varvec{B}}\). The CLQ is defined as an odds ratio of the observed to expected proportion of \({\varvec{B}}\) transcripts among neighbors of \({\varvec{A}}\) for a fixed radius r; it is formulated as:

$${CLQ}_{A\to B}=\frac{{C}_{A\to B}/{N}_{A}}{{N{\prime}}_{B}/(N-1)}$$

Here \({{\varvec{C}}}_{{\varvec{A}}\to {\varvec{B}}}\) denotes the number of \({\varvec{A}}\) transcripts of which \({\varvec{B}}\) transcripts are considered a neighbor. \({{\varvec{N}}}_{{\varvec{A}}}\) denotes the total number of \({\varvec{A}}\) transcripts, while \({\varvec{N}}{\boldsymbol{^{\prime}}}_{{\varvec{B}}}\) stands for the total number of \({\varvec{B}}\) transcripts. In the case that \({\varvec{A}}={\varvec{B}}\), \({\varvec{N}}{\boldsymbol{^{\prime}}}_{{\varvec{B}}}\) equals the total number of \({\varvec{B}}\) transcripts minus 1. \({\varvec{N}}\) denotes the total number of transcripts in the cell. Following statistical recommendations from the original formulation of the colocation quotient (CLQ), genes with fewer than 10 transcripts were not considered to reduce sparsity and improve testing power [53].

Tensor decomposition for compartment-specific colocalization

We begin by calculating the CLQ for every pair of genes within each compartment of every cell. We structure our data as follows:

  1. 1.

    Cells set: Denote the set of cells as\({\varvec{C}}\boldsymbol{ }=\boldsymbol{ }\{{{\varvec{c}}}_{1},{{\varvec{c}}}_{2},\boldsymbol{ }...,{{\varvec{c}}}_{{\varvec{n}}}\}\), where \({{\varvec{c}}}_{{\varvec{i}}}\) represents the \({{\varvec{i}}}^{{\varvec{t}}{\varvec{h}}}\) cell and \({\varvec{n}}\) is the total number of cells.

  2. 2.

    Compartments set: Every cell has the same set of compartments, represented as\({\varvec{K}}=\{{{\varvec{k}}}_{1},{{\varvec{k}}}_{2},\boldsymbol{ }...,{{\varvec{k}}}_{{\varvec{m}}}\}\), where \({{\varvec{k}}}_{{\varvec{j}}}\) is the \({{\varvec{j}}}^{{\varvec{t}}{\varvec{h}}}\) compartment within a cell.

  3. 3.

    Gene pairs set: The gene pairs are represented by \({\varvec{G}}=\{{{\varvec{g}}}_{1},{{\varvec{g}}}_{2},\boldsymbol{ }...,\boldsymbol{ }{{\varvec{g}}}_{{\varvec{p}}}\}\) where \({{\varvec{g}}}_{{\varvec{p}}}\) is the \({{\varvec{p}}}^{{\varvec{t}}{\varvec{h}}}\) gene pair.

By computing the CLQ for every combination of cells in \({\varvec{C}}\), compartments in \({\varvec{K}}\), and gene pairs in \({\varvec{G}}\), we populate a three-dimensional tensor \({\varvec{X}}\) with dimensions corresponding to these sets.

We then apply non-negative parallel factor analysis (PARAFAC) as implemented in Tensorly [31] to reduce the dimensionality of our dataset and capture the underlying patterns. For tensor decomposition, we employed non-negative parallel factor analysis as implemented in Tensorly [31]. The tensor \({\varvec{X}}\) is decomposed into the sum of \({\varvec{R}}\) factors; each factor is a three-dimensional tensor expressed as the outer product of three vectors: \({{\varvec{x}}}_{{\varvec{r}}}\) (compartment loadings), \({{\varvec{y}}}_{{\varvec{r}}}\) (cell loadings), and \({{\varvec{z}}}_{{\varvec{r}}}\) (gene pair loadings). This is denoted as follows:

$${{\widehat{X}}_{R}}={\sum }_{r=1}^{R}{x}_{r}\otimes {y}_{r}\otimes {z}_{r}$$

The optimal rank-\({\varvec{R}}\) decomposition of \({\varvec{X}}\) is determined by minimizing the error between \({\varvec{X}}\) and the reconstructed tensor \(\widehat{{\varvec{X}}}\). We use the elbow function heuristic to select the best-fit rank from a range of 2–12 factors. This approach seeks to balance the complexity of the model against the fidelity of reconstruction. Missing values in \({\varvec{X}}\) are ignored when calculating the loss.

RNAflux: unsupervised spatial embedding and subcellular domain quantization

To generate RNAflux embeddings, first, a set of query coordinates are generated tiling across the cell area on a uniform grid. This effectively downsamples the original data units (pixels) resulting in much fewer samples to compute embeddings. Let \(Q = \{{q}_{1},{q}_{2},...,{q}_{n}\}\) be the set of query coordinates and \({q}_{i}\) denote the \({i}^{th}\) query coordinate. For the MERFISH U2-OS dataset, a step size of 10 data units (pixels) was used to generate the uniform grid. For the iPSC-derived cardiomyocytes, a step size of 5 data units was used. Each query point is assigned an expression vector, counting the abundance of each gene within a fixed radius of 40 and 50 data units respectively. Each expression vector is normalized to sum to one, converting the expression vector to a composition vector, denoted as \({v}_{i}\). Similarly, the cell composition vector \({x}_{j}\) for the \({j}^{th}\) cell is calculated by normalizing the total cell expression to sum to one. The RNAflux embedding \({r}_{i}\) at query coordinate \({q}_{i}\) is defined as the difference between the \({v}_{i}\) and \({x}_{j}\) divided by \({\sigma }_{v}\), the vector of standard deviation of composition vectors \(v\).

$${r}_{i}=\frac{{v}_{i}-{x}_{j}}{{\sigma }_{v}}$$

The RNAflux embedding serves as an interpretable spatial gene embedding that quantifies highly local fluctuations in gene composition. Dimensional reduction of the embeddings is performed using truncated singular value decomposition (SVD). Truncated SVD was chosen over PCA to better handle large but sparse data. Embeddings were reduced to the top 10 components. To assign domains, self-organizing maps (SOM) as implemented in MiniSom [82] were used for low-rank quantization of query embeddings. In an analysis of the MERFISH dataset, SOMs of size 1 × k were fit across a range of 2 to 12; the best model was determined using the elbow method heuristic to evaluate quantization error. Similarly, domains were determined for the cardiomyocyte spatial transcriptomics data by fitting the vehicle and treatment samples separately, for k across a range of 2 to 8. The elbow method heuristic determined an optimal k of 6; subsequently, a k of 4 was used for further analysis for ease of interpretation.

RNAflux: visualizing spatial embeddings

The top 3 principal components of the RNAflux embeddings are transformed to map red, green, and blue values respectively. Embeddings are first quantile normalized and scaled to a minimum of 0.1 and 0.9 to avoid map** extreme quantiles to white and black. These values are then used for red, green, and blue color channels. To map the downsampled grid back to the original data units, linear interpolation was used to rescale the computed color values and fill the space between the uniform grid points.

RNAflux: enrichment of locale-specific transcriptomes derived by APEX-seq

The enrichment score for each pixel is calculated by first taking the weighted sum of its RNAflux embedding and locale-specific log fold-change values as implemented by the decoupler tool [83]. Scores for pixels within a given cell are normalized against a null distribution constructed via random permutations of the input embeddings, to produce z-scaled enrichment scores. Fluxmap domain enrichment scores are simply obtained by taking the mean score of all pixels within the boundary of each domain. Fluxmap domain overlaps are computed by counting the fraction of pixels within the boundary of each domain with a positive enrichment score.

MERFISH of U2-OS cells

MERFISH sample preparation. MERFISH measurements of 130 genes with five non-targeting blank controls were done as previously described, using the published encoding [44] and readout probes [84]. Briefly, U2-OS cells were cultured on 40 mm #1.5 coverslips that are silanized and poly-L-lysine coated [44] and subsequently fixed in 4% (vol/vol) paraformaldehyde in 1 × PBS for 15 min at room temperature. Cells were then permeabilized in 0.5% Triton X-100 for 10 min at room temperature and washed in 1 × PBS containing Murine RNase Inhibitor (NEB M0314S). Cells were preincubated with hybridization wash buffer (30% (vol/vol) formamide in 2 × SSC) for 10 min at room temperature with gentle shaking. After preincubation, the coverslip was moved to a fresh 60-mm petri dish and residual hybridization wash buffer was removed with a Kimwipe lab tissue. In the new dish, 50 uL of encoding probe hybridization buffer (2 × SSC, 30% (vol/vol) formamide, 10% (wt/vol) dextran sulfate, 1 mg ml−1 yeast tRNA, and a total concentration of 5 uM encoding probes and 1 μM of anchor probe: a 15-nt sequence of alternating dT and thymidine-locked nucleic acid (dT +) with a 5′-acrydite modification (Integrated DNA Technologies). The sample was placed in a humidified 37 °C oven for 36 to 48 h, then washed with 30% (vol/vol) formamide in 2 × SSC for 20 min at 37 °C, 20 min at room temperature. Samples were post-fixed with 4% (vol/vol) paraformaldehyde in 2 × SSC and washed with 2 × SSC with murine RNase inhibitor for 5 min. The samples were finally stained with an Alexa 488-conjugated anchor probe-readout oligo (Integrated DNA Technologies) and DAPI solution at 1 μg/ml.

MERFISH imaging

MERFISH measurements were conducted on a home-built system as described in Huang et al. [84].

MERFISH spot detection

Individual RNA molecules were decoded in MERFISH images using MERlin v0.1.6 [85]. Images were aligned across hybridization rounds by maximizing phase cross-correlation on the fiducial bead channel to adjust for drift in the position of the stage from round to round. Background was reduced by applying a high-pass filter and decoding was then performed per-pixel. For each pixel, a vector was constructed of the 16 brightness values from each of the 16 rounds of imaging. These vectors were then L2 normalized and their Euclidean distances to each of the L2 normalized barcodes from the MERFISH codebook were calculated. Pixels were assigned to the gene whose barcode they were closest to, unless the closest distance was greater than 0.512, in which case the pixel was not assigned a gene. Adjacent pixels assigned to the same gene were combined into a single RNA molecule. Molecules were filtered to remove potential false positives by comparing the mean brightness, pixel size, and distance to the closest barcode of molecules assigned to blank barcodes to those assigned to genes to achieve an estimated misidentification rate of 5%. The exact position of each molecule was calculated as the median position of all pixels consisting of the molecule.

MERFISH image segmentation

Cellpose v1.0.2 [70] was used to perform image segmentation to determine the boundaries of cells and nuclei. The nuclei boundaries were determined by running Cellpose with the “nuclei” model using default parameters on the DAPI stain channel of the pre-hybridization images. Cytoplasm boundaries were segmented with the “cyto” model and default parameters using the polyT stain channel. RNA molecules identified by MERlin were assigned to cells and nuclei by applying these segmentation masks to the positions of the molecules.

iPSC cardiac differentiation and doxorubicin treatment

Matrigel (Corning, cat # 354,277)-coated plates were used to culture iPSCs with mTESR Plus human iPSC medium (StemCell Technologies, cat # 100–0276) in a humidified incubator at 37 °C with 5% CO2. iPSCs were dissociated with Gentle Cell Dissociation Reagent (StemCell Technologies, cat # 100–0485) and passaged with mTESR Plus medium and 10uM ROCK inhibitor (Tocris, cat #1254) at a ratio of 1:12. mTESR plus medium was replaced every other day until the cells reached 80% confluency for maintenance and replating, or 90% confluency for cardiac differentiation utilizing a chemically defined protocol [86]. On day 0 of cardiac differentiation, cells were treated with 6 μM CHIR99021 (Selleck Chem, cat # S1263) in RPMI 1640 media (Gibco, cat # 11,875) and B27 minus insulin supplement (Thermo Fisher, cat # A1895601). On day 2, CHIR was removed, and cells were cultured with RPMI 1640 media and B27 minus insulin supplement (Thermo Fisher, cat # A18956). On day 3, media was replaced with RPMI media containing B27 minus insulin supplement and 5 μM Wnt-C59 (Cellagen Technologies, cat # C7641-2 s). On days 5, 7, and 9, media was replaced with RPMI media containing B27 insulin supplement (Thermo Fisher, cat # 17,504). On days 11 and 13, media was replaced with RPMI 1640 media without glucose (Thermo Fisher, cat # 11,879,020) containing B27 insulin supplement for purification of cardiomyocytes. From days 15 onward, the cells were cultured in RPMI 1640 media containing B27 supplement which was changed every other day until the cells reached day 30 for replating. For replating, cells were incubated in 10 × TrypLE (Thermo Fisher, cat # A1217701) for 12 min at 37 °C, neutralized with equal volumes of RPMI 1640 media containing B27 supplement with 20% FBS (Gibco, cat # 26,140–079), gently dissociated by pipetting, then spun down and resuspended for replating in RPMI 1640 media containing B27 supplement with 20% FBS. The next day, the cell media was replaced with RPMI 1640 media containing B27 supplement which was replaced with fresh media every other day. On day 48, the cells were replated onto chamber slides (Ibidi, cat # 80,826) as described above and recovered for 10 days before doxorubicin treatments began (MedChemExpress, cat # HY-15142). On day 60, doxorubicin treatments concluded, and the cells underwent methanol fixation.

Molecular cartography

Cultured cell processing

After doxorubicin treatment, cardiomyocytes were washed with PBS (1 ×) twice and fixed in methanol (− 20 °C) for 10 min. After fixation, Methanol was aspirated and cells were dried and stored at − 80 °C until use. The samples were used for Molecular Cartography™ (100-plex combinatorial single-molecule fluorescence in-situ hybridization) according to the manufacturer’s instructions Day 1: Molecular Preparation Protocol for cells, starting with the addition of buffer DST1 followed by cell priming and hybridization. Briefly, cells were primed for 30 min at 37 °C followed by overnight hybridization of all probes specific for the target genes (see below for probe design details and target list). Samples were washed the next day to remove excess probes and fluorescently tagged in a two-step color development process. Regions of interest were imaged as described below and fluorescent signals removed during decolorization. Color development, imaging, and decolorization were repeated for multiple cycles to build a unique combinatorial code for every target gene that was derived from raw images as described below.

Probe design

The probes for 100 genes were designed using Resolve’s proprietary design algorithm. Briefly, the probe design was performed at the gene level. For every targeted gene, all full-length protein-coding transcript sequences from the ENSEMBL database were used as design targets if the isoform had the GENCODE annotation tag “basic” [87, 88]. To speed up the process, the calculation of computationally expensive parts, especially the off-target searches, the selection of probe sequences was not performed randomly, but limited to sequences with high success rates. To filter highly repetitive regions, the abundance of k-mers was obtained from the background transcriptome using Jellyfish [89]. Every target sequence was scanned once for all k-mers, and those regions with rare k-mers were preferred as seeds for full probe design. A probe candidate was generated by extending a seed sequence until a certain target stability was reached. A set of simple rules was applied to discard sequences that were found experimentally to cause problems. After these fast screens, the remaining probe candidates were mapped to the background transcriptome using ThermonucleotideBLAST [90] and probes with stable off-target hits were discarded. Specific probes were then scored based on the number of on-target matches (isoforms), which were weighted by their associated APPRIS level [91], favoring principal isoforms over others. A bonus was added if the binding site was inside the protein-coding region. From the pool of accepted probes, the final set was composed by picking the highest-scoring probes. Probes with catalog numbers can be found in Additional File 3: Table S3.

Imaging

Samples were imaged on a Zeiss Celldiscoverer 7, using the 50 × Plan Apochromat water immersion objective with an NA of 1.2 and the 0.5 × magnification changer, resulting in a 25 × final magnification. Standard CD7 LED excitation light source, filters, and dichroic mirrors were used together with customized emission filters optimized for detecting specific signals. Excitation time per image was 1000 ms for each channel (DAPI was 20 ms). A z-stack was taken at each region with a distance per z-slice according to the Nyquist-Shannon sampling theorem. The custom CD7 CMOS camera (Zeiss Axiocam Mono 712, 3.45 μm pixel size) was used. For each region, a z-stack per fluorescent color (two colors) was imaged per imaging round. A total of 8 imaging rounds were done for each position, resulting in 16 z-stacks per region. The completely automated imaging process per round was realized by a custom Python script using the scripting API of the Zeiss ZEN software (Open application development).

Image processing and spot segmentation

As a first step, all images were corrected for background fluorescence. A target value for the allowed number of maxima was determined based on the area of the slice in μm2 multiplied by the factor 0.5. This factor was empirically optimized. The brightest maxima per plane were determined, based upon an empirically optimized threshold. The number and location of the respective maxima were stored. This procedure was done for every image slice independently. Maxima that did not have a neighboring maximum in an adjacent slice (called z-group) were excluded. The resulting maxima list was further filtered in an iterative loop by adjusting the allowed thresholds for (Babs-Bback) and (Bperi-Bback) to reach a feature target value (Babs: absolute brightness, Bback: local background, Bperi: background of periphery within 1 pixel). These feature target values were based on the volume of the 3D image. Only maxima still in a group of at least 2 after filtering were passing the filter step. Each z-group was counted as one hit. The members of the z-groups with the highest absolute brightness were used as features and written to a file. They resemble a 3D-point cloud. To align the raw data images from different imaging rounds, images had to be registered. To do so, the extracted feature point clouds were used to find the transformation matrices. For this purpose, an iterative closest point cloud algorithm was used to minimize the error between two point clouds. The point clouds of each round were aligned to the point cloud of round one (reference point cloud). The corresponding point clouds were stored for downstream processes. Based upon the transformation matrices, the corresponding images were processed by a rigid transformation using trilinear interpolation. The aligned images were used to create a profile for each pixel consisting of 16 values (16 images from two color channels in 8 imaging rounds). The pixel profiles were filtered for variance from zero normalized by the total brightness of all pixels in the profile. Matched pixel profiles with the highest score were assigned as an ID to the pixel. Pixels with neighbors having the same ID were grouped. The pixel groups were filtered by group size, number of direct adjacent pixels in group, and number of dimensions with a size of two pixels. The local 3D-maxima of the groups were determined as potential final transcript locations. Maxima were filtered by the number of maxima in the raw data images where a maximum was expected. The remaining maxima were further evaluated by the fit to the corresponding code. The remaining maxima were written to the results file and considered to resemble transcripts of the corresponding gene. The ratio of signals matching to codes used in the experiment and signals matching to codes not used in the experiment were used as estimation for specificity (false positives). The algorithms for spot segmentation were written in Java and are based on the ImageJ library functionalities. Only the iterative closest point algorithm is written in C +  + based on the libpointmatcher library (https://github.com/ethz-asl/libpointmatcher).

Image segmentation

Cellpose v1.0.2 [70] was used to perform image segmentation to determine the boundaries of nuclei. The nuclei boundaries were determined by running Cellpose with the “nuclei” model using default parameters on the DAPI stain channel of the pre-hybridization images. Cytoplasm boundaries were determined with ClusterMap [24] using spot coordinates.

Review history

The review history is available as Additional file 5.

Peer review information

Kevin Pang was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.