STEAMBOAT: Attention-based multiscale delineation of cellular interactions in tissues
Posted on: 25 April 2025
Preprint posted on 10 April 2025
Steamboat reveals how local and long-range cell interactions shape tissues, using a smart and interpretable machine learning framework.
Selected by Benjamin Dominik MaierCategories: bioinformatics
Background
Spatial-Omics Technologies
Over the last decade, many single cell or subcellular spatial omics technologies have been developed to study cellular states, phenotypes and neighborhoods within their native environment, which can be categorized into three broad groups:
- Probe-based multiplexed FISH (Fluorescence In Situ Hybridization) techniques, such as MERFISH (Multiplexed Error-Robust Fluorescence In Situ Hybridization) [1] and Xenium [2], use fluorescently labeled oligonucleotide probes that hybridize directly to RNA molecules within intact tissues or cells, allowing the localization and quantification of gene expression at single-molecule resolution.
- Antibody-based multiplexed spectrometry methods, including MIBI (Multiplexed Ion Beam Imaging) [3] and IMC (Imaging Mass Cytometry) [4], use antibodies tagged with unique metal isotopes or labels, which bind to target proteins in tissue samples and are detected using mass spectrometry or ion-based imaging.
- Next-generation sequencing methods, such as FISSEQ (Fluorescent In Situ Sequencing) [5] and Visium [6], capture RNA molecules in situ and either sequence them directly in place (FISSEQ) or extract and spatially barcode them (Visium), generating transcriptome-wide data mapped back to tissue coordinates.
These methods have led to the generation of cell atlases like the Human Cell Atlas [7] and have greatly improved our understanding of development, signalling, disease progression and immune responses by identifying regions with different gene and/or protein expressions.
Multi-Head Attention
Attention is a mechanism that allows a machine learning model to identify and focus on the most relevant parts of the input when making predictions. This is done by assigning different weights to different inputs based on their learnt importance. Multi-head attention extends this idea by using multiple attention mechanisms in parallel, allowing the model to capture different types of relationships or patterns in the data simultaneously. A great beginner friendly introduction by Ketan Doshi can be found [here].
Steamboat
In their study, Shaoheng Liang and colleagues present a novel machine learning framework, Steamboat, to model how cells interact with each other across spatial scales—within the cell (ego), locally among nearby cells (local), and long-range interactions across the tissue (global). This will allow researchers to study how multiscale cellular interactions shape cell states and spatial organization. Unlike previous models, Steamboat uses a self-supervised, multi-head attention approach to decompose gene expression into these three spatial scales (Fig. 1B). Each attention head identifies interactions between cells by computing scores based on their similarity to learned gene signatures. This enables the construction of cell/sample embeddings, interaction graphs, and reconstructed gene profiles for various downstream spatial-omics applications (Fig. 1C). Shaoheng Liang and colleagues demonstrate that Steamboat outperforms traditional methods in clustering, segmentation, and marker discovery across various cancer datasets.
Fig. 1 Steamboat Framework for Spatial Omics Analysis. Figure taken from Shaoheng Liang, Junjie Tang, Guanghan Wang, and Jian Ma (2025), BioRxiv published under the CC-BY-NC-ND 4.0 International licence.
Key Findings
First, Shaoheng Liang and colleagues use a two-layer cell simulation (Fig. 2A) to show that Steamboat accurately reflects spatial and cell type structure by capturing both cell identity and simulated intracellular signaling (Fig. 2C). Moreover, they could show that Steamboat successfully distinguishes cell intrinsic programs from external environmental influences.
Fig. 2 Steamboat learns simulated cell types and receptor expression (A), reflected in clustering and spatial distribution results (C). Figure taken from Shaoheng Liang, Junjie Tang, Guanghan Wang, and Jian Ma (2025), BioRxiv published under CC-BY-NC-ND 4.0.
Next, the authors showed that Steamboat accurately captures cell types and cell-cell interactions in ovarian cancer. After decomposing the gene expression across 27 samples into their ego, local and global environment using multiscale attention heads, they found that each cell type is represented by a few attention heads and that they cluster apart from each other in UMAP. Furthermore, they identified key signaling patterns – such as fibroblast-B cell cytokine signaling and immune-related interactions – across their top up- and downregulated genes. When comparing their cell-cell signaling links to proximity-based methods, Steamboat more closely matches the ligand-receptor-based interaction strengths from CellChat [8], demonstrating its ability to uncover functional cellular interactions de novo from spatial-omics data.
The authors then applied Steamboat to the data obtained from 129 adult mouse brain slides containing 2.6 million cells. Steamboat accurately clustered the brain cell types and segmented spatial domains using its multiscale attention approach to integrate gene expression with spatial context. Compared to three existing methods (BANKSY [9], STAGATE [10], and SEDR [11]), Steamboat achieved better performance in detecting spatial domains and subtle variations, highlighting its ability to generate biologically meaningful cell embeddings/representations.
Next, the authors investigated whether Steamboat can predict how a cell behaves when its cellular environment is changed by either (a) changing the gene expression of neighboring cells or (b) placing the cell in a new spatial context. Their proof-of-concept in silico spatial perturbation results are consistent with previously reported effects and indicate that Steambot may be suitable for generating testable hypotheses and identifying candidate molecules that shape tissue organization.
Finally, the authors analysed a colorectal cancer proteomics dataset with high inter-patient variability to identify spatial features linked to cancer prognosis. Their analysis yielded global attention patterns that distinguished high-risk from low-risk patients, capturing macroenvironmental immune-suppressive tumor environments beyond previously known localized features like tertiary lymphoid structures presence.
GitHub
The Python implementation of Steamboat, along with the experimental datasets used in this study, example workflows, and documentation can be found here: https://github.com/ma-compbio/Steamboat
Perspective
Modern bioinformatics is advancing rapidly along the axes of time, space, and scale. Spatial single-cell omics analysis is crucial to understand how cells behave and interact within their native tissue environments—insights that are essential for studying development, disease, and immune dynamics. I was particularly excited about the preliminary in silico spatial perturbation experiments as I am currently addressing them from a network biology and mathematical modelling perspective. High-quality in silico perturbations hold great promise for predictive modeling in tissue engineering, synthetic biology, and organoid design. In the long term, these approaches could significantly reduce reliance on animal models by complementing 3D ex vivo models, supporting the 3Rs (Replacement, Reduction, and Refinement) in both basic research and drug development.
Questions to the Authors
Q1: Given Steamboat’s need for single-cell resolution, what pre-processing techniques do you recommend for adapting lower-resolution datasets? How does Steamboat handle noisy or incomplete data?
Q2: How do you separate biologically meaningful spatial dependencies from artifacts or statistical correlations in attention maps?
Q3: Did you quantify redundancy or synergy between attention heads, such as correlation of attention maps/metagenes or mutual information between head outputs? Did increasing the number of heads beyond the PCA-informed value ever lead to overfitting or spurious attention patterns, particularly in small or homogeneous datasets?
Q4: Do you envision a future version of Steamboat where the number of attention heads is adaptively learned during training rather than predefined based on PCA and UMAP clustering? Did you explore quantitative alternatives (e.g., silhouette score, explained variance, or information gain) to determine the optimal number of heads beyond visual UMAP assessment?
Q5: Could Steamboat be extended to integrate spatial transcriptomics, proteomics, epigenomics, and/or metabolomics in a joint framework?
References
[1] Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S., & Zhuang, X. (2015). RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science (New York, N.Y.), 348(6233), aaa6090. https://doi.org/10.1126/science.aaa6090
[2] Janesick, A., Shelansky, R., Gottscho, A. D., Wagner, F., Williams, S. R., Rouault, M., Beliakoff, G., Morrison, C. A., Oliveira, M. F., Sicherman, J. T., Kohlway, A., Abousoud, J., Drennon, T. Y., Mohabbat, S. H., 10x Development Teams, & Taylor, S. E. B. (2023). High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nature communications, 14(1), 8353. https://doi.org/10.1038/s41467-023-43458-x
[3] Angelo, M., Bendall, S. C., Finck, R., Hale, M. B., Hitzman, C., Borowsky, A. D., Levenson, R. M., Lowe, J. B., Liu, S. D., Zhao, S., Natkunam, Y., & Nolan, G. P. (2014). Multiplexed ion beam imaging of human breast tumors. Nature medicine, 20(4), 436–442. https://doi.org/10.1038/nm.3488
[4] Chang, Q., Ornatsky, O. I., Siddiqui, I., Loboda, A., Baranov, V. I., & Hedley, D. W. (2017). Imaging Mass Cytometry. Cytometry. Part A : the journal of the International Society for Analytical Cytology, 91(2), 160–169. https://doi.org/10.1002/cyto.a.23053
[5] Lee, J. H., Daugharthy, E. R., Scheiman, J., Kalhor, R., Yang, J. L., Ferrante, T. C., Terry, R., Jeanty, S. S., Li, C., Amamoto, R., Peters, D. T., Turczyk, B. M., Marblestone, A. H., Inverso, S. A., Bernard, A., Mali, P., Rios, X., Aach, J., & Church, G. M. (2014). Highly multiplexed subcellular RNA sequencing in situ. Science (New York, N.Y.), 343(6177), 1360–1363. https://doi.org/10.1126/science.1250212
[6] Ståhl, P. L., Salmén, F., Vickovic, S., Lundmark, A., Navarro, J. F., Magnusson, J., Giacomello, S., Asp, M., Westholm, J. O., Huss, M., Mollbrink, A., Linnarsson, S., Codeluppi, S., Borg, Å., Pontén, F., Costea, P. I., Sahlén, P., Mulder, J., Bergmann, O., Lundeberg, J., … Frisén, J. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science (New York, N.Y.), 353(6294), 78–82. https://doi.org/10.1126/science.aaf2403
[7] Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A., & Teichmann, S. A. (2017). The Human Cell Atlas: from vision to reality. Nature, 550(7677), 451–453. https://doi.org/10.1038/550451a
[8] Jin, S., Guerrero-Juarez, C. F., Zhang, L., Chang, I., Ramos, R., Kuan, C. H., Myung, P., Plikus, M. V., & Nie, Q. (2021). Inference and analysis of cell-cell communication using CellChat. Nature communications, 12(1), 1088. https://doi.org/10.1038/s41467-021-21246-9
[9] Singhal, V., Chou, N., Lee, J., Yue, Y., Liu, J., Chock, W. K., Lin, L., Chang, Y. C., Teo, E. M. L., Aow, J., Lee, H. K., Chen, K. H., & Prabhakar, S. (2024). BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nature genetics, 56(3), 431–441. https://doi.org/10.1038/s41588-024-01664-3
[10] Dong, K., & Zhang, S. (2022). Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nature communications, 13(1), 1739. https://doi.org/10.1038/s41467-022-29439-6
[11] Xu, H., Fu, H., Long, Y., Ang, K. S., Sethi, R., Chong, K., Li, M., Uddamvathanak, R., Lee, H. K., Ling, J., Chen, A., Shao, L., Liu, L., & Chen, J. (2024). Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome medicine, 16(1), 12. https://doi.org/10.1186/s13073-024-01283-x
Q6: Do you plan to incorporate temporal progression into Steamboat to study spatial-temporal tissue remodeling and/or spatiotemporal lineage tracing?
doi: https://doi.org/10.1242/prelights.40184
Read preprintSign up to customise the site to your preferences and to receive alerts
Register hereAlso in the bioinformatics category:
Decoding the Molecular Language of Proteins with Evolla
Jawdat Sandakly

Tidyplots empowers life scientists with easy code-based data visualization
Felipe Del Valle Batalla

IMMClock reveals immune aging and T cell function at single-cell resolution
Jessica Chevallier

preListsbioinformatics category:
in theKeystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate
This preList contains preprints discussed during the Metabolic and Nutritional Control of Development and Cell Fate Keystone Symposia. This conference was organized by Lydia Finley and Ralph J. DeBerardinis and held in the Wylie Center and Tupper Manor at Endicott College, Beverly, MA, United States from May 7th to 9th 2024. This meeting marked the first in-person gathering of leading researchers exploring how metabolism influences development, including processes like cell fate, tissue patterning, and organ function, through nutrient availability and metabolic regulation. By integrating modern metabolic tools with genetic and epidemiological insights across model organisms, this event highlighted key mechanisms and identified open questions to advance the emerging field of developmental metabolism.
List by | Virginia Savy, Martin Estermann |
‘In preprints’ from Development 2022-2023
A list of the preprints featured in Development's 'In preprints' articles between 2022-2023
List by | Alex Eve, Katherine Brown |
9th International Symposium on the Biology of Vertebrate Sex Determination
This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.
List by | Martin Estermann |
Alumni picks – preLights 5th Birthday
This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.
List by | Sergio Menchero et al. |
Fibroblasts
The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!
List by | Osvaldo Contreras |
Single Cell Biology 2020
A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.
List by | Alex Eve |
Antimicrobials: Discovery, clinical use, and development of resistance
Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.
List by | Zhang-He Goh |