Charting a tissue from single-cell transcriptomes

Mor Nitzan, Nikos Karaiskos, Nir Friedman, Nikolaus Rajewsky

Preprint posted on October 30, 2018

Nitzan et al.'s exciting new approach promises to resuscitate the spatial context of cells from scRNAseq without marker genes

Selected by Irepan Salvador-Martinez


The field of developmental genetics has been largely built on the description of spatiotemporal patterns of gene expression during development. In the mid-80’s, in situ hybridization (ISH) using digoxygenin-labelled DNA probes was developed, which allowed the localisation of RNA products in whole mount Drosophila embryos. Papers describing spatial gene expression patterns came one after another in the following years. The big disadvantage of the ISH technique is that only 1 or few RNAs can be assessed at the same time. To overcome this limitation, large-scale ISH assays and/or great community efforts have generated databases with thousands of in situ images for a few model organisms (e.g., FlyExpress in Drosophila). However, a precise inference of gene co-expression is not possible given the staining variability between ISH experiments or biological variability of gene expression between organisms.

In parallel, DNA sequencing technology has improved rapidly in recent years, both in terms of cost and the amount of biological sample required for analysis, so now it is possible to sequence the genomic DNA or the transcriptome of a single cell. This had led to an explosion in single-cell sequencing technology, currently allowing for sequencing of >100,000 cells. scRNAseq has allowed the identification of previously unknown rare cell types and has provided a clearer picture of the gene expression differences underlying cell type differentiation. The drawback of scRNAseq is that the spatial context of the cells is lost, due to the required cell dissociation. This means that by using scRNAseq you might get to know the gene expression profiles of every single cell in a tissue/embryo, but lack information about where the cells came from in the original tissue.

The work from Nitzan et al. tries to fill this gap of scRNAseq techniques by resuscitating the spatial context of the cells. The idea is straightforward: if we know the gene expression of every single cell in a tissue, can we reconstruct its spatial organisation? Last year, Karaiskos et al., demonstrated that it was possible to use scRNAseq data of the Drosophila blastoderm to reconstruct the spatial distribution of the cells at single cell resolution[1]. To do this, they used as a reference the previously known spatial distribution of only 84 genes (binarised as ON/OFF), that were mapped onto a 2D “virtual embryo” of ~3000 bins (each bin can be thought of as a “virtual cell”). Then, gene expression data of single cells (coming from scRNAseq) was also binarised and a correlation coefficient for each cell/bin pair was calculated. Finally, every sequenced cell was assigned to one (or multiple) bins in the virtual embryo. Karaiskos et al. could not only reproduce known gene expression patterns (as revealed previously by ISH), but they could also predict the unknown expression pattern of some transcription factors and long noncoding RNAs.

The preprint

The main novelty of Nitzan et al.’s approach is that they propose a de novo reconstruction of gene expression spatial patterns, i.e., without the need of a reference map of gene expression based on marker genes. Instead, their mapping (which they name novoSpaRc) is based solely on the sequencing data and geometric features of the tissue/embryo to be reconstructed (see Figure 1). To do this, they use a well known mathematical framework called “optimal transport problem” which was originally motivated by a seemingly simple problem, how can I rearrange a pile of dirt from a monticle to another configuration by expending the minimal amount of energy?

Figure 1. Overview of novoSpaRc.


In this specific application of the optimal transport problem, the aim is to find a correspondence between two groups: the gene expression profiles coming from scRNAseq (“cells”) and the cellular locations of the tissue to be reconstructed (“positions”). They start by calculating two different parwise distance matrices: 1) a “physical distance matrix” based on the Euclidean distance between “positions” and 2) a “expression distance matrix” based on a correlation-based distance between “cells”. Then, the optimal transport problem framework is used to find a probabilistic embedding of the “cells” to “positions” that minimises the discrepancy between the distances in expression and physical space (Figure 1).

Nitzan et al. used various datasets to test their approach, including Karaiskos et al. data. They demonstrated that a de novo reconstruction of the embryo (without using marker genes) successfully separated major gene expression spatial domains (i.e., endoderm and mesoderm) and largely reconstructed gene expression patterns across the dorso-ventral and anterio-posterior axes. Amazingly, by using information of only 4 marker genes (instead of 84 genes used by Karaiskos et al.) they could reconstruct almost perfectly original fluorescence ISH data (Figure 2).

Figure 2. Reconstructing real ISH fluorescence data.

Future work

This approach is based on the strong assumption that gene expression between nearby cells is more similar than gene expression between cells which are separated by larger distances. Nitzan et al., used mostly pre-gastrulation gene expression data to test their approach, which is an ideal dataset due to the low spatial gene expression differentiation at these stages and because the gene expression patterns are essentially 2D. It still remains unclear if this approach would be applicable to more complex gene expression datasets (e.g., at later developmental stages or 3D data). Nevertheless, this work provides an exciting proof of principle and will carve the way for studies in the near future.

Questions for the authors:

  • Could this framework be used to reconstruct more complex gene expression patterns (i.e., where the assumption that genetic expression and spatial distance are correlated is not met)?
  • Could this be extended to 3D spatial coordinates?


[1] Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194-199 (2017).


Posted on: 24th January 2019


Read preprint (No Ratings Yet)

  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here