Charting a tissue from single-cell transcriptomes
Preprint posted on October 30, 2018 https://www.biorxiv.org/content/early/2018/10/30/456350
The field of developmental genetics has been largely built on the description of spatiotemporal patterns of gene expression during development. In the mid-80’s, in situ hybridization (ISH) using digoxygenin-labelled DNA probes was developed, which allowed the localisation of RNA products in whole mount Drosophila embryos. Papers describing spatial gene expression patterns came one after another in the following years. The big disadvantage of the ISH technique is that only 1 or few RNAs can be assessed at the same time. To overcome this limitation, large-scale ISH assays and/or great community efforts have generated databases with thousands of in situ images for a few model organisms (e.g., FlyExpress in Drosophila). However, a precise inference of gene co-expression is not possible given the staining variability between ISH experiments or biological variability of gene expression between organisms.
In parallel, DNA sequencing technology has improved rapidly in recent years, both in terms of cost and the amount of biological sample required for analysis, so now it is possible to sequence the genomic DNA or the transcriptome of a single cell. This had led to an explosion in single-cell sequencing technology, currently allowing for sequencing of >100,000 cells. scRNAseq has allowed the identification of previously unknown rare cell types and has provided a clearer picture of the gene expression differences underlying cell type differentiation. The drawback of scRNAseq is that the spatial context of the cells is lost, due to the required cell dissociation. This means that by using scRNAseq you might get to know the gene expression profiles of every single cell in a tissue/embryo, but lack information about where the cells came from in the original tissue.
The work from Nitzan et al. tries to fill this gap of scRNAseq techniques by resuscitating the spatial context of the cells. The idea is straightforward: if we know the gene expression of every single cell in a tissue, can we reconstruct its spatial organisation? Last year, Karaiskos et al., demonstrated that it was possible to use scRNAseq data of the Drosophila blastoderm to reconstruct the spatial distribution of the cells at single cell resolution. To do this, they used as a reference the previously known spatial distribution of only 84 genes (binarised as ON/OFF), that were mapped onto a 2D “virtual embryo” of ~3000 bins (each bin can be thought of as a “virtual cell”). Then, gene expression data of single cells (coming from scRNAseq) was also binarised and a correlation coefficient for each cell/bin pair was calculated. Finally, every sequenced cell was assigned to one (or multiple) bins in the virtual embryo. Karaiskos et al. could not only reproduce known gene expression patterns (as revealed previously by ISH), but they could also predict the unknown expression pattern of some transcription factors and long noncoding RNAs.
The main novelty of Nitzan et al.’s approach is that they propose a de novo reconstruction of gene expression spatial patterns, i.e., without the need of a reference map of gene expression based on marker genes. Instead, their mapping (which they name novoSpaRc) is based solely on the sequencing data and geometric features of the tissue/embryo to be reconstructed (see Figure 1). To do this, they use a well known mathematical framework called “optimal transport problem” which was originally motivated by a seemingly simple problem, how can I rearrange a pile of dirt from a monticle to another configuration by expending the minimal amount of energy?
In this specific application of the optimal transport problem, the aim is to find a correspondence between two groups: the gene expression profiles coming from scRNAseq (“cells”) and the cellular locations of the tissue to be reconstructed (“positions”). They start by calculating two different parwise distance matrices: 1) a “physical distance matrix” based on the Euclidean distance between “positions” and 2) a “expression distance matrix” based on a correlation-based distance between “cells”. Then, the optimal transport problem framework is used to find a probabilistic embedding of the “cells” to “positions” that minimises the discrepancy between the distances in expression and physical space (Figure 1).
Nitzan et al. used various datasets to test their approach, including Karaiskos et al. data. They demonstrated that a de novo reconstruction of the embryo (without using marker genes) successfully separated major gene expression spatial domains (i.e., endoderm and mesoderm) and largely reconstructed gene expression patterns across the dorso-ventral and anterio-posterior axes. Amazingly, by using information of only 4 marker genes (instead of 84 genes used by Karaiskos et al.) they could reconstruct almost perfectly original fluorescence ISH data (Figure 2).
This approach is based on the strong assumption that gene expression between nearby cells is more similar than gene expression between cells which are separated by larger distances. Nitzan et al., used mostly pre-gastrulation gene expression data to test their approach, which is an ideal dataset due to the low spatial gene expression differentiation at these stages and because the gene expression patterns are essentially 2D. It still remains unclear if this approach would be applicable to more complex gene expression datasets (e.g., at later developmental stages or 3D data). Nevertheless, this work provides an exciting proof of principle and will carve the way for studies in the near future.
Questions for the authors:
- Could this framework be used to reconstruct more complex gene expression patterns (i.e., where the assumption that genetic expression and spatial distance are correlated is not met)?
- Could this be extended to 3D spatial coordinates?
 Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194-199 (2017).
Posted on: 24th January 2019Read preprint
Also in the bioinformatics category:
Species-specific oscillation periods of human and mouse segmentation clocks are due to cell autonomous differences in biochemical reaction parameters
|Selected by||Irepan Salvador-Martinez|
Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects
Systematic comparative analysis of single cell RNA-sequencing methods
|Selected by||Rob Hynds|
Transcriptome analysis of Plasmodium berghei during exo-erythrocytic development
|Selected by||Mariana De Niz|
Also in the genomics category:
Resolving the 3D landscape of transcription-linked mammalian chromatin folding
|Selected by||Clarice Hong|
Reconstructing the transcriptional ontogeny of maize and sorghum supports an inverse hourglass model of inflorescence development
|Selected by||Alexa Sadier|
Accurate detection of m6A RNA modifications in native RNA sequences
|Selected by||Christian Bates|
Also in the systems biology category:
Spreading of molecular mechanical perturbations on linear filaments
|Selected by||Lars Hubatsch|
Lineage tracing on transcriptional landscapes links state to fate during differentiation
|Selected by||Yen-Chung Chen|
Short-range interactions govern cellular dynamics in microbial multi-genotype systems
Rapid microbial interaction network inference in microfluidic droplets
|Selected by||Connor Rosen|
preListsbioinformatics category:in the
Antimicrobials: Discovery, clinical use, and development of resistance
Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.
|List by||Zhang-He Goh|