Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects

Elisabetta Mereu, Atefeh Lafzi, Catia Moutinho, Christoph Ziegenhain, Davis J. MacCarthy, Adrian Alvarez, Eduard Batlle, Sagar, Dominic Grün, Julia K. Lau, Stéphane C. Boutet, Chad Sanada, Aik Ooi, Robert C. Jones, Kelly Kaihara, Chris Brampton, Yasha Talaga, Yohei Sasagawa, Kaori Tanaka, Tetsutaro Hayashi, Itoshi Nikaido, Cornelius Fischer, Sascha Sauer, Timo Trefzer, Christian Conrad, Xian Adiconis, Lan T. Nguyen, Aviv Regev, Joshua Z. Levin, Swati Parekh, Aleksandar Janjic, Lucas E. Wange, Johannes W. Bagnoli, Wolfgang Enard, Marta Gut, Rickard Sandberg, Ivo Gut, Oliver Stegle, Holger Heyn

Posted on: 1 June 2019

Preprint posted on 13 May 2019

Article now published in Nature Biotechnology at http://dx.doi.org/10.1038/s41587-020-0469-4

and

Systematic comparative analysis of single cell RNA-sequencing methods

Jiarui Ding, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, Marc H. Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex K. Shalek, Alexandra-Chloé Villani, Aviv Regev, Joshua Z. Levin

Posted on:

Preprint posted on 9 May 2019

RNAseq is now a practical, high-throughput and economically viable approach to determine cell fate and state at single cell resolution. As consortia form to generate tissue/organ ‘atlases’, these preprints compare RNA sequencing techniques.

Selected by Rob Hynds

Categories: bioinformatics, cell biology, molecular biology

Background

Improvements to single cell (sc) and single nuclear RNA sequencing (snRNAseq) techniques now allow us to profile the phenotype of thousands of cells in an unbiased fashion. Rapid progress in the field has led to the identification of novel, rare cell types and also depictions of dynamic changes in cellular phenotype during development and regeneration.

In efforts to emulate the success of large-scale collaborative efforts to map cancer genomes, consortia such as the Human Cell Atlas are now being formed across the world to map the transcriptional landscapes of whole tissues, organs and organisms. However, the early cancer genome sequencing efforts were hindered by large numbers of groups undertaking independent sequencing efforts in parallel, using assorted protocols that generated diverse datasets between which it was difficult to compare mutation calls. Thus, benchmarking is of critical importance early on in such processes: it could be the difference between future investigators being able to use datasets from multiple parallel research efforts to increase the power of datasets and, in an extreme example, individual datasets being useful only for validation of the initial findings.

Since competing technologies vary in terms of cell/RNA capture, and consequently library complexity, as well as cost and scalability, these two preprints benchmark scRNAseq and snRNAseq technologies, comparing protocols side-by-side using reference samples with a view to informing future studies.

Approaches

Mereu et al. compared 13 techniques using a mixed sample containing 60% human peripheral blood mononuclear cells (PBMCs), 30% mouse colonic tissue, 6% HEK293T cell line, 3% NIH3T3 mouse embryonic fibroblast cell line and 1% dog MDCK cells. Ding et al. analysed 7 techniques (of which only Seq-Well and sci-RNA-seq were non-overlapping) in three independent samples: a 50:50 mix of human HEK293 and mouse NIH3T3 cell cultures, human PBMCs and mouse cortex tissue. The use of PBMCs in both studies produces cells of varying size, RNA quantity and doesn’t require enzymatic dissociation, while the use of cells from multiple species allowed the assessment of singlets vs. multiplets in both studies as barcodes that have a contribution from >1 species can be excluded. The use of PBMCs and mouse colon in combination allows assessment of ability to distinguish cell types and cell states, respectively, as PBMCs generate distinct clusters whereas the mouse colonic epithelium is a continuum of cell states along the crypt-villus axis.

Importantly for the interpretation of these data, low-throughput methods in which cells are sorted into multiwell plates typically generate an order of magnitude more reads per cell than higher throughput methods, making them suitable for deep characterisation of particular cell types or states. By contrast, high-throughput methods separate cells into individual wells or droplets with reagents and beads that will identify the RNA from that cell during analysis and achieve throughput at the expense of depth of interrogation.

Key Findings

Using downsampled data (i.e. reducing the number of cells analysed) to ensure fair comparison between techniques, both preprints demonstrate that the total number of genes detected across samples varies according to the technique used (Fig. 2B, Mereu et al.; Fig. 2D, Ding et al.). Unsurprisingly, lower throughput techniques generally afford deeper sequencing per cell and have reduced rates of doublets. 10X Chromium emerges as the leading candidate among the high-throughput methods in both preprints, even when datasets are substantially downsampled. When analyses were limited to known, cell type-specific ‘marker’ genes, Mereu et al. find that 83% are detected by all technologies but that the performance of different technologies varies widely. In particular, Quartz-seq2 and Smart-seq2 performed cell type identification well and so might be useful for future studies seeking to annotate poorly described tissues/organs.

As a result of protocol-dependent variables, the clustering of cells varied between experiments in both preprints (see Figure 4 from Mereu et al. below). For example, Ding et al. (Fig 5A) show that protocols differ in the proportion of cells within clusters and their ability to distinguish cell types, which is notable since PBMCs should have well demarcated population boundaries. Further, the ability to detect rare cell types in the mouse cortex varied, with pericytes only detected in one replicate of DroNc-seq and with oligodendrocyte precursors and microglia additionally missing from sci-RNA-seq datasets (Fig 6B, Ding et al.).

Figure 4 from Mereu et al.
t-SNE plots showing unsupervised clustering of human samples. Methods with good library complexity and marker detection were better clustered. Monocytes are a good example of a cell type for which some protocols clearly resolved CD14+ (cyan) and FCGR3A+ (black) subpopulations, whereas others could not.

Another key difference between protocols is mapping: more transcripts containing introns or intergene sequences were found in snRNAseq techniques due to the presence of recently transcribed, unprocessed mRNAs but wide variability was also seen between scRNAseq techniques (Fig. 1, Mereu et al.) with these sequences accounting for 7% of sequences in inDrops and 39.5% in sci-RNA-seq (Supp Fig 3., Ding et al). The proportion of anti-sense reads also varied between 31% in 10X Chromium vs 7.5% in sci-RNA-seq (Supp Fig 3., Ding et al). Finally, the detection of transcripts encoded by the mitochondria genome is of interest in studies that track mtDNA mutations to understand the lineage history of cells in a tissue. Ding et al. found some variability between protocols (Supp Fig. 4, Ding et al.) although for most techniques it was within a range expected based on bulk RNA sequencing. Of relevance for consortia performing disease characterisation, techniques such as 10X Chromium obtain short reads from the 3’ and 5’ ends of transcripts whereas other methods capture the full length. As such, the former provide less information on variants/splice junctions present in transcripts while the latter require greater sequencing depth to get equivalent numbers of transcripts represented.

The only sample type in either preprint that required enzymatic tissue dissociation was the mouse colon tissue (Mereu et al.). There were significantly more mouse colon cells in the reference snRNAseq dataset (which is less biased by sampling due to the isolation of nuclei) than in the digested samples. Samples that were dissociated but not subjected to viability selection also had greater representation of mouse cells, confirming that systematic biases can be introduced by sample preparation. Thus, for studies in which tissue composition is sought, it might be preferable to perform quality control steps in silico or to use snRNAseq, as is necessary in tissues such as brain where dissociation is not possible. Mereu et al. found nuclear RNA to be a reasonable surrogate for cytoplasmic profiles although the inclusion of introns caused some difficulties in comparisons between the two (Fig. 3C).

Both preprints generate new computational pipelines capable of handling input data from multiple scRNAseq techniques and which take on key challenges for the field, including how to remove low quality cells from data derived from differing protocols in a single analysis pipeline. Both preprints highlight the power of this approach: joint analyses are clustered primarily based on cell phenotypes, are able to resolve cell types and examples of cell state separation (e.g. T cell subpopulations) and identify rare cells, even when these were not detected by every individual technique. Ding et al. identify 10X chromium as the technique most consistent with combined datasets with regards to cell clustering. Deeper analysis within cell types did identify differences relating to the protocol-of-origin, however (Fig. 5E-H, Mereu et al.). Moreover, the ability of a technique to distinguish cells types was unrelated to its compatibility with other datasets and further downsampling led to reduced compatibility, leading the authors to suggest that consortia adopt minimum coverage thresholds to enable future dataset integration.

Conclusions

These preprints provide a resource that can be integrated with cost and scale considerations for data-guided selection of techniques in future scRNAseq experiments.
10X Chromium performs well in both studies and has the added advantage that it requires the least time to perform experiments.
The computational approaches in these preprints provide a framework for integration of datasets from alternative scRNAseq techniques and a means to assess and incorporate newer methodologies as they arise. These data will help consortia to establish high standards that allow the full benefits of parallel approaches to be realised.

Further reading

The Human Cell Atlas White Paper, The HCA Consortium (October 2017: https://www.humancellatlas.org/files/HCA_WhitePaper_18Oct2017.pdf

Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Tian et al., Nature Methods (2019).

Thanks to Dr. Emilia Lim (@LimEmilia), Dr. Kyren Lazarus (@KyrenLazarus) and the authors of both preprints for thoughtful discussions about the data.

Questions for the authors

Q1: In Mereu et al. all samples were frozen before scRNAseq whereas in Ding et al. samples were prepared fresh. Is an unqueried aspect of benchmarking the ability to compare datasets that mix fresh and frozen samples? Would the pipelines handle this?

Q2: These preprints highlight some challenges of combining scRNAseq data to gain experimental power. What are the steps that individual laboratories can take to ensure that data from smaller scRNAseq experiments become useful in future, more powerful analyses?

Q3: Large consortia rightly involve experts in scRNAseq technology development and indeed these are not the first comparisons of scRNAseq methods, but the most comprehensive and recent. Does the temptation to use newer technologies as these emerge potentially create unforeseeable benchmarking issues?

Tags: benchmarking, pipeline, replication, reproducibility, scrnaseq, sequencing

doi: https://doi.org/10.1242/prelights.10803

(No Ratings Yet)

Author's response

Jiarui Ding and Joshua Levin shared about Systematic comparative analysis of single cell RNA-sequencing methods

Thank you very much for your interest in our studies.

Q1: It’s interesting to compare data obtained from frozen and fresh tissues. Previous studies (e.g., Zheng et al., 2017) found no significant difference in the number of detected UMIs or genes from frozen and fresh human PBMCs. For different tissues and cell types, the difference in data obtained from freeze-thaw or fresh tissues could vary. In addition, for frozen tissues, a large number of cells are required to compensate the cell loss in the freeze-thaw process.

From a computational perspective, the pipeline can process data from either freeze-thaw or fresh tissues to generate gene by cell count matrices. For down-stream analyses such as merging all these data together for clustering analysis or differential expression analysis, we can treat the condition (frozen or fresh) as a confounder and deconvolute its effects on scRNA-seq data.

Q2: Although data from some platforms were better in identifying cell types, data generated from different platforms were all useful for this purpose. Therefore, generally we think that scRNA-seq methods are robust in generating data for tasks such as cell type identifications. In addition, to minimize the variations introduced by lab protocols, people can share their optimized protocols.

Of course, data should be properly pre-processed to filter low-quality cells, remove contaminations (e.g., reads are likely from other cells or ambient RNAs as evident in CEL-Seq2 data, Supp. Fig. 7 of Ding et al.), and normalize for down-stream tasks such as cell type identifications and development trajectory inferences. In addition, accurate, robust, and efficient computational algorithms are needed to take all these nuisance variables into considerations to produce predictions with uncertain information.

As HCA is generating a reference map for all the human cells, individual labs will greatly benefit from these data. New data generated by individual labs can be mapped (e.g., by a parametric function) to the reference data such that known cell types can be readily identified and importantly, potentially novel cell types or states with high-uncertainty (or low probabilities) in mapping them to the reference data can be pinpointed for detailed study.

Q3: Single cell RNA-sequencing remains a rapidly evolving field with continued development of new methods. We think that the basic metrics such as the number of UMIs or genes detected per cell are still relevant for comparison. Similarly, it’s also important to compare the data generated by new protocols in identifying cell types or tracing development trajectories. Our choice of samples and computational approaches for the current study should make it easier for future benchmarking efforts to assess new methods. We agree that new developments in the field, e.g., feature barcoding protocols to simultaneously profile both the transcriptome and a set of cell surface protein marker expression may need additional metrics for comparisons. We are excited about all these new developments and fascinated by how rapidly this cutting-edge technology is transforming our understanding of biology and human diseases.

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the bioinformatics category:

Temporal degradation of PRC2 uncovers specific developmental dependencies

Ming-Kang Lee, Sebastian D. Mackowiak, Daniel Felismino, et al.

Selected by 19 May 2026

María Mariner-Faulí

Science should be machine-readable

A. Sina Booeshaghi, Laura Luebbert, Lior Pachter

Selected by 03 May 2026

Theodora Stougiannou

Discussion

Remote homology and functional genetics unmask deeply preserved Scm3/HJURP orthologs in metazoans

Jeremy A. Hollis, Jason A. Stonick, Irini Topalidou, et al.

Selected by 21 April 2026

Reinier Prosee

Also in the cell biology category:

Combinatorial and Inducible CRISPRa/i Enables Canalized hiPSC Forward Programming and Iterative Refinement via Single-Cell Genomics

Federica Sozza, Alberto Romano, Nicole D’Elia, et al.

Selected by 01 July 2026

Cell-ID

Developmental conversion of the nucleolus into an RNA Polymerase II transcriptional platform in Drosophila spermatocytes

Jaclyn M. Fingerhut, Jun I. Park, Rebecca Y. Li, et al.

Selected by 22 June 2026

Panagiotis Giannios

Cell position is more important than cell shape or age for the acquisition of cell identity in the brown alga Ectocarpus

Denis Saint-Marcoux, Bernard Billoud, Sabine Chenivesse, et al.

Selected by 18 June 2026

Urvashi Goswami

Also in the molecular biology category:

Disordered protein COSA-2 maintains crossover-specific repair compartments to ensure meiotic crossover maturation

Celja J. Uebel, Dahlia Y. Deng, Yumi Kim, et al.

Selected by 15 July 2026

Chee Kiang Ewe

Discussion

Combinatorial and Inducible CRISPRa/i Enables Canalized hiPSC Forward Programming and Iterative Refinement via Single-Cell Genomics

Federica Sozza, Alberto Romano, Nicole D’Elia, et al.

Selected by 01 July 2026

Cell-ID

Defective BRCA1-mediated DNA end resection drives tandem duplication formation and FANCM synthetic lethality

Namrata M. Nilavar, Alberto Marin-Gonzalez, Francesca Menghi, et al.

Selected by 23 June 2026

Marta San Martin

Discussion

preLists in the bioinformatics category:

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

This preList contains preprints discussed during the Metabolic and Nutritional Control of Development and Cell Fate Keystone Symposia. This conference was organized by Lydia Finley and Ralph J. DeBerardinis and held in the Wylie Center and Tupper Manor at Endicott College, Beverly, MA, United States from May 7th to 9th 2025. This meeting marked the first in-person gathering of leading researchers exploring how metabolism influences development, including processes like cell fate, tissue patterning, and organ function, through nutrient availability and metabolic regulation. By integrating modern metabolic tools with genetic and epidemiological insights across model organisms, this event highlighted key mechanisms and identified open questions to advance the emerging field of developmental metabolism.

Benchmarking Single-Cell RNA Sequencing Protocols for Cell Atlas Projects

Systematic comparative analysis of single cell RNA-sequencing methods

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the bioinformatics category:

Temporal degradation of PRC2 uncovers specific developmental dependencies

Science should be machine-readable

Remote homology and functional genetics unmask deeply preserved Scm3/HJURP orthologs in metazoans

Also in the cell biology category:

Combinatorial and Inducible CRISPRa/i Enables Canalized hiPSC Forward Programming and Iterative Refinement via Single-Cell Genomics

Developmental conversion of the nucleolus into an RNA Polymerase II transcriptional platform in Drosophila spermatocytes

Cell position is more important than cell shape or age for the acquisition of cell identity in the brown alga Ectocarpus

Also in the molecular biology category:

Disordered protein COSA-2 maintains crossover-specific repair compartments to ensure meiotic crossover maturation

Combinatorial and Inducible CRISPRa/i Enables Canalized hiPSC Forward Programming and Iterative Refinement via Single-Cell Genomics

Defective BRCA1-mediated DNA end resection drives tandem duplication formation and FANCM synthetic lethality

preLists in the bioinformatics category:

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

‘In preprints’ from Development 2022-2023

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

Fibroblasts

Single Cell Biology 2020

Antimicrobials: Discovery, clinical use, and development of resistance

Also in the cell biology category:

Developmental regulation: molecular and ecological niches

preLighters’ choice – Handpicked DevBio preprints

BSDB Spring Meeting: Molecules to Morphogenesis

Keystone Symposium on Stem Cell Models in Embryology 2026

SciELO preprints – From 2025 onwards

November in preprints – DevBio & Stem cell biology

October in preprints – DevBio & Stem cell biology

October in preprints – Cell biology edition

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

February in preprints – the CellBio edition

Community-driven preList – Immunology

January in preprints – the CellBio edition

December in preprints – the CellBio edition

November in preprints – the CellBio edition

BSCB-Biochemical Society 2024 Cell Migration meeting

‘In preprints’ from Development 2022-2023

preLights peer support – preprints of interest

The Society for Developmental Biology 82nd Annual Meeting

CSHL 87th Symposium: Stem Cells

Journal of Cell Science meeting ‘Imaging Cell Dynamics’

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

CellBio 2022 – An ASCB/EMBO Meeting

Fibroblasts

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

FENS 2020

Planar Cell Polarity – PCP

BioMalPar XVI: Biology and Pathology of the Malaria Parasite

Cell Polarity

TAGC 2020

3D Gastruloids

ECFG15 – Fungal biology

ASCB EMBO Annual Meeting 2019

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Autophagy

Lung Disease and Regeneration

Cellular metabolism

BSCB/BSDB Annual Meeting 2019

MitoList

Biophysical Society Annual Meeting 2019

ASCB/EMBO Annual Meeting 2018

Also in the molecular biology category:

Developmental regulation: molecular and ecological niches

preLighters’ choice – Handpicked DevBio preprints

BSDB Spring Meeting: Molecules to Morphogenesis

Keystone Symposium on Stem Cell Models in Embryology 2026

SciELO preprints – From 2025 onwards