Endogenous CRISPR arrays for scalable whole organism lineage tracing

James Cotterell, James Sharpe

Preprint posted on 20 December 2018

Mutating a cell’s genome to know its history: Cotterell and Sharpe propose a CRISPR-Cas9 lineaging approach that doesn’t require transgenic animals

Selected by Irepan Salvador-Martinez


Complex animals are composed of billions of cells, all descendants of a single cell, the zygote. The relationships between every cell of a multicellular animal (i.e., its cell lineage) can be represented similar to a genealogical tree. The root of this tree represents the zygote, the terminal tips represent the cells of the adult, and every split of the tree represent a cell division. The cell lineage is one of the most important concepts in developmental biology as it is crucial to understand how multicellular organisms are built and how cell fates are determined during development.

The first successful attempts to reconstruct the cell lineage consisted of following under the microscope the successive cell divisions of an organism as it developed. This approach was famously used by Sulston in the 1980’s to determine the complete cell lineage of the nematode worm C. elegans. Unfortunately, it is not possible to use this approach in larger animals, as the cell divisions cannot be easily observed under the microscope and because the number of cells quickly become unfathomable. Some years ago it was proposed that naturally occurring somatic mutations could be used to reconstruct a cell lineage [1]. This is analogous to the molecular phylogenetic approach in which the genetic mutations that have accumulated for millions of years in certain genes are used to unravel the phylogenetic relationship between species. Even if this approach is possible in theory, it would represent a great effort to sequence, in every single cell, mutations that accumulate at random positions in the genome.

Recently, several groups have used CRISPR-Cas9 to generate mutations throughout the development of an organism and used them as lineage markers. As with CRISPR-Cas9 mutations are targeted to specific loci, these can be easily recovered afterwards. One approach is to introduce a CRISPR-Cas9 “recorder” via transgenesis, which consists of several CRISPR-Cas9 targets (as in [2]). With the recorder in place, gRNAs and the Cas9 need to be added so mutations can begin to accumulate.

The preprint

Cotterell and Sharpe present an alternative approach for CRISPR-Cas9 lineaging. Instead of introducing the CRISPR targets via transgenesis, they use endogenous genome sequences that can serve as CRISPR targets. The main advantage of this approach is that it would make the researcher’s life easier by avoiding the generation of transgenic animals. This would be especially useful in non-model organisms where transgenesis has not been established.

For the identification of endogenous sequences suitable for their use as CRISPR-target arrays, they set-up a bioinformatic pipeline, which they used to analyse mouse and the zebrafish genomes. They looked in the entire genome using a sliding window approach to identify the maximum number of CRISPR-arrays, predefined as a contiguous region with >8 CRISPR sites per 350bp window. In general, to be called a CRISPR target, a genomic sequence needs only a proto-spacer adjacent motif (PAM) that consists of a NGG sequence (N can be any of the 4 nucleotides) or NCC on the oppposite DNA strand. They used a series of filters to ensure the quality of the arrays and to reduce potential off-targeting.

Figure 1. Distribution of the filtered endogenous CRISPR arrays over the Zebrafish and Mouse genomes. Individual CRISPR arrays are represented by a red line. The result for 9 CRISPR targets per array is shown (modified from Figure 2 of the manuscript).


Using this approach they found ~3600 and ~2000 arrays in the mouse and zebrafish respectively, distributed across almost all chromosomes (Figure 1). After making sure that, in each species, every target of one specific array can be mutated independently, (Figure 2) they microinjected 1-cell stage zebrafish embryos with Cas9 and sgRNAs targeting one array extracting genomic DNA after 48 hours. The sequencing results of the extracted DNA showed thousands of specific combinations of mutations, which suggests that these method can be used to reconstruct cell lineages.

Figure 2. Examples of endogenous CRISPR arrays in Zebrafish. (Left) Primer sequences and 5’Gs of target sites are shown in bold. The CRISPR targets sites and PAM sequences are shown in blue and red respectively. (Right) Indel detection using Miseq deep sequencing of Zebrafish amplicons. Purple lines show the number of indels detected at that specific position in the amplicon. Vertical dashed lines represent the expected positions of indels (modified from Figure 3 of the manuscript).


Future research and open questions

An important drawback of this approach is that any given cell with mutational information will have two alleles different from each other (one for each chromosome pair). During sequencing, any or both of these alleles might be sequenced producing 1) an overestimation of the diversity of mutated arrays and 2) inaccurate tree reconstruction. The authors are aware of this drawback and propose that the use of Single Nucleotide Polymorphisms (SNPs) could solve it by assigning the mutated arrays to each of the alleles. This is an interesting alternative as it would make possible (as the authors mention) to generate two lineage trees for a single organism. As both alleles should give the same lineage tree, the information of both alleles could be integrated to build a more reliable consensus tree.

CRISPR-Cas9 cell lineaging is a new and growing field, where future improvements in the recorders’ design, reconstruction methods and spatial transcriptomics will improve the lineage reconstruction accuracy. Cotterell and Sharpe’s work is an important addition to the ongoing discussion on how to improve this nascent field.

Questions to the authors:

  • If transgenesis is to be avoided and both gRNAs and Cas9 are microinjected, CRISPR activity would decay with time. How does this limit the lineage recording capabilities of this approach?
  • Why did you decide on targeting a target array on an autosomal chromosome (e.g. chromosome 11 in the zebrafish) and not targeting a sex chromosome to solve the issue of having two alleles per cell?
  • It has been recently shown that after the double-strand break resulting from CRISPR-Cas9, and its repair via Non-homologous end-joining (NHEJ), certain mutational outcomes appear more frequently than others in a predictable way [3] I wonder if the authors consider this for their reconstruction method.
  • The target arrays used in here are quite compact (in nucleotides size). I wonder to what extent the authors observe “dropout” events (deletion of entire targets by the simultaneous CRISPR activity or multiple nearby sites)? These events can be quite common [2] and computer simulations have shown they have a major impact on the accuracy of lineage reconstruction [4].


[1] Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U., & Shapiro, E. (2005). Genomic Variability within an Organism Exposes Its Cell Lineage Tree. PLoS Computational Biology, 1(5).

[2] McKenna, A., Findlay, G. M., Gagnon, J. A., Horwitz, M. S., Schier, A. F., & Shendure, J. (2016). Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science, 353(6298).

[3] Chen, W., McKenna, A., Schreiber, J., Yin, Y., Agarwal, V., Noble, W. S., & Shendure, J. (2018). Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. BioRxiv.

[4] Salvador-Martínez, I., Grillo, M., Averof, M., & Telford, M. (2019) Is it possible to reconstruct an accurate cell lineage using CRISPR recorders? eLife.

Tags: cell lineage, crispr-cas9, mouse, zebrafish

Posted on: 27 February 2019


Read preprint (No Ratings Yet)

Have your say

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

Also in the developmental biology category:

2nd Conference of the Visegrád Group Society for Developmental Biology

Preprints from the 2nd Conference of the Visegrád Group Society for Developmental Biology (2-5 September, 2021, Szeged, Hungary)


List by Nándor Lipták


The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!


List by Osvaldo Contreras

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.


List by Alex Eve

EMBL Conference: From functional genomics to systems biology

Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020


List by Jesus Victorino

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.


List by Alex Eve

Society for Developmental Biology 79th Annual Meeting

Preprints at SDB 2020


List by Irepan Salvador-Martinez, Martin Estermann

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020


List by Ana Dorrego-Rivas

Planar Cell Polarity – PCP

This preList contains preprints about the latest findings on Planar Cell Polarity (PCP) in various model organisms at the molecular, cellular and tissue levels.


List by Ana Dorrego-Rivas

Cell Polarity

Recent research from the field of cell polarity is summarized in this list of preprints. It comprises of studies focusing on various forms of cell polarity ranging from epithelial polarity, planar cell polarity to front-to-rear polarity.


List by Yamini Ravichandran

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

3D Gastruloids

A curated list of preprints related to Gastruloids (in vitro models of early development obtained by 3D aggregation of embryonic cells). Updated until July 2021.


List by Paul Gerald L. Sanchez and Stefano Vianello

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)


List by Madhuja Samaddar et al.

EDBC Alicante 2019

Preprints presented at the European Developmental Biology Congress (EDBC) in Alicante, October 23-26 2019.


List by Sergio Menchero et al.

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Preprints discussed at the 2019 edition of Seeing is Believing, at EMBL Heidelberg from the 9th-12th October 2019


List by Dey Lab

SDB 78th Annual Meeting 2019

A curation of the preprints presented at the SDB meeting in Boston, July 26-30 2019. The preList will be updated throughout the duration of the meeting.


List by Alex Eve

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.


List by Rob Hynds

Young Embryologist Network Conference 2019

Preprints presented at the Young Embryologist Network 2019 conference, 13 May, The Francis Crick Institute, London


List by Alex Eve

Pattern formation during development

The aim of this preList is to integrate results about the mechanisms that govern patterning during development, from genes implicated in the processes to theoritical models of pattern formation in nature.


List by Alexa Sadier

BSCB/BSDB Annual Meeting 2019

Preprints presented at the BSCB/BSDB Annual Meeting 2019


List by Dey Lab

Zebrafish immunology

A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.


List by Shikha Nayar