Endogenous CRISPR arrays for scalable whole organism lineage tracing

James Cotterell, James Sharpe

Preprint posted on December 20, 2018

Mutating a cell’s genome to know its history: Cotterell and Sharpe propose a CRISPR-Cas9 lineaging approach that doesn’t require transgenic animals

Selected by Irepan Salvador-Martinez


Complex animals are composed of billions of cells, all descendants of a single cell, the zygote. The relationships between every cell of a multicellular animal (i.e., its cell lineage) can be represented similar to a genealogical tree. The root of this tree represents the zygote, the terminal tips represent the cells of the adult, and every split of the tree represent a cell division. The cell lineage is one of the most important concepts in developmental biology as it is crucial to understand how multicellular organisms are built and how cell fates are determined during development.

The first successful attempts to reconstruct the cell lineage consisted of following under the microscope the successive cell divisions of an organism as it developed. This approach was famously used by Sulston in the 1980’s to determine the complete cell lineage of the nematode worm C. elegans. Unfortunately, it is not possible to use this approach in larger animals, as the cell divisions cannot be easily observed under the microscope and because the number of cells quickly become unfathomable. Some years ago it was proposed that naturally occurring somatic mutations could be used to reconstruct a cell lineage [1]. This is analogous to the molecular phylogenetic approach in which the genetic mutations that have accumulated for millions of years in certain genes are used to unravel the phylogenetic relationship between species. Even if this approach is possible in theory, it would represent a great effort to sequence, in every single cell, mutations that accumulate at random positions in the genome.

Recently, several groups have used CRISPR-Cas9 to generate mutations throughout the development of an organism and used them as lineage markers. As with CRISPR-Cas9 mutations are targeted to specific loci, these can be easily recovered afterwards. One approach is to introduce a CRISPR-Cas9 “recorder” via transgenesis, which consists of several CRISPR-Cas9 targets (as in [2]). With the recorder in place, gRNAs and the Cas9 need to be added so mutations can begin to accumulate.

The preprint

Cotterell and Sharpe present an alternative approach for CRISPR-Cas9 lineaging. Instead of introducing the CRISPR targets via transgenesis, they use endogenous genome sequences that can serve as CRISPR targets. The main advantage of this approach is that it would make the researcher’s life easier by avoiding the generation of transgenic animals. This would be especially useful in non-model organisms where transgenesis has not been established.

For the identification of endogenous sequences suitable for their use as CRISPR-target arrays, they set-up a bioinformatic pipeline, which they used to analyse mouse and the zebrafish genomes. They looked in the entire genome using a sliding window approach to identify the maximum number of CRISPR-arrays, predefined as a contiguous region with >8 CRISPR sites per 350bp window. In general, to be called a CRISPR target, a genomic sequence needs only a proto-spacer adjacent motif (PAM) that consists of a NGG sequence (N can be any of the 4 nucleotides) or NCC on the oppposite DNA strand. They used a series of filters to ensure the quality of the arrays and to reduce potential off-targeting.

Figure 1. Distribution of the filtered endogenous CRISPR arrays over the Zebrafish and Mouse genomes. Individual CRISPR arrays are represented by a red line. The result for 9 CRISPR targets per array is shown (modified from Figure 2 of the manuscript).


Using this approach they found ~3600 and ~2000 arrays in the mouse and zebrafish respectively, distributed across almost all chromosomes (Figure 1). After making sure that, in each species, every target of one specific array can be mutated independently, (Figure 2) they microinjected 1-cell stage zebrafish embryos with Cas9 and sgRNAs targeting one array extracting genomic DNA after 48 hours. The sequencing results of the extracted DNA showed thousands of specific combinations of mutations, which suggests that these method can be used to reconstruct cell lineages.

Figure 2. Examples of endogenous CRISPR arrays in Zebrafish. (Left) Primer sequences and 5’Gs of target sites are shown in bold. The CRISPR targets sites and PAM sequences are shown in blue and red respectively. (Right) Indel detection using Miseq deep sequencing of Zebrafish amplicons. Purple lines show the number of indels detected at that specific position in the amplicon. Vertical dashed lines represent the expected positions of indels (modified from Figure 3 of the manuscript).


Future research and open questions

An important drawback of this approach is that any given cell with mutational information will have two alleles different from each other (one for each chromosome pair). During sequencing, any or both of these alleles might be sequenced producing 1) an overestimation of the diversity of mutated arrays and 2) inaccurate tree reconstruction. The authors are aware of this drawback and propose that the use of Single Nucleotide Polymorphisms (SNPs) could solve it by assigning the mutated arrays to each of the alleles. This is an interesting alternative as it would make possible (as the authors mention) to generate two lineage trees for a single organism. As both alleles should give the same lineage tree, the information of both alleles could be integrated to build a more reliable consensus tree.

CRISPR-Cas9 cell lineaging is a new and growing field, where future improvements in the recorders’ design, reconstruction methods and spatial transcriptomics will improve the lineage reconstruction accuracy. Cotterell and Sharpe’s work is an important addition to the ongoing discussion on how to improve this nascent field.

Questions to the authors:

  • If transgenesis is to be avoided and both gRNAs and Cas9 are microinjected, CRISPR activity would decay with time. How does this limit the lineage recording capabilities of this approach?
  • Why did you decide on targeting a target array on an autosomal chromosome (e.g. chromosome 11 in the zebrafish) and not targeting a sex chromosome to solve the issue of having two alleles per cell?
  • It has been recently shown that after the double-strand break resulting from CRISPR-Cas9, and its repair via Non-homologous end-joining (NHEJ), certain mutational outcomes appear more frequently than others in a predictable way [3] I wonder if the authors consider this for their reconstruction method.
  • The target arrays used in here are quite compact (in nucleotides size). I wonder to what extent the authors observe “dropout” events (deletion of entire targets by the simultaneous CRISPR activity or multiple nearby sites)? These events can be quite common [2] and computer simulations have shown they have a major impact on the accuracy of lineage reconstruction [4].


[1] Frumkin, D., Wasserstrom, A., Kaplan, S., Feige, U., & Shapiro, E. (2005). Genomic Variability within an Organism Exposes Its Cell Lineage Tree. PLoS Computational Biology, 1(5).

[2] McKenna, A., Findlay, G. M., Gagnon, J. A., Horwitz, M. S., Schier, A. F., & Shendure, J. (2016). Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science, 353(6298).

[3] Chen, W., McKenna, A., Schreiber, J., Yin, Y., Agarwal, V., Noble, W. S., & Shendure, J. (2018). Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. BioRxiv.

[4] Salvador-Martínez, I., Grillo, M., Averof, M., & Telford, M. (2019) Is it possible to reconstruct an accurate cell lineage using CRISPR recorders? eLife.

Tags: cell lineage, crispr-cas9, mouse, zebrafish

Posted on: 27th February 2019

Read preprint (No Ratings Yet)

  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here