Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

Arnav Moudgil, Michael N Wilkinson, Xuhua Chen, June He, Alexander J Cammack, Michael J Vasek, Tomas Lagunas Jr., Zongtai Qi, Samantha A Morris, Joseph D Dougherty, Robi D Mitra

Preprint posted on February 01, 2019

Making a mark on gene regulatory networks: A method using transposon insertions directed by a transcription factor allows simultaneous mapping of transcription factor binding sites and gene expression in single cells.

Selected by James Briscoe

Categories: genomics



One of the aspects of preprints that I’ve found particularly useful is the rapid communication of innovative new methods. This is particularly true in fast moving fields such as single cell assays. I think the work by Moudgil et al is a good example of this.

At the heart of developmental mechanisms are gene regulatory networks – collections of transcriptional regulators that interact with each other, through the cis-regulatory elements they bind, to control gene expression and hence cell identity and function. Developing methods that allow the simultaneous assay, in individual cells, of the transcriptome and the genomic binding pattern of specific transcription factors (TF) would offer new insight into gene regulatory networks. In this preprint, Moudgil et al develop a method to do just this.


The method

To identify TF binding sites the authors previously described a technique that is based on a fusion between a TF of interest and a transposase [1]. The TF-transposase chimera is introduced into cells along with a reporter-harbouring transposon. As a result, the TF-transposase targets deposition of the reporter-transposon to DNA near the TF binding sites. The authors refer to these insertions as “calling cards” that can be amplified from chromatin and the locations determined by high-throughput sequencing of genomic DNA.

To make the technique compatible with transcriptome assays, the authors extended the technique by developing “self-reporting transposons (SRTs)”. To do this they removed the polyadenylation signal from the reporter-transposon and added a ribozyme after the terminal repeat to minimize reads from the non-integrated reporter. These clever tricks allow transcription of the reporter gene through the transposon into the flanking genomic sequence. The location of an insertion event can then be identified from mRNA by the sequence of the 3’ untranslated regions (UTRs) of reporter gene transcripts.

The authors first validate the technique in populations of cells transfected with the transcription factor SP1 fused to the transposase by demonstrating that calling cards sequenced from mRNA UTRs overlap with positions in the genome that are known to bind SP1. The transposase used, piggyBac, naturally interacts with the bromodomain protein BRD4, which itself associates with acetylated histones and active enhancers. The authors turn this potential bug into a feature by demonstrating that, if not fused to a specific TF, piggyBac can be used to map BRD4 bound regions in the genome using the SRT technique.


Schematic of the scCC library preparation strategy. Self-reporting transcripts from inserted transposons that incorporate sequence from the transposon and adjacent genomic DNA are amplified using biotinylated primers and circularized. This brings the cell barcode (BC) and unique molecular index (UMI) close to the transposon-genome junction. Circularized molecules are sheared, captured, and adapters are ligated. Sequencing yields the cell barcode and UMI with read 1 and the genomic insertion site with read 2. From Figure 3A of the preprint


Finally, the authors modify their protocols to apply it to single cells – scCC (single cell Calling Cards). mRNA from single cells transfected with the SRT system was used to recover both the call cards, indicating the locations of transposon insertions, and the transcriptome. The cell barcodes from the single cell library preparation allow transcriptome data to be paired with call cards and hence transposon insertions assigned to specific cell types. Proof of principle experiments in cell lines were followed by mapping of Brd4 binding and gene expression in individual cells from the mouse cortex. They demonstrate differential Brd4 binding in excitatory neurons located in different layers of the cortex, providing evidence that SRT can be used to map transcriptional regulators in situ.


Why I like the preprint

I think the approach is elegant and original with the potential for further refinement. It has similarities with some other recently developed techniques, such as targeted DamID (e.g.[2]), which identifies TF binding sites using a TF fused to DNA methyltransferase that methylates GATCs in the neighbourhood of binding. But scCC allows the simultaneous recovering of mRNA as well as TF binding location in individual cells, this has the potential to infer the link between TF binding and gene regulation. Moreover, the approach is of broad interest as it is sufficiently flexible to apply to almost any TF, in any cell type or tissue, in any species.

Many modifications to the system can be imagined. The authors mention the idea of using calling card insertions as molecular records of cell lineage or specific cellular events. Another possibility would be to use two or more distinguishable reporter-transposons, introduced at different times, to examine temporal changes in binding. It’s also possible to imagine using orthogonal transposase-transposon pairs to simultaneously monitor the binding of two TFs in the same cell.


Questions and open issues

As the authors point out, potential limitations are the sparsity of the data from single cells and the inherent bias in the insertion preferences of the transposase. In this context, it would be interesting to know if enhancers that produce eRNAs (~25% of enhancers) might be captured more frequently. Tweaks to the system and scaling up the datasets could address some of the shortcomings. Ultimately, analyzing the correlation between TF binding and the activity of individual genes in populations of many single cells would offer fantastically rich datasets from which to make gene regulatory predictions.

The current system relies on the ectopic expression of the TF-transposase fusion protein. Whether this results in aberrant binding of the TF to sites not normally occupied or whether the expression of the TF-transposase fusion protein has dominant effects that alter the state of the cells will depend on the details of the TF and cell types. Developing the system to allow the regulated expression from an endogenous gene would be one way around this limitation.

In the study, the authors use the interaction between PiggyBac and Brd4 to their advantage. However, this could also be a limitation as it might result in a high background or confounding results that complicate the identification of binding events that are specific to a TF of interest. It was unclear how much the Sp1-PiggBac fusion protein is recruited to Brd4 bound sites. Modifications that reduce or abrogate the Brd4 interaction, perhaps by using alternative transposases, would eliminate this concern. In addition, I’d be interested in seeing a comparison between BRD4 binding sites identified in the scCC system and other techniques, such as scATACseq, that mark accessible chromatin, to see how much overlap there is.



  1. Wang H, Mayhew D, Chen X, Johnston M, Mitra RD. (2012) “Calling cards” for DNA-binding proteins in mammalian cells. Genetics. 190:941-9
  2. Cheetham SW, Gruhn WH, van den Ameele J, Krautz R, Southall TD, Kobayashi T, Surani MA, Brand AH. (2018) Targeted DamID reveals differential binding of mammalian pluripotency factors. Development 145:dev170209

Tags: single-cell sequencing, transcription factor

Posted on: 20th February 2019

Read preprint (1 votes)

  • Author's response

    Arnav Moudgil and Rob Mitra shared

    Thank you for selecting our manuscript for a preLight! You’ve provided a very nice summary of the single cell calling cards (scCC) method described in our manuscript. We have a few comments with regards to the questions that you raised:


    1. Data Sparsity. Increasing the number of calling card insertions collected per cell would reduce the cost of the method, and we’ve outlined several ways that this might be accomplished in our Discussion section. One promising future direction is to combine scCC with newer scRNA-seq techniques such as Cell Hashing, Sci-seq, or SPLIT-seq that substantially reduce the per-cell costs for library construction. Since scCC does not require much sequencing coverage per cell, and single cell transcriptomes do not need to be sequenced to high depth to be mapped onto high quality reference datasets, generating scCC libraries from these methods has the potential to significantly reduce costs.


    1. Ectopic expression versus “tagging” of endogenous genes. We agree that for certain applications it will be important to tag the transcription factor of interest with the transposase at its endogenous genomic locus. In fact, we do this routinely in yeast, an organism especially amenable to genome engineering (see e.g. We also agree that regulating the deposition of calling cards will further broaden the scope of the method, and we have developed robust methods to achieve chemical control over calling card deposition (e.g.


    1. The piggyBac-Brd4 interaction as a background for peak calling. Our standard calling card analysis pipeline compares TF-directed calling card data with a background distribution (e.g. calling cards deposited by the unfused piggyBac transposase) to identify genomic loci enriched for TF-directed insertions. The sites identified in this fashion should be TF-specific. For TFs that redirect Brd4 binding almost completely, or for experiments where it is not convenient to collect a background distribution, or for the analysis of Brd4 binding, the pipeline can also perform background-free binding site identification. We do agree that engineering other transposases for use with calling cards will be useful as it would remove the need to collect an unfused background distribution. One possibility is the SleepingBeauty transposase; as supplemental data, we characterized its background distribution with SRTs and found it to be relatively uniform across chromatin states.


    1. Calling cards and ATAC-seq. With regards to comparisons between calling card data and ATAC-seq data, we generally observe that calling card peaks overlap a small subset of ATAC-seq peaks, which is consistent with the notion that calling cards map the binding of a specific TF, while ATAC-seq provides a general measure of chromatin accessibility.

    Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here