Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

Arnav Moudgil, Michael N Wilkinson, Xuhua Chen, June He, Alexander J Cammack, Michael J Vasek, Tomas Lagunas Jr., Zongtai Qi, Samantha A Morris, Joseph D Dougherty, Robi D Mitra

Preprint posted on 1 February 2019

Article now published in Cell at

Making a mark on gene regulatory networks: A method using transposon insertions directed by a transcription factor allows simultaneous mapping of transcription factor binding sites and gene expression in single cells.

Selected by James Briscoe

Categories: genomics



One of the aspects of preprints that I’ve found particularly useful is the rapid communication of innovative new methods. This is particularly true in fast moving fields such as single cell assays. I think the work by Moudgil et al is a good example of this.

At the heart of developmental mechanisms are gene regulatory networks – collections of transcriptional regulators that interact with each other, through the cis-regulatory elements they bind, to control gene expression and hence cell identity and function. Developing methods that allow the simultaneous assay, in individual cells, of the transcriptome and the genomic binding pattern of specific transcription factors (TF) would offer new insight into gene regulatory networks. In this preprint, Moudgil et al develop a method to do just this.


The method

To identify TF binding sites the authors previously described a technique that is based on a fusion between a TF of interest and a transposase [1]. The TF-transposase chimera is introduced into cells along with a reporter-harbouring transposon. As a result, the TF-transposase targets deposition of the reporter-transposon to DNA near the TF binding sites. The authors refer to these insertions as “calling cards” that can be amplified from chromatin and the locations determined by high-throughput sequencing of genomic DNA.

To make the technique compatible with transcriptome assays, the authors extended the technique by developing “self-reporting transposons (SRTs)”. To do this they removed the polyadenylation signal from the reporter-transposon and added a ribozyme after the terminal repeat to minimize reads from the non-integrated reporter. These clever tricks allow transcription of the reporter gene through the transposon into the flanking genomic sequence. The location of an insertion event can then be identified from mRNA by the sequence of the 3’ untranslated regions (UTRs) of reporter gene transcripts.

The authors first validate the technique in populations of cells transfected with the transcription factor SP1 fused to the transposase by demonstrating that calling cards sequenced from mRNA UTRs overlap with positions in the genome that are known to bind SP1. The transposase used, piggyBac, naturally interacts with the bromodomain protein BRD4, which itself associates with acetylated histones and active enhancers. The authors turn this potential bug into a feature by demonstrating that, if not fused to a specific TF, piggyBac can be used to map BRD4 bound regions in the genome using the SRT technique.


Schematic of the scCC library preparation strategy. Self-reporting transcripts from inserted transposons that incorporate sequence from the transposon and adjacent genomic DNA are amplified using biotinylated primers and circularized. This brings the cell barcode (BC) and unique molecular index (UMI) close to the transposon-genome junction. Circularized molecules are sheared, captured, and adapters are ligated. Sequencing yields the cell barcode and UMI with read 1 and the genomic insertion site with read 2. From Figure 3A of the preprint


Finally, the authors modify their protocols to apply it to single cells – scCC (single cell Calling Cards). mRNA from single cells transfected with the SRT system was used to recover both the call cards, indicating the locations of transposon insertions, and the transcriptome. The cell barcodes from the single cell library preparation allow transcriptome data to be paired with call cards and hence transposon insertions assigned to specific cell types. Proof of principle experiments in cell lines were followed by mapping of Brd4 binding and gene expression in individual cells from the mouse cortex. They demonstrate differential Brd4 binding in excitatory neurons located in different layers of the cortex, providing evidence that SRT can be used to map transcriptional regulators in situ.


Why I like the preprint

I think the approach is elegant and original with the potential for further refinement. It has similarities with some other recently developed techniques, such as targeted DamID (e.g.[2]), which identifies TF binding sites using a TF fused to DNA methyltransferase that methylates GATCs in the neighbourhood of binding. But scCC allows the simultaneous recovering of mRNA as well as TF binding location in individual cells, this has the potential to infer the link between TF binding and gene regulation. Moreover, the approach is of broad interest as it is sufficiently flexible to apply to almost any TF, in any cell type or tissue, in any species.

Many modifications to the system can be imagined. The authors mention the idea of using calling card insertions as molecular records of cell lineage or specific cellular events. Another possibility would be to use two or more distinguishable reporter-transposons, introduced at different times, to examine temporal changes in binding. It’s also possible to imagine using orthogonal transposase-transposon pairs to simultaneously monitor the binding of two TFs in the same cell.


Questions and open issues

As the authors point out, potential limitations are the sparsity of the data from single cells and the inherent bias in the insertion preferences of the transposase. In this context, it would be interesting to know if enhancers that produce eRNAs (~25% of enhancers) might be captured more frequently. Tweaks to the system and scaling up the datasets could address some of the shortcomings. Ultimately, analyzing the correlation between TF binding and the activity of individual genes in populations of many single cells would offer fantastically rich datasets from which to make gene regulatory predictions.

The current system relies on the ectopic expression of the TF-transposase fusion protein. Whether this results in aberrant binding of the TF to sites not normally occupied or whether the expression of the TF-transposase fusion protein has dominant effects that alter the state of the cells will depend on the details of the TF and cell types. Developing the system to allow the regulated expression from an endogenous gene would be one way around this limitation.

In the study, the authors use the interaction between PiggyBac and Brd4 to their advantage. However, this could also be a limitation as it might result in a high background or confounding results that complicate the identification of binding events that are specific to a TF of interest. It was unclear how much the Sp1-PiggBac fusion protein is recruited to Brd4 bound sites. Modifications that reduce or abrogate the Brd4 interaction, perhaps by using alternative transposases, would eliminate this concern. In addition, I’d be interested in seeing a comparison between BRD4 binding sites identified in the scCC system and other techniques, such as scATACseq, that mark accessible chromatin, to see how much overlap there is.



  1. Wang H, Mayhew D, Chen X, Johnston M, Mitra RD. (2012) “Calling cards” for DNA-binding proteins in mammalian cells. Genetics. 190:941-9
  2. Cheetham SW, Gruhn WH, van den Ameele J, Krautz R, Southall TD, Kobayashi T, Surani MA, Brand AH. (2018) Targeted DamID reveals differential binding of mammalian pluripotency factors. Development 145:dev170209

Tags: single-cell sequencing, transcription factor

Posted on: 20 February 2019 , updated on: 25 July 2020


Read preprint (1 votes)

Author's response

Arnav Moudgil and Rob Mitra shared

Thank you for selecting our manuscript for a preLight! You’ve provided a very nice summary of the single cell calling cards (scCC) method described in our manuscript. We have a few comments with regards to the questions that you raised:


  1. Data Sparsity. Increasing the number of calling card insertions collected per cell would reduce the cost of the method, and we’ve outlined several ways that this might be accomplished in our Discussion section. One promising future direction is to combine scCC with newer scRNA-seq techniques such as Cell Hashing, Sci-seq, or SPLIT-seq that substantially reduce the per-cell costs for library construction. Since scCC does not require much sequencing coverage per cell, and single cell transcriptomes do not need to be sequenced to high depth to be mapped onto high quality reference datasets, generating scCC libraries from these methods has the potential to significantly reduce costs.


  1. Ectopic expression versus “tagging” of endogenous genes. We agree that for certain applications it will be important to tag the transcription factor of interest with the transposase at its endogenous genomic locus. In fact, we do this routinely in yeast, an organism especially amenable to genome engineering (see e.g. We also agree that regulating the deposition of calling cards will further broaden the scope of the method, and we have developed robust methods to achieve chemical control over calling card deposition (e.g.


  1. The piggyBac-Brd4 interaction as a background for peak calling. Our standard calling card analysis pipeline compares TF-directed calling card data with a background distribution (e.g. calling cards deposited by the unfused piggyBac transposase) to identify genomic loci enriched for TF-directed insertions. The sites identified in this fashion should be TF-specific. For TFs that redirect Brd4 binding almost completely, or for experiments where it is not convenient to collect a background distribution, or for the analysis of Brd4 binding, the pipeline can also perform background-free binding site identification. We do agree that engineering other transposases for use with calling cards will be useful as it would remove the need to collect an unfused background distribution. One possibility is the SleepingBeauty transposase; as supplemental data, we characterized its background distribution with SRTs and found it to be relatively uniform across chromatin states.


  1. Calling cards and ATAC-seq. With regards to comparisons between calling card data and ATAC-seq data, we generally observe that calling card peaks overlap a small subset of ATAC-seq peaks, which is consistent with the notion that calling cards map the binding of a specific TF, while ATAC-seq provides a general measure of chromatin accessibility.


Update 25.07.2020 How did the paper improve as a result of peer review?
Answers from Arnav Moudgil (taken from his personal Twitter page

While there are a number of additions, I’ll focus on two major new components.

First we were asked to perform additional validation that this method works. The reviewers recognized the method’s potential but wanted to see more TFs, ideally in other cell lines as well. We have now mapped the sequence-specific TF SP1 in both HCT-116 and K562 cells. We then mapped the binding of FOXA2, a sequence-specific pioneer factor, in HepG2 cells. Finally, we mapped BAP1, a non-sequence-specific chromatin remodeler, in OCM-1A (uveal melanoma) cells. For all four TFs, we identified sharp peaks concordantly aligned with orthogonal data from bulk ChIP-seq or calling card experiments. For the three sequence-specific TFs, we could identify the TF’s motif from peaks. For BAP1, we detected the motif of YY1, a known co-factor. This establishes that scCC can accurately map TF binding sites from scRNA-seq.


The second major addition to the paper is the discovery of a bromodomain-dependent cell state transition in K562 cells. Previous work found that individual K562 cells dynamically oscillate between CD24-high and CD24-low states. Since we had already performed scCC with BRD4 in K562 cells, and since bromodomains like BRD4 have been implicated in regulating cell identity, we first asked whether we could detect these two states from scRNA-seq; and if so, is there differential BRD4 binding between them?

The answer to the first question was yes! We were indeed able to characterize a gradient of CD24 expression across K562 cells. We then classified these cells as either CD24-high or CD24-low and stratified their BRD4-directed single cell transpositions from the scCC library.

The answer to the second question also appears to be yes! We found several BRD4 peaks that were differentially bound, most of them preferentially found in CD24-high cells. This suggested to us that BRD4 might promote the transition from a CD24-low cell to a CD24-high cell.

So, we further investigated whether BRD4 was responsible for mediating these states. Summarizing several experiments to tease this out, we found that both pharmacological (JQ1 treatment) and genetic (BRD4 CRISPRi) perturbations reduced the proportion of CD24-high cells.

The CD24-high and -low states have also been shown to have different chemosensitivities, with the latter being more susceptible to the drug imatinib. Since JQ1 treatment increased the proportion of CD24-low cells, would it also increase imatinib sensitivity? Indeed, it did! This result was particularly exciting as it could have therapeutic applications. It also dovetails nicely with growing evidence that bromodomains regulate cell state transitions in disease. For another example, see Michael Alexanian’s very recent preprint.


These were the major highlights of the revision, but of course there are more details interspersed throughout. Take a look and let us know what you think! Sequencing data and processed files are on GEO and code is on GitHub.


Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the genomics category:

preLights peer support – preprints of interest

This is a preprint repository to organise the preprints and preLights covered through the 'preLights peer support' initiative.


List by preLights peer support

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

This preList contains preprints discussed during the 'Semmelweis Symposium 2022' (7-9 November), organised around the 40th anniversary of international medical education at Semmelweis University covering a wide range of topics.


List by Nándor Lipták

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

In this annual conference, Hungarian geneticists, biochemists and biotechnologists presented their works. Link:


List by Nándor Lipták

EMBL Conference: From functional genomics to systems biology

Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020


List by Jesus Victorino

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

Zebrafish immunology

A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.


List by Shikha Nayar