High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue-specificity

Kaia Mattioli, Pieter-Jan Volders, Chiara Gerhardinger, James C. Lee, Philipp G. Maass, Marta Mele, John L. Rinn

Preprint posted on December 04, 2018

What core promoter sequence features are important for gene activity? Mattioli et al. uncover an important role of overlapping motifs

Selected by Clarice Hong

Categories: genomics, systems biology


Though transcription initiates from mRNA promoters, long non-coding RNA (lncRNA) promoters and enhancers (to produce enhancer RNAs, or eRNAs), each of these classes of genomic sequences have very different expression profiles. Specifically, lncRNAs and eRNAs are less active and more tissue-specific than mRNAs. The different expression patterns must be encoded by the genomic sequence itself, however, it remains unclear what sequence features determine different transcriptional patterns. Furthermore, a subclass of transcribed sequences, known as ‘divergent’ promoters, produce two stable transcripts in the sense and antisense direction respectively. Whether a ‘divergent’ transcript is produced by one promoter with unique sequence features or two proximal promoters remains unknown. Thus, to understand the sequence features underlying different promoter types, the authors used massively parallel reporter assays (MPRAs) to measure the intrinsic transcriptional activity of hundreds of promoters and enhancers in different cell types.

Key findings

The authors first grouped the genomic sequences that initiate transcription into 5 categories: eRNAs, intergenic lncRNAs (lincRNAs), divergent lncRNAs, mRNAs and divergent mRNAs. They then selected high-confidence transcription start sites (TSSs) for each category from 3 different cell lines (K562, HepG2 and HeLa) and designed sequences covering the core promoter to test for transcriptional activity. For the MPRA, each core promoter is linked to a unique barcode sequence that is transcribed. The activity of each promoter is then calculated by taking the RNA barcode counts divided by the DNA input barcode counts. Using this method, the authors found that both divergent mRNA and lncRNA promoters tended to be more active than their non-divergent counterparts, suggesting that divergent promoters are intrinsically stronger than non-divergent promoters. Furthermore, at least part of the tissue-specificity of core promoters appears to be encoded in the core promoter sequence itself, since the MPRA was able to recapitulate tissue-specific expression. Thus, the core promoter sequence alone can explain some of the differences between the different classes of promoters.

To determine the sequence features that discriminate between different promoters, the authors looked at two main features: the transcription factor (TF) motif architecture (the suite of TFs that binds to sequence) and the cell-type-specificity of the TFs that bind to the core promoter. TF motif architecture was further subdivided into two parts: number of independent binding sites in the sequence and the number of overlapping motifs. Using these three features, they fit a linear model to the MPRA data to see which feature contributes the most to core promoter activity. They found that while the number of binding sites and number of overlapping motifs (both under TF motif architecture) could explain some of the variation, cell-type-specificity of the TFs contributed almost nothing to core promoter activity. This suggests that the strength of a core promoter is dependent on its TF motif architecture, but this itself is not sufficient since they each only explain less than 20% of the variation.

Using the same metrics, the authors then looked at publicly available CAGE data (which measures the activity of each TSS in the genome) and found that overlapping TF motifs is correlated with higher core promoter activity and lower tissue-specificity. They thus hypothesised that disruptions in overlapping motifs would have a larger effect size than disruptions in individual motifs, since they are likely to have more severe consequences on promoter activity. To test this, they designed a second library of core promoter sequences from 21 disease-associated genes and 5 nearby lncRNAs and eRNAs with single nucleotide deletions spanning the core promoter. Indeed, the effect size of each deletion is somewhat correlated with the number of motifs it is predicted to disrupt, suggesting that overlapping TF motifs are indeed predictive of stronger promoter activity. This was also true for disease-associated single nucleotide polymorphisms (SNPs), as SNPs in overlapping motifs led to larger expression changes. From these results, the authors concluded that overlapping binding sites for different TFs allow a core promoter to be ubiquitously expressed across cell types and maintain high expression (Figure 1).

Figure 1: Summary of gene expression regulation by core promoters (Figure 5 from preprint). High and ubiquitous expression is associated with more overlapping TF motifs, while low and tissue-specific expression tends to have fewer TF motifs.

What I liked

As a student trying to understand the regulation of gene expression, the question of what sequence features of core promoters determine their activity is very interesting to me. This is especially exciting since we found out that so much more of the genome than we expected is transcribed. Since different groups of genes clearly have very different expression patterns, we need to find the rules governing these patterns. In this preprint, the authors took this one step further, and used some of the rules they learnt (overlapping TF motifs) to identify and determine the function of known SNPs in core promoters, which will be very useful for the understanding of non-coding disease variants. Furthermore, the MPRA is a powerful technique used to assay the activity of many DNA sequences, so I like that MPRAs are being used for this purpose. This also provides a great tool for the further study of TF binding sites and how variants affect TF binding and expression.

Future directions and questions

The biggest question that I have is what else is causing the differential expression levels and tissue specificity, since the features tested did not explain at least half of the variance. Can we consider other sequence features, for example, shape of the DNA? The specific combinations of TF motifs might also be important, since low affinity binding sites that are not usually picked up by motif finders can be used in the genome in combination with the right partners. Furthermore, are there any sequence features that might lead to a divergent vs non-divergent promoter? It also appears that the same rules used to explain the difference between categories of promoters can also be applied within each group of promoters, which suggests that perhaps things like TF motif architecture do not distinguish between the different promoter categories, but simply discriminates between high/low expression and tissue-specific expression. This begs the question of whether lncRNA and mRNA promoters and even eRNAs are categorically different, or whether they simply are transcribed according to the same rules to produce transcripts of different functions.


Posted on: 29th January 2019 , updated on: 30th January 2019

Read preprint (No Ratings Yet)

  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here

    Also in the genomics category:

    Accurate detection of m6A RNA modifications in native RNA sequences

    Huanle Liu, Oguzhan Begik, Morghan C Lucas, et al.

    Selected by Christian Bates


    Crowdfunded whole-genome sequencing of the celebrity cat Lil BUB identifies causal mutations for her osteopetrosis and polydactyly

    Mike Bridavsky, Heiner Kuhl, Arthur Woodruf, et al.

    Selected by Jesus Victorino, Gabriel Aughey


    Endogenous CRISPR arrays for scalable whole organism lineage tracing

    James Cotterell, James Sharpe

    Selected by Irepan Salvador-Martinez

    Prospective, brain-wide labeling of neuronal subclasses with enhancer-driven AAVs

    Lucas T Graybuck, Adriana Sedeño-Cortés, Thuc Nghi Nguyen, et al.

    Selected by Jesus Victorino

    Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells

    Arnav Moudgil, Michael N Wilkinson, Xuhua Chen, et al.

    Selected by James Briscoe


    Reconstruction of the global neural crest gene regulatory network in vivo

    Ruth M Williams, Ivan Candido-Ferreira, Emmanouela Repapi, et al.

    Selected by Hannah Brunsdon

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.

    Selected by Irepan Salvador-Martinez

    Single cell RNA-Seq reveals distinct stem cell populations that drive sensory hair cell regeneration in response to loss of Fgf and Notch signaling

    Mark E. Lush, Daniel C. Diaz, Nina Koenecke, et al.


    Distinct progenitor populations mediate regeneration in the zebrafish lateral line.

    Eric D Thomas, David Raible

    Selected by Rudra Nayan Das


    Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan

    Julien Rougot, Naomi D Chrispijn, Marco Aben, et al.

    Selected by Yen-Chung Chen


    The embryonic transcriptome of Arabidopsis thaliana

    Falko Hofmann, Michael A Schon, Michael D Nodine

    Selected by Chandra Shekhar Misra


    Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells

    Mridusmita Saikia, Philip Burnham, Sara H Keshavjee, et al.


    High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes

    Mandeep Singh, Ghamdan Al-Eryani, Shaun Carswell, et al.

    Selected by Samantha Seah

    The microbial basis of impaired wound healing: differential roles for pathogens, "bystanders", and strain-level diversification in clinical outcomes

    Lindsay Kalan, Jacquelyn S Meisel, Michael A Loesche, et al.

    Selected by Snehal Kadam

    Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems

    Xiannian Zhang, Tianqi Li, Feng Liu, et al.

    Selected by Samantha Seah

    PUMILIO hyperactivity drives premature aging of Norad-deficient mice

    Florian Kopp, Mehmet Yalvac, Beibei Chen, et al.

    Selected by Carmen Adriaens

    LCM-seq reveals unique transcriptional adaption mechanisms of resistant neurons in spinal muscular atrophy

    Susanne Nichterwitz, Helena Storvall, Jik Nijssen, et al.


    Axon-seq decodes the motor axon transcriptome and its modulation in response to ALS

    Jik Nijssen, Julio Cesar Aguila Benitez, Rein Hoogstraaten, et al.

    Selected by Yen-Chung Chen

    LADL: Light-activated dynamic looping for endogenous gene expression control

    Mayuri Rege, Ji Hun Kim, Jacqueline Valeri, et al.

    Selected by Ivan Candido-Ferreira

    Also in the systems biology category:

    Spreading of molecular mechanical perturbations on linear filaments

    Zsombor Balassy, Anne-Marie Lauzon, Lennart Hilbert

    Selected by Lars Hubatsch

    Lineage tracing on transcriptional landscapes links state to fate during differentiation

    Caleb Weinreb, Alejo E Rodriguez-Fraticelli, Fernando D Camargo, et al.

    Selected by Yen-Chung Chen


    Short-range interactions govern cellular dynamics in microbial multi-genotype systems

    Alma Dal Co, Simon van Vliet, Daniel Johannes Kiviet, et al.


    Rapid microbial interaction network inference in microfluidic droplets

    Ryan H Hsu, Ryan L Clark, Jin Wei Tan, et al.

    Selected by Connor Rosen

    High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue-specificity

    Kaia Mattioli, Pieter-Jan Volders, Chiara Gerhardinger, et al.

    Selected by Clarice Hong

    Variability of bacterial behavior in the mammalian gut captured using a growth-linked single-cell synthetic gene oscillator

    David T Riglar, David L Richmond, Laurent Potvin-Trottier, et al.

    Selected by Meng Zhu

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.

    Selected by Irepan Salvador-Martinez

    Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

    Hila Sberro, Nicholas Greenfield, Georgios Pavlopoulos, et al.

    Selected by Ganesh Kadamur

    Symmetry breaking in the embryonic skin triggers a directional and sequential front of competence during plumage patterning

    Richard Bailleul, Carole Desmarquet-Trin Dinh, Magdalena Hidalgo, et al.

    Selected by Alexa Sadier

    RNase L reprograms translation by widespread mRNA turnover escaped by antiviral mRNAs

    James M Burke, Stephanie L Moon, Evan T Lester, et al.

    Selected by Connor Rosen

    Acquired interbacterial defense systems protect against interspecies antagonism in the human gut microbiome

    Benjamin D. Ross, Adrian J. Verster, Matthew C. Radey, et al.

    Selected by Connor Rosen

    DNA microscopy: Optics-free spatio-genetic imaging by a stand-alone chemical reaction

    Joshua A. Weinstein, Aviv Regev, Feng Zhang

    Selected by Theo Sanderson


    The Toll pathway inhibits tissue growth and regulates cell fitness in an infection-dependent manner

    Federico Germani, Daniel Hain, Denise Sternlicht, et al.

    Selected by Rohan Khadilkar

    LCM-seq reveals unique transcriptional adaption mechanisms of resistant neurons in spinal muscular atrophy

    Susanne Nichterwitz, Helena Storvall, Jik Nijssen, et al.


    Axon-seq decodes the motor axon transcriptome and its modulation in response to ALS

    Jik Nijssen, Julio Cesar Aguila Benitez, Rein Hoogstraaten, et al.

    Selected by Yen-Chung Chen

    Memory sequencing reveals heritable single cell gene expression programs associated with distinct cellular behaviors

    Sydney M Shaffer, Benjamin L Emert, Ann E. Sizemore, et al.

    Selected by Leighton Daigh


    Conserved phosphorylation hotspots in eukaryotic protein domain families

    Marta J Strumillo, Michaela Oplova, Cristina Vieitez, et al.

    Selected by Gautam Dey

    A minimal "push-pull" bistability model explains oscillations between quiescent and proliferative cell states.

    Sandeep Krishna, Sunil Laxman

    Selected by Lauren Neves