Menu

Close

High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue-specificity

Kaia Mattioli, Pieter-Jan Volders, Chiara Gerhardinger, James C. Lee, Philipp G. Maass, Marta Mele, John L. Rinn

Preprint posted on December 04, 2018 https://www.biorxiv.org/content/10.1101/482232v2

What core promoter sequence features are important for gene activity? Mattioli et al. uncover an important role of overlapping motifs

Selected by Clarice Hong

Categories: genomics, systems biology

Background

Though transcription initiates from mRNA promoters, long non-coding RNA (lncRNA) promoters and enhancers (to produce enhancer RNAs, or eRNAs), each of these classes of genomic sequences have very different expression profiles. Specifically, lncRNAs and eRNAs are less active and more tissue-specific than mRNAs. The different expression patterns must be encoded by the genomic sequence itself, however, it remains unclear what sequence features determine different transcriptional patterns. Furthermore, a subclass of transcribed sequences, known as ‘divergent’ promoters, produce two stable transcripts in the sense and antisense direction respectively. Whether a ‘divergent’ transcript is produced by one promoter with unique sequence features or two proximal promoters remains unknown. Thus, to understand the sequence features underlying different promoter types, the authors used massively parallel reporter assays (MPRAs) to measure the intrinsic transcriptional activity of hundreds of promoters and enhancers in different cell types.

Key findings

The authors first grouped the genomic sequences that initiate transcription into 5 categories: eRNAs, intergenic lncRNAs (lincRNAs), divergent lncRNAs, mRNAs and divergent mRNAs. They then selected high-confidence transcription start sites (TSSs) for each category from 3 different cell lines (K562, HepG2 and HeLa) and designed sequences covering the core promoter to test for transcriptional activity. For the MPRA, each core promoter is linked to a unique barcode sequence that is transcribed. The activity of each promoter is then calculated by taking the RNA barcode counts divided by the DNA input barcode counts. Using this method, the authors found that both divergent mRNA and lncRNA promoters tended to be more active than their non-divergent counterparts, suggesting that divergent promoters are intrinsically stronger than non-divergent promoters. Furthermore, at least part of the tissue-specificity of core promoters appears to be encoded in the core promoter sequence itself, since the MPRA was able to recapitulate tissue-specific expression. Thus, the core promoter sequence alone can explain some of the differences between the different classes of promoters.

To determine the sequence features that discriminate between different promoters, the authors looked at two main features: the transcription factor (TF) motif architecture (the suite of TFs that binds to sequence) and the cell-type-specificity of the TFs that bind to the core promoter. TF motif architecture was further subdivided into two parts: number of independent binding sites in the sequence and the number of overlapping motifs. Using these three features, they fit a linear model to the MPRA data to see which feature contributes the most to core promoter activity. They found that while the number of binding sites and number of overlapping motifs (both under TF motif architecture) could explain some of the variation, cell-type-specificity of the TFs contributed almost nothing to core promoter activity. This suggests that the strength of a core promoter is dependent on its TF motif architecture, but this itself is not sufficient since they each only explain less than 20% of the variation.

Using the same metrics, the authors then looked at publicly available CAGE data (which measures the activity of each TSS in the genome) and found that overlapping TF motifs is correlated with higher core promoter activity and lower tissue-specificity. They thus hypothesised that disruptions in overlapping motifs would have a larger effect size than disruptions in individual motifs, since they are likely to have more severe consequences on promoter activity. To test this, they designed a second library of core promoter sequences from 21 disease-associated genes and 5 nearby lncRNAs and eRNAs with single nucleotide deletions spanning the core promoter. Indeed, the effect size of each deletion is somewhat correlated with the number of motifs it is predicted to disrupt, suggesting that overlapping TF motifs are indeed predictive of stronger promoter activity. This was also true for disease-associated single nucleotide polymorphisms (SNPs), as SNPs in overlapping motifs led to larger expression changes. From these results, the authors concluded that overlapping binding sites for different TFs allow a core promoter to be ubiquitously expressed across cell types and maintain high expression (Figure 1).

Figure 1: Summary of gene expression regulation by core promoters (Figure 5 from preprint). High and ubiquitous expression is associated with more overlapping TF motifs, while low and tissue-specific expression tends to have fewer TF motifs.

What I liked

As a student trying to understand the regulation of gene expression, the question of what sequence features of core promoters determine their activity is very interesting to me. This is especially exciting since we found out that so much more of the genome than we expected is transcribed. Since different groups of genes clearly have very different expression patterns, we need to find the rules governing these patterns. In this preprint, the authors took this one step further, and used some of the rules they learnt (overlapping TF motifs) to identify and determine the function of known SNPs in core promoters, which will be very useful for the understanding of non-coding disease variants. Furthermore, the MPRA is a powerful technique used to assay the activity of many DNA sequences, so I like that MPRAs are being used for this purpose. This also provides a great tool for the further study of TF binding sites and how variants affect TF binding and expression.

Future directions and questions

The biggest question that I have is what else is causing the differential expression levels and tissue specificity, since the features tested did not explain at least half of the variance. Can we consider other sequence features, for example, shape of the DNA? The specific combinations of TF motifs might also be important, since low affinity binding sites that are not usually picked up by motif finders can be used in the genome in combination with the right partners. Furthermore, are there any sequence features that might lead to a divergent vs non-divergent promoter? It also appears that the same rules used to explain the difference between categories of promoters can also be applied within each group of promoters, which suggests that perhaps things like TF motif architecture do not distinguish between the different promoter categories, but simply discriminates between high/low expression and tissue-specific expression. This begs the question of whether lncRNA and mRNA promoters and even eRNAs are categorically different, or whether they simply are transcribed according to the same rules to produce transcripts of different functions.

 

Posted on: 29th January 2019 , updated on: 30th January 2019

Read preprint (No Ratings Yet)




  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here

    Also in the genomics category:

    Reconstruction of the global neural crest gene regulatory network in vivo

    Ruth M Williams, Ivan Candido-Ferreira, Emmanouela Repapi, et al.



    Selected by Hannah Brunsdon

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.



    Selected by Irepan Salvador-Martinez

    Single cell RNA-Seq reveals distinct stem cell populations that drive sensory hair cell regeneration in response to loss of Fgf and Notch signaling

    Mark E. Lush, Daniel C. Diaz, Nina Koenecke, et al.

    AND

    Distinct progenitor populations mediate regeneration in the zebrafish lateral line.

    Eric D Thomas, David Raible



    Selected by Rudra Nayan Das

    1

    Maintenance of spatial gene expression by Polycomb-mediated repression after formation of a vertebrate body plan

    Julien Rougot, Naomi D Chrispijn, Marco Aben, et al.



    Selected by Yen-Chung Chen

    1

    The embryonic transcriptome of Arabidopsis thaliana

    Falko Hofmann, Michael A Schon, Michael D Nodine



    Selected by Chandra Shekhar Misra

    1

    Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells

    Mridusmita Saikia, Philip Burnham, Sara H Keshavjee, et al.

    AND

    High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes

    Mandeep Singh, Ghamdan Al-Eryani, Shaun Carswell, et al.



    Selected by Samantha Seah

    The microbial basis of impaired wound healing: differential roles for pathogens, "bystanders", and strain-level diversification in clinical outcomes

    Lindsay Kalan, Jacquelyn S Meisel, Michael A Loesche, et al.



    Selected by Snehal Kadam

    Comparative analysis of droplet-based ultra-high-throughput single-cell RNA-seq systems

    Xiannian Zhang, Tianqi Li, Feng Liu, et al.



    Selected by Samantha Seah

    PUMILIO hyperactivity drives premature aging of Norad-deficient mice

    Florian Kopp, Mehmet Yalvac, Beibei Chen, et al.



    Selected by Carmen Adriaens

    LCM-seq reveals unique transcriptional adaption mechanisms of resistant neurons in spinal muscular atrophy

    Susanne Nichterwitz, Helena Storvall, Jik Nijssen, et al.

    AND

    Axon-seq decodes the motor axon transcriptome and its modulation in response to ALS

    Jik Nijssen, Julio Cesar Aguila Benitez, Rein Hoogstraaten, et al.



    Selected by Yen-Chung Chen

    LADL: Light-activated dynamic looping for endogenous gene expression control

    Mayuri Rege, Ji Hun Kim, Jacqueline Valeri, et al.



    Selected by Ivan Candido-Ferreira

    Precise tuning of gene expression output levels in mammalian cells

    Yale S. Michaels, Mike B Barnkob, Hector Barbosa, et al.



    Selected by Tim Fessenden

    1

    Template switching causes artificial junction formation and false identification of circular RNAs

    Chong Tang, Tian Yu, Yeming Xie, et al.



    Selected by Fabio Liberante

    The genomic basis of colour pattern polymorphism in the harlequin ladybird

    Mathieu Gautier, Junichi Yamaguchi, Julien Foucaud, et al.



    Selected by Fillip Port

    Widespread inter-individual gene expression variability in Arabidopsis thaliana

    Sandra Cortijo, Zeynep Aydin, Sebastian Ahnert, et al.



    Selected by Martin Balcerowicz

    Single-cell Map of Diverse Immune Phenotypes Driven by the Tumor Microenvironment

    Elham Azizi, Ambrose J. Carr, George Plitas, et al.



    Selected by Tim Fessenden

    Also in the systems biology category:

    Lineage tracing on transcriptional landscapes links state to fate during differentiation

    Caleb Weinreb, Alejo E Rodriguez-Fraticelli, Fernando D Camargo, et al.



    Selected by Yen-Chung Chen

    1

    Short-range interactions govern cellular dynamics in microbial multi-genotype systems

    Alma Dal Co, Simon van Vliet, Daniel Johannes Kiviet, et al.

    AND

    Rapid microbial interaction network inference in microfluidic droplets

    Ryan H Hsu, Ryan L Clark, Jin Wei Tan, et al.



    Selected by Connor Rosen

    High-throughput functional analysis of lncRNA core promoters elucidates rules governing tissue-specificity

    Kaia Mattioli, Pieter-Jan Volders, Chiara Gerhardinger, et al.



    Selected by Clarice Hong

    Variability of bacterial behavior in the mammalian gut captured using a growth-linked single-cell synthetic gene oscillator

    David T Riglar, David L Richmond, Laurent Potvin-Trottier, et al.



    Selected by Meng Zhu

    Charting a tissue from single-cell transcriptomes

    Mor Nitzan, Nikos Karaiskos, Nir Friedman, et al.



    Selected by Irepan Salvador-Martinez

    Large-scale analyses of human microbiomes reveal thousands of small, novel genes and their predicted functions

    Hila Sberro, Nicholas Greenfield, Georgios Pavlopoulos, et al.



    Selected by Ganesh Kadamur

    Symmetry breaking in the embryonic skin triggers a directional and sequential front of competence during plumage patterning

    Richard Bailleul, Carole Desmarquet-Trin Dinh, Magdalena Hidalgo, et al.



    Selected by Alexa Sadier

    RNase L reprograms translation by widespread mRNA turnover escaped by antiviral mRNAs

    James M Burke, Stephanie L Moon, Evan T Lester, et al.



    Selected by Connor Rosen

    Acquired interbacterial defense systems protect against interspecies antagonism in the human gut microbiome

    Benjamin D. Ross, Adrian J. Verster, Matthew C. Radey, et al.



    Selected by Connor Rosen

    DNA microscopy: Optics-free spatio-genetic imaging by a stand-alone chemical reaction

    Joshua A. Weinstein, Aviv Regev, Feng Zhang



    Selected by Theo Sanderson

    2

    The Toll pathway inhibits tissue growth and regulates cell fitness in an infection-dependent manner

    Federico Germani, Daniel Hain, Denise Sternlicht, et al.



    Selected by Rohan Khadilkar

    LCM-seq reveals unique transcriptional adaption mechanisms of resistant neurons in spinal muscular atrophy

    Susanne Nichterwitz, Helena Storvall, Jik Nijssen, et al.

    AND

    Axon-seq decodes the motor axon transcriptome and its modulation in response to ALS

    Jik Nijssen, Julio Cesar Aguila Benitez, Rein Hoogstraaten, et al.



    Selected by Yen-Chung Chen

    Memory sequencing reveals heritable single cell gene expression programs associated with distinct cellular behaviors

    Sydney M Shaffer, Benjamin L Emert, Ann E. Sizemore, et al.



    Selected by Leighton Daigh

    2

    Conserved phosphorylation hotspots in eukaryotic protein domain families

    Marta J Strumillo, Michaela Oplova, Cristina Vieitez, et al.



    Selected by Gautam Dey

    LADL: Light-activated dynamic looping for endogenous gene expression control

    Mayuri Rege, Ji Hun Kim, Jacqueline Valeri, et al.



    Selected by Ivan Candido-Ferreira

    A minimal "push-pull" bistability model explains oscillations between quiescent and proliferative cell states.

    Sandeep Krishna, Sunil Laxman



    Selected by Lauren Neves

    1

    Close