Genome-wide maps of enhancer regulation connect risk variants to disease genes

Joseph Nasser, Drew T. Bergman, Charles P. Fulco, Philine Guckelberger, Benjamin R. Doughty, Tejal A. Patwardhan, Thouis R. Jones, Tung H. Nguyen, Jacob C. Ulirsch, Heini M. Natri, Elle M. Weeks, Glen Munson, Michael Kane, Helen Y. Kang, Ang Cui, John P. Ray, Tom M. Eisenhaure, Kristy Mualim, Ryan L. Collins, Kushal Dey, Alkes L. Price, Charles B. Epstein, Anshul Kundaje, Ramnik J. Xavier, Mark J. Daly, Hailiang Huang, Hilary K. Finucane, Nir Hacohen, Eric S. Lander, Jesse M. Engreitz

Preprint posted on 3 September 2020

Having a hard time finding the needle in the haystack? The activity-by-contact model connects risk variants and target genes to prioritize functional studies on GWAS.

Selected by Jesus Victorino

Categories: genetics, genomics

*If you liked this preLights, please click on the thumb-up icon at the end of the page. Any comment, suggestion or question related to either scientific discussion or format will be more than welcome and very much appreciated. You can write directly at the bottom of this page or contact me by email or Twitter.


Last week I joined the #PreprintReviewChallenge, a great (and virtual) initiative organized by @ASAPbio_ and supported by @preLights@PREreview_@PeerCommunityIn & @PubPeer to build trust in #preprints. It was great to see more than 50 people, most of which were early-career researchers, gathering to chat about science, discuss about each other’s experience. With regard to science, I suggested the latest manuscript from the labs of Jesse Engreitz and Eric Lander. A week later (better late than never!) here’s my highlight including parts of the discussion that Iratxe Puebla, Julien Roux and myself had during the event.

Summary & background

In the GWAS era that we live in, thousands of risk variants have already been associated to diseases [1]. For the most studied traits, the list of candidate loci contributing to common polygenic disorders is above a hundred and the number keeps growing as the sample size enlarges. The majority (>80%) of associated polymorphisms lie on the non-coding genome where they might affect the activity of regulatory elements and, therefore, gene expression [2]. But of which genes? And in which tissue?

Due to the huge number of possible scenarios for a given disease, it is of great importance to prioritize the candidate regions on which to focus functional studies. In this preprint, Nasser,
Bergman, Fulco, Guckelberger, Doughty et al. et al. build maps of enhancers with their target genes in over a hundred samples using a model of ‘Activity by contact’ where they take into
account chromatin accessibility, enhancer marks and enhancer-promoter interaction [3]. They integrate this data with variants associated to inflammatory bowel disease, among other traits,
and predict their target genes and tissue of relevance (Fig. 1). Using this approach, the authors identify an enhancer linked to inflammatory bowel disease that affects the metabolic state of mitochondria in immune cells. This work provides an interesting and powerful approach to characterize enhancer landscapes and their effect on the regulation of genes causing disease.

Figure 1. Activity by contact on over 100 biosamples to prioritized disease-associated pairs of enhancers-genes (taken from Fig 1a of the preprint).

Key results

– Mapping of over 6 million enhancer-gene connections across >100 biological samples.
– Prediction of the target genes for nearly 5,000 variants within enhancers across 72 traits.
– Prioritizing 14 new genes to inflammatory bowel disease, including PPIF.
– An enhancer controlling PPIF gene expression modulates mitochondrial function in immune cells responding to inflammatory stimuli.

How this work moves the field forward

In the last lustrum, GWAS have identified over a hundred associated variants to many common traits such as cardiovascular diseases [4, 5, 6, 7].  The significant increase in sample size of both cases and controls of such studies allowed many new SNPs to reach the widely accepted threshold for genome-wide significance, which is presumably going to keep growing as sample size keeps growing. In fact, several studies have also included sub-threshold SNPs when assessing the functional activity of associated non-coding regions since they are enriched for epigenetic signals specific to disease-relevant tissues [8]. However, they are very likely to be functionally weaker when compared to genome-wide significant variants, which is suggested by their lower contribution to genetic risk scores [6, 9].

Considering all this, I can’t help but wonder whether GWAS will identify variants forever or if we would reach a paradox situation in which every single variant in the genome will be associated to every single trait though in a very weak manner. Coming back to a more pragmatic view of the current situation, high-throughput screenings that functionally validate the activity of associated regions are going to be of seminal importance and, in this respect, massively-parallel reporter assay (MPRA) are very promising. Nevertheless, such assays have many limitations, such as their use exclusively in cell culture. Therefore, in order to elucidate the role of disease-relevant variants we still need genetic approaches of lower throughput that focuses on a reduced number of loci.

We could say that the two main limitations in the field are a constantly growing number of associations which not necessarily are relevant and the need of time-consuming techniques which are a bottleneck to fast-forwarding scientific discovery. For this matter, prioritization is of tremendous importance, since we cannot test everything thoroughly. Works like the one presented by Nasser et al. provide tools to find the needle in a haystack full of associations which will help dissect the regulatory code and understand the genetic contribution to disease.

Question to authors

1. In this preprint, the authors identified around 6 million candidate enhancers based on the ABC model in different biosamples. Do the authors know the estimated rate of false positive enhancers that should be expected among those? Are the authors planning on doing any sort of systematic validation to have a prediction of the performance of the predicting tool?

2. Each year, many new loci are identified for common diseases by GWAS. How easy would it be the systematic update of the new associations, in this case to inflammatory bowel disease, to include them in the priorized set of enhancer-genes?

3. In order to predict ABC enhancers the authors use data on chromatin accessibility, histone marks and HiC. How do the authors envision the inclusion of MPRA data to the ABC model in cell types where such resource is available?


1. GWAS catalog

2. Manolio  TA. 2010. Genomewide Association Studies and Assessment of the Risk of Disease. N Eng J Med 363, 166-176.

3. Fulco CP, Nasser J et al. 2019. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664–1669.

4. Nielsen JB et al. 2018. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat Genet 50, 1234–1239.

5. Roselli C et al. 2018. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 50, 1225–1233.

6. Nelson CP et al. 2017. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 49, 1385–1391.

7. van der Harst P et al. 2018. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ Res 122(3), 433–443.

8. Wang X et al. 2016. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. Elife 5:e10557.

9. Villar D et al. 2020. The contribution of non-coding regulatory elements to cardiovascular disease. Open Biol 10:200088.

Tags: activity by contact, gene expression, gwas

Posted on: 30 September 2020


Read preprint (1 votes)

Have your say

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the genetics category:

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

This preList contains preprints discussed during the 'Semmelweis Symposium 2022' (7-9 November), organised around the 40th anniversary of international medical education at Semmelweis University covering a wide range of topics.


List by Nándor Lipták

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

In this annual conference, Hungarian geneticists, biochemists and biotechnologists presented their works. Link:


List by Nándor Lipták

2nd Conference of the Visegrád Group Society for Developmental Biology

Preprints from the 2nd Conference of the Visegrád Group Society for Developmental Biology (2-5 September, 2021, Szeged, Hungary)


List by Nándor Lipták

EMBL Conference: From functional genomics to systems biology

Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020


List by Jesus Victorino

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20


List by Maiko Kitaoka et al.

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah


Preprints on autophagy and lysosomal degradation and its role in neurodegeneration and disease. Includes molecular mechanisms, upstream signalling and regulation as well as studies on pharmaceutical interventions to upregulate the process.


List by Sandra Malmgren Hill

Zebrafish immunology

A compilation of cutting-edge research that uses the zebrafish as a model system to elucidate novel immunological mechanisms in health and disease.


List by Shikha Nayar