Genome-wide maps of enhancer regulation connect risk variants to disease genes

Joseph Nasser, Drew T. Bergman, Charles P. Fulco, Philine Guckelberger, Benjamin R. Doughty, Tejal A. Patwardhan, Thouis R. Jones, Tung H. Nguyen, Jacob C. Ulirsch, Heini M. Natri, Elle M. Weeks, Glen Munson, Michael Kane, Helen Y. Kang, Ang Cui, John P. Ray, Tom M. Eisenhaure, Kristy Mualim, Ryan L. Collins, Kushal Dey, Alkes L. Price, Charles B. Epstein, Anshul Kundaje, Ramnik J. Xavier, Mark J. Daly, Hailiang Huang, Hilary K. Finucane, Nir Hacohen, Eric S. Lander, Jesse M. Engreitz

Posted on: 30 September 2020

Preprint posted on 3 September 2020

Having a hard time finding the needle in the haystack? The activity-by-contact model connects risk variants and target genes to prioritize functional studies on GWAS.

Selected by Jesus Victorino

Categories: genetics, genomics

*If you liked this preLights, please click on the thumb-up icon at the end of the page. Any comment, suggestion or question related to either scientific discussion or format will be more than welcome and very much appreciated. You can write directly at the bottom of this page or contact me by email or Twitter.

Last week I joined the #PreprintReviewChallenge, a great (and virtual) initiative organized by @ASAPbio_ and supported by @preLights, @PREreview_, @PeerCommunityIn & @PubPeer to build trust in #preprints. It was great to see more than 50 people, most of which were early-career researchers, gathering to chat about science, discuss about each other’s experience. With regard to science, I suggested the latest manuscript from the labs of Jesse Engreitz and Eric Lander. A week later (better late than never!) here’s my highlight including parts of the discussion that Iratxe Puebla, Julien Roux and myself had during the event.

I've just registered to participate in the @ASAPbio_ initiative #PreprintReviewChallenge to build trust in #preprints.@preLights @PREreview_ @PeerCommunityIn & @PubPeer joined to discuss and comment on recent science (no previous experience is needed)https://t.co/k5haJfujwy

— Jesús Mellamo (@JesusMellamoyo1) September 8, 2020

Summary & background

In the GWAS era that we live in, thousands of risk variants have already been associated to diseases [1]. For the most studied traits, the list of candidate loci contributing to common polygenic disorders is above a hundred and the number keeps growing as the sample size enlarges. The majority (>80%) of associated polymorphisms lie on the non-coding genome where they might affect the activity of regulatory elements and, therefore, gene expression [2]. But of which genes? And in which tissue?

Due to the huge number of possible scenarios for a given disease, it is of great importance to prioritize the candidate regions on which to focus functional studies. In this preprint, Nasser,
Bergman, Fulco, Guckelberger, Doughty et al. et al. build maps of enhancers with their target genes in over a hundred samples using a model of ‘Activity by contact’ where they take into
account chromatin accessibility, enhancer marks and enhancer-promoter interaction [3]. They integrate this data with variants associated to inflammatory bowel disease, among other traits,
and predict their target genes and tissue of relevance (Fig. 1). Using this approach, the authors identify an enhancer linked to inflammatory bowel disease that affects the metabolic state of mitochondria in immune cells. This work provides an interesting and powerful approach to characterize enhancer landscapes and their effect on the regulation of genes causing disease.

Figure 1. Activity by contact on over 100 biosamples to prioritized disease-associated pairs of enhancers-genes (taken from Fig 1a of the preprint).

Key results

– Mapping of over 6 million enhancer-gene connections across >100 biological samples.
– Prediction of the target genes for nearly 5,000 variants within enhancers across 72 traits.
– Prioritizing 14 new genes to inflammatory bowel disease, including PPIF.
– An enhancer controlling PPIF gene expression modulates mitochondrial function in immune cells responding to inflammatory stimuli.

How this work moves the field forward

In the last lustrum, GWAS have identified over a hundred associated variants to many common traits such as cardiovascular diseases [4, 5, 6, 7]. The significant increase in sample size of both cases and controls of such studies allowed many new SNPs to reach the widely accepted threshold for genome-wide significance, which is presumably going to keep growing as sample size keeps growing. In fact, several studies have also included sub-threshold SNPs when assessing the functional activity of associated non-coding regions since they are enriched for epigenetic signals specific to disease-relevant tissues [8]. However, they are very likely to be functionally weaker when compared to genome-wide significant variants, which is suggested by their lower contribution to genetic risk scores [6, 9].

Considering all this, I can’t help but wonder whether GWAS will identify variants forever or if we would reach a paradox situation in which every single variant in the genome will be associated to every single trait though in a very weak manner. Coming back to a more pragmatic view of the current situation, high-throughput screenings that functionally validate the activity of associated regions are going to be of seminal importance and, in this respect, massively-parallel reporter assay (MPRA) are very promising. Nevertheless, such assays have many limitations, such as their use exclusively in cell culture. Therefore, in order to elucidate the role of disease-relevant variants we still need genetic approaches of lower throughput that focuses on a reduced number of loci.

We could say that the two main limitations in the field are a constantly growing number of associations which not necessarily are relevant and the need of time-consuming techniques which are a bottleneck to fast-forwarding scientific discovery. For this matter, prioritization is of tremendous importance, since we cannot test everything thoroughly. Works like the one presented by Nasser et al. provide tools to find the needle in a haystack full of associations which will help dissect the regulatory code and understand the genetic contribution to disease.

Question to authors

1. In this preprint, the authors identified around 6 million candidate enhancers based on the ABC model in different biosamples. Do the authors know the estimated rate of false positive enhancers that should be expected among those? Are the authors planning on doing any sort of systematic validation to have a prediction of the performance of the predicting tool?

2. Each year, many new loci are identified for common diseases by GWAS. How easy would it be the systematic update of the new associations, in this case to inflammatory bowel disease, to include them in the priorized set of enhancer-genes?

3. In order to predict ABC enhancers the authors use data on chromatin accessibility, histone marks and HiC. How do the authors envision the inclusion of MPRA data to the ABC model in cell types where such resource is available?

References

1. GWAS catalog

2. Manolio TA. 2010. Genomewide Association Studies and Assessment of the Risk of Disease. N Eng J Med 363, 166-176.

3. Fulco CP, Nasser J et al. 2019. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat Genet 51, 1664–1669.

4. Nielsen JB et al. 2018. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat Genet 50, 1234–1239.

5. Roselli C et al. 2018. Multi-ethnic genome-wide association study for atrial fibrillation. Nat Genet 50, 1225–1233.

6. Nelson CP et al. 2017. Association analyses based on false discovery rate implicate new loci for coronary artery disease. Nat Genet 49, 1385–1391.

7. van der Harst P et al. 2018. Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease. Circ Res 122(3), 433–443.

8. Wang X et al. 2016. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. Elife 5:e10557.

9. Villar D et al. 2020. The contribution of non-coding regulatory elements to cardiovascular disease. Open Biol 10:200088.

Tags: activity by contact, gene expression, gwas

doi: https://doi.org/10.1242/prelights.25020

Read preprint

(1 votes)

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Selected by 04 February 2026

Roberto Amadio et al.

Discussion

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Mohamad Youness, Onne A.H.O. Ronda, Ankit Pradhan, et al.

Selected by 15 December 2025

Jeny Jose

Discussion

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

Blake Dirks, Alex E. Mohr, Karen D. Corbin, et al.

Selected by 11 December 2025

Jasmine Talevi

Discussion

A high-coverage genome from a 200,000-year-old Denisovan

Stéphane Peyrégne, Diyendo Massilani, Yaniv Swiel, et al.

AND

A global map for introgressed structural variation and selection in humans

PingHsun Hsieh, Natthapon Soisangwan, David S. Gordon, et al.

Selected by 02 December 2025

Siddharth Singh

Discussion

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

preLists in the genetics category:

SciELO preprints – From 2025 onwards

SciELO has become a cornerstone of open, multilingual scholarly communication across Latin America. Its preprint server, SciELO preprints, is expanding the global reach of preprinted research from the region (for more information, see our interview with Carolina Tanigushi). This preList brings together biological, English language SciELO preprints to help readers discover emerging work from the Global South. By highlighting these preprints in one place, we aim to support visibility, encourage early feedback, and showcase the vibrant research communities contributing to SciELO’s open science ecosystem.

Genome-wide maps of enhancer regulation connect risk variants to disease genes

Summary & background

Key results

How this work moves the field forward

Question to authors

References

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Kosmos: An AI Scientist for Autonomous Discovery

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

A high-coverage genome from a 200,000-year-old Denisovan

A global map for introgressed structural variation and selection in humans

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

preLists in the genetics category:

SciELO preprints – From 2025 onwards

October in preprints – DevBio & Stem cell biology

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

Early 2025 preprints – the genetics & genomics edition

January in preprints – the CellBio edition

End-of-year preprints – the genetics & genomics edition

BSDB/GenSoc Spring Meeting 2024

BSCB-Biochemical Society 2024 Cell Migration meeting

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

2nd Conference of the Visegrád Group Society for Developmental Biology

EMBL Conference: From functional genomics to systems biology

TAGC 2020

ECFG15 – Fungal biology

Autophagy

Zebrafish immunology

Also in the genomics category:

November in preprints – DevBio & Stem cell biology

May in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

Early 2025 preprints – the genetics & genomics edition

End-of-year preprints – the genetics & genomics edition

BSCB-Biochemical Society 2024 Cell Migration meeting

9th International Symposium on the Biology of Vertebrate Sex Determination

Semmelweis Symposium 2022: 40th anniversary of international medical education at Semmelweis University

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

EMBL Conference: From functional genomics to systems biology

TAGC 2020