GeneWalk identifies relevant gene functions for a biological context using network representation learning

Robert Ietswaart, Benjamin M. Gyori, John A. Bachman, Peter K. Sorger, L. Stirling Churchman

Posted on: 4 October 2019

Preprint posted on 5 September 2019

Article now published in Genome Biology at http://dx.doi.org/10.1186/s13059-021-02264-8

Fear The Walking Gene: knowledge-based machine learning is able to highlight gene functions relevant for a distinct biological context

Selected by Ramona Jühlen

Categories: biochemistry, bioinformatics, cancer biology, cell biology, developmental biology, genetics, genomics, molecular biology

Background and GeneWalk methodology

High-throughput functional genomics can provide scientists with a long list of candidate genes which could play a role in the biological context they are studying. But how should one narrow down such a list to get candidate genes which are the most important in the given context? Gene functions are commonly interrogated using GO annotations, and GO coupled to gene set enrichment analysis (GSEA) can be used to reveal enriched biological functions in a gene set. This analysis, however, does not address the context-specific functions of individual genes in the dataset. To overcome this shortcoming, the authors developed GeneWalk, a novel approach using knowledge-based machine learning and statistical modelling.

First, GeneWalk assembles a context-specific gene network from a knowledge base (e.g. Pathway Commons, INDRA) starting with a list of input genes obtained from a specific experiment. This gene network is added to a GO network resulting in a full GeneWalk network (GWN) (Figure 1).

Next, the GWN structure is learned by an unsupervised network representation learning algorithm, termed DeepWalk (1). Briefly, using random walks the local neighbourhood of nodes (representing genes or GO terms) is scanned, summarised as a collection of neighbouring node pairs and provided as a training set for a neural network with one hidden layer (the layer between input and out, i.e. the artificial neuron) (Figure 1). After training, each input node in the GWN is represented as a vector by the resultant hidden layer weights.

Finally, GeneWalk determines by significance testing whether the similarity value between a gene and a GO term is higher than that of a generated null distribution of similarity values (Figure 1). Yielded adjusted p-values rank the relevant context-specific GO term for a gene of interest.

Figure 1. Scheme of GeneWalk methodology. Details outlining GeneWalk network representation learning and significance testing of the GeneWalk methodology.

Example applications of GeneWalk

To test GeneWalk the authors set out to use it first in an already characterised experimental context. Oligodendrocytes myelinate neurons in the brain in a QKI-dependent mechanism, where the gene QKI codes for a RNA-binding protein involved in alternative splicing. RNA-sequencing data of QkI-deficient murine oligodendrocytes revealed 1899 differentially expressed genes, and several of those strong down-regulated genes have been linked to neuron myelination (e.g. Mal, Pllp, Plp1) (2). GeneWalk, using the knowledge base INDRA, identified in the RNA-sequencing data of QkI-deficient murine brains that GO terms linked to neuron myelination were most similar to the differentially expressed genes Mal, Pllp and Plp1. GSEA analysis using PANTHER also identified myelination-related processes to be enriched; however, specific gene functions in this biological context could not be recovered. Additionally, the authors present that GeneWalk is not influenced by biases from genes with a high or low number of GO annotations, or from the degree of connectivity of GO annotations of a gene

Next, in order to apply GeneWalk in a different experimental set-up the authors reanalysed
published Native Elongation Transcript sequencing (NET-seq) data of a human T-cell acute
lymphoblastic leukaemia (ALL) cell line responding to treatment with JQ1 (3). JQ1 is a small drug
that targets BRD4 and other BET family members that are involved in haematologic cancers like
ALL. With NET-seq a quantitative read-out of the nascent transcription is possible. By first
calculating differentially transcribed protein-coding genes, GeneWalk identified 28% similar GO
terms for these genes, whereas conventional GSEA only identified five high-level functions with
low fold enrichment. These results reveal the advantage of GeneWalk (and disadvantage of GSEA),
when a magnitude of functionally unrelated genes are mis-regulated. Furthermore, in this
experiment GeneWalk was able to systematically prioritise context-specific functions of genes with
a multitude of GO annotations (e.g. MYC or BRCA1), that are not all relevant for this specific
biological context.

As a third application of GeneWalk the authors generated NET-seq data from HeLa cells treated
with the biflavonoid isoginkgetin (IsoG). IsoG is a plant-derived compound with possible anticancerogenic abilities. It has been shown that IsoG inhibits pre-mRNA splicing in vitro and in vivo
and causes Pol II accumulation at the 5’-end of genes (4); however, its exact mode of action
remains to be elucidated. NET-seq revealed 2940 genes as differentially transcribed upon IsoG
treatment and GeneWalk found that 24% of these genes had at least one similar GO term. On the
contrast to GSEA, GeneWalk found HES1, EGR1 and IRF1 as plausible candidate genes for
inhibiting Pol II transcriptional elongation after IsoG treatment.

Summed up, the authors provide a novel computational tool that is able to identify context-specific
gene functions in gene sets of experimental assays. These assays are not limited to input data of
RNA-sequencing or NET-seq, but can also be transferred to e.g. CRISPR screens or mass
spectrometry approaches.

What I like about this work and open questions

GeneWalk supplements over-representation tests and GSEA of GO annotations. I am currently
doing GSEA using the R package clusterProfiler (5), and now I will alternatively analyse my data
using GeneWalk in order to complement my results. Both tools seem to be a great combination
(they are also both open source)!

It will be good to know whether it will be possible in the future to add another genome wide
annotation parameter by mapping Entrez Gene identifiers, so that data of more species can be
analysed (Bioconductor provides OrgDb for 20 species).

Additional references

1. B. Perozzi, R. Al-Rfou, S. Skiena, Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining – KDD ’14, 701–710 (2014).
2. L. Darbelli, K. Choquet, S. Richard, C. L. Kleinman, Sci Rep. 7, 1–13 (2017).
3. G. E. Winter et al., Mol. Cell. 67, 5-18.e19 (2017).
4. K. O’Brien, A. J. Matlin, A. M. Lowell, M. J. Moore, J. Biol. Chem. 283, 33147–33154 (2008).
5. G. Yu, L.-G. Wang, Y. Han, Q.-Y. He, OMICS. 16, 284–287 (2012).

More information

https://github.com/churchmanlab/genewalk

https://churchman.med.harvard.edu/genewalk

Tags: crispr screen, genomics, gsea, proteomics, python

doi: https://doi.org/10.1242/prelights.14324

Read preprint

(No Ratings Yet)

Author's response

Robert Ietswaart and L. Stirling Churchman shared

Dear Ramona,

Thank you for the exciting preLight on GeneWalk! We hope it will help to get more insight
into your functional genomics data. We agree with you that GeneWalk is complementary to GSEA as GeneWalk is more focused on getting insight into the functional roles of individual genes, whilst GSEA provides more global information on which processes are relevant to the biological context.

As you pointed out, GeneWalk currently works for human and mouse, but we are still looking into the best way to extend GeneWalk to other model organisms. The feasibility of extending GeneWalk depends mostly on whether there are open source knowledge bases available that contain mechanistic reaction statements such as “CDK9 phosphorylates RNA Polymerase II” that have previously been reported in the scientific literature. Such reactions are slightly different from gene annotations, which serve more as curated function labels for genes. GeneWalk makes use of these reactions (besides annotations) as they provide an understanding of how the input genes interact with each other. A gene that interacts with many functionally related input genes is then found to be central to the biological context and those shared functions are ranked as most relevant to that gene.

Thank you for suggesting OrgDb at Bioconductor! As far as we understand, it provides a
gene annotation database for many organisms, so we would also still look for reaction
knowledge bases for those organisms to complement the information needed for GeneWalk. Budding yeast for instance, seems like a model organism that GeneWalk could be extended to in the future by using reactions from the Saccharomyces Genome Database (SGD). We’ve recently added support for human Ensembl gene IDs as input and some data visualization code in Python and R in the tutorial. We are open to hear more suggestions on useful features from the community.

Kind regards,

Robert and Stirling

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the biochemistry category:

Active flows drive clustering and sorting of membrane components with differential affinity to dynamic actin cytoskeleton

Abrar Bhat, Amit Das, Meenakshi Iyer, et al.

Selected by 26 January 2026

Teodora Piskova

Snake venom metalloproteinases are predominantly responsible for the cytotoxic effects of certain African viper venoms

Keirah E. Bartlett, Adam Westhorpe, Mark C. Wilkinson, et al.

Selected by 13 January 2026

Daniel Osorno Valencia

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

William S. Stuart, Michail N. Isupov, Mathew McLaren, et al.

Selected by 06 January 2026

Leonie Brüne

Also in the bioinformatics category:

The lipidomic architecture of the mouse brain

Luca Fusar Bassini, Halima Hannah Schede, Laura Capolupo, et al.

Selected by 09 February 2026

CRM UoE Journal Club et al.

Discussion

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Selected by 04 February 2026

Roberto Amadio et al.

Discussion

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Also in the cancer biology category:

A Novel Chimeric Antigen Receptor (CAR) - Strategy to Target EGFRVIII-Mutated Glioblastoma Cells via Macrophages

Kristi Vera, Gülen Esken, Jin Wook Hwang, et al.

Selected by 21 January 2026

Dina Kabbara

Discussion

Taxane-Induced Conformational Changes in the Microtubule Lattice Activate GEF-H1-Dependent RhoA Signaling

Joyce C. M. Meiring, Varsha Mahapatra, Molly S.C. Gravett, et al.

Selected by 31 December 2025

Vibha SINGH

ROCK2 inhibition has a dual role in reducing ECM remodelling and cell growth, while impairing migration and invasion

Daniel A. Reed, Anna E. Howell, Nadia Kuepper, et al.

Selected by 27 November 2025

Sharvari Pitke

Also in the cell biology category:

Resilience to cardiac aging in Greenland shark Somniosus microcephalus

Elena Chiavacci, Kirstine Fleng Steffensen, Pierre Delaroche, et al.

Selected by 17 February 2026

Theodora Stougiannou

The lipidomic architecture of the mouse brain

Luca Fusar Bassini, Halima Hannah Schede, Laura Capolupo, et al.

Selected by 09 February 2026

CRM UoE Journal Club et al.

Discussion

Self-renewal of neuronal mitochondria through asymmetric division

Tejashree Pradip Waingankar, Camryn Zurita, Angelica E. Lang, et al.

Selected by 06 February 2026

Lorena Olifiers

Also in the developmental biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Cross Sectional and Longitudinal Imaging Reveals Spatiotemporal Divergence in Morphogenesis and Cell Lineage Specification between in-vivo and in-vitro Mouse Embryo during Pre- and Peri-implantation

Huanhuan Yang

Selected by 19 February 2026

Heather Pollington

Tissue mechanics and systemic signaling safeguard epithelial tissue against spindle misorientation

Floris Bosveld, Baptiste Tesson, Eric van Leen, et al.

Selected by 26 January 2026

Ruoheng Li

Discussion

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Selected by 04 February 2026

Roberto Amadio et al.

Discussion

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Mohamad Youness, Onne A.H.O. Ronda, Ankit Pradhan, et al.

Selected by 15 December 2025

Jeny Jose

Discussion

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

Blake Dirks, Alex E. Mohr, Karen D. Corbin, et al.

Selected by 11 December 2025

Jasmine Talevi

Discussion

A high-coverage genome from a 200,000-year-old Denisovan

Stéphane Peyrégne, Diyendo Massilani, Yaniv Swiel, et al.

AND

A global map for introgressed structural variation and selection in humans

PingHsun Hsieh, Natthapon Soisangwan, David S. Gordon, et al.

Selected by 02 December 2025

Siddharth Singh

Discussion

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Also in the molecular biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

William S. Stuart, Michail N. Isupov, Mathew McLaren, et al.

Selected by 06 January 2026

Leonie Brüne

Junctional Heterogeneity Shapes Epithelial Morphospace

Anubhav Prakash, Raman Kaushik, Nishant Singh, et al.

Selected by 25 December 2025

Bhaval Parmar

preLists in the biochemistry category:

September in preprints – Cell biology edition

A group of preLighters, with expertise in different areas of cell biology, have worked together to create this preprint reading list. This month, categories include: (1) Cell organelles and organisation, (2) Cell signalling and mechanosensing, (3) Cell metabolism, (4) Cell cycle and division, (5) Cell migration

GeneWalk identifies relevant gene functions for a biological context using network representation learning

Background and GeneWalk methodology

Example applications of GeneWalk

What I like about this work and open questions

Additional references

More information

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the biochemistry category:

Active flows drive clustering and sorting of membrane components with differential affinity to dynamic actin cytoskeleton

Snake venom metalloproteinases are predominantly responsible for the cytotoxic effects of certain African viper venoms

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

Also in the bioinformatics category:

The lipidomic architecture of the mouse brain

Kosmos: An AI Scientist for Autonomous Discovery

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Also in the cancer biology category:

A Novel Chimeric Antigen Receptor (CAR) - Strategy to Target EGFRVIII-Mutated Glioblastoma Cells via Macrophages

Taxane-Induced Conformational Changes in the Microtubule Lattice Activate GEF-H1-Dependent RhoA Signaling

ROCK2 inhibition has a dual role in reducing ECM remodelling and cell growth, while impairing migration and invasion

Also in the cell biology category:

Resilience to cardiac aging in Greenland shark Somniosus microcephalus

The lipidomic architecture of the mouse brain

Self-renewal of neuronal mitochondria through asymmetric division

Also in the developmental biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Cross Sectional and Longitudinal Imaging Reveals Spatiotemporal Divergence in Morphogenesis and Cell Lineage Specification between in-vivo and in-vitro Mouse Embryo during Pre- and Peri-implantation

Tissue mechanics and systemic signaling safeguard epithelial tissue against spindle misorientation

Also in the genetics category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Kosmos: An AI Scientist for Autonomous Discovery

Loss of MGST1 during fibroblast differentiation enhances vulnerability to oxidative stress in human heart failure

Also in the genomics category:

Microbial Feast or Famine: dietary carbohydrate composition and gut microbiota metabolic function

A high-coverage genome from a 200,000-year-old Denisovan

A global map for introgressed structural variation and selection in humans

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Also in the molecular biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

Junctional Heterogeneity Shapes Epithelial Morphospace

preLists in the biochemistry category:

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

Biologists @ 100 conference preList

February in preprints – the CellBio edition

Community-driven preList – Immunology

January in preprints – the CellBio edition

BSCB-Biochemical Society 2024 Cell Migration meeting

Peer Review in Biomedical Sciences

CellBio 2022 – An ASCB/EMBO Meeting

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

Fibroblasts

ASCB EMBO Annual Meeting 2019

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Cellular metabolism

MitoList

Also in the bioinformatics category:

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

‘In preprints’ from Development 2022-2023

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

Fibroblasts

Single Cell Biology 2020

Antimicrobials: Discovery, clinical use, and development of resistance

Also in the cancer biology category:

October in preprints – Cell biology edition

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList