Biologically informed NeuralODEs for genome-wide regulatory dynamics

Intekhab Hossain, Viola Fanfani, John Quackenbush, Rebekka Burkholz

Posted on: 31 May 2023 , updated on: 22 May 2024

Preprint posted on 27 February 2023

Article now published in Genome Biology at https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03264-0

Predictive, explainable, flexible & scalable: Hossain and colleagues developed a modelling framework based on prior-informed neuralODEs (PHOENIX) to estimate gene regulatory dynamics.

Selected by Benjamin Dominik Maier

Categories: bioinformatics, cancer biology, systems biology

Updated 22 May 2024 with a postLight by Benjamin Dominik Maier

Congratulations to Intekhab Hossain, Viola Fanfani, Jonas Fischer, John Quackenbush and Rebekka Burkholz! Their manuscript has been published in Genome Biology on May 21st 2024 (link to the journal article). Jonas Fischer has been added to the author list during the review process and contributed to this work with the newly added B cell modeling experiment and its interpretation.

When comparing the preprint to the published manuscript, there are two noteworthy changes:
– Firstly, the authors added a comparative quantitative analysis of PHOENIX against existing competitor methods (Dynamo, RNA-ODE, DeepVelo) as well as out-of-the-box NeuralODEs using real data (yeast and breast cancer) expanding on their previous qualitative theoretical approach (see old Table 1). The authors observed better scalability and better interpretability compared to the existing black-box models as well as a better performance compared to out-of-the-box NeuralODE models (see Table 1, Figure 6, Supp. Table 11, Supp. Table 14).
– Secondly, the authors added a case study on longitudinal RNA-seq measurements of B-cells treated with the monoclonal antibody Rituximab and an untreated control. Rituximab binds to B-cells in order to induce cell death through apoptosis, NK-mediated cytotoxicity or macrophage-mediated phagocytosis thereby acting as targeted cancer drug against B cell malignancies. Following model training, PHOENIX was found to accurately model the gene expression dynamics of both conditions. When looking at changes in regulatory dynamics, the authors identified several key regulators of apoptosis (see Supp. Figures S11 & S12 as well as Supp. Table S15) highlighting the biological interpretability of the results.
For more details and reasoning on modifications of the manuscript, please have a look at the published reviewer correspondence which can be found here.

Background:

Modelling Gene Regulatory Networks (GRNs)

A gene regulatory network (GRN) is a conceptual model that explains how genes and their regulatory elements (e.g. transcription factors) interact within a cell (Karlebach & Shamir, 2008). These directional interactions between genes and their products can be modelled as a system of coupled ordinary differential equations (ODEs) with activations or repressions represented as positive or negative terms. As the ODE model describes how the concentrations of each component change over time, it can help to causally explain temporal gene expression patterns.

ODE models are constructed based on existing knowledge of the network structure and kinetic parameters, which can be derived from literature or experimental data. Unknown parameters can be iteratively estimated by minimising the difference between predicted values and the true values of the training dataset. The function that is used to compute this difference/error is called loss function or cost function. After validating the model with independent data, ODE models can predict and study GRN behaviour under various conditions, such as environmental changes or gene mutations. The model can therefore be solved numerically using inter alia the Runge-Kutta or Euler’s methods (for visualisations check out the blog entry by Harold Serrano) to simulate GRN dynamics (Griffiths & Higham, 2010). In short, the Runge-Kutta method estimates the value of y for a given x by computing slopes at individual points and taking the weighted averages of them thereby approximating first-order ordinary differential equations.

Neural Ordinary Differential Equations (NeuralODE)

Neural Ordinary Differential Equations are a type of neural network architecture that directly model the continuous evolution of a system using ordinary differential equations (Chen et al., 2018). The input to the model is a set of initial conditions and an ODE-based function that describes the change in the system over time as a continuous trajectory. The neural network is then trained to learn the parameters of the ODE function that best fit the training data. Neural ODEs have shown promising results in a variety of applications, including image classification (Paoletti et al., 2020), time-series prediction (Jin et al., 2022), and physical simulations (Lanzieri et al., 2022; Kong et al., 2022; Lai et al., 2022).

For readers interested in the topic, I can warmly recommend to read the more detailed introduction to (neural) ODEs by Jonty Sinai, which can be found at https://jontysinai.github.io/jekyll/update/2019/01/18/understanding-neural-odes.html. An overview of different ML methods to infer gene regulatory networks can be found in Table 1 of the featured preprint.

Fig. 1 PHOENIX Neural ODE framework. Figure taken from Hossain et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International license.

Out-of-the-box models (OOTB models)

Out-of-the-box or pre-trained NLP models are machine learning models that are trained on general purpose data sources. Even though they tend not to be tailored to specific questions/applications, they are popular for pioneering and benchmarking routines as they can be used immediately without time-/resource-expensive customization or training on specific datasets.

Key Findings

The authors developed their ML-framework PHOENIX (Prior-informed Hill-like ODEs to Enhance Neuralnet Integrals with eXplainability) to overcome pitfalls of previously published methods and obtain biologically more meaningful results (GitHub). To better represent the non-linear activation or inhibition of biological processes, the authors incorporated sigmoid Hill-Langmuir-like kinetics (Frank, 2013), which are commonly used when modelling pharmacological reactions, signal transduction and gene expression processes. The Hill-Langmuir equation accounts for saturation effects and cooperative processes such as transcription factors being able to bind to DNA molecules via multiple binding sites thereby increasing the gene expression rate (Chu et al., 2009). Moreover, Hill-like kinetics have been shown to better resemble noise filter-induced bimodality (Ochab-Marcinek et al., 2017), which cannot be resolved using linear dynamics.

Secondly, the PHOENIX ML-framework allows users to integrate prior domain knowledge models to leverage prior expert domain knowledge and create explainable model representations in sparse and noisy settings.

Dynamics from noisy simulated GRN

Hossain and colleagues tested PHOENIX on simulated gene expression data from two in silico systems using 150 trajectories (140 training / 10 testing) with varying noise levels. When compared to the ground truth (i.e. the original non-noisy data), PHOENIX was found to perform better than out-of-the-box models and previously published methods across different noise levels. PHOENIX successfully recovered the true, i.e. unnoisy gene expression patterns over time despite very noisy training trajectories and prior knowledge models. While prior-less PHOENIX demonstrated the highest predictability overall, adding a user-supplied prior knowledge model resulted in a better explainability.

Recovery of sparse causal biology

Next, the authors quantified the effect of misspecified prior knowledge models on the PHOENIX prediction to assess whether the model can learn causal elements in the system beyond the ones given in the prior knowledge model. They determined that PHOENIX is able to infer regulatory interactions from just the data itself and – if needed – can also deviate from the prior knowledge.

Oscillating yeast cell cycle dynamics

To assess how their framework performs on real biological data, the authors applied PHOENIX to time-resolved expression values of synchronised yeast cells during the cell cycle. Even though the authors had two experimental replicates, they decided against using one for training and one for validation as the high similarity between the replicates would have yielded artificially good results. Hence the data was split into transition pairs, i.e. always taking expression vectors from two consecutive time points with a 86%/7%/7% split (training, validation, testing). When comparing the prediction results to ChIP-chip transcription factor (TF) binding data, it seemed like the model had not only learned to explain temporal patterns in expression, but could also accurately predict TF binding. Moreover, the model predicted the continued periodic oscillations of the cell cycle, even though there were only two cycles present in the data and the ML framework was based on Hill-like kinetics. Hence, this result demonstrates that while predicting the dynamics accurately, the model is flexible enough to deviate from its kinetic framework and the prior knowledge model.

Fig. 2 PHOENIX prediction of yeast cell-cycle dynamics. Figure taken from Hossain et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International license.

Large-scale breast cancer dynamcis

Most computational approaches to infer gene regulatory networks are not scalable to human-genome scale networks (> 25,000 genes). To assess whether PHOENIX is extendable to large-scale human expression data, publicly available microarray expression values for 22,000 genes from 198 breast cancer patients were obtained and ordered in pseudotime. After excluding genes with no measurable expression, the data was split 90%/5%/5% into training, validation and testing data. PHOENIX predictions were found to be in agreement with a validation network of experimental ChIP-chip binding information, even when including all genes with measurable expression. Next, they perturbed the initial expression of each gene in silico and studied the effect on the predicted expressions of all other genes. Perturbations of (cancer-relevant) transcription factor genes and known cancer drivers were found to have the largest effects on overall gene expression, which is in line with prior literature knowledge. To test whether the full gene set is required for obtaining biological interpretable results, the authors reduced the number of included genes considerably. This resulted in a loss of mechanistic explainability of the resulting regulatory model as well as failing to identify previously found cancer-relevant pathways in a pathway-based functional enrichment analysis. Therefore, the authors concluded that there is a need and benefit for genome-scale predictors and that reducing the complexity of a predictor comes at the cost of explainability.

Conclusion and Perspective

While ODE’s have a long history in applied and pure mathematics, it is fascinating to see the transition to high-dimensional machine-learning algorithms with millions of parameters solving applied mathematical problems. Neural ODEs slowly close the gap between traditional mechanism-driven mathematical modelling and more data-driven ML-approaches. Given the enormous flexibility of neural networks as well as the development of even more sophisticated and highly optimised computing frameworks, it will be interesting to see which applied mathematical problems we might be able to solve in the next decade. Still, traditional approaches will remain indispensable for biological mathematical modelling.

I decided to feature this preprint from Intekhab Hossain and colleagues after his fantastic talk (link to recording) and poster presentation at the 15th RECOMB Satellite Workshop on Computational Cancer Biology in Istanbul earlier this year. Besides it being potentially very useful for my current PhD project, PHOENIX stood out to me as it allows the flexible integration of prior expert knowledge and is data-driven while at the same still mechanism-driven. Taken together, the ML-framework seems to be quite easily adaptable for a wide range of applications. Even though many researchers are pushing for interpretable models, a lot of researchers try to explain their black box models rather than developing explainable models from the beginning (see Rudin, 2019). Hence, I really like the approach chosen by Intekhab Hossain and colleagues.

Further Material

RECOMB-CCB 2023 talk

GitHub Repository

Tags/Keywords

Ordinary Differential Equation (ODE), Gene-Regulatory Network (GRN), Neural Network, Mathematical Modelling, Machine Learning

References

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural Ordinary Differential Equations (Version 5). arXiv. https://doi.org/10.48550/ARXIV.1806.07366

Chu, D., Zabet, N. R., & Mitavskiy, B. (2009). Models of transcription factor binding: Sensitivity of activation functions to model assumptions. Journal of Theoretical Biology (Vol. 257, Issue 3, pp. 419–429). https://doi.org/10.1016/j.jtbi.2008.11.026

Frank, S. A. (2013). Input-output relations in biological systems: measurement, information and the Hill equation. Biology Direct (Vol. 8, Issue 1). https://doi.org/10.1186/1745-6150-8-31

Griffiths, D. F., & Higham, D. J. (2010). Numerical methods for ordinary differential equations (2010th ed.). Guildford, England: Springer.

Jin, M., Zheng, Y., Li, Y.-F., Chen, S., Yang, B., & Pan, S. (2022). Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs. IEEE Transactions on Knowledge and Data Engineering (pp. 1–14). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/tkde.2022.3221989

Karlebach, G., Shamir, R. (2008) Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 9, 770–780. https://doi.org/10.1038/nrm2503

Kong, X., Yamashita, K., Foggo, B., & Yu, N. (2022). Dynamic Parameter Estimation with Physics-based Neural Ordinary Differential Equations. 2022 IEEE Power & Energy Society General Meeting (PESGM). https://doi.org/10.1109/pesgm48719.2022.9916840

Lai, Z., Liu, W., Jian, X., Bacsa, K., Sun, L., & Chatzi, E. (2022). Neural modal ordinary differential equations: Integrating physics-based modeling with neural ordinary differential equations for modeling high-dimensional monitored structures. Data-Centric Engineering, 3, E34. https://doi.org/10.1017/dce.2022.35

Lanzieri, D., Lanusse, F., & Starck, J.-L. (2022). Hybrid Physical-Neural ODEs for Fast N-body Simulations (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2207.05509

Ochab-Marcinek, A., Jędrak, J., & Tabaka, M. (2017). Hill kinetics as a noise filter: the role of transcription factor autoregulation in gene cascades. Physical Chemistry Chemical Physics (Vol. 19, Issue 33, pp. 22580–22591).https://doi.org/10.1039/c7cp00743d

Paoletti, M. E., Haut, J. M., Plaza, J., & Plaza, A. (2020). Neural Ordinary Differential Equations for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing (Vol. 58, Issue 3, pp. 1718–1734). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/tgrs.2019.2948031

Rudin, C. (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1, 206–215. https://doi.org/10.1038/s42256-019-0048-x

Tags: gene regulatory network, neural network, ode-modelling

doi: https://doi.org/10.1242/prelights.34751

Read preprint

(No Ratings Yet)

Author's response

Intekhab Hossain shared

Thanks for featuring our preprint! We really value and appreciate your interest in our work.

Q1: One central goal of modern biology is to understand the molecular characteristics and underlying regulatory mechanisms of cell fate decisions. Have you considered to use PHOENIX to screen for potential gene targets which manipulate cell fate decisions?

Yes, this is certainly a great use case, and something we see as a straightforward application of PHOENIX. We noted that most papers in this space have an application to cell fate decisions [1, 2, 3], but do so by first projecting the space down to a lower dimensional space, through UMAP, PCA, highly variable gene subsets, etc. The black-box nature of these dimension reduction techniques leads to a subsequent loss in interpretability. Hence for our paper, we decided to do something different by applying PHOENIX towards understanding the nuanced dynamics of breast cancer progression, a task for which the genome-wide scalability of PHOENIX (without dimensionality reduction) is crucial. We discovered that PHOENIX can find not only potential driver (target) genes, but also potential driver pathways underlying breast cancer progression. Hence PHOENIX promises to be a powerful tool for discovering new targets for many biological processes (including cell fate) and can do so while remaining extremely explainable

Q2: Neural ODEs are often even more unstable than regular ODEs? Have you tested whether your model fulfils the Lyapunov condition for stability?

This is a great question and is indeed something that we had been thinking about while designing the PHOENIX architecture. One of the reasons why neuralODEs are so unstable is exactly this instability that you mention, and we kept this in mind when designing the biologically motivated activation functions. As you will notice, the Hill–like activation functions depict the Hill–Langmuir kinetic equations which are indeed Lyapunov stable, allowing PHOENIX to model complex systems of interactions generally without numerical instability issues.

Q3: Have you studied whether there are subgroups of genes for which your predictability is particularly good or bad? And if so, are these gene families linked to certain processes or involved in big complexes? (The question is inspired by Fig. 2 of Srivastava et al., PLOS comp bio. 2022)

Thanks! This is great suggestion, and we haven’t looked into this closely enough. For our applications to breast cancer dynamics in humans and cell cycle dynamics in yeast, we focused on getting good predictive performance across all genes, and have not investigated yet whether certain genes families are more amenable to good predictions. We hope to look into this in subsequent analyses and post any interesting findings as an addendum to our supplement.

References:
1. Liu, R., Pisco, A. O., Braun, E., Linnarsson, S., & Zou, J. (2022). Dynamical systems model of RNA velocity improves inference of single–cell trajectory, pseudo–time and gene regulation. Journal of Molecular Biology, 434(15), 167606. https://doi.org/10.1016/j.jmb.2022.167606.
2. Qiu, Xiaojie, et al. “Mapping transcriptomic vector fields of single cells.” Cell 185.4 (2022): 690–711. https://doi.org/10.1016/j.cell.2021.12.045.
3. Yeo, Grace Hui Ting, Sachit D. Saksena, and David K. Gifford. “Generative modeling of single–cell time series with PRESCIENT enables prediction of cell trajectories with interventions.” Nature communications 12.1 (2021): 3222. https://doi.org/10.1038/s41467-021-23518-w.

1 comment

3 years

Benjamin Dominik Maier

Intekhab Hossain (ihossain@g.harvard.edu), the first author of this study, is happy to answer questions and guide users through the setup process of PHOENIX, so feel free to send him a message or comment here 🙂

Reply Report

Permalink

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the bioinformatics category:

The lipidomic architecture of the mouse brain

Luca Fusar Bassini, Halima Hannah Schede, Laura Capolupo, et al.

Selected by 09 February 2026

CRM UoE Journal Club et al.

Discussion

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, et al.

Selected by 04 February 2026

Roberto Amadio et al.

Discussion

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Also in the cancer biology category:

A Novel Chimeric Antigen Receptor (CAR) - Strategy to Target EGFRVIII-Mutated Glioblastoma Cells via Macrophages

Kristi Vera, Gülen Esken, Jin Wook Hwang, et al.

Selected by 21 January 2026

Dina Kabbara

Discussion

Taxane-Induced Conformational Changes in the Microtubule Lattice Activate GEF-H1-Dependent RhoA Signaling

Joyce C. M. Meiring, Varsha Mahapatra, Molly S.C. Gravett, et al.

Selected by 31 December 2025

Vibha SINGH

ROCK2 inhibition has a dual role in reducing ECM remodelling and cell growth, while impairing migration and invasion

Daniel A. Reed, Anna E. Howell, Nadia Kuepper, et al.

Selected by 27 November 2025

Sharvari Pitke

Also in the systems biology category:

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Longitudinal single cell RNA-sequencing reveals evolution of micro- and macro-states in chronic myeloid leukemia

David E. Frankhouser, Dandan Zhao, Yu-Hsuan Fu, et al.

Selected by 03 November 2025

Charis Qi

Environmental and Maternal Imprints on Infant Gut Metabolic Programming

Kine Eide Kvitne, Celeste Allaband, Jennifer C. Onuora, et al.

Selected by 26 October 2025

Siddharth Singh

Discussion

preLists in the bioinformatics category:

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

This preList contains preprints discussed during the Metabolic and Nutritional Control of Development and Cell Fate Keystone Symposia. This conference was organized by Lydia Finley and Ralph J. DeBerardinis and held in the Wylie Center and Tupper Manor at Endicott College, Beverly, MA, United States from May 7th to 9th 2025. This meeting marked the first in-person gathering of leading researchers exploring how metabolism influences development, including processes like cell fate, tissue patterning, and organ function, through nutrient availability and metabolic regulation. By integrating modern metabolic tools with genetic and epidemiological insights across model organisms, this event highlighted key mechanisms and identified open questions to advance the emerging field of developmental metabolism.

Biologically informed NeuralODEs for genome-wide regulatory dynamics

Background:

Modelling Gene Regulatory Networks (GRNs)

Neural Ordinary Differential Equations (NeuralODE)

Out-of-the-box models (OOTB models)

Key Findings

Dynamics from noisy simulated GRN

Recovery of sparse causal biology

Oscillating yeast cell cycle dynamics

Large-scale breast cancer dynamcis

Conclusion and Perspective

Further Material

Tags/Keywords

References

Share this:

1 comment

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the bioinformatics category:

The lipidomic architecture of the mouse brain

Kosmos: An AI Scientist for Autonomous Discovery

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Also in the cancer biology category:

A Novel Chimeric Antigen Receptor (CAR) - Strategy to Target EGFRVIII-Mutated Glioblastoma Cells via Macrophages

Taxane-Induced Conformational Changes in the Microtubule Lattice Activate GEF-H1-Dependent RhoA Signaling

ROCK2 inhibition has a dual role in reducing ECM remodelling and cell growth, while impairing migration and invasion

Also in the systems biology category:

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Longitudinal single cell RNA-sequencing reveals evolution of micro- and macro-states in chronic myeloid leukemia

Environmental and Maternal Imprints on Infant Gut Metabolic Programming

preLists in the bioinformatics category:

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

‘In preprints’ from Development 2022-2023

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

Fibroblasts

Single Cell Biology 2020

Antimicrobials: Discovery, clinical use, and development of resistance

Also in the cancer biology category:

October in preprints – Cell biology edition

September in preprints – Cell biology edition

July in preprints – the CellBio edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

March in preprints – the CellBio edition

Biologists @ 100 conference preList

February in preprints – the CellBio edition

BSCB-Biochemical Society 2024 Cell Migration meeting

CSHL 87th Symposium: Stem Cells

Journal of Cell Science meeting ‘Imaging Cell Dynamics’

CellBio 2022 – An ASCB/EMBO Meeting

Fibroblasts

Single Cell Biology 2020

ASCB EMBO Annual Meeting 2019

Lung Disease and Regeneration

Anticancer agents: Discovery and clinical use

Biophysical Society Annual Meeting 2019

Also in the systems biology category:

2024 Hypothalamus GRC

‘In preprints’ from Development 2022-2023

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

Single Cell Biology 2020

ASCB EMBO Annual Meeting 2019

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Pattern formation during development