Close

Biologically informed NeuralODEs for genome-wide regulatory dynamics

Intekhab Hossain, Viola Fanfani, John Quackenbush, Rebekka Burkholz

Preprint posted on 27 February 2023 https://www.biorxiv.org/content/10.1101/2023.02.24.529835v1

Predictive, explainable, flexible & scalable: Hossain and colleagues developed a modelling framework based on prior-informed neuralODEs (PHOENIX) to estimate gene regulatory dynamics.

Selected by Benjamin Dominik Maier

Background:

Modelling Gene Regulatory Networks (GRNs)

A gene regulatory network (GRN) is a conceptual model that explains how genes and their regulatory elements (e.g. transcription factors) interact within a cell (Karlebach & Shamir, 2008). These directional interactions between genes and their products can be modelled as a system of coupled ordinary differential equations (ODEs) with activations or repressions represented as positive or negative terms. As the ODE model describes how the concentrations of each component change over time, it can help to causally explain temporal gene expression patterns.

ODE models are constructed based on existing knowledge of the network structure and kinetic parameters, which can be derived from literature or experimental data. Unknown parameters can be iteratively estimated by minimising the difference between predicted values and the true values of the training dataset. The function that is used to compute this difference/error is called loss function or cost function. After validating the model with independent data, ODE models can predict and study GRN behaviour under various conditions, such as environmental changes or gene mutations. The model can therefore be solved numerically using inter alia the Runge-Kutta or Euler’s methods (for visualisations check out the blog entry by Harold Serrano) to simulate GRN dynamics (Griffiths & Higham, 2010). In short, the Runge-Kutta method estimates the value of y for a given x by computing slopes at individual points and taking the weighted averages of them thereby approximating first-order ordinary differential equations.

Neural Ordinary Differential Equations (NeuralODE)

Neural Ordinary Differential Equations are a type of neural network architecture that directly model the continuous evolution of a system using ordinary differential equations (Chen et al., 2018). The input to the model is a set of initial conditions and an ODE-based function that describes the change in the system over time as a continuous trajectory. The neural network is then trained to learn the parameters of the ODE function that best fit the training data. Neural ODEs have shown promising results in a variety of applications, including image classification (Paoletti et al., 2020), time-series prediction (Jin et al., 2022), and physical simulations (Lanzieri et al., 2022; Kong et al., 2022; Lai et al., 2022).

For readers interested in the topic, I can warmly recommend to read the more detailed introduction to (neural) ODEs by Jonty Sinai, which can be found at https://jontysinai.github.io/jekyll/update/2019/01/18/understanding-neural-odes.html. An overview of different ML methods to infer gene regulatory networks can be found in Table 1 of the featured preprint.

Fig. 1 PHOENIX Neural ODE framework. Figure taken from Hossain et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International license.

Out-of-the-box models (OOTB models)

Out-of-the-box or pre-trained NLP models are machine learning models that are trained on general purpose data sources. Even though they tend not to be tailored to specific questions/applications, they are popular for pioneering and benchmarking routines as they can be used immediately without time-/resource-expensive customization or training on specific datasets.

Key Findings

The authors developed their ML-framework PHOENIX (Prior-informed Hill-like ODEs to Enhance Neuralnet Integrals with eXplainability) to overcome pitfalls of previously published methods and obtain biologically more meaningful results (GitHub). To better represent the non-linear activation or inhibition of biological processes, the authors incorporated sigmoid Hill-Langmuir-like kinetics (Frank, 2013), which are commonly used when modelling pharmacological reactions, signal transduction and gene expression processes. The Hill-Langmuir equation accounts for saturation effects and cooperative processes such as transcription factors being able to bind to DNA molecules via multiple binding sites thereby increasing the gene expression rate (Chu et al., 2009). Moreover, Hill-like kinetics have been shown to better resemble noise filter-induced bimodality (Ochab-Marcinek et al., 2017), which cannot be resolved using linear dynamics.

Secondly, the PHOENIX ML-framework allows users to integrate prior domain knowledge models to leverage prior expert domain knowledge and create explainable model representations in sparse and noisy settings.

Dynamics from noisy simulated GRN

Hossain and colleagues tested PHOENIX on  simulated gene expression data from two in silico systems using 150 trajectories (140 training / 10 testing) with varying noise levels. When compared to the ground truth (i.e. the original non-noisy data), PHOENIX was found to perform better than out-of-the-box models and previously published methods across different noise levels. PHOENIX successfully recovered the true, i.e. unnoisy gene expression patterns over time despite very noisy training trajectories and prior knowledge models. While prior-less PHOENIX demonstrated the highest predictability overall, adding a user-supplied prior knowledge model resulted in a better explainability.

Recovery of sparse causal biology

Next, the authors quantified the effect of misspecified prior knowledge models on the PHOENIX prediction to assess whether the model can learn causal elements in the system beyond the ones given in the prior knowledge model. They determined that PHOENIX is able to infer regulatory interactions from just the data itself and – if needed – can also deviate from the prior knowledge.

Oscillating yeast cell cycle dynamics

To assess how their framework performs on real biological data, the authors applied PHOENIX to time-resolved expression values of synchronised yeast cells during the cell cycle. Even though the authors had two experimental replicates, they decided against using one for training and one for validation as the high similarity between the replicates would have yielded artificially good results. Hence the data was split into transition pairs, i.e. always taking expression vectors from two consecutive time points with a 86%/7%/7% split (training, validation, testing). When comparing the prediction results to ChIP-chip transcription factor (TF) binding data, it seemed like the model had not only learned to explain temporal patterns in expression, but could also accurately predict TF binding. Moreover, the model predicted the continued periodic oscillations of the cell cycle, even though there were only two cycles present in the data and the ML framework was based on Hill-like kinetics. Hence, this result demonstrates that while predicting the dynamics accurately, the model is flexible enough to deviate from its kinetic framework and the prior knowledge model.

Fig. 2 PHOENIX prediction of yeast cell-cycle dynamics. Figure taken from Hossain et al. (2023), BioRxiv published under the CC-BY-NC-ND 4.0 International license.

Large-scale breast cancer dynamcis

Most computational approaches to infer gene regulatory networks are not scalable to human-genome scale networks (> 25,000 genes). To assess whether PHOENIX is extendable to large-scale human expression data, publicly available microarray expression values for 22,000 genes from 198 breast cancer patients were obtained and ordered in pseudotime. After excluding genes with no measurable expression, the data was split 90%/5%/5% into training, validation and testing data. PHOENIX predictions were found to be in agreement with a validation network of experimental ChIP-chip binding information, even when including all genes with measurable expression. Next, they  perturbed the initial expression of each gene in silico and studied the effect on the predicted expressions of all other genes.  Perturbations of (cancer-relevant)  transcription factor genes and known cancer drivers were found to have the largest effects on overall gene expression, which is in line with prior literature knowledge. To test whether the full gene set is required for obtaining biological interpretable results, the authors reduced the number of included genes considerably. This resulted in a loss of mechanistic explainability of the resulting regulatory model as well as failing to identify previously found cancer-relevant pathways in a pathway-based functional enrichment analysis. Therefore, the authors concluded that there is a need and benefit for genome-scale predictors and that reducing the complexity of a predictor comes at the cost of explainability.

Conclusion and Perspective

While ODE’s have a long history in applied and pure mathematics, it is fascinating to see the transition to high-dimensional machine-learning algorithms with millions of parameters solving applied mathematical problems. Neural ODEs slowly close the gap between traditional mechanism-driven mathematical modelling and more data-driven ML-approaches. Given the enormous flexibility of neural networks as well as the development of even more sophisticated and highly optimised computing frameworks, it will be interesting to see which applied mathematical problems we might be able to solve in the next decade. Still, traditional approaches will remain indispensable for biological mathematical modelling.

I decided to feature this preprint from Intekhab Hossain and colleagues after his fantastic talk (link to recording) and poster presentation at the 15th RECOMB Satellite Workshop on Computational Cancer Biology in Istanbul earlier this year. Besides it being potentially very useful for my current PhD project, PHOENIX stood out to me as it allows the flexible integration of prior expert knowledge and is data-driven while at the same still mechanism-driven. Taken together, the ML-framework seems to be quite easily adaptable for a wide range of applications. Even though many researchers are pushing for interpretable models, a lot of researchers try to explain their black box models rather than developing explainable models from the beginning (see Rudin, 2019). Hence, I really like the approach chosen by Intekhab Hossain and colleagues.

Further Material

RECOMB-CCB 2023 talk

GitHub Repository

Tags/Keywords

Ordinary Differential Equation (ODE), Gene-Regulatory Network (GRN), Neural Network, Mathematical Modelling, Machine Learning

References

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural Ordinary Differential Equations (Version 5). arXiv. https://doi.org/10.48550/ARXIV.1806.07366

Chu, D., Zabet, N. R., & Mitavskiy, B. (2009). Models of transcription factor binding: Sensitivity of activation functions to model assumptions. Journal of Theoretical Biology (Vol. 257, Issue 3, pp. 419–429). https://doi.org/10.1016/j.jtbi.2008.11.026

Frank, S. A. (2013). Input-output relations in biological systems: measurement, information and the Hill equation. Biology Direct (Vol. 8, Issue 1). https://doi.org/10.1186/1745-6150-8-31

Griffiths, D. F., & Higham, D. J. (2010). Numerical methods for ordinary differential equations (2010th ed.). Guildford, England: Springer.

Jin, M., Zheng, Y., Li, Y.-F., Chen, S., Yang, B., & Pan, S. (2022). Multivariate Time Series Forecasting with Dynamic Graph Neural ODEs. IEEE Transactions on Knowledge and Data Engineering (pp. 1–14). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/tkde.2022.3221989

Karlebach, G., Shamir, R. (2008) Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 9, 770–780. https://doi.org/10.1038/nrm2503

Kong, X., Yamashita, K., Foggo, B., & Yu, N. (2022). Dynamic Parameter Estimation with Physics-based Neural Ordinary Differential Equations. 2022 IEEE Power & Energy Society General Meeting (PESGM). https://doi.org/10.1109/pesgm48719.2022.9916840

Lai, Z., Liu, W., Jian, X., Bacsa, K., Sun, L., & Chatzi, E. (2022). Neural modal ordinary differential equations: Integrating physics-based modeling with neural ordinary differential equations for modeling high-dimensional monitored structures. Data-Centric Engineering, 3, E34. https://doi.org/10.1017/dce.2022.35

Lanzieri, D., Lanusse, F., & Starck, J.-L. (2022). Hybrid Physical-Neural ODEs for Fast N-body Simulations (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2207.05509

Ochab-Marcinek, A., Jędrak, J., & Tabaka, M. (2017). Hill kinetics as a noise filter: the role of transcription factor autoregulation in gene cascades. Physical Chemistry Chemical Physics (Vol. 19, Issue 33, pp. 22580–22591).https://doi.org/10.1039/c7cp00743d

Paoletti, M. E., Haut, J. M., Plaza, J., & Plaza, A. (2020). Neural Ordinary Differential Equations for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing (Vol. 58, Issue 3, pp. 1718–1734). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/tgrs.2019.2948031

Rudin, C. (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1, 206–215. https://doi.org/10.1038/s42256-019-0048-x

Tags: gene regulatory network, neural network, ode-modelling

Posted on: 31 May 2023 , updated on: 21 August 2023

doi: https://doi.org/10.1242/prelights.34751

Read preprint (No Ratings Yet)

Author's response

Intekhab Hossain shared

Thanks for featuring our preprint! We really value and appreciate your interest in our work.

Q1: One central goal of modern biology is to understand the molecular characteristics and underlying regulatory mechanisms of cell fate decisions. Have you considered to use PHOENIX to screen for potential gene targets which manipulate cell fate decisions?

Yes, this is certainly a great use case, and something we see as a straightforward application of PHOENIX. We noted that most papers in this space have an application to cell fate decisions [1, 2, 3], but do so by first projecting the space down to a lower dimensional space, through UMAP, PCA, highly variable gene subsets, etc. The black-box nature of these dimension reduction techniques leads to a subsequent loss in interpretability. Hence for our paper, we decided to do something different by applying PHOENIX towards understanding the nuanced dynamics of breast cancer progression, a task for which the genome-wide scalability of PHOENIX (without dimensionality reduction) is crucial. We discovered that PHOENIX can find not only potential driver (target) genes, but also potential driver pathways underlying breast cancer progression. Hence PHOENIX promises to be a powerful tool for discovering new targets for many biological processes (including cell fate) and can do so while remaining extremely explainable

Q2: Neural ODEs are often even more unstable than regular ODEs? Have you tested whether your model fulfils the Lyapunov condition for stability?

This is a great question and is indeed something that we had been thinking about while designing the PHOENIX architecture. One of the reasons why neuralODEs are so unstable is exactly this instability that you mention, and we kept this in mind when designing the biologically motivated activation functions. As you will notice, the Hilllike activation functions depict the HillLangmuir kinetic equations which are indeed Lyapunov stable, allowing PHOENIX to model complex systems of interactions generally without numerical instability issues.

Q3: Have you studied whether there are subgroups of genes for which your predictability is particularly good or bad? And if so, are these gene families linked to certain processes or involved in big complexes? (The question is inspired by Fig. 2 of Srivastava et al., PLOS comp bio. 2022)

Thanks! This is great suggestion, and we haven’t looked into this closely enough. For our applications to breast cancer dynamics in humans and cell cycle dynamics in yeast, we focused on getting good predictive performance across all genes, and have not investigated yet whether certain genes families are more amenable to good predictions. We hope to look into this in subsequent analyses and post any interesting findings as an addendum to our supplement.

 

References:
1. Liu, R., Pisco, A. O., Braun, E., Linnarsson, S., & Zou, J. (2022). Dynamical systems model of RNA velocity improves inference of singlecell trajectory, pseudotime and gene regulation. Journal of Molecular Biology, 434(15), 167606.
https://doi.org/10.1016/j.jmb.2022.167606.
2. Qiu, Xiaojie, et al. “Mapping transcriptomic vector fields of single cells.” Cell 185.4 (2022): 690711.
https://doi.org/10.1016/j.cell.2021.12.045.
3. Yeo, Grace Hui Ting, Sachit D. Saksena, and David K. Gifford. “Generative modeling of singlecell time series with PRESCIENT enables prediction of cell trajectories with interventions.” Nature communications 12.1 (2021): 3222. https://doi.org/10.1038/s41467-021-23518-w.

1 comment

11 months

Benjamin Dominik Maier

Intekhab Hossain (ihossain@g.harvard.edu), the first author of this study, is happy to answer questions and guide users through the setup process of PHOENIX, so feel free to send him a message or comment here 🙂

4

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the bioinformatics category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

 



List by Alex Eve, Katherine Brown

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.

 



List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.

 



List by Sergio Menchero et al.

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.

 



List by Alex Eve

Antimicrobials: Discovery, clinical use, and development of resistance

Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.

 



List by Zhang-He Goh

Also in the cancer biology category:

BSCB-Biochemical Society 2024 Cell Migration meeting

This preList features preprints that were discussed and presented during the BSCB-Biochemical Society 2024 Cell Migration meeting in Birmingham, UK in April 2024. Kindly put together by Sara Morais da Silva, Reviews Editor at Journal of Cell Science.

 



List by Reinier Prosee

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23

 



List by Alex Eve

Journal of Cell Science meeting ‘Imaging Cell Dynamics’

This preList highlights the preprints discussed at the JCS meeting 'Imaging Cell Dynamics'. The meeting was held from 14 - 17 May 2023 in Lisbon, Portugal and was organised by Erika Holzbaur, Jennifer Lippincott-Schwartz, Rob Parton and Michael Way.

 



List by Helen Zenner

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.

 



List by Nadja Hümpfer et al.

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.

 



List by Alex Eve

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

 



List by Madhuja Samaddar et al.

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.

 



List by Rob Hynds

Anticancer agents: Discovery and clinical use

Preprints that describe the discovery of anticancer agents and their clinical use. Includes both small molecules and macromolecules like biologics.

 



List by Zhang-He Goh

Biophysical Society Annual Meeting 2019

Few of the preprints that were discussed in the recent BPS annual meeting at Baltimore, USA

 



List by Joseph Jose Thottacherry
Close