Close

Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom

Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker

Preprint posted on 9 October 2023 https://doi.org/10.1101/2023.10.09.561603

A generalised structure prediction algorithm for a comprehensive understanding of atoms in biological units, spanning proteins, nucleic acids, small molecules, cofactors, and chemical modifications.

Selected by Saanjbati Adhikari

Figure 1: Copied from Figure 4A of the preprint. 

Background 

Proteins constitute the building blocks of biological processes. Consequently, knowledge of their native structures is a critical pre-requisite to any form of basic and therapeutic research (1). Experimental approaches to solving protein structures are infamous for being laborious, non-economic, and often beyond the scope of in vitro conditions (I can personally relate to such misfortune, as my PhD project primarily revolved around the challenges of protein isolation and structural characterisation). Conversely, primary amino acid sequences are readily available through widely used databanks, and can be used to estimate a protein’s folded conformation (1, 2). The advent of artificial intelligence-based tools for structurally predicting biomolecules has revolutionised the life sciences over the past decade (2, 3). 

The evolution of the protein modelling field is truly remarkable. In the 1950s, pioneers such as Linus Pauling, G. N. Ramachandran, Alexander Rich, and F. R. Crick used a combination of stereochemical and geometric considerations, model-building, and mathematical equations to assemble secondary structures of alpha-keratin (4) and collagen (5, 6) into 3D-models. However, the analysis of the first protein crystal by Kendrew and colleagues in 1961 (7) unveiled that protein structure prediction can be more complicated than previously thought (8). After half a century of scientists trying to solve the ‘protein folding’ problem, the emergence of platforms like AlphaFold (AF; 9) and RoseTTAfold (RF; 10) radically transformed the protein structural prediction scene. Both approaches are highly accurate in predicting domain structures but exhibit low confidence when it comes to modelling loops, coiled-coil proteins, and disordered regions. The recently modified versions of AF and RF – AlphaFoldMultimer and RoseTTAFold – can predict complex protein assemblies for some canonical amino acids. 

However, they lack the training to accurately predict the coordinates of small molecules, ligands, and cofactors usually associated with a protein complex. 

What has this study achieved? 

In this new multidisciplinary work coordinated by David Baker, a generalised structure prediction algorithm was used to gather a holistic understanding for all atoms within a biological unit, including proteins, nucleic acids, small molecules, cofactors, and chemical modifications. Using a novel prediction tool – RFdiffusion All Atom (RFdiffusionAA) – the authors designed and experimentally validated binding partners of a therapeutic small molecule, an enzymatic cofactor, and optically active photosynthetic molecules. 

Summary of the paper 

Part 1: Development of a RoseTTAFold-based model for prediction of universal biomolecules 

  • Generalised Biomolecular Prediction with RoseTTAFold All-Atom (RFAA) 

Using the architectural framework of RoseTTAFold2 (RF2), the authors developed RoseTTAFold All-Atom (RFAA) to include input information about a biomolecular assembly, i.e., amino acids and nucleic acid base sequences, metal ions, small molecule bonded structure, and covalent bonds between proteins and their interactors. 

The three-track system in RF2 was improved as follows: 

1) the 1D track includes 46 new elements representing most common element types found in the Protein Data Bank (PDB), in addition to the 20 residue and 8 nucleic acid base representation in RoseTTAFold Nucleic Acid (RFNA; 10), 

2) the 2D track comprises types of bonds between elements (single, double, triple, etc), 

3) the 3D tracker encodes stereochemistry information (chirality of the molecules). 

Figure 2. Processing of molecular input information on RFAA for generalised biomolecular prediction. Copied from the preprint (Fig 1B). 

  • Training RFAA to predict protein-small molecule complexes. 

In AlphaFold (AF), the Frame Aligned Point Error (FAPE) loss function helps minimise the distance between the predicted structure and the actual PDB (protein data bank) structure, thereby enhancing accuracy of a model. The authors incorporated an all-atom version of FAPE in RFAA, where every atom was assigned a local coordinate frame based on neighbouring 

atoms. Similar to AF, the RFAA network also predicts atom and residue-wise confidence (pLDDT) and pairwise confidence (PAE) to help identify high quality predictions. 

Of note, the authors demonstrated three successful examples where they used RFAA to model higher order biomolecular complexes. Comparison to other deep learning-based docking methods showed that RFAA predicts 42% of the complexes correctly, which is significantly higher than what was achieved by other tested methods. 

  • Prediction of protein covalent modifications using RFAA 

Posttranslational modifications (PTMs) involve the covalent addition of modifying groups to some proteins after their biosynthesis. The ability to model these covalent modifications can have direct implications for therapeutics and diagnostic designs. RFAA uses a process called ‘atomisation’ to treat proteins as atoms, rather than residues. Consequently, it models modifications in proteins by assigning atomic identity to a particular residue and the attached chemical moiety. 

In this work, 931 recent entries in the PDB were used for the prediction of covalently modified proteins. In 46% of the cases, the network made accurate predictions with modification RMSD <2.5Å. High confidence structures (PAE interaction<10) were obtained for 60% of the predictions, akin to that reported for protein-small molecule complexes. 

  • Small molecule binding protein design 

Although there have been several efforts aiming to ‘dock’ molecules into native protein scaffold structures, designing proteins that bind small molecules remains a challenge in the field. Building on recent work (11), the authors in this study refined RFAA by developing a diffusion model, RFdiffusion All-Atom (RFdiffusionAA). This model was conditionally trained on the distribution of proteins pertaining to a biomolecular structure. Information about protein sequences or ‘motifs’ were included in the training language, since motifs play a big role in determining ligand conformation. The authors then evaluated RFdiffusionAA in silico for four independent small molecules, utilising a combination of platforms like LigandMPNN and RosettaGALigandDock. The results show that RFdiffusionAA predictions score better binding energy evaluations than the existing RF diffusion platform that utilises an attractive/repulsive potential. 

Part 2: Experimental validation of designed binders 

Next, the authors designed proteins that bind three diverse small molecules using RFAA, and experimentally validated their binding characteristics to further strengthen the prediction tool. 

  1. Previously, binders for Digoxigenin – a small molecule therapeutic steroid for treating cardiovascular diseases – were designed based on co-crystal structures, binding fitness landscapes, and thermodynamic binding parameters (12). Such approaches may not be ideal for small molecules of diverse origins due to challenges in determining experimental characteristics. In this work, the authors employed RFAA to design Digoxigenin-binding backbones without utilising prior knowledge about protein-ligand interface or backbone structure. After fitting sequences into the backbones with the assistance of LigandMPNN and Rosetta FastRelax, over 4000 designs were selected based on AF2 predictions and Rosetta metrices and screened for binding via fluorescence-activated cells sorting. Finally, 3 of these designed proteins, displaying high binding signals, were purified in vitro and characterised. The most potent Digoxigenin binder demonstrated a dissociation constant (Kd) of 10 nM and exhibited remarkable thermostability at temperatures as high as 98 °C. This clearly proves that RFAA can successfully generate novel binders for small molecules in a robust and resourceful manner. 
  1. Heme is a porphyrin Iron-binding compound and a critical enzymatic cofactor in the transport and storage of oxygen in vertebrates. Hemoglobin, myoglobin, cytochrome, etc. all comprise the heme auxillary group. Therefore, designing novel high affinity heme binders holds promise for potential therapeutic and diagnostic avenues in the future. Using RFAA, the authors designed heme binders and selected 168 of the designs based on AF predictions and confidence of the backbone design (indicated by RMSD). The critical part in these designs was that, unlike the small molecule Digoxigenin, the catalytic function of heme depends on its binding to a central Iron molecule, which is coordinated by a Cysteine residue above the porphyrin ring. Finally, after purification from the bacterial heterologous system, 38 of the designs were obtained as monomeric, thermostable, heme-binding structures, indicating therapeutic and industrial applications. 
  2. Employing RFAA-based characterisation, the researchers could also build optically enhanced binders for bilins, essential light-harvesting molecules in photosynthetic organisms. Of particular interest, 3 designs reported satisfactory fluorescence quantum yields, demonstrating potential in building novel complexes with increased light capturing capacity and subsequently higher photosynthetic output. 

What I loved about this work 

  • What truly stood out for me in this study was the utilisation of in vitro tools to validate in silico predictions performed with RFAA and RFdiffusionAA. It satisfied my scientific curiosity to see orthogonal evidence that further strengthened the presented model and offered tangible proof to appreciate the credibility of the predictions. 
  • The idea of “atomisation” stood out particularly because of the elegance of the concept where protein residues are simply treated as atoms. This model assigns atomic identity not only to the constituent residues of a protein, but also extends it to diverse small molecules closely associated with an individual protein or a protein complex. By reducing macromolecules to their elemental units, this approach encapsulates the essence of all matter with a remarkable level of simplicity. 
  • Another interesting attribute of the study is that the authors tested RFAA’s efficiency and “trainability” based on its ability to predict protein-small molecule interactions outside of their training dataset. By extending the tool’s memory to encompass a diverse array of proteins and small molecules beyond those included in the training set, this attribute not only enhances the tool’s adaptability but also underscores the robust and comprehensive approach taken in its development. 

Questions to the authors. 

  • Is there a maximum limit to the number of atoms or residues that can be accurately predicted using this training network? 
  • Given the promising outcomes observed in three experimental cases in this study, can RFAA-mediated modelling potentially contribute to understanding dynamic transient interactions crucial for accurate cellular functioning? For example, in cell cycle pathways, small molecules external to critical protein complexes dynamically interact with enzymes and functional domains within short time frames. Often, it is challenging to capture such interactive complexes in vitro. I was wondering whether RFAA has been tested to understand such transiently/ reversibly stable complexes. 

References: 

  1. Lupas, A. N., Pereira, J., Alva, V., Merino, F., Coles, M., & Hartmann, M. D. (2021). The breakthrough in protein structure prediction. Biochemical Journal, 478(10), 1885–1890. https://doi.org/10.1042/BCJ20200963 
  2. Kuhlman, B., & Bradley, P. (2019). Advances in protein structure prediction and design. Nature Reviews Molecular Cell Biology, 20(11), 681–697. https://doi.org/10.1038/s41580-019-0163-x 
  3. Hekkelman, M. L., De Vries, I., Joosten, R. P., & Perrakis, A. (2023). AlphaFill: Enriching AlphaFold models with ligands and cofactors. Nature Methods, 20(2), 205–213. https://doi.org/10.1038/s41592-022-01685-y 
  4. Pauling, L., & Corey, R. B. (1953). Compound Helical Configurations of Polypeptide Chains: Structure of Proteins of the α-Keratin Type. Nature, 171(4341), 59–61. https://doi.org/10.1038/171059a0 
  5. Ramachandran, G. N., & Kartha, G. (1955). Structure of Collagen. Nature, 176(4482), 593–595. https://doi.org/10.1038/176593a0 
  6. Rich, A., & Crick, F. H. C. (1955). The Structure of Collagen. Nature, 176(4489), 915–916. https://doi.org/10.1038/176915a0 
  7. Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R. G., Wyckoff, H., & Phillips, D. C. (1958). A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis. Nature, 181(4610), 662–666. https://doi.org/10.1038/181662a0 
  8. Lupas, A. N., Pereira, J., Alva, V., Merino, F., Coles, M., & Hartmann, M. D. (2021). The breakthrough in protein structure prediction. Biochemical Journal, 478(10), 1885–1890. https://doi.org/10.1042/BCJ20200963 
  9. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2 
  10. Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., Wang, J., Cong, Q., Kinch, L. N., Schaeffer, R. D., Millán, C., Park, H., Adams, C., Glassman, C. R., DeGiovanni, A., Pereira, J. H., Rodrigues, A. V., Van Dijk, A. A., Ebrecht, A. C., … Baker, D. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871–876. https://doi.org/10.1126/science.abj8754 
  11. Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., Ahern, W., Borst, A. J., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Hanikel, N., Pellock, S. J., Courbet, A., Sheffler, W., Wang, J., Venkatesh, P., Sappington, I., Torres, S. V., … Baker, D. (2022). Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models [Preprint]. Biochemistry. https://doi.org/10.1101/2022.12.09.519842 
  12. Tinberg, C. E., Khare, S. D., Dou, J., Doyle, L., Nelson, J. W., Schena, A., Jankowski, W., Kalodimos, C. G., Johnsson, K., Stoddard, B. L., & Baker, D. (2013). Computational design of ligand-binding proteins with high affinity and selectivity. Nature, 501(7466), 212–216. https://doi.org/10.1038/nature12443 

 

Posted on: 24 January 2024

doi: https://doi.org/10.1242/prelights.36373

Read preprint (No Ratings Yet)

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the biochemistry category:

Preprint Peer Review – Biochemistry Course at UFRJ, Brazil

Communication of scientific knowledge has changed dramatically in recent decades and the public perception of scientific discoveries depends on the peer review process of articles published in scientific journals. Preprints are key vehicles for the dissemination of scientific discoveries, but they are still not properly recognized by the scientific community since peer review is very limited. On the other hand, peer review is very heterogeneous and a fundamental aspect to improve it is to train young scientists on how to think critically and how to evaluate scientific knowledge in a professional way. Thus, this course aims to: i) train students on how to perform peer review of scientific manuscripts in a professional manner; ii) develop students' critical thinking; iii) contribute to the appreciation of preprints as important vehicles for the dissemination of scientific knowledge without restrictions; iv) contribute to the development of students' curricula, as their opinions will be published and indexed on the preLights platform. The evaluations will be based on qualitative analyses of the oral presentations of preprints in the field of biochemistry deposited in the bioRxiv server, of the critical reports written by the students, as well as of the participation of the students during the preprints discussions.

 



List by Marcus Oliveira

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.

 



List by Nadja Hümpfer et al.

20th “Genetics Workshops in Hungary”, Szeged (25th, September)

In this annual conference, Hungarian geneticists, biochemists and biotechnologists presented their works. Link: http://group.szbk.u-szeged.hu/minikonf/archive/prg2021.pdf

 



List by Nándor Lipták

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

 



List by Madhuja Samaddar et al.

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Preprints discussed at the 2019 edition of Seeing is Believing, at EMBL Heidelberg from the 9th-12th October 2019

 



List by Dey Lab

Cellular metabolism

A curated list of preprints related to cellular metabolism at Biorxiv by Pablo Ranea Robles from the Prelights community. Special interest on lipid metabolism, peroxisomes and mitochondria.

 



List by Pablo Ranea Robles

MitoList

This list of preprints is focused on work expanding our knowledge on mitochondria in any organism, tissue or cell type, from the normal biology to the pathology.

 



List by Sandra Franco Iborra

Also in the bioinformatics category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

 



List by Alex Eve, Katherine Brown

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.

 



List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.

 



List by Sergio Menchero et al.

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.

 



List by Alex Eve

Antimicrobials: Discovery, clinical use, and development of resistance

Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.

 



List by Zhang-He Goh

Also in the cell biology category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

 



List by Alex Eve, Katherine Brown

preLights peer support – preprints of interest

This is a preprint repository to organise the preprints and preLights covered through the 'preLights peer support' initiative.

 



List by preLights peer support

The Society for Developmental Biology 82nd Annual Meeting

This preList is made up of the preprints discussed during the Society for Developmental Biology 82nd Annual Meeting that took place in Chicago in July 2023.

 



List by Joyce Yu, Katherine Brown

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23

 



List by Alex Eve

Journal of Cell Science meeting ‘Imaging Cell Dynamics’

This preList highlights the preprints discussed at the JCS meeting 'Imaging Cell Dynamics'. The meeting was held from 14 - 17 May 2023 in Lisbon, Portugal and was organised by Erika Holzbaur, Jennifer Lippincott-Schwartz, Rob Parton and Michael Way.

 



List by Helen Zenner

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.

 



List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.

 



List by Sergio Menchero et al.

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.

 



List by Nadja Hümpfer et al.

Fibroblasts

The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!

 



List by Osvaldo Contreras

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.

 



List by Alex Eve

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020

 



List by Ana Dorrego-Rivas

Planar Cell Polarity – PCP

This preList contains preprints about the latest findings on Planar Cell Polarity (PCP) in various model organisms at the molecular, cellular and tissue levels.

 



List by Ana Dorrego-Rivas

BioMalPar XVI: Biology and Pathology of the Malaria Parasite

[under construction] Preprints presented at the (fully virtual) EMBL BioMalPar XVI, 17-18 May 2020 #emblmalaria

 



List by Dey Lab, Samantha Seah

1

Cell Polarity

Recent research from the field of cell polarity is summarized in this list of preprints. It comprises of studies focusing on various forms of cell polarity ranging from epithelial polarity, planar cell polarity to front-to-rear polarity.

 



List by Yamini Ravichandran

TAGC 2020

Preprints recently presented at the virtual Allied Genetics Conference, April 22-26, 2020. #TAGC20

 



List by Maiko Kitaoka et al.

3D Gastruloids

A curated list of preprints related to Gastruloids (in vitro models of early development obtained by 3D aggregation of embryonic cells). Updated until July 2021.

 



List by Paul Gerald L. Sanchez and Stefano Vianello

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome

 



List by Hiral Shah

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

 



List by Madhuja Samaddar et al.

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Preprints discussed at the 2019 edition of Seeing is Believing, at EMBL Heidelberg from the 9th-12th October 2019

 



List by Dey Lab

Autophagy

Preprints on autophagy and lysosomal degradation and its role in neurodegeneration and disease. Includes molecular mechanisms, upstream signalling and regulation as well as studies on pharmaceutical interventions to upregulate the process.

 



List by Sandra Malmgren Hill

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.

 



List by Rob Hynds

Cellular metabolism

A curated list of preprints related to cellular metabolism at Biorxiv by Pablo Ranea Robles from the Prelights community. Special interest on lipid metabolism, peroxisomes and mitochondria.

 



List by Pablo Ranea Robles

BSCB/BSDB Annual Meeting 2019

Preprints presented at the BSCB/BSDB Annual Meeting 2019

 



List by Dey Lab

MitoList

This list of preprints is focused on work expanding our knowledge on mitochondria in any organism, tissue or cell type, from the normal biology to the pathology.

 



List by Sandra Franco Iborra

ASCB/EMBO Annual Meeting 2018

This list relates to preprints that were discussed at the recent ASCB conference.

 



List by Dey Lab, Amanda Haage
Close