State-of-the-Art Estimation of Protein Model Accuracy using AlphaFold

James P. Roney, Sergey Ovchinnikov

Preprint posted on 24 March 2022

Article now published in Physical Review Letters at

AlphaFold may not be “just” a pattern recognition algorithm, but may actually have learnt about the energetics of protein folding.

Selected by Kieran Didi

Categories: bioinformatics, biophysics

Background (ca. 400 words)

The field of protein structure prediction was revolutionized last year when the DeepMind team, which won the 14th Critical Assessment of Structure Prediction (CASP14) competition, published the paper2 and the code for their AlphaFold (AF2) model. This major advance on the protein folding problem holds the promise for progress in many areas of biology and medicine, since protein structures are essential for research in e.g. drug discovery and protein engineering, but often only accessible via expensive and laborious experimental methods such as X-ray crystallography, cryo-electron microscopy and NMR spectroscopy. These methods have enabled scientists to elucidate more than 100,000 protein structures (available through the Protein Data Bank), but are costly and involve a lot of trial and error. Computational methods try to simplify this procedure by predicting the 3D structure of a linear protein sequence without experimentally determining it.

Early efforts at simplifying protein structure prediction aimed to capture the physics that govern protein folding and simulate the folding process to get an accurate structure; a prime example of this is the Rosetta software suite developed by David Baker and co-workers at the University of Washington in Seattle. In the 90s, coevolution information was recognized as a valuable input for protein structure prediction. For this, multiple sequence alignments (MSAs) between evolutionarily related proteins are constructed and spatial contacts are inferred based on coevolution of amino acids. Finally, the progress in the field of machine learning and especially deep learning also had an impact on the structural biology community, with huge models such as AlphaFold producing state-of-the-art protein structure predictions.

One open question regarding AlphaFold is whether the model learned something about the underlying physics of the protein folding problem or is “just” a pattern matching algorithm inherently dependent on the provided MSAs. Since proteins in nature fold astonishingly fast by themselves (a phenomenon known as Levinthal’s paradox) and some of them can refold after denaturation (as observed by Anfinsen3), the 3D structure must be encoded in the protein sequence alone. Anfinsen’s dogma, therefore, states that proteins fold as a result of free energy minimization. This free energy depends on the protein structure and is the one that earlier physics-based prediction tools tried to approximate (and that is, for example, still approximated in techniques like molecular dynamics simulations).

Figure 1 of the preprint: The hypothesized role of coevolutionary information in AlphaFold’s prediction procedure. According to this, AlphaFold implicitly learns an energy function of the protein conformational landscape.


In this preprint, Roney and Ovchinnikov address this question by testing the hypothesis that AlphaFold learned this energy function and uses coevolution information to find a good initial guess for an energy minimum in this conformation landscape, therefore understanding something about the underlying physics of the protein folding problem. They use this hypothesis to rewire AlphaFold in such a way that they can rank decoy protein structures with it, performing better than state-of-the-art (SOTA) models for this task.

Key findings (ca. 500 words)                                      

Use of AF2 for ranking candidate protein structures

During the structure prediction process, AlphaFold uses an MSA of the amino acid sequence of the target protein with related sequences as input. As an additional option, known protein structures close to the target protein sequence (known as templates) can be provided to improve prediction results. The model then outputs a predicted protein structure and two confidence metrics for this prediction: the predicted LDDT-Cα Score (pLDDT) and the predicted TM Score (pTM).

To change the objective of AlphaFold from predicting protein structures to ranking candidate structures, the authors made three adjustments. First, instead of providing known protein structures as templates, they provide a “decoy structure” that is a candidate structure for the target protein, e.g. one predicted by another model. Furthermore, they do not provide an MSA as input, but just the amino acid of the target protein, therefore stripping the model of the ability to use coevolutionary information. Lastly, they compute a new output metric called a “composite confidence score” based on the existing metrics: they multiply output pLDDT, output pTM and the TM Score between the structure predicted by AlphaFold and the decoy. The last term is needed since the main objective is not to assess the quality of the predicted structure, but the quality of the decoy structure that was given as a template.

The authors use this approach to rank decoys from the Rosetta decoy dataset, which contains 133 native protein structures along with thousands of decoy structure variants, and compare the performance with common decoy ranking tools such as Rosetta4 and the SOTA machine learning model DeepAccNet5. Their approach based on AlphaFold strongly outperforms Rosetta and DeepAccNet, both in terms of Spearman correlation of the confidence metric with decoy quality and in terms of top-1 accuracies of decoy structures.

Ranking quality independent of decoy’s amino acid sequence

The decoy structures the authors provide the model with masks out the side chains that help to increase the accuracy. Since the decoy structure now basically consists of the backbone and the Cβ-atoms only, any sequence of correct length could be fed into the model as input instead of the correct sequence of the target protein. The authors investigated the influence of this parameter by running their experiments with two different one-hot-encoded sequence inputs: both the true target sequence and an all-alanine sequence. They found that both choices deliver robust results on the Rosetta decoy dataset, with the all-alanine sequence performing better on the correlation metric and the correct target sequence performing better on the top-1 accuracies. The authors then used this result to further extend their hypothesis regarding the inner workings of their decoy ranking predictions: in the case of the target sequence input, this sequence and the masked-out sequence of the template are identical, and therefore the structural predictions probably very similar. Since the global geometry is thus quite similar, the confidence metrics used to compute the composite score are therefore more dependent on local fold features, delivering better results on the top-1 accuracies.

For the all-alanine sequence, the opposite is the case: due to very low sequence similarity, the global symmetry between decoy and prediction will be very different, causing the confidence metrics to be strongly influenced by the global fold and the model to perform better on the general correlation metric. Using a weighted hybrid approach, the authors were able to combine the strengths of both methods and outperform the results gained by providing either of the inputs alone.

Evaluation from CASP14: MSAs needed for accurate structure prediction, not for decoy ranking

To test their hypothesis on an independent data set, the authors used the CASP14 EMA (Estimation of Model Accuracy) tasks. Here, they show that AlphaFold is indeed able to rank decoys better than the top models from CASP14 without coevolution information, but still needs the MSAs to perform structure prediction itself. Without MSAs, it can rank predicted decoys reliably but performs poorly in producing structure predictions, further supporting the author’s hypothesis that coevolution information is used to provide a good initial guess on the learned energy landscape, from which the structure module performs local gradient descent to an energetic minimum.

Why I selected this preprint (ca. 100 words)                       

The publication of AlphaFold had a major influence on the structural biology community and the life sciences in general by improving experimental structure predictions, as well as providing thousands of predicted structures to researchers around the world. However, more difficult problems such as protein design still pose a challenge. The main hypothesis of this preprint (i.e. that AlphaFold has learnt some kind of underlying energy function) presents a novel idea that suggests new angles from which challenging problems in structural biology can be tackled.

Questions for the authors (ca. 2 questions)

1. The preprint provides evidence for your hypothesis that AF2 learns an energy function for protein folding, but what other experiments could be used to support/falsify your hypothesis?

2. For protein structure prediction, the MSAs still seem indispensable. If your hypothesis is true, in what ways could this new insight be used for problems such as protein design/structure prediction for single sequences?


(1)       Roney, J. P.; Ovchinnikov, S. State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold. bioRxiv March 12, 2022, p 2022.03.11.484043.

(2)       Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; Kohl, S. A. A.; Ballard, A. J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A. W.; Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589.

(3)       Anfinsen, C. B.; Scheraga, H. A. Experimental and Theoretical Aspects of Protein Folding. In Advances in Protein Chemistry; Anfinsen, C. B., Edsall, J. T., Richards, F. M., Eds.; Academic Press, 1975; Vol. 29, pp 205–300.

(4)       Rubenstein, A. B.; Blacklock, K.; Nguyen, H.; Case, D. A.; Khare, S. D. Systematic Comparison of Amber and Rosetta Energy Functions for Protein Structure Evaluation. J. Chem. Theory Comput. 2018, 14 (11), 6015–6025.

(5)       Hiranuma, N.; Park, H.; Baek, M.; Anishchenko, I.; Dauparas, J.; Baker, D. Improved Protein Structure Refinement Guided by Deep Learning Based Accuracy Estimation. Nat. Commun. 2021, 12 (1), 1340.

Tags: alphafold, casp, protein structure prediction, rosetta

Posted on: 13 April 2022


Read preprint (No Ratings Yet)

Author's response

James Roney shared

Thanks for sharing this with us! I think your writeup does a really good job summarizing our preprint, and I’m glad you found it interesting! The two questions you had at the end are very important, and we hope to address them robustly in the future. Here are some preliminary perspectives on those questions:

1. If AlphaFold has learned an energy function, we might expect it to be useful for other applications like predicting the effects of single mutations on protein stability, or for improving the accuracy of protein design. This suggests some new experiments that could be used to test the hypothesis we’ve proposed in the preprint, and future versions of our paper may contain some of these experiments.
2. Using the energy function learned by AlphaFold to predict protein structures from single sequences is a very exciting possibility opened up by the hypothesis we’ve proposed in our preprint. In theory, it should be possible to search over the space of possible decoy conformations to find structures that produce high-confidence outputs from AlphaFold. However, it is unclear whether this is computationally feasible in general, or if such a search might uncover adversarial structures that “trick” AlphaFold into being highly confident. In the latest version of our preprint, we’ve explored a simple approach to optimizing decoy structures in Appendix E. Essentially, we showed that a simple greedy optimization procedure can be used to improve the accuracy of AlphaFold’s MSA-free predictions for many protein targets. There’s still plenty of work to do to see if this approach can be improved upon and generalized, but we think it’s a very interesting proof-of-concept.

Have your say

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here