Conserved phosphorylation hotspots in eukaryotic protein domain families

Marta J Strumillo, Michaela Oplova, Cristina Vieitez, David Ochoa, Mohammed Shahraz, Bede P Busby, Richelle Sopko, Romain A Studer, Norbert Perrimon, Vikram G Panse, Pedro Beltrao

Posted on: 16 August 2018

Preprint posted on 13 August 2018

Article now published in Nature Communications at http://dx.doi.org/10.1038/s41467-019-09952-x

Rules of the game: extensive comparative study reveals phosphorylation hotspots in key eukaryotic protein domains

Selected by Dey Lab

Categories: evolutionary biology, systems biology

Context

Post-translational modifications (PTMs), and protein phosphorylation in particular, serve as the cell’s precision rewiring tools. The addition (or removal) of a single charged phosphate group can alter a protein’s enzymatic activity, drive a binding partner switch, affect folding and stability, or force a change in subcellular location- all in a matter of seconds!

Advances in mass spectrometry have made it possible to map sites of protein phosphorylation on a massive scale. In recent years, this has produced an overwhelming wealth of genome-wide data across a range of cell types and species- along with new analytical challenges. Take the human genome: of approximately 160,000 non-redundant phosphosites¹, only a tiny fraction is annotated, and many might not actually be functional², with little or no contribution to fitness. How, then, to assess the functional relevance of all this data?

Evolutionary comparisons can help, based on the argument that highly conserved phosphosites are likely to be functional. However, these comparisons are not without their own challenges. Many functional phosphosites lie within unstructured regions of proteins, presumably relaxing selective constraints on their positions. Further complicating such analyses, individual phosphosites have the capacity to flip to acidic residues (and back) on relatively short evolutionary timescales³, rewiring signaling cascades in the process⁴.

Building on work initiated when he was a postdoc at UCSF², Pedro Beltrao and colleagues circumvent these challenges to generate what is, to my knowledge, the most comprehensive comparative analysis of phosphosites, encompassing more than 500,000 phosphosites across 40 eukaryotic genomes.

Major findings

The authors selected a subset 344 well-represented Pfam domain families with a high density of phosphorylation sites. Using a rolling window to account for alignment and assignment errors, and a background expectation generated by randomly permuting phosphosites to equivalent residues within the same sequence, resulted in the identification of significant “hotspots” within 162 of the 344 families. Encouragingly, the hotspots were enriched for known functional phosphosites- and once mapped onto structural models, recovered well-characterized regulatory motifs (Figure 1).

Figure 1: Reproduced from Figure 2 of Strumillo et al. 2018 under a CC-BY-NC-ND 4.0 international license. Enrichment over random of protein phosphorylation along the domain sequence, shown here for 4 domains. The average number of phosphosites observed per rolling window is plotted in a solid black line (observed). The background level of expected phosphorylation calculated from random sampling is shown in gray line, with standard deviations as gray band. The blue line represents the negative logarithm of p-value at each position (right y axis). A horizontal red line indicates a cut-off of the Bonferroni corrected p-value of 0.01. Positions with a -log(p-value) above this cut-off and average phosphosites per window higher than 2 are considered putative regulatory regions and highlighted under a vertical yellow bar. Red circles indicate human phosphosite positions with known regulatory function. In the structural representations, the predicted hotspot regions are highlighted in yellow.

The authors then looked to generalize their analyses, and found that the hotspots tend to be located proximal to catalytic residues or binding interfaces- across a broad range of domain families. Zooming in, they make experimentally tractable functional predictions for uncharacterised hotspots in two enzymes- IMP dehydrogenase and transaldolase. Finally, they selected two phosphorylation sites within a budding yeast ribosomal S11 domain hotspot for an experimental case study of their own, demonstrating a functional role for one of them.

What’s next?

I think it’s quite clear that this study has generated a fantastic resource for the larger community, although- as the authors are quick to point out in their discussion- follow-up structural biology analyses will not necessarily be straightforward. To end with a couple of open-ended questions:

Could the hotspot database be flipped around to help answer evolutionary questions? For example, are there any systematic differences between the organisation of hotspots in apicomplexan parasites and their free-living relatives? Or between multicellular and unicellular organisms?
How far are we from a bottoms-up, synthetic biology approach to designing protein switches or circuits controlled by phosphorylation?
On an even deeper evolutionary timescale: how many of the domain families are present in bacteria or archaea? It would be fascinating to extend the hotspot analysis beyond eukaryotes if feasible!

References:

Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015).
Beltrao, P. et al. Systematic Functional Prioritization of Protein Posttranslational Modifications. Cell 150, 413–425 (2012).
Pearlman, S. M., Serber, Z. & Ferrell, J. E. A mechanism for the evolution of phosphorylation sites. Cell 147, 934–46 (2011).
Dey, G. & Meyer, T. Phylogenetic Profiling for Probing the Modular Architecture of the Human Genome. Cell Syst. 1, 106–115 (2015).

Tags: comparative genomics, phosphoproteomics

doi: https://doi.org/10.1242/prelights.4393

Read preprint

(No Ratings Yet)

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the evolutionary biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Morphological variations in external genitalia do not explain the interspecific reproductive isolation in Nasonia species complex (Hymenoptera: Pteromalidae)

Babita Rahul Baisla, Taruna Verma, Anjali Rana, et al.

Selected by 23 January 2026

Stefan Friedrich Wirth

Discussion

A high-coverage genome from a 200,000-year-old Denisovan

Stéphane Peyrégne, Diyendo Massilani, Yaniv Swiel, et al.

AND

A global map for introgressed structural variation and selection in humans

PingHsun Hsieh, Natthapon Soisangwan, David S. Gordon, et al.

Selected by 02 December 2025

Siddharth Singh

Discussion

Also in the systems biology category:

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Zimo Zhu, Rongbin Zheng, Yang Yu, et al.

Selected by 11 November 2025

Charis Qi

Discussion

Longitudinal single cell RNA-sequencing reveals evolution of micro- and macro-states in chronic myeloid leukemia

David E. Frankhouser, Dandan Zhao, Yu-Hsuan Fu, et al.

Selected by 03 November 2025

Charis Qi

Environmental and Maternal Imprints on Infant Gut Metabolic Programming

Kine Eide Kvitne, Celeste Allaband, Jennifer C. Onuora, et al.

Selected by 26 October 2025

Siddharth Singh

Discussion

preLists in the evolutionary biology category:

SciELO preprints – From 2025 onwards

SciELO has become a cornerstone of open, multilingual scholarly communication across Latin America. Its preprint server, SciELO preprints, is expanding the global reach of preprinted research from the region (for more information, see our interview with Carolina Tanigushi). This preList brings together biological, English language SciELO preprints to help readers discover emerging work from the Global South. By highlighting these preprints in one place, we aim to support visibility, encourage early feedback, and showcase the vibrant research communities contributing to SciELO’s open science ecosystem.

Conserved phosphorylation hotspots in eukaryotic protein domain families

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the evolutionary biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Morphological variations in external genitalia do not explain the interspecific reproductive isolation in Nasonia species complex (Hymenoptera: Pteromalidae)

A high-coverage genome from a 200,000-year-old Denisovan

A global map for introgressed structural variation and selection in humans

Also in the systems biology category:

Human single-cell atlas analysis reveals heterogeneous endothelial signaling

Longitudinal single cell RNA-sequencing reveals evolution of micro- and macro-states in chronic myeloid leukemia

Environmental and Maternal Imprints on Infant Gut Metabolic Programming

preLists in the evolutionary biology category:

SciELO preprints – From 2025 onwards

November in preprints – DevBio & Stem cell biology

October in preprints – DevBio & Stem cell biology

October in preprints – Cell biology edition

Biologists @ 100 conference preList

‘In preprints’ from Development 2022-2023

preLights peer support – preprints of interest

EMBO | EMBL Symposium: The organism and its environment

9th International Symposium on the Biology of Vertebrate Sex Determination

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

Planar Cell Polarity – PCP

TAGC 2020

ECFG15 – Fungal biology

COVID-19 / SARS-CoV-2 preprints

SDB 78th Annual Meeting 2019

Pattern formation during development

Also in the systems biology category:

2024 Hypothalamus GRC

‘In preprints’ from Development 2022-2023

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

Single Cell Biology 2020

ASCB EMBO Annual Meeting 2019

EMBL Seeing is Believing – Imaging the Molecular Processes of Life