Massively parallel protein-protein interaction measurement by sequencing (MP3-seq) enables rapid screening of protein heterodimers

Alexander Baryshev, Alyssa La Fleur, Benjamin Groves, Cirstyn Michel, David Baker, Ajasja Ljubetič, Georg Seelig

Preprint posted on 9 February 2023

High-throughput sequencing of protein-protein interactions quantifies more than 100,000 interactions simultaneously with interaction strengths spanning several orders of magnitude.

Selected by Benjamin Dominik Maier

Categories: bioinformatics, genomics


Protein-protein interactions (PPIs) are ubiquitous in biological systems and play a crucial role in various biological processes such as cell proliferation, growth, and differentiation, as well as triggering downstream responses through cellular signalling (Westermarck et al., 2013). They are necessary to form complexes, transfer electrons (McMillan et al., 2013), regulate processes and transport molecules between different compartments. Proteins bind to each other through multiple mechanisms including hydrophobic bonding, van der Waals forces, and salt bridges (Xie et al., 2019). The binding specificity and strength (transient – permanent) varies greatly as there are promiscuous proteins unspecifically interacting with many other proteins as well as proteins that have exactly one partner. Hence there is a need for quantitative interaction measurements over several orders of magnitudes when performing proteome-wide studies.

As aberrant PPIs have been associated with numerous human diseases, including inter alia cancer, infectious diseases, and neurodegenerative diseases such as Alzheimer’s disease and Creutzfeldt–Jakob disease (Lu et al., 2020), targeting PPIs has become an important focus for novel treatments (Corbi-Verge et al., 2016 & Modell et al., 2016). However, most existing approaches for analysing PPIs are limited by scalability, a lack of knowledge of interaction sites, or the need for tedious laboratory work. Therefore, many existing methods have been improved and many new methods have been developed to better identify what proteins interact and how they interact.

In this study, Baryshev, La Fleur et al. (BioRxiv, 2023) developed a new Yeast Two-Hybrid approach with direct measurement of interaction strength by DNA sequencing (MP3-seq) for high-throughput study of protein-protein interactions.

Current Approaches to Study Protein-Protein Interactions

There are multiple high-throughput experimental and computational methods to characterise and quantity PPIs with yeast two-hybrid (Y2H) and tandem affinity purification-mass spectrometry (MS) being the most commonly used methods. Depending on the aim of a study, different factors (e.g. in vivo or in vitro, qualitative or quantitative approach, transient and/or stable PPIs, indirect or direct identification) should be considered when deciding on the method to address the research question. A more detailed overview of genetic, biochemical, biophysical and computational methods to study protein-protein interactions can be found below as supplementary comment. Additionally, I found the review from Stynen et al. (2012) and Rao et al. (2014) quite helpful even though they are not covering recent advances in the field.

Key Findings

New scalable Yeast Two-Hybrid approach (MP3-seq)

Baryshev, La Fleur et al. (BioRxiv, 2023) introduced a new scalable Yeast Two-Hybrid approach that can rapidly identify more than 100,000 pairwise protein-protein heterodimer interactions and quantify interaction strengths over several orders of magnitude. The workflow involves transforming all necessary DNA fragments into yeast and assembly in histidine auxotroph yeast. This is followed by positive plasmid selection after which the cells are split. While some cells are taken aside, the rest of the cells subjected to a His selection where cells can only grow when the proteins of interest interact, triggering the gene expression of an essential enzyme in the histidine biosynthesis pathway.

In more detail, yeast cells (histidine and tryptophan auxotroph) are transformed with all necessary components via electroporation. To reduce unwanted variability which could distort subsequent quantification, a centromere sequence is added as it ensures the expression of one pair of hybrid proteins per cell on average. As all elements can assemble to a plasmid in yeast through homologous recombination, no additional plasmid cloning (E. coli) or mating (yeast) step is required. Subsequently, the cells are grown in a selective media lacking Tryptophan (Trp), only allowing growth of yeast that have successfully transformed and assembled the plasmid (positive selection).

As the quantification method relies on comparing pre- and post-His-comparison, some cells were frozen, before the remaining cells were transferred to a selective media without histidine (His). If the two proteins to be screened interact with each other, the growth essential enzyme his3 is expressed allowing cells to grow in absence of  histidine (His). his3 encodes the IGP dehydratase which catalyses an essential step in the histidine biosynthesis pathway. This means that cells that carry two strongly/permanently interacting proteins on their plasmid are expressing more IGP dehydratase allowing more histidine to be produced and thereby maximal growth. On the other hand, cells with transiently interacting proteins grow slower given the lower amount of available histidine.

Finally, cells from both pre- and post-His selection were lysed and the plasmid DNA extracted for subsequent amplification and sequencing. Based on the barcode counts before and after his-selection, one can compute the relative enrichment which serves as a proxy for the interaction strength.

Fig. 1 MP3-seq Workflow. Figure taken from: Baryshev, A. et al. (2023). Massively parallel protein-protein interaction measurement by sequencing (MP3-seq) enables rapid screening of protein heterodimers. BioRxiv, 2023.02.08.527770., which was published under a Attribution-NoDerivatives 4.0 International License (CC BY-ND 4.0). Panel A shows the experimental MP3-seq workflow consisting of transformation of all necessary fragments via electroporation, his selection in selective media lacking His followed by next-generation barcode sequencing. The equation given in panel B is used to compute the enrichment between pre- and post-His-selection normalised by library size. Panel C describes the computational data analysis to remove nonspecific autoactivators and quantification of interaction strength via log-fold change of pseudoreplicates in DESeq2.

Development of a Computational Data Analysis Workflow

The authors have developed a computational data analysis workflow to analyse Mp3-seq data and quantify interaction strength (; hopefully available soon). Following calculation of the enrichment, multiple corrections are performed including autoactivator removal, undetected PPI filtering, and accounting for experimental replicates.

In detail, the authors first computed the enrichment between the pre- and the post-His selection normalised by library size. Next, they removed proteins, which act as autoactivators meaning that they can non-specifically activate the expression of the reporter gene (here his3) in absence of a protein-protein interaction. They were identified by screening for proteins that displayed a high enrichment value for all interaction partners, which means that the his3 expression was unspecific. Subsequently, the authors screened for protein-protein combinations for which no barcode could be detected in the pre- and the post-His selection sequencing results and replaced them by the minimum value of detected pre-His reads. The corrected reads from both fusion orders (P1-DBD + P2-AD and P2-DBD + P1-AD, taken as pseudoreplicates) were then analysed by DESeq2 to determine the log2 fold changes which serves as proxy for the interaction strength.

Benchmark Results

The authors performed multiple benchmarking routines to make sure that their method is feasible for large-scale screening and that their results are in agreement with other state-of-the-art methods. These benchmarks included a large-scale assay of designed heterodimers, an approach with orthogonal coiled-coil dimers and an experiment with BCL family binders (control cell death primarily by direct binding interactions). Overall, their results demonstrated a good quantitative agreement with previously published results.


Obtaining unbiased high-quality protein-protein interaction data is crucial for multiple fields including synthetic biology, drug development and molecular biology. Recent advances have paved the way for the design of novel drugs targeting PPIs resulting in multiple small molecules, peptides and antibodies currently being in clinical trials as PPI modulators. MP3-seq might be used as a complementary analysis to fragment-based drug discovery (FBDD) by building large-scale libraries of modular interaction domains.

Moreover, having more reliable experimental quantitative high-throughput protein-protein interaction data through MP3-seq as prior knowledge may allow for more sophisticated computational predictions of protein-protein interactions. I would assume that by combining it with AlphaFold-based docking methodologies like the protocol from Bryant et al. (2022) or AlphaFold-multimer which use optimised multiple sequence alignments, one could improve prediction results and filter out more false-positives further improving PPI predictions.

Finally, it would be interesting to see whether it is possible to tune the approach to specific cellular environments and other two-hybrid screening techniques, which would allow for context-specific and tailored analysis of protein-protein interactions.

What I liked about this preprint

What I really liked about this preprint is that the authors not only developed a very promising experimental method, but also a custom workflow to analyse their high-throughput Y2H data.

Collaborations between experimental and computational researchers are crucial in ensuring that the statistical and filtering analysis tools used downstream are specifically tailored to meet the demands of the experimental method. This leads to a more efficient, robust, and accurate interpretation of the results. Additionally, such collaborations foster innovation and interdisciplinary thinking, as both sides learn from and influence each other.


Bryant, P., Pozzati, G., & Elofsson, A. (2022). Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications, 13(1).

Corbi-Verge, C., Kim, P.M. Motif mediated protein-protein interactions as drug targets. Cell Commun Signal 14, 8 (2016).

Lu, H., Zhou, Q., He, J. et al. (2020) Recent advances in the development of protein–protein interactions modulators: mechanisms and clinical trials. Sig Transduct Target Ther 5, 213 .

McMillan DG, Marritt SJ, Firer-Sherwood MA, et al. (2013) Protein-protein interaction regulates the direction of catalysis and electron transfer in a redox enzyme complex. J Am Chem Soc., 135(28), 10550-10556.

Modell, A. E., Blosser, S. L., & Arora, P. S. (2016). Systematic Targeting of Protein-Protein Interactions. Trends in pharmacological sciences, 37(8), 702–713. https://doi.org10.1016/

Rao, V. S., Srinivas, K., Sujini, G. N., & Kumar, G. N. S. (2014). Protein-Protein Interaction Detection: Methods and Analysis. International Journal of Proteomics, 2014, 147648.

Stynen Bram, Tournu Hélène, Tavernier Jan, & Van Dijck Patrick. (2012). Diversity in Genetic In Vivo Methods for Protein-Protein Interaction Studies: from the Yeast Two-Hybrid System to the Mammalian Split-Luciferase System. Microbiology and Molecular Biology Reviews, 76(2), 331–382.

Westermarck, J., Ivaska, J., & Corthals, G. L. (2013). Identification of Protein Interactions Involved in Cellular Signaling. Molecular & Cellular Proteomics, 12(7), 1752–1763.

Xie, N. Z., Du, Q. S., Li, J. X., & Huang, R. B. (2015). Exploring Strong Interactions in Proteins with Quantum Chemistry and Examples of Their Applications in Drug Design. PloS one, 10(9), e0137113.

Questions to the Authors

Q1: Can the interaction strength proxy obtained from MP3-seq be used to make assumptions about the nature of an interaction (e.g. transient/permanent)?

Q2: Is it possible to apply the MP3-seq methodology to other Two-hybrid screening methods such as the bacterial two-hybrid (B2H) system?

Q3: The cellular environment and posttranslational modifications (e.g.  phosphorylations, methylations, and acetylations) which temper the charge of a protein (domain) are believed to play a crucial role in context-specific protein-protein interactions and their regulation. Considering that some protein domains do “well” in the reducing environment of the yeast nucleus, while others “prefer” the more oxidising cytoplasm; is there a way to tune the protocol to specific cellular environments and/or conditions to advance our understanding of human health?

Q4: Even though AlphaFold was initially just trained on monomeric protein structures, it has demonstrated that it can reliably predict protein complexes, especially when using AlphaFold-Multimer which has been additionally trained on multimeric protein structures. Where do you see the role of experimental methods to predict protein-protein interactions in future?

Q5: How does MP3-seq compare to some of the existing methods which are currently being commercialised/already on the market?

Tags: protein-protein interactions, sequencing, yeast-two-hybrid

Posted on: 7 April 2023


Read preprint (1 votes)

Author's response

The author team shared

Q1: Yes, MP3-seq can be used to quantify interaction strength!

We show in figure 2H and figure 3D that the logarithm of the enrichment ratio (P-FLC) correlates very well with the dissociation constant (Kd). Kd is the ratio between the off and on rate (k_off/k_on). Assuming a similar k_on, we can therefore predict which interactions dissociate slower.

Q2: Yes, it should be applicable to other next-gen screening methods.

Q3: We currently use activation of transcription as the readout. Therefore the interactions must be stable in the cytoplasm and cytosol. Not sure how we could tune this to different environments, although it would be interesting to do so.

Q4: While structure predictors like AlphaFold and AlphaFold-Multimer excel at predicting complexes, using the predicted complexes to determine protein-protein interactions with them is still difficult, especially when one wants to assess orthogonality or if two proteins do not interact. We are making some exciting predictions with multiple versions of AlphaFold multimer to see if we can predict MP3-Seq values from complex features. The next preprint version of the article will include these, so stay tuned.

Experimental methods are needed now and will be needed in the future to verify interactions, regardless of how accurate protein-protein interaction predictors become. Current sequence-based models to predict protein-protein interactions can be very accurate but are limited to certain structure classes. MP3-seq or other high throughput PPI methods, in combination with the fine-tuning of general structural predictors like AlphaFold, might give a more general model of PPI prediction.

Q5: MP3-seq is an easy-to-implement, high-throughput assay to measure pairs of molecular interactions inside cells. A recently developed assay NGB2H (Boldrige, Ljubetič, et al, bioRxiv, 2022) can also measure interactions between binders in a large library. However that assay is implemented in bacteria, and works based on a split cAMP survival system. The cloning steps are easier in yeast cells.

Alpha-screen is another method to measure a large set of interactions. However, it requires proteins to be individually expressed and attached to beads, which introduces more work and lowers the throughput.


We present a method (MP3-seq) for measuring a large set of protein-protein interactions. We show that the MP3-seq readout is quantitatively correlated with binding affinity over several orders of magnitude of interaction strengths and measure over 100,000 interactions simultaneously.

We apply MP3-seq to characterise interactions between several families of rationally designed heterodimers and develop a greedy algorithm to reduce large-scale MP3-seq screens to identify sets of potentially orthogonal heterodimers. Finally, we use MP3-seq to analyse interactions between four helix binders with designed hydrogen-bond networks. From the data we derive new rules for designing orthogonal four helix binders.

1 comment

2 months

Benjamin Dominik Maier

Supplement: Current Approaches to Study Protein-Protein Interactions

– Yeast Two-Hybrid (Y2H) System involves fusing one protein of interest to a DNA-binding domain (BDB) and another protein to an activation domain (AD) (Fields and Song, 1989). If the two proteins interact, the DNA-binding and activation domains come into close proximity, leading to the expression of a reporter gene. Newer versions of the assay (Weile et al., 2017; Luck et al. 2020) allow for screening of entire proteomes or random peptide libraries, which can enable unbiased identification of novel interactions.
– Protein fragment complementation assay (PCA) involves splitting a protein of interest into two fragments and fusing each fragment with a complementary protein fragment (Michnick et al., 2007). The resulting fusion proteins can only reconstitute the original protein activity if the two proteins interact and bring the two fragments together. This approach can be used to study protein-protein interactions in vivo and has been used for high-throughput screening to identify potential drug targets.
– Protein-array based methods involve immobilising large numbers of purified proteins on a solid surface (e.g. glass slide). The arrays can then be probed with (fluorescently) labelled proteins of interest to identify potential interaction partners. This approach enables the simultaneous screening of thousands of protein pairs, but it does necessitate laborious protein purification steps.

– Fluorescence/Bioluminescence Resonance Energy Transfer (FRET/BRET) involves labelling one protein with a donor fluorophore/luminescence and the other protein with an acceptor fluorophore (Sun et al., 2016). When the two proteins interact, the donor fluorophore transfers energy to the acceptor fluorophore, resulting in fluorescence, which can be measured by a detector.
– Mass spectrometry-based methods involve the fractionation of cell lysates by size or other physicochemical properties, followed by mass spectrometry analysis to identify co-eluting proteins (Richards et al., 2021). Although this approach can identify numerous potential interaction partners in an unbiased manner, it can be quite laborious since it requires protein purification steps.

– Immunoprecipitation-based methods such as Co-Immunoprecipitation (Co-IP) or Tandem Affinity Purification (TAP) involve the isolation of protein-protein interactions by adsorbing the protein complex onto beads using a combination of protein tags and specific antibodies (Lin & Lai, 2017), and subsequently identifying them by mass spectrometry or Western blotting.
– Surface Plasmon Resonance (SPR) involves immobilising one protein on a chip and flowing the other protein over it. The binding between the two proteins is measured in real-time based on the changes in the refractive index of the chip.
– Proximity-dependent labelling methods such as BioID and APEX enable identification of interacting proteins within a certain distance of the protein of interest and can be used to identify both stable and transient interactions in an unbiased way (Chen & Perrimon, 2017).

– Co-evolution Analysis makes inferences about protein-protein interactions using alignments (both sequence and structure) and phylogenetic distances. The approach can be used to predict functional residues and domains involved in the PPI.
– Molecular Docking Analysis uses structural templates of individual proteins to predict the structure of a complex. The approach is commonly used to screen large libraries of small molecule compounds for potential drug candidates that can target PPIs.
While most models require experimentally determined interactions as prior knowledge, there are also some approaches to predict interactions de novo.

Supplemental References

Chen, C. L., & Perrimon, N. (2017). Proximity‐dependent labeling methods for proteomic profiling in living cells. WIREs Developmental Biology, 6(4).

Fields, S., & Song, O. (1989). A novel genetic system to detect protein-protein interactions. Nature, 340(6230), 245–246.

Lin, J. S., & Lai, E. M. (2017). Protein-Protein Interactions: Co-Immunoprecipitation. Methods in molecular biology (Clifton, N.J.), 1615, 211–219.

Luck, K., Kim, DK., Lambourne, L. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).

Michnick, S., Ear, P., Manderson, E. et al. (2007) Universal strategies in research and drug discovery based on protein-fragment complementation assays. Nat Rev Drug Discov, 6, 569–582.

Richards, A.L., Eckhardt, M. and Krogan, N.J. (2021) Mass spectrometry‐based protein–protein interaction networks for the study of human diseases, Molecular Systems Biology, 17(1).

Sun, S., Yang, X., Wang, Y., & Shen, X. (2016). In Vivo Analysis of Protein–Protein Interactions with Bioluminescence Resonance Energy Transfer (BRET): Progress and Prospects. International Journal of Molecular Sciences, 17(10), 1704.

Weile, J., Sun, S., Cote, A. G., Knapp, J., Verby, M., Mellor, J. C., … Roth, F. P. (2017). A framework for exhaustively mapping functional missense variants. Molecular Systems Biology, 13(12), 957.


Have your say

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the bioinformatics category:

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.


List by Sergio Menchero et al.


The advances in fibroblast biology preList explores the recent discoveries and preprints of the fibroblast world. Get ready to immerse yourself with this list created for fibroblasts aficionados and lovers, and beyond. Here, my goal is to include preprints of fibroblast biology, heterogeneity, fate, extracellular matrix, behavior, topography, single-cell atlases, spatial transcriptomics, and their matrix!


List by Osvaldo Contreras

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.


List by Alex Eve

Antimicrobials: Discovery, clinical use, and development of resistance

Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.


List by Zhang-He Goh