Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip

Curtis J Layton, Peter L McMahon, William J Greenleaf

Posted on: 26 June 2018

Preprint posted on 14 June 2018

Article now published in Molecular Cell at http://dx.doi.org/10.1016/j.molcel.2019.02.019

From sequence to function: Current Illumina high-throughput sequencing technology adapted to carry out functional screening on a huge variety of proteins.

Selected by Samantha Seah

Categories: molecular biology

Background
Illumina high-throughput sequencing technologies have been widely utilised to tackle many biological problems. For example, RNA-Seq enables the study of gene expression changes, Hi-C considers chromatin architecture and ChIP-seq examines binding of DNA-binding proteins. In Illumina sequencing, DNA fragments are added to sequencing flow cells, where they bind to flow cell oligonucleotides and via bridge amplification, produce clusters of identical DNA molecules. The subsequent addition and excitation of fluorescently-labelled reversible terminators enables the identification of each added base, as each base has a unique emission. The emission profiles present at each cluster over subsequent rounds of synthesis enable the elucidation of DNA sequences, in a process known as sequencing-by-synthesis.

In contrast to the success in linking DNA sequence variation to function, there has been less success linking protein sequence to function. A recent preprint by the Greenleaf lab outlines a technology (Prot-MAP: Protein display on a Massively-Paralleled Array) that combines sequencing-by-synthesis with protein function assays to enable quantitative protein function assays with a massively high throughput.

Key Findings
To generate protein arrays, the authors first created a library of DNA constructs encoding their polypeptides of interest, which are then clustered and sequenced on an Illumina MiSeq, with the cluster positions recorded (Figure 1A). The authors then carried out in vitro transcription and translation with stalling of both the E. coli RNA Polymerase and ribosome, such that both the transcript and peptide remain associated with the DNA template. They then use fluorescence-based assays to study protein function. As the position of the clusters remain the same from the initial Illumina MiSeq to the final functional assays, DNA sequence, which determines protein sequence, can be directly correlated with protein function.

To test the technology with protein binding assays, the authors utilised the well-characterised FLAG peptide/M2 antibody system. Previous studies have identified DYKxxDxx to be the consensus sequence of the M2 epitope. From this, the authors engineered a library of 13,154 sequences that included single, double, triple-combination of mutant positions, with each position substituted to 6 different amino acids. After DNA sequencing and peptide generation, the M2 antibody was introduced, before the introduction of a fluorescent secondary antibody and imaging, similarly to an ELISA. To determine the binding affinity of the M2 antibody to the peptides, the above process was repeated for increasing concentrations of M2 antibody, enabling the elucidation of the limit of detection (LoD) for each peptide, i.e. the lowest antibody concentration at which binding is detected.

Upon studying the mutant affinity landscapes, the authors note that they largely recapitulate the expected consensus sequence (DYKxxDxx), and even find a “superFLAG” sequence that has a LoD 7.9x lower (meaning higher binding affinity) than that of the wild-type FLAG. They also find additional constraint at position 4: antibody binding only occurs when D or L are present at this position, and reduced binding upon substitution of D by L. Further study of the triple mutants including D4L indicate that some mutations at other positions, including D5E and D7K, partially rescue D4L, and that some of these mutation combinations even exhibit cooperativity.

For enzymatic catalysis assays, the authors also tested their technology on the SNAP-tag protein modification, which can be fused to proteins and subsequently tagged with a ligand, such as a fluorescent dye. They tested 7 residues that have been previously associated with modulating function, and made single, double and triple-mutants combinations across all 20 possible amino acid substitutions, testing over 150,000 variants in total. They find that the mutational constraints vary between different residues. Some residues are strictly constrained (such as Y114), while others are much more tolerant to mutations (for example, A121 and L153). By studying double mutants more closely, the authors found pairs of mutations that exhibited positive cooperativity, and noted that most strong positively-cooperative pairs are in close proximity in the protein (Cα-Cα distances of less than 13 Å). They also found that histidine was extremely capable of participating in cooperative interactions, and hypothesised that this was due to the variability in the charge and hydrogen bonding state of histidine in different contexts.

Figure 1A of the preprint: Workflow for enabling the establishment of a high-throughput protein array.

What I like about this work
I think that this is a brilliant modification to current Illumina sequencing technology to enable it to be used for high-throughput functional protein assays. The microfluidic chips and sequencing technology required are commercially available and the imaging software is simply adapted from current Illumina sequencing. By including a series of simple, yet elegant changes that enable the DNA fragment to be transcribed and translated, with the RNA and protein remaining attached to the DNA fragment, the authors have made it possible to study an additional dimension (protein function) while maintaining a high throughput.

Kudos to the authors for simply co-opting the positional information that enables the linking of nucleotides into a complete DNA sequence, to link DNA sequence to protein function.

Outlook
A key limitation of the technology is the size of DNA molecules that can be clustered. This in turn severely restricts the size of the protein that can be studied, and may result in the technology being used largely only to study peptide fragments or protein domains. I wonder if the authors see this as the key limitation of this technology, or if they see a way to somehow overcome this.

Further reading

She, R., et al., Comprehensive and quantitative mapping of RNA–protein interactions across a transcribed eukaryotic genome. Proceedings of the National Academy of Sciences, 2017. 114(14): p. 3619-3624.

Jung, C., et al., Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips. Cell, 2017. 170(1): p. 35-47. e13.

Tags: illumina, ngs, protein array

Read preprint

(No Ratings Yet)

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the molecular biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Sijia Liu, Siamsa M. Doyle, Kathryn M. Robinson, et al.

Selected by 20 February 2026

Jeny Jose

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

William S. Stuart, Michail N. Isupov, Mathew McLaren, et al.

Selected by 06 January 2026

Leonie Brüne

Junctional Heterogeneity Shapes Epithelial Morphospace

Anubhav Prakash, Raman Kaushik, Nishant Singh, et al.

Selected by 25 December 2025

Bhaval Parmar

preLists in the molecular biology category:

SciELO preprints – From 2025 onwards

SciELO has become a cornerstone of open, multilingual scholarly communication across Latin America. Its preprint server, SciELO preprints, is expanding the global reach of preprinted research from the region (for more information, see our interview with Carolina Tanigushi). This preList brings together biological, English language SciELO preprints to help readers discover emerging work from the Global South. By highlighting these preprints in one place, we aim to support visibility, encourage early feedback, and showcase the vibrant research communities contributing to SciELO’s open science ecosystem.

Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the molecular biology category:

A drought stress-induced MYB transcription factor regulates pavement cell shape in leaves of European aspen (Populus tremula)

Cryo-EM reveals multiple mechanisms of ribosome inhibition by doxycycline

Junctional Heterogeneity Shapes Epithelial Morphospace

preLists in the molecular biology category:

SciELO preprints – From 2025 onwards

October in preprints – DevBio & Stem cell biology

October in preprints – Cell biology edition

September in preprints – Cell biology edition

June in preprints – the CellBio edition

May in preprints – the CellBio edition

Keystone Symposium – Metabolic and Nutritional Control of Development and Cell Fate

April in preprints – the CellBio edition

Biologists @ 100 conference preList

February in preprints – the CellBio edition

Community-driven preList – Immunology

January in preprints – the CellBio edition

2024 Hypothalamus GRC

BSCB-Biochemical Society 2024 Cell Migration meeting

‘In preprints’ from Development 2022-2023

CSHL 87th Symposium: Stem Cells

9th International Symposium on the Biology of Vertebrate Sex Determination

Alumni picks – preLights 5th Birthday

CellBio 2022 – An ASCB/EMBO Meeting

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

FENS 2020

ECFG15 – Fungal biology

ASCB EMBO Annual Meeting 2019

Lung Disease and Regeneration

MitoList