Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip

Curtis J Layton, Peter L McMahon, William J Greenleaf

Preprint posted on 14 June 2018

Article now published in Molecular Cell at

From sequence to function: Current Illumina high-throughput sequencing technology adapted to carry out functional screening on a huge variety of proteins.

Selected by Samantha Seah

Categories: molecular biology

Illumina high-throughput sequencing technologies have been widely utilised to tackle many biological problems. For example, RNA-Seq enables the study of gene expression changes, Hi-C considers chromatin architecture and ChIP-seq examines binding of DNA-binding proteins. In Illumina sequencing, DNA fragments are added to sequencing flow cells, where they bind to flow cell oligonucleotides and via bridge amplification, produce clusters of identical DNA molecules. The subsequent addition and excitation of fluorescently-labelled reversible terminators enables the identification of each added base, as each base has a unique emission. The emission profiles present at each cluster over subsequent rounds of synthesis enable the elucidation of DNA sequences, in a process known as sequencing-by-synthesis.

In contrast to the success in linking DNA sequence variation to function, there has been less success linking protein sequence to function. A recent preprint by the Greenleaf lab outlines a technology (Prot-MAP: Protein display on a Massively-Paralleled Array) that combines sequencing-by-synthesis with protein function assays to enable quantitative protein function assays with a massively high throughput.

Key Findings
To generate protein arrays, the authors first created a library of DNA constructs encoding their polypeptides of interest, which are then clustered and sequenced on an Illumina MiSeq, with the cluster positions recorded (Figure 1A). The authors then carried out in vitro transcription and translation with stalling of both the E. coli RNA Polymerase and ribosome, such that both the transcript and peptide remain associated with the DNA template. They then use fluorescence-based assays to study protein function. As the position of the clusters remain the same from the initial Illumina MiSeq to the final functional assays, DNA sequence, which determines protein sequence, can be directly correlated with protein function.

To test the technology with protein binding assays, the authors utilised the well-characterised FLAG peptide/M2 antibody system. Previous studies have identified DYKxxDxx to be the consensus sequence of the M2 epitope. From this, the authors engineered a library of 13,154 sequences that included single, double, triple-combination of mutant positions, with each position substituted to 6 different amino acids. After DNA sequencing and peptide generation, the M2 antibody was introduced, before the introduction of a fluorescent secondary antibody and imaging, similarly to an ELISA. To determine the binding affinity of the M2 antibody to the peptides, the above process was repeated for increasing concentrations of M2 antibody, enabling the elucidation of the limit of detection (LoD) for each peptide, i.e. the lowest antibody concentration at which binding is detected.

Upon studying the mutant affinity landscapes, the authors note that they largely recapitulate the expected consensus sequence (DYKxxDxx), and even find a “superFLAG” sequence that has a LoD 7.9x lower (meaning higher binding affinity) than that of the wild-type FLAG. They also find additional constraint at position 4: antibody binding only occurs when D or L are present at this position, and reduced binding upon substitution of D by L. Further study of the triple mutants including D4L indicate that some mutations at other positions, including D5E and D7K, partially rescue D4L, and that some of these mutation combinations even exhibit cooperativity.

For enzymatic catalysis assays, the authors also tested their technology on the SNAP-tag protein modification, which can be fused to proteins and subsequently tagged with a ligand, such as a fluorescent dye. They tested 7 residues that have been previously associated with modulating function, and made single, double and triple-mutants combinations across all 20 possible amino acid substitutions, testing over 150,000 variants in total. They find that the mutational constraints vary between different residues. Some residues are strictly constrained (such as Y114), while others are much more tolerant to mutations (for example, A121 and L153). By studying double mutants more closely, the authors found pairs of mutations that exhibited positive cooperativity, and noted that most strong positively-cooperative pairs are in close proximity in the protein (Cα-Cα distances of less than 13 Å). They also found that histidine was extremely capable of participating in cooperative interactions, and hypothesised that this was due to the variability in the charge and hydrogen bonding state of histidine in different contexts.

Figure 1A of the preprint: Workflow for enabling the establishment of a high-throughput protein array.

What I like about this work
I think that this is a brilliant modification to current Illumina sequencing technology to enable it to be used for high-throughput functional protein assays. The microfluidic chips and sequencing technology required are commercially available and the imaging software is simply adapted from current Illumina sequencing. By including a series of simple, yet elegant changes that enable the DNA fragment to be transcribed and translated, with the RNA and protein remaining attached to the DNA fragment, the authors have made it possible to study an additional dimension (protein function) while maintaining a high throughput.

Kudos to the authors for simply co-opting the positional information that enables the linking of nucleotides into a complete DNA sequence, to link DNA sequence to protein function.

A key limitation of the technology is the size of DNA molecules that can be clustered. This in turn severely restricts the size of the protein that can be studied, and may result in the technology being used largely only to study peptide fragments or protein domains. I wonder if the authors see this as the key limitation of this technology, or if they see a way to somehow overcome this.

Further reading

She, R., et al., Comprehensive and quantitative mapping of RNA–protein interactions across a transcribed eukaryotic genome. Proceedings of the National Academy of Sciences, 2017. 114(14): p. 3619-3624.

Jung, C., et al., Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips. Cell, 2017. 170(1): p. 35-47. e13.

Tags: illumina, ngs, protein array

Posted on: 26 June 2018

Read preprint (No Ratings Yet)

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here

preLists in the molecular biology category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023


List by Alex Eve, Katherine Brown

CSHL 87th Symposium: Stem Cells

Preprints mentioned by speakers at the #CSHLsymp23


List by Alex Eve

9th International Symposium on the Biology of Vertebrate Sex Determination

This preList contains preprints discussed during the 9th International Symposium on the Biology of Vertebrate Sex Determination. This conference was held in Kona, Hawaii from April 17th to 21st 2023.


List by Martin Estermann

Alumni picks – preLights 5th Birthday

This preList contains preprints that were picked and highlighted by preLights Alumni - an initiative that was set up to mark preLights 5th birthday. More entries will follow throughout February and March 2023.


List by Sergio Menchero et al.

CellBio 2022 – An ASCB/EMBO Meeting

This preLists features preprints that were discussed and presented during the CellBio 2022 meeting in Washington, DC in December 2022.


List by Nadja Hümpfer et al.

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.


List by Alex Eve

FENS 2020

A collection of preprints presented during the virtual meeting of the Federation of European Neuroscience Societies (FENS) in 2020


List by Ana Dorrego-Rivas

ECFG15 – Fungal biology

Preprints presented at 15th European Conference on Fungal Genetics 17-20 February 2020 Rome


List by Hiral Shah

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)


List by Madhuja Samaddar et al.

Lung Disease and Regeneration

This preprint list compiles highlights from the field of lung biology.


List by Rob Hynds


This list of preprints is focused on work expanding our knowledge on mitochondria in any organism, tissue or cell type, from the normal biology to the pathology.


List by Sandra Franco Iborra