Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip

Curtis J Layton, Peter L McMahon, William J Greenleaf

Preprint posted on June 14, 2018

From sequence to function: Current Illumina high-throughput sequencing technology adapted to carry out functional screening on a huge variety of proteins.

Selected by Samantha Seah

Categories: molecular biology

Illumina high-throughput sequencing technologies have been widely utilised to tackle many biological problems. For example, RNA-Seq enables the study of gene expression changes, Hi-C considers chromatin architecture and ChIP-seq examines binding of DNA-binding proteins. In Illumina sequencing, DNA fragments are added to sequencing flow cells, where they bind to flow cell oligonucleotides and via bridge amplification, produce clusters of identical DNA molecules. The subsequent addition and excitation of fluorescently-labelled reversible terminators enables the identification of each added base, as each base has a unique emission. The emission profiles present at each cluster over subsequent rounds of synthesis enable the elucidation of DNA sequences, in a process known as sequencing-by-synthesis.

In contrast to the success in linking DNA sequence variation to function, there has been less success linking protein sequence to function. A recent preprint by the Greenleaf lab outlines a technology (Prot-MAP: Protein display on a Massively-Paralleled Array) that combines sequencing-by-synthesis with protein function assays to enable quantitative protein function assays with a massively high throughput.

Key Findings
To generate protein arrays, the authors first created a library of DNA constructs encoding their polypeptides of interest, which are then clustered and sequenced on an Illumina MiSeq, with the cluster positions recorded (Figure 1A). The authors then carried out in vitro transcription and translation with stalling of both the E. coli RNA Polymerase and ribosome, such that both the transcript and peptide remain associated with the DNA template. They then use fluorescence-based assays to study protein function. As the position of the clusters remain the same from the initial Illumina MiSeq to the final functional assays, DNA sequence, which determines protein sequence, can be directly correlated with protein function.

To test the technology with protein binding assays, the authors utilised the well-characterised FLAG peptide/M2 antibody system. Previous studies have identified DYKxxDxx to be the consensus sequence of the M2 epitope. From this, the authors engineered a library of 13,154 sequences that included single, double, triple-combination of mutant positions, with each position substituted to 6 different amino acids. After DNA sequencing and peptide generation, the M2 antibody was introduced, before the introduction of a fluorescent secondary antibody and imaging, similarly to an ELISA. To determine the binding affinity of the M2 antibody to the peptides, the above process was repeated for increasing concentrations of M2 antibody, enabling the elucidation of the limit of detection (LoD) for each peptide, i.e. the lowest antibody concentration at which binding is detected.

Upon studying the mutant affinity landscapes, the authors note that they largely recapitulate the expected consensus sequence (DYKxxDxx), and even find a “superFLAG” sequence that has a LoD 7.9x lower (meaning higher binding affinity) than that of the wild-type FLAG. They also find additional constraint at position 4: antibody binding only occurs when D or L are present at this position, and reduced binding upon substitution of D by L. Further study of the triple mutants including D4L indicate that some mutations at other positions, including D5E and D7K, partially rescue D4L, and that some of these mutation combinations even exhibit cooperativity.

For enzymatic catalysis assays, the authors also tested their technology on the SNAP-tag protein modification, which can be fused to proteins and subsequently tagged with a ligand, such as a fluorescent dye. They tested 7 residues that have been previously associated with modulating function, and made single, double and triple-mutants combinations across all 20 possible amino acid substitutions, testing over 150,000 variants in total. They find that the mutational constraints vary between different residues. Some residues are strictly constrained (such as Y114), while others are much more tolerant to mutations (for example, A121 and L153). By studying double mutants more closely, the authors found pairs of mutations that exhibited positive cooperativity, and noted that most strong positively-cooperative pairs are in close proximity in the protein (Cα-Cα distances of less than 13 Å). They also found that histidine was extremely capable of participating in cooperative interactions, and hypothesised that this was due to the variability in the charge and hydrogen bonding state of histidine in different contexts.

Figure 1A of the preprint: Workflow for enabling the establishment of a high-throughput protein array.

What I like about this work
I think that this is a brilliant modification to current Illumina sequencing technology to enable it to be used for high-throughput functional protein assays. The microfluidic chips and sequencing technology required are commercially available and the imaging software is simply adapted from current Illumina sequencing. By including a series of simple, yet elegant changes that enable the DNA fragment to be transcribed and translated, with the RNA and protein remaining attached to the DNA fragment, the authors have made it possible to study an additional dimension (protein function) while maintaining a high throughput.

Kudos to the authors for simply co-opting the positional information that enables the linking of nucleotides into a complete DNA sequence, to link DNA sequence to protein function.

A key limitation of the technology is the size of DNA molecules that can be clustered. This in turn severely restricts the size of the protein that can be studied, and may result in the technology being used largely only to study peptide fragments or protein domains. I wonder if the authors see this as the key limitation of this technology, or if they see a way to somehow overcome this.

Further reading

She, R., et al., Comprehensive and quantitative mapping of RNA–protein interactions across a transcribed eukaryotic genome. Proceedings of the National Academy of Sciences, 2017. 114(14): p. 3619-3624.

Jung, C., et al., Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips. Cell, 2017. 170(1): p. 35-47. e13.

Tags: illumina, ngs, protein array

Posted on: 26th June 2018

Read preprint (No Ratings Yet)

  • Have your say

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Sign up to customise the site to your preferences and to receive alerts

    Register here