Close

DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions

Angus M. Sidore, Calin Plesa, Joyce A. Samson, Sriram Kosuri

Preprint posted on 20 August 2019 https://www.biorxiv.org/content/10.1101/740977v1

Article now published in Nucleic Acids Research at http://dx.doi.org/10.1093/nar/gkaa600

Gene Synthesis Costs Reduced to a Drop in the Bucket – DropSynth2.0 Improves Multiplexed Gene Synthesis

Selected by Connor Rosen

Background:

In the last two decades, the ability to rapidly sequence DNA with ever-decreasing costs has revolutionized biology, generating vast amounts of genomic data and unlocking new areas of biology. A variety of highly multiplexed assays taking advantage of high-throughput DNA synthesis with next-generation DNA sequencing to interrogate function of DNA variants, generalized as Multiplexed Assays for Variant Effects (MAVEs), have revealed new biological insights at an incredible pace and scale [Gasperini 2016]. However, the ability to generate gene-size (100s-1000s of DNA base pairs) libraries with high fidelity has limited the scale and scope of some functional assays. Previously, the Kosuri group described DropSynth, an inexpensive bead- and emulsion-based method to assemble gene-size DNA fragment from short oligo pools [Plesa and Sidore, 2018]. Here, they improve DropSynth to substantially increase the scale and fidelity, enabling even high-throughput generation of high-quality gene libraries.

 

Key findings:

A major improvement in DropSynth fidelity (the percentage of assemblies that perfectly match the designed sequence) came from optimization using high-fidelity polymerases, resulting in a ~5-fold improvement to a ~20% rate of perfect assemblies across multiple different codon usage libraries. Several other steps were tested for improved fidelity or to support the use of the high-fidelity polymerases, including buffer optimization, enzymatic mismatch correction, and suppression PCR. Finally, the scale of DropSynth was improved by expanding the on-bead barcode repertoire to enable 4-times as many assembly reactions to be carried out at once. Together, these technical optimizations represent a substantial improvement in the rate and cost of gene synthesis using the DropSynth technique.

 

Importance:

Gene synthesis can still be a rate- or cost-limiting step in the design and implementation of high-throughput functional assays. DropSynth initially developed a robust low-cost synthesis platform for variant library production, enabling what the authors described in their first manuscript as “broad mutational scanning”. However, the fidelity of assembly was still too low for some applications. DropSynth 2.0 greatly improved the scale and success of this technique, and illustrates the promise for further refinement to enable ever-greater assembly of gene libraries. One exciting area this promises to open is the functional interrogation of the rapidly-expanding gene databases generated by large-scale genome and metagenome sequencing efforts. For example, the ability to rapidly generate libraries of all predicted polyketide synthase and non-ribosomal peptide synthase domains (as just one example) from microbial metagenomes will be important to enable functional interrogation of the full gene content of often uncultured microbes.

 

Moving Forward / Questions for Authors:

  • What and where are the major types of errors in gene synthesis with DropSynth? One might expect that errors will be more enriched at the overlap sequences than in the intervening sequence (where it should be limited to the rate of polymerase error), but the next-generation sequencing data should clarify that assumption. Additionally, what percentage of erroneous sequences have indels that result in early stop codons? If a substantial fraction of errors result in truncated proteins, functional selection (e.g. cloning in-frame with GFP to enable sorting of full-length variants) may be sufficient for downstream applications rather than requiring dial-out PCR or similar techniques when a high fraction of perfect assemblies is required.
  • Is DropSynth primarily limited by number of oligos per gene, or by the length of the overall assembly? As private companies increase the length of oligo pools offered, what do the authors expect to be the limiting step in determining the size of genes that may be produced by DropSynth?
  • There seem to be substantial differences between the two different codon libraries prepared. Does this reflect differences in assembly fidelity / efficiency, PCR bias during sequencing library preparation, or simply the variability between library assembly attempts?
  • The example libraries are of near-uniform size, which enables size selection and efficient bulk suppression PCR. In a situation where many different genes are assembled of varying lengths (such as the domain-scale metagenome libraries described above), how efficient do the authors expect the bulk suppression PCR to be, or what other methods of removing low molecular weight and improperly assembled products might be used?

 

References:

  • Gasperini M., Starita L., Shendure J. “The power of multiplexed functional analysis of genetic variants” 2016. Nature Protocols 11, 1782-1787
  • Plesa C.*, Sidore A.M.*, Lubock N.B., Zhang D., Kosuri S. “Multiplexed gene synthesis in emulsions for exploring protein functional landscapes” 2018. Science 359(6373), 343-347

 

Posted on: 23 August 2019 , updated on: 27 August 2019

doi: https://doi.org/10.1242/prelights.13560

Read preprint (No Ratings Yet)

Author's response

Calin Plesa shared

  1. In the original DropSynth paper we saw lots of mismatch errors due to the use of Kapa Robust. The use of a high-fidelity polymerase has now reduced those significantly as evidenced by the increased yield. The major error type is now single and multiple base deletions which dominate in microarray derived oligos. We’ve seen deletion errors depleted at the overlaps. This is likely due to the stochastic nature of the errors. An error in the overlap of one oligo will affect hybridization with the corresponding overlap in another oligo since it’s unlikely that the exact same error will occur in both. This effect will select against assemblies with errors in the overlap. We’ve previously seen around 40% of assemblies with a premature stop codon. Yes, it should be possible to do an in-frame fusion to a selectable marker. We have tried this in the past and encountered two major issues. First, the selection significantly amplifies biases present in the non-uniform distribution of genes. This effectively reduces coverage as more sequencing depth is required to find under-represented genes. Second, we saw poor enrichment rates which we attribute to the proteins in the library having lot of Methionines. We believe the ribosomes were able to re-initiate translation from the downstream ATG sites, bypassing the deletion errors. I am working on some ways to address these issues.
  2. We’ve demonstrated that up to 6 oligos can be pulled down to form assemblies and assemblies with even higher numbers of oligos should be possible. The limiting factor right now is the fidelity of the source oligos. Increasing the number of oligos needed to assemble a gene further decreases fidelity of the resulting assembly. As the length of oligo pools increases the limiting factor will depend on how the error rates scale. What we’ve seen in the past was that it was advantageous to use longer oligos because the distribution of errors was not uniform. Some oligos had lots of errors and others had few or none. This meant that the percentage of perfect oligos in the pool did not decrease much even though the oligos had longer lengths. If this trend continues, we will be able to assemble longer lengths with similar fidelity without increasing the number of oligos. Using five 300-mers, we will be able to assemble genes 1kbp in length.

  3. While there is some variability between different attempts, the yield is generally reproducible. We believe the majority of the differences are due to a combination of factors, some of which you alluded to already.  Sequence specific effects, which are still poorly understood, operate at multiple levels: 1) the synthesis of the oligos, 2) the PCA assembly process, 3) PCR amplification (bias).

  4. We encountered this problem very early on and came up with a system that standardizes the length of constructs in a library. The assembled genes are flanked by restrictions sites for cloning as well as assembly primers on each end like this:

    Primer—RE—gene—RE—primer

    By adding random sequence between the restriction site and primer you can buffer the length of shorter genes to some minimum length like this:

    Primer—RE—gene—RE—buffer—primer

    With this approach all of the assemblies can have a small length distribution for the size selection and suppression PCR steps. Using this approach the true underlying distribution of gene lengths only becomes a factor once you digest the assemblies for cloning.

Have your say

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Register here
Close