DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions

Angus M. Sidore, Calin Plesa, Joyce A. Samson, Sriram Kosuri

Posted on: 23 August 2019 , updated on: 27 August 2019

Preprint posted on 20 August 2019

Article now published in Nucleic Acids Research at http://dx.doi.org/10.1093/nar/gkaa600

Gene Synthesis Costs Reduced to a Drop in the Bucket – DropSynth2.0 Improves Multiplexed Gene Synthesis

Selected by Connor Rosen

Categories: synthetic biology, systems biology

Background:

In the last two decades, the ability to rapidly sequence DNA with ever-decreasing costs has revolutionized biology, generating vast amounts of genomic data and unlocking new areas of biology. A variety of highly multiplexed assays taking advantage of high-throughput DNA synthesis with next-generation DNA sequencing to interrogate function of DNA variants, generalized as Multiplexed Assays for Variant Effects (MAVEs), have revealed new biological insights at an incredible pace and scale [Gasperini 2016]. However, the ability to generate gene-size (100s-1000s of DNA base pairs) libraries with high fidelity has limited the scale and scope of some functional assays. Previously, the Kosuri group described DropSynth, an inexpensive bead- and emulsion-based method to assemble gene-size DNA fragment from short oligo pools [Plesa and Sidore, 2018]. Here, they improve DropSynth to substantially increase the scale and fidelity, enabling even high-throughput generation of high-quality gene libraries.

Key findings:

A major improvement in DropSynth fidelity (the percentage of assemblies that perfectly match the designed sequence) came from optimization using high-fidelity polymerases, resulting in a ~5-fold improvement to a ~20% rate of perfect assemblies across multiple different codon usage libraries. Several other steps were tested for improved fidelity or to support the use of the high-fidelity polymerases, including buffer optimization, enzymatic mismatch correction, and suppression PCR. Finally, the scale of DropSynth was improved by expanding the on-bead barcode repertoire to enable 4-times as many assembly reactions to be carried out at once. Together, these technical optimizations represent a substantial improvement in the rate and cost of gene synthesis using the DropSynth technique.

Importance:

Gene synthesis can still be a rate- or cost-limiting step in the design and implementation of high-throughput functional assays. DropSynth initially developed a robust low-cost synthesis platform for variant library production, enabling what the authors described in their first manuscript as “broad mutational scanning”. However, the fidelity of assembly was still too low for some applications. DropSynth 2.0 greatly improved the scale and success of this technique, and illustrates the promise for further refinement to enable ever-greater assembly of gene libraries. One exciting area this promises to open is the functional interrogation of the rapidly-expanding gene databases generated by large-scale genome and metagenome sequencing efforts. For example, the ability to rapidly generate libraries of all predicted polyketide synthase and non-ribosomal peptide synthase domains (as just one example) from microbial metagenomes will be important to enable functional interrogation of the full gene content of often uncultured microbes.

Moving Forward / Questions for Authors:

What and where are the major types of errors in gene synthesis with DropSynth? One might expect that errors will be more enriched at the overlap sequences than in the intervening sequence (where it should be limited to the rate of polymerase error), but the next-generation sequencing data should clarify that assumption. Additionally, what percentage of erroneous sequences have indels that result in early stop codons? If a substantial fraction of errors result in truncated proteins, functional selection (e.g. cloning in-frame with GFP to enable sorting of full-length variants) may be sufficient for downstream applications rather than requiring dial-out PCR or similar techniques when a high fraction of perfect assemblies is required.
Is DropSynth primarily limited by number of oligos per gene, or by the length of the overall assembly? As private companies increase the length of oligo pools offered, what do the authors expect to be the limiting step in determining the size of genes that may be produced by DropSynth?
There seem to be substantial differences between the two different codon libraries prepared. Does this reflect differences in assembly fidelity / efficiency, PCR bias during sequencing library preparation, or simply the variability between library assembly attempts?
The example libraries are of near-uniform size, which enables size selection and efficient bulk suppression PCR. In a situation where many different genes are assembled of varying lengths (such as the domain-scale metagenome libraries described above), how efficient do the authors expect the bulk suppression PCR to be, or what other methods of removing low molecular weight and improperly assembled products might be used?

References:

Gasperini M., Starita L., Shendure J. “The power of multiplexed functional analysis of genetic variants” 2016. Nature Protocols 11, 1782-1787
Plesa C.*, Sidore A.M.*, Lubock N.B., Zhang D., Kosuri S. “Multiplexed gene synthesis in emulsions for exploring protein functional landscapes” 2018. Science 359(6373), 343-347

doi: https://doi.org/10.1242/prelights.13560

Read preprint

(No Ratings Yet)

Author's response

Calin Plesa shared

In the original DropSynth paper we saw lots of mismatch errors due to the use of Kapa Robust. The use of a high-fidelity polymerase has now reduced those significantly as evidenced by the increased yield. The major error type is now single and multiple base deletions which dominate in microarray derived oligos. We’ve seen deletion errors depleted at the overlaps. This is likely due to the stochastic nature of the errors. An error in the overlap of one oligo will affect hybridization with the corresponding overlap in another oligo since it’s unlikely that the exact same error will occur in both. This effect will select against assemblies with errors in the overlap. We’ve previously seen around 40% of assemblies with a premature stop codon. Yes, it should be possible to do an in-frame fusion to a selectable marker. We have tried this in the past and encountered two major issues. First, the selection significantly amplifies biases present in the non-uniform distribution of genes. This effectively reduces coverage as more sequencing depth is required to find under-represented genes. Second, we saw poor enrichment rates which we attribute to the proteins in the library having lot of Methionines. We believe the ribosomes were able to re-initiate translation from the downstream ATG sites, bypassing the deletion errors. I am working on some ways to address these issues.
We’ve demonstrated that up to 6 oligos can be pulled down to form assemblies and assemblies with even higher numbers of oligos should be possible. The limiting factor right now is the fidelity of the source oligos. Increasing the number of oligos needed to assemble a gene further decreases fidelity of the resulting assembly. As the length of oligo pools increases the limiting factor will depend on how the error rates scale. What we’ve seen in the past was that it was advantageous to use longer oligos because the distribution of errors was not uniform. Some oligos had lots of errors and others had few or none. This meant that the percentage of perfect oligos in the pool did not decrease much even though the oligos had longer lengths. If this trend continues, we will be able to assemble longer lengths with similar fidelity without increasing the number of oligos. Using five 300-mers, we will be able to assemble genes 1kbp in length.
While there is some variability between different attempts, the yield is generally reproducible. We believe the majority of the differences are due to a combination of factors, some of which you alluded to already. Sequence specific effects, which are still poorly understood, operate at multiple levels: 1) the synthesis of the oligos, 2) the PCA assembly process, 3) PCR amplification (bias).
We encountered this problem very early on and came up with a system that standardizes the length of constructs in a library. The assembled genes are flanked by restrictions sites for cloning as well as assembly primers on each end like this:

Primer—RE—gene—RE—primer

By adding random sequence between the restriction site and primer you can buffer the length of shorter genes to some minimum length like this:

Primer—RE—gene—RE—buffer—primer

With this approach all of the assemblies can have a small length distribution for the size selection and suppression PCR steps. Using this approach the true underlying distribution of gene lengths only becomes a factor once you digest the assemblies for cloning.

Have your say Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Sign up to customise the site to your preferences and to receive alerts

Also in the synthetic biology category:

Enhancer cooperativity can compensate for loss of activity over large genomic distances

Henry Thomas, Songjie Feng, Marie Huber, et al.

Selected by 10 June 2024

Milan Antonovic

Discovery and Validation of Context-Dependent Synthetic Mammalian Promoters

Adam M. Zahm, William S. Owens, Samuel R. Himes, et al.

Selected by 21 June 2023

Jessica L. Teo

Genetically encoded multimeric tags for intracellular protein localisation in cryo-EM

Herman KH Fung, Yuki Hayashi, Veijo T Salo, et al.

Selected by 16 January 2023

Martyna Kosno-Vega

Discussion

Also in the systems biology category:

Modular control of time and space during vertebrate axis segmentation

Ali Seleit, Ian Brettell, Tomas Fitzgerald, et al.

AND

Natural genetic variation quantitatively regulates heart rate and dimension

Jakob Gierten, Bettina Welz, Tomas Fitzgerald, et al.

Selected by 24 June 2024

Girish Kale, Jennifer Ann Black

Expressive modeling and fast simulation for dynamic compartments

Till Köster, Philipp Henning, Tom Warnke, et al.

Selected by 18 April 2024

Benjamin Dominik Maier

Clusters of lineage-specific genes are anchored by ZNF274 in repressive perinucleolar compartments

Martina Begnis, Julien Duc, Sandra Offner, et al.

Selected by 10 April 2024

Silvia Carvalho

preLists in the synthetic biology category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

List by

Alex Eve, Katherine Brown

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.

List by

Alex Eve

EMBL Conference: From functional genomics to systems biology

Preprints presented at the virtual EMBL conference "from functional genomics and systems biology", 16-19 November 2020

List by

Jesus Victorino

Antimicrobials: Discovery, clinical use, and development of resistance

Preprints that describe the discovery of new antimicrobials and any improvements made regarding their clinical use. Includes preprints that detail the factors affecting antimicrobial selection and the development of antimicrobial resistance.

List by

Zhang-He Goh

Advances in Drug Delivery

Advances in formulation technology or targeted delivery methods that describe or develop the distribution of small molecules or large macromolecules to specific parts of the body.

List by

Zhang-He Goh

Also in the systems biology category:

‘In preprints’ from Development 2022-2023

A list of the preprints featured in Development's 'In preprints' articles between 2022-2023

List by

Alex Eve, Katherine Brown

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

A list of preprints mentioned at the #EESmorphoG virtual meeting in 2021.

List by

Alex Eve

Single Cell Biology 2020

A list of preprints mentioned at the Wellcome Genome Campus Single Cell Biology 2020 meeting.

List by

Alex Eve

ASCB EMBO Annual Meeting 2019

A collection of preprints presented at the 2019 ASCB EMBO Meeting in Washington, DC (December 7-11)

List by

Madhuja Samaddar et al.

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Preprints discussed at the 2019 edition of Seeing is Believing, at EMBL Heidelberg from the 9th-12th October 2019

List by

Dey Lab

Pattern formation during development

The aim of this preList is to integrate results about the mechanisms that govern patterning during development, from genes implicated in the processes to theoritical models of pattern formation in nature.

List by

Alexa Sadier

DropSynth 2.0: high-fidelity multiplexed gene synthesis in emulsions

Share this:

Have your say Cancel reply

Sign up to customise the site to your preferences and to receive alerts

Also in the synthetic biology category:

Enhancer cooperativity can compensate for loss of activity over large genomic distances

Discovery and Validation of Context-Dependent Synthetic Mammalian Promoters

Genetically encoded multimeric tags for intracellular protein localisation in cryo-EM

Also in the systems biology category:

Modular control of time and space during vertebrate axis segmentation

Natural genetic variation quantitatively regulates heart rate and dimension

Expressive modeling and fast simulation for dynamic compartments

Clusters of lineage-specific genes are anchored by ZNF274 in repressive perinucleolar compartments

preLists in the synthetic biology category:

‘In preprints’ from Development 2022-2023

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

EMBL Conference: From functional genomics to systems biology

Antimicrobials: Discovery, clinical use, and development of resistance

Advances in Drug Delivery

Also in the systems biology category:

‘In preprints’ from Development 2022-2023

EMBL Synthetic Morphogenesis: From Gene Circuits to Tissue Architecture (2021)

Single Cell Biology 2020

ASCB EMBO Annual Meeting 2019

EMBL Seeing is Believing – Imaging the Molecular Processes of Life

Pattern formation during development