Optimizing Synthetic DNA for Metabolic Engineering Applications Howard Salis Penn State University
Synthetic Biology Specify a function Build a genetic system (a DNA molecule) Genetic Pseudocode call producequorumsignal(luxi = 100) Genetic Parts if QS Signal > critical: if ph > 4 AND O 2 > 1%: call producefuel(e 1 = 100, E 2 = 1000, E 3 = 50, E 4 = 10000) call shiftmetabolism(glycolysis = 1000, TCA cycle = 100, biomass = 0) A Genetic System (a programming language for microbes)
Design Methods for Synthetic DNA RBS
Genetic Parts Control gene expression Control protein production Control systems behavior ribosome binding sites promoters protein coding sequences terminators small RNAs RNAse binding sites bacteria can only read DNA: A, G, C, T equations & numbers DNA sequence
Genetic Parts Control gene expression Control protein production Control systems behavior ribosome binding sites promoters protein coding sequences terminators small RNAs RNAse binding sites bacteria can only read DNA: A, G, C, T equations & numbers DNA sequence
Ribosome Binding Sites production rate of protein: Low Low translation initiation rate CAGUACACAACUUCGCUCGUAGUUC ribosome binding site sequence ~35 nucleotides
Ribosome Binding Sites production rate of protein: High High translation initiation rate AAUAUACACAAAGGAGGUUACAACG ribosome binding site sequence ~35 nucleotides
A Biophysical Model of Ribosome Binding Sites Translation initiation is a key rate-limiting step in protein production inside bacteria Input DNA (RNA) Sequence including the protein coding sequence! AATATACACAAAGGAGGTTACAACG ATGACGATGCATGCACGCAGTGCGACATC Output: Translation Initiation Rate a proportional scale from <1 to 100, 000+
A Thermodynamic Model of Translation ribosome turnover rate-limiting step ribosome protein m R m::r + = ( β ) 30S subunits RBS sequence on mrna bound 30S subunits r translation exp m::r exp m R Gtot Boltzmann constant β G ( ) Gibbs free energy change of ribosome binding tot
An Example of Predicting Translation Rate We can calculate the Gibbs free energy change when the ribosome binds to an mrna and use statistical thermodynamics to predict its translation initiation rate dotted lines = nucleotide base pairing mrna sequence G total = 3.1 kcal/mol r = 620 au (proportional scale)
Monte Carlo optimization with Metropolis criteria and simulated annealing Inputs 1. target translation rate 2. protein coding sequence mutate Output a synthetic RBS sequence target reached? accept/reject 10-200 iterations typical ACAGAGTTAAGGGACGAGAG ATGACGATGCATGCACGCAGTGCGAC Output RBS sequence
Automated Design of RBS Sequences We use the RBS Calculator to design 29 synthetic RBS sequences, selecting the translation initiation rate in Escherichia coli (1 to 100 000) proportional scale Target Rate 5.8 120 280 520 2500 6700 12200 23000 Synthetic RBS Sequences TTTCTCCCATAATAACTCGAACCATTACCGAGCT CGAGCTGGTACTTAAAAAC ATAGAGTCGACGC AACCTCAAGGAATTCCCT ACACTAAGGAGACCAACTTGC TGGGGAGTGTAAA AAAGTAAGGAGGCGCGGCT ACAACTTAAGGAGGTATTC
Testing Forward Predictions We use the RBS Calculator to design 29 synthetic RBS sequences, selecting the translation initiation rate in Escherichia coli r translation exp ( β G ) tot Log Fluorescence slope of β Predicted G tot according to the theory E. coli, flow cytometry, steady-state measurements
Testing Forward Predictions We use the RBS Calculator to design 29 synthetic RBS sequences, selecting the translation initiation rate in Escherichia coli r translation exp ( β G ) tot Theory vs. experiment: R 2 = 0.84 We can rationally control the translation initiation rate Accuracy: 2.3-fold over a range from 1 to 100 000+ E. coli, flow cytometry, steady-state measurements
HM Salis, EA Mirsky, CA Voigt, Nat. Biotech., 2009 120+ Predictions Tested
Modularity of RBS Sequences Myth #1: You can use the same ribosome binding site with different protein coding sequences and expect similar expression levels Protein A Protein B 21-fold same RBS E. coli, flow cytometry 6 steady-state measurements Salis, et. al., Nat. Biotech., 2009
RBS Sequences are NOT Modular Parts Myth #1: You can use the same ribosome binding site with different protein coding sequences and expect similar expression levels Myth Busted! Protein A Protein B 530-fold same RBS 17-fold same RBS E. coli, flow cytometry steady-state measurements Salis, et. al., Nat. Biotech., 2009
Predicting Translation Initiation Rates during TRMR Strong RBSs are highly affected by surrounding sequence context (variation in mrna secondary structures) UP RBS (wide distribution) DOWN RBS (narrow distribution) Warner, Reeder, Karimpour-Fard, Woodruff, & Gill, Nat. Biotech., 2010
RBS Calculator Demonstration
Systematic Optimization of Metabolic Pathways What are the optimal enzyme expression levels? E 1 E 2 E 3 E 4 Productivity/titer enzyme 2 enzyme 1 a multi-dimensional, complex surface The optimal enzyme expression levels are somewhere between zero and too high How do we find these numbers, for any pathway?
Furfural and HMF Growth Toxicity Furfural & HMF are potent growth inhibitors of industrial microbes They slow down or completely stop biofuel production They are expensive to remove from hydrolysate They reduce the sugar-to-fuel yield E. coli MG1665 DH10B M9 minimal media 4 g/l glucose (thiamine and leucine supplemented) M1000 TECAN spectrophotometer
A Furfural Biodetoxification Pathway A recently discovered pathway catabolizes furfural and HMF to α-ketoglutarate (Koopman et. al., Proc. Natl. Acad. Sci., 2010) Transferring this DNA to E. coli or budding yeast doesn t yield furfural catabolism DNA is code, but the operating system matters!
Designing a Biodetoxification Pathway Furfural α-ketoglutarate fuel hmfabcde sequences were synthesized using commercial DNA synthesis Redesigned all genetic parts: promoters, ribosome binding sites, protein coding sequences, terminators
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression The RBS Calculator allows us to control bacterial enzyme expression across a 100,000-fold scale translation initiation rate RBS Calculator a synthetic RBS sequence Salis et. al., Nature Biotechnology, v27 (10), 2009
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression 2. an efficient search strategy for finding optimal enzyme expression levels Which search strategy is better? Missed! You ve sunk my battleship!
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression 2. an efficient search strategy for finding optimal enzyme expression levels degenerate RBS sequences (drbs): mixtures of ribosome binding sites CGTCACNNNNNNNNNGT ATG CAACTTSAKGDBGTATTC ATG
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression 2. an efficient search strategy for finding optimal enzyme expression levels Myth #2: Random RBS libraries are a good way to vary protein expression Myth Busted!
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression 2. an efficient search strategy for finding optimal enzyme expression levels E. coli, spectrophotometry, steady-state measurements
Optimization = Control + Search Systematic metabolic pathway optimization requires two ingredients: 1. the ability to quantitatively control enzyme expression 2. an efficient search strategy for finding optimal enzyme expression levels Minimum Translation Init. Rate Maximum Translation Init. Rate RBS Library Resolution
Efficient Combinatorial Optimization Step 1: design optimized drbs sequences using the RBS Library Calculator Step 2: insert optimized drbs sequences in front of each enzyme coding sequence We use a new combinatorial cloning strategy (Gibson et. al., Nature Methods, 2009) drbs libraries: 8 x 8 x 8 x 8 x 8 = 32768 pathway variants combinations of ribosome binding sites combinations of enzyme expression levels
Efficient Combinatorial Optimization Step 3: selection or screening of optimal pathway variants with increasing furfural conc. Step 4. DNA sequencing of surviving colonies, followed by pathway characterization Selection on increasing furfural concentrations 32,768 pathways 3+ unique pathways remaining (still refining selection procedure)
The Optimal Pathway (so far) x DH10B x DH10B + phmf-1 design ~1 year now Strain / Plasmid Initial Growth Rate 0 g/l furfural Initial Growth Rate 3 g/l furfural E. coli DH10B 0.186 hr 1 0.026 hr 1 E. Coli DH10B + phmf-1 0.183 hr 1 0.095 hr 1
RBS Library Calculator Demonstration
Regulatory Small RNAs Regulatory srnas bind to target mrnas and alter the ribosome s ability to initiate translation srna Repression srna Activation
Control of Chromosomal Protein Expression Synthetic srna arrays enable fine control over protein expression without modification of the chromosome
A Statistical Thermodynamic Model srna mrna 1. Calculating mrna, srna, and ribosome interactions 2. Calculating equilibrium concentrations 3. Predicting translation initiation rates The model predicts the translation initiation rate when any srna regulates the translation initiation of any mrna translation initiation rate (proteins / mrna / second)
Automated Design of Synthetic Regulatory RNAs Two synthetic srnas were designed to repress protein expression by >100-fold Synthetic srna Sequence #1 GAAUUCAGAUACAUAUCUCCUUAUGUUUUGGUCUGUAGAUAUCAGACAUUGACGUCUGCUGUGU mrna Target #1 UCUAGAGGAUCCAAUUCGCGAUAUCUACAGACCAAAACAUAAGGAGAUAUAAACAUGGCGAGCU CUGAAGACGUUAUCAAAGAGUUCAUGCGUUUCAAAGUUCGUAUGGAAGGUUCCGUUAACGGUCA CGAGUUCGAAAUCGAAGGUGAAGGUG Synthetic srna Sequence #2 GAAUUCAGAUACACGCAUUUUUAUCCUCCUUCGACUUCUUGAUAGACAUUGACGUCUGCUGUGU mrna Target #2 ACUAGUUUAGGUCACCCUCGUAGUAACUGAUAAGCUAUCAAGAAGUCGAAGGAGGAUAAAAAUG CGUAAACUCGAGGAACUUUUCACGGGGGUCGUCCCAAUUCUGGUAGA
Preliminary Testing of Model Predictions absence of synthetic srna presence of synthetic srna E. coli DH10B (green) ++ srna1 (red) no srna1 (blue) background autofluorescence 40 hour cultures to obtain steady-state data Measuring cell density and fluorescence using a TECAN spectrophotometer Fluorescence of (no srna) and (background) are indistinguishable
Preliminary Testing of Model Predictions These are the most tightly repressing (published) synthetic srnas created to date. trial 1 trial 2 RFP Protein Production (flu./cell) Repression srna1 ++srna no srna ++srna GFP Protein Production (flu./cell) no srna In trial 2, we improved our range of measurement srna2 no srna ++srna no srna ++srna >300-fold >3000-fold E. coli DH10B TECAN spectrophotometry steady-state measurements
Surprises & Solutions Along the Way 1. We experimented with 5 and 3 Hfq-binding hairpins. A free 5 end is required. 2. We employed a bacterial artificial chromosome (1 copy) to obtain a high srna:mrna ratio protein production was not repressed These hairpins were reputed to increase Hfq binding Hfq-binding improves srna:mrna regulation ColE1: ~60 copies low srna to mrna ratio poor repression protein production was not repressed protein production was repressed protein production was repressed BAC: 1 copy ColE1: ~60 copies high srna to mrna ratio good repression
Small RNA Calculator Demonstration
Acknowledgements Salis Lab Postdocs Jason Collens Graduate students Amin Espah Borujeni Iman Farasat Tian Tian Long Chen Bayram Bulut Undergraduate students Andrew Kirk, Lecie Houston, Emily Dong