Synthetic Biology: A New Application Area for Design Automation Research

Size: px
Start display at page:

Download "Synthetic Biology: A New Application Area for Design Automation Research"

Transcription

1 Synthetic Biology: A New Application Area for Design Automation Research Chris Myers University of Utah Carnegie Mellon University December 10, 2009

2 James Watson Biology has at least 50 more interesting years (1984).

3 Michael Samoilov and Adam Arkin

4 Phage λ Virus Host chromosome Phage λ E. coli bacterial cell Attachment Lysogeny Penetration Lysis Replication Cell division Assembly Induction event Release Lysogeny Pathway Lysis Pathway

5 Phage λ Decision Circuit

6 Asynchronous Circuit? McAdams/Shapiro, Science (1995)

7 Stochastic Circuit? Arkin/Ross/McAdams, Genetics (1998)

8 Stochastic Asynchronous Circuit? k PL k 2 k 1 ciii k 5 k PL N 0.2 k PL k PR k 3 k 4 k PR Cro CroH k PR cii k PRE ci k PRM cih 0.5 k PR k 6

9 Stochastic Asynchronous Circuit Results Estimated Fraction of Lysogens Stochastic Asynchronous Circuit (starved) O- Experimental (starved) P- Experimental (starved) Master Eqn Simulation (starved) 1e Average Phage Input (API) SAC results generated in only 7 minutes. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

10 Stochastic Asynchronous Circuit Results Estimated Fraction of Lysogens Stochastic Asynchronous Circuit (starved) 1e-05 O- Experimental (starved) P- Experimental (starved) Master Eqn Simulation (starved) Stochastic Asynchronous Circuit (well-fed) O- Experimental (well-fed) 1e Average Phage Input (API) SAC results generated in only 7 minutes. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

11 Synthetic Biology (From Adventures in Synthetic Biology - Endy et al.)

12 Genetic Engineering vs. Synthetic Biology Genetic engineering (last 30 years): Recombinant DNA - constructing artificial DNA through combinations. Polymerase Chain Reaction (PCR) - making many copies of this new DNA. Automated sequencing - checking the resulting DNA sequence. Synthetic biology adds: Standards - create repositories of parts that can be easily composed. Abstraction - high-level models to facilitate design. Automated construction - separate design from construction. (source: Drew Endy)

13 Genetic Design Automation (GDA) Standards, abstraction, and automated construction are the cornerstones of Electronic Design Automation (EDA). EDA facilitates the design of more complex integrated circuits each year. Crucial to the success of synthetic biology is an improvement in methods and tools for Genetic Design Automation (GDA). Experiences with EDA can jump start the development of GDA.

14 Current State of GDA (Standards) Registry of standard biological parts used to design synthetic genetic circuits ( Adequate characterization of these parts is an ongoing effort. Systems Biology Markup Language (SBML) has been proposed as a standard representation for the simulation of biological systems. Many simulation tools have been developed that accept models in the SBML format (Copasi, Jarnac, CellDesigner, SimBiology, ibiosim, etc.).

15 Current State of GDA (Abstraction) Existing SBML-based GDA tools model biological systems at the molecular level. A typical SBML model is composed of a number of chemical species (i.e., proteins, genes, etc.) and reactions that transform these species. This is a very low level representation which is roughly equivalent to the layout level for electronic circuits. Designing and simulating genetic circuits at this level of detail is extremely tedious and time-consuming.

16 Current State of GDA (Automated Construction) Several companies have formed that will construct a plasmid from an arbitrary DNA sequence. It is still difficult, however, to separate design and construction issues. To achieve this, a GDA tool that supports higher-levels of abstraction for modeling, analysis, and design of genetic circuits is essential.

17 Phage λ Decision Circuit

18 Phage λ Decision Circuit

19 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

20 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

21 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

22 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription RNAP RNAP RNAP RNAP cii Promoters Genes

23 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription RNAP RNAP RNAP cii Promoters Genes

24 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription RNAP RNAP RNAP cii Promoters Genes

25 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

26 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

27 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

28 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

29 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

30 Genetic Circuits CI Dimer DNA Dimerization ci CI Protein Repression Activation Pre O E O R Operator Sites CII Protein Transcription Pr RNAP RNAP RNAP cii Degradation Translation mrna Promoters Genes

31 Genetic Circuits CI Dimer DNA Dimerization ci CI Protein Repression Activation Pre O E O R Operator Sites CII Protein Transcription Pr RNAP RNAP RNAP cii Degradation Translation mrna Promoters Genes

32 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

33 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

34 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

35 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

36 Genetic Circuits CI Dimer DNA Dimerization CI Protein Repression Activation Pre O E O R ci Operator Sites CII Protein Degradation Translation mrna Transcription Pr RNAP RNAP RNAP cii Promoters Genes

37 Francis Crick DNA makes RNA, RNA makes protein, and proteins make us.

38 Johann Von Neumann The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work.

39 Scott Adams There are many methods for predicting the future. For example, you can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively, these methods are known as nutty methods. Or you can put well-researched facts into sophisticated computer models, more commonly referred to as a complete waste of time.

40 Genetic Circuit Model (GCM) Provides a higher level of abstraction than SBML. Includes only important species and their influences upon each other. GCMs also include structural constructs that allow us to connect GCMs for separate modules through species ports.

41 A Genetic Not Gate A P1 A C C P1 c A C

42 A Genetic Nor Gate A B A B C P1 P1 C P1 c A B C

43 A Genetic Nand Gate A C A B P1 C P2 B P1 c C P2 c A B C

44 A Genetic Oscillator CI Dimer Pre CI Pr CI Protein CII Protein CII Pre O E O R Pr ci cii CI CII

45 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

46 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

47 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

48 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

49 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

50 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

51 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

52 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

53 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

54 SBML: Main Elements Species Global parameters (ex. k1=0.1) Reactions Reactants Products Modifiers Stoichiometry Reversible Kinetic laws

55 Synthesizing SBML from a GCM Representation Create degradation reactions Create open complex formation reactions Create dimerization reactions Create repression reactions Create activation reactions

56 GCM Example CI Pre Pr CII

57 Degradation Reactions

58 Open Complex Formation Reactions

59 Dimerization Reactions

60 Repression Reactions

61 Activation Reactions

62 Complete SBML Model

63 Classical Chemical Kinetics Uses ordinary differential equations (ODE) to represent the system to be analyzed, and it assumes: Molecule counts are high, so concentrations can be continuous variables. Reactions occur continuously and deterministically. Genetic circuits have: Small molecule counts which must be considered as discrete variables. Gene expression reactions that occur sporadically. ODEs do not capture non-deterministic behavior.

64 Richard Feynmann A philosopher once said It is necessary for the very existence of science that the same conditions always produce the same results. Well, they do not. You set up the circumstances, with the same conditions every time, and you cannot predict behind which hole you will see the electron.

65 NYTimes: Expressing Our Individuality, the Way E. Coli Do

66 Rainbow and CC

67 Stochastic Chemical Kinetics To more accurately predict the temporal behavior of genetic circuits, stochastic chemical kinetics formalism can be used. Use Gillespie s Stochastic Simulation Algorithm which tracks the quantities of each molecular species and treats each reaction as a separate random event. Only practical for small systems with no major time-scale separations. Abstraction is essential for efficient analysis of any realistic system.

68 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

69 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

70 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

71 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

72 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

73 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

74 Automatic Abstraction Reaction Model Reaction-based Abstraction Abstracted Reaction Model State-based Abstraction SAC Model Markov Chain Analysis Stochastic Simulation Results Begins with a reaction-based model in SBML. Automatically abstracts this model leveraging the quasi-steady state assumption, whenever possible. Encodes chemical species concentrations into Boolean (or n-ary) levels to produce a stochastic asynchronous circuit (SAC) model. Can now utilize Markov chain analysis. Kuwahara et al., Trans. on Comp. Sys. Bio. (2006)

75 Dimerization Reduction

76 Dimerization Reduction

77 Operator Site Reduction (PR)

78 Operator Site Reduction (PR)

79 Operator Site Reduction (PRE)

80 Operator Site Reduction (PRE)

81 Similar Reaction Combination

82 Modifier Constant Propagation

83 Final SBML Model 10 species and 10 reactions reduced to 2 species and 4 reactions

84 GCM Advantages Greatly increases the speed of model development and reduces the number of errors in the resulting models. Allows efficient exploration of the effects of parameter variation. Constrains SBML model such that it can be more easily abstracted resulting in substantial improvement in simulation time.

85 ibiosim: The Intelligent Biological Simulator Project management support. GCM Editor - creates Genetic Circuit Models (GCM). SBML Editor - creates models using the Systems Biology Markup Language (SBML). reb2sac - abstraction-based ODE, Monte Carlo, and Markov analysis. TSD Graph Editor - visualizes time series data (TSD). Probability Graph Editor - visualizes probability data. GeneNet - learns GCMs from TSD. Myers et al., Bioinformatics (2009)

86 ibiosim: Genetic Circuit Editor Myers et al., Bioinformatics (2009)

87 ibiosim: SBML Editor Myers et al., Bioinformatics (2009)

88 ibiosim: Analysis Engine Myers et al., Bioinformatics (2009)

89 ODE Results for the Simple Genetic Oscillator Comparison of ODE to SSA Results Number of molecules ,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,00 Time (s) CI_total (ODE)

90 SSA Results for the Simple Genetic Oscillator Number of molecules Comparison of ODE to SSA Results ,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,00 Time (s) CI_total (ODE) CI_total (SSA)

91 Genetic Muller C-Element A B C C A B C C 1 0 C 1 1 1

92 Toggle Switch C-Element (Genetic Circuit) A D X A B A B D E X F Y Z S R Q C P1 B P2 D P7 E d x E X e x P3 y F f F Z Y P8 f P4 z C Y Z P5 c y P6 z Nguyen et al., 13th Symposium on Async. Ckts. & Sys., 2007 (best paper) Nguyen et al., to appear in the Journal of Theoretical Biology

93 Toggle Switch C-Element (GCM) A D X P1 B d E x X Y P2 D e x P3 y F E P7 f F Z P8 f P4 z C Y Z P5 c y P6 z

94 Toggle Switch C-Element (SBML)

95 Toggle Switch C-Element (Abstracted) Reduced from 34 species and 31 reactions to 9 species and 15 reactions.

96 Toggle Switch C-Element (Simulation) Simulation time improved from 312 seconds to 20 seconds.

97 Majority Gate C-Element (Genetic Circuit) A B X Y D E C Z A X Y D P1 x y P5 d B Z X D E C P2 z x P4 d P7 e P8 c D Y Z D P3 y z P6 d

98 Speed-Independent C-Element (Genetic Circuit) A B S1 S2 S4 S3 C A S4 X S1 S2 S3 P1 s4 x P4 s1 P5 s2 P6 s3 B S4 Y S1 S3 S2 Z P2 s4 y P7 s1 P8 s2 z S4 S3 Z C S2 S4 P3 s3 z P9 c P10 s4

99 Genetic C-element Failures Toggle, Inputs Mixed 120 dy=0 dz=0 100 Y Stable? 20 Unstable Stable Z

100 Comparison of Failure Rates for the C-element Designs 0.02 Failure Rate for Each C-Element Design Majority Gate (High to Low) Speed-Independent (High to Low) Toggle Switch (High to Low) Failure Rate Time (s)

101 Effects of Decay Rates on Failure Rates Failure Rate Versus Decay Rate (Toggle Switch C-element) 0.25 High to Low Low to High 0.2 Failure Rate Decay Rate

102 Effects of Decay Rates on Switching Time Switching Time Versus Decay Rate (Toggle Switch C-element) 2000 High Low Switching Time (s) Decay Rate

103 Application: Bacterial Consensus One interesting application for synthetic biology is the design of bacteria that can hunt and kill tumor cells (Anderson et al.). Care must be taken in determining when to attack potential tumor cells. Can use a genetic Muller C-element and a bacterial consensus mechanism known as quorum sensing. C-element combines a noisy environmental trigger signal and a density dependent quorum sensing signal. Activated bacteria signal their neighbors to reach consensus. Env A Detect (error rate ε) E Muller C-element (state error rate δ) Concentration Threshold Action cell boundary Winstead et al., IBE Conference (2008)

104 Confidence Amplifier A noisy C-element with a confidence-feedback loop: E C A The output rails to maximum confidence, even if E has low confidence. This configuration only works if the C-element is noisy. Otherwise, the circuit is permanently stuck in its initial state.

105 Quorum Trigger Population Dynamics Inactive Trigger Circuits

106 Quorum Trigger Population Dynamics Env signal applied Env (HSL concentration low)

107 Quorum Trigger Population Dynamics One circuit randomly activates Env (HSL concentration increases)

108 Quorum Trigger Population Dynamics More circuits activate due to HSL Env (HSL concentration increases sharply)

109 Quorum Trigger Population Dynamics Avalanche effect: most cells activate Env (HSL concentration saturates)

110 Quorum Trigger Population Dynamics Env signal is removed. (Circuits stay active)

111 Quorum Trigger Population Dynamics Time passes. (Circuits randomly switch off)

112 Quorum Trigger Simulation Results Probability of Toggle gate stimuli, E= Environmental Trigger Consensus Activator Probability Time Steps

113 Quorum Trigger Simulation Results Probability of Toggle gate stimuli, E= Environmental Trigger Consensus Activator Probability Time Steps

114 Quorum Trigger Simulation Results Probability of Toggle gate stimuli, E= Environmental Trigger Consensus Activator Probability Time Steps

115 Quorum Trigger Design medium 3OC6HSL Env LuxR Complex LuxI LuxR Env OR Complex LuxI LuxR HSL(out) HSL(in) OR AND

116 Quorum Trigger Design medium 3OC6HSL Env LuxR Complex LuxI LuxR laci+pl RBS luxr lux pr RBS GFP RBS luxr RBS luxi luxr GFP luxi luxr R0011 B0034 C0062 B0015 R0062 B0034 E0040 B0034 C0062 B0034 C0061 B0015 F2622 K116634

117 Quorum Trigger Design Chris Myers (U. of Utah) Synthetic Biology Carnegie Mellon University

118 Future GDA Research Directions Genetic circuits have no signal isolation. Circuit products may interfere with each other and host cell. Gates in a genetic circuit library usually can only be used once. Behavior of circuits are non-deterministic in nature. No global clock, so timing is difficult to characterize. To address these challenges, we are investigating soft logic models based on factor graphs and adapting asynchronous synthesis tools to a genetic circuit technology.

119 Biologically Inspired Circuit Design Human inner ear performs the equivalent of one billion floating point operations per second and consumes only 14 µw while a game console with similar performance burns about 50 W (Sarpeshkar, 2006). We believe this difference is due to over designing components in order to achieve an extremely low probability of failure in every device. Future silicon and nano-devices will be much less reliable. For Moore s law to continue, future design methods should support the design of reliable systems using unreliable components. Biological systems constructed from very noisy and unreliable devices. GDA tools may be useful for future integrated circuit technologies.

120 Adam Arkin Since the engineering principles by which such circuitry is constructed in cells comprise a super-set of that used in electrical engineering, it is, in turn, possible that we will learn more about how to design asynchronous, robust electronic circuitry as well.

121 More Information Linux/Windows/Mac versions of ibiosim are freely available from: Publications: Course materials: myers/ece6760/ myers/math6790/

122 Engineering Genetic Circuits

123 Acknowledgments Nathan Barker Keven Jones Hiroyuki Kuwahara Curtis Madsen Nam Nguyen Chris Winstead This work is supported by the National Science Foundation under Grants No , CCF , and CCF