Swarms and Genes: Exploring λ-switch Gene Regulation through Swarm Intelligence

Size: px
Start display at page:

Download "Swarms and Genes: Exploring λ-switch Gene Regulation through Swarm Intelligence"

Transcription

1 WCCI 2006, World Congress on Computational Intelligence, Vancouver, BC CEC 2006, Congress on Evolutionary Computation Swarms and Genes: Exploring λ-switch Gene Regulation through Swarm Intelligence Christian Jacob, Member, IEEE, Anna Barbasiewicz, and Glorious Tsui Abstract We demonstrate a 3-dimensional, agent-based model of the λ-switch gene regulatory system, which simulates the interactions of proteins, promoters and operators during the phases of lysogenic and lytic growth in the phage λ infected bacterium Escherichia coli. Following a decentralized approach, all regulatory mechanisms result from local interactions of the biomolecular agents within a simulated 3D cytoplasm. We show that our model displays the regulatory λ switch behavior observed in E. coli, where proteins regulate their own production. Our λ switch simulations provide a versatile testbed for studying gene regulation in silico, complementing in vitro experiments. I. INTRODUCTION Major advances in systems biology will increasingly be enabled by the utilization of computers as an integral research tool, leading to new interdisciplinary fields within bioinformatics, computational biology, and biological computing. Nowadays, in an effort to reduce costs for expensive experiments and minimize the time spent in the laboratory, model building, simulation, data gathering and analysis can be done primarily in silico. Furthermore, innovations in agent-based modeling, computer graphics and specialized visualization technology, such as the CAVE c Automated Virtual Environment, provide biologists with unprecedented tools for research in virtual laboratories [1], [2], [3]. However, current mathematical and computer models of cellular and biomolecular systems have major shortcomings regarding their usability for biological and medical research. Most models do not explicitly take into account that the measurable and observable dynamics of cellular/biomolecular systems result from the interaction of a (usually large) number of agents, such as viruses with cells, proteins with proteins, or proteins with promoter and operator regions on the DNA. With our agent-based models [4], [5], simulations and visualizations that introduce swarm intelligence algorithms [6], [7] into biomolecular and cellular systems, we develop highly visual, adaptive and user-friendly innovative research tools, which, we think, will play a more and more dominant role in the biological and life sciences research community thus complementing most of the current, more abstract and computationally more challenging mathematical and computational modeling approaches [8], [9]. For example, many differential equation models of biological systems, such as gene regulatory networks, are very sensitive to initial conditions, result in a large number of equations, and usually require control parameters that have no direct correspondence to measurable quantities within biological systems [9]. We propose a model of the gene regulatory system utilized by the bacterial Escherichia coli (E. coli) virus lambda (λ), as a highly sophisticated network of orchestrated interactions, based on relatively simple rules for each of the biomolecular agents involved. Giving these agents the freedom to interact within a confined, 3-dimensional space results in emergent behavior patterns that resemble the regulatory reactions of the E. coli genome and proteome in response to a λ phage infection. This paper is organized as follows. Section II gives a brief description of the bacterial genomic and proteomic control systems involved in the λ switch. In Section III we provide a more indepth discussion of the biomolecular agents that act as the regulatory units through interactions with promoter and operator regions on the E. coli genome. The key switching mechanism of regulated recruitment among regulatory proteins is outlined in Section IV. Section V introduces our agent-based, 3D model and related work in Section VI. In Section VII a typical simulation run of our model illustrates the actual switching behavior emerging from the interactions among the biomolecular agents. A more detailed analysis of the simulation is provided in Section VIII. The paper concludes with future directions for our λ switch model and the potential of 3D, agent-based simulations for computational biology in general. II. PHAGE λ AND ITS REPLICATION The bacterium E. coli is one of the most studied model organisms in biological and medical research [10], [11], [12]. As a procaryotic microorganism it provides an ideal platform to study genomic and proteomic regulatory networks, both in vitro and in silico. The relative ease of manipulation of E. coli in wet lab experiments makes it a strong candidate for building increasingly detailed (hence, more realistic and predictive) mathematical and computer models, which consequently allow us to study gene regulatory systems in a more general context. Phage λ, as all viruses, utilizes the replication machinery of its E. coli host. However, rapid replication of the virus does not start immediately after the host s infection; the λ genome is integrated into the bacterial genome and thus resides dormant within its host. In this phase of lysogeny, the viral genome is replicated along with the host chromosome (Fig. 1). During its dormant residency in the lysogen, only one phage gene (ci) is active, which expresses the phage λ repressor. This repressor ensures that all other phage genes are switched off. Only a socalled induction event will flip the λ switch. For example, after exposure to UV light, the repressor looses its functionality, is in fact switched off, and the rest of the phage genes (lytic genes) gets activated. As

2 a consequence, the cell will eventually lyse and release new bacterial phages (Fig. 1). Fig. 1. Schematic illustration of the lysogenic and lytic growth phases after infection of a bacterium by phage λ. III. THE λ SWITCH After this high-level description of the bacterial switching behavior from lysogenic to lytic growth, we have to have a closer look at the actual biomolecular agents involved in these regulatory processes. The two genes ci and cro play a key role in the switch, together with the socalled right operator (O R ) regulatory region (Fig. 2). The gene ci encodes for the λ repressor and is switched on in a lysogen, whereas the gene encoding for the Cro protein is inactive (and vice versa during lysogeny). How this switching behavior is maintained, becomes clear after a closer inspection of the O R regulatory region. There are two aspects to note. First, the operator is subdivided into three binding sites (O R1, O R2, and O R3 ). Secondly, the binding sites overlap the two opposing promoters P RM and P R, which initiate transcription of ci and the lytic genes (including cro), respectively. Fig. 2. Sections of the bacterial genome that partake in the λ switch. IV. REGULATED RECRUITMENT Looking at the agents involved in the regulatory scheme, we have the following situation (Fig. 3). In a lysogen, the λ repressors, which form dimers, bind to two adjacent sites, O R1 and O R2. This binding usually occurs sequentially, with O R1 being occupied first, which then increases the probability of another repressor dimer to attach to O R2 (Fig. 3a). This cooperative binding has two effects. First, the regulated recruitment results in a much higher frequency of RNA polymerase binding at P RM, which, in turn, produces more repressors to stabilize the switch through a positive feedback loop. Secondly, the P R promoter, which partly overlaps with O R1 and O R2, is blocked, so that RNA polymerase cannot initiate transcription of cro and other lytic genes. It is important to note that repressor is not constantly bound to the operator; repressors frequently fall off the operator and either rebind or are replaced by another nearby repressor. Consequently, during these agent-based regulatory interactions the concentration of repressors has to be kept high enough to ensure that O R1 and O R2 are occupied almost all the time, which self-stabilizes the lysogenic state. The stable state of the lysogen can only be disturbed by an induction event with dramatic consequences. Inducers, such as UV light, primarily damage DNA, which changes the behavior of the bacterial protein RecA, that normally acts as a catalyzer for recombination of DNA molecules. 1 After induction, RecA assumes the function of a protease and cleaves λ repressor monomers, which prevents further formation of repressor dimers and hence disables repressors from binding to the operators. Eventually, O R1 and O R2 become vacant, so that RNA polymerase can now initiate transcription of the cro gene and other genes, which encode for proteins necessary in the early stages of lytic growth (Fig. 3b). The Cro protein has a high affinity for binding to O R3, where it facilitates recruitment of RNA polymerase to express more of the cro gene. Through this positive feedback loop, Cro concentration is rising until O R1 and O R2 are also occupied by Cro, which eventually prevents RNA polymerase from further binding to the P R promoter and thus automatically turns off expression of cro and other early lytic genes. It is particularly interesting that the binding sequence of Cros (O R3, O R2, O R1 ) is reverse to the order in which repressors bind onto the operators (O R1, O R2, O R3 ), which is the core mechanism for the switch and its bi-stable state. V. THE AGENT-BASED MODEL What should be clear at this point is that the whole switching behavior can be defined through the interactions of a number of biomolecular agents of different types: Repressor monomers which form dimers; Repressor dimers, which bind to the operator sites; Cro monomers, which form dimers; Cro dimers, which bind to the operator sites; RecA, which cleaves repressor; RNA polymerase, which docks onto promoters and initiates the expression of proteins (repressor and Cro); and promoter and operator sites on the DNA. Given these agents and a set of interaction rules, we have implemented a computational model of the λ switch, which incorporates a swarm-based approach with a 3D visualization 1 How this change occurs is still not fully understood [11].

3 (a) Schematic of the regulated recruitment of repressor dimers and RNA polymerases, leading to a regulatory, self-stabilizing interaction network. (b) Schematic of the regulated recruitment of Cro dimers and RNA polymerases, leading to a regulatory, self-stabilizing interaction network. (c) Regulated recruitment of repressor dimers and RNA polymerases as visualized in our agent-based simulation. (d) Regulated recruitment of Cro dimers and RNA polymerases as visualized in our agent-based simulation. Fig. 3. The key biomolecular entities of the λ switch and their interaction during lysogeny and lytic growth. Fig. 4. An overview of the λ switch agents included in our current model. (1) Repressor monomer; (2) Repressor dimer; (3) Cro monomer; (4) Cro dimer; (5) RNA polymerase; and (6) RecA. The window at the bottom-left corner allows direct observation of the operator sites.

4 (Fig. 4). We use modeling techniques similar to our other agent-based simulations of bacterial chemotaxis [3], [2], the human immune system [13], and the lactose operon [14], [1]. Each individual element in the λ switch simulation is represented as an independent agent governed by (usually simple) rules of interaction, for which we use the BREVE physics-based, multi-agent simulation engine [15]. While executing specific actions when colliding with or getting close to other agents, the dynamic elements in the system move randomly in continuous, 3-dimensional space, which represents the bacterial cytoplasm. As illustrated in Figure 4, we represent λ switch agents as three-dimensional, abstract shapes, which have been modeled after illustrations found in the literature [11], [12] (compare also Fig. 3). The simulation system provides each agent with basic services, such as the ability to move, rotate, and determine the presence and position of other agents. A scheduler implements time slicing by invoking each agent s Iterate method, which executes a specific, context-dependent action. These actions are based on the agent s current state, and the state of other agents in its vicinity. Each agent keeps track of other agents in the vicinity of its neighbourhood space, which is defined as a sphere with a specific radius. Each agent s next-action step is triggered depending on the types and numbers of agents within its local interaction space. Consequently, our simulated agents work in a decentralized fashion with no central control unit to govern the interactions of the agents. VI. RELATED WORK One of the early successful thermodynamic models is described in [16], which accounts for cooperative interactions of λ repressor with the right operator, based on rate equations that capture the probabilities of microscopic configurations of repressor operator site bindings. However, this model does not include Cro monomers. An extended version of this model by Shea and Ackers incorporates the kinetics for coupling events, including synthesis of ci and cro [17]. Hasty et al. [18] give a good overview of further modeling approaches using rate equations or stochastic kinetics. A qualitative simulation model of λ phage growth [19] utilizes qualitative reasoning, which could be used to extract relations and interactions of biological entities from genomic databases. As this mining of interaction rules cannot be performed automatically at the present time, and a manual composition of qualitative constraints is still tedious, we follow an intermediate approach, where interaction rules among biomolecular entities in 3-dimensional cell space are formulated. As it turns out, these rules of interaction can initially be rather crude to approximate a given regulatory system, as we demonstrate here with the λ switch. Repeated fine tuning of control parameters for the interaction rules can then be achieved automatically using evolutionary algorithms to match data from in vitro experiments with the in silico model. Fig. 5. Graphical user interface to set agent parameters: affinity strength for the repressors and operator binding sites 1 through 3; RNA polymerase transcription speed; cleavage duration. VII. THE λ SWITCH MODEL IN ACTION The simulated cell cytoplasm only contains the DNA section that is important for the switching behavior. In particular, we exclude any other genes, such as the early lytic genes. The DNA, modeled as a stationary beam in the centre of the cell (Fig. 4), contains the two promoters, P RM (left of the centre) and P R (to the right of the centre), as well as the three operator binding sites, O R1, O R2, and O R3 (Fig. 3c,d). The strands for the ci and cro genes are represented by the left and right sections, respectively. 2 1) Key Switching Stages: In Figure 6 we show some of the key stages of our λ switch model for the lysogenic and lytic growth phase. The simulation is initialized with λ repressor monomers and RNA polymerases only (Fig. 6a). In general, all agents perform a random walk within the designated cell space. However, when agents enter specific regions the probability distributions for their movements can change. For example, repressor monomers form dimers whenever they get close to each other and then, as pairs, have an increased tendency to move towards the operator sites. Whenever a dimer gets close to any of the binding sites, the affinity for each operator determines whether docking occurs with O R1, O R2, or O R3. These affinities can be changed during the simulation (Fig. 5) and are set so that repressor docking occurs in the desired order O R1 O R2 O R3. Any repressor docked onto one of the operators increases the affinity for its adjacent operator, which results in the recruitment of further repressors (Fig. 6b). A similar recruitment process applies to RNA polymerases, which have a higher tendency to move towards and dock onto promoter P RM, once O R1 and O R2 are 2 It should be noted that left and right only refers to this particular figure. The visualization can be freely rotated and zoomed in 3D space.

5 (a) Repressors start docking onto OR1. (b) Repressors collectively bind to OR1 and OR2. (c) RNA polymerase gets increasingly recruited to PRM and expresses more repressors. (d) After induction, RecA proteins (blue) are introduced into the cell environment, which start cleaving Repressor monomers. (e) Operator sites OR1 and OR2 become vacant and Cro starts docking onto OR3. (f) Cro is now mainly occupying OR1 during the lytic phase, which keeps repressor production turned off. Fig. 6. Selected snapshots from the λ switch simulation. occupied by repressors. As in the real bacterial cell, repressors bind to an operator site only for a certain time period tbind. Therefore, it is not until a certain concentration of repressors has been reached, that the operator sites, OR1 and OR2 are occupied most of the time (Fig. 6c). 2) Induction: We simulate an induction event by introducing a large number of RecA proteins into the simulation space (Fig. 6d). RecA cleaves any repressor monomer it encounters. Any RecA that collides with a repressor dimer cleaves one of its monomers, which also splits the dimer. After some time, almost no dimers exist and most monomers are split into their amino and carboxyl domains. Consequently, the operator binding sites are now vacant, so that RNA polymerases start docking onto the PR promoter, producing Cro protein. Cro then follows a similar binding and recruitment process as the repressors, but in reverse order of binding, starting at OR3 and increasingly recruiting RNA polymerases to the cro gene (Fig. 6e). As this, again, acts as a positive feedback

6 Fig. 7. Diagrammatic representation of docking events at the three operator sites over 10, 000 iterations. loop, the concentration of Cro rises. Eventually, with more Cro in the system and around the operator binding sites all three operators tend to be occupied most of the time, which prevents RNA polymerase from producing more Cro. Hence, the concentration of Cro is stabilized (Fig. 6f). VIII. ANALYSIS OF SIMULATION DATA As the agent-based system exhibits implicit noise, we will explain an example of a typical simulation run. The following experiment was performed over 10, 000 iteration steps, where one step is associated with the smallest integration time step for the physics simulation engine that determines each particle s direction, velocity, and acceleration. Within each of these time steps, collision detection (important for binding events to the operator sites or for forming dimers from monomers) and resolution takes place. Focusing on docking events on the three operator sites first, Figure 7 shows a time plot of the bindings that occur at O R1, O R2, and O R3 over 10, 000 iterations. Each column represents one site, where each horizontal line marks a docking event at a particular point in time. In each column plot, time progresses from top to bottom. Repressor dimer bindings are marked by grey lines, whereas Cro dockings are represented by black lines. During the first 2, 000 time steps, it is primarily repressors that occupy the O R1 and O R2 sites. Very rarely, when O R2 becomes vacant, Cro docks onto it. Cro never docks onto O R1 during this initial period, as repressors have a much higher affinity for this site than Cro, and Cro itself prefers to dock onto O R3 first. During the intervals where O R3 is vacant RNA polymerases have the chance to dock onto the P RM promoter in order to start transcription, translation, and final expression of a new repressor monomer. 3 Shortly before iteration 2, 000 induction agents in the form of RecA are introduced into the system. These cleavage agents separate the amino and the carboxyl part of each 3 In our model, we do not explicitly distinguish between transcription, translation, and expression; these are simulated as a single process, as this does not have a direct influence on the overall switching behavior.

7 Fig. 8. Concentrations of repressor dimers (black) and Cro dimers (grey) over 10, 000 iteration steps. The noisy repressor numbers are due to continuous dissociation of repressor dimers into monomers and vice versa. Induction agents (RecA) are introduced shortly before time step 2, 000, after which the number of repressor dimers declines rapidly due to repressor monomer cleavage by RecA. (a) Lysogenic Growth: Repressors. (b) Lytic Growth: Repressors. (c) Lysogenic Growth: Cros. (d) Lytic Growth: Cros. Fig. 9. Docking frequencies of Cro and Repressor dimers.

8 repressor monomer (Fig. 6d). This has the consequence that repressors can no longer form dimers, hence they are no longer able to dock onto any of the operator sites. Consequently, as can be seen in Figure 7, shortly after time step 2, 000 no more repressors are bound to operator sites O R1 or O R2. Instead, operator O R3 is now increasingly occupied by Cro, which in combination with Cro bound to site O R2 almost completely blocks RNA polymerases from docking onto P RM. Therefore, expression of repressors comes to a halt, as illustrated in Figure 8. Also, because O R1 is now vacant all the time (Cro has a very low affinity for this site), polymerases can dock onto promoter P R, which leads to an increased rate of Cro expression. Note in Figure 8 that after time step 2, 000 the slope of the Cro concentration curve increases. Once the concentration of Cro is high enough, Cro will start to also occupy O R1, which then switches off Cro expression in the same way as repressor expression levels off, as seen in the left-most section of Figure 8. Figure 9 gives a more detailed overview of the docking frequencies during this experiment. For repressors and Cro we summarize how many docking events occurred both during the lysogenic (before induction) and during the lytic phase (after induction). We have extracted each docking combination separately, in order to show the cooperativity effects that are so crucial for the λ-switch. Figure 9a shows that O R1 and O R2 are the sites most occupied by repressors in the lysogen. The O R2 bar is higher than O R1 as it combines both the single dockings at O R2 and the combined O R1 -O R2 dockings. Obviously, as the fourth bar illustrates, cooperative docking at O R1 and O R2 prevails. All other docking combinations occur rather infrequently. During lytic growth, repressor numbers decrease rapidly and therefore the number of operator bindings by repressor dimers is greatly reduced (Fig. 9b). Of course, looking at the Cro docking frequencies, the picture should be exactly reversed. Cro numbers are kept to a minimum during lysogeny, as shown in Figure 9c. However, the dominant number of O R3 dockings reflects Cro s high affinity for O R3. On the other hand, during the lytic phase Cro thrives, with bindings at O R3 being most dominant. IX. CONCLUSION AND FUTURE WORK We have presented an agent-based (or individual-based) model of the gene regulatory events that occur in the E. coli bacterium after infection by phage lambda. The model implements the major biomolecular players involved in the regulatory processes in a lysogen and during lytic growth. The model also replicates the key features of the switch, which is triggered by induction agents. In our future work with this model we will investigate the following aspects: A sensitivity analysis of the agent parameters will be undertaken. We will utilize evolutionary algorithms to adjust affinity settings and other parameters with respect to a certain fitness criterion (e.g., switching speed, robustness). We will explore the effects of noise (introduced through agent velocities and other spatial parameters) on the robustness of the overall switching behavior. Up to date details about this λ-switch model and other agent-based simulation examples, which are investigated in our Evolutionary & and Swarm Design Lab can be found at: ACKNOWLEDGMENT Financial support for this research was provided by the Natural Sciences and Engineering Research Council of Canada. REFERENCES [1] I. Burleigh, G. Suen, and C. Jacob, Dna in action! a 3d swarm-based model of a gene regulatory system, in ACAL 2003, First Australian Conference on Artificial Life, Canberra, Australia, [2] R. Hoar, J. Penner, and C. Jacob, Transcription and evolution of a virtual bacteria culture, in Congress on Evolutionary Computation. Canberra, Australia: IEEE Press, [3] J. Penner, R. Hoar, and C. Jacob, Bacterial chemotaxis in silico, in ACAL 2003, First Australian Conference on Artificial Life, Canberra, Australia, [4] S. Johnson, Emergence: The Connected Lives of Ants, Brains, Cities, and Software. New York: Scribner, [5] S. Wolfram, A New Kind of Science. Champaign, IL: Wolfram Media, [6] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems, ser. Santa Fe Institute Studies in the Sciences of Complexity. New York: Oxford University Press, [7] S. Camazine, J.-L. Deneubourg, N. R. Franks, J. Sneyd, G. Theraulaz, and E. Bonabeau, Self-Organization in Biological Systems, ser. Princeton Studies in Complexity. Princeton: Princeton University Press, [8] S. Salzberg, D. Searls, and S. Kasif, Eds., Computational Methods in Molecular Biology, ser. New Comprehensive Biochemistry. Amsterdam: Elsevier, 1998, vol. 32. [9] J. M. Bower and H. Bolouri, Eds., Computational Modeling of Genetic and Biochemical Networks. Cambridge, MA: MIT Press, [10] B. Müller-Hill, The lac Operon - A Short History of a Genetic Paradigm. Berlin: Walter de Gryter, [11] M. Ptashne, A Genetic Switch: Phage λ and Higher Organisms, 2nd ed. Palo Alto, CA: Cell Press & Blackwell Scientific Publications, [12] M. Ptashne and A. Gann, Genes & Signals. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, [13] C. Jacob and J. Litorco, Immunity through swarms: Agent-based simulations of the human immune system, in Artificial Immune Systems, ICARIS 2004, Third International Conference. Catania, Italy: Springer, 2004, (to be published). [14] C. Jacob and I. Burleigh, Biomolecular swarms: An agent-based model of the lactose operon, Natural Computing, 2004, (in print). [15] L. Spector, J. Klein, C. Perry, and M. Feinstein, Emergence of collective behavior in evolving populations of flying agents, in Genetic and Evolutionary Computation Conference (GECCO-2003), E. C.-P. et al., Ed. Chicago, IL: Springer-Verlag, 2003, pp [16] G. K. Ackers, A. D. Johnson, and M. A. Shea, Quantitative model for gene regulation by λ phage repressor, Proc. Natl. Acad. Sci. USA: Biophysics, vol. 79, pp , [17] M. A. Shea and G. K. Ackers, The o r control system of bacteriophage lambda: A physical-chemical model for gene regulation, Journal of Molecular Biology, vol. 181, pp , [18] J. Hasty, D. McMillen, F. Isaacs, and J. Collins, Computational studies of gene regulatory networks: in numero molecular biology, Nature Reviews: Genetics, vol. 2, no. April, pp , [19] K. R. Heidtke and S. Schulze-Kremer, Design and implementation of a qualitative simulation model of λ phage infection, Bioinformatics, vol. 14, no. 1, pp , 1998, oxford University Press.