arxiv: v2 [physics.bio-ph] 15 Dec 2017

Size: px
Start display at page:

Download "arxiv: v2 [physics.bio-ph] 15 Dec 2017"

Transcription

1 Looping and Clutering model for the organization of protein-dna complexe on the bacterial genome Jean-Charle Walter, Nil-Ole Wallier, Gabriel David, Jérôme Dorignac, Frédéric Geniet, and John Palmeri Laboratoire Charle Coulomb, UMR5 CNRS-UM, Univerité de Montpellier, Place Eugène Bataillon, 3495 Montpellier Cedex 5, France Andrea Parmeggiani DIMNP, UMR535 CNRS-UM, Univerité de Montpellier, Place Eugène Bataillon, 3495 Montpellier Cedex 5, France arxiv:77.373v [phyic.bio-ph] 5 Dec 7 Ned S. Wingreen Department of Molecular Biology and Lewi-Sigler Intitute for Integrative Genomic, Princeton Univerity, Princeton NJ 8544, USA Chae P. Broederz Arnold-Sommerfeld-Center for Theoretical Phyic and Center for NanoScience, Ludwig-Maximilian-Univerität München, D-8333 München, Germany. (Dated: December 8, 7) The bacterial genome i organized by a variety of aociated protein inide a tructure called the nucleoid. Thee protein can form complexe on DNA that play a central role in variou biological procee, including chromoome egregation. A prominent example i the large ParB-DNA complex, which form an eential component of the egregation machinery in many bacteria. ChIP- Seq experiment how that ParB protein localize around centromere-like pars ite on the DNA to which ParB bind pecifically, and pread from there over large ection of the chromoome. Recent theoretical and experimental tudie ugget that DNA-bound ParB protein can interact with each other to condene into a coherent 3D complex on the DNA. However, the tructural organization of thi protein-dna complex remain unclear, and a predictive quantitative theory for the ditribution of ParB protein on DNA i lacking. Here, we propoe the Looping and Clutering (LC) model, which employ a tatitical phyic approach to decribe protein-dna complexe. The LC model account for the extruion of DNA loop from a cluter of interacting DNA-bound protein that i organized around a ingle high-affinity binding ite. Conceptually, the tructure of the protein-dna complex i determined by a competition between attractive protein interaction and the configurational and loop entropy of thi protein-dna cluter. Indeed, we how that the protein interaction trength determine the tightne of the loopy protein-dna complex. Thu, our model provide a theoretical framework to quantitatively compute the binding profile of ParB-like protein around a cognate (pars) binding ite. I. INTRODUCTION Undertanding the biophyical principle that govern chromoome tructure in both eukaryotic and prokaryotic cell remain an outtanding challenge [ 7]. Many bacteria have a ingle chromoome with a length three order of magnitude longer than the cell itelf, poing a daunting organizational problem. Owing to recent technological advance in live-cell imaging and chromoome conformation capture baed approache, it i becoming increaingly clear that the DNA i not coiled like a imple amorphou polymer inide the cell [8 ], but rather exhibit a high degree of organization over a broad range of lengthcale []. It remain unclear, however, how thi patial and dynamic organization of the chromoome i etablihed and maintained inide living bacteria []. A Electronic addre: c.broederz@lmu.de hot of Nucleoid-Aociated Protein (NAP) have been hown to play a central role in the patial organization of the bacterial chromoome [ 4]. Such NAP bind to the DNA in large number, and by interacting with each other and with DNA in both equence-dependent and equence-independent manner they can collectively tructure the DNA polymer and control chromoome organization. In many bacterial pecie, the broadly conerved ParABS ytem i reponible for chromoome and plamid egregation [, 5]. A central component of thi ytem i the partitioning module, which i formed by a large protein-dna complex of ParB protein that aemble around centromere-like pars ite, frequently located near the origin of replication. The ParBS complexe can ubequently interact with ParA ATPae, leading to the egregation of replicated origin [6 3]. How i thi ParBS partitioning module phyically organized on the DNA? ParB i known to bind pecifically to pars, triggering the formation of a large protein-dna cluter, which i viible a a tight focu in microcopy

2 (a) (b) (c) : DNA outide the cluter : DNA inide the cluter J = S = Intermediate S J = S = : ParB protein : ParB bound to pars : pars P() P() pars P()? pars pars P()~[+C] p() + C) -.5 d FIG. : Schematic illutration of two recent model propoed to decribe the ParB partition complex (left) accompanied with a typical ditribution of ParB on extended DNA (middle), and the average ditribution profile (right). The Spreading & Bridging model [9] i hown with (a) trong coupling, where thermal fluctuation cannot break the bond between protein uch that all bridging and preading interaction are atified, and (b) intermediate coupling where the energetic cot of breaking a preading bond compete with the configurational and loop entropy. With the Looping and Clutering approach preented here, we propoe a imple analytic decription for thi regime. (c) The Stochatic Binding model aume a pherical region of high concentration of ParB around pars [5]. Thi model can be een a taking the limit of the preading bond trength to zero ( ), and thu the formation of loop i not hampered by protein-protein bond. In thi limit, the binding profile can be decribed a the return of the polymer to an origin of finite ize, uch that the profile i given by P () (+C) dν, where d i the dimenion, ν i the Flory exponent, and C i a contant. image of fluorecently labeled ParB [5, 9, 4, 5]. The propenity of ParB to form foci around pars ha been exploited in recent tudie, which ued exogenou expreion of fluorecently labeled ParB along with pars inertion to label DNA loci for live-cell imaging [6, 7]. In the F-plamid of Echerichia coli cell, each ParB focu contain roughly 3 protein, together repreenting 9% of all ParB preent in the cell [5]. High-preciion ChIP-Seq experiment on thi ytem provide quantitative ParB binding profile along the DNA, which are trongly peaked around pars with a broad decay over a ditance of up to 3 kilobaepair (kb), conitent with earlier obervation [4, 8]. Variou model have been introduced to explain the ditribution of ParB along DNA around pars ite. An early tudy of the ditribution of ParB propoed that ParB protein pread from the pars equence by nearetneighbor interaction, forming a continuou filament-like tructure along the DNA [8]. Thi model wa termed the Spreading model. However, thi i effectively a D model with hort range interaction. On general tatitical phyical ground, uch a D model cannot be expected to account for the formation of a large coherent protein-dna complex, given phyiological protein interaction trength [9]. Furthermore, the number of ParB protein available in the cell i not ufficient to allow enrichment by imple D polymerization of ParB along DNA at genomic ditance from pars a large a oberved experimentally [5]. To reolve the puzzle of how ParB protein organize around a pars ite, we recently introduced a novel theoretical framework to tudy the collective behavior of interacting protein that can bind to a DNA polymer [9]. Thi model uggeted that ParB aemble into a three-dimenional complex on the DNA, a illutrated in Figure a,b. Single molecule experiment provided direct evidence for the preence of 3D bridging interaction between two ParB protein on DNA [3, 3]. We howed that a combination of uch a 3D bridging bond and D preading bond between ParB protein contitute a minimal model for the condenation of ParB protein on DNA into a coherent complex [9], conitent with the obervation that ParB-GFP fuion protein form a tight fluorecent focu on the DNA [5, 9, 4, 5]. The tatitical propertie of the 3D tructure of ParB- DNA complexe determine the binding profile of ParB on DNA, which can be accurately meaured in ChIP-Seq experiment. However, it i computationally demanding to imulate thee binding profile with the Spreading & Bridging model. The protein binding profile can be eaily calculated analytically in the limit of trong proteinprotein interaction, where the cluter of ParB on the DNA become compact with a correponding triangular ditribution of ParB along DNA. The protein binding profile can alo be etimated in the limit of weak proteinprotein interaction with the o-called Stochatic Binding model, where a phere of high ParB concentration i aumed to exit within which a DNA polymer freely fluctuate [5] (ee Figure c). The decription of the average protein binding profile i thu imilar to the return tatitic of the polymer into the ParB phere [3], uggeting a long range (power-law) ditribution of ParB protein along DNA. Importantly, however, neither of thee two exiting approache provide a imple way of computing ParB binding profile around pars ite over the full relevant range of ytem parameter. In addition, it remain unclear how the Spreading & Bridging model and the Stochatic Binding model relate to each other. Here, we propoe a theoretical approach to decribe the ditribution of ParB protein around pars ite on the DNA in term of molecular interaction parameter and protein expreion level. To thi end, we develop a imple model for protein-dna cluter that explicitly account for the competition between the poitional entropy aociated with placing the loop on the cluter, which favour a looer cluter configuration, and both protein-protein interaction and loop cloure entropy, which tend to favour a compact cluter. Thi Looping and Clutering model repreent a reduced, approximate

3 3 verion of the full Spreading & Bridging model that incorporate the key phyical ingredient needed to provide a clearer undertanding and at the ame time greatly facilitate calculation of the ditribution profile of ParB (or other protein that form protein-dna cluter). Thu, our approach can be ued to etimate molecular interaction between protein from experimentally determined protein binding profile. II. THE LOOPING AND CLUSTERING MODEL To theoretically decribe the protein binding profile of ParB on DNA, we firt conider a DNA polymer of length L that can move in pace on a 3D cubic lattice and with a finite number of protein m. Since the number of ParB protein in the protein-dna cluter ha been oberved to include the vat majority of protein in the cell [5], we employ a canonical enemble with a fixed number of ParB protein m in the ParB complex. Thee protein are able to diffue along the DNA. Importantly, in thi model the DNA itelf i alo dynamic and fluctuate between different three-dimenional configuration, which are affected by the preence of interacting DNA bound protein. When protein are bound to the DNA, they are aumed to be able to interact attractively with each other by contact interaction in two ditinct way: (i) D preading interaction with coupling trength, defined a an interaction between protein on nearet-neighbor ite along the polymer, and (ii) a 3D bridging interaction with trength J B between two protein bound to ite on non-nearet neighbor-ite on the DNA, but which are poitioned at nearet neighbor-ite in 3D pace (ee Figure a,b). Thu, thee bridging interaction couple to the 3D configuration of the DNA, while the D preading interaction do not. Single-molecule experiment provide evidence for bridging bond [3], with the bridging valency of a ParB protein limited to one [33, 34]. Even in thi cae where each protein can form two preading bond and a ingle bridging bond, the ytem ha been hown to exhibit a condenation tranition where the majority of the protein form a ingle large cluter that can be localized by a ingle pars ite on the DNA [9]. While it i poible to perform Monte Carlo imulation of the Spreading & Bridging model for a lattice polymer, uch imulation are computationally demanding. In thi paper, we aim to provide a imple analytical decription for the average binding profile of protein along the DNA (ee right panel in Figure ). With thi aim in mind, we can implify our decription by realizing that the configuration of ParB protein along the DNA are more enitive to than to J B, for ufficiently large J B. While both preading and bridging bond are neceary for the condenation of all protein into a ingle cluter, loop extruion from the cluter i controlled by, and uch loop extruion trongly impact the binding profile of protein on the DNA. Indeed, a loop can be extruded from the protein-dna cluter by breaking a preading bond, but without effecting the internal configuration of the bridging bond. Therefore, we will aume that J B i ufficiently large to maintain a coherent 3D protein-dna cluter, leaving a the main adjutable parameter in the model. A contiguou 3D cluter of protein on DNA with loop can effectively be repreented graphically by a diconnected D cluter along the DNA, where connection in 3D between the D ubcluter are implied, and domain of protein-free DNA within the diconnected D cluter repreent loop that emanate from the 3D cluter (ee Figure b,c). We can decribe thi ytem by a reduced model for the effective D cluter in which we account for the entropy of the loop that originate from the protein- DNA cluter. In thi model, the preading bond energy et by the parameter combined with the cot in loop cloure entropy, compete with the poitional entropy for placing loop on the cluter and will therefore play a crucial role in determining the binding profile of ParB on DNA around a pars ite. To capture thee effect, we propoe the reduced Looping and Clutering (LC) model, which offer a implified decription of 3D protein-dna cluter with preading and bridging bond. In thi model a loop i formed whenever there i a gap between D cluter. We can make the connection between the gap in the D cluter and the number of loop extending from the 3D cluter explicit by writing down the partition function for thi model. The effective D cluter correponding to a 3D cluter with m protein and n loop ha a multiplicity: Ω cluter = (m )! (m n )!n!, () which count the number of way in which one can partition m protein into n + ubcluter in D. Thi multiplicity lead to a poitional entropy of mixing, S cluter = ln Ω cluter, for placing n loop at m poible poition (in unit of k B ). Note, we do not explicitly include the number of way in which the bridging bond can be formed, ince loop formation i not expected to ubtantially affect the poible configuration of bridging bond. However, creating n loop will require breaking n preading bond, and the probability at equilibrium for thi to occur will include a Boltzmann factor exp ( n ), where the interaction energy i expreed in unit of k B T. Within our imple decription, we do not conider how the formation of a loop affect the full internal entropy of the protein-dna cluter, but thi can be expected to be a fixed number per loop that can be aborbed into. Furthermore, the loop that are formed are aumed to be independent, and thu contribute to the loop cloure entropy (in unit of k B ) a [3]: S loop = dν n ln(l i ), () i= where d i the patial dimenion, ν i the Flory exponent, and the loop length i meaured in unit of the lattice

4 4 pacing of the polymer a, which we take to be equal to the footprint of a ParB protein, e.g. 6 bp for the exogenou ParABS ytem of E. coli [5]. Thi entropy i obtained by conidering both the loop formed within the protein cluter and the protein-free egment of DNA outide the cluter. Indeed, the number of configuration aociated with loop i for a Gauian polymer i given by z li l dν i [3, 35], where z i the lattice coordination number. Therefore, there i alo an extenive contribution to the entropy given by k B l i log(z). However, when a loop of length l i form, the ame length of polymer i removed from the DNA outide of the cluter, which alo reult in a reduction of the entropy by k B l i log(z). Thu, there i a precie cancelation between the extenive contribution to the entropy aociated with the loop inide the cluter and the extenive contribution due to effectively hortening the DNA outide the cluter [39]. It i now traightforward to write down the partition function of the Looping and Clutering model: Z LC = m n= (m )! lmax lmax (m n )!n! exp( n) dl l dν... dl n l dν n l l [ + exp( )] m. (3) l max where = + ln [ l dν (dν ) ] > i a renormalized loop activation energy that include the cot in loop cloure entropy). All length are meaured in unit of the protein footprint a, l i the lower cutoff of loop ize and approximately repreent the peritence length of DNA, and the bond interaction are in unit of k B T. In the partition function, we conveniently et the upper boundary of integration, l max, to infinity. Strictly peaking, the upper boundary for l j hould be L (m + L j ), where L j = j i= repreent the total accumulated loop length before loop j. In practice, however, for chromoome, but arguably alo for plamid, L m and the probability to have a large loop i very mall. For intance, if we conider the F-plamid of E. coli with a length of 6 kbp, we have L = 375 in unit of the ParB footprint of 6 bp [5, 36]. For thi ytem, Monte Carlo (MC) imulation (ee Appendix A) of the LC model, with m = reveal that the average cumulated loop ize i 5 for mall coupling ( = ) down to 5 for large coupling ( = 4), which in both cae i much le than the DNA length. Thu, for biologically relevant cae it i reaonable to aume that the length of the DNA polymer i much larger than the footprint of the whole protein complex on the DNA. The LC model contitute a imple tatitical mechanic approach to decribe how protein aemble into a protein-dna cluter with multiple loop. Next, we will include a pars ite on the DNA, to which ParB protein bind with a higher affinity than the other non-pecific binding ite on the DNA. Our central aim i to compute the binding profile of ParB around thi pars ite. III. PROFILE OF PAR B FOR FIXED NUMBER AND SIZES OF LOOPS With our approach, we aim to quantitatively decribe average ParB binding profile, which are directly meaurable by ChIP-Seq experiment. By fitting our model to uch ChIP-Seq data, it would be poible to extract microcopic parameter uch a the number of protein in the ParB cluter and the protein-protein interaction parameter uch a. In thi ection, we will decribe how to compute the ParB binding profile around thi pars ite given a fixed number of loop with pecified loop length. Then we will ue the tatitical mechanic framework provided above, to perform a weighted average over all poible loop number and ize to arrive at a imple predictive theory for the ParB binding profile. A. -loop binding profile It i intructive to tart our analyi of ParB binding profile by firt calculating the probability of ParB occupancy a a function of ditance from the pars ite for the cae of a protein-dna cluter with only one DNA loop (n = ) with fixed loop length l. We will aume a fixed number m of ParB protein in thi -loop protein-dna cluter, and that one of thee protein i bound to the pars ite at any time, a illutrated in Figure. Thu, to calculate the -loop ParB binding probability, P (, l), at a ditance from pars, we need to conider all poible configuration of protein in the protein-dna cluter ubject to thee contraint. Firt, we note that P (, l) = for > m + l, becaue the D cluter can maximally extend to a ditance m + l, which occur when the D cluter adopt a configuration that lie entirely on one ide of the pars ite. For a binding ite at a ditance < m + l, the ParB

5 5 m l m-m m-m l m pars ` RE ' cae (i) ` pars cae (ii) RE ' FIG. : Schematic of the ytem with m protein and a ingle loop of ize l. The whole cluter i plit in two part: m i the number of protein in the cluter that overlap with pars and m m i the number of protein in the other cluter. The origin of the genomic coordinate i pars, the right edge of the ytem (RE) i located at the coordinate. We can divide the configuration into two equally likely cae: (i) the leftmot cluter overlap with pars or (ii) the rightmot cluter overlap with pars. P (,l) ` `l= l = = `l= l = =5 5 `l= l = =5 5 `l= l = =75 75 `l= l = = extruding from the cluter, i.e. an unoccupied ite on the DNA within the protein cluter, a depicted in Figure. The overbar here repreent the complementary condition, and the expreion above implifie becaue P (, l loop()) = by contruction. We can proceed to calculate the conditional probability, P (, l loop()), by decompoing thi contribution a a um of probabilitie of mutually excluive configuration, which are conditioned by the location of the right edge of the D ParB cluter denoted a end( ) (ee Figure ). Then, we will take a continuou limit for the binding profile auming m, and expre the binding profile P (, l loop()) in term of probabilitie, p end ( ), for the condition decribing the poition of the right edge of the cluter. Thu, we firt write the conditional probability P (, l loop()) for (the cae < i obtained by ymmetry) a: P (, l loop()) = m+l = m+l P (, l loop(), end( ))p end ( ), d Θ( )p end ( ). (5) FIG. 3: Protein occupation probability, P (, l), for a ite a genomic ditance from the pars ite for different loop length l and a fixed cluter ize of m = protein. Solid curve repreent analytic calculation from Eq. (4), (8), and (??), and dahed curve repreent data obtained from exact numerical enumeration for comparion to our analytical approximation. We note that for l =, we recover the triangular profile of the S&B model in the trong coupling limit [9]. binding probability i reduced, either by configuration where thi ite i located on the DNA loop within the D cluter, or by tate where the D cluter adopt a configuration around the pars ite that doe not extend to the binding ite at, placing thi ite outide the D cluter. To capture thee effect, it i helpful to expre P (, l) in term of conditional probabilitie: P (, l) = P (, l loop())p loop () + P (, l loop())p loop () = P (, l loop())p loop (), (4) where loop() repreent a condition with probability p loop () correponding to ite being part of a loop Clearly, P (, l loop(), end( )) = when < and zero otherwie, and thu we have replaced thi term by the Heaviide tep function Θ( ) and approximated the um by an integral in the econd line above. To calculate p end ( ), it i convenient to introduce two ubcluter, and, with m and m m protein repectively ( < m < m), uch that cluter with m protein i overlapping with pars, a hown in Figure. Given two uch ubcluter, two equally likely ituation can occur: (i) the leftmot cluter overlap with pars, i.e. m m + l < m + l or (ii) the rightmot cluter overlap with pars, i.e. < m. Thi directly allow u to contruct the conditional probability to find the right edge of the whole ytem, uch that one of the m protein in the cluter overlap with pars: p end ( m ) = m [Θ( (m m + l))θ(m + l ) + Θ(m )], (6) where the prefactor / come from the equal probabilitie to find the ytem in one of the two cae (i) and (ii). The condition (i) and (ii) are encoded with a product of two unit tep function for (i) and a ingle tep function for (ii). Each ingle realization can be obtained by hifting the poition of the ite in cluter overlapping with pars and i equally likely, giving rie to an overall prefactor /m. From thi, we can obtain the full probability p end ( ) by integrating over m :

6 6 p end ( ) m = Θ(m + l ) m(m ) dm p end ( m )p(m ) [Θ( (m + l ))(m ) + Θ(m + l )Θ( l )( l ) + Θ( )(m ) + Θ( )Θ(m )(m )] (7) where we ued p(m ) = m /(m(m )), ince the number of configuration to place cluter i m and m [, m ]. Uing thi expreion for the normalized probability ditribution for the right edge of the D cluter to be poitioned at, we can compute the conditional probability in Eq. (5): [ Θ(l + m ) (m ) P (, l loop()) Θ(l + ) + (m )Θ(l + m ) (m )m + (m )(m + l )Θ( m l + ) + (m + l )m l + 3)Θ( l ) ( ) (m ) + Θ( ) Θ(m ) + (m )( ) + ( m + ) Θ( )Θ(m ) ] (8) To obtain the full -loop protein ditribution (Eq. (4)), we firt need to compute the probability for a ite to not be part of loop, p loop () = p loop (). (9) If the loop denity, ρ, were uniform, we would imply have p uni loop () = lρuni (m, l) = l m+l, ince the D cluter ha a total length of m + l with a ingle loop of length l. Thi uniform condition would only apply if we randomly chooe l ite to be part of the loop and ignore the requirement that all thee loop ite need to be neighboring. In a real cluter, however, we expect the loop denity ρ loop () to be higher in the bulk of the D cluter than cloe to the pars ite or the edge, becaue fewer loop can be formed near the pars ite or near the boundarie of the D cluter, at which a protein mut be bound by contruction. In particular, we expect the loop denity, ρ(, m, l) min(, l, m+l ), which meaure the number of way a ite at can be part of a loop. Thi reult in the normalized probability: p loop () = lρ(, m, l) () l min(, l, m + l ) = Θ(m l) [l + l(m l)] + Θ(l m) ( ) m+l In the normalization of thi expreion we ditinguih the cae where the loop i either maller or larger than the number of protein in the cluter. With Eq. (8) and (), we have all the element to calculate the -loop protein binding profile P (, l) from Eq. (4). We invetigated the binding profile P (, l) predicted by thi model for a elected et of parameter, a hown in Figure 3. We only how > becaue of the ymmetry of the binding profile. It i intructive to contrat thee profile with the triangular profile (black curve) for a cluter with no loop. A expected, the addition of loop widen the profile, allowing the tail of the ditribution to extend out to a ditance m + l. The widening of the binding profile i accompanied by a fater decay of the profile in the vicinity of pars, which croe over to a flatter profile at ditance > l due to additional contribution from configuration where the loop lie between the pars ite and ite. Interetingly, for large loop ize the profile can even become non-monotonic with a light increae near the far edge of the domain. Thee feature of the profile reflect the reduced loop denity near pars and near the far edge of the cluter. Note that the integral under thi curve remain contant for varying l to conerve the number of particle in the cluter. To verify the validity of the analytical approximation leading to P (, l), we ued exact enumeration a a benchmark. Overall, the numeric (dahed line) and the analytic (olid line) are in good agreement for the -loop cae, a hown in Figure 3. In the next ection, we employ the approximate analytical expreion obtained above, to efficiently calculate the full binding profile averaged over all configuration.

7 7 <n> <n> p loop () (a) m=5 m= m= m=4 - <n>/m ~exp(- ) (b) (c) = = =3 =4 ~m <n> e m 3 m = = =3 = FIG. 4: (a) Average number of loop, n, a a function of preading coupling trength obtained from Eq. (). The different curve correpond to protein number m = 5 (black) m = (red), m = (green), and m = 4 (blue), with loop-ize cutoff l =. We oberve an exponential decreae n e in accordance with Eq.(). Inet: Same data replotted with expected dependence of average loop number on m caled out. (b) Average number of loop n a a function of m for =,, 3, and 4. The behaviour i linear a expected from Eq.(). The prefactor that determine the vertical hift between the different curve cale with e, a demontrated in the inet of panel (b). (c) Average loop probability a a function of the genomic coordinate with m = and L = 4 for protein-dna cluter with fluctuating loop number and loop length. Different curve correpond to different preading coupling =,, 3, and 4. The analytic approximation uing Eq.(5) for the loop denity, averaged over different loop configuration with the appropriate Boltzmann factor a in Eq. (4) i compared to MC imulation (dahed curve) of the LC model (ee Appendix A). IV. PROTEIN BINDING PROFILES AND STATISTICS OF THE LOOPING AND CLUSTERING MODEL Above we defined the Looping and Clutering model and calculated the binding profile of protein around a pars ite for a cluter with loop with fixed length. Real protein-dna cluter, however, are expected to fluctuate with new loop forming and diappearing continuouly. To capture uch fluctuation, we will ue the expreion for the binding profile of a tatic cluter with fixed loop length together with a tatitical mechanic decription of the LC model to obtain average binding profile for dynamic cluter, including an enemble average over both the number of loop and the loop length. To obtain a full binding profile averaged over all realization, it i ueful to invetigate the tatitic of loop that extend from the protein-dna cluter and how thee tatitic are determined by the underlying microcopic parameter of the model. We tart by conidering the number of loop that extend from the cluter. Uing the partition function in Eq. (3), it i poible to calculate the baic feature of the LC model. For intance, the moment of the ditribution of the number of loop are given by: n α = m Z LC n= [ n α (m )! (m n )!n! e n l dν dν ] n. () From thi, we find the the average loop number i given by: n = (m ) + e m,j S me, () where i the renormalized loop activation energy introduced in Eq. (3). The average loop number n i depicted in Figure 4a, demontrating the exponential dependence on the preading energy. In Figure 4b, we plot n a a function of the total number of protein m in the protein-dna cluter. Indeed, we oberve the expected linear dependence of the average loop number n on m over a broad range of parameter. Thee reult illutrate how the average number of loop i determined by the competition between the effective renormalized loop activation energy, (including the cot in loop cloure entropy), and the gain in the poitional entropy of mixing (ee Appendix B). The linear dependence on m in Eq. () reflect that loop are aumed to be able to form anywhere in the cluter in the Looping and Clutering model. However, one would naively expect that loop can only form at the urface of a 3D cluter, reulting in a dependence n m /3 for a compact, pherical cluter. However, Monte Carlo imulation of the full S&B model have revealed that the protein-dna cluter are not compact [9], but rather have a urface that cale almot linearly in m, cloe to the behavior of the implified LC

8 8 model preented here. The non-compact nature of the protein-dna cluter i perhap not urpriing becaue each protein can form only one bridging bond. A cloely related tatitic i the average accumulated loop length h`i. From the LC partition function, we notice that the loop length i completely decoupled from the coupling contant JS and depend only on the upper cutoff `max. Therefore, the cumulated average loop length become: (a) - PLC() PLC() PLC ~ -3 -dν (b) h`i JS = JS = JS = 3 JS = 4.8 PLC() `max ` 6 PLC () = ZLC Z ` FIG. 5: Binding profile of ParB from Eq. (4) plotted veru the genomic ditance to pars for (a) m =, (b), and (c) 4. In Eq. (4), the loop ize integral were calculated with a lower cutoff ` = and an upper cutoff of ` ; ummation were truncated at n = 5. The dark grey circle in panel (c) how experimental ChIP-Seq ParB enrichment data from the F-plamid of E. coli extracted from [5]. The inet in panel (a) how the binding profile of ParB veru genomic ditance to pars for JS =, ν =.588 (elf-avoiding polymer). The reult in thi inet were obtained by Monte Carlo imulation of the LC model (ee SI for detail). The data are plotted in log-log cale, we oberve the power law decay PLC dν a expected in the limit of low JS, where the LC model become conceptually imilar to the Stochatic Binding model. ρ(, m, `i, `) Θ(m + ` `i ) [`i dν ` dν max hni, dν ` dν (3) where the factor in front of hni repreent the average length per loop. Thi prefactor induce a mall algebraic dependence on `max, in contrat to hni which depend only on the lower cutoff `. The loop tatitic of protein-dna cluter are not eaily acceible in experiment. Intead, the mot relevant reult for which thi model can provide inight come from ChIP-Seq experiment. Thee experiment yield data for the enrichment of bound ParB a a function of genomic poition on the DNA, providing a meaure of the average protein binding profile of ParB on DNA [4, 5]. In the LC model, the ParB denity profile along DNA can be calculated from:.4 = m X (m )! exp( njs ) (4) (m n )!n! n= Z d` ` dν... d`n ` dν n Pn (, {`i }) where ZLC i given in Eq. (3). Here, Pn (, {`i }) repreent the multiloop ParB binding profile with n loop of length {`i } = {`,..., `n }. For implicity, we approximate thi multiloop profile by the analytical -loop conditional probability, P (, ` loop()), with the loop P length equal to the accumulated loop length, i.e. ` P i `i, weighted by n the loop probability ploop (, {`i }) i= `i ρ(, m, `i, `). In the expreion for the loop probability, ρ(, m, `i, `) i defined a the contribution to the loop denity of a loop of length `i in a cluter of m protein with a total accumulated loop length `, and we neglected correlation between contribution from different loop. Furthermore, we approximate ρ(, m, `i, `) by uing a generalization of the -loop expreion in Eq. (), min(, `i, m + ` ) + `i (m + ` `i )] + Θ(`i (` + m)) In the analyi above, we aimed to capture the effect ` m+`. (5) of multiple loop in a imple way by auming tatiti-

9 9 cal independence of the loop, and by uing the analytical -loop expreion to approximate the impact of loop formation on the loop denity and the ParB binding profile of the protein-dna complex. To tet the validity of thee approximation, we performed MC imulation of the complete LC model. We find that the numerically obtained average loop probability i in reaonable agreement with our approximate expreion for the multiloop denity, a hown in Figure 4c. Thu, depite the implicity of our approach, the analytical model provided here capture the eential feature of looping in protein-dna cluter. The protein binding profile P LC () around a pars ite i calculated by averaging the tatic binding profile for different total loop number and loop length uing the Boltzmann factor (ee Eq. (3)) from the Looping and Clutering model a the appropriate weighting factor. The reulting expreion in Eq. (4) for the protein binding profile of a protein-dna cluter i the central reult of thi paper. We ue thi expreion to compute binding profile for the full Looping and Clutering model, which are hown in Figure 5 a a function of the ditance to pars for m =,, and 4. By contruction, the ite = correponding to pars i alway occupied, and thu P ( = ) = for all value of the preading energy. Thi feature of the LC model capture the aumed trong affinity of ParB for a pars binding ite. For = 4, the binding profile converge to a triangular profile, implying a very tight cluter of protein on the DNA with almot no loop. The triangular profile in thi cae reult from all the ditinct configuration in which thi tight cluter can bind to DNA uch that one of the protein in the cluter i bound to pars, and therefore the probability drop linearly to at m. The ame triangular binding profile wa oberved for the S&B model in the trong coupling limit [9]. Interetingly, a become weaker, we oberve a fater decreae of the binding profile near pars together with a broadening of the tail of the ditribution for ditance far from pars. Thi behavior reult from the increae of the number of loop that extrude from the ParB-DNA cluter with decreaing preading bond trength. The inertion of loop in the cluter allow binding of ParB to occur at larger ditance from pars. Thu, the genomic range of the ParB binding profile i et by max m + l, where the average cumulated loop length l i controlled by (ee Eq. (3)) and m. Thee reult illutrate how the full average binding profile i controlled by the preading bond trength : the weaker, the looer the protein-dna cluter become, which reult in a much wider binding profile of protein around pars. In the limit, the LC model quantitatively reduce to the tatitic of non-interacting loop, a hown in the inet of Figure 5. In thi cae, the binding profile exhibit aymptotic behaviour P LC () dν for large, a in the Stochatic Binding model [5]. Interetingly, we oberve a weaker caling P LC () with at intermediate genomic ditance, which we attribute to P LC () = 3 P LC ().8 =.6.4. /m m = m = m = /m FIG. 6: Scaling function of the ParB binding profile for different total protein number m (ame data a Figure 5). The data for different total protein number m are plotted veru the dimenionle genomic ditance /m from pars (main graph: = 3, inet: = ). the reduced loop denity near pars (ee Figure 4c). To invetigate how the functional hape of the binding profile i determined by the total number of protein in the cluter, we plot the binding probability veru the caled variable /m for m =,, and 4, a hown in Figure 6. For fixed, the data approximately collape onto a ingle curve a a function of the caled ditance /m. Thi implie that the functional hape of the ParB binding profile i largely determined by the preading bond trength, while the number of protein in the cluter determine the width of the profile. V. DISCUSSION The Looping and Clutering model introduced here allow u to acce the average binding profile of protein making up a large 3D protein-dna complex. In our model, the formation of a coherent cluter of ParB protein i enured by a combination of preading and bridging bond between DNA bound protein, which together can drive a condenation tranition in which all ParB protein form a large protein-dna complex localized around a pars ite [9]. We do not aume, however, that thi protein-dna cluter i compact. Indeed, loop of protein-free DNA may extend from the cluter, which trongly influence the average patial configuration of protein along the DNA. In the LC model, the formation of loop in the protein-dna cluter i controlled by the trength of preading bond, i.e. the bond between protein bound to nearet neighbor ite on the DNA. Specifically, for every protein-free loop of DNA that extend from the cluter, a ingle preading bond between two protein within the cluter mut be broken. Thu, if the preading interaction energy,, i ufficiently mall, thermal fluctuation will enable the tranient formation and breaking of preading bond, thereby allowing multiple loop of DNA to emanate from the protein cluter (See Figure ). Conceptually, the preading bond interaction deter-

10 mine how looe the protein-dna cluter i, which directly impact the ParB binding profile. When i large, loop formation i unlikely, reulting in a compact protein-dna cluter with a correponding triangular protein binding profile centered around pars [9]. At intermediate, the protein-dna cluter become looer with the formation of loop, reulting in a binding profile that are more trongly peaked around pars but with far-reaching tail. Importantly, the LC model enable u to etablih a link between the Spreading & Bridging model and the Stochatic Binding model [5]. The firt ued a microcopic approach baed on the type of interaction between protein on the DNA polymer, while the econd employed a more macrocopic approach baed on the polymer configuration around a dene phere of protein. In the limit, the LC model i conitent with the Stochatic Binding model with a profile of the form [5] given by P () dν (inet Figure 5a). Thu, the LC model offer a decription for a broad parameter regime, connecting two limit invetigated in preceding tudie [5, 9]. The Looping and Clutering model, which we introduce to calculate the binding profile of ParB-like protein on the DNA, i a imple theoretical framework imilar to the Poland-Scheraga model for DNA melting [37, 38]. An important difference in the LC model with repect to the homogeneou Poland-Scheraga model, i that tranlational ymmetry i broken due to the preence of a pars ite at which a protein i bound with a high affinity. Thu, the protein-dna cluter can adopt a wide range of configuration a long a one of the protein i bound to the pars ite. A a reult, loop are effectively excluded in the vicinity of pars. The central new reult of thi work i a imple way of computing the protein binding profile around uch a pars ite in term of molecular interaction parameter. We how that the binding profile predicted by thi model are enitive to both the expreion level of protein and the preading interaction trength, which directly control the formation of loop in the protein- DNA cluter. The LC model predict a profile in good quantitative agreement with binding profile meaured with ChIP-Seq on the F-plamid of E. coli, a hown in Fig. 5c. Importantly, from thi analyi we extract the preading interaction trength k B T and the number of protein in the cluter m 4. Our reult alo have implication for experiment that employ fluorecent labelling of DNA loci by exogenou ParB [6, 7]. Indeed, our model can be ued to invetigate how the protein interaction trength determine the 3D tructure and mobility of the ParB-DNA cluter, a well a the tendency of multiple ParB foci to adhere to each other. Thi model thu provide an inightful quantitative tool that could be employed to analyze and interpret ChIP-Seq and fluorecence data of ParB-like protein on chromoome and plamid. Acknowledgment Thi project wa upported by the German Excellence Initiative via the program NanoSytem Initiative Munich (NIM) (C.P.B.), the Deutche Forchunggemeinchaft (DFG) Grant TRR74 (C.P.B), and the National Science Foundation Grant PHY-3555 (N.S.W.). We alo thank J.-Y. Bouet for helpful comment on the manucript. The author acknowledge financial upport from the Agence Nationale de la Recherche (IBM project ANR-4-CE9-5-) and from the CNRS Défi Inphyniti (Projet Structurant 5-6). Thi work i alo part of the program Invetiement d Avenir ANR- -LABX- and Labex NUMEV (AAP 3--5, 5--55, 6--4). [] Dekker, J., Marti-Renom, M. A., & Mirny, L. A. (3). Exploring the three-dimenional organization of genome: interpreting chromatin interaction data. Nature Review Genetic, 4(6), [] Scolari, V. F., & Lagomarino, M. C. (5). Combined collape by bridging and elf-adheion in a prototypical polymer model inpired by the bacterial nucleoid. Soft matter, (9), [3] Dame, R. T., Tark-Dame, M., & Schieel, H. (). A phyical approach to egregation and folding of the Caulobacter crecentu genome. Molecular microbiology, 8(6), [4] Jun, S., & Mulder, B. (6). Entropy-driven patial organization of highly confined polymer: leon for the bacterial chromoome. Proceeding of the National Academy of Science, 3(33), [5] Emanuel, M., Radja, N. H., Henrikon, A., & Schieel, H. (9). The phyic behind the larger cale organization of DNA in eukaryote. Phyical biology, 6(), 58. [6] Marenduzzo, D., Micheletti, C., & Cook, P. R. (6). Entropy-driven genome organization. Biophyical journal, 9(), [7] Mirny, L. A. (). The fractal globule a a model of chromatin architecture in the cell. Chromoome reearch, 9(), [8] Umbarger, M. A., Toro, E., Wright, M. A., Porreca, G. J., Bau, D., Hong, S. H., & Shapiro, L. (). The threedimenional architecture of a bacterial genome and it alteration by genetic perturbation. Molecular cell, 44(), [9] Le, T. B., Imakaev, M. V., Mirny, L. A., & Laub, M. T. (3). High-reolution mapping of the patial organization of a bacterial chromoome. Science, 34(659), [] Viollier, P. H., Thanbichler, M., McGrath, P. T., Wet, L., Meewan, M., McAdam, H. H., & Shapiro, L. (4). Rapid and equential movement of individual chromoomal loci to pecific ubcellular location during bacterial DNA replication. Proceeding of the National Academy

11 of Science of the United State of America, (5), [] Lagomarino, M. C., Epéli, O., & Junier, I. (5). From tructure to function of bacterial chromoome: Evolutionary perpective and idea for new experiment. FEBS letter, 589(PartA), [] Wang, X., Llopi, P. M., & Rudner, D. Z. (3). Organization and egregation of bacterial chromoome. Nature Review Genetic, 4(3), 9-3. [3] Dillon, S. C., & Dorman, C. J. (). Bacterial nucleoidaociated protein, nucleoid tructure and gene expreion. Nature Review Microbiology, 8(3), [4] Dame, R. T. (5). The role of nucleoid?aociated protein in the organization and compaction of bacterial chromatin. Molecular microbiology, 56(4), [5] Mohl, D. A., & Gober, J. W. (997). Cell cycle-dependent polar localization of chromoome partitioning protein in Caulobacter crecentu. Cell, 88(5), [6] Lim, H. C., Surovtev, I. V., Beltran, B. G., Huang, F., Bewerdorf, J., & Jacob-Wagner, C. (4). Evidence for a DNA-relay mechanim in ParABS-mediated chromoome egregation. Elife, 3, e758. [7] Banigan, E. J., Gelbart, M. A., Gitai, Z., Wingreen, N. S., & Liu, A. J. (). Filament depolymerization can explain chromoome pulling during bacterial mitoi. PLoS Comput Biol, 7(9), e45. [8] Ptacin, J. L., Lee, S. F., Garner, E. C., Toro, E., Eckart, M., Comolli, L. R., & Shapiro, L. (). A pindle-like apparatu guide bacterial chromoome egregation. Nature cell biology, (8), [9] Le Gall, A., Cattoni, D. I., Guilha, B., Mathieu- Demazière, C., Oudjedi, L., Fiche, J. B., & Nollmann, M. (6). Bacterial partition complexe egregate within the volume of the nucleoid. Nature Communication, 7. [] Walter, J. C., Dorignac, J., Lorman, V., Rech, J., Bouet, J. Y., Nollmann, M., Palmeri, J., Parmeggiani A. & Geniet, F. (7). Surfing on protein wave: proteophorei a a mechanim for bacterial genome partitioning. Phyical Review Letter 9(), 8. [] Vecchiarelli, A. G., Neuman, K. C., & Mizuuchi, K. (4). A propagating ATPae gradient drive tranport of urface-confined cellular cargo. Proceeding of the National Academy of Science, (3), [] Jindal, L., & Emberly, E. (5). Operational principle for the dynamic of the in vitro ParA-ParB ytem. PLOS Comput Biol, (), e465. [3] Surovtev, I. V., Campo, M., & Jacob-Wagner, C. (6). DNA-relay mechanim i ufficient to explain ParA-dependent intracellular tranport and patterning of ingle and multiple cargo. Proceeding of the National Academy of Science, 3(46), E768-E776. [4] Breier, A. M., & Groman, A. D. (7). Whole?genome analyi of the chromoome partitioning and porulation protein SpoJ (ParB) reveal preading and origin?dital ite on the Bacillu ubtili chromoome. Molecular microbiology, 64(3), [5] Sanchez, A., Cattoni, D. I., Walter, J. C., Rech, J., Parmeggiani, A., Nollmann, M., & Bouet, J. Y. (5). Stochatic elf-aembly of ParB protein build the bacterial DNA egregation apparatu. Cell ytem, (), [6] Chen, B., Guan, J., & Huang, B. (6). Imaging pecific genomic DNA in living cell. Annual review of biophyic, 45, -3. [7] Saad, H., Gallardo, F., Dalvai, M., Tanguy-le-Gac, N., Lane, D.,& Bytricky, K. (4). DNA dynamic during early double-trand break proceing revealed by nonintruive imaging of living cell. PLoS Genet, (3), e487. [8] Rodionov, O., Lobocka, M., & Yarmolinky, M. (999). Silencing of gene flanking the P plamid centromere. Science, 83(54), [9] Broederz, C. P., Wang, X., Meir, Y., Loparo, J. J., Rudner, D. Z., & Wingreen, N. S. (4). Condenation and localization of the partitioning protein ParB on the bacterial chromoome. Proceeding of the National Academy of Science, (4), [3] Graham, T. G., Wang, X., Song, D., Eton, C. M., van Oijen, A. M., Rudner, D. Z., & Loparo, J. J. (4). ParB preading require DNA bridging. Gene & development, 8(), [3] Taylor, J. A., Patrana, C. L., Butterer, A., Perntich, C., Gwynn, E. J., Sobott, F.,... & Dillingham, M. S. (5). Specific and non-pecific interaction of ParB with DNA: implication for chromoome egregation. Nucleic acid reearch, 43(), [3] Genne, P. G. D. (979). Scaling concept in polymer phyic. [33] Leonard, T. A., Butler, P. J., & Löwe, J. (5). Bacterial chromoome egregation: tructure and DNA binding of the Soj dimer?a conerved biological witch. The EMBO journal, 4(), 7-8. [34] Fiher, G. L., Patrana, C. L., Higman, V. A., Koh, A., Taylor, J. A., Butterer, A., & Moreno-Herrero, F. (7). The C-Terminal Domain Of ParB I Critical For Dynamic DNA Binding And Bridging Interaction Which Condene The Bacterial Centromere. biorxiv, 986. [35] Hanke, A., & Metzler, R. (3). Entropy lo in longditance DNA looping. Biophyical journal, 85(), [36] Bouet, J. Y., & Lane, D. (9). Molecular bai of the upercoil deficit induced by the mini-f plamid partition complex. Journal of Biological Chemitry, 84(), [37] Poland, D., & Scheraga, H. A. (97). Theory of helixcoil tranition in biopolymer. [38] Everaer, R., Kumar, S., & Simm, C. (7). Unified decription of poly- and oligonucleotide DNA melting: Nearet-neighbor, Poland-Sheraga, and lattice model. Phyical Review E, 75(4), 498. [39] Although thi reaoning i not trictly true for elfavoiding polymer, it doe hold if we adopt the uual approximation ued in the Poland-Scheraga model for DNA melting that elf-avoidance act only within individual loop Appendix A: Monte Carlo imulation and numerical integration procedure. Monte Carlo procedure Uing the partition function, we can formulate an effective D Hamiltonian for the LC model, which explicitly account for the balance between preading bond and loop entropy:

12 P LC () (a) m = = = =3 =4 3 FIG. 7: The binding profile obtained with the analytic approach (ymbol) are compared to MC imulation (dahed line) for m =, l =, and =,, 3 and 4. L H LC = i= φ i φ i+ + dν n i= ln(l i + l ). (A) Thi effective Hamiltonian i ueful to perform Monte Carlo imulation of the model a a benchmark for the approximation performed in the analytical approach (ee Fig.7). The protein are modelled a particle that bind/unbind onto ite of a one-dimenional lattice with free boundary condition. The lattice ize L = 4 i choen to prevent finite ize effect for the range of protein conidered. Note that, in thee MC imulation, the total ize of the loop i limited to L m. The imulation are performed with the tandard Metropoli rule:. Propoe a move of a particle randomly choen to a random empty ite of the lattice (conerved order parameter). A MC iteration tep conit of m attempt of move.. Calculate the difference of energy H = H f H i between final and initial configuration. 3. If H <, the move i accepted with probability, otherwie it i accepted with probability exp( β H). The ytem i et initially with all particle in a ingle cluter ( = ), and then thermalized to the actual of the imulation ranging from to 4 (ee Fig. 7). The ampling tart after thermalization of the ytem (4 MC iteration). A ampling of the ytem configuration i performed every MC iteration. All MC average have been performed over 7 configuration, ν =.588, L = 4 and l =. The numerical reult of thi Monte Carlo imulation are in good agreement with our approximate analytic reult, a hown in Fig. 7.. Numerical integration To evaluate the binding profile P LC (), we proceeded a follow. We carried out the evaluation of the implified expreion in Eq. (4) uing numerical ummation and integration. We truncated the ummation at n = 5, intead of going up to m, baed on the correponding average number of loop of Fig. 5. Finally, we introduced an upper cutoff for the loop-length, l max =, intead of going up to infinity. We confirmed that hape of the binding profile doe not change ignificantly for higher value of l max =. The numerical evaluation of the multidimenional integral in Eq. (4) have been performed with an accuracy and preciion of repectively and 3 effective digit in the final reult. We have carried out convergence tet of the curve hape in order to ae our parameter choice and rule out numerical intabilitie. All computation have been performed by routine written in the Wolfram Language and executed by the Mathematica oftware uite (verion and ). Appendix B: Formal connection between the LC model and a Lattice Ga with renormalized coupling For m and n (thermodynamic limit), we can formulate a addle point approximation to evaluate the partition function and n, by approximating the entropic (factorial) term in Z (Eq.()) uing the tandard entropy of mixing for placing n loop on m poible ite. Thi approach give phyical inight into how the loop entropy contribute to a renormalized protein-protein interaction and how the competition between thi renormalized interaction and the entropy of mixing control n. Taking the thermodynamic limit lead to a partition function: Z = dρ l exp [ (m )F eff (ρ l )], (B) where ρ l = n/(m ) i the concentration of loop ( ρ l ) and F eff (ρ l ) = ρ l J S + {[ ρ l ] ln[ ρ l ] + ρ l ln[ρ l ]} (B) an effective free energy where = +ln α i a loop activation energy renormalized by the cot in loop entropy with α = l dν (dν ). In the limit m, the approximate partition function Z become exact and can be evaluated exactly in the addle point approximation by minimizing F eff. The olution, ρ SP, to the addle point equation, df eff (ρ l )/dρ l =, i ρ SP =. (B3) + e The entropic contribution to F eff (econd term) vanihe at ρ l = and, and reache a minimum at ρ l = /,