1 J Mol Evol (2008) 67: DOI / x Beneficial Fitne Effect Are Not Exponential for Two Virue Darin R. Rokyta Æ Craig J. Beiel Æ Paul Joyce Æ Martin T. Ferri Æ Chritina L. Burch Æ Holly A. Wichman Received: 17 December 2007 / Accepted: 21 July 2008 / Publihed online: 9 September 2008 Ó The Author() Thi article i publihed with open acce at Abtract The ditribution of fitne effect for beneficial mutation i of paramount importance in determining the outcome of adaptation. It i generally aumed that fitne effect of beneficial mutation follow an exponential ditribution, for example, in theoretical treatment of quantitative genetic, clonal interference, experimental evolution, and the adaptation of DNA equence. Thi aumption ha been jutified by the tatitical theory of extreme value, becaue the fitnee conferred by beneficial mutation hould repreent ample from the extreme right tail of the fitne ditribution. Yet in extreme value theory, there are three different limiting form for right tail of ditribution, and the exponential decribe only thoe of ditribution in the Gumbel domain of attraction. Uing beneficial mutation from two virue, we how for the firt time that the Gumbel domain can be rejected in favor of a ditribution with a right-truncated tail, thu providing evidence for an upper bound on fitne effect. Our data alo violate the common aumption that malleffect beneficial mutation greatly outnumber thoe of large effect, a they are conitent with a uniform ditribution of beneficial effect. D. R. Rokyta H. A. Wichman (&) Department of Biological Science, Univerity of Idaho, P.O. Box , Mocow, ID , USA C. J. Beiel P. Joyce Department of Mathematic and Department of Statitic, Univerity of Idaho, Mocow, ID 83844, USA M. T. Ferri C. L. Burch Department of Biology, Univerity of North Carolina, Chapel Hill, NC 27599, USA Keyword Fitne ditribution Extreme value theory Adaptation Bacteriophage Viru Introduction Adaptation i a proce in which beneficial mutation increae in frequency in a population. The ditribution of fitne effect i central to many apect of thi proce and influence, for example, the rate of adaptation (Wilke 2004) and the mean fitne increae due to the fixation of a beneficial mutation (Orr 2003). Beneficial mutation of large effect have hitorically been aumed to be rare relative to thoe of mall effect, an idea propounded by Fiher (1930), and more recently it ha been argued that beneficial fitne effect hould in fact be approximately exponentially ditributed. The exponential ditribution ha become a prominent aumption in theoretical tudie of the genetic of adaptation, erving a the tarting point for theorie of quantitative genetic (Otto and Jone 2000), clonal interference (Gerrih and Lenki 1998; Rozen et al. 2002; Wilke 2004), experimental evolution (Wahl and Krakauer 2000), and the adaptation of DNA equence (Gillepie 1983, 1984, 1991; Orr 2002, 2003; Rokyta et al. 2006). The firt theoretical jutification for thi aumption wa provided by Gillepie (1983, 1984, 1991): if beneficial mutation are rare, then the relevant portion of the full fitne ditribution i the extreme right tail. If we conider the fitnee of all poible genotype differing from the wild type by a ingle nucleotide change a a large ample from an unknown fitne ditribution, the vat majority of them will fall below the wild type fitne. A fitne greater than the wild type i a rare event and, thu, lie in the extreme right tail of the fitne ditribution. Tail for many ditribution have limiting

2 J Mol Evol (2008) 67: form that are only weakly dependent on the parent ditribution. Furthermore, the limiting form which decribe the tail of mot commonly encountered ditribution (e.g., normal, gamma, exponential, etc.) i, in fact, the exponential ditribution. Ditribution of thi form belong to the Gumbel domain of attraction in extreme value theory (EVT). Since it i generally accepted that beneficial mutation are rare, it eem reaonable to aume that EVT can provide information regarding the ditribution of beneficial fitne effect. However, according to EVT, the exponential i but one of three poible limiting tail ditribution, a ha been noted previouly in the context of adaptation theory (Orr 2005, 2006). The other looely correpond to ditribution with heavier than exponential tail (the Fréchet domain) and ditribution which are right-truncated (the Weibull domain); ee Leadbetter et al. (1980) for more precie decription. The tail for each of thee three EVT domain can all be decribed by the generalized Pareto ditribution (GPD). It cumulative ditribution function i given by 8 1=j; >< 1 1þ jx x0; if j[0 ðfrechetþ 1=j; 0x\ Fxjj; ð Þ¼ 1 1þ jx j ; if j\0 ðweibull) >: 1 e x= ; x0; if j¼0 ðgumbel) ð1þ with hape parameter j and cale parameter. The hape parameter, j, determine the domain (Pickand 1975), with j = 0 correponding to the Gumbel domain, j [ 0 correponding to the Fréchet domain, and j \ 0 correponding the Weibull domain (illutrated in Fig. 1). Thi formulation decribe the limiting ditribution of the tail above a high threhold (et to zero here). In the context of beneficial mutation, if the threhold i et to the fitne of the wild type, the GPD would decribe the ditribution of beneficial fitne effect. However, the GPD hape parameter i table with repect to change in the threhold (Catillo and Hadi 1997), thu any high threhold i equivalent for characterizing the domain of attraction. Uing a tatitical method tailored to thi problem decribed by Beiel et al. (2007), we teted the null hypothei that the fitne ditribution ha an exponential tail (j = 0 under the GPD) for two et of beneficial mutation from virue for which the identitie of the mutation were determined by equencing. One et conited of nine different beneficial mutation for the DNA bacteriophage ID11, elected for high growth rate in liquid culture at 37 C on hot Echerichia coli C (Rokyta et al. 2005). The econd et conited of 16 beneficial mutation for RNA phage /6, elected for ability to grow on a novel hot (Ferri et al. 2007). Material and Method Likelihood Ratio Tet The likelihood ratio tet (LRT) and it tatitical propertie have been decribed in detail by Beiel et al. (2007). Briefly, negative twice the difference in log likelihood, -2logK, i calculated baed on the GPD, comparing the null model j = 0 to the alternative j = 0. Thu, the tet determine whether the data are more conitent with a fitne ditribution in the Gumbel domain of attraction (i.e., a ditribution with an exponential tail) or either the Fréchet (j [ 0) or Weibull (j \ 0) domain, auming that the data conit of value above a high threhold. Under the GPD, A Probability denity Gumbel (exponential) and Fréchet (heavy-tailed) κ = 0 (Gumbel) κ = 10 κ = 5 κ = 1 B Probability denity Weibull (truncated) κ = -1.1 κ = -1.0 κ = κ = x Fig. 1 A comparion of right tail for the three domain of attraction under the generalized Pareto ditribution (GPD). The hape parameter j determine the domain, and the cale parameter = 1 for all example. (a) The right tail for ditribution in the Gumbel domain correpond to the GPD with j = 0 (i.e., the exponential ditribution) x Ditribution in the Fréchet domain (j[ 0) have tail that are heavier than exponential. (b) Ditribution in the Weibull domain have righttruncated tail and correpond to the GPD with j \ 0. Truncation point are denoted by vertical gray line

3 370 J Mol Evol (2008) 67: the parameter of interet, j, i table with repect to change in the threhold (Catillo and Hadi 1997). Thu, it i not neceary to ue the wild type fitne a the threhold to characterize the domain of attraction for fitne ditribution. Beiel et al. (2007) argued that fitnee hould be hifted relative to the fitne of the malleteffect mutation oberved rather than the wild type, at the cot of a ingle degree of freedom. Thi reduce poible bia introduced by miing mall-effect beneficial mutation in an empirical ample and enure that the threhold i high enough for EVT to apply. Meaurement error can be eaily incorporated into the tet, but Beiel et al. (2007) howed that doing o ha no ignificant effect on type I error rate a long a the coefficient of variation for the data i relatively mall (\*0.2). Although the ditribution of the tet tatitic, -2logK, i aymptotically v 2 1 ; we ued a parametric boottrapping approach ince our ample ize were mall. The p-value are baed on 10,000 parametric boottrap replicate. All analye were performed uing R (R Development Core Team 2006). An Approximate Method for Etimating j The GPD ha been widely applied to problem in engineering and finance, but i not commonly encountered in biological application (but ee Orr 2006). Much of the tatitical theory concerning the GPD involve aymptotic reult that decribe approximation to the ditribution of the maximum likelihood etimator (MLE) which are valid for large ample ize. To obtain thee reult, it i neceary to retrict the range of the parameter pace. For example, the tatitical literature on the etimation of the parameter of the GPD generally ignore the cae of j \ -1/2, ince for -1 B j \ -1/2 aymptotic propertie do not obtain, and for j \ -1, the MLE doe not exit (Catillo and Hadi 1997). However, thee retriction are artificial and are made only for mathematical convenience. In applying the tet of Beiel et al. (2007) to our data, we found that ^j ¼ 1 (ee Reult), but a part of the teting procedure, it wa neceary to retrict j C-1, uggeting that the true value of j could potentially be le than -1. Thu, to addre parameter etimation for value in thi range, we derived a imple procedure for etimating j for value near -1. The probability denity function for the GPD i given by þ jx jþ1 j >< ; x 0; if j [ 0 fðxjj; Þ ¼ >: 1 1 jþ1 1 þ jx j ; 0 x\ j e x= ; x 0; if j ¼ 0: ; if j\0 ð2þ We retrict our attention to the cae where j \ 0 and let k =-/j, which correpond to the right truncation point. If k i known, then the MLE for j can be calculated directly uing equation (2) and i given by ^j ¼ 1 n X n i¼1 lnð1 X i =kþ: ð3þ Under thee condition, it can be hown that lnð1 X i =kþ follow the exponential ditribution with mean -j. Thi implie that Eð^j Þ ¼ j;varð^j Þ ¼ j 2 =n; and 2n^j=j follow the v 2 ditribution with 2n degree of freedom. Now conider an ordered ample of ize n from the GPD, X (1), X (2),..., X (n), where X (n) i the larget value from the ample and X (1) i the mallet. For j B -1 and moderate ample ize, X (n) will be cloe to k. Thu, we can take ^k ¼ X ðnþ a an etimate of k. However, if we replace k with ^k ¼ X ðnþ in Eq. 3 above, the um i undefined. To circumvent thi problem we imply drop the X (n) term in (3) to get the approximate etimator ^j ¼ 1 X n 1 n 1 i¼1 ln 1 X ðiþ =X ðnþ : ð4þ An approximate confidence interval can be contructed by auming that 2ðn 1Þ^j=j follow a v 2 with 2(n - 1) degree of freedom. Performance of thi new etimation procedure wa aeed through imulation uing R (R Development Core Team 2006). We ued ample ize of 10 and 30, and conidered 3 j 1=2 with right truncation point k ¼ =j ¼ 10. For each combination of ample ize and j, we generated 100,000 ample from the GPD and calculated ^j and it 95% confidence interval. Data Set The detail of the experimental protocol for iolating, identifying, and meauring the fitnee of the nine beneficial mutation for the DNA phage ID11 are decribed by Rokyta et al. (2005). The raw fitne data and identitie of the mutation are lited in Table 1. Briefly, 20 replicate lineage were elected for increaed growth rate in liquid culture on hot E. coli C at 37 C. Population were repeatedly bottlenecked to *10 4 phage to minimize the effect of clonal interference. For each lineage, a ingle beneficial mutation wa allowed to fix and wa identified through full genome equencing, yielding a total of nine unique beneficial mutation. The fitne of each unique mutation wa meaured in 10 replicate a the log 2 increae in the phage population per 15 min (approximately one generation). We ignored the frequencie at which the variou beneficial mutation fixed, and conidered the unique fixed beneficial mutation to be a biaed ample from the ditribution of new beneficial mutation. Thi approach

4 J Mol Evol (2008) 67: wa dicued at length by Beiel et al. (2007) and i addreed further in the Dicuion and in Fig. 2a. Although election will bia the ample toward mutation Table 1 The nine beneficial mutation for the phage ID11 elected for rapid growth at 37 C on Echerichia coli C (Rokyta et al. 2005) Genome poition a Nucleotide ubtitution Amino acid poition b Amino acid ubtitution Fitne c SE d Ancetor A? G F421 D? G A? G F322 N? S G? T F3 V? F A? G F419 T? A C? T F314 A? V C? T J15 A? V G? A/T F416 M? I C? T F355 P? S G? T J20 V? L a Poition correpond to the publihed equence of ID11 (GenBank acceion number AY751298) b The letter indicate the gene name and the number give the amino acid poition c Fitne i meaured a the log 2 increae in population ize per 15 min d Baed on 10 replicate of larger effect, hifting the fitnee to be relative to the fitne of the mallet-effect mutation oberved hould largely eliminate thi iue. The protocol for iolating and characterizing the beneficial mutation for RNA phage /6 have been decribed in detail elewhere (Ferri et al. 2007). Briefly, /6 clone were plated on the nonpermiive hot Peudomona yringae pv. glycinea train R4a uch that only phage with hot range mutation were able to form plaque. For each of 40 replicate, a ingle randomly elected plaque wa choen, and it P3 gene wa equenced, a it gene product ha been previouly aociated with hot range expanion (Duffy et al. 2005). Nineteen unique genotype were identified, though we excluded three from our analye. Two genotype were excluded becaue they were found to have two different mutation in P3, and the other wa excluded ince it mutation gave rie to the ame amino acid ubtitution a another. Thu, thi data et conited of 16 unique hot range mutation (Table 2). Fitne wa meaured in ix replicate a log 10 of the number of progeny per initial plaque after 24 h of growth on plate on the novel hot. A thee are gain-of-function mutation, the wild type ha a fitne of zero under the new condition. In addition, there i a potential bia in that only mutation with large enough beneficial effect to allow the formation A Sampling mutation through election wt threhold Fitne unlikely to be oberved B Gain-of-function mutation wt unobervable mutation threhold Fitne not GPD GPD Fig. 2 Two method of ampling beneficial mutation. The haded curve repreent a hypothetical fitne ditribution, and the vertical gray bar repreent a ample of beneficial mutation. The location of the vertical bar on the horizontal axi repreent their fitnee, and their height repreent their hypothetical relative probabilitie of being oberved in a ingle experiment under the ampling trategy. Note that only the fitnee, and not their probabilitie of being oberved, are conidered in the tet. (a) Under election, the wild type fitne lie in the tail but ampling i biaed toward large-effect mutation. Repeated ampling will yield multiple obervation of the ame mutation, though thoe of very mall effect may not be oberved in a reaonable number of replicate. Shifting alleviate the bia due to miing very mall-effect mutation. (b) In gain-offunction experiment, the wild type fitne will not lie in the tail and may in fact be zero, and ome beneficial mutation may not have effect large enough to be oberved. Mutation with effect above ome threhold will have equal probabilitie of being ampled, and a thee are the fittet mutation, they will lie in the tail. The fitne of the mallet-effect mutation from thoe ampled provide a threhold in the tail of the ditribution

5 372 J Mol Evol (2008) 67: Table 2 The 16 beneficial mutation for the phage /6 elected for the ability to grow on novel hot Peudomona yringae pv. glycinea (Ferri et al. 2007) a Fitne wa meaured a the log 10 of the number of phage per plaque per 24 h on plate b Baed on ix replicate Mutant ID Gene poition Nucleotide ubtitution Amino acid poition Amino acid ubtitution Fitne a G19 Not in P3 Unknown Unknown Unknown g G? A 535 D? N g C? T 339 P? H g25 23 A? G 8 E? G g24 22 G? A 8 E? K g A? C 554 D? A g A? T 554 D? V g A? G 554 D? G g C? T 555 L? F g22 13 G? A 5 G? S g G? A 554 D? N g A? C 533 D? A g8 434 A? G 145 D? G g5 437 A? G 146 N? S g A? G 516 T? A g G? C 178 E? D SE b of a viible plaque on a plate will be ampled. Shifting the fitnee relative to the fitne of the mallet-effect mutation oberved alleviate both of thee iue. The ue of thi type of data for teting the domain of attraction for fitne ditribution wa dicued in detail by Beiel et al. (2007) and i addreed further in the Dicuion and in Fig. 2b. Reult Performance of the New Etimator for j We explored the behavior of our new etimator of the hape parameter of the generalized Pareto ditribution (GPD) uing imulation with ample ize of 10 and 30 (Fig. 3). We conidered 3 j 1=2 with right truncation point k ¼ =j et to 10. The behavior at each combination of j and ample ize wa approximated by 100,000 imulated data et. Etimate of the hape parameter, ^j, are lightly biaed toward -1 for j = 1,with bia increaing with the ditance from -1 and decreaing with increaing ample ize (Fig. 3a). The approximate 95% confidence interval performed a expected for 2\j\ 0:8 (Fig. 3b). For 3\j\ 2; the confidence interval captured the truth *93% of the time, and for 0:8\j\ 0:5; their performance wa poor. Thee reult were imilar acro both ample ize conidered. Analyi of Two Viral Data Set For both data et, we analyzed meaure of log fitne, or Malthuian fitne, a thi i a more appropriate meaure when reproduction doe not occur at dicrete time (i.e., log fitne i the appropriate parameter for continuou growth model). To account for the poible empirical Fig. 3 Performance of the approximate etimator for j. Each point repreent the average of 100,000 imulated data et. (a) Bia in ^j. The bold olid line give the expectation for an unbiaed etimate. For j = -1, ^j i biaed toward -1. (b) Performance of 95% confidence interval. The interval were ymmetric, with 2.5% upper and 2.5% lower tail probabilitie. The bold olid line denote a 95% probability of containing the true value of j A κ^ n = 10 n = B % of interval containing κ n = 10 n = κ κ

6 J Mol Evol (2008) 67: A Probability denity DNA phage ID11 H 0 : κ = 0 MLE: τ ^ = 0.68 H A : κ 0 MLE: κ ^ = -1.0, ^τ = log Λ = 9.42 P = B Probability denity RNA phage φ6 H 0 : κ = 0 MLE: ^τ = 1.33 H A : κ 0 MLE: κ ^ = -1.0, ^τ = log Λ = P < data κ = 0 κ = Shifted fitne Fig. 4 The ditribution of hifted fitne effect for two virue. The null hypothei (H 0 : j = 0) i that the fitne ditribution belong to the Gumbel domain of attraction. The alternative hypothei (H A : j = 0) i that the ditribution i in either the Weibull (j \ 0) or Fréchet (j [ 0) domain. The dahed and olid line depict the fitted denitie under the null and alternative model repectively. p-value data κ = 0 κ = Shifted fitne are baed on 10,000 parametric boottrap replicate. The hifted empirical fitnee, plotted a vertical line, are compared to their expected value under the fitted null (vertical line with j = 0) and alternative (vertical line with j = -1) model. (a) The nine beneficial mutation for the DNA phage ID11. (b) The 16 hot range mutation for RNA phage /6 Table 3 A ummary of the tatitical reult for two viral data et Phage Ob. Shift a H 0 : j = 0(^) H A : j = 0 ð^j; ^ Þ Likelihood ratio tet b Approximate method b -2logK df p ^j df 95% CI ID (-1.0, 1.02) (-2.64, -0.57) (-1.0, 0.89) (-2.93, -0.55) (-1.0, 0.69) (-3.08, -0.49) / (-1.0, 2.03) \ (-1.83, -0.63) (-1.0, 1.33) (-1.22, -0.40) a Thi deignate the threhold ued for the analyi. 1 indicate the fitne of the lowet fitne mutant; 2, the econd lowet; etc. Shifting further than hown reult in a failure to reject at the 5% ignificance level b The likelihood ratio tet and approximate method are decribed under Material and Method abence of mall-effect beneficial mutation due to ampling trategie, we teted the ditribution of fitne effect hifted relative to the fitne of the mallet-effect mutation oberved rather than the wild type, a decribed by Beiel et al. (2007). Thu, rather than uing the wild type fitne a the threhold, we ued the fitne of the mallet-effect mutation. The coefficient of variation for both data et were *0.07, thu we ignored meaurement error in our analye a uggeted by Beiel et al. (2007). Including meaurement error had no effect on our reult (not hown). Depite the mall ample ize, the Gumbel domain (j = 0) wa rejected uing the LRT in favor of the Weibull domain (j \ 0) for both data et (Fig. 4, Table 3). Thi reult indicate that a fitne ditribution with a righttruncated tail give a better fit to both data et than a ditribution with an exponential tail. Furthermore, the Gumbel domain i till rejected for the DNA phage ID11 data et if the threhold i et to the fitne of the econd or third mallet-effect mutation and for the RNA phage /6 data et with the threhold et to the fitne of the econd mallet-effect mutation (Table 3), providing tronger evidence that mied mall-effect mutation are not reponible for thi reult. In applying the LRT, it i neceary to retrict j C-1, ince for j \ -1, the likelihood can become infinite, and thu the maximum likelihood etimate do not exit. Thi retriction make the tet conervative, yet the fact that ^j 1 ugget that the bet etimate of j i B 1. We developed a novel etimation method appropriate for j near -1, which alo provide confidence interval (Material and Method). Applying thi method with the threhold et a the fitne of the mallet-effect mutation, we etimated ^j ¼ 1:06 (ID11; 95% CI, \ j \ -0.57) and ^j ¼ 1:00 (/6; 95% CI, \ j \-0.63), in cloe agreement with the reult from the LRT

7 374 J Mol Evol (2008) 67: (Table 3). Thee confidence interval encompa a wide range of tail behavior (ee Fig. 1), yet all are characterized by a right-truncated ditribution and differ ubtantially from the exponential ditribution. Dicuion We have hown, uing collection of beneficial mutation from a DNA viru and an RNA viru, that at leat ome fitne ditribution do not belong to the Gumbel domain of attraction from extreme value theory (EVT). The widepread aumption of exponentially ditributed beneficial fitne effect in theorie of adaptation i trongly rejected for our data. Both data et ugget that fitne ditribution can intead belong to the Weibull domain of attraction, which implie that the ditribution of beneficial fitne effect i right-truncated. In fact, the fitted ditribution (GPD with j = -1) correpond to a uniform ditribution, which ha been conidered an unrealitic ditribution for beneficial fitne effect (Wilke 2004) and i at variance with the common obervation that mall-effect beneficial mutation greatly outnumber thoe of large effect (Imhof and Schlötterer 2001; Kaen and Bataillon 2006; Perfeito et al. 2007; Sanjuán et al. 2004). Sampling Procedure The method ued for collecting the ID11 and /6 data et may at firt eem to be at odd with the theory being teted. The beneficial mutation for ID11 were ampled only after they had fixed in an evolving population and, thu, repreent a nonrandom ample of new beneficial mutation. For the /6 data et, the wild type wa unable to grow under the elective condition, and thu it fitne cannot be aumed to be in the tail of the fitne ditribution. However, the tatitical methodology we apply wa developed pecifically for thee type of data. Beiel et al. (2007) provide extenive dicuion of the appropriatene of beneficial mutation collected through election experiment (ID11) and gain-of-function experiment (/6). Both cae rely on the fact that the GPD hape parameter i not altered by a change in threhold, and ince the domain of attraction i pecified by the hape parameter, we are able to change the threhold to alleviate ampling bia or to be certain that the threhold i far enough into the tail for EVT to apply. We briefly reiterate the argument of Beiel et al. (2007) in the context of our two data et. Selection i a biaed method for ampling from the ditribution of new beneficial mutation. The experiment of Rokyta et al. (2005) involved 20 ample from the ditribution of new beneficial mutation, biaed by requiring thee mutation to urvive drift and fix in an evolving population to be oberved. Many of the oberved mutation were ampled multiple time, and thi frequency data wa ignored in our analyi. Only the fitnee of unique mutation were included. Uing election a a ampling trategy implie that we are likely to thoroughly ample only large-effect mutation, and thoe of very mall effect are likely to be mied. Figure 2 illutrate a hypothetical example. The location of the vertical gray bar on the horizontal axi are the fitnee of the genotype poeing beneficial mutation. We aume a mall number of beneficial mutation. Thu, under election, which could include the effect of clonal interference, the ditribution of fixed effect i a dicrete probability ma function on the fitnee of the mutant (repreented in Fig. 2a by the height of the gray bar). Higher-fitne mutant are more likely to be oberved in any given replicate, but repeated experiment will give thorough ampling, epecially of the larget-effect mutation. For our analyi, we hift further into the right tail to compenate for thi bia and demontrate that the threhold can be et to the mallet-, econd mallet-, and third mallet-effect beneficial mutation while till rejecting the Gumbel domain (Table 3). Thu, we only need to aume that we have an adequate ample of mutation that have larger effect than the third malleteffect mutation. Seventeen of the 20 ampled mutation decribed by Rokyta et al. (2005) fall into thi range, which include 6 unique mutation. Furthermore, a i clear in Fig. 4a, the dicrepancy between the data and the exponential ditribution actually involve the large-effect mutation, i.e., thoe mot likely to have been oberved. Under the exponential ditribution, we would have expected to ee well-paced mutation of very large effect, which were not oberved, and adding a handful of malleffect mutation would not have affected our reult. The ue of gain-of-function mutation for teting the domain of attraction for fitne ditribution involve a conceptual departure from the modeling framework to which the Gumbel aumption i generally applied. For example, the reult of Orr (2002) rely on the aumption that the wild type fitne i in the tail of the ditribution. However, thi aumption need not hold to tet the domain of attraction of a fitne ditribution. If we conider the fitne of wild type /6 and all of it ingle-mutation neighbor in equence pace a a ample from the fitne ditribution, it i clear that wild type fitne (zero) doe not lie in the tail and thu would not erve a an adequate threhold (Fig. 2b). However, we can be confident that the fitne of the mallet-effect mutation i in the tail, ince only a mall number of fitnee were larger. Thu, hifting relative to the fitne of the mallet-effect mutation aure u that we are dealing with the tail of the fitne ditribution, though thi particular fitne ditribution may not be amenable to the prediction of population genetic

8 J Mol Evol (2008) 67: model uch a thoe of Orr (2002). Our analyi focue olely on the Gumbel aumption for fitne ditribution and teting thi aumption only require that the threhold be in the tail. The Domain of Attraction for Fitne Ditribution Although we have demontrated that beneficial fitne effect in two viral ytem are characterized by a righttruncated ditribution, it remain to be determined whether the Weibull domain of attraction will apply generally, or whether ditribution will vary among different organim in different environment. Prior empirical attempt to characterize ditribution of beneficial fitne effect for microbe did not ditinguih between alternative EVT domain of attraction. A tudy of beneficial mutation in veicular tomatiti viru by Sanjuán et al. (2004) rejected the exponential ditribution in favor of the gamma ditribution, wherea a tudy of beneficial mutation in Peudomona fluorecen by Kaen and Bataillon (2006) failed to reject the exponential in favor of the gamma. However, the gamma itelf belong to the Gumbel domain of attraction. Since the aumption of an exponential ditribution for beneficial fitne effect i jutified by EVT, it i more natural to turn to EVT to provide the appropriate alternative hypothee. Variou limitation of thee data et (e.g., ample ize for the Sanjuán et al. data et and the unknown number of unique mutation in the Kaen and Bataillon data et) prevented u from reanalyzing them in the context of EVT. Theoretical upport in favor of the Gumbel domain wa provided by Orr (2006) through an analyi of the ditribution of fitne effect under Fiher (1930) geometrical model of adaptation. While the generality of our reult remain uncertain, we have clearly demontrated that not all fitne ditribution fall within the Gumbel domain of attraction. Ditribution in the Gumbel and Weibull domain of attraction differ qualitatively in the key characteritic ued to predict the rate and pattern of adaptive evolution the fitne pacing between beneficial mutation, which allow direct calculation of fitne effect and election coefficient (Orr 2005). Under the exponential ditribution, the fitne pacing between adjacently ranked beneficial mutation are independent exponential random variable. If the mean of the parent ditribution i l, then the expected difference between the larget and the econd larget obervation i l, the expected difference between the econd and the third larget i l/2, the expected difference between the third and the fourth larget i l/3, etc. Thu, the fitne effect of beneficial mutation tend to accumulate near low value, matching Fiherian intuition. Thi pattern hold everal conequence for adaptive evolution; for example, it mitigate the effect of clonal interference in the adaptation of aexual organim (Kim and Orr 2005), and it caue natural election to behave, on average, halfway between perfect adaptation, where the bet available mutation i alway fixed, and random adaptation, where all available beneficial mutation have equal probabilitie of being fixed (Orr 2002). Thi pattern of pacing doe not hold under the GPD with j & -1 (ee Fig. 4), and for j \ -1 the pattern begin to revere uch that fitne effect tend to accumulate near the right truncation point. A a conequence, our finding that at leat ome fitne ditribution are not in the Gumbel domain will have a ubtantial impact on our undertanding of the rate and pattern of adaptive evolution. Small-Effect v. Large-Effect Mutation Our reult appear to violate the long-held intuition that mutation of large benefit hould be le common than thoe of mall benefit. The logic behind thi intuition date back to Fiher (1930) and ha been upported by recent empirical and theoretical work (Imhof and Schlötterer 2001; Kaen and Bataillon 2006; Perfeito et al. 2007; Sanjuán et al. 2004). However, both of our data et are conitent with a uniform ditribution of beneficial fitne effect, implying that mall-effect and large-effect beneficial mutation are equally common for thee two ytem. How do we reconcile thee conflicting reult? Firt, it i important to differentiate between the ditribution of beneficial fitne effect fixed over the coure of an adaptive walk and the beneficial effect of potential ingle-tep mutation. The former deal with mutation in a variety of genetic background, and each fixed mutation could conceivably give rie to an entirely new ditribution of ingle-tep mutation. Much of the work purporting to upport an exce of malleffect mutation (e.g., Imhof and Schlötterer 2001; Perfeito et al. 2007) ha examined the ditribution over multiple tep in adaptation. Except under the aumption of trict additivity thee two ditribution will not be the ame, and it i clear that biological ytem are not trictly additive. For a ingle ancetral genotype, there are many context in which fitne might be improved greatly through the alteration of a ingle biochemical property (e.g., increaed protein tability or drug reitance), and any of everal mutation may confer roughly the ame large effect. However, once a large tep i taken to achieve thi phenotypic change, the remaining firt-tep mutation may have much maller effect or even be neutral or deleteriou when combined with the fixed beneficial mutation. Furthermore, uch large-effect mutation may require compenatory change to overcome pleiotropic effect. Thu, there i little reaon to expect the ditribution of beneficial fitne effect from a ingle ancetral genotype to reemble the ditribution of effect over multiple tep in an adaptive walk.

9 376 J Mol Evol (2008) 67: A uniform ditribution of beneficial effect, and imilarly a right-truncated ditribution, i certainly not unreaonable at the biochemical level. For example, even if the phenotypic effect of mutation howed an exce of mall-effect, the tranlation of phenotype into fitne could produce a more uniform ditribution. A pattern of diminihing return (i.e., a concave mapping of phenotype into fitne) uch a i commonly een for biochemical reaction and metabolic flux (Hartl et al. 1985), could conceivably yield both an apparent right truncation point and a uniform or even revered-tailed ditribution of fitne effect, regardle of the underlying ditribution of phenotypic effect. Likewie, mutation can have a large effect on a phenotype uch a hot attachment, but the extent to which that phenotype can increae fitne may be limited (Pepin et al. 2006). Thi too might yield a uniform ditribution of fitne effect. Thu, a more complete undertanding of the ditribution of fitne effect and the relative abundance of large-effect beneficial mutation may require a better undertanding of the biochemical nature of adaptation. Acknowledgment The author would like to thank J. J. Bull for comment and dicuion. Thi work wa upported by grant from the National Intitute of Health to P.J. and H.A.W. (R01GM076040) and to C.L.B. (R01GM067940). C.J.B. wa upported in part by NIHP20 RR16448 and a grant from the National Science Foundation (DEB ) to P.J. D.R.R. wa upported in part by NIH P20 RR Analytical reource were provided by NIH P20 RR16448 and NIH P20 RR Open Acce Thi article i ditributed under the term of the Creative Common Attribution Noncommercial Licene which permit any noncommercial ue, ditribution, and reproduction in any medium, provided the original author() and ource are credited. Reference Beiel CJ, Rokyta DR, Wichman HA, Joyce P (2007) Teting the extreme value domain of attraction for ditribution of beneficial fitne effect. Genetic 176: Catillo E, Hadi AS (1997) Fitting the generalized Pareto ditribution to data. J Am Stat Aoc 92: Duffy S, Turner PE, Burch CL (2005) Pleiotropic cot of niche expanion in the RNA bacteriophage /6. Genetic 172: Ferri MT, Joyce P, Burch CL (2007) High frequency of mutation that expand the hot range of an RNA viru. Genetic 176: Fiher RA (1930) The genetical theory of natural election. Oxford Univerity Pre, Oxford, UK Gerrih PJ, Lenki RE (1998) The fate of competing beneficial mutation in an aexual population. Genetica 102(103): Gillepie JH (1983) A imple tochatic gene ubtitution model. Theor Popul Biol 23: Gillepie JH (1984) Molecular evolution over the mutational landcape. Evolution 38: Gillepie JH (1991) The caue of molecular evolution. Oxford Univerity Pre, New York Hartl DL, Dykhuizen DE, Dean AM (1985) Limit of adaptation: the evolution of elective neutrality. Genetic 111: Imhof M, Schlötterer C (2001) Fitne effect of advantageou mutation in evolving Echerichia coli population. Proc Natl Acad Sci USA 98: Kaen R, Bataillon T (2006) Ditribution of fitne effect among beneficial mutation before election in experimental population of bacteria. Nature Genet 38: Kim Y, Orr HA (2005) Adaptation in exual v aexual: clonal interference and the Fiher-Muller model. Genetic 171: Leadbetter MR, Lindgren G, Rootzén H (1980) Extreme and related propertie of random equence and procee. Springer-Verlag, New York Orr HA (2002) The population genetic of adaptation: the adaptation of DNA equence. Evolution 56: Orr HA (2003) The ditribution of fitne effect among beneficial mutation. Genetic 163: Orr HA (2005) The genetic theory of adaptation: a brief hitory. Nat Rev Gen 6: Orr HA (2006) The ditribution of beneficial fitne effect among beneficial mutation in Fiher geometric model of adaptation. J Theor Biol 238: Otto SP, Jone CD (2000) Detecting the undetected: etimating the total number of loci underlying a quantitive trait. Genetic 156: Pepin KM, Samuel MA, Wichman HA (2006) Variable pleiotropic effect from mutation at the ame locu hamper prediction of fitne from a fitne component. Genetic 172: Perfeito L, Fernande L, Mota C, Gordo I (2007) Adaptive mutation in bacteria: high rate and mall effect. Science 317: Pickand J III (1975) Statitical inference uing extreme order tatitic. Ann Statit 3: R Development Core Team (2006) R: a language and environment for tatitical computing. R Foundation for Statitical Computing, Vienna, Autria. Available at: Rokyta DR, Joyce P, Caudle SB, Wichman HA (2005) An empirical tet of the mutational landcape model of adaptation uing a ingle-tranded DNA viru. Nature Genet 37: Rokyta DR, Beiel CJ, Joyce P (2006) Propertie of adaptive walk on uncorrelated landcape under trong election and weak mutation. J Theor Biol 243: Rozen DE, de Vier JAGM, Gerrih PJ (2002) Fitne effect of fixed beneficial mutation in microbial population. Curr Biol 12: Sanjuán R, Moya A, Elena SF (2004) The ditribution of fitne effect caued by ingle-nucleotide ubtitution in an RNA viru. Proc Natl Acad Sci USA 101: Wahl LM, Krakauer DC (2000) Model of experimental evolution: the role of genetic chance and elective neceity. Genetic 156: Wilke CO (2004) The peed of adaptation in large aexual population. Genetic 167:

