Analyses of adaptive evolution and recombination rate variation in Drosophila

Size: px
Start display at page:

Download "Analyses of adaptive evolution and recombination rate variation in Drosophila"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations 2011 Analyses of adaptive evolution and recombination rate variation in Drosophila Ramesh Ratnappan University of Iowa Copyright 2011 Ramesh Ratnappan This dissertation is available at Iowa Research Online: Recommended Citation Ratnappan, Ramesh. "Analyses of adaptive evolution and recombination rate variation in Drosophila." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Biology Commons

2 ANALYSES OF ADAPTIVE EVOLUTION AND RECOMBINATION RATE VARIATION IN DROSOPHILA by Ramesh Ratnappan An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Biology in the Graduate College of The University of Iowa December 2011 Thesis Supervisor: Associate Professor Josep M. Comeron

3 1 ABSTRACT The neutral theory of molecular evolution provides a framework to understand the molecular basis of evolutionary change. The fundamental principle of the neutral theory is that the majority of mutations observed within populations as well as between species is neutral. More importantly, the theoretical framework that the neutral theory provides not only includes mutation and random genetic drift but also two other important parameters in evolution: selection and recombination. Molecular evolutionary analyses allow us to estimate the magnitude and consequences of these parameters in natural populations. The work presented in this thesis investigates two aspects affecting evolutionary change: adaptation and recombination. In the second and third chapter, adaptive changes associated with a new habitat are investigated in Drosophila santomea, a species which belongs to the melanogaster subgroup and which is endemic to the volcanic island of São Tomé. In its present habitat, D. santomea inhabits a colder, darker and more humid environment compared with its sister species, Drosophila yakuba, which is distributed throughout sub-saharan Africa. Comparisons of genome wide changes in expression between these species indicate that a group of genes involved in detection of external stimuli could be providing selective advantages to D. santomea in its new environment. In the second chapter, the coding sequences of these genes were obtained in D. santomea and analyzed to assess signatures of selection at the level of amino acid sequence. The analysis reveals that along with changes in the gene expression, protein coding sequences are evolving at a faster rate in D. santomea when compared with D. yakuba, likely providing evidence for an adaptive advantage to D. santomea when colonizing its new environment. The third chapter describes the study of the effect of cold temperature on the fitness of both species. The results indicate that D. santomea tolerates cold temperature better than D. yakuba, with different stages of the life cycle showing more pronounced effects. The observed reduction

4 2 in fitness at low temperatures strongly supports the hypothesis that temperature is a key factor delimiting the distribution of these two species in their current habitats. Recombination is an important evolutionary parameter that influences the amount of variation present within a species and the potential to adapt to biotic/abiotic changes. As such, it is a key parameter in population genetics models of selection. To date, however, no study has been able to measure the variation in recombination with high resolution (ideally at the level of single genes) while also capturing variation in recombination rates within a species. Further, there is a need to understand how the two outcomes of meiotic recombination (cross-over and gene conversion) are distributed across genomes. The fourth chapter describes the direct measurement of ultra-highresolution variation in recombination rate throughout the D. melanogaster genome by massively genotyping the products of 5860 female meiosis. These maps reveal that crossover rates are sharply reduced near telomeres and centromeres, with no cross-over activity in the small fourth chromosome. Importantly, we detect genomic regions with almost undetectable cross-over events embedded in large regions with high cross-over rates. Gene conversion rates are more uniformly distributed across the genome than cross-over rates and detectable even in regions with no evidence of cross-over activity. Finally, the study of intraspecific variation on cross-over rates reveals many regions with significant excess of variation thus uncovering the presence of modifiers of recombination segregating with D. melanogaster. The results from this analysis underscore the need to incorporate both intraspecific variation in cross-over rates as well as gene conversion rates into a new generation of population genetics models. Abstract Approved: Thesis Supervisor Title and Department Date

5 ANALYSES OF ADAPTIVE EVOLUTION AND RECOMBINATION RATE VARIATION IN DROSOPHILA by Ramesh Ratnappan A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in biology in the Graduate College of The University of Iowa December 2011 Thesis Supervisor: Associate Professor Josep M. Comeron

6 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Ramesh Ratnappan has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Biology at the December 2011 graduation. Thesis Committee: Josep M. Comeron, Thesis Supervisor Ana Llopart Jan Fassler Andrew Forbes Terry Braun

7 To my father, Karipala Sankara Pillai Ratnappa ii

8 ACKNOWLEDGMENTS First and foremost, I would like to thank my adviser, Dr. Josep M. Comeron, from whom I learned so much about population genetics. His wide knowledge of the subject has positively influenced and changed the way I have been thinking about the evolutionary process. I am grateful for all his time, ideas, and funding which made my graduate study stimulating. I would like to thank Dr. Ana Llopart who helped me tremendously during my research. She was kind enough to provide me the samples for my study which were collected painstakingly from the island of Sao Tome. I have learned a lot from comments and questions from my committee members. The comprehensive exam changed the manner in which I thought about and presented science. I would like to thank my present and past committee members, Dr. Ana Llopart, Dr. Jan Fassler, Dr. Terry Braun, Dr. Andrew Forbes, Dr. Debashish Bhattacharya, and Dr. Steve Hendrix, who helped me to understand and deliver science in a better way. I would also like to thank Dr. Bryant McAllister and Dr. Steve Hendrix. I learned a lot from the classes taught by them. I am indebted to my lab mates for providing a stimulating environment to learn and grow. I am especially grateful to Anya Williford, Derek Peters, Sam Bailin and Andrew Adrian for the stimulating discussions, team efforts and all the fun we have had during my course of study. I am grateful to all other faculty and staff members of the Department of Biology at the University of Iowa for assisting me in many different ways. Special thanks to Phil Ecklund who assisted me with all the official paperwork. iii

9 Lastly and most importantly, I wish to thank my family. I cannot express in words the gratitude to my father K. S. Ratnappan, who inspired me to pursue science and my mother Sulochana M. G who is the source of constant spirit and enthusiasm for me. My wife, Archana has always been supportive through all these years of graduate study. I gained a lot from her understanding of science. I am greatly obliged to my sister Lakshmi also. Finally, my little son Arvind whose smiling face is always a stress reliever. iv

10 ABSTRACT The neutral theory of molecular evolution provides a framework to understand the molecular basis of evolutionary change. The fundamental principle of the neutral theory is that the majority of mutations observed within populations as well as between species is neutral. More importantly, the theoretical framework that the neutral theory provides not only includes mutation and random genetic drift but also two other important parameters in evolution: selection and recombination. Molecular evolutionary analyses allow us to estimate the magnitude and consequences of these parameters in natural populations. The work presented in this thesis investigates two aspects affecting evolutionary change: adaptation and recombination. In the second and third chapter, adaptive changes associated with a new habitat are investigated in Drosophila santomea, a species which belongs to the melanogaster subgroup and which is endemic to the volcanic island of São Tomé. In its present habitat, D. santomea inhabits a colder, darker and more humid environment compared with its sister species, Drosophila yakuba, which is distributed throughout sub-saharan Africa. Comparisons of genome wide changes in expression between these species indicate that a group of genes involved in detection of external stimuli could be providing selective advantages to D. santomea in its new environment. In the second chapter, the coding sequences of these genes were obtained in D. santomea and analyzed to assess signatures of selection at the level of amino acid sequence. The analysis reveals that along with changes in the gene expression, protein coding sequences are evolving at a faster rate in D. santomea when compared with D. yakuba, likely providing evidence for an adaptive advantage to D. santomea when colonizing its new environment. The third chapter describes the study of the effect of cold temperature on the fitness of both species. The results indicate that D. santomea tolerates cold temperature better than D. yakuba, with different stages of the life cycle showing more pronounced effects. The observed reduction v

11 in fitness at low temperatures strongly supports the hypothesis that temperature is a key factor delimiting the distribution of these two species in their current habitats. Recombination is an important evolutionary parameter that influences the amount of variation present within a species and the potential to adapt to biotic/abiotic changes. As such, it is a key parameter in population genetics models of selection. To date, however, no study has been able to measure the variation in recombination with high resolution (ideally at the level of single genes) while also capturing variation in recombination rates within a species. Further, there is a need to understand how the two outcomes of meiotic recombination (cross-over and gene conversion) are distributed across genomes. The fourth chapter describes the direct measurement of ultra-highresolution variation in recombination rate throughout the D. melanogaster genome by massively genotyping the products of 5860 female meiosis. These maps reveal that crossover rates are sharply reduced near telomeres and centromeres, with no cross-over activity in the small fourth chromosome. Importantly, we detect genomic regions with almost undetectable cross-over events embedded in large regions with high cross-over rates. Gene conversion rates are more uniformly distributed across the genome than cross-over rates and detectable even in regions with no evidence of cross-over activity. Finally, the study of intraspecific variation on cross-over rates reveals many regions with significant excess of variation thus uncovering the presence of modifiers of recombination segregating with D. melanogaster. The results from this analysis underscore the need to incorporate both intraspecific variation in cross-over rates as well as gene conversion rates into a new generation of population genetics models. vi

12 TABLE OF CONTENTS LIST OF TABLES... ix LIST OF FIGURES... xi CHAPTER 1 INTRODUCTION Theoretical framework for understanding evolution Neutral theory of molecular evolution Nearly neutral theory of molecular evolution Applications of the neutral theory Studies of adaptation Studies of adaptation at molecular level Studies of adaptation to temperature variation Evolution and Recombination Single-site models of molecular evolution Linkage models of molecular evolution Fine-scale recombination in D. melanogaster...12 CHAPTER 2 ADAPTATION TO A NEW ENVIRONMENT IN DROSOPHILA SANTOMEA: MOLECULAR EVOLUTION IN PROTEIN CODING GENES Introduction Examples of adaptive changes at molecular level The novel habitat of D. santomea Materials and methods Sequencing Maximum likelihood methods to detect selection on the D. santomea lineage Results Transition to transversion rate ratio PAML branch-model Codonbias results Discussion...32 CHAPTER 3 ADAPTATION TO A NEW ENVIRONMENT IN DROSOPHILA SANTOMEA: EFFECT OF COLD TEMPERATURE ON FITNESS IN D. SANTOMEA S NEW ENVIRONMENT Introduction Examples of adaptation to abiotic factors Materials and Methods Temperature tolerance experiments Results Ageing, mating and development at constant temperature Ageing at different temperatures, mating and development at 23 C Mating at different temperatures, ageing and development at 23 C Ageing and mating at 23 C, development at different temperatures Discussion...57 vii

13 3.4.1 Adaptation in D. santomea to colder temperature...57 CHAPTER 4 INTRASPECIFIC VARIATION IN RECOMBINATION IN DROSOPHILA MELANOGASTER Introduction Advantages of recombination Meiotic Recombination Variation in recombination rate within populations Recombination rate as a population genetic parameter The need for a new generation of recombination maps Materials and Methods Fly Crosses DNA extraction Real-Time PCR Illumina library preparation Gel extraction PCR enrichment, library validation and quantification Sequence analysis Results and Discussion A high-resolution CO map for D. melanogaster Intraspecific variation in CO landscapes GC and GC maps in D. melanogaster Conclusion and future directions...97 CHAPTER 5 CONCLUSION APPENDIX..127 REFERENCES viii

14 LIST OF TABLES Table 2.1 Preferred codon in D. yakuba estimated by CodonW program and used in Codonbias to estimate selection synonymous codons The ω (nonsynonymous to synonymous rate ratio) values for D. santomea and rest of the branches of the phylogeny from the program PAML The ω (nonsynonymous to synonymous rate ratio) values for D. santomea, D. yakuba and rest of the branches of the phylogeny from the program PAML The ω (nonsynonymous to synonymous rate ratio) and codon usage bias values for D. santomea and rest of the branches of the phylogeny from the program Codonbias The ω (nonsynonymous to synonymous rate ratio) and codon usage bias values for D. santomea, D. yakuba and rest of the branches of the phylogeny from the program Codonbias Cross numbers and parental strains used to generate RAIL recombinant lines List of primers used in the experiments Maximum likelihood estimates of gene conversion tract lengths L for each chromosome (bp) and ρ, the rate of gene conversion initiation (/female meiosis/bp) A.1 Results from PAML for two ratio test for individual genes of external stimulus group. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A.2 Results from PAML for two ratio test for individual genes of functional random group of genes. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A.3 Results from PAML for three ratio test for individual genes of external stimulus group. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A.4 Results from PAML for three ratio test for individual genes of functional random group of genes. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A.5 Results from Codonbias for two ratio test for individual genes of external stimulus group. s is the selection for codon usage. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models ix

15 A.6 Results from Codonbias for two ratio test for individual genes of functionally random group. s is the selection for codon usage. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A. 7 Results from Codonbias for three ratio test for individual genes of external stimulus group. s is the selection for codon usage. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models A.8 Results from Codonbias for three ratio test for individual genes of functionally random group. s is the selection for codon usage. lnl is the likelihood value of the model. LRT (Likelihood Ratio Test) compares lnl of both models x

16 LIST OF FIGURES Figure 1.1 Advantages of recombination (a) Clonal interference model (b) Ruby in the rubbish model Accumulation of deleterious mutations in nonrecombining populations Crossing over rates in 2 nd chromosome in Drosophila (a) Based on several models and methods (b) based on SNP markers Phylogenetic tree based on branch lengths obtained by the programpaml using a concatenated sequence of all the 156 genes The nonsynonymous to synonymous rate ratio) of external stimuli genes estimated separately using the program PAML for D. santomea and D. yakuba branches. (A) of individual genes in D. santomea. (B) of individual genes in D. yakuba The nonsynonymous to synonymous rate ratio) of the concatenated sequence of external stimuli genes, estimated separately using the program PAML for D. santomea and D. yakuba branches Increase in nonsynonymous to synonymous rate ratio) in D. santomea when compared with D. yakuba in external stimuli and functionally random genes The nonsynonymous to synonymous rate ratio) of external stimuli genes estimated separately using the program Codonbias for D. santomea and D. yakuba branch. (A) f individual genes in D. santomea branch (B) of individual genes in D. yakuba branch The nonsynonymous to synonymous rate ratio) of functionally random genes estimated separately using the program Codonbias for D. santomea and D. yakuba branch. (A) f individual genes in D. santomea branch (B) of individual genes in D. yakuba branch Increase in nonsynonymous to synonymous rate ratio) in D. santomea when compared with D. yakuba in external stimuli and functionally random genes Relative abundance of Drosophila yakuba, D. santomea, and hybrid flies at different elevations in São Tomé Island The percentage of females that produced live offspring when aging, mating and development were done at the same temperature in D. santomea, D. yakuba mainland line and D. yakuba island line The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when maintained at same temperature throughout their life cycle xi

17 3.4 The average number of males and females produced by both species at different temperatures. (A) Offspring produced when aging, mating and development were done at constant temperature. (B) Offspring produced when aging and mating were done at 23 C and development was done at different temperatures The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when aging of males was carried at low temperatures and mating and development were done at 23 C The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when aging of females was carried at low temperatures and mating and development were done at 23 C The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when males of both species were maintained at low temperatures for aging while mating and development is done at 23 C The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when females of both species were maintained at low temperatures for aging while mating and development are done at 23 C The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged at 23 C, mated at the four different temperatures and developed at 23 C The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged at 23 C, mated at the four different temperatures and developed at 23 C The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged and mated 23 C and developed at the four different temperatures The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged and mated at 23 C and developed at the four different temperatures Effect of temperature on different stages of life cycle in (A) D. santomea, (B) D. yakuba mainland line and (C) D. yakuba island line Recombination and the effectiveness of selection on two beneficial mutations on two different chromosomes Recombination and the effectiveness of selection on a beneficial mutation on a chromosome with strong deleterious mutations The effectiveness of selection on genes with different recombination rates Recombination and Double Strand Repair (DSB) pathways during meiosis xii

18 4.5 Diagrammatic representation of the steps involved in creating Illumina library and analyzing the sequenced reads A typical recombination map generated from a single RAIL fly (Recombinant Advanced Intercross Lines) of the cross RAL 375 x RAL 208 using 9,000 informative SNPs Rates of crossing-over (CO) in D. melanogaster for X chromosome based on data from all crosses Rates of crossing-over (CO) in D. melanogaster for left arm of the 2 nd chromosome based on data from all crosses Rates of crossing-over (CO) in D. melanogaster for right arm of the 2 nd chromosome based on data from all crosses Rates of crossing-over (CO) in D. melanogaster for left arm of the 3 rd chromosome based on data from all crosses Rates of crossing-over (CO) in D. melanogaster for right arm of the 3 rd chromosome based on data from all crosses (a) Rates of crossing-over (CO) in D. melanogaster for eight different crosses for X chromosome (b) Estimate of Index of Dispersion (ID CO ) for CO rates along chromosomes (a) Rates of crossing-over (CO) in D. melanogaster for eight different crosses for 2L chromosome (b) Estimate of Index of Dispersion (ID CO ) for CO rates along chromosomes (a) Rates of crossing-over (CO) in D. melanogaster for eight different crosses for 2R chromosome (b) Estimate of Index of Dispersion (ID CO ) for CO rates along chromosomes (a) Rates of crossing-over (CO) in D. melanogaster for eight different crosses for 3L chromosome (b) Estimate of Index of Dispersion (ID CO ) for CO rates along chromosomes (a) Rates of crossing-over (CO) in D. melanogaster for eight different crosses for 3R chromosome (b) Estimate of Index of Dispersion (ID CO ) for CO rates along chromosomes Maximum likelihood estimates of GC initiation ( ) and length of GC tracts length (L) Rates of gene conversion (GC) initiation (ρ) in D. melanogaster across the 3L chromosomal arm based on data from all crosses Diagrammatic representation of advancement made in recombination maps by this study xiii

19 1 CHAPTER 1 INTRODUCTION Diversity is an attribute of natural populations. One of the most important factors contributing to phenotypic diversity is the presence of genetic variation between individuals of the population. The level of such variation within a population is the result of the combined effects of mutation, selection, random genetic drift and recombination. Mutation is one of the fundamental forces in evolution as it generates variability and ultimately enables evolutionary change through the effects of selection, drift and recombination. The rate at which mutations occur at a given locus is an important parameter in evolution, with mutation rates known to vary between different taxa ranging from 7.2 X 10-7 to 7.2 X per base pair per generation (Hamilton 2009). The persistence of mutations in any population depends on the balance between mutation rate, random genetic drift and selection and determines the level of polymorphism at different loci across genomes. Population size is another important parameter that needs to be taken into account when trying to understand the dynamics of mutations in a population, possibly the most important one known to act on every mutation of every species. The size of the population that is relevant for evolution and thus used in population genetic analyses is termed the effective population size. The effective population size is different from, and in almost all cases smaller than, the census population size. While census population size is the total number of the surviving individuals, the effective population size can be generally described as the total number of individuals that contribute to the gene pool of the next generation. Several factors cause the effective population size to differ from the

20 2 census size. Inbreeding and random variation in the number of offspring produced from one generation to the next can reduce the effective population size rapidly and drastically below the actual population size. The effective population size can also vary within one species, varying among chromosomes (autosomes, X-chromosomes and Y- chromosomes) and organelles. Finally, as detailed below, the effective population size can be affected by the interplay between selection and recombination or linkage (Charlesworth 2009). The consequences of the effective population size on rates and patterns of evolution are manifested in terms of the stochastic process associated with finite population sizes and the random sampling of alleles from the parental generation to produce an offspring generation that will necessarily differ from the parental. This random process active in all populations is called random genetic drift, and the magnitude of its effects is inversely proportional to the effective population size (random genetic drift increases as the effective population size decreases and vice-versa). Random genetic drift plays a role in all species, even in those with extremely large populations. Mutations can be characterized into different classes according to the fitness consequences on individuals: deleterious, neutral, and beneficial. A strongly deleterious mutation, influenced by negative of purifying selection, will be removed from the population and play a minimal role in the observed variation within a species. Strongly beneficial mutations provide clear fitness advantages and their probability of fixation is strongly influenced by their selection coefficient (s). Although these beneficial mutations also play a minimal role in levels of polymorphism themselves, they can influence levels of variation in adjacent genomic regions depending on the rate of recombination (see

21 3 below). Slightly deleterious and slightly beneficial mutations linger in the population for long time, with sojourn times to fixation or loss that depends on both selection and random genetic drift. Neutral mutations do not provide any advantage or disadvantage to the population and their properties can be described purely in terms of random genetic drift. Other fitness classifications, such as balancing selection or other forms of frequency dependent selection, depend on the genotypic composition at a given locus in diploid individuals or in a population, actively maintaining different alleles in the population. It is important to note however that random genetic drift also plays a significant role in the probability of fixation of strongly beneficial mutations, albeit a smaller one than on weakly deleterious mutations. The relative balance between the importance of selection and random genetic drift shifts according to the effective population size; selection is more relevant than drift in species with large effective population size, and drift is more relevant in smaller populations. For instance, a nearly neutral, beneficial mutation increases its fixation probability when present in larger population than in a smaller population. The genetic structure and evolution of a population thus is mainly dependent on the interaction between mutations, selection, random genetic drift and effective population size. 1.1 Theoretical framework for understanding evolution The initial framework for understanding evolutionary change was provided by Darwin in his book On the Origin of species. The classical view of evolution proposed by Darwin stated that natural selection acts mainly on continuous variation present within species and that this variation could provide beneficial effects on fitness, ultimately being the basis of adaptive evolution. This idea of strongly beneficial mutations driving

22 4 evolutionary change by positive ( Darwinian ) selection was the guiding principle throughout the first half of the 20 th century. It is now associated with adaptive evolution. Although important theoretical advances were proposed in 1930 s, the study of evolution changed in mid-60 s with analyses on proteins and protein encoding loci being used to determine the amount of genetic variation present within populations. The results of these studies defined the new era of molecular evolution and molecular population genetics. Analysis of genetic variation in Drosophila by Lewontin and Hubby (1966), and in humans by Harris (1966) indicated there was more genetic variation present in these populations than previously assumed. Kimura (1968) calculated that with the amount of genetic variation observed by these authors, there should be on average one nucleotide substitutions every 1.8 year or roughly 0.5 per generation, assuming 4 years per generation for mammals. This number was significantly higher than the previously calculated rate of 1 base pair change in 300 generations (Haldane 1957). This difference is not trivial, for it indicates that a species would not survive if these mutations were strongly selected for or against. The presence of such high number of changes suggested that these mutations, or a majority of them, had to have neutral effects on fitness. Kimura proposed that random genetic drift acting in finite populations and neutral mutations explained the presence of such high levels of molecular diversity within populations. A similar idea was also put forward by King and Jukes (1969). They showed that the regions of the protein which are functionally important evolve at a slower rate than the functionally less important ones indicating the importance of random genetic drift in maintaining the diversity in populations.

23 Neutral theory of molecular evolution The discovery of large amounts of variation at the molecular level, first in proteins and later in DNA sequences, necessitated including non-darwinian forces to describe evolutionary change. The neutral theory provided an alternative explanation for the persistence of variation in populations to Darwin s. It also proposed and described how the fate (fixation or loss) of neutral mutations is determined by random genetic drift thus explaining evolutionary change that depends on the mutation rate while being independent of the population size (Kimura 1983). Kimura s theory, however, is not one of pure neutrality and evolution by chance due to genetic drift; it accepts mutations with fitness effects. Advantageous mutations occur but they are rare, so infrequent that they do not contribute to the genetic variation present within populations and minimally to the genetic variation observed between species. Kimura also accepted deleterious mutations as being a detectable fraction of all mutations but proposed that these mutations are strongly deleterious thus playing no role in the observed variation within and between species. Mutation with a selection coefficient (s) much less than the reciprocal of the effective population size (N), that is, s<<1/n, effectively behave like neutral mutations. Studies on protein polymorphism further revealed that there is variation in the rate of evolution in different proteins. The amino acid substitution rate was found to vary in different proteins from 9 X 10-9 substitutions per site per year in fibrinopeptide, to the slowest in histone IV (Dickerson 1971). It, hence, became apparent that the rate of evolution was different for functionally less important and functionally more important proteins, with mutations in functionally more important parts of proteins being removed by natural selection.

24 Nearly neutral theory of molecular evolution The neutral theory was later modified by Ohta (1972, 1973, 1974, and 1992) to include a significant contribution of borderline mutations (nearly neutral) to variation within and between species. These nearly neutral mutations have selection coefficients in a range between the completely neutral and strongly selected. Ohta also proposed that most of these weakly selected mutations are slightly deleterious mutations with selection coefficient s 1/2N. The fate of this class of nearly neutral mutations is affected both by random genetic drift and selection. When selection is weak (relative to ~1/N), drift determines the fate of these slightly deleterious mutations. Importantly, mutations with the same selection coefficient would be removed or fixed by selection if they occurred in a population with larger population size. The nearly neutral theory thus considered effective population size as one of the main parameters determining the fate of most mutations. Whether molecular data fits better to the neutral theory or to the nearly neutral theory depends on the perspective and type of sequences or changes under study. It is now known that mutations in noncoding regions of the DNA, pseudogenes or at synonymous sites in coding regions evolve under, or close to, neutrality while a measurable fraction of mutations altering amino acid sequences (nonsynonymous mutations) are nearly neutral with s 1/2N (Ohta and Gillespie 1996) Applications of the neutral theory The neutral theory of molecular evolution has significantly impacted the field of population genetics. Even if it can be now considered to be too extreme to explain all observed data, it plays an important role as a null model for population genetic analyses

25 7 when trying to infer and explain the evolutionary trajectory of a population. This idea was aptly summarized by Martin Kreitman in his article entitled The neutral theory is dead. Long live the neutral theory (Kreitman 1996). 1.2 Studies of adaptation Studies of adaptation at molecular level Several methods have been developed to detect and understand the type of selection acting on DNA and protein sequences. One such method relies on the comparison between different types of sites within codons. Nonsynonymous sites are base pair positions at which changes can result in a different amino acid in the protein. On the other hand, synonymous sites are positions in a codon at which a mutation would not result in change in the amino acid in the protein. Adaptive processes acting at the level of protein sequences can be investigated by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions in the protein coding regions of any gene. The premise of this comparison is that when orthologous genes are compared between related species, synonymous sites will describe neutral tendencies. Hence, if a protein is evolving under neutrality, the rate of nonsynonymous substitutions should be equal to the rate of synonymous substitutions. Alternatively, a high nonsynonymous to synonymous rate ratio indicates positive Darwinian selection (adaptive evolution) is acting on amino acid sequences. When the nonsynonymous to synonymous rate ratio is lower than one, it is suggestive of purifying or negative selection acting on amino acid sequences. Besides protein sequence evolution, adaptation can also impinge on levels or patterns of gene expression.

26 8 Changes in gene expression or protein coding sequence have been observed in a number of species as a response to changes in environment. In cichlid fishes living in clear and turbid lakes of East Africa, spectral tuning of rods and cones is accomplished by shifts in the wavelength of peak absorbance usually due to amino acid substitutions or changes in gene expression in the corresponding opsin (Kocher, 2004). In Dissostichus mawsoni, a fish inhabiting cold Antarctic waters, several gene families have undergone duplication and an increase in gene expression as a response to cold temperatures (Chen et al. 2008). Adaptive evolution in protein coding regions of olfactory and gustatory receptor genes have occurred in two species of melanogaster subgroup, Drosophila sechellia and Drosophila erecta, as a response to host specialization when compared to their generalist relatives. Drosophila santomea and its sister species D. yakuba separated recently (400,000 years ago). Drosophila yakuba is widespread in Africa, whereas D. santomea is endemic to the small volcanic island of São Tomé in the Gulf of Guinea. D. santomea formed after colonization of the island by a D. yakuba-like ancestor. Both species can now be found in the island of São Tomé due to secondary colonization of D. yakuba, where they are distributed according to an altitudinal cline. Importantly, the two species now differ drastically in their habitat and therefore offer the opportunity to study adaptation to a new set of biotic/abiotic conditions. In the second chapter, I present and discuss the results obtained from the analysis of selection on protein evolution in the lineage leading to D. santomea. Analysis conducted by A. Llopart (personal communication) showed that a group of genes involved in the detection of external stimuli have changed significantly in

27 9 expression between D. santomea and its closest relative D. yakuba. I analyzed the nonsynonymous to synonymous rates in the protein coding regions of this group of genes to understand whether, along with expression, adaptive evolution has also occurred at protein sequence level Studies of adaptation to temperature variation Temperature plays an important role in habitat selection and distribution of many species. Seasonal variation in temperature affects the latitudinal diversity in hylid frogs by limiting the dispersal of many tropical species into temperate regions (Wiens et al. 2006). Temperature also plays an important role in distribution and habitat selection in many freshwater turtles (Tamplin 2006; Schofield et al. 2009). The distribution of nitrogen fixing cyanobacteria Trichodesmium spp. in the tropical and temperate waters is dependent on temperature. Temperature also affects the distribution of many species of Drosophila (Kimura 2004; Kellermann et al. 2009). Even widely distributed cosmopolitan Drosophila melanogaster and D. simulans have significant differences in tolerating extreme temperatures. D. simulans tolerates colder temperature better than D. melanogaster and conversely D. melanogaster tolerates higher temperature better than D. simulans (Chakir et al. 2002, David et al. 2005). The differences in habitat preferences and distributions of D. santomea and D. yakuba on the island of São Tomé provide an opportunity to study adaptive changes in D. santomea beyond molecular analyses. It was especially interesting to understand if changes other than those affecting gene expression and protein coding regions have occurred in D. santomea. The notable temperature difference between the habitats of these two species suggested that temperature could be one of the factors affecting the

28 10 distribution of the species. In the third chapter, I discuss experiments designed to estimate fitness difference between D. santomea and D. yakuba with respect to the ability of these species to tolerate cold temperature. 1.3 Evolution and Recombination Single-site models of molecular evolution One of the important theoretical advances associated with the neutral theory was the development of single-site models of evolution (Kimura 1962; Kimura and Ohta 1969a and 1969b). These models have helped to understand important questions in population genetics, including the probability of fixation or loss of a mutation in a population, the effect of different mutations on intraspecific variation (diversity) and on divergence between species (Loewe and Hill 2010 and references therein). In these models, the fate of a mutation at a site is considered independent of other sites in the genome (i.e., single site). These independent sites can be affected by selection (including positive, negative or stabilizing), random genetic drift, back mutations and migration. These sites, however, are not affected by mutations at other sites of the genome, including those due to selection. That is, the neutral model, in its simplest form, does not include genetic linkage, even when two mutations are very close to each other in the genome as it would be the case for mutations in adjacent genes or even within the same gene Linkage models of molecular evolution While single-site models consider independent sites, linkage between sites certainly occurs and can influence evolutionary trajectories of both neutral mutations and

29 11 mutations under selection. The most classic, and still valid, argument on the effects of recombination on the fate of mutations (Fisher 1930) posits that favorable alleles that arise in different chromosomes in a population will compete with one another; without recombination only one will be fixed while the other while providing a fitness advantage will be lost. The evolutionary consequences of linkage on the behavior of mutations are studied by taking into consideration not only genetic recombination between mutations but also selection at one ore multiple mutations and finite population size (Felsenstein 1974; Comeron et al. 2002; Keightley et al. 2006; Comeron et al. 2008). Several theories and models have been proposed to describe the evolutionary fate of mutations under linkage. The main conclusion of these models is that, when sites are linked, the consequences are similar to a reduction in the effective population size (the Hill-Robertson interference). This is because selection acting on one site reduces the fixation probability at a linked site (Hill and Robertson 1966), the same effect that a smaller effective population size would have. Genetic recombination reduces the linkage between sites and thus increases the effectiveness of selection (Fig 1.1). A number of population genetics models have been proposed, all associating lack or reduced recombination with reduced fitness and differing on the assumption about the relative presence of beneficial and deleterious mutations. Below, I briefly describe two of these models. The background selection model was proposed by Charlesworth et al. (1993). It states that the removal of strongly deleterious mutations from a population will also result in the removal of the linked variation, which will cause population dynamics equivalent to a reduction in the effective population size (Fig1.2 a). Recombination, however,

30 12 reduces genetic linkage between deleterious and adjacent mutations and therefore the reduction in effective population size is not as severe as it would be in the absence of recombination. Muller s ratchet is another theoretical model proposed to describe the decrease in fitness of a population in the absence of recombination (Muller 1964). In finite populations, random genetic drift can eliminate the best genotypes. Recombination can restore these genotypes by bringing together the remaining genotypes. Due to the prevalence of deleterious mutations in natural conditions and the inevitable action of random genetic drift, populations without recombination can only accumulate deleterious mutations with time (Fig1.2 b). In finite populations, with mutation, selection and drift operating, recombination is advantageous as it increases the effectiveness of selection. Notably, models assuming infinite population size can reach opposite conclusions but these models, albeit mathematically more tractable, are hardly relevant to natural populations (Otto 2009). To gain detailed insight into the consequences of recombination however it is important to have accurate measures that will be relevant to natural populations. There is a need to understand how much variation in recombination rate exists both within the species and also across genome at a fine scale. There is also need to quantify recombination separately for the two outcomes of double strand break repair (crossing over and noncrossing over recombination (i.e., gene conversion) Fine-scale recombination in D. melanogaster In the fourth chapter, the results of the experiments carried out to understand the fine scale variation in the rate of recombination in D. melanogaster are described. While recent studies have enabled the quantification of recombination rate throughout the

31 13 genome and have unveiled fine-scale variation in the yeast Saccharomyces cerevisiae (Mancera et al. 2008), studies of fine-scale recombination rate variation in multicellular eukaryotes are either completely lacking, use low marker densities or focus on a limited genomic region (Fig 1.3). There is no recombination map in any multicellular eukaryote that locates recombination events at the level of individual genes (<10kb or better) at a whole-genome level. Also, these studies are not dense enough to incorporate the contribution of gene conversion to total recombination. To this end, ultra-dense, whole genome recombination maps were generated in D. melanogaster based on the direct quantification or of crossing over and gene conversion events using next-generation sequencing technology and mapping of single nucleotide polymorphism (SNP).

32 14 Figure 1.1 Advantages of recombination (a) Clonal interference model (b) Ruby in the rubbish model Source: Modified from Larracuente et al. (2008) Note: (a) Clonal interference describes the process by which without recombination, beneficial mutations on different backgrounds compete for fixation, which can lead to reduced adaptation, and in this case, the hitchhiking of linked deleterious mutations. (b) Ruby in the rubbish is a process where the lack of recombination leads to the loss of beneficial mutations appearing on deleterious backgrounds, leading to reduced adaptation.

33 15 Figure 1.2 Accumulation of deleterious mutations in nonrecombining populations Source: Modified from Bachtrog (2006) Note: (a) In non-recombining populations, background selection will remove strongly deleterious mutations which will also result in removal of the linked variation. (b) In the absence of recombination, many slightly deleterious mutations can get fixed in a population resulting in reduced fitness.

34 cm/mb 16 a Kilman and Hey (KHP) 1993 Hey and Kilman (Rp) 2002 Hey and Kilman (R_TE) 2002 Marais, Mouchiroud, Duret 2001 Fiston-Lavier, Singh, Lipatov, Petrov (Marey) A b 40D Figure 1.3 Crossing over rates in 2 nd chromosome in Drosophila (a) Based on several models and methods (b) based on SNP markers Source: (a) J. M Comeron (Personal communication) (b) Kulathinal et al. (2008) Note: (a) Crossing-over rates (in centimorgans per megabase, cm/mb) for D. melanogaster chromosomal arm 2L based on several models and methods. (b) Cross-over rate along the Drosophila pseudoobscura second-chromosome with bars indicating the 95% confidence interval of the estimated cross-over rate. The map was generated with 49 markers separated by distances of 130 kb to 1.7 Mb (median 466 kb).

35 17 CHAPTER 2 ADAPTATION TO A NEW ENVIRONMENT IN DROSOPHILA SANTOMEA: MOLECULAR EVOLUTION IN PROTEIN CODING GENES 2.1 Introduction Studies on adaptation are important in understanding how an organism and populations survive under novel selective pressures. Indeed, studies of populations that have experienced new environments are ideal to investigate the relative role of various evolutionary forces, including mutation, selection, and random genetic drift in adaptive processes. Further, these studies can also help to gain insight into the process of speciation (Kondrashov and Mina 1986; Barton and Mallet 1996). The ecological opportunity provided by a new unexplored territory can result in directional selection and phenotypic changes (Glor et al. 2005; Barrett et al. 2008). If there is spatial separation due to a geographical barrier, it could result in genetically based habitat preference and speciation (Coyne and Orr 2004). The study of isolated taxa in new environments can provide insight into changes occurring at molecular, physiological as well as morphological level. At all these levels, evolutionary analyses need to separate changes that would have enabled the population to survive in its new environment (adaptive) from those that arise naturally due to the colonization process and posterior random drift (neutral). Recently diverged species in island environments are excellent for studying the molecular changes associated with adaptation of an organism to a new environment due to the high genetic and physiological similarity as well as the limited possibility for initial changes obscured by posterior ones. In this chapter, I discuss molecular changes that occurred in a recently diverged species in the melanogaster subgroup after colonizing the island of São Tomé on the west coast of Gabon (D. santomea).

36 Examples of adaptive changes at molecular level Adaptation in bacterial populations In bacteria, long term (2000 generations) experimental studies have revealed that adaptation to a new environment could occur due to few mutations which provide a large increase in fitness advantage, when compared with the ancestor (Lenski et al. 1991). However, other studies on bacteria show that the fitness advantage is mostly provided by the interaction or combination of mutations that previously didn t offer any fitness advantage. Cryptic mutations (having no effect on phenotypes in a given genetic or environmental background) can provide fitness advantages by interacting with other mutations (epistasis) in a different environment (for review of types of epistatic interactions see Moore 2005). Hence, adaptation not only results from the appearance of new beneficial mutations but it could also result from epistatic interactions, with combinations of extant mutations increasing fitness when there is fluctuation in its environment (Woods et al. 2011) Adaptive changes in gene expression In higher organisms, adaptation at the molecular level has been studied using a comparative approach, mostly investigating changes in gene expression and protein coding regions in related species. For instance, changes in gene expression as a response to low temperatures have been observed in Antarctic fish, Dissostichus mawsoni. In these fishes adaptation to colder temperatures has resulted in instances of both upregulation of gene expression and gene duplication. Over 177 protein families show upregulation in gene expression, many of which (118 protein coding genes) have also undergone gene duplication (Chen et al. 2008). Expression changes have also been detected in cone opsin,

37 19 rhodopsin and rhodopsin-like genes in various closely related cichlid fishes living in clear and turbid lakes of East Africa as a result of adaptation to different photic environments (Carleton and Kocher 2001; Spady et al. 2005). In Drosophila, change in the basal level of expression is observed in many genes in lines selected for resistance against different kinds of environmental stresses including heat shock, cold shock and desiccation (Sorensen et al. 2005; Telonis-Scott et al. 2009) Adaptive changes in protein coding regions Adaptive changes at the level of protein sequences can be detected by comparing rates of protein evolution to neutral rates. At a practical level, rates of protein evolution can be estimated by the number of nonsynonymous changes per nonsynonymous site (dn or Ka) and neutral rates are often estimated based on the rate of synonymous changes per synonymous site (ds or Ks). The ratio ds/dn (or Ka/Ks) therefore is an indication of how fast is a protein sequene evolving relative to neutral expectations with high dn/ds (Ka/Ks) suggesting the action of positive selection and adaptation. An increased Ka/Ks ratio due to positive selection has been observed in genes involved in immunity (Hughes and Nei 1988; Hughes and Yeager 1998), genes involved in mating behavior, fertilization and spermatogenesis or sex determination (Civetta and Singh 1995, 1998). For instance, Llopart and Comeron (2008) showed that the roughex gene, a dose-dependent regulator of Drosophila spermatogenesis exhibit recurrent adaptive evolution for amino acid change, concurrently in different Drosophila lineages. Adaptive evolution has been observed in several taxonomic groups in many genes that mediate sexual reproduction, such as those involved in gamete recognition and could be involved in establishing reproductive barrier (Swanson and Vacquier 2002). Greenberg et al. (2003) demonstrated

38 20 that increase in cold tolerance in the cosmopolitan D. melanogaster occurred as a result of fixation of a desaturase2 allele which is absent in the ancestral Zimbabwean population. Adaptive evolution in genes involved in chemosensory perception is believed to be responsible for host specialization in many Drosophilid species. D. sechellia and D. erecta have evolved independently to specialize exclusively on the fruits of plant Morinda citrifolia and Pandanus candelabrum respectively, while their generalist relatives exploit a broad array of fruits. In both D. sechellia and D. erecta higher rates of protein evolution are observed in intact olfactory receptor (OR) and gustatory receptor (GR) genes suggesting strong selection for adaptive changes in these genes. These species, however, have also experienced higher rates of non-random, functionally related OR and GR gene loss thus suggesting relaxed selection pressure due to host specialization for OR and GR gene maintenance that could also explain high rates of protein evolution without invoking adaptive changes (McBride et al. 2007). D. grimshawi, another insular species found in Hawaii, has undergone extensive duplication and pseudogenization in its OR genes, likely due to the colonization of its new environment (Gardiner et al. 2008) Adaptive changes in both gene expression and protein coding region Some studies have investigated the correlation between gene expression changes and changes at the level of the protein sequence. Nuzhdin et al (2004) obtained a positive correlation between expression divergence and nonsynonymous substitutions in a study of 156 orthologous genes between D. melanogaster and D. simulans, indicating a

39 21 coupling between gene expression and protein sequence evolution. After removing the effects of protein-protein interactions, mrna abundance and protein length, Lemos et al. (2005) concluded that positive coupling could be attributed to similar selective pressures acting on gene expression and protein sequence. A positive correlation between gene expression divergence with protein evolution has been observed in male-biased genes and female-biased genes in D. simulans - D. melanogaster comparison (Pröschel 2006, Begun et al. 2007). In a human chimpanzee comparison, genes expressed in testis show excess of expression difference and also amino acid substitutions due to positive selection (Khaitovich et al. 2005). While the role of gene expression and protein coding region in adaptation has often been studied separately, there is a need to understand the significance of these changes together in the process of adaption. Previous studies that investigated statistical coupling between gene expression and protein evolution did not investigate the adaptive significance of such a correlation. Also important, most of these analyses compared species that are either not very close (e.g., D. melanogaster and D. simulans; see above) or do not exhibit clear differences in habitat thus making difficult inferences about the adaptive significance. The D. santomea-d. yakuba is an excellent model system to understand the role of adaptive changes in gene expression and protein coding sequence together as it allows for correlating these changes to a its new environment. Signatures of positive selection in protein coding sequences of the external stimuli genes, known to have changed gene expression via adaptation (see Chapter 1 and below), will indicate that both gene expression and protein coding regions are responding similarly to the changed

40 22 environment (darker, colder and more humid) of D. santomea, when compared with D. yakuba The novel habitat of D. santomea In this study, the adaptive changes that occurred in a recently diverged island species, D. santomea, were investigated. The common ancestor of D. yakuba and D. santomea colonized the island of São Tomé on the west coast of the African mainland, 225 km from Cameroon and Gabon. D. santomea and D. yakuba are sister species, belonging to the melanogaster subgroup. These two species occupy different habitats and are estimated to have diverged around 0.4 million years ago (Cariou et al. 2001; Llopart et al. 2002b). Even though these are recently diverged species, they have striking morphological differences. These include absence of dark abdominal pigmentation in D. santomea (Lachaise et al. 2000; Llopart et al. 2002a and 2002b), changes in male genital morphology and differences in the number of sex comb teeth (Lachaise et al. 2000; Coyne et al. 2004). On the island of São Tomé, D. santomea inhabits montane rainforest along the volcanic slopes in an environment with low temperature, low light and high humidity. D. yakuba, on the other hand, is distributed throughout sub-saharan Africa and is found in semiarid areas and grasslands. D. yakuba is also found in São Tomé due to a secondary colonization event when the Portuguese colonists turned large sections of coastal rainforest into plantations in last 500 years (Cariou et al. 2001; Llopart et al. 2005b). On the island of São Tomé, D. yakuba is found below 1450 m and D. santomea is found above 1150 m along the volcanic mountain, Pico de São Tomé (Lachaise et al. 2000; Llopart et al. 2005a). The restricted distribution to higher elevations on the island and

41 23 striking differences in the habitat of D. santomea when compared with D. yakuba indicates that the species may have undergone adaptive changes to better survive in its new environment. While changes have occurred both in morphology and physiology in D. santomea compared with D. yakuba, this study focused on looking at the molecular changes in the protein coding regions of a group of genes with Gene Ontology term Detection of External Stimulus. A study comparing genome-wide expression patterns (A. Llopart, personal communication) found that genes involved in the detection of external stimuli were significantly differentially expressed between D. santomea and D. yakuba. To understand whether adaptive changes have also occurred in the protein coding sequences of these genes, signatures of selection were investigated by comparing the rate of nonsynonymous substitutions to synonymous substitutions (dn/ds). The premise of this comparison is that mutations at nonsynonymous sites can be either neutral or under selection (positive or negative), while mutations at synonymous sites are neutral. Hence, genes with protein sequences evolving under neutral evolution showa rate of nonsynonymous substitutions equal to the rate of synonymous substitutions. Alternatively, a high nonsynonymous to synonymous rate ratio indicates positive Darwinian selection acting on amino acid changes. To understand if the evolutionary pattern is unique to the external stimuli genes in the D. santomea lineage, the rate of evolution of these genes was also compared to a group of genes selected randomly with regard to function. This comparison was necessary to distinguish the signature of positive selection from relaxed selection in the D. santomea lineage. Since D. santomea is a recently diverged species endemic to the

42 24 island of São Tomé, selection is expected to be relaxed due to the initial colonization process and likely initial redcued population size. This signature however should be observed throughout the D. santomea genome. If selection has been relaxed in the D. santomea lineage after split from the D. yakuba lineage, the signature of relaxed selection should be observed throughout the genome, including both groups of genes. However, if external stimuli genes are providing an advantage to D. santomea in its new environment, a higher rate of evolution will be observed only in the external stimuli genes, but not in functionally random genes. 2.2 Materials and methods Sequencing The protein coding sequences of orthologous external stimuli genes of D. melanogaster were first obtained from Flybase. Orthologous sequences of other Drosophila species (D. simulans, D. erecta and D. yakuba) were then obtained from the UCSC Genome Browser using the BLAT alignment tool using the sequence of D. melanogaster. For each species the sequence with maximum identity with D. melanogaster sequence was used for further analysis. A total of 32 genes were PCR amplified and sequenced in the D. santomea STO.4 strain. D. santomea STO.4 was collected on March 1998 from the Obo Natural Reserve on São Tomé Island in the zone of sympatry with D. yakuba (Lachaise et al. 2000). PCR reactions were performed using primers designed based on D. yakuba sequences obtained from UCSC Genome Browser. Cleanup of the PCR fragments was done using the Wizard MagneSil system (Promega, Madison, WI). Both strands were sequenced directly on a 3730 DNA analyzer using the Big Dye 3.1 chemistry (Applied Biosystems, Foster City,

43 25 CA). Sequencher 4.1 software was used to edit the sequences, removing the primers and assembling the contigs. Sequencing was done on complementary DNA for 27 genes and genomic DNA for 5 genes. Multiple sequence alignment was obtained using ClustalW (Thompson et al. 1994). The aligned sequences were saved in Phylip format to be used for further analysis with the programs PAML v4b (Phylogenetic Analysis by Maximum Likelihood) (Yang 1997; Nielsen and Yang 1998; Yang 2000) and Rasmus Nielsen s CodonBias (Nielsen et al. 2007). The combined length of the 32 genes sequenced involved in external stimuli genes was 39,828 bp. The group of functionally random genes consisted of 33 genes previously sequenced in D. santomea (Llopart et al. 2002, 2006), and their sequences were downloaded from GenBank, and an additional 91 genes that were sequenced following the same PCR and sequencing protocols described for external stimuli genes. In total, 124 genes with a total length of 49,794 bp were analyzed and compared with external stimuli genes. For all 124 genes the coding sequence of D. melanogaster was then downloaded from Flybase. D. simulans, D. erecta and D. yakuba sequences were obtained from the UCSC Genome Browser as mentioned above and alignment was conducted as described earlier Maximum likelihood methods to detect selection on the D. santomea lineage PAML program Phylogenetic analysis was first done using the program PAML on both groups of genes: external stimuli and functionally random. PAML implements a maximum likelihood methodto detect the effect of selection on protein sequence evolution using

44 26 divergence data. Briefly, PAML uses a maximum likelihood statistical approach to compare different codon substitution models to protein-coding sequence data. The models implemented in PAML include purifying selection (ratio of nonsynonymous rate to synonymous rate is between 0 and 1), neutral evolution (when the ratio is 1) and also positive selection (when the ratio is greater than 1). The models implemented in PAML investigates the ratio of nonsynonymous to synonymous substitution rates or dn/ds indicated by the symbol for any site (codon) in the gene as a random variable from a statistical distribution. This allows ω to vary in the different sites of a codon (Yang 2007). The estimate of selection on a gene is estimated by obtaining the proportion of sites with different ω values. PAML estimates the maximum likelihood value of a model given the data. The statistical significance of a model describing the data can be tested using a likelihood ratio test (LRT). LRT is the negative of twice log likelihood difference between the two compared models, an alternative and a null. LRT follows a distribution similar to χ 2 distribution. Hence, the p- value for the significance of the models can be obtained from the χ 2 distribution with degrees of freedom equal to the difference in number of parameters between the two models (Yang 1997, 2007) PAML branch model In this study the possibility of adaptive evolution in the D. santomea lineage was investigated using branch models. Analyses with PAML require the phylogenetic relation (tree) of the species to be provided. In our case, the initial phylogenetic tree was obtained based on the results of the program MEGA (Tamura et al. 2007). Branch models implemented in PAML allows estimating in specific branches of the phylogenetic tree.

45 27 Two different branch models were used. First, a branch model that allows the ω ratio to be estimated only in the D. santomea branch was used. This model (which is the alternative model) was compared with a null model in which ω was fixed in all the branches of the phylogeny. The likelihood values of both the alternative and null models model were then compared with the LRT. Another, more realistic, model estimated ω in D. yakuba and D. santomea branches separately, while maintaining the same ω for all other branches of the phylogenetic tree was also investigated. This alternative model was compared to a null model in which D. yakuba and D. santomea branches had the same ω while all the other branches had a single, possibly different, ω. This second model was investigated to better understand changes in ω that occurred in D. santomea when compared to its sister species D. yakuba after they diverged from their common ancestor. All thirty two genes in the detection of external stimuli group were analyzed individually and as a concatenated sequence with the two models mentioned above. For the functionally random group of genes, only genes with sequence length greater than 270 bp (70 genes) were investigated individually with the two models. The concatenated sequence of functionally random genes was obtained by combining the sequence of the above mentioned 70 genes and the partial sequence of another 54 genes PAML Branch-site model Even though the branch model mentioned above estimates in specified branches of the phylogeny, it does not estimate of individual codon of that branch. Thus, a branch site model implemented in PAML that detects positive selection on specific codons in the D. santomea lineage was also used. In this model, codons in D. santomea lineage are checked for positive selection by designating the D. santomea branch as

46 28 foreground branch and the remaining branches as background branches. The model assumes four classes of sites or codons. Site class 0 includes codons with ω conserved throughout the tree and is varied between 0 and 1 (0<ω<1). Site class 1 involves codons with ω equal to 1. Site class 2a and 2b have ω conserved or neutral in the background branches while it changes to positive selection (ω>1) in the foreground branches. Such a model is the alternative model which can be compared with another null model. In the null model, the ω in the site classes 2a and 2b in D. santomea branch is fixed to 1, indicating neutral evolution. The rest of the site classes in all branches evolve either under purifying selection or under neutral evolution. LRT can then be conducted to check if the model with positive selection is significantly different from the neutral model (Zhang, et al. 2005) Codonbias program One limitation of using the program PAML in this type of analysis is that the rate synonymous change is assumed to be neutral while estimating ω. However, recent reports have shown that synonymous sites are not completely neutral in Drosophila, with a significant bias in the usage of synonymous codons (Ikemura 1981; Akashi 1997). To correct for codon usage bias, we used the maximum likelihood method implemented in the program Codonbias (Nielsen et al. 2007). Codonbias accounts for the selection on synonymous codons while calculating ω. Codonbias, however, requires the preferred codons in the species to be provided by the user. To determine the preferred codons in the D. yakuba D. santomea lineage, CodonW 1.4 (John Peden, was used (Table 2.1). Branch models similar to the ones

47 29 investigated with PAML can also be tested with Codonbias, also using likelihood ratio tests. The reason for analyzing the data with both the programs, PAML and Codonbias, was that Codonbias program determines selection acting on specific branches of the phylogeny. PAML, on the other hand, can estimate the selection acting on both the branches and also on the codon in a specific branch. 2.3 Results Transition to transversion rate ratio The ratio of transition to transversion rates (kappa) for the concatenated sequence of 157 genes was determined first and was found to be 1.98 using a one-ratio model which assumes the same ω ratio for all branches in the phylogeny. Since PAML requires a phylogenetic tree to be specified, the initial tree used in PAML was generated with the program MEGA, with the concatenated sequence of 156 genes. The one-ratio model in PAML then determined the branch lengths of this tree (Fig.2.1). These values for kappa and the branch lengths were used for all further analyses with PAML PAML branch-model Two branch models were tested with the concatenated sequences, one in which ω was determined only in D. santomea branch (two-ratio model) and another in which ω was determined in both D. santomea and D. yakuba branch (three ratio model). For the external stimuli genes, ω was significantly greater in D. santomea in the two-ratio test (p=0.026). For functionally random group of genes, the ω was significantly higher in the D. santomea branch in the two-ratio test (p=0.18) (Table 2.2). In the three-

48 30 ratio test, ω was greater in D. santomea in both group of genes (three-ratio test: external stimuli genes, p=0.004; functional random genes, p=0.035) (Table 2.3) (Fig 2.3). In the external stimuli genes, the ω increased 2.1 times in D. santomea when compared to D. yakuba while in the functionally random genes the increase of in D. santomea was 1.5 times when compared to D. yakuba (Fig2.4). The increase in ω in D. santomea when compared to D. yakuba, in both group of genes, indicates relaxed selection acting on the D. santomea genome. However, the greater increase in ω in external stimuli genes when compared to functionally random genes could suggest additional forces acting on external stimuli genes relative to genomw-wide tendencies, possibly due to increased selection on this class of genes. To check if these results are robust and also to remove any confounding effect of codon usage bias, we further tested these genes with Codonbias program (see below). Analysis of individual genes (Fig 2.2a and 2.2b) showed that two genes in detection of external stimuli group, pinta and Gβ76C, have a significantly greater ω in D. santomea lineage (two-ratio test: pinta p=0.0031, Gβ76C p= ; three-ratio test: pinta p=0.024, Gβ76C p=0.040). In the functionally random group of genes, one gene, , was significant in the three-ratio test in D. yakuba branch (p=0.001) albeit in the opposite direction, evolving under stronger purifying selection in D. santomea lineage PAML branch-site model The analysis of the concatenated sequences of external stimuli group of genes showed higher ω in the foreground D. santomea branch (ω=1.72). However, this value

49 31 was not significantly different from the null model with neutral evolution rate of ω. In the functionally random group of genes, was not greater in the same analysis Codonbias results We further tested both group of genes with the program Codonbias (Nielsen et al. 2007) to take into account selection on synonymous codons while calculating ω. Again two models were used, one to detect ω only in D. santomea branch and another to detect ω in both D. santomea and D. yakuba branches. Similar results were observed with Codonbias for both the models. When ω was determined in D. santomea lineage only, the concatenated sequence of external stimuli genes had greater ω than the rest of the phylogeny (LRT p=0.015). Conversely, the concatenated sequence of random group of genes showed significant decrease in ω in D. santomea lineage (LRT p=0.005) (Table 2.4). When ω was determined in both D. santomea and D. yakuba separately, both external stimuli and functionally random genes had greater ω values in D. santomea, when compared to D. yakuba (p=0.001 and respectively) (Table 2.5). Similar to the results from PAML, the increase in ω detected in the D. santomea branch was greater in external stimuli genes that in functionally random genes, 2.3 and 1.6 times, respectively (Table 2.5) (Fig 2.7). To test if both groups of genes are evolving under relaxed selection or positive selection, the ω ratios of individual genes in both groups of genes were compared with a nonparametric test. If both groups of genes are evolving under relaxed selection in D. santomea, the increase in ω in external stimuli genes should not be significantly greater than the increase in ω in functionally random genes. However, if the increase in ω in external stimuli genes was significantly greater than the increase in ω in functionally

50 32 random genes, it would suggest an accelerated rate of evolution in the external stimuli genes. Note also that the application of this nonparametric test reduces the potential impact of outlier genes with exceedingly high or low. The comparison of ω values for individual genes show a significant difference between the two groups of genes (Mann- Whitney U test p=0.0407). The greater increase in ω in the external stimuli genes indicates that these genes are probably experiencing positive selection due to the adaptive advantage provided by these genes. Among the individual genes, pinta and Gβ76C again had significantly higher ω when compared to rest of the phylogeny (two-ratio test: pinta, p=0.0046, Gβ76C p=0.002; three-ratio test: pinta, p=0.011, Gβ76C p=0.048) (Fig 2.5). In the functionally random group of genes, had significant change in in the three-ratio test (p=0.0004). Similar to results from PAML this gene was evolving under purifying selection in the D. santomea lineage but has amino acid substitutions in the D. yakuba lineage (Table 2.4 and Table 2.5) (Fig 2.6). As expected due to larger effective population size, selection coefficients for synonymous mutations were higher in the D. yakuba lineage than in the D. santomea lineage in both models tested (Table 2.4 and 2.5). Results of PAML and Codonbias for individual genes are listed in appendix (Table A.1 A. 8). 2.4 Discussion This study investigated the possibility of adaptive evolution in D. santomea external stimuli genes. This group of genes exhibited significantly different expression in a genome-wide expression study between D. santomea and D. yakuba (A. Llopart, personal communication). In the new environment of D. santomea (high elevation, darker

51 33 and more humid) expression changes of these genes could have provided a fitness advantage in this species. To understand whether there are adaptive changes in the protein coding sequences of these genes, the ratio of nonsynonymous changes to synonymous changes ( ) was estimated based on two maximum likelihood programs PAML and Codonbias. The group of external stimuli genes was also compared to a functionally random group of genes to distinguish between positive selection acting at this class of genes and genome-wide patterns of relaxed selection associated with the colonization process and smaller effective population size in D. santomea. The ω in D. santomea is greater in external stimuli genes than in the rest of the phylogeny. On the other hand in functionally random genes, ω decreases in D. santomea when compared to the background branches. This would indicate that positive Darwinian selection is acting on external stimuli genes in the D. santomea lineage. However, when ω is determined in both D. santomea and D. yakuba and compared with the background phylogeny, ω is greater in both groups of genes when compared to D. yakuba. This would suggest that selection has been relaxed in D. santomea throughout the genome. While this could be true and congruent with the recent demography of D. santomea, the increase in ω in both groups of genes is not equivalent. Two different results suggest that the external stimuli genes are evolving at a higher rate than functionally random genes. First, the external stimuli genes show a greater increase in ω when compared to functionally random genes. Second, a nonparametric test comparing ω values of individual genes between these two groups reveals that these two groups are evolving under different selective regimes with external stimuli genes experiencing higher and increased selection. Together, these results indicate that

52 34 external stimuli genes might be under increased selection as a group probably adding a selective advantage in D. santomea in its new environment during or shortly after colonization. Selection acting on external stimuli genes in D. santomea could be providing an advantage to the species in its current darker habitat than the habitat of its sister species D. yakuba. The mist montane forest along the higher elevations of the volcanic mountain of São Tomé, covered in canopy, reduces the amount of light reaching the ground. Successful survival of any species, relying on visual cues, would have been dependent on the ability to see better in such low light conditions. Adaptive evolution in protein coding sequence of genes involved in visual sensitivity has been observed in other species. In cichlid fishes living in the turbid and clear lakes of East Africa, spectral tuning of rods and cones is accomplished by shifts in the wavelength of peak absorbance. This has been achieved by changes in gene expression and amino acid substitutions. Four of the six opsin genes in cichlids are evolving under positive selection (Spady et al. 2005). Teleost fishes of sub-order Cottoidei dwelling at different depths of Lake Baikal have undergone amino acid substitutions in response to shift in absorbance of wavelength by both rod and cone pigments (Hunt et al. 1996). Our analysis of individual genes indicates positive selection acting on two genes, pinta and Gβ76C, in D. santomea. These two genes are involved in phototransduction. pinta codes for a retinoid binding protein (RBP) in the Drosophila eye which is required in the retinal pigment cells for biosynthesis of rhodopsin after the light induced isomerization of chromophore (Wang and Montell 2005). The exact role of Gb76C in

53 35 phototransduction is unknown. However, its role in phototransduction is evident due to its expression only in the photoreceptor cells in Drosophila eye (Boto et al. 2010). The use of a large number of random genes also reveals genome-wide patterns likely associated with demographic processes and relaxed selection. Importantly, our comparison of external stimuli genes to the functionally random genes indicates that positive selection might be acting only on external stimuli genes with two genes involved in phototransduction showing a significant pattern of positive selection. However, the absence of signature of positive selection in other genes involved in phototransduction suggests that this selective trend may be limited. Further population genetic studies on polymorphism and divergence need to be conducted on this group of genes to better understand the evolutionary processes on these genes, as a group and individually.

54 36 Table 2.1 Preferred codon in D. yakuba estimated by CodonW program and used in Codonbias to estimate selection synonymous codons Preferred codons ttc tac cac cgc ctg cag atc acc agc aag gcc gac gtg gag atg ccg tgg ggc aac tgc

55 Table 2.2 The ω (nonsynonymous to synonymous rate ratio) values for D. santomea and rest of the branches of the phylogeny from the program PAML. PAML ω (dn/ds) s B lnl s=b lnl s B p- value External Stimuli genes Functionally Random genes Concatenated sequence (n = 32, ~40kb) pinta Gβ76C Concatenated sequence (n=124, ~50kb) Note: s= D. santomea branch, B= Background branches, lnl s=b is the log likelihood value of the model when ω is fixed for all the branches of the phylogeny. lnl s B is the log likelihood value of the model when ω is determined in D. santomea branch only, while a single ω is fixed for all the branches of the phylogeny. p-value is determined based on the likelihood ratio test value (2*(lnL s=b lnl s B )) which follows a chi-square distribution with df=1. 37

56 Table 2.3 The ω (nonsynonymous to synonymous rate ratio) values for D. santomea, D. yakuba and rest of the branches of the phylogeny from the program PAML. PAML ω (dn/ds) s y B lnl s y B lnl s=y B p-value External Stimuli genes Concatenated sequence (n = 32, ~40kb) Gβ76C pinta Functionally Random genes Concatenated sequence (n=124, ~50kb) Note: s= D. santomea, y= D. yakuba branch, B= Background branches, lnl s=y B is the log likelihood value of the model when one ω is fixed for all the branches of the phylogeny, except for D. santomea and D. yakuba branch which have a different single ω. This model is compared with another model in which lnl s y B is the log likelihood value of the model when ω is varied separately in D. santomea and D. yakuba while the background branches of the phylogeny have a single ω fixed. p-value is determined based on the likelihood ratio test value (2*(lnL s y B lnl s=y B )) which follows a chi-square distribution with df=1. 38

57 Table 2.4 The ω (nonsynonymous to synonymous rate ratio) and codon usage bias values for D. santomea and rest of the branches of the phylogeny from the program Codonbias. External Stimuli genes Functionally Random genes Concatenated sequence (n = 32, ~40kb) Codonbias Codon usage bias ω (dn/ds) lnl p-value D. santomea Background D. santomea branch Background s=b s B Gβ76C pinta Concatenated sequence (n=124, ~50kb) Note: s= D. santomea branch, B= Background branches. lnl s=b is the log likelihood value of the model when ω is fixed for all the branches of the phylogeny. lnl s B is the log likelihood value of the model when ω is determined in D. santomea branch only, while a single ω is fixed for all the branches of the phylogeny. Codon usage bias is the selection on the synonymous sites of a codon. p-value is determined based on the likelihood ratio test value (2*(lnL s=b lnl s B )) which follows a chi-square distribution with df=1. 39

58 Table 2.5 The ω (nonsynonymous to synonymous rate ratio) and codon usage bias values for D. santomea, D. yakuba and rest of the branches of the phylogeny from the program Codonbias. Codonbias Codon usage bias ω (dn/ds) lnl p-value External Stimuli genes D. santomea D. yakuba Back ground D. santomea D. yakuba Back ground s y B s=y B Concatenated sequence (n = 32, ~40kb) Gβ76C pinta Function -ally Random genes Concatenated sequence (n=124, ~50kb) Note: s= D. santomea, y= D. yakuba branch, B= Background branches, lnl s=y B is the log likelihood value of the model when one ω is fixed for all the branches of the phylogeny except for D. santomea and D. yakuba branch which have a single ω. This model is compared with another model in which. lnl s y B is the log likelihood value of the model when ω is varied separately in D. santomea D. yakuba while the background branches of the phylogeny while the background branches have a single ω fixed. Codon usage bias is the selection on the synonymous sites of a codon. p-value is determined based on the likelihood ratio test value (2*(lnL s y B lnl s=y B )) which follows a chi-square distribution with df=1. 40

59 Figure 2.1 Phylogenetic tree based on branch lengths obtained by the program PAML using a concatenated sequence of all the 156 genes. 41

60 Gycalpha99B ogre inad cry CG3964 ninaa Fer3HCH PGRP-LC FKBP59 ninae Pgm Rh6 Rh5 Arr1 Arr2 inac CG6751 CG7650 tko CdsA Gbeta76C ninab Rh4 Rh3 laza pinta pain Rh2 Galpha49B ninad inaf-d ω (dn/ds) Gycalpha99B ogre inad cry CG3964 ninaa Fer3HCH PGRP-LC FKBP59 ninae Pgm Rh6 Rh5 Arr1 Arr2 inac CG6751 CG7650 tko CdsA Gbeta76C ninab Rh4 Rh3 laza pinta pain Rh2 Galpha49B ninad inaf-d ω (dn/ds) * * A D. santomea D. yakuba B Figure 2.2 The nonsynonymous to synonymous rate ratio) of external stimuli genes estimated separately using the program PAML for D. santomea and D. yakuba branches. (A) of individual genes in D. santomea. (B) of individual genes in D. yakuba. Note: Asterisk indicates P < The name of the genes analyzed is shown below the x- axis.

61 ω (dn/ds) * * "External Stimulus Genes" D. santomea D. yakuba Figure 2.3 The nonsynonymous to synonymous rate ratio) of the concatenated sequence of external stimuli genes, estimated separately using the program PAML for D. santomea and D. yakuba branches. Note: Double asterisks indicate P < 0.01.

62 44 D. santomea / D. yakuba Figure 2.4 Increase in nonsynonymous to synonymous rate ratio) in D. santomea when compared with D. yakuba in external stimuli and functionally random genes. Note: is determined separately using the program PAML with a branch model that allows variation in for D. santomea and D. yakuba branches while rest of the branches of the phylogeny have same.

63 Gycalpha99B ogre inad cry CG3964 ninaa Fer3HCH PGRP-LC FKBP59 ninae Pgm Rh6 Rh5 Arr1 Arr2 inac CG6751 CG7650 tko CdsA Gbeta76C ninab Rh4 Rh3 laza pinta pain Rh2 Galpha49B ninad inaf-d ω (dn/ds) Gycalpha99B ogre inad cry CG3964 ninaa Fer3HCH PGRP-LC FKBP59 ninae Pgm Rh6 Rh5 Arr1 Arr2 inac CG6751 CG7650 tko CdsA Gbeta76C ninab Rh4 Rh3 laza pinta pain Rh2 Galpha49B ninad inaf-d ω (dn/ds) A ** * D. santomea B D. yakuba Figure 2.5 The nonsynonymous to synonymous rate ratio) of external stimuli genes estimated separately using the program Codonbias for D. santomea and D. yakuba branch. (A) f individual genes in D. santomea branch (B) of individual genes in D. yakuba branch. Note: Asterisk indicates P < 0.05 and double asterisks indicate P < The name of the genes analyzed is shown below the x-axis.

64 Adh AP-50 Art2 Bangles&Beads Barren CG10019 CG10202 DnaJ-1 CG11321 CG11379 CG11892 CG12147 CG12229 CG12943 CG1324 CG13243 CG14087 ver CG15362 CG15497 And CG2113 CG30499 CG31148 Jupiter RpL10Aa CG4161 sqz qua CG8476 CG9642 Cyp4d21 Dhc36C Disembodied DOS Esterase6 Forked Hex-C Hex-A His3 Hsc70-4 Hunchback Kr Lsp-1gamma LvpH NGP - Pgi Pgm Rad1 Rep4 Roughex rpl14 Rpn5 salr Sara sfl Singed Sod Sog tan Trp1 ω (dn/ds) Adh AP-50 Art2 Bangles&Beads Barren CG10019 CG10202 DnaJ-1 CG11321 CG11379 CG11892 CG12147 CG12229 CG12943 CG1324 CG13243 CG14087 ver CG15362 CG15497 And CG2113 CG30499 CG31148 Jupiter RpL10Aa CG4161 sqz qua CG8476 CG9642 Cyp4d21 Dhc36C Disembodied DOS Esterase6 Forked Hex-C Hex-A His3 Hsc70-4 Hunchback Kr Lsp-1gamma LvpH NGP - Pgi Pgm Rad1 Rep4 Roughex rpl14 Rpn5 salr Sara sfl Singed Sod Sog tan Trp1 ω (dn/ds) A CG42816 CG9510 CG3492 CG3581 y D. santomea B CG42816 CG9510 CG3492 CG3581 y D. yakuba Figure 2.6 The nonsynonymous to synonymous rate ratio) of functionally random genes estimated separately using the program Codonbias for D. santomea and D. yakuba branch. (A) f individual genes in D. santomea branch (B) of individual genes in D. yakuba branch. Note: The name of the genes analyzed is shown below the x-axis.

65 D. santomea / D. yakuba "External Stimulus Genes" Funtionaly Random Genes Figure 2.7 Increase in nonsynonymous to synonymous rate ratio) in D. santomea when compared with D. yakuba in external stimuli and functionally random genes. Note: is determined separately using the program Codonbias with a branch model that allows variation in for D. santomea and D. yakuba while rest of the branches of the phylogeny have the same.

66 48 CHAPTER 3 ADAPTATION TO A NEW ENVIRONMENT IN DROSOPHILA SANTOMEA: EFFECT OF COLD TEMPERATURE ON FITNESS IN D. SANTOMEA S NEW ENVIRONMENT 3.1 Introduction The distribution of D. santomea is restricted to higher elevations along the slope of the Pico de São Tomé. D. santomea is found above 1150 m along the volcanic mountain while D. yakuba is found below 1450 m (Lachaise et al. 2000; Llopart et al. 2005a) (Fig 3.1). The differences in the habitat preferences (mentioned in chapter 2) and distribution of these two species indicate that along with changes in gene expression and protein sequence of external stimuli genes, physiological adaptive changes might have occurred in D. santomea in its ability to tolerate abiotic factors like cold temperature, low light and high humidity Examples of adaptation to abiotic factors Adaptation to different abiotic factors has been observed in many taxa. Lizards of the genus Anolis comprise of more than 400 species, with more than 150 species in different habitats of Caribbean islands due to adaptive radiation. On these islands, independent colonization has resulted in a striking pattern of convergent evolution in morphology and behavior, with six different types of habitat specialist ecomorphs observed thus indicating microhabitat specialization (Schneider 2008; Losos and Schneider 2009).

67 49 African cichlids inhabiting Lake Tanganyika (200 species), Lake Malawi and Lake Victoria (each with more than 500 species) are also excellent examples of adaptive ecological radiation and speciation. The radiation in Lake Malawi can be described by three different stages, the first stage involves adaptation to distinct rocky and sandy habitats followed by differentiation of jaw morphology by natural selection. The third and final stage entails diversification of male color patterns (Kocher 2004). Temperature plays an important role in habitat selection and distribution of many species. Seasonal variation in temperature affects the latitudinal diversity in hylid frogs by limiting the dispersal of many tropical species into temperate regions (Wiens et al. 2006). Temperature also plays an important role in distribution and habitat selection in many freshwater turtles (Tamplin 2006; Schofield et al. 2009). The distribution of nitrogen fixing cyanobacteria Trichodesmium spp. in the tropical and temperate waters is dependent on temperature. Nonheterocystous cyanobacteria (found in tropical oceans) requires higher temperature for the fixation of N 2 when compared with the heterocystous cyanobacteria (found in temperate fresh water and brackish water lakes) (Staal et al. 2003). The geographic distribution of rodents in colder environments has been also restricted by high thermoregulatory energy expenditures, with a distribution that correlates with maximum cold induced metabolism, which is higher in temperate species and lower in tropical species (Bozinovic and Rosenmann 1989). Such a correlated geographic distribution has also been observed in 44 species of passerine birds in relation to temperature (Swanson and Garland 2009). Finally, temperature tolerance is a critical factor determining the range of distribution and particularly their successful overwintering in high latitudes in many

68 50 species. Insects of Liriomyza species, many of which are pests of ornamental and agricultural plants vary in their ability to tolerate cold stress. Thermal stress tolerance studies have shown that distribution of these insects is correlated with the ability to survive at low temperatures (Kang et al. 2009). Temperature affects the distribution of many species of Drosophila (Kimura 2004; Kellermann et al. 2009). Even widely distributed cosmopolitan Drosophila species like D. melanogaster and D. simulans have significant differences in tolerating extreme temperatures. D. simulans tolerates colder temperature better than D. melanogaster and conversely D. melanogaster tolerates higher temperature better than D. simulans (Chakir et al. 2002, David et al. 2005). The possibility of a similar change in temperature tolerance in D. santomea when compared to D. yakuba was shown by Matute et al. (2009). They found that tolerance to high temperature could be an important isolating factor restricting the distribution of these two species on the island of São Tomé. The traits compared in the study were larval viability, egg hatchability, adult fertility, adult longevity and adult temperature preference. At higher temperatures, D. santomea scored significantly lower than D. yakuba in all these fitness traits. However, the authors didn t observe the opposite trend in colder temperatures, where D. santomea was expected to have higher fitness than D. yakuba. While these results are unexpected because they would suggest an expansion of the permissive temperature range in D. yakuba instead of a shift towards higher temperature tolerance at the expense of reduced fitness at low temperatures, it could be also due to the stages of the flies (egg and larval) on which the effect of the temperature was investigated.

69 51 In the study described in this chapter, both D. yakuba and D. santomea were examined extensively for tolerance to lower temperature by studying the effect of temperature on various stages of the life cycle. The life cycle was divided into three stages: aging, mating and development as explained below. The effect of three different temperatures, 15 C, 18 C and 21 C, on fitness of D. yakuba and D. santomea was then investigated and compared to fitness at 23 C. Importantly, two lines of D. yakuba were investigated, one sympatric (D. yakuba Bosu, island line) and the other allopatric (D. yakuba genome, mainland line) to capture possible variability within D. yakuba thus providing a required context to assess putative changes in D. santomea. 3.2 Materials and Methods Temperature tolerance experiments For all experiments at all temperatures, flies were maintained on standard cornmeal/yeast/agar medium under a 12 h light/dark cycle. D. santomea STO.4 was collected from the Obo Natural Reserve on São Tomé Island, D. yakuba genome was collected from the border between Guinea and the northwest Ivory Coast and D. yakuba BOSU was collected at an altitude of 1153 m on São Tomé Island. Note that the D. yakuba BOSU was collected in the zone of sympatry with D. santomea thus making conservative inferences about D. santomea adaptation to a colder environment. Two life history traits, number of females producing live offspring and the average number of live offspring produced, were investigated. These traits were first compared at 23 C in both species and then compared at three different lower temperatures, 21 C, 18 C and 15 C. At each temperature virgins were collected from both species. Male and female virgin flies were maintained separately for 3 days in 8

70 52 dram glass vials. This stage will henceforth be referred to as aging. On the 4 th day 10 males and 10 females were combined in a vial for a period of 7.5 to 8.5 hrs. This was the second stage and will henceforth be referred to as mating. After the mating period, males were discarded and females were placed in individual 8 dram vials and allowed to lay eggs for 2 days. In total, females were allowed to lay eggs in five different vials, two days each in 1 st, 2 nd and 3 rd vial and one week each in 4 th and 5 th vial. This stage, when females were allowed to lay eggs in single vial, will hence be referred to as development. The effect of lower temperatures on each of the above mentioned three stages of the life cycle was then tested by comparing the two life history traits. To test the effect of low temperature on aging, flies were first maintained before mating at any of the three different temperatures (15 C, 18 C or 21 C), while mating and development was done at 23 C. Previous studies had shown that extreme temperatures could reduce fertility in males by decreasing sperm motility (Chakir et al. 2002; Araripe et al. 2004) and thus this part of the study on effect of low temperature on aging was expanded, testing males and females separately for possible effects of cold temperature on fertility. When males were aged at any of the three different lower temperatures, females were maintained at 23 C. Similarly, when females were aged at any of the three lower temperatures, males were maintained at 23 C. After aging, mating and development were carried out at 23 C. When testing the effect of temperature on mating, aging and development were done at 23 C while mating was done at 15 C, 18 C or 21 C. To test the effect of temperature on egg laying and development, aging and mating were carried out at 23 C

71 53 while egg lying and development were carried out at the above mentioned three different temperatures separately. 3.3 Results For each of the conditions tested, the results of the percentage of females that produced live offspring will be discussed first, followed by the average number of offspring produced by each female Aging, mating and development at constant temperature When flies were aged, mated and developed at constant temperatures, the percentage of females that produced live offspring was greater in D. santomea at 15 C and 18 C (Fig.3.2). At 15 C, only 3% of the D. santomea females produced live offspring while no offspring were produced by the D. yakuba females. At 18 C, there was a significant difference in the females which produced live offspring between the two species (p< 10-4, Fisher Exact test). This difference was greater between D. santomea and the island D. yakuba than between D. santomea and the mainland D. yakuba. While 75% of D. santomea females produced live offspring at 18 C, only 3% of the island D. yakuba and 35% of the mainland D. yakuba produced live offspring. At 21 C, there was no significant difference between D. santomea and either lines of D. yakuba. At 23 C, no significant difference was observed in both species in the percentage of females that produced live offspring. The average number of offspring produced by D. santomea was higher compared to D. yakuba at all temperatures with the exception of 23 C (Fig.3.3). At 15 C, very few offspring were produced by D. santomea while none were produced by both lines of D.

72 54 yakuba. At 18 C, the number was significantly greater in D. santomea compared to mainland and island D. yakuba (p<10-6, Mann-Whitney U test). Notably, the island D. yakuba produced significantly lower number of offspring than mainland D. yakuba (p<0.05, Mann-Whitney U test). Another interesting observation was that at 18 C the mainland D. yakuba females produced on average twice the number of males compared to females (number of males when compared to females; p<10-4, Mann-Whitney U test) (Fig.3.4). No such male bias, however, was observed in either D. santomea or island D. yakuba. Male to female ratio did not differ significantly at 21 C and 23 C in either species. Again, island D. yakuba produced significantly lower number of offspring when compared to both D. santomea (p<0.01 Mann-Whitney U test) and mainland D. yakuba (p<0.05 Mann-Whitney U test). Although at 23 C the average number was higher in both D. yakuba lines relative to D. santomea, no statistically significant difference was observed between the two species Aging at different temperatures, mating and development at 23 C Cold temperature is known to reduce the ability of males to produce viable sperm in Drosophila (Chakir et al. 2002; Araripe et al. 2004; David et al. 2005). Both male and female fertility were tested at temperatures of 15 C, 18 C and 21 C. When males were aged at 15 C and females were maintained at a constant temperature of 23 C, 38% of the D. santomea mated females produced live offspring, while only 4% of the mainland D. yakuba females and 7% of island D. yakuba females (p<10-4, Fisher Exact test) produced live offspring (Fig.3.5). At 18 C, the percentage of females which produced live offspring was lower in both mainland and island D. yakuba females when compared to D.

73 55 santomea, however, this difference was not significant (p=0.106). When females were aged at low temperatures, significantly greater number of D. santomea females produced live offspring at 15 C than both lines of D. yakuba (p<0.10-3, Fisher Exact test) (Fig.3.6). This difference was not observed, however, at the other three temperatures tested. The average number of offspring produced at 15 C was significantly greater in D. santomea compared to both island and mainland D. yakuba (for both comparisons of D. santomea with island and mainland D. yakuba: p<10-3, Mann-Whitney U test) (Fig.3.7). At 18 C, D. santomea produced significantly greater number of offspring compared to both island and mainland species of D. yakuba (for both comparisons of D. santomea with island and mainland D. yakuba; p<0.05, Mann-Whitney U test). When females were aged at 15 C and males were aged at 23 C, D. santomea produced significantly greater number of offspring when compared to both lines of D. yakuba (p<10-5, Mann-Whitney U test) (Fig.3.8). This difference, however, was not observed at the higher temperatures. There was no difference in the number of male and female offspring produced by either species based male or female aging at low temperature Mating at different temperatures, aging and development at 23 C To test the effect of temperature on mating, flies were aged at 23 C, and allowed to mate at 15 C, 18 C or 21 C. After mating, vials were maintained at 23 C for egg laying and development. No significant difference was observed between the two species in either the number of females that produced live offspring or in the average number of offspring produced. At all three temperatures, the percentage of females that produced live offspring in D. santomea was between 70-80% (Fig.3.9). In D. yakuba, % of

74 56 the females produced live offspring. The average number of offspring produced in both species was between 90 and 140 (Fig.3.10) Aging and mating at 23 C, development at different temperatures The number of females that produced live offspring at 15 C was significantly higher in D. santomea than both mainland and island lines of D. yakuba (p< 0.001, Fisher Exact test) (Fig.3.11). The adverse effect of 15 C on development was also obvious from the presence of pupae in vials of mainland D. yakuba which could not metamorphose into adults and died, even after maintaining the vials at 15 C for more than 5 weeks. No dead pupae were observed in D. santomea. Island D. yakuba did not produce any pupae or offspring. The percentage of females which produced live offspring was not significantly different between the two species either at 18 C or at 21 C, and no dead pupae were observed in either species at these temperatures. At all the three temperatures, D. santomea produced a greater number of offspring compared to D. yakuba. The average number of offspring produced by D. santomea was significantly greater than that both lines of D. yakuba at 15 C, 18 C and 21 C (15 C: p<10-6, 18 C: p<0.01, 21 C: p<0.05, Mann-Whitney U test) (Fig.3.12). At 18 C, significantly greater number of offspring was observed in mainland compared to island D. yakuba (p< , Mann-Whitney U test). At 15 C many pupae were present in the vials and only few metamorphosed into adults, all of which were males. When development was allowed at 18 C while aging and mating were done at 23 C, a bias in sex ratio was also observed for mainland D. yakuba. At 18 C, the number of males was twice the number of females produced (p< , Mann-Whitney U test)

75 57 (Fig.3.4). This bias in sex ratio was similar to the bias observed in mainland D. yakuba when aging, mating and development were allowed at 18 C. At 21 C, the number of females produced in mainland line was lower than males but the difference was not significant (p= 0.053, Mann-Whitney U test). There was no significant difference in the number of males and females produced either by D. santomea or by island line of D. yakuba at any of the three temperatures tested. 3.4 Discussion Adaptation in D. santomea to colder temperature The distribution of both species in the island is restricted by altitude along the volcanic mountain of Pico de São Tomé. Even though from 1150 to 1450m both species occupy the mountain together forming a hybrid zone (Llopart et al. 2005), D. santomea is not found in the lower altitudes and conversely, D. yakuba is not found in the higher altitudes. Habitat preference as a result of different biotic and abiotic environments could be a factor restricting the distribution of the species. The reason for testing the effect of low temperature was because of the vital role temperature plays in distribution of many species, including Drosophila (Kimura 2004; Kellermann et al. 2009). Moreover, we also wanted to extend a previous study done by Matute et al. (2009) that observed a physiological role of high temperature possibly influencing the distribution of the species. The experiments on temperature tolerance indicate that at low temperatures D. santomea has higher fitness when compared with either D. yakuba line, when the flies were allowed to mature, mate, lay eggs and develop their offspring at 18 C. D. santomea scored significantly better than D. yakuba in both fitness traits, the number of females which produced live offspring and the average number of offspring produced. At this

76 58 temperature, D. yakuba mainland line has higher fitness when compared to D. yakuba island line. The absence of any offspring at 15 C indicates that there could be a threshold for tolerating lower temperature in both species, below which the cellular machinery responsible for growth and reproduction do not function properly. Along the mountain in São Tomé, temperature decreases with the altitude and could range from 29 C at sea level to well below 10 C at night at the highest point (Matute et al. 2009). To understand the reason for the significant reduction in the number of offspring in both species at lower temperatures and the significantly lower number in D. yakuba compared to D. santomea, the effect of temperature on different stages of life cycle in both species was studied. The partitioning of the life cycle into aging, mating and development allowed teasing out the effect of low temperatures on any of these stages. The results suggest that the reduction in number of the females producing live offspring and in the number of progeny could be explained by the combined effect of lower temperature during aging of males and also during development of offspring from fertilization to adult (Fig.3.13). The reduction in fitness due to aging of males at low temperature has been previously reported in some species of Drosophila, including D. melanogaster and D. simulans (Chakir et al. 2002; Araripe et al. 2004; Vollmer et al. 2004; David et al. 2005). Extreme temperature is known to cause reversible sterility in Drosophila. Sterility at extreme temperatures due to loss of mobility of sperm can be reversed when the males are transferred to optimal temperatures (Chakir et al. 2002; Araripe et al. 2004). Similar to our results, no offspring were observed due to male sterility in Zaprionus indianus, a tropical drosophilid, which was constantly maintained at lower temperatures (14 C and

77 59 15 C). However, they also found that a few sperms did survive the lower temperatures (Araripe et al. 2004). Our study also indicates a decrease in fertility in males, probably due to sperm inviability when males are aged at low temperatures. Fertility increased significantly in both species when aging temperatures was increased. The fertility is lower in males of both lines of D. yakuba when compared with D. santomea. Contrary to other studies which didn t observe any reduction in fitness in females, a significant reduction in the fitness was observed in our study when females were aged at 15 C. This suggests that lower temperatures negatively affect reproductive processes significantly in both sexes of both species but to a higher extent in males, especially in D. yakuba. The effect of low temperatures on development was unexpected. Transferring the females to the low temperatures immediately after the mating period ensured that we could detect the effect of temperature during the zygotic stage of the life cycle. Both species show reduction in the number of live offspring produced but a significantly greater reduction was observed in D. yakuba. In both the species, a gradual increase in the number of offspring was observed with increase in temperature. Egg hatchability and larval survival traits have been studied for both the species by Matute et al. (2009) who found both traits to be lower in D. santomea when compared to D. yakuba at all the temperatures tested from 15 C to 28 C. However, our results differ from those reported by Matute et al. (2009). One reason for such discrepancy could be a difference with respect to the starting point in life cycle of the flies at which both experiments were conducted. While this study looked at the effect of temperature on females as soon as they were fertilized, Matute et al. (2009) investigated the effects of temperature from the egg stage, i.e. after the fertilized females had already laid their eggs. At 15 C, many

78 60 pupae were observed in the D. yakuba vials which could not develop into adults. Similar results were also observed in D. buzzatii when 1 st instar larvae reared at 12 C failed to develop into adults (Vollmer et al. 2004). These results strongly indicate that embryonic development from fertilization is also negatively affected by lower temperatures. Together with the finding of a significant higher fitness of D. santomea relative to either D. yakuba line, we also detected that at lower temperatures the island line of D. yakuba had lower fitness compared to the mainland line. D. yakuba distribution extends throughout sub-saharan Africa and it is probable that variation for temperature tolerance occur within the species. The mainland D. yakuba line comes from Ivory Coast while island D. yakuba recolonized when the Portuguese colonists turned large sections of coastal rainforest into plantations in last 500 years (Cariou et al. 2001; Llopart et al. 2005b), but its continental origin is unlikely to be Ivory Coast. Therefore, the differences between the island and mainland D. yakuba lines investigated might reflect intraspecific variation in temperature tolerance in continental Africa. Alternatively, there is the formal possibility that parallel to D. santomea adaptation to lower temperatures, adaptive changes in D. yakuba after its recent colonization of the island increased tolerance to high temperatures at the expense of losing fitness at low temperatures. Further studies comparing more lines of mainland and island D. yakuba are needed to assess whether population variation within mainland D. yakuba could explain the observed differences between the island and mainland lines investigated in this study. Our results suggest that the reduction in fitness at low temperature is due to the combined effect on both aging of male and development of larvae (Fig.3.13). Though D. santomea also shows reduced fitness at 15 C and 18 C relative to 21 C and 23 C, its

79 61 fitness was significantly higher than D. yakuba. At 18 C, D. santomea could produce 57% of the progeny produced at 21 C. This would indicate that at higher altitudes where temperature falls below 18 C flies, especially D. santomea, would reproduce in refuges which should be warmer than the outer environment. The fruit of a fig tree could be one such refuge. D. santomea emerges only from figs of a fig tree Ficus chlamydocarpa ssp. fernandesiana, which are abundant on the slopes of volcano from 1200m to 1750 m (Cariou et al. 2001; Llopart et al. 2005a). Unexpectedly, there is an evident male bias observed in the mainland D. yakuba line. Although this pattern is absent in the island line investigated, it raises the interesting possibility of explaining the absence of hybrids in nature between D. yakuba female and D. santomea males as observed by Llopart et al. (2005a). Our results indicate that temperature plays an important role in limiting the geographic range of these two species on the island. These results also show that, along with genetic changes evident from the differential expression and selection at the level of protein evolution in genes involved in external stimuli, there has been adaptation to low temperature in D. santomea.

80 62 Figure 3.1 Relative abundance of Drosophila yakuba, D. santomea, and hybrid flies at different elevations in São Tomé Island. Source: Llopart et al. (2005a). Note: The horizontal filled box marks the gradual transition between open cultivated fields/secondary forest and primary/rain forest (from Bom Sucesso at 1150 m to the Obo Natural Park entrance at 1350 m).

81 Percentage of females that produced live offspring D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.2 The percentage of females that produced live offspring when aging, mating and development were done at the same temperature in D. santomea, D. yakuba mainland line and D. yakuba island line. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Double asterisks indicate P < 0.01.

82 Average number of offspring per female * ** D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.3 The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when maintained at same temperature throughout their life cycle. Note: Standard errors of the mean among species tested are shown. Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Asterisk indicates P < 0.05 and double asterisks indicate P < 0.01.

83 Average number of offspring A B Figure 3.4 The average number of males and females produced by both species at different temperatures. (A) Offspring produced when aging, mating and development were done at constant temperature. (B) Offspring produced when aging and mating were done at 23 C and development was done at different temperatures. Note: S4M & S4F: D. santomea males and females respectively; YGM & YGF: D. yakuba mainland males and females respectively; YBM & YBF: D. yakuba island males and females respectively. Numbers correspond to the temperatures in degree Celsius at which the experiment was done. 65

84 Percentage of females that produced live offspring * D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.5 The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when aging of males was carried at low temperatures and mating and development were done at 23 C. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Asterisk indicates P < 0.05.

85 Percentage of females that produced live offspring ** D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.6 The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when aging of females was carried at low temperatures and mating and development were done at 23 C. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Double asterisks indicate P < 0.01.

86 Average number of offspring produced per female * D. santomea D. yakuba mainland D. yakuba island 40 ** Temperature C Figure 3.7 The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when males of both species were maintained at low temperatures for aging while mating and development is done at 23 C. Note: Standard errors of the mean among species tested are shown. Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Asterisk indicates P < 0.05 and double asterisks indicate P < 0.01.

87 Average number of offspring produced per female ** D. santomea D. yakuba mainland D. yakuba island Figure 3.8 The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when females of both species were maintained at low temperatures for aging while mating and development are done at 23 C. Note: Standard errors of the mean among species tested are shown. Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Double asterisks indicate P < 0.01.

88 Percentage of females that produced live offspring D. santomea D. yakuba mainland D. yakuba island Temperature 21 C 23 Figure 3.9 The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged at 23 C, mated at the four different temperatures and developed at 23 C. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba.

89 Average number of offspring per female D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.10 The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged at 23 C, mated at the four different temperatures and developed at 23 C. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba.

90 Percentage of females that produced live offspring Sto4 Yak genome Yak Bosu Temperature C Figure 3.11 The percentage of females that produced live offspring in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged and mated 23 C and developed at the four different temperatures. Note: Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Double asterisks indicate P < 0.01.

91 Average number of offspring per female * * ** D. santomea D. yakuba mainland D. yakuba island Temperature C Figure 3.12 The average number of offspring produced in D. santomea, D. yakuba mainland line and D. yakuba island line when both species were aged and mated at 23 C and developed at the four different temperatures. Note: Standard errors of the mean among species tested are shown. Statistical significance is shown when D. santomea is different from both lines of D. yakuba. Asterisk indicates P < 0.05 and double asterisks indicate P < 0.01.

92 74 A Average no. offspring per female Same temp Age temp Mating temp Dev temp B C Average no. offspring per female Average no. offspring per female Temperature C Same temp Age temp Mating temp Dev temp Same temp Age temp Mating temp Dev temp Figure 3.13 Effect of temperature on different stages of life cycle in (A) D. santomea, (B) D. yakuba mainland line and (C) D. yakuba island line. Note: Standard errors of the mean among species tested are shown.

93 75 CHAPTER 4 INTRASPECIFIC VARIATION IN RECOMBINATION IN DROSOPHILA MELANOGASTER 4.1 Introduction Meiotic recombination occurs in most eukaryotes, and results in exchange of genetic material between homologous chromosomes. Recombination can alter the allelic combinations either by crossing over (CO) or by gene conversion (GC), the two likely outcomes of Double strand Break Repair (DSB) during meiosis. While both events differ in scale, they both contribute strongly to changes in standing genetic variation within species. Population genetic theory predicts that recombination rates shape a variety of parameters, including levels of DNA variation and effectiveness of natural selection. As such recombination has been a fundamental parameter in population genetic studies Advantages of recombination Considerable amount of theoretical research has focused on the evolution and maintenance of recombination. Theories on the evolution and maintenance of recombination can be divided into two main classes (albeit there are innumerable models focusing on the specific relevance of specific biological conditions). One class of models focuses on the advantages of recombination to the population when considered as a group. The second class of theories considers the evolution of recombination and importance and fate of modifier genes that can alter the rate of recombination between two other genes. The group selection argument provided by Fisher (1930), Muller (1964), and Crow and Kimura (1970) maintains that when favorable mutations occur at two or more

94 76 loci, recombination could combine them into the same genome. While this basic argument is accepted by many authors based on the general idea that beneficial mutations are rare and therefore will almost certainly arise in different genetic backgrounds within a population, there has been disagreement on various parameters of this argument. Crow and Kimura (1965) argued that this model would be most pertinent in populations with large population size (N) and large U/s, where U is the rate at which new favorable mutations arise per genome and s is the selective advantage. Bodmer (1970), conversely, argued that recombination is most likely to play a more important role in populations with 8Nu<1 (with u being the mutation rate), due the increased speed to generate favorable combinations. Bodmer thus concluded that the advantage of recombination is greater in small than in large populations. The modifier gene theory of recombination involves two types of genes; the genes that are under selection (eg, A and B) and the genes that might or might not be under selection but control the rate of recombination between A and B. In case of a neutral modifier gene, any advantage to the modifier is due to the selective advantage generated by the allelic combinations of A and B. Nei (1967) proposed that when the fitness of the genotype is constant, a modifier that reduces the rate of recombination will spread through the population. Nei (1969) later showed that the spread of the modifier in population is faster when it itself is linked to the genes A and/or B. The modifier theory was further improved by incorporating the effect of polymorphism at the modifier loci, again with similar result: a modifier allele will increase in frequency if it reduces the rate of recombination (Feldman 1972). Some modifier models have been recently proposed and conclude that in infinite populations if epistasis and linkage disequilibrium are not

95 77 present--there will be no advantage of recombination (Feldman et al. 1996). However, Felsenstein (1974) and others (e.g., Barton 2009) argue that models assuming infinite populations are hardly relevant and that loci will always be in linkage disequilibrium in any finite population. Due to the inevitable presence of linkage disequilibrium in all populations, alleles will always interfere with the efficacy of selection of other alleles in the same genetic background. This effect, now known as the Hill-Robertson effect, is sufficient to explain the presence of recombination in natural (finite) populations. An advantage of recombination can also be explained by Muller s Ratchet. In finite populations with no recombination, multiple deleterious mutations can get fixed in the same genome by random genetic drift. Without recombination this genotype will be at a disadvantage, because genotypes cannot be rearranged. Further, if a genotype with the least number of deleterious mutations is lost due to random genetic drift, it cannot be recreated in the absence of recombination (Muller 1964; Felsenstein 1974). The magnitude of the deleterious consequences associated with Muller s Ratchet increases in populations with very small population size and high rates of deleterious mutations. Charlesworth (1993a and 1993b) compared sexual (recombining) and asexual (nonrecombining) populations based on different types of selection. He showed that under directional selection, a small increase in genetic variance will be beneficial for the population and genetic variance of freely recombining population will be four times that of the population without recombination. This is because any type of selection (other than stabilizing selection) will create negative linkage disequilibrium resulting in reduction of the variance of any trait. Directional selection on quantitative characters also reduces the genetic variance among the individuals, leading to an increase in negative linkage

96 78 disequilibrium. However, there is increased advantage for a population with recombination under directional selection because the mean fitness of the population could reach the new fitness optimum much quicker than a population without recombination. When a population is under stabilizing selection, there will be an increased pressure for selection to reduce recombination if the rate of change of the optimal fitness is below a threshold. For a sexual population to be more fit than the asexual under stabilizing selection, genetic variance must cross the threshold (Charlesworth 1993a and 1993b). The question then is how frequent is stabilizing selection in natural populations. The reduced level of long-distance linkage disequilibrium observed in many species suggests that stabilizing selection within a given population might be an exception rather than the norm. Barton (1995) further argues based on multiple loci, that recombination can be favored under directional selection when epistasis between the loci is weak and negative. Some early experiments in Drosophila suggested that recombination might be selected against, because it can break up good gene combinations and causes reduction in fitness (Mukai and Yamaguchi 1974; Charlesworth and Charlesworth 1975). This reduction in fitness is also known as recombination load. Charlesworth and Barton (1996), however, observed that recombination will be favored if there is substantial net heritable variance in fitness as it will provide faster response to directional selection. This is true with modifiers of recombination that can suffer an immediate loss in fitness if they increase recombination but will be ultimately favored in the long term. Kondrashov (1988) provided another explanation for the maintenance of recombination in a population. He proposed that, when large numbers of deleterious

97 79 mutations are introduced into a population, fitness will drop and Selection will then remove them. Assuming the number of mutants to be normally distributed in terms of fitness effects (from very bad to very good), this will reduce the variance around the mean number of mutants. Recombination will mix the genomes and bring the variance and mean back to the previous state (spread of the distribution will be greater). Asexual populations don t have this advantage and reduced variance around the mean will be fatal for the population (Kondrashov 1988). Later theories have incorporated the effect of finite population size, random genetic drift and effectiveness of selection as important parameters maintaining the advantage of sex and recombination in a population (Otto and Barton 2001; Barton and Otto 2005; Keightley et al. 2006). These more recent models also incorporate a realistic approach to the many possible levels of selection acting every generation, in all species. In this regard, the original Red Queen s hypothesis (Van Vallen 1973) that posits successive overlapping selective events (intra- and interspecies competition as well as environmental changes) generates the necessary dynamics to explain benefits to recombination in most, if not all, natural conditions. For a review of theories on the advantage of recombination see: Hadany and Comeron 2008; Otto All the theories support the idea that recombination increases the effectiveness of selection as it results in increased variation in fitness in natural populations (Fig 4.1and 4.2). The comparison of genomes from 12 Drosophila species has allowed for many different evolutionary analyses, including studies showing that the efficacy of selection is increased when rate of recombination is high. Positive Darwinian selection is greater in protein coding sequences of genes located in genomic regions with high recombination

98 80 rate. Similarly, purifying selection removes more efficiently deleterious mutations from protein coding sequences of genes also located in these regions with high recombination (Larracuente 2008) (Fig 4.3). The importance of recombination in increasing the efficacy of selection is also observed in neo-sex chromosomes. When neo-sex chromosomes are formed, non-recombining neo-y undergoes deterioration by accumulation of deleterious mutations. Selection removes the deleterious mutations more efficiently in the recombining neo-x chromosome (Bachtrog 2003). Most of these studies have been successful at indentifying the qualitative advantages of recombination, comparing regions of high vs low, or neo-x vs neo-y chromosomes. Detailed insight into the role of recombination however requires accurate measures of the recombination rate across genomes as well as the inclusion of recombination variability within populations (see below). While recent studies have enabled the quantification of recombination rate throughout the genome and have unveiled fine-scale variation in the yeast Saccharomyces cerevisiae (Mancera et al. 2008), studies of fine-scale recombination rate variation in multicellular eukaryotes are either completely lacking or use marker densities lower than one marker /100 kb and thus cannot discriminate specific genes. There is no recombination map in any multicellular eukaryote that locates recombination events at the level of individual genes (<10kb or better) at a whole-genome level.

99 Meiotic Recombination Meiotic recombination and its two outcomes: Crossing-over and Gene conversion When a double strand break (DSB) occurs during meiosis it can be resolved either by forming a Holliday junction or through a Holliday junction independent pathway. If the Holliday junction is formed, the DSB can be resolved either by CO or by GC. However, if the DSB is resolved without formation of Holliday junction, the resolution will always result in GC (Mehrotra et al. 2008). The CO to GC ratio can, hence, be different than one with a likely probability of GC events being higher than CO events (Fig 4.4) In humans, the GC to CO ratio ranges from 4:1 to 15:1 at different regions of the genome. This implies that 80 to 94% of the recombination events are resolved as GCs (Jeffreys and May 2004). In D. melanogaster approximately 24 DSB are detected per genome and, on average, 5.6 COs occur per meiosis. This would also suggest that GC occur at a rate ~4 times higher than the number of CO events (Mehrotra and McKim 2006). Unexpectedly, population genetic analyses of a region near the telomere of X chromosome of D. melanogaster, where recombination is severely reduced, suggest that GC events occur more than 400 times as frequent as crossover events (Gay et al. 2007). This further supports the hypothesis that in the Drosophila genome where CO is suppressed, GC is not suppressed and both are either separate processes or DSBs in this region are resolved into GCs only (Gay et al. 2007). With increased depth in sequencing whole genomes, it is now possible to comprehend the variation in both CO and also GC throughout the genome.

100 Variation in CO rates across genomes In humans, recombination rates show sharp and narrow peaks with the majority of recombination occurring in small proportion of sequences. These regions, also termed Hot spots increase in both directions away from genes for 30kb before decreasing again. The human genome comprises more than 25,000 recombination hotspots (Myers et al. 2005). In these hotspots, there is overrepresentation of CT rich and GA rich repeats. Further, there is no difference in the spatial distribution of hotspots in males and females. Myers et al. (2005) proposed that over large distances there is selection for maintaining stable recombination rates, but at a finer scale there is a lot of variation in recombination rates. In Drosophila, fine scale variation is observed in recombination rate throughout the genome. Recombination is absent in males in D. melanogaster. It is also absent in the 4 th dot chromosome and suppressed in the centromere of the 2 nd and 3 rd chromosome. In the X chromosome it is moderately reduced at the centromere, but suppressed in the telomere. In Drosophila, the accepted point of view has been one suggesting no recombination hotspots (Nachman 2002). Kulathinal et al. (2008) presented the best recombination map in Drosophila to date (D. pseudoobscura) and found crossover rates to vary from 0.88 to 15 cm/mb along the 2 nd chromosome and from 0.91 to 22 cm/mb on the X chromosome. The authors also found two motifs, CCCCACCCC and CCTCCCT, overrepresented in the regions that they classified as hotspots of recombination (Kulathinal et al. 2008). Stevison and Noor (2010) analyzed recombination rates of the second chromosome and the right arm of the X chromosome in D. persimilis, a closely related

101 83 species to D. pseudoobscura, and detected variation in local rates ranging from 0 to cm/mb This study also observed a significant correlation between a 13 base pair (bp) motif and the recombination rate in both D. persimilis and D. pseudoobscura (Stevison and Noor 2010) Variation in GC rate across genomes When a DSB is resolved into GC, one of the homologous chromosomes transfers its genetic material to the other chromosome resulting in nonreciprocal transfer of DNA. Chovnick et al. (1964) investigated mutants in the rosy gene in D. melanogaster andconcluded that intragenic recombination within rosy can cause GC. Curtis and Bender (1991) estimated the average tract length of the GC events to be 885 bp in mei locus. Based on recombinants of the D. melanogaster rosy locus, Hilliker et al. (1984) estimated the average GC tract length of 352 bp. In yeast the average meiotic conversion tract length is estimated to be 1 to 2 kb (Judd and Petes 1988). In humans GC tract lengths have been inferred from genotyping sperm and rangefrom 55 to 290 bp (Jeffreys and May 2004) Variation in recombination rate within populations The presence of variation in the rate of recombination within populations was demonstrated by selection experiments conducted in the mid 70s. Kidwell (1972a and 1972b) artificially selected for high recombinant and low recombinant flies for 20 generations. She observed an increase in recombination in the high recombinant lines but not in the low recombinant ones. These experiments not only were one of the first to experimentally uncover the presence of natural variation in recombination (natural modifiers) but also evidenced that there was a tremendous potential within species to

102 84 change recombination in very few generations (Kidwell 1972a and 1972b). Similar results in Drosophila were observed in a study by Chinnici (1971) in which directional selection increased or decreased recombination in selected lines and by Abdullah and Charlesworth (1974) who obtained a fast and significant reduction in the recombination rate. Moreover, Charlesworth and Charlesworth (1985a and 1985b) observed that age influences recombination rate in Drosophila, with higher recombination in younger females. They also found that genes on all three chromosomes: X, 2 nd and 3 rd are involved in controlling recombination rates thus revealing the polygenic nature of recombination rates Recombination rate as a population genetic parameter The comparison of the scaled mutation rate (ϴ=4N e u) and the scaled recombination rate (C=4N e r) can provide insight into neutral expectation for linkage disequilibrium and therefore be used as a null model to investigate selective events (Andolfatto and Przeworski 2000; Andolfatto and Wall 2003; Frisse et al. 2001; Przeworski and Wall 2001; Wall et al. 2002). Mutation rates can be easily estimated based in interspecifc comparisons of homologous DNA sequence data. Recombination rates however need to be estimated within species and, ideally, based on experimental data instead of inferred based on population genetic models. That is, there is a need to generate high density recombination maps within a population context. These maps will provide better understanding of the evolutionary forces acting on specific regions of the genome. Without high density recombination maps, for instance, positive selection can be erroneously inferred in regions of reduced recombination if these regions were assumed to be highly recombining.

103 The need for a new generation of recombination maps The increased ability to sequence markers throughout the genome has resulted in a better understanding of recombination rate variation across genomes, unveiled unexpectedly high fine-scale variation. However, in multicellular eukaryotes, these studies of fine-scale recombination rate variation are still coarse relative to population genetic analyses the target single genes (or smaller physical units) as potential units of selection due to limited density of markers (Coop et al. 2008; Crawford et al. 2004; Kulathinal et al. 2008; Stevison and Noor 2010). There is no recombination map in any multicellular eukaryote that locates recombination events at the level of individual genes (<10kb or better) at a whole-genome level. Also, most of these previous studies are not dense enough to incorporate the contribution of GC to total recombination and the studies that estimated GC rates (unless for S. cerevisiae) have not obtained genome-wide rates. Additionally, while variation in recombination rate is known to occur in most species, to date no study has documented recombination variation within species with fine-maps. Here, we report the generation of ultra-dense, whole genome recombination events in D. melanogaster based on the direct mapping of single nucleotide polymorphism (SNP), allowing us to obtain maps for CO and GC events separately and capture recombination variation within this species. 4.2 Materials and Methods Fly Crosses Sequenced isogenic wild type D. melanogaster RAL strains caught in Raleigh, North Carolina were obtained from the Bloomington Stock Center, Indiana and were used in all the crosses. All parental lines were resequenced to capture any residual

104 86 heterozygosity. All populations and crosses used in this study were maintained with constant density in half pint bottles with standard cornmeal/yeast/agar medium under a 12 hr. light/dark cycle at 23.5 C. A total of 8 crosses between different RAL lines were conducted in both directions (Table 4.1). Crosses were carried out with 6 hour old virgins obtained from the parental isogenic lines. Flies were mated in both directions separately, in the half pint bottles for 2 days after which the parents were discarded and the F 1 offspring were allowed to emerge. Virgin heterozygous F 1 flies were collected from both directions and mixed together. To generate RAILs (Recombinant Advanced Intercross Lines), we then crossed twenty F 1 males and 20 F 1 females were randomly collected from the previous generation and put together in new bottles. The F 1 flies were also allowed to mate for 2 days after which they were discarded and F 2 offspring were allowed to emerge. RAIL flies were thus generated by sibling mating. The number of generations of sibling mating varied among crosses and allowed for the accumulation of recombination events across the same genome. For all crosses, the number of bottles was increased progressively, starting with five, to reduce the effect of random genetic drift that could generate homozygous combination for recombination events. After crossing the strains for the desired number of generations, virgin females were collected and mated with males of D. simulans (Florida City strain) in a 3:1 male to female ratio. D. melanogaster virgin females were aged for one day and D. simulans Florida City virgin males were aged for 3-4 days before combining them for interspecific mating. 10 females and 30 males were placed in a single 8 dram vial for 5 days after which males were discarded and individual females were maintained in single 8dram

105 87 vials to lay eggs. Once enough pupae were observed in the vials, females were discarded and hybrid offspring were allowed to emerge. For each cross between the two RAL strains, we froze (-20 C) female hybrid offspring of more than 350 D. melanogaster RAIL females mated with D. simulans Florida City males. To ensure enough replicates and resequencing if needed, we froze 6 hybrid offspring females from each of the more than 350 D. melanogaster mated females DNA extraction More than 350 individual hybrid females from each cross between two RAL strains were genotyped using Illumina deep sequencing technology (Fig 4.5). DNA was extracted using DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA). A single frozen fly was disrupted with a 5mm stainless steel bead (Qiagen, Valencia, CA) in a 2ml tube (Qiagen, Valencia, CA ) and using Tissue Lyser LT (Qiagen, Valencia, CA). Disruption and homogenization were achieved through high-speed shaking of the samples for two rounds of 90 seconds each. DNA was extracted from the disrupted fly tissue using a modified Qiagen DNeasy Blood & Tissue Kit protocol. We increased the incubation time from 10 min to 1hr after which the lysate was treated with RNAse A (final concentration of 10ng/µl) (Qiagen, Valencia, CA) for 30 min. After 90 min of total incubation, 200μl ethanol was added to the sample and mixed gently to precipitate the DNA. The lysate was further purified following the Qiagen DNeasy Blood & Tissue Kit protocol and the DNA was eluted in 85µl of elution buffer (EB, Qiagen, Valencia, CA). Seventy five µl of the DNA in EB solution was used for preparing the library. The rest of the sample was kept for posterior validation of SNPs, CO or GC events. Seventy five µl of the DNA was sheared in a 0.5ml tube (USA Scientific, Orlando FL) using a

106 88 Bioruptor (Diagenode, Denville, NJ) for 23 cycles of 15 sec sonication on and 45 sec sonication off at full setting (high). The temperature of the water in Bioruptor was maintained at 4 C. Our protocol and settings maximized the concentration of sheared DNA to be around 300bp. Sheared DNA was then purified with QIAquick PCR Purification kit protocol (Qiagen, Valencia, CA) and eluted in 22µl of EB. Most this eluate (18.7 μl) was used further for library preparation. The remaining of the sample was used in Real-time PCR (Roche LightCycler 480 system, Roche IN, USA) for relative quantification of the amount of DNA based on crossing point values (Cp) that was used to normalizing the libraries before first round of multiplexing (see below) Real-Time PCR Relative quantification of each fly DNA sample was performed with real time PCR based on individual Cp values determined by the LightCycler 480 system. Relative quantification of the amount of individual fly DNA was performed using the gene ribosomal protein 49 (rp49) as a proxy. The 20µl real-time reaction contained 10μl Roche LightCycler 480 SYBR Green I Master (Roche Diagnostics), 1μl (final concentration 0.5μM) of each rp49 primers, 1µl of DNA in EB, and water. The sequence of rp49 primers are mentioned in Table 4.2. The reactions were performed in white, 96- well PCR plates (Roche Diagnostics, IN, USA) at the following temperatures: 95 C for 5min followed by 30 cycles of 98 C for 10 seconds, 55 C for 30 seconds, and 72 C for 30 seconds. After Illumina adapter ligation, Cp values were used to normalize the amount of DNA from each fly.

107 Illumina library preparation To prepare single fly Illumina libraries, three standard enzymatic reactions were performed consecutively on the sheared DNA using New England BioLabs reagents. All reactions were carried out in 0.2ml PCR tubes in ABI GeneAmp PCR system The end-repair reaction was done at 20 C for 30 min in a final volume of 25μl with the following: 18.7μl of eluate, 2.5μl of T4 DNA ligase buffer (New England Biolabs, MA, USA), 1μl of dntp mix (final concentration 0.4mM each) (New England Biolabs, MA, USA), 1.25μl T4 DNA polymerase (final concentration 0.15U/μl) (New England Biolabs, MA, USA), 0.25μl Klenow DNA polymerase (final concentration 0.05U/μl) (New England Biolabs, MA, USA), and 1.25μl of T4 polynucleotide nucleotide kinase (PNK) (final concentration 0.5U/μl) (New England Biolabs, MA, USA). Repaired DNA was purified with Qiagen MinElute PCR Purification Kit (Qiagen, Valencia, CA) and eluted in 11μl of EB. The second reaction, to add an adenine base at the repaired blunt ends, was performed at 37 C for 30 min in a final volume of 12.5μl with 10.5 μl eluate from the previous reaction, 1.25μl da-tailing buffer (final concentration of 20 μm datp/μl) (New England Biolabs, MA, USA), and 0.75μl Klenow 3 to 5 exonuclease (final concentration 0.3U/μl). The 3 da-tailed DNA was then cleaned and purified with Qiagen MinElute PCR Purification Kit and eluted in 10 μl of EB. The third reaction, to ligate the da-tailed DNA to Illumina adapters was carried out at 20 C for 30min. Illumina adapters (Table 4.2) were modified by adding 7 unique bp to generate 24 indexed Illumina adapters. Thus, 24 libraries were multiplexed and sequenced in a single Illumina lane and later separated based on the 7 bp index.

108 90 Lyophilized indexed adapters were resuspended in 10 mm Tris-HCl ph 8 to a 100 μm stock concentration. The stock solution was diluted to 60 μm of adapter oligo mix in 20 mm Tris-HCl ph 8 as per the above mentioned protocol and stored at 4 C as working stock. The reaction was carried out in a final volume of 22 μl with 9.2 μl of the eluted DNA, 11 μl of 2x quick ligation buffer, 0.75 μl of quick DNA ligase (final concentration 68 U/μl) (New England Biolabs, MA, USA), and 1 μl of adapter oligo mix (final concentration 2.7 μm). After the ligation reaction, each individual fly had a unique adapter. For multiplexing, we normalized and pooled the DNA twice. The first pooling was done after the ligation step based on the Cp values of the Real-time PCR, when ligated DNA of 4 flies was pooled together. The pooled ligated product was then purified with Qiagen MinElute PCR Purification Kit and eluted in 10ul of EB Gel extraction To select a size-range of templates to go on the cluster generation platform for Illumina sequencing, we excised 300 bp fragments from the gel and extracted the DNA. For gel extraction, a 100 ml x cm, (Bio-Rad, CA, USA) 2.0% low range ultra agarose gel with EtBr (final concentration 400ng/ml) was used. The 10 μl eluate was mixed with 2 μl of 6X loading dye and loaded into the gel. Fermentas GeneRuler 1 kb plus ladder (Fermentas USA) was used as the marker. The gel was run at 90V for 110 minutes to allow adequate separation of DNA fragments. After 110 minutes, the gel was placed on a Dark Reader transilluminator and DNA with approximate length of 300 bp was excised from the gel using the 1 kb plus ladder as a reference. DNA extraction was performed on the excised gel using the QIAquick Gel Extraction Kit (Qiagen, Valencia, CA). 6x volume of QG buffer was added to 1x weight of excised gel and incubate at

109 91 50 C for 10 minutes or until the gel dissolved. After incubation, 2x gel volume of isopropanol was added dissolved gel to precipitate the DNA. The DNA was then column purified and eluted in 30 µl of EB (Fig 4.5) PCR enrichment, library validation and quantification The adapter-ligated DNA was amplified with PCR to enrich and validate the library. A 25 μl reaction was performed with 12.5 µl Phusion High-Fidelity DNA Polymerase (New England Biolabs, MA, USA), 0.75 μl of each adapter specific primers (0.15 μm final concentration), 4 μl of eluted DNA and water. Sequences of Illumina PCR primers are described in Table 4.2. PCR was performed at following temperatures: 98 C for 30 seconds, followed by 10 cycles of 98 C for 10 seconds, 65 C for 30 seconds, and 72 C for 8 seconds with a final extension at 72 C for 5 minutes. The PCR enriched library was validated by running on a standard agarose gel. After pooling the PCR products, it was cleaned up using QIAquick PCR Purification (Qiagen, Valencia, CA) kit and eluted in 30μl of EB. The PCR-enriched, adapter-ligated DNA thus had 4 fly genomes multiplexed in it. We further pooled samples to obtain a final set of 24 multiplexed fly genome libraries that would be sequenced in a single Illumina lane. This second pooling the samples for multiplexed sequencing was based on the concentration of the PCR enriched DNA. Each sample was quantified with Quant-IT PicoGreen (Invitrogen, CA, USA). dsdna reagents following the provided protocol on a Turner BioSystems TBS-380 Fluorometer. Based on the DNA concentration we multiplexed 6 PCR enriched libraries, each already containing 4 libraries, thus generation a pool of 24 normalized, single-fly genome libraries. 100 nm of the multiplexed sample in 24 μl EB was sequenced. A total of 29

110 92 Illumina plates were sequenced using the Illumina Genome Analyzer IIx and Illumina HiSeq 1000 instruments at the Iowa State University DNA Facility (Ames, IA). Four additional plates were sequenced at the High Throughput Sequencing Facility of the University of North Carolina at Chapel Hill using the Illumina Genome Analyzer IIx Sequence analysis Sequenced reads from each lane were first separated bioinformatically by sorting the specific adaptor tags that identified DNA molecules from a single recombinant female. Reads with tags that were not exact to expected tag sequences were removed. Note that all tags were designed to have more than three differences thus ensuring that no sequence would be misclassified. Filtering and mapping was performed based on a combination of programs and custom scripts. Filtering and trimming of low quality reads was performed with the program SAMtools (Li, 2009). The reads were then aligned to D. simulans with the programs Bowtie and BWA (Li and Durbin, 2009), which align relatively short nucleotide sequences against a long reference sequence. Reads aligned to D. simulans Florida City or to any of six different strains of D. simulans were discarded. MOSAIK aligner (Stromberg and Lee, 2009) was then used to align the remaining reads to both parental RAL strains. Reads that aligned to both parents were also removed. Reads that mapped uniquely to one parental reference sequence and that differ from the homologous sequence of the other parental strain by informative SNPs, and not indels, were used for further analysis (Fig 4.5). CO and GC events were assigned for each chromosome and fly sequence and recombination maps along chromosomes were finally generated for each cross with a program written by J.M. Comeron (personal communication) (Fig 4.6).

111 Results and Discussion Our new experimental and methodological approach focused on massively genotyping the products of 5860 female meiosis in eight crosses between natural strains of D. melanogaster. Overall, we genotyped 139 million informative SNPs with an average of 49,000 informative SNPs per fly. This genotyping effort allowed us to directly map more than 105,000 recombination (CO and GC) events at a physical resolution of 2.5 kilobases (Kb) A high-resolution CO map for D. melanogaster A total of 32,511 COs were used to generate high-resolution CO maps for each chromosomal arm in D. melanogaster (Fig ). This resolution of our CO maps is almost equivalent to the high-resolution mapping of meiotic recombination of the unicellular S. cerevisiae (Mancera et al. 2008) while more than 50-fold more detailed than previous high-resolution whole-genome CO maps in D. pseudoobscura (median distance between markers of 466 kb; Kulathinal et al. 2008), in humans (median interval size of 93 kb; Coop et al. 2008) or C. elegans (median interval of 98 kb; Rockman and Kruglyak 2009). Due to the elevated density of markers each CO is supported by numerous contiguous markers on either side. We therefore expect to have detected all COs. In the combined maps we observe that CO rates are sharply reduced near telomeres and centromeres, with no CO activity in the small fourth (dot) chromosome, all in agreement with genetic maps based on data using visible markers before the genomics era (Lindsley and Zimm 1992; Kindahl 1994; Comeron et al. 1999; Marais et al. 2001; Hey and Kliman 2002; Singh et al. 2005; Fiston-Lavier et al. 2010).

112 94 Unlike most of the studies that define recombination environments over distances of several megabases with broadly defined regions as either reduced, intermediate and non-reduced recombination, our results deepen our recent appreciation for intrachromosomal variation in CO rates in Drosophila (Kulathinal et al. 2008; Singh et al. 2009) and maps this heterogeneity at a much finer scale. All chromosomal arms (except the forth chromosome) show 15-to-20-fold variation within regions traditionally labeled as regions of non-reduced or high recombination, thus defining hot and cold regions of CO in D. melanogaster. Adjacent 100-kb windows can differ by more than 15-fold even in these combined maps (eg., region 15,900,000-16,100,000 in the X chromosome). Notably we detect genomic regions with almost undetectable CO events embedded in large regions with high CO rates (see below) Intraspecific variation in CO landscapes Our study of eight crosses of natural D. melanogaster strains allows us to compare CO maps produced by different genotypes from the same species, controlling for factors that can alter CO rates in Drosophila such as female age, temperature, fly density or food. These eight CO maps (Fig ) reveal a high degree of variation with some genomic regions showing individual crosses with exceedingly high rates (> 40-fold relative to adjacent regions or other crosses). Despite this variation the overall picture is not one with intermediate rates along chromosomes due to crosses balancing each other: some genomic regions consistently have high or low rates in all crosses under our experimental conditions. We also show that regions assigned as peaks for CO rates based on average should not be considered equivalent, for they can be either a species

113 95 trait or reflect the presence of a polymorphic hot-spot at low frequency within the species. To quantify variation among crosses we estimated the variance to mean ratio (Index of dispersion; R CO ), an approach equivalent to that for the study of constancy in nucleotide changes among independent lineages (Gillespie 1989). In order to focus our study on variation in the distribution of CO rates rather than overall rates we used weighting factors that include the number of meioses analyzed as well as the total number of CO along a chromosome as lineage effects and estimated the statistical significance of observed R CO values based on simulation studies following the procedure described in (Zeng et al. 1998). The comparison of the eight CO maps reveals many regions (107 or 22% of all 250-kb long regions across the genome) with significant excess of variation after correcting for multiple tests (Fig ). We show that all chromosomes harbor several genomic regions with higher than expected variance in CO rates. The magnitude of this excess in variance is highest for chromosomal arm 2L (Fig 4.13) while notably reduced for the chromosomal arm 3L (Fig 4.15). Importantly, excess variance in CO rates does not wane when the study involves large genomic regions. On the contrary, at the physical scale of 1 Mb (data not shown) more than half of the chromosome exhibits excess variance thus suggesting that they encompass one or more variable regions. These results are not wholly unanticipated based on early studies of CO rates using morphological markers in D. melanogaster that showed differences among strains for a few specific genomic regions (Kidwell 1972a and 1972b; Abdullah and Charlesworth 1974). Nevertheless, our study details the genomic location and magnitude

114 96 of this variation and depicts for the first time in a eukaryote a genome-wide polymorphic landscape for CO GC and GC maps in D. melanogaster Although we have detected 74,453 GC events, short GC tracts that lay between flanking markers are expected to be missed. Therefore, we applied a new maximum likelihood algorithm (J.M Comeron, personal communication) that was developed to simultaneously estimate the rate of GC initiation ( ) and length of GC tracts (L) based on the observed number of GC events and the distance between markers. Overall, our estimates of and L are 1.25x10-7 /bp/female meiosis and 518 bp, respectively, both values higher than those obtained at the rosy locus (Hilliker and Chovnick 1981; Hilliker et al. 1994). The study of each chromosomal arm separately (Table 4.3 and Fig 4.17) shows all recombining arms (2L, 2R, 3L, 3R and X) with similar estimates for (ranging between 1.13x10-7 and 1.49x10-7 /bp/female meiosis) and L (ranging between 456 and 632 bp). Importantly, we have directly observed several GC events in the small fourth chromosome, a chromosome that is always achiasmatic in Drosophila and, congruently, shows no evidence of CO events (in our study as well as in any other D. melanogaster study to date). Our estimates of and L for the fourth chromosome are 0.46x10-7 /bp/female meiosis and 1062 bp and represent the first experimental evidence of GC in the absence of CO in Drosophila. Finally, we show that GC rates are more uniformly distributed across the genome than CO rates (Fig 4.18). At the level of resolution of our study, there is no reduction in GC rates near telomeres or centromeres as it is the case for CO (see above). Most 100-kb

115 97 windows have rates of GC within a 2-fold range. This pattern contrasts sharply with the reported high heterogeneity for CO rates where more than 15-fold variation in rates is observed within regions classically assumed to have moderate/high recombination. The detection of GC rates fairly constant across recombining chromosomes together with our observation of GC events in the achiasmatic fourth chromosome strongly supports a different mechanism or a bias in its resolution for DSB repair in different regions of the Drosophila genome. 4.4 Conclusion and future directions Our detailed recombination maps show, at an unprecedented level of detail, great heterogeneity in CO rates along chromosomes, with hot- and coldspots in regions previously assumed to exhibit similar rates. Notably our detection of genomic regions with almost undetectable CO events embedded in large regions with high CO rates suggest the possibility that these regions may exhibit excess linkage disequilibrium or reduced polymorphism without the need to invoke recent selective sweeps (Andolfatto 2007; Sella et al. 2009). Therefore, we argue, population genetic analyses of selection across the D. melanogaster should incorporate these more accurate recombination maps. Evolutionarily, our results based on intraspecific variation in CO maps underscore a tremendous and previously uncharacterized potential for selection to act on segregating modifiers of recombination in natural populations. The presence of modifiers of recombination segregating in natural populations is at the core of most models on the evolution and maintenance of recombination (Hey 1998; Agrawal et al. 2005; Barton and Otto 2005; Friberg and Rice 2008; Barton 2009) but little was known about their actual genomic location and frequency within species. Further genomic analyses however are

116 98 needed to fully characterize the genetic nature of these modifiers of recombination associated with CO, their mode of evolution and the evolutionary implications. Finally, we have uncovered a recombination map based on GC events that differs from that based on CO events. We show that GC is not inhibited in regions with severely reduced or absent CO, likely due to biases in the resolution of DSB repair. Because GC and CO are predicted to influence population parameters differentially, we suggest that new population genetics models should be developed to incorporate the variable relative rate of GC and CO along chromosomes.

117 99 Table 4.1 Cross numbers and parental strains used to generate RAIL recombinant lines. Strains used to generate RAILs Cross # Parental Strain 1 Parental Strain2 1 Papua New Guinea Madagascar 2 RAL-208 RAL RAL-306 RAL RAL-514 RAL RAL-208 Papua New Guinea 6 RAL-301 RAL RAL-712 RAL RAL-380 RAL-820

118 100 Table 4.2 List of primers used in the experiments. Primer rp49 forward primer rp49 reverse primer Illumina adapter-1 Illumina adapter-2 Illumina forward primer Illumina forward primer Sequence 5 -CAAGAAGCTAGCCCAACCTG-3 5 -CACTCACCGACAGCTTAGCA-3 5 -/Phos/- GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG-3 5 -ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3 5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCT ACACGACGCTCTTCCGATCT-3 5'-CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT-3

119 101 Table 4.3 Maximum likelihood estimates of gene conversion tract lengths L for each chromosome (bp) and ρ, the rate of gene conversion initiation (/female meiosis/bp). L ρ 2L x R x L x R x 10-7 X x x 10-7

120 102 Figure 4.1 Recombination and the effectiveness of selection on two beneficial mutations on two different chromosomes. Source: Modified from Eyre-Walker (2006) Note: Two beneficial mutations in two different chromosomes cannot be brought together into a single chromosome in absence of recombination. (1 and 2) Recombination can bring the two beneficial mutations together into a single chromosome (3 and 4). Selection can then fix both advantageous mutations in the population.

121 103 Figure 4.2 Recombination and the effectiveness of selection on a beneficial mutation on a chromosome with strong deleterious mutations. Source: Modified from Eyre-Walker (2006) Note: Beneficial mutations in a chromosome with strong deleterious mutations will be removed from the population in absence of recombination. (1 and 2) Recombination can bring the beneficial mutation to a chromosome without deleterious mutations. (3 and 4) Selection can then fix the advantageous mutation in the population.

122 104 a b Figure 4.3 The effectiveness of selection on genes with different recombination rates. Source: Larracunte (2008) Note: Box plots of ω for (a) accelerated and (b) not accelerated datasets, divided by genes with high, low or zero recombination. Genes with low recombination have lower ω for the accelerated dataset and higher ω for the not accelerated dataset.

123 105 Figure 4.4 Recombination and Double Strand Repair (DSB) pathways during meiosis. Source: Modified from Mehrotra et al. (2008)

Local effects of limited recombination in Drosophila

Local effects of limited recombination in Drosophila University of Iowa Iowa Research Online Theses and Dissertations Spring 2010 Local effects of limited recombination in Drosophila Anna Ouzounian Williford University of Iowa Copyright 2010 Anna Ouzounian

More information

Lecture 19: Hitchhiking and selective sweeps. Bruce Walsh lecture notes Synbreed course version 8 July 2013

Lecture 19: Hitchhiking and selective sweeps. Bruce Walsh lecture notes Synbreed course version 8 July 2013 Lecture 19: Hitchhiking and selective sweeps Bruce Walsh lecture notes Synbreed course version 8 July 2013 1 Hitchhiking When an allele is linked to a site under selection, its dynamics are considerably

More information

The neutral theory of molecular evolution

The neutral theory of molecular evolution The neutral theory of molecular evolution Objectives the neutral theory detecting natural selection exercises 1 - learn about the neutral theory 2 - be able to detect natural selection at the molecular

More information

Molecular Evolution. H.J. Muller. A.H. Sturtevant. H.J. Muller. A.H. Sturtevant

Molecular Evolution. H.J. Muller. A.H. Sturtevant. H.J. Muller. A.H. Sturtevant Molecular Evolution Arose as a distinct sub-discipline of evolutionary biology in the 1960 s Arose from the conjunction of a debate in theoretical population genetics and two types of data that became

More information

Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15

Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15 Introduction to Population Genetics Spezielle Statistik in der Biomedizin WS 2014/15 What is population genetics? Describes the genetic structure and variation of populations. Causes Maintenance Changes

More information

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012 Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation

More information

Review. Molecular Evolution and the Neutral Theory. Genetic drift. Evolutionary force that removes genetic variation

Review. Molecular Evolution and the Neutral Theory. Genetic drift. Evolutionary force that removes genetic variation Molecular Evolution and the Neutral Theory Carlo Lapid Sep., 202 Review Genetic drift Evolutionary force that removes genetic variation from a population Strength is inversely proportional to the effective

More information

Explaining the evolution of sex and recombination

Explaining the evolution of sex and recombination Explaining the evolution of sex and recombination Peter Keightley Institute of Evolutionary Biology University of Edinburgh Sexual reproduction is ubiquitous in eukaryotes Syngamy Meiosis with recombination

More information

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human?

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human? TEST FORM A Evolution PCB 4673 Exam # 2 Name SSN Multiple Choice: 3 points each 1. The horseshoe crab is a so-called living fossil because there are ancient species that looked very similar to the present-day

More information

Park /12. Yudin /19. Li /26. Song /9

Park /12. Yudin /19. Li /26. Song /9 Each student is responsible for (1) preparing the slides and (2) leading the discussion (from problems) related to his/her assigned sections. For uniformity, we will use a single Powerpoint template throughout.

More information

Distinguishing Among Sources of Phenotypic Variation in Populations

Distinguishing Among Sources of Phenotypic Variation in Populations Population Genetics Distinguishing Among Sources of Phenotypic Variation in Populations Discrete vs. continuous Genotype or environment (nature vs. nurture) Phenotypic variation - Discrete vs. Continuous

More information

An Introduction to Population Genetics

An Introduction to Population Genetics An Introduction to Population Genetics THEORY AND APPLICATIONS f 2 A (1 ) E 1 D [ ] = + 2M ES [ ] fa fa = 1 sf a Rasmus Nielsen Montgomery Slatkin Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

Liberating genetic variance through sex

Liberating genetic variance through sex Liberating genetic variance through sex Andrew D. Peters and Sarah P. Otto* Summary Genetic variation in fitness is the fundamental prerequisite for adaptive evolutionary change. If there is no variation

More information

Lecture 10 Molecular evolution. Jim Watson, Francis Crick, and DNA

Lecture 10 Molecular evolution. Jim Watson, Francis Crick, and DNA Lecture 10 Molecular evolution Jim Watson, Francis Crick, and DNA Molecular Evolution 4 characteristics 1. c-value paradox 2. Molecular evolution is sometimes decoupled from morphological evolution 3.

More information

University of York Department of Biology B. Sc Stage 2 Degree Examinations

University of York Department of Biology B. Sc Stage 2 Degree Examinations Examination Candidate Number: Desk Number: University of York Department of Biology B. Sc Stage 2 Degree Examinations 2016-17 Evolutionary and Population Genetics Time allowed: 1 hour and 30 minutes Total

More information

MSc in Genetics. Population Genomics of model species. Antonio Barbadilla. Course

MSc in Genetics. Population Genomics of model species. Antonio Barbadilla. Course Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB 1 Course 2012-13 Outline Cataloguing nucleotide variation at the genome scale

More information

Genetic drift. 1. The Nature of Genetic Drift

Genetic drift. 1. The Nature of Genetic Drift Genetic drift. The Nature of Genetic Drift To date, we have assumed that populations are infinite in size. This assumption enabled us to easily calculate the expected frequencies of alleles and genotypes

More information

1 (1) 4 (2) (3) (4) 10

1 (1) 4 (2) (3) (4) 10 1 (1) 4 (2) 2011 3 11 (3) (4) 10 (5) 24 (6) 2013 4 X-Center X-Event 2013 John Casti 5 2 (1) (2) 25 26 27 3 Legaspi Robert Sebastian Patricia Longstaff Günter Mueller Nicolas Schwind Maxime Clement Nararatwong

More information

Reading for today. Adaptive Molecular Evolution. Predictions of neutral theory. The neutral theory of molecular evolution

Reading for today. Adaptive Molecular Evolution. Predictions of neutral theory. The neutral theory of molecular evolution Final exam date scheduled: Thursday, MARCH 17, 2005, 1030-1220 Reading for today Adaptive Molecular Evolution Li and Graur chapter (PDF on website) Evolutionary EST paper (PDF on website) Page and Holmes

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Karin S. Dorman Department of Statistics Iowa State University August 22, 2006 What is population genetics? A quantitative field of biology, initiated by Fisher, Haldane and

More information

Neutrality Test. Neutrality tests allow us to: Challenges in neutrality tests. differences. data. - Identify causes of species-specific phenotype

Neutrality Test. Neutrality tests allow us to: Challenges in neutrality tests. differences. data. - Identify causes of species-specific phenotype Neutrality Test First suggested by Kimura (1968) and King and Jukes (1969) Shift to using neutrality as a null hypothesis in positive selection and selection sweep tests Positive selection is when a new

More information

Adaptive Molecular Evolution. Reading for today. Neutral theory. Predictions of neutral theory. The neutral theory of molecular evolution

Adaptive Molecular Evolution. Reading for today. Neutral theory. Predictions of neutral theory. The neutral theory of molecular evolution Adaptive Molecular Evolution Nonsynonymous vs Synonymous Reading for today Li and Graur chapter (PDF on website) Evolutionary EST paper (PDF on website) Neutral theory The majority of substitutions are

More information

What is molecular evolution? BIOL2007 Molecular Evolution. Modes of molecular evolution. Modes of molecular evolution

What is molecular evolution? BIOL2007 Molecular Evolution. Modes of molecular evolution. Modes of molecular evolution BIOL2007 Molecular Evolution What is molecular evolution? Evolution at the molecular level Kanchon Dasmahapatra k.dasmahapatra@ucl.ac.uk Modes of molecular evolution INDELS: insertions and deletions Modes

More information

The Hill Robertson effect: evolutionary consequences of weak selection and linkage in finite populations

The Hill Robertson effect: evolutionary consequences of weak selection and linkage in finite populations SHORT REVIEW (2008) 100, 19 31 & 2008 Nature Publishing Group All rights reserved 0018-067X/08 $30.00 www.nature.com/hdy The Hill Robertson effect: evolutionary consequences of weak selection and linkage

More information

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus. NAME EXAM# 1 1. (15 points) Next to each unnumbered item in the left column place the number from the right column/bottom that best corresponds: 10 additive genetic variance 1) a hermaphroditic adult develops

More information

Molecular Evolution. COMP Fall 2010 Luay Nakhleh, Rice University

Molecular Evolution. COMP Fall 2010 Luay Nakhleh, Rice University Molecular Evolution COMP 571 - Fall 2010 Luay Nakhleh, Rice University Outline (1) The neutral theory (2) Measures of divergence and polymorphism (3) DNA sequence divergence and the molecular clock (4)

More information

2. Write an essay on reinforcement and its relationship to sympatric speciation.

2. Write an essay on reinforcement and its relationship to sympatric speciation. BIOL B242 Evolutionary and Population Genetics Exam 2006 Model Answers 2. Write an essay on reinforcement and its relationship to sympatric speciation. Reinforcement Suppose some adaptation has led to

More information

CHAPTER 12 MECHANISMS OF EVOLUTION

CHAPTER 12 MECHANISMS OF EVOLUTION CHAPTER 12 MECHANISMS OF EVOLUTION 12.1 Genetic Variation DNA biological code for inheritable traits GENES units of DNA molecule in a chromosome LOCI location of specific gene on DNA molecules DIPLOID

More information

Recombination modulates how selection affects linked sites in Drosophila. Mohamed A. F. Noor 1

Recombination modulates how selection affects linked sites in Drosophila. Mohamed A. F. Noor 1 1 Recombination modulates how selection affects linked sites in Drosophila 2 3 4 5 6 Suzanne E. McGaugh 1, Caiti S. Smukowski 1, Brenda Manzano-Winkler 1, Tiffany L. Himmel 2, Mohamed A. F. Noor 1 7 8

More information

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both? Frequency 10/6/2014 Variation Chapter 9 Some terms Genotype Allele form of a gene, distinguished by effect on phenotype Haplotype form of a gene, distinguished by DNA sequence Gene copy number of copies

More information

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the POPULATION GENETICS POPULATION GENETICS studies the genetic composition of populations and how it changes with time. It includes the study of forces that induce evolution (the change of the genetic constitution)

More information

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action Gene Linkage and Genetic 4 Mapping Key Concepts Genes that are located in the same chromosome and that do not show independent assortment are said to be linked. The alleles of linked genes present together

More information

Genetics Lecture Notes Lectures 6 9

Genetics Lecture Notes Lectures 6 9 Genetics Lecture Notes 7.03 2005 Lectures 6 9 Lecture 6 Until now our analysis of genes has focused on gene function as determined by phenotype differences brought about by different alleles or by a direct

More information

Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome

Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome Department of Biology Publications 6-26-2014 Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome Josep M. Comeron University of Iowa Copyright: 2014 Josep M. Comeron

More information

Balancing and disruptive selection The HKA test

Balancing and disruptive selection The HKA test Natural selection The time-scale of evolution Deleterious mutations Mutation selection balance Mutation load Selection that promotes variation Balancing and disruptive selection The HKA test Adaptation

More information

Midterm exam BIOSCI 113/244 WINTER QUARTER,

Midterm exam BIOSCI 113/244 WINTER QUARTER, Midterm exam BIOSCI 113/244 WINTER QUARTER, 2005-2006 Name: Instructions: A) The due date is Monday, 02/13/06 before 10AM. Please drop them off at my office (Herrin Labs, room 352B). I will have a box

More information

1) (15 points) Next to each term in the left-hand column place the number from the right-hand column that best corresponds:

1) (15 points) Next to each term in the left-hand column place the number from the right-hand column that best corresponds: 1) (15 points) Next to each term in the left-hand column place the number from the right-hand column that best corresponds: natural selection 21 1) the component of phenotypic variance not explained by

More information

Overview of using molecular markers to detect selection

Overview of using molecular markers to detect selection Overview of using molecular markers to detect selection Bruce Walsh lecture notes Uppsala EQG 2012 course version 5 Feb 2012 Detailed reading: WL Chapters 8, 9 Detecting selection Bottom line: looking

More information

Lecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD

Lecture 10 : Whole genome sequencing and analysis. Introduction to Computational Biology Teresa Przytycka, PhD Lecture 10 : Whole genome sequencing and analysis Introduction to Computational Biology Teresa Przytycka, PhD Sequencing DNA Goal obtain the string of bases that make a given DNA strand. Problem Typically

More information

Bio 312, Exam 3 ( 1 ) Name:

Bio 312, Exam 3 ( 1 ) Name: Bio 312, Exam 3 ( 1 ) Name: Please write the first letter of your last name in the box; 5 points will be deducted if your name is hard to read or the box does not contain the correct letter. Written answers

More information

Population Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC

Population Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC 1 Population Genetics How Much Genetic Variation exists in Natural Populations? Phenotypic Variation If we closely examine the individuals of a population, there is almost always PHENOTYPIC VARIATION -

More information

Biology Evolution Dr. Kilburn, page 1 Mutation and genetic variation

Biology Evolution Dr. Kilburn, page 1 Mutation and genetic variation Biology 203 - Evolution Dr. Kilburn, page 1 In this unit, we will look at the mechanisms of evolution, largely at the population scale. Our primary focus will be on natural selection, but we will also

More information

Population genetic analysis of weak selection in the Drosophila subobscura species complex

Population genetic analysis of weak selection in the Drosophila subobscura species complex University of Iowa Iowa Research Online Theses and Dissertations Fall 2011 Population genetic analysis of weak selection in the Drosophila subobscura species complex Derek Ernest Peters University of Iowa

More information

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Linkage & Genetic Mapping in Eukaryotes. Ch. 6 Linkage & Genetic Mapping in Eukaryotes Ch. 6 1 LINKAGE AND CROSSING OVER! In eukaryotic species, each linear chromosome contains a long piece of DNA A typical chromosome contains many hundred or even

More information

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool. 11.1 Genetic Variation Within Population KEY CONCEPT A population shares a common gene pool. 11.1 Genetic Variation Within Population Genetic variation in a population increases the chance that some individuals

More information

Section KEY CONCEPT A population shares a common gene pool.

Section KEY CONCEPT A population shares a common gene pool. Section 11.1 KEY CONCEPT A population shares a common gene pool. Genetic variation in a population increases the chance that some individuals will survive. Why it s beneficial: Genetic variation leads

More information

1. BASICS OF POPULATION GENETICS.

1. BASICS OF POPULATION GENETICS. Bio 312, Fall 2016 Exam 3 ( 1 ) Name: Please write the first letter of your last name in the box; 5 points will be deducted if your name is hard to read or the box does not contain the correct letter.

More information

Chapter 3: Evolutionary genetics of natural populations

Chapter 3: Evolutionary genetics of natural populations Chapter 3: Evolutionary genetics of natural populations What is Evolution? Change in the frequency of an allele within a population Evolution acts on DIVERSITY to cause adaptive change Ex. Light vs. Dark

More information

Selective interference among deleterious mutations favours sex and. recombination in finite populations regardless of the nature of epistasis

Selective interference among deleterious mutations favours sex and. recombination in finite populations regardless of the nature of epistasis Selective interference among deleterious mutations favours sex and recombination in finite populations regardless of the nature of epistasis Peter D. Keightley 1 and Sarah P. Otto 2 1 Institute of Evolutionary

More information

Molecular Evolution. Wen-Hsiung Li

Molecular Evolution. Wen-Hsiung Li Molecular Evolution Wen-Hsiung Li INTRODUCTION Molecular Evolution: A Brief History of the Pre-DNA Era 1 CHAPTER I Gene Structure, Genetic Codes, and Mutation 7 CHAPTER 2 Dynamics of Genes in Populations

More information

5 FINGERS OF EVOLUTION

5 FINGERS OF EVOLUTION MICROEVOLUTION Student Packet SUMMARY EVOLUTION IS A CHANGE IN THE GENETIC MAKEUP OF A POPULATION OVER TIME Microevolution refers to changes in allele frequencies in a population over time. NATURAL SELECTION

More information

The Process of Molecular Phylogenetics

The Process of Molecular Phylogenetics The Process of Molecular Phylogenetics I. Exercise #1 Molecular phylogenetics using a pseudogene Below are four gene sequences. These are taken from four animals that are believed to have recent shared

More information

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008)

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008) An investigation of the fitness and strength of selection on the white-eye mutation of Drosophila melanogaster in two population sizes under light and dark treatments over three generations Image Source:

More information

Genetic Variation. Genetic Variation within Populations. Population Genetics. Darwin s Observations

Genetic Variation. Genetic Variation within Populations. Population Genetics. Darwin s Observations Genetic Variation within Populations Population Genetics Darwin s Observations Genetic Variation Underlying phenotypic variation is genetic variation. The potential for genetic variation in individuals

More information

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes Coalescence Scribe: Alex Wells 2/18/16 Whenever you observe two sequences that are similar, there is actually a single individual

More information

Exam 1, Fall 2012 Grade Summary. Points: Mean 95.3 Median 93 Std. Dev 8.7 Max 116 Min 83 Percentage: Average Grade Distribution:

Exam 1, Fall 2012 Grade Summary. Points: Mean 95.3 Median 93 Std. Dev 8.7 Max 116 Min 83 Percentage: Average Grade Distribution: Exam 1, Fall 2012 Grade Summary Points: Mean 95.3 Median 93 Std. Dev 8.7 Max 116 Min 83 Percentage: Average 79.4 Grade Distribution: Name: BIOL 464/GEN 535 Population Genetics Fall 2012 Test # 1, 09/26/2012

More information

Lecture 21: Association Studies and Signatures of Selection. November 6, 2006

Lecture 21: Association Studies and Signatures of Selection. November 6, 2006 Lecture 21: Association Studies and Signatures of Selection November 6, 2006 Announcements Outline due today (10 points) Only one reading for Wednesday: Nielsen, Molecular Signatures of Natural Selection

More information

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? 1 Linkage & Recombination HUH? What? Why? Who cares? How? Multiple choice question. Each colored line represents

More information

The evolutionary significance of structure. Detecting and describing structure. Implications for genetic variability

The evolutionary significance of structure. Detecting and describing structure. Implications for genetic variability Population structure The evolutionary significance of structure Detecting and describing structure Wright s F statistics Implications for genetic variability Inbreeding effects of structure The Wahlund

More information

Deleterious mutations

Deleterious mutations Deleterious mutations Mutation is the basic evolutionary factor which generates new versions of sequences. Some versions (e.g. those concerning genes, lets call them here alleles) can be advantageous,

More information

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool. 11.1 Genetic Variation Within Population KEY CONCEPT A population shares a common gene pool. 11.1 Genetic Variation Within Population! Genetic variation in a population increases the chance that some individuals

More information

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome Of course, every person on the planet with the exception of identical twins has a unique

More information

The Evolution of Populations

The Evolution of Populations Chapter 23 The Evolution of Populations PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Papers for 11 September

Papers for 11 September Papers for 11 September v Kreitman M (1983) Nucleotide polymorphism at the alcohol-dehydrogenase locus of Drosophila melanogaster. Nature 304, 412-417. v Hishimoto et al. (2010) Alcohol and aldehyde dehydrogenase

More information

Population- group of individuals of the SAME species that live in the same area Species- a group of similar organisms that can breed and produce

Population- group of individuals of the SAME species that live in the same area Species- a group of similar organisms that can breed and produce Dr. Bertolotti Essential Question: Population- group of individuals of the SAME species that live in the same area Species- a group of similar organisms that can breed and produce FERTILE offspring Allele-

More information

Assumptions of Hardy-Weinberg equilibrium

Assumptions of Hardy-Weinberg equilibrium Migration and Drift Assumptions of Hardy-Weinberg equilibrium 1. Mating is random 2. Population size is infinite (i.e., no genetic drift) 3. No migration 4. No mutation 5. No selection An example of directional

More information

Lecture 11: Genetic Drift and Effective Population Size. October 1, 2012

Lecture 11: Genetic Drift and Effective Population Size. October 1, 2012 Lecture 11: Genetic Drift and Effective Population Size October 1, 2012 Last Time Introduction to genetic drift Fisher-Wright model of genetic drift Diffusion model of drift Effects within and among subpopulations

More information

A Primer of Ecological Genetics

A Primer of Ecological Genetics A Primer of Ecological Genetics Jeffrey K. Conner Michigan State University Daniel L. Hartl Harvard University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Contents Preface xi Acronyms,

More information

Constancy of allele frequencies: -HARDY WEINBERG EQUILIBRIUM. Changes in allele frequencies: - NATURAL SELECTION

Constancy of allele frequencies: -HARDY WEINBERG EQUILIBRIUM. Changes in allele frequencies: - NATURAL SELECTION THE ORGANIZATION OF GENETIC DIVERSITY Constancy of allele frequencies: -HARDY WEINBERG EQUILIBRIUM Changes in allele frequencies: - MUTATION and RECOMBINATION - GENETIC DRIFT and POPULATION STRUCTURE -

More information

Patterns of Mutation and Selection at Synonymous Sites in Drosophila

Patterns of Mutation and Selection at Synonymous Sites in Drosophila Patterns of Mutation and Selection at Synonymous Sites in Drosophila Nadia D. Singh,* Vanessa L. Bauer DuMont,* Melissa J. Hubisz, Rasmus Nielsen,à and Charles F. Aquadro* *Department of Molecular Biology

More information

February 10, 2005 Bio 107/207 Winter 2005 Lecture 12 Molecular population genetics. I. Neutral theory

February 10, 2005 Bio 107/207 Winter 2005 Lecture 12 Molecular population genetics. I. Neutral theory February 10, 2005 Bio 107/207 Winter 2005 Lecture 12 Molecular population genetics. I. Neutral theory Classical versus balanced views of genome structure - like many controversies in evolutionary biology,

More information

Detecting selection on nucleotide polymorphisms

Detecting selection on nucleotide polymorphisms Detecting selection on nucleotide polymorphisms Introduction At this point, we ve refined the neutral theory quite a bit. Our understanding of how molecules evolve now recognizes that some substitutions

More information

Human linkage analysis. fundamental concepts

Human linkage analysis. fundamental concepts Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal

More information

Evolutionary Genetics: Part 1 Polymorphism in DNA

Evolutionary Genetics: Part 1 Polymorphism in DNA Evolutionary Genetics: Part 1 Polymorphism in DNA S. chilense S. peruvianum Winter Semester 2012-2013 Prof Aurélien Tellier FG Populationsgenetik Color code Color code: Red = Important result or definition

More information

Forces Determining Amount of Genetic Diversity

Forces Determining Amount of Genetic Diversity Forces Determining Amount of Genetic Diversity The following are major factors or forces that determine the amount of diversity in a population. They also determine the rate and pattern of evolutionary

More information

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh Selective constraints on noncoding DNA of mammals Peter Keightley Institute of Evolutionary Biology University of Edinburgh Most mammalian noncoding DNA evolves rapidly Homo-Pan Divergence (%) 1.5 1.25

More information

Mutation Rates and Sequence Changes

Mutation Rates and Sequence Changes s and Sequence Changes part of Fortgeschrittene Methoden in der Bioinformatik Computational EvoDevo University Leipzig Leipzig, WS 2011/12 From Molecular to Population Genetics molecular level substitution

More information

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype? 1 Linkage & Recombination HUH? What? Why? Who cares? How? Multiple choice question. Each colored line represents

More information

New Alleles? Mendel reasoned thus: Point Mutation

New Alleles? Mendel reasoned thus: Point Mutation Modern Synthesis I BIOL 4415: Evolution Dr. Ben Waggoner In 1843, a young lad named Johann Mendel took his vows as an Augustinian monk in the Abbey of St. Thomas, in the city of Brno. As part of his vows,

More information

Neutral theory: The neutral theory does not say that all evolution is neutral and everything is only due to to genetic drift.

Neutral theory: The neutral theory does not say that all evolution is neutral and everything is only due to to genetic drift. Neutral theory: The vast majority of observed sequence differences between members of a population are neutral (or close to neutral). These differences can be fixed in the population through random genetic

More information

The Modern Synthesis. Terms and Concepts. Evolutionary Processes. I. Introduction: Where do we go from here? What do these things have in common?

The Modern Synthesis. Terms and Concepts. Evolutionary Processes. I. Introduction: Where do we go from here? What do these things have in common? Evolutionary Processes I. Introduction - The modern synthesis Reading: Chap. 25 II. No evolution: Hardy-Weinberg equilibrium A. Population genetics B. Assumptions of H-W III. Causes of microevolution (forces

More information

AP Biology: Allele A1 Lab

AP Biology: Allele A1 Lab AP Biology: Allele A1 Lab Allele A1 Download: http://tinyurl.com/8henahs In today s lab we will use a computer program called AlleleA 1 to study the effects of the different evolutionary forces mutation,

More information

Lecture #8 2/4/02 Dr. Kopeny

Lecture #8 2/4/02 Dr. Kopeny Lecture #8 2/4/02 Dr. Kopeny Lecture VI: Molecular and Genomic Evolution EVOLUTIONARY GENOMICS: The Ups and Downs of Evolution Dennis Normile ATAMI, JAPAN--Some 200 geneticists came together last month

More information

If today s lecture is perplexing, there is a useful film at

If today s lecture is perplexing, there is a useful film at 1 Roadmap Chromosome evolution: Inversion Transposition Fission and fusion If today s lecture is perplexing, there is a useful film at http://www.youtube.com/watch?v=zcnymmhlkaw 2 One-minute responses

More information

Population Genetics Modern Synthesis Theory The Hardy-Weinberg Theorem Assumptions of the H-W Theorem

Population Genetics Modern Synthesis Theory The Hardy-Weinberg Theorem Assumptions of the H-W Theorem Population Genetics A Population is: a group of same species organisms living in an area An allele is: one of a number of alternative forms of the same gene that may occur at a given site on a chromosome.

More information

Supplementary Material online Population genomics in Bacteria: A case study of Staphylococcus aureus

Supplementary Material online Population genomics in Bacteria: A case study of Staphylococcus aureus Supplementary Material online Population genomics in acteria: case study of Staphylococcus aureus Shohei Takuno, Tomoyuki Kado, Ryuichi P. Sugino, Luay Nakhleh & Hideki Innan Contents Estimating recombination

More information

Selection and genetic drift

Selection and genetic drift Selection and genetic drift Introduction There are three basic facts about genetic drift that I really want you to remember, even if you forget everything else I ve told you about it: 1. Allele frequencies

More information

homology - implies common ancestry. If you go back far enough, get to one single copy from which all current copies descend (premise 1).

homology - implies common ancestry. If you go back far enough, get to one single copy from which all current copies descend (premise 1). Drift in Large Populations -- the Neutral Theory Recall that the impact of drift as an evolutionary force is proportional to 1/(2N) for a diploid system. This has caused many people to think that drift

More information

2. The rate of gene evolution (substitution) is inversely related to the level of functional constraint (purifying selection) acting on the gene.

2. The rate of gene evolution (substitution) is inversely related to the level of functional constraint (purifying selection) acting on the gene. NEUTRAL THEORY TOPIC 3: Rates and patterns of molecular evolution Neutral theory predictions A particularly valuable use of neutral theory is as a rigid null hypothesis. The neutral theory makes a wide

More information

Summary Genes and Variation Evolution as Genetic Change. Name Class Date

Summary Genes and Variation Evolution as Genetic Change. Name Class Date Chapter 16 Summary Evolution of Populations 16 1 Genes and Variation Darwin s original ideas can now be understood in genetic terms. Beginning with variation, we now know that traits are controlled by

More information

Human linkage analysis. fundamental concepts

Human linkage analysis. fundamental concepts Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal

More information

CHAPTERS 16 & 17: DNA Technology

CHAPTERS 16 & 17: DNA Technology CHAPTERS 16 & 17: DNA Technology 1. What is the function of restriction enzymes in bacteria? 2. How do bacteria protect their DNA from the effects of the restriction enzymes? 3. How do biologists make

More information

Questions/Comments/Concerns/Complaints

Questions/Comments/Concerns/Complaints Reminder Exam #1 on Friday Jan 29 Lectures 1-6, QS 1-3 Office Hours: Course web-site Josh Thur, Hitchcock 3:00-4:00 (?) Bring a calculator Questions/Comments/Concerns/Complaints Practice Question: Product

More information

LAB. POPULATION GENETICS. 1. Explain what is meant by a population being in Hardy-Weinberg equilibrium.

LAB. POPULATION GENETICS. 1. Explain what is meant by a population being in Hardy-Weinberg equilibrium. Period Date LAB. POPULATION GENETICS PRE-LAB 1. Explain what is meant by a population being in Hardy-Weinberg equilibrium. 2. List and briefly explain the 5 conditions that need to be met to maintain a

More information

This is DUE: Tuesday, March 1, 2011 Come prepared to share your findings with your group.

This is DUE: Tuesday, March 1, 2011 Come prepared to share your findings with your group. Biology 160 NAME: Reading Guide 12: Population Dynamics, Humans, Part II This is DUE: Tuesday, March 1, 2011 Come prepared to share your findings with your group. *As before, please turn in only the Critical

More information

GENETICS - CLUTCH CH.21 POPULATION GENETICS.

GENETICS - CLUTCH CH.21 POPULATION GENETICS. !! www.clutchprep.com CONCEPT: HARDY-WEINBERG Hardy-Weinberg is a formula used to measure the frequencies of and genotypes in a population Allelic frequencies are the frequency of alleles in a population

More information

MECHANISMS FOR EVOLUTION CHAPTER 20

MECHANISMS FOR EVOLUTION CHAPTER 20 MECHANISMS FOR EVOLUTION CHAPTER 20 Objectives State the Hardy-Weinburg theorem Write the Hardy-Weinburg equation and be able to use it to calculate allele and genotype frequencies List the conditions

More information

What is genetic variation?

What is genetic variation? enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center

More information

Molecular Evolution Course #27615

Molecular Evolution Course #27615 Molecular Evolution Course #27615 Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) gorm@cbs.dtu.dk Neutral Theory of Molecular

More information

a) In terms of the gene pool, evolution can be defined as a generation to generation change in the allele frequencies within a population.

a) In terms of the gene pool, evolution can be defined as a generation to generation change in the allele frequencies within a population. I. Population Genetics Figure 1: Gene Pool Gene Pool: a) In terms of the gene pool, evolution can be defined as a generation to generation change in the allele frequencies within a population. Figure 2:

More information