Population Genetics Matthew B. Hamilton

Size: px
Start display at page:

Download "Population Genetics Matthew B. Hamilton"

Transcription

1 Population Genetics Matthew B. Hamilton A John Wiley & Sons, Ltd., Publication

2

3 Population Genetics

4 Dedication For my wife and best friend, I-Ling.

5 Population Genetics Matthew B. Hamilton A John Wiley & Sons, Ltd., Publication

6 This edition first published 2009, 2009 by Matthew B. Hamilton Blackwell Publishing was acquired by John Wiley & Sons in February Blackwell s publishing program has been merged with Wiley s global Scientific, Technical and Medical business to form Wiley-Blackwell. Registered office: John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK 111 River Street, Hoboken, NJ , USA For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloguing-in-Publication Data Hamilton, Matthew B. Population genetics / Matthew B. Hamilton. p. ; cm. Includes bibliographical references and index. ISBN (hbk. : alk. paper) 1. Population genetics. I. Title. [DNLM: 1. Genetics, Population. QU 450 H219p 2009] QH455.H dc A catalogue record for this book is available from the British Library. Set in 10/12.5pt Photina by Graphicraft Limited, Hong Kong Printed and bound in Malaysia

7 Contents Preface and acknowledgments, xi 1 Thinking like a population geneticist, Expectations, 1 Parameters and parameter estimates, 2 Inductive and deductive reasoning, Theory and assumptions, Simulation, 6 Interact box 1.1 The textbook website, 7 Chapter 1 review, 8 Further reading, 8 2 Genotype frequencies, Mendel s model of particulate genetics, Hardy Weinberg expected genotype frequencies, 13 Interact box 2.1 Genotype frequencies, Why does Hardy Weinberg work?, Applications of Hardy Weinberg, 19 Forensic DNA profiling, 19 Problem box 2.1 The expected genotype frequency for a DNA profile, 22 Testing for Hardy Weinberg, 22 Box 2.1 DNA profiling, 22 Interact box 2.2 χ 2 test, 26 Assuming Hardy Weinberg to test alternative models of inheritance, 26 Problem box 2.2 Proving allele frequencies are obtained from expected genotype frequencies, 27 Problem box 2.3 Inheritance for corn kernel phenotypes, The fixation index and heterozygosity, 28 Interact box 2.3 Assortative mating and genotype frequencies, 29 Box 2.2 Protein locus or allozyme genotyping, Mating among relatives, 33 Impacts of inbreeding on genotype and allele frequencies, 33 Inbreeding coefficient and autozygosity in a pedigree, 34 Phenotypic consequences of inbreeding, 37 The many meanings of inbreeding, Gametic disequilibrium, 41 Interact box 2.4 Decay of gametic disequilibrium and a χ 2 test, 44 Physical linkage, 45 Natural selection, 46 Interact box 2.5 Gametic disequilibrium under both recombination and natural selection, 46

8 vi CONTENTS Mutation, 47 Mixing of diverged populations, 47 Mating system, 48 Chance, 48 Interact box 2.6 Estimating genotypic disequilibrium, 49 Chapter 2 review, 50 Further reading, 50 Problem box answers, 51 3 Genetic drift and effective population size, The effects of sampling lead to genetic drift, 53 Interact box 3.1 Genetic drift, Models of genetic drift, 58 The binomial probability distribution, 58 Problem box 3.1 Applying the binomial formula, 60 Math box 3.1 Variance of a binomial variable, 62 Markov chains, 62 Interact box 3.2 Genetic drift simulated with a Markov chain model, 65 Problem box 3.2 Constructing a transition probability matrix, 66 The diffusion approximation of genetic drift, Effective population size, 73 Problem box 3.3 Estimating N e from information about N, Parallelism between drift and inbreeding, Estimating effective population size, 80 Interact box 3.3 Heterozygosity,and inbreeding over time in finite populations, 81 Different types of effective population size, 82 Problem box 3.4 Estimating N e from observed heterozygosity over time, 85 Breeding effective population size, 85 Effective population sizes of different genomes, Gene genealogies and the coalescent model, 87 Math box 3.2 Approximating the probability of a coalescent event with the exponential distribution, 93 Interact box 3.4 Build your own coalescent genealogies, Effective population size in the coalescent model, 96 Interact box 3.5 Simulating gene genealogies in populations with different effective sizes, 97 Coalescent genealogies and population bottlenecks, 98 Coalescent genealogies in growing and shrinking populations, 99 Interact box 3.6 Coalescent genealogies in populations with changing size, 101 Chapter 3 review, 101 Further reading, 102 Problem box answers, Population structure and gene flow, Genetic populations, 105 Method box 4.1 Are allele frequencies random or clumped in two dimensions?, Direct measures of gene flow, 111 Problem box 4.1 Calculate the probability of a random haplotype match and the exclusion probability, 117 Interact box 4.1 Average exclusion probability for a locus, Fixation indices to measure the pattern of population subdivision, 118 Problem box 4.2 Compute F IS, F ST,and F IT, 122 Method box 4.2 Estimating fixation indices, 124

9 CONTENTS vii 4.4 Population subdivision and the Wahlund effect, 124 Interact box 4.2 Simulating the Wahlund effect, 127 Problem box 4.3 Account for population structure in a DNA-profile match probability, Models of population structure, 131 Continent-island model, 131 Interact box 4.3 Continent-island model of gene flow, 134 Two-island model, 134 Infinite island model, 135 Interact box 4.4 Two-island model of gene flow, 136 Math box 4.1 The expected value of F ST in the infinite island model, 138 Problem box 4.4 Expected levels of F ST for Y-chromosome and organelle loci, 139 Interact box 4.5 Finite island model of gene flow, 139 Stepping-stone and metapopulation models, The impact of population structure on genealogical branching, 142 Combining coalescent and migration events, 143 The average length of a genealogy with migration, 144 Interact box 4.6 Coalescent events in two demes, 145 Math box 4.2 Solving two equations with two unknowns for average coalescence times, 148 Chapter 4 review, 149 Further reading, 150 Problem box answers, Mutation, The source of all genetic variation, The fate of a new mutation, 160 Chance a mutation is lost due to Mendelian segregation, 160 Fate of a new mutation in a finite population, 162 Interact box 5.1 Frequency of neutral mutations in a finite population, 163 Geometric model of mutations fixed by natural selection, 164 Muller s Ratchet and the fixation of deleterious mutations, 166 Interact box 5.2 Muller s Ratchet, Mutation models, 168 Mutation models for discrete alleles, 169 Interact box 5.3 R ST and F ST as examples of the consequences of different mutation models, 172 Mutation models for DNA sequences, The influence of mutation on allele frequency and autozygosity, 173 Math box 5.1 Equilibrium allele frequency with two-way mutation, 176 Interact box 5.4 Simulating irreversible and bi-directional mutation, The coalescent model with mutation, 178 Interact box 5.5 Build your own coalescent genealogies with mutation, 181 Chapter 5 review, 183 Further reading, Fundamentals of natural selection, Natural selection, 185 Natural selection with clonal reproduction, 185 Problem box 6.1 Relative fitness of HIV genotypes, 189 Natural selection with sexual reproduction, General results for natural selection on a diallelic locus, 193 Math box 6.1 The change in allele frequency each generation under natural selection, 194 Selection against a recessive phenotype, 195 Selection against a dominant phenotype, 196

10 viii CONTENTS General dominance, 197 Heterozygote disadvantage, 198 Heterozygote advantage, 198 The strength of natural selection, 199 Math box 6.2 Equilibrium allele frequency with overdominance, How natural selection works to increase average fitness, 200 Average fitness and rate of change in allele frequency, 201 Problem box 6.2 Mean fitness and change in allele frequency, 203 The fundamental theorem of natural selection, 203 Interact box 6.1 Natural selection on one locus with two alleles, 203 Chapter 6 review, 206 Further reading, 206 Problem box answers, Further models of natural selection, Viability selection with three alleles or two loci, 208 Natural selection on one locus with three alleles, 209 Problem box 7.1 Marginal fitness and Δp for the Hb C allele, 211 Interact box 7.1 Natural selection on one locus with three or more alleles, 211 Natural selection on two diallelic loci, Alternative models of natural selection, 216 Natural selection via different levels of fecundity, 216 Natural selection with frequency-dependent fitness, 218 Natural selection with density-dependent fitness, 219 Math box 7.1 The change in allele frequency with frequency-dependent selection, 219 Interact box 7.2 Frequency-dependent natural selection, 220 Interact box 7.3 Density-dependent natural selection, Combining natural selection with other processes, 222 Natural selection and genetic drift acting simultaneously, 222 Interact box 7.4 The balance of natural selection and genetic drift at a diallelic locus, 224 The balance between natural selection and mutation, 225 Interact box 7.5 Natural selection and mutation, Natural selection in genealogical branching models, 226 Directional selection and the ancestral selection graph, 227 Problem box 7.2 Resolving possible selection events on an ancestral selection graph, 230 Genealogies and balancing selection, 230 Interact box 7.6 Coalescent genealogies with directional selection, 231 Chapter 7 review, 232 Further reading, 233 Problem box answers, Molecular evolution, The neutral theory, 235 Polymorphism, 236 Divergence, 237 Nearly neutral theory, 240 Interact box 8.1 The relative strengths of genetic drift and natural selection, Measures of divergence and polymorphism, 241 Box 8.1 DNA sequencing, 242 DNA divergence between species, 242 DNA sequence divergence and saturation, 243 DNA polymorphism, 248

11 CONTENTS ix 8.3 DNA sequence divergence and the molecular clock, 250 Interact box 8.2 Estimating π and S from DNA sequence data, 251 Dating events with the molecular clock, 252 Problem box 8.1 Estimating divergence times with the molecular clock, Testing the molecular clock hypothesis and explanations for rate variation in molecular evolution, 255 The molecular clock and rate variation, 255 Ancestral polymorphism and Poisson process molecular clock, 257 Math box 8.1 The dispersion index with ancestral polymorphism and divergence, 259 Relative rate tests of the molecular clock, 260 Patterns and causes of rate heterogeneity, Testing the neutral theory null model of DNA sequence evolution, 265 HKA test of neutral theory expectations for DNA sequence evolution, 265 MK test, 267 Tajima s D, 269 Problem box 8.2 Computing Tajima s D from DNA sequence data, 271 Mismatch distributions, 272 Interact box 8.3 Mismatch distributions for neutral genealogies in stable, growing, or shrinking populations, Molecular evolution of loci that are not independent, 274 Genetic hitch-hiking due to background or balancing selection, 278 Gametic disequilibrium and rates of divergence, 278 Chapter 8 review, 279 Further reading, 280 Problem box answers, Quantitative trait variation and evolution, Quantitative traits, 283 Problem box 9.1 Phenotypic distribution produced by Mendelian inheritance of three diallelic loci, 285 Components of phenotypic variation, 286 Components of genotypic variation (V G ), 288 Inheritance of additive (V A ), dominance (V D ), and epistasis (V I ) genotypic variation, 291 Genotype-by-environment interaction (V G E ), 292 Additional sources of phenotypic variance, 295 Math box 9.1 Summing two variances, Evolutionary change in quantitative traits, 297 Heritability, 297 Changes in quantitative trait mean and variance due to natural selection, 299 Estimating heritability by parent offspring regression, 302 Interact box 9.1 Estimating heritability with parent offspring regression, 303 Response to selection on correlated traits, 304 Interact box 9.2 Response to natural selection on two correlated traits, 306 Long-term response to selection, 307 Interact box 9.3 Response to selection and the number of loci that cause quantitative trait variation, 309 Neutral evolution of quantitative traits, 313 Interact box 9.4 Effective population size and genotypic variation in a neutral quantitative trait, Quantitative trait loci (QTL), 315 QTL mapping with single marker loci, 316 Problem box 9.2 Compute the effect and dominance coefficient of a QTL, 321 QTL mapping with multiple marker loci, 322

12 x CONTENTS Problem box 9.3 Derive the expected marker-class means for a backcross mating design, 324 Limitations of QTL mapping studies, 325 Biological significance of QTL mapping, 326 Interact box 9.5 Effect sizes and response to selection at QTLs, 328 Chapter 9 review, 330 Further reading, 330 Problem box answers, The Mendelian basis of quantitative trait variation, The connection between particulate inheritance and quantitative trait variation, 334 Scale of genotypic values, 334 Problem box 10.1 Compute values on the genotypic scale of measurement for IGF1 in dogs, Mean genotypic value in a population, Average effect of an allele, 337 Math box 10.1 The average effect of the A 1 allele, 339 Problem box 10.2 Compute the allele average effect of the IGF1 A 2 allele in dogs, Breeding value and dominance deviation, 341 Interact box 10.1 Average effects, breeding values, and dominance deviations, 345 Dominance deviation, Components of total genotypic variance, 348 Interact box 10.2 Components of total genotypic variance, V G, 350 Math box 10.2 Deriving the total genotypic variance, V G, Genotypic resemblance between relatives, 351 Chapter 10 review, 354 Further reading, 354 Problem box answers, Historical and synthetic topics, Historical controversies in population genetics, 356 The classical and balance hypotheses, 356 How to explain levels of allozyme polymorphism, 358 Genetic load, 359 Math box 11.1 Mean fitness in a population at equilibrium for balancing selection, 362 The selectionist/neutralist debates, Shifting balance theory, 366 Allele combinations and the fitness surface, 366 Wright s view of allele-frequency distributions, 368 Evolutionary scenarios imagined by Wright, 369 Critique and controversy over shifting balance, 372 Chapter 11 review, 374 Further reading, 374 Appendix, 376 Statistical uncertainty, 376 Problem box A.1 Estimating the variance, 378 Interact box A.1 The central limit theorem, 379 Covariance and correlation, 380 Further reading, 382 Problem box answers, 382 References, 383 Index, 396 Color plates appear in between pages

13 Preface and acknowledgments This book was born of two desires, one simple and the other more ambitious, both of which were motivated by my experiences learning and teaching population genetics. My first desire was to create a more up-to-date survey text of the field of population genetics. Several of the widely employed and respected standard texts were originally conceived in the mid-1980s. Although these texts have been revised over time, aspects of their organization and content are inherently dated. At the same time, I set out with the more ambitious goal of offering an alternative body of materials to enrich the manner in which population genetics is taught and learned. Much of population genetics during the twentieth century was hypothesis-rich but data-poor. The theory developed between about 1920 and 1980 spawned manifold predictions about basic evolutionary processes. However, most of these predictions could not be tested or tested with only very limited power for lack of appropriate or sufficient genetic data. In the last two decades, population genetics has become a field that is no longer data-limited. With the collection and open sharing of massive amounts of genomic data and the technical ability to collect large amounts of genetic information rapidly from almost any organism, population genetics has now become data-rich but relatively hypothesis-poor. Why? Because mainstream population genetics has struggled to develop and employ alternative testable hypotheses in addition to those offered by traditional null models. Innovation in developing contextspecific and testable alternative population genetic models is as much a requirement for hypothesis testing as empirical data. Such innovation, of course, first requires a sound understanding of the traditional and well-accepted models and hypotheses. It is often repeated that the major advance in population genetics over the last decade or two is the availability of huge amounts of genetic data generated by the ability to collect genetic data and to sequence entire genomes. It is certainly true that advances in molecular biology, DNA sequencing technology, and bioinformatics have provided a wealth of genetic data, some of it in the form of divergence or polymorphism data that is grist for the mill of population genetics hypothesis testing. An equally fundamental advance in population genetics has been the emergence of new models and expectations to match the genetic data that are now readily available. Coalescent or genealogical branching theory is primary among these conceptual advances. During the past two decades, coalescent theory has moved from an esoteric problem pursued for purely mathematical reasons to an important conceptual tool used to make testable predictions. Nonetheless, teaching of coalescent theory in undergraduate and graduate population genetics courses has not kept pace with the growing influence of coalescent theory in hypothesis testing. A major impediment has been the lack of teaching materials that make coalescent theory truly accessible to students learning population genetics for the first time. One of my goals was to construct a text that met this need with a systematic and thorough introduction to the concepts of coalescent theory and its applications in hypothesis testing. The chapter sections on coalescent theory are presented along with traditional theory of identity by descent on the same topics to help students see the commonality of the two approaches. However, the coalescence chapter sections could easily be assigned as a group. Another of my primary goals for this text was to offer material to engage the various learning styles possessed by individuals. Learning conceptual population genetics in the language of mathematics is often relatively easy for abstract and mathematical learners. However, my aim was to cater to a wide range of learning styles by building a range of features into the text. A key pedagogical feature in the book is formed by boxes set off from the main text that are designed to engage the various learning styles. These include Interact boxes that guide students through structured exercises in computer simulation utilizing software in the public domain. The simulation

14 xii PREFACE AND ACKNOWLEDGMENTS problems are active rather than reflective and should appeal to trial-and-error or visual learners. Additionally, simulations uniquely demonstrate the outcome of stochastic processes where the evaluation of numerous replicates is required before a pattern or generalization can be seen. Because understanding the biological impact of stochastic processes is a major hurdle for many students, the Interact boxes should improve learning and retention. Problem boxes placed in the text rather than at the end of chapters are designed to provide practice and to reinforce concepts as they are encountered, appealing to experiential learners. Math boxes that fully explain mathematical derivations appeal to mathematical and logical learners and also provide a great deal of insight for all readers into the many mathematical approximations employed in population genetics. Finally, the large number of two-color illustrations in the text were designed to appeal to and help cultivate visual learning. The teaching strategy employed in this text to cope with mathematics proficiency deserves further explanation. The undergraduate biology curricula employed at most US institutions has students take calculus in their first year and usually does not require the application of much if any mathematics within biology courses. This leads to students who have difficulty in or who avoid courses in biological disciplines that require explicit mathematical reasoning. Population genetics is built on basic mathematics and, in my experience, students obtain a much richer and nuanced understanding of the subject with some comprehension of these mathematical foundations. Therefore, I have attempted to deconstruct and offer step-by-step explanations the basic mathematics (mostly probability) required for a sound understanding of population genetics. For those readers with more interest or facility in mathematics, such as graduate students, the book also presents more difficult and detailed mathematical derivations in boxes that are separated from the main narrative of the text as well as chapter sections containing more mathematically rigorous content. These sections can be assigned or skipped depending on the level and scope of a course using this text. The Appendix further provides some very basic background in statistical concepts that are useful throughout the book and especially in Chapter 3 on genetic drift and Chapters 9 and 10 on quantitative genetics. This approach will hopefully provide students with the tools to develop their abilities in basic mathematics through application, and at the same time learn population genetics more fully. Members of my laboratory and the students who have taken my population genetics course provided a range of feedback on chapter drafts, figures, and effective means to explain the concepts herein. This feedback was absolutely invaluable and helped me shape the text into a more useful and usable resource for students. James Crow graciously reviewed each chapter and offered many insightful comments on points both nuanced and technical. Rachel Adams, Genevieve Croft, and Paulo Nuin provided many useful comments on each of the chapters as I wrote them. A.W.F. Edwards reviewed the material on the fundamental theorem in Chapter 6 and also provided the photograph of R.A. Fisher. Sivan Rottenstreich and Judy Miller patiently helped me with numerous mathematical points and derivations, including material included in the Math boxes. John Braverman supplied me with insights and thoughtprovoking discussions that contributed to this book. Ronda Rolfes and Martha Weiss also provided comments and suggestions. I also thank Paulo Nuin for his collaboration and hard work on the creation of PopGene.S 2. I also thank the anonymous reviewers from Aberdeen University, Arkansas State University, Cambridge University, Michigan State University, University of North Carolina, and University of Nottingham who provided feedback on some or all of the draft chapters. John Epifanio provided the allozyme gel picture in Chapter 2. Eric Delwart provided the original data used to draw a figure in Chapter 6. Michel Veuille shared information on Drosophila simulans DNA sequences used in an Interact box in Chapter 8. Peter Armbruster shared unpublished mosquito pupal mass data used in Chapter 9. John Dudley and Stephen Moose generously shared the Illinois Long- Term Selection experiment data used in Chapter 9. Robert J. Robbins kindly provided high-resolution scans from Sewall Wright s Chapter in an original copy of the Proceedings of the Sixth International Congress of Genetics (see I am grateful to Nancy Wilton for pushing me at the right times and for getting this project off the ground initially. Elizabeth Frank, Haze Humbert, and Karen Chambers of Wiley-Blackwell helped bring this book to fruition. I thank Nik Prowse for his expertise as a copy editor. I owe everyone at the Mathworks an enormous debt of gratitude since all of the simulations and many of the figures for this text were produced using Matlab. Matthew B. Hamilton September 2008

15 PREFACE AND ACKNOWLEDGMENTS xiii Interact boxes Throughout this book you will encounter Interact boxes. These boxes contain opportunities for you to interact directly with the material in the text using computer simulations designed to demonstrate fundamental concepts of population genetics. Each box will contain step-by-step instructions for you to follow in order to carry out a simulation. For instructions on how to get started with the simulations, see Interact box 1.1 on page 7. A companion website is available with interactive computer simulations for each chapter at: