Measurement of Molecular Genetic Variation. Forces Creating Genetic Variation. Mutation: Nucleotide Substitutions

Similar documents
Random Allelic Variation

Genetic drift. 1. The Nature of Genetic Drift

Detecting selection on nucleotide polymorphisms

homology - implies common ancestry. If you go back far enough, get to one single copy from which all current copies descend (premise 1).

Lecture #8 2/4/02 Dr. Kopeny

Review. Molecular Evolution and the Neutral Theory. Genetic drift. Evolutionary force that removes genetic variation

How about the genes? Biology or Genes? DNA Structure. DNA Structure DNA. Proteins. Life functions are regulated by proteins:

Concepts: What are RFLPs and how do they act like genetic marker loci?

Bio 121 Practice Exam 3

It s not a fundamental force like mutation, selection, and drift.

Chapter 14: Genes in Action

Haplotype Structure and Population Genetic Inferences from Nucleotide- Sequence Variation in Human Lipoprotein Lipase

Conifer Translational Genomics Network Coordinated Agricultural Project

Gene mutation and DNA polymorphism

Basic Concepts of Human Genetics

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

8/21/2014. From Gene to Protein

Mutations during meiosis and germ line division lead to genetic variation between individuals

Introduction to population genetics. CRITFC Genetics Training December 13-14, 2016

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

The neutral theory of molecular evolution

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

HARDY WEIBERG EQUILIBRIUM & BIOMETRY

Genetic variation, genetic drift (summary of topics)

An introduction to genetics and molecular biology

POPULATION GENETICS. Evolution Lectures 4

Conifer Translational Genomics Network Coordinated Agricultural Project

Basic Concepts of Human Genetics

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Lecture 10 Molecular evolution. Jim Watson, Francis Crick, and DNA

AP BIOLOGY Population Genetics and Evolution Lab

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Today s lecture: Types of mutations and their impact on protein function

CHAPTER 21 LECTURE SLIDES

Genomes summary. Bacterial genome sizes

Lesson 3 Gel Electrophoresis of Amplified PCR Samples and Staining of Agarose Gels

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

Genome research in eukaryotes

Population Genetics (Learning Objectives)

MATH 5610, Computational Biology

LS50B Problem Set #7

Lecture 12: Effective Population Size and Gene Flow. October 5, 2012

The Evolution of Populations

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Characterization of Allele-Specific Copy Number in Tumor Genomes

DNA Structure & the Genome. Bio160 General Biology

3I03 - Eukaryotic Genetics Repetitive DNA

Gene Expression: Transcription

R1 12 kb R1 4 kb R1. R1 10 kb R1 2 kb R1 4 kb R1

Genetic load. For the organism as a whole (its genome, and the species), what is the fitness cost of deleterious mutations?

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Mutation Rates and Sequence Changes

Introduction to Molecular Biology

Molecular Genetics Student Objectives

Chapter 13. From DNA to Protein

Genome-Wide Association Studies (GWAS): Computational Them

Lecture 20: Drosophila melanogaster

PV92 PCR Bio Informatics

LINKAGE DISEQUILIBRIUM MAPPING USING SINGLE NUCLEOTIDE POLYMORPHISMS -WHICH POPULATION?

Human linkage analysis. fundamental concepts

Population genetics. Population genetics provides a foundation for studying evolution How/Why?

Gene Regulation & Mutation 8.6,8.7

Population and Community Dynamics. The Hardy-Weinberg Principle

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008)

The genetic material

I. Prokaryotic Gene Regulation. Figure 1: Operon. Operon:

can be found from OMIM (Online Mendelian Inheritance in Man),

Molecular Markers CRITFC Genetics Workshop December 9, 2014

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

Cornell Probability Summer School 2006

REVIEW 5: EVOLUTION UNIT. A. Top 10 If you learned anything from this unit, you should have learned:

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity.

3. INHERITED MUTATIONS

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Mutation entries in SMA databases Guidelines for national curators

Mapping strategies for sequence reads

GENETICS: BIOLOGY HSA REVIEW

Methods for the analysis of nuclear DNA data

Bio 6 Natural Selection Lab

MUTATION RATES. Genetica per Scienze Naturali a.a prof S. Presciuttini

SUPPLEMENTARY INFORMATION

Gene Mutation, DNA Repair, and Transposition

Linking Genetic Variation to Important Phenotypes

Initial sequence of the chimpanzee genome and comparison with the human genome

SNP calling and VCF format

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

RNA and PROTEIN SYNTHESIS. Chapter 13

Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino

Chapter 14 Active Reading Guide From Gene to Protein

Introduction to Basic Human Genetics. Professor Hanan Hamamy Department of Genetic Medicine and Development Geneva University Switzerland

Using mutants to clone genes

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Multiple choice questions (numbers in brackets indicate the number of correct answers)

2. The rate of gene evolution (substitution) is inversely related to the level of functional constraint (purifying selection) acting on the gene.

Approximate likelihood methods for estimating local recombination rates

Supplementary Note: Detecting population structure in rare variant data

Introduction to the UCSC genome browser

Transcription:

Measurement of Molecular Genetic Variation Genetic Variation Is The Necessary Prerequisite For All Evolution And For Studying All The Major Problem Areas In Molecular Evolution. How We Score And Measure Genetic Variation Determines The Questions We Can Address And Even Our Fundamental View Of Evolution Forces Creating Genetic Variation Mutation Transposition Recombination and Gene Conversion Mutation: Nucleotide Substitutions Replacement, or Nonsynonymous, or 1

Mutation: Nucleotide Substitutions β-thalassaemia is a genetic disease characterized by a lowered rate of production of β-globin protein. Many Thalassaemia Alleles Are Due To Nucleotide Substitutions In Non-coding Regions of The β-globin Gene, Including The 5 and 3 Control Areas And Intron Splice Sites That Affect Either Transcription Or mrna Processing Mutation: Nucleotide Substitutions Not All Silent Mutations Are Phenotypically Silent. GGT GGT GAG G.. GGA GGT GAG G.. Both GGT and GGA code for Glycine, But GGAGGTGAGG Is A Splice Site Sequence, So Get Abnormal Splicing Mutation: Nucleotide Insertions & Deletions Many Thalassaemia Alleles Are Due Insertions and Deletions, Some Of Which Are Frameshift Mutations When They Occur In The Coding Region. 2

Transposition Many Insertions and Deletions Are Due To Transposons, With The Phenotypic Effects Ranging From Undetectable to Drastic, Depending Upon The Location Of The Site Of Insertion Or Excision Various types of repetitive elements in the human gene encoding homogentisate 1,2-dioxygenase Recombination/Gene Conversion Recombination and Gene Conversion Can Create Much Genetic Variation By Producing New Combinations of Genetic Variants At Two Or More Sites Genetic Variation At Different Biological Levels (All Subject To Study By Molecular Evolutionists) Genome Individual Local Population Among Local Populations Within A Species Between Species OR Pairs of Molecules 3

Variation Within A Genome The ribosomal DNA in Drosophila mercatorum exists as a tandem family of about 200 repeats on the X chromosome and about 80 on the Y chromosome. Variation Within An Individual In diploid individuals, there is the possibility of heterozygosity versus homozygosity at any homologous DNA region. One type of variation frequently scored at the individual level are SNP s (Single Nucleotide Polymorphisms) because their scoring can be automated. There are about 6 million SNPs available for humans. Variation Within An Individual Underreplication in Polytene Tissues in Drosophila There is also somatic variation, often in copy number, but one can also show individual heterozygosity for variation in this copy number (e.g., abnormal abdomen in D. mercatorum). Somatic cells can also differ due to somatic mutation, which can play an important role in individual phenotype (e.g., cancer) and in evolution (e.g., plants) 4

Variation Among Individuals Within A Local Population It is at this level that the number of ways of measuring diversity increase substantially Variation can be measured at both the genotypic level and the gamete level Genotypic Variation Within A Local Population (Single Locus or Single Site) The Number of Genotypes Genotype Frequencies =No. of Individuals with Specific A Genotype Total Number of Individuals Sampled Observed Heterozygosity = No. of Individuals Heterozygous at a Locus Total Number of Individuals Sampled The above can be averaged over several loci (but note that it is an average of single locus measures and not a true multi-loci statistic) Gametic Variation Within A Local Population (Single Locus or Single Site) Number of alleles or haplotypes in Gene Pool = n a The allele or haplotype frequency = p = No. of DNA copies with Specific Allele Total Number of DNA copies Sampled Percent Polymorphism = Percent of loci with p most common 0.95 Expected Heterozygosity (note, this is not a genotypic measure 5

Expected Heterozygosity A Gene Pool is the population of gene copies that are collectively shared by the individuals of a deme (local population) OR A Gene Pool is the population of potential gametes that can be produced by the individuals of a deme (in either case, the gene pool consists of haploid genetic elements). Expected Heterozygosity If n is the number of distinct alleles or haplotypes and p i = frequency of the i th allele or haplotype in gene pool, then If several loci are scored, then the average expected heterozygosity is: Expected Heterozygosity At the nucleotide level, for DNA sequence data: 1. If have SNPs, simply use previous equations 2. If have haplotypes, then use: where p i is the frequency of haplotype i in the gene pool, and π ij is the number of nucleotide differences between haplotypes i and j. Note, recall from classical neutral theory that: 6

Expected Heterozygosity At the nucleotide level, for DNA sequence data, another commonly encountered equation is: where V is the number of variable sites in the sample, L is the length in nucleotides of the DNA region surveyed, and n is the number of DNA molecules in the sample. This equation assumes the INFINITE SITES MODEL. Many statistics used in molecular evolution are based upon the infinite sites model in which each mutation occurs at a new nucleotide site. This model allows for no multiple mutational hits at a single site. When dealing with a region of DNA in which mutational hotspots exist, the statistics based upon the infinite sites model may be very misleading. Despite this danger, such statistics still are widely used and few test this assumption. Sensitivity To The Infinite Sites Assumption Palsboll et al. Evolution 58: 670-675, 2004. Looked at mtdna in fin whales to estimate θ and the gene flow parameter M=Nm: our results clearly show that one can arrive at radically different conclusions if applying the wrong mutation model during estimation. Posterior Distribution of θ under the infinite sites model Posterior Distribution of θ under a fitted finite sites model Posterior Distribution of M under the infinite sites model (implied either no gene flow or highly restricted gene flow & significant subdivision) Posterior Distribution of M under a fitted finite sites model (implied extensive gene flow and near panmixia) 7

Sensitivity To The Infinite Sites Assumption The Four Gamete Test For Detecting Recombination Start With Genetic Variation At Only One Site: Two Gamete Types A G A G C G C G Mutation At A Second Site Produces Three Gamete Types: A G A G C G C T Recombination Produces Four Gamete Types A G A T C G C T Start With Genetic Variation At Only One Site: Two Gamete Types A G A G C G C G Mutation At A Second Site Produces Three Gamete Types: A G A G C G C T Second Mutation At Site Produces Four Gamete Types A T A G C G C T 8

Genetic Survey of Lipoprotein Lipase LPL Has 10 Exons Over 30 kb of DNA on Chromosome 8p22 Sequenced 9,734 bp from the 3 End of Intron 3 to the 5 End of Intron 9 Sequenced: 24 Individuals from North Karelia, Finland (World s Highest Frequency of CAD) 23 European-Americans from Rochester, Minnesota 24 African-Americans from Jackson, Mississippi Found 88 Variable Sites Ignored Singleton and Doubleton Sites and Variation Due to a Tetranucleotide Repeat, but Phased the Remaining 69 Polymorphic Sites by a Combination of Using Allele Specific Primer Pairs and Haplotype Subtraction The Phased Site Data Identified 88 Distinct Haplotypes Clark et al. (Am. J. Human Genet. 63:595-612, 1998) Applied The 4-Gamete Test to the LPL Region and Inferred Extensive Recombination Uniformly Distributed Throughout The LPL Region A G A T C G C T But does this region satisfy the infinite sites model? Mutagenesis via 5-Methylcytosine 9

ANALYSIS OF HIGHLY MUTABLE SITES IN LPL TYPE OF SITE NUMBER OF NUMBER % POLY. PER NUCLEOTIDES POLYMORPHIC NUCLEOTIDE C P G 198 19 9.6% MONONUCLEOTIDE RUNS 5 456 15 3.3% POLYMERASE α ARREST SITE ± 3 NUCLEOTIDES 264 8 3.0% [TG(A/G)(A/G)GA] ALL OTHER NUCLEOTIDES 8,866 46 0.5% ln-liklihood RATIO TEST OF HOMOGENEITY = 99.8, 3 df, p 1.75 10-7 ln-liklihood RATIO TEST OF HOMOGENEITY WITHIN THE THREE MUTABLE CLASSES = 12.3, 2 df, p 0.002 Templeton et al. (Am. J. Human Genet. 66:69-83, 2000) Used A Test That Did Not Assume The Infinite Sites Model And Inferred Much Less Recombination Concentrated Into The 6 th Intron of the LPL Region Nucleotide Position The four gamete test was applied to human mtdna (a molecule with no recombination) and identified 413 recombination events uniformly distributed across the molecule! 10

Large Sections of Chromsome 19 Have Been Sequenced and Can Be Used To Study How Often Deviations From the Infinite Sites Model Occur An Analysis For The Mutagenic Effects of CpG Dinucleotides in the APOS Chromosome Block and for Heterogeneity Within The Block Must Look At Your Data Before Analysis! Pr ob. Homogenei ty Acr oss Bl ocks CG, C T CG, Other Non- CG 1.7 10-6 0.51 5.5 10-3 CG dinucleotides accounted for 4.7% of the nucleotides and 40% of the polymorphic sites. Prob. Homogeneity: 0.076 0.005 3.1 10-4 1.4 10-7 5.5 10-6 1.7 10-17 1.4 10-9 0.033 1.8 10-7 6.6 10-14 No Block Satisfied The Infinite Sites Model, But The Degree of Deviation Varied Significantly From Block To Block. Must Look At Your Data Before Analysis! Several tools are available to help you: e.g., Posada s ModelTest at http://darwin.uvigo.es/ 11

Variation Among Populations Within A Species n s = number of subpopulations with N s being size of subpopulation s, and n L = number of loci surveyed p ijs = frequency of allele (haplotype) i at locus j in subpopulation s p ij = frequency of allele (haplotype) i at locus j in total population Variation Among Populations Within A Species F ST Measures How Genetic Variation Is Distributed Within and Among Subpopulations on a 0-1 Scale All Demes Have Identical Gene Pools; All Variation Shared Equally Throughout The Species No Variation Within Demes; All Variation Exists As Differences Between Demes Gene Pools F ST 0 1 12

Wright Quantified The Balance of Gene Flow To Drift as Measured by F st for the Island Model Impact of Drift and Gene Flow On Average F In The Island Model Effect of Drift Alone On The Prob. Two Randomly Chosen Genes are I.B.D: 1 1 F(t) = + (1 - )F(t-1) 2N 2N With Gene Flow, Two Genes Can Only Be I.B.D. If They Are From the Same Deme (by assumption), So: 1 1 F(t) =[ + (1 - )F(t-1)](1-m) 2 2N 2N At equilibrium, F(t) = F(t-1) = F, and the above equation yields: F = F st 1 4Nm+1 This is the M seen before The equation: F 1 4Nm+1 is NOT the universal relationship Between gene flow and genetic drift, as often presented. E.g., consider The one-dimensional stepping stone model (isolation by distance): 13

Impact of Drift and Gene Flow On f st In The Stepping Stone Model f st 1 1+ 4N ev 2m 1 m When m 1 >> m Because the two migration parameters appear as the product m 1 m, this means that even small amounts of long distance Gene flow have a major impact on f st. The reason is that the evolutionary impact of gene flow Depends both on the amount of gene flow and the difference In allele frequency. The farther the distance, the greater the Difference in allele frequency in general, so long distance Dispersal has a disproportionate evolutionary impact F ST Can Also Be Applied To A Pair of Demes As A Measure of The Genetic Distance Between Their Gene Pools. Note: This Genetic Distance As Measured by F ST Is A Function Only Of Allele Frequency Differences In The Gene Pools Of The Two Demes Being Compared. Human Populations Theoretical Expectations To Wright s Isolation By Distance Model Nei Created Another Genetic Distance Between Populations Based On Allele Frequency Differences That Is Widely Used. Genetic Identity between two populations, X & Y: I XY ranges on a 0 to 1 scale, with 1 meaning that both populations share the same alleles with the same fequencies, and 0 meaning the gene pools share no alleles in common. Nei converted this into a genetic distance on a 0 to scale by a log transformation: 14

Variation Among Isolates And Species There Is No Gene Flow Among Isolates and Species, So Their Long Term Divergence Is Dominated By Mutation Genetic Distances Are The Most Common Measures Used To Quantify This Divergence Distance Estimates attempt to estimate the mean number of mutational changes per site since 2 species (or isolates, or sequences) split from each other Simply counting the number of differences (p distance) may underestimate the amount of change - especially if the sequences are very dissimilar - because of multiple hits We therefore use a model which includes parameters which reflect how we think sequences may have evolved The Bogus Proof Of Nei s Genetic Distance As Applied To Species Let t = the time of the split (cessation of gene flow), and assume a constant mutation rate of α. Then Nei showed : Expected Homozygosity at the time of the split. Probability of No Mutation In Lineage X by time t Probability of No Mutation In Lineage Y by time t Problems: Nei treats his expected homozygosity as a constant over time until a mutation occurs in one lineage, thereby destroying it completely. But J XY is exclusively a function of allele frequencies, and any evolutionary force that alters allele frequencies will cause J XY to change with time. Thus, J can change even in the absence of mutation. Moreover, mutation is modeled as reducing J to 0 instantly, which is incompatible with the definition of J in terms of allele frequencies. The Bogus Proof Of Nei s Genetic Distance As Applied To Species Nei Next Assumes That J X and J Y Are Constants Over Time (Realistic?). Then Nei showed : Nei Next Assumes That D XY (0) =0 (Not valid when one or more of the isolates was established by a founder event or if an ancestral population with F ST >0 is Fragmented): 15

The Bogus Proof Of Nei s Genetic Distance As Applied To Species The assumptions of Nei s proof of a linear genetic distance for species are not appropriate for a genetic distance for populations based on allele frequencies; rather they are more appropriate for the divergence of two DNA lineages from a common ancestral molecule. Nei was trying to force his genetic distance to have properties similar to DNA molecules as calculated by Kimura under neutrality. Nei s bogus justification of his distance through mutational accumulation has caused and continues to cause great confusion between population genetic distance (defined in terms of allele or haplotype frequencies) and molecule genetic distance. This distinction is critical, but is often not made in much of the literature, so BEWARE! Measurement of Molecular Genetic Variation We now turn our attention to molecules, but to do so, we must first look at coalescent theory. 16