Methods Available for the Analysis of Data from Dominant Molecular Markers

Similar documents
Papers for 11 September

b. less precise, but more efficient at detecting variation

Why do we need statistics to study genetics and evolution?

Lecture 5: Genetic Variation and Inbreeding. September 7, 2012

An Introduction to Population Genetics

Exam 1, Fall 2012 Grade Summary. Points: Mean 95.3 Median 93 Std. Dev 8.7 Max 116 Min 83 Percentage: Average Grade Distribution:

Studies on the Genetic Diversity of Wild Populations of Masu Salmon, Oncorhynchus mason mason, by Microsatellite DNA Markers

PopGen1: Introduction to population genetics

Basics of AFLP and. microsatellite analysis

GENE FLOW AND POPULATION STRUCTURE

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

COMPUTER SIMULATIONS AND PROBLEMS

University of York Department of Biology B. Sc Stage 2 Degree Examinations

Population stratification. Background & PLINK practical

Questions we are addressing. Hardy-Weinberg Theorem

Population Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC

Population genetic structure. Bengt Hansson

A Primer of Ecological Genetics

Conifer Translational Genomics Network Coordinated Agricultural Project

Measuring Evolution of Populations. SLIDE SHOW MODIFIED FROM KIM

Lecture 5: Inbreeding and Allozymes. Sept 1, 2006

Conservation Genetics Population Genetics: Diversity within versus among populations

POPULATION GENETIC STRUCTURE OF RARE AND ENDANGERED PLANTS USING MOLECULAR MARKERS

Mapping and Mapping Populations

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Measuring Evolution of Populations

1) (15 points) Next to each term in the left-hand column place the number from the right-hand column that best corresponds:

Park /12. Yudin /19. Li /26. Song /9

Package snpready. April 11, 2018

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Introduction to population genetics. CRITFC Genetics Training December 13-14, 2016

Lecture 10: Introduction to Genetic Drift. September 28, 2012

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM

Conifer Translational Genomics Network Coordinated Agricultural Project

Analysis of geographically structured populations: (Traditional) estimators based on gene frequencies

Edexcel (B) Biology A-level

Quantitative Genetics

Constancy of allele frequencies: -HARDY WEINBERG EQUILIBRIUM. Changes in allele frequencies: - NATURAL SELECTION

Population Genetics and Evolution

Biology 445K Winter 2007 DNA Fingerprinting

Genetic diversity and relationships among tea (Camellia sinensis) cultivars as revealed by RAPD and ISSR based fingerprinting

How well do evolutionary trees describe genetic relationships among populations?

The Modern Synthesis. Terms and Concepts. Evolutionary Processes. I. Introduction: Where do we go from here? What do these things have in common?

Population Genetics. Ben Hecht CRITFC Genetics Training December 11, 2013

Genotype AA Aa aa Total N ind We assume that the order of alleles in Aa does not play a role. The genotypic frequencies follow as

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

POPULATION GENETICS. Evolution Lectures 1

Lab 2: Mathematical Modeling: Hardy-Weinberg 1. Overview. In this lab you will:

Evolution of Populations (Ch. 17)

MEASURES OF GENETIC DIVERSITY

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Molecular Characterization of Heterotic Groups of Cotton through SSR Markers

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Introduction to Quantitative Genomics / Genetics

Using molecular marker technology in studies on plant genetic diversity Final considerations

Genetic variation of Garra rufa fish in Kermanshah and Bushehr provinces, Iran, using SSR microsatellite markers A B S T R A C T

January 6, 2005 Bio 107/207 Winter 2005 Lecture 2 Measurement of genetic diversity

The Evolution of Populations

Genes in Populations: Hardy Weinberg Equilibrium. Biostatistics 666

A little knowledge is a dangerous thing. So is a lot. Albert Einstein. Distribution of grades: Exam I. Genetics. Genetics. Genetics.

Two-locus models. Two-locus models. Two-locus models. Two-locus models. Consider two loci, A and B, each with two alleles:

Assessment of Genetic Variation and Distribution Pattern of Thalictrum petaloideum Detected by RAPDs

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Population genetics. Population genetics provides a foundation for studying evolution How/Why?

HARDY-WEINBERG EQUILIBRIUM

Introduction to Population Genetics. Spezielle Statistik in der Biomedizin WS 2014/15

International Journal of Science, Environment and Technology, Vol. 6, No 1, 2017,

Gene Flow and Paternity Analysis. Oct 6, 2006

Linkage Disequilibrium

Lesson: Measuring Microevolution

Distinguishing Among Sources of Phenotypic Variation in Populations

Week 7 - Natural Selection and Genetic Variation for Allozymes

HWE Tutorial (October 2007) Mary Jo Zurbey PharmD Candidate 2008

POPULATION GENETICS. Evolution Lectures 4

RESEARCH NOTE. Introduction. Material and methods. M. FALAHATI-ANBARAN 1,2, A. A. HABASHI 2, M. ESFAHANY 3, S. A. MOHAMMADI 2,4 and B.

International Journal of Science, Environment and Technology, Vol. 6, No 1, 2017,

Evolutionary and statistical properties of three genetic distances

PYPOP: A SOFTWARE FRAMEWORK FOR POPULATION GENOMICS: ANALYZING LARGE-SCALE MULTI-LOCUS GENOTYPE DATA

The evolutionary significance of structure. Detecting and describing structure. Implications for genetic variability

Virtual Lab 2 Hardy-Weinberg

Conifer Translational Genomics Network Coordinated Agricultural Project

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Chapter 25 Population Genetics

February 10, 2005 Bio 107/207 Winter 2005 Lecture 12 Molecular population genetics. I. Neutral theory

Molecular characterization of Aspergillus niger isolates inciting black mould rot of onion through RAPD

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human?

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Biotools: an R function to predict spatial gene diversity via an individual-based approach

Review. Molecular Evolution and the Neutral Theory. Genetic drift. Evolutionary force that removes genetic variation

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Office Hours. We will try to find a time

(a) Describe an experiment that could be used to test whether this is an evolutionary response or phenotypic plasticity.

Average % If you want to complete quiz corrections for extra credit you must come after school Starting new topic today. Grab your clickers.

The Hardy-Weinberg Principle. Essential Learning Objectives 1.A.1 (g) and 1.A.1 (h)

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

Chapter 3 Some Basic Concepts from Population Genetics

Transcription:

Methods Available for the Analysis of Data from Dominant Molecular Markers Lisa Wallace Department of Biology, University of South Dakota, 414 East Clark ST, Vermillion, SD 57069 Email: lwallace@usd.edu February, 2003 In the following descriptions, locus = band (these are observable on a gel); an allele is an estimated entity based on dominant data. I. Descriptive statistics of levels of diversity Descriptive population genetic statistics can be calculated based on phenotypic (i.e., band presence/absence) or genotypic (i.e., allele frequencies) data. If you choose to calculate allele frequencies, you are assuming Hardy-Weinberg equilibrium in populations, an outcrossing mating system, and nearly random mating. If you have information from other sources on the mating system or extent of random mating (e.g., from allozymes), that can be incorporated in your estimates of diversity based on dominant data. Only two alleles are considered to exist for a dominant marker locus, the dominant allele (or present; this does not imply that the presence of a band is dominant over the absence in a Mendelian sense) and the null (or visually absent) allele. The presence of a band indicates either a heterozygote or a homozygote for the dominant allele. Thus, allele frequencies are calculated based on the frequency of the null allele (i.e., the number of individuals without the band). Where q i represents the frequency of the null allele, and p i represents the frequency of the dominant allele, q i = # individuals for which the band was NOT present 1/2 total # individuals surveyed p i = 1- q i Several descriptive measures of diversity can be calculated, including: 1. Band frequencies (phenotypic data) 1

2. Allele frequencies (genotypic data) 3. Number of bands per primer, population, taxon, etc. 4. Number and frequencies of rare bands (it s up to you to determine and defend what constitutes rarity) 5. Percentage of polymorphic loci the number of loci where the band was observed in some individuals and not in other individuals. This too can be determined at various levels (e.g., population, taxon) 6. Gene diversity often indicated as Nei s heterozygosity or expected heterozygosity. Even calculated from allele frequencies, this is still only an estimate of the expected heterozygosity because the allele frequencies are an estimate of expected allele frequencies. Thus, estimates of this statistic should not be compared directly to estimates of heterozygosity that are based on true allele frequencies from allozymes or other codominant markers. The following estimates of diversity are from Mariette et al. (2002). a. Phenotypic gene diversity: H p = 1 P 2 i - Q i 2, where P i and Q i are the frequencies of band presence and absence, respectively. Estimates of H p are calculated for each locus, and the mean over all loci is used as the overall estimate of diversity at whatever hierarchical level you are interested in quantifying. b. Genotypic gene diversity: H g = 1 p 2 i - q i 2, where p 2 i - q i 2 are the frequencies of the dominant and null alleles, respectively. Calculate for each locus, and then the mean over all loci just as for phenotypic diversity described above. c. I do not know of a program that will calculate H p in this manner, but both POPGENE and TFPGA will calculate H g. With POPGENE, you can specify if you want to assume complete inbreeding, complete outcrossing, or something inbetween in the estimates of H g. In TFPGA, though, you only have the option of assuming complete outcrossing (i.e., Hardy-Weinberg equilibrium) in populations. TFPGA will give three estimates of genotypic diversity or heterozygosity. These include a direct count, expected heterozygosity under HWE, and Nei s (1978) unbiased heterozygosity. The first two measures should be the same and are H g. I don t recommend using Nei s estimate because I don t fully understand how the program calculates it. POPGENE calculates Nei s 2

diversity (1973), which should also be H g. I calculated estimates of H p by hand in a spreadsheet. You just need to do a lot of adding, multiplying, and subtracting. 7. Differences in the various estimates of diversity among taxa (populations are the experimental units) can be determined using non-parametric tests such as Kruskal-Wallis followed by Dunn s multiple comparisons if you find significant overall differences. See Zar (1996) or Sokal and Rohlf (1995) for formulas. You could also test for differences across populations using loci as the experimental units. II. Comparative statistics of diversity Genetic identities or distances are useful for getting an overall idea of how similar (or different) populations and taxa are. Like estimates of levels of diversity, genetic identities/distances can be calculated based on phenotypic or genotypic data. For phenotypic data, the similarity coefficient of Nei and Li (1979; = Dice s coefficient) is a commonly used measure, and can be calculated using NTSYS-pc. ARLEQUIN will calculate a raw estimate of the differences (i.e., the mean number of pairwise differences in bands within and between populations and taxa and inter-taxic distances are corrected to account for relative differences found within species). For genotypic data, any number of measures can be used, and the reader is referred to the manuals of TFPGA and POPGENE. Once calculated, identities/distances can be used in multivariate analyses (e.g., principal coordinates analysis) and in tree-building algorithms (UPGMA or Neighbor-joining). NTSYSpc will perform multivariate analyses and build trees. For PCO, generate a Dice similarity matrix of the data, DCENTER the matrix, and use the double centered matrix in EIGEN. For a tree, put the Dice similarity matrix into the NJOIN program or SAHN (for UPGMA). POPGENE and TFPGA only do UPGMA. I recommend using the NEIGHBOR algorithm in PHYLIP for a neighbor-joining analysis because you can then view the tree easily in TreeView. PAUP will also implement the neighbor-joining algorithm. Bootstrap support for trees can be determined in PAUP or with the RAPD programs developed by Bill Black. Use RAPDPLOT or RAPDDIST to generate multiple pseudo-replicate datasets of distances. Then, move this file over into the PHYLIP directory, and use NEIGHBOR to generate a tree from each of the distance matrices. Rename the resulting treefile and outfile to something like treefile1 3

and outfile1. Input treefile1 into CONSENSE to generate a consensus tree of the trees generated in NEIGHBOR. The bootstrap values will be in the outfile. The consensus tree can be viewed from the treefile in TreeView. These programs have limits on the number of populations and loci that can be used. Therefore, it might be easier to use PAUP to construct and bootstrap a tree. Similarity/distance matrices can also be compared to matrices based on other sets of data using a Mantel test (e.g., to compare physical distance among populations with how genetically similar/dissimilar they are or to compare taxonomic similarity based on molecular and morphological data). Mantel tests are most easily implemented using NTSYS-pc. III. Genetic structure Estimates of genetic structure or the degree of differentiation among populations can be estimated using a variety of measures, including an analysis of molecular variance (AMOVA), and by using ratios of other diversity statistics. I recommend using AMOVA. AMOVA can be implemented using ARLEQUIN, and the help file that comes with the program is quite thorough in its explanation of how to carry out the analysis. Should you wish to use ratios of estimates of diversity (e.g., H p or H g ) to determine population differentiation, the following is a guide: Amount of variation within populations = mean pop diversity/total species diversity Amount of variation among populations = [total species diversity mean pop diversity]/ total species diversity = 1 amount of variation within pops. If you are interested in genetic structure at more than two levels, then just adjust the above to match the number of levels you do have. For example, if you want to determine the amount of divergence among multiple regions as well as among populations, then the following would apply: Amount of variation among groups = [total species diversity mean regional diversity]/total species diversity 4

A new program call HICKORY and developed by Kent Holsinger and Paul Lewis at the University of Connecticut will calculate F-statistics, including F ST, and the inbreeding coefficient, f. Some of the analyses included in this software using Bayesian statistics to estimate population genetic parameters. See Holsinger et al. (2002) for more details about analyses performed by HICKORY. Programs and where to find them: NTSYS-pc (F.J. Rohlf) $230 for the latest version 2.1 from Exeter Software, Setauket, NY. http://www.exetersoftware.com/cat/ntsyspc/ntsyspc.html ARLEQUIN (S. Schneider, D. Roessli, L. Excoffier) Free at http://lgb.unige.ch/arlequin/ POPGENE (F. Yeh, R. Yang, T. Boyle) Free at http://www.ualberta.ca/~fyeh/ TFPGA (M. Miller) Free at http://bioweb.usu.edu/mpmbio/tfpga.htm PHYLIP (J. Felsenstein) Free at http://evolution.genetics.washington.edu/phylip.html RAPD programs (B. Black) Free at ftp://lamar.colostate.edu/pub/wcb4/ TreeView (R. Page) Free at http://taxonomy.zoology.gla.ac.uk/rod/treeview.html HICKORY (K. Holsinger, P. Lewis) Free at http://darwin.eeb.uconn.edu/hickory/hickory.html References Holsinger, K. E., P. O. Lewis, and D. K. Dey. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11: 1157-1164. Mariette, S., V. Le Corre, F. Austerlitz, and A. Kremer. 2002. Sampling within the genome for measuring within-population diversity: trade-offs between markers. Molecular Ecology 11: 1145-156. Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA 70: 3321-3323. Nei, M. 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89: 583-590. 5

Nei, M. and W. H. Li. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences, USA 76: 5269-5273. Sokal, R. R. and F. J. Rohlf. 1995. Biometry. Freeman, NY. Zar, J. H. 1996. Biostatistical Analysis. Prentice Hall, Upper Saddle River, NJ. 6