Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Similar documents
Linkage Disequilibrium

Mapping and Mapping Populations

Identifying Genes Underlying QTLs

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Understanding genetic association studies. Peter Kamerman

Introduction to Quantitative Genomics / Genetics

SNPs - GWAS - eqtls. Sebastian Schmeier

Trudy F C Mackay, Department of Genetics, North Carolina State University, Raleigh NC , USA.

GDMS Templates Documentation GDMS Templates Release 1.0

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Computational Workflows for Genome-Wide Association Study: I

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Genome-wide association studies (GWAS) Part 1

I.1 The Principle: Identification and Application of Molecular Markers

Prostate Cancer Genetics: Today and tomorrow

Gene Mapping in Natural Plant Populations Guilt by Association

ABSTRACT : 162 IQUIRA E & BELZILE F*

Map-Based Cloning of Qualitative Plant Genes

A major gene for leaf cadmium. (Zea mays L.)

A brief introduction to Marker-Assisted Breeding. a BASF Plant Science Company

QTL Mapping, MAS, and Genomic Selection

Brassica carinata crop improvement & molecular tools for improving crop performance

High-density SNP Genotyping Analysis of Broiler Breeding Lines

LD Mapping and the Coalescent

Lecture 6: Association Mapping. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Genome-wide analyses in admixed populations: Challenges and opportunities

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Utilization of Genomic Information to Accelerate Soybean Breeding and Product Development through Marker Assisted Selection

Quantitative Genetics for Using Genetic Diversity

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Supplementary Note: Detecting population structure in rare variant data

1b. How do people differ genetically?

Association Mapping in Wheat: Issues and Trends

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Using molecular marker technology in studies on plant genetic diversity Final considerations

GBS Usage Cases: Examples from Maize

Association studies (Linkage disequilibrium)

Major Genes Conditioning Resistance to Rust in Common Bean. and a Protocol for Monitoring Local races of the Bean Rust Pathogne

Molecular and Applied Genetics

Algorithms for Genetics: Introduction, and sources of variation

Experimental Design and Sample Size Requirement for QTL Mapping

Human Genetics and Gene Mapping of Complex Traits

Crash-course in genomics

Genomics assisted Genetic enhancement Applications and potential in tree improvement

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

CMSC423: Bioinformatic Algorithms, Databases and Tools. Some Genetics

Identifying the functional bases of trait variation in Brassica napus using Associative Transcriptomics

Lecture 21: Association Studies and Signatures of Selection. November 6, 2006

Module 1 Principles of plant breeding

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

Population Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Topics in Statistical Genetics

AlphaSim software for

Authors: Vivek Sharma and Ram Kunwar

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics

Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Why do we need statistics to study genetics and evolution?

latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe

BTRY 7210: Topics in Quantitative Genomics and Genetics

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Identifying and exploiting natural variation

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

Implementing direct and indirect markers.

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

H3A - Genome-Wide Association testing SOP

Genome-Wide Association Studies (GWAS): Computational Them

Analysis of genome-wide genotype data

Quantitative Genetics, Genetical Genomics, and Plant Improvement

QTL ANALYSIS OF EMAMECTIN BENZOATE SUSCEPTIBILITY IN THE SALMON LOUSE Lepeophtheirus salmonis

A genome wide association study of metabolic traits in human urine

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

OBJECTIVES-ACTIVITIES 2-4

Agricultural Applications for Genome Sequencing

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

Principal Component Analysis in Genomic Data

Linking Genetic Variation to Important Phenotypes

Question. In the last 100 years. What is Feed Efficiency? Genetics of Feed Efficiency and Applications for the Dairy Industry

Genome-wide association studies for agronomical traits in a world wide spring barley collection

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Applied Bioinformatics

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

Genotype Prediction with SVMs

Lecture 1 Introduction to Modern Plant Breeding. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

General aspects of genome-wide association studies

Familial Breast Cancer

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Transcription:

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of recombination events o Result is that the QTL covers many cm Additional steps required to narrow QTL or clone gene Difficult to discover closely linked markers or the causative gene Association mapping (AM) An alternative to traditional QTL mapping o Uses the recombination events from many lineages o Discovers linked markers associated (=linked) to gene controlling the trait Major goal o Discover the causative SNP in a gene Exploits the natural variation found in a species o Landraces o Cultivars from multiple programs Discovers associations of broad application o Variation from regional breeding programs can also be utilized Associations useful for special local discovered

Problem with AM Association could be the result of population structure o Hypothetical example North America South America Plant Ht 10 10 12 11 13 9 11 10 13 12 4 6 5 7 6 6 4 5 9 5 Dis Res S S S T S S S S T S T S T T T T T S T T SNP1 T T T G T T T T G T G G G G G G T G T G SNP1 in Example Assumed the SNP it is associated with plant height or disease resistance o North American lines are Shorter and susceptible Allele T could be associated with either trait o South American lines are Taller and tolerant Allele G could be associated with either trait Associated with both traits because of population structure o These are false positive associations (Type I errors) Result o Population structure must be accounted for in analysis Key Principle Regarding AM Human o Common variant/common disease o A specific SNP in a specific gene is responsible for a disease found throughout humans Plants o Common variant/common phenotype o A specific SNP in a specific gene is responsible for a disease found throughout a specific species Important Concept Related to Principle AM o Useful for discovering common variant o Each locus may account for only a small amount of the variation Bi-parental mapping o Useful for discovering rare alleles that control a phenotype o These alleles typically have a major effect

Idealized Cases Results for AM No association between marker and phenotype Marker 1 Allele 1 Allele 2 Case 100 100 Control 100 100 Association between marker and phenotype Marker 2 Allele 1 Allele 2 Case 200 0 Control 0 200 Methodology of AM 1. Define a population for analysis Should represent the diversity useful for goals of project o Specific to target of project Species-wide Use lines from all major subdivisions of the species Regional or local Use lines typical to target region 2. Genotype the population Genome-wide scan o Low density ~100 SSR markers Gel-based o Medium density ~1500 SNP Golden Gate assay o High density 50,000-500,000 SNPs Arabidopsis o 250,000 SNPs o Affymetrix chip

Candidate gene o Select genes that might control trait Sequence different genotypes Discover SNPs in gene Consider o 5 -UTR o Coding region o 3 -UTR 3. Accounting for Population Structure/Relatedness Define subpopulations o Select markers to genotype the population Markers should Distributed among all chromosomes All should be in linkage equilibrium Minor allele frequency >0.1 o Evaluating population structure STRUCTUE software Use matrix of percentage population membership in analysis Fixed effect Principal component (PC) Defines groups of individuals Select principal components that account for specific amount of variation 50% is a typical value Fixed effect o Evaluate relatedness Spagedi relatedness calculations Output is a table with all pairwise-comparisons Random effect

4. Statistical Analysis Marker-by-marker analysis o Regression of phenotype onto marker genotype Significant marker/trait associations discovered Analysis must controlling for population structure and/or relatedness o Most popular approach Mixed linear model Example formula: y = Pυ + S + I + e y = vector of phenotypic values P = matrix of structure or PC values v = vector regarding population structure (STRUCTURE of PC values) (fixed effect) S = vector of genotype values for each marker = vector of fixed effects for each marker (fixed effect) I = relatedness identity matrix = vector pertaining to recent ancestry (random effects) e = vector of residual effects Model from: Weber et al. 2008. Genetics 180:1221. 5. Choosing the correct model Evaluate all models individually o STRUCTURE o STRUCTURE and relatedness o PC o PC and relatedness Develop a P by P plot for each model o Y-axis Cumulative P values o X-axis Experimental P values for marker-by-marker analysis o Ideal situation o 5% of the cumulative P-values Select the approach that is linear or nearly

Figure 2. P-P plots for simple (no structure correction) for full (PC and relatedness correction). (From: Weber et al. 2008. Genetics 180:1221. 6. What is a Significant Association? When performing multiple analyses on the same phenotype dataset o At a P = 0.05 level 1 of 20 random associations will be significant Must account for this Type I error Bonferroni test o Divide experiment-wide error rate by number of comparisons Error rate of 0.05 and 100 comparisons P < 0.0005 would be significant o Conservative approach False discovery rate o P < 0.05 of FDR value Does AM Work?? Example: Arabidopsis flowering time and disease resistance genes Aranzana et al. 2005, PLoS Genetics 1(5): e60 Population o 95 Arabidopsis acessions from Europe Phenotyping o Flowering time o Disease response to three pathogens Genotyping o 876 random loci o 4 candidate genes Flowering time FRI Disease Resistance Rpm1, Rps2, Rps5 Statistical analysis o Population structure only correction

Results All four candidate loci strongly associated with expected phenotype Marker density had an effect Markers less than 10kb from loci strongly associated with phenotype for all traits Population structure effect Fig. 2. P-P plots. Effect of models on detecting significant associations. If structure is accounted for, the line would be straight. Marker distance effect

Fig 5. Effect of marker distance for detecting significant association (A) Flowering time (FRI); (B) Disease resistance (Rpm1); (C) Disease resistance (Rps2); (D) Disease resistance (Rps5)

Association Mapping or Bi-Parental QTL Mapping? 1. Issues to consider Effect of rare alleles o Effect on rare allele in the association population mean will be minimal o Locus will not be detected by the AM approach The effect of a rare allele can be detected in a biparental population Effect of common alleles o Common alleles are a component of phenotypic expression Effect found through out the population (species) and can be discovered using AM Contribution of any one allele to phenotype may be small (R2<10%) 2. What is your goal? Discover, analyze, and test genes of major effect o Bi-parental populations of divergent parents and traditional (CIM) is best approach Dissect the factors controlling a phenotype through out a population o AM of appropriate population