Conifer Translational Genomics Network Coordinated Agricultural Project

Similar documents
Conifer Translational Genomics Network Coordinated Agricultural Project

Identifying Genes Underlying QTLs

Conifer Translational Genomics Network Coordinated Agricultural Project

Understanding genetic association studies. Peter Kamerman

Conifer Translational Genomics Network Coordinated Agricultural Project

Introduction to Quantitative Genomics / Genetics

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

Conifer Translational Genomics Network Coordinated Agricultural Project

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize

Quantitative Genetics, Genetical Genomics, and Plant Improvement

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

A Primer of Ecological Genetics

Genome-Wide Association Studies (GWAS): Computational Them

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Linkage Disequilibrium

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Gene Mapping in Natural Plant Populations Guilt by Association

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Computational Workflows for Genome-Wide Association Study: I

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Genome-wide analyses in admixed populations: Challenges and opportunities

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013

General aspects of genome-wide association studies

SNPs - GWAS - eqtls. Sebastian Schmeier

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Genome-wide association studies (GWAS) Part 1

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

Quantitative Genetics for Using Genetic Diversity

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

FOREST GENETICS. The role of genetics in domestication, management and conservation of forest tree populations

BTRY 7210: Topics in Quantitative Genomics and Genetics

Genetics Effective Use of New and Existing Methods

Genomics assisted Genetic enhancement Applications and potential in tree improvement

Trudy F C Mackay, Department of Genetics, North Carolina State University, Raleigh NC , USA.

Park /12. Yudin /19. Li /26. Song /9

BTRY 7210: Topics in Quantitative Genomics and Genetics

QTL Mapping, MAS, and Genomic Selection

Inflorescence QTL, Canalization, and Selectable Cryptic Variation. Patrick J. Brown Department of Crop Sciences UIUC

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

Introduction to Quantitative Genetics

A unified mixed-model method for association mapping that accounts for multiple levels of relatedness

Mapping and Mapping Populations

Landscape Genomics A tool for managing public lands in the face of changing environments.

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Lecture 6: Association Mapping. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Linkage Disequilibrium. Adele Crane & Angela Taravella

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015

Why do we need statistics to study genetics and evolution?

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Why can GBS be complicated? Tools for filtering & error correction. Edward Buckler USDA-ARS Cornell University

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

Pedigree transmission disequilibrium test for quantitative traits in farm animals

H3A - Genome-Wide Association testing SOP

Supplementary Note: Detecting population structure in rare variant data

High-density SNP Genotyping Analysis of Broiler Breeding Lines

Molecular and Applied Genetics

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

RNA-SEQUENCING ANALYSIS

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

GREG GIBSON SPENCER V. MUSE

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

Module 1 Principles of plant breeding

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

An introduction to genetics and molecular biology

Methods for linkage disequilibrium mapping in crops

Let s call the recessive allele r and the dominant allele R. The allele and genotype frequencies in the next generation are:

Algorithms for Genetics: Introduction, and sources of variation

Quantitative Genetics

Anthro 101: Human Biological Evolution. Lecture 3: Genetics & Inheritance. Prof. Kenneth Feldmeier feldmekj.weebly.

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Prof. Dr. Konstantin Strauch

Anthro 101: Human Biological Evolution. Lecture 3: Genetics & Inheritance. Prof. Kenneth Feldmeier feldmekj.weebly.

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Prostate Cancer Genetics: Today and tomorrow

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Supplementary Figure 1 Genotyping by Sequencing (GBS) pipeline used in this study to genotype maize inbred lines. The 14,129 maize inbred lines were

LD Mapping and the Coalescent

Crash-course in genomics

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe

Lecture 21: Association Studies and Signatures of Selection. November 6, 2006

Brassica carinata crop improvement & molecular tools for improving crop performance

University of York Department of Biology B. Sc Stage 2 Degree Examinations

Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM

I.1 The Principle: Identification and Application of Molecular Markers

Ecological genomics and molecular adaptation: state of the Union and some research goals for the near future.

Genomic Selection with Linear Models and Rank Aggregation

Questions we are addressing. Hardy-Weinberg Theorem

AlphaSim software for

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

ABSTRACT : 162 IQUIRA E & BELZILE F*

Capabilities & Services

Transcription:

Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 11 Association Genetics Nicholas Wheeler & David Harry Oregon State University

What is association genetics? Association genetics is the process of identifying alleles that are disproportionately represented among individuals with different phenotypes. It is a population-based survey used to identify relationships between genetic markers and phenotypic traits Two approaches for grouping individuals By phenotype (e.g. healthy vs. disease) By marker genotype (similar to approach used in QTL studies) Two approaches for selecting markers for evaluation Candidate gene Whole genome

Association genetics: conceptual example Figure Credit: Nicholas Wheeler, Oregon State University

Comparing the approaches Criteria Family-based QTL Mapping Population-based Association Mapping Number of markers Relatively few (50 100 s) Many (100 s 1000 s) Populations QTL analysis Few parents or grandparents with many offspring (>500) Easy or complex. Sophisticated tools minimize ghost QTL and increase mapping precision Many individuals with unknown or mixed relationships. If pedigreed, family sizes are typically small (10 s) relative to sampled population (>500) Easy or complex. Sophisticated tools reduce risk of false positives Detection depends on Mapping precision Variation detected Extrapolation to other families or populations QTL segregation in offspring, and marker-trait linkage within-family(s) Poor (0.1 to 15 cm). QTL regions may contain many positional candidate genes Subset (only the portion segregating in sampled pedigrees) Poor. (Other families not segregating QTL, changes in marker phase, etc) QTL segregation in population, and markertrait LD in mapping population Can be excellent (10 s to 1000 s kb). Depends on population LD Larger subset. Theoretically all variation segregating in targeted regions of genome Good to excellent. (Although not all QTL will be segregate in all population / pedigree subsamples)

Essential elements of association genetics Appropriate populations Detection Verification Good phenotypic data Good genotypic data Markers (SNPs): Number determined by experimental approach Quality of SNP calls Missing data Appropriate analytical approach to detect significant associations

Flowchart of a gene association study Figure Credit: Modified from Flint-Garcia et al., 2005

An association mapping population with known kinship 32 parents 64 families ~1400 clones Figure Credits: Cooperative Forest Genetics Research Program, University of Florida

Phenotyping: Precision, accuracy, and more Figure Credits: Gary Peter, University of Florida

Genotyping: Potential genomic targets Figure Credit: Nicholas Wheeler and David Harry, Oregon State University

Whole genome or candidate gene? Let s look again at how this works Figure Credit: Reprinted from Current Opinion in Plant Biology, Vol 5, Rafalski, Applications of single nucleotide polymorphisms in crop genetics, pages 94-100, 2002, with permission from Elsevier

Local distribution of SNPs and genes Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics, Jorgenson and Witte, 2006

Candidate genes for novel (your) species Availability of candidate genes Positional candidates Functional studies Model organisms Genes identified in other forest trees

Stems Candidate genes for association studies Functional candidates By homology to genes in other species By direct evidence in forest trees Positional candidates QTL analyses in pedigrees zinc finger protein tub u lin u biq u itin 2% 1% 4% aq u ap o rin un kn o w n 1% 2% tra ns lation initia tion tr na atp synthase 1% 1% 2% transc ription fac tor 3% structure recognition 1% rna /dna binding 2% rna polymerase 1% rib os o m a l 17 % o xid as e 1% ph o sp h ata se p e ptid as e r ed u cta se 1% 3% 3% e fh a nd 2% heat shock protein 2% he lica s e 1% h isto ne 2% h ydr o las e 2% k in a se 1% 0.0 8.7 8.9 16.3 19.8 23.7 26.9 30.8 31.7 35.7 40.0 43.3 43.8 44.6 45.4 47.1 50.0 50.3 51.3 55.6 61.1 64.1 65.2 73.0 78.5 81.0 91.4 95.6 97.8 98.9 104.3 115.4 122.4 P-UBC_BC_350_1000 P-PmIFG_1275_a/Thaumatin-like_precursor P-PmIFG_1173_a F-PmIFG_1548_a(fr8-1) M-PmIFG_1514_c P-UBC_BC_570_425 M-estPmIFG_Pt9022+/2/translation-factor P-OSU_BC_570_360 B-PmIFG_1474_a M-PmIFG_1474_b F-PmIFG_0339_a(fr8-2) P-PmIFG_0320_a P-PmIFG_1005_e(fr17-2) M-UBC_BC_245_750 P-PmIFG_1427_a F-PmIFG_1567_a(fr8-3) Antifreeze protein P-PmIFG_1570_a P-PmIFG_0005_a(fr17-1) P-PmIFG_1308_a F-PtIFG_2885_a/2 P-PmIFG_1145_a(fr8-4) P-PmIFG_1601_a M-IFG_OP_K14_825 M-estPtIFG_8510/11 M-PmIFG_0005_b P-UBC_BC_446_600 P-PmIFG_0315_a(fr15-1) B-estPmaLU_SB49/3 P-PmIFG_1060_a Linkage group M-IFG_OP_H09_0650 F-PmIFG_1123_a(fr15-2) 8 M-UBC_BC_506_800 M-PmIFG_1591_a Expression candidates Microarray analyses Proteomics Metabolomics Figure Credits: Kostya Krutovsky, Texas A&M University

Potential genotyping pitfalls Quality of genotype data Contract labs, automated base calls Minor allele frequency Use minimum threshold, e.g. MAF 0.05 or MAF 0.10 Rare alleles can cause spurious associations due to small samples (recall that D is unstable with rare alleles) Missing data!!! Alternative methods for imputing missing data

Statistical tests for marker/trait associations SNP by trait association testing is, at its core, a simple test of correlation/regression between traits In reality such cases rarely exist and more sophisticated approaches are required. These may take the form of mixed models that account for potential covariates and other sources of variance The principle covariates of concern are population structure and kinship or relatedness, both of which may result in LD between a marker and a QTN that is not predictive for the population as a whole

Causes of population structure Geography Adaptation to local conditions (selection) Non-random mating Isolation / bottlenecks (drift) Assortative mating Geographic isolation Population admixture (migration) Co-ancestry

Case-control and population structure Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Genetics Marchini et al., 2004.

Accommodating population structure Avoid the problem by avoiding admixted populations or working with populations of very well defined co-ancestry Use statistical tools to make appropriate adjustments

Detecting and accounting for population structure Family based methods Population based methods Genomic control (GC) Structured association (SA) Multivariate Mixed model analyses (test for association)

Family based approaches Avoid unknown population structure by following marker-trait inheritance in families (known parent-offspring relationships) Common approaches include Transmission disequilibrium test (TDT) for binary traits Quantitative transmission disequilibrium test (QTDT) for quantitative traits Both methods build upon Mendelian inheritance of markers within families Test procedure Group individuals by phenotype Look for markers with significant allele frequency differences between groups For a binary trait such as disease, use families with affected offspring Constraints Family structures must be known (e.g. pedigree) Limited samples

Population based: Genomic control Because of shared ancestry, population structure should translate into an increased level of genetic similarity distributed throughout the genome of related individuals By way of contrast, the expectation for a causal association would be a gene specific effect Genomic control (GC) process Neutral markers (e.g. 10-100 SSRs) are used to estimate the overall level of genetic similarity within a sampled population In turn, this proportional increase in similarity is used as an inflation factor, sometimes called, used to adjust significance probabilities (pvalues) For example, p-value (adj) = p-value (unadj) /(1+ ) Typical values of are in the range of ~0.02-0.10

Structured association The general idea behind structured association (SA) is that cryptic population history (or admixture) causes increased genetic similarity within groups The challenge is to determine how many groups (K) are represented, and then to quantify group affinities for each individual Correction factors are applied separately to each individual, based upon the inferred group affinities SA is computationally demanding

Multivariate methods Multivariate methods build upon co-variances among marker genotypes Multivariate methods such as PCA offer several advantages over SA Downstream analysis of SA and PCA data are similar

Mixed model approaches Mixed models test for association by taking into account factors such as kinship and population structure, provided by other means Provides good control of both type 1 (false positive associations) and type 2 (false negative associations) errors

Tassel mixed model : y i 1 2 3 4 5 6 7 8 X S Qv Zu e Location ID SNP ID Population ID Genotype ID Trait L1 L2 SNP1 P1 P2 G1 G2 G3 G4 y 1 = 1 0 1 1 0 1 0 0 0 e 1 y 2 = 1 0 1 0 1 0 0 1 0 e 2 y 3 = 1 0 1 0 1 0 0 1 0 u 1 e 3 y 4 = 1 0 b 1 0 0 1 v 1 0 0 1 0 u 2 e 4 * + * a 1 + * + * + y 5 = 0 1 b 2 0 1 0 v 2 0 1 0 0 u 3 e 5 y 6 = 0 1 0 0 1 0 0 0 1 u 4 e 6 y 7 = 0 1 1 1 0 0 1 0 0 e 7 y 8 = 0 1 1 0 1 0 0 0 1 e 8 y i = Xβ + Sα + Qv + Zu + e i y 3 = b 1 + a 1 + v 2 + u 3 + e 3 = Measured trait = Fixed effects (BLUE = Best Linear Unbiased Esitimates) = Random effects (BLUP = Best Linear Unbiased Predictions) Figure Credit: Fikret Isik, North Carolina State University

Significant associations for diabetes distributed across the human genome Figure Credit: Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics McCarthy et al., 2008.

Association genetics: Concluding comments Advantages Populations Mapping precision Scope of inference Drawbacks Resources required Confounding effects Repeatability

References cited Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdos, and E. S. Buckler. 2007. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23: 2633-2635. (Available online at: http://dx.doi.org/10.1093/bioinformatics/btm308) (verified 2 June 2011). Flint-Garcia, S. A., A. C. Thuillet, J. M. Yu, G. Pressoir, S. M. Romero, S. E. Mitchell, J. Doebley, S. Kresovich, M. M. Goodman, and E. S. Buckler. 2005. Maize association population: A high-resolution platform for quantitative trait locus dissection. Plant Journal 44: 1054-1064. (Available online at: http://dx.doi.org/10.1111/j.1365-313x.2005.02591.x) (verified 2 June 2011). Jorgenson, E. and J. Witte. 2006. A gene-centric approach to genome-wide association studies. Nature Reviews Genetics 7: 885-891. (Available online at: http://dx.doi.org/10.1038/nrg1962) (verified 2 June 2011). Marchini, J., L. R. Cardon, M. S. Phillips, and P. Donnelly. 2004. The effects of human population structure on large genetic association studies. Nature Genetics 36: 512-517. (Available online at: http://dx.doi.org/10.1038/ng1337) (verified 2 June 2011). McCarthy, M. I., G. R. Abecasis, L. R. Cardon, D. B. Goldstein, J. Little, J.P.A. Iaonnidis, and J. N. Hirschhorn. 2008. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9: 356-369. (Available online at: http://dx.doi.org/10.1038/nrg2344) (verified 2 June 2011).

References cited Neale, D. B., and O. Savolainen. 2004. Association genetics of complex traits in conifers. Trends in Plant Science 9: 325-330. (Available online at: http://dx.doi.org/10.1016/j.tplants.2004.05.006) (verified 2 June 2011). Pearson, T. A., and T. A. Manolio. 2008. How to interpret a genome-wide association study. Journal of the American Medical Association 299: 1335-1344. (Available online at: http://dx.doi.org/10.1001/jama.299.11.1335) (verified 2 June 2011). Rafalski, A. 2002. Applications of single nucleotide polymorphisms in crop genetics. Current Opinion in Plant Biology 5: 94-100. Wheeler, N. C., K. D. Jermstad, K. Krutovsky, S. N. Aitken, G. T. Howe, J. Krakowski, and D. B. Neale. 2005. Mapping of quantitative trait loci controlling adaptive traits in coastal Doublas-fir. IV. Cold-hardiness QTL verification and candidate gene mapping. Molecular Breeding 15: 145-156. (Available online at: http://dx.doi.org/10.1007/s11032-004-3978-9) (verified 2 June 2011). Yu, J., G. Pressoir, W. H. Briggs, I. V. Bi, M. Yamasaki, J. F. Doebley, M. D. McMullen, et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genetics 38: 203-208. (Available online at: http://dx.doi.org/10.1038/ng1702) (verified 2 June 2011).

Thank You. Conifer Translational Genomics Network Coordinated Agricultural Project