Similar documents
QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

Genetics of dairy production

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

Detecting gene-gene interactions in high-throughput genotype data through a Bayesian clustering procedure

Introduction to Quantitative Genomics / Genetics

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

PLINK gplink Haploview

Why do we need statistics to study genetics and evolution?

BTRY 7210: Topics in Quantitative Genomics and Genetics

Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits

Understanding genetic association studies. Peter Kamerman

1b. How do people differ genetically?

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Exam 1 Answers Biology 210 Sept. 20, 2006

Chapter 4.!Extensions to Mendelian Genetics.! Gene Interactions

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

D) Gene Interaction Takes Place When Genes at Multiple Loci Determine a Single Phenotype.

BTRY 7210: Topics in Quantitative Genomics and Genetics

Linkage Disequilibrium. Adele Crane & Angela Taravella

Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

QTL Mapping, MAS, and Genomic Selection

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Conifer Translational Genomics Network Coordinated Agricultural Project

Statistical Methods for Network Analysis of Biological Data

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

High-density SNP Genotyping Analysis of Broiler Breeding Lines

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Supplementary Note: Detecting population structure in rare variant data

Computational Genomics

Beyond single genes or proteins

ch03 Student: If a phenotype is controlled by the genotypes at two different loci the interaction of these genes is called

Danika Bannasch DVM PhD. School of Veterinary Medicine University of California Davis

Using Mapmaker/QTL for QTL mapping

Bioinformatics opportunities in Genomics and Genetics

LINKAGE AND CHROMOSOME MAPPING IN EUKARYOTES

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

Papers for 11 September

Haplotype Based Association Tests. Biostatistics 666 Lecture 10

1. why study multiple traits together?

SAMPLE MIDTERM QUESTIONS (Prof. Schoen s lectures) Use the information below to answer the next two questions:

Chapter 14: Mendel and the Gene Idea

Human Genetics and Gene Mapping of Complex Traits

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017

Course Announcements

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

General aspects of genome-wide association studies

Multiple Choice (3.35 each) Total = 100pts. Choice the choice that best answers the question! Good luck!

Observing Patterns in Inherited Traits. Chapter 11

Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women

Chapter 9. Gene Interactions. As we learned in Chapter 3, Mendel reported that the pairs of loci he observed segregated independently

Advanced Introduction to Machine Learning

Linkage Disequilibrium

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

Complex inheritance of traits does not follow inheritance patterns described by Mendel.

Lab 1: A review of linear models

(A) Type AB only. (B) Type A or Type B only. (C) Type A, AB, and B only. (D) All four types are possible: type A, AB, B or O.

Polygenic Influences on Boys & Girls Pubertal Timing & Tempo. Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University

POLYMORPHISM AND VARIANT ANALYSIS. Matt Hudson Crop Sciences NCSA HPCBio IGB University of Illinois

QTL Mapping, MAS, and Genomic Selection

A Primer of Ecological Genetics

Multiple Traits & Microarrays

Chapter 5: Overview. Overview. Introduction. Genetic linkage and. Genes located on the same chromosome. linkage. recombinant progeny with genotypes

Genome-wide association studies (GWAS) Part 1

Crash-course in genomics

Department of Psychology, Ben Gurion University of the Negev, Beer Sheva, Israel;

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

BIOLOGY - CLUTCH CH.14 - MENDELIAN GENETICS.

Using the Association Workflow in Partek Genomics Suite

Outline of lectures 9-11

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Topics in Statistical Genetics

Chapter 5. Gene Interactions

Take Home Message. Molecular Imaging Genomics. How to do Genetics. Questions for the Study of. Two Common Methods for Gene Localization

Observing Patterns in Inherited Traits. Chapter 11 Updated Reading Not

Illuminating Genetic Networks with Random Forest

MAPPING GENES TO TRAITS IN DOGS USING SNPs

B.6.F predict possible outcomes of various genetic combinations such as monohybrid crosses, dihybrid crosses and non Mendelian inheritance

A/A;b/b x a/a;b/b. The doubly heterozygous F1 progeny generally show a single phenotype, determined by the dominant alleles of the two genes.

FFGWAS. Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data

H3A - Genome-Wide Association testing SOP

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Genetic data concepts and tests

Introduction to Quantitative Genetics

GCTA/GREML. Rebecca Johnson. March 30th, 2017

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by

Video Tutorial 9.1: Determining the map distance between genes

Genetic load. For the organism as a whole (its genome, and the species), what is the fitness cost of deleterious mutations?

Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize

EVOLUTION/HERDEDITY UNIT Unit 1 Part 8A Chapter 23 Activity Lab #11 A POPULATION GENETICS AND EVOLUTION

Midterm#1 comments#2. Overview- chapter 6. Crossing-over

COMPUTER SIMULATIONS AND PROBLEMS

Imputation. Genetics of Human Complex Traits

Transcription:

http://genemapping.org/

Epistasis in Association Studies David Evans

Law of Independent Assortment

Biological Epistasis Bateson (99) a masking effect whereby a variant or allele at one locus prevents the variant at another locus from manifesting its effect Phenotypic differences among individuals with various genotypes at one locus depend on their genotypes at other loci Does NOT depend on allele frequency

Epistasis in the labrador retriever dog

Recessive Epistasis B = Black b = brown E/e = gold locus

Statistical Epistasis Deviation of multilocus genotypic values from the additive combination of the single locus components Close to statistical concept of interaction Depends on allele frequencies Population specific May be scale dependent Different ways the epistatic values/variance components can be calculated: Hierarchical ANOVA (Sham, 998) Method of Contrasts (Cockerham, 954) Using partial derivatives of the population mean (Kojima, 959; Tiwari & Elston, 997)

No Epistasis Effect at locus B independent of effect at Trait Value AA Aa aa BB Bb bb

Epistasis Two Loci modifies the effect at locus B Trait Value AA Aa aa BB Bb bb

Genetic Variance One Locus AA Aa aa Genotypic Mean Y AA Y Aa Y aa Frequency f AA f Aa f aa f i Y i i AA, Aa, aa 2 A D f i i AA, Aa, aa ( Y ) i 2 2 A 2 D 2 f A fa[ f A( YAA YAa ) fa ( YAa Yaa )] f 2 A f 2 a ( YAA 2YAa Yaa ) 2 2

Partitioning the Variance One Locus 3 3 2 2 aa Aa AA aa Aa AA 2 2 Additive Model Dominant Model Additive Genetic Variance is variance explained by regression Dominance variance is residual variance not explained by regression

Least Squares Regression One Locus Represent genotypes of each individual by indicator variables: Additive Coefficient Additive and Dom Coefficient Genotype X X Z aa - - -½ Aa ½ AA -½ Can provide tests of significance Y = µ + a X + d Z + ε Fit by least squares (or maximum likelihood) Partitions data into variance components

Genetic Variance Two Loci BB Bb bb AA Aa aa Y AABB f AABB Y AABb f AABb Y AAbb Y AaBB f AaBB Y AaBb f AaBb Y Aabb Y aabb f aabb Y aabb f aabb Y aabb A { AA, Aa, aa} B { BB, Bb, bb} 2 G i A i A j B j B f ij Y f ij ij ( Y ) ij 2 f AAbb f Aabb f aabb 2 I 2 G 2 A D

Components of Variance for a Two Locus Model Additive genetic variance Locus Additive genetic variance Locus 2 Dominance genetic variance Locus Dominance genetic variance Locus 2 Additive x Additive genetic variance Additive x Dominance genetic variance Dominance x Additive genetic variance Dominance x Dominance genetic variance Epistatic Variance

Additive Variance A Dominance Variance A Additive Var iance at locus A Dominance Var iance at locus A Var iance.2 8 6 4 2.8.6.4.2.6 p(b) Varian ce.2 8 6 4 2.8.6.4.2.5.9 p( B ) BB Bb bb p(a) p( A ) AA Additive Variance B Dominance Variance B Additive Var iance at locus B.2 Dominance Var iance at locus B.2 Aa 8 8 6 6 4 4 2 2 Varian ce.8.6.4.2.5.9 p( B ) Varian ce.8.6.4.2.5.9 p( B ) aa p( A ) p( A ) Additive x Additive Additive x Dominant Dominant x Additive Dominant x Dominant Additive x Additive Var iance Additive x Dominance Var iance Additive x Dominance Var iance Domi nance x Domi nance V ar i ance.2.2.2 2.E- 8 8 8. 8 E- 6 6 6. 6 E- 4 4 4. 4 E- 2 2 2. 2 E- Varian ce Varian ce Varian ce Varian ce. E-.8.8.8 8.E-2.6.6.6 6.E-2.4.2.9.5 p( B ).4.2.9.5 p( B ).4.2.9.5 p( B ) 4.E-2 2.E-2.E+.6 p( B ) p( A ) p( A ) p( A ) p( A )

Least Squares Regression Two Loci Y = µ + a X + a 2 X 2 + d Z + d 2 Z 2 + i aa w aa + i ad w ad + i da w da + i dd w dd + ε if AA if BB x = if Aa x 2 = if Bb - if aa - if bb z = ½if Aa -½ otherwise z 2 = ½if Bb -½ otherwise w aa = x x 2 w ad = x z 2 w da = z x 2 w dd = z z 2 Can also formulate using logistic regression for dichotomous traits

Why model Epistasis in GWAS? Epistasis is important in model organisms c.f. Studies in Guinea fowl, yeast, drosophila Single locus tests will not always detect epistasis!!! Modeling epistasis may improve power to detect loci (?) Epistasis is ignored in human studies -Main effects hard enough to find! -Multiple testing problem: e.g., markers gives a cutoff of p = x -!!! -Computational problems -Storage problems

Epistasis in GWAS? What are the consequences of fitting two locus models when epistasis is absent? - If it isn t there, what happens if we go looking for it? What are the consequences of fitting two locus models when epistasis is present? - If it IS there, what happens if we go looking for it? What are the consequences of fitting single locus models when epistasis is present? - If it s there, what happens when we ignore it?

Simulate quantitative variable METHOD -BOTH loci combined are responsible for %, 5%, 2% or % of total variance -5, or 2 individuals -Assume, markers across the genome -Perfect LD between marker and trait locus -Comprehensive range of allele frequencies at both loci -5 different models incorporating varying degrees of epistasis

MA ¾ ½ ¾½¼ ½¼ M M2 M3 M5 M7 M M M2 M3 M4 M5 M6 M7 M8 M9 M2 M23 M26 M27 M28 M29 M3 M4 M4 M42 M43 M45 M56 M57 M58 M59 M6 M68 M69 M7 Evans et al. (26) PLOS Genet M78 M84 M85 M86 M94 M97 M98 M99 M M6 M8 M3 M4 M7 M86

CONTROL LESS FREAKY WAY FREAKY Additive Trait Mean.5 Complimentary Gene Action Trait Mean.5 Modifying Effects Trait Mean.5 AA Aa aa bb Bb Locus B BB AA Aa aa bb Bb Locus B BB AA Aa aa bb Bb Locus B BB Additive Trait Mean.5 AA Aa aa bb Bb Locus B BB Dominant x Dominant Complimentary Gene Action Trait Mean.5 AA Aa aa bb Bb Locus B BB D x D Epistasis Trait Mean.5 AA Aa aa bb Bb Locus B BB Dominant Trait Mean.5 AA Aa aa bb Bb Locus B BB Dominant x Recessive Complimentary Gene Action Trait Mean.5 AA Aa aa bb Bb Locus B BB XOR Trait Mean.5 AA Aa aa bb Bb Locus B BB Heterozygote Advantage Trait Mean.5 AA Aa aa bb Bb Locus B BB Threshold Model Trait Mean.5 AA Aa aa bb Bb Locus B BB Chequer Board Trait Mean.5 AA Aa aa bb Bb Locus B BB

Genetic Model Simulating Genetic Data Allele frequencies -Allele frequencies at each locus will determine frequency of genotypes p AA =.64 p A =.8 p Aa =.32 p a =.2 p aa =.4 -Need to specify trait means for each genotype combination Complimentary Gene Action Trait Mean.5 -A random normal deviate can then be placed on these means to simulate the action of the environment Each combination of parameters will result in a unique variance profile AA Aa aa bb Bb BB Locus B

Simulating Genetic Data Complimentary Gene Action.9 Trait Mean.5 AA Aa aa bb Bb BB Locus B Additive Variance Locus 2.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.8.55.3.5.9.8 Additive Variance Locus.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.5.3.8.55 Dominance Variance Locus 2.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.8.55.3.5 Dominance Variance Locus.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.8.55.3.5 Epistatic Variance Components.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.8.55.3.5 (Variance components shown for complimentary gene action model)

METHOD Quantify power to detect association -, simulations for each combination of parameters Single locus test of association -Power to detect BOTH loci -Power to detect EITHER locus Two locus test of association -Power to detect BOTH loci -Different from power to detect the epistatic variance component explicitly All models fit via maximum likelihood -Significance assessed by minus two log-likelihood chi-square

Epistasis Isn t There Two Locus Test Power to Detect EITHER Locus Single Locus Test Power to Detect BOTH Loci Single Locus Test.9.9.8.8.7.9.7 Power.6.5.4.3.2.2.3.4.5.6.7.8.9 Allele Frequency.4.7 Allele Frequency Locus B.8.7.6.5.4.3.2.2.3.4.5.6.7.8.9.4.7.6.5.4.3.2.2.3.4.5.6.7.8.9.4.7 (Simulations represent 5 individuals, % genetic variance, Additive Model) The power to detect EITHER locus is greater using the single locus test when epistasis is NOT present The power to detect BOTH loci is actually less using single locus tests even when epistasis is NOT present

Epistasis IS There The power to detect BOTH loci is always better using the two locus test when epistasis is present Two Locus Test Power to Detect BOTH Loci SINGLE Locus Tests.9.9.8.8.7.7 Power.6.5.4 Power.6.5.4.3.3.2.2.7.7.2.3.4.5.6.7.8.9 Allele Frequency.4 Allele Frequency Locus B.2.3.4.5.6.7.8.9 Allele Frequency.4 Allele Frequency Locus B (Simulations represent 5 individuals, % genetic variance, Dominant x Dominant Complimentary Gene Action Model)

Power To Detect EITHER Locus When the model is EXTREME, the power of the two locus test is often better than the power to detect EITHER locus using single locus tests Model Two Locus Test Power to detect EITHER locus SINGLE Locus Tests Dominant x Dominant Epistasis.9.9.8.8.7.7 Trait Mean.5 Power.6.5.4 Power.6.5.4.3.3 AA Aa aa BB bb Bb Locus B.2.2.3.4.5.6.7.8.9.4.7 AF Locus B.2.2.3.4.5.6.7.8.9.4.7 Allele Frequency (5 individuals, % genetic variance)

Power to Detect EITHER Locus When the model is less extreme, the power to detect EITHER locus using the single locus tests is often better than the two locus test Model Two Locus Test Power to detect EITHER locus SINGLE Locus Tests Dominant x Dominant Complimentary Gene Action Model.9.9.8.7.8.7 Trait Mean.5 Power.6.5.4.3 Power.6.5.4.3 AA Aa aa bb Bb BB Locus B.2.2.3.4.5.6.7.8.9.4.7 AF Locus B.2.2.3.4.5.6.7.8.9.4.7 Allele Frequency (5 individuals, % genetic variance)

The power to detect BOTH loci is always better fitting the two locus model regardless of the underlying model There are situations where fitting the full two-locus model will reveal effects which are not identified using single-locus methodology Multiple testing doesn t kill you as much as you think!!!

Exhaustive or Two Stage Strategy? Idea: Two Stage Strategies -Test a subset of, C 2 comparisons -Which comparisons are chosen depends on their performance in the single locus tests PLUS: Reduce cost due to multiple testing MINUS: Throw away some comparisons which would be significant in the two locus test, yet are not significant in the single locus tests

Two-stage Models Strategy One -Only markers which pass first stage threshold in the single locus analysis are tested -All pair-wise combinations of these markers are tested pval.5 Marker Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Marker 8 Marker 9 Marker -Three comparisons: -Marker 2 vs Marker 6 -Marker 2 vs Marker 9 -Marker 6 vs Marker 9

Two Stage Procedure: Strategy One.9.9.8.8 p < Power.7.6.5.4 p <. Power.7.6.5.4.3.3.2.2.7.7.2.3.4.5.6.7.8.9.4.2.3.4.5.6.7.8.9.4.9.9.8.8.7.7 p <. Power.6.5.4.3 p <. Power.6.5.4.3.2.2.7.7.2.3.4.5.6.7.8.9 Allele Frequency Locus.4 Allele Frequency Locus 2.2.3.4.5.6.7.8.9 Allele Frequency Locus.4 Allele Frequency Locus 2 (Dominant x Dominant Complimentary Gene Action Model, 5 individuals, % genetic variance) As the first stage threshold becomes more stringent, the power to detect both loci DECREASES for the majority of the parameter space

Why Does Strategy One Perform Poorly? For BOTH loci to be included in the second stage, Strategy One requires BOTH single locus tests to meet some threshold This threshold will not be met when the single locus variance is close to zero Therefore Strategy One will tend to fail whenever EITHER single locus component is close to zero Variance Locus One Variance Locus Two Epistatic Variance Proportion of Variance.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65 AF Locus.75.85.95.5.3.8.55 AF Locus 2.9.8.7.6.5.4.3.2.85.75.65.55.45.35.25 AF Locus 2 5.5.5.35.65.95 AF Locus.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.8.55.3.5 (Dominant x Dominant Complimentary Gene Action Model)

Two-stage Models Strategy Two -Markers which pass first stage threshold are tested with ALL other markers pval.5 Marker Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Marker 8 Marker 9 Marker -24 comparisons: -Marker 2 vs Markers, 3, 4, 5, 6, 7, 8, 9, -Marker 6 vs Marker, 3, 4, 5, 6, 7, 8, 9, -Marker 9 vs Marker, 3, 4, 5, 7, 8,

Two-Stage Procedure: Strategy Two p < Power.9.8.7.6.5.4.3.2.2.3.4.5.6.7.8.9.4.7 p <. Power.9.8.7.6.5.4.3.2.2.3.4.5.6.7.8.9.4.7.9.9.8.8 p <. Power.7.6.5.4 p <. Power.7.6.5.4.3.3.2.2.7.7.2.3.4.5.6.7.8.9.4.2.3.4.5.6.7.8.9.4 (Dominant x Dominant Complimentary Gene Action Model, 5 individuals, % genetic variance) As the first stage threshold becomes more stringent, there is no increase in power There is a decrease in power at more stringent levels

Because of the need to condition on the first stage results being significant, there is no increased power in the second stage Since loci are included in the second stage if the variance is in EITHER single locus component, the strategy fails when the majority of variance is in the epistatic variance component or the first stage threshold is too severe Less Extreme Models Extreme Models CGA DxD CGA XOR Chequer.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5 DxR CGA Threshold ME DxD.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.5.35.65.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5.9.8.7.6.5.4.3.2.5 5.25.35.45.55.65.75.85.95.65.35.5

Many simple looking models contain regions of their parameter space where loci would not be able to be identified using single locus analyses or two stage analyses An exhaustive search involving all pair-wise combinations of markers across the genome is superior to performing a two stage strategy

Conclusions An exhaustive search involving all pair-wise combinations of markers across the genome is superior to performing a two stage strategy Many simple looking models contain sizeable regions of their parameter space where loci would not be able to be identified using single locus analyses Despite the increased penalty due to multiple testing, it is possible to detect interacting loci which contribute to moderate proportions of the phenotypic variance with realistic sample sizes Is it worth incorporating epistasis in GWA?

Using PLINK to test for Epistasis Two tests of epistasis are implimented in PLINK Testing for additive x additive epistasis: Y = µ + a X + a 2 X 2 + i aa w aa + ε plink --file mydata --epistasis plink.epi.cc plink.epi.cc.summary Fast epistasis : plink --file mydata --fast-epistasis Possible to control output using the flags: --epi. --epi2.

Practical Copy the files epistasis.ped and epistasis.map from H:/davide/LEUVEN28 Run single locus tests of association in this dataset plink --file epistasis --assoc Run a scan for epistasis using the --fast-epistasis option plink --file epistasis --fast-epistasis What are the two top interactions from this analysis? Are these loci flagged in the single locus analysis?

Practical Two locus results: CHR SNP CHR2 SNP2 STAT P 22 rs244 22 rs967957 5.68 7.485e-5 22 rs276672 22 rs2769 5.72 7.348e-5 Single locus results: CHR SNP CHISQ P OR 22 rs244.269.8697. 22 rs967957 3.333.679.738 22 rs276672 3.7.7969.22 22 rs2769.572.4763.9587