Introduction to statistics for Genome- Wide Association Studies (GWAS) Day 2 Section 8

Size: px
Start display at page:

Download "Introduction to statistics for Genome- Wide Association Studies (GWAS) Day 2 Section 8"

Transcription

1 Introduction to statistics for Genome- Wide Association Studies (GWAS) 1

2 Outline Background on GWAS Presentation of GenABEL Data checking with GenABEL Data analysis with GenABEL Display of results 2

3 R Packages for GWAS Plink developped by the M.I.T. but only available for linux platform only. ( SNPassoc (Juan R. González 1, et al. Bioinformatics, (5): ) GenABEL (Aulchenko Y.S., Ripke S., Isaacs A., van Duijn C.M. Bioinformatics. 2007, 23(10): ) 3

4 What is a GWAS? A genome-wide association study is an approach that involves rapidly scanning markers across genome ( 0.5M or 1M) of many people ( 2K) to find genetic variations associated with a particular disease. A large number of subjects are needed because (1)associations between SNPs and causal variants are expected to show low odds ratios, typically below 1.5 (2)In order to obtain a reliable signal, given the very large number of tests that are required, associations must show a high level of significance to survive the multiple testing correction Such studies are particularly useful in finding genetic variations that contribute to common, complex diseases 4

5 What is a GWAS? 5

6 Why are such studies possible now? The completion of the Human Genome Project in 2003 and the International HapMap Project in 2005, researchers now have a set of research tools that make it possible to find the genetic contributions to common diseases 6

7 GWAS for complex diseases 7

8 Overview of the general design and workflow of a genome-wide association (GWA) study 8

9 What have GWAS found? In 2005, it was learned through GWAS that age-related macular degeneration is associated with variation in the gene for complement factor H, which produces a protein that regulates inflammation (Klein et al. (2005) Science, 308, ) In 2007, the Wellcome Trust Case-Control Consortium (WTCCC) carried out GWAS for the diseases coronary heart disease, type 1 diabetes, type 2 diabetes, rheumatoid arthritis, Crohn's disease, bipolar disorder and hypertension. This study was successful in uncovering many new disease genes underlying these diseases. See next page for more publications in GWAS 9

10 Examples of GWAS Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Wellcome Trust Case Control Consortium Nature. 2007;447; Genomewide association analysis of coronary artery disease. Samani et al. N Engl J Med. 2007;357; Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Parkes et al. Nat Genet. 2007;39;830-2 Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Todd et al. Nat Genet. 2007;39; A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Frayling et al. Science. 2007;316; Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Zeggini et al. Science. 2007;316; Scott et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 316,

11 Example: Data & Results 11

12 Problem(s) How to make inference about SNP-Disease associations? Which computational tools to use? 12

13 Features of GenABEL Specifically designed for GWAS Provides specific facilities for storage and manipulation of large data Very fast tests for GWAS Specific functions to analyze and display the results More efficient than the library genetics 13

14 GeneABEL: GWAS.data class 14

15 Exploring GWAS.data class objects library("genabel") data(ge03d2ex) # phenotype data summary(ge03d2ex@phdata) R output id sex age dm2 Length:136 Min. : Min. :23.84 Min. : Class :character 1st Qu.: st Qu.: st Qu.: Mode :character Median : Median :48.71 Median : Mean : Mean :49.07 Mean : rd Qu.: rd Qu.: rd Qu.: Max. : Max. :81.57 Max. : height weight diet bmi Min. :150.2 Min. : Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.:24.56 Median :169.4 Median : Median : Median :28.35 Mean :169.4 Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.:35.69 Max. :191.8 Max. : Max. : Max. :59.83 NA's : 1.0 NA's : 1.00 NA's :

16 Exploring GWAS.data class objects library("genabel") data(ge03d2ex) # phenotype data summary(ge03d2ex@phdata) # number of people in study ge03d2ex@gtdata@nids # number of SNPs ge03d2ex@gtdata@nsnps # SNP names ge03d2ex@gtdata@snpnames[1:10] # Chromosome labels ge03d2ex@gtdata@chromosome[1:10] # SNPs map positions ge03d2ex@gtdata@map[1:10] 16

17 Descriptive statistics: phenotypes descriptive.trait(ge03d2ex) R output No Mean SD id 136 NA NA sex age dm height weight diet bmi type 2 diabetes status descriptives.trait(ge03d2ex, by=ge03d2ex@phdata$dm2)) = by case-control status 17

18 Descriptive statistics: markers descriptives.marker(ge03d2ex) $`Minor allele frequency distribution` X<= <X<= <X<= <X<=0.2 X>0.2 No Prop $`Distribution of number of SNPs out of HWE, at different alpha` X<=1e-04 X<=0.001 X<=0.01 X<=0.05 X>0.05 No Prop $`Distribution of porportion of successful genotypes (per SNP)` X<= <X<= <X<= <X<=0.99 X>0.99 No Prop R output $`Distribution of porportion of successful genotypes (per person)` X<= <X<= <X<= <X<=0.99 X>0.99 No Prop $`Mean heterozygosity for a SNP` [1] $`Standard deviation of the mean heterozygosity for a SNP` [1] $`Mean heterozygosity for a person` [1] $`Standard deviation of mean heterozygosity for a person` [1]

19 Test of Hardy-Weinberg equilibrium # Test of Hardy-Weinberg equilibrium in control group s<-summary(ge03d2ex@gtdata[(ge03d2ex@phdata$dm2 == 0),]) pexcas<-s[,"pexact"] estlambda(pexcas) # Test of Hardy-Weinberg equilibrium in case group s<-summary(ge03d2ex@gtdata[(ge03d2ex@phdata$dm2 == 1),]) pexcas<-s[,"pexact"] estlambda(pexcas) R output Controls Cases 19

20 Data checking: procedure qc1<-check.marker(ge03d2ex, p.level=0) R output RUN markers and 134 people in total 304 ( %) markers excluded as having low (< %) minor allele frequency 36 ( %) markers excluded because of low (<95%) call rate 0 (0%) markers excluded because they are out of HWE (P <0) 1 ( %) people excluded because of low (<95%) call rate 3 ( %) people excluded because too high autosomal heterozygosity (FDR <1%) Mean autosomal HET was (s.e ), people excluded had HET >= ( %) people excluded because of too high IBS (>=0.95) Mean IBS was (s.e ), as based on 2000 autosomal markers In total, 3653 ( %) markers passed all criteria In total, 129 ( %) people passed all criteria 20

21 Data checking: summary table summary(qc1) R output $`Per-SNP fails statistics` NoCall NoMAF NoHWE Redundant Xsnpfail NoCall NoMAF NA NoHWE NA NA Redundant NA NA NA 0 0 Xsnpfail NA NA NA NA 1 $`Per-person fails statistics` IDnoCall HetFail IBSFail isfemale ismale IDnoCall HetFail NA IBSFail NA NA isfemale NA NA NA 2 0 ismale NA NA NA NA 0 21

22 Data checking: output The procedure provides the list of individuals (idok) and SNPs (snpok) who passed all QC criteria. It is then possible to obtain a clean dataset: data1<-ge03d2ex[qc1$idok, qc1$snpok] 22

23 Data checking: HW plots after cleaning == 1),]) pexcas1<-s1[,"pexact"] estlambda(pexcas1) R output After Before 23

24 Finding genetic sub-structure # matrix of genomic kindship between all pairs of individuals data1.gkin <-ibs(data1[,data1@gtdata@chromosome!= "X"], weight="freq") # distance matrix data1.dist<-as.dist(0.5-data1.gkin) #use classical multidimensional scaling data1.mds<-cmdscale(data1.dist) #plot the two first components plot(data1.mds) Exclude these individuals 24

25 Remove outliers km<-kmeans(data1.mds, centers=2, nstart=1000) cl1<-names(which(km$cluster==1)) cl2<-names(which(km$cluster==2)) data2<-data1[cl1,] Then, repeat the QC analysis allowing for HWE checks (using controls and exclude markers with FDR 0.2) qc2<-check.marker(data2, ==0), fdr=0.2) summary(qc2) R output NoCall NoMAF NoHWE Redundant Xsnpfail NoCall NoMAF NA NoHWE NA NA Redundant NA NA NA 0 0 Xsnpfail NA NA NA NA 0 IDnoCall HetFail IBSFail isfemale ismale IDnoCall HetFail NA IBSFail NA NA isfemale NA NA NA 0 0 ismale NA NA NA NA 0 25

26 GWA scan: raw data Scan of the raw data (before quality control) using a score test, as implemented in the qtscore() function. an0<-qtscore(dm2, ge03d2ex, trait="binomial") plot(an0) # add corrected p-values in green add.plot(an0, df="pc1df", col="green") interesting results? R output 26

27 GWA scan: raw data Scan of the raw data (before quality control) using a score test, as implemented in the qtscore() function. #descriptive table descriptives.scan(an0) R output: Top 10 results Chromosome Position effb P1df Pc1df effab effbb P2df rs rs rs rs rs rs rs rs rs rs

28 GWA scan: cleaned data data2<-data2[qc2$idok, qc2$snpok] # plot an1<-qtscore(dm2, data2, trait="binomial") plot(an1) # add corrected p-values add.plot(an1, df="pc1df", col="green") interesting results R output 28

29 Comparison of the two scans #compare with previous results plot(an1,, col="green") # add corrected p-values add.plot(an0, col="red") false signal? Clean data Raw data 29

30 GWA scan: cleaned data #descriptive table descriptives.scan(an1) Clean data Chromosome Position effb P1df Pc1df effab effbb P2df rs rs rs rs rs rs rs rs rs rs Raw data Chromosome Position effb P1df Pc1df effab effbb P2df rs rs rs rs rs rs rs rs rs rs

31 GWA in presence of genetic stratification Assess population structure Account for pop. structure in the analysis %in% cl1) pop # Assess pop. structure pop<-as.numeric(data1@phdata$id %in% cl1) pop # Stratified association data1.sa<-qtscore(dm2, data=data1, strata=pop) # plots results and compare with analysis removing the outliers plot(an1, cex=0.5, pch=19, ylim=c(1, 5)) add.plot(data1.sa, col="green", cex=1.2) 31

32 GWA in presence of genetic stratification Adjust both phenotypes and genotypes for possible stratification using principal component analysis (Price s method) data1.eg<-egscore(dm2, data=data1, kin=data1.gkin) plot(an1, cex=0.5, pch=19, ylim=c(1, 5)) add.plot(data1.sa, col="green", cex=1.2) add.plot(data1.eg, col="red", cex=1.3) 32

33 Other interesting features Genetic data imputations Meta-analysis of GWA scans Analysis of selected regions Conversion of plink files 33

34 Conclusion GWAS is becoming of major area of research New computational tools and stat methods are needed GenABEL is an interesting program, especially for easy data cleaning and display of results Plink has more features for stat analysis but not yet available in R for Windows! 34

35 Thank you! 35

GenABEL: an R package for Genome Wide Association Analysis Archana Bhardwaj

GenABEL: an R package for Genome Wide Association Analysis Archana Bhardwaj GenABEL: an R package for Genome Wide Association Analysis Archana Bhardwaj GBIO2 1 Outline R : conditional statements : if, else and for loop GeneABEL Genetic data QC GWA association analysis GBIO2 2

More information

Practical aspects of GWAS

Practical aspects of GWAS Practical aspects of GWAS GenABEL hands-on tutorial GBIO0009-1 (Oct 6 2015) 1 Table of contents Introduction to genetic statistical analysis in GWA Typical study designs / general idea Popular genetic

More information

H3A - Genome-Wide Association testing SOP

H3A - Genome-Wide Association testing SOP H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta

More information

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE : GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

Genome-wide association studies (GWAS) Part 1

Genome-wide association studies (GWAS) Part 1 Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations

More information

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey Genome-Wide Association Studies Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey Introduction The next big advancement in the field of genetics after the Human Genome Project

More information

A genome wide association study of metabolic traits in human urine

A genome wide association study of metabolic traits in human urine Supplementary material for A genome wide association study of metabolic traits in human urine Suhre et al. CONTENTS SUPPLEMENTARY FIGURES Supplementary Figure 1: Regional association plots surrounding

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Mark Craven craven@biostat.wisc.edu Spring 2011 1. Understanding Human Genetic Variation!

More information

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small

More information

Genome Wide Association Studies

Genome Wide Association Studies Genome Wide Association Studies Liz Speliotes M.D., Ph.D., M.P.H. Instructor of Medicine and Gastroenterology Massachusetts General Hospital Harvard Medical School Fellow Broad Institute Outline Introduction

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public

More information

Genome wide association studies. How do we know there is genetics involved in the disease susceptibility?

Genome wide association studies. How do we know there is genetics involved in the disease susceptibility? Outline Genome wide association studies Helga Westerlind, PhD About GWAS/Complex diseases How to GWAS Imputation What is a genome wide association study? Why are we doing them? How do we know there is

More information

Genotype quality control with plinkqc Hannah Meyer

Genotype quality control with plinkqc Hannah Meyer Genotype quality control with plinkqc Hannah Meyer 219-3-1 Contents Introduction 1 Per-individual quality control....................................... 2 Per-marker quality control.........................................

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2012 1. Understanding Human Genetic Variation

More information

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech From genome-wide association studies to disease relationships Liqing Zhang Department of Computer Science Virginia Tech Types of variation in the human genome ( polymorphisms SNPs (single nucleotide Insertions

More information

Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron

Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron Many sources of technical bias in a genotyping experiment DNA sample quality and handling

More information

Supplementary Figures

Supplementary Figures 1 Supplementary Figures exm26442 2.40 2.20 2.00 1.80 Norm Intensity (B) 1.60 1.40 1.20 1 0.80 0.60 0.40 0.20 2 0-0.20 0 0.20 0.40 0.60 0.80 1 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 Norm Intensity

More information

Genomics Resources in WHI. WHI ( ) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011

Genomics Resources in WHI. WHI ( ) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011 Genomics Resources in WHI WHI (2010-2015) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011 WHI Genomic Resources in dbgap Outcomes and traits in AA and Hispanics GWAS-SHARe Sequencing-ESP

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

Genome-wide analyses in admixed populations: Challenges and opportunities

Genome-wide analyses in admixed populations: Challenges and opportunities Genome-wide analyses in admixed populations: Challenges and opportunities E-mail: esteban.parra@utoronto.ca Esteban J. Parra, Ph.D. Admixed populations: an invaluable resource to study the genetics of

More information

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011 EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS

More information

SUPPLEMENTARY METHODS AND RESULTS

SUPPLEMENTARY METHODS AND RESULTS SUPPLEMENTARY METHODS AND RESULTS With: Genetic variation in the KIF1B locus influences susceptibility to multiple sclerosis Yurii S. Aulchenko 1,8, Ilse A. Hoppenbrouwers 2,8, Sreeram V. Ramagopalan 3,

More information

Analysis of genome-wide genotype data

Analysis of genome-wide genotype data Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version

More information

Using the Association Workflow in Partek Genomics Suite

Using the Association Workflow in Partek Genomics Suite Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within

More information

Familial Breast Cancer

Familial Breast Cancer Familial Breast Cancer SEARCHING THE GENES Samuel J. Haryono 1 Issues in HSBOC Spectrum of mutation testing in familial breast cancer Variant of BRCA vs mutation of BRCA Clinical guideline and management

More information

Single Nucleotide Polymorphisms (SNPs)

Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Polymorphisms (SNPs) Sequence variations Single nucleotide polymorphisms Insertions/deletions Copy number variations (large: >1kb) Variable (short) number tandem repeats Single Nucleotide

More information

emerge-ii site report Vanderbilt

emerge-ii site report Vanderbilt emerge-ii site report Vanderbilt 29 June 2015 Vanderbilt activities emerge II PGx implementation locally and emerge-pgx SCN5A/KCNH2 project provider attitudes Phenotype contributions Methods development

More information

Statistical challenges to genome-wide association study

Statistical challenges to genome-wide association study 1 Statistical challenges to genome-wide association study Naoyuki Kamatani, M.D., Ph.D. 1. Director and Professor, Institute of Rheumatology, Tokyo Women s Medical University 2. Director, Medical Informatics

More information

SNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies

SNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies GENOMICS PROTEOMICS & BIOINFORMATICS www.sciencedirect.com/science/journal/16720229 Application Note SNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies Changzheng Dong * School of

More information

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,

More information

Population stratification. Background & PLINK practical

Population stratification. Background & PLINK practical Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major

More information

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Genotype calling Genotyping methods for Affymetrix arrays Genotyping

More information

SUPPLEMENTARY INFORMATION. Common variants in TMPRSS6 are associated with iron status and erythrocyte volume

SUPPLEMENTARY INFORMATION. Common variants in TMPRSS6 are associated with iron status and erythrocyte volume SUPPLEMENTARY INFORMATION Common variants in TMPRSS6 are associated with iron status and erythrocyte volume Beben Benyamin, Manuel A. R. Ferreira, Gonneke Willemsen, Scott Gordon, Rita P. S. Middelberg,

More information

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical. S SG ection ON tatistical enetics Metabolomics meets Genomics Hemant K. Tiwari, Ph.D. Professor and Head Section on Statistical Genetics Department of Biostatistics School of Public Health Metabolomics:

More information

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Introduction to Add Health GWAS Data Part I Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Outline Introduction to genome-wide association studies (GWAS) Research

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

SNPassoc: an R package to perform whole genome association studies

SNPassoc: an R package to perform whole genome association studies SNPassoc: an R package to perform whole genome association studies Juan R González, Lluís Armengol, Xavier Solé, Elisabet Guinó, Josep M Mercader, Xavier Estivill, Víctor Moreno November 16, 2006 Contents

More information

Nature Genetics: doi: /ng.3143

Nature Genetics: doi: /ng.3143 Supplementary Figure 1 Quantile-quantile plot of the association P values obtained in the discovery sample collection. The two clear outlying SNPs indicated for follow-up assessment are rs6841458 and rs7765379.

More information

Derrek Paul Hibar

Derrek Paul Hibar Derrek Paul Hibar derrek.hibar@ini.usc.edu Obtain the ADNI Genetic Data Quality Control Procedures Missingness Testing for relatedness Minor allele frequency (MAF) Hardy-Weinberg Equilibrium (HWE) Testing

More information

Population and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift

Population and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift Population and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift Heather J. Cordell Professor of Statistical Genetics Institute of Genetic Medicine Newcastle University,

More information

Department of Psychology, Ben Gurion University of the Negev, Beer Sheva, Israel;

Department of Psychology, Ben Gurion University of the Negev, Beer Sheva, Israel; Polygenic Selection, Polygenic Scores, Spatial Autocorrelation and Correlated Allele Frequencies. Can We Model Polygenic Selection on Intellectual Abilities? Davide Piffer Department of Psychology, Ben

More information

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by author] Statistical methods: All hypothesis tests were conducted using two-sided P-values

More information

1b. How do people differ genetically?

1b. How do people differ genetically? 1b. How do people differ genetically? Define: a. Gene b. Locus c. Allele Where would a locus be if it was named "9q34.2" Terminology Gene - Sequence of DNA that code for a particular product Locus - Site

More information

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis 10/7/15: ASHG 015 Po-Ru Loh Harvard T.H. Chan School of Public Health Heritability

More information

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015 Lecture 3: Introduction to the PLINK Software Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 PLINK Overview PLINK is a free, open-source whole genome association analysis

More information

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017

Lecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017 Lecture 3: Introduction to the PLINK Software Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 20 PLINK Overview PLINK is a free, open-source whole genome

More information

Papers for 11 September

Papers for 11 September Papers for 11 September v Kreitman M (1983) Nucleotide polymorphism at the alcohol-dehydrogenase locus of Drosophila melanogaster. Nature 304, 412-417. v Hishimoto et al. (2010) Alcohol and aldehyde dehydrogenase

More information

Topics in Statistical Genetics

Topics in Statistical Genetics Topics in Statistical Genetics INSIGHT Bioinformatics Webinar 2 August 22 nd 2018 Presented by Cavan Reilly, Ph.D. & Brad Sherman, M.S. 1 Recap of webinar 1 concepts DNA is used to make proteins and proteins

More information

Supplementary Information. Werner Koch, Petra Hoppmann, Jakob C. Mueller, Albert Schömig & Adnan Kastrati

Supplementary Information. Werner Koch, Petra Hoppmann, Jakob C. Mueller, Albert Schömig & Adnan Kastrati Supplementary Information Werner Koch, Petra Hoppmann, Jakob C. Mueller, Albert Schömig & Adnan Kastrati The Supplementary Information has the following sections in order: 1. Supplementary Methods 2. Supplementary

More information

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain

More information

Global Screening Array (GSA)

Global Screening Array (GSA) Technical overview - Infinium Global Screening Array (GSA) with optional Multi-disease drop in (MD) The Infinium Global Screening Array (GSA) combines a highly optimized, universal genome-wide backbone,

More information

Module 2: Introduction to PLINK and Quality Control

Module 2: Introduction to PLINK and Quality Control Module 2: Introduction to PLINK and Quality Control 1 Introduction to PLINK 2 Quality Control 1 Introduction to PLINK 2 Quality Control Single Nucleotide Polymorphism (SNP) A SNP (pronounced snip) is a

More information

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer Multi-SNP Models for Fine-Mapping Studies: Application to an association study of the Kallikrein Region and Prostate Cancer November 11, 2014 Contents Background 1 Background 2 3 4 5 6 Study Motivation

More information

GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES

GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES GENOME WIDE ASSOCIATION STUDY OF INSECT BITE HYPERSENSITIVITY IN TWO POPULATION OF ICELANDIC HORSES Merina Shrestha, Anouk Schurink, Susanne Eriksson, Lisa Andersson, Tomas Bergström, Bart Ducro, Gabriella

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ejhg.2015.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ejhg.2015. van Loon, J., Dehghan, A., Weihong, T., Trompet, S., McArdle, W. L., Asselbergs, F. F. W.,... O'Donnell, C. (2016). Genome-wide association studies identify genetic loci for low von Willebrand factor levels.

More information

Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis

Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis correction notice Nat. Genet. 45, 613 620 (2013); published online 14 April 2013; corrected online 1 October 2013 Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis

More information

General aspects of genome-wide association studies

General aspects of genome-wide association studies General aspects of genome-wide association studies Abstract number 20201 Session 04 Correctly reporting statistical genetics results in the genomic era Pekka Uimari University of Helsinki Dept. of Agricultural

More information

Linking Genetic Variation to Important Phenotypes

Linking Genetic Variation to Important Phenotypes Linking Genetic Variation to Important Phenotypes BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY E. ZEGGINI and J.L. ASIMIT Wellcome Trust Sanger Institute, Hinxton, CB10 1HH,

More information

Axiom Biobank Genotyping Solution

Axiom Biobank Genotyping Solution TCCGGCAACTGTA AGTTACATCCAG G T ATCGGCATACCA C AGTTAATACCAG A Axiom Biobank Genotyping Solution The power of discovery is in the design GWAS has evolved why and how? More than 2,000 genetic loci have been

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Linkage Disequilibrium

Linkage Disequilibrium Linkage Disequilibrium Why do we care about linkage disequilibrium? Determines the extent to which association mapping can be used in a species o Long distance LD Mapping at the tens of kilobase level

More information

Nutrigenomics and nutrigenetics are they the keys for healthy nutrition?

Nutrigenomics and nutrigenetics are they the keys for healthy nutrition? Nutrigenomics and nutrigenetics are they the keys for healthy nutrition? Maria Koziołkiewicz Faculty of Biotechnology and Food Sciences, Technical University of Lodz, Lodz, Poland Basic definitions Nutrigenomics

More information

Algorithms for Genetics: Introduction, and sources of variation

Algorithms for Genetics: Introduction, and sources of variation Algorithms for Genetics: Introduction, and sources of variation Scribe: David Dean Instructor: Vineet Bafna 1 Terms Genotype: the genetic makeup of an individual. For example, we may refer to an individual

More information

Office Hours. We will try to find a time

Office Hours.   We will try to find a time Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: https://tinyurl.com/666-office-hours Thanks! Hardy Weinberg Equilibrium Biostatistics 666

More information

Supplementary Figure 1. Study design of a multi-stage GWAS of gout.

Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 2. Plot of the first two principal components from the analysis of the genome-wide study (after QC) combined with

More information

SAC review Haplotype mapping in human disease

SAC review Haplotype mapping in human disease 10.1576/toag.11.4.277.27532 http://onlinetog.org Haplotype mapping in human disease Author Linda Morgan Key content: Many obstetric and gynaecological disorders result from complex interactions between

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

Genetic Association Analysis with R Dr. Jing Hua Zhao

Genetic Association Analysis with R Dr. Jing Hua Zhao Genetic Association Analysis with R MRC Epidemiology Unit & Institute of Metabolic Science, Addenbrooke s Hospital, Cambridge CB2 0QQ, UK http://www.mrc-epid.cam.ac.uk/~jinghua.zhao jinghua.zhao@mrc-epid.cam.ac.uk

More information

arxiv: v1 [stat.ap] 31 Jul 2014

arxiv: v1 [stat.ap] 31 Jul 2014 Fast Genome-Wide QTL Analysis Using MENDEL arxiv:1407.8259v1 [stat.ap] 31 Jul 2014 Hua Zhou Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Email: hua_zhou@ncsu.edu Tao

More information

Whole Genome Sequencing. Biostatistics 666

Whole Genome Sequencing. Biostatistics 666 Whole Genome Sequencing Biostatistics 666 Genomewide Association Studies Survey 500,000 SNPs in a large sample An effective way to skim the genome and find common variants associated with a trait of interest

More information

Human Genetics and Gene Mapping of Complex Traits

Human Genetics and Gene Mapping of Complex Traits Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2018 Human Genetics Series Thursday 4/5/18 Nancy L. Saccone, Ph.D. Dept of Genetics nlims@genetics.wustl.edu / 314-747-3263 What

More information

B I O I N F O R M A T I C S

B I O I N F O R M A T I C S Bioinformatics LECTURE 3-16 B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Bioinformatics LECTURE

More information

Polygenic Influences on Boys & Girls Pubertal Timing & Tempo. Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University

Polygenic Influences on Boys & Girls Pubertal Timing & Tempo. Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University Polygenic Influences on Boys & Girls Pubertal Timing & Tempo Gregor Horvath, Valerie Knopik, Kristine Marceau Purdue University Timing & Tempo of Puberty Varies by individual (Marceau et al., 2011) Risk

More information

Core Resources Working Group Report. Opportunities for Investigator Engagement

Core Resources Working Group Report. Opportunities for Investigator Engagement Core Resources Working Group Report Opportunities for Investigator Engagement Goals of Core Resource Working Group Initial purpose was to explore intervention effects in the 4 clinical trials Extend definition

More information

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations

5/18/2017. Genotypic, phenotypic or allelic frequencies each sum to 1. Changes in allele frequencies determine gene pool composition over generations Topics How to track evolution allele frequencies Hardy Weinberg principle applications Requirements for genetic equilibrium Types of natural selection Population genetic polymorphism in populations, pp.

More information

Genetics and Bioinformatics

Genetics and Bioinformatics Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 3: Genome-wide Association Studies 1 Setting

More information

Association studies (Linkage disequilibrium)

Association studies (Linkage disequilibrium) Positional cloning: statistical approaches to gene mapping, i.e. locating genes on the genome Linkage analysis Association studies (Linkage disequilibrium) Linkage analysis Uses a genetic marker map (a

More information

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing André R. de Vries a, Ilja M. Nolte b, Geert T. Spijker c, Dumitru Brinza d, Alexander Zelikovsky d,

More information

Data quality control in genetic case-control association studies

Data quality control in genetic case-control association studies Data quality control in genetic case-control association studies Carl A Anderson 1,2, Fredrik H Pettersson 1, Geraldine M Clarke 1, Lon R Cardon 3, Andrew P Morris 1 & Krina T Zondervan 1 1 Genetic and

More information

Downloaded from:

Downloaded from: Dudbridge, F (03) Power and predictive accuracy of polygenic risk scores. PLoS genetics, 9 (3). e003348. ISSN 553-7390 DOI: https://doi.org/0.37/journal.pgen.003348 Downloaded from: http://researchonline.lshtm.ac.uk/74877/

More information

linkage signal sufficiently to identify a causative gene. GWA studies build on the valuable lessons learned from candidate gene and family linkage stu

linkage signal sufficiently to identify a causative gene. GWA studies build on the valuable lessons learned from candidate gene and family linkage stu SPECIAL COMMUNICATION How to Interpret a Genome-wide Association Study Thomas A. Pearson, MD, MPH, PhD Teri A. Manolio, MD, PhD Genome-wide association (GWA) studies use high-throughput genotyping technologies

More information

Concepts and relevance of genome-wide association studies

Concepts and relevance of genome-wide association studies Science Progress (2016), 99(1), 59 67 Paper 1500149 doi:10.3184/003685016x14558068452913 Concepts and relevance of genome-wide association studies ANDREAS SCHERER and G. BRYCE CHRISTENSEN Dr Andreas Scherer

More information

Reviewers' comments: Reviewer #1 (Remarks to the Author):

Reviewers' comments: Reviewer #1 (Remarks to the Author): Reviewers' comments: Reviewer #1 (Remarks to the Author): This is an interesting paper and a demonstration that diversity in the allelic spectrum, such as those in founder populations, can be leveraged

More information

Prediction and Meta-Analysis

Prediction and Meta-Analysis Prediction and Meta-Analysis May 13, 2015 Greta Linse Peterson Director of Product Management & Quality Questions during the presentation Use the Questions pane in your GoToWebinar window Golden About

More information

Introduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies

Introduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies Introduc)on to Sta)s)cal Gene)cs: emphasis on Gene)c Associa)on Studies Lisa J. Strug, PhD Guest Lecturer Biosta)s)cs Laboratory Course (CHL5207/8) March 5, 2015 Gene Mapping in the News Study Finds Gene

More information

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip : Sample Size, Power, Imputation, and the Choice of Genotyping Chip Chris C. A. Spencer., Zhan Su., Peter Donnelly ", Jonathan Marchini " * Department of Statistics, University of Oxford, Oxford, United

More information

OVERVIEW OF GOALS EXAMPLE DATASETS AND SOFTWARE

OVERVIEW OF GOALS EXAMPLE DATASETS AND SOFTWARE OVERVIEW OF GOALS PLINK WGAS Practical Exercise; March 2009 (http://pngu.mgh.harvard.edu/purcell/plink/) Shaun Purcell (shaun@pngu.mgh.harvard.edu) The first part of this tutorial can be approached in

More information

Package snpready. April 11, 2018

Package snpready. April 11, 2018 Version 0.9.6 Date 2018-04-11 Package snpready April 11, 2018 Title Preparing Genotypic Datasets in Order to Run Genomic Analysis Three functions to clean, summarize and prepare genomic datasets to Genome

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 4 2009, pages 497 503 doi:10.1093/bioinformatics/btn641 Genetics and population analysis ATOM: a powerful gene-based association test by combining optimally weighted

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Haplotype testing and Minimum GWAS analysis steps Jason Mezey jgm45@cornell.edu April 17, 2017 (T) 8:40-9:55 Announcements Project

More information

Human Genetics and Gene Mapping of Complex Traits

Human Genetics and Gene Mapping of Complex Traits Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2015 Human Genetics Series Thursday 4/02/15 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu January 29, 2015 Why you re here

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: A pooling-based genome-wide analysis identifies new potential candidate genes for atopy in the European Community Respiratory Health Survey (ECRHS) Authors: Francesc

More information

Statistical Tools for Predicting Ancestry from Genetic Data

Statistical Tools for Predicting Ancestry from Genetic Data Statistical Tools for Predicting Ancestry from Genetic Data Timothy Thornton Department of Biostatistics University of Washington March 1, 2015 1 / 33 Basic Genetic Terminology A gene is the most fundamental

More information