Genome-wide association studies (GWAS) Part 1

Similar documents
CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Genome-Wide Association Studies (GWAS): Computational Them

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Supplementary Note: Detecting population structure in rare variant data

General aspects of genome-wide association studies

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Answers to additional linkage problems.

PLINK gplink Haploview

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

An introductory overview of the current state of statistical genetics

Oral Cleft Targeted Sequencing Project

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

An introduction to genetics and molecular biology

Linking Genetic Variation to Important Phenotypes

Linkage Disequilibrium. Adele Crane & Angela Taravella

Concepts and relevance of genome-wide association studies

Exploring the Genetic Basis of Congenital Heart Defects

Human linkage analysis. fundamental concepts

Derrek Paul Hibar

Basic Concepts of Human Genetics

AP BIOLOGY Population Genetics and Evolution Lab

Basic Concepts of Human Genetics

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Conifer Translational Genomics Network Coordinated Agricultural Project

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Axiom mydesign Custom Array design guide for human genotyping applications

Conifer Translational Genomics Network Coordinated Agricultural Project

Why can GBS be complicated? Tools for filtering, error correction and imputation.

LAB ACTIVITY ONE POPULATION GENETICS AND EVOLUTION 2017

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

Population and Community Dynamics. The Hardy-Weinberg Principle

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

Practical integration of genomic selection in dairy cattle breeding schemes

Genetic variation, genetic drift (summary of topics)

MAPPING GENES TO TRAITS IN DOGS USING SNPs

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Haplotype phasing in large cohorts: Modeling, search, or both?

Question. In the last 100 years. What is Feed Efficiency? Genetics of Feed Efficiency and Applications for the Dairy Industry

Mutations during meiosis and germ line division lead to genetic variation between individuals

Introduction to statistics for Genome- Wide Association Studies (GWAS) Day 2 Section 8

Concepts: What are RFLPs and how do they act like genetic marker loci?

Data Mining and Applications in Genomics

Biology Genetics Practice Quiz

Additional file 8 (Text S2): Detailed Methods

It s not a fundamental force like mutation, selection, and drift.

UK Biobank Axiom Array

Using the Trio Workflow in Partek Genomics Suite v6.6

Genomic Selection Using Low-Density Marker Panels

Chapter 14: Genes in Action

Mapping and Mapping Populations

Genetic load. For the organism as a whole (its genome, and the species), what is the fitness cost of deleterious mutations?

A haplotype map of the human genome

Inheritance Biology. Unit Map. Unit

Conifer Translational Genomics Network Coordinated Agricultural Project

Random Allelic Variation

Sequencing Millions of Animals for Genomic Selection 2.0


GENOME-WIDE ASSOCIATION STUDIES: THEORETICAL AND PRACTICAL CONCERNS

Statistical Methods for Genome Wide Association Studies

Exam 1 Answers Biology 210 Sept. 20, 2006

The Evolution of Populations

Population Genetics (Learning Objectives)

Package powerpkg. February 20, 2015

POPULATION GENETICS. Evolution Lectures 4

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

Runs of Homozygosity Analysis Tutorial

Chp 10 Patterns of Inheritance

Author's response to reviews

POPULATION GENETICS: The study of the rules governing the maintenance and transmission of genetic variation in natural populations.

Amapofhumangenomevariationfrom population-scale sequencing

DO NOT OPEN UNTIL TOLD TO START

Genomic Selection using low-density marker panels

SNP calling and VCF format

Genetic Association Studies

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Age-Adjusted Death Rates for Coronary Heart Disease, U.S.,

Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Fundamentals of Genomic Selection

Characterization of Allele-Specific Copy Number in Tumor Genomes

Bio 311 Learning Objectives

Quantitative Genetics

Strategy for applying genome-wide selection in dairy cattle

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

Observing Patterns in Inherited Traits. Chapter 11

OBJECTIVES-ACTIVITIES 2-4

A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts

Linkage Disequilibrium Mapping via Cladistic Analysis of Single-Nucleotide Polymorphism Haplotypes

ch03 Student: If a phenotype is controlled by the genotypes at two different loci the interaction of these genes is called

Section 4 - Guidelines for DNA Technology. Version October, 2017

4.1. Genetics as a Tool in Anthropology

Genetics II: Linkage and the Chromosomal Theory

A GENOTYPE CALLING ALGORITHM FOR AFFYMETRIX SNP ARRAYS

Gen e e n t e i t c c V a V ri r abi b li l ty Biolo l gy g Lec e tur u e e 9 : 9 Gen e et e ic I n I her e itan a ce

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

MICROSATELLITE MARKER AND ITS UTILITY

Transcription:

Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi

Published Genome-Wide Associations through 12/2012 Published GWA at p 5e-8 for 17 trait categories Contents: 1.Concepts, Rationale 2.Technology and Data 3.Examples and criticism 4.Applications NHGRI GWA Catalog www.genome.gov/gwastudies www.ebi.ac.uk/fgpt/gwas/

What is an association? IDEAL CASE TYPICAL CASE

Concepts Individuals (humans, mice, flies, plants) Phenotypes, traits Quantitative (height, cholesterol levels) Discrete (case-control, Parkinson's disease) EXPLAIN Understanding, mechanisms, therapeutics PREDICT Intervention, agriculture, no need to understand

Y=μ+G+E+(GxE) Y phenotype, μ population mean E nvironment Chemicals, temperature, food, physical activity... G enetics DNA, A,C,G,T, 3e+9 bases, 22 chrs + X + Y, diploid, meiosis Genes proteins, 2e+4, 1-2 % of DNA Single nucleotide polymorphisms (SNPs) GxE not in this course

Transmission of genomes From parents to offspring with recombination and mutation Variation at two physically close genetic loci tend to be statistically correlated

Linkage disequilibrium Non-independence of alleles at (two) SNPs in population LD SNPs tag each other in a population sample Knowing allele at locus 1 gives some information about the allele at locus 2 Basis for gene mapping with a SNP panel A: 50% B: 50% a: 50% b: 50% A/a B/b A- B 0.25 0.3 0.5 A- b 0.25 0.2 0 a-b 0.25 0.2 0 a-b 0.25 0.3 0.5 r^2 0 0.13 1

1000 genomes project

Chr 2 pos 136608646 From 1000 Genomes database

Does genetics affect trait? Do more close relatives have more similar phenotypes (on average)? Twin studies compare monozygotic twins with dizygotic twins Environment can be a confounder in humans Recently, same question has been asked with unrelated individuals (Yang 2010, Nat Gen) Environment is not an (obvious) confounder

Variance component analysis Decompose trait variance into genetic and environmental (Y=G+E) For height in 14,500 Finnish samples, we get that genetic component explains 52% of variation Heritability, the proportion of variance explained by genetics

Gene mapping Once we know there is a genetic component to the trait we want to find the relevant positions (loci, a locus) on the genome affecting the trait Linkage analysis in pedigrees Association analysis in populations

Linkage analysis: Trace whether some segments of genome are co-transmitted with the disease Palles et al. Nat Gen 2013

Linkage analysis Families Does trait correlate with genetic sharing at some parts of the genome? Parametric Linkage analysis in pedigrees Affected sib-pairs (non-parametric) Need for families restricts sample size and thus size of genetic effects that can be found Localisation is coarse since large blocks of genome are linked in close relatives Sequence data and annotations of variants help

Collect a population sample and test whether trait is associated with any genetic variants

Association analysis If a large panel of SNPs can be genotyped, association between each SNP and the trait can be tested Role of LD (HapMap project, 1000 Genomes) Only samples from (homogeneous) population (not families) needed Genome coverage and localisation depend on #SNPs and LD in the population

Risch: Searching for genetic determinants in the new millenium Nature, June 2000

Genotyping Current Human SNP Chips Contain probes for several million SNPs Price ~100 euros/sample Figure from Steven M. Carr http://www.mun.ca/biology/ scarr/dna_chips.html

Genotype calling OptiCall Shah et al.

Output from genotyping GOOD! ERROR, clustering algorithm ERROR, monomorphic ERROR, structural variant From D.Phil thesis of Damjan Vukcevic, Oxford, 2009

Quality control Remove individuals and SNPs that show low quality Individuals: Sex discrepancies, missingness, heterozygosity, relatedness, ancestry SNPs: missingness, deviation from HardyWeinberg equilibrium, low minor allele frequency

X intensity Sex

Relatedness Genome Pairs of individuals Number of chromosomes Identical by Descent 0: Gray 1: Blue 2: Red

Population structure

QC genotype calls The optimal approach is either to model the calling errors or to look at all cluster plots SNP filtering is then a short cut The level of SNP filtering is therefore a trade-off

Minor allele frequency Genotype calling algorithms like to model the extra clusters

Hardy-Weinberg Clustering errors, or putative structural variation often distorts HW equilibrium

Missingness High missingness is also indicative of clustering failure or poor cluster resolution

Testing for association 100% of SNPS

Testing for association 80.69% of SNPS 1% MAF

Testing for association 78.36% of SNPS MAF > 1% & info > 0.975

Testing for association 78.36% of SNPS MAF > 1% & info > 0.975 & HW < 1e-20

Testing for association 77.92% of SNPS MAF > 1% & info > 0.975 & HW < 1e-20 miss < 2%

Fingerprint markers, gender checks Genotype QC, intensity outlier Given an individual s disease status, their genotype at a SNP, is independent of, and identically distributed to, other samples conditional on the underlying genotype frequencies in cases and controls Duplicate checks, relatedness Population structure analysis

A good Manhattan plot Wellcome Trust Case Control Consortium, Crohn's disease, Nature 2007 Shows signals supported by many neighboring SNPs

A retracted paper Sebastiani et al. Genetic signatures of exceptional longevity in humans Science July 2010 Was retracted in July 2011 because QC had not been done properly! Sebastiani et al. 2010 Science

Principal compnent analysis (PCA) Dimension reduction technique in statistics Represent high dimensional data points in a few (usually one or two) dimensions Optimization criterion is maximal empirical variance (See handout)

PC plot of Europeans Novembre et al. 2008 Nat Gen

PCA in Northern Europe Salmela E, et al. (2008) Genome-Wide Analysis of Single Nucleotide Polymorphisms Uncovers Population Structure in Northern Europe. PLoS ONE 3(10): e3519

Uses of PCA in GWAS PCs tell about ancestry Exclude individuals of unexpected ancestry Use PCs as covariates in the analysis to correct for possible biases induced by sample collection or non-genetic geographical effects on phenotype

Population structure in case-control association studies Are there differences in genotype frequencies between cases and controls? If yes, then locus is possibly interesting But could also reflect ascertainment scheme if cases and controls are not well matched w.r.t. genetic background! Example: Let's look at analysis where 5,000 UK controls are compared against 1,800 UK + 500 Irish Psoriasis cases

UK+Irish Psoriasis study Not well matched w.r.t genetic background!

Basic analysis First PC as covariate LACTASE REGION SNP LACTASE REGION? Population Phenotype