Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

Similar documents
Computational Workflows for Genome-Wide Association Study: I

Genomics Resources in WHI. WHI ( ) Extension Study Steering Committee Meeting Seattle, WA May 05-06, 2011

Genome-Wide Association Studies (GWAS): Computational Them

Crash-course in genomics

SNPs - GWAS - eqtls. Sebastian Schmeier

Association studies (Linkage disequilibrium)

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

H3A - Genome-Wide Association testing SOP

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

Roadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Axiom Biobank Genotyping Solution

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Understanding genetic association studies. Peter Kamerman

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

BTRY 7210: Topics in Quantitative Genomics and Genetics

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato

Familial Breast Cancer

1b. How do people differ genetically?

Prostate Cancer Genetics: Today and tomorrow

Human Genetics and Gene Mapping of Complex Traits

Benno Pütz. MPI of Psychiatry

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Linking Genetic Variation to Important Phenotypes

Genomic resources. for non-model systems

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Redefine what s possible with the Axiom Genotyping Solution

Genome-wide analyses in admixed populations: Challenges and opportunities

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

BTRY 7210: Topics in Quantitative Genomics and Genetics

Analysis of genome-wide genotype data

Global Screening Array (GSA)

POLYMORPHISM AND VARIANT ANALYSIS. Matt Hudson Crop Sciences NCSA HPCBio IGB University of Illinois

Genetic Analysis Platform. Wendy Winckler, Ph.D. October 7, 2010

PERSPECTIVES. A gene-centric approach to genome-wide association studies

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Brassica carinata crop improvement & molecular tools for improving crop performance

Danika Bannasch DVM PhD. School of Veterinary Medicine University of California Davis

Genome Wide Association Studies

LD Mapping and the Coalescent

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Exploring the Genetic Basis of Congenital Heart Defects

B I O I N F O R M A T I C S

Human linkage analysis. fundamental concepts

Human Genetics and Gene Mapping of Complex Traits

EPIB 668 Introduction to linkage analysis. Aurélie LABBE - Winter 2011

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome.

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Introduction. RESEARCHERS DATA DICTIONARY Genetic Data (RDD-Gen)

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

Whole Genome-Assisted Selection. Alison Van Eenennaam, Ph.D. Cooperative Extension Specialist

Genetics and Bioinformatics

Bioinformatics Advice on Experimental Design

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Translational Medicine in the Era of Big Data: Hype or Real?

Genomic Selection in Cereals. Just Jensen Center for Quantitative Genetics and Genomics

Introduction to Quantitative Genomics / Genetics

Genome-wide association studies (GWAS) Part 1

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Strategic Research Center. Genomic Selection in Animals and Plants

Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases

Experimental Design and Sample Size Requirement for QTL Mapping

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

GENOME-WIDE ASSOCIATION STUDIES: THEORETICAL AND PRACTICAL CONCERNS

Trudy F C Mackay, Department of Genetics, North Carolina State University, Raleigh NC , USA.

Blood Pressure and Hypertension Genetics

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Core Resources Working Group Report. Opportunities for Investigator Engagement

Next Generation Sequencing Technologies. Rob Mitra 1/30/17

ACCEPTED. Victoria J. Wright Corresponding author.

HLA and other tales: The different perspectives of Celiac Disease Gutierrez Achury, Henry Javier

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

UKPMC Funders Group Author Manuscript Nature. Author manuscript; available in PMC 2011 April 1.

Quantitative Genetics for Using Genetic Diversity

I.1 The Principle: Identification and Application of Molecular Markers

Using the Association Workflow in Partek Genomics Suite

DNA METHYLATION RESEARCH TOOLS

DNA Pooling Rationale. GWAS pooling study. DNA pooling. Cost. Larry Kuehn, U.S. Meat Animal Research Center. June 10, 2015

The Revolution in Human Genetics: Deciphering Complexity. David Galas Institute for Systems Biology, Seattle, WA

Centre for Genetic Epidemiology & Biostatistics

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

Supplementary Note: Detecting population structure in rare variant data

S SG. Metabolomics meets Genomics. Hemant K. Tiwari, Ph.D. Professor and Head. Metabolomics: Bench to Bedside. ection ON tatistical.

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Genotyping requirements for complex disease studies

INTRODUCTION TO MOLECULAR GENETICS. Andrew McQuillin Molecular Psychiatry Laboratory UCL Division of Psychiatry 22 Sept 2017

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

REVIEWS GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS. Joel N. Hirschhorn* and Mark J. Daly*

Transcription:

Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

Key considerations Strength of association Exposure Genetic model Outcome Quantitative trait Binary Prevalence Measurement

Timeline Understanding of architecture of human genome Developments in technology Understanding of (inherited) genetic basis of complex disease 1866 1980 1990 2000 2010

What is meant by the genetic model? Allele frequency Common >5% Uncommon 1-5% Rare 0.1 1% Very rare <0.1% Mode of inheritance Dominant/recessive Co-dominant Effect size

What is likely genetic model? Common alleles conferring relative risk >2 not been identified for any complex phenotype Uncommon and rare alleles conferring modest risks (RR = 2-4) have been identified

Finding common variation for complex phenotypes Association study Compare allele frequencies in subjects with (cases) and without (controls) phenotype of interest Statistical methods generally straightforward

Candidate gene studies: results Many putative associations published None replicated robustly Inappropriate reliance on P<0.05 as significant

An aside on P-values The P-value is the probability of data IF null hypothesis is true It is NOT the probability that the null hypothesis is true (true negative probability) TNN depends on prior and power If prior and power are small, even a very small P is more likely to represent a false positive Number of tests irrelevant

Prior The probability of association for locus under study Given that a locus detectable by association must have a detectable effect size explain >0.1% variance Thus 1,000 loci at most 10 million common variants Prior 1:10,000 at best Low signal to noise ratio May improve this by 10x with good candidate selection

Genome-wide association studies Possible to select a set of SNPs that tag all the common variation in genome Genotyping technology enable this to be done on large sample sizes Still too expensive to genotype everything Phased study design

Data analysis Simple test of association SNP by SNP Genome-wide significance Based on the principle of adjusting for multiple testing 10M SNPs are correlated, so not 10M independent tests 5 x 10-8 is approx equal to 0.05 corrected However, small biases can have disproportionate effects at the extremes of the distribution of test statistics Large samples sizes are needed

Power to detect association Prevalence of radiosensitivity phenotype = 10 percent P < 5 x 10-8 N = 4000 N = 8000

Phased GWAS designs Genome-wide array 200k 1M 2000 cases and 2000 controls Test for association P>0.05 P<0.05 Custom array 4000 cases and 4000 controls P>0.00001 Test for association P<0.00001 Top hits 10,000 cases and 10,000 controls SNPs with P<10-8 declared significant

GWAS results Over 1,400 loci for 250 complex phenotypes Relative risks modest Proportion of genetic component of phenotypic variance explained <20% Missing heritability Range of underlying genetic models possible Large scale GWAS for radiosensitivity phenotypes just being established Radiogenomics Consortium

Searching for uncommon and rare variants Understanding of human genomic architecture continues to increase 1000 Genomes Project Resequencing different populations Many studies now sequencing exomes of subjects with selected phenotypes ~40 million different variants detected by 1000GP

Next generation association studies High throughput genotyping arrays now being designed to capture rarer variation Illumina Omni 5M >1% variation across the genome Illumina and Affymetrix exome arrays ~270K variants Selected from an analysis of 12,000 exomes Variant present in at least 3 exomes Testing for association as with common variants

Problems Population substructure may cause substantial bias that is difficult to correct for using standard methods developed for analysis of common variants Small errors in genotype calling may result in bias Sample size remains a major problem

Power to detect rare variants Prevalence of radiosensitivity phenotype = 10 percent P < 5 x 10-8 N = 4000 N = 8000

Resequencing Alternative to genotyping Whole genome sequencing now ~$5,000 Exome sequencing $500 Targeted sequencing 10 genes $50

Problems Data management Very large data files Exome 1-5 Gbytes Data analysis Variant calling Each genome will include very large number of variants including private variants of unknown function Methods for statistical analysis not yet developed Sample size

Conclusions Only fraction of the inter-individual variation in radiation sensitivity explained by well known sensitivity syndromes Technological advances have made it possible to genotype/sequence thousands of subjects making large scale genetic epidemiological studies feasible Considerable obstacles still to be overcome

Questions?