Genotyping requirements for complex disease studies

Similar documents
SNP GENOTYPING WITH iplex REAGENTS AND THE MASSARRAY SYSTEM

Computational Workflows for Genome-Wide Association Study: I

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genome-wide association studies (GWAS) Part 1

Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era

Genome-Wide Association Studies (GWAS): Computational Them

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

PERSPECTIVES. A gene-centric approach to genome-wide association studies

Axiom mydesign Custom Array design guide for human genotyping applications

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

rjlflemmers, LUMC, Leiden, The Netherlands 6/3/2010

Popula'on Gene'cs I: Gene'c Polymorphisms, Haplotype Inference, Recombina'on Computa.onal Genomics Seyoung Kim

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

TaqPath ProAmp Master Mixes

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

SNPs - GWAS - eqtls. Sebastian Schmeier

Mutations during meiosis and germ line division lead to genetic variation between individuals

A haplotype map of the human genome

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

What is genetic variation?

PLINK gplink Haploview

Amapofhumangenomevariationfrom population-scale sequencing

H3A - Genome-Wide Association testing SOP

Molecular Markers CRITFC Genetics Workshop December 9, 2014

Basic Concepts of Human Genetics

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

Lecture 2: Biology Basics Continued

Genetics and Psychiatric Disorders Lecture 1: Introduction

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

BST227 Introduction to Statistical Genetics

Laboratory Exercise 4. Multiplex PCR of Short Tandem Repeats and Vertical Polyacrylamide Gel Electrophoresis.

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

GenPlex HID Training Class I

Multiplex Assay Design

An introduction to genetics and molecular biology

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

Answers to additional linkage problems.

Target Enrichment Strategies for Next Generation Sequencing

Concepts: What are RFLPs and how do they act like genetic marker loci?

REVIEWS GENOME-WIDE ASSOCIATION STUDIES FOR COMMON DISEASES AND COMPLEX TRAITS. Joel N. Hirschhorn* and Mark J. Daly*

Evaluation of Genome wide SNP Haplotype Blocks for Human Identification Applications

Lecture 8: Sequencing and SNP. Sept 15, 2006

Age-Adjusted Death Rates for Coronary Heart Disease, U.S.,

LightScanner Hi-Res Melting Comparison of Six Master Mixes for Scanning and Small Amplicon and LunaProbes Genotyping

UF Center for Pharmacogenomics. Explanation of Services. UF Center for Pharmacogenomics Services

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Inheritance (IGCSE Biology Syllabus )

Getting high-quality cytogenetic data is a SNP.

Linking Genetic Variation to Important Phenotypes

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Human linkage analysis. fundamental concepts

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Concepts and relevance of genome-wide association studies

Genetics module. DNA Structure, Replication. The Genetic Code; Transcription and Translation. Principles of Heredity; Gene Mapping

Runs of Homozygosity Analysis Tutorial

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

Welcome to the NGS webinar series

Genomics Based Approaches to Genetic Improvement in Sugarcane. Robert Henry

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

BTRY 7210: Topics in Quantitative Genomics and Genetics

Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium Model

The Evolution of Populations

Why can GBS be complicated? Tools for filtering, error correction and imputation.

3I03 - Eukaryotic Genetics Repetitive DNA

Bio 311 Learning Objectives

Chapter 15 Gene Technologies and Human Applications

Unit 10: Genetics. Chapter 9: Read P

Cancer Genetics Solutions

Bioinformatics Advice on Experimental Design

Linkage Disequilibrium. Adele Crane & Angela Taravella

General aspects of genome-wide association studies

Genetics Test. Multiple Choice Identify the choice that best completes the statement or answers the question.

Basic Concepts of Human Genetics

C. Incorrect! Second Law: Law of Independent Assortment - Genes for different traits sort independently of one another in the formation of gametes.

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

The first and only fully-integrated microarray instrument for hands-free array processing

Observing Patterns in Inherited Traits. Chapter 11

Uniparental disomy (UPD) analysis of chromosome 15

Exploring Mendelian Genetics. Dihybrid crosses. Dihybrid crosses

#3: Random Fertilization. If DNA replication and cell division are both so precise, and so accurate, why are we all so unique??

4.1. Genetics as a Tool in Anthropology

Using Single Nucleotide Polymorphism (SNP) to Predict Bitter Tasting Ability

EOC Review Reporting Category 2 Mechanisms of Genetics

Mendel and The Gene Idea

Complementary Technologies for Precision Genetic Analysis

UK Biobank Axiom Array

JS 190- Population Genetics- Assessing the Strength of the Evidence Pre class activities

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

MassARRAY System MASSARRAY SYSTEM ACCELERATING RESEARCH

Chapter 23: The Evolution of Populations. 1. Populations & Gene Pools. Populations & Gene Pools 12/2/ Populations and Gene Pools

MassARRAY. Quantitative Methylation Analysis. High Resolution Profiling. Simplified with EpiTYPER.

Transcription:

Genotyping requirements for complex disease studies Grant Montgomery Molecular Epidemiology, Queensland Institute of Medical Research, Australia Queensland Institute of Medical Research

Outline Background Genetic markers Genome-wide association studies Genotyping technologies High quality genotypes and QC Interpreting the signals

The Challenge of Complex Disease Understanding the link between - DNA sequence (Genotype) Biology/Disease (Phenotype) ATTCGCATGGACC C A Environment

Complex Trait Model Marker Linkage Disequilibrium Gene 1 Association Individual environment Disease Phenotype Mode of inheritance Gene 2 Gene 3 Common environment Polygenic background

DNA polymorphisms Minisatellites Microsatellites >100,000 Many alleles, (CA) n, very informative, even, easily automated SNPs 10,054,521 (25 Jan 05) Most with 2 alleles (up to 4), not very informative, even, easily automated Detecting SNPs RFLPs Mass Spectrometry Bead Arrays A B C - G A - T A - T T - A G - C C - G T - A T - A T - A G - C T - A A - T C - G G - C A - T C - G A - T C - G A - T (CA) n G - C G - C C - G G - C A - T T - A A - T C - G G - C T - G C - G T - A A - T A - T A - T

Microsatellites or short tandem repeats (STRs) Detected by PCR Multiple alleles Widely used in linkage analysis and forensics

STR Profiles Jobling & Gill(2004)

The Positional Cloning Problem Chromosome Region Linkage to broad region Difficult to define more precise location Many possible genes Nature of the mutations/variants Deciding whether any variant is causal

Genetic architecture of complex genetic disorders Large Mendelian Disorders Highly Unusual Effect size Possible and detectable spectrum of common complex genetic disorders Very very Small Very very Rare Not detectable/ Not useful Allele Frequency Common

There have been few, if any, similar bursts of discovery in the history of medical research Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439. Stephen Channock

Single Nucleotide Polymorphisms (SNP) GGCTTCAGAATGGCC GGCTTCAAAATGGCC Single base changes Human SNPs = 10,054,521 - Validated SNPs 5,054,675 Frequency ~ 1 every 300 bp Can cause functional changes

Association studies to 2006 Candidate regions Some successes but generally: Poor replication Small sample sizes Conclusion effect sizes are smaller than expected selection of candidate regions

Candidate gene studies in endometriosis Reviewed >100 papers Results for > 60 genes Candidate genes chosen based on biology Mostly tested a few variants Small numbers of case and controls (<250 individuals) No associations widely replicated Montgomery et al, 2008

GWAS in humans Better understanding of patterns of human sequence variation 3,000,000,000 bases in human genome Advances in genotyping technology Sample collections of adequate size Genome-wide association scans Samples of interest ~10,000,000 positions commonly variant in Europeans 80% of these captured by typing ~500k test for evidence of association

Development of genome-wide association studies (GWAS) Risch & Merikangas, Science 1996 Human Genome Sequence 2003 ~10 million SNP polymorphisms (dbsnp) HapMap project 270 samples from 4 populations >3 million validated SNPs Linkage disequilibrium (LD) SNP chips Affymetrix (500k, 1M) Illumina (370k, 550k, 1M)

Haplotype Map of the Human Genome QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Goals: Define patterns of genetic variation across human genome Guide selection of SNPs efficiently to tag common variants Public release of all data (assays, genotypes) Phase I: 1.3 M markers in 269 people Phase II: +2.8 M markers in 270 people

Pairwise tagging A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 Tags: A A T T G G A A G C G C T C C C G C G C A C C C SNP 1 SNP 3 SNP 6 3 in total Test for association: high r 2 high r 2 high r 2 After Carlson et al. (2004) AJHG 74:106 SNP 1 SNP 3 SNP 6 (Mark Daly HapMap Consortium)

Cost per genotype Cents (USD) Progress in genotyping technology 10 2 ABI TaqMan 10 1 0.1 ABI SNPlex Sequenom PyroSeq Illumina Golden Gate Affymetrix 10K Perlegen Affymetrix Illumina 100K/500K Infinium/Sentrix 1 10 10 2 10 3 10 4 10 5 10 6 SNPs No of 2001 2007 Stephen Channock

Genome wide association >500k SNPs Hirschhorn & Daly Nat. Genet. Rev. 6: 95, 2005 >1-30k SNPs Replication Replication Replication NCI-NHGRI Working Group on Replication Nature 447: 655, 2007

SNP Genotyping Platforms Throughput (SNPs Per Assay) 1 35 >7500 TaqMan 7900 Illumina BeadStation Sequenom MassARRAY Cost Per Assay Flexibility in Project Design

Sequenom MassARRAY Medium throughput Primer extension Detection by Mass-Spectrometry 30-35 assays per sample 384 many 1000s samples

Sequenom SNP Platform Multiple or Single Base Primer Extension Chemistry Allele 1 Allele 2 EXTEND Primer (23-mer) EXTEND Primer (23-mer) CTA GTA extended Primer (24-mer) +Enzyme +ddgtp/ddatp +dctp/dttp extended Primer (25-mer) CTA GTA 21 22 23 24 25 26 27 28 4-Level specificity: 1: PCR two primers 2: Extension primer hybridization 3: Primer extension traps the event 4: Mass resolution expected masses Unambiguous high confidence results

Example of 25 Plex Assay using iplex on Compact MassARRAY Lowers the cost per genotype to under USD 0.06

Illumina BeadStation Linkage Mapping Custom Genotyping 1536 60,0000 SNPs Genome Wide Association Gene Expression

Whole Genome Genotyping: Infinium

Human610-QUAD Bead Chip Coverage CEU CHB YRI U.S. (residents with ancestry from N and W Europe collected in 1980 by the Centre d'etude du Polymorphisme Humain, CEPH) Japan, China Nigeria (Yoruba)

Custom SNP Set Custom Genotyping 96 Well Format First custom SNP set 1536 SNPs 1482 tag SNPs 225 coding SNPs 39 double tag SNPs in larger SNP bins

Illumina BeadStation 500 1 million markers across all chromosomes Comparison in four MZ twin pairs Mean error rate 6 SNPs in 1.06 million calls

Producing High Quality Genotypes Minimum Finished Genotypes (>98.5%) Quality of DNA Measure concentrations Dispense in large volumes Assay Design Repeat sequences SNPs in primer sequences Quality of Assays Check cluster plots Test for Hardy-Weinberg equilibrium Analysis of SNP data is particularly sensitive to assay problems Genotype failures are not random Heterozygous individuals fail most often All SNP typing platforms Include controls and check error rates Check controls Repeat assays

Producing High Quality Genotypes Sample collection Sample storage and tracking Laboratory Technique Mixed samples Data interpretation True mixtures

Standard Blood Collection and Processing Samples are collected in the following tubes: 2 x EDTA 1 x SERUM 1 x ACD 1 x PAX 1 x BUCCAL MNC Processing Buccal Extraction 4 x Red Blood Cells 4 x Plasma 4 x Serum The 2 x EDTA & 1 x SERUM tubes are centrifuged at 3000rpm for 10mins and then the fractions are collected. All fractions & 1 x Buffy Coat are stored in the -80 o C freezers Stored in Freezer for later RNA work 2 x Buffy Coats 1 x Buffy Coat Extraction

DNA Quantitation Stock DNA 400ul 1 x TE Stock DNA 1:5 Dilution (100ul stock + 400ul 1 x TE) 1:5 Dilutions 1:100 Dilution (5ul 1:5 + 495ul 1 x TE) 96 deep well plate Based on Fluoroskan Picogreen results, the 1:5 added to plates dilution is and standards, modified to fluorescence 50ng/ul by detected by addition of more Ascent buffer or stock + Fluoroskan 50ul of 1:100 transferred to Black OptiPlates in duplicate Known DNA Standards New DNA dilution 50ng/ul 500ul+ + Remaining Stock DNA 300ul Expensive but costs offset by savings in better quality genotypes, less DNA used and reduced reaction volumes

Multiplex Assays Must be Tested Poor Markers Redesigned

Genotyping artifacts Allele 1 Allele 2 EXTEND Primer (23-mer) EXTEND Primer (23-mer) CTA GTA extended Primer (24-mer) +Enzyme +ddgtp/ddatp +dctp/dttp extended Primer (25-mer) CTA GTA Base change (SNP) under the primer site 21 22 23 24 25 26 27 28 4-Level specificity: 1: PCR two primers 2: Extension primer hybridization 3: Primer extension traps the event 4: Mass resolution expected masses Unambiguous high confidence results

Producing High Quality Genotypes Null Allele?

Producing High Quality Genotypes Plate variation

Genotype Quality Control Control Group 1 Control Group 2 Case Group

CNVs in MZ Twins CNV Analysis of one twin pair showing a 1.6 Mb deletion on chromosome 2 Bruder et al. (2008) AJHG 82, 1 9,

DNA mixtures Mixed samples Blood transfusions Chimeras rare cases share cells with co-twin in utero blood chimeras true chimeras

DNA Mixtures Science 308: 1864 24 June 2005

The 'semi-identical' twins are the result of two sperm cells fusing with a single egg a previously unreported way for twins to come about. The twins are chimaeras, meaning that their cells are not genetically uniform. Each sperm has contributed genes to each child. news@nature.com

Allele sharing in chimeric twins Golden Gate 6008 SNPs Heterozygous markers Father Mother 779 675 Shared alleles 52.1% 100%

Possible Mechanisms (A) the three gamete model immediate cleavage secondary to parthenogenetic activation of the egg followed by fertilization of the identical cells formed, by two different sperm containing different sex chromosomes. (B) dispermic fertilization of an ovum followed by the postzygotic diploidization of triploids concept as postulated by Golubovsky (2003).

Wellcome Trust Sanger Institute SNP QC

Genotype Quality Control All SNPs which exhibit phenotype association(s) should have their hybridization intensity cluster plots manually examined for potential biases or failures. This check can halve the false positive rate and reduce the cost of a replication experiment. Each plot is inspected for: 1. Over-dispersion of the genotype clusters or overlap 2. Biased no calling 3. Erroneous genotype assignment A SNP failing any of the above QC criteria is excluded from further analyses. WTSI QC Pipeline

Tag SNPs probably not the casual variants SNP association with disease allele marker SNP disease allele GENE marker SNP marker SNP marker SNP Linkage and LD assume markers have indirect association with the trait Large SNP collections and cheaper genotyping may allow testing for direct, physiologically relevant associations with trait

Human OCA2 and blue/brown eye colour A three-snp haplotype in the first intron of OCA2 explains most human eye color variation Zhu et al., Twins Res 7:197-210 (2004) Duffy et al., AJHG Feb, 2007

Single variant upstream of OCA2 determines eye colour rs12913832 and eye colour C/C T/C T/T 0 0.2 0.4 0.6 0.8 1 Eye colour frequencies 21 Kb OCA2 HERC2 Sturm et al., AJHG 82: 424-431, 2008

A single SNP within intron 86 of HERC2 determines Blue-Brown eye colour Sturm et al., AJHG 82: 424-431, 2008 rs12913832 C = Blue rs12913832 T = Brown HLTF Sulem et al, Nat Genet 39: 1443, 2007 Kayser et al, AJHG 82: 411-423, 2008 Eiberg et al, Hum Genet 123: 177-187, 2008

Block III Log(p) values for Illumina SNPs located in a 2 Mb region centred around the 122 Kb block III (marked by solid vertical lines)

High throughput sequencing Whole genome and targeted resequencing Discovery of rare variants Additional SNP variation Copy number variations and chromosomal rearrangements

DNA Requirements Amount (ng) 1 1 30 600k Some sequencing applications might require 20 g

DNA Requirements Amount ( g) 1 1 30 600k Genome Sequence Some sequencing applications might require >20 g

Conclusions Rapid advances in genome technologies Accurate high throughput SNP typing platforms Discovery of many genes/variants contributing to risk for common diseases Errors and artefacts still occur Careful QC from sample collection to data analyses Typical data set (4000 individuals 600k SNPs) 2.4 x 10 9 genotypes A good quality data set come thanks to good lab people