Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iselect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
Illumina s Mission Innovating for the Future of Genetic Analysis From Genome Wide Discovery To Targeted Validation and Beyond To be the leading provider of integrated solutions that advance the understanding of genetics and health 2
Illumina s Genotyping Product Portfolio 5,000,000 WGA 200,000 6000 # of Markers 3000 1536 768 384 192 144 96 iselect GoldenGate GoldenGate Indexing Options: Illumina s Genotyping toolbox provides an entire spectrum of plex ranges from 1 to 5M SNPs Quality: Robust assay technologies (GoldenGate and Infinium) provide industry-best data quality Flexibility: Interrogate virtually any SNP Reliability: Proven GG and Infinium platforms support researchers with effective and dependable tools 48 24 ASPE <10 100 1000 2000+ Samples per day*: 280 days/year 3
Overview First-gen GWAS vs. Next-gen GWAS Next-gen Sequencing and the 1kGP Revolution Illumina s GWAS Roadmap 4
Why Does Understanding Genetics Matter? Disease Is Largely Heritable Many diseases have familial risk, or heritability Heritability is made up of the sum of effects of different genetic factors Identification of the genetic factors is needed To understand genes and pathways involved in disease To develop better diagnosis, prognosis, treatment To analyse non-genetic factors (environment) 5
The Spectrum of Variants in Disease First Generation GWAS (Circa 2005) Effect size Small Large LINKAGE Very Rare Variants Large Effect Size (Rare Variants Minimal Effect Size) HGSP & HapMap (Common Variants Large Effect Size) Common Variants First-Gen GWAS Low Allele Frequency High 6
Linkage Disequilibrium and Haplotype Blocks The Phenomenon that Makes GWAS Possible 2005: First GWAS *Adapted from Klein et al. 2005, Science 7
The GWAS Approach is Successful in Human Genetics Year 2006 2007 2008 2009 # of Pubs 8 89 151 222 674 GWAS Publications, 3275 Associations as of 9/28/2010 8
NHGRI GWA Catalog www.genome.gov/gwastudies 9
The Case of the Missing Heritability For most common diseases, the sum of individual effects found so far is much less than the total estimated heritability 100% Rare Common Disease/Traits 80% 60% 40% Heritability Missing Explained 20% 0% 10 Adapted from Manolio et al 2009
The Missing Heritability For most common diseases, the sum of individual effects found with common variants is much less than the total estimated heritability Potential Reasons: Many studies might have been underpowered because of: Overestimation of genomic coverage; HapMap vs. 1000 Genomes Low Sample size Exclusion of rare variants (0.3-5%) Additional contributions from Epigenetic, CNV other unknown factors New Hypotheses: CNV (rare and common) Methylation Gene-Gene Interaction Importance of Rare Variants 11
Next-gen Sequencing and the 1kGP Revolution: Bridging the Gap improved coverage of common variants unprecedented access to rare variation 12
Sequencing and Arrays driving next wave of discovery Targeted resequencing Next-Gen Sequencing Custom Genotyping Arrays Next-gen GWAS Arrays Array Design 13
The 1,000 Genomes Project Sequence 2,500 genomes to complete the picture of genetic variation Achieve a nearly complete catalog of common human genetic variants with frequency 1% or higher. Project Goals 1. Accelerate fine-mapping efforts in gene regions indentified through genome-wide association studies or candidate gene studies 2. Improve the power of future genetic association studies by enabling design of next-generation genotyping microarrays that more fully represent human genetic variation 14 3. Enhance the analysis of ongoing and already completed association studies by improving our ability to impute or predict untyped genetic variants
HapMap Represents a Small Part of All Variation SNPs by observations in 60 CEU Samples 1.0 0.8 HapMap 1000 Genomes Count (x 10 6 ) 0.6 0.4 0.2 0.0 0 10 20 30 40 50 60 Minor Alleles 15
New Content for Next-gen GWAS Arrays Rich content to explore new hypotheses and enable new discoveries Sequence to discover SNPs >1% MAF (1000-Genomes project) Leverage the power of LD to select tagsnps and remove redundancy Include progressively more SNPs at lower allele frequencies (5%, 2.5%, 1%) Project Year Approx. Cumulative SNPs found Tag SNPs needed for max coverage Lower limit of allele frequency targeted % variation tagged (r 2 >0.8) HapMap 2003-2007 3M ~0.6M 5% >90% 1kG Pilot Project 2008-2009 13M ~2.5M 2.5% ~80% 1kG Full Project 2010 35M* ~5.0M 1% >90% * Estimated 16
Genomic Coverage for HapMap versus 1000 Genomes OMNI 2.5 vs Hap610/660 MAF: > 2.5% Blue: HapMap ~3M SNPs Pink: 1000 Genomes ~17M SNPs 17
Tackling the Full Spectrum of Variants in Disease Effect size Small Large Very Rare Variants Large Effect Size SEQUENCING LINKAGE NEW! Rich GWAS Rare/Intermediate Variants Intermediate Effect Size (Rare Variants Minimal Effect Size) NEW! (Common Variants Large Effect Size) Common Variants Small Effect Size Ongo ing GWAS GWAS Low Allele Frequency High 18
19 Illumina s GWAS Roadmap
GWAS Roadmap Review Content Source HapMap Phase 1 HapMap Phase 2 HapMap Phase 3 1,000 Genomes Project Future GWAS Products Array Products HumanOmni1-Quad & OmniExpress Human1M-Duo Human660-Quad HumanHap500 HumanHap300 2.5M 5M Data Points per Sample 1M 550K 660K 317K Current Projected Announcement at ASHG, Oct 2009 20
The Omni Family of Microarrays Next-generation GWAS. NOW. Omni Express* Omni1- Quad Omni1S-8 Omni2.5- Quad Omni2.5S Omni5 Highestthroughput array with industryproven quality at an exceptional price. Optimal combination of common SNPs, CNVs, and content from 1kGP. Takes researchers from Omni1/Express to 2.5M The most optimal and comprehensive set of both common and rare SNP content from the 1kGP ~2.5M additional markers providing rare 1kGP content The ultimate GWAS tool providing near complete coverage of common and rare variation MAF > 5% MAF >2.5% MAF > 1% 21
Illumina s GWAS Roadmap Content optimized from next-gen re-sequencing efforts such as 1000 Genomes. Pushing the boundary of GWAS content into unexplored territory Cost effective path for researchers that want to ride the cutting edge today 22
Roadmap Paths Path Step 1 Step 2 Step 3 Total Markers 1 ~4.4 Million OmniExpress Omni1S Omni2.5S 2 ~5 Million Omni1 Omni1S Omni2.5S 3 ~5 Million Omni2.5 Omni2.5S 4 ~5 Million Omni5 23
2010 GWAS Roadmap Multiple chips made Easier with the Multi-use Workflow Roadmap Entry Point Second Array Third Array Omni1-Quad Multi-use Omni1S Multi-use Omni2.5S Multi-use OmniExpress Multi-use Omni1S Multi-use Omni2.5S Multi-use Omni2.5 Multi-use Omni2.5S Multi-use 24
25 Omni2.5 Details
HapMap represents only a small part of the common variation SNPs by observations in 60 CEU samples 1.0 0.8 HapMap 1000 Genomes Count (x 10 6 ) 0.6 0.4 0.2 26 0.0 0 10 20 30 40 50 60 Minor Alleles
This is not just an array with new content! The Omni2.5 array is a complete game-changer! 100% CEU Coverage Estimates: HapMap vs. 1kGP Reference Data 90% 80% % Captured at r 2 >0.8 70% 60% 50% 40% 30% 20% 10% 0% Competitor New Array * Competitor Old 900K 660W Omni1/ OmniExpress Omni2.5 HapMap 5% 1kGP 5% 1kGP 2.5% 27 *Base content only
Genomic Coverage Stats for African Populations 1 YRI Coverage Estimates: HapMap vs. 1kGP Pilot Data 0.9 0.8 0.7 % Captured at r 2 >0.8 0.6 0.5 0.4 HapMap 5% 1kGP 5% 1kGP 2.5% 0.3 0.2 0.1 0 Competitor "New Array" * Competitor "Old 900K" 660W Omni1/ OmniExpress Omni2.5 28 *Base content only
Genomic Coverage Stats for Asian Populations 1 CHB/JPT Coverage Estimates: HapMap vs. 1kGP Pilot Data 0.9 0.8 0.7 % Captured at r 2 >0.8 0.6 0.5 0.4 0.3 HapMap 5% 1kGP 5% 1kGP 2.5% 0.2 0.1 0 Competitor "New Array" * Competitor "Old 900K" 660W Omni1/ OmniExpress Omni2.5 29 *Base content only
Final Omni2.5 Content Category Count SNPs from previous products i.e. OmniExpress 720,219 SNPs selected from 1000 Genomes 1,729,781 Polymorphic in CEU 1,028,961 Polymorphic in YRI 1,344,573 Polymorphic in CHB+JPT 824,800 Already in dbsnp 808,471 Discovered SNPs 921,310 Total after gentrain 2,450,000 30
Enabling discoveries with next-gen GWAS Arrays Effect size 12.0 6.0 3.0 1.5 Sequencing Very Rare Variants Large Effect Size Rare/Intermediate Variants Intermediate Effect Size Omni5M Omni2.5M Omni1 (Common Variants Large Effect Size) 1.1 (Rare Variants Minimal Effect Size) Common Variants Small Effect Size 1% 5% Allele Frequency 25% 31
Summary First-generation GWAS has provided a foundation for beginning to understand the genetic architecture of many diseases and traits. However, first-generation GWAS was limited by the extent of knowledge about the spectrum of variation in humans in the HapMap era. NGS re-sequencing efforts, such as 1kGP, are providing a much more comprehensive catalog of common variation (>1% MAF) in diverse populations Next-gen GWAS tools are leveraging this expanded catalog of variation to drive a new wave of genetic discovery by enabling exploration of the rare-variant hypothesis and higher resolution CNV research in a costeffective tools. 32
33 Thank you!