The Agilent Technologies SureSelect Platform for Target Enrichment Focus your next-gen sequencing on DNA that matters Kimberly Troutman Field Applications Scientist January 27 th, 2011
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 2
Target Enrichment: A Highly Enabling Process What? Also referred to as genome partitioning, targeted re-sequencing, DNA capture Captures genomic material of interest for next generation sequencer (i.e. Illumina, SOLiD, 454 etc ) gdna Why? Sequence your regions of interest! Enables focus on a subset of the genome Saves both time and money for downstream sequencing Enriched library Identify homozygous and heterozygous variants in targets relative to the reference genome Page 3
Agilent s SureSelect Platform: Two Options SureSelect Target Enrichment System* Developed in collaboration with the Broad Institute Dr. Chad Nusbaum et al. SureSelect DNA Capture Array Developed in collaboration with Cold Spring Harbor Dr. Greg Hannon et al. 3 µg gdna Agilent 60-mer Array 244k & 1M features 1-5 µg gdna (with WGA) or 20 µg gdna (unamplified) *Flagship Method Released February 2009 Released July 2009 Page 4
Baits crna probes Long (120 bases) Biotin labeled Illumina GAII x Illumina HiSeq SOLiD 3 SOLiD 4 5500 GS FLX & GS JR
SureSelect Target Enrichment Kit Choices Product Target amount (Mb) Reactions/kit Product Definition Human X- demo Human All Exon v1 Human All Exon Plus Human All Exon v2 Human All Exon 50Mb 3.05 5 Human X-chr Exons 38 5-10,000 38-50 plus up to 6.8 of custom content 5-10,000 44 5-10,000 50 5-10,000 Catalog content from CCDS 2008 plus >1000 ncrna Add custom content to All Exon catalog content CCDS Sept. 2009 Plus additional RefSeq GENCODE content Most comprehensive coverage Multiplexable Kinome 3.2 5-10,000 All kinases Indexed custom content <0.2 0.2-0.49 0.5-1.49 1.5-2.9 3-6.8 10-5,000 Custom offering -Illumina (12 indexes) -SOLiD (16 barcodes)
SureSelect Kits Multiplexing Capability Target Enrichment Size Ranges Illumina AB SOLiD GA HiSeq 2000 Octet Quadrant Flow Cell Full Run <200 Kb targets 12 12 16 16 16 16 200 Kb - 499 Kb targets 12 12 16 16 16 16 500 Kb - 1.49 Mb targets 12 12 5 10 16 16 1.5 Mb - 2.99 Mb targets 12 12 3 7 16 16 3.0 Mb - 6.0 Mb targets 8 12 2 3 16 16 Human All Exon 38 Mb 1 4 0 1 3 7 Human All Exon 50 Mb 1 3 0 0 3 5
XT
Agilent SureSelect XT Kits gdna kit + Library Prep kit + SureSelect Reagents = SureSelect XT Kit SureSelect XT Kit Coupled with an optimized gdna prep and library prep kit, allows the use one kit for the entire, sample-prep-to-sequencing target enrichment workflow Kit composition gdna Isolation Lysis buffer and enzymes required for isolation Library prep Buffers, reagents, enzymes and indexes needed for prep SureSelect Target Enrichment Kit Hybridization buffers and baits All kits are available in the XT format- catalog kits and custom content SureSelect XT All Exome & SureSelect XT All Exome Plus SureSelect XT Human Kinome & SureSelect XT Human X Chromosome SureSelect XT Custom from < 200 Kb to > 6.8 Mb (up to 34 Mb in Spring 2011) Illumina GAIIx and HiSeq 2000 (Protocol v1.0 Nov 2010) SOLiD 3 / 4 and 5500 (Available soon)
SureSelect XT complete sample to sequencer solutions for your target enrichment needs Genomic DNA prep Manual Procedure for small number of samples Library prep (GA, SOLiD) Bioanalyzer, qpcr quant Sequencer Page 11
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 12
Exon Capture is a Powerful Tool to Study Mendelian Diseases Mendelian diseases are caused by coding mutations (with some exceptions) Exons are only ~1-1.4 % of human genome (30-50Mb) Primarily protein coding regions Advantages: Much less sequencing ~5% of WGS, so up to 20x more samples All Exons on X chromosomes 7674 exons 3 Mb Why coding? More interpretable Easier to follow up Especially adapted to study of Mendelian diseases CCDS exons v1 CCDS + RefSeq 38 Mb v2 (Broad) GENCODE 50 Mb (Sanger) Includes ncrna Page 13
Applications to Mendelian disorders and many more to come Page 16
SureSelect Human All Exon Kits All Exon v1 All Exon v2 All Exon50 Mb CCDS Sept. 2008 CCDS Sept. 2008 + additional RefSeq content including CCDS Sept. 2009 exons GENCODE and Sanger (includes CCDS and Broad defined v2 content as well) CCDS (Nov. 2010) 89.6% 98.2% 99.5% CNV (Mar. 2010) 23.98% 27.49% 30.62% Ensembl (Aug. 2010) 79.9% 90.9% 96.2% mirna (mirbase 14) 90.0% 90.0% 92.8% GenBank (6/16/2010) 75.96% 89.07% 90.74% RefSeq Genes (Nov. 2010) 85.0% 96.9% 99.0% RefSeq Transcripts (6/16/2010) 88.85% 95.07% 97.50% Target Size 38Mb 44Mb 50Mb Developed with Broad Broad Sanger Human All Exon kits can be customized (PLUS) with up to 6.8 Mb additional custom content Human All Exon kits can be multiplexed on SOLiD4 and HiSeq2000 Page 17
Human All Exon 50Mb 2x76 bp, 50-60M HQ Reads 100% 90% 80% 70% 60% 50% 40% 76.32% 85.07% 96.65% 87.93% 77.46% The most comprehensive Human All Exon content available 38 Mb design = a subset of 50 Mb Sequencing capacity: 0.5-1 sample / lane GAIIx 1-3 samples / lane HiSeq 5-10 samples /full slide SOLiD4 30% 20% 10% Chemistry recommended: PE 2x76 bp Illumina v4 PE 50+25 SOLiD 0% % on target +/- 100bp Uniformity (3/4 mean with upper tail): % bases with 1x coverage % bases with 10x coverage % bases with 20x coverage Multiplexing: Illumina SOLiD Page 18
Comparison of SNP Calls with HapMap Genotype Sensitivity vs. HapMap Genotype Concordance vs. HapMap 100% 99.1% 99.2% 98.4% 98.0% 100% 99.8% 99.4% 99.7% 99.3% 98.2% 98.5% 98.1% 98.3% 95% 95.7% 94.9% 95% 90% 90% 85% 85% 80% 80% 75% 75% 70% Human All Exon v2 Human All Exon 50Mb 70% Human All Exon v2 Human All Exon 50Mb GT is REF GT is variant HET GT is variant HOM GT is REF GT is variant HET GT is variant HOM OVERALL Page 19
All Exon Plus Is the Human All Exon Kit not hitting all of your regions of interest? Enter Your Custom Regions in earray CCDS exons >1000 ncrnas 38 Mb All Exon Library + Your Custom Library Your regions of interest (6.8 Mb) Page 20
22394 23224 24337 21976 26352 No. SNPs 4480 5173 5528 5816 6182 Human All Exon Plus Performance 1 tube capture, 1 lane seq. at 2x76 bp on GAIIx = ~2 Gb SNP Analysis vs. HapMap SNP Analysis vs. dbsnp 100% 90% 80% 35000 30000 70% 25000 60% 50% 20000 40% 15000 30% 10000 20% 10% 5000 0% Exome + 0.87 Mb Exome + 1.7 Mb Exome + 3.4 Mb Exome + 6.8 Mb Exome Control 0 Exome + 0.87 Mb Exome + 1.7 Mb Exome + 3.4 Mb Exome + 6.8 Mb Exome Control Sensitivity Concordance Concordant Novel Mismatched Page 21
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 22
Page 23 Beyond Mendelian Diseases: Complex diseases
Beyond Mendelian Diseases: Complex diseases 25bp deletion 7bp deletion 10bp deletion 11bp deletion Page 24
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 25
Other Applications of Targeted Re-Sequencing Capture any custom genomic regions (introns, exons, UTRs, regulatory, etc.) Ideal for biomarkers discovery and profiling (e.g. cancer) Ideal for custom SNP follow-up Ideal for characterization of large sample cohorts Key enabling features: High throughput 12 Illumina indexes / up to 96 samples per run 16 SOLiD barcodes / up to 128 samples per run Only pay what you capture, scalable from 0.2 to 6.9 Mb (sweet spot for 3 rd Gen Seq) <0.2 Mb 0.2 0.5 Mb 0.5 1.5 Mb 1.5 3 Mb 3 6.9 Mb Very reproducible, excellent allelic balance for accurate heterozygote calls Custom and catalog content (kinome) Automation (library prep and capture) Page 26
Target Enrichment Design Application in earray earray is a tool to design and order custom microarrays, qpcr primers and SureSelect products (and it is free!!) earray is divided into Application Spaces Allows for application specific functionality Target Enrichment application space features: Create custom baits and bait libraries Search existing designs/baits Catalog and custom Upload custom bait designs Download design files Share designs Get quotes Page 27
Customize your SureSelect TM Kit Create your own design or add extra custom sequence to a catalog design up to 6.8Mb earray Webportal Customer A Gene ID 1 Gene ID 2 Gene ID 3 Bait design Bait design Bait design Baits #1 Baits #2 Baits #3 Library design Kit size Virtual bait library Up to 55,000 unique baits Quote Order library DNA bait library RNA bait library Customer B https://earray.chem.agilent.com Kit Assemble kit Ship Kit to Customer Page 28
Sequence Any Genome- earray XD
Page 30 The Power of Smaller SureSelect Panels
Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancerassociated inherited mutations in these genes are collectively quite common, but individually rare or even private. To determine whether massively parallel, next-generation sequencing would enable accurate, thorough, and cost-effective identification of inherited mutations for breast and ovarian cancer, we developed a genomic assay to capture [with Agilent s custom SureSelect], sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited mutations that predispose to breast or ovarian cancer. There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic rearrangements for any gene in any test sample. This approach enables widespread genetic testing and personalized risk assessment for breast and ovarian cancer. Page 31
Deletion up to 19bp Excellent allelic balance Page 32
Efficient Capture of 5 bp Deletion on Chr X: Menke s Syndrome SureSelect Target Enrichment Kit Efficiently Captures 5 bp Mutant Readout on Illumina GA hg18_chrx_77131408_77131467_+ : Wild type Bait Design CTATTGTTTATCAACCTCATCTTATCTCAGTAGAGGAAATGAAAAAGCAGATTGAAGCT CTATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAA ATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAG TTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGC GTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAG TATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATT ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG CAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAA CCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAAGCT Page 36
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 37
SureSelect TM RNA Target Enrichment FIRST IN CLASS First RNA Capture product on the market Custom and catalog kits Design kits from 200Kb to 3.4Mb using RNA Target Enrichment space on earray portal (PN G7581-G7585) RNA Capture Kinome Kit (catalog) containing same content as current SureSelect Kit (PN G7580)
SureSelect RNA Enrichment Protocol Start with 0.1-0.5ug RNA Similar process to DNA Target enrichment Except that it is a cdna NGS library Protocol time ~ 3-4 days Protocols available Illumina and SOLiD Individual or multiplexed samples
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 45
SureSelect kinome Discovery and profiling of biomarkers related to disease and/or drug response Definition of SureSelect Human Kinome Kit: 3.2Mb (incl. UTRs) (Original content defined by Prof. René Bernards NKI) 518 putative kinases 12 PI3K domain-containing genes 13 diglyceride kinases 6 PI3K regulatory components 9 inositol polyphosphate Kinases 9 PIP4/PIP5 Kinases 28 genes frequently mutated in human cancer 19 genes specifically known to be mutated in breast cancer 612 genes total G. Manning et al Science 298 1912 (2002) Slide courtesy of Rene Bernards
Kinome Kit Performance 3-5 samples per GAIIx lane / SOLID quad 100% 80% 60% 40% 20% 0% Kinome Index 1 Reproducible Performance Across Indexes Kinome Index 2 Kinome Index 3 Kinome Index 4 Kinome Index 5 Even Index Representation Across Single Lane 0.98 1.15 1.00 1.15 0.84 Kinome Index 1 Kinome Index 2 Kinome Index 3 Kinome Index 4 Kinome Index 5 Uniform Read Depth Distribution % on target +/- 200bp % Bases 1X Coverage % Bases 10X Coverage % Bases 20X Coverage Page 47
Agenda 1 Introduction: SureSelect TM 2 Exome Approach for Genetic Diseases 3 Complex Diseases 4 Custom Biomarker Discovery and Profiling 5 Targeted RNA Sequencing 6 Kinome Kit 7 New SureSelect Products Page 49
SureSelect + 454 SureSelect support for both 454 FLX and GS Junior sequencers Simplified protocol with a rapid library protocol (<3hrs), the shortest in-solution capture protocol Only 500 ng of starting material required Full SureSelect product line available, custom bait libraries from <200 kb - 6.8 Mb capture size or catalog kit ( Human All Exon, Kinome and X chromosome) Allows for detection of mutations, SNPs, indels, CNVs and fusions/translocations
454 FLX SureSelect Custom Capture: 0.5 Mb 1/4 PicoTiterPlate run, 67 Gb of sequence >95% of capture sequenced at 20X depth or greater DNA Bait and Pond NA10831 00.5Mb_B Avg read length: bases 360.3 Total number of bases mapped: bases 67,157,190 Percentage reads in targeted regions : 57.07% Percentage reads in regions +/- 300bp: 59.35% Average Read Depth: fold 52.7 Percentage of targeted bases covere d by......at least 1 read: 99.34%...at least 5 reads: 98.99%...at least 10 reads: 98.12%...at least 20 reads: 95.42%...at least 30 reads: 88.83%...at least 40 reads: 76.10%
454 FLX SureSelect Custom Capture: 0.5 Mb SNP Detection 343 HapMap SNPs were assayed in replicate samples of NA10831 98% 95% 97% 100% 100% 100%
Cancer Research Gene Fusions Problem: Genomic rearrangement in tyrosine kinase genes Can lead to deregulation of cellular signaling and cancer Identification of novel TK fusions is laborious TKs are attractive therapeutic targets Solution: SureSelect Custom Capture (908 Kb) based on known cancer-derived TK fusions Designed baits to a conserved GXGXXG motif in 90 TKs + ATK and BRAF Regions extended to include preceding exons/introns 454 long-read sequencing SureSelect Custom Capture (908 Kb)
SureSelect XT Mouse All Exon Kit Agilent SureSelect XT Mouse All Exon Kit For SOLiD, Illumina & 454 platforms Available in 5 to 10,000 reactions Designed against UCSC mm9 / NCBI build 37 (July 2007) Exon definition derived from Ensembl + RefSeq Complete Mouse exome coverage 49.6 Mb capture 221,784 exons and 24,306 genes Excellent coverage uniformity, on-target specificity and SNP detection and accuracy
SureSelect XT Mouse All Exon Kit: Illumina GAIIx, single lane 2x76 PE, 5.2 Gb C3H mouse genomic DNA, 49.6 Mb Mouse All Exon Capture On-target reads= 69% 98% of Bases covered at 1x or greater and the average read depth was 54X 84 % of the targeted bases were sequenced at a depth of 20X or greater, enabling highconfidence SNP calling
SureSelect Mouse All Exon Kit SNP Sensitivity & Concordance 99% 99% 98% 95% A) Sensitivity of SNP detection relative to the Perlegen Mouse SNP dataset was very high with the SureSelect XT Mouse All Exon Kit / Illumina GAIIx platform, with 99 percent of reference SNPs detected for the C3H and DBA samples. Variant SNPs were also detected at high rates (99 percent) for both the C3H and DBA samples. B) Of the SNPs detected, concordance with the Perlegen Mouse SNP data set for both the C3H ad DBA samples was 98 percent for the variants and 95 percent overall.
SureSelect XT Catalog Exome Kits Coming soon in 2011 - SureSelect XT Exome Kits for Bovine, Canine, Xenopus, and Zebrafish
Acknowledgements Collaborators: Broad Institute Chad Nussbaum et. al Stacey Gabriel et. al Sheila Fisher et. al Sanger Institute Daniel Turner et.al., NKI Rene Bernards Ian Majewski RIKEN Institute Yoichi Gondo All our early access collaborators (over 20 institutions worldwide)