The Agilent Technologies SureSelect Platform for Target Enrichment Focus your next-gen sequencing on DNA that matters Emily M. Leproust, Ph.D. Director Applications and Chemistry R&D Genomics
Agenda: Introduction Applications for Mendelian diseases research Custom biomarker discovery and profiling Applications for cancer biomarker discovery and profiling Automation Page 2
Target Enrichment: A Highly Enabling Process What? Also referred to as genome partitioning, targeted re-sequencing, DNA capture Captures genomic material of interest for next generation sequencer (i.e. Illumina, SOLiD, 454 etc ) gdna Why? Sequence your regions of interest! Enables focus on a subset of the genome Saves both time and money for downstream sequencing Enriched library Identify homozygous and heterozygous variants in targets relative to the reference genome Page 3
Sequencing Technology Keeping up with Science Older Methods Appropriate Lower Throughput Discovery PCR + Sanger Sequencing Newer Methods Appropriate High Throughput Discovery SureSelect Platform + Next-gen Sequencing (from http://www.sciencemag.org/cgi/content/full/291/5507/1221/f1)
Agilent s SureSelect Platform: New Options SureSelect Target Enrichment System* Developed in collaboration with the Broad Institute Dr. Chad Nusbaum et al. SureSelect DNA Capture Array Developed in collaboration with Cold Spring Harbor Dr. Greg Hannon et al. 1-3 µg gdna Agilent 60mer Array 1-5 µg gdna (with WGA) 20 µg gdna (unamplified) *Flagship Method Page 5
Distinct Target Enrichment Products for Distinct Project Needs SureSelect Target Enrichment System SureSelect DNA Capture Array Throughput High Low Study Sizes 10-1,000s samples 1-10 samples DNA Input 1-3 µg* 1-20 µg** Up to 6.8 Mb (1x 1Mb (20x tiling) Capture of tiling, 120mer Target DNA baits) for custom; (Custom 60-mer catalog products baits) *minimum of 500 ng of prepped sample required **Depends upon use of whole genome amplification. or Page 6
<3µg Library Preparation Illumina SOLiD Hybridization / Capture 0.5µg Advantages of Agilent Target Enrichment: Long baits tolerate mismatches RNA-DNA hybrids stronger than DNA-DNA RNA probe is strand-specific: Allows large molar excess of bait, Target-limited; improves uniformity Easily automated: All steps liquid handling Bead Separation Low input DNA (< 3 ug) 24hrs hybridization Same high quality Nucleic Acids as Agilent arrays Wash / Elution / Amp 24 hours Working on Solution Enrichment since 2006, license from Broad Baits: - crna probes - Long (120bp) - Biotin labeled - User-defined (earray) - SurePrint synthesis Page 7
SureSelect Target Enrichment System: Workflow <3µg 0.5µg 24 hours SOLiD 3 SOLiD 4 Page 8
SureSelect Target Enrichment System: Workflow <3µg 0.5µg 24 hours Illumina GAII x HiSeq 2000 Page 9
Broad Paper on Cover of February, 2009 Nature Biotechnology Underlying Technology of SureSelect Target Enrichment System Page 10
Agilent SureSelect Target Enrichment Efficacy: Sample Coverage Plot 269 0 SureSelect Bait Coordinates: Black Bars 2x tiling design approach This plot (generated in the UCSC Genome Browser) shows sequence coverage data for a representative portion of the Agilent SureSelect Target Enrichment demo kit, which covers the RefSeq exons on the non-pseudoautosomal portion of Chromosome X using 2x tiling. In red is the log of the sequence coverage. Bait coordinates are denoted in black and the exons that constitute the target region are in blue. Sequence coverage is higher for those exons that are covered by more than one overlapping bait Page 11
Agenda: Introduction Applications for Mendelian diseases research Custom biomarker discovery and profiling Applications for cancer biomarker discovery and profiling Automation Page 12
Exon capture is powerful to study Mendelian diseases Mendelian diseases are caused by coding mutations (with some exceptions) Exons are only ~1-1.4 % of human genome (30-50Mb) Primarily protein coding regions Advantages: Much less sequencing ~5% of WGS, so up to 20x more samples All Exons on X chromosomes 7674 exons 3Mb Disadvantage: Miss non-coding variants Why coding+? More interpretable Easier to follow up Especially adapted to study of Mendelian diseases CCDS exons v1 CCDS + RefSeq 38Mb v2 (Broad) GENCODE 50Mb (Sanger) Includes ncrna Page 13
Applications to Mendelian disorders and many more to come Page 14
The Problem: Linkage analysis confirmed family inheritance pattern to a 28 Mb region (build 36) >200 genes contained in 28 Mb region; daunting task to sequence and validate all the variants. Solution: Identify a 2 nd family that manifests the syndrome, determine linkage and perform targeted sequencing using the X- demo kit on heterozygote females of both families Page 15
Target : 2.7 Mb; 36 bp SE seq on Illumina GA Findings Family 1 heterozygote showed a frameshift due to an A insertion in RBM 10 (RNA binding motif 10) Family 2 heterozygote showed a G to A substitution in RBM 10 (RNA binding motif 10) Both mutations were confirmed by Sanger Sequencing Inhibition of the orthologous Rbm 10 expression in developing mouse embryo using anti-sense probes showed patterns that were consistent with observed human malformations Page 16
Human All Exon Kits Most comprehensive coverage New Original design Exome V2 50Mb design CCDS Sept. 2008 CCDS Sept. 2008 + additional RefSeq content including CCDS Sept. 2009 exons GENCODE and Sanger (includes CCDS and Broad defined v2 content as well) CCDS (Sept. 2009) 93.76% 99.01% 99.86% CNV (Mar. 2010) 23.98% 27.49% 30.62% Ensembl (6/16/2010) 65.58% 71.37% 75.24% mirna (mirbase 14) 90.00% 90.00% 92.78% GenBank (6/16/2010) 75.96% 89.07% 90.74% RefSeq Genes (6/16/2010) 86.69% 93.29% 96.47% RefSeq Transcripts (6/16/2010) 88.85% 95.07% 97.50% Total 37Mb 38Mb 50Mb Developped with Broad Broad Sanger Human All Exon kits can be customized (PLUS) with up to 6.8Mb additional custom content Human All Exon kits can be multiplexed on SOLiD4 and HiSeq2000 Page 17
Human All Exon 50Mb 2x76bp, 50-60M HQ reads 100% 90% 80% 70% 60% 50% 40% 76.3% 85.1% 96.7% 87.9% 77.5% Most comprehensive Human All Exon content available 38Mb design = a subset of 50Mb Sequencing capacity: 0.5-1 sample / lane GAIIx 1-3 samples / lane HiSeq 5-10 samples /flowcell SOLiD4 30% 20% 10% 0% % on target +/- 100bp Uniformity (3/4 mean with upper tail): % bases with 1x coverage % bases with 10x coverage % bases with 20x coverage Chemistry recommended: PE 2x76bp Illumina PE 50+25 SOLiD Multiplexing: Illumina SOLiD Page 18
Comparison of SNP calls with HapMap Genotype Sensitivity vs. HapMap Genotype Concordance vs. HapMap 100% 99.1% 99.2% 98.4% 98.0% 100% 99.8% 99.4% 99.7% 99.3% 98.2% 98.5% 98.1% 98.3% 95% 95.7% 94.9% 95% 90% 90% 85% 85% 80% 80% 75% 75% 70% Human All Exon v2 Human All Exon 50Mb 70% Human All Exon v2 Human All Exon 50Mb GT is REF GT is variant HET GT is variant HOM GT is REF GT is variant HET GT is variant HOM OVERALL Page 19
All Exon Plus (Add custom content to your All Exon) Is the Human All Exon not hitting all of your regions of interest? Enter Your Custom Regions in earray All Exon Library Your Custom Library CCDS exons >1000 ncrnas 38Mb + Your regions of interest Page 20
22394 23224 24337 21976 26352 No. SNPs 4480 5173 5528 5816 6182 Human All Exon Plus (All Exon + custom content) 1 tube capture, 1 lane seq. at 2x76bp on GAIIx = ~2Gb SNP Analysis vs. HapMap SNP Analysis vs. dbsnp 100% 90% 80% 35000 30000 70% 25000 60% 50% 20000 40% 15000 30% 10000 20% 10% 5000 0% Exome + 0.87 Mb Exome + 1.7 Mb Exome + 3.4 Mb Exome + 6.8 Mb Exome Control 0 Exome + 0.87 Mb Exome + 1.7 Mb Exome + 3.4 Mb Exome + 6.8 Mb Exome Control Sensitivity Concordance Concordant Novel Mismatched Page 21
Efficient Capture of 5bp deletion on the X-Chromosome: Menke s Syndrome SureSelect Target Enrichment Kit Efficiently Captures 5 bp Mutant Readout on Illumina GA hg18_chrx_77131408_77131467_+ : Wildtype Bait Design CTATTGTTTATCAACCTCATCTTATCTCAGTAGAGGAAATGAAAAAGCAGATTGAAGCT CTATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAA ATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAG TTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGC GTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAG TATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATT ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG CAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAA CCTCATCTT-----A-TAGAGGAAATGAAAAAGCAGATTGAAGCT Page 22
Agenda: Introduction Applications for Mendelian diseases research Custom biomarker discovery and profiling Applications for cancer biomarker discovery and profiling Automation Page 23
Other Applications of Targeted Re-Sequencing Capture any custom genomic regions (introns, exons, UTRs, regulatory, etc.) Ideal for biomarkers discovery and profiling (e.g. cancer) Ideal for custom SNP follow-up Ideal for characterization of large sample cohorts Key enabling features: High throughput 12 Illumina indexes / up to 96 samples per run 16 SOLiD barcodes / up to 128 samples per run Only pay what you capture, scalable from 0.2 to 6.8Mb (sweet spot for 3 rd Gen Seq) <0.2 Mb 0.2 0.5Mb 0.5 1.5Mb 1.5 3Mb 3 6Mb Very reproducible, excellent allelic balance for accurate heterozygote calls Custom and catalog content (kinome) Automation (library prep and capture) Page 24
Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancerassociated inherited mutations in these genes are collectively quite common, but individually rare or even private. To determine whether massively parallel, next-generation sequencing would enable accurate, thorough, and cost-effective identification of inherited mutations for breast and ovarian cancer, we developed a genomic assay to capture [with Agilent s custom SureSelect], sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited mutations that predispose to breast of ovarian cancer. There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic rearrangements for any gene in any test sample. This approach enables widespread genetic testing and personalized risk assessment for breast and ovarian cancer. Page 25
Deletion up to 19bp Excellent allelic balance Page 26
earray: a free tool to design custom SureSelect libraries earray Webportal Customer A Genomic locations #1 Design parameters Genomic locations #2 Design parameters Bait design Bait design Baits #1 Baits #2 Library design Kit size Virtual bait library Quote Order library Real DNA bait library Genomic locations #3 Design parameters Bait design Baits #3 Up to 55,000 Unique Baits/Kit Send to Manufacturing Pass on Kit size info, etc. RNA bait library Customer B Assemble kit Kit Ship Kit to Customer Page 27
earray Supported species H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, C. familiaris, S. cerviseae, S. pombe, G. gallus, B. taurus, A. thaliana Page 28
earrayxd Import your own genomes Available now Page 29
Indexing/Barcoding Procedure with SureSelect 1. Library prep For optimum performance: Capture Illumina / SOLiD primer 2. Pre-Capture PCR SureSelect Universal Primer 4. Post-Capture PCR SureSelect Pre-Capture Primer Prepared Library without Index Tag 3. SureSelect Hybridization/Capture Index Illumina / SOLiD Index Primer (Up to 12/16) Captured Library with Index Tag Index Pool Sequence Combine multiple samples per sequencing lane Save on capture costs with production scale Pay only for the Mb you capture: <0.2Mb 12-16 samples 0.2 0.5Mb 0.5 1.5Mb 1.5 3Mb 3-4 samples 3 6.8Mb Similar protocol available for SOLiD (x16 primers) Page 30
SOLiD Barcoding 16 barcodes of 0.2Mb Capture in 1 SOLiD Quad, 1x50bp, 30M reads 100% 90% Standard Index Representation in Single SOLiD Quad 80% 70% 60% 50% 40% 30% 20% Percentage reads in targeted regions: Percentage reads in regions +/- 100bp: Percentage reads in regions +/- 200bp: 0.86 1.20 0.98 0.90 1.03 0.93 0.96 1.05 0.90 0.79 1.02 0.96 1.17 1.15 1.05 1.05 Fold Representation Relative to the Mean BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BC10 BC11 10% 0% HQ Reads Mean/Barcode 1,844,819 Median/Barcode 1,843,950 Total/Quad 29,517,104 Page 31
Number of SNP sites SOLiD Barcoding Comparison with dbsnp 16 barcodes of 0.2Mb Capture in 1 SOLiD Quad, 1x50bp, 50M reads 350 dbsnp concordance across multiple barcodes Comparison of observed SNP calls vs. dbsnp 130 300 250 85 87 97 97 91 108 101 96 99 108 79 99 96 102 87 92 We see 100% concordance rate with dbsnp across all called SNPs. 200 150 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 There are, on average, ~80 novel SNPs called for each sample. 100 50 199 201 205 203 206 206 202 201 202 204 199 204 199 200 200 199 Novel Sites dbsnp Mismatches dbsnp Concordant 0 BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BC10 BC11 BC12 BC13 BC14 BC15 BC16 Barcode http://www.broadinstitute.org/gsa/wiki/index.php/the_genome_analysis_toolkit Page 32
Agenda: Introduction Applications for Mendelian diseases research Custom biomarker discovery and profiling Applications for cancer biomarker discovery and profiling Automation Page 33
Targeted Therapies: Expensive, Can they be more effective? 100 % Herceptin Cost: 40,000 per patient/year Survival 0 % Time (months) Tito Fojo and Christine Grady, J Natl Cancer Inst. 2009;10:1044-8 Slide courtesy of Rene Bernards
KRAS mutation alters response to EGFR therapy in Colon Cancer BRAF mut: 10% KRAS mut: 32% PIK3CA mut: 13% Mutant KRAS +EGFR -EGFR Wild type KRAS +EGFR -EGFR Amado et al. J Clin Oncol; 26:1626-1634 2008 Slide courtesy of Rene Bernards
Current clinically useful correlations between kinome alterations and drug responses HER2 expression (breast) Herceptin BCR-ABL translocation (CML) Gleevec KRAS mutation (colon) (no response) Cetuximab EGFR mutation (lung) Erlotinib Somewhat further away, but promising: BRAF mutation (melanoma) PLX4032 ALK mutation/translocation (lung, neuroblastoma) PF02341066 PIK3CA mutation (breast) (no response) Herceptin Slide courtesy of Rene Bernards
SureSelect kinome Discovery and profiling of biomarkers related to disease and/or drug response G. Manning et al Science 298 1912 (2002) 518 putative kinases 12 PI3K domain-containing genes 6 PI3K regulatory components 13 diglyceride kinases 18 genes frequently mutated in human cancer 19 genes specifically known to be mutated in breast cancer Slide courtesy of Rene Bernards
Kinome Kit Performance 3-5 samples per GAIIx lane / SOLID quad Page 38
Agenda: Introduction Applications for Mendelian diseases research Custom biomarker discovery and profiling Applications for cancer biomarker discovery and profiling Automation Page 39
Illumina Library Prep and SureSelect Enrichment on the Bravo Automated Liquid Handling Platform Page 40
Next-Gen Sequencing and SureSelect Overview Library Prep Shear Genomic DNA Repair Ends 3 -da Addition Adapter Ligation PCR Enrichment Prepped Library SureSelect Oligo Capture Library SureSelect Target Enrichment Library Hybridization Magnetic Bead Capture PCR QA Illumina Cluster & Sequence Page 41
Next-Gen Sequencing and SureSelect Overview Library Prep Shear Genomic DNA Repair Ends 3 -da Addition Adapter Ligation PCR Enrichment Prepped Library Automated Purification Automated Purification Automated Purification Automated Purification Automated Purification SureSelect Oligo Capture Library Automated Purification SureSelect Target Enrichment Library Hybridization Automated Magnetic Bead Capture PCR QA Illumina Cluster & Sequence Agilent Automation Bravo Liquid Handler Page 42
Most Comprehensive, Ever Expanding SureSelect Menu Product Target amount (catalog number) Reactions/kit Product Definition X-demo 3.4 Mb 5 Exons in the human X-chr All Exon v1 38 Mb 5-10,000 All Exon Plus 38 Mb + up to 6.8 Mb of custom content 5-10,000 All Exon v2 38 Mb + RefSeq 5-10,000 All Exon 50Mb 50Mb 5-10,000 Catalog content from CCDS + >1000 ncrna Add custom content to All Exon catalog content CCDS Sept. 2009 + additional RefSeq GENCODE content Most comprehensive coverage Multiplexable Kinome <3Mb 5-10,000 All kinases Indexed custom content <0.2 Mb, 0.2-0.49 Mb, 0.5-1.49 Mb, 1.5-2.9 Mb 3 6.8 Mb 10 5,000 Cost-saving custom offering Illumina (12 indexes) and SOLiD (16 barcodes) Page 43
Summary: Efficient Enrichment for Re-Sequencing Most comprehensive Human All Exon catalog products New 50Mb catalog content Multiplexable (HiSeq and SOLiD4) With option to add up to 6Mb custom (Exon Plus) Enables biomarker discovery and profiling New Kinome catalog content and custom content Indexing/Barcoding for Illumina and SOLiD Scalable and affordable from 0.2 6.8Mb Free web portal, earray, enables fully custom design Fastest way to biological answer Low DNA input Accurate SNP calls Fast, reproducible and, automatable Available for SE and PE on Illumina and SOLiD Page 44
Acknowledgements Collaborators: o Broad Institute Chad Nussbaum et. al Stacey Gabriel et. al Sheila Fisher et. al o Sanger Institute Daniel Turner et.al., o NKI Rene Bernards Ian Majewski o All our early access collaborators (over 20 institutions worldwide) Life Technologies and Illumina R&D/Marketing Page 45
Thank you QUESTIONS?