The Agilent Technologies

Size: px
Start display at page:

Download "The Agilent Technologies"

Transcription

1 The Agilent Technologies SureSelect Platform for Target Enrichment Focus your next-gen sequencing on DNA that matters Emily M. Leproust, Ph.D. Director Applications and Chemistry R&D Genomics

2 Agenda: Introduction Custom probe design / earray update Human All Exon (Plus) Indexing/barcoding update Page 2

3 Target Enrichment: A Highly Enabling Process What? Also referred to as genome partitioning, targeted re- sequencing, DNA capture Captures genomic material of interest for next generation sequencer (i.e. Illumina, SOLiD, 454 etc ) Remaining genomic material discarded Why? Addresses a major workflow bottleneck Enables focus on a subset of the genome Saves both time and money for downstream sequencing Identify homozygous and heterozygous variants in targets relative to the reference genome Page 3

4 Agilent s SureSelect Platform: New Options SureSelect Target Enrichment System* Developed in collaboration with the Broad Institute Dr. Chad Nusbaum et al. SureSelect DNA Capture Array Developed in collaboration with Cold Spring Harbor Dr. Greg Hannon et al. 1-3 µg gdna Agilent 60mer Array 1-5 µg gdna (with WGA) 20 µg gdna (unamplified) *Flagship Method Page 4

5 <3µg Library Preparation Illumina SOLiD Hbidi Hybridization / Capture 0.5µg 24 hours Baits: - crna probes - Long (120bp) - Biotin labeled - User-defined (earray) - SurePrint synthesis Bead Separation Wash / Elution / Amp Page 5

6 <3µg Library Preparation Illumina SOLiD Hbidi Hybridization / Capture 0.5µg Advantages of Agilent Target Enrichment: Long baits tolerate mismatches RNA DNA hybrids stronger than DNA DNA RNA probe is strand specific: Allows large molar excess of bait, Target limited; improves uniformity Easily automated: All steps liquid handling Bead Separation Low input DNA (< 3 ug) 24hrs hybridization Same high quality Nucleic Acids as Agilent arrays Wash / Elution / Amp 24 hours Working on Solution Enrichment since 2006, license from Broad Baits: - crna probes - Long (120bp) - Biotin labeled - User-defined (earray) - SurePrint synthesis Page 6

7 SureSelect Target Enrichment System: Workflow <3µg 0.5µg 24 hours SOLiD 3 SOLiD 4 Page 7

8 SureSelect Target Enrichment System: Workflow <3µg 0.5µg 24 hours Illumina GAII x HiSeq 2000 Page 8

9 Broad Paper on Cover of February, 2009 Nature Biotechnology Underlying Technology of SureSelect Target Enrichment System Page 9

10 Agilent SureSelect Target Enrichment Efficacy: Sample Coverage Plot SureSelect Bait Coordinates: Black Bars 2x tiling design approach This plot (generated in the UCSC Genome Browser) shows sequence coverage data for a representative portion of the Agilent SureSelect Target Enrichment demo kit, which covers the RefSeq exons on the non pseudoautosomal portion of Chromosome X using 2x tiling. In red is the log of the sequence coverage. Bait coordinates are denoted in black and the exons that constitute the target region are in blue. Sequence coverage is higher for those exons that are covered by more than one overlapping bait Page 10

11 earray: a free tool to design custom SureSelect libraries earray Webportal Customer A Genomic locations #1 Design parameters Genomic locations #2 Design parameters Bait design Bait design Baits #1 Baits #2 Library design Kit size Virtual bait library Quote Order library Real DNA bait library Genomic locations #3 Bait design Design parameters Baits #3 Up to 55,000 Unique Baits/Kit RNA bait library Send to Manufacturing Pass on Kit size info, etc. Customer B Kit Assemble kit Ship Kit to Customer Page 11

12 earray Supported species H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, C. familiaris, S. cerviseae, S. pombe, G. gallus, B. taurus, A. thaliana Page 12

13 earrayxd Import your own genomes Available now Page 13

14 SureSelect Target Enrichment Kit Agilent provides all reagents necessary to complete a SureSelect capture aside from SPRI beads and Herculase II in a fully quality controlled kit format. Page 14

15 Human All Exons in a Tube - 38 Mb: CCDS + >1,000 ncrna -1 sample = 1 tube - 1 or 2 lane(s) (2 x 76bp) on GAIIX - 1 or 2 quad(s) (1 x 50bp) on SOLID3 - No gel library preparation on GAIIX - 3ug starting gdna Page 15

16 Efficient All Exon Capture: 38Mb = CCDS + >1,000 ncrna 1 tube capture, seq. at 2x76bp on GAIIx 100% 90% Percent tage of Base es Covered 80% 70% 60% 50% 40% 30% 20% 10% 0% 90% of 10x 80% of 20x at least 1 read at least 5 reads at least 10 reads at least 20 reads at least 30 reads 1 lane 2.1 Gb 2 lanes 4.2 Gb 3 lanes 6.2Gb 4 lanes 8.2Gb Sequencing Coverage per sample Page 16

17 Efficient All Exon Capture: 38Mb, 1 lane/sample, seq. at 2x76bp on GAIIx v4 (4-5 Gb/lane) En nrichment Specificity an nd Coverage e Depth 100% 90% Specificity (On or 80% Near Target Reads) 70% 60% 50% Bases with 10X Coverage or Greater 40% 30% 20% 10% 0% Sensitivity (Targeted Regions with at Least 1 Read) Breast Cancer Samples Page 17

18 Accurate SNP calling and discovery of novel SNPs (GAIIx) 35,000 30,000 25,000 20,000 15,000 Novel Sites dbsnp Mismatch 10,000 dbsnp Concordant 5, Page 18

19 Efficient All Exon Capture: 38Mb = CCDS + >1,000 ncrna 1 tube capture, seq. on SOLiD All Exon Coverage for Single and Paired End Sequencing 100% 90% es Covered ge of Targeted Bas Percenta 80% 70% 60% 50% 40% 30% 20% 10% 0%...at least 1 read:...at least 5 reads:...at least 10 reads:...at least 20 reads:...at least 30 reads:...at least 40 reads: 1.29 Gb 2.29 Gb 2.2 Gb 4.45 Gb Single end Paired End Total Number of Mapped Bases (Gb) Page 19

20 Efficient All Exon Capture: 38Mb, 1 quad/sample, seq. at 1x50bp on SOLiD 3 (1 Gb/Quad) 100% 90% 80% Specificity (On or 70% Near Target Reads) 60% 50% Need to format data likefor Illumina (have samples as X axis and metrics as different bars) Bases with 10X Coverage or Greater 40% 30% 20% Sensitivity (Targeted Regions with at Least 1 Read) 10% 0% Lung Cancer Lung Cancer Page 20

21 Efficient Capture of 5bp deletion on the X-Chromosome: Menke s Syndrome SureSelect Target Enrichment Kit Efficiently Captures 5 bp Mutant Readout on Illumina GA hg18_chrx_ _ _+ : Wildtype Bait Design CTATTGTTTATCAACCTCATCTTATCTCAGTAGAGGAAATGAAAAAGCAGATTGAAGCT CTATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAA ATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAG TTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGC GTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAG TATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATT ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG CAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAA CCTCATCTT-----A-TAGAGGAAATGAAAAAGCAGATTGAAGCT Page 21

22 All Exon Plus (Add custom content to your All Exon) Is the Human All Exon not hitting all of your regions of interest? Enter Your Custom Regions in earray All Exon Library Your Custom Library CCDS exons >1000 ncrnas 38Mb + Your regions of interest Page 22

23 Human All Exon Plus (All Exon + custom content) 1 tube capture, 1 lane seq. at 2x76bp on GAIIx = ~2Gb SNP Analysis vs. HapMap SNP Analysis vs. dbsnp 100% 90% 80% 70% % 50% 40% 30% No. SNPs % 10% % Exome Mb Exome Mb Exome Mb Exome Mb Exome Control 0 Exome Mb Exome Mb Exome Mb Exome Mb Exome Control Sensitivity Concordance Concordant Novel Mismatched Page 23

24 Indexing/ Barcoding Increasing sequencing capacity = lower costs, more samples Combine multiple samples per sequencing lane Save on capture costs with production scale Pay only for the Mb you capture: <0.2Mb samples per 2x36bp lanes <0.5Mb <1.5Mb <3Mb 3-4 samples per 2x36bp lanes <6.6Mb For optimum performance: Capture Index Pool Sequence Page 24

25 Illumina Indexing Procedure with SureSelect 1. Library prep Library Preparation Protocol for Indexing with SureSelect requires: Illumina InPE1.0 primer 2. Pre Capture PCR SureSelect Pre Capture Primer 1. Agilent SureSelect Indexing Kit 2. Illumina PE Library Prep Kit 3. Illumina Multiplexing Sample Preparation Oligonucleotide Kit SureSelect S Universal Primer Prepared Library without Index Tag 3. SureSelect Hybridization/Capture All SPRI = Automation friendly No gel extraction required Herculase II amplification = High fidelity performance 4. Post Capture PCR Index Illumina Index Pi Primer (Up to 12) Captured Library with Index Tag Index added after capture = Even performance across indexes Eliminates cumbersome 3 Primer PCR for index addition Similar protocol available for SOLiD (x16 primers) Page 25

26 Mass Balancing of Indexed Samples Post-Capture Results in Uniform Representation of Sequencing Reads 0.2 Mb Capture Library (with 12 indexes per lane) ATCACG CGATGT TTAGGC TGACCA ACAGTG GCCAAT CAGATC ACTTGA GATCAG TAGCTT GGCTAC CTTGTA Fold-representation of each index is expressed as the number of high quality reads the median across all indexes. Shown is a typical result of a) 12 indexed samples captured with a 0.2 Mb SureSelect library Performing individual SureSelect captures for each sample before addition of index tags results in uniform representation and minimizes potential sample competition during hybridization. Page 26

27 Library quantification in NGS workflow (GA) Isolate genomic DNA Shear DNA Library preparation PCR Target enrichment Brilliant III UF SYBR qpcr Features: - Accurate quantification of PCRamplifiable DNA (both adapters) - Higher sensitivity (>4 orders) - Requires less DNA, so fewer PCR cycles to amplify (less bias) & can quantify SureSelect libraries prior to PCR Library validation Sequencing PCR (index tagging) 2100 Bioanalyzer Features: - Assess size distribution of fragmented DNA - Accurate quantification at DNA fragmentation and post-pcr steps Independent quantification of library insert, primer dimers and large MW product Page 27

28 Indexing/Barcoding Coverage: 0.2 MB library - 12 indexes in 1 Illumina lane (~2Gb) or Bases Percent tage of Reads % On-Target % Bases with >20X Coverage Mb Coverage by Index Page 28

29 Comparison vs. HapMap: 0.2 MB library - 12 indexes in 1 Illumina lane (~2Gb) 100% 95% Genotype Sensitivity 100% 95% Genotype Concordance IS_REF 90% 90% 85% 80% IS_VAR_HOM IS_VAR_HET OVERALL 85% 80% 75% VARIANT 75% 70% 70% Index Sequences Index Sequences Page 29

30 Comparison vs. dbsnp: 0.2 MB library - 12 indexes in 1 Illumina lane (~2Gb) Number of SNP sites dbsnp mismatch Novel sites dbsnp concordant 0 Index Sequences Page 30

31 0.2 Mb Custom Index Capture, Illumina GAII % of Bases with 10 0X or Greate er Coverage e 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Page 31

32 SOLiD Barcoding 7 barcodes of 3.0Mb Capture in 1 SOLiD Quad, 1x50bp, 50M reads 100% 7 Indexes of 3.0Mb Capture in 1 SOLiD Quad Index Representation in Single SOLiD Quad 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Index A Index B Index C Index D Index E Index F Index G Percentage reads in targeted regions: Percentage reads in regions +/ 100bp: Percentage reads in regions +/ 200bp: Fold Representation Relative to the Median Data represents average of two replicate barcodes HQ Reads Mean/Barcode 6,431,190 Median/Barcode 3,537,291 Total/Quad 48,233,927 Index A Index B Index C Index D Index E Index F Index G Page 32

33 SOLiD Barcoding SNP concordance and sensitivity 7 barcodes of 3.0Mb Capture in 1 SOLiD Quad, 1x50bp, 50M reads 100% Genotype Sensitivity 100% Genotype Concordance 95% isref 95% 90% isvarhet 90% 85% 80% 75% isvarhom Overall Variant 85% 80% 75% 70% A B C D E F G 70% A B C D E F G Barcode number Barcode number Page 33

34 SOLiD Barcoding Comparison with dbsnp 7 barcodes of 3.0Mb Capture in 1 SOLiD Quad, 1x50bp, 50M reads 1400 Comparison of observed SNP calls vs. dbsnp 130 Number of SNP sites A B C D E F G Novel Sites dbsnp mismatches dbsnp concordant We see >= 99.79% concordance rate with dbsnp across all called SNPs. There are, on average, ~200 novel SNPs called for each sample. The same 2 SNP calls are in disagreement across all samples. The SNP calls all pass the Phred Confidence threshold and have an average read depth of 80. Barcode Page 34

35 SOLiD Barcoding 16 barcodes of 0.2Mb Capture in 1 SOLiD Quad, 1x50bp, 30M reads 100% 90% Standard Index Representation in Single SOLiD Quad 80% 70% Percentage reads % in targeted regions: Percentage reads 50% in regions +/ 100bp: % Percentage reads 30% in regions +/ bp: % Fold Representation Relative to the Mean 10% 0% HQ Reads Mean/Barcode 1,844,819 Median/Barcode 1,843,950 Total/Quad 29,517,104 BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BC10 BC11 Page 35

36 SOLiD Barcoding Comparison with dbsnp 16 barcodes of 0.2Mb Capture in 1 SOLiD Quad, 1x50bp, 50M reads 350 dbsnp concordance across multiple barcodes Comparison of observed SNP calls vs. dbsnp We see 100% concordance rate with dbsnp across all called SNPs. tes Number of SNP sit There are, on average, ~80 novel SNPs called for each sample Novel Sites dbsnp Mismatches dbsnp Concordant 0 BC1 BC2 BC3 BC4 BC5 BC6 BC7 BC8 BC9 BC10 BC11 BC12 BC13 BC14 BC15 BC16 Barcode Page 36

37 Why not pooling first, then capture? Blocking of multiple indexes is too inefficient for now Need more development Capture first enables highly hl efficient i blocking = high h % on-target t Pooling before capture creates capture bias Higher quality samples are captured better (i.e. have more reads) than low quality samples Creates variations in read distributions between samples Necessitate recapture/sequencing of poor performers Capture first enables mass balancing after capture/indexing = uniform index distribution Main driver is cost saving per sample Higher production scale enables lower bait costs, indexing enables lower sequencing costs Capture first enables savings on BOTH capture and sequencing Page 37

38 SureSelect Target Enrichment Kit Configurations Product Target amount (catalog number) Reactions/kit Product Definition X demo 34Mb Exons in the human X chr Custom content 3.4 Mb and 6.6 Mb 10 5,000 All Exon 38 Mb 5 10,000 All Exon Plus Indexed custom content 38 Mb + up to 6.8 Mb of custom content <0.2 Mb, Mb, Mb, Mb Mb Design custom content using earray. Catalog content from CCDS + >1000 ncrna 5 10,000 Add custom content to All Exon catalog content 10 5,000 Cost saving custom offering Illumina (12 indexes) and SOLiD (16 barcodes) Page 38

39 SureSelect Experiment Workflow Discovery: SureSelect All-Exon (Plus) enrichment and sequencing Discover 10,000s of SNPs Filter to 100s - 1,000s novel, high-quality, high-confidence, non-synonymous variants SNPs InDels Validation SureSelect custom SNPs Re-Sequencing High Resolution G3 CGH/CNV Arrays Deep coverage on up to 1,000s variants Deep tilling on 100s of variants Custom SureSelect + Indexing Custom G3 arrays with 1M probes Custom SureSelect enrichment of variants 0.2Mb, 0.5Mb, 1.5Mb, 3Mb or 6.6Mb Indexing (+ automation) Sample cohorts Profiling Multiplexed G3 CGH/CNV Arrays 8x60k or 4x180k per slides Custom design Sample cohorts Page 39

40 Summary: Efficient Enrichment for Re-Sequencing Available for SE and PE on Illumina and SE on SOLiD Single tube Human All Exon catalog product with option to add up to 6Mb custom Indexing/Barcoding for Illumina and SOLiD Free web portal, earray, enables fully custom design o Future launch of earray xd would enable bait design for any genome Low DNA input (3 μg gdna; 500 ng of prepared library) Fast, reproducible and efficient capture o Specificity 40-60% on target capture o Reproducibility R 2 of 0.96 between technical replicates o o Uniformity: i >80% of captured bases at 20x coverage; targets with one bait captured effectively Negligible bias to DNA containing mutations as well as a 5bp deletion (SE) Accurate SNP calls Simple workflow, scalable, automatable increase sample size cost-effectively For more information and product announcements please go to

41 Recent publications using SureSelect Target Enrichment Page 41

42 The Problem: Linkage analysis confirmed family inheritance pattern to a 28 Mb region (build 36) >200 genes contained in 28 Mb region; daunting task to sequence and validate all the variants. Solution: Identify a 2 nd family that manifests the syndrome, determine linkage and perform targeted sequencing using the X demokitonheterozygote on femalesofboth families Page 42

43 Target : 2.7 Mb; 36 bp SE seq on Illumina GA Findings Family 1 heterozygote showed a frameshift due toana insertion in RBM 10 (RNA binding motif 10) Family 2 heterozygote showed a G to A substitution in RBM 10 (RNA binding motif 10) Both mutations were confirmed by Sanger Sequencing Inhibition of the orthologous Rbm 10 expression in developing mouse embryo using anti sense probes showed patterns that were consistent with observed human malformations Page 43

44 Future Trends The Agilent SureSelect Platform Scales with Sequencing Technologies Other catalogue content Mouse All Exon SOLiD Paired-End d support Automation Support for 454 Improved Protocol Simplified Improved po performance Page 44

45 Illumina Library Prep and SureSelect Enrichment on the Bravo Automated Liquid Handling Platform Page 45

46 Next Gen Sequencing and SureSelect Overview Library Prep Shear Genomic DNA Repair Ends 3 -da Addition Adapter Ligation PCR Enrichment Prepped Library SureSelect Oligo Capture Library SureSelect Target Enrichment Library Hybridization Magnetic Bead Capture PCR QA Illumina Cluster & Sequence

47 Next Gen Sequencing and SureSelect Overview Library Prep Shear Genomic DNA Repair Ends 3 -da Addition Adapter Ligation PCR Enrichment Prepped Library SureSelect Automated Oligo Capture Purification Library Automated Purification Automated Purification Automated Purification Automated Purification Automated Purification SureSelect Target Enrichment Library Hybridization Automated Magnetic Bead Capture PCR QA Illumina Cluster & Sequence Agilent Automation Bravo Liquid Handler

48 More information Google: Agilent To Enrich or Not to Enrich Page 48

49 Page 49

50 Updated Illumina GA Library Quantification Method Recommends Agilent Mx3005P QPCR Standard run in triplicate (1:2 dilutions) 100pM 1.5pM, dilute samples to ~10nM in EB Ensure quality of run with standard efficiency (90 110%) Interpolate library lb samples against standard d Page 50

51 Post Pre QPCR for exons of 2 different genes NTC NTC (Water Blank) is negative, no detection of exon, no Ct Pre hyb samples are positive for selected exons, Ct=30 Post hyb samples are positive for selected exons, Ct=20 ΔCt (pre post)=~10, or 1024 fold enrichment of selected exons, in line with NGS enrichment metrics Similar Ct for pre, post and ΔCt across genes support similar exon representation in amplified library before and after enrichment, and thus no bias Page 51

52 Minimized or Amplification Free Sequencing Key aspect is minimizingamplificationin the NGS protocol eliminate or reduce the amount of PCR required, reduced artifact bias and time Consolidate steps where possible Agilent Mx3005p QPCR system provides real time amplification curve management, with pause feature that allows start and stop of cycling PCR of adapter ligated library step can be consolidated d with quantification step by QPCR Amplification of library proceeds in the presence of SYBR green User stops cycling as library samples begin to amplify Restart cycling for lower concentration samples All samples would get a Ct to be interpolated against standard curve as in previous protocols Amplification and QPCR Titration Page 52 Agilent Restricted

53 Minimized or Amplification Free Sequencing Agilent Mx3005p can pause, provides minimal required amplification for each library expansion and simultaneous quantification by running only required number of cycles Library sample 1 done Standard Curve Quant Report of library concentration Page 53 Agilent Restricted

54 Acknowledgements Collaborators: o Broad Institute Chad Nussbaum et. al Stacey Gabriel et. al Sheila Fisher et. al o Sanger Institute Daniel Turner et.al., o All our early access collaborators (over 20 institutions worldwide) Illumina and Life Technologies R&D/Marketing Page 54