Whole Genome, Exome, or Custom Targeted Sequencing: How do I choose? Aaron Thorner, PhD Clinical Genomics Group Leader

Size: px
Start display at page:

Download "Whole Genome, Exome, or Custom Targeted Sequencing: How do I choose? Aaron Thorner, PhD Clinical Genomics Group Leader"

Transcription

1 Whole Genome, Exome, or Custom Targeted Sequencing: How do I choose? Aaron Thorner, PhD Clinical Genomics Group Leader Center for Cancer Genome Discovery (CCGD) Dana-Farber Cancer Institute d

2 Outline Center for Cancer Genome Discovery (CCGD) Power of massively parallel sequencing (MPS) MPS workflow Genome, Exome, or Custom Targeted Sequencing Sequencing analysis and reporting

3 Outline Center for Cancer Genome Discovery (CCGD) Power of massively parallel sequencing (MPS) MPS workflow Genome, Exome, or Custom Targeted Sequencing Sequencing analysis and reporting

4 CCGD Mission To advance precision cancer medicine by developing new technologies for the analysis of cancer genomes and to provide basic, translational, and clinical investigators with access to these technologies. Technology development: To develop new technologies for the analysis of cancer genomes. Collaborations: To provide access to these genomic technologies to basic, translational, and clinical investigators at Dana-Farber and beyond. Translation: To translate technologies to the clinical setting.

5 CCGD Structure CCGD is the research and development group within the Precision Cancer Medicine effort at Dana-Farber Cancer Institute, Brigham and Women's Hospital, and Boston Children's Hospital.

6 Massively Parallel Sequencing (MPS) Enables comprehensive genome analysis quickly, accurately, and economically However, cost of analysis and storage has not followed the same trend!

7 The Power of MPS Whole-genome, whole-exome, transcriptome, and targeted sequencing Detect rare alleles/mutations Discover indels, translocations, and copy number variations Determine potential driver mutations Identify mechanisms of drug resistance Measure expression changes Integrative Genomics Viewer (IGV) Thorvaldsdóttir, Robinson et al., 2012

8 Outline Center for Cancer Genome Discovery (CCGD) Power of massively parallel sequencing (MPS) MPS workflow Genome, Exome, or Custom Targeted Sequencing Sequencing analysis and reporting

9 CCGD Instrumentation Read Length HiSeq 3000 (Patterned flow cell) 2 x 100 Paired End (PE) HiSeq 2500 Rapid Run Mode MiSeq 2 x 100 PE 2 x 100 PE Lanes per Flow Cell PF Reads per Lane > 600 million 300 million 28 million Target Coverage 80% > 30x 80% > 30x 80% > 30x Time 40 hours 40 hours 17 hours Images courtesy of Illumina, Inc. Illumina, Inc. All rights reserved.

10 Workflow Quantify DNA gdna samples received ( ng FFPE, FF, etc.) *PicoGreen dsdna Quantification DNA shearing (Covaris) Sequencing Agilent Bioanalyzer or Agilent TapeStation QC analysis, Clean-up Agilent SureSelect Hybrid Capture Manual or automated (Beckman-Coulter Biomek FXp) Library construction (Illumina, Beckman, Kapa); QC: Bioanalyzer and MiSeq quant

11 Agilent SureSelect Hybrid Capture

12 Outline Center for Cancer Genome Discovery (CCGD) Power of massively parallel sequencing (MPS) MPS workflow Genome, Exome, or Custom Targeted Sequencing Sequencing analysis and reporting

13 Genome, Exome, or Targeted Sequencing Should I sequence the whole genome, the whole exome, or a targeted set of exons? Currently, the functional genome is more clinically relevant/actionable More cost-efficient to sequence portions of the genome Targeted Sequencing: Higher mutiplexing of samples in flow cell lanes reduces cost Greater sequencing depth Simultaneous detection of mutations, copy number alterations and translocations Targeted panels (Agilent Technologies) Whole Exome v5: 50.4 Mb Oncopanel v3: 550 genes + translocations, 2.8 Mb Mb Custom targeted panels: sizes vary

14 Genome, Exome, or Targeted Sequencing Whole Genome Sequencing (3000 Mb) Pros: Unbiased sequence all genes, regulatory regions, etc. Detect structural variants and copy number aberrations Coverage uniformity Longer reads De novo genome assembly Cons: High cost Data storage Difficult clinical research interpretation Coverage lower (somatic mutation detection more difficult)

15 Genome, Exome, or Targeted Sequencing Whole Exome (50.4 Mb; < 2% of genome) Pros: Reduced cost Higher coverage Lower data storage costs Exome more highly characterized than whole genome Faster turnaround More samples sequenced per lane Copy number and structural variants (w/ limitations) Quicker/cheaper analysis Cons: Copy number and structural variants limitations Difficult clinical research interpretation (variants of unknown significance, VUS) Genes with unknown function Virtually no coverage of regulatory regions Less uniform coverage versus WGS

16 Genome, Exome, or Targeted Sequencing Custom Targeted Sequencing (OncoPanel+translocations) (3.6 Mb; ~0.12% of genome) Pros: Choose your genes/regions of interest Reduced cost Higher coverage Lower data storage costs Less material needed for capture Easier clinical research interpretation Exome more highly characterized than whole genome Faster turnaround More samples sequenced per lane Some copy number and structural variant analysis Quicker/cheaper analysis Cons: Copy number and structural variants highly limited False negatives (variant of significance is missed) VUS Virtually no coverage of regulatory regions, unless targeted Less uniform coverage

17 Genome, Exome, or Targeted Sequencing Q: Should I sequence the whole genome, whole exome, or a targeted set of exons? A: It depends on your scientific question! - Do you need deeper coverage? - Are potential causative genes known or unknown? - Regulatory regions? - Structural and copy number variations? - How much money and material do you have? - What is the starting quality of your gdna?

18 Genome, Exome, or Targeted Sequencing Whole Genome Exome v5 OncoPanel Instrument HiSeq 2500 HiSeq 2500 HiSeq 2500 Read Length 2 x 100 PE 2 x 100 PE 2 x 100 PE # Lanes # Samples # Reads per Lane 300 million 300 million 300 million Mean Target Coverage* *Estimated. Results vary based on sample quality. 10 x 150 x 200 x

19 Outline Center for Cancer Genome Discovery (CCGD) Power of massively parallel sequencing (MPS) MPS workflow Genome, Exome, or Custom Targeted Sequencing Sequencing analysis and reporting

20 Bioinformatics Analysis Demultiplex Reads In Lane Alignment To Reference Genome Merge Aligned & Unaligned BAM Lane Level Demultiplexed Bam files Lane Level Aligned Bam file Bam files ready for aggregation Duplicate Marking Realignment Around Known Indels Quality Score Recalibration Lane level quality metrics SNP site genotype calls Variant and Indel detection MuTect: Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnology (2013).doi: /nbt.2514 GATK Indel Locator:

21 Reporting of Targeted Seq In-depth pre- and post-project discussions Comprehensive reporting with QCs and data Post-fragmentation Post-library construction Sequencing Metrics Variant Detection Copy Number Analysis

22

23 Thank you! (Project Initiation) (Informatics) Directors Matthew Meyerson, MD, PhD Laura MacConaill, PhD Paul Van Hummelen, PhD Faculty William Hahn, MD, PhD Adam Bass, MD Rameen Beroukhim, MD, PhD Matthew Freedman, MD Levi Garraway, MD Todd Golub, MD Massimo Loda, MD Kornelia Polyak, MD, PhD Charles Roberts, MD, PhD Kimberly Stegmaier, MD Genomics Team Aaron Thorner, PhD Andrea Clapp Haley Coleman Samantha Drinian Angelica Laing Suzanne McShane Edwin Thai Liuda Ziaugra Bioinformatics Team Matthew Ducar Joshua Bohannon Robert Burns Johann Hoeftberger Phani Kishore Monica Manam Neil Patel Paul Rapoza Priyanka Shivdasani Bruce Wollison

24 Agilent products described are For Research Use Only. Not for use in diagnostic procedures.