Emerging applications of SMRT Sequencing

Size: px
Start display at page:

Download "Emerging applications of SMRT Sequencing"

Transcription

1 Emerging applications of SMRT Sequencing N Lance Hepler For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.

2 AGENDA -Minor variants -Structural variants -Multiplexed microbial WGS

3 AGENDA -Minor variants -Structural variants -Multiplexed microbial WGS

4 WITHIN-PATIENT VIRAL DIVERSITY 60.0% 10.2% 7.3% 5.0% 3.1% 2.7% 1.0% 0.8% 0.2% After Months virions per day 3x10-5 mutations/rt/base Day 1

5 HOW TO AVOID TREATMENT FAILURES? Identify minor variants Adjust treatment Avert failure

6 UNIFIED MINOR VARIANT SOFTWARE PIPELINE - JULIETFLOW BLASR fuse cleric juliet

7 MINOR VARIANT CALLING - JULIET -Targeted amplicon approach, reference guided, one-click analysis - 4kb amplicon (fully spanned by long reads) -Initial viral focus (HIV, HCV, HBV), oncology (BCR-ABL) -De-novo codon variant discovery now, small in/dels later -Reliable 1% minor variant detection high-quality CCS per sample (RQ > 0.99, 5 passes) false-negative rate and 1% false-positive rate -High multiplexing per 1M ZMW Sequel cell, 6 hour run -8 samples for 1% detection: minimum 48k CCS reads yield ->50* samples for 10% detection: minimum 30k CCS reads yield juliet -Extensible for new disease areas and organisms -Drug-resistance mutation/phenotype annotation

8 SCALES OF RECONSTRUCTION H 1 : H 2 : H 3 : ACCGTGAACGTTTCTGGACTTAGAGATATCTAGCTGTCATAGGCCATGTGTGACAGTCAGTTTGCATA ACGGTGAACATTTCAGCACTTAGGGAAATCTAGCTGTCATAGGCCATGTGTGACAGTCAGCTTGAATA ACGTTGAACGTTTCAGGACTTAGGGAGATCTAGCTGTCATAGGCCATGTGTGACAGTCACCTTGCATA 50% 30% 20% Cons: ACSGTGAACRTTTCWGGACTTAGRGADATCTAGCTGTCATAGGCCATGTGTGACAGTCAGYTTGMATA ACCGTGACCGTTTCTGGACTTAGAGATATCTAGCTGTCATAGGCCATGTGTGACAGTCAGTATGC ACCGTGAACGTTTCTGGACTTAGAGATATCTAGCTGGCATAGGCCATGTGTGACAGTCAGTTTGCA ACGGTGAACATTTCAGCACTTAGGGAAATCTAGCTGTCATAGGCCATGTGTGACAGTCAGCTTGAAT ACGGTGAACATTTCAGCACTTAGGGAAATCTAACTGTCATAGGCCATGTGTGACAG-CAGCTTGAATA ACGTTGAACGTTTCAGGACTTAGGGAGATCTAGCTGTCATAGGCCATGTGTGACAGTCACCTTGCATA ACCGTGAACGTTTCTGGACTTAGAGATATCTAGCTGTCATAGGCCATGTGTGACAGTCAGTTTGCATA CGTGAACGGTTCTGGACTTA-AGATATCTAGCTGTCATAGGCCATGTGTGACAGTCAGTTTGCATA GGTGAACATTTCAGCACTTAGGGAAATCTAGCTGTCATAGGCCATGTGTGACAGTCAGCTTGAA CGTGAACGTTTCTGGACTTAGAGATATCTAGCTGTCATAGACCATGTGTGACAGTCAGTTTGCAT GTGAACATTTCAGCACTTAGGGAAATCTAGCTGTCATAGGCCATGTGTGACAGTCAGCTTGAATA TTGAACGTTTCAGGACTTAGGGAGATCTAGCTGTCATAGGCCATGTGTGACAGTCACCTTGCATA TGCACGTTTCTGGACTTAGAGATATCTAGCTGTCATAGGCCATGTGTGACAGTCAGTTTGCATA long-range local SNV sequencing error

9 CHALLENGE Reliable identify 1% variants from sequencing noise

10 MINOR VARIANT OR PCR HETERODUPLEXES? (1) Subread for a CCS read with a heteroduplex (2) Filtering via QVs and conversion to N forward and reverse strand

11 JULIET OUTPUT Report can include directly interpretable outcomes, like drug resistance variants.

12 PERFORMANCE BENCHMARK Mix in-silico five known HIV strains, 1-3 SNVs distance 96% WT, 1% minors 10,000 data sets at 3000x 0.07% FP rate 0.0% FN rate

13 EXAMPLE: PHI29 INTERNAL MIXTURE Phasing of distant co-occurring mutations

14 JULIET EXAMPLE: WHY PHASING MATTERS

15 JULIET EXAMPLE: HIV POL

16 JULIET EXAMPLE: HIV POL

17 WHERE TO GET IT Official release in SMRT Link 5.0 (soon) as-is version

18 AGENDA -Minor variants -Structural variants -Multiplexed microbial WGS

19 USE CASES: MEDICAL GENETICS Clinical application of whole-exome sequencing across clinical indications Retterer K, Juusola J, Cho MT, et al. RESULTS: The overall diagnostic yield of WES was 28.8%. The diagnostic yield was 23.6% in proband-only cases and 31.0% when three family members were analyzed. Exome sequencing: the expert view Biesecker LG, Shianna KV, Mullikin JC What are the major limitations of exome sequencing? KS: The major limitation of exome sequencing may be the inability to comprehensively represent genomic SVs. Many groups have designed algorithms that use a read depth or read pair-based approach for predicting structural variation; however, these approaches are not very efficient at identifying SVs with exome data. Another approach uses a split read method, but this will not be comprehensive and will miss many of the SVs. -Identify pathogenic mutations for individuals with a genetic disease -Two main classes: rare Mendelian disease and cancer -Success is high diagnostic yield

20 STRUCTURAL VARIANTS CONTRIBUTE TO DISEASE, TRAITS, AND EVOLUTION Disease chrx microduplication and microdeletion syndromes Traits AMY1 copy number is tied to starch consumption Evolution human-specific deletions are tied to human-specific traits - Stankiewicz and Lupski. (2010) Annu Rev Med 61: Perry et al. (2007) Nat Genet 39: McLean et al. (2011) Nature 471:216-9.

21 PACBIO LONG READS DISCOVER MANY STRUCTURAL VARIANTS MISSED BY ILLUMINA SHORT READS Personal Genome PacBio Coverage Deletions 50 bp Insertions 50 bp AK fold 7,358 10,077 CHM1 (haploid) 2 62-fold 7,557 12,998 CHM13 (haploid) 2 66-fold 7,306 13,118 HX fold 9,891 10,284 Structural Variants in a Typical Human Genome 1-4 PacBio 20,000 Illumina 4,000-1 Seo et al. (2016) Nature 538: Huddleston J, et al. (2016) Genome Res Shi et al. (2016) Nat Commun 7: Sudmant et al. (2015) Nature 526:75-81.

22 CASE UNDIAGNOSED BY ILLUMINA WGS Euan Ashley Jason Merker Negative clinical gene sequencing Negative Illumina WGS -

23 LOW COVERAGE SEQUENCING ON SEQUEL SMRT Cells 10 Run time 60 hrs Basepairs 26.7 Gb Reads 4.3 M Sequel Chemistry V1.2 Initial call set Deletions 50 bp 6,971 Insertions 50 bp 6,821 Not in segdup 5,893 6,254 Not in NA12878 healthy control Overlaps RefSeq coding exon Gene linked to some disease in OMIM 2, ,

24 USE CASES: POPULATION GENETICS 1000 Genomes Project aims to create a complete and detailed catalogue of human genetic variations. decode Genetics aims to identify human genes associated with common disease using population studies. Build a database of common structural variation Apply database in GWAS and as population controls Success is a complete population database

25 MOST STRUCTURAL VARIANTS IN THE HUMAN POPULATION LIKELY REMAIN UNDISCOVERED 100% Sensitivity (% of structural variants found) 80% 60% 40% 20% Variant frequency in population 5% 1% 0.5% 0% 10 humans at 50-fold Today - Poisson Model with assumptions: sample of population is random; no population substructure; equal sequencing coverage per individual; sequencing across genome is Poisson random; reads are long enough to identify all variants; infinite variants of each frequency; variant caller uses pooled discovery and requires 10 reads of support

26 INCREASED SAMPLE SIZES WILL YIELD HIGHER RATES OF STRUCTURAL VARIANT DISCOVERY 100% Sensitivity (% of structural variants found) 80% 60% 40% 20% Variant frequency in population 5% 1% 0.5% 0% 10 humans at 50-fold 100 humans at 50-fold 1,000 humans at 50-fold Today - Poisson Model with assumptions: sample of population is random; no population substructure; equal sequencing coverage per individual; sequencing across genome is Poisson random; reads are long enough to identify all variants; infinite variants of each frequency; variant caller uses pooled discovery and requires 10 reads of support

27 VARIANT DISCOVERY DOES NOT REQUIRE DEEP SEQUENCING OF EVERY INDIVIDUAL 100% Sensitivity (% of structural variants found) 80% 60% 40% 20% Variant frequency in population 5% 1% 0.5% 0% 10 humans at 50-fold 100 humans at 50-fold 1,000 humans at 50-fold 1,000 humans at 5-fold Today same total sequencing - Poisson Model with assumptions: sample of population is random; no population substructure; equal sequencing coverage per individual; sequencing across genome is Poisson random; reads are long enough to identify all variants; infinite variants of each frequency; variant caller uses pooled discovery and requires 10 reads of support

28 RESEQUENCING FOR STRUCTURAL VARIANTS READS ALIGNMENTS VARIANTS PacBio & SVs NGM-LR pbsv

29 RESEQUENCING FOR STRUCTURAL VARIANTS READS ALIGNMENTS VARIANTS PacBio & SVs NGM-LR pbsv SVs from full set discovered 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Coverage Insertion Deletion

30 IGV: BASELINE VIEW PACBIO ILLUMINA -Credit: Jim Robinson

31 IGV: LABEL LARGE INSERTIONS PACBIO ILLUMINA

32 IGV: PACBIO SUPPORT IS IN PREFERENCES purple insertion labels hide small indels only show consistent mismatches

33 AGENDA -Minor variants -Structural variants -Multiplexed microbial WGS

34 MULTIPLEXED MICROBIAL WGS -PacBio sequencing is the gold standard for microbial WGS. -Multiplexing bacterial samples is the key to maximizing throughput, efficiency, and cost on Sequel System. -Multiplexing of microbial genomes is achievable using standard barcodes, however the process is complicated by competing priorities: maximizing barcoded yield and maximizing read length. -We ve optimized the protocol and have released an end-to-end workflow for bacterial multiplexing. Early bases Distal barcode

35 MULTIPLEXED MICROBIAL WGS SUMMARY Workflow compatible both on the PacBio RS II and Sequel systems Complete assembly with <10 contigs with 50-fold coverage: Multiplex up 12 microbes (<5 Mb) Recommended # of SMRT Cells: PacBio RS II: 2 microbes = 1 SMRT Cell Sequel System: 12 microbes (4-5 Mb)/ 1 SMRT Cell Base Modification: Minimum 100-fold coverage required Under Investigation: 10 hour movies and Pre-extension increase read length and barcoded yield

36 WHERE TO GET IT Sample preparation: content/uploads/procedure-checklist- Preparing-SMRTbell-Libraries-PacBio- Barcoded-Adapters-Multiplex-SMRT- Sequencing.pdf Data analysis: content/uploads/analysis-procedure- Multiplexed-Microbial-Assembly-SMRT- Link.pdf ASM 2016 poster: PacBio data set wiki/8-plex-ecoli-multiplexed-microbial- Assembly

37 SMRT LINK 5.0 COMING SOON

38 ACKNOWLEDGMENTS -Minor Variants -Armin Töpfer -Michael Brown -Matt Boitano -Structural Variants -Aaron Wenger -Yuan Li -Multiplexed Microbial WGS -John Harting -Christine Lambert -Primo Baybayan

39 For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.