Structural Variant Detection in SMRT Link 5 with pbsv

Size: px
Start display at page:

Download "Structural Variant Detection in SMRT Link 5 with pbsv"

Transcription

1 Structural Variant Detection in SMRT Link 5 with pbsv Aaron Wenger For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.

2 STRUCTURAL VARIANT = DIFFERENCE 50 BP Deletion Insertion Duplication Inversion Tandem Repeat Translocation

3 VARIATION BETWEEN TWO HUMAN GENOMES vs variants basepairs affected 5 Mb 3 Mb 10 Mb SNVs indels structural variants Huddleston et al. (2017) Genome Research 27(5):

4 STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME PacBio 20,000 Short reads 4,000 repeats + GC-rich + large insertions Huddleston et al. (2017) Genome Research 27(5): Seo et al. (2016) Nature 538: Sudmant et al. (2016) Nature 526:75-81.

5 SEQUENCING + ANALYSIS pbsv? Structural Variants Short reads Li and Durbin (2009) Bioinformatics 25: McKenna et al. (2010) Genome Research 20: BWA SNVs + Indels

6 3 COMPONENTS TO PBSV pbsv command line utility for top-level commands pbsvutil command line utility for detailed commands SMRT Link web interface

7 TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

8 TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

9 PBSV ALIGN UTILIZES NGM-LR pbsvutil ngmlr sequencing errors (frequent & independent) penalty structural variants (infrequent & correlated) gap size Rescheneder, Sedlazeck, and Schatz.

10 PBSV ALIGN UTILIZES NGM-LR BWA NGM-LR pbsvutil ngmlr sequencing errors sequencing errors penalty structural variants penalty structural variants gap size gap size

11 PBSV ALIGN CHAINS CO-LINEAR ALIGNMENTS Reference X W Y W Z pbsvutil chain Read X Z W X W Y W Z

12 TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

13 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

14 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

15 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

16 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads ANNOTATE SV Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

17 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

18 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat - Alu FILTER SV 2 and 20% reads support

19 PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat - Alu FILTER SV 2 and 20% reads support

20 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

21 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

22 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis chr ACGCGGCCGCCTCCTCCTCCGAACGTGGCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGA A PASS IMPRECISE;SVTYPE=DEL;END=904587;SVLEN=-97;SVANN=TANDEM GT:AD:DP 0/1:9:15

23 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis chr Deletion -97. GT:AD:DP 0/1:9:15 SVANN=TANDEM

24 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

25 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

26 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

27 PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

28 3 COMPONENTS TO PBSV pbsv command line utility for top-level commands pbsvutil command line utility for detailed commands SMRT Link web interface

29 ACKNOWLEDGMENTS Schatz Lab Michael Schatz Philipp Rescheneder Fritz Sedlazeck PacBio Yuan Li Chris Dunn Ben Lerch Jim Drake Nat Echols Aaron Klammer Mary Budagyan NGM-LR penalty convex errors indels gap size

30 For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.