Structural Variant Detection in SMRT Link 5 with pbsv

Similar documents
Jenny Gu, PhD Strategic Business Development Manager, PacBio

Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology

SMRT Analysis Barcoding Overview

Statistics Output Guide

Guidelines for Using the Sage Science Pippin Pulse Electrophoresis Power Supply System

Procedure & Checklist - Preparing SMRTbell Libraries using PacBio Barcoded Universal Primers for Multiplex SMRT Sequencing

Variant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD

Analysis of Structural Variants using 3 rd generation Sequencing

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Reference-quality diploid genomes without de novo assembly

Assignment 9: Genetic Variation

SNP calling and VCF format

Comprehensive Views of Genetic Diversity with Single Molecule, Real-Time (SMRT) Sequencing

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Next-Generation Sequencing Services à la carte

A near perfect de novo assembly of a eukaryotic genome using sequence reads of greater than 10 kilobases generated by the Pacific Biosciences RS II

Procedure & Checklist - Preparing Asymmetric SMRTbell Templates

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

SVMerge Output File Format Specification Sheet

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Next-Generation Sequencing. Technologies

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Course Presentation. Ignacio Medina Presentation

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Detection of Fusion Genes by Targeted Roche 454 Sequencing

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Axiom mydesign Custom Array design guide for human genotyping applications

What is genetic variation?

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

Analytics Behind Genomic Testing

Read Mapping and Variant Calling. Johannes Starlinger

HLA and Next Generation Sequencing it s all about the Data

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Calling DNA Variants Steve Laurie Centro Nacional de Analisis Genomico (CNAG-CRG), Barcelona

SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

About Strand NGS. Strand Genomics, Inc All rights reserved.

Next Generation Sequencing: Data analysis for genetic profiling

Variant calling in NGS experiments

Applications of PacBio Single Molecule, Real- Time (SMRT) DNA Sequencing

The SMRTer Way: Single Genes to Complex Genomes

1.1 Post Run QC Analysis

Supplementary Table 1. Summary of whole genome shotgun sequence used for genome assembly

Germline variant calling and joint genotyping

Genome Sequencing and Structural Variation

Data processing and analysis of genetic variation using next-generation DNA sequencing!

Create a Planned Run. Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin USER BULLETIN. Publication Number MAN Revision A.

Surely Better Target Enrichment from Sample to Sequencer and Analysis

GENOMICS WORKFLOW SOLUTIONS THAT GO WHERE THE SCIENCE LEADS. Genomics Solutions Portfolio

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004

Gap Filling for a Human MHC Haplotype Sequence

SeqStudio Genetic Analyzer

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Genome Sequence of Medicago sativa: Cultivated Alfalfa at the Diploid Level (CADL)

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Nature Biotechnology: doi: /nbt.3943

MedSavant: An open source platform for personal genome interpretation

Tangram: a comprehensive toolbox for mobile element insertion detection

LightScanner Hi-Res Melting Comparison of Six Master Mixes for Scanning and Small Amplicon and LunaProbes Genotyping

Long-read sequencing and de novo assembly of a Chinese genome

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Supplementary Material for

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Genome editing. Knock-ins

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Genomic solutions for complex disease

Lon Phan, Jeffrey Hsu, Le Quang Minh Tri, Michaela Willi, Tamer Mansour, Yan Kai 8,9, John Garner 1, John Lopez 1, Ben Busby 1

Ion S5 and Ion S5 XL Systems

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

ACCEL-NGS 2S DNA LIBRARY KITS

Bioinformatics in next generation sequencing projects

Sequence variation in the short tandem repeat system SE33 discovered by next generation sequencing

Ion S5 and Ion S5 XL Systems

Bionano Access 1.1 Software User Guide

Improving PacBio Long Read Accuracy by Short Read Alignment

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Next Generation Sequencing and Bioinformatics Analysis Pipelines. Adam Ameur National Genomics Infrastructure SciLifeLab Uppsala

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Toward a better understanding of plant genomes structure: combining NGS and optical mapping technology to improve the sunflower assembly

Browsing Genes and Genomes with Ensembl

Sanger vs Next-Gen Sequencing

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

High Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

LUMPY: A probabilistic framework for structural variant discovery

Accelerate precision medicine with Microsoft Genomics

De novo whole genome assembly

NGS technologies approaches, applications and challenges!

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Standard Services Price List

Transcription:

Structural Variant Detection in SMRT Link 5 with pbsv Aaron Wenger 2017-06-27 For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.

STRUCTURAL VARIANT = DIFFERENCE 50 BP Deletion Insertion Duplication Inversion Tandem Repeat Translocation

VARIATION BETWEEN TWO HUMAN GENOMES vs. 4 10 5 2 10 4 variants 5 10 6 basepairs affected 5 Mb 3 Mb 10 Mb SNVs indels structural variants Huddleston et al. (2017) Genome Research 27(5):677-85.

STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME PacBio 20,000 Short reads 4,000 repeats + GC-rich + large insertions Huddleston et al. (2017) Genome Research 27(5):677-85. Seo et al. (2016) Nature 538:243-7. Sudmant et al. (2016) Nature 526:75-81.

SEQUENCING + ANALYSIS pbsv? Structural Variants Short reads Li and Durbin (2009) Bioinformatics 25:1754-60. McKenna et al. (2010) Genome Research 20:1297-303. BWA SNVs + Indels

3 COMPONENTS TO PBSV pbsv command line utility for top-level commands pbsvutil command line utility for detailed commands SMRT Link web interface

TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

penalty PBSV ALIGN UTILIZES NGM-LR pbsvutil ngmlr sequencing errors (frequent & independent) structural variants (infrequent & correlated) gap size Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.

penalty penalty PBSV ALIGN UTILIZES NGM-LR BWA NGM-LR pbsvutil ngmlr sequencing errors sequencing errors structural variants structural variants gap size gap size

W PBSV ALIGN CHAINS CO-LINEAR ALIGNMENTS Reference X W Y W Z pbsvutil chain Read X Z X W Y W Z

TOP-LEVEL PBSV COMMANDS pbsv generate-config [-h] [-o sv.cfg] (optional) Generate a configuration file to specify options for other stages. pbsv align [-h] [--cfg_fn sv.cfg] ref.fa subreads.bam ref.align.bam Map reads to a reference genome with a structural variant aware aligner. pbsv call [-h] [--cfg_fn sv.cfg] ref.fa ref.align.bam ref.sv.bed vcf Call structural variants from aligned reads.

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CLUSTER SV SIGNATURES SUMMARIZE INTO SV GENOTYPE SV ANNOTATE SV CIGAR D & I 50 bp nearby with similar sequence consensus of supporting reads supporting reads / covering reads Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads ANNOTATE SV Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat - Alu FILTER SV 2 and 20% reads support

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER FIND SV SIGNATURES CIGAR D & I 50 bp CLUSTER SV SIGNATURES nearby with similar sequence SUMMARIZE INTO SV consensus of supporting reads 63 bp insertion 329 bp deletion GENOTYPE SV supporting reads / covering reads heterozygous (1 of 10) heterozygous (4 of 10) ANNOTATE SV Alu, LINE, SVA, tandem repeat - Alu FILTER SV 2 and 20% reads support

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis chr1 904490 ACGCGGCCGCCTCCTCCTCCGAACGTGGCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGA A PASS IMPRECISE;SVTYPE=DEL;END=904587;SVLEN=-97;SVANN=TANDEM GT:AD:DP 0/1:9:15

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis chr1 904490 904587 Deletion -97. GT:AD:DP 0/1:9:15 SVANN=TANDEM

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER SMRT Analysis

3 COMPONENTS TO PBSV pbsv command line utility for top-level commands pbsvutil command line utility for detailed commands SMRT Link web interface

penalty ACKNOWLEDGMENTS Schatz Lab Michael Schatz Philipp Rescheneder Fritz Sedlazeck PacBio Yuan Li Chris Dunn Ben Lerch Jim Drake Nat Echols Aaron Klammer Mary Budagyan NGM-LR convex errors indels gap size

www.pacb.com For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies. All other trademarks are the sole property of their respective owners.