Next Generation Sequencing: Data analysis for genetic profiling

Similar documents
Welcome to the NGS webinar series

Variant calling in NGS experiments

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Your Best Data: Teaming QIAGEN Chemistry & Bioinformatics to Drive Samples to Insight

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

SNP calling and VCF format

Prioritization: from vcf to finding the causative gene

QIAseq SPE technology for Illumina : Redefining amplicon sequencing

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

NGS in Pathology Webinar

Molecular analysis via next-generation sequencing

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Variant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD

Get to Know Your DNA. Every Single Fragment.

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

Analytics Behind Genomic Testing

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

From raw reads to variants

QIAseq Targeted Panel Analysis Plugin USER MANUAL

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0

Read Mapping and Variant Calling. Johannes Starlinger

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

IDENTIFYING A DISEASE CAUSING MUTATION

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

QIAGEN s Preanalytic NGS Solutions

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

MPG NGS workshop I: SNP calling

Genomics: Human variation

Comparing a few SNP calling algorithms using low-coverage sequencing data

Next Generation Sequencing. Target Enrichment

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Accelerate your NGS performance through Sample to Insight solutions

Exome Sequencing and Disease Gene Search

Next-Generation Sequencing. Technologies

Agilent NGS Solutions : Addressing Today s Challenges

Mapping errors require re- alignment

Variant Callers. J Fass 24 August 2017

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Development and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications

Cancer Genetics Solutions

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

SureSelect Target Enrichment for the Ion Proton TM Next Generation Sequencing System

Ion S5 and Ion S5 XL Systems

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Implementation of Ion AmpliSeq in molecular diagnostics

Ion S5 and Ion S5 XL Systems

Oncomine cfdna Assays Part III: Variant Analysis

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

Variant prioritization in NGS studies: Annotation and Filtering "

The Agilent Technologies SureSelect Platform for Target Enrichment

Enterprise Interest I am an employee of ThermoFisher Scientific.

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP

What is genetic variation?

SureSelect XT HS. Target Enrichment

Access Array BRCA1 / BRCA2 / TP53 Target-Specific Panel Build the highest quality amplicon libraries with qualified assays

Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang

HiSeq Whole Exome Sequencing Report. BGI Co., Ltd.

Detection of Rare Variants in Degraded FFPE Samples Using the HaloPlex Target Enrichment System

VARIANT ANNOTATION. Vivien Deshaies.

G E N OM I C S S E RV I C ES

Using VarSeq to Improve Variant Analysis Research

Published online 15 May 2014 Nucleic Acids Research, 2014, Vol. 42, No. 12 e101 doi: /nar/gku392

Normal-Tumor Comparison using Next-Generation Sequencing Data

CITATION FILE CONTENT / FORMAT

Targeted Sequencing of Leukemia-Associated Genes Using 454 Sequencing Systems

Services Presentation Genomics Experts

SUPPLEMENTARY INFORMATION

Analytical verification methods for the Oncomine Lung cfdna Assay using the Ion S5 XL System

Practical Considerations for Implementation of Clinical Sequencing. Emily Winn-Deen, Ph.D. April 2017

A new paradigm in testing for NSCLC-targeted therapies

About Strand NGS. Strand Genomics, Inc All rights reserved.

Strand NGS Variant Caller

Targeted Sequencing in the NBS Laboratory

SEQUENCING FROM SAMPLE TO SEQUENCE READY

New Frontiers of Genetic Profiling Achieve Higher Sensitivity and Greater Insights with Molecular Barcodes, Long Read Capture and Optimized Exomes

Surely Better Target Enrichment from Sample to Sequencer

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:

Edge effects in calling variants from targeted amplicon sequencing

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Create a Planned Run. Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin USER BULLETIN. Publication Number MAN Revision A.

Novel Variant Discovery Tutorial

Sanger vs Next-Gen Sequencing

Target Enrichment Strategies for Next Generation Sequencing

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Maximizing your NGS sequencing with IDT. Adam Chernick, PhD Field Applications Manager, Functional Genomics

Proteogenomics. Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

An Automated Pipeline for NGS Testing and Reporting in a Commercial Molecular Pathology Lab: The Genoptix Case

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification

Supplementary Figures and Data

Transcription:

Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com

Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction to technology, and applications January 5 th, 1:00 am EDT, 10 am PDT, 6 pm GMT Webinar 2 Targeted NGS for Cancer Research January 12 th, 1:00 am EDT, 10 am PDT, 6 pm GMT NGS in cancer Webinar 3 NGS: Data analysis for genetic profiling January 19th, 1:00 am EDT, 10 am PDT, 6 pm GMT Data analysis pre-set parameters Webinar 4 NGS: Advanced analysis with IVA & CLC bio Cancer Research Workbench January 26th, 1:00 am EDT, 10 am PDT, 6 pm GMT Advance data analysis & interpretation

Legal Disclaimer QIAGEN products shown here are intended for molecular biology applications. These products are not intended for the diagnosis, prevention, or treatment of a disease. For up-to-date licensing information and product-specific disclaimers, see the respective QIAGEN kit handbook or user manual. QIAGEN kit handbooks and user manuals are available at www.qiagen.com or can be requested from QIAGEN Technical Services or your local distributor. Title, Location, Date 3

Agenda NGS Data Analysis Read Mapping Variant Calling Variant Annotation Targeted Enrichment GeneRead Gene Panels GeneRead Data Analysis Portal Workflow Interface Data Interpretation 4

Millions of reads from a single run Read Mapping Reads mapped to a reference genome Alignment Mapping Quality Programs for read-mapping Hash-based: MAQ, ELAND, SOAP, Novoalign Suffix array/burrows Wheeler Transform based: BWA, BowTie, BowTie2, SOAP2 Title, Location, Date 5

Variant Calling Determine if there is enough statistical support to call a variant Reference sequence ACAGTTAAGCCTGAACTAGACTAGGATCGTCCTAGATAGTCTCGATAGCTCGATATC Aligned reads AACTAGACTAGGATCGTCCTAGATAGTCTCG AACTAGACTAGGATCGTCCTACATAGTCTCG AACTAGACTAGGATCGTCCTACATAGTCTCG GATCGTCCTAGATAGTCTCGATAGCTCGAT GATCGTCCTAGATAGTCTCGATAGCTCGAT GATCGTCCTAGATAGTCTCGATAGCTCGAT Multiple factors are considered in calling variants No. of reads with the variant Mapping qualities of the reads Base qualities at the variant position Strand bias (variant is seen in only one of the strands) Variant Calling Software GATK Unified Genotyper, Torrent Variant Caller, SamTools, Mutect, Title, Location, Date 6

Variant Representation VCF Variant Call Format http://www.1000genomes.org/wiki/analysis/variant%20call%20format/vcf-variant-call-format-version-41 Header lines ##fileformat=vcfv4.1 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> ##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias"> ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality"> ##INFO=<ID=OND,Number=1,Type=Float,Description="Overall non-diploid ratio (alleles/(alleles+nonalleles))"> ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth"> ##contig=<id=chrm,length=16571,assembly=hg19> ##contig=<id=chr1,length=249250621,assembly=hg19> ##contig=<id=chr2,length=243199373,assembly=hg19> ##contig=<id=chr3,length=198022430,assembly=hg19> Column labels ##contig=<id=chr4,length=191154276,assembly=hg19> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample chr1 11181327 rs11121691 C T 100.0 PASS DP=1000;MQ=87.67 GT:AD:DP 0/1:146,45:191 chr1 11190646 rs2275527 G A 100.0 PASS DP=1000;MQ=67.38 GT:AD:DP 0/1:462,121:583 chr1 11205058 rs1057079 C T 100.0 PASS DP=1000;MQ=79.57 GT:AD:DP 0/1:49,143:192 Variant calls Title, Location, Date 7

Variant Annotation dbsnp/cosmic ID Actual change and position within the codon or amino acid sequence Effect of the variant on protein coding Chro R Gene Mutation Codon AA Filtered Pos ID Alt m ef Name type Change Change Coverage Allele Frequency Variant Frequency snpeff Effect chr1 11181327 rs11121691 C T MTOR SNP c.6909c>t p.l2303 1,924 C=0.761 T=0.239 0.239 SYNONYMOUS_CODING chr1 11190646 rs2275527 G A MTOR SNP c.5553g>a p.s1851 5,842 G=0.791 A=0.208 0.208 SYNONYMOUS_CODING chr1 11205058 rs1057079 C T MTOR SNP c.4731c>t p.a1577 1,928 C=0.254 T=0.746 0.746 SYNONYMOUS_CODING chr1 11288758 rs1064261 G A MTOR SNP c.2997g>a p.n999 5,186 G=0.212 A=0.788 0.788 SYNONYMOUS_CODING chr1 11300344 rs191073707 C T MTOR SNP 210 C=0.924 T=0.076 0.076 INTRON chr1 11301714 rs1135172 A G MTOR SNP c.1437a>g p.d479 3,965 A=0.248 G=0.752 0.752 SYNONYMOUS_CODING chr1 11322628 rs2295080 G T MTOR SNP 339 G=0.239 T=0.755 0.755 UPSTREAM chr1 186641626 rs2853805 G A PTGS2 SNP 97 G=0.0 A=1.0 1 UTR_3_PRIME chr1 186642429 rs2206593 A G PTGS2 SNP 3,552 A=0.167 G=0.833 0.833 UTR_3_PRIME chr1 186643058 rs5275 A G PTGS2 SNP 237 A=0.759 G=0.241 0.241 UTR_3_PRIME chr1 186645927 rs2066826 C T PTGS2 SNP 209 C=0.88 T=0.12 0.12 INTRON chr2 29415792 rs1728828 G A ALK SNP 2,520 G=0.0 A=1.0 1 UTR_3_PRIME chr2 29416366 rs1881421 G C ALK SNP c.4587g>c p.d1529e NON_SYNONYMOUS_CODI G=0.907 C=0.093 4,361 0.093 NG chr2 29416481 rs1881420 T C ALK SNP c.4472t>c p.k1491r NON_SYNONYMOUS_CODI T=0.954 C=0.045 3,061 0.045 NG chr2 29416572 rs1670283 T C ALK SNP c.4381t>c p.i1461v NON_SYNONYMOUS_CODI T=0.0 C=0.999 5,834 0.999 NG chr2 29419591 rs1670284 G T ALK SNP 739 G=0.093 T=0.907 0.907 INTRON chr2 29445458 rs3795850 G T ALK SNP c.3375g>t p.g1125 1,776 G=0.917 T=0.082 0.082 SYNONYMOUS_CODING chr2 29446184 rs2276550 C G ALK SNP 475 C=0.895 G=0.105 0.105 INTRON SIFT score Predicts the deleterious effect of an amino acid change based on how conserved the sequence is among related species Polyphen score Predicts the impact of the variant on protein structure Title, Location, Date 8

GeneRead DNAseq Gene Panel: Targeted Sequencing What is targeted sequencing? Sequencing a sub set of regions in the whole-genome Why do we need targeted sequencing? Not all regions in the genome are of interest or relevant to a specific study Exome Sequencing: sequencing most of the exonic regions of the genome (exome). Protein-coding regions constitute less than 2% of the entire genome Focused panel/hot spot sequencing: focused on the genes or regions of interest What are the advantages of focused panel sequencing? More coverage per sample, more sensitive mutation detection More samples per run, lower cost per sample Title, Location, Date 9

Target Enrichment - Methodology Multiplex PCR Small DNA input (40ng) Short processing time (several hrs) Relatively small throughput (KB - MB region) Sample preparation (DNA isolation) PCR target enrichment (2 hours) Library construction Sequencing Data analysis Title, Location, Date 10

Variants Identifiable through Multiplex PCR SNPs single nucleotide polymorphisms Indels Indels < 20 bp in length Variants callable with the help of a reference Copy number variants (CNVs) CNV Variants not callable Structural variants Large indels Inversions Large insertion Inversion Title, Location, Date 11

GeneRead DNAseq Gene Panel Multiplex PCR technology based targeted enrichment for DNA sequencing Cover all human exons (coding regions) Division of gene primers sets into 4 tubes; up to 2400 plex in each tube 12

GeneRead DNASeq Gene Panels Type Panel name # Genes # Amplicons Target region (bases) Solid tumors Hematologic malignancy Tumor Actionable Mutations Clinically Relevant Tumor 8 118 7104 24 602 39603 Myeloid Neoplasms 50 2536 236319 Genes Involved in Disease Genes with High Relevance Disease-specific Comprehensive Breast Cancer 44 2915 268621 Colorectal Cancer 38 1954 182851 Liver Cancer 33 2052 191170 Lung Cancer 45 3586 332999 Ovarian Cancer 32 2021 198058 Prostate Cancer 32 1837 167195 Gastric Cancer 29 2377 222333 Cardiomyopathy 58 2657 249727 Carrier Testing 157 6943 664735 Cancer Predisposition 143 6582 620318 Comprehensive Cancer 160 7951 744835 13

GeneRead DNAseq Custom Panel 14

Data Analysis for Targeted Sequencing GeneRead data analysis work flow Read Mapping Primer Trimming Variant Calling Variant Annotation Read mapping Identify the possible position of the read within the reference Align the read sequence to reference sequences Primer trimming Remove primer sequences from the reads Variant calling Identify differences between the reference and reads Variant filtering and annotation Functional information about the variant Title, Location, Date 15

Reads from Targeted Sequencing Typical NGS raw read from targeted sequencing Adapter Barcode Primer Insert sequence Primer Adapter 5 - -3 Removal of adapters and de-multiplexing Primer Insert sequence Primer 5 - -3 Read length can vary: only part of the insert or the 3 primer may be present 5 - -3 5 - -3 5 - -3 Title, Location, Date 16

Read Mapping Align reads to the reference genome Reference sequence Amplicon 1 Aligned reads Amplicon 2 Title, Location, Date 17

Primer Trimming Primer sequences must be trimmed for accurate variant calling Reference sequence Amplicon 1 C C Frequency of `C` without primer trimming = 4/13 = 31% C C Aligned reads Amplicon 2 Frequency of `C` after primer trimming = 4/7 = 57% Title, Location, Date 18

GeneRead Variant Calling Overview Raw reads from the sequencer (de-multiplexed) GeneRead panel ID Analysis mode Annotation snpeff (basic annotation) Read mapping and post-processing BowTie2/BWA BAM GATK Indel Realigner Primer Trimming BAM GATK UG/ VCF Strelka/CLC LF MiSeq/HiSeq BAM BAM Sequencing platform Variant calling and filtering CNV calling w/ref VCF GATK Base Quality Score Recalibrator BAM BAQ Computation IonTorrent VCF TMAP BAM Primer Trimming BAM Torrent Variant Caller (TVC) VCF SnpSift (links to dbsnp, Cosmic and computation of Sift scores, etc.) VCF dbsnp Cosmic ClinVar dbnsfp GATK Variant Annotator GATK Variant Filtration VCF Additional filtering (based on frequency and coverage) VCF Variants in Excel format Separate interface, free preview available Variants in Ingenuity Variant Analysis Title, Location, Date 19

Indel Realignment DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011 May; 13(5):191-8. PMID: 21178889 Eliminates some false-positive variant calls around indels Read aligners can not eliminate these alignment errors since they align reads independently Multiple sequence alignment can identify these errors and correct them Title, Location, Date 20

Base Quality Recalibration DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011 May; 13(5):191-8. PMID: 21178889 Eliminates sequencer-specific biases Lane-specific/sample-specific biases Instrument-specific under-reporting/over-reporting of quality scores Systematic errors based on read position Di-nucleotide-specific sequencing errors Recalibration leads to improved variant calls Title, Location, Date 21

Variant Filtration Variant Frequency Somatic mode SNPs with frequency < 4% and indels with frequency < 20% Germline mode SNPs with frequency < 20% and indels with frequency < 25% Strand Bias SNPs with FS 60 Indels with FS 200 Mapping Quality SNPs with MQ 40.0 Haplotype Score SNPs with HaplotypeScore 13.0 Not applicable for pooled samples C C C Strand Bias: variants that are present in reads from only one of the two strands Title, Location, Date 22

Copy Number Variants CNV calls made by comparison to normalized read depths in a reference sample High specificity and sensitivity possible due pooling evidence from multiple amplicons Title, Location, Date 23

Specificity Analysis Specificity: the percentage of sequences that map to the intended targets region of interest number of on-target reads / total number of reads Coverage depth (or depth of coverage): how many times each base has been sequenced Coverage uniformity: evenness of the coverage depth along the target region Reference sequence ROI 1 ROI 2 NGS reads Off-target reads On-target reads On-target reads Title, Location, Date 24

GeneRead Data Analysis Web Portal FREE Complete & Easy to use Data Analysis with Web-based Software.bam file (Ion Torrent).fastq or.fastq.gz (MiSeq/HiSeq) Submit runs for analysis View output files Delete/manage files 25

Features of Variant Report SNP detection Indel detection 26

Biological Interpretation Using IVA VCF to Ingenuity Variant Analysis Built-in optimized support for uploading called variant files in VCF format Create analysis by answering a few questions about study Study Type Repeat workflows Title, Location, Date 27

Features of Ingenuity Variant Analysis Annotate samples with Ingenuity Knowledge Base Compare what s genetically distinctive Identify and share the most promising variants Interactive filter cascade Share / Publish Ingenuity Annotation Title, Location, Date 28

QIAGEN s GeneRead DNAseq Gene Panel System FOCUS ON YOUR RELEVANT GENES Focused: Biologically relevant content selection enables deep sequencing on relevant genes and identification of rare mutations Flexible: Mix and match any gene of interest NGS platform independent: Functionally validated for PGM, MiSeq/HiSeq Free analysis: Free, complete and easy to use data analysis tool

Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction to technology, and applications January 5 th, 1:00 pm EDT, 10 am PDT, 6 pm GMT Webinar 2 Targeted NGS for Cancer Research January 12 th, 1:00 pm EDT, 10 am PDT, 6 pm GMT NGS in cancer Webinar 3 NGS: Data analysis for genetic profiling January 19th, 1:00 pm EDT, 10 am PDT, 6 pm GMT Data analysis pre-set parameters Webinar 4 NGS: Advanced analysis with IVA & CLC bio Cancer Research Workbench January 26th, 1:00 pm EDT, 10 am PDT, 6 pm GMT Advance data analysis & interpretation

Thank you for attending! Contact Technical Support Phone: 1-800-362-7737 BRCsupport@qiagen.com Webinar related questions: Qiawebinars@qiagen.com 31