Analytics Behind Genomic Testing

Similar documents
Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

SNP calling and VCF format

Course Presentation. Ignacio Medina Presentation

Welcome to the NGS webinar series

Next-Generation Sequencing. Technologies

Sanger vs Next-Gen Sequencing

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Bioinformatics

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Variant calling in NGS experiments

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Next Gen Sequencing. Expansion of sequencing technology. Contents

Variant detection analysis in the BRCA1/2 genes from Ion torrent PGM data

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

CNV and variant detection for human genome resequencing data - for biomedical researchers (II)

Read Mapping and Variant Calling. Johannes Starlinger

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

About Strand NGS. Strand Genomics, Inc All rights reserved.

Admera Health RUO Services

Next Generation Sequencing: Data analysis for genetic profiling

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Create a Planned Run. Using the Ion AmpliSeq Pharmacogenomics Research Panel Plugin USER BULLETIN. Publication Number MAN Revision A.

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

Ion S5 and Ion S5 XL Systems

Long and short/small RNA-seq data analysis

Next Generation Sequencing (NGS) Market Size, Growth and Trends ( )

DATA FORMATS AND QUALITY CONTROL

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

HLA and Next Generation Sequencing it s all about the Data

RNA-Sequencing analysis

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification

Oncomine cfdna Assays Part III: Variant Analysis

Next Generation Sequencing and Bioinformatics Analysis Pipelines. Adam Ameur National Genomics Infrastructure SciLifeLab Uppsala

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

ACCEL-NGS 2S DNA LIBRARY KITS

Surely Better Target Enrichment from Sample to Sequencer and Analysis

WELCOME. Norma J. Nowak, PhD Executive Director, NY State Center of Excellence in Bioinformatics and Life Sciences (CBLS)

OHSU Digital Commons. Oregon Health & Science University. Benjamin Cordier. Scholar Archive

Gene mutation and DNA polymorphism

Genomics and Transcriptomics of Spirodela polyrhiza

Introduction to the UCSC genome browser

Introduction to RNA sequencing

Next-Generation Sequencing Services à la carte

Introduction to Bioinformatics and Gene Expression Technologies

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

Nature Genetics: doi: /ng Supplementary Figure 1

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Cancer Genetics Solutions

Bioinformatics in next generation sequencing projects

Workflow of de novo assembly

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Product selection guide Ion S5 and Ion S5 XL Systems

CDC s Advanced Molecular Detection (AMD) Sequence Data Analysis and Management

Bioinformatics Advice on Experimental Design

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

Fundamentals of Next-Generation Sequencing: Technologies and Applications

High-throughput scale. Desktop simplicity.

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Unravelling the genetic basis of Mayer-Rokitansky- Küster-Hauser syndrome through whole exome sequencing

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Developing Tools for Rapid and Accurate Post-Sequencing Analysis of Foodborne Pathogens. Mitchell Holland, Noblis

Axiom mydesign Custom Array design guide for human genotyping applications

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

Product selection guide Ion GeneStudio S5 Series

Introduction to RNA-Seq

SCALABLE, REPRODUCIBLE RNA-Seq

User Requirement Specifications

E2ES to Accelerate Next-Generation Genome Analysis in Clinical Research

Single Cell Genomics

Jenny Gu, PhD Strategic Business Development Manager, PacBio

Introduction to Bioinformatics

Next Generation Sequencing. Target Enrichment

Whole genome sequencing in drug discovery research: a one fits all solution?

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Contact us for more information and a quotation

RNAseq Differential Gene Expression Analysis Report

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Applications of the Ion AmpliSeq Immune Repertoire Assay Plus TCRβ

De Novo Assembly of High-throughput Short Read Sequences

RNA Seq: Methods and Applica6ons. Prat Thiru

Transcription:

A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1

Learning Objectives Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

The Human Reference Genome ~3B base pairs structured into 23 chromosome pairs 3,098,825,702 base pairs 20,805 coding genes 14,181 pseudogenes 196,501 gene transcripts

Why Genomic Testing? KRAS G12D 1 in 4 cancer deaths are from lung cancer. ~222,500 new cases of lung cancer in the U.S. in 2017.

Genomic Testing Short-Read Sequencers Illumina Ion-Torrent Long-Read Sequencers PacBio NanoPore 10X Nanostring

Types of NGS Testing Somatic & Germline 7

Types of NGS Testing cfdna and ctdna Non-Invasive Prenatal Testing (NIPT) Liquid Biopsy EGFR Trisomy 21 (Down Syndrome) Non-small cell lung cancer

Types of NGS Testing Infectious Disease

Types of NGS Testing RNA-Seq Alternate transcripts Novel gene isoforms Gene fusions

Role of Clinical Bioinformatics Build pipelines Provide supplemental information for clinical interpretation and quality control Other computationally heavy analytics are involved in evaluating: Design of new panels Identification of genetic patterns in patient cohorts Discovery of gene pathways

Understanding bioinformatics requires understanding the laboratory process.

Variant Calling Pipeline Steps in a bioinformatics pipeline: 1. Sample demultiplexing 2. Read alignment 3. BAM polishing steps 4. Variant calling 5. Variant annotations 6. QC calculations

Step 1: Sample Demultiplexing

Step 2: Read Alignment Read Alignment SAM Format

Step 3: BAM Polishing Steps PCR Duplicate Removal Base Quality Score Recalibration Q30 Phred base quality score 99.9% 1/1000 homopolymer A T C C C T G A T C C C T G A T C A T C A T C +1% C C C T G C T G C C T G A T C C C T G reference read

Step 4: Variant Calling by Class

Example Variant Calling Algorithms SNV/Insertion/Deletion Position based callers (GATK Unified Genotyper, LoFreq) Local de-novo assembly of haplotypes (GATK Haplotype Caller) Graph based variant callers (Graph Genome) Neural networks (Deep Variant) Duplications/Structural variants Pattern growth approach (Pindel) Split reads, discordant paired-end reads (Manta, DELLY, CREST) kmer + de-novo assembly (BreaKmer) Unmapped or partially mapped reads (ITD Assembler) Depth of coverage + background error correction + principal component analysis (XHMM) Tumor/normal B allele frequency

Example KRAS G12D Variant Cell

Step 5: Variant Annotations VCF variant Annotated variant The VCF variant includes: chromosome position ID reference base alternate base variant quality meta-information information and individual format fields filter flags The annotated variant includes: Gene Gene Transcript Nucleotide change (cdot) Protein change (pdot) Variant Type Polymorphism Synonymous Non-synonymous Nonsense Missense Frame shift

Step 6: QC Calculations Sample Sequencing QC metrics Report Run Level Cluster density Base call quality score Fragment size Sample Level Depth coverage Uniformity Mapping quality Duplication rate Variant Level Novel variants Known variants Transition-to-transversion ratio

Sample-Level QC Metrics for Targeted Capture Uniformity On-Target Minimum Depth of Coverage Mapping Quality Off-Target Read Depth Duplication Rate Exon Gene Intronic regions

Compute Infrastructure for Data Processing How does a bioinformatics job get executed in clinical production? Job 5 Job 4 Job 1 Job 3 Job 2

Data Storage Infrastructure Database Object Storage hot Archive cold 99.9% availability 99.999999999% durability Cloud based BCL (500 550 GB) FASTQs (12 15 GB) FASTQs, BAMs, VCFs HiSeq 4000 Raw output for a single run Exome ~150x Bioinformatics In-house

Bioinformatics Pipeline Validation Recommendations from CAP/AMP 17 recommendation statements 59 variants tested in each variant class Example Statistics Positive percentage agreement (PPA) Positive predictive value (PPV) Reproducibility Allelic fraction lower limit of detection Validation required prior to use in clinical production

Summary Catalogue various types of bioinformatics analyses that support clinical genomic testing Enumerate types of variant classes Describe algorithmic methods for variant detection by NGS Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud Explain validation strategies for bringing best-in-class pipelines into clinical production

Questions? Elaine Gee, PhD Director of Bioinformatics ARUP Laboratories elaine.gee@aruplab.com