NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING

Similar documents
Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Fundamentals of Next-Generation Sequencing: Technologies and Applications

Introduction to Bioinformatics

SNP calling and VCF format

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Welcome to the NGS webinar series

Emma Huxley. Principal Clinical Scientist West Midlands Regional Genetics Laboratory

Cancer Genetics Solutions

Lecture 2: Biology Basics Continued

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Chapter 14: Genes in Action

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Corporate Overview. March 2017

Figure S1. Unrearranged locus. Rearranged locus. Concordant read pairs. Region1. Region2. Cluster of discordant read pairs, bundle

Read Mapping and Variant Calling. Johannes Starlinger

The Agilent Technologies SureSelect Platform for Target Enrichment

Next-Generation Sequencing. Technologies

Complementary Technologies for Precision Genetic Analysis

Surely Better Target Enrichment from Sample to Sequencer and Analysis

Genome Sequencing and Structural Variation

MoGUL: Detecting Common Insertions and Deletions in a Population

Haploid Assembly of Diploid Genomes

Assignment 9: Genetic Variation

LUMPY: A probabilistic framework for structural variant discovery

Personal Genomics Platform White Paper Last Updated November 15, Executive Summary

Target Enrichment Strategies for Next Generation Sequencing

A pathogenic mutation was identified in the LDLR gene.

Bioinformatics Advice on Experimental Design

High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours

ICH Topic E15 Definitions for genomic biomarkers, pharmacogenomics, pharmacogenetics, genomic data and sample coding categories.

Computational methods for discovering structural variation with next-generation sequencing

Next Generation Sequencing. Target Enrichment

Sequence Assembly and Alignment. Jim Noonan Department of Genetics

Jenny Gu, PhD Strategic Business Development Manager, PacBio

Mutation entries in SMA databases Guidelines for national curators

CRISPR Applications: Mouse

Human Genomics, Precision Medicine, and Advancing Human Health. The Human Genome. The Origin of Genomics : 1987

Biomedical Big Data and Precision Medicine

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

REVIEWS. Structural variation in the human genome

Genomic Instability And Chromosome Architecture. Kevin Mills, Ph.D. Associate Professor, The Jackson Laboratory

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

Fluorescent in-situ Hybridization

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Release Notes for Genomes Processed Using Complete Genomics Software

Next-Generation Sequencing Services à la carte

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Detecting copy-neutral LOH in cancer using Agilent SurePrint G3 Cancer CGH+SNP Microarrays

Clinician s Guide to Actionable Genes and Genome Interpretation

BST227 Introduction to Statistical Genetics

Towards detection of minimal residual disease in multiple myeloma through circulating tumour DNA sequence analysis

TEKS 5C describe the roles of DNA, ribonucleic acid (RNA), and environmental factors in cell differentiation

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Genomic Research: Issues to Consider. IRB Brown Bag August 28, 2014 Sharon Aufox, MS, LGC

Modern Epigenomics. Histone Code

Course Overview: Mutation Detection Using Massively Parallel Sequencing

NGS 101 Panel Design and Quality. Adam Hauge Development Manager University of Minnesota Genomics Center

Chromosome Analysis Suite 3.0 (ChAS 3.0)

Linking Genetic Variation to Important Phenotypes

Gene mutation and DNA polymorphism

Personalized Medicine

THE PATH TO PRECISION MEDICINE IN MULTIPLE MYELOMA. themmrf.org

About Strand NGS. Strand Genomics, Inc All rights reserved.

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

What is genetic variation?

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

American Board of Medical Genetics and Genomics

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Advances in B Lymphblastic Leukemia MRD. Brent Wood MD PhD Departments of Laboratory Medicine and Pathology University of Washington.

Serial Analysis of Gene Expression

SUPPLEMENTARY INFORMATION

Accessible answers. Targeted sequencing: accelerating and amplifying answers for oncology research

Next Generation Sequencing for Gene Fusion Detection: A Complementary Tool for Cytogenetics

SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches

Whole genome sequencing in drug discovery research: a one fits all solution?

Ion S5 and Ion S5 XL Systems

Applications of PacBio Single Molecule, Real- Time (SMRT) DNA Sequencing

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Additional Practice Problems for Reading Period

Genome-wide association studies (GWAS) Part 1

Finding Biology in the Human Microbiome. George Weinstock

Getting high-quality cytogenetic data is a SNP.

Growing Needs for Practical Molecular Diagnostics: Indonesia s Preparedness for Current Trend

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Bioinformatics for High Throughput Sequencing

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Axiom mydesign Custom Array design guide for human genotyping applications

Timothy D. Howard, PhD Center for Human Genomics Wake Forest University Health Sciences

Transcription:

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING Ken Chen, Ph.D. kchen@genome.wustl.edu The Genome Center, Washington University in St. Louis

The path to genomic medicine Human genome sequencing finished in 2003 Genomic medicine: healthcare tailored to the individual based on genomic information The central mission for NHGRI and the field of genomics is to establish the path to the realization of genomic medicine. -Eric Green, Personal Genomes, 2010 Question: How do we establish the path? Dr. Ellis and Rhonda Levan Rhonda Levan, breast cancer clinical trial participant, has been given a new lease on life, thanks to Matthew Ellis, MB, BChir, PhD, a Washington University medical oncologist at the Siteman Cancer Center

Medical Genomes @ WUGC Identification of genomic variants in individual genomes Single Nucleotide Variants (SNVs) Structural Variants (SVs) Genotypical characterization of the variants Frequency in population Heterogeneity, origin, and progression Phenotypical characterization of the variants Functional annotation Integration with functional/clinical data (expression, interaction) Network biology Model organisms

The genomic variants Our 3Gb genomes are ~99% identical, however each individual genome differs from the reference: Single nucleotide variants (SNVs), ~3-4 M Structural variants (SVs), ~300-500 K genomic alterations that involve segments of DNA that are longer than 1 bp Novel sequences, ~5 Mb [Li et al., Nature Biotech, 2010]

Classes of Structural Variants (SVs) Classes of SVs reference new allele Change in # copies of refseq Sequence spacing orientation Deletion Yes Yes Yes Novel sequence Tandem duplication Yes Yes Sometimes VNTR Yes Yes No Dispersed duplication Yes Yes Yes Novel Insertion No Yes Yes SINE/LINE Insertion Yes Yes Yes Inversion No Yes Yes Translocation Sometimes Yes Yes Courtesy: Matt Hurles, 1000 Genomes

SVs in cancer Nature Genetics 36, 331-334 (2004) Total cases: 59,570, Nov. 22, 2010, Mitelman Database of Chromosome Aberrations

Identification of variants: the resequencing approach Computer Reference DNA Samples Sequencer Reads SNVs SVs

SV detection paired end read mapping Var: Ref: SV d 200-500 bp DNA fragments d 3 types of evidence Normal read-pair Discordant read-pair Split-Read Microhomology Approaches Read Depth Read Pair Split Reads Classes DEL, DUP All except large novel insertion DEL, Small INS,DI, Inversion Size Range > kb >50 bp >1 bp, < 1 Mb Resolution kb 50 bp 1 bp Tools CNV-HMM, CMDS BreakDancer, GASV Pindel

SV detection paired end read mapping Var: Ref: SV d 200-500 bp DNA fragments d 3 types of evidence Normal read-pair Discordant read-pair Split-Read Microhomology Detection power Read Depth Read Pair Split Reads Targeted Assembly Homology SV size Maybe Maybe Insert size Read length Maybe Maybe Physical Coverage Sequence Coverage Maybe Maybe

BreakDancer: detect SVs from discordant read pairs Var Reference Type Deletion Insertion Inversion a b Intra-chromosomal translocation Inter-chromosomal translocation SV k i P(n i k i ) n i ~ Poisson(λ i ) λ i = (a + b)n G Density c del Jointly analyze multiple libraries The SV score summarizes: 1. Number of supporting reads 2. Size of the anchor region a, b 3. Physical coverage of each library Insert size 4. Insert size distribution 2 χ 2m m = 2 log e (P j ) j =1 Q = log 10 (P) Chen et al., Nature Methods, 2009

CNVs detected by BreakDancer

TIGRA_SV: assemble SVs to nucleotide resolution Var: Ref: SV BreakDancer TIGRA_SV TIGRA_SV Integration AGCTGT---CA! AGCTGTTGTCA! Chen et al., in revision 1000 Genomes Consortium, Nature, 2010 Mills et al, Nature, 2011

Soft-clipping at SV breakpoints

CREST: SV identification from soft-clipped reads Wang et al., submitted

Washington University genome center genomics landmarks PolyScan 2007 cnvhmm SomaticSniper BreakDancer Pairoscope Varscan CMDS TIGRA_SV Pindel CREST 2008 2009 2010 2011 http://genome.wustl.edu/software/ Nature 455, TSP Lung Nature 455, TCGA Brain Nature 456, AML1 first cancer genome NEJM, AML recurrent mutation IDH1 Nature 464, Breast cancer metastasis xenograft Nature 467, 1000 Genomes Pilot NEJM, AML DNMT3A connect genomics and epigenomics Nature, 1000 Genomes Pilot SV JAMA, Clinical diagnosis of atypical APL fusion

The genomic tsunami Dec. 2010, WashU Genome Center Sequenced: Total Number of Cases: 765 Total Number of Cases Completed: 408 Total Number of Bases produced: ~100 Tb More in 2011!

Diagnose a cryptic fusion using whole genome sequencing Case history 39 y.o. female Presented with pancytopenia and DIC Histology: promyelocytic morphology

Complex cytogenetics, inconsistent with APL Cytogenetics: 46, XX, del(9)(q12q32),del12(q12q21)[6] Complex (poor risk), with no t(15;17) Interphase FISH: most consistent with an RARA-PML fusion, not the pathogenic PML-RARA fusion

Diagnostic conundrum Questions: Does this patient have APL? Leukemia with promyelocytic features FISH: No PML-RARa Cytogenetics: Complex (poor risk), with no t(15;17) 46, XX, del(9)(q12q32),del12(q12q21)[6] Options: APL: All-transretinoic acid (ATRA) low-cost, non-toxic, and good outcome Cytogenetically complex AML: an allo-transplant expensive, with a risk of lethal GvHD

Whole genome sequencing Tumor DNA: 187.1 Gigabases (~43.7X) 99.74% heterozygous SNPs 99.53% homozygous SNPs Skin (normal) DNA: 200.1 Gigabases (~46.8X) 99.76% heterozygous SNPs 99.64% homozygous SNPs Single HiSeq run (two flow cells), $10,000 (sequencing+analysis+validation) completed in <6 weeks from sample receipt

Sequence based copy number analysis

The Identification of a cryptic insertional translocation BreakDancer/TIGRA_SV chr15

Gene fusions produced by the insertional translocation Truncated protein Out of frame Expressed, Pathogenic

Standard FISH failed in diagnosis

Conclusions The patient s unique oncogenotype was determined within the required clinical time frame Correct clinical decision was made: ATRA treatment was indicated, the patient is in remission and doing very well A new set of fosmid-based FISH probes (each 30-40 Kb in size) was made based on this novel discovery Two additional cases of cryptic insertional fusions have been identified so far Time to begin applying whole genome sequencing as a diagnostic approach for potentially understanding atypical cases of diseases

Looking forward Towards clinical sequencing - Higher standard and better algorithms Cancer genomics - Recurrent mutations - Tumor Heterogeneity and progression - Tumor genome architecture Novel algorithms - Complex structural rearrangements - New technology Integrative analysis - RNA profiling and transcriptome assembly - Epigenomic profiling - Networks

Acknowledgements Richard Wilson Elaine Mardis Timothy Ley George Weinstock Collaborators at WashU Medical Genomics The Genome Center Heng Li Matt Hurles Evan Eichler Charles Lee 1000 Genomes Structural Variation Group

Thank you!