Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Similar documents
SUPPLEMENTARY INFORMATION

Prioritization: from vcf to finding the causative gene

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Understanding the science and technology of whole genome sequencing

CITATION FILE CONTENT / FORMAT

Variant calling in NGS experiments

Mutations. Mutations may be either gene mutations or. These mistakes are called. mutations. Gene mutations produce a change

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

In silico variant analysis: Challenges and Pitfalls

Using VarSeq to Improve Variant Analysis Research

Genomics: Human variation

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analytics Behind Genomic Testing

INTRODUCTION NEW GENETIC TECHNIQUES IN METABOLIC DISEASES 26/01/2016 FROM ONE GENERATION TO THE NEXT. Image challenge of the week D.

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA

NGS in Pathology Webinar

Gathering of pathogenicity evidence for novel variants. By Lewis Pang

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Alissa Interpret The next evolution of Cartagenia Bench

Characterization of novel rare genetic variants identified by next generation sequencing

Mutation entries in SMA databases Guidelines for national curators

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant prioritization in NGS studies: Annotation and Filtering "

Custom Panels via Clinical Exomes

HGVS Mutation Nomenclature Queries

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

SEQUENCING. M Ataei, PhD. Feb 2016

Introduction to human genomics and genome informatics

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

Introduction to Genomic Medicine Exeter Expert Series: Cardiology

ANNOVAR Variant Annotation and Interpretation

SNP calling and VCF format

Proteogenomics. Kelly Ruggles, Ph.D. Proteomics Informatics Week 9

Applied Bioinformatics

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

Welcome to the NGS webinar series

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Next Generation Sequencing: Data analysis for genetic profiling

Automating the ACMG Guidelines with VSClinical. Gabe Rudy VP of Product & Engineering

Read Mapping and Variant Calling. Johannes Starlinger

DATA FORMATS AND QUALITY CONTROL

ACGS: Standardisation of variant interpretation and reports

Understanding Genes & Mutations. John A Phillips III May 16, 2005

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

SureSelect Clinical Research Exome V2. Optimized for Rare Diseases

Exeter s experience of trying to keep up with demand for NGS. Anna Bussell

Genetics 101: What does it mean to have an LBSL-specific mutation

Genetic Testing in the Clinic. Anne Goodeve Sheffield Diagnostic Genetics Service Sheffield Children s NHS Foundation Trust

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Molecular Genetics of Disease and the Human Genome Project

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Case 3. Laura Yeates BSc (hons) Grad Dip Gen Couns. FHGSA

Introduction to RNA-Seq in GeneSpring NGS Software

Exploring genomic databases: Practical session "

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Functional Annotation and Prioritization of Whole Exome and Whole Genome Sequencing Variants. Mulin Jun Li

For more information about how to cite these materials visit

VARIANT ANNOTATION. Vivien Deshaies.

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Niemann-Pick Type C Disease Gene Variation Database ( )

Course Presentation. Ignacio Medina Presentation

SureSelect Clinical Research Exome V2 Definitive Answers Where it Matters Most

Identification of the Photoreceptor Transcriptional Co-Repressor SAMD11 as Novel Cause of. Autosomal Recessive Retinitis Pigmentosa

Jumping Into Your Gene Pool: Understanding Genetic Test Results

MedSavant: An open source platform for personal genome interpretation

Matthew Tinning Australian Genome Research Facility. July 2012

Chapter 5. Structural Genomics

Supplemental Figure 1: Nisoldipine Concentration-Response Relationship on ipsc- CMs. Supplemental Figure 2: Exome Sequencing Prioritization Strategy

Implementing ACMG guidelines on sequence variant interpretation: software-assisted variant curation and filtering

Introduction to the UCSC genome browser

100,000 Genomes Project Rare Disease Programme. Dr Richard Scott Clinical Lead for Rare Disease, Genomics England

Targeted resequencing

BIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.

Guided tour to Ensembl

Bacterial Mutation Types Mechanisms And Mutant Detection

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Next-Generation Sequencing. Technologies

MRC-Holland MLPA. Description version 12; 27 November 2015

What determines if a mutation is deleterious, neutral, or beneficial?

G E N OM I C S S E RV I C ES

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Ion S5 and Ion S5 XL Systems

RNA-Sequencing analysis

IDENTIFYING A DISEASE CAUSING MUTATION

3. The following sequence is destined to be translated into a protein: However, a mutation occurs that results in the molecule being altered to:

Gene mutation and DNA polymorphism

Ion S5 and Ion S5 XL Systems

What is genetic variation?

OncoMD User Manual Version 2.6. OncoMD: Cancer Analytics Platform

Deep Sequencing technologies

Ion S5 and Ion S5 XL Systems

Transcription:

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS Workflow

What is Next Generation Sequencing? (NGS) Historically we have used Sanger sequencing to investigate genetic diseases This looks at one stretch of DNA from one patient at a time (~600 base pairs in length) Measures fluorescence given off when dye labelled nucleotides are excited by a laser to determine order of bases

What is Next Generation Sequencing? (NGS) NGS (also referred to as high throughput sequencing or massively parallel sequencing) Generates hundreds of millions of overlapping short sequences (up to 300bp) in a single run These have to be computationally put back together Can look at multiple patients in one run

Why do we need Next Generation Sequencing? (NGS) Human Genome project took 15 years to complete using Sanger based technology at an estimated cost of $3 billion Today, using NGS, this could be completed in a day or two for under $1000

Common approaches to NGS Targeted panels (tngs) Pull out specific genes from the patient s DNA and only obtain the sequence data from these genes (up to about 150 genes) Rare disease / Medeliome / Clinical exome Essential a very large (6,110 genes) panel that looks at the exons of genes known to cause human disease (at the time of design!) Whole exome Looks at the exons of 23,244 expressed genes that encode 1-2% of the human genome Genome sequencing Looks at the complete (ish) DNA sequence from a patient

Single gene disease Common approaches to NGS Easily clinically recognisable disease Single genetic aetiology (mutations in one gene cause this disorder) Existing tests widely available in diagnostic laboratories Small number of genes for a disease Clinically recognisable disease Multiple sub-types caused by mutations in different genes Highly developed clinical expertise and knowledge available in specialist centres Large number of possible causes (or no known cause) Strong suggestion of monogenic disease, but no clear clue to which gene to test

Workflow for NGS Patient Extract DNA, prepare library and sequence Raw Reads (FASTQ) Assess quality and process reads Processed reads (FASTQ) Map to reference genome Assess depth and breadth of coverage Aligned Reads (SAM/BAM) Call variants (VCF) Variant and sample quality control Annotate variants Filter and prioritise variants Integrate with clinical data Shortlist of disease related variants Diagnostic report Visualise data Visualise data

Workflow for NGS Patient Extract DNA, prepare library and sequence Call variants Visualise data Quality Control Annotate variants Map to reference genome Shortlist of disease related variants Visualise data Diagnostic report

DNA extraction and library preparation Genomic DNA Fragment Target Attach adaptors for paired end sequencing

Sequencing

Read mapping After base calling, align/map sequences onto reference genome Determine coordinates (chromosomal position) and add basic annotations (coding, non-coding, etc) if known ATCTTGTAGG GAAACACAAAGTG GTCTAGGGAAGAAGG.. TAGTACCCCATCTTGTAGGTCTGAAACACAAAGTGTGGGGTGTCTAGGGAAGAAGGTGTGTGACCAGGGAGGTCCC.. Reference Genome

Read mapping

Coverage Vertical coverage how many times a particular base has been sequenced (e.g. 20X, 30X etc.) Greater depth of coverage means improved accuracy for variant detection (but is more expensive) Horizontal coverage how much of the genome has been sequenced Greater target size means more genome is sequenced (but is more expensive)

Coverage

Variant calling

Variant calling #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT germline chr4 27668. T C 8.65. DP=2;AF1=1;AC1=4; GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3 chr4 27669. G T 4.77. DP=2;AF1=1;AC1=4; GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3 chr4 27712. T C 44. DP=2;AF1=1;AC1=4; GT:PL:DP:SP:GQ 1/1:40,3,0:1:0:8 chr4 27774. G A 5.47. DP=2;AF1=0.5011; AC1=2; GT:PL:DP:SP:GQ 0/1:34,0,23:2:0:28 chr4 36523. A T 10.4. DP=1;AF1=1;AC1=4; GT:PL:DP:SP:GQ 0/1:0,0,0:0:0:3

Variants A variant is a DNA sequence that is different to the normal sequence for a particular species. These should be named according to standardised nomenclature (HGVS) This allows consistent reporting and must include: Reference sequence - e.g. NM_0000123.4 cdna change - e.g. c.123a>g Protein change - e.g. p.(v59m) or p.(val59met)

Variant types The sun was hot but the man did not get his hat. SNV a change to a single base pair The sun wos hot but the man did not get his cat. The sun was.ot but the man did not get his hat. Small insertion/deletion (InDel) in frame The sun hot but the man did not get his hat. The sun was too hot but the man did not get his hat. Small insertion/deletion (InDel) frameshift The sun wah otb utt hem and idn otg eth ish at The sun wwa sho tbu tth ema ndi dno tge thi sha t

Variant pathogenicity A variant is pathogenic if it interferes with normal protein production. There are many ways that this can happen! Regulatory region Change amino acid Change splice site, add intron Frameshift, causing stop codon later New stop codon Change splice site, remove exon Stop codon

Variant prioritisation Frameshift and stop gain (nonsense substitution) variants are highly likely to be pathogenic. Splicing variants are likely to be pathogenic, but need checking with a splicing predictor. Missense variants can be pathogenic, and there are in-silico tools to predict the effect. The effect depends on how the amino acids are changed. Synonymous substitutions are very unlikely to be pathogenic unless they affect splicing.

Variant prioritisation ~30,000 variants Exclude common variants Identify potential pathogenic mutation(s) Causal mutation(s)

Variant annotation and filtering We can pull information in from a variety of external sources, including: Population databases, e.g. ExAC and dbsnp These provide an approximation of the variants that are common in the population and may be excluded from consideration Disease databases, e.g. HGMD These provide a list of the known disease causing mutations seen in a variety of settings and may be a flag for prioritisation In silico analysis packages, e.g. SIFT, PolyPhen Phenotypic terms provided by clinician using HPO

Questions?