Understanding the science and technology of whole genome sequencing

Similar documents
SEQUENCING. M Ataei, PhD. Feb 2016

Matthew Tinning Australian Genome Research Facility. July 2012

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

Next-generation sequencing Technology Overview

Deep Sequencing technologies

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Third Generation Sequencing

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Welcome to the NGS webinar series

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Get to Know Your DNA. Every Single Fragment.

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Illumina s Suite of Targeted Resequencing Solutions

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

INTRODUCTION NEW GENETIC TECHNIQUES IN METABOLIC DISEASES 26/01/2016 FROM ONE GENERATION TO THE NEXT. Image challenge of the week D.

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

How much sequencing do I need? Emily Crisovan Genomics Core

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Next-Generation Sequencing. Technologies

A Crash Course in NGS for GI Pathologists. Sandra O Toole

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Using New ThiNGS on Small Things. Shane Byrne

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

SUPPLEMENTARY INFORMATION

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Next Gen Sequencing. Expansion of sequencing technology. Contents

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

ILLUMINA SEQUENCING SYSTEMS

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Introduction to Next Generation Sequencing (NGS)

Overview of Next Generation Sequencing technologies. Céline Keime

Targeted Sequencing in the NBS Laboratory

Wet-lab Considerations for Illumina data analysis

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

ngs metagenomics target variation amplicon bioinformatics diagnostics dna trio indel high-throughput gene structural variation ChIP-seq mendelian

NGS-based innovations within the Leiden Network

Informed Consent for Columbia Combined Genetic Panel (CCGP) for Adults Please read the following

solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Research school methods seminar Genomics and Transcriptomics

Next Generation Sequencing. Tobias Österlund

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014


Ultrasequencing: methods and applications of the new generation sequencing platforms

TREE CODE PRODUCT BROCHURE

G E N OM I C S S E RV I C ES

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Next Generation Sequencing (NGS)

Ion S5 and Ion S5 XL Systems

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Sequencing techniques

THE ERA OF INDIVIDUAL GENOMES. Sandra Viz Lasheras Advanced Genetics ( )

Centro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F.

Sequencing techniques and applications

Ion S5 and Ion S5 XL Systems

Crash-course in genomics

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Human genome sequence

latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

SureSelect Clinical Research Exome V2. Optimized for Rare Diseases

Next-Generation Sequencing Services à la carte

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

Whole Human Genome Sequencing Report This is a technical summary report for PG DNA

AUDREY FARBOS JEREMIE POSCHMANN PAUL O NEILL KONRAD PASZKIEWICZ KAREN MOORE

Applied Bioinformatics

Galaxy Workshop

DNA Sequencing by Ion Torrent. Marc Lavergne CHEM 4590

14 March, 2016: Introduction to Genomics

NEXT GENERATION SEQUENCING. Farhat Habib

Trimethylaminuria (TMAU) Yiran Guo, Ph.D. Center for Applied Genomics Children's Hospital of Philadelphia

Design a super panel for comprehensive genetic testing

Course Overview: Mutation Detection Using Massively Parallel Sequencing

2nd (Next) Generation Sequencing 2/2/2018

RNA-Seq data analysis course September 7-9, 2015

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Introduction to Bioinformatics

Ion S5 and Ion S5 XL Systems

Overview of techniques in Molecular Diagnostics ELKE BOONE 23 MAART 2017

DNA-Sequencing. Technologies & Devices

Analytics Behind Genomic Testing

SureSelect Clinical Research Exome V2 Definitive Answers Where it Matters Most

Next Generation Sequencing. Target Enrichment

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

Course Overview. Objectives

Exome Sequencing Exome sequencing is a technique that is used to examine all of the protein-coding regions of the genome.

Transcription:

Understanding the science and technology of whole genome sequencing Dag Undlien Department of Medical Genetics Oslo University Hospital University of Oslo and The Norwegian Sequencing Centre d.e.undlien@medisin.uio.no

DNA sequencing technology

DNA 4 bases - A, G, C, T G A T C C T A G Human genome ~3 billion bases How can we read the sequence of bases?

DNA sequencing Mimic the way DNA replication occurs in living cells in a test tube. Monitor the chemical reactions that take place Sanger sequencing Detect nucleotide extension with radioactivity or fluorescence Accurate but slow

High-throughput sequencing also known as next generation sequencing, deep sequencing, massively parallel sequencing

Sequencing: old and next LTS (Sanger) Molecules sequenced Fragment Clone/PCR Sequence 1, 48, 96... genomic DNA...unless you have a lot of machines HTS (High-throughput sequencing) Fragment Array Sequence 4x10-5 9 1x10 Massively parallel...on one machine

Illumina sequence data Random DNA library of short fragments ~300 bp 3 billion DNA sequence reads 50, 100 bp long Single-end reads Paired-end reads Run time: 1-9 days Data volume: 1-500 GB

Sequencing platforms Platform 454 Illumina HiSeq Illumina MiSeq PacBio Ion Torrent System cost - -- ++ --- +++ Prep - + ++ + + Running cost -- + ++ ++ ++ Run time 10 hours 1-9 days 27 hours 2 hours 2 hours Read accuracy 99% 98% 98% 87% 98.8% Read number 100000 3000000000 3500000 75000 6 x 10^6 Read length 400 bp 2x100 2x150 ~2700 (10kb) 2-400 bp Output 35 Mb 600 Gb >1 Gb 90 Mb >1 Gb

HT sequencing - commonalities Many short sequence fragments assembly Stochastic how many times a given sequence is sequenced The technology can be used for quantitative investigations E.g. CNVs, gene expression Well suited for many non-genetic studies

High throughput sequencing - applications Platform 454 Illumina HiSeq Illumina MiSeq* PacBio Ion Torrent Resequencing - +++ ++ + +++ de novo +++ + + +++ +++ metagenomics +++ ++ + ++ +++ mrna ++ +++ ++ ++ ++ mirna - +++ +++ - - ChIP - +++ ++ - - DNA meth - +++ +??? -

Illumina throughput 150 112,5 75 37,5 600 450 300 150 200x human genome HiSeq2000 2007 2008 2009 2010 2007 2008 2009 2010 2011 Read length Gb/run Human genome 3 billion bases - 3 Gb Illumina HiSeq 600 Gb per run 200 x human genome per run

So what? Parameter ABI 3100 ABI 3730 Illumina HiSeq Read length ~700 ~700 100 (x2) Reads per run 16 96 3000000000 Run time 2 hours 30 minutes 9 days Time for 1x human genome (3 Gb) 120 years 15 years 1 hour

HTS and medical genetics Finding mutations which cause disease

Genetic disease: the challenge Single gene (monogenic) disorders Approximately 50% of children with congenital syndromes/mental retardation do not receive a firm etiological diagnosis Many of these children have disorders that are primarily genetic in origin. Many disorders are individually rare and difficult to recognize even for experts. Clinical phenotypes can be non-specific and variable. Many phenotypes are genetically heterogeneous. Human genome is (quite) big ~23 000 genes ~3 billion bases pairs How can we (rapidly) identify a mutation causing disease?

Mendelian disease in man 2951 of the well-characterized phenotypes registered in OMIM have a known molecular basis 3743 registered phenotypes with known or suspected Mendelian basis, no associated gene has been identified Improve speed/cost of diagnosis of known genetic disorders Improve speed/cost of identification of the cause new/suspected genetic disorders

Aim @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG +EAS54_6_R1_2_1_443_348 ;;;;;;;;;;;9;7;;.7;39333 Compare to reference R G FASTQ format 3 billion reads 3 billion bases 1 mutation

Exome sequencing still more common than whole genome oligonucleotides Design exome capture array Make sequence library from patient DNA Hybridize and capture Sequence

Exome sequencing data

What s in an exome? > 20 000 variants Many loss-of-function variants MacArthur & Tyler-Smith Hum Mol Genet 19 (2010).

Many, many variants will be found Which variants are deleterious? Novel? (dbsnp, 1000genomes, HGMD) Synonymous/non-synonymous? Conserved? Alter protein structure? SNPnexus PolyPhen2 MutationTaster ANNOVAR SeattleSeq Annotation

Strategies to identify mutations multiple individuals same disease large multigenerational pedigrees de novo mutations population frequency for complex diseases Bamshad et al. Nature Reviews Genetics Nov 2011

stricter criteria Family data - Shendure table more exomes Comparing two exomes identifies ~20 000 SNPs Which is the causal variant? In a family, compare more exomes

Genetic diagnosis and rare diseases 2009 Confirm diagnosis Test series of single genes Often international labs Very expensive 2012 Confirm or clarify diagnosis 1000s genes tested at once Costs decreasing Fast (weeks) Time-consuming (years) A diagnostic revolution

The future? Oxford Nanopore MinION disposable device plug directly into computer USB port compatible with blood, serum and environmental samples ~1 Gb sequence ~$900 Personal genomics?

Summary High-throughput sequencing Dramatic increase in sequence production Many applications on one platform Field new and moving very quickly Diagnostic (exome) sequencing in place Huge impact on human/medical genetics Challenges/opportunities Data storage/backup/distribution Data analysis Whole-genome sequencing? Incidental findings