Introduction to Next Generation Sequencing (NGS)

Similar documents
Next Generation Sequencing. Simon Rasmussen Assistant Professor Center for Biological Sequence analysis Technical University of Denmark

Next Generation Sequencing. Josef K Vogt Slides by: Simon Rasmussen

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Overview of Next Generation Sequencing technologies. Céline Keime

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

High throughput DNA Sequencing. An Equal Opportunity University!

Matthew Tinning Australian Genome Research Facility. July 2012

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Third Generation Sequencing

Next Gen Sequencing. Expansion of sequencing technology. Contents

Next Generation Sequencing (NGS)

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

DNA-Sequencing. Technologies & Devices

ChIP-seq data analysis

DNA-Sequencing. Technologies & Devices

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Human genome sequence

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

DNA Sequencing by Ion Torrent. Marc Lavergne CHEM 4590

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

Welcome to the NGS webinar series

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

Research school methods seminar Genomics and Transcriptomics

Next-generation sequencing Technology Overview

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Next-Generation Sequencing. Technologies

A Crash Course in NGS for GI Pathologists. Sandra O Toole

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Chapter 7. DNA Microarrays

Sequencing technologies

Concepts and methods in sequencing and genome assembly

DNA-Sequenzierung. Technologien & Geräte

you can see that if if you look into the you know the capability kilobases per day, per machine kind of calculation if you do.

Next Generation Sequencing. Tobias Österlund

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Sequencing techniques

Next Generation Sequencing Technologies

Sequencing techniques and applications

Contact us for more information and a quotation

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Genomic techniques. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona.

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Bio(tech) Interlude. 3 Nobel Prizes: PCR: Kary Mullis, 1993 Electrophoresis: A.W.K. Tiselius, 1948 DNA Sequencing: Frederick Sanger, 1980

Ultrasequencing: methods and applications of the new generation sequencing platforms

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

NextGen Sequencing Technologies Sequencing overview

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Understanding the science and technology of whole genome sequencing

Advanced Technology in Phytoplasma Research

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

Next-generation sequencing technologies

BIOINFORMATICS 1 SEQUENCING TECHNOLOGY. DNA story. DNA story. Sequencing: infancy. Sequencing: beginnings 26/10/16. bioinformatic challenges

Joint RuminOmics/Rumen Microbial Genomics Network Workshop

NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

What is Bioinformatics?

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Using New ThiNGS on Small Things. Shane Byrne

Chapter 6 - Molecular Genetic Techniques

Deep Sequencing technologies

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

Application of NGS (next-generation sequencing) for studying RNA regulation

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Biochemistry 412. New Strategies, Technologies, & Applications For DNA Sequencing. 12 February 2008

HLA-Typing Strategies

1. Introduction Gene regulation Genomics and genome analyses

Course Overview. Objectives

Lecture 8: Sequencing and SNP. Sept 15, 2006

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Molecular Biology and Functional Genomic Core Facility

DNA Sequencing and Assembly

Illumina Sequencing Overview

NB536: Bioinformatics

Wet-lab Considerations for Illumina data analysis

Wheat CAP Gene Expression with RNA-Seq

Targeted Sequencing in the NBS Laboratory

High throughput sequencing technologies

Next-generation sequencing and quality control: An introduction 2016

TREE CODE PRODUCT BROCHURE

Opportunities offered by new sequencing technologies

Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Analysing genomes and transcriptomes using Illumina sequencing

INTRODUCCIÓ A LES TECNOLOGIES DE 'NEXT GENERATION SEQUENCING'

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Transcription:

Introduction to Next eneration Sequencing (NS) Simon Rasmussen Assistant Professor enter for Biological Sequence analysis Technical University of Denmark 2012

Today 9.00-9.45: Introduction to NS, How it works 10.00-10.30: Data basics - what does the data look like? 10.30-11.00: de novo assembly exercise 11.15-12.00: Alignment of reads 13.00-13.30: Introduction to exercise (variations, alignment processing, genotyping) 13.30-16.30: Afternoon exercise

DNA sequencing Reading the order of bases in DNA fragments

Why NS? Transforming how we are doing biological science (and bioinformatics) by allowing experiments that could not have been done before, and perform experiments much faster

How? by producing massive amounts of sequence data, really fast

1st generation to NS Kilobases per day per machine 1,000,000,000 100,000,000 10,000,000 1,000,000 100,0000 10,000 1,000 100 10 el-based systems Manual slab gel Automated slab gel Massively parallel sequencing apillary sequencing irst-generation capillary Microwell pyrosequencing Second-generation capillary sequencer Single molecule? Short-read sequencers 1980 1985 1990 1995 2000 Year 2005 2010 uture 1977 - Sanger hain-termination method Stratton et al., Nature 2009

1st generation to NS Kilobases per day per machine 1,000,000,000 100,000,000 10,000,000 1,000,000 100,0000 10,000 1,000 100 10 el-based systems Manual slab gel Automated slab gel Massively parallel sequencing apillary sequencing irst-generation capillary Microwell pyrosequencing Second-generation capillary sequencer Single molecule? Short-read sequencers 1980 1985 1990 1995 2000 Year 2005 2010 uture 1977 - Sanger hain-termination method Human genome Stratton et al., Nature 2009

1st generation to NS 1,000,000,000 Single molecule? Illumina Kilobases per day per machine 100,000,000 10,000,000 1,000,000 100,0000 10,000 1,000 100 10 el-based systems Manual slab gel Automated slab gel Massively parallel sequencing apillary sequencing irst-generation capillary Microwell pyrosequencing Second-generation capillary sequencer Short-read sequencers Solid 454 Ion Torrent Pacific Biosciences Oxford Nanopore 1980 1985 1990 1995 2000 Year 2005 2010 uture 1977 - Sanger hain-termination method Human genome Stratton et al., Nature 2009

Read throughput 1977 2006 x very big number x 1 2007 x even bigger number 1998 2008 x gigantic number x 384 2011 x big number

Read throughput 1977 2006 x very big number x 1 1-384 2007 x even bigger number 1998 2008 x gigantic number x 384 2011 x big number

Read throughput 1977 2006 x very big number x 1 2007 1-384 10 5-10 9 x even bigger number 1998 2008 x gigantic number x 384 2011 x big number

Sequencing costs Drop in costs is faster than Moore s Law (omputer power doubles every 2 years)

Human sequencing irst draft genome of human in 2001, final 2004 Estimated costs $3 billion, time 13 years Today: Illumina: 1 week, 4000$ Exome: 6 weeks*, $998 Towards 1000$ genome? * Real-time, not machine-time

Storage and analysis Highest cost is (almost) not the sequencing but storage and analysis A standard human (30-40x) wholegenome sequencing exp. would create 100 b of data

Storage and analysis Highest cost is (almost) not the sequencing but storage and analysis A standard human (30-40x) wholegenome sequencing exp. would create 100 b of data BI, based in hina, is the world s largest genomics research institute, with 167 DNA sequencers producing the equivalent of 2-4,000 human genomes a day.

The X enomes projects 1000 genomes project: atalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts Sequence 2500 unidentified people from about 25 populations around the world 10.000 microbial genomes project, Earth Microbiome project, ancer genome project, Plants and animals,...

NS & Bioinformatics Extreme data size causes problems Just transferring and storing the data Standard comparisons fail (N*N) Standard tools can not be used Think in fast and parallel programs

What can we use it for? Whole genome re-sequencing Ancient genomes Metagenomics ancer genomics Exome sequencing (targeted) RNA sequencing hip-seq enomic Epidemiology anything with DNA

How it works?

irst generation: Sanger (dye) ragment DNA lone into plasmid and amplify Sequence using dntp + labelled ddntps (stops reaction) Run capillary electrophoresis and read DNA code Low output, long reads (~300-1000 nt), high quality

2nd generation 1. reate library molecule 2. Amplification (PR) 3. Massive parallel sequencing

2nd generation 1. reate library molecule 2. Amplification (PR) 3. Massive parallel sequencing DNA from extract ragment & polish DNA Adapters Library molecule

2nd generation 1. reate library molecule 2. Amplification (PR) 3. Massive parallel sequencing Library

Amplification and immobilization Emulsion PR (454, Solid, IonTorrent): Water, oil, beads, one DNA template/droplet Bridge PR (Illumina): One DNA template/cluster, primers on surface, grow by bridging primers Metzker, Naten Rev. 2010

luorescence detection REVIEWS Illumina - yclic reversible termination 454 - Pyrosequencing a Illumina/Solexa Reversible terminators A T A T c Helicos BioSciences Reversible terminators Add all dntps Incorporate all four nucleotides, each label with a different dye labelled w. diff dye T A A T Incorporate single, dye-labelled nucleotides Load template beads into wells reate fourcolor image Wash, fourcolour imaging T A T Wash, onecolour imaging low one dntp across wells Polymerase incorporates nucleotide leave dye and repeat next cycle leave dye and terminating groups, wash T A T leave dye and inhibiting groups, cap, wash Release of PPi leads to light 27803 - Biological Sequence b Analysis Repeat cycles d Imaging, next Repeat cycles dntp Metzker, Naten Rev. 2010 T A

groups, wash groups, cap, wash 2: Imaging handout Repeat cycles Repeat b d Illumina 1: T A Illumina 2: T A T A Top: ATT Bottom: Top: Bottom: TAT ATA One-base-encoded probe An oligonucleotide sequence in which one interrogation base is 27803 associated - Biological with Sequence a particular Analysis igure 2 our-colour and one-colour cyclic reversible termination methods. a The four-colour cyclic termination (RT) method uses Illumina/Solexa s 3 -O-azidomethyl reversible terminator chemistry 23,101 (B solid-phase-amplified template clusters (I. 1b, shown as single templates for illustrative purposes). ollo imaging, a cleavage step removes the fluorescent dyes and regenerates the 3 -OH group using the reduci tris(2-carboxyethyl)phosphine (TEP) 23. b The four-colour images highlight the sequencing data from tw amplified templates. c Unlike Illumina/Solexa s terminators, 454: the Helicos Virtual Terminators 33 are labelled same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition ollowing total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and groups using TEP to permit the addition of the next y5-2 -deoxyribonucleoside triphosphate (dntp) an free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition 33 (step d The one-colour images highlight the sequencing data from two single-molecule templates. Metzker, Naten Rev. 2010

2.5: Ion Torrent IonTorrent video Based on semiconductors, ie. no fluorescence Release of hydrogen when a nucl. is incorporated is measured by ph-meter Small machine, low price pr. run

3rd generation No amplification (PR introduces bias!) Simple sample preparation Helicos Pacific Biosciences Oxford Nanopore

Platform 3730XL 454 LX 454 S JR HiSeq 2000 MiSeq SOLiD 5500 IonTorrent PacBio RS Method of amplification lonal plasmid amplificatio n emrr on beads emrr on beads Bridge PR amplification Bridge PR amplification empr on bead empr on bead None hemistry hain termination Synthesis (Pyrosequencing) Synthesis (Pyrosequencing) Synthesis (Reversible termination) Synthesis (Reversible termination) Ligation (dual-base encoding) Synthesis (H + detection) Synthesis Instrument ost $376k $500k $108k $690k $125k $595k $67.5k $695k Yield per Run 60 kb 900 Mb 50 Mb 600 b 1 b 155 b 1 b 20-80 Mb Read Length (bases) 650 750 400 100 150 75 + 35 200 (318 chip) <1,800 - >5,000 Reagent ost (library + run) $96 $6 200 $1 100 $23 610 $1 035 $10 503 $925 $272 ost per Mb $1600 $7 $22 $0.039 $1 $0.068 $0.93 $3.4-13.6 Primary error & error rate Substitution 0.1-1 % Indel 1% Indel 1% Substitution >0.1% Substitution >0.1% indel >0.01% Indel ~1% Indel ~15% Primary Advantage Low cost for small study Long read length Long read length Most output at lowest cost Easy workflow & fast run Each lane can be run independent ly & ability to rescue failed cycle ast run, low cost, and trajectory to longer read Longest read length, single molecule real-time seq Primary Disadvantage High cost for large study Unreliable for homopolym er region; High cost NS High cost per Mb High capital cost & computation need ew reads & higher cost per Mb Relatively short read, more gap in assemblies Unreliable for long homopolymer region High error rates, Low output, expensive

Platform 3730XL 454 LX 454 S JR HiSeq 2000 MiSeq SOLiD 5500 IonTorrent PacBio RS Method of amplification lonal plasmid amplificatio n emrr on beads emrr on beads Bridge PR amplification Bridge PR amplification empr on bead empr on bead None hemistry hain termination Synthesis (Pyrosequencing) Synthesis (Pyrosequencing) Synthesis (Reversible termination) Synthesis (Reversible termination) Ligation (dual-base encoding) Synthesis (H + detection) Synthesis Instrument ost $376k $500k $108k $690k $125k $595k $67.5k $695k Yield per Run 60 kb 900 Mb 50 Mb 600 b 1 b 155 b 1 b 20-80 Mb Read Length (bases) 650 750 400 100 150 75 + 35 200 (318 chip) <1,800 - >5,000 Reagent ost (library + run) $96 $6 200 $1 100 $23 610 $1 035 $10 503 $925 $272 ost per Mb $1600 $7 $22 $0.039 $1 $0.068 $0.93 $3.4-13.6 Primary error & error rate Substitution 0.1-1 % Indel 1% Indel 1% Substitution >0.1% Substitution >0.1% indel >0.01% Indel ~1% Indel ~15% Primary Advantage Low cost for small study Long read length Long read length Most output at lowest cost Easy workflow & fast run Each lane can be run independent ly & ability to rescue failed cycle ast run, low cost, and trajectory to longer read Longest read length, single molecule real-time seq Primary Disadvantage High cost for large study Unreliable for homopolym er region; High cost NS High cost per Mb High capital cost & computation need ew reads & higher cost per Mb Relatively short read, more gap in assemblies Unreliable for long homopolymer region High error rates, Low output, expensive