Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Similar documents
Overview of Next Generation Sequencing technologies. Céline Keime

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Matthew Tinning Australian Genome Research Facility. July 2012

Next-Generation Sequencing. Technologies

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Chapter 7. DNA Microarrays

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

NextGen Sequencing Technologies Sequencing overview

Concepts and methods in sequencing and genome assembly

Human genome sequence

Introduction to Next Generation Sequencing (NGS)

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Next Generation Sequencing (NGS)

Sequencing technologies

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Lecture 7. Next-generation sequencing technologies

Sequencing techniques

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Contact us for more information and a quotation

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Third Generation Sequencing

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Research school methods seminar Genomics and Transcriptomics

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

DNA-Sequencing. Technologies & Devices

DNA-Sequencing. Technologies & Devices

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

Next Generation Sequencing. Simon Rasmussen Assistant Professor Center for Biological Sequence analysis Technical University of Denmark

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

Next Generation Sequencing Technologies

Next-generation sequencing and quality control: An introduction 2016

Next Generation Sequencing. Josef K Vogt Slides by: Simon Rasmussen

DATA FORMATS AND QUALITY CONTROL

DNA-Sequenzierung. Technologien & Geräte

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Next-generation sequencing technologies

Ultrasequencing: methods and applications of the new generation sequencing platforms

NUCLEIC ACIDS. DNA (Deoxyribonucleic Acid) and RNA (Ribonucleic Acid): information storage molecules made up of nucleotides.

Deep Sequencing technologies

Sequencing techniques and applications

Next Generation Sequencing Technologies. Some slides are modified from Robi Mitra s lecture notes

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Next-generation sequencing Technology Overview

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

Illumina Sequencing Overview

Sequence Assembly and Next Generation Sequencing Informatics CBPS7711

Wheat CAP Gene Expression with RNA-Seq

Bioinformatics Advice on Experimental Design

Methods, Models & Techniques. High-throughput DNA sequencing concepts and limitations

High throughput DNA Sequencing. An Equal Opportunity University!

Next Gen Sequencing. Expansion of sequencing technology. Contents

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

2nd (Next) Generation Sequencing 2/2/2018

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Welcome to the NGS webinar series

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

you can see that if if you look into the you know the capability kilobases per day, per machine kind of calculation if you do.

1. Introduction Gene regulation Genomics and genome analyses

Opportunities offered by new sequencing technologies

High throughput sequencing technologies

SEQUENCING TARU SINGH UCMS&GTBH

Next Generation Sequencing: An Overview

Lecture 8: Sequencing and SNP. Sept 15, 2006

NEXT GENERATION SEQUENCING. Farhat Habib

Next Generation Sequencing. Tobias Österlund

Targeted Sequencing Using Droplet-Based Microfluidics. Keith Brown Director, Sales

BIOINFORMATICS 1 SEQUENCING TECHNOLOGY. DNA story. DNA story. Sequencing: infancy. Sequencing: beginnings 26/10/16. bioinformatic challenges

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Read Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016

Get to Know Your DNA. Every Single Fragment.

DNA and genome sequencing. Matthew Hudson Dept of Crop Sciences University of Illinois

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Jonathan A. Eisen. University of California, Davis

Introductory Next Gen Workshop

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

Nuts and bolts of phage genome sequencing. the 5 5 and 5 8 perspective. Allison Johnson & Anneke Padolina

Bio(tech) Interlude. 3 Nobel Prizes: PCR: Kary Mullis, 1993 Electrophoresis: A.W.K. Tiselius, 1948 DNA Sequencing: Frederick Sanger, 1980

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

NEXT GENERATION SEQUENCING: A REVOLUTION IN GENE SEQUENCING

How is genome sequencing done?

Multiplexed Strand-specific RNA-Seq Library Preparation for Illumina Sequencing Platforms

BENG 183 Trey Ideker The next generation

Transcription:

Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms

Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality Scores Tweaks Data Formats Resources 2

Introduction Rapid technological development 3 NGS platforms mainly used: Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Others Sanger ( 1 st generation ) Polonator Ion Torrent Budget systems: Illumina MySeq, 454 Junior, Helicos Heliscope Oxford Nanopore Pacific Biosciences SMRT 2012-03-19 Sequencing Platforms Fabian Müller 3 Baker, Nature Methods, 2010

Distribution of Platforms All platforms 4 http://pathogenomics.bham.ac.uk/hts/

Distribution of Platforms GAII 5

Distribution of Platforms HiSeq 6 http://pathogenomics.bham.ac.uk/hts/

Distribution of Platforms SOLiD 7 http://pathogenomics.bham.ac.uk/hts/

Distribution of Platforms 454 8 http://pathogenomics.bham.ac.uk/hts/

Distribution of Platforms all platforms Sequencing Platforms Fabian Müller 2012-03-19 9 http://pathogenomics.bham.ac.uk/hts/

Workflow Library preparation Fragmentation Adapter ligation Size selection Amplification Dehybridization Immobilization (2 nd ) Amplification Emulsion PCR Solid phase 2012-03-19 Sequencing Platforms Fabian Müller 10 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

Workflow Library preparation Fragmentation Adapter ligation Size selection Amplification Dehybridization Immobilization (2 nd ) Amplification Emulsion PCR Solid phase Sequencing By synthesis o Cyclic reversible termination o Single-nucleotide addition By ligation Imaging Base calling Alignment/assembly Higher level data processing Quality Control 11

Illumina Genome Analyzer / HiSeq Previously: Solexa Then: Genome Analyzer State of the art: HiSeq 2500 Currently the most widely used platform Cluster generation step Sequencing by synthesis Reversible terminators 12

Illumina Genome Analyzer / HiSeq Hybridization to flow cell lawn of sequences complimentary to adapters 1 flow cell has 8 lanes 13 Mardis, Annu. Rev. Genomics Hum. Genet., 2008; www.illumina.com

Illumina Genome Analyzer / HiSeq Cluster Generation: Bridge Amplification Denaturation Random cluster distances Strand removal s.t. a single direction can be sequenced 14 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

Illumina Genome Analyzer / HiSeq Sequencing by synthesis using reversible terminator chemistry Add sequencing primer Add labeled nucleotides Excite and detect light emission Remove blocking group Image analysis for cluster identification Sequence of images yields DNA sequence 15 Metzker, Nature Reviews Genetics, 2010

Illumina Genome Analyzer / HiSeq Extensive output HiSeq allows for 2 flowcells to be processed simultaneously and images from top and bottom Substitutions are the most common error Spike-ins facilitate quality control and base calling calibration e.g. ΦX174 phage genome Limitations Read length limited by dephasing Quality decreases towards read ends as signal intensities decline Substitution biases as only 2 lasers excite 4 dntp (A/C, G/T) Alternative base callers (e.g. Ibis) Low complexity reads o Results from sequencing junk (e.g. dust, lints, ) 16

Roche 454 GS / FLX / Titanium Current machine version: GS FLX+ Emulsion PCR amplification Sequencing by synthesis Pyrosequencing 17

Roche 454 GS20 / FLX / Titanium Emulsion PCR Use water in oil emulsion to isolate single DNA molecules Amplification in microreactors produces millions of copies on each bead Applies also to ABI SOLiD Molecule to bead ratio to ensure 1 molecule per bead Occupied beads can be selected from empty ones via the second adapter sequence 2012-03-19 Sequencing Platforms Fabian Müller 18 Metzker, Nature Reviews Genetics, 2010

Roche 454 GS20 / FLX / Titanium A picotiter plate contains 1 bead per well ~2M wells Reagents are added Nucleotides (unlabeled) are successively washed across the plate ATP driven luciferase light reactions allows to monitor which and how many bases are incorporated 19 Metzker, Nature Reviews Genetics, 2010

Roche 454 GS20 / FLX / Titanium A picotiter plate contains 1 bead per well ~2M wells Reagents are added Nucleotides (unlabeled) are successively washed across the plate ATP driven luciferase light reactions allows to monitor which and how many bases are incorporated Imaging via high resolution CCD camera 20 Metzker, Nature Reviews Genetics, 2010

Roche 454 GS20 / FLX / Titanium Problems Mixed beads o Software postprocessing Long homopolymers can lead to inconsistent calls o Primary errors are insertions and deletions Bleed-over signals and ghost wells o Strong light emissions may influence neighboring well readout o Software correction Limitations Emulsion PCR technically challenging Polymerase and luciferase efficiency drops during run Long reads Deep sequencing 21

ABI SOLiD Life Technologies/Applied Biosystems Current machine versions: SOLiD4, 5500XL Emulsion PCR similar to 454 Sequencing by ligation 2012-03-19 Sequencing Platforms Fabian Müller 22 http://www.appliedbiosystems.com

ABI SOLiD Use labeled oligonucleotides Degenerate positions 3-5 Specific dinucleotides at 1-2 1 of 4 fluorescent dyes 2012-03-19 Sequencing Platforms Fabian Müller 23 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

ABI SOLiD Sequencing: Ligation of oligos from mixture o First 2 bases will match the template Imaging Capping of unextended probes o Phosphatase treatment to prevent any remaining unextended strands from contributing to out of phase ligation events Cleaving off the flour 2012-03-19 Sequencing Platforms Fabian Müller 24 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

Do for 5 primer offsets ABI SOLiD Sequencing: Do for n cycles Ligation of oligos from mixture o First 2 bases will match the template Imaging Capping of unextended probes o Phosphatase treatment to prevent any remaining unextended strands from contributing to out of phase ligation events Cleaving off the flour 25 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

Do for 5 primer offsets ABI SOLiD Sequencing: Do for n cycles Ligation of oligos from mixture o First 2 bases will match the template Imaging Capping of unextended probes o Phosphatase treatment to prevent any remaining unextended strands from contributing to out of phase ligation events Cleaving off the flour 26 Mardis, Annu. Rev. Genomics Hum. Genet., 2008

ABI SOLiD Imaging cycling produces a chain of colors (color space) 27

ABI SOLiD Imaging cycling produces a chain of colors (color space) Each base is captured twice 28

ABI SOLiD Imaging cycling produces a chain of colors (color space) Each base is captured twice If the first base is known (we know the adapter), then for a given sequence the remaining bases follow Alignment in color space 29

ABI SOLiD Double interogation of each base facilitates discrimation of errors from true polymorphisms (SNPs) If reference sequence is present Works better in theory than in practice High accuracy 30

ABI SOLiD Double interogation of each base facilitates discrimation of errors from true polymorphisms (SNPs) If reference sequence is present Works better in theory than in practice High accuracy Problems Probes do not necessarily ligate next to the primer signal decline Limitations Emulsion PCR technically challenging Long run times Short read lengths 31

Problems All Platforms Dephasing Sequencing cycles out of sync Source o Multiple bases inserted o No base inserted o Terminator stuck or ineffective Adapter problems Adapter chimeras Sequencing into the adapter PCR artifacts E.g. coverage variation Library contamination Local effects E.g. bubbles, machine calibration, incomplete mixing of reagents, broken chemistry E.g. degraded fluorophores/ polymerase QC is essential! Tools: FastQC, SuperDeDuper, samtools, GATK, o See exercise 32

Platform Comparison Roche 454 ABI SOLiD Illumina HiSeq Read length 700 1000 bp 75bp (75+35bp PE) 2 * 105bp Runtime 23 h 7 d / genome 12 d Initial release 10/2005 2007 Early 2007 #reads 1*10 6 6*10 8 3*10 9 (SE) 6*10 9 (PE) Error rates ~1% ~0.1% ~1% Machine cost ~ 690,000$ Sequencing cost ~ 20$ / Mb ~ 0.5$ / MB ~ 30$ / Gb 33

Quality Scores Phred Score (Q): Q = 10 log 10 P Here P denotes the estimated base calling error probability Base quality scores tend to decline towards the end of the read Reads are often trimmed before or in the alignment step 34

Tweaks Paired End Sequencing Virtually increases read length Better mapping Long inserts allow for efficient assembly Helpful in resolving structural variations and repetitive regions Sequencing Platforms Fabian Müller 2012-03-19 35 www.illumina.com, Kircher 2011

Tweaks Paired End Sequencing Virtually increases read length Better mapping Long inserts allow for efficient assembly Helpful in resolving structural variations and repetitive regions Mate Pair libraries Similar to paired end, but involves circularization Used for larger DNA molecules Provides distance information Sequencing Platforms Fabian Müller 2012-03-19 36 www.illumina.com, Kircher 2011

Tweaks Directional libraries Sequence only 1 strand Barcoding Aka multiplexing Adding sample specific tags allows for sequencing multiple samples in a single lane The samples can be separated based on their tags Sequencing Platforms Fabian Müller 2012-03-19 37

File Formats Image data Usually discarded after base calling FASTA/FASTQ identifier (typically specifies flow cell location) Sequence quality scores (FASTQ only) SAM/BAM File format for aligned reads However due to good compression and annotation, also often used for storing unaligned reads More in the alignment lecture 38

File Formats FASTA/FASTQ identifier (typically specifies flow cell location and read number) Sequence quality scores (ASCII encoded, FASTQ only) Color space equivalents exist for SOLiD *.fastq @HWUSI-EAS100R:6:73:941:1973#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +HWUSI-EAS100R:6:73:941:1973#0/1 IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC Different ASCII encodings for quality scores exist 39

File Formats FASTA/FASTQ identifier (typically specifies flow cell location and read number) Sequence quality scores (ASCII encoded, FASTQ only) Color space equivalents exist for SOLiD Different ASCII encodings for quality scores exist *.csfasta / *.qual >186_2041_1641_F3 T122233110.3012011122133012030.1110.31220022220.120 >186_2041_1706_F3 T11132121312201321220103230123.2113.31201112230.031 >186_2041_1709_F3 T2103022220322301123212223030330323320201102233.123 >97_2040_1850_F3 38 36 26 33 41 26 24 33 28 31 27 23 5 35 32 31 11 10 24 38 22 24 7 12 15 21 12 18 34 31 27 >97_2040_1898_F3 41 41 41 38 32 29 39 24 23 36 32 38 25 30 28 21 27 33 34 33 24 27 9 35 34 14 30 18 33 8 13 32 40

Resources Seqanswers Forum and wiki for all sorts of questions concerning NGS http://seqanswers.com NCBI Short Read Archive (SRA) Data archive for NGS data Discontinued? Maybe not http://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive (ENA) http://www.ebi.ac.uk/ena/ DNAnexus Cloud based data management and analysis capabilities for sequencing providers and researchers https://dnanexus.com/ 41

Sequencing Platforms Questions? Sequencing Platforms Fabian Müller 2012-03-19 42