Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Similar documents
Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing techniques

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

Next-generation sequencing Technology Overview

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Using New ThiNGS on Small Things. Shane Byrne

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Understanding the science and technology of whole genome sequencing

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Next-Generation Sequencing. Technologies

Deep Sequencing technologies

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Next Gen Sequencing. Expansion of sequencing technology. Contents

Third Generation Sequencing

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

BIOINFORMATICS 1 SEQUENCING TECHNOLOGY. DNA story. DNA story. Sequencing: infancy. Sequencing: beginnings 26/10/16. bioinformatic challenges

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Sequencing techniques and applications

Introduction to Next Generation Sequencing (NGS)

DNA Sequencing by Ion Torrent. Marc Lavergne CHEM 4590

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Next Generation Sequencing. Tobias Österlund

Overview of Next Generation Sequencing technologies. Céline Keime

Galaxy Workshop

NEXT GENERATION SEQUENCING. Farhat Habib

TREE CODE PRODUCT BROCHURE

FUTURE PROSPECTS IN MOLECULAR INFECTIOUS DISEASES DIAGNOSIS

Matthew Tinning Australian Genome Research Facility. July 2012

Principles of Sequencing and Pla3orms

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Next generation sequencing in diagnostic laboratories: opportunities and challenges

AUDREY FARBOS JEREMIE POSCHMANN PAUL O NEILL KONRAD PASZKIEWICZ KAREN MOORE

Next Generation Sequencing (NGS)

Targeted Sequencing in the NBS Laboratory

Introduction to Whole Genome Sequencing and its Applications in Microbial Diagnostics

INTRODUCTION TO GENOMICS & SEQUENCING

EURL WORKING GROUP ON WHOLE GENOME SEQUENCING AND PULSENET INTERNATIONAL

A Crash Course in NGS for GI Pathologists. Sandra O Toole

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

Wheat CAP Gene Expression with RNA-Seq

2014 APHL Next Generation Sequencing (NGS) Survey

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

1. Introduction Gene regulation Genomics and genome analyses

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

2nd (Next) Generation Sequencing 2/2/2018

Human genome sequence

Research school methods seminar Genomics and Transcriptomics

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Faramarz Valafar.

Next Generation Sequencing (NGS) Market Size, Growth and Trends ( )

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Next-generation sequencing and quality control: An introduction 2016


BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

CBC Data Therapy. Metagenomics Discussion

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

NGS technologies approaches, applications and challenges!

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Welcome to the NGS webinar series

Introduction to Whole Genome Sequencing and its Applications in Microbial Diagnostics

Introduction to Microbial Sequencing

Sanger vs Next-Gen Sequencing

HLA-Typing Strategies

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

Next Generation Sequencing Technologies

Analytics Behind Genomic Testing

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Lecture 7. Next-generation sequencing technologies

DE NOVO WHOLE GENOME ASSEMBLY AND SEQUENCING OF THE SUPERB FAIRYWREN. (Malurus cyaneus) JOSHUA PEÑALBA LEO JOSEPH CRAIG MORITZ ANDREW COCKBURN

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

DNA Sequencing. Happiness Kumburu BSU- workshop Nov, 2016

MinION, GridION, how does Nanopore technology meet the needs of our users?

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

SEQUENCING. M Ataei, PhD. Feb 2016

Experimental Design Microbial Sequencing

Introduction to Whole Genome Sequencing and its Applications in Microbial Diagnostics

Molecular Biology and Functional Genomic Core Facility

Next Generation Sequencing for Metagenomics

NGS-based innovations within the Leiden Network

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

Looking Ahead: Improving Workflows for SMRT Sequencing

Announcements. Coffee! Evalua,on. Dr. Yoshiki Sasai, R.I.P.

Measuring transcriptomes with RNA-Seq

A Roadmap to the De-novo Assembly of the Banana Slug Genome

BST227 Introduction to Statistical Genetics. Lecture 8: Variant calling from high-throughput sequencing data

Transcription:

Sequencing Theory Brett E. Pickett, Ph.D. J. Craig Venter Institute Applications of Genomics and Bioinformatics to Infectious Diseases GABRIEL Network

Agenda Sequencing Instruments Sanger Illumina Ion Torrent Oxford Nanopore PacBio

Virus (or Pathogen) Sequencing Application of NGS to study of Virus Evolution and Molecular Epidemiology Track the evolution of viruses over time Better understand the selective pressures that drive virus evolution Identify the origins (reservoir) of outbreak strains Investigate transmission dynamics Identify molecular determinants of host range Identification of evolutionarily conserved regions for targeted vaccines

Some Trivia What year was the first whole genome sequence reported? a) 1969 b) 1977 c) 1981 d) 1985 For which organism? Bacteriophage ΦX174 (5,375 bp) What method was used? dideoxy chain termination with 32 P (aka Sanger sequencing) What year was the first whole genome sequence for a free living organism reported? a) 1979 b) 1984 c) 1989 d) 1995 For which organism? Haemophilus influenza (1.8 x 10 6 bp) What method was used?

Some Trivia What year was the first whole genome sequence reported? a) 1969 b) 1977 c) 1981 d) 1985 For which organism? Bacteriophage ΦX174 (5,375 bp) What method was used? dideoxy chain termination with 32 P (aka Sanger sequencing) What year was the first whole genome sequence for a free living organism reported? a) 1979 b) 1984 c) 1989 d) 1995 For which organism? Haemophilus influenza (1.8 x 10 6 bp) What method was used? Sanger sequencing with fluorescence

JCVI Joint Technology Core ABI 3730xl Capacity: 240,000 sequences/day or 80 million lanes/year at 24 runs per day

New JCVI Joint Technology Core Illumina NextSeq/MiSeq 800 million reads/runs Oxford Nanopore MinION

Sanger vs NGS

Change in Cost 1st Generation Next Generation Next Generation w/broad adoption Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: http://www.genome.gov/sequencingcosts/. Accessed 23AUG2017.

Sanger

Sanger 1 => Sanger 2

A chromatogram

Illumina

Illumina Instruments https://www.illumina.com/systems/sequencing-platforms/comparison-tool.html

Illumina Sequencing (Optics-Based)

ION Torrent

ION Torrent Sequencing (H+ Based) High-throughput

PacBio

Single molecule detection Sequencing by synthesis Single base incorporation Sequences same molecule multiple times Random error detection Easy to generate consensus

Oxford Nanopore

Oxford Nanopore sequencing DNA pushed through a nanopore in a lipid membrane Speed control provided by a Phi29 DNA polymerase Measure changes in the ionic current of an applied electric field Combined w/ other platform to improve quality of assembly

Read Lengt 2e+05 Oxford 5 produces high quality reads 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 >50 Read kb; GC Content longest 1e+05 >800 Read GC Content kb 1e+05 C Read Read Lengt D 2e+05 Read 5 0e+00 0e+00 E 0.00 0.252e+05 0.50 0.75 1.00 Read Length Read GC Content 1e+05 Oxford F Read Length 2e+05 1e+05 0.00 0.25 0.50 0.75 1.00 Read GC Content PacBio 0e+00 0e+00 0.00 0.25 0.50 0.75 1.00 Read GC Content 0.00 0.25 0.50 0.75 1.00 Read GC Content Read Length 2e+05 1e+05 0e+00 E Read Length 2e+05 1e+05 Read Length F Read Length 2e+05 1e+05 2e+05 1e+05 0e+00 0e+00 5 10 15 20 Read Quality Score 5 10 15 20 Read Quality Score 0e+00 5 10 15 20 5 10 15 20 Read Quality Score Read Quality Score 4k 8k 12k 4k 8k 12k 20k 40k 60k 20k 40k 60k Read Count Read Count Read Count Read Count

Long Read Technology Comparison Advantages Full length transcriptomes, including splice variants Resolution of long repeat regions in genomes Genomic structural variants Haplotype phasing Disadvantages High error rates Higher cost Lower throughput

Instrument Comparison Platform Advantages Disadvantages HiSeq PE run (2x75) MiSeq PE run (2x300) Ion Torrent (200bp, 318 chip) high throughput, lowest per base cost high throughput, low per base cost, fast turnaround? fast turn-around short reads, long run time data quality, homopolymers Oxford Nanopore PacBio Sanger run (96 wells) Fast turn-around, various use cases, long reads, low-cost instrument base-calling various use cases, long reads, high quality data, long read length intense library prep, instrument cost high cost, low throughput

FastQ Format @HWUSI-EAS582_157:6:1:1:1501/1 NCACAGACACACACGAACACACAAAGACATGCCCATATGAAGAT + %.7786867:778556858746575058873/347777476035 @HWUSI-EAS582_157:6:1:1:1606/1 NCTGGCACCTTGATTTTGGACTTCCCAGCCTCCAGAACTGTGAG + %1948988888798988366898888648998788898888588 Header @HWUSI-EAS582_157:6:1:1:453/1 NCTGCTTGCACCCCTGAAGTCACTGATCACATTTCAGGGTCACC + %/868998988888867668888986644788988413488885 @HWUSI-EAS582_157:6:1:1:1844/1 NGATTGACATTGGCAAAGAGGACAACTGATTGCAAACTTCACAC + %-7;:::::;86499;75574586::635:62687666887879 @HWUSI-EAS582_157:6:1:1:1707/1 NAGGCTCAGGCGCACGGCCTACATCGTCGCTGTCGGCCAAGGGG + Read (sequence) Quality scores (phred-33)

Assessing Quality: Phred scores Phred quality scores were originally produced by the Phred base calling program using a statistical analysis of Sanger chromatogram trace files in support of the Human Genome Project. Subsequently adapted to NGS technologies for judging qualities of sequences. Q = -10 log 10 P e P e = error probability of a given base call

Acknowledgements JCVI Vinita Puri William Nierman, Ph.D. Karen Nelson, Ph.D. Alan Durbin Torrey Williams Kari A. Dilley, Ph.D. Lauren Oldfield, Ph.D. Susmita Shrivastava Nadia Fedorova Mark Novotny U19AI110819 Paolo Amedeo, Ph.D. Reed S. Shabman, Ph.D. Gene Tan, Ph.D.

Questions? bpickett@jcvi.org