Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Similar documents
Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Matthew Tinning Australian Genome Research Facility. July 2012

NEXT GENERATION SEQUENCING. Farhat Habib

Overview of Next Generation Sequencing technologies. Céline Keime

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Next Generation Sequencing. Tobias Österlund

Research school methods seminar Genomics and Transcriptomics

Next-generation sequencing Technology Overview

Third Generation Sequencing

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

Next-Generation Sequencing. Technologies

Deep Sequencing technologies

Contact us for more information and a quotation

Next Generation Sequencing (NGS)

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Next Gen Sequencing. Expansion of sequencing technology. Contents

Lecture 7. Next-generation sequencing technologies

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Ultrasequencing: methods and applications of the new generation sequencing platforms

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

SEQUENCING FROM SAMPLE TO SEQUENCE READY

Understanding the science and technology of whole genome sequencing

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

A Crash Course in NGS for GI Pathologists. Sandra O Toole

Bioinformatics: A perspective

Welcome to the NGS webinar series

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Chapter 7. DNA Microarrays

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Next-generation sequencing technologies

Next Generation Sequencing Technologies

Human genome sequence

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

SEQUENCING. M Ataei, PhD. Feb 2016

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Introduction to Next Generation Sequencing (NGS)

Sequencing techniques and applications

Next-Generation Sequencing Services à la carte

Genomic resources. for non-model systems

Bioinformatics: A perspective

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

NGS technologies approaches, applications and challenges!

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

HLA-Typing Strategies

Bioinformatics: A perspective

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Bioinformatics Advice on Experimental Design

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

DNA-Sequencing. Technologies & Devices

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

High throughput omics and BIOINFORMATICS

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

DNA-Sequencing. Technologies & Devices

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Using New ThiNGS on Small Things. Shane Byrne

Sequencing techniques

Molecular Biology and Functional Genomic Core Facility

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

Concepts and methods in sequencing and genome assembly

Sequencing technologies

2nd (Next) Generation Sequencing 2/2/2018

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

Single Cell Transcriptomics scrnaseq

TruSPAdes: analysis of variations using TruSeq Synthetic Long Reads (TSLR)

Understanding Accuracy in SMRT Sequencing

CBC Data Therapy. Metagenomics Discussion

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

NEXT GENERATION SEQUENCING Whole Gene Sequencing

DNA and genome sequencing. Matthew Hudson Dept of Crop Sciences University of Illinois

DNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Illumina s Suite of Targeted Resequencing Solutions

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

RNA-Seq data analysis course September 7-9, 2015

DNA Sequencing. Happiness Kumburu BSU- workshop Nov, 2016

Galaxy Workshop

Analytics Behind Genomic Testing

Experimental Design Microbial Sequencing

High throughput sequencing technologies

Introduction to Microbial Sequencing

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms


solid S Y S T E M s e q u e n c i n g See the Difference Discover the Quality Genome

Transcription:

Sequencing technologies Jose Blanca COMAV institute bioinf.comav.upv.es

Outline Sequencing technologies: Sanger 2nd generation sequencing: 3er generation sequencing: 454 Illumina SOLiD Ion Torrent PacBio Nanopore General considerations Reducing the complexity

Cost per raw Megabase of DNA

Technological exponential growth Nokia 1200 (2007) iphone 5S (2013)

Sanger sequencing

Sanger sequencing Traditional DNA sequencing method Ideal for small sequencing projects Read length around 600-700 bp Around 5-10$ per reaction 384 reactions in parallel at most Applied Biosystems is the main technological provider

Sanger sequencing

Sanger sequencing

ABI files Sequences are usually delivered as ABI files. Binary files. Contain real signal, chromatogram and called sequence and quality They can be opened with: chromas (win) (www.technelysium.com.au/chromas.html) Sequence scanner (win) (Applied Biosystems) FinchTv (win, mac, linux) (www.geospiza.com/products/finchtv.shtml) trev (win, mac, linux) (part of Staden). Usually they have a.ab1 extension.

Sequence and quality Phred score = - 10 log (prob error)

Sequence and quality Sequence in fasta format >SGN-E577821 TAACAATAACAGTCAGAGCCCGTGTGGGAGCAGTACCGTGGAGTCATCCAGCGGAG AAACGGTTGTTCACGCGCCTAATACGCGACACGCGCCGC >SGN-E577821 54 57 54 57 48 48 48 48 57 57 57 47 47 41 42 41 47 57 57 57 57 47 44 44 44 44 50 50 54 57 57 46 43 37 44 43 57 37 37 37 57 57 57 57 52 52 52 52 57 50 47 47 52 52 47 52 52 50 50 50 50 50 57 57 54 57 57 57 57 57 57 57 46 46 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 57 29 29

Sequence and quality Due to technical limitations different technologies have different errors patterns.

Sanger sequencing Quality is worst at the beginning and at the end.

Other sources of error Pre-sequencing: PCR mutation-like errors Cloning artifacts, chimeric molecules Post-sequencing: Assembly artifacts Alignment errors due to: Reference Alignment algorithms SNV calling software

2nd generation sequencing

Sanger vs NGS sequencing Sanger NGS Num. sequences per reaction 1 clone Millions of molecules Max. parallelization 384 Several millions Sequence quality High Low Sequence length 600-800 bp 35-20000 (depends on the platform) Throughtput Low High

Sanger vs NGS

Library preparation Fragmentation Sonication Nebulization Shearing Size selection End repair Sequencing adaptor ligation Purification

454 First NGS platform (and first to be phased out) Pirosequencing based chemistry Long reads (400-700bp) Most expensive cost per base Ideal for de novo sequencing projects Owned by Roche ½, ¼ and 1/8 run can be ordered >1 million reads GS FLX+ and GS Junior

454

454 quality The lengthiest the homopolymer the less quality. It is very difficult to differentiate AAAAAA from AAAAA. Quality diminishes with the sequence length.

Illumina Previously known as Solexa Reversible terminators based sequencing technique Short reads (50 or 250bp depending on the version) Lowest cost per base Ideal for resequencing projects Highest throughput Runs divided in 8 lanes Up to 4000 million reads Can sequence both ends of the molecules (paired ends) HiSeq2500, NextSeq 500 and MiSeq

Illumina 1 sample: 30X human genome

HiSeq X Ten

Illumina

Illumina Quality diminishes with sequence length. No homopolymer problem, mainly substitution errors.

SOLiD Ligation based sequencing chemistry Short reads (35-75bp depending on the version) Only for resequencing projects It used to produce nucleotide sequences, but colors Color sequences have poor quality, but nucleotide sequences have high quality 115 or 320 million reads

SOLiD

SOLiD Really?

Ion Torrent Around 60-80 M reads. 200 pb length. Sequences based on H+ production Error rates lower than other 2nd generation Error pattern similar to 454, with homopolymer problem. Very cheap per run. Belongs to Life technologies (Applied Biosystems)

Ion Torrent

3rd generation sequencing

PacBio 3rd generation platform (single molecule) Polymerase based chemistry (SMRT) Longest NGS reads (up to 20,000bp) Very high error rate Ideal for de novo sequencing projects 45000 reads

PacBio 3rd generation, single molecule detection. No amplification step required. Nucleotides labeled on the phosphate removed during the polymerization. Sequencing based on the time required by the polymerase to incorporate a nucleotide (Polymerase requires milliseconds versus microseconds for the stochastic diffusion)

PacBio length distribution Longest lengths Limited by the polymerase damages due to the laser Sequencing strategies Standard Circular consensus Taken from flxlexblog.wordpress.com Distributions for standard sequencing, for circular the mean is around 250 pb.

PacBio quality distribution Distributions for standard sequencing. For circular sequencing the quality is at least 99.9% Taken from flxlexblog.wordpress.com

Nanopore Senses differences in ion flow USB powered

Nanopore-first data Not very reliable (yet)

Nanopore alignments

Nanopore alignments

Nanopore accuracy

Nanopore-Illumina hybrid error correction

Sequencing technologies Modified from Michael Stromberg

NGS capabilities Modified from ngsbuzz.blogspot.com Ngsbuzz gives 2.94 Gb yield to Pacbio.

Throughput

Cost per nucleotide

Cost per reaction

Read length

Bioinformatic challenges Huge data files handling. Beefy computers required. Software still being developed or missing. Ad-hoc software required during the analysis. Existing software tailored to experienced bioinformaticians. Dollar for dollar rule proposed EDSAC by Computer Laboratory Cambridge

Bioinformatic challenges Amount of data managed on a typical transcriptome assembly

NGS vs Hard Drive Stein Genome Biology 2010 11:207 doi:10.1186/gb-2010-11-5-207

Reducing the complexity

Genome Pros: Finest resolution Reproducible Cons: Expensive ($600 per sample) Lots of information will be lost if no reference is available, especially in the repetitive regions.

RNASeq Pros: Cheaper than whole genome sequencing ($300) Well proven methodologies Cons: RNA handling For many samples is pricier than GBS Follows gene density Reproducible Follows gene density

Exome

Exome

Exome Pros: More complete representation than RNASeq More reproducible than RNASeq Cons: Exome capture platforms only available in model species Pricier than RNASeq

doi: 10.1534/genetics.112.147710 Genotyping by Sequencing (GBS) GBS review: Nature Reviews Genetics 12, 499510 (July 2011) doi:10.1038/nrg3012

Genotyping by Sequencing (GBS) Pros: Cheap ($50 per sample) Lots of variation Cons: Prone to artifacts (e.g. false SNPs due to repetitive DNA) if no reference genome is available. Degree of coverage along the genome depends on the Restriction Enzyme chosen How reproducible is it? GBS based SNPs in Soy doi: 10.1534/genetics.112.147710

Amplicons Pros: Cons: Cheap for few genes Not scalable for lots of genes Labor intensive Previous sequence information is required Amplicon sets can be ordered, but the design is expensive

This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Jose Blanca COMAV institute bioinf.comav.upv.es