BIOINFORMATICS 1 SEQUENCING TECHNOLOGY. DNA story. DNA story. Sequencing: infancy. Sequencing: beginnings 26/10/16. bioinformatic challenges

Similar documents
Matthew Tinning Australian Genome Research Facility. July 2012

Overview of Next Generation Sequencing technologies. Céline Keime

Human genome sequence

Next Gen Sequencing. Expansion of sequencing technology. Contents

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Third Generation Sequencing

Genome Sequencing. I: Methods. MMG 835, SPRING 2016 Eukaryotic Molecular Genetics. George I. Mias

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Contact us for more information and a quotation

Chapter 7. DNA Microarrays

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Introduction to Next Generation Sequencing (NGS)

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Next Generation Sequencing (NGS)

Next Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms

Functional Genomics Research Stream. Research Meetings: November 2 & 3, 2009 Next Generation Sequencing

Next-generation sequencing Technology Overview

Nuts and bolts of phage genome sequencing. the 5 5 and 5 8 perspective. Allison Johnson & Anneke Padolina

Next generation sequencing techniques" Toma Tebaldi Centre for Integrative Biology University of Trento

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Concepts and methods in sequencing and genome assembly

DNA Sequencing by Ion Torrent. Marc Lavergne CHEM 4590

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

NGS technologies approaches, applications and challenges!

High throughput DNA Sequencing. An Equal Opportunity University!

Next Generation Sequencing. Simon Rasmussen Assistant Professor Center for Biological Sequence analysis Technical University of Denmark

Next Generation Sequencing. Josef K Vogt Slides by: Simon Rasmussen

DNA-Sequencing. Technologies & Devices

Introduction to NGS. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Next-Generation Sequencing. Technologies

Sequencing Theory. Brett E. Pickett, Ph.D. J. Craig Venter Institute

Welcome to the NGS webinar series

A Crash Course in NGS for GI Pathologists. Sandra O Toole

DNA-Sequencing. Technologies & Devices

Sequencing technologies

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

DNA and genome sequencing. Matthew Hudson Dept of Crop Sciences University of Illinois

Research school methods seminar Genomics and Transcriptomics

Genome Sequencing Technologies. Jutta Marzillier, Ph.D. Lehigh University Department of Biological Sciences Iacocca Hall

Wheat CAP Gene Expression with RNA-Seq

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

DNA-Sequenzierung. Technologien & Geräte

Introduction to metagenome assembly. Bas E. Dutilh Metagenomic Methods for Microbial Ecologists, NIOO September 18 th 2014

Molecular Biology. DNA structure and function. Recombinant DNA technology. DNA amplification. DNA sequencing

NB536: Bioinformatics

Genome Sequencing and Assembly

Introduction to NGS. Simon Rasmussen Associate Professor DTU Bioinformatics Technical University of Denmark 2018

Next Generation Sequencing Technologies

Multiple choice questions (numbers in brackets indicate the number of correct answers)

DNA SEQUENCING BY SANGER METHOD

De novo assembly of human genomes with massively parallel short read sequencing. Mikk Eelmets Journal Club

CSC Assignment1SequencingReview- 1109_Su N_NEXT_GENERATION_SEQUENCING.docx By Anonymous. Similarity Index

NEXT-GENERATION SEQUENCING AND BIOINFORMATICS

CHEM 4420 Exam I Spring 2013 Page 1 of 6

Next Generation Sequences & Chloroplast Assembly. 8 June, 2012 Jongsun Park

How is genome sequencing done?

Ultrasequencing: methods and applications of the new generation sequencing platforms

2 nd Genera-on ( NextGen ) Sequencing Technologies

Introduction to Bioinformatics. Lecture 20: Sequencing genomes

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Genetic Fingerprinting

Introductory Next Gen Workshop

Sequencing techniques

Next Generation Sequencing. Tobias Österlund

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

Genetic Fingerprinting

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

4. Analysing genes II Isolate mutants*

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Deep Sequencing technologies

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

SEQUENCING TARU SINGH UCMS&GTBH

High throughput sequencing technologies

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

A shotgun introduction to sequence assembly (with Velvet) MCB Brem, Eisen and Pachter

13-2 Manipulating DNA Slide 1 of 32

AUDREY FARBOS JEREMIE POSCHMANN PAUL O NEILL KONRAD PASZKIEWICZ KAREN MOORE

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequencing techniques and applications

Course summary. Today. PCR Polymerase chain reaction. Obtaining molecular data. Sequencing. DNA sequencing. Genome Projects.

DNA Sequencing and Assembly

Bioinformatics for Genomics

Gene Expression Technology

Molecular Genetics Techniques. BIT 220 Chapter 20

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Genetics Lecture 21 Recombinant DNA

Bio(tech) Interlude. 3 Nobel Prizes: PCR: Kary Mullis, 1993 Electrophoresis: A.W.K. Tiselius, 1948 DNA Sequencing: Frederick Sanger, 1980

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

Understanding the science and technology of whole genome sequencing

you can see that if if you look into the you know the capability kilobases per day, per machine kind of calculation if you do.

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

BIOTECHNOLOGY. Biotechnology is the process by which living organisms are used to create new products THE ORGANISMS

Transcription:

BIOINFORMATICS 1 or why biologists need computers SEQUENCING TECHNOLOGY bioinformatic challenges http://www.bioinformatics.uni-muenster.de/teaching/courses-2012/bioinf1/index.hbi Prof. Dr. Wojciech Makałowski" Dr. Victoria Shabardina " Institute of Bioinformatics" Prof. Dr. Wojciech Makałowski" Dr. Victoria Shabardina " Institute of Bioinformatics" 1870 Friedrich Miescher discovers DNA DNA story 1944 Oswald Avery proves that DNA is a genetic material 1953 James Watson and Francis Crick discover DNA structure ( Double Helix ) DNA story Sequencing: beginnings Sequencing: infancy 1964 Robert Holley determines nucleotide sequences (77 nt) of the yeast Alanine trna J. Biol. Chem. 240: 2122-2128 1968 Ray Wu and A. Dale Kaiser sequenced 12 bases (!) of λ phage s 5 cohesive ends using chain termination and polyacrylamide gel electrophoresis J. Mol. Biol. 35: 523-537 1977 - Allan Maxam and Walter Gilbert develop DNA sequencing method by chemical degradation 1977 Fred Sanger develops 2,3 -dideoxy chain termination method 1

Chemical degradation sequencing" (Maxam&Gilbert) Chemical degradation sequencing" (Maxam&Gilbert) Polyacrylamide gel electrophoresis can resolve single-stranded DNA molecules that differs in length by just one nucleotide Figure 4.8 Genomes 3 ( Garland Science 2007) Chain termination DNA sequencing (Sanger) Sequencing: maturity Figure 4.2 Genomes 3 ( Garland Science 2007) Different types of primer for chain termination sequencing Sequencing: maturity Sequencing: maturity 1983 - Marvin Caruthers developed a method to construct fragments of DNA of predetermined sequence from five to about 75 base pairs long. He and Leroy Hood invented instruments that could make such fragments automatically. 1983 - Kary Mullis invented the polymerase chain reaction (PCR) technique 1987 - ABI 370; first fully automated sequencing machine 1995 - Craig Venter uses wholegenome shotgun sequencing technique to determine complete genome of bacterium Haemophilus influenzae 2005 - introduction of GS20 sequencing machine; first in the line of Next Generation Sequencing 2

Sequencing: maturity Sequencing: maturity Chromatogram of a DNA sequence generated by ABI sequencing machine (https://www.dnalc.org/view/15912-sequencing-dna.html ) Sequencing: maturity Next Generation Sequencing 1983 - Marvin Caruthers developed a method to construct fragments of DNA of predetermined sequence from five to about 75 base pairs long. He and Leroy Hood invented instruments that could make such fragments automatically. 1983 - Kary Mullis invented the polymerase chain reaction (PCR) technique 1987 - ABI 370; first fully automated sequencing machine 1995 - Craig Venter uses wholegenome shotgun sequencing technique to determine complete genome of bacterium Haemophilus influenzae 2005 - introduction of GS20 sequencing machine; first in the line of Next Generation Sequencing Massive parallelization of the sequencing process Relatively short reads Different approaches from improving Sanger s technique to direct observation of DNA through a microscope Attempts to sequence single molecules without amplification step Next Generation Sequencing Incorporation of a nucleotide -> formation of phosphodiester bond 1 Pyrosequencing (Roche454) 2 Ion torrent 3 Illumina 4 Pacific Bioscience (PacBio) 5 MinIon (Oxford NanoTechnologies) 3

NGS pyrosequencing" " library preparation NGS pyrosequencing" " loading granules with enzymes NGS - pyrosequencing After the emulsion PCR has been performed, the oil is removed, and the beads are put into a picotiter plate. Each well is just big enough to hold a single bead. The pyrosequencing enzymes are attached to much smaller beads, which are then added to each well. The plate is then repeatedly washed with the each of the four dntps, plus other necessary reagents, in a repeating cycle. The plate is coupled to a fiber optic chip. A CCD camera records the light flashes from each well. NGS - pyrosequencing Extension with individual dntps gives a readout. The readout is recorded by a detector that measures position of light flashes and intensity of light flashes. NGS - pyrosequencing Ten times faster workflow than other NGS systems ~2 hour sequencing runs (real-time detection of sequence extension) Batch sample preparation (six samples in six hours) Capable of six samples/day on two PGM Systems 4

NGS -ion torrent Simple Natural Chemistry NGS -ion torrent Fast Direct Detection Sequencing by synthesis Four nucleotides flow sequentially Four nucleotides flow sequentially Base call Base call A A T 5

Base call Base call A T C A T C 4 x T Base call Workflow ATCGTGTTTTAGGGTCCCCGGGGTTAAAA The flow cell - a core component Preparation of template 6

Preparation of template The flow cell is mounted on the cbot Another sheering method: transposomes enzymes for DNA cleavage Hybridization of template Amplification of template Incorporation Scanning 7

Cleavage Millions of clusters are sequenced in parallel A picture is taken every time a new base is added The flow cell is mounted on the sequencer Third generation sequencing (PacBio) TGS MinIon Single molecule sequencing: MinION by Oxford Nanopore https://www.youtube.com/watch?v=_b_cuz8hsyu 8

MinIon: Sequencing using nanopores Nanopores as polymer sensors. The idea emerged in early 1990s. Fundamental work done by David Deamer and Daniel Branton in collaboration with John Kasianowicz. (PNAS 1996 146:13770-13773) Hundreds of papers and patents since then. MinION basics https://nanoporetech.com/science-technology/introductionto-nanopore-sensing/introduction-to-nanopore-sensing MinION basics https://nanoporetech.com/science-technology/introductionto-nanopore-sensing/introduction-to-nanopore-sensing 512 channels (pores) per flow cell. Usually about 90% are working. Synthetic membrane Nanopore (2) is created by modified protein pores: α-hemolysin, CsgG from E.coli Non-destructive motor protein (1) (actually serves as a break) Read length: > 10 Kb (Phage λ DNA, 50 Kb) Read speed: 8 bases to 20 bases/sec Run time: max 48 hours Error rate = 5-10 % Sequence yield per flow cell: 0.5-1.4 Gb MinION dataflow Easy, standard template preparation Time of library preparation: 1D - about ten minutes 2D - up to two hours Cost of a single run: reagents $1000 flow cell $1000 MinION the device Nanopore sensing is carried out on the sensor chip, contained in the flow cell inside the MinION device. Data is processed by an Application-Specific Integrated Circuit (ASIC) also in the flow cell and processed in real time by the MinKNOW software MinKNOW the software MinKNOW is the software that controls the MinION. It carries out several core data tasks and can be used to change experimental workflows or parameters. MinKNOW runs on the user s computer. METRICHOR base calling Metrichor is an on-demand, cloud-based, bioinformatics data analysis platform. It supports Oxford Nanopore base calling software. Base calling may be made available locally. 9

MiniKNOW - Data Render MiniKNOW - Channels Panel Numerous applications explored by MinION Access Program (MAP) Comparison table Genomic DNA sequencing Metagenomic analysis Direct RNA sequencing Species identification in the field Splice variants identification Method all sequence by synthesis 454 Illumina Ion Torrent PacBio MinIon Pyrosequencing: Bridge amplificaaon; Ion semiconductor: Single-molecule in Nanopores: pyrophosphates detecaon of label free detecaon real-ame: modified pore detecaon by fluorescently labeled of released protons. detecaon of proteins detect chemoluminicent nucleoades. Detector: ion sensor fluorescently current change when reacaon (luciferase Detector: CCD labeled cleaved different nucleoades enzyme). camera pyrophosphates. pass the pore. Detector: CCD Detector: ZMW Detector: ASIC - camera camera (sensiave!) measures ionic current flow Direct determination of modified nucleotides And many more to come 454: https://www.youtube.com/watch?v=nffgwgfe0aa Illumina: https://www.youtube.com/watch?v=fcd6b5hraz8 J Ion Torrent: https://www.youtube.com/watch?v=wybzbxifuks J PacBio: https://www.youtube.com/watch?v=_b_cuz8hsyu J MioIon: https://nanoporetech.com/how-it-works Comparison table SEQUENCING INFORMATICS ASSEMBLY AS A GIANT PUZZLE 454 Illumina Ion Torrent PacBio MinIon Read length 700 bp 50-250 bp 200 bp 3000 bp 5000-7000 Reads per run 1M up to 3.2G up to 4M 35000-75000 30-400 M Time per run 24 hours 1-10 days 2 hours 30 min 2 hours 6 hours several days Cost per million 10$ 0.05-0.15$ 1$ 2$ bases Machine cost 120.000-80.000$ 695.000$ 1500$ 650.000$ Error rate 0.1-1% 0.5-1% 1-2% 12% 5-10% 10

Sequencing informatics Sequence assembly A fundamental goal of DNA sequencing has been to generate large, continuous regions of DNA sequence CONTIGS In principle, assembling a sequence is just a matter of finding overlaps and combining them. In practice: most genomes contain multiple copies of many sequences, there are random mutations (either naturally occurring cell-to-cell variation or generated by PCR or cloning), there are sequencing errors Assembly problems Solving assembly problems: " " pair end reads 11

Assembly problems: sequencing gaps Assembly evaluation - N50 If one orders the set of contigs produced by the assembler by size, then N50 is the size of the contig such that 50% of the total bases are in contigs of equal or greater size. 15+12+9+7+6+5+2 = 56 56/2 = 28 -> N50 is 9kb (15+12 = 27 is less than 50%) Sequence assembly NGS case De bruijn graph construction Volume and read length of data from next-gen sequencing machines meant that the read-centric overlap approaches were not feasible already in 1980 s Pevzner et al. introduced an alternative assembly framework based on de Bruijn graph Based on a idea of a graph with fixed-length subsequences (k-mers) Key is that not storing read sequences just k-mer abundance information in a graph structure The k-mers in the reads (4-mers in this example) are collected into nodes and the coverage at each node is recorded (numbers at nodes). Features: continuous linear stretches within the graph Sequencing errors are low frequency tips in the graph Flicek & Birney (2009) Nat Meth, 6: S6-S12. Next-gen assemblers Graph is simplified to combine nodes that are associated with the continuous linear stretches into single, larger nodes of various k-mer sizes. Error correction removes the tips and bubbles that result from sequencing errors. Final graph structure that accurately and completely describes in the original genome sequence First de Bruijn based assembler was Newbler developed by 454 Life Sciences Adapted to handle main source of error in 454 data indels in homopolymer tracts Many de Bruijn assemblers subsequently developed SHARCGS, VCAKE, VELVET, EULER-SR, EDENA, ABySS and ALLPATHS, SOAP Most can use mate-pair information Slightly different approach to transcriptome assembly It has to allow many discontinuous graphs representing single transcript, including paralogs and alternatively spliced ones. SOAP-Trans, Trinity Flicek & Birney (2009) Nat Meth, 6: S6-S12. 12

Data storage problem a consequence of sequencing technology success Storage prices versus DNA sequencing costs BIOINFORMATICS CREED Remember about biology Do not trust the data Use comparative approach Use statistics Know the limits Remember about biology!!! 13