RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Similar documents
Deep Sequencing technologies

Overview of Next Generation Sequencing technologies. Céline Keime

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Matthew Tinning Australian Genome Research Facility. July 2012

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014

Wheat CAP Gene Expression with RNA-Seq

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Monday September 15, 2014

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Wet-lab Considerations for Illumina data analysis

Introduction Bioo Scientific

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Welcome to the NGS webinar series

Next-Generation Sequencing. Technologies

NextGen Sequencing Technologies Sequencing overview

High Throughput Sequencing Technologies. J Fass UCD Genome Center Bioinformatics Core Tuesday December 16, 2014

Third Generation Sequencing

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Next-generation sequencing and quality control: An introduction 2016

How to deal with your RNA-seq data?

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

DNA-Sequencing. Technologies & Devices. Matthias Platzer. Genome Analysis Leibniz Institute on Aging - Fritz Lipmann Institute (FLI)

High throughput DNA Sequencing. An Equal Opportunity University!

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

ThruPLEX -FD Prep Kit Instruction Manual. Single Tube Library Preparation for Illumina NGS Platforms

Gene Expression Technology

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

NEXTFLEX Small RNA Barcode Primers - Set A (For Illumina Platforms) Catalog #NOVA (Kit contains 96 reactions)

GENERAL INFORMATION...

KAPA Single-Indexed Adapter Kit Illumina Platforms

Introduction to RNA-Seq

Next-generation sequencing Technology Overview

Supplementary Figure 1

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

RNA-Seq data analysis course September 7-9, 2015

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Introduction to RNA-Seq

Next Gen Sequencing. Expansion of sequencing technology. Contents

The Journey of DNA Sequencing. Chromosomes. What is a genome? Genome size. H. Sunny Sun

Genomic resources. for non-model systems

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introductory Next Gen Workshop

RNA-Sequencing analysis

Human genome sequence

Guidelines for sequencing SCC libraries All libraries made using SCC supplied hydrogel after are V3 libraries

Research school methods seminar Genomics and Transcriptomics

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Contact us for more information and a quotation

RAPID, ROBUST & RELIABLE

The Iso-Seq Method: Transcriptome Sequencing Using Long Reads

Expressed genes profiling (Microarrays) Overview Of Gene Expression Control Profiling Of Expressed Genes

Human Genome Sequencing Over the Decades The capacity to sequence all 3.2 billion bases of the human genome (at 30X coverage) has increased

The Expanded Illumina Sequencing Portfolio New Sample Prep Solutions and Workflow

Application of NGS (next-generation sequencing) for studying RNA regulation

Next Generation Sequencing (NGS)

High Throughput Sequencing Technologies. UCD Genome Center Bioinformatics Core Monday 15 June 2015

Applied Bioinformatics - Lecture 16: Transcriptomics

Sequencing techniques

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Next Generation Sequencing. Tobias Österlund

NEXT GENERATION SEQUENCING Whole Gene Sequencing

The Genome Analysis Centre. Building Excellence in Genomics and Computa5onal Bioscience

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Library construction for nextgeneration sequencing: Overviews. and challenges

Chapter 7. DNA Microarrays

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

Application of NGS (nextgeneration. for studying RNA regulation. Sung Wook Chi. Sungkyunkwan University (SKKU) Samsung Medical Center (SMC)

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Next-generation sequencing technologies

Genome 373: High- Throughput DNA Sequencing. Doug Fowler

Modern Epigenomics. Histone Code

FGCZ NEWSLETTER FALL Next Generation Sequencing at the Functional Genomics Center Zurich

ChIP-seq and RNA-seq

Plant Breeding and Agri Genomics. Team Genotypic 24 November 2012

02 Agenda Item 03 Agenda Item

Applications of short-read

measuring gene expression December 5, 2017

Finding Genes with Genomics Technologies

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

NEXTflex PCR-Free Barcodes - 6 (For Illumina Platforms) Catalog # (Kit contains 48 reactions) Bioo Scientific Corp V12.

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Next Generation Sequencing

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

Ultrasequencing: Methods and Applications of the New Generation Sequencing Platforms

SMARTer for NGS. SMARTer Solutions 다카라코리아바이오메디칼

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Single Cell Transcriptomics scrnaseq

NEBNext Multiplex Oligos for Illumina (Index Primers Set 1)

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Increased transcription detection with the NEBNext Single Cell/Low Input RNA Library Prep Kit

Concepts and methods in sequencing and genome assembly

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Illumina Sequencing Overview

Transcription:

RNA Sequencing Next gen insight into transcriptomes 05-06-2013, Elio Schijlen

Transcriptome complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome The key aims of transcriptomics are: to catalogue all species of transcripts, including mrnas, non-coding RNAs and small RNAs; to determine the transcriptional structure of genes, in terms of their start sites, 5 and 3 ends, splicing patterns and other post-transcriptional modifications; to quantify the changing expression levels of each transcript during development and under different conditions.

Recently, the development of novel high-throughput DNA sequencing methods has provided a new method for both determining, mapping and quantifying transcriptomes. This method, termed RNA-Seq (RNA sequencing) clear advantages over previous approaches is revolutionizing the manner in which eukaryotic transcriptomes are analysed

Illumina HiSeq2000 454 SOLiD 5500 Ion proton Pacbio RS

From next to 3rd generation sequencing Illumina HiSeq Fluorescent nt scanning SOLiD Ligation fluorescent oligos 454 Pyrosequencing Ion proton Hydrogen detection Pacbio Real time fluorescent detection

From next to 3rd generation sequencing Illumina HiSeq SOLiD 5500 ssdna sequence template Clonaly amplified into clusters on glass slide (flow cell) idem 454 Ion proton ssdna sequence template Clonaly amplified on beads (empcr) idem Pacbio dsdna sequence template Single molecule/polymerase molecule complex

Illumina HiSeq2000 Syringe pumps Optics Flow cell access door Reagents compartment Flow cell 8 channels

Illumina HiSeq2000 3 5 DNA (0.1-5.0 μg) Library Preparation Single molecule array Cluster Growth A G T C C G T G C A A G C 5 T G T A C G A T C A C C C G A T C G A A Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T Image Acquisition Base Calling

Eusol BACs 177.14 M PF clusters; 33.8 Gb>Q30 Lane Sample ID Sample Ref Index Description Yield (Mbases) % PF # Reads % of raw clusters per lane Clusters with unmatched barcodes 1 lane1 unknown Undetermined for lane 1 3,234 87.47 36,608,108 9.74 1 plate10 EUsol_fill_gaps TAGCTT 3,359 94.77 35,088,534 9.34 1 plate1 EUsol_fill_gaps ATCACG 4,150 95.35 43,091,246 11.47 1 plate2 EUsol_fill_gaps CGATGT 3,480 95.66 36,020,422 9.59 1 plate3 EUsol_fill_gaps TTAGGC 3,496 95.27 36,331,200 9.67 1 plate4 EUsol_fill_gaps TGACCA 4,674 95.4 48,508,022 12.91 1 plate5 EUsol_fill_gaps ACAGTG 2,305 93.65 24,365,574 6.49 1 plate6 EUsol_fill_gaps GCCAAT 1,895 94.83 19,783,144 5.27 1 plate7 EUsol_fill_gaps CAGATC 3,366 94.9 35,115,836 9.35 1 plate8 EUsol_fill_gaps ACTTGA 2,592 95.29 26,934,126 7.17 1 plate9 EUsol_fill_gaps GATCAG 3,232 94.59 33,829,830 9.01

SOLiD 5500

454 sequencing technology & workflow

NGS - 454 pyrosequencing raw read GCTAAG

Ion semiconductor sequencing

Ion Torrent PGM & Proton

3 d Gen Sequencing: PacBio SMRT sequencing Kb read length <50,000 reads <100 Mb

Pacbio sequencing Phospholinked Cleavage by DNA polymerase Fluorophore clipped off by polymerase DNA synthesized is natural No steric hindrance or accumulation of background signal ZMW Zero Mode Waveguide

Sequence read length (raw), quality Illumina HiSeq fixed 50 or 100 nt, SR and PE SOLiD 5500 fixed 75 nt 454 range 50-1,000 nt (av~750) Ion torrent range 50-200 nt (av ~170) Pacbio range 50-20,000 nt (av ~3-4 kb)

Sequence read quality Illumina HiSeq SOLiD 454 HQ reads, systematic errors Lower quality 3 ends Low GC coverage very HQ reads Lower quality 3 ends HQ reads, sytematic errors Homopolymer problems Clonality Lower quality 3 ends Ion torrent idem, but lower overall quality Pacbio Low Quality (0.8-0.85) Random errors No decrease read quality 3 end

Sequence reads & throughput/run Illumina HiSeq SOLiD 5500XL 454 Ion torrent Pacbio 1.5 E+09 full flowcell, 12days/run Up to 550 Gb (2 cells) 1.5 E+09 full flowcell, 6days/run Up to 240 Gb (2 flow chips) 1 E+06 full PTP, 1 day/run Up to 1 Gb 60-80 E+06 ionpi chip, 4 hours/run Up to 10 Gb 300,000 (8 cell strip), 1day/run Up to 0.75 Gb

Transcript coverage

DNA Samples for sequencing Active Chromatin 1 Genomic DNA mrna Library preparation: Ligate adapters to both ends of fragmented nucleic acid Small RNA ChIP-Sequencing Other Apps

RNA input requirements RNA: DNA free, RNAse free, non degraded, No contaminants (proteins, polysaccharides)

Protocol variations Fragmentation methods RNA: nebulization, hydrolysis cdna: sonication, Dnase I treatment Depletion of highly abundant transcripts Positive selection of mrna. Poly(A) selection or target specific Negative selection. (RiboMinus, RNAseH) Strand specificity Most RNA sequencing is not strand-specific Single-end or Paired-end sequencing

(Illumina) RNA seq workflow

Aligning the millions of reads to a "reference genome". many tools available for aligning genomic reads to a reference genome (sequence alignment tools), however, special attention is needed when alignment of a transcriptome to a genome, mainly when dealing with genes having intronic regions. As discussed above, the sequence libraries are created extracting mrna using its poly(a) tail, which is added to the mrna molecule post-transcriptionally and thus splicing has taken place. Therefore, the created library and the short reads obtained cannot come from intronic sequences and thus, when trying to align these short reads to a reference genome, only short reads aligning entirely inside exonic regions will be matched while short reads from exon-exon junction regions will not. Several software packages exist for short read alignment, and recently specialized algorithms for transcriptome alignment have been developed, e.g. TopHat and Cufflinks.

Sequences coverage A.thaliana:approx 60 E +06 mapped reads result in plateau of unique gene models expressedm(approx 20,000)

Multi mapped 50nt SR reads (A.thaliana ~5%) can cause inaccurate expressin estimates Tubulin B chain reads mapped to reference genome (gray) Blue lines intron spanning reads Histograms read coverage Blue multimapped contributed Green unique mapped contributed Including multimapped artificially increases expression value Readmapping 2 genes sharing genome region by their 3 end on opposite strands Multimapped reads derived from + strand would severly overestimate expression of strand gene.

Ekblom et al., 2012 Comparative and Functional Genomics doi:10.1155/2012/281693

Wenger and Galliot BMC Genomics 2013, 14:204 doi:10.1186/1471-2164-14-204

Some considerations The information gathered by RNAseq has similar limitations as other RNA expression analysis pipelines. RNA status dependent Biological variable: Tissue specific; Time dependent. Triplicates! During a cell's lifetime and context, its gene expression levels change. Strongly RNA quality dependent Library prep method dependent Sequencing technology dependent Analysis method dependent Because of this, care must be taken when drawing conclusions from the sequencing experiment. Results must be verified using independent technology