Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

Similar documents
RNA-Seq with the Tuxedo Suite

Top 5 Lessons Learned From MAQC III/SEQC

RNA-Seq Software, Tools, and Workflows

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Sequencing analysis

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Introduction to RNA-Seq

RNAseq Differential Gene Expression Analysis Report

measuring gene expression December 5, 2017

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Sanger vs Next-Gen Sequencing

Course Presentation. Ignacio Medina Presentation

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

SCALABLE, REPRODUCIBLE RNA-Seq

Long and short/small RNA-seq data analysis

RNA-Seq Tutorial 1. Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

RNA Seq: Methods and Applica6ons. Prat Thiru

RNA-seq Data Analysis

Single Cell Genomics

2100 Bioanalyzer. Overview & News. Ralph Beneke Dec 2010

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

RNA-Seq analysis workshop

Demo of mrna NGS Concluding Report

Mapping strategies for sequence reads

Measuring transcriptomes with RNA-Seq

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Introduction to RNA sequencing

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Measuring transcriptomes with RNA-Seq. BMI/CS 776 Spring 2016 Anthony Gitter

RNA-Seq analysis using R: Differential expression and transcriptome assembly

Expression Array System

Microarray Gene Expression Analysis at CNIO

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

PrimePCR Assay Validation Report

Differential gene expression analysis using RNA-seq

Technical Review. Real time PCR

Validate with confidence Move forward with reliable master mixes

Gene Expression Analysis Superior Solutions for any Project

Post-assembly Data Analysis

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

User Guide. SIRV-Set 2. SIRV-Set 3. Spike-In RNA Variant Controls. Spike-In RNA Variant Controls with Isoforms (Iso Mix E0)

PrimePCR Assay Validation Report

Next Gen Sequencing. Expansion of sequencing technology. Contents

Intermediate RNA-Seq Tips, Tricks and Non-Human Organisms

Illumina TruSeq RNA Access Library Prep Kit Automated on the Biomek FX P Dual-Hybrid Liquid Handler

RNA Sequencing Analyses & Mapping Uncertainty

PrimePCR Assay Validation Report

Illumina Bio-Rad Single Cell Sequencing For Research Use Only. Not for use in diagnostic procedures.

RNA spike-in controls & analysis methods for trustworthy genome-scale measurements

Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data

Post-assembly Data Analysis

PrimePCR Assay Validation Report

QuantSeq 3 mrna-seq with integrated automatic data analysis - a complete workflow for most easy to use and cost efficient gene expression profiling

G ene expression quantification on cultured cells using the reverse transcription quantitative polymerase

About Strand NGS. Strand Genomics, Inc All rights reserved.

WELCOME. Norma J. Nowak, PhD Executive Director, NY State Center of Excellence in Bioinformatics and Life Sciences (CBLS)

Gene Expression on the Fluidigm BioMark HD

RNA-Seq analysis workshop. Zhangjun Fei

RNA- seq data analysis tutorial. Andrea Sboner

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Bioinformatics in next generation sequencing projects

Detecting circular RNAs: bioinformatic and experimental challenges

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Cancer Genetics Solutions

RNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

SCIENCE CHINA Life Sciences

A fuzzy method for RNA-Seq differential expression analysis in presence of multireads

PrimePCR Assay Validation Report

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012

ViennaNGS: A toolbox for building efficient next-generation sequencing analysis pipelines

RNA Sequencing and Analysis

Outline. Array platform considerations: Comparison between the technologies available in microarrays

Shuji Shigenobu. April 3, 2013 Illumina Webinar Series

Gene Expression Technology

Genomics and Transcriptomics of Spirodela polyrhiza

Automated size selection of NEBNext Small RNA libraries with the Sage Pippin Prep

Introduction to Real-Time PCR: Basic Principles and Chemistries

Page Bioanalyzer Assays and Applications

RNA-seq Using Next Generation Sequencing

ProteoGenomics in Galaxy: Identifying novel constellations of proteoforms using transcriptomic and proteomic data.

Introduction to genome biology

Applications and Uses. (adapted from Roche RealTime PCR Application Manual)

Digital DNA/RNA sequencing enables highly accurate and sensitive biomarker detection and quantification

How much sequencing do I need? Emily Crisovan Genomics Core

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

PrimePCR Assays, Panels, and Controls for Real-Time PCR. Instruction Manual

L3: Short Read Alignment to a Reference Genome

RNASEQ WITHOUT A REFERENCE

Introducing a Highly Integrated Approach to Translational Research: Biomarker Data Management, Data Integration, and Collaboration

Phosphate buffered saline (PBS) for washing the cells TE buffer (nuclease-free) ph 7.5 for resuspending the SingleShot RNA control template

Roche Molecular Biochemicals Technical Note No. LC 10/2000

RT 2 Profiler PCR Array: Web-based data analysis tutorial

Calculating Sample Size Estimates for RNA Sequencing Data ABSTRACT

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Transcription:

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data Jan Hellemans 7th international qpcr & NGS Event - Freising March 24 th, 2015

Therapeutics lncrna oncology antisense qpcr solutions qbase+ qpcr training Lab services mrna, mirna, lncrna qpcr, ddpcr, RNA-seq

The Biogazelle team & collaborators Lab team Ariane De Ganck Gaelle Vanseveren Nele Nijs Anthony Van Driessche Shana Robbrecht Tom Maes Bio-IT team Manuel Luypaert Sander Claus Project management & data analysis Pieter Mestdagh Jo Vandesompele Anneleen Beckers Collaborators Bio-Rad UGent

RNA-seq workflow experiment design sample collection & RNA extraction library prep & sequencing QC, data processing & interpretation

RNA-seq workflow experiment design https://www.biogazelle.com/knowledge-center/selected-video-presentations

RNA-seq workflow experiment design sample collection & RNA extraction library prep & sequencing QC, data processing & interpretation

Data processing tools Included in this analysis TopHat + HTseq TopHat + Cufflinks Sailfish Running & planned Star + HTseq Salmon (Sailfish successor) StringTie Interpretation & statistics of differential gene expression not considered

Data processing tools What is your preferred RNA-seq data mapper? TopHat MapSplice Star alignment free (eg Sailfish) don t know other What is your favorite RNA-seq quantification tool Cufflinks RSEM HTseq don t know Sailfish (Salmon) other What is the front end for your analysis no (command line) Basespace Galaxy Commercial solution

Data processing tools TopHat TopHat input RNA-seq reads (fastq files) genome index transcriptome annotation (optional GTF file) Procedure transcriptome mapping (when annotation is provided) genome mapping (non-junction reads) spliced mapping (de-novo, based on canonical D/A sites) TopHat output SAM/BAM files

Data processing tools HTseq HTseq input SAM/BAM file with mapping results (e.g. from TopHat) only unique mapping fraction is considered GTF/GFF file with genome features (e.g. gene models) processing htseq-count tool different modes to deal with reads overlapping multiple features HTseq output table with counts for each feature (exon, gene, )

Data processing tools Cufflinks Cufflinks input SAM/BAM file with mapping results (e.g. from TopHat) GTF/GFF file with genome features (e.g. gene models) Processing assemble transcripts estimate transcript abundances Cufflinks output GTF file with transcript coordinates, abundances & class codes abundances in FPKM (expected fragments per kilobase of transcript per million fragments sequenced)

Data processing tools Sailfish Sailfish input RNA-seq reads (fastq files) transcriptome index Processing counting k-mer occurrence in reads rather than aligning reads Sailfish output table with transcript abundance KPKM (K-mers Per Kilobase per Million mapped k-mers) TPM (Transcripts Per Million) FPKM

Data processing tools comparison TopHat Cufflinks TopHat HTSeq Sailfish mapping alignment based alignment free quantification level transcript gene reported metric read counts normalized expression novel transcripts only known transcripts known & novel transcripts

Benchmarking Published comparisons (Chandramohan et al, 2013) Comparison of public MAQC RNA dataset against matching TaqMan data Good overall correlation for the tools evaluated Limitations Gene based analysis not considering transcript variability Small comparison (531 datapoints) based on historic assays No detailed analysis on potential causes of different interpretations

Benchmarking A qpcr transcriptome reference data set Studies relying on MAQC samples Microarray quality control study (Shi 2006) Sequencing quality control study (Su 2014) MicroRNA quality control study (Mestdagh 2014) MAQC samples A = universal RNA B = brain RNA C = ¾ A + ¼ B D = ¼ A + ¾ B

Benchmarking A qpcr transcriptome reference data set Benefits of MAQC samples commercially available multiple large scale published datasets built in truths (relying on known mixing proportions) allows evaluation of reproducibility titration response accuracy dynamic range Biogazelle & Bio-Rad have performed a full qpcr transcriptome profiling on the MAQC samples using validated primepcr assays 4 x 22 238 data points

Benchmarking Setup Samples MAQC-A & MAQC-B RNA-seq: 2 replicates qpcr: 1 replicate Sequencing poly-a seq @ NextSeq500 18M reads per sample qpcr primepcr assays for all human coding genes 5 µl assays @ CFX384

Benchmarking Setup Data taken into consideration qpcr signal in MAQC-A & B Cq values between 11 & 32 Only transcripts detected by the qpcr assays (n = 15 087) Comparing qpcr & RNA-seq Sailfish: sum of expression for transcripts detected by qpcr other tools: gene level quantification based on custom gtffile containing only transcripts detected by qpcr

Benchmarking Gene expression levels qpcr data global mean normalized Cq values (already log2 scale) Sailfish log2 of sum of KPKM values of transcripts detected by qpcr HTseq log2 of normalized gene counts Cufflinks log2 of gene level FPKM values

Benchmarking Gene expression levels

Benchmarking Gene expression levels

Benchmarking Relative expression levels

Benchmarking More non-concordance @ low expression

Benchmarking Looking at the differences in Q2-Q4

Impact of Sailfish reference Completeness of reference impacts mapping % total RNA-seq 4 MAQC samples Mapping % for different reference transcriptomes 0% 20% 40% 60% 80% 100% Ensembl cdna 40-45% + Ensembl ncrna ~60% + LNCipedia ~61%

Impact of Sailfish reference Ensembl (cdna+ncrna) vs Ensembl+LNCipedia

Impact of Sailfish reference Ensembl (cdna+ncrna) vs Ensembl+LNCipedia 17.7% 0.5% (1.1%) 4.5% (10.6%) 34.9% 4.9%

Impact of Sailfish reference Ensembl (cdna+ncrna) vs Ensembl+LNCipedia KPKM 10.4 0.0 KPKM 1.5 10.3

Impact of reference transcriptome Ensembl vs RefGene Zhao, BMC Genomics 2015 expression level differential expression Ensembl Ensembl RefGene RefGene

Conclusions Overall good concordance between qpcr & all different RNA-seq analysis tools Some genes show strongly different absolute expression levels between qpcr & RNA-seq Some genes show pronounced differences in differential expression RNA-seq vs qpcr analysis method specific specific differences may be called / missed by a single tool The choice of reference transcriptome significantly impacts transcript expression analysis by Sailfish Validation required because some genes show strongly deviating results depending on quantification and data processing method