RNA spike-in controls & analysis methods for trustworthy genome-scale measurements

Similar documents
Assessing Technical Performance in Differential Gene Expression Experiments with External Spike- in RNA Control Ratio Mixtures

Best Practice for RNA-seq Analysis

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Top 5 Lessons Learned From MAQC III/SEQC

Wet-lab Considerations for Illumina data analysis

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

RNA standards v May

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

Parts of a standard FastQC report

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

TECH NOTE Stranded NGS libraries from FFPE samples

Increased transcription detection with the NEBNext Single Cell/Low Input RNA Library Prep Kit

RNA-SEQUENCING ANALYSIS

Application Note Selective transcript depletion

Simultaneous profiling of transcriptome and DNA methylome from a single cell

Sequence Analysis 2RNA-Seq

Deep Sequencing technologies

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Overcome limitations with RNA-Seq

Bio-Reference Standards/Materials. Carole Foy LGC Presentation to EuroGentest 15 th May 2007

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

QPCR ASSAYS FOR MIRNA EXPRESSION PROFILING

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction:

Wheat CAP Gene Expression with RNA-Seq

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

High performance sequencing and gene expression quantification

Successful gene expression studies using validated qpcr assays. Jan Hellemans, CEO Biogazelle webinar October 28 th, 2015

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Overcome ligation-induced bias and skewed mirna representation in microrna-seq

Certificate of Analysis Standard Reference Material 2374

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

the mirna quality control study mirqc Jo Vandesompele Pieter Mestdagh

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

User Guide. SIRV-Set 2. SIRV-Set 3. Spike-In RNA Variant Controls. Spike-In RNA Variant Controls with Isoforms (Iso Mix E0)

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

ChIP-seq and RNA-seq. Farhat Habib

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Automation of Lexogen s QuantSeq 3 mrna-seq Library Prep Kits on the Biomek FX p NGS Workstation

Gene expression microarrays and assays. Because your results can t wait

Supplementary Information

Motivation From Protein to Gene

DNA METHYLATION RESEARCH TOOLS

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

sherwood - UltramiR shrna Collections

DNA Microarray Technology

FFPE in your NGS Study

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Comparative Analysis using the Illumina DASL assay with FFPE tissue. Wendell Jones, PhD Vice President, Statistics and Bioinformatics

Digital RNA allelotyping reveals tissue-specific and allelespecific gene expression in human

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

Next Generation Sequencing

RNA-Sequencing analysis

Direct RNA sequencing of human transcripts using the Oxford Nanopore sequencing platform. Rachael Workman The Nanopore RNA Consortium AGBT 2018

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Differential gene expression analysis using RNA-seq

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

Joint RuminOmics/Rumen Microbial Genomics Network Workshop

Guidelines Analysis of RNA Quantity and Quality for Next-Generation Sequencing Projects

Lecture #1. Introduction to microarray technology

Exploring of microrna markers for body fluid identification using NGS

Low input RNA-seq library preparation provides higher small non-coding RNA diversity and greatly reduced hands-on time

G E N OM I C S S E RV I C ES

Standardized Next Generation Sequencing Abundance Measurements (StarSeq) using Competitive Template Mixtures. AccuGenomics Inc.

G-MAKE, a Make-based Infrastructure for Rapid Genome Characterization and the Genomes in a Bottle Consortium

Next-generation sequencing technologies

Applied Biosystems SOLiD 3 Plus System. RNA Application Guide

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012

Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells

RAPID, ROBUST & RELIABLE

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Full-length single-cell RNA-seq applied to a viral human. cancer: Applications to HPV expression and splicing analysis. Supplementary Information

IMGM Laboratories GmbH. Sales Manager

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

RNA-Seq Analysis. Simon Andrews, Laura v

rnaseqcore.vet.cornell.edu

Introduction to RNA-Seq in GeneSpring NGS Software

Deep Sequencing QC: An Component Study of the FDA-led Sequencing Quality Control Project Phase 2 (SEQC2)

RNA-Seq with the Tuxedo Suite

Technical Review. Real time PCR

ChIP-seq and RNA-seq

Technical note: Molecular Index counting adjustment methods

Combining Techniques to Answer Molecular Questions

CBC Data Therapy. Metatranscriptomics Discussion

Assay Validation Services

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Functional Genomics Research Stream. Research Meeting: November 15, 2011 Developments in the Field of Functional Genomics

Transcription:

RNA spike-in controls & analysis methods for trustworthy genome-scale measurements Sarah A. Munro, Ph.D. Genome-Scale Measurements Group ABRF Meeting March 29, 2015

Overview External RNA Controls Consortium (ERCC) RNA spike-in controls erccdashboard analysis tool ERCC 2.0: Building an updated suite of RNA controls

Overview External RNA Controls Consortium (ERCC) RNA spike-in controls erccdashboard analysis tool ERCC 2.0: Building an updated suite of RNA controls

How can we have trustworthy We re simultaneously measuring thousands of RNA molecules in gene expression experiments But are we getting it right? gene expression results?

External RNA Controls Consortium (ERCC) initiated by industry, hosted by NIST Initiated by Janet Warrington, VP Clinical Genomics at Affymetrix Open to all interested parties Voluntary More than 90 participants Industry, Academia, Government All major microarray technology developers Other gene expression assay developers Spikeins

ERCC control sequences are in NIST Standard Reference Material 2374 DNA sequence library 96 unique control sequences in DNA plasmids Controls intended to mimic mammalian mrna In vitro transcription to make RNA controls NIST SRM 2374 and related data files are available directly from NIST @ http://tinyurl.com/erccsrm

Making ERCC ratio mixtures with true positive and true negative ratios NIST Plasmid DNA Library RNA transcripts Mixtures with known abundance ratios in vitro transcription Pooling

Using ERCC ratio mixtures Treated (n>3) Control (n>3)

Using ERCC ratio mixtures Treated (n>3) Control (n>3)

Using ERCC ratio mixtures Treated (n>3) Control (n>3)

Using ERCC ratio mixtures Treated (n>3) Control (n>3) Measurement process Expression Measures Multiple steps Many people & labs Takes days to weeks Statistical Analysis

Example gene expression data Treated Control

Are the RNA molecule ratios statistically different across the samples? Treated Control

Evaluate technical performance with ERCC true positive and true negative ratios Treated Control

Overview External RNA Controls Consortium (ERCC) RNA spike-in controls erccdashboard analysis tool ERCC 2.0: Building an updated suite of RNA controls

Use erccdashboard to produce standard performance metrics for any experiment R package is available from: Bioconductor NIST GitHub Site Open source and open access for use in Other analysis tools and pipelines Commercial software

Gauge technical performance with 4 erccdashboard figures Developed as part of SEQC study, with ABRF partners Technology-independent ratio performance measures Assessed differences in performance across Experiments Laboratories Measurement processes Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014).

Ambion ERCC Ratio Mixtures 23 Controls per Subpool Design abundance spans 2 20 range within each Subpool

Spike-in design for SEQC RNA Sequencing Experiments Samples replicates for sequencing Rat Experiment Treated and Control Rat RNA Biological Replicates Interlaboratory Experiment Human Reference RNA Samples Technical Replicates

What is the dynamic range of my experiment? Rat Experiment Interlaboratory Experiment Log2 Normalized ERCC Counts Log2 Normalized ERCC Counts Log2 ERCC Spike Amount (attomol nt µg -1 total RNA) Log2 ERCC Spike Amount (attomol nt µg -1 total RNA)

What is the dynamic range of my experiment? Rat Experiment Interlaboratory Experiment Log2 Normalized ERCC Counts Typical Sequencing ~40 million sequence reads per replicate Log2 Normalized ERCC Counts Deep Sequencing ~260 million sequence reads per replicate Log2 ERCC Spike Amount (attomol nt µg -1 total RNA) Log2 ERCC Spike Amount (attomol nt µg -1 total RNA)

What was the diagnostic power? Rat Experiment Interlaboratory Experiment True Positive Rate True Positive Rate False Positive Rate False Positive Rate

What was the diagnostic power? Rat Experiment Interlaboratory Experiment True Positive Rate Area Under the Curve (AUC) depends on the number of controls detected! True Positive Rate False Positive Rate False Positive Rate

AUC is a reasonable summary statistic But we d like to evaluate our diagnostic performance as a function of abundance

Rat Experiment MA Plot Log2 Normalized Ratio of Counts Log2 Normalized Average Counts

LODR: Limit of Detection of Ratios DE Test P-values Rat Experiment Average Counts Reference RNA Model P-values as a function of average signal Find P-value threshold based on chosen false discovery rate Here FDR = 0.1 Default is FDR = 0.05 Estimate LODR from intersection of model confidence interval upper bound and P-value threshold

LODR: Limit of Detection of Ratios DE Test P-values Rat Experiment Average Counts Reference RNA LODR provides Specified confidence in the differentially expressed transcripts above LODR (90% chance of <10% FDR) Guidance for experimental design increase signal for transcripts above LODR estimate

Rat Experiment MA Plot 4:1 LODR Log2 Ratio of Normalized Counts Log2 Normalized Average Counts

Rat Experiment ** MA Plot * 4:1 LODR Log2 Ratio of Normalized Counts Log2 Normalized Average Counts

Log2 Ratio of Normalized Counts Rat Experiment MA Plot 4:1 LODR ** * Increased sequencing depth shifts endogenous transcript ratio measurements above LODR Log2 Normalized Average Counts

What are the LODR estimates for my experiment? Rat Experiment Interlaboratory Experiment DE Test P-values DE Test P-values Average Counts Average Counts

How do the endogenous samples relate to LODR? Rat Experiment Interlaboratory Experiment Log2 Ratio of Normalized Counts Log2 Ratio of Normalized Counts 4:1 LODR 4:1 LODR Log2 Normalized Average Counts Log2 Normalized Average Counts

How much technical variability & bias is there? Rat Experiment Interlaboratory Experiment Log2 Ratio of Normalized Counts Log2 Ratio of Normalized Counts Decreased Variability Significant Ratio Bias

mrna Fraction Differences Between Samples Contributes to Bias in ERCC Ratios Spike-in Spike-in mrna Total RNA mrna enrichment mrna rrna Sample 1 Sample 2 Sample 1 Sample 2 The RNA fractions are exaggerated for illustration purposes

Dynamic Range AUC Diagnostic performance Variability Bias LODR & Sample Transcripts LODR Limit of Detection of Ratios

EVALUATE REPRODUCIBILITY ACROSS LABORATORIES

Good Performance Poor Performance

Interlaboratory Analysis Using erccdashboard performance metrics Lab 1-6 Illumina + poly-a selection (Illumina kit) Lab 7-9 Life Tech + poly-a selection (Life Tech kit) Lab 10-12 Illumina + ribosomal RNA depletion

Consistent LODR across 11 of 12 Labs Diagnostic performance was consistent within and amongst measurement processes Lab 7 was an outlier for diagnostic performance LODR (Average Counts) LODR agreement with AUC Laboratory

Ratio bias is highly variable amongst experiments Ratio bias (r m ) can be attributed to mrna fraction difference between samples: Shippy et al. 2006 mrna fraction Difference R s = nominal subpool ratio (E 1 /E 2 ) s = empirical ratio Log(r m ) Large standard errors indicate that mrna fraction isn t the only factor contributing to ERCC ratio bias mrna enrichment protocol is a factor Laboratory

Protocol-dependent bias from poly-a selection affects ERCC controls due to short poly-a tails Lab 1-6 ILM Poly-A Lab 7-9 LIF Poly- Lab 10-12 ILM Ribo

mrna enrichment protocol biases vary across individual ERCCs but are consistent for a protocol

mrna enrichment protocol biases vary across individual ERCCs but are consistent for a protocol

Results of the erccdashboard Publication Ratio performance measures for any technology platform and any experiment Diagnostic Power Novel LODR metric Technical Variability & Bias Comparison across experiments Quantification of mrna fraction differences between samples Show protocol-dependent bias

Overview External RNA Controls Consortium (ERCC) RNA spike-in controls erccdashboard analysis tool ERCC 2.0: Building an updated suite of RNA controls

ERCC 2.0: A New Suite of RNA Controls Approached by industry and academia to build new RNA controls NIST-hosted open, public ERCC 2.0 workshop Workshop report and presentations available: slideshare.net/ercc-workshop All interested parties are welcome to participate Sequence contributions Interlaboratory analysis New and Improved mrna Mimics Transcript Isoforms mirna

New and Improved mrna Mimics Additional controls Expand distributions of RNA control properties Length (> 2kb) GC content Poly-A tail length

Transcript Isoform Controls Transcript Design Non-cognate Spike-in RNA Variants (SIRVs) developed by Lexogen Cognate sequence selection in progress Schizosaccharomyces pombe Mixture design Dynamic Range 2 4 Design Ratios < 2:1 Lukas Paul, Lexogen

Small and mirna Controls Needed for validation of clinical applications Early Detection Research Network Tgen Other applications relevant to bacterial RNA-Seq Non-cognate mirna controls Include some pre-mirna Direct RNA control synthesis by Agilent no need for DNA templates Karol Thompson, FDA

Recap External RNA Controls Consortium (ERCC) RNA spike-in controls erccdashboard analysis tool ERCC 2.0: Building an updated suite of RNA controls

Acknowledgements All External RNA Controls Consortium participants NIST Marc Salit Steve Lund P. Scott Pine Justin Zook David Duewer Jerod Parsons Jennifer McDaniel Margaret Klein Empa Matthias Roesslein SEQC study participants Co-authors on erccdashboard manuscript: S. P. Lund, P. S. Pine, H. Binder, D. Clevert, A. Conesa, J. Dopazo, M. Fasold, S. Hochreiter, H. Hong, N. Jafari, D. P. Kreil, P. P. Łabaj, S. Li, Y. Liao, S. M. Lin, J. Meehan, C. E. Mason, J. Santoyo-Lopez, R. A. Setterquist, L. Shi, W. Shi, G. K. Smyth, N. Stralis-Pavese, Z. Su, W. Tong, C. Wang, J. Wang, J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit For more information contact: sarah.munro@nist.gov