SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Similar documents
Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

Single Cell Transcriptomics scrnaseq

Introduction to Microbial Sequencing

Experimental Design Microbial Sequencing

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Deep Sequencing technologies

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

Single Cell Genomics

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

RNA-Seq Analysis. Simon Andrews, Laura v

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

TECH NOTE Stranded NGS libraries from FFPE samples

RNA-Seq data analysis course September 7-9, 2015

Guidelines Analysis of RNA Quantity and Quality for Next-Generation Sequencing Projects

Quality Control of Next Generation Sequence Data

Single Cell Genomics

Next Generation Sequencing

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

SOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit

RNA-Sequencing analysis

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

RAPID, ROBUST & RELIABLE

Introduction to RNA-Seq

RNAseq Differential Gene Expression Analysis Report

G E N OM I C S S E RV I C ES

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

RNA-SEQUENCING ANALYSIS

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Application Note Selective transcript depletion

dbcamplicons pipeline Amplicons

Differential gene expression analysis using RNA-seq

Sequence Analysis 2RNA-Seq

Wet-lab Considerations for Illumina data analysis

Transcriptome analysis

FFPE in your NGS Study

Introduction to RNA-Seq

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

NEBNext. Ultra II RNA Library Prep Kit for Illumina

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

rnaseqcore.vet.cornell.edu

NEXTFLEX Rapid Directional RNA-Seq Kit (For Illumina Platforms) Catalog #NOVA (Kit contains 8 reactions)

Applications and Sample Preparation for SOLiD Runs

dbcamplicons pipeline Amplicons

How much sequencing do I need? Emily Crisovan Genomics Core

How to deal with your RNA-seq data?

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Genome Resequencing. Rearrangements. SNPs, Indels CNVs. De novo genome Sequencing. Metagenomics. Exome Sequencing. RNA-seq Gene Expression

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Introduction to RNA-Seq in GeneSpring NGS Software

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Advanced RNA-Seq course. Introduction. Peter-Bram t Hoen

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

RNA standards v May

Intro to RNA-seq. July 13, 2015

Considerations for Illumina library preparation. Henriette O Geen June 20, 2014 UCD Genome Center

measuring gene expression December 11, 2018

High-quality stranded RNA-seq libraries from single cells using the SMART-Seq Stranded Kit Product highlights:

Low input RNA-seq library preparation provides higher small non-coding RNA diversity and greatly reduced hands-on time

RNA SEQUINS LABORATORY PROTOCOL

Multiplexed Strand-specific RNA-Seq Library Preparation for Illumina Sequencing Platforms

RNA-Seq with the Tuxedo Suite

Wheat CAP Gene Expression with RNA-Seq

Increased transcription detection with the NEBNext Single Cell/Low Input RNA Library Prep Kit

measuring gene expression December 5, 2017

Overcome limitations with RNA-Seq

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

Functional Genomics Research Stream. Research Meeting: November 8, 2011 cdna Library Construction for RNA-Seq

3.1 RNA Fragmentation, Priming and First Strand cdna Synthesis. 3.1A RNA Fragmentation and Priming Starting from Intact or Partially Degraded RNA:

Differential gene expression analysis using RNA-seq

Applications of short-read

Finding Genes with Genomics Technologies

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Gene Expression Technology

Parts of a standard FastQC report

Genomic resources. for non-model systems

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012

CBC Data Therapy. Metatranscriptomics Discussion

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Preparing normalized cdna libraries for transcriptome sequencing (Illumina HiSeq)

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Seq de novo assembly training

Quantitative Real Time PCR USING SYBR GREEN

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

Analysis of Differential Gene Expression in Cattle Using mrna-seq

RNA SEQUINS LABORATORY PROTOCOL

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

NextGen Sequencing Technologies Sequencing overview

SMARTer for NGS. SMARTer Solutions 다카라코리아바이오메디칼

KAPA RNA HyperPrep Workflow: Recommendations and Expectations for RNA-sequencing Using Degraded Inputs

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Successful gene expression studies using validated qpcr assays. Jan Hellemans, CEO Biogazelle webinar October 28 th, 2015

Total RNA isola-on End Repair of double- stranded cdna

Introduction Bioo Scientific

Welcome to the NGS webinar series

RNA-seq Data Analysis

Transcription:

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS SETTLES@UCDAVIS.EDU Bioinformatics Core Genome Center UC Davis BIOINFORMATICS.UCDAVIS.EDU

DISCLAIMER This talk/workshop is full of opinion, there are as many different way to perform analysis as there are Bioinformaticians. My opinion is based on over a decade of experience and spending a considerable amount of time to understand the data and how it relates to the biological question. Each experiment is unique, this workshop is a starting place and should be adapted to the specific characteristics of your experiment.

OUTLINE 1. Introduction to RNA-seq RNA-seq Experimental Design RNA Sequencing and Sample Preparation 2. Overview of RNA-seq Data Analysis Pipeline Workflow/Stages Software Metadata input Files and Directory Structure

RNA-SEQ PIPELINE OVERVIEW RNA-seq (Differential Expression) Data Analysis Pipeline Experiment Metadata Sequence Data (fastq) Software Setup Reference Genome preprocess QA/QC mapping Gene Annotation counting QA/QC DE Analysis Output

GENERATING RNA-SEQ LIBRARIES Considerations QA/QC of RNA samples RNA of interest Library Preparation Stranded Vs. Unstranded Size Selection/Cleanup Final QA Sequencing

QA/QC OF RNA SAMPLES RNA Quality and RIN (RQN on AATI Fragment Analyzer) RNA sequencing begins with high-quality total RNA, only an Agilant BioAnalyzer (or equivalent) can adequately determine the quality of total RNA samples. RIN values between 7 and 10 are desirable. BE CONSISTANT!!!

RNA OF INTEREST From total RNA we extract RNA of interest. Primary goal is to NOT sequence 90% (or more) ribosomal RNAs, which are the most abundant RNAs in the typical sample. there are two main strategies for enriching your RNA of interest. polya selection. Enrich mrna (those with polya tails) from the sample by oligo dt affinity. rrna depletion. rrna knockdown using RiboZero (or Ribominus) is mainly used when your experiment calls for sequencing non-polya RNA transcripts and non-coding RNA (ncrna) populations. This method is also usually more costly. rrna depletion will result in a much larger proportion of reads align to intergenic and intronic regions of the genome.

LIBRARY PREPARATION Some library prep methods first require you to generate cdna, in order to ligate on the Illumina barcodes and adapters. cdna generation using oligo dt (3 biased transcripts) cdna generation using random hexomers (less biased) full-length cdnas using SMART cdna synthesis method Also, can generate strand specific libraries, which means you only sequence the strand that was transcribed. This is most commonly performed using dudp rather than dntps in cdna generation and digesting the rna strand. Can also use a RNA ligase to attach adapters and then PCR the second strand and remainder of adapters. Single stranded adapter ligation is also becoming popular

SIZE SELECTION/CLEANUP/QA Final insert size optimal for DE are ~ 150bp Very important to be consistent across all samples in an experiment on how you size select your final libraries. You can size select by: Fragmenting your RNA, prior to cdna generation. Chemically heat w/magnesium Mechanically (ex. ultra-sonicator) Cleanup/Size select after library generation using SPRI beads or (gel cut) QA the samples using an electrophoretic method (Bioanalyzer) and quantify with qpcr. Most important thing is to be consistent!!!

SEQUENCING Reasons to use Paired-end sequencing Can better detect adapter sequence (using overlaps) Can better infer PCR duplicates Increased mapping confidence Can better detect splice variation (if looking) Reasons to use Single-end sequencing No really good reason

METADATA FILE Variables that describe each sample, including those that will be used to compare samples to each other (experimental factors). Also good to include all technical factors that may influence experimental results (ex. Day of RNA isolation) in order to test for effect later. No such thing as too much information!!!

QA/QC Beyond generating better data for downstream analysis, cleaning statistics also give you an idea as to the quality of the sample, library generation, and sequencing technique used to generate the data. This can help inform you of what you might do in the future. I ve found it best to perform QA/QC on both the run as a whole (poor samples can affect other samples) and on the samples themselves as they compare to other samples (REMEMBER, BE CONSISTANT). Reports such as Basespace for Illumina, are great ways to evaluate the runs as a whole. PCA/MDS plots of the preprocessing summary are a great way to look for technical bias across your experiment

GENERAL RULES FOR PREPARING SAMPLES Prepare more samples then you are going to need, i.e. expect some will be of poor quality, or fail Preparation stages should occur across all samples at the same time (or as close as possible) and by the same person Spend time practicing a new technique to produce the highest quality product you can, reliably Quality should be established using Fragment analysis traces (pseudo-gel images, RNA RIN > 7.0) DNA/RNA should not be degraded 260/280 ratios for RNA should be approximately 2.0 and 260/230 should be between 2.0 and 2.2. Values over 1.8 are acceptable Quantity should be determined with a Fluorometer, such as a Qubit. BE CONSISTENT ACROSS ALL SAMPLES!!!

ILLUMINA SEQUENCING http://www.illumina.com/systems/hiseq-3000-4000/specifications.html 2500 MiSeq

SEQUENCING DEPTH The first and most basic question is how many base pairs of sequence data will I get Factors to consider are: 1. Number of reads being sequenced 2. Read length (if paired consider then as individuals) 3. Number of samples being sequenced 4. Expected percentage of usable data The number of reads and read length data are best obtained from the manufacturer s website (search for specifications) and always use the lower end of the estimate.

SEQUENCING COVERAGE Once you have the number of base pairs per sample you can then determine expected coverage Factors to consider are: 1. Length of the genome (in bp) 2. Any extra-genomic sequence, or contamination Coverage is determined differently for Counting based experiments (RNAseq, amplicons) where an expected number of reads per sample is typically more suitable.

RNA-SEQ Characterization of transcripts or differential gene expression Factors to consider are: Read length needed depends on likelihood of mapping uniqueness, but generally longer is better and paired-end is better than single-end. (2 x >75bp is recommended) Interest in measuring genes expressed at low levels ( << level, the >> the depth and necessary complexity of library) The fold change you want to be able to detect ( < fold change more replicates, more depth) Detection of novel transcripts, or quantification of isoforms requires >> sequencing depth The amount of sequencing needed for a given sample is determined by the goals of the experiment and the nature of the RNA sample.

COST ESTIMATION RNA extraction QA/QC (Bioanalyzer) Enrichment of RNA of interest + Library Preparation Library QA/QC (Bioanalyzer and Qubit) Pooling ($10/library) Sequencing (Number of Lanes) Bioinformatics (General rule is to estimate the same amount as data generation, i.e. double your budget) http://dnatech.genomecenter.ucdavis.edu/prices/ Example: 12 samples, ribo-depletion libraries, target 30M reads per sample, Hiseq 3000 (2x100).

COST ESTIMATION 12 Samples QA Bioanalyzer = $98 for all 12 samples Library Preparation (ribo-depletion) = $383/sample = $4,596 Sequencing = $2346 per lane 2.1-2.5 Billion reads per run / 8 lanes = Approximately 300M reads per lane Multiplied by a 0.8 buffer equals 240M expected good reads Divided by 12 samples in the lane = 20M reads per sample per lane. Target 30M reads means 2 lanes of sequencing $2346x2 = $4692 Bioinformatics, simple pairwise comparison design, DE only $2000 Total = $98 + $4596 + $4692 + $2000 = $11,386 Approximately $950 per sample