RNAseq Differential Gene Expression Analysis Report

Similar documents
Next Generation Sequencing

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

SCALABLE, REPRODUCIBLE RNA-Seq

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

RNA-Seq Analysis. Simon Andrews, Laura v

RNA-seq Data Analysis

Introduction to RNA-Seq in GeneSpring NGS Software

RNA-Seq with the Tuxedo Suite

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

TECH NOTE Stranded NGS libraries from FFPE samples

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

RNA-Seq Module 2 From QC to differential gene expression.

Gene Expression analysis with RNA-Seq data

Transcriptome analysis

Deep Sequencing technologies

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Introduction of RNA-Seq Analysis

Introduction to RNA-Seq

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

RNA-Seq Software, Tools, and Workflows

Sequence Analysis 2RNA-Seq

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Increased transcription detection with the NEBNext Single Cell/Low Input RNA Library Prep Kit

Demo of mrna NGS Concluding Report

RNA Seq: Methods and Applica6ons. Prat Thiru

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Single Cell Transcriptomics scrnaseq

Wheat CAP Gene Expression with RNA-Seq

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Introduction to RNA-Seq

Applications of short-read

Total RNA isola-on End Repair of double- stranded cdna

Course Presentation. Ignacio Medina Presentation

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

RNA-Seq data analysis course September 7-9, 2015

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Sanger vs Next-Gen Sequencing

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

High Throughput Sequencing the Multi-Tool of Life Sciences. Lutz Froenicke DNA Technologies and Expression Analysis Cores UCD Genome Center

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

Low input RNA-seq library preparation provides higher small non-coding RNA diversity and greatly reduced hands-on time

Long and short/small RNA-seq data analysis

Wet-lab Considerations for Illumina data analysis

NGS Data Analysis and Galaxy

Automated size selection of NEBNext Small RNA libraries with the Sage Pippin Prep

Canadian Bioinforma3cs Workshops

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Eucalyptus gene assembly

How to deal with your RNA-seq data?

High-quality stranded RNA-seq libraries from single cells using the SMART-Seq Stranded Kit Product highlights:

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

1. Introduction Gene regulation Genomics and genome analyses

Computational & Quantitative Biology Lecture 6 RNA Sequencing

RNA Sequencing: Experimental Planning and Data Analysis. Nadia Atallah September 12, 2018

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

An introduction to RNA-seq. Nicole Cloonan - 4 th July 2018 #UQWinterSchool #Bioinformatics #GroupTherapy

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

Guidelines Analysis of RNA Quantity and Quality for Next-Generation Sequencing Projects

Differential gene expression analysis using RNA-seq

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Zika infected human samples

Next-generation sequencing technologies

TECH NOTE Ligation-Free ChIP-Seq Library Preparation

Protein and transcriptome quantitation using BD AbSeq Antibody-Oligonucleotide

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:

Next-Generation Sequencing: Quality Control

De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

RAPID, ROBUST & RELIABLE

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

RNA

ANAQUIN : USER MANUAL

Quantifying gene expression

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

RNA SEQUINS LABORATORY PROTOCOL

Single Cell Genomics

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Application Note Selective transcript depletion

Session 8. Differential gene expression analysis using RNAseq data

Technical Note. GeneChip 3 IVT PLUS Reagent Kit vs. GeneChip 3 IVT Express Reagent Kit Comparison. Introduction:

Parts of a standard FastQC report

RNA Sequencing Analyses & Mapping Uncertainty

NEBNext. RNA First Strand Synthesis Module LIBRARY PREPARATION. Instruction Manual. NEB #E7525S/L 24/96 reactions Version 4.

Cufflinks Assembly & DE v2.0 BaseSpace App Guide

Lecture 7. Next-generation sequencing technologies

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

Differential gene expression analysis using RNA-seq

Introductory Next Gen Workshop

Short Read Alignment to a Reference Genome

Transcription:

RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample Species: Number of Samples: Date: Otogenetics Contact: Bioinformatics Phone: Email: bioinfor@otogenetics.com Page 1

1. Description of Workflow Total RNA was submitted to Otogenetics Corporation (Atlanta, GA USA) for RNA-Seq assays. Briefly, the integrity and purity of total RNA were assessed using Agilent Bioanalyzer and OD260/280 using Nanodrop. 1-2 μg of cdna was generated using Clontech Smart cdna kit (Clontech Laboratories, Inc., Mountain View, CA USA, catalog# 634925) from 100ng of total RNA. cdna was fragmented using Covaris (Covaris, Inc., Woburn, MA USA), profiled using Agilent Bioanalyzer, and subjected to Illumina library preparation usi ng NEBNext reagents (New England Biolabs, Ipswich, MA USA, catalog# E6040). 1.1 Illumina RNA-Seq sample preparation workflow Figure 1.1 RNA sample preparation. A. mrnas are purified using Poly(A) selection from total RNA sample, and then fragmented. B. First strand of cdna is synthesized using random priming, followed by the synthesis of the second strand of cdna. C. The resulting double-strand cdna from step B is end repaired, phosphorylated and A-tailed. D. Adapter ligation and PCR amplification are performed, the library is ready for clustering and sequencing. Page 2

2. Raw Data Overview Table 2.1. Quality control. Data summary of generated reads. Ot## Ot## Ot## Ot## Ot## The quality, quantity and size distribution of the Illumina libraries were determined using an Agilent Bioanalyzer 2100. The libraries were then submitted for Illumina HiSeq2500 sequencing according to the standard operation. Paired-end 90-100 nucleotide (nt) reads were generated and checked for data quality using FASTQC (Babraham Institute, Cambridge, UK). After achieving optimum QC results, samples were analyzed. Page 3

3. Bioinformatics analysis workflow To analyze the data we used the following pipeline. 3.1. Trimming sequence reads to remove adapters and low quality bases at ends 3.2. Fastqc quality control of Fastq file per sample 3.3. Mapping sequence reads to reference genome 3.4. Calculating reads count (FPKM) values of gene expression 3.5. Differential gene expression Analysis with unique read counts among samples Figure 3.1. RNA-seq workflow data analysis. Page 4

3.1. Trimming sequence reads to remove adapters and low quality bases at ends. Poor quality or technical sequences can affect the downstream analysis and data interpretation, which lead to inaccurate results. To assess quality of raw sequenced data, we used FastQC before and after trimming the adapters. Sequence reads were trimmed to remove possible adapter sequences and nucleotides with poor quality (error rate < 0.05) at the end. After trimming, sequence reads shorter than 30 nucleotides were discarded. Figure 3. 2. Data quality control. Before triming (left) and after triming (right). You will see data after triming quality control in your folder under fastq with an extension of.fastqc.gz. (There are two per sample, left and right reads). Page 5

Figure 3.3. Data analysis pipeline. Pipeline use to perform the RNA-seq data analysis. 3.2 Mapping sequence reads to reference genome The short reads were then mapped to a reference genome assembly to discover their locations with respect to that reference using HiSat a popular spliced aligner for RNA-sequence (RNA-seq) experiments, from the tuxedo protocol. The results of the mapping can be found in the file: mapping/yoursample.bam The table below shows the summary of the mapping for the samples. Summary Stats of Mapping (MappingStats.log.gz) Page 6

Table 3.1. Summary of the mapping for the samples. Started job on X/XX/2017 12:40:11 AM Started mapping on X/XX/2017 12:40:21 AM Finished on X/XX/2017 1:00:16 AM Mapping speed, Million of reads per hour 149.47 Number of input reads 49615041 Average input read length 252 UNIQUE READS: Uniquely mapped reads number 40503208 Uniquely mapped reads % 81.63% Average mapped length 239.96 Number of splices: Total 37682359 Number of splices: Annotated (sjdb) 37093908 Number of splices: GT/AG 37253900 Number of splices: GC/AG 288473 Number of splices: AT/AC 29937 Number of splices: Non-canonical 110049 Mismatch rate per base, % 0.41% Deletion rate per base 0.01% Deletion average length 1.62 Insertion rate per base 0.01% Insertion average length 1.53 MULTI-MAPPING READS: Number of reads mapped to multiple loci 3016636 % of reads mapped to multiple loci 6.08% Number of reads mapped to too many loci 20654 % of reads mapped to too many loci 0.04% UNMAPPED READS: % of reads unmapped: too many mismatch 0.00% % of reads unmapped: too short 12.22% % of reads unmapped: other 0.03% CHIMERIC READS: Number of chimeric reads 0 % of chimeric reads 0.00% Page 7

4. Calculating reads count (FPKM) values of gene expression The mapping bam files are then imported to the Cufflinks software from the tuxedo protocol, and FPKM values are calculated. You can find these results in the Cufflinks directory of your results. Your results will have: 4.1. Transcriptome assembly: transcripts.gtf This GTF file contains Cufflinks assembled isoforms. The first 7 columns are standard GTF, and the last column contains attributes, some of which are also standardized ( gene_id, and transcript_id ). 4.2. Transcript-level expression: isoforms.fpkm_tracking This file contains the estimated isoform-level expression values in the generic FPKM Tracking Format. 4.3. Gene-level expression: genes.fpkm_tracking This file contains the estimated gene-level expression values in the generic FPKM Tracking Format. For more information about the file format visit: http://cole-trapnelllab.github.io/cufflinks/cufflinks/ Page 8

5. Differential gene expression analysis with unique read counts among samples. Identification of differentially expressed genes/transcripts using Cuffdiff: Cufflinks includes Cuffdiff, which is a program used to find significant changes in transcript expression. Cuffdiff: uses all the bam files from tophat output and compares (differential) across samples or group of samples and will generate similar files with suffix.diff ; you can use reported fold change information for comparing samples. Cuffdiff like Cufflinks calculates the FPKM of each transcript, primary transcript, and gene in each sample. FPKMs are computed by summing the FPKMs of transcripts in each primary transcript group or gene group. The results are output in FPKM tracking files in the format described here: 5.1. FPKM tracking files: There are four FPKM tracking files. Sum the FPKMs of transcripts in each primary transcript group or gene group. isoforms.fpkm_tracking: Transcript FPKMs genes.fpkm_tracking: Gene FPKMs cds.fpkm_tracking: Coding sequence FPKMs. tss_groups.fpkm_tracking: Primary transcript FPKMs. Page 9

5.2. Count tracking files: There are four Count tracking files. Cuffdiff estimates the number of fragments that originated from each transcript. isoforms.count_tracking: Transcript counts genes.count_tracking: Gene counts. cds.count_tracking: Coding sequence counts. tss_groups.count_tracking: Primary transcript counts. 5.3. Read group tracking files: There are four read group tracking files. Cuffdiff calculates the expression and fragment count for each transcript, primary transcript, and gene in each replicate. isoforms.read_group_tracking: Transcript read group tracking genes.read_group_tracking: Gene read group tracking. cds.read_group_tracking: Coding sequence FPKMs. tss_groups.read_group_tracking: Primary transcript FPKMs. 5.4. Differential expression tests: Four files are created. Tab delimited file lists the results of differential expression testing between samples isoform_exp.diff: Transcript-level differential expression. gene_exp.diff: Gene-level differential expression. tss_group_exp.diff: Primary transcript differential expression. cds_exp.diff: Coding sequence differential expression. For more information about the output files format visit: http://cole-trapnelllab.github.io/cufflinks/cuffdiff/ Page 10

6. Data visualization via CummeRbund This is an R package designed to help with the visualization of the large amount Cuffdiff RNA-Seq outputs. Figures reproduce from CummeRbund are showed in figure 6.1. Figure 6.1. Visualization of the results cummerbund. A. Box plots of the groups. B. Expression level distribution for all genes in the experimental conditions. C. Scatter plots highlight general similarities and specific outliers between conditions. D. Volcano plots reveal genes, transcripts, TSS groups or CDS groups that differ significantly between the pairs of conditions. Page 11

6. Deliverables Raw fastq and fastqc files: Fastq dataset Yoursamplename_R1_001.fastq.gz Yoursamplename _R1_001.fastq.gz.md5.txt Yoursamplename _R1_001_fastqc.zip Yoursamplename _R2_001.fastq.gz Yoursamplename _R2_001.fastq.gz.md5.txt Yoursamplename _R2_001_fastqc.zip Yoursamplename _SampleSheet.csv Mapping Yoursamplename.genome.bam Yoursamplename.genome.deduplicated.bai Yoursamplename.genome.deduplicated.bam Yoursamplename.genome.deduplicated.RNAseqmetric Yoursamplename.genome.duplication_metrics Yoursamplename.MappingStats.log.gz Yoursamplename.SJ.out.tsv.gz Page 12

Cufflinks Yoursamplename.genes.fpkm_tracking Yoursamplename.isoforms.fpkm_tracking Yoursamplename.transcripts.gtf.gz skipped.gtf transcripts.gtf.gz cuffdiff/ Each comparison data analysis output tables: GroupA_Vs_GroupB/ bias_params.info cds.count_tracking cds.diff cds.fpkm_tracking cds.read_group_tracking cds_exp.diff gene_exp.diff genes.count_tracking genes.fpkm_tracking genes.read_group_tracking isoform_exp.diff isoforms.count_tracking Page 13

isoforms.fpkm_tracking isoforms.read_group_tracking promoters.diff read_groups.info run.info splicing.diff tss_group_exp.diff tss_groups.count_tracking tss_groups.fpkm_tracking tss_groups.read_group_tracking var_model.info vennt-report.html Page 14

CummeRbund fugures Box.pdf Density.pdf Dispersion.pdf FPKM.pdf FPKM_replicates Volcano.pdf Overall data analysis report. Page 15

References Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15-21. Trapnell C, Roberts A, Goff L,Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562 578. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105-1111. Page 16