RNA-Seq with the Tuxedo Suite

Similar documents
RNA-Seq Software, Tools, and Workflows

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Sequence Analysis 2RNA-Seq

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

Quantifying gene expression

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Eucalyptus gene assembly

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Analysis of RNA-seq Data. Bernard Pereira

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Analysis of RNA-seq Data

Next Generation Sequencing

Transcriptome analysis

ChIP-seq and RNA-seq

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

ChIP-seq and RNA-seq. Farhat Habib

RNA-SEQUENCING ANALYSIS

Introduction to RNA-Seq

Introduction to RNA-Seq

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

NGS Data Analysis and Galaxy

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

CSE 549: RNA-Seq aided gene finding

Applications of short-read

How to deal with your RNA-seq data?

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

RNAseq Differential Gene Expression Analysis Report

Canadian Bioinforma3cs Workshops

SCALABLE, REPRODUCIBLE RNA-Seq

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

RNA-Seq Module 2 From QC to differential gene expression.

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

RNA-Seq Analysis. Simon Andrews, Laura v

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

NGS part 2: applications. Tobias Österlund

Introduction to RNA sequencing


Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

RNA Seq: Methods and Applica6ons. Prat Thiru

Long and short/small RNA-seq data analysis

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Introduction of RNA-Seq Analysis

RNA-Sequencing analysis

Mapping strategies for sequence reads

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Course Presentation. Ignacio Medina Presentation

Genome annotation & EST

CBC Data Therapy. Metatranscriptomics Discussion

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

De novo assembly in RNA-seq analysis.

RNA-seq differential expression analysis. bioconnector.org/workshops

Computational & Quantitative Biology Lecture 6 RNA Sequencing

Sanger vs Next-Gen Sequencing

02 Agenda Item 03 Agenda Item

1. Introduction Gene regulation Genomics and genome analyses

RNA Ribonucleic Acid. Week 14, Lecture 28. RNA- seq is a new, emerging field. Two major domains applica:on 12/4/ When the transcriptome is known

Accurate, Fast, and Model-Aware Transcript Expression Quantification

How to Use This Presentation

Wheat CAP Gene Expression with RNA-Seq

RNA-Seq Blog Poll Results

From reads to results: differential. Alicia Oshlack Head of Bioinformatics

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Next Generation Genome Annotation with mgene.ngs

Transcriptome Assembly and Evaluation, using Sequencing Quality Control (SEQC) Data

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

Differential gene expression analysis using RNA-seq

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

oqtans A Galaxy-Integrated Workflow for Quantitative Transcriptome Analysis from NGS Data

Genomic resources. for non-model systems

RNA-Seq Tutorial 1. Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016

measuring gene expression December 5, 2017

Workflows and Pipelines for NGS analysis: Lessons from proteomics

RNA

Proteogenomics Workflow for Neoantigen Discovery

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

RNA-Seq data analysis course September 7-9, 2015

The Iso-Seq Method: Transcriptome Sequencing Using Long Reads

res2)=)colocalisation(eqtl_file="icoslg_eqtl.txt",) gwas_file="uc_gwas_icoslg_locus.txt",)p1)=)1e+4,)p2)=)1e+4,)p12)=)1e+5,) region)=)200000)$

Gene Regulation 10/19/05

Chapter 13. From DNA to Protein

Videos. Lesson Overview. Fermentation

Form for publishing your article on BiotechArticles.com this document to

Genomics and Transcriptomics of Spirodela polyrhiza

EECS730: Introduction to Bioinformatics

ANAQUIN : USER MANUAL

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Bacterial Genome Annotation

RNA-Seq analysis using R: Differential expression and transcriptome assembly

Introduction to RNA-Seq in GeneSpring NGS Software

Deep sequencing of transcriptomes

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

Transcription:

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop

The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with RNA-Seq. Bioinformatics Trapnell C, et al. 2010 Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology Kim D, et al. 2011 TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology Roberts A, et al. 2011 Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology Roberts A, et al. 2011 Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics Cufflinks assembles transcripts Cuffdiff identifies differential expression of genes/ transcripts/promoters Trapnell C, et al. 2013 Differential analysis of gene regulation at transcript resolution with RNA-Seq. Nature Biotechnology

Alignment and Differential Expression Read set(s) TopHat bam file(s) Existing annotation (GTF) We followed these steps with the single-end reads Toptables, etc. Cuffdiff

But, do we have all the genes? For organisms with genomes, gene models are stored in gtf files Assumptions: The gtf file contains annotation for ALL transcripts and genes All splice sites, start/stop codons, etc. are correct Are these assumptions correct for every sequenced organism? RNA-Seq reads can be used to independently construct genes and splice variants using limited or no annotation Method used depends on how much sequence information there is for the organism

Gene Construction (Alignment) vs. Assembly Genome- Sequenced Organisms Trinity software Novel or Non-Model Organisms Haas and Zody (2010) Nat. Biotech. 28:421-3

Gene / Transcriptome Construction Annotation can be improved even for well-annotated model organisms Identify all expressed exons Combine expressed exons into genes Find all splice variants for a gene Discover novel transcripts For newly sequenced organisms Validate ab initio annotation Comparison between different annotation sets Can assist in finding some types of contamination Reconstruction of rrna genes Genomic/mitochondrial DNA in RNA library preps.

Reference Annotation Based Transcript (RABT) Assembly Read set(s) TopHat bam file(s) Existing annotation (GTF) [optional] Cufflinks Cuffmerge Cuffcompare Read-set specific GTF(s) Merged GTF Final assembly (GTF and stats) Toptables, etc. Cuffdiff

TopHat Spliced Alignment to a Genome

Reference Annotation Based Transcript (RABT) Assembly

Cufflinks Identification of Incompatible Fragments Incompatible alignment

Cufflinks Minimum Paths to Transcripts

Cufflinks Abundance Estimation

Cufflinks Abundance Estimation

Merging Cufflinks Assemblies

So Now We ve Explored These Tools

We ve Used Other Software in Conjunction HTSeq-count Raw Counts edger (But HTSeq-count and edger are independent)

And Then Came Some Extensions

Modules Introduced in 2014 Cuffquant Improves efficiency of running multiple samples Stores data in.cxb compressed format, that can later be analyzed with cuffdiff or cuffnorm Cuffnorm Generate tables of expression values that are normalized for library size. Tables are used as input to Monocle Monocle Used to analyze single-cell expression data Trapnell, et al., 2014, Nat. Biotech. 32:381

But Software Continues to Evolve HISAT (Hierarchical Indexing for Spliced Alignment of Transcripts) Kim et al., 2015, Nat. Methods Planned to be Tophat3 Faster than other aligners More accurate on simulated reads.

But Software Continues to Evolve StringTie Pertea et al., 2015, Nat. Biotech Probable successor to Cufflinks2 Assembles more transcripts (based on simulated reads) Ballgown Frazee et al., 2015, Nat. Biotech Bioconductor R package Probable successor to Cuffdiff2 Includes useful Tablemaker preprocessor

A New Potential Game-Changer (2015) Kallisto ( Near-Optimal RNA-Seq Quantification ) Bray et al. (http://arxiv.org/abs/1505.02710) Extremely fast, uses pseudo-alignment based on k-mers and debruijn graphs Speed Accuracy

A Few Words About Bacterial RNA-Seq

Eukaryotic and Bacterial Gene Structures are Different Eukaryotes Gene structure includes introns and exons Splicing, poly-adenylation Each mrna is a discrete molecule when translated Bacteria / Prokaryotes Individual genes and groups of genes in operons Generally, no splicing, no polya One mrna can contain coding sequences for multiple proteins

Bacterial RNA-Seq Considerations rrna depletion strategies may leave considerable amounts of non-coding RNA molecules Splicing-aware aligners (such as Tophat) may not be useful Reads from polycistronic mrna may overlap two genes How would HTSeq-Count handle this? Compare alignments to the genome to alignments to transcriptome. Some aligners, such as bwa-mem, will report secondary alignments Transcriptome alignments can be used to generate counts table for edger Specialized software, such as Rockhopper (stand-alone, http://cs.wellesley.edu/~btjaden/rockhopper/)