Long and short/small RNA-seq data analysis

Similar documents
Applications of short-read

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

RNA-Sequencing analysis

Session 8. Differential gene expression analysis using RNAseq data

Transcriptome analysis

SCALABLE, REPRODUCIBLE RNA-Seq

Eucalyptus gene assembly

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

How much sequencing do I need? Emily Crisovan Genomics Core

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Sanger vs Next-Gen Sequencing

Introduction of RNA-Seq Analysis

RNA-Seq with the Tuxedo Suite

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Next Generation Sequencing

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Overcome limitations with RNA-Seq

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Development of quantitative targeted RNA-seq methodology for use in differential gene expression

Course Presentation. Ignacio Medina Presentation

Form for publishing your article on BiotechArticles.com this document to

Deep Sequencing technologies

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

RNA-seq Data Analysis

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Introduction to RNA-Seq

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

Introduction to RNA-Seq

NGS Data Analysis and Galaxy

ChIP-seq and RNA-seq. Farhat Habib

RNAseq Differential Gene Expression Analysis Report

De novo assembly in RNA-seq analysis.

ChIP-seq and RNA-seq

CBC Data Therapy. Metatranscriptomics Discussion

RNA-Seq Analysis. Simon Andrews, Laura v

Single Cell Transcriptomics scrnaseq

RNA-Seq Software, Tools, and Workflows

Analysis Datasheet Exosome RNA-seq Analysis

TECH NOTE Stranded NGS libraries from FFPE samples

Bioinformatics Advice on Experimental Design

DATA FORMATS AND QUALITY CONTROL

Single Cell Genomics

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

RNA-Seq Module 2 From QC to differential gene expression.

rnaseqcore.vet.cornell.edu

1. Introduction Gene regulation Genomics and genome analyses

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

Introduction to RNA-Seq in GeneSpring NGS Software

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

Total RNA isola-on End Repair of double- stranded cdna

RNA-SEQUENCING ANALYSIS

Sequence Analysis 2RNA-Seq

An introduction to RNA-seq. Nicole Cloonan - 4 th July 2018 #UQWinterSchool #Bioinformatics #GroupTherapy

Applied Biosystems SOLiD 3 Plus System. RNA Application Guide

Computational & Quantitative Biology Lecture 6 RNA Sequencing

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

measuring gene expression December 5, 2017

RNA-Seq data analysis course September 7-9, 2015

Advanced RNA-Seq course. Introduction. Peter-Bram t Hoen

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

Galaxy Platform For NGS Data Analyses

Analytics Behind Genomic Testing

Differential gene expression analysis using RNA-seq

Data Analysis with CASAVA v1.8 and the MiSeq Reporter

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

How to deal with your RNA-seq data?

RNA Seq: Methods and Applica6ons. Prat Thiru

completed by ncrna annotation

RNASEQ WITHOUT A REFERENCE

measuring gene expression December 11, 2018

Top 5 Lessons Learned From MAQC III/SEQC

Microarray Gene Expression Analysis at CNIO

RNA-Seq analysis workshop

Reads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Automated size selection of NEBNext Small RNA libraries with the Sage Pippin Prep

Combined final report: genome and transcriptome assemblies

B&DA Committee Bioinformatics and Data Analysis. PAG January 2016

Gene expression microarrays and assays. Because your results can t wait

Measuring and Understanding Gene Expression

oqtans A Galaxy-Integrated Workflow for Quantitative Transcriptome Analysis from NGS Data

NGS-based innovations within the Leiden Network

Gene Expression analysis with RNA-Seq data

Analysis of RNA-seq Data. Bernard Pereira

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Next-Generation Sequencing: Quality Control

RNA Ribonucleic Acid. Week 14, Lecture 28. RNA- seq is a new, emerging field. Two major domains applica:on 12/4/ When the transcriptome is known

Transcription:

Long and short/small RNA-seq data analysis GEF5, 4.9.2015 Sami Heikkinen, PhD, Dos.

Topics 1. RNA-seq in a nutshell 2. Long vs short/small RNA-seq 3. Bioinformatic analysis work flows GEF5 / Heikkinen S / 4.9.2015 2

RNA-seq in a nutshell GEF5 / Heikkinen S / 4.9.2015 3

Planning Bench Define problem Consult bioinformatician (et al)! Get ethical permits Select/get samples (groups, N) Define sequencing strategy RNA-seq project work flow Execution Bioinformatics Extract RNA Generate sequencing libraries Sequence Perform QC Preprocess, align, analyze / test Summarize, visualize Interpret! Bed side Individualized treatment? Genetic risk? etc! GEF5 / Heikkinen S / 4.9.2015 4

Next Generation Sequencing (NGS) =deep sequencing (Fin: syväsekvensointi) counting applications the count of reads aligning to a genomic location matter (the most) e.g. ChIP-seq, RNA-seq, many others qualitative applications the sequence itself matters e.g. whole genome / targeted / exome sequencing d Nature Reviews in Genetics Me>ger 2010 Annu. Rev. Anal. Chem, Mardis 2013 GEF5 / Heikkinen S / 4.9.2015 5

Sample barcoding e.g. ATCACG Barcoded sequencing libraries Sample 1 Sample 2 Linkers & adapters DNA fragment Linkers & adapters Sequencing allows for multiplexing - take benefit of the modern high-capasity sequencers: ~200 million reads per one run on old Illumina HiSeq - recent versions up to 20x that! - typically up to 48 or even 96 barcodes All reads De- bar coded reads Sample 1 Sample 2 GEF5 / Heikkinen S / 4.9.2015 6

Read length most counting applications: 50 bp genome sequencing: 100 600 bp long RNA-seq Reads Fragments Target mrna only measure gene expression levels (etc)? 50 bp OK interested in alternative splicing? Need 100+ bp! 100 bp short/small RNA-seq e.g. mature mirnas ~22 nt ~40 bp Sequence reads Sequenced fragment Target mrna 50 bp Exon 1 Exon 2 100 bp Exon 3 GEF5 / Heikkinen S / 4.9.2015 7

Single or paired-end? single end most counting applications, including typical long RNA-seq paired-end helps in alignment alternative splicing in RNA-seq genome sequencing higher cost, longer sequencer run times Single end OR Sequenced fragment Genome Paired- end AND? - >! Sequence read pairs Sequenced fragment Target mrna 50 bp 100 bp Exon 1 Alternative Exon 3 exon 2 GEF5 / Heikkinen S / 4.9.2015 8

Sequencing depth in RNA-seq, more depth = more reliability (for lower expressed genes) Random result?! N reads Low expressed gene Sample 1 Sample 2 6 4 Higher expressed gene Sample 1 Sample 2 60 40 4 * N reads 24 16 240 160 long RNA-seq on mammalian-size transcriptome - gene expression: need 10-40 million single end reads per sample multiplex ~6-12x - gene expression + alternatively spliced mrna isoforms: need 100 million paired-end reads per sample no/low multiplexing short/small RNA-seq - need 2-3 million single end, 40 bp reads per sample use lower capasity sequencer, and multiplex e.g. 12 x GEF5 / Heikkinen S / 4.9.2015 9

Replicates also RNA-seq suffers from the inherent variation in e.g. gene expression levels between individuals - need samples from many individuals per group - probably at the very least tens - the smaller the expected difference, the bigger the N must be - power calculations? GEF5 / Heikkinen S / 4.9.2015 10

Long vs short RNA-seq GEF5 / Heikkinen S / 4.9.2015 11

Long RNA-seq Short/small RNA-seq Target - any RNA present in the extracted RNA sample - messanger RNAs (mrnas) - long non-coding RNAs (lncrnas), processed pseudogenes etc - typical min length: ~200 bp - all expressed isoforms included Target - Small non-coding RNAs (sncrnas) - micrornas (mirnas) - PIWI-interacting RNAs (pirnas) - small nucleolar RNAs (snornas) - utilizes chemical properties at the ends of small RNAs E.g. Protein mrna 5 3 Genome GEF5 / Heikkinen S / 4.9.2015 12

Long RNA-seq Short/small RNA-seq Starting material - total RNA - mrna only sample Starting material - total RNA THAT MUST include also the <50 bp small RNA species RNA extraction method matters! - small RNA only sample Comlexity - e.g. 19797 known protein coding genes (through 79795 transcripts) - variable tissue specificity - variable lengths à high complexity Comlexity - e.g. 2588 known mature mirnas - higher tissue specificity - very short à low complexity GEF5 / Heikkinen S / 4.9.2015 13

RNA-seq analysis work flows GEF5 / Heikkinen S / 4.9.2015 14

Raw data on server Long RNA-seq data analysis work-flow Download Transcriptome + genome Public data Initial QC Preprocessing Decontam. Raw data locally Trim for 3 - A n ( homertools trim ) Trim Trim adapter and Q- filter ( Trimmomatic ) Preprocess Align to rrna+chrm+etc ( bowtie2 or tophat2 ) Decontaminate QC results fastqc fastqc fastqc bowtie2 index fastqc Unaligned Aligned cufflinks? Align and index ( tophat2 & samtools ) Align, sort, index, and visualize.tdf Gene expressions igvtools Quantitate ( cuffquant ) Test ( cuffdiff ) DEG vs results bowtie2 index fastqc Pathway analysis Associations to clinical data etc transcriptome (re- )annotation ( cuffcompare ) Export ( cuffnorm ) Norm d GEx & counts Alignment Quantitate Analyze GEF5 / Heikkinen S / 4.9.2015 15

RNA-seq pipeline architecture Output (folders) filename(s).suffix pipeline_se]ings.txt run_fastqc.sh log.txt fastqc Input (folder) log.txt Master Unix shell script run_homertools.sh run_trimmomatic.sh log.txt homertools trim trimmomatic Output (folder) filename(s).suffix Output (folder) reporting.sh(s) log.txt filename(s).suffix summary.txt etc.sh. etc Output (folder) filename(s).suffix GEF5 / Heikkinen S / 4.9.2015 16

Some data formats and types Raw sequence data (.fastq.gz) FastQC quality control (.html) Visualization in genome browser (.tdf,.bigbed,.bigwig ) etc Aligned reads (.sam,.bam, indexed and sorted.bam ) Gene expression test results (from cuffdiff, tab.delim.txt) GEF5 / Heikkinen S / 4.9.2015 17

Raw data on server Long RNA-seq data analysis work-flow Download Transcriptome + genome Public data Initial QC Preprocessing Decontam. Raw data locally Trim for 3 - A n ( homertools trim ) Trim Trim adapter and Q- filter ( Trimmomatic ) Preprocess Align to rrna+chrm+etc ( bowtie2 or tophat2 ) Decontaminate QC results fastqc fastqc fastqc bowtie2 index fastqc Unaligned Aligned cufflinks? Align and index ( tophat2 & samtools ) Align, sort, index, and visualize.tdf Gene expressions igvtools Quantitate ( cuffquant ) Test ( cuffdiff ) DEG vs results bowtie2 index fastqc Pathway analysis Associations to clinical data etc transcriptome (re- )annotation ( cuffcompare ) Export ( cuffnorm ) Norm d GEx & counts Alignment Quantitate Analyze GEF5 / Heikkinen S / 4.9.2015 18

Pilot RNA-seq sample from human blood Read count across processing steps Tissue Specific Expression Analysis (TSEA) (top 1000 expressed genes) GEF5 / Heikkinen S / 4.9.2015 19

Pilot RNA-seq sample from human blood GEF5 / Heikkinen S / 4.9.2015 20

Raw data on server Small RNA-seq data analysis work-flow Align, QC, sort, index, visualize Quantitate Annotate Test Phenotype data Initial QC Preprocessing mature mirna index QC Aligned Viz. Unaligned DESeq2 depend. DESeq2 depend. DESeq2 Groups Clim chem Histology Disease risk etc Decontaminate hairpin mirna index QC Aligned Viz. Unaligned sncrna index NOTE: with e.g. miseq (Mediteknia), adapter clipping done already on sequencer 40 bp QC Viz. pirna QC Aligned Unaligned index Aligned Viz. e.g. 22 bp GEF5 / Heikkinen S / 4.9.2015 21

Thank you! uef.fi