RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

Similar documents
Transcriptome analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

How to deal with your RNA-seq data?

RNA-Seq with the Tuxedo Suite

Quantifying gene expression

Sequence Analysis 2RNA-Seq

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

RNA-Sequencing analysis

RNA-Seq Software, Tools, and Workflows

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq. Farhat Habib

Introduction to RNA-Seq

RNA-Seq Analysis. Simon Andrews, Laura v

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Introduction to RNA sequencing

Next Generation Sequencing

Differential gene expression analysis using RNA-seq

RNA-Seq de novo assembly training

Wheat CAP Gene Expression with RNA-Seq

Introduction to RNA-Seq in GeneSpring NGS Software

Differential gene expression analysis using RNA-seq

measuring gene expression December 5, 2017

Introduction to RNA-Seq

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Deep Sequencing technologies

RNA-Seq Module 2 From QC to differential gene expression.

RNA-SEQUENCING ANALYSIS

RNA-seq Data Analysis

Applied Biosystems SOLiD 3 Plus System. RNA Application Guide

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

Introduction of RNA-Seq Analysis

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

Applications of short-read

Experimental Design. Dr. Matthew L. Settles. Genome Center University of California, Davis

measuring gene expression December 11, 2018

De novo assembly in RNA-seq analysis.

RNA standards v May

SCALABLE, REPRODUCIBLE RNA-Seq

RNA-Seq data analysis course September 7-9, 2015

Long and short/small RNA-seq data analysis

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

RNA-Seq analysis workshop

An introduction to RNA-seq. Nicole Cloonan - 4 th July 2018 #UQWinterSchool #Bioinformatics #GroupTherapy

Targeted RNA sequencing reveals the deep complexity of the human transcriptome.

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

Sanger vs Next-Gen Sequencing

FFPE in your NGS Study

RNAseq Differential Gene Expression Analysis Report

Differential gene expression analysis using RNA-seq

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

NGS Data Analysis and Galaxy

RNA Seq: Methods and Applica6ons. Prat Thiru

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Eucalyptus gene assembly

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Single-Cell Whole Transcriptome Profiling With the SOLiD. System

Analysis of RNA-seq Data

Galaxy Platform For NGS Data Analyses

Deep sequencing of transcriptomes

Computational & Quantitative Biology Lecture 6 RNA Sequencing

TECH NOTE Stranded NGS libraries from FFPE samples

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

Gene Expression analysis with RNA-Seq data

Analysis of RNA-seq Data. Bernard Pereira

SUPPLEMENTARY INFORMATION

Bioinformatics in next generation sequencing projects

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Obtain superior NGS library performance with lower input amounts using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina

Shuji Shigenobu. April 3, 2013 Illumina Webinar Series

Single Cell Transcriptomics scrnaseq

Overcome limitations with RNA-Seq

Finding Genes with Genomics Technologies

Measuring transcriptomes with RNA-Seq

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment


Francisco García Quality Control for NGS Raw Data

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Genome annotation & EST

Parts of a standard FastQC report

1. Introduction Gene regulation Genomics and genome analyses

Mapping strategies for sequence reads

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Human housekeeping genes are compact

RNA Sequencing: Experimental Planning and Data Analysis. Nadia Atallah September 12, 2018

From reads to results: differential. Alicia Oshlack Head of Bioinformatics

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

Transcription:

RNA-Seq Workshop AChemS 2017 Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

Benefits & downsides of RNA-Seq Benefits: High resolution, sensitivity and large dynamic range Independent of prior knowledge (in contrast to predesigned probes in microarray analysis). Unravel previously inaccessible complexities. Downside: Data analysis is not straightforward; methods continue to evolve. Cost

Types of RNA-Seq analysis Gene expression analysis Single cell RNA-Seq (scrna-seq) Small RNA-Seq (mirna-seq) Analysis of RNA-protein/RNA-RNA-interaction

Goals of typical RNA-Seq analysis Identify expressed genes and transcripts Quantify gene expression in different conditions or tissues (differential expression). Identify novel transcripts and genes (de novo assembly) Alternative splicing Novel transcribed genes Transcriptome from non-model organisms

Comparison of sequencing platforms 1 st Gen 2 nd Gen 3 rd Gen

Overview of Illumina RNA-Seq https://www.slideshare.net/ueb52/uebuat-bioinformatics-course-session-23-vhir-barcelona

Sequencing strategies Which library preparation protocol to use? How many replicates? What is the optimal library size (sequencing depth)? Paired end or single end? Which data analysis pipeline to use?

Not all types of RNA encode information The bulk (~95%) of cellular RNA is rrna and trna. http://finchtalk.blogspot.com/2009/05/small-rnas-get-smaller.html

Quality and quantity of input RNA High quality RNA is preferred, but many times not available. Needle biopsies, Laser microdissection and formalin fixed paraffin embedded samples yield low integrity RNA. The amount of RNA may be low by necessity or by design (e.g. scrna-seq).

mrna has to be selectively enriched polya Selection RNase H = Magnetic bead Ribo-Zero

Stranded libraries are better! Stranded libraries preserve information on the strand of origin of the transcript Helpful when overlapping antisense transcripts occur in a genomic region (~19% of genes in human genome!) e.g. Mouse Gng13 and Chtf18 genes.

How many replicates? Considerations Include: Technical variability of RNA-Seq protocol. The intrinsic biological variability. The desired statistical power. Multiple samples can be sequenced in the same lane (multiplexing). Prepare all replicate libraries at once, to avoid batch effects.

Sequencing mode and length Paired end preferred for de novo transcriptome assembly and isoform level analysis Single end sequencing sufficient for gene expression studies Illumina sequencer read lengths vary from 50-150bp. Longer reader length= better mappability.

Library size Only a subset of the genome is transcribed The dynamic range of gene expression is huge Reliable detection of genes expressed at lower levels need bigger library size. scrna seq needs lower depth Tools such as Scotty and RNASeqPower can help calculate optimum library size and # replicates based on pilot data. The ENCODE consortium guidelines: http://encodeproject.org/encode/experiment_guidelin es.html

RNA-Seq Library preparation

Library specific index sequences allow pooling multiple libraries ~ 6 libraries are pooled per lane for typical RNA-Seq 100 s of libraries are pooled for scrna-seq.

Digital RNA-Seq uses barcodes to correct PCR bias Proc Natl Acad Sci U S A. 2012 Jan 24;109(4):1347-52 Particularly useful when many cycles of PCR amplification are used (e.g scrna-seq)

Illumina Sequencing

From sequence to biological insights Reads Mapping FASTQ Files QC by FastQC/R To genome/transcriptome/de novo Expression quantification Summarize read counts : EM/union of exons QC by RSeQC Differential Expression Analysis Gene/transcript level Functional Interpretation Enriched pathways/go terms, integration with other data Biological Insights & hypothesis

FASTQ file format FASTQ format is used by modern sequencers. Bundles a FASTA sequence and its quality data. Line1: Sequence identifier Line2: Raw sequence Line3: meaningless, may repeat sequence identifier Line4: quality values for the sequence (!=lowest, ~ highest) @HWUSI-EAS100R:6:73:941:1973#0/1 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT +!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

Sequencing QC, using FastQC Basic information (total reads, sequence length, etc.) Per base sequence quality Overrepresented sequences GC content Duplication level Etc.

FastQC report http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Per base sequence quality

Overrepresented Sequences Adapter

Challenges in RNA-Seq read alignment How to correctly align short reads to the parent gene? Theoretically, the chances of a 100 bp read occurring more than once in a genome is infinitesimally small (4 100 = ~1.6*10 60, compared to the size of mammalian genome, ~3*10 9 ). But repeat elements, such as conserved regions in gene families and overlapping antisense genes abound in the genome. About 1/3 of RNA-Seq reads span exon-exon junctions!

Are you awake?

The shredded book analogy for short read alignment Adapted from a lecture by Michael Schatz, JHU

Nature Methods. 10, 1165 1166 (2013)

de Bruijn Graph assembly Repeat sequences make correct reconstruction and quantification difficult.

Overview of read mapping and transcript identification Model organisms (Reference sequence available) RNA-Seq Reads Non-model organisms (poor or no reference sequence) RNA-Seq reads Splice aware mapper TopHat, STAR, HISAT Ungapped mapper BWA, Bowtie De novo assembler Trinity StringTie Cufflinks Align to genome Align to transcriptome Identify all transcripts EM algorithm Cufflinks, RSEM With GTF Analyze known transcripts Union of exons FeatureCounts W/O GTF Discover novel transcripts RSEM, Kallisto Analyze known transcripts Align to de novo transcriptsome Analyze BWA, Bowtie RSEM Kallisto

Alignment and annotation files SAM is a text based file format for storing sequences aligned to a reference sequence. Consists of header (read names) and alignment sections (mandatory). Alignment section has 11 mandatory fields specifying alignment information BAM files are compressed forms of SAM files. GTF, GFF and BED files contain annotations of features such as the cordinates of genes, transcripts and exons.

Genome browsers Sashimi Plot Web based: UCSC Desktop: IGV

Taste cell and tissue isolation for RNA-Seq analysis A B C Before cutting After cutting Type III Salt Type III Sour T1R3GFP (Sweet/Umami) GustGFP (Mostly bitter) GADGFP (Type III) Lgr5GFP (Stem) Circumvallate Fungiform 33

Mapping QC Percentage of reads properly mapped or uniquely mapped Among the mapped reads, the percentage of reads in exon, intron, and intergenic regions. Splice junctions 5' or 3' bias Etc Popular software include RseqQC and RNAseqQC.

Read mapping to gene features

Splice junction saturation

Taste cells express many novel isoforms and genes 42%-45% of the splice junctions in taste libraries are either completely or partially novel. But these novel splice junctions were rarely used (<5%). Taste and olfactory tissue is barely represented in public gene annotation efforts.

Normalized read mapping intensity Gene body coverage 100 Bulk taste libraries Single cell libraries 0 Normalized Distance along transcript 5 ->3 (%)

Motivation for re-annotating the taste transcriptome Not all transcripts are fully annotated, even in human and mouse Transcriptomes are annotated from well studied tissues by RefSeq and Gencode. The 3 and 5 UTRs of genes are poorly annotated This causes problems for 3 end sequencing Especially problematic for scrna-seq

Strategies vary for model organisms

And non-model organisms

Methods for transcriptome Assembly Reference-based assembly De novo assembly Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671 682

Transcriptome assembly when reference genome is available https://galaxyproject.org/tutorials/rb_rnaseq/

When reference genome and transcriptome are available Bioinformatics (2011) 27 (17): 2325-2329. Reference annotation based transcriptome assembly (RABT assembly) leverages existing gene annotations for discovering novel transcripts. Appropriate for model organisms.

Strategy for RABT assembly of taste RNA-Seq data Reference annotation based transcriptome assembly using cufflinks and Stringtie packages of taste bud libraries Results from the two workflows were combined. Non coding, pre-mrna and transcripts containing premature stop codons were removed. Potential coding transcripts were functionally annotated. More info: Poster # 520

Many novel genes and isoforms of known genes were identified in the taste buds Transcript types De novo Gene annotations Identical to known 111512* Novel Intronic 115 Novel isoforms of known genes 50110 Novel intergenic Transcripts 1649 Novel antisense transcripts 303 *Out of a total of 111706 transcripts in Gencode M7

Improved bitter taste receptor gene annotations Blue= de novo model, red = refseq model 23/35 Tas2r genes are multi-exonic. Ten of them were verified by RT-PCR using cdna from taste tissue

Novel isoform of known genes: e.g. Chromogranin A

Improved mouse OR gene annotations A B 913 (73.1%) OR and 246 (45.9%) VR genes had extended gene Models. The de novo models are more sensitive at detecting OR gene expression (B). A : From PLoS Genet. 2014 Sep 4;10(9):e1004593 B: From Scientific Reports 5, Article number: 18178 (2015) doi:10.1038/srep18178

Thanks for your attention! ssukumaran@monell.org Many figures and slides in this presentations came from publications, presentations, web pages etc. I am grateful to the authors for making them available.