Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Similar documents
RNA-Sequencing analysis

RNA-Seq with the Tuxedo Suite

measuring gene expression December 5, 2017

SNPs - GWAS - eqtls. Sebastian Schmeier

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Mapping strategies for sequence reads

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Gene Expression Technology

Introduction to RNA-Seq

Transcription in Eukaryotes

RNA-Seq Software, Tools, and Workflows

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Non-Organic-Based Isolation of Mammalian microrna using Norgen s microrna Purification Kit

Gene Expression: Transcription

Long and short/small RNA-seq data analysis

Assay Standards Working Group Nov 2012 Assay Standards Working Group Recommendations, November 2012

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

RNAseq Differential Gene Expression Analysis Report

Welcome to the NGS webinar series

Microarray Gene Expression Analysis at CNIO

RNA Seq: Methods and Applica6ons. Prat Thiru

Course Presentation. Ignacio Medina Presentation

Unit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total)

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Introduction to RNA sequencing

Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

About Strand NGS. Strand Genomics, Inc All rights reserved.

Introduction to the UCSC genome browser

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Bio 101 Sample questions: Chapter 10

Chapter 15 Gene Technologies and Human Applications

Introduction to Bioinformatics and Gene Expression Technologies

Genomics and Transcriptomics of Spirodela polyrhiza

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Introductory Next Gen Workshop

RNA-Seq analysis workshop

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot

Hello! Outline. Cell Biology: RNA and Protein synthesis. In all living cells, DNA molecules are the storehouses of information. 6.

Year III Pharm.D Dr. V. Chitra

Demo of mrna NGS Concluding Report

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

DNA Structure & the Genome. Bio160 General Biology

132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, This exposition is based on the following source, which is recommended reading:

8/21/2014. From Gene to Protein

Gene Identification in silico

Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading:

Chapter 12. DNA TRANSCRIPTION and TRANSLATION

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

Chapter 13. From DNA to Protein

SCALABLE, REPRODUCIBLE RNA-Seq

Adv Biology: DNA and RNA Study Guide

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

DNA. bioinformatics. epigenetics methylation structural variation. custom. assembly. gene. tumor-normal. mendelian. BS-seq. prediction.

Lecture #1. Introduction to microarray technology

RNA spike-in controls & analysis methods for trustworthy genome-scale measurements

7.2 Protein Synthesis. From DNA to Protein Animation

INTRODUCTION TO REVERSE TRANSCRIPTION PCR (RT-PCR) ABCF 2016 BecA-ILRI Hub, Nairobi 21 st September 2016 Roger Pelle Principal Scientist

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

SOLiD Total RNA-Seq Kit SOLiD RNA Barcoding Kit

Microarrays: since we use probes we obviously must know the sequences we are looking at!

Protein Synthesis: Transcription and Translation

RNA-seq Data Analysis

Molecular Markers CRITFC Genetics Workshop December 9, 2014

CM581A2: NEXT GENERATION SEQUENCING PLATFORMS AND LIBRARY GENERATION

Top 5 Lessons Learned From MAQC III/SEQC

BIO 311C Spring Lecture 36 Wednesday 28 Apr.

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

CH 17 :From Gene to Protein

Agilent Genomics Software Future Directions

Analysis of Biological Sequences SPH

BIOLOGY Dr.Locke Lecture# 27 An Introduction to Polymerase Chain Reaction (PCR)

Next Generation Sequencing. Target Enrichment

Analysis of Differential Gene Expression in Cattle Using mrna-seq

Regulation of eukaryotic transcription:

From DNA to Protein: Genotype to Phenotype

Target Enrichment Strategies for Next Generation Sequencing

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Genomics and Gene Recognition Genes and Blue Genes

Computational Biology I LSM5191

Eukaryotic Gene Prediction. Wei Zhu May 2007

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?

Eukaryotic Gene Structure

Bundle 6 Test Review

RNA : functional role

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

Quantitative Real Time PCR USING SYBR GREEN

DNA Structure and Replication, and Virus Structure and Replication Test Review

Protein Synthesis Notes

Applications and Uses. (adapted from Roche RealTime PCR Application Manual)

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

Chromosomes. Chromosomes. Genes. Strands of DNA that contain all of the genes an organism needs to survive and reproduce

Lecture for Wednesday. Dr. Prince BIOL 1408

Transcription:

Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein

Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Qualitative Quantitative Integrative Understand the molecular basis of gene function. Classify and transform cellular states

RNA studies involve... Biological System Questions Project Available Resources Technology DB ~/bin

RNA studies involve... Biological System Questions Project Available Resources Technology DB ~/bin This talk: Focusing on reference based mammalian RNA-seq analysis

Transcriptional Complexity TSS TSS TSS pa pa pa pa TSS genomic DNA micrornas spliced intron TSS transcription start site protein coding regions pa polyadenylation signal translation start site non-coding regions polyadenylation

Transcriptional Complexity TSS TSS TSS pa pa pa pa TSS tirna PASR mirna genomic DNA micrornas spliced intron TSS transcription start site protein coding regions pa polyadenylation signal translation start site non-coding regions polyadenylation

Transcriptional Complexity TSS TSS TSS pa pa pa pa TSS tirna PASR Alu mirna genomic DNA micrornas spliced intron TSS transcription start site protein coding regions pa polyadenylation signal translation start site non-coding regions polyadenylation

Transcriptional Complexity Mutations Allelic Expression TSS TSS TSS pa pa pa pa TSS tirna PASR Alu mirna RNA Editing genomic DNA micrornas spliced intron TSS transcription start site protein coding regions pa polyadenylation signal translation start site non-coding regions polyadenylation

RNA-seq TSS TSS TSS pa pa pa pa TSS tirna PASR Alu mirna non-spliced reads mutations strand specific junction reads Cloonan et al. Nat Methods 2008; 5:613-619

Advantages of RNA-seq Discovery genes, exons, junctions, UTRs, fusions (Present and Future) %#!!!!" %!!!!!",-./01-2340" $#!!!!" $!!!!!" #!!!!"!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"

Advantages of RNA-seq Discovery genes, exons, junctions, UTRs, fusions (Present and Future) Dynamic Range,-./01-2340" %#!!!!" %!!!!!" $#!!!!" $!!!!!" #!!!!"!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6" Mortazavi et al. Nat. Methods 2008; 5:621 628

Advantages of RNA-seq Discovery genes, exons, junctions, UTRs, fusions (Present and Future) Dynamic Range,-./01-2340" %#!!!!" %!!!!!" $#!!!!" $!!!!!" #!!!!"!" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6" Mortazavi et al. Nat. Methods 2008; 5:621 628 Nucleotide Specific

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Library Construction trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA rrna (80%) Profile (ribosomes) AA A Fragment cellular RNA Capture (tiling arrays) ds-cdna synthesis Sequencing Ligate adaptors + Amplify

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

RNA-seq Mapping Challenge #1: Introns

RNA-seq Mapping Challenge #1: Introns Align to database of junctions or transcriptome Split Read Alignments Wood et al. Bioinformatics 2011; 27:580 581 Trapnell et al. Bioinformatics 2009; 25:1105-11

RNA-seq Mapping Challenge #1: Introns Align to database of junctions or transcriptome Split Read Alignments Wood et al. Bioinformatics 2011; 27:580 581 Trapnell et al. Bioinformatics 2009; 25:1105-11 Challenge #2: Correctness Sufficient Overlap Sufficient Evidence

RNA-seq Mapping Challenge #1: Introns Align to database of junctions or transcriptome Split Read Alignments Wood et al. Bioinformatics 2011; 27:580 581 Trapnell et al. Bioinformatics 2009; 25:1105-11 Challenge #2: Correctness Challenge #3: Multi-mappers Sufficient Overlap Sufficient Evidence Align to the transcriptome Sequence Similarity

RNA-seq Mapping Data QC (clipping) Align to Filter Set Align to genome Align to junctions Split read Alignment Exclude Flag and Exclude Choose Alignments, Disambiguate Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

RNA-seq Mapping Data QC (clipping) Align to Filter Set Align to genome Align to junctions Split read Alignment Exclude Flag and Exclude Choose Alignments, Disambiguate Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11 BAM BAM BAM Alignment Filtering Library QC Analysis

RNA-seq Mapping rrna, trna? reference? diploid? gene model? ESTs? Algorithm? Data QC (clipping) Align to Filter Set Align to genome Align to junctions Split read Alignment Exclude Flag and Exclude Choose Alignments, Disambiguate Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11 BAM BAM BAM Alignment Filtering Library QC Analysis

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA rrna (80%) Profile (ribosomes) AA A Fragment cellular RNA Capture (tiling arrays) ds-cdna synthesis Sequencing Ligate adaptors + Amplify

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA Affects RNA content (Expression quantification) rrna (80%) Profile (ribosomes) AA A Fragment cellular RNA Capture (tiling arrays) ds-cdna synthesis Sequencing Ligate adaptors + Amplify

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA Affects RNA content (Expression quantification) rrna (80%) cellular RNA Profile (ribosomes) Capture (tiling arrays) AA A Fragment Affects Insert Size (transcript identification) ds-cdna synthesis Sequencing Ligate adaptors + Amplify

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA Affects RNA content (Expression quantification) rrna (80%) cellular RNA Profile (ribosomes) Capture (tiling arrays) AA A Fragment Affects Insert Size (transcript identification) ds-cdna synthesis Affects Strand Specificity Sequencing Ligate adaptors + Amplify

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA Affects RNA content (Expression quantification) rrna (80%) cellular RNA Profile (ribosomes) Capture (tiling arrays) AA A Fragment Affects Insert Size (transcript identification) ds-cdna synthesis Affects Strand Specificity Sequencing Ligate adaptors + Amplify Affects Library Complexity (Tag uniqueness)

Library Quality Control (QC) trna (15%) 5% Deplete rrna Enrich polya RNA AA AA AA Target RNA Affects RNA content (Expression quantification) rrna (80%) cellular RNA Profile (ribosomes) Capture (tiling arrays) AA A Fragment Affects Insert Size (transcript identification) ds-cdna synthesis Affects Strand Specificity Affects Mapping Rate Paired-end? Sequencing Ligate adaptors + Amplify Affects Library Complexity (Tag uniqueness)

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

Calculate Gene Expression Gene A 3500nt (700 reads) Gene B 400nt (160 reads)

Calculate Gene Expression Gene A 3500nt (700 reads) Gene B 400nt (160 reads) RPKM = 2.0 RPKM = 4.0 Reads Per Kilobase per Million 10 RPKM = R 3 10 6 L N R = Gene Read Count L = Length of gene N = Library Size Mortazavi et al. Nat. Methods 2008; 5:621 628

Further Normalisation Repeat Normalise to mappable gene length Koehler et al. Bioinformatics 2010

Further Normalisation Repeat Normalise to mappable gene length Scale Expression Values by TMM Koehler et al. Bioinformatics 2010 Cellular RNA Cond. 1 Cond. 2 Robinson et al. Genome Biology 2010; 11:R25

Further Normalisation Repeat Normalise to mappable gene length Scale Expression Values by TMM Koehler et al. Bioinformatics 2010 Cellular RNA RPKM Cond. 1 Cond. 2 Cond. 1 Cond. 2 Robinson et al. Genome Biology 2010; 11:R25

Further Normalisation Repeat Normalise to mappable gene length Scale Expression Values by TMM Koehler et al. Bioinformatics 2010 Robinson et al. Genome Biology 2010; 11:R25 Benjamini et al. NAR; 2012 Normalise to GC content of region

Calculate Feature Expression

Calculate Feature Expression Exonic Region

Calculate Feature Expression Exonic Region Exon Junction

Calculate Feature Expression Exonic Region Exon Junction Intronic Region

Calculate Feature Expression Exonic Region Exon Junction Intronic Region Exon Boundary

Calculate Feature Expression Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate Feature Expression Calculate RPKM for any feature Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate Feature Expression Calculate RPKM for any feature Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region Extended 3 UTR

Calculate Feature Expression Calculate RPKM for any feature Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region Extended 3 UTR Retained Intron

Calculate Transcript Expression

Calculate Transcript Expression diagnostic feature

Calculate Transcript Expression diagnostic feature Approach #1: Expression calculated using diagnostic features Strong Evidence Easy to calculate Sampling Variability Lacks statistical robustness Dependent on gene model Excludes Transcripts ALEXA-seq: Griffith et al. Nat. Methods 2010; 11:R25

Calculate Transcript Expression

Calculate Transcript Expression Approach #2: Expression estimated Construct bipartite graph, then finds minimum path Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515

Calculate Transcript Expression Approach #2: Expression estimated Construct bipartite graph, then finds minimum path Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515 Estimates expression for all transcripts Incorporates ambiguous reads Model can fail in complex / highly expressed regions More statistically robust Error rate largely unknown

Expressed or not? Cond. 1 Cond. 2 Cond. 3 Frequency not expressed expressed Need to determine expression cut-off value log2 (expression)

Expressed or not? 1 Expressed if > 1 RPKM Has literature support Lacks sensitivity Arbitrary

Expressed or not? 1 Expressed if > 1 RPKM Has literature support Lacks sensitivity Arbitrary 2 Expressed if above intergenic background Frequency log2 Expression 95th percentile

Expressed or not? 1 Expressed if > 1 RPKM Has literature support Lacks sensitivity Arbitrary 2 Expressed if above intergenic background Cut-off based on empirical evidence Still somewhat arbitrary Frequency log2 Expression 95th percentile

Expressed or not? 1 2 3 Expressed if > 1 RPKM Expressed if above intergenic background Incorporate replicate information Has literature support Cut-off based on empirical evidence np IDR 0 0.1 0.3 0.5 0.7 0.9 1 Based on observed reproducibility Lacks sensitivity Still somewhat arbitrary Requires replicates log2 (expression) bins Arbitrary Frequency Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut off log2 Expression 11 7 3 1 5 9 13 17 21 25 95th percentile

Expressed or not? 1 Expressed if > 1 RPKM Has literature support Lacks sensitivity Arbitrary 2 Expressed if above intergenic background Cut-off based on empirical evidence Still somewhat arbitrary Frequency log2 Expression 95th percentile 3 Incorporate replicate information Based on observed reproducibility Requires replicates np IDR 0 0.1 0.3 0.5 0.7 0.9 1 Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut off 11 7 3 1 5 9 13 17 21 25 log2 (expression) bins

Expressed or not? 1 Expressed if > 1 RPKM Has literature support Lacks sensitivity Arbitrary 2 Expressed if above intergenic background Cut-off based on empirical evidence Still somewhat arbitrary Frequency log2 Expression 95th percentile 3 Incorporate replicate information Based on observed reproducibility Requires replicates np IDR 0 0.1 0.3 0.5 0.7 0.9 1 Rep 1 vs Rep 2 Rep 2 vs Rep 1 Mean Cut off 11 7 3 1 5 9 13 17 21 25 log2 (expression) bins Choose what is reasonable for your experiment, be consistent!

Nucleotide-Resolution Analysis Imprinting ICR

Nucleotide-Resolution Analysis Imprinting eqtl sqtl

Nucleotide-Resolution Analysis Imprinting eqtl sqtl Complex Traits

Nucleotide-Resolution Analysis Imprinting eqtl sqtl Complex Traits Allelic Fraction A B C SNPs

Nucleotide-Resolution Analysis Imprinting eqtl sqtl Complex Traits A B C SNPs Allelic Fraction Density 0.0 0.5 1.0 1.5 2.0 Expected Mean Observed Mean Reference bias 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of RNA seq Reads Matching Reference Allele Degner et al. Bioinformatics 2009

Nucleotide-Resolution Analysis Imprinting eqtl sqtl Complex Traits A B C SNPs Allelic Fraction Density 0.0 0.5 1.0 1.5 2.0 Expected Mean Observed Mean Reference bias Map to a diploid genome 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of RNA seq Reads Matching Reference Allele AlleleSeq: Rozowsky et al. Mol. Sys. Bio 2011 Degner et al. Bioinformatics 2009

Typical experiment workflow Field / Clinic Wet Lab Dry Lab Run Experiment Design Experiment Sample Acquisition Field / Clinic / Lab Obtain RNA Make Library Sequencing 1 Base Calling Mapping 2 Library QC 2 Sample Acquisition Verification Validation Analysis Interpretation 3 3 Publish

The future of RNA-seq (now) Single Cell Shalek, et al. Nature 2013

The future of RNA-seq (now) Single Cell Huge Cohort Genotype-Tissue Expression project (GTEx) 900 donors 30,000 RNA-seq data sets! Shalek, et al. Nature 2013 Lonsdale, et al. Nature Genetics 2013

Summary 1 Choose an alignment approach suitable for your experiment, available resources and tools 2 Assess library quality, specifically rrna contamination, insert size, strand specificity and library complexity 3 Gene and Feature Expression can be calculated using count data, and normalised by length, library size and GC content 4 Transcript expression calculation requires alternative approaches and algorithms, which although common, are largely unproven 5 RNA-seq can interrogate nucleotide specific questions, but be careful of alignment biases (diploid mapping can help here)

Questions and References Cloonan et al. Nat Methods 2008; Stem cell transcriptome profiling via massive-scale mrna sequencing Mortazavi et al. Nat. Methods 2008; Mapping and quantifying mammalian transcriptomes by RNA-Seq Wood et al. Bioinformatics 2011; X-MATE: A flexible system for mapping short read data Trapnell et al. Bioinformatics 2009; TopHat: discovering splice junctions with RNA-Seq Koehler et al. Bioinformatics 2010. The Uniqueome: A mappability resource for short-tag sequencing Robinson et al. Genome Biology 2010; A scaling normalization method for differential expression analysis of RNA-seq data. Benjamini et al. NAR; 2012. Summarizing and correcting the GC content bias in high-throughput sequencing Griffith et al. Nat. Methods 2010; Alternative expression analysis by RNA sequencing. Trapnell et al. Nat. Biotech. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform Degner et al. Bioinformatics 2009; Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing Rozowsky et al. Mol. Sys. Bio 2011; AlleleSeq: analysis of allele-specific expression and binding in a Shalek, et al. Nature 2013; Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells Lonsdale, et al. Nature Genetics 2013; The Genotype-Tissue Expression (GTEx) project.