CS-E5870 High-Throughput Bioinformatics RNA-seq analysis
|
|
- Dale Jennings
- 5 years ago
- Views:
Transcription
1 CS-E5870 High-Throughput Bioinformatics RNA-seq analysis Harri Lähdesmäki Department of Computer Science Aalto University September 30, 2016 Acknowledgement for J Salojärvi and E Czeizler for the previous version of the slides Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis
2 Gene transcription A process of making an RNA copy of a gene sequence in DNA Figure from Alternative splicing A process of making alternative mrna molecules from the same precursor RNA In humans, 95% of multi-exonic genes are are alternatively spliced Figure from
3 Alternative splicing mechanisms Alternative splicing happens co-transcriptionally and is largely regulated by splicing factors (proteins) that bind RNA motifs located in the pre-mrna Figure from (Chen & Manley, 2009) Different types of alternative splicing Basic modes of alternative splicing Figure from (Cartegni et al, 2002)
4 RNA-seq High-throughput sequencing of RNA (or cdna) to provide a comprehensive picture of the transcriptome Types of RNA molecules ribosomal RNA rrna 85-90% transfer RNA trna 10% mrna messenger RNA 1-5% micro RNA and other mirna, pirna, etc rest RNA-seq basic pipeline 1 RNA population is converted to a library of cdna fragments with adaptors attached to one or both ends 2 High-throughput sequencing for the cdna fragment library (single-end or paired-end), read length bp 3 Alignment against reference genome or transcriptome, transcriptome reconstruction, expression quantification, etc Figure from (Wang et al, 2009)
5 RNA-seq vs microarrays RNA-seq measures any transcripts in a sample while microarrays measure only genes corresponding to predetermined probes on a microarray With RNA-seq, there is no need to identify/synthesize probes prior to a measurement RNA-seq provides count data which can be a better proxy for the amount of mrna in a sample than the fluorescence measurements produced with microarray technology RNA-seq provides information about transcript sequence in addition to information about transcript abundance Thus, with RNA-seq it is possible to measure the expression of different alternatively spliced transcripts that has turned out to be difficult with microarray technology Sequence information also permits the identification of SNPs, allele specific expression, single nucleotide polymorphisms (SNPs), and other forms of sequence variation What can we do with RNA-seq data? Transcript assembly Construct full-length transcript sequences from the RNA-seq data (either with or without the knowledge of the reference genome) Identify variants Transcript quantification Given transcript sequence annotations (reference), estimate the gene expression or abundances of all different transcripts (alternative transcript isoforms for a gene) Differential expression Statistical inference for differential gene expression or alternative splicing
6 Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis RNA-seq read alignment If full-length transcript annotations are known, then reads can be aligned exactly as aligning against a reference genome If transcript annotations are not known, still similar approaches as for aligning DNA sequence reads will work with some modifications Transcriptomic reads can span exon junctions Transcriptomic reads can contain poly(a) ends (from post-transcriptional RNA processing) Exon A Exon B Exon C Processed mrna Mapping to genome Figure from (Trapnell & Salzberg, 2009)
7 TopHat pipeline We will look at TopHat, a commonly used tool for RNA-seq alignment All reads are mapped to the reference genome using Bowtie Reads that do not map to the genome are set aside as initially unmapped reads (IUM reads) Figure from (Trapnell et al, 2009) TopHat pipeline Consensus assemble of initially mapped reads with Maq assemble For low-quality or low-coverage positions, use reference genome to call the base Consensus exons are likely missing some amount of sequence at ends TopHat considers flanking sequences from reference genome (default=45bp) Merge neighboring exons with very short gap to a single exon (low transcribed genes can Figure from (Trapnell et al, 2009) have coverage gap)
8 TopHat pipeline To map reads to splice junctions: Enumerate all canonical donor and acceptor sites in consensus exons Consider all possible pairings (allowed intron length is an adjustable parameter) For each candidate splice junction, find initially unmapped reads that span them: seed-and-extend approach 234 _ Figure from (Trapnell et al, 2009) TopHat pipeline 004 _ B3gat1 B3gat B3gat1 BC Seed-and-extend: Pre-compute an index of reads: lookup table based on 2k-mer keys, which are associated with reads that contain the 2k-mer in their high-quality region (default k = 5) For candidate splice junction, concatenate the k bases downstream of the acceptor to the k bases upstream Query this 2k-mer against the read index (exact seed match, no mismatch allowed) Align remaining part of read left and right of the exact match (allowing fixed number of mismatches) erlapped by the 5 -UTR of another transcript Both isoforms are present in the brain tissue RNA sample The top track is the le read coverage reported by ERANGE for this region (Mortazavi et al, 2008) The lack of a large coverage gap causes TopHat ining both exons TopHat looks for introns within single islands in order to detect this junction The figure shows the normalized coverage of ns by uniquely mappable reads as reported by pts are clearly present in the RNA-Seq sample, region as a single island In order to detect such performance and specificity, TopHat looks for deeply sequenced During the island extraction rithm computes the following statistic for each to j in the map: j m=i d m 1 j i nm=0 (1) d m erage at coordinate m in the Bowtie map, and ce genome When scaled to range [0, 1000], malized depth of coverage for an island We nctions tend to fall within islands with high D s looks for junctions contained in islands with er can be changed by the user A high D -value Fig 3 The seed and Figure extendfrom alignment (Trapnell usedet to match al, 2009) reads to possible splice sites For each possible splice site, a seed is formed by combining a small amount of sequence upstream of the donor and downstream of the acceptor Downloaded from bioinformaticsoxfordjournalsorg at Helsinki University of Technology on
9 Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis Transcriptome assembly Goal: define precise map of all transcripts and isoforms that are expressed in a particular sample Challenges Gene expression spans several orders of magnitude, with some genes represented by only few reads Reads can originate from mature mrna or from incompletely spliced precursor RNA For short reads, hard to determine from which isoform they were produced Two main classes of methods Genome-independent (de Brujin graph, see previous lecture) Genome-guided (after read alignment)
10 Transcriptome reconstruction with Cufflinks Genome-guided: takes TopHat spliced alignments as input With paired-end RNA-seq data, Bowtie and TopHat produce alignments where paired reads of the same fragment are treated together as single alignment a Map paired cdna fragment sequences to genome TopHat Spliced fragment alignments Figure from (Trapnell et al, 2010) Transcriptome reconstruction with Cufflinks Connect fragments in an overlap graph Each fragment (read pair) corresponds to a node Directed edge from node x to node y if the alignment for x started at a lower coordinate than y, the alignments overlapped in the genome, and the fragments were compatible (every implied intron in one fragment matched an identical implied intron in the other fragment) If two reads originate from different isoforms they are likely incompatible b Assembly Overlap graph Mutually incompatible fragments Figure from (Trapnell et al, 2010)
11 Transcriptome reconstruction with Cufflinks Construct from the overlap graph minimal set of transcript isoforms explaining all the fragments Minimum path cover problem Minimum path cover Dilworth s theorem: maximum number of mutually incompatible fragments equals minimum number of paths covering the whole graph (=minimum number of transcripts needed to explain all the fragments) Transcripts Figure from (Trapnell et al, 2010) Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis
12 Gene expression quantification Expression of a gene: sum of the expression of all its isoforms Computing isoform abundances can be computationally challenging Simplified counting schemes without computing isoform abundances Exon union method: count reads mapped to any of the exons Exon intersection method: count reads mapped to constitutive exons c Isoform 1 Exon union method Isoform 2 Exon intersection method Figure from (Garber et al, 2011) Gene expression quantification REVIEW Cost of the simplified models REVIEW The union model tends to underestimate expression for verage across each path to decide alternatively a most likely to originate from the spliced genes The intersection 1 can 2 Low High reduce power for differential expression e similar computational requirepersonal computer Both assem- Short transcript Long transcript analysis 3 4 igh expression levels but differ sed transcripts where Cufflinks b d 10 ersus 25,000) most of which do 4 Exon union model Isoform 1 Transcript model 10 ance threshold used by Scripture 3 Isoform 2 Transcript expression method 10 pplementary Fig 3) In ccontrast, 2 10 s per locus (average of 16 veronly for a handful of transcripts 0 Exon union method 1 Isoform he most extreme case, Scripture 1 Isoform 2 a single locus whereas Cufflinks e gene Likelihood of isoform 2 0% 25% 25% Read count Exon intersection Isoform method 1 Isoform 2 100% Estimated FPKM FPKM True FPKM 10 4 truction Rather than mapping first, genome-independent tranrithms such as transabyss 53 use nsus transcripts Consensus ed to a genome or aligned to a nnotation purposes The central dent approaches is to partition Confidence interval Figure 3 An overview of gene expression quantification with RNA-seq Figure from (Garber et al, 2011) (a) Illustration of transcripts c of different lengths with different read coverage levels (left) as well as total read counts observed for each transcript (middle) and FPKM-normalized read counts (right) (b) Reads from alternatively spliced Isoform genes 1may be attributable to a single isoform or more than one isoform Reads are color-coded when their isoform of origin is clear Black reads indicate reads with uncertain origin Isoform expression methods estimate isoform abundances that best explain the Isoform 2 observed read counts under a generative model Samples near the original maximum likelihood estimate (dashed line) improve the robustness of the estimate and provide a confidence interval around each isoform s abundance Exon union method
13 Gene expression quantification Basic idea: read count corresponds to the expression level Basic assumption θ i = P(sample a read from transcript i) = 1 Z µ il i, where µi is the expression level of transcript i l i is the length of transcript i Normalizing constant is Z = i µ il i Gene expression quantification Use the frequency estimator to estimate the probability that a read originates from a given transcript i ˆθ i = k i N, where k i is the number of reads mapping to transcript i N is the total number of mapped reads Convert the estimates into expression values by normalizing by the transcript length ˆµ i ˆθ i l i
14 Gene expression quantification Common units to measure gene expression RPKM: reads per kilobases per million reads RPKM i = k i l i = 10 9 k i N 10 l i N RPKM is for single-end reads, FPMK is for paired-end reads Gene expression quantification Consider the 4 four transcripts with different lengths and expression levels illustrated below (left) On the right panel: the read counts normalized by the transcript length using the FPKM metric Transcripts 2 and 4 have comparable read-counts, transcript 2 has a significantly higher normalized expression level After normalization, transcripts 3 and 4 have similar expression values a 1 3 Low Short transcript 2 4 High Long transcript Read count FPKM Figure from (Garber et al, 2011) When the same gene is compared between conditions, the read counts (normalized by sequencing depth) are just fine
15 Gene expression quantification The above formulation assumes that all reads can be assigned uniquely to a single transcript That is generally not true Genes belonging to the same gene families have similar genome/rna sequence For gene expression estimate, different transcript isoforms can share a large fraction of their exons Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis
16 Gene expression quantification Recall the microarray based differential expression analysis using t-tests Two aspects Expression difference: how large is the expression difference between two groups? Statistical significance: how sure are we that there is a true difference? The latter is a statistical question: hypothesis testing Gene expression quantification Sequence count data is found to have non-gaussian distribution t-test based methods are not appropriate Recall: gene expression quantification using the frequency estimate For a single sample, assume read counts per unique transcripts have a multinomial (sampling) distribution Multinomial distribution can be approximated by Poisson distribution Read counts across replicates is observed to have a larger variance than what Poisson model suggests So-called overdispersed noise Negative binomial has been found to provide a good fit to the data
17 Gene expression quantification Assume that the number of aligned reads in sample j that are assigned to gene i can be modelled by negative binomial distribution K ij NB(µ ij, σ 2 ij) An alternative parametrization: the number of successes in a sequence of independent and identically distributed Bernoulli trials with probability p before a specified (non-random) number r of failures occurs K ij NB(r, p), where p = µ ij σ 2 ij r = µ2 ij σ 2 ij µ The parametric form P(K = k) = ( ) k + r 1 p r (1 p) k r 1 Gene expression quantification Estimation of mean and variance can be difficult because number of replicates is typically small Make further modeling assumptions (Anders & Huber, 2010) 1 Mean is modeled as product of (unknown) expected concentration of fragments from gene i under the condition of sample j and the sequencing depth for sample j µ ij = q i,ρ(j) s j 2 Variance is sum of shot noise and raw variance term σ 2 ij = µ ij + s 2 j v i,ρ(j) 3 The raw variance v i,ρ(j) is assumed to be a smooth function of q i,ρ(j) v i,ρ(j) = v ρ (q i,ρ(j) ) The last assumption allows us to pool the data from genes with similar expression strength for the purpose of variance estimation
18 Gene expression quantification The model of (Anders & huber, 2010) can be fitted to the data Assume we have m A replicate samples for biological condition A and m B samples for condition B We would like to test the null hypothesis of q i,a = q i,b q i,a is the expression strength for condition A (see previous page) Define a test statistic K is = K ia + K ib,where K ia = K ij, K ib = j:ρ(j)=a j:ρ(j)=b K ij Gene expression quantification Denote and assume independence (under the null hypothesis) p(a, b) =P(K ia = a, K ib = b) =P(K ia = a)p(k ib = b) The p-value of a pair of observed count sums (k ia, k ib )isthen the sum of all probabilities less or equal to p(k ia, k ib ), given that the overall sum is k is : p(a, b) p i = a + b = k is p(a, b) p(k ia, k ib ) p(a, b) a+b=k is (1) Correct p-values for multiple testing
19 1 000! 002! An example (Anders Wolfgang, 2010) Anders andfrom Huber Genome Biology& 2010, 11:R106 x7 RNA-Seq in fly embryos in two conditions, A and B Variance dependence on the mean: Poisson model (purple), orange (DEseq), dashed orange (edger) density! Gene expression quantification log10 mean 10^2 10^3 10^4 mean e SCV for the neural cell data Comes 1 and S9 See caption to Figure 1 Figure S6: Same plot as Figure 4, now for the neural cell data Again, red is DESeq s and blue edger s result (light blue, using read count sum for library size adjustment; dark blue, using Equation (5); solid, with common dispersion; dashed, with tagwise dispersion) Figure from (Anders & Wolfgang, 2010) Figure 1 Dependence of the variance on the mean for condition A in the fly RNA-Seq data (a) The scatter plot sh sample variances (Equation (7)) plotted against the common-scale means (Equation (6)) The orange line is the fit w(q) The p variance implied by the Poisson distribution for each of the two samples, that is, ^s j q^ i, A The dashed orange line is the varia edger (b) Same data as in (a), with the y-axis rescaled to show the squared coefficient of variation (SCV), that is all quantities square of the mean In (b), the solid orange line incorporated the bias correction described in Supplementary Note C in Add only shows SCV values in the range [0, 02] For a zoom-out to the full range, see Supplementary Figure S9 in Additional file Gene expression quantification! Figure S7: Type-I error control for the neural cell data Compare Volcano with plotfigure for 2the fly RNA-seq data from (Anders & Wolfgang, 2010) ential expression between conditions ll data Compare to Figure Empirical CDF obtained from each library var36 millions A good fraction of ample, from 32% to 53%) could ed to annotated genes, and End the data in a table of counts six samples and 18,760 rows, he di erential expression analye four GNS samples (excluding pathology and using only one of he same patient) is taken from a di erent sub- 008 Empirical CDF dealing with very di erent noise same analysis here for the Tagm et al [18] The data set comures derived from glioblastomas (condition GNS ) with two om non-cancerous neural stem p value Figure from (Anders & Wolfgang, 2010) p value 0 Figure S8: Volcano plot for comparison of conditions A and Figure 2 Type-I error control Thethe panels show empirical cumulative distribution functions (ECDFs) for P values from B in the flycondition RNA-Seq data replicate from A of the fly RNA-Seq data with the other one No genes are truly differentially expressed, and the should remain below the diagonal (gray) Panel (a): top row corresponds to DESeq, middle row to edger and bottom row test The right column shows the distributions for all genes, for genes the left and middle columns show them separately mean of 100 Panel (b) shows the same data, but zooms into the range of small P values The plots indicate that edger an 3 error at (and in fact slightly below) the nominal rate, while the Poisson-based c2 test fails to do so edger has an excess o counts: the blue line lies above the diagonal This excess is, however, compensated by the method being more conservat
20 Gene expression quantification MA plot for the fly RNA-seq data from (Anders & Wolfgang, 2010) MA plot: a scatter plot where x-axis shows mean and y-axis shows log Signal in A Signal in B Figure from (Anders & Wolfgang, 2010) Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis
21 Transcript-level expression quantification Let us assume that each gene i is associated with J i transcripts indexed by j, then θ ij = P(sample a read from transcript j associated with gene i) = 1 Z µ ijl ij, where µ ij is the expression level of transcript j associated with gene j lij is the length of transcript j of gene i Normalizing constant is Z = ij µ ijl ij The true expression level of gene i is µ i = J i j=1 µ ij Transcript-level expression quantification Lets denote the aligned RNA-seq reads as R 1, R 2,,R N Let us also make an unrealistic assumption that all reads are assigned uniquely to one of the transcripts Then the frequency estimator gives us ˆθ ij = k ij N, where k ij is the number of reads assigned uniquely to µ ij Correspondingly, we can convert the estimates into expression values by normalizing by the transcript length ˆµ ij j ˆθ ij l ij = j k ij l ij N
22 Transcript-level expression quantification Recall the union method for estimating the gene expression level k i = k ij j and the frequency estimator ˆθ i = k i l i, where l i is the length of the gene i Union method tends to underestimate the gene expression level because ˆθ i = j k ij l i = k i1 l i k i1 l i1 + + k ij i l iji, + + k ij i l i where l i l ij Transcript-level expression quantification Consider a simple case of skipped exon Intron Skipped exon Constitutive exons Inclusion reads Constitutive reads Exclusion reads Constitutive reads Figure from (Katz et al, 2010) We can use eg the reads in the skipped exon and the inclusion and exclusion reads together with the frequency estimator to estimate the relative expression of the two transcripts
23 Transcript-level expression quantification With paired end reads we can try to use all (non-uniquely) aligned reads assuming we can estimate insert variability c Insert variability: = d Insert length (nt) Paired-end estimate, MISO Figure from (Katz et al, 2010) Estimation can be done Markov chain Monte Carlo (MCMC) sampling (Katz et al, 2010) References Simon Anders & Wolfgang Huber, Differential expression analysis for sequence count data, Genome Biology, 11:R106, 2010 Cartegni, L, Chew, S L, & Krainer, A R Listening to silence and understanding nonsense: exonic mutations that affect splicing Nature Reviews Genetics 3, (2002) Mo Chen & James L Manley, Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches Nature Reviews Molecular Cell Biology 10, , 2009 Manuel Garber, et al, Computational methods for transcriptome annotation and quantification using RNA-seq, Nature Methods 8, , 2011 Katz Y, et al, Analysis and design of rna sequencing experiments for identifying isoform regulation, Nature Methods, 7(12): , 2010 Cole Trapnell, Lior Pachter and Steven L Salzberg, TopHat: discovering splice junctions with RNA-Seq, Bioinformatics, 25(9): , 2009 Cole Trapnell & Steven L Salzberg, How to map billions of short reads onto genomes, Nature Biotechnology 27, , 2009 Cole Trapnell, et al, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology 28, , 2010 Zhong Wang, Mark Gerstein & Michael Snyder, RNA-Seq: a revolutionary tool for transcriptomics Nature Reviews Genetics 10, 57-63, 2009
Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),
Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), 2012-01-26 What is a gene What is a transcriptome History of gene expression assessment RNA-seq RNA-seq analysis
More informationChIP-seq and RNA-seq
ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)
More informationChIP-seq and RNA-seq. Farhat Habib
ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions
More informationIntroduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance
More informationStatistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies
Statistical Genomics and Bioinformatics Workshop: Genetic Association and RNA-Seq Studies RNA Seq and Differential Expression Analysis Brooke L. Fridley, PhD University of Kansas Medical Center 1 Next-generation
More informationTranscriptome analysis
Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize
More informationCS-E5870 High-Throughput Bioinformatics RNA-seq analysis
CS-E5870 High-Throughput Bioinformatics RNA-seq analysis Harri Lähdesmäki Department of Computer Science Aalto University November 14, 2017 Acknowledgement for J. Salojärvi and E. Czeizler for the previous
More informationAnalysis of RNA-seq Data
Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,
More informationDeep sequencing of transcriptomes
1 / 40 Deep sequencing of transcriptomes An introduction to RNA-seq Michael Dondrup UNI BCCS 2. november 2010 2 / 40 Transcriptomics by Ultra-Fast Sequencing Microarrays have been the primary transcriptomics
More informationless sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput
Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationRNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)
RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS) Terms and definitions TRANSCRIPTOME The full set of RNA transcripts and their
More informationMapping strategies for sequence reads
Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements
More informationmeasuring gene expression December 5, 2017
measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA
More informationBackground Wikipedia Lee and Mahadavan, JCB, 2009 History (Platform Comparison) P Park, Nature Review Genetics, 2009 P Park, Nature Reviews Genetics, 2009 Rozowsky et al., Nature Biotechnology, 2009
More informationRNA-Seq with the Tuxedo Suite
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with
More informationRNA-SEQUENCING ANALYSIS
RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS
More informationmeasuring gene expression December 11, 2018
measuring gene expression December 11, 2018 Intervening Sequences (introns): how does the cell get rid of them? Splicing!!! Highly conserved ribonucleoprotein complex recognizes intron/exon junctions and
More informationSequence Analysis 2RNA-Seq
Sequence Analysis 2RNA-Seq Lecture 10 2/21/2018 Instructor : Kritika Karri kkarri@bu.edu Transcriptome Entire set of RNA transcripts in a given cell for a specific developmental stage or physiological
More informationEucalyptus gene assembly
Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted
More informationRNA
RNA sequencing Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271 www.inouyelab.org
More informationCSE 549: RNA-Seq aided gene finding
CSE 549: RNA-Seq aided gene finding Finding Genes We ll break gene finding methods into 3 main categories. ab initio latin from the beginning w/o experimental evidence comparative make use of knowledge
More informationSystematic evaluation of spliced alignment programs for RNA- seq data
Systematic evaluation of spliced alignment programs for RNA- seq data Pär G. Engström, Tamara Steijger, Botond Sipos, Gregory R. Grant, André Kahles, RGASP Consortium, Gunnar Rätsch, Nick Goldman, Tim
More informationAnalysis of RNA-seq Data. Bernard Pereira
Analysis of RNA-seq Data Bernard Pereira The many faces of RNA-seq Applications Discovery Find new transcripts Find transcript boundaries Find splice junctions Comparison Given samples from different experimental
More informationFinding Genes with Genomics Technologies
PLNT2530 Plant Biotechnology (2018) Unit 7 Finding Genes with Genomics Technologies Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationRNA-Seq Analysis. Simon Andrews, Laura v
RNA-Seq Analysis Simon Andrews, Laura Biggins simon.andrews@babraham.ac.uk @simon_andrews v2018-10 RNA-Seq Libraries rrna depleted mrna Fragment u u u u NNNN Random prime + RT 2 nd strand synthesis (+
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure
More informationGeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment
GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment Zhaojun Zhang, Shunping Huang, Jack Wang, Xiang Zhang, Fernando Pardo
More informationFrom reads to results: differential. Alicia Oshlack Head of Bioinformatics
From reads to results: differential expression analysis with ihrna seq Alicia Oshlack Head of Bioinformatics Murdoch Childrens Research Institute Benefits and opportunities ii of RNA seq All transcripts
More informationIntro to RNA-seq. July 13, 2015
Intro to RNA-seq July 13, 2015 Goal of the course To be able to effectively design, and interpret genomic studies of gene expression. We will focus on RNA-seq, but the class will provide a foothold into
More information1. Introduction Gene regulation Genomics and genome analyses
1. Introduction Gene regulation Genomics and genome analyses 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites Databases 3. Technologies Microarrays Deep sequencing
More informationDifferential expression analysis for sequencing count data. Simon Anders
Differential expression analysis for sequencing count data Simon Anders Two applications of RNA-Seq Discovery find new transcripts find transcript boundaries find splice junctions Comparison Given samples
More informationIntroduction of RNA-Seq Analysis
Introduction of RNA-Seq Analysis Jiang Li, MS Bioinformatics System Engineer I Center for Quantitative Sciences(CQS) Vanderbilt University September 21, 2012 Goal of this talk 1. Act as a practical resource
More informationCS-E5870 High-Throughput Bioinformatics Microarray data analysis
CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the
More informationWhole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist
Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data
More informationDe novo assembly in RNA-seq analysis.
De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct
More informationNGS Data Analysis and Galaxy
NGS Data Analysis and Galaxy University of Pretoria Pretoria, South Africa 14-18 October 2013 Dave Clements, Emory University http://galaxyproject.org/ Fourie Joubert, Burger van Jaarsveld Bioinformatics
More informationExperimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis
-Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification
More informationNext Generation Sequencing
Next Generation Sequencing Complete Report Catalogue # and Service: IR16001 rrna depletion (human, mouse, or rat) IR11081 Total RNA Sequencing (80 million reads, 2x75 bp PE) Xxxxxxx - xxxxxxxxxxxxxxxxxxxxxx
More informationSCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly
SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG
More informationHigh performance sequencing and gene expression quantification
High performance sequencing and gene expression quantification Ana Conesa Genomics of Gene Expression Lab Centro de Investigaciones Príncipe Felipe Valencia aconesa@cipf.es Next Generation Sequencing NGS
More information10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group
RNA-Seq analysis With reference assembly Cormier Alexandre, PhD student UMR8227, Algal Genetics Group Summary 2 Typical RNA-seq workflow Introduction Reference genome Reference transcriptome Reference
More informationTECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA
TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA Stranded, Illumina ready library construction in
More informationRNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols Next generation sequencing protocol cdna, not RNA sequencing
More informationIntroduction to RNA sequencing
Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence
More informationQuantifying gene expression
Quantifying gene expression Genome GTF (annotation)? Sequence reads FASTQ FASTQ (+reference transcriptome index) Quality control FASTQ Alignment to Genome: HISAT2, STAR (+reference genome index) (known
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationGene Expression Technology
Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene
More informationDNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs
DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs Nadia Pisanti University of Pisa & Leiden University Outline New Generation Sequencing (NGS), and the importance of detecting
More informationCanadian Bioinforma3cs Workshops
Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu
More informationApplications of short-read
Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Sequencing applications RNA-Seq includes experiments
More informationIntroduction to RNAseq Analysis. Milena Kraus Apr 18, 2016
Introduction to RNAseq Analysis Milena Kraus Apr 18, 2016 Agenda What is RNA sequencing used for? 1. Biological background 2. From wet lab sample to transcriptome a. Experimental procedure b. Raw data
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012
Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 A typical RNA-Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationRNA standards v May
Standards, Guidelines and Best Practices for RNA-Seq: 2010/2011 I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable properties for quantification,
More informationOptimal Calculation of RNA-Seq Fold-Change Values
International Journal of Computational Bioinformatics and In Silico Modeling Vol. 2, No. 6 (2013): 285-292 Research Article Open Access ISSN: 2320-0634 Optimal Calculation of RNA-Seq Fold-Change Values
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationSequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq
Sequencing applications Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ RNA-Seq includes experiments
More informationCalculating Sample Size Estimates for RNA Sequencing Data ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 12, 2013 # Mary Ann Liebert, Inc. Pp. 970 978 DOI: 10.1089/cmb.2012.0283 Calculating Sample Size Estimates for RNA Sequencing Data STEVEN N. HART, 1 TERRY
More informationSingle-cell sequencing
Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data
More informationSMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA
SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA The most sensitive cdna synthesis technology, combined with next-generation
More informationIntroduction to RNA-Seq in GeneSpring NGS Software
Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,
More informationRNAseq Differential Gene Expression Analysis Report
RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample
More informationRNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University
RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day five Alternative splicing Assembly RNA edits Alternative splicing
More informationSUPPLEMENTARY INFORMATION
doi:1.138/nature11233 Supplementary Figure S1 Sample Flowchart. The ENCODE transcriptome data are obtained from several cell lines which have been cultured in replicates. They were either left intact (whole
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationBasics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility
2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,
More informationIntegrative Genomics 1a. Introduction
2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering
More informationYear III Pharm.D Dr. V. Chitra
Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves
More informationSupplementary Information for Single-cell sequencing of the small-rna transcriptome
Supplementary Information for Single-cell sequencing of the small-rna transcriptome Omid R. Faridani 1,6,*, Ilgar Abdullayev 1,2,6, Michael Hagemann-Jensen 1,3, John P. Schell 4, Fredrik Lanner 4,5 and
More informationFrom reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute
From reads to results: differen1al expression analysis with RNA seq Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute Purported benefits and opportuni1es of RNA seq All transcripts are
More informationVM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE
VM origin Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE NGS intro + Genome-Based Transcript Reconstruction and Analysis Using RNA-Seq Data Based on material
More informationRNA Sequencing Analyses & Mapping Uncertainty
RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly
More informationHomework 4. Due in class, Wednesday, November 10, 2004
1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationIntegrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013
Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA March 2, 2013 Steven R. Kain, Ph.D. ABRF 2013 NuGEN s Core Technologies Selective Sequence Priming Nucleic Acid Amplification
More informationSO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS
SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS SETTLES@UCDAVIS.EDU Bioinformatics Core Genome Center UC Davis BIOINFORMATICS.UCDAVIS.EDU DISCLAIMER This talk/workshop
More informationTranscriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona
Transcriptomics Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Central dogma of molecular biology Central dogma of molecular biology Genome Complete DNA content of
More informationRNA-Seq analysis using R: Differential expression and transcriptome assembly
RNA-Seq analysis using R: Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 12/7/2016 Agenda Brief about RNA-seq and experiment design Gene oriented analysis Gene quantification
More informationCBC Data Therapy. Metatranscriptomics Discussion
CBC Data Therapy Metatranscriptomics Discussion Metatranscriptomics Extract RNA, subtract rrna Sequence cdna QC Gene expression, function Institute for Systems Genomics: Computational Biology Core bioinformatics.uconn.edu
More informationA normalization method based on variance and median adjustment for massive mrna polyadenylation data
ISSN : 0974-7435 Volume 8 Issue 4 A normalization method based on variance and median adjustment for massive mrna polyadenylation data Guoli Ji, Ying Wang, Mingchen Wu, Yangzi Zhang, Xiaohui Wu* Department
More informationBenchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data
Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data Jan Hellemans 7th international qpcr & NGS Event - Freising March 24 th, 2015 Therapeutics lncrna oncology
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationIntroduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute
Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how
More informationSUPPLEMENTARY FIGURES AND TABLES
SUPPLEMENTARY FIGURES AND TABLES Supplementary Figure S1: Plot shows the percentage of initially identified backsplice junctions against the Read depth value. Junctional circrna candidate have one mate
More informationNature Methods: doi: /nmeth Supplementary Figure 1. DMS-MaPseq data are highly reproducible at elevated DMS concentrations.
Supplementary Figure 1 DMS-MaPseq data are highly reproducible at elevated DMS concentrations. a, Correlation of Gini index for 202 yeast mrna regions with 15x coverage at 2.5% or 5% v/v DMS concentrations
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationGenomics. Data Analysis & Visualization. Camilo Valdes
Genomics Data Analysis & Visualization Camilo Valdes cvaldes3@miami.edu https://github.com/camilo-v Center for Computational Science, University of Miami ccs.miami.edu Today Sequencing Technologies Background
More informationTECH NOTE Stranded NGS libraries from FFPE samples
TECH NOTE Stranded NGS libraries from FFPE samples Robust performance with extremely degraded FFPE RNA (DV 200 >25%) Consistent library quality across a range of input amounts (5 ng 50 ng) Compatibility
More informationWheat CAP Gene Expression with RNA-Seq
Wheat CAP Gene Expression with RNA-Seq July 9 th -13 th, 2018 Overview of the workshop, Alina Akhunova http://www.ksre.k-state.edu/igenomics/workshops/ RNA-Seq Workshop Activities Lectures Laboratory Molecular
More informationAnalysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)
Analysis of RNA-seq Data Feb 8, 2017 Peikai CHEN (PHD) Outline What is RNA-seq? What can RNA-seq do? How is RNA-seq measured? How to process RNA-seq data: the basics How to visualize and diagnose your
More informationBi 8 Lecture 5. Ellen Rothenberg 19 January 2016
Bi 8 Lecture 5 MORE ON HOW WE KNOW WHAT WE KNOW and intro to the protein code Ellen Rothenberg 19 January 2016 SIZE AND PURIFICATION BY SYNTHESIS: BASIS OF EARLY SEQUENCING complex mixture of aborted DNA
More informationAnnotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.
David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six
More informationConsensus Ensemble Approaches Improve De Novo Transcriptome Assemblies
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department
More informationIntroduction to Microarray Analysis
Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray
More informationCourse Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses
Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:
More informationUser s Manual Version 1.0
User s Manual Version 1.0 University of Utah School of Medicine Department of Bioinformatics 421 S. Wakara Way, Salt Lake City, Utah 84108-3514 http://genomics.chpc.utah.edu/cas Contact us at issue.leelab@gmail.com
More informationThe first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.
Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create
More information