CS-E5870 High-Throughput Bioinformatics RNA-seq analysis

Size: px
Start display at page:

Download "CS-E5870 High-Throughput Bioinformatics RNA-seq analysis"

Transcription

1 CS-E5870 High-Throughput Bioinformatics RNA-seq analysis Harri Lähdesmäki Department of Computer Science Aalto University September 30, 2016 Acknowledgement for J Salojärvi and E Czeizler for the previous version of the slides Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis

2 Gene transcription A process of making an RNA copy of a gene sequence in DNA Figure from Alternative splicing A process of making alternative mrna molecules from the same precursor RNA In humans, 95% of multi-exonic genes are are alternatively spliced Figure from

3 Alternative splicing mechanisms Alternative splicing happens co-transcriptionally and is largely regulated by splicing factors (proteins) that bind RNA motifs located in the pre-mrna Figure from (Chen & Manley, 2009) Different types of alternative splicing Basic modes of alternative splicing Figure from (Cartegni et al, 2002)

4 RNA-seq High-throughput sequencing of RNA (or cdna) to provide a comprehensive picture of the transcriptome Types of RNA molecules ribosomal RNA rrna 85-90% transfer RNA trna 10% mrna messenger RNA 1-5% micro RNA and other mirna, pirna, etc rest RNA-seq basic pipeline 1 RNA population is converted to a library of cdna fragments with adaptors attached to one or both ends 2 High-throughput sequencing for the cdna fragment library (single-end or paired-end), read length bp 3 Alignment against reference genome or transcriptome, transcriptome reconstruction, expression quantification, etc Figure from (Wang et al, 2009)

5 RNA-seq vs microarrays RNA-seq measures any transcripts in a sample while microarrays measure only genes corresponding to predetermined probes on a microarray With RNA-seq, there is no need to identify/synthesize probes prior to a measurement RNA-seq provides count data which can be a better proxy for the amount of mrna in a sample than the fluorescence measurements produced with microarray technology RNA-seq provides information about transcript sequence in addition to information about transcript abundance Thus, with RNA-seq it is possible to measure the expression of different alternatively spliced transcripts that has turned out to be difficult with microarray technology Sequence information also permits the identification of SNPs, allele specific expression, single nucleotide polymorphisms (SNPs), and other forms of sequence variation What can we do with RNA-seq data? Transcript assembly Construct full-length transcript sequences from the RNA-seq data (either with or without the knowledge of the reference genome) Identify variants Transcript quantification Given transcript sequence annotations (reference), estimate the gene expression or abundances of all different transcripts (alternative transcript isoforms for a gene) Differential expression Statistical inference for differential gene expression or alternative splicing

6 Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis RNA-seq read alignment If full-length transcript annotations are known, then reads can be aligned exactly as aligning against a reference genome If transcript annotations are not known, still similar approaches as for aligning DNA sequence reads will work with some modifications Transcriptomic reads can span exon junctions Transcriptomic reads can contain poly(a) ends (from post-transcriptional RNA processing) Exon A Exon B Exon C Processed mrna Mapping to genome Figure from (Trapnell & Salzberg, 2009)

7 TopHat pipeline We will look at TopHat, a commonly used tool for RNA-seq alignment All reads are mapped to the reference genome using Bowtie Reads that do not map to the genome are set aside as initially unmapped reads (IUM reads) Figure from (Trapnell et al, 2009) TopHat pipeline Consensus assemble of initially mapped reads with Maq assemble For low-quality or low-coverage positions, use reference genome to call the base Consensus exons are likely missing some amount of sequence at ends TopHat considers flanking sequences from reference genome (default=45bp) Merge neighboring exons with very short gap to a single exon (low transcribed genes can Figure from (Trapnell et al, 2009) have coverage gap)

8 TopHat pipeline To map reads to splice junctions: Enumerate all canonical donor and acceptor sites in consensus exons Consider all possible pairings (allowed intron length is an adjustable parameter) For each candidate splice junction, find initially unmapped reads that span them: seed-and-extend approach 234 _ Figure from (Trapnell et al, 2009) TopHat pipeline 004 _ B3gat1 B3gat B3gat1 BC Seed-and-extend: Pre-compute an index of reads: lookup table based on 2k-mer keys, which are associated with reads that contain the 2k-mer in their high-quality region (default k = 5) For candidate splice junction, concatenate the k bases downstream of the acceptor to the k bases upstream Query this 2k-mer against the read index (exact seed match, no mismatch allowed) Align remaining part of read left and right of the exact match (allowing fixed number of mismatches) erlapped by the 5 -UTR of another transcript Both isoforms are present in the brain tissue RNA sample The top track is the le read coverage reported by ERANGE for this region (Mortazavi et al, 2008) The lack of a large coverage gap causes TopHat ining both exons TopHat looks for introns within single islands in order to detect this junction The figure shows the normalized coverage of ns by uniquely mappable reads as reported by pts are clearly present in the RNA-Seq sample, region as a single island In order to detect such performance and specificity, TopHat looks for deeply sequenced During the island extraction rithm computes the following statistic for each to j in the map: j m=i d m 1 j i nm=0 (1) d m erage at coordinate m in the Bowtie map, and ce genome When scaled to range [0, 1000], malized depth of coverage for an island We nctions tend to fall within islands with high D s looks for junctions contained in islands with er can be changed by the user A high D -value Fig 3 The seed and Figure extendfrom alignment (Trapnell usedet to match al, 2009) reads to possible splice sites For each possible splice site, a seed is formed by combining a small amount of sequence upstream of the donor and downstream of the acceptor Downloaded from bioinformaticsoxfordjournalsorg at Helsinki University of Technology on

9 Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis Transcriptome assembly Goal: define precise map of all transcripts and isoforms that are expressed in a particular sample Challenges Gene expression spans several orders of magnitude, with some genes represented by only few reads Reads can originate from mature mrna or from incompletely spliced precursor RNA For short reads, hard to determine from which isoform they were produced Two main classes of methods Genome-independent (de Brujin graph, see previous lecture) Genome-guided (after read alignment)

10 Transcriptome reconstruction with Cufflinks Genome-guided: takes TopHat spliced alignments as input With paired-end RNA-seq data, Bowtie and TopHat produce alignments where paired reads of the same fragment are treated together as single alignment a Map paired cdna fragment sequences to genome TopHat Spliced fragment alignments Figure from (Trapnell et al, 2010) Transcriptome reconstruction with Cufflinks Connect fragments in an overlap graph Each fragment (read pair) corresponds to a node Directed edge from node x to node y if the alignment for x started at a lower coordinate than y, the alignments overlapped in the genome, and the fragments were compatible (every implied intron in one fragment matched an identical implied intron in the other fragment) If two reads originate from different isoforms they are likely incompatible b Assembly Overlap graph Mutually incompatible fragments Figure from (Trapnell et al, 2010)

11 Transcriptome reconstruction with Cufflinks Construct from the overlap graph minimal set of transcript isoforms explaining all the fragments Minimum path cover problem Minimum path cover Dilworth s theorem: maximum number of mutually incompatible fragments equals minimum number of paths covering the whole graph (=minimum number of transcripts needed to explain all the fragments) Transcripts Figure from (Trapnell et al, 2010) Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis

12 Gene expression quantification Expression of a gene: sum of the expression of all its isoforms Computing isoform abundances can be computationally challenging Simplified counting schemes without computing isoform abundances Exon union method: count reads mapped to any of the exons Exon intersection method: count reads mapped to constitutive exons c Isoform 1 Exon union method Isoform 2 Exon intersection method Figure from (Garber et al, 2011) Gene expression quantification REVIEW Cost of the simplified models REVIEW The union model tends to underestimate expression for verage across each path to decide alternatively a most likely to originate from the spliced genes The intersection 1 can 2 Low High reduce power for differential expression e similar computational requirepersonal computer Both assem- Short transcript Long transcript analysis 3 4 igh expression levels but differ sed transcripts where Cufflinks b d 10 ersus 25,000) most of which do 4 Exon union model Isoform 1 Transcript model 10 ance threshold used by Scripture 3 Isoform 2 Transcript expression method 10 pplementary Fig 3) In ccontrast, 2 10 s per locus (average of 16 veronly for a handful of transcripts 0 Exon union method 1 Isoform he most extreme case, Scripture 1 Isoform 2 a single locus whereas Cufflinks e gene Likelihood of isoform 2 0% 25% 25% Read count Exon intersection Isoform method 1 Isoform 2 100% Estimated FPKM FPKM True FPKM 10 4 truction Rather than mapping first, genome-independent tranrithms such as transabyss 53 use nsus transcripts Consensus ed to a genome or aligned to a nnotation purposes The central dent approaches is to partition Confidence interval Figure 3 An overview of gene expression quantification with RNA-seq Figure from (Garber et al, 2011) (a) Illustration of transcripts c of different lengths with different read coverage levels (left) as well as total read counts observed for each transcript (middle) and FPKM-normalized read counts (right) (b) Reads from alternatively spliced Isoform genes 1may be attributable to a single isoform or more than one isoform Reads are color-coded when their isoform of origin is clear Black reads indicate reads with uncertain origin Isoform expression methods estimate isoform abundances that best explain the Isoform 2 observed read counts under a generative model Samples near the original maximum likelihood estimate (dashed line) improve the robustness of the estimate and provide a confidence interval around each isoform s abundance Exon union method

13 Gene expression quantification Basic idea: read count corresponds to the expression level Basic assumption θ i = P(sample a read from transcript i) = 1 Z µ il i, where µi is the expression level of transcript i l i is the length of transcript i Normalizing constant is Z = i µ il i Gene expression quantification Use the frequency estimator to estimate the probability that a read originates from a given transcript i ˆθ i = k i N, where k i is the number of reads mapping to transcript i N is the total number of mapped reads Convert the estimates into expression values by normalizing by the transcript length ˆµ i ˆθ i l i

14 Gene expression quantification Common units to measure gene expression RPKM: reads per kilobases per million reads RPKM i = k i l i = 10 9 k i N 10 l i N RPKM is for single-end reads, FPMK is for paired-end reads Gene expression quantification Consider the 4 four transcripts with different lengths and expression levels illustrated below (left) On the right panel: the read counts normalized by the transcript length using the FPKM metric Transcripts 2 and 4 have comparable read-counts, transcript 2 has a significantly higher normalized expression level After normalization, transcripts 3 and 4 have similar expression values a 1 3 Low Short transcript 2 4 High Long transcript Read count FPKM Figure from (Garber et al, 2011) When the same gene is compared between conditions, the read counts (normalized by sequencing depth) are just fine

15 Gene expression quantification The above formulation assumes that all reads can be assigned uniquely to a single transcript That is generally not true Genes belonging to the same gene families have similar genome/rna sequence For gene expression estimate, different transcript isoforms can share a large fraction of their exons Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis

16 Gene expression quantification Recall the microarray based differential expression analysis using t-tests Two aspects Expression difference: how large is the expression difference between two groups? Statistical significance: how sure are we that there is a true difference? The latter is a statistical question: hypothesis testing Gene expression quantification Sequence count data is found to have non-gaussian distribution t-test based methods are not appropriate Recall: gene expression quantification using the frequency estimate For a single sample, assume read counts per unique transcripts have a multinomial (sampling) distribution Multinomial distribution can be approximated by Poisson distribution Read counts across replicates is observed to have a larger variance than what Poisson model suggests So-called overdispersed noise Negative binomial has been found to provide a good fit to the data

17 Gene expression quantification Assume that the number of aligned reads in sample j that are assigned to gene i can be modelled by negative binomial distribution K ij NB(µ ij, σ 2 ij) An alternative parametrization: the number of successes in a sequence of independent and identically distributed Bernoulli trials with probability p before a specified (non-random) number r of failures occurs K ij NB(r, p), where p = µ ij σ 2 ij r = µ2 ij σ 2 ij µ The parametric form P(K = k) = ( ) k + r 1 p r (1 p) k r 1 Gene expression quantification Estimation of mean and variance can be difficult because number of replicates is typically small Make further modeling assumptions (Anders & Huber, 2010) 1 Mean is modeled as product of (unknown) expected concentration of fragments from gene i under the condition of sample j and the sequencing depth for sample j µ ij = q i,ρ(j) s j 2 Variance is sum of shot noise and raw variance term σ 2 ij = µ ij + s 2 j v i,ρ(j) 3 The raw variance v i,ρ(j) is assumed to be a smooth function of q i,ρ(j) v i,ρ(j) = v ρ (q i,ρ(j) ) The last assumption allows us to pool the data from genes with similar expression strength for the purpose of variance estimation

18 Gene expression quantification The model of (Anders & huber, 2010) can be fitted to the data Assume we have m A replicate samples for biological condition A and m B samples for condition B We would like to test the null hypothesis of q i,a = q i,b q i,a is the expression strength for condition A (see previous page) Define a test statistic K is = K ia + K ib,where K ia = K ij, K ib = j:ρ(j)=a j:ρ(j)=b K ij Gene expression quantification Denote and assume independence (under the null hypothesis) p(a, b) =P(K ia = a, K ib = b) =P(K ia = a)p(k ib = b) The p-value of a pair of observed count sums (k ia, k ib )isthen the sum of all probabilities less or equal to p(k ia, k ib ), given that the overall sum is k is : p(a, b) p i = a + b = k is p(a, b) p(k ia, k ib ) p(a, b) a+b=k is (1) Correct p-values for multiple testing

19 1 000! 002! An example (Anders Wolfgang, 2010) Anders andfrom Huber Genome Biology& 2010, 11:R106 x7 RNA-Seq in fly embryos in two conditions, A and B Variance dependence on the mean: Poisson model (purple), orange (DEseq), dashed orange (edger) density! Gene expression quantification log10 mean 10^2 10^3 10^4 mean e SCV for the neural cell data Comes 1 and S9 See caption to Figure 1 Figure S6: Same plot as Figure 4, now for the neural cell data Again, red is DESeq s and blue edger s result (light blue, using read count sum for library size adjustment; dark blue, using Equation (5); solid, with common dispersion; dashed, with tagwise dispersion) Figure from (Anders & Wolfgang, 2010) Figure 1 Dependence of the variance on the mean for condition A in the fly RNA-Seq data (a) The scatter plot sh sample variances (Equation (7)) plotted against the common-scale means (Equation (6)) The orange line is the fit w(q) The p variance implied by the Poisson distribution for each of the two samples, that is, ^s j q^ i, A The dashed orange line is the varia edger (b) Same data as in (a), with the y-axis rescaled to show the squared coefficient of variation (SCV), that is all quantities square of the mean In (b), the solid orange line incorporated the bias correction described in Supplementary Note C in Add only shows SCV values in the range [0, 02] For a zoom-out to the full range, see Supplementary Figure S9 in Additional file Gene expression quantification! Figure S7: Type-I error control for the neural cell data Compare Volcano with plotfigure for 2the fly RNA-seq data from (Anders & Wolfgang, 2010) ential expression between conditions ll data Compare to Figure Empirical CDF obtained from each library var36 millions A good fraction of ample, from 32% to 53%) could ed to annotated genes, and End the data in a table of counts six samples and 18,760 rows, he di erential expression analye four GNS samples (excluding pathology and using only one of he same patient) is taken from a di erent sub- 008 Empirical CDF dealing with very di erent noise same analysis here for the Tagm et al [18] The data set comures derived from glioblastomas (condition GNS ) with two om non-cancerous neural stem p value Figure from (Anders & Wolfgang, 2010) p value 0 Figure S8: Volcano plot for comparison of conditions A and Figure 2 Type-I error control Thethe panels show empirical cumulative distribution functions (ECDFs) for P values from B in the flycondition RNA-Seq data replicate from A of the fly RNA-Seq data with the other one No genes are truly differentially expressed, and the should remain below the diagonal (gray) Panel (a): top row corresponds to DESeq, middle row to edger and bottom row test The right column shows the distributions for all genes, for genes the left and middle columns show them separately mean of 100 Panel (b) shows the same data, but zooms into the range of small P values The plots indicate that edger an 3 error at (and in fact slightly below) the nominal rate, while the Poisson-based c2 test fails to do so edger has an excess o counts: the blue line lies above the diagonal This excess is, however, compensated by the method being more conservat

20 Gene expression quantification MA plot for the fly RNA-seq data from (Anders & Wolfgang, 2010) MA plot: a scatter plot where x-axis shows mean and y-axis shows log Signal in A Signal in B Figure from (Anders & Wolfgang, 2010) Contents Background Alignment of RNA-seq data Transcriptome assembly Gene expression quantification Differential gene expression analysis Transcript-level analysis

21 Transcript-level expression quantification Let us assume that each gene i is associated with J i transcripts indexed by j, then θ ij = P(sample a read from transcript j associated with gene i) = 1 Z µ ijl ij, where µ ij is the expression level of transcript j associated with gene j lij is the length of transcript j of gene i Normalizing constant is Z = ij µ ijl ij The true expression level of gene i is µ i = J i j=1 µ ij Transcript-level expression quantification Lets denote the aligned RNA-seq reads as R 1, R 2,,R N Let us also make an unrealistic assumption that all reads are assigned uniquely to one of the transcripts Then the frequency estimator gives us ˆθ ij = k ij N, where k ij is the number of reads assigned uniquely to µ ij Correspondingly, we can convert the estimates into expression values by normalizing by the transcript length ˆµ ij j ˆθ ij l ij = j k ij l ij N

22 Transcript-level expression quantification Recall the union method for estimating the gene expression level k i = k ij j and the frequency estimator ˆθ i = k i l i, where l i is the length of the gene i Union method tends to underestimate the gene expression level because ˆθ i = j k ij l i = k i1 l i k i1 l i1 + + k ij i l iji, + + k ij i l i where l i l ij Transcript-level expression quantification Consider a simple case of skipped exon Intron Skipped exon Constitutive exons Inclusion reads Constitutive reads Exclusion reads Constitutive reads Figure from (Katz et al, 2010) We can use eg the reads in the skipped exon and the inclusion and exclusion reads together with the frequency estimator to estimate the relative expression of the two transcripts

23 Transcript-level expression quantification With paired end reads we can try to use all (non-uniquely) aligned reads assuming we can estimate insert variability c Insert variability: = d Insert length (nt) Paired-end estimate, MISO Figure from (Katz et al, 2010) Estimation can be done Markov chain Monte Carlo (MCMC) sampling (Katz et al, 2010) References Simon Anders & Wolfgang Huber, Differential expression analysis for sequence count data, Genome Biology, 11:R106, 2010 Cartegni, L, Chew, S L, & Krainer, A R Listening to silence and understanding nonsense: exonic mutations that affect splicing Nature Reviews Genetics 3, (2002) Mo Chen & James L Manley, Mechanisms of alternative splicing regulation: insights from molecular and genomics approaches Nature Reviews Molecular Cell Biology 10, , 2009 Manuel Garber, et al, Computational methods for transcriptome annotation and quantification using RNA-seq, Nature Methods 8, , 2011 Katz Y, et al, Analysis and design of rna sequencing experiments for identifying isoform regulation, Nature Methods, 7(12): , 2010 Cole Trapnell, Lior Pachter and Steven L Salzberg, TopHat: discovering splice junctions with RNA-Seq, Bioinformatics, 25(9): , 2009 Cole Trapnell & Steven L Salzberg, How to map billions of short reads onto genomes, Nature Biotechnology 27, , 2009 Cole Trapnell, et al, Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology 28, , 2010 Zhong Wang, Mark Gerstein & Michael Snyder, RNA-Seq: a revolutionary tool for transcriptomics Nature Reviews Genetics 10, 57-63, 2009

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), 2012-01-26 What is a gene What is a transcriptome History of gene expression assessment RNA-seq RNA-seq analysis

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies Statistical Genomics and Bioinformatics Workshop: Genetic Association and RNA-Seq Studies RNA Seq and Differential Expression Analysis Brooke L. Fridley, PhD University of Kansas Medical Center 1 Next-generation

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

CS-E5870 High-Throughput Bioinformatics RNA-seq analysis

CS-E5870 High-Throughput Bioinformatics RNA-seq analysis CS-E5870 High-Throughput Bioinformatics RNA-seq analysis Harri Lähdesmäki Department of Computer Science Aalto University November 14, 2017 Acknowledgement for J. Salojärvi and E. Czeizler for the previous

More information

Analysis of RNA-seq Data

Analysis of RNA-seq Data Analysis of RNA-seq Data A physicist and an engineer are in a hot-air balloon. Soon, they find themselves lost in a canyon somewhere. They yell out for help: "Helllloooooo! Where are we?" 15 minutes later,

More information

Deep sequencing of transcriptomes

Deep sequencing of transcriptomes 1 / 40 Deep sequencing of transcriptomes An introduction to RNA-seq Michael Dondrup UNI BCCS 2. november 2010 2 / 40 Transcriptomics by Ultra-Fast Sequencing Microarrays have been the primary transcriptomics

More information

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS) RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS) Terms and definitions TRANSCRIPTOME The full set of RNA transcripts and their

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

Background Wikipedia Lee and Mahadavan, JCB, 2009 History (Platform Comparison) P Park, Nature Review Genetics, 2009 P Park, Nature Reviews Genetics, 2009 Rozowsky et al., Nature Biotechnology, 2009

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

RNA-SEQUENCING ANALYSIS

RNA-SEQUENCING ANALYSIS RNA-SEQUENCING ANALYSIS Joseph Powell SISG- 2018 CONTENTS Introduction to RNA sequencing Data structure Analyses Transcript counting Alternative splicing Allele specific expression Discovery APPLICATIONS

More information

measuring gene expression December 11, 2018

measuring gene expression December 11, 2018 measuring gene expression December 11, 2018 Intervening Sequences (introns): how does the cell get rid of them? Splicing!!! Highly conserved ribonucleoprotein complex recognizes intron/exon junctions and

More information

Sequence Analysis 2RNA-Seq

Sequence Analysis 2RNA-Seq Sequence Analysis 2RNA-Seq Lecture 10 2/21/2018 Instructor : Kritika Karri kkarri@bu.edu Transcriptome Entire set of RNA transcripts in a given cell for a specific developmental stage or physiological

More information

Eucalyptus gene assembly

Eucalyptus gene assembly Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted

More information

RNA

RNA RNA sequencing Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271 www.inouyelab.org

More information

CSE 549: RNA-Seq aided gene finding

CSE 549: RNA-Seq aided gene finding CSE 549: RNA-Seq aided gene finding Finding Genes We ll break gene finding methods into 3 main categories. ab initio latin from the beginning w/o experimental evidence comparative make use of knowledge

More information

Systematic evaluation of spliced alignment programs for RNA- seq data

Systematic evaluation of spliced alignment programs for RNA- seq data Systematic evaluation of spliced alignment programs for RNA- seq data Pär G. Engström, Tamara Steijger, Botond Sipos, Gregory R. Grant, André Kahles, RGASP Consortium, Gunnar Rätsch, Nick Goldman, Tim

More information

Analysis of RNA-seq Data. Bernard Pereira

Analysis of RNA-seq Data. Bernard Pereira Analysis of RNA-seq Data Bernard Pereira The many faces of RNA-seq Applications Discovery Find new transcripts Find transcript boundaries Find splice junctions Comparison Given samples from different experimental

More information

Finding Genes with Genomics Technologies

Finding Genes with Genomics Technologies PLNT2530 Plant Biotechnology (2018) Unit 7 Finding Genes with Genomics Technologies Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

RNA-Seq Analysis. Simon Andrews, Laura v

RNA-Seq Analysis. Simon Andrews, Laura v RNA-Seq Analysis Simon Andrews, Laura Biggins simon.andrews@babraham.ac.uk @simon_andrews v2018-10 RNA-Seq Libraries rrna depleted mrna Fragment u u u u NNNN Random prime + RT 2 nd strand synthesis (+

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure

More information

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment

GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment Zhaojun Zhang, Shunping Huang, Jack Wang, Xiang Zhang, Fernando Pardo

More information

From reads to results: differential. Alicia Oshlack Head of Bioinformatics

From reads to results: differential. Alicia Oshlack Head of Bioinformatics From reads to results: differential expression analysis with ihrna seq Alicia Oshlack Head of Bioinformatics Murdoch Childrens Research Institute Benefits and opportunities ii of RNA seq All transcripts

More information

Intro to RNA-seq. July 13, 2015

Intro to RNA-seq. July 13, 2015 Intro to RNA-seq July 13, 2015 Goal of the course To be able to effectively design, and interpret genomic studies of gene expression. We will focus on RNA-seq, but the class will provide a foothold into

More information

1. Introduction Gene regulation Genomics and genome analyses

1. Introduction Gene regulation Genomics and genome analyses 1. Introduction Gene regulation Genomics and genome analyses 2. Gene regulation tools and methods Regulatory sequences and motif discovery TF binding sites Databases 3. Technologies Microarrays Deep sequencing

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders Two applications of RNA-Seq Discovery find new transcripts find transcript boundaries find splice junctions Comparison Given samples

More information

Introduction of RNA-Seq Analysis

Introduction of RNA-Seq Analysis Introduction of RNA-Seq Analysis Jiang Li, MS Bioinformatics System Engineer I Center for Quantitative Sciences(CQS) Vanderbilt University September 21, 2012 Goal of this talk 1. Act as a practical resource

More information

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis CS-E5870 High-Throughput Bioinformatics Microarray data analysis Harri Lähdesmäki Department of Computer Science Aalto University September 20, 2016 Acknowledgement for J Salojärvi and E Czeizler for the

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

De novo assembly in RNA-seq analysis.

De novo assembly in RNA-seq analysis. De novo assembly in RNA-seq analysis. Joachim Bargsten Wageningen UR/PRI/Plant Breeding October 2012 Motivation Transcriptome sequencing (RNA-seq) Gene expression / differential expression Reconstruct

More information

NGS Data Analysis and Galaxy

NGS Data Analysis and Galaxy NGS Data Analysis and Galaxy University of Pretoria Pretoria, South Africa 14-18 October 2013 Dave Clements, Emory University http://galaxyproject.org/ Fourie Joubert, Burger van Jaarsveld Bioinformatics

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Complete Report Catalogue # and Service: IR16001 rrna depletion (human, mouse, or rat) IR11081 Total RNA Sequencing (80 million reads, 2x75 bp PE) Xxxxxxx - xxxxxxxxxxxxxxxxxxxxxx

More information

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly

SCIENCE CHINA Life Sciences. Comparative analysis of de novo transcriptome assembly SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 156 162 RESEARCH PAPER doi: 10.1007/s11427-013-4444-x Comparative analysis of de novo transcriptome assembly CLARKE Kaitlin 1, YANG

More information

High performance sequencing and gene expression quantification

High performance sequencing and gene expression quantification High performance sequencing and gene expression quantification Ana Conesa Genomics of Gene Expression Lab Centro de Investigaciones Príncipe Felipe Valencia aconesa@cipf.es Next Generation Sequencing NGS

More information

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group RNA-Seq analysis With reference assembly Cormier Alexandre, PhD student UMR8227, Algal Genetics Group Summary 2 Typical RNA-seq workflow Introduction Reference genome Reference transcriptome Reference

More information

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA Stranded, Illumina ready library construction in

More information

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols Next generation sequencing protocol cdna, not RNA sequencing

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

Quantifying gene expression

Quantifying gene expression Quantifying gene expression Genome GTF (annotation)? Sequence reads FASTQ FASTQ (+reference transcriptome index) Quality control FASTQ Alignment to Genome: HISAT2, STAR (+reference genome index) (known

More information

MODULE 5: TRANSLATION

MODULE 5: TRANSLATION MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs

DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs DNA polymorphisms and RNA-Seq alternative splicing blow bubbles in de Bruijn Graphs Nadia Pisanti University of Pisa & Leiden University Outline New Generation Sequencing (NGS), and the importance of detecting

More information

Canadian Bioinforma3cs Workshops

Canadian Bioinforma3cs Workshops Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu

More information

Applications of short-read

Applications of short-read Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Sequencing applications RNA-Seq includes experiments

More information

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016 Introduction to RNAseq Analysis Milena Kraus Apr 18, 2016 Agenda What is RNA sequencing used for? 1. Biological background 2. From wet lab sample to transcriptome a. Experimental procedure b. Raw data

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012 Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 A typical RNA-Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

RNA standards v May

RNA standards v May Standards, Guidelines and Best Practices for RNA-Seq: 2010/2011 I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable properties for quantification,

More information

Optimal Calculation of RNA-Seq Fold-Change Values

Optimal Calculation of RNA-Seq Fold-Change Values International Journal of Computational Bioinformatics and In Silico Modeling Vol. 2, No. 6 (2013): 285-292 Research Article Open Access ISSN: 2320-0634 Optimal Calculation of RNA-Seq Fold-Change Values

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq Sequencing applications Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ RNA-Seq includes experiments

More information

Calculating Sample Size Estimates for RNA Sequencing Data ABSTRACT

Calculating Sample Size Estimates for RNA Sequencing Data ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 20, Number 12, 2013 # Mary Ann Liebert, Inc. Pp. 970 978 DOI: 10.1089/cmb.2012.0283 Calculating Sample Size Estimates for RNA Sequencing Data STEVEN N. HART, 1 TERRY

More information

Single-cell sequencing

Single-cell sequencing Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data

More information

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA

SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA The most sensitive cdna synthesis technology, combined with next-generation

More information

Introduction to RNA-Seq in GeneSpring NGS Software

Introduction to RNA-Seq in GeneSpring NGS Software Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,

More information

RNAseq Differential Gene Expression Analysis Report

RNAseq Differential Gene Expression Analysis Report RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day five Alternative splicing Assembly RNA edits Alternative splicing

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:1.138/nature11233 Supplementary Figure S1 Sample Flowchart. The ENCODE transcriptome data are obtained from several cell lines which have been cultured in replicates. They were either left intact (whole

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

Year III Pharm.D Dr. V. Chitra

Year III Pharm.D Dr. V. Chitra Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves

More information

Supplementary Information for Single-cell sequencing of the small-rna transcriptome

Supplementary Information for Single-cell sequencing of the small-rna transcriptome Supplementary Information for Single-cell sequencing of the small-rna transcriptome Omid R. Faridani 1,6,*, Ilgar Abdullayev 1,2,6, Michael Hagemann-Jensen 1,3, John P. Schell 4, Fredrik Lanner 4,5 and

More information

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute From reads to results: differen1al expression analysis with RNA seq Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute Purported benefits and opportuni1es of RNA seq All transcripts are

More information

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE VM origin Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE NGS intro + Genome-Based Transcript Reconstruction and Analysis Using RNA-Seq Data Based on material

More information

RNA Sequencing Analyses & Mapping Uncertainty

RNA Sequencing Analyses & Mapping Uncertainty RNA Sequencing Analyses & Mapping Uncertainty Adam McDermaid 1/26 RNA-seq Pipelines Collection of tools for analyzing raw RNA-seq data Tier 1 Quality Check Data Trimming Tier 2 Read Alignment Assembly

More information

Homework 4. Due in class, Wednesday, November 10, 2004

Homework 4. Due in class, Wednesday, November 10, 2004 1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013 Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA March 2, 2013 Steven R. Kain, Ph.D. ABRF 2013 NuGEN s Core Technologies Selective Sequence Priming Nucleic Acid Amplification

More information

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS

SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS SO YOU WANT TO DO A: RNA-SEQ EXPERIMENT MATT SETTLES, PHD UNIVERSITY OF CALIFORNIA, DAVIS SETTLES@UCDAVIS.EDU Bioinformatics Core Genome Center UC Davis BIOINFORMATICS.UCDAVIS.EDU DISCLAIMER This talk/workshop

More information

Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Transcriptomics Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Central dogma of molecular biology Central dogma of molecular biology Genome Complete DNA content of

More information

RNA-Seq analysis using R: Differential expression and transcriptome assembly

RNA-Seq analysis using R: Differential expression and transcriptome assembly RNA-Seq analysis using R: Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 12/7/2016 Agenda Brief about RNA-seq and experiment design Gene oriented analysis Gene quantification

More information

CBC Data Therapy. Metatranscriptomics Discussion

CBC Data Therapy. Metatranscriptomics Discussion CBC Data Therapy Metatranscriptomics Discussion Metatranscriptomics Extract RNA, subtract rrna Sequence cdna QC Gene expression, function Institute for Systems Genomics: Computational Biology Core bioinformatics.uconn.edu

More information

A normalization method based on variance and median adjustment for massive mrna polyadenylation data

A normalization method based on variance and median adjustment for massive mrna polyadenylation data ISSN : 0974-7435 Volume 8 Issue 4 A normalization method based on variance and median adjustment for massive mrna polyadenylation data Guoli Ji, Ying Wang, Mingchen Wu, Yangzi Zhang, Xiaohui Wu* Department

More information

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data

Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data Jan Hellemans 7th international qpcr & NGS Event - Freising March 24 th, 2015 Therapeutics lncrna oncology

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how

More information

SUPPLEMENTARY FIGURES AND TABLES

SUPPLEMENTARY FIGURES AND TABLES SUPPLEMENTARY FIGURES AND TABLES Supplementary Figure S1: Plot shows the percentage of initially identified backsplice junctions against the Read depth value. Junctional circrna candidate have one mate

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. DMS-MaPseq data are highly reproducible at elevated DMS concentrations.

Nature Methods: doi: /nmeth Supplementary Figure 1. DMS-MaPseq data are highly reproducible at elevated DMS concentrations. Supplementary Figure 1 DMS-MaPseq data are highly reproducible at elevated DMS concentrations. a, Correlation of Gini index for 202 yeast mrna regions with 15x coverage at 2.5% or 5% v/v DMS concentrations

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Genomics. Data Analysis & Visualization. Camilo Valdes

Genomics. Data Analysis & Visualization. Camilo Valdes Genomics Data Analysis & Visualization Camilo Valdes cvaldes3@miami.edu https://github.com/camilo-v Center for Computational Science, University of Miami ccs.miami.edu Today Sequencing Technologies Background

More information

TECH NOTE Stranded NGS libraries from FFPE samples

TECH NOTE Stranded NGS libraries from FFPE samples TECH NOTE Stranded NGS libraries from FFPE samples Robust performance with extremely degraded FFPE RNA (DV 200 >25%) Consistent library quality across a range of input amounts (5 ng 50 ng) Compatibility

More information

Wheat CAP Gene Expression with RNA-Seq

Wheat CAP Gene Expression with RNA-Seq Wheat CAP Gene Expression with RNA-Seq July 9 th -13 th, 2018 Overview of the workshop, Alina Akhunova http://www.ksre.k-state.edu/igenomics/workshops/ RNA-Seq Workshop Activities Lectures Laboratory Molecular

More information

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD) Analysis of RNA-seq Data Feb 8, 2017 Peikai CHEN (PHD) Outline What is RNA-seq? What can RNA-seq do? How is RNA-seq measured? How to process RNA-seq data: the basics How to visualize and diagnose your

More information

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016 Bi 8 Lecture 5 MORE ON HOW WE KNOW WHAT WE KNOW and intro to the protein code Ellen Rothenberg 19 January 2016 SIZE AND PURIFICATION BY SYNTHESIS: BASIS OF EARLY SEQUENCING complex mixture of aborted DNA

More information

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six

More information

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies

Consensus Ensemble Approaches Improve De Novo Transcriptome Assemblies University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Introduction to Microarray Analysis

Introduction to Microarray Analysis Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray

More information

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses Course Information Introduction to Algorithms in Computational Biology Lecture 1 Meetings: Lecture, by Dan Geiger: Mondays 16:30 18:30, Taub 4. Tutorial, by Ydo Wexler: Tuesdays 10:30 11:30, Taub 2. Grade:

More information

User s Manual Version 1.0

User s Manual Version 1.0 User s Manual Version 1.0 University of Utah School of Medicine Department of Bioinformatics 421 S. Wakara Way, Salt Lake City, Utah 84108-3514 http://genomics.chpc.utah.edu/cas Contact us at issue.leelab@gmail.com

More information

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks.

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. Open Seqmonk Launch SeqMonk The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything is in order, indicated by the green check marks. SeqMonk Analysis Page 1 Create

More information