Size: px
Start display at page:

Download ""

Transcription

1

2

3 Background

4

5 Wikipedia

6

7 Lee and Mahadavan, JCB, 2009

8 History (Platform Comparison)

9 P Park, Nature Review Genetics, 2009

10 P Park, Nature Reviews Genetics, 2009

11 Rozowsky et al., Nature Biotechnology, 2009

12 Chromatin Immunoprecipitation (ChIP)

13 PJ Farnham, 2009

14

15 VS.

16 P Park, 2009

17 DNA TF1 TF1 TF3 TF4 TF2 Region of DNA Isolated via CHIP

18 PJ Farnham, Nature Reviews Genetics, 2009

19 Reference Samples

20

21 Auerbach et al, PNAS, 2009

22 Bioinformatics

23

24

25

26

27

28

29 Pepke et al., Nature Methods, 2009

30 Rozowsky et al., Nature Biotechnology, 2009

31 P Park, 2009

32 Transcriptome Analysis using RNA-Seq CBB752 Lukas Habegger 04/07/2010

33 Outline Background Comparison of RNA-Seq to previous methods Informatics Mapping of RNA-Seq reads Calculating gene expression values Transcriptome analysis of human embryonic stem cells undergoing neural differentiation 2

34 Background The transcriptome is the complete set of transcripts in a cell population For a specific developmental stage or physiological condition Understanding the transcriptome is essential to Interpret the functional elements of the genome Understand development and disease 3

35 Aims of Transcriptomics The key aims of transcriptomics are: To catalog the types of transcripts present in a cell population mrnas Long non-coding RNAs Small RNAs To determine the transcriptional structure of genes To quantify the abundance of transcript isoforms 4

36 Outline Background Comparison of RNA-Seq to previous methods Informatics Mapping of RNA-Seq reads Calculating gene expression values Transcriptome analysis of human embryonic stem cells undergoing neural differentiation 5

37 Previous methods: Sequence-based approaches Sanger sequencing of cdna/est libraries Low throughput Expensive Not very quantitative Tag-based methods SAGE (Serial Analysis of Gene Expression) MPSS (Massively Parallel Signature Sequencing) Limitations: Expensive (based on Sanger sequencing) Unable to distinguish transcript isoforms 6

38 Previous methods: Hybridization-based approaches Gene expression arrays Exon arrays Tiling arrays Limitations: Cross-hybridization Resolution Dynamic range Source: wikipedia 7

39 Overview of an RNA-Seq experiment 8 Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, (2009).

40 Benefits of RNA-Seq Ability to distinguish different isoforms Ability to distinguish allelic gene expression Detection of RNA editing events De-novo transcript assembly Detection of fusion transcripts High-throughput Nucleotide level resolution 9

41 Comparison between methods Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, (2009). 10

42 Outline Background Comparison of RNA-Seq to previous methods Informatics Mapping of RNA-Seq reads Calculating gene expression values Transcriptome analysis of human embryonic stem cells undergoing neural differentiation 11

43 Mapping RNA-Seq reads Short vs. long reads Short-read mappers Algorithms based on seed-based indexing Algorithms based on Borrows-Wheeler Transform (BWT) 12

44 Index-based short-read mappers 13 Adapted from Trapnell, C. & Salzberg, S.L. How to map billions of short reads onto genomes. Nat Biotech 27, (2009)

45 BWT-based short-read mappers Adapted from Flicek, P. & Birney, E. Sense from sequence reads: methods for alignment and assembly. Nat Meth 6, S6-S12 (2009). 14 Adapted from Trapnell, C. & Salzberg, S.L. How to map billions of short reads onto genomes. Nat Biotech 27, (2009)

46 Mapping of RNA-Seq reads DNA cdna/rna Map reads to the reference genome and splice junction library 15 Adapted from Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat Meth 6, S22-S32 (2009).

47 TopHat: Identification of novel splice junctions Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics,

48 Transcript assembly Assemble transcripts directly based on RNA-Seq data Advantage: Detect structural alterations that are not present in reference genome Computationally intensive Not suitable for lowly expressed genes 17

49 Outline Background Comparison of RNA-Seq to previous methods Informatics Mapping of RNA-Seq reads Calculating gene expression values Transcriptome analysis of human embryonic stem cells undergoing neural differentiation 18

50 Calculating exon/gene expressions values Reads per kilobase per million mapped reads (RPKM) Composite gene model Isoform 1 Isoform 2 Composite Model Transcript isoform quantification 19

51 Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth 5, (2008). 20

52 Outline Background Comparison of RNA-Seq to previous methods Informatics Mapping of RNA-Seq reads Calculating gene expression values Transcriptome analysis of human embryonic stem cells undergoing neural differentiation 21

53 Transcriptome analysis of human embryonic stem cells undergoing neural differentiation Technologies: 454 (2M reads) Illumina (250M reads) Single-end Paired-end Samples: Human embryonic stem cells Neuronal precursors 3 stages 22

54 Example: Long/short RNA-Seq reads Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 23

55 Differential gene expression Embryonic stem cells (hesc): RPKM Neural progenitor cells (N2): RPKM Adapted from Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 24

56 Differential gene expression Embryonic stem cells (hesc): RPKM NCAM: Neural Cell Adhesion Molecule Neural progenitor cells (N2): RPKM Adapted from Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 25

57 Differential splicing Embryonic stem cells (hesc): RPKM Embryonic stem cells (hesc) Neural progenitor cells (N2) Neural progenitor cells (N2): RPKM SLK: Serine / Threonine Kinase 2 Adapted from Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 26

58 Fraction of genes detected N2 hesc 1x coverage N2 hesc 5x coverage Adapted from Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 27

59 Number of splice junctions detected Known splice junctions Novel splice junctions Adapted from Wu, J.Q., Habegger, L. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proceedings of the National Academy of Sciences 107, (2010). 28

60 Summary RNA-Seq has many advantages compared to conventional methods Higher resolution Larger dynamic range Method of choice to study the structure and dynamics of the transcriptome Connectivity of exons Alternative splicing 29