Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Size: px
Start display at page:

Download "Transcriptomics analysis with RNA seq: an overview Frederik Coppens"

Transcription

1 Transcriptomics analysis with RNA seq: an overview Frederik Coppens

2 Platforms Applications Analysis Quantification RNA content

3 Platforms

4 Platforms Short (few hundred bases) Long reads (multiple kilobases)

5 What is your biological question?

6 Applications

7 Applications Quantification Gene expression Targeted gene expression Non coding RNA Small RNA Known features RNA content de novo transcriptome assembly whole transcripts single cell RNA seq Discovery

8 Library prep Sequencing Analysis

9 Quantification Gene expression Targeted gene expression Non coding RNA Small RNA

10 Quantification Select for features in library preparation High coverage per feature Assign read to a feature

11 Quantification Single reads versus Paired End Read length: from 50 to 600 bases Stranded Sequencing depth

12 Library preparation RNA extraction Selection e.g. poly A, ribosomal depletion Fragmentation e.g. 250 nt cdna synthesis e.g. random primed Adapter ligation & PCR amplification

13 Single read versus Paired End

14 Longer & Paired reads increases mapping specificity unique mapping quantification accuracy Good quality reference genome: Short Single reads Low quality or repetitive rich : Long Paired End reads

15 Strandedness Original non stranded protocol Stranded now standard Most common protocol d UTP based: reverse complement Reverse First strand Template Anti sense

16 Sequencing depth

17 Sequencing depth

18 Sequencing depth

19 Sequencing depth

20 Coverage For expression quantification 20 million fragments is sufficient : 20K expressed genes at average 1000 counts

21 Low expressed genes? Bioinformatics. 2011;27(13):i383 i391

22 Low expressed genes? Bioinformatics. 2011;27(13):i383 i391

23 Targeted expression profiling Hybridization + RNA seq Select for subset Exome sequencing Gene panels e.g. disease related Genome regions Library prep contains hybridization step Decreases read depth needed Relies on good selection probes!

24 RNA content de novo transcriptome assembly whole transcripts single cell RNA seq

25 Determine transcripts expressed resequencing de novo assembly whole transcripts

26 What is your biological question?

27 Transcriptome re sequencing Determine variants to reference Single Nucleotide Polymorphism (SNP) Copy Number Variation (CNV) Insertion & Deletion (InDel) Structural Variant (SV)

28 de novo transcript assembly Assemble short reads to transcripts Combine data Paired End data Mate pair data Long reads Labour intensive Results vary with algorithm

29 Sequence the whole transcript Pacific Biosciences: Iso seq Oxford Nanopore cdna direct RNA in early access Normalization in library preparation? Combining size selections

30 Analysis Quantification Genome or transcriptome mapping Transcriptome inference RNA content Assembly Whole transcripts Single cell

31 Quantification analysis

32 Quantification analysis Quality Control Quality Filtering Technical control Discover biases in your data Tools : FastQC, RSeQC,... Mapping Summarization

33 Quantification analysis Quality Control Quality Filtering Mapping Remove low quality reads Trim adapters Remove too short after trimming For quantification: no need to be very strict Tools : Trimmomatic, FastX toolkit, cutadapt,... Summarization

34 Quantification analysis Quality Control Quality Filtering Mapping Map to reference genome Splice site capable! Tools: GSNAP, HISAT2, STAR,... Map to reference transcripts Tools: same + Bowtie, BWA,... Number of mismatches allowed? Summarization

35 Quantification analysis Quality Control Quality Filtering Mapping Genome: assign to annotated feature Tools : HTSeq count, featurecount Transcripts: count Tools : Samtools Summarization

36 Inference based Quality Control Quality Control Quality Filtering Quality Filtering Mapping Transcriptome inference Summarization

37 Inference based Quality Control Quality Filtering Infer transcript Handles isoforms Fast! Tools : Salmon, Kallisto,... Transcriptome inference

38 Thank you