Transcriptomics analysis with RNA seq: an overview Frederik Coppens
|
|
- Cody Parker
- 5 years ago
- Views:
Transcription
1 Transcriptomics analysis with RNA seq: an overview Frederik Coppens
2 Platforms Applications Analysis Quantification RNA content
3 Platforms
4 Platforms Short (few hundred bases) Long reads (multiple kilobases)
5 What is your biological question?
6 Applications
7 Applications Quantification Gene expression Targeted gene expression Non coding RNA Small RNA Known features RNA content de novo transcriptome assembly whole transcripts single cell RNA seq Discovery
8 Library prep Sequencing Analysis
9 Quantification Gene expression Targeted gene expression Non coding RNA Small RNA
10 Quantification Select for features in library preparation High coverage per feature Assign read to a feature
11 Quantification Single reads versus Paired End Read length: from 50 to 600 bases Stranded Sequencing depth
12 Library preparation RNA extraction Selection e.g. poly A, ribosomal depletion Fragmentation e.g. 250 nt cdna synthesis e.g. random primed Adapter ligation & PCR amplification
13 Single read versus Paired End
14 Longer & Paired reads increases mapping specificity unique mapping quantification accuracy Good quality reference genome: Short Single reads Low quality or repetitive rich : Long Paired End reads
15 Strandedness Original non stranded protocol Stranded now standard Most common protocol d UTP based: reverse complement Reverse First strand Template Anti sense
16 Sequencing depth
17 Sequencing depth
18 Sequencing depth
19 Sequencing depth
20 Coverage For expression quantification 20 million fragments is sufficient : 20K expressed genes at average 1000 counts
21 Low expressed genes? Bioinformatics. 2011;27(13):i383 i391
22 Low expressed genes? Bioinformatics. 2011;27(13):i383 i391
23 Targeted expression profiling Hybridization + RNA seq Select for subset Exome sequencing Gene panels e.g. disease related Genome regions Library prep contains hybridization step Decreases read depth needed Relies on good selection probes!
24 RNA content de novo transcriptome assembly whole transcripts single cell RNA seq
25 Determine transcripts expressed resequencing de novo assembly whole transcripts
26 What is your biological question?
27 Transcriptome re sequencing Determine variants to reference Single Nucleotide Polymorphism (SNP) Copy Number Variation (CNV) Insertion & Deletion (InDel) Structural Variant (SV)
28 de novo transcript assembly Assemble short reads to transcripts Combine data Paired End data Mate pair data Long reads Labour intensive Results vary with algorithm
29 Sequence the whole transcript Pacific Biosciences: Iso seq Oxford Nanopore cdna direct RNA in early access Normalization in library preparation? Combining size selections
30 Analysis Quantification Genome or transcriptome mapping Transcriptome inference RNA content Assembly Whole transcripts Single cell
31 Quantification analysis
32 Quantification analysis Quality Control Quality Filtering Technical control Discover biases in your data Tools : FastQC, RSeQC,... Mapping Summarization
33 Quantification analysis Quality Control Quality Filtering Mapping Remove low quality reads Trim adapters Remove too short after trimming For quantification: no need to be very strict Tools : Trimmomatic, FastX toolkit, cutadapt,... Summarization
34 Quantification analysis Quality Control Quality Filtering Mapping Map to reference genome Splice site capable! Tools: GSNAP, HISAT2, STAR,... Map to reference transcripts Tools: same + Bowtie, BWA,... Number of mismatches allowed? Summarization
35 Quantification analysis Quality Control Quality Filtering Mapping Genome: assign to annotated feature Tools : HTSeq count, featurecount Transcripts: count Tools : Samtools Summarization
36 Inference based Quality Control Quality Control Quality Filtering Quality Filtering Mapping Transcriptome inference Summarization
37 Inference based Quality Control Quality Filtering Infer transcript Handles isoforms Fast! Tools : Salmon, Kallisto,... Transcriptome inference
38 Thank you