RNAseq analysis -it s complicated. November 2017

Size: px
Start display at page:

Download "RNAseq analysis -it s complicated. November 2017"

Transcription

1 RNAseq analysis -it s complicated November 2017

2 RNA reads are not enough to identify functional RNAs Defining functional DNA elements in the human genome Kellis M et al. PNAS 2014;111:

3 All the steps will affect the results All RNA

4 All the steps will affect the results Experimental setup All R A

5 All the steps will affect the results Lab work + RNA extraction Expeimental setu All R A

6 All the steps will affect the results RNA enrichment protocoll Expeimental setu All R A

7 All the steps will affect the results Sequencing machine Expeimental setu All R A

8 All the steps will affect the results Reference Expeimental setu All R A

9 All the steps will affect the results Mapping program Expeimental setu All R A

10 All the steps will affect the results Differential expression analysis program Expeimental setu All R A

11 Try to be as consistent as possible Differential expression analysis program Expeimental setu All R A Differential expression analysis Expeimental program setu All R A Differential expression analysis Expeimental program setu All R A Differential expression analysis Expeimental programsetu All R A Use a pipeline!

12 Gene and Isoform detection

13 Promises and pitfalls Long reads Low throughput (-) Complete transcripts (++) Not quantitative (-) Only highly expressed genes (-) Expensive (-) Low background noise (+) Easy downstream analysis (++) short reads High throughput (++) Quantitative (++) Fractions of transcripts (-) Full dynamic range (+-) Unlimited dynamic range (+) Cheap (+) Low background noise (+) Strand specificity (+) Re-sequencing (+)

14 RNA-seq analysis workflow reads.fastq.gz( Reads( STAR( Mapping( Reference( genome.fa( mappedreads.bam( Mapped(reads( Gene( expression( Gene(annotaCon: ref.bed(/(ref.ga(

15 Do a lot of QC Read(QC( $(FastQC( Mapping( stacsccs( Mapping(QC( $(RseQC( reads.fastq.gz( Reads( STAR( Mapping( Reference( genome.fa( Outlier( deteccon,( sample(swaps( mappedreads.bam( Mapped(reads( Gene( expression( Gene(annotaCon:( ref.bed(/(ref.ga(

16 species C.grandiflora hybrid C.rubella C.grandiflora Cg4a_KS2 Cg88_14_KS3 Cg88_44_KS2 Cg89_16_KS2 Cg89_3_KS1 Cg8c_KS2 Cg8f_KS1 Cg94_6_KS1 Cr1GR1_2_KS1 Cr1GR1_2_KS2 Cr1GR1_2_KS3 Cr23_9_KS2 Cr39_1_TS1_KS1 Cr75_2_3_KS1 Cr79_29_extra1 Cr84_21_KS1 CrN22561_KS2 Inter3_1 Inter4_1_1 Inter4_1_2 Inter4_1_3 Inter4_1_4 Inter5_1 Intra6_3 Intra7_2_1 Intra7_2_2 Intra7_2_3 Intra8_2 Percent properly mapped paired end reads More variation when using top hat 2 with default settings than when using STAR or Stampy with default setting Compare mapping efficacy of different RNA seq assemblers 70 Colour Stampy STAR Tophat2 Shape Flower Leaf

17 RNA QC PCA(analysis(detected(potenCal( sample(swaps(

18 C. rubella Principal component 1 separates samples from flowers and leaves PC Cg4a_KS2_F Cg4a_KS2_L Cg88_14_KS3_F Cg88_14_KS3_L Cg88_44_KS2_F Cg88_44_KS2_L Cg89_16_KS2_F Cg89_16_KS2_L Cg89_3_KS1_F Cg89_3_KS1_L Cg8c_KS2_F Cg8c_KS2_L Cg8f_KS1_F Cg8f_KS1_L Cg94_6_KS1_F Cg94_6_KS1_L Cr1GR1_2_KS1_F Cr1GR1_2_KS1_L Cr1GR1_2_KS2_F Cr1GR1_2_KS2_L Cr1GR1_2_KS3_F Cr1GR1_2_KS3_L Cr23_9_KS2_F Cr23_9_KS2_L Cr39_1_TS1_KS1_F Cr39_1_TS1_KS1_L Cr75_2_3_KS1_F Cr75_2_3_KS1_L Cr79_29_extra1_F Cr79_29_extra1_L Cr84_21_KS1_F Cr84_21_KS1_L CrN22561_KS2_F CrN22561_KS2_L Inter3_1_F Inter3_1_L Inter4_1_1_F Inter4_1_1_L Inter4_1_2_F Inter4_1_2_L Inter4_1_3_F Inter4_1_4_L Inter5_1_F Inter5_1_L Intra6_3_F Intra6_3_L Intra7_2_1_F Intra7_2_1_L Intra7_2_2_F Intra7_2_2_L Intra7_2_3_F Intra7_2_3_L Intra8_2_F Intra8_2_L PC1

19 Expression levels are similar between RT-qPCR and RNA-seq data Figure 1. Gene expression correlation between RT-qPCR and RNA-seq data. The Pearson correlation coefficients and linear regression line are indicated. Results are based on RNA-seq data from dataset 1.

20 Most problems are consistent so they disappear when you do diff-exp analysis Figure 3. High fold change correlation between RT-qPCR and RNA-seq data for each workflow. The correlation of the fold changes was calculated by the Pearson correlation coefficient. Results are based on RNAseq data from dataset 1.

21 Gene-set analysis (GSA) Samples Genes Immune response Pyruvate PPARG GO-terms Pathways Chromosomal locations Transcription factors Histone modifications Diseases etc Gene-level data Gene-set analysis Gene-set data (results) We will focus on transcriptomics and differential expression analysis However, GSA can in principle be used on all types of genome-wide data.

22 Analysis regarding Type II Diabetes p values up p values down mean, distinct directional (both) e e KEGG_STEROID_BIOSYNTHESIS KEGG_AMINOACYL_TRNA_BIOSYNTHESIS KEGG_RIBOSOME KEGG_TERPENOID_BACKBONE_BIOSYNTHESIS Max node size: 177 genes Min node size: 15 genes Max edge width: 84 genes Min edge width: 1 genes KEGG_PYRIMIDINE_METABOLISM KEGG_DNA_REPLICATION KEGG_PURINE_METABOLISM KEGG_CELL_CYCLE KEGG_PENTOSE_PHOSPHATE_PATHWAY KEGG_GLUTATHIONE_METABOLISM KEGG_GLYCOLYSIS_GLUCONEOGENESIS KEGG_AMYOTROPHIC_LATERAL_SCLEROSIS_ALS KEGG_HUNTINGTONS_DISEASE KEGG_PARKINSONS_DISEASE KEGG_ALZHEIMERS_DISEASE KEGG_FOCAL_ADHESION KEGG_CARDIAC_MUSCLE_CONTRACTION KEGG_VIBRIO_CHOLERAE_INFECTION KEGG_ALDOSTERONE_REGULATED_SODIUM_REABSORPTION KEGG_OXIDATIVE_PHOSPHORYLATION KEGG_PROTEIN_EXPORT KEGG_PROTEASOME KEGG_LINOLEIC_ACID_METABOLIS KEGG_ECM_RECEPTOR_INTERACTION KEGG_GLYCOSPHINGOLIPID_BIOSYNTHESIS_LACTO_AND

23 Expression of genes on pathway

24 Exercises Mapping STAR HISAT2 Tutorial for reference guided assembly Cufflinks Stringtie Tutorial for de novo assembly Trinity Visualise mapped reads and assembled transcripts on reference IGV RNA quality controll Tutorial for RNA seq Quality Control Differential expression analysis DEseq2 Calisto and Sleuth multi variate analysis in SIMCA Introductory Introduction to the RNA seq data provided Short introduction to R Short introduction to IGV Beta labs Single cell RNA PCA and clustering Gene set analysis UPPMAX sbatch script example small RNA analysis mirna analysis

25 Need help?? We are here for you. Apply for help.