Gene Expression analysis with RNA-Seq data

Size: px
Start display at page:

Download "Gene Expression analysis with RNA-Seq data"

Transcription

1 Gene Expression analysis with RNA-Seq data C3BI Hands-on NGS course November 24th 2016 Frédéric Lemoine

2 Plan Quality Control 3. Read Mapping 4. Gene Expression Analysis 5. Splicing/Transcript Analysis 6. Other Analyses 7. Visualization 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 2

3 Sequencing : Reminder High throupghput sequencing (HTS pour high-throughput sequencing), or NGS is a set of methods developped in 2005, that produce millions of sequences in a run, at a low cost. Example: Genome «reads» Coverage : 10 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 3

4 Sequencers (Illumina) NextSeq 500 HiSeq 4000 X Ten Max Output 120 Gb Max Read Number 800 M Max Read Length 2x150 bp Run time 29 h 9 exomes per run Output 1500 Gb Read Number 4->5 B Read Length 2x150 bp Run time 1 -> 3,5 D 12 genomes per run Max Output 1800 Gb Max Read Number 6 B Max Read Length 2x150 bp Run time < 3 D > human genomes per year 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 4

5 Sequencing: Applications DNA-Seq: DNA Sequencing CHIP-Seq: Study of protein/dna interaction CLIP-Seq: Study Protein/RNA interaction RNA-Seq : RNA Sequencing 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 5

6 Sequencing: Applications DNA-Seq: DNA Sequencing CHIP-Seq: Study of protein/dna interaction CLIP-Seq: Study Protein/RNA interaction RNA-Seq : RNA Sequencing 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 6

7 RNA-Seq : Definition RNA-Seq allows to reveal the presence and quantity of RNA in a genome at a given moment in time «reads» Chromosome Genes Junction reads Exonic reads Coverage Qualitative + Quantitative! 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 7

8 RNA-Seq : Data types reads Illumina flowcell reads per lane sequencing lanes 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 8

9 RNA-Seq : Data types A little modification of library preparation allows to read both ends (forward and reverse) of the fragments. FastQ File 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 9

10 RNA-Seq : Data Types 1 length=76 GTCGATGATGCCTGCTAAACTGCAGCTTGACGTACTGCGGACCCTGCAGTCCAGCGCTCGTCATGGAACGCAAACG + HHHHHHHHHHHHFGHHHHHHFHHGHHHGHGHEEHHHHHEFFHHHFHHHHBHHHEHFHAH?CEDCBFEFFFFAFDF9 FASTA format >SRR length=76 GTCGATGATGCCTGCTAAACTGCAGCTTGACGTACTGCGGACCCTGCAGTCCAGCGCTCGTCATGGAACGCAAACG SFF - Standard Flowgram Format - binary format for 454 reads Colorspace (SOLiD) - 2_34_121_F3 T ;;9:;>+0*&:*.*1-.5($2$3&$570*$575&$9966$5835'665 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 10

11 RNA-Seq : FASTQ Read ID Read sequence Per base quality 1:N:0:GCCAAT GGAAAACATATTCACCCAAGACCTGT Read ID Details Column Descrip.on HWI-ST946 Unique iden8fier of sequencer 381 Project (run) iden8fier C2ABHACXX Flowcell Iden8fier 1 Lane number into flowcell 1101 Tile number in Lane 1154 X Coordinate of the read cluster in the Tile 2156 Y Coordinate of the read cluster in the Tile 1 Read number in the pair (1 or 2). Only if paired-end/mate-pair sequencing N Pass read filter :«Y» or «N».«N» indicates a bad read. 0 0 when no control bit is ac8vated GCCAAT Index of the sequence: When several samples are mul8pexed 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 11

12 RNA-Seq : FASTQ Scores 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 12

13 RNA-Seq : FASTQ Scores Phred Quality Score Incorrect iden.fica.on probability Base iden.fica.on precision 10 1/10 90% 20 1/100 99% 30 1/ % 40 1/ % 50 1/ % 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 13

14 RNA-Seq : Applications Measure gene expression Measure alternative splicing Detect expressed mutations Gene annotation (new exons) Detect fusion transcripts 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 14

15 RNA-Seq : Whole pipeline Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 15

16 RNA-Seq : Whole pipeline Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 16

17 RNA-Seq : Quality control Sequence quality controls Mapping quality controls We use FastQC & PicardTools 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 17

18 Sequence quality: FastQC Documentation 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 18

19 Sequence quality: Per base A warning will be issued if the lower quar.le for any base is less than 10, or if the median for any base is less than 25. A failure is raised if the lower quar.le for any base is less than 5 or if the median for any base is less than /12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 19

20 Sequence quality: Per sequence S1 S2 A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate. An error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 20

21 Sequence quality: GC content per sequence S1 S2 A warning is raised if the sum of the devia.ons from the normal distribu.on represents more than 15% of the reads. This module will indicate a failure if the sum of the devia.ons from the normal distribu.on represents more than 30% of the reads. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 21

22 Sequence quality: N content S2 S3 This module raises a warning if any posi.on shows an N content of >5%. This module will raise an error if any posi.on shows an N content of >20%. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 22

23 Sequence quality: Overrepresented sequences S1 S2 This module will issue a warning if any sequence is found to represent more than 0.1% of the total. This module will issue an error if any sequence is found to represent more than 1% of the total. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 23

24 Sequence quality: Nucleotide content S1 S2 This module issues a warning if the difference between A and T, or G and C is greater than 10% in any posi.on. This module will fail if the difference between A and T, or G and C is greater than 20% in any posi.on. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 24

25 Quality control: Mapping Sample Uniquely mapped reads Mapped to too many loci Mapped to mul.ple loci Unmapped reads: too short Unmapped reads: other s % 4.93% 2.54% 11.97% 0.64% s % 6.73% 3.38% 5.29% 1.03% s % 4.10% 2.32% 6.38% 0.52% s % 5.21% 2.84% 4.29% 0.79% s % 2.92% 2.08% 8.98% 0.44% s % 5.82% 3.45% 7.36% 1.02% s % 3.28% 2.04% 6.27% 0.42% s % 3.80% 2.24% 11.67% 0.48% s % 4.17% 2.43% 10.32% 0.56% s % 4.31% 2.58% 4.52% 0.70% s % 4.96% 2.41% 14.16% 0.70% s % 11.45% 3.85% 4.65% 2.16% s % 4.13% 2.42% 7.22% 0.56% s % 7.21% 3.68% 6.54% 1.28% s % 5.39% 2.92% 6.11% 0.82% s % 3.57% 2.67% 24.05% 0.34% s % 2.88% 2.12% 1.90% 0.58% STAR 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 25

26 Quality control: Transcript coverage S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 Picard Tools 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 26

27 Quality control: Read localization Genes Exons Introns UTRs Intergenic regions rrnas 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 27

28 Quality control: Read localization % BASES Sample RIBOSOMAL CODING UTR INTRONIC INTERGENIC MRNA USABLE s1 0% 79% 15% 2% 3% 95% 76% s2 0% 76% 18% 2% 4% 93% 80% s3 0% 83% 13% 2% 2% 96% 84% s4 0% 80% 15% 2% 3% 95% 84% s5 0% 84% 13% 2% 2% 97% 82% s6 0% 76% 17% 2% 4% 93% 78% s7 0% 84% 12% 2% 2% 97% 85% s8 0% 83% 13% 2% 2% 96% 78% s9 0% 81% 14% 2% 2% 96% 79% s10 0% 83% 13% 2% 3% 96% 85% s11 0% 83% 13% 2% 3% 95% 74% s12 0% 73% 19% 3% 5% 92% 74% s13 0% 82% 14% 2% 2% 96% 82% s14 0% 74% 19% 3% 5% 93% 77% s15 0% 78% 16% 2% 3% 95% 81% s16 0% 75% 19% 2% 4% 94% 62% s17 0% 87% 10% 1% 2% 97% 91% Picard Tools 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 28

29 Quality control: Read localization Picard Tools 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 29

30 RNA-Seq : Whole pipeline Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 30

31 Mapping RNA-Seq 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 31

32 Mapping RNA-Seq: Difficult Splice Junctions! 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 32

33 Mapping RNA-Seq: Tophat Tophat pipeline Trapnell et. al. Bioinforma0cs, /12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 33

34 Mapping RNA-Seq: STAR 1) Search MMP : SA 2) Alignment clustering Dobin et. al. Bioinforma0cs, /12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 34

35 Format SAM SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*) Sequence ID Flag Chr Position Map Qual Cigar Paired end info HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M1S = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 35

36 Format SAM SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*)... Sequence Base qualities Optional tags AGAGAATCGACAAAAGGCTCTGGCCCG CCCFFFFFHHHHHJJJIJIJJJJJJJIJJJB NH:i:1 HI:i:1 AS:i:197 nm:i:0 TCTGGCCCGCAGAGCTGAGAAGTTATT DDDDDBDBDCDDDDDEDDDEDDCCAACDEEE NH:i:1 HI:i:1 AS:i:197 nm:i:0 AACGAATGTAACTTTAAGGCAGGAAAG CCCFFFFFHHHHHJJJJJJJJJJIJJJIIII NH:i:1 HI:i:1 AS:i:198 nm:i:0 ATAGAGGCCCTCTAAATAAGGAATAAA DDDDDDDFFFDDHHHHHJIIGJJJIJIGGCJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 CCTGAGATGTGCGTAGCCTCCGTGTAA CCCFFFFFHHHHHJJJJJJJJJIJIJJJJJJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 ACCCAGCCTTTACCAGCAGCGTACGGC ADDDDDDCDDDCDDDDDDDDDDDFFFFHHHH NH:i:1 HI:i:1 AS:i:198 nm:i:0 GCTGGCATGGTGGTGGGCACCCATAAT CCCFFFFDHHFHHHGIJIJJJJJJJJJJIJJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 GGGCACCCATAATCCTAGCTGCTCAGG DDDBCDCDDDDDCDDDDDDEEECCCFFFEHH NH:i:1 HI:i:1 AS:i:198 nm:i:0 GCCCTTTCAACTTTCCCTCTGGTCCTT CCCFFFFFHHHHHJJIJJIJJJGIJJJJJJJ NH:i:1 HI:i:1 AS:i:196 nm:i:1 CACATCCCCATCTGGGCCCTCTCCTTT DDDDDDDDDCBDDDDDDDDCDEFFFFFHHHH NH:i:1 HI:i:1 AS:i:196 nm:i:1 (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 36

37 Format SAM SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*) Sequence ID Flag Chr Position Map Qual Cigar Paired end info HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M1S = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 37

38 SAM Fields : FLAG FLAG : Combination of bitwise FLAGs Example: Decimal Flag Value 83 Binary Flag Value To each bit corresponds a meaning 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 38

39 SAM Fields : «Explain FLAG» tool h0ps://broadins6tute.github.io/picard/explain-flags.html 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 39

40 SAM Format SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*) Sequence ID Flag Chr Position Map Qual Cigar Paired end info HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M1S = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 40

41 SAM Fields : Mapping quality Higher mapping quality = more unique = 10 log10 ( ) With p : Estimate of the probability that the alignment does not correspond to the read s true point of origin A mapping quality of < 10 indicates that there is > a 1 in 10 chances that the read truly originated elsewhere 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 41

42 SAM Format SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*) Sequence ID Flag Chr Position Map Qual Cigar Paired end info HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M1S = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 42

43 SAM Fields : CIGAR String representation of the alignment Example: 52M36890N45M3S REF : chr20 READ 3689N 52 M 45M 3S All Cigar operations 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 43

44 SAM Format SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*) Sequence ID Flag Chr Position Map Qual Cigar Paired end info HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M1S = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4320: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4274: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4333: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = HWI-ST1136:196:HS113:4:1101:4353: chr M = (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 44

45 SAM Fields : Paired-end Information REF : chr20 READ Mate read First read 3 fields: 1) Chromosome «=» if the paired read maps on the same chromosome The other chromosome otherwise 2) Position Mapping position of the paired read 3) Template Length Number of bases from the leftmost mapped base to the rightmost mapped base 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 45

46 Format SAM SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. (*)... Sequence Base qualities Optional tags AGAGAATCGACAAAAGGCTCTGGCCCG CCCFFFFFHHHHHJJJIJIJJJJJJJIJJJB NH:i:1 HI:i:1 AS:i:197 nm:i:0 TCTGGCCCGCAGAGCTGAGAAGTTATT DDDDDBDBDCDDDDDEDDDEDDCCAACDEEE NH:i:1 HI:i:1 AS:i:197 nm:i:0 AACGAATGTAACTTTAAGGCAGGAAAG CCCFFFFFHHHHHJJJJJJJJJJIJJJIIII NH:i:1 HI:i:1 AS:i:198 nm:i:0 ATAGAGGCCCTCTAAATAAGGAATAAA DDDDDDDFFFDDHHHHHJIIGJJJIJIGGCJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 CCTGAGATGTGCGTAGCCTCCGTGTAA CCCFFFFFHHHHHJJJJJJJJJIJIJJJJJJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 ACCCAGCCTTTACCAGCAGCGTACGGC ADDDDDDCDDDCDDDDDDDDDDDFFFFHHHH NH:i:1 HI:i:1 AS:i:198 nm:i:0 GCTGGCATGGTGGTGGGCACCCATAAT CCCFFFFDHHFHHHGIJIJJJJJJJJJJIJJ NH:i:1 HI:i:1 AS:i:198 nm:i:0 GGGCACCCATAATCCTAGCTGCTCAGG DDDBCDCDDDDDCDDDDDDEEECCCFFFEHH NH:i:1 HI:i:1 AS:i:198 nm:i:0 GCCCTTTCAACTTTCCCTCTGGTCCTT CCCFFFFFHHHHHJJIJJIJJJGIJJJJJJJ NH:i:1 HI:i:1 AS:i:196 nm:i:1 CACATCCCCATCTGGGCCCTCTCCTTT DDDDDDDDDCBDDDDDDDDCDEFFFFFHHHH NH:i:1 HI:i:1 AS:i:196 nm:i:1 (*) h?ps://samtools.github.io/hts-specs/samv1.pdf 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 46

47 SAM Fields : Optional TAGS Example of optional tags NM:i : Edit distance between read and reference (number of mismatches fo example) RG:id : Read group. When several samples are merged in one bam file for example NH:i : number of reported alignments for that read XS:A:+/- : if rnaseq stranded library AS:i : Alignment Score 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 47

48 SAM Fields : Optional TAGS Alignment Score AS:i (Ex Bowtie) Higher the score and the more similar the read sequence is to the reference sequence aligned to. A score is calculated by subtracting penalties for each difference (mismatch, gap, etc) and, in local alignment mode, adding bonuses for each match. Example: ACGCGATCGGACTACCATCTAGCATCGACTGCGCATAC ACGCGATCGGACTAGCATCTAGCAT--ACTGCGCATAC Type # Score Matches Mismatches 1-5 GAP 2-11 (=-5-3-3) Total 81 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 48

49 BAM Format BAM : Binary SAM (BGZF compressed) BGZF is block compression implemented on top of the standard gzip file format.13 The goal of BGZF is to provide good compression while allowing efficient random access to the BAM file for indexed queries. The BGZF format is gunzip compatible, in the sense that a compliant gunzip utility can decompress a BGZF compressed file (*) Can be: 1) Sorted by coordinates (or read names) 2) Indexed (bai file) 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 49

50 Converting SAM To BAM SAMTOOLS (*) : Samtools is a suite of programs for interacting with highthroughput sequencing data. Sam to Bam conversion > samtools view Shb o sample.bam sample.sam Sorting Bam file > samtools sort o sample_sorted.bam sample.bam (*) h?p:// 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 50

51 Indexing BAM Indexing aims to achieve fast retrieval of alignments overlapping a specified region without going through the whole alignments. BAM must be sorted by the reference ID and then the leftmost coordinate before indexing. Indexation: Indexing creates a.bai file, next to each indexed bam file Each index uses virtual file offsets into the BGZF file And after? Using a sorted bam file and an index file allows to pose queries like: What reads overlap chrx:start-end? Without decompressing and traversing the whole bam file Indexing Bam file > samtools index sample_sorted.bam 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 51

52 RNA-Seq : Whole Pipeline Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 52

53 RNA-Seq : Differential expression Two main steps: 1) Counting reads on genes 2) Statistical analysis 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 53

54 RNA-Seq : Differential expression Counting reads on genes Gene model Exons Introns Junctions 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 54

55 RNA-Seq : Differential expression Counting reads on genes Gene model Exons Introns Junctions 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 55

56 GFF Format GFF (General Feature Format) lines are based on the Sanger GFF2 specification. GFF lines have nine required fields that must be tab-separated (*) It s a text file with the following fields: 1. seqname - The name of the sequence (chromosome/scaffold) 2. source - The program that generated this feature 3. feature - Type of feature ("CDS", "start_codon", "stop_codon", "exon ) 4. start - Starting position of the feature in the sequence (starts at 1) 5. end - Ending position of the feature (inclusive). 6. score - Score between 0 and 1000 (or. if no value) 7. strand - '+', '-', or '.' 8. frame - If coding exon, frame should be 0-2: reading frame of the first base. 9. group - All lines with the same group are linked together into a single item. GTF format: Refinement of GFF format (*) 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 56

57 GTF Format Example GTF (Gene Transfer Format) is a refinement to GFF that tightens the specification. The first eight GTF fields are the same as GFF. The group field has been expanded into a list of attributes. Each attribute consists of a type/value pair. chr9 hg38_refgene stop_codon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene CDS gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene start_codon gene_id NM_020469; transcript_id NM_020469; chr9 hg38_refgene exon gene_id NM_020469; transcript_id NM_020469; (*) 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 57

58 RNA-Seq : Differential expression Subread FeatureCount : Modèle de gène GTF File BAM File Feature Count G1 count1 G2 count2 Gn countn hip:// 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 58

59 RNA-Seq : Differential expression Subread FeatureCount : Options: -t gene : feature type to count -s 1 : stranded -a gtf file : annotation file -o output file : count file «A read is said to overlap a feature if at least one read base is found to overlap the feature. For paired-end data, a fragment (or template) is said to overlap a feature if any of the two reads from that fragment is found to overlap the feature.» 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 59

60 RNA-Seq : Differential expression Statistical Analysis: DESeq2 Next presentation 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 60

61 RNA-Seq : Pipeline complet Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 61

62 Splicing analysis 3 levels of analysis: Exon Patterns Full transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 62

63 Splicing analysis: Exon level We search for exons that are differentially included in the genes 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 63

64 Splicing analysis: Exon level We search for exons that are differentially included in the genes DEXSeq (*) (*) 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 64

65 Splicing Analysis: DEXSeq Counts per exons Counts are modeled by a negative binomial distribution: One model per gene 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 65

66 Splicing Analysis: DEXSeq 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 66

67 Splicing analysis: Pattern level Différent kind of splicing patterns 1 er alternative exons; Last alternative exons; PA PA Cassette exons; Mutually exclusive exons; Intron retentions; Alternative acceptor splice sites; Alternative donor splice sites. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 67

68 Splicing analysis: Pattern level Assign each read to a pattern: Comparing counts between samples: Sort and list regulated patterns 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 68

69 Splicing analysis: Pattern level Griffith et. al., Nature Methods, /12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 69

70 Splicing analysis: Pattern level Griffith et. al., Nature Methods, /12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 70

71 Splicing analysis: Transcript level Transcript assembly Cufflinks Trinity Kissplice Etc. Analysis Cuffdiff RSEM EBSeq BitSeq Etc. 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 71

72 Splicing analysis: Cufflinks Assembly with Cufflinks Output transcripts.gtf isoforms.fpkm_tracking genes.fpkm_tracking 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 72

73 Splicing analysis: Cufflinks Cuffcompare : Compare assembled transcripts with reference annotations Track transcripts between different replicates Output output.stats output.combined.gtf output.tracking Cuffdiff Find significant differences between transcript expression, splicing, and promotor usage Output FPKM tracking files Count tracking files Read group tracking files Differential expression tests Differential splicing tests - splicing.diff 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 73

74 RNA-Seq : Whole Pipeline Raw Data Preprocessing Mapping Quality control Expression Splicing Fusion transcripts SNPs Visualization Exon Splicing Patterns Transcript 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 74

75 Integrated Genome Viewer The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. Helga ThorvaldsdóQr, James T. Robinson, Jill P. Mesirov. Integra0ve Genomics Viewer (IGV): highperformance genomics data visualiza0on and explora0on. Briefings in Bioinforma0cs 14, (2013). 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 75

76 Integrated Genome Viewer 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 76

77 Integrated Genome Viewer Organism chromosome Genome Positions Basic usage Data Tracks 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 77

78 Integrated Genome Viewer Basic usage Gene annotations Read coverage Read alignments 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 78

79 Integrated Genome Viewer Basic usage Gene annotations SNV C->A: coverage : 73 %: 96% 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 79

80 Integrated Genome Viewer Basic usage: Mapping information 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 80

81 Integrated Genome Viewer Basic usage: Annotations 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 81

82 Integrated Genome Viewer Advanced usage : Importing a new genome 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 82

83 Integrated Genome Viewer Advanced usage : Importing datasets 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 83

84 Integrated Genome Viewer Advanced usage : Importing datasets 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 84

85 Integrated Genome Viewer Advanced usage : Igv Tools - Computing coverage data 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 85

86 Integrated Genome Viewer Advanced usage : Igv Tools - Computing coverage data 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 86

87 Integrated Genome Viewer : RNA-Seq 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 87

88 Integrated Genome Viewer : RNA-Seq 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 88

89 RNA-Seq Lots of applications Variant detection Gene Expression Splicing analysis Fusion transcript Difficult analysis: No fully detailed recommendations for all analyses 10/12/15 Institut Pasteur - C3BI Hands-on NGS course - RNA-Seq 89

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group

10/06/2014. RNA-Seq analysis. With reference assembly. Cormier Alexandre, PhD student UMR8227, Algal Genetics Group RNA-Seq analysis With reference assembly Cormier Alexandre, PhD student UMR8227, Algal Genetics Group Summary 2 Typical RNA-seq workflow Introduction Reference genome Reference transcriptome Reference

More information

RNA Seq: Methods and Applica6ons. Prat Thiru

RNA Seq: Methods and Applica6ons. Prat Thiru RNA Seq: Methods and Applica6ons Prat Thiru 1 Outline Intro to RNA Seq Biological Ques6ons Comparison with Other Methods RNA Seq Protocol RNA Seq Applica6ons Annota6on Quan6fica6on Other Applica6ons Expression

More information

RNAseq and Variant discovery

RNAseq and Variant discovery RNAseq and Variant discovery RNAseq Gene discovery Gene valida5on training gene predic5on programs Gene expression studies Paris japonica Gene discovery Understanding physiological processes Dissec5ng

More information

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE

VM origin. Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE VM origin Okeanos: Image Trinity_U16 (upgrade to Ubuntu16.04, thanks to Alexandros Dimopoulos) X2go: LXDE NGS intro + Genome-Based Transcript Reconstruction and Analysis Using RNA-Seq Data Based on material

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12

Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 Galaxy for Next Generation Sequencing 初探次世代序列分析平台 蘇聖堯 2013/9/12 What s Galaxy? Bringing Developers And Biologists Together. Reproducible Science Is Our Goal An open, web-based platform for data intensive

More information

RNAseq Differential Gene Expression Analysis Report

RNAseq Differential Gene Expression Analysis Report RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample

More information

RNA-Seq Module 2 From QC to differential gene expression.

RNA-Seq Module 2 From QC to differential gene expression. RNA-Seq Module 2 From QC to differential gene expression. Ying Zhang Ph.D, Informatics Analyst Research Informatics Support System (RISS) MSI Apr. 24, 2012 RNA-Seq Tutorials Tutorial 1: Introductory (Mar.

More information

RNA-seq Data Analysis

RNA-seq Data Analysis Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture

More information

Introduction of RNA-Seq Analysis

Introduction of RNA-Seq Analysis Introduction of RNA-Seq Analysis Jiang Li, MS Bioinformatics System Engineer I Center for Quantitative Sciences(CQS) Vanderbilt University September 21, 2012 Goal of this talk 1. Act as a practical resource

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012 Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 A typical RNA-Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,

More information

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD) Analysis of RNA-seq Data Feb 8, 2017 Peikai CHEN (PHD) Outline What is RNA-seq? What can RNA-seq do? How is RNA-seq measured? How to process RNA-seq data: the basics How to visualize and diagnose your

More information

RNA-Seq Analysis. Simon Andrews, Laura v

RNA-Seq Analysis. Simon Andrews, Laura v RNA-Seq Analysis Simon Andrews, Laura Biggins simon.andrews@babraham.ac.uk @simon_andrews v2018-10 RNA-Seq Libraries rrna depleted mrna Fragment u u u u NNNN Random prime + RT 2 nd strand synthesis (+

More information

DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila

DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila Bioinforma>cs analysis and annota>on of variants in NGS data workshop Cape Town, 4th to 6th April 2016 DNA Sequencing:

More information

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols Next generation sequencing protocol cdna, not RNA sequencing

More information

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016

Introduction to RNAseq Analysis. Milena Kraus Apr 18, 2016 Introduction to RNAseq Analysis Milena Kraus Apr 18, 2016 Agenda What is RNA sequencing used for? 1. Biological background 2. From wet lab sample to transcriptome a. Experimental procedure b. Raw data

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 BST 226 Statistical Methods for Bioinformatics David M. Rocke March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1 NGS Technologies Illumina Sequencing HiSeq 2500 & MiSeq PacBio Sequencing PacBio

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Reference genomes and common file formats

Reference genomes and common file formats Reference genomes and common file formats Dóra Bihary MRC Cancer Unit, University of Cambridge CRUK Functional Genomics Workshop September 2017 Overview Reference genomes and GRC Fasta and FastQ (unaligned

More information

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010 Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong

More information

Quantifying gene expression

Quantifying gene expression Quantifying gene expression Genome GTF (annotation)? Sequence reads FASTQ FASTQ (+reference transcriptome index) Quality control FASTQ Alignment to Genome: HISAT2, STAR (+reference genome index) (known

More information

Reference genomes and common file formats

Reference genomes and common file formats Reference genomes and common file formats Overview Reference genomes and GRC Fasta and FastQ (unaligned sequences) SAM/BAM (aligned sequences) Summarized genomic features BED (genomic intervals) GFF/GTF

More information

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG

More information

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis -Seq Analysis Quality Control checks Reproducibility Reliability -seq vs Microarray Higher sensitivity and dynamic range Lower technical variation Available for all species Novel transcript identification

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

Transcriptome analysis

Transcriptome analysis Statistical Bioinformatics: Transcriptome analysis Stefan Seemann seemann@rth.dk University of Copenhagen April 11th 2018 Outline: a) How to assess the quality of sequencing reads? b) How to normalize

More information

Introduction to Next Generation Sequencing

Introduction to Next Generation Sequencing The Sequencing Revolution Introduction to Next Generation Sequencing Dena Leshkowitz,WIS 1 st BIOmics Workshop High throughput Short Read Sequencing Technologies Highly parallel reactions (millions to

More information

02 Agenda Item 03 Agenda Item

02 Agenda Item 03 Agenda Item 01 Agenda Item 02 Agenda Item 03 Agenda Item SOLiD 3 System: Applications Overview April 12th, 2010 Jennifer Stover Field Application Specialist - SOLiD Applications Workflow for SOLiD Application Application

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction

ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June Introduction INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC 1/SC 29/WG 11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC 1/SC 29/WG 11 N15527 Warsaw, CH June

More information

How to deal with your RNA-seq data?

How to deal with your RNA-seq data? How to deal with your RNA-seq data? Rachel Legendre, Thibault Dayris, Adrien Pain, Claire Toffano-Nioche, Hugo Varet École de bioinformatique AVIESAN-IFB 2017 1 Rachel Legendre Bioinformatics 27/11/2018

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory

Bioinformatics Monthly Workshop Series. Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory Bioinformatics Monthly Workshop Series Speaker: Fan Gao, Ph.D Bioinformatics Resource Office The Picower Institute for Learning and Memory Schedule for Fall, 2015 PILM Bioinformatics Web Server (09/21/2015)

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing Complete Report Catalogue # and Service: IR16001 rrna depletion (human, mouse, or rat) IR11081 Total RNA Sequencing (80 million reads, 2x75 bp PE) Xxxxxxx - xxxxxxxxxxxxxxxxxxxxxx

More information

Applications of short-read

Applications of short-read Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ Sequencing applications RNA-Seq includes experiments

More information

ChIP-seq and RNA-seq

ChIP-seq and RNA-seq ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)

More information

Long and short/small RNA-seq data analysis

Long and short/small RNA-seq data analysis Long and short/small RNA-seq data analysis GEF5, 4.9.2015 Sami Heikkinen, PhD, Dos. Topics 1. RNA-seq in a nutshell 2. Long vs short/small RNA-seq 3. Bioinformatic analysis work flows GEF5 / Heikkinen

More information

ChIP-seq and RNA-seq. Farhat Habib

ChIP-seq and RNA-seq. Farhat Habib ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions

More information

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia

RNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia RNA-Seq Workshop AChemS 2017 Sunil K Sukumaran Monell Chemical Senses Center Philadelphia Benefits & downsides of RNA-Seq Benefits: High resolution, sensitivity and large dynamic range Independent of prior

More information

Introduction to NGS analyses

Introduction to NGS analyses Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1

More information

Lecture 7. Next-generation sequencing technologies

Lecture 7. Next-generation sequencing technologies Lecture 7 Next-generation sequencing technologies Next-generation sequencing technologies General principles of short-read NGS Construct a library of fragments Generate clonal template populations Massively

More information

RNA

RNA RNA sequencing Michael Inouye Baker Heart and Diabetes Institute Univ of Melbourne / Monash Univ Summer Institute in Statistical Genetics 2017 Integrative Genomics Module Seattle @minouye271 www.inouyelab.org

More information

Introduction to bioinformatics (NGS data analysis)

Introduction to bioinformatics (NGS data analysis) Introduction to bioinformatics (NGS data analysis) Alexander Jueterbock 2015-06-02 1 / 45 Got your sequencing data - now, what to do with it? File size: several Gb Number of lines: >1,000,000 @M02443:17:000000000-ABPBW:1:1101:12675:1533

More information

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq

Sequencing applications. Today's outline. Hands-on exercises. Applications of short-read sequencing: RNA-Seq and ChIP-Seq Sequencing applications Applications of short-read sequencing: RNA-Seq and ChIP-Seq BaRC Hot Topics March 2013 George Bell, Ph.D. http://jura.wi.mit.edu/bio/education/hot_topics/ RNA-Seq includes experiments

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

NGS Data Analysis and Galaxy

NGS Data Analysis and Galaxy NGS Data Analysis and Galaxy University of Pretoria Pretoria, South Africa 14-18 October 2013 Dave Clements, Emory University http://galaxyproject.org/ Fourie Joubert, Burger van Jaarsveld Bioinformatics

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day five Alternative splicing Assembly RNA edits Alternative splicing

More information

Short Read Alignment to a Reference Genome

Short Read Alignment to a Reference Genome Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Summer School in Bioinformatics Cambridge, September 2018 Aligning to a reference genome BWA Bowtie2 STAR GEM Pseudo Aligners for RNA-seq

More information

Francisco García Quality Control for NGS Raw Data

Francisco García Quality Control for NGS Raw Data Contents Data formats Sequence capture Fasta and fastq formats Sequence quality encoding Quality Control Evaluation of sequence quality Quality control tools Identification of artifacts & filtering Practical

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

Sequence Analysis 2RNA-Seq

Sequence Analysis 2RNA-Seq Sequence Analysis 2RNA-Seq Lecture 10 2/21/2018 Instructor : Kritika Karri kkarri@bu.edu Transcriptome Entire set of RNA transcripts in a given cell for a specific developmental stage or physiological

More information

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ),

Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), Analysis of data from high-throughput molecular biology experiments Lecture 6 (F6, RNA-seq ), 2012-01-26 What is a gene What is a transcriptome History of gene expression assessment RNA-seq RNA-seq analysis

More information

SCALABLE, REPRODUCIBLE RNA-Seq

SCALABLE, REPRODUCIBLE RNA-Seq SCALABLE, REPRODUCIBLE RNA-Seq SCALABLE, REPRODUCIBLE RNA-Seq Advances in the RNA sequencing workflow, from sample preparation through data analysis, are enabling deeper and more accurate exploration

More information

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012

Introduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012 Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 Transcriptome: the old school Cyanine 5 (Cy5) Cy-3: - Excitation 550nm - Emission 570nm Cy-5: - Excitation

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science + UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point

More information

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format:

Why QC? Next-Generation Sequencing: Quality Control. Illumina data format. Fastq format: Why QC? Next-Generation Sequencing: Quality Control BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute Do you want to include the reads with low quality base calls?

More information

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis

Data Basics. Josef K Vogt Slides by: Simon Rasmussen Next Generation Sequencing Analysis Data Basics Josef K Vogt Slides by: Simon Rasmussen 2017 Generalized NGS analysis Sample prep & Sequencing Data size Main data reductive steps SNPs, genes, regions Application Assembly: Compare Raw Pre-

More information

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies

Statistical Genomics and Bioinformatics Workshop. Genetic Association and RNA-Seq Studies Statistical Genomics and Bioinformatics Workshop: Genetic Association and RNA-Seq Studies RNA Seq and Differential Expression Analysis Brooke L. Fridley, PhD University of Kansas Medical Center 1 Next-generation

More information

Next-Generation Sequencing: Quality Control

Next-Generation Sequencing: Quality Control Next-Generation Sequencing: Quality Control Bingbing Yuan BaRC Hot Topics January 2017 Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Why QC? Do you want to

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

L3: Short Read Alignment to a Reference Genome

L3: Short Read Alignment to a Reference Genome L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

measuring gene expression December 11, 2018

measuring gene expression December 11, 2018 measuring gene expression December 11, 2018 Intervening Sequences (introns): how does the cell get rid of them? Splicing!!! Highly conserved ribonucleoprotein complex recognizes intron/exon junctions and

More information

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Introduc)on to Genomics

Introduc)on to Genomics Introduc)on to Genomics Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák h1p://ngs-course.readthedocs.org/en/praha-january-2017/ Genome The genome is the gene,c material

More information

Introduction to RNA-Seq in GeneSpring NGS Software

Introduction to RNA-Seq in GeneSpring NGS Software Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,

More information

Differential gene expression analysis using RNA-seq

Differential gene expression analysis using RNA-seq https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, August 2017 Friederike Dündar with Luce Skrabanek & Ceyda Durmaz Day 3 QC of aligned reads

More information

Introduction to RNA-Seq

Introduction to RNA-Seq Introduction to RNA-Seq Monica Britton, Ph.D. Bioinformatics Analyst September 2014 Workshop Overview of Today s Activities Morning RNA-Seq Concepts, Terminology, and Work Flows Two-Condition Differential

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

Deep Sequencing technologies

Deep Sequencing technologies Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University

More information

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer. DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.

More information

Galaxy Platform For NGS Data Analyses

Galaxy Platform For NGS Data Analyses Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory http://collaboratory.lifesci.ucla.edu Workshop Outline ü Day 1 UCLA galaxy

More information

NGS part 2: applications. Tobias Österlund

NGS part 2: applications. Tobias Österlund NGS part 2: applications Tobias Österlund tobiaso@chalmers.se NGS part of the course Week 4 Friday 13/2 15.15-17.00 NGS lecture 1: Introduction to NGS, alignment, assembly Week 6 Thursday 26/2 08.00-09.45

More information

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS)

RNA-sequencing. Next Generation sequencing analysis Anne-Mette Bjerregaard. Center for biological sequence analysis (CBS) RNA-sequencing Next Generation sequencing analysis 2016 Anne-Mette Bjerregaard Center for biological sequence analysis (CBS) Terms and definitions TRANSCRIPTOME The full set of RNA transcripts and their

More information

Illumina Sequencing Error Profiles and Quality Control

Illumina Sequencing Error Profiles and Quality Control Illumina Sequencing Error Profiles and Quality Control RNA-seq Workflow Biological samples/library preparation Sequence reads FASTQC Adapter Trimming (Optional) Splice-aware mapping to genome Counting

More information

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

High performance sequencing and gene expression quantification

High performance sequencing and gene expression quantification High performance sequencing and gene expression quantification Ana Conesa Genomics of Gene Expression Lab Centro de Investigaciones Príncipe Felipe Valencia aconesa@cipf.es Next Generation Sequencing NGS

More information

Canadian Bioinforma3cs Workshops

Canadian Bioinforma3cs Workshops Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu

More information

Bioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION

Bioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION IDENTIFYING A DISEASE CAUSING MUTATION MARCELA DAVILA 2/03/2017 Core Facilities at Sahlgrenska Academy www.cf.gu.se 5 statisticians, 3 bioinformaticians Consultation 7-8 Courses / year Contact information

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Eucalyptus gene assembly

Eucalyptus gene assembly Eucalyptus gene assembly ACGT Plant Biotechnology meeting Charles Hefer Bioinformatics and Computational Biology Unit University of Pretoria October 2011 About Eucalyptus Most valuable and widely planted

More information

RNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

RNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland RNA-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? How to operate the Chipster software Short introduction to RNA-seq Analyzing RNA-seq

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data

More information

Wheat CAP Gene Expression with RNA-Seq

Wheat CAP Gene Expression with RNA-Seq Wheat CAP Gene Expression with RNA-Seq July 9 th -13 th, 2018 Overview of the workshop, Alina Akhunova http://www.ksre.k-state.edu/igenomics/workshops/ RNA-Seq Workshop Activities Lectures Laboratory Molecular

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

RNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland

RNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland RNA-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? 1. What you can do with Chipster and how to operate it 2. What RNA-seq can be

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

RNA-Seq analysis workshop

RNA-Seq analysis workshop RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of

More information

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen

RNA Sequencing. Next gen insight into transcriptomes , Elio Schijlen RNA Sequencing Next gen insight into transcriptomes 05-06-2013, Elio Schijlen Transcriptome complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological

More information

De Novo Assembly of High-throughput Short Read Sequences

De Novo Assembly of High-throughput Short Read Sequences De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,

More information

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

Genome 373: Mapping Short Sequence Reads II. Doug Fowler Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half

More information