Introduction to NGS analyses

Size: px
Start display at page:

Download "Introduction to NGS analyses"

Transcription

1 Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

2 Overview 1 Introduction DNA sequencing Applications Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

3 Overview 1 Introduction DNA sequencing Applications 2 Primary Analysis ChIPseq RNAseq Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

4 Overview 1 Introduction DNA sequencing Applications 2 Primary Analysis ChIPseq RNAseq 3 Interpretation Visualization Analysis Software Downstream Analysis Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

5 Overview 1 Introduction DNA sequencing Applications 2 Primary Analysis ChIPseq RNAseq 3 Interpretation Visualization Analysis Software Downstream Analysis 4 Online Resources ENCODE GEO ENA Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

6 Introduction DNA sequencing Fred Sanger Late 1970s Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

7 Introduction DNA sequencing Fred Sanger Late 1970s Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

8 Introduction DNA sequencing Fred Sanger Late 1970s... Several million times... Late 2000s Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

9 Introduction DNA sequencing Fred Sanger Late 1970s... Several million times... Late 2000s Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

10 Introduction Applications Reuter et al, Mol Cell, 2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

11 Introduction Applications Reuter et al, Mol Cell, 2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

12 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

13 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

14 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

15 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

16 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

17 Introduction ChIPseq Genome Wide Occupancy Profiling Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

18 Introduction ChIPseq Genome Wide Occupancy Profiling Overlapping reads come from different cells!!! NGS is a cell population readout ( 50 million cells) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

19 Introduction ChIPseq Primary Sequencing Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

20 Introduction ChIPseq Primary Sequencing Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

21 Introduction ChIPseq Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

22 Introduction ChIPseq Genome wide, highly accurate representation of genomic states Fairly unbiased Complementary and overlapping information Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

23 Primary Analysis ChIPseq Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

24 Primary Analysis ChIPseq Raw Data Aligned Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

25 Primary Analysis ChIPseq Raw Data Aligned Data Enriched Regions Read Density Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

26 Primary Analysis ChIPseq Raw Data Aligned Data Enriched Regions Read Density Visualization Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

27 Primary Analysis ChIPseq Raw Data Aligned Data Enriched Regions Read Density Target Genes Visualization Functional Analysis Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

28 Primary Analysis ChIPseq Raw Data Aligned Data Overlaps Distribution Enriched Regions Read Density Motifs Target Genes Visualization Functional Analysis Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

29 Primary Analysis ChIPseq Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

30 Primary Analysis ChIPseq Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

31 Primary Analysis ChIPseq Raw Data Operations: DeMultiplexing Quality Control Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

32 Primary Analysis ChIPseq Raw Data FastQC Operations: DeMultiplexing Quality Control Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

33 Primary Analysis ChIPseq Raw Data FastQC Operations: DeMultiplexing Quality Control Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

34 Primary Analysis ChIPseq Raw Data FastQC Operations: DeMultiplexing Quality Control Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

35 Primary Analysis ChIPseq Raw Data FastQC Operations: DeMultiplexing Quality Control Align Data... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

36 Primary Analysis ChIPseq Align Data... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

37 Primary Analysis ChIPseq Align Data... Aligner (Mapper) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

38 Primary Analysis ChIPseq Align Data... Aligner (Mapper) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

39 Primary Analysis ChIPseq Align Data... Aligner (Mapper) Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Different mappers will use different Index Formats... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

40 Primary Analysis ChIPseq Align Data... Aligner (Mapper) bowtie.2 Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Different mappers will use different Index Formats... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

41 Primary Analysis ChIPseq Align Data... Aligner (Mapper) bowtie.2 Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Fast... Versatile... Documented and supported... Different mappers will use different Index Formats... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

42 Primary Analysis ChIPseq Align Data... Aligner (Mapper) bowtie.2 Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Different mappers will use different Index Formats... % of alignment... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

43 Primary Analysis ChIPseq Align Data... Aligner (Mapper) bowtie.2 Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Different mappers will use different Index Formats... % of alignment... Low percentage? Contaminations, adapter/barcode trimming, wrong reference... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

44 Primary Analysis ChIPseq Align Data... Aligner (Mapper) bowtie.2 Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... (igenomes, UCSC, NCBI, EBI...) Different mappers will use different Index Formats... % of alignment... Low percentage? Contaminations, adapter/barcode trimming, wrong reference... Bad Library... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

45 Primary Analysis ChIPseq Aligned Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

46 Primary Analysis ChIPseq Aligned Data SAM BAM BED Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

47 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

48 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

49 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Chr Start Stop ID Score SET of GENOMIC COORDINATES (PEAKS...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

50 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions How Many??? Chr Start Stop ID Score SET of GENOMIC COORDINATES (PEAKS...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

51 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions How Many??? Where??? Chr Start Stop ID Score SET of GENOMIC COORDINATES (PEAKS...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

52 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Chr Start Stop ID Score How Many??? Where??? With Who??? SET of GENOMIC COORDINATES (PEAKS...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

53 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Chr Start Stop ID Score How Many??? Where??? With Who??? Function??? SET of GENOMIC COORDINATES (PEAKS...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

54 Primary Analysis ChIPseq Aligned Data SAM BAM BED Enriched Regions Chr Start Stop ID Score How Many??? Where??? With Who??? Function??? SET of GENOMIC COORDINATES (PEAKS...) HOW??? Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

55 Primary Analysis Peak Calling Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

56 Primary Analysis Peak Calling Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads Challenge: Not one single peak shape... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

57 Primary Analysis Peak Calling Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads Challenge: Not one single peak shape... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

58 Primary Analysis Peak Calling Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads Challenge: Not one single peak shape... Number of peaks Peak distribution Peak height Range of measurements Experimental variation (IP...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

59 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

60 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

61 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

62 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

63 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

64 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Peaks represent only a summary of a ChIPseq experiment... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

65 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Peaks represent only a summary of a ChIPseq experiment... Useful metrics: FRiP: Fraction of Reads in Peaks (>1%) IDR: Irreproducible Discovery Rate Significance: pvalue, ChIPtoINPUT ratio, # of reads... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

66 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Peaks represent only a summary of a ChIPseq experiment... Empirical: Known Motif: Enriched in peaks, Central Position Distribution: Near TSS, Within GeneBody, Enhancers Target Genes: Gene expression changes, Functional Annotation Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

67 Primary Analysis Peak Calling Solutions: Different Peak Callers Optimal Parameters Correct Experimental Design Good ChIP... Peaks represent only a summary of a ChIPseq experiment... Empirical: Known Motif: Enriched in peaks, Central Position Distribution: Near TSS, Within GeneBody, Enhancers Target Genes: Gene expression changes, Functional Annotation Browser inspection and previously known sites Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

68 Primary Analysis Peak Calling TFs: MACS HMs: SICER Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

69 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

70 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

71 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

72 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

73 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

74 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage Percentage forward tags reverse tags shifted tags d= Distance to the middle Peak Model forward tags reverse tags shifted tags d=50 WIG file: Normalized Read Density text format bigwig file: Binary Read Density format Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

75 Primary Analysis Peak Calling TFs: MACS HMs: SICER Running MACS: macs14 -t ChIP -c Input -g mm -n Name Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Peak Model Percentage forward tags reverse tags shifted tags d= Distance to the middle Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

76 Primary Analysis ChIPseq Summary General Guidelines: Biological Replicates (2x) INPUT (noip sample) mln reads per sample Optimize ChIP conditions before library preparation Minimize experimental variation Look at the data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

77 Primary Analysis ChIPseq Summary General Guidelines: Biological Replicates (2x) INPUT (noip sample) mln reads per sample Optimize ChIP conditions before library preparation Minimize experimental variation Look at the data ChIP-seq guidelines and practices of the ENCODE and modencode consortia Landt et al, Genome Res, 2012 Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data Bailey et al, PLoS Comp Biol, 2013 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

78 Primary Analysis RNAseq Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

79 Primary Analysis RNAseq Raw Data Aligned Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

80 Primary Analysis RNAseq Raw Data Aligned Data Read Counts Condition 1 Read Counts Condition 2 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

81 Primary Analysis RNAseq Raw Data Aligned Data Read Counts Condition 1 Read Counts Condition 2 Differential Expression Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

82 Primary Analysis RNAseq Aligner (Mapper) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

83 Primary Analysis RNAseq Aligner (Mapper) Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... Spliced alignment Build reference based on known transcripts Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

84 Primary Analysis RNAseq Aligner (Mapper) STAR Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... Spliced alignment Build reference based on known transcripts Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

85 Primary Analysis RNAseq Aligner (Mapper) STAR Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... Spliced alignment Build reference based on known transcripts VERY Fast... (45 million paired reads per hour per processor) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

86 Primary Analysis RNAseq Aligner (Mapper) STAR Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... Spliced alignment Build reference based on known transcripts VERY Fast... (45 million paired reads per hour per processor) TopHat2 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

87 Primary Analysis RNAseq Aligner (Mapper) STAR Reference Genome Human (hg18, hg19) Mouse (mm9, mm10)... Spliced alignment Build reference based on known transcripts VERY Fast... (45 million paired reads per hour per processor) TopHat2 If you want to use CuffLinks... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

88 Primary Analysis RNAseq Raw Data Aligned Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

89 Primary Analysis RNAseq Raw Data Aligned Data Read Counts Condition 1 htseqcount Read Counts Condition 2 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

90 Primary Analysis RNAseq Raw Data Aligned Data Read Counts Condition 1 htseqcount Read Counts Condition 2 edger Differential Expression DESeq Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

91 Primary Analysis RNAseq Design Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

92 Primary Analysis RNAseq Design Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

93 Primary Analysis RNAseq Design Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

94 Primary Analysis RNAseq Design Raw Data Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

95 Interpretation Visualization Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

96 Interpretation Visualization Local Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

97 Interpretation Visualization Local Online Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

98 Interpretation Visualization Local Online IGV (Integrative Genomics Viewer) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

99 Interpretation Visualization Local Online IGV (Integrative Genomics Viewer) UCSC Genome Browser Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

100 Interpretation Visualization Local Online IGV (Integrative Genomics Viewer) UCSC Genome Browser Cross Platform (Win, Mac, Linux) Different file formats Fast Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

101 Interpretation Visualization Local Online IGV (Integrative Genomics Viewer) Cross Platform (Win, Mac, Linux) Different file formats Fast UCSC Genome Browser Web based Slow Comprehensive database Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

102 Interpretation Visualization G1E 2 Scale chr11: STS Markers BC AK G1E-ER4 E CH12 CTCF chr chr CH12 p300 SC-584 CH12 Pol2 CH12 c-jun MEL CTCF MEL GATA-1 MEL p300 SC-584 MEL Pol2 Mammal Cons 0.15 _ 0 _ 0.15 _ 0 _ 2.1 _ _ Rat Human Orangutan Dog Horse Opossum Chicken Stickleback RepeatMasker SNPs (128) 50 kb mm9 18,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000 chr chr Meis1 Meis1 Meis1 STS Markers on Genetic and Radiation Hybrid Maps UCSC Genes (RefSeq, GenBank, trnas & Comparative Genomics) G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU chr chr chr chr chr G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU chr chr G1E DNaseI HS Signal Rep 2 from ENCODE/PSU G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 c-jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 c-jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale Placental Mammal Basewise Conservation by PhyloP Multiz Alignments of 30 Vertebrates Repeating Elements by RepeatMasker Simple Nucleotide Polymorphisms (dbsnp build 128) Gm16140 Meis1 AK chr chr chr chr chr chr Local Online IGV (Integrative Genomics Viewer) Cross Platform (Win, Mac, Linux) Different file formats Fast UCSC Genome Browser Web based Slow Comprehensive database Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

103 Interpretation Visualization G1E 2 Scale chr11: STS Markers BC AK G1E-ER4 E CH12 CTCF chr chr CH12 p300 SC-584 CH12 Pol2 CH12 c-jun MEL CTCF MEL GATA-1 MEL p300 SC-584 MEL Pol2 Mammal Cons 0.15 _ 0 _ 0.15 _ 0 _ 2.1 _ _ Rat Human Orangutan Dog Horse Opossum Chicken Stickleback RepeatMasker SNPs (128) 50 kb mm9 18,780,000 18,790,000 18,800,000 18,810,000 18,820,000 18,830,000 18,840,000 18,850,000 18,860,000 18,870,000 18,880,000 18,890,000 18,900,000 18,910,000 18,920,000 chr chr Meis1 Meis1 Meis1 STS Markers on Genetic and Radiation Hybrid Maps UCSC Genes (RefSeq, GenBank, trnas & Comparative Genomics) G1E DNaseI HS Peaks Rep 2 from ENCODE/PSU chr chr chr chr chr G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Peaks Rep 2 from ENCODE/PSU chr chr G1E DNaseI HS Signal Rep 2 from ENCODE/PSU G1E-ER4 Estradiol 24hr DNaseI HS DNase-seq Signal Rep 2 from ENCODE/PSU CH12 CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale CH12 c-jun TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale CH12 c-jun TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL CTCF TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL CTCF TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL GATA-1 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL GATA-1 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL p300 (SC-584) TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL p300 (SC-584) TFBS ChIP-seq Signal from ENCODE/Stanford/Yale MEL Pol2 TFBS ChIP-seq Peaks from ENCODE/Stanford/Yale MEL Pol2 TFBS ChIP-seq Signal from ENCODE/Stanford/Yale Placental Mammal Basewise Conservation by PhyloP Multiz Alignments of 30 Vertebrates Repeating Elements by RepeatMasker Simple Nucleotide Polymorphisms (dbsnp build 128) Gm16140 Meis1 AK chr chr chr chr chr chr Local Online GenomeGraphs (R package) UCSC Genome Browser Web based Slow Comprehensive database Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

104 Interpretation Visualization Local Online GenomeGraphs (R package) Ariadne (GeneViewer) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

105 Interpretation Analysis Software Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

106 Interpretation Analysis Software GALAXY server Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

107 Interpretation Analysis Software GALAXY server Online collection of widely used bioinformatics tools (e.g. FastQC, bowtie.2, CuffLinks, MACS, SICER...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

108 Interpretation Analysis Software GALAXY server Online collection of widely used bioinformatics tools (e.g. FastQC, bowtie.2, CuffLinks, MACS, SICER...) Learn Galaxy... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

109 Interpretation Analysis Software GALAXY server Online collection of widely used bioinformatics tools (e.g. FastQC, bowtie.2, CuffLinks, MACS, SICER...) Learn Galaxy... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

110 Interpretation Analysis Software GALAXY server Chipster Online collection of widely used bioinformatics tools (e.g. FastQC, bowtie.2, CuffLinks, MACS, SICER...) Learn Galaxy... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

111 Interpretation Analysis Software GALAXY server Chipster Online collection of widely used bioinformatics tools (e.g. FastQC, bowtie.2, CuffLinks, MACS, SICER...) User-friendly analysis software for high-throughput data (GUI) Learn Galaxy... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

112 Interpretation Downstream Analysis Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

113 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

114 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) Functional annotation of cis-regulatory regions Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

115 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) Functional annotation of cis-regulatory regions Input: BED file Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

116 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) Functional annotation of cis-regulatory regions Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

117 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) Functional annotation of cis-regulatory regions Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

118 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) MEME-ChIP Functional annotation of cis-regulatory regions Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

119 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) MEME-ChIP Functional annotation of cis-regulatory regions Perform motif discovery, motif enrichment analysis and clustering Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

120 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) MEME-ChIP Functional annotation of cis-regulatory regions Input: BED file Perform motif discovery, motif enrichment analysis and clustering Input: FASTA file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

121 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) MEME-ChIP Functional annotation of cis-regulatory regions Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) Perform motif discovery, motif enrichment analysis and clustering Input: FASTA file Output: Enriched motifs Motif distribution plots http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

122 Interpretation Downstream Analysis Genomic Regions Enrichment of Annotations Tool (GREAT) MEME-ChIP Functional annotation of cis-regulatory regions Input: BED file Output: List of potential target genes Peak distribution plots Geneset enrichment analysis (GO, Pathways, Phenotypes...) Perform motif discovery, motif enrichment analysis and clustering Input: FASTA file Output: Enriched motifs Motif distribution plots http: //bejerano.stanford.edu/great/public/html/index.php Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

123 Online Resources ENCODE Encyclopedia of DNA Elements Aim: build a comprehensive parts list of functional elements in the human genome Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

124 Online Resources ENCODE Encyclopedia of DNA Elements Aim: build a comprehensive parts list of functional elements in the human genome New website is extremely easy to use! Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

125 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

126 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

127 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

128 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

129 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

130 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

131 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

132 Online Resources ENCODE Encyclopedia of DNA Elements Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

133 Online Resources GEO Gene Expression Omnibus Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

134 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

135 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository All NGS datasets have to be deposited to GEO prior to publication Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

136 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository All NGS datasets have to be deposited to GEO prior to publication All datasets included in GEO are publicly available for academic use Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

137 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository All NGS datasets have to be deposited to GEO prior to publication All datasets included in GEO are publicly available for academic use The search function is pretty close to awful Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

138 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository All NGS datasets have to be deposited to GEO prior to publication All datasets included in GEO are publicly available for academic use The search function is pretty close to awful Best way to query the GEO database is by Accession Number (declared in the manuscript, GSE30142, GSM746581) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

139 Online Resources GEO Gene Expression Omnibus GEO is a public functional genomics data repository All NGS datasets have to be deposited to GEO prior to publication All datasets included in GEO are publicly available for academic use The search function is pretty close to awful Best way to query the GEO database is by Accession Number (declared in the manuscript, GSE30142, GSM746581) FTP access to SRA experiment (Sequence Read Archive) Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

140 Online Resources ENA European Nucleotide Archive Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

141 Online Resources ENA European Nucleotide Archive ENA provides a comprehensive record of the world s nucleotide sequencing information Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

142 Online Resources ENA European Nucleotide Archive ENA provides a comprehensive record of the world s nucleotide sequencing information It covers raw sequencing data, sequence assembly information and functional annotation Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

143 Online Resources ENA European Nucleotide Archive ENA provides a comprehensive record of the world s nucleotide sequencing information It covers raw sequencing data, sequence assembly information and functional annotation The search function is pretty close to awful Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

144 Online Resources ENA European Nucleotide Archive ENA provides a comprehensive record of the world s nucleotide sequencing information It covers raw sequencing data, sequence assembly information and functional annotation The search function is pretty close to awful Large overlap with GEO database Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

145 Comprehensive database of genomic features Systemic understanding of biological processes Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28

146 Thank you... Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/ / 28