Pathway analysis. Martina Kutmon. Department of Bioinformatics, Maastricht University, The Netherlands

Size: px
Start display at page:

Download "Pathway analysis. Martina Kutmon. Department of Bioinformatics, Maastricht University, The Netherlands"

Transcription

1 Pathway analysis Martina Kutmon Department of Bioinformatics, Maastricht University, The Netherlands Utrecht, 11 September 2013

2 Why Pathway Analysis? 2

3 Why Pathway Analysis? Scanned microarrays Image analysis Set of affected pathways Pathway analysis QC Normalization Raw intensities Biological Interpretation Set of overrepresentated GO terms GO analysis Statistical analysis Normalized intensities Set of co-regulated genes Clustering Gene level statistics 3

4 Why Pathway Analysis? Hmgcr Dgat1 Ldlr Mttp Soat1 Lipc Pltp Lcat 4

5 Why Pathway Analysis? A picture is worth a thousand words. Intuitive Puts data into biological context More efficient than looking up single gene information 5

6 Why Pathway Analysis? Involvement in pathways Group genes, proteins and other biological molecules Reducing complexity Several hundred pathways instead of thousands of genes Analysis on functional level Identify active pathways that differ between two conditions Higher explanatory power than a simple list of genes 6

7 Pathway Analysis Methods Overrepresentation analysis (ORA) Functional Class Scoring (FCS) Pathway Topology (PT) Based Khatri, Purvesh, Marina Sirota, and Atul J. Butte. "Ten years of pathway analysis: current approaches and outstanding challenges. PLoS computational biology 8.2 (2012): e

8 Pathway Analysis Methods Overrepresentation analysis: Define input list through criteria Count genes in pathway for input and background list Perform statistical test for over- or under-representation (e.g. hypergeometric test) 8

9 Pathway Analysis Methods 1. Total number of genes measured in experiment = N 2. Set criteria to define total number of differential expressed genes = R 3. Total number of genes in pathway that are measured in experiment = n 4. Number of genes changed in the pathway = r N = 35 genes R = 9 genes n = 12 genes r = 6 genes Z-Score =

10 Pathway Analysis Methods What does the Z-Score tell you? Z-Score > 1.96: Significantly more genes are changed in the pathway compared to the complete data set Z-Score = 0: Distribution of changed genes in the pathway is the same as in the complete data set Z-Score < Significantly less genes are changed in the pathway compared to the complete data set BUT... 10

11 Pathway Analysis Methods Be aware! Overrepresentation analysis and functional class scoring DO NOT take pathway topology into account! Always manually verify the pathways to make the right conclusions! 11

12 Pathway Analysis with PathVisio 12

13 PathVisio a tool to edit and analyze biological pathways 13

14 PathVisio What can you do with PathVisio? Draw your pathways Visualize your data on pathways Find pathways regulated in your data set New PathVisio 3 version since March 2013 Modular system that can be extended by plugins Plugin manager one-click installation of plugins from the PathVisio repository 14

15 PathVisio application Toolbar: drawing objects, layout, view, visualization Pathway display area Status bar: Loaded databases, data sets Sidepanel: Drawing, Editing, Backpage, Search, Legend 15

16 PathVisio plugins Currently 9 plugins in the repository 10 plugins in pre-release testing state (will be submitted in the next few month) Open source developers all over the world 3 students in Google Summer of Code 2013 WikiPathways plugin SBML plugin PathBuilder plugin 16

17 Plugin Repository 17

18 PathVisio plugins Current and soon to be released plugins 18

19 PathVisio plugin manager 19

20 PathVisio walkthrough 20

21 PathVisio Walkthrough 1 Draw pathways Load identifier mapping databases 2 Visualize data Create new pathway Load dataset WikiPathways Share/Upload Download 3 Pathway statistics Select pathway collection Pathway drawn in PathVisio/ Downloaded from Wikipathways Find regulated pathways Create data visualization Visualize data on pathway Export pathway image with data 21

22 0 Load identifier mapping databases Identifier mapping datbases from Download bridge files from Gene products (based on Ensembl) Metabolites (based on HMDB) Data -> Select Gene/Metabolite Database Check status bar! 22

23 Identifier Mapping Annotation of data nodes and interactions Xref identifier + data source 23

24 Identifier Mapping Annotation How does it work in PathVisio? ID Mapping 24

25 1 Draw / Edit Pathways Default set of drawing objects Plugins: MIM SBGN PathwayValidator MAPPBuilder PathBuilder 25

26 Draw/Edit Pathways Share your pathways on WikiPathways Plugin: wp-client Search Browse Upload Update GSoC 2013 project until September 23 Student: Sravanthi Sinha 26

27 2 Data Visualization Load dataset Quantitative data Comma separated file (.csv or.txt) Identifier column System code (1-2 letter code) column datasource (optional) 27

28 Create visualization Data Visualization Basic one value (e.g. logfc) Advanced multiple values (e.g. logfc + p.value) Gradient based visualization logfc Rule based visualization p.value 28

29 Visualize data on pathway Data Visualization 29

30 Data Visualization Export high quality pathway image with data for publications Plugins: HTMLexport plugin BioPAX plugin 30

31 3 Pathway Statistics Download pathway collection from WikiPathways 20 different species Find regulated pathways Default method: Z-Score statistics 31

32 Pathway Statistics Default method: Z-Score statistics Plugins: GSEA gene set enrichment analysis GeneOntology analysis find overrepresented GO terms 32

33 PathVisio Workflow Integration Use PathVisio functionality from R, Perl, Python,... Perform analysis for many data sets PathVisioRPC Multiple data sets... Statistics Results Pathway Figures 33

34 Pathway Data Visualization 34

35 Tissue expression data Jennen, Danyel GJ, et al. "Biotransformation pathway maps in WikiPathways enable direct visualization of drug metabolism related expression changes."drug discovery today (2010): Meta pathway for biotransformation showing basal gene expression data from human tissues: (i) kidney cortex; (ii) kidney medulla (iii) liver (iv) lung 35

36 Multi-omics data visualization Visualize multiple experiments together on the pathway 36

37 Time series data Tisoncik, Jennifer R., et al. Into the eye of the cytokine storm. Microbiology and Molecular Biology Reviews 76.1 (2012): Each column in a datanode for one time point 37

38 Acknowledgements Many developers/collaborators around the world! Martijn van Iersel General Bioinformatics Thomas Kelder TNO Anwesha Bohler Maastricht University Nuno Nunes Maastricht University 38

39 Questions? 39