Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

Similar documents
Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

27041, Week 02. Review of Week 01

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

LARGE-SCALE PROTEIN INTERACTOMICS. Karl Frontzek Institute of Neuropathology

Experimental Techniques 2

CS 5984: Topics and Schedule

Finding Compensatory Pathways in Yeast Genome

Genome Biology and Biotechnology

The Two-Hybrid System

Protein complex membership estimation using apcomplex

V 1 Introduction! Fri, Oct 24, 2014! Bioinformatics 3 Volkhard Helms!

Every object that biology studies is a system of systems (1974) Francois Jacob (1965 Nobel Prize in Medicine)

A New Algorithm for Protein-Protein Interaction Prediction

12/11/12. Dr. Sanjeeva Srivastava. IIT Bombay 2. Interactomics Yeast Two-Hybrid Immunoprecipitation Protein microarrays. Proteomics Course NPTEL

Role of Centrality in Network Based Prioritization of Disease Genes

Introduction to Bioinformatics. Ulf Leser

Introduction to Bioinformatics. Ulf Leser

Dissecting the Human Protein-Protein Interaction Network via Phylogenetic Decomposition

Supplementary materials

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Introduction to Bioinformatics Finish. Johannes Starlinger

ECS 234: Genomic Data Integration ECS 234

Lecture 22 Eukaryotic Genes and Genomes III

Biology 644: Bioinformatics

Gene function prediction. Computational analysis of biological networks. Olga Troyanskaya, PhD

Uncovering differentially expressed pathways with protein interaction and gene expression data

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data

Random matrix analysis for gene co-expression experiments in cancer cells

Network System Inference

MBios 478: Mass Spectrometry Applications [Dr. Wyrick] Slide #1. Lecture 25: Mass Spectrometry Applications

Outline and learning objectives. From Proteomics to Systems Biology. Integration of omics - information

From Proteomics to Systems Biology. Integration of omics - information

NETWORK BASED PRIORITIZATION OF DISEASE GENES

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Understanding protein lists from comparative proteomics studies

Protein interaction network for Alzheimer's disease using computational approach

Retrieval of gene information at NCBI

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

ROLE OF SYNTHETIC GENETIC INTERACTIONS INTERACTIONS AMONG PATHWAYS IN UNDERSTANDING FUNCTIONAL. S. Mohammadi G. Kollias A. Grama

MFMS: Maximal Frequent Module Set mining from multiple human gene expression datasets

The Interaction-Interaction Model for Disease Protein Discovery

Proteomics and some of its Mass Spectrometric Applications

Chinese Whispers for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Experimental Bioinformatics Lab Institute for Research in Biomedicine Barcelona Barcelona Supercomputing Center

Introduction to Molecular Biology

Yeast 2-Hybrid Kayla Nygaard

Introduction to Microarray Analysis

Identifying Signaling Pathways. BMI/CS 776 Spring 2016 Anthony Gitter

7.03, 2006, Lecture 23 Eukaryotic Genes and Genomes IV

7.05, 2005, Lecture 23 Eukaryotic Genes and Genomes IV

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Metabolic Networks. Ulf Leser and Michael Weidlich

GenMiner: Mining Informative Association Rules from Genomic Data

ChIP-seq and RNA-seq. Farhat Habib

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Motivation From Protein to Gene

Introduction into single-cell RNA-seq. Kersti Jääger 19/02/2014

Chapter 17 Lecture. Concepts of Genetics. Tenth Edition. Regulation of Gene Expression in Eukaryotes

Chapter 14 Regulation of Transcription

Protein protein interaction maps: a lead towards cellular functions

Introduction to Bioinformatics

Machine Learning. HMM applications in computational biology

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE

Microarrays & Gene Expression Analysis

Chinese Whispers for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

A Propagation-based Algorithm for Inferring Gene-Disease Associations

ChIP-seq and RNA-seq

Bioinformatics 3 V 3 Data for Building Networks

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Capabilities & Services

Computational Genomics. Reconstructing signaling and dynamic regulatory networks

Nature Methods: doi: /nmeth Supplementary Figure 1. Pilot CrY2H-seq experiments to confirm strain and plasmid functionality.

PIN (Proteins Interacting in the Nucleus) DB: A database of nuclear protein complexes from human and yeast

Switch On Protein-Protein Interactions

10/20/2009 Comp 590/Comp Fall

Technical University of Denmark

Module 2 overview SPRING BREAK

Measuring and Understanding Gene Expression

Gene-centered resources at NCBI

Computing with large data sets

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Network-free Inference of Knockout Effects in Yeast

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Analysis of Microarray Data

Introduction to genome biology

CS4220: Knowledge Discovery Methods for Bioinformatics Unit 6: Protein Complex Prediction. Limsoon Wong

Why does this matter?

OriGene GFC-Arrays for High-throughput Overexpression Screening of Human Gene Phenotypes

Augmenting DIAMOnD: A Method for Improving Disease Networks Among Human Genes

Chapter 18. Regulation of Gene Expression

Bioinformatics Introduction to genomics and proteomics II

CELL BIOLOGY - CLUTCH CH TECHNIQUES IN CELL BIOLOGY.

Module 2 overview SPRING BREAK

Exploring protein interaction data using ppistats

Bayesian Variable Selection and Data Integration for Biological Regulatory Networks

Transcription:

Protein-Protein-Interaction Networks Ulf Leser, Samira Jaeger

SHK Stelle frei Ab 1.9.2015, 2 Jahre, 41h/Monat Verbundprojekt MaptTorNet: Pankreatische endokrine Tumore Insb. statistische Aufbereitung und Integration von großen biomedizinischen Datensätzen (Transkriptom, Genom, Proteom) Siehe WBI-Homepage Ulf Leser: Bioinformatics, Summer Semester 2015 2

This Lecture Protein-protein interactions Characteristics Experimental detection methods Databases Biological networks Ulf Leser: Bioinformatics, Summer Semester 2015 3

Motivation Interaction: Physical binding of two or more proteins E.g. signal transduction, gene regulation, metabolism, Transient or permanent Directed effect (regulates), undirected (binds), specific (activates) Changes in protein structure may hinder bindings and thus perturb natural cellular processes Influence on all downstream proteins, i.e., proteins reachable through a path of interactions Interactome: Set of all PPIs in a cell (type, species, ) Complex: Permanent binding of two or more proteins Ulf Leser: Bioinformatics, Summer Semester 2015 4

Context-dependency PPI often is context-dependent Cell type, cell cycle phase and state Environmental conditions Developmental stage Protein modification Presence of cofactors and other binding partners Disregarded by many PPI detection methods Low quality of typical data sets Ulf Leser: Bioinformatics, Summer Semester 2015 5

Experimental detection methods PPIs have been studied extensively using different experimental methods Many are small-scale: Two given proteins in a given condition High-throughput methods Yeast two-hybrid assays (Y2H) Tandem affinity purification and mass spectrometry (TAP-MS) Ulf Leser: Bioinformatics, Summer Semester 2015 6

Yeast two-hybrid screens Test if protein A (bait) is interacting with B (prey) Choose a transcription factor T and reporter gene G such that If activated T binds to promoter of G, G is expressed Expression of G can be measured T can be split in two domains: DNA binding and activation Bait is fused to DNA binding domain of T Prey is fused to activating domain of T Both are expressed in genetically engineered yeast cells If A binds to B, T is assembled and G is expressed Bait protein Prey protein AD BD AD BD RNA Polymerase Promoter Transcription of reporter gene Ulf Leser: Bioinformatics, Summer Semester 2015 7

Properties Advantages Throughput: Many preys can be tested with same bait (and vice versa) Can be automized high coverage of interactome Readout can be very sensitive Problems High rate of false positives (up to 50%) Artificial environment: Yeast cells No post-translational modifications No protein transport Unclear if proteins in vivo are ever expressed at the same time... Fusion influences binding behavior false negatives Ulf Leser: Bioinformatics, Summer Semester 2015 8

Tandem affinity purification and mass spectrometry 1. Tag the protein of interest 3. Complexes are fished by affinity chromatography Bait 2. Protein binds in its natural environment 4. Purification 6. Identification of associated proteins by mass spectrometry 5. Purified protein complexes Ulf Leser: Bioinformatics, Summer Semester 2015 9

Properties Advantages Can capture PPI in (almost the tag) natural conditions Single bait can detect many interactions in one experiment Few false positives Disadvantages Tag may hinder PPI false negatives Purification and MS are delicate processes Difficult MS since the input is a mixture of different proteins Individual complexes are not identified Internal structure of complex is not resolved Ulf Leser: Bioinformatics, Summer Semester 2015 10

Matrix / Spokes Model Direct interactions can not be distinguished from interactions mediated by other proteins in a complex Matrix model: infers interactions between all proteins of a purified complex (N*(N-1))/2 Spokes model: infers only interactions between the bait and the co-purified proteins N 1 # Proteins Matrix Spokes 4 6 3 Bait 10 45 9 80 3540 79 Ulf Leser: Bioinformatics, Summer Semester 2015 11

PPI Databases [KP10] There are >300 DBs related to PPI and pathways See http://www.pathguide.org Manually curated source DBs (blue) Gather data from low-throughput methods Ulf Leser: Bioinformatics, Summer Semester 2015 12

PPI Databases There are >300 BDBs related to PPI and pathways See http://www.pathguide.org Manually curated source DBs DBs integrating other DBs and HT data sets (red) Ulf Leser: Bioinformatics, Summer Semester 2015 13

PPI Databases There are >300 BDBs related to PPI and pathways See http://www.pathguide.org Manually curated source DBs DBs integrating others and HT data sets Predicted interactions (yellow) Ulf Leser: Bioinformatics, Summer Semester 2015 14

PPI Databases There are >300 BDBs related to PPI and pathways See http://www.pathguide.org Manually curated source DBs DBs integrating others and HT data sets Predicted interactions Pathway DBs (green) Ulf Leser: Bioinformatics, Summer Semester 2015 15

A Mess [KP10] Different definitions of a PPI Binary, physical interaction Complexes, pairs, pathways Transient, functional association Consistency: Some integrated DBs have imported more data than there is in the sources Databases overlap to varying degrees Different reliability of content Literature-curated DBs do not guarantee higher quality than high-throughout experiments [CYS08] Re-annotation reveals inconsistencies, subjective judgments, errors in gene name assignment, Ulf Leser: Bioinformatics, Summer Semester 2015 16

Concrete Examples Database Species Proteins Interactions IntAct No restriction 53.276 271.764 BioGrid No restriction 30.712 131.638 DIP No restriction 23.201 71.276 MINT No restriction 31.797 90.505 HPRD Human only 30.047 39.194 MMPPI Mammals Experimentally verified STRING UniHI OPID No restriction (630) Human only Human only 2.590.259 Experimentally verified and / or predicted Ulf Leser: Bioinformatics, Summer Semester 2015 17

This Lecture Protein-protein interactions Biological networks Scale-free graphs Cliques and dense subgraphs Centrality and diseases Ulf Leser: Bioinformatics, Summer Semester 2015 18

Some Fundamental Observations Proteins that are close in the network share function more frequently Central proteins are vital Complexes form dense subgraphs Functional modules are subgraphs Certain subgraphs can be found significantly more often than expected by chance (why?) Ulf Leser: Bioinformatics, Summer Semester 2015 19

Protein-protein interaction networks Networks are represented as undirected graphs Definition of a graph: G = (V,E) V is the set of nodes (proteins) E is the set of binary, undirected edges (interactions) Computational representation Adjacency lists { (A,C), (A,D), (B,D), (C,A), (C,D) (D,B), (D,C), (D,A) } A C B D Adjacency matrix A B C D A 0 0 1 1 B 0 0 0 1 C 1 0 0 1 D 1 1 1 0 Ulf Leser: Bioinformatics, Summer Semester 2015 20

Degree distribution Degree distribution P(k): relative frequency of nodes with degree k Used to define different classes of networks Common distributions Poisson P( k) Random networks ~ k λ λ k! e Power-law P ( k) ~ Scale-free networks γ k Barabasi et al., 2004 Ulf Leser: Bioinformatics, Summer Semester 2015 21

Scale-free Networks Biological networks are (presumably) scale-free Few nodes are highly connected (hubs) Most nodes have very few connections Also true for many other graphs: electricity networks, public transport, social networks, Evolutionary explanation Growth: Networks grow by addition of new nodes Preferential attachment: new nodes prefer linking to highly connec. nodes Possible explanation: Gene duplication interaction with same targets Older nodes have more changes to connect to nodes Hub-structure emerges naturally Ulf Leser: Bioinformatics, Summer Semester 2015 22

Ulf Leser: Bioinformatics, Summer Semester 2015 23

Other Biological Networks Regulatory networks: How genes / transcription factors influence the expression of each other TF regulate expression of genes and of other TFs Edges semantics: activate / inhibit / regulate Important, for instance, in cell differentiation Signal networks: Molecular reaction to external stimulus Transient interactions including small molecules Temporal dimension important (fast) Important, for instance, in oncology Metabolic networks Protein-protein interaction networks Ulf Leser: Bioinformatics, Summer Semester 2015 24

Modular network organization Cellular function is carried out by modules Sets of proteins interacting to achieve a certain function Function is reflected in a modular network structure Don t be fooled by layout Modules must be dense, not close Costanzo et al., Nature, 2010 Ulf Leser: Bioinformatics, Summer Semester 2015 25

Functional Modules Protein transport Ribosome subunits Translation Pathways in cancer MAPK/VEGF/Erb B signaling pathway Proteasome subunits Protein degradation Ulf Leser: Bioinformatics, Summer Semester 2015 26

Clustering Coefficient Modules (clusters) are densely connected groups of nodes Cluster coefficient C reflects network modularity by measuring tendency of nodes to cluster ( triangle density ) C v = d v 2E ( d v v 1) E v = number of edges between neighbors of v d v = number of neighbors of v d v ( d v 1) = maximum number of edges between neighbors d v 2 C 1 = V C v v V Ulf Leser: Bioinformatics, Summer Semester 2015 27

Example v v v C v = 10/10 = 1 C v = 3/10 = 0.3 C v = 0/10 = 0 Cluster coefficient C is a measure for the entire graph We also want to find modules, i.e., regions in the graph with high cluster coefficient A clique is a maximal complete subgraph, i.e., a maximal set of nodes where every pair is connected by an edge Ulf Leser: Bioinformatics, Summer Semester 2015 28

Finding Modules / Cliques Finding all (maximal) cliques in a graph is intractable NP-complete Finding quasi-cliques is equally complex Cliques with some missing edges Same as subgraphs with high cluster coefficient Various heuristics build set S 2 of all cliques of size 2 i:= 2; repeat i := i+1; S i := ; for j := 1 to S i-1 for k := j+1 to S i-1 T := S i-1 [j] S i-1 [k]; if T =i-1 then N := S i-1 [j] S i-1 [k]; if N is a clique then S i := S i N; end if; end if; end for; end for; until S i = 0: E.g. a good quasi-clique probably contains a (smaller) clique Ulf Leser: Bioinformatics, Summer Semester 2015 29

Example 5 6 4 3 7 1 2 4-cliques: (1,3,4,5) (1,3,4,6) (1,3,4,7) - Merge-Phase (1,3,4,6) (1,3,4,7) =3 (1,3,4,6) (1,3,4,7)=(1,3,4,6,7) Edge (6,7) exists 5-clique (1,3,4,5) (1,3,4,6) =3 (1,3,4,5) (1,3,4,6)=(1,3,4,5,6) Edge (5,6) does not exists No clique Ulf Leser: Bioinformatics, Summer Semester 2015 30

This Lecture Protein-protein interactions Biological networks Scale-free graphs Cliques and dense subgraphs Centrality and diseases Ulf Leser: Bioinformatics, Summer Semester 2015 31

Network centrality Central proteins exhibit interesting properties Essentiality knock-out is lethal Much higher evolutionary conservation Often associated to (certain types of) human diseases Various measures exist Degree centrality: Rank nodes by degree Betweenness-centrality: Rank nodes by number of shortest paths between any pair of nodes on which it lies Closeness-centrality: Rank nodes by their average distance to all other nodes PageRank Ulf Leser: Bioinformatics, Summer Semester 2015 32

Network-based Disease Gene Ranking Ulf Leser: Bioinformatics, Summer Semester 2015 33

Centrality of Seeds in (OMIM) Disease Networks Fraction of seeds among top k% proteins; ~600 diseases from OMIM Rank in % within disease network d 1 = direct interactions d 2 = direct and indirect interactions Ulf Leser: Bioinformatics, Summer Semester 2015 34

Cross-Validation If a disease gene is not yet known can we find it? Cross-validation recovery rate 20% Rank in % within disease network Ulf Leser: Bioinformatics, Summer Semester 2015 35

Further Reading Jaeger, S. (2012). "Network-based Inference of Protein Function and Disease-Gene Associations". Dissertation, Humboldt-Universität zu Berlin. Goh, K. I., Cusick, M. E., Valle, D., Childs, B., Vidal, M. and Barabasi, A. L. (2007). "The human disease network." Proc Natl Acad Sci U S A 104(21): 8685-90. Ideker, T. and Sharan, R. (2008). "Protein networks in disease." Genome Res 18(4): 644-52. Barabasi, A. L. and Oltvai, Z. N. (2004). "Network biology: understanding the cell's functional organization." Nat Rev Genet 5(2): 101-13. Ulf Leser: Bioinformatics, Summer Semester 2015 36