Network System Inference Francis J. Doyle III University of California, Santa Barbara Douglas Lauffenburger Massachusetts Institute of Technology WTEC Systems Biology Final Workshop March 11, 2005 What is Systems Inference? Estimation of interactions of elements in a network given time series data of activities of different nodes (e.g., gene interactions from expression data) Goals Hypothesis generation Design of experiment Understanding cellular function Unravel design principles Data sources Large scale deletion projects DNA microarray ChiP Assay 1
Spectrum of Network Modeling [Stelling, 2005] Complexity Gene-Protein Sequence Protein biochemistry Cell dynamics Coverage Genomics Systems analysis of networks 2
Network Model Identification Problem Basic transcription model ( ) log Y eg = α fg X ef + n eg f F( g) e = experiment, f = factor, g = gene X = TF level, Y = expression, α = connection strength F = connection knowledge (i.e., localization constraint) not considering details of clustering methods, etc. Basic signal transduction model Rate of generation x& =Ω + Sr µ x gen (, ) r = f x p Rate of degradation Rate of inter-conversion Applications of Network Inference Metabolic networks Signaling networks Development models Gene regulatory networks Physical interaction networks 3
Tools and Mathematical Approaches Many Approaches Boolean networks + optimization [Ideker et al., 2000] Signed directed graphs + graph theory (SDGs) [Kyoda et al., 2000] Bayesian networks + machine learning [Pe er et al., 2001] Acyclic graphs + graph theory [Wagner, 2001] S-functions + decomposition [Kimura et al., 2004] Technical refinements that are promising innovations Exploit sparsity Exploit modularity Refined design of experiment Representations S-functions, Bayesian nets, SDGs, etc. Motifs and Modules Interconnection of function, dynamics & network topology Role in robustness compensation [Stelling et al., 2004] Recurring regulatory motifs Switches, oscillators, filters, memory, amplifiers [Wolf & Arkin, 2000] Technical approaches Pattern matching Constrained optimization Others Modules Larger size, inter-connectivity, autonomy Confers: robustness evolvability [Lee et al., 2002; Shen-Orr et al., 2002] 4
Modules in Yeast Molecular Network [Tanay et al., 2004] Issue of Dynamics Largely static interactions maps are constructed Explosion of parameters for dynamic network models Limited dynamic data available Isolated exceptions circadian rhythm work of Ueda (RIKEN, Kobe) [Ueda et al., Nature Genetics, 2005] 5
Iterations Between Models and Experiments Very few groups report studies with multiple modelexperiment iterations e.g., Klingmüller (Germany), Nobel (UK) Convergence is an issue Reporting modeling failures is an issue Multiple groups are at an early stage Important issues Model validation/invalidation Identifiability Design of experiment [Kitano, 2002] Validation, Verification, Consistency, etc. Validation or verification is critical step in any model identification problem [Ljung, 1999] Matching of data (to date): consistency In practice, only invalidation is possible [Poolla et al., 1994] Contradiction w/ data is often the most valuable role of a model Model discrimination can suggest new experiments Competing hypotheses can be resolved Data sets can be invalidated Various statistical tools for model invalidation Measure of error Confidence intervals Likelihood ratios 6
Identifiability Issues General questions [Ljung, 1999] Will identification procedure yield unique parameter set? Is resulting model equal to true system? Can experimental conditions lead to model discrimination? Design of experiment issues inputs (ligand, environmental, knockouts, etc.) measurements (localization information, expression, etc.) perturbation richness ; duration of experiment; sampling protocol Few applications in biological network inference Statistical mechanics approaches [Brown & Sethna, 2003] Formal identifiability of gene networks [Zak et al., 2003] Design of experiment [Faller, et al., 2003] Measurement selection [Stelling & Gilles, 2005] Role of Perturbations vs. Data [Stelling & Gilles, MPI, Magdeburg, Germany] 7
Regional Highlights - Japan Kitano Lab Systems Biology Markup Language [SBML] Difference-based regulation finding-minimum equivalent gene network RIKEN Cooperativity coevolutionary inference algorithm Receptor Tyrosine Kinase Regulatory Networks Consortium Ueda lab profiling of circadian regulation Miyano Lab (U. Tokyo) Bayesian networks and non-parametric regression for yeast gene networks Collaboration with pharma company for drug studies Computational Biology Research Center Gene regulatory network inference (lung cancer studies) KEGG (U. Kyoto) Portability of network data SBML, GON, etc. Reconstruction of networks via kernel methods (incl. dynamics) Ito Lab (U. Tokyo) Heterogeneous measurements for network inference (MS proteomics, FRET metabolomics, ChiP & GATC-PCR TF-binding, MS phosphorylation, MS protein complex) Regional Highlights - Europe MPI Dynamics of Complex Systems Design of experiment, model iterations, identifiability, perturbations Collaborative Research Center for Theoretical Biology (Humboldt U.) Dynamic modeling, driving new microarray data applied to Ras pathway German Cancer Research Center Klingmueller: design of experiment, novel hypothesis testing, model discrimination application to Jak/STAT pathway to predict unobservable behaviors Reuss: Bioinformatics studies of cytochrome p450 Oxford University Center for Mathematical Biology (Armitage): bottom-up approach to pathway analysis in histidine sensing Dept. of Physiology (Noble): challenging the bottom-up approach, advocating a combination that starts in the middle 8
General Observations Problems Many people using limited data to regress lots of coefficients Little true validation out there Limited success with dynamic network inference Promise Many, many approaches to fit models to data ( regression ) Motifs and modules (structure) are being factored into inference methods Nicer interplay possible between network modelers (static) and dynamic modelers with portability of tools and data Lots of curricular development in this area (bioinformatics) 9