Alice in Bio-Land : Engineering Challenges in the World of Life Sciences

Size: px

Start display at page:

Download "Alice in Bio-Land : Engineering Challenges in the World of Life Sciences"

Darleen Jackson
6 years ago
Views:

Life ScienceS computing Alice in Bio-Land : Engineering Challenges in the World of Life Sciences Alfredo Benso Stefano Di Carlo, Gianfranco Politano, and Alessandro Savino, Politecnico di Torino,

own top-down engineering approach. In this realm, she tries to understand where systems and computational biology is the much-needed methodological middle ground.

biotechnologies and the use of the Web as a primary big data repository.

This isn t due to technical reasons but rather to a methodological limitation in life sciences, researchers often apply a bottom-up approach in studying problems, using computation mostly to make

1 Life ScienceS computing Alice in Bio-Land : Engineering Challenges in the World of Life Sciences Alfredo Benso Stefano Di Carlo, Gianfranco Politano, and Alessandro Savino, Politecnico di Torino, Italy Enrico Bucci, BioDigitalValley, Italy When Alice the engineer ventures into the researchers world of life sciences, she realizes that their research methodology works backward compared to her own top-down engineering approach. In this realm, she tries to understand where systems and computational biology is the much-needed methodological middle ground. The life sciences world typically sees computer science as a tool to analyze laboratory data a reasonable role given that computers play a significant part in interpreting and statistically analyzing biological data. Computers started to become necessary in life sciences when biologists were no longer able to manually infer plausible hypotheses and models from the voluminous data arising from high-throughput biotechnologies and the use of the Web as a primary big data repository. Nevertheless, there s a wide gap between what computer science is doing for the life sciences research community and what it could be doing. This isn t due to technical reasons but rather to a methodological limitation in life sciences, researchers often apply a bottom-up approach in studying problems, using computation mostly to make sense of lab data. Here, we ask a few simple questions: Is this data analysis role the only plausible and effective approach for engineers and computer scientists to help life sciences researchers with their work? Is it the most efficient? And do life scientists know what they really need from computational tools? In our opinion, computational work shouldn t be limited to the analysis of an experiment s results; it should also drive the design of the experiment itself. This habit, however, requires an engineering frame of mind that s often missing or neglected in the life sciences community. 38 IT Pro July/August 2014 Published by the IEEE Computer Society /14/$ IEEE

2 Biology Is Reverse Engineering Unlike engineers, biologists don t design systems. They instead try to understand how a certain biological system works. This task is known as reverse engineering, a concept very clear to engineers but not so easy to understand for other scientists, especially in its implications. According to Eldad Eilam and Elliot J. Chikofsky, reverse engineering is the process of discovering the functional principles of a device, object, or system through analysis of its structure, function, and operation. 1 Life sciences are the reverse engineering sciences by definition: their task is to understand, model, and predict the dynamics of the biological systems that allow, define, and regulate life. One key concept to understand is that designing a system and reverse engineering it are two opposite tasks whose complexity can differ by orders of magnitude. Given this, the question now is: Should the target system s complexity (in mathematical terms) drive the methodological approach followed to reverse engineer it? Answering this question is fundamental, because it could have important consequences not only in terms of how research is planned but also how IT professionals plan and organize their life sciences curricula. The choice of a methodological approach is a consequence not just of the notions scientists learned during their studies but even more of the mindset that a particular course of studies shaped in each of them. Bridging the Cultural Gap Since we started applying computer engineering to life sciences, we have observed a definite cultural gap between biologists and engineers. Biologists seem to be very comfortable with ambiguity, whereas engineers always expect to achieve established knowledge and certainty. The methodological differences between these two are planted during undergraduate and graduate studies. Engineers are trained to work topdown, with an approach that s focused toward modeling as much as possible, stopping only when the model is detailed enough to make all the available results and knowledge work smoothly together. Biologists are taught a bottom-up approach, analytically and precisely studying highly specific and controlled problems, but lacking reliable methodological tools that let them easily generalize their findings into a higher-level model of the systems under study. What s important to understand is whether the success of one approach or the other is an absolute characteristic of the approach itself, or whether it depends on the target system s complexity and properties. An Example: Nyquist, a Movie Plot, and Genetic Regulation To better understand why having a system s complexity drive the reverse engineering approach is so important, we ll use a simple example. In many engineering fields, a common problem is to determine the sampling frequency required to reconstruct a signal of any kind. The Nyquist-Shannon Sampling theorem comes to our aid here, basically stating that to obtain a reliable signal reconstruction, the sampling frequency must be greater than the double of the signal frequency. To make this concept easier to understand for readers unfamiliar with signal reconstruction problems, let s assume that our task is to figure out the plot of a Since we started applying computer engineering to life sciences, we have observed a definite cultural gap between biologists and engineers. movie by seeing the fewest possible screenshots of it. The question is: How often should we grab a screenshot (the sampling rate) to be able to reliably reconstruct the plot of the movie? The answer depends only on how often a significant event occurs in the plot. If we assume the movie plot is an electronic signal, and we assume that two consecutive significant events never take place less than 10 minutes apart, then the sampling theorem law tells us that we must get a screenshot at least every five minutes to be able to infer anything reliable about the plot (see Figure 1). We used this example despite its not being completely theoretically correct because it s closely related to a typical bioinformatic problem: the understanding of the wide range of mechanisms that genes use to regulate the production of specific products (proteins). A gene s expression is the process by which the information it encodes computer.org/itpro 39

3 Life Sciences Computing Gen (a) Gen (b) Prote Prote Top-down approach Metabol Metabol Human Physiology. Bottom-up approach Human Physiology.???????? Figure 1. Choosing the sampling frequency to reconstruct a given signal: (a) in a movie plot, samples are represented by frames; (b) a biological experiment, samples can be represented by microarrays. is used to synthesize a product, such as a protein. Genes aren t expressed all the time; instead, the gene s expression level depends on the interactions among the products of several other genes. The most effective way to represent this network of interactions is to use graphs in which each node is a gene, and each edge connecting two genes represents a direct regulatory interaction between them. These network topologies don t change in time, but each gene s expression profile does change in time depending on the other nodes status, and this defines the network dynamics. Biotechnologies (such as microarrays) let us take snapshots of the expression levels of thousands of genes. We can thus formalize the problem as: Given a set of snapshots of a group of selected genes expression levels taken at variable time intervals (usually minutes or hours), is it possible to reconstruct the network of interactions that models the system dynamics? Using the analogy introduced before, each gene expression profile is like a movie screenshot, and the final network topology (and consequently, the dynamics) is the movie plot. Before even considering the different approaches proposed to build the gene interaction network from the expression profiles, we have another question: If current biotechnologies let us photograph the gene expression for example, A high-level hypothesis is formulated specifying, but not detailing, any first-level subsystems (= top) Local observations are merged to create a higher-level model (= up) Each subsystem is then refined in yet greater detail, sometimes in many additional subsystem levels, to possibly encompass all experimental observations The variables believed as important in the determination of the state of the system (such as a set of genes) are selected (in most cases almost randomly, or based on previous experiments) The actual important variables are unknown as well as their relationship... Relationships between variables are inferred from a set of local observations (such as gene expression experiments) (= bottom-up) every 10 minutes will this sampling frequency be compatible, following the Nyquist-Shannon Sampling theorem, with the network operating frequency? In this example, the network would be reconstructable only if its frequency is in the range of tens of minutes. So, what is the frequency at which a gene regulatory network works? Apparently, the dynamics of some genes are in the order of seconds, while others work in the order of minutes and others of hours and so on. So, how accurately can we reconstruct a network in this way given that, if the sampling frequency is incompatible with the system dynamics, then no expression sample can be reliably used? Does it make sense to identify drug targets, associate pathologies with genes, or design a therapy with data that give no guarantee of revealing anything about the real systems functionality? Further interesting discussions about these issues are available in an article that focuses specifically on reverse engineering regulatory systems. 2 Bottom-Up vs. Top-Down Figure 2 summarizes the key characteristics of the bottom-up and top-down approaches to reverse engineering a biological system. The bottom-up approach works extremely well with linear systems. Linear systems are subject to the principle of super-position, which states that the net response at a given place and time caused by two or more stimuli is the sum of the responses that each stimulus would have caused individually. So, if input A produces response X and input B produces response Y, then input (A + B) produces response (X + Y). The direct consequence of this property is that it s possible to study the whole system s dynamics by studying, individually, the dynamics of each component. Linear systems are also easy for non-mathematicians to understand, and they re also easy to visualize. Our observation has been that a large part of the life sciences world particularly the medical field continues to reason in linear terms, neglecting the fact that few biological systems are likely to be truly linear. 40 IT Pro July/August 2014

P (z (t ) = 1) Log expression level 0.5 6 5 4 3 2 Score: 44.58 0 10 20 30 Time/hr 40 50 Frames Microarrays MOVIE PLOT? REGULATORY MECHANISMS Figure 2.

In top-down approaches, a high-level hypothesis is formulated by specifying, but not detailing, any first-level subsystems. Neither approach can answer, by itself, the needs of life sciences.

If the system is complex rather than linear, building this model might not be possible because the superposition property no longer holds.

They consist of many diverse and autonomous and yet interrelated and interdependent components or parts linked together through many (usually dense) interconnections.

4 P (z (t ) = 1) Log expression level Score: Time/hr Frames Microarrays MOVIE PLOT? REGULATORY MECHANISMS Figure 2. Key characteristics of the (a) top-down and (b) bottom-up approaches to reverse engineering. In bottom-up approaches, relationships between variables are inferred from a set of local observations. In top-down approaches, a high-level hypothesis is formulated by specifying, but not detailing, any first-level subsystems. Neither approach can answer, by itself, the needs of life sciences. A combination of the two approaches is required. The main obstacle appears when it s time to merge all experimental observations to build a higher-level model (the up part of the methodology). If the system is complex rather than linear, building this model might not be possible because the superposition property no longer holds. These complex (nonlinear) systems are much more difficult to understand or visualize. They consist of many diverse and autonomous and yet interrelated and interdependent components or parts linked together through many (usually dense) interconnections. They can t be described by a single rule and, importantly, they exhibit properties that emerge from the interaction of their parts and that can t be predicted from only each individual part s properties. In practice, it seems to us that what often happens is that observations that don t make sense are either discarded as lab errors or the results are manipulated to make everything fit together. However, this failure in the abstraction process might not be caused by errors made during the research, but instead by a limitation of the methodology itself, which is intrinsically unable to elucidate some of the system s most important functional properties. In this situation, a bottom-up approach has strong limitations; this is clearly demonstrated by huge variations and poor controllability in lab experiments. However, the bottom-up approach has a very misleading characteristic that makes its use very tempting : it can show early insights into the system behavior. The mistake is to consider these insights reliable enough to produce a higher-level model (or even a therapy for a disease); it s likely that some key system dynamics computer.org/itpro 41

Life Sciences Computing Lab activity Raw data Top-down Systems and computational biology to correctly adapt the model to the available experimental data.

research requires. Bottom-up Chromosome Genes Figure 3. The role of systems and computational biology in life sciences research as a link between top-down and bottom-up methodologies.

Because it works on abstractions and inferences, the conclusions reached are often general enough to try to explain the overall mechanics, but they re based on computational assumptions that might be

5 Life Sciences Computing Lab activity Raw data Top-down Systems and computational biology to correctly adapt the model to the available experimental data. Moreover, some model features aren t comprehensively confirmable by experimental evidence; we believe this is one of the primary reasons that life sciences researchers are often skeptical about computational approaches. Furthermore, the models resulting from top-down approaches aren t always quickly and easily translated to real applications or therapies that is, new drugs and treatments that medical and biological research requires. Bottom-up Chromosome Genes Figure 3. The role of systems and computational biology in life sciences research as a link between top-down and bottom-up methodologies. are missed because of the observations specificity or, worse, because the complexity itself hides the basic mechanics. The top-down approach has its limitations, too. Because it works on abstractions and inferences, the conclusions reached are often general enough to try to explain the overall mechanics, but they re based on computational assumptions that might be incorrect. Thus, the risk of a top-down solution is that, in the first phases, it might provide limited correlation with experimental data and, consequently, its potential benefits might not be immediately evident. There s also the risk that, while refining the model, its complexity will increase to the point where it s no longer computationally manageable. However, this risk is mitigated by the steady increase in modern computers computational power. The top-down approach s implementation overhead is also likely to be higher, because it can require several iterations and refinements Identification of new genes, drug targets, and regulatory motifs The Role of Systems Biology For a long time, the scientific community assumed that the top- and bottomdown approaches could be pursued independently and would eventually meet somewhere in the middle. This proved true in the reverse engineering of almost-linear systems, but it showed huge limitations for complex systems. Even big pharmaceutical companies are now understanding that integrating the two approaches is the only way to provide new momentum to the drug target discovery market which is currently stagnant due to the lack of new molecules and targets to test. 3 Systems and computational biology aren t just the latest fashion in biology; they are a necessary step to overcome the limitations of top- and bottom-down approaches and find a middle ground. Starting from high-level hypotheses and models, systems and computational biology should drive the experimentation and laboratory phases, establishing a loop that could contain the best advantages of both approaches, while overcoming most of their limitations (see Figure 3). Not surprisingly, similar concepts were introduced in the 50s including Karl Ludwig von Bertalanffy s General System Theory, which emphasizes holism over reductionism, and organism over mechanism but paradoxically appear to be novel only now that the amount of data and the systems complexity are emerging as overwhelming obstacles in life sciences. 42 IT Pro July/August 2014

6 The Case of Biological Networks A good example of how an engineering mind can (and did) contribute to biological research is in the field of biological networks, where graphs and network structures are used to model interactions and relationships among biological entities. These models are now increasingly used to discover emergent properties among genes, proteins, and other relevant biomolecules that refer to specific phenotypes or diseases. Nevertheless, the idea of correlating biological properties to a network s topological features didn t come directly from the life sciences world; it was actually born almost by chance. Two scientists Albert-László Barabási, a physicist, and Zoltán N. Oltva, a cell biologist were neighbors. Barabasi had already shown that the Internet is a non-random network, and that its connectivity structure influences its function. In 1999, the pair proved that the metabolic pathways of yeast define a network whose structure is very similar to the Internet. Thereafter, several theoretical studies confirmed that biological networks share many features with other types of networks such as computer or social networks and, even more importantly, that several mathematical and computational methods of the graph theory apply to biological studies as a result. 4,5 Network biology s likely keystone was found in 2000 when three scientists explained in a now classical Nature article, Surfing the p53 Network, 6 the tumor-suppressor role of the gene p53, which was found to be mutated in about 50 percent of cancer patients, by analyzing the gene s network topology. The cell, like the Internet, appears to be a scale-free network, they wrote, and one way to understand the p53 network is to compare it to the Internet. The surprising point here is that, in 1999, more than 15,000 articles had already been published on p53 and its role in cancer biology, yet some aspects of p53 biology could be unveiled thanks only to a methodology that had nothing to do with the traditional biological research approach. Since then, the computational analysis of biological networks has become increasingly used to mine the complexity of cellular processes and signaling pathways. Many types of biological networks do exist, depending on the information associated with their nodes and edges. In general, Related Work in IT Professional The following IT Professional articles cover various aspects of IT opportunities in life sciences: See-Kiong Ng and Limsoon Wong, Accomplishments and Challenges in Bioinformatics, IT Professional, vol. 6, no. 1, 2004, pp Marti Szczur, Delivering Environmental Health Information, IT Professional, vol. 6, no. 2, 2004, pp Thomas Jepsen et al., Healthcare IT, IT Professional, vol. 12, no. 2, 2010, pp James Teeter, Flexible Medical-Grade Networks, IT Professional, vol. 13, no. 3, 2011, pp they can be classified as directed or undirected networks. 7 In directed networks, nodes are molecules and edges indicate causal biological interactions among them (such as transcription and translation regulations). In contrast, in undirected networks, an edge indicates a shared property, such as sequence similarity, 8 gene coexpression, 9,10 protein protein interaction, 11 or term co-occurrence in the scientific literature. 12 Approaches to Network Analysis Because a complex system shows properties that can t be understood from the isolated study of its components, the main contribution of the network model is that it provides a way to study interactions. In general, there are two main approaches to network analysis: topological analysis attempts to identify recurrent network substructures and correlate them with particular functional characteristics of the system; dynamics analysis focuses instead on studying the network s evolution over time. As shown later, both are important when studying biological systems. Topological Network Analysis To give a general idea of the potential of this analysis, here we focus on three well-known topological features of a graph: hubs, cliques, and graphlets. Identifying these features in biological networks lets researchers unveil interesting biological mechanisms that emerge from network component interactions and would be impossible to observe by studying individual network nodes. Hubs. Hubs are nodes with many more (input and/or output) connections than the overall node computer.org/itpro 43

7 Life Sciences Computing average. In a protein interaction network where nodes are proteins and two proteins are connected if some kind of direct or indirect interaction between them has been observed in a laboratory hubs are nodes with many different molecular partners, and therefore are likely implied in most cellular processes. Also, proteins found in many different macromolecular complexes, represented as interaction network hubs, could be either a key component of a single molecular complex or elements shared in many different molecular complexes in which they work as switches to coordinate different molecular processes activation or repression. An important consequence of these observations is that any alteration of a protein-protein Researchers have extensively explored... the centrality-lethality rule in the lab by experimentally knocking down protein interaction hubs. interaction network s hubs is predicted to have large effects on the cell biology, in the same way that hacking an Internet hub server would cause widespread network malfunction. Researchers have extensively explored this assumption, known as the centrality-lethality rule, in the lab by experimentally knocking down protein interaction hubs and quantitatively assessing the effects in different models. 13 Furthermore, researchers have also hypothesized that mutations affecting these proteins are likely to be particularly related to the appearance of diseases. Some experimental validation of this prediction has indeed been obtained: Davide Rambaldi and his research group provided evidence that, in the human proteinprotein interaction network, virtually all hubs with more than 80 edges (interactions) are targets of known cancer-related mutations. 14 Similarly, after building an interaction network comprising all human proteins involved in immune response, Csaba Ortutay and Mauno Vihinen found that the network hubs included known disease-causing genes as well as 26 new genes related to primary immunodeficiency. 15 In a further example, Wenjun Chang and his colleagues 16 found new gastric cancer candidate markers by looking at hubs in a protein-protein interaction network built from genes differentially expressed in the patient tissues. This is a good example of a possible role of systems biology as an input to the experimental research, and not vice versa. Cliques. A clique is a set of nodes in which each node is connected to all the others. In biological networks, cliques proved extremely important in identifying molecular complexes and functional modules. In 2003, Victor Spirin and Leonid Mirny described the presence of densely connected modules in protein-protein interaction networks that is, neighborhoods whose internal connectivity was very high compared to the average network connectivity. 17 The authors were thus able to identify a full set of previously unknown functional modules and molecular machineries. This startlingly seminal work was crucial because it shifted the concept of biological network analysis from a single node centrality to a community of nodes. This trend culminated in several complex applications of clique analysis, such as a recent work that nicely illustrates how the mitotic spindle functioning is regulated by a cascade of events involving cliques that is, molecular complexes instead of single proteins. 11 Graphlets. A graphlet is a small, connected network subgraph with a predetermined number of nodes. Graphlets biological meaning in networks is relatively new. In one significant paper, 18 Tijana Milenkovic and Natasa Przulj discuss the biological meaning of the graphlet degree signature, observing how, in a human protein-protein interaction network, oncogenes have a similar graphlet degree signature, which is different from that of genes unrelated to cancer. This observation let the researchers use this signature to identify new possible oncogenes. If this finding is confirmed by other researchers, it will prove that the detailed topology around a node in a global protein-protein interaction map is as important to determining the corresponding protein s function as is its sequence and threedimensional structure. This result is even more important if we consider the fact that proteinprotein interaction networks are only abstract models of all interactions that have been observed, without spatial and temporal resolution, and don t correspond to any real physical entity. 44 IT Pro July/August 2014

8 Analysis of Network Dynamics Researchers have recently obtained highly interesting biological insights by studying biological network dynamics. For some networks, it s possible to define, for each node, a status that is, a condition that can be correlated to a biological condition. For example, in gene regulatory networks, each node is a gene with an expression level (that is, a status) that corresponds to its transcription rate. The connection between two genes models the causality between them: the expression of gene A directly causes the expression of gene B. Collecting the status of all nodes in a network constitutes the network status. In biological systems, we clearly must understand the evolution of a network s status (that is, its dynamics) when it starts from different initial conditions because decisions are reached and actions are taken within a cell through methods that are exceedingly parallel and extraordinarily integrated. In biological networks, as well as in many other fields, the key to understanding a complex system s behavior is to identify and study its attractors. An attractor is a stable state toward which a system evolves (that is, a state to which it s attracted). When a system evolves, the collection of states through which the system transits can either settle to one particular state (point attractor), repeatedly return to a group of states (cyclic attractor), or not clearly return to a defined set of states (strange attractor). Recently, researchers have applied the study of attractors, state spaces, and state trajectories to gene regulatory networks with interesting results that seem to plausibly explain complex mechanisms, such as cell differentiation and the unexpected reactions of tumor cells to standard therapies. In their 2009 paper, 19 Sui Huang and colleagues introduced the idea of cancer attractors that consider tumors as a particular natural state of cell regulation that is normally inaccessible and hidden beneath layers of complex molecular networks. 19 Studying gene regulatory network dynamics might show us how cancer states can be reached as well as avoided. Huang and his colleagues paper is fascinating in that the idea of cancer attractors provides a simple formal framework that explains several complex and unexpected behaviors of cancer cells that traditional biology hasn t yet been able to make clear. Similarly fascinating works, such as the one published by Tariq Enver, 20 explain how a population of stem cells can diversify in completely different phenotypes (such as epithelial cells, nervous cells, and so on). The idea is that this transition is allowed by a regulatory network s evolution through several transformations, so that once fully differentiated the evolution allows the regulatory network to settle into a set of possible attractors that is different for every cell type, thus allowing the expression of different phenotypes in the cell behavior and morphology. These pioneering works show that, to fully understand the nature of cellular functions, it s necessary to study the behavior of genes in a holistic rather than individual manner, because the The typical bottom-up approach followed by biologists lacks generality as much as the engineers top-down approach lacks specificity. expressions and activities of genes aren t isolated or independent of each other. Moreover, this type of translation of well-established approaches into a new field of research (life sciences) is exactly the type of objective that the engineering community should promote to actively support research in the medical/biological community. We ve discussed here some of the methodological issues that, as engineers, we observed when approaching the life sciences world. Our goal was to illuminate the great opportunities this research field offers to researchers with expertise in different areas. Understanding the extreme complexity of life s machinery shouldn t be a task left to biologists, but rather should be a multidisciplinary effort involving a shift in the research methodology used to build models and analyze data. The typical bottom-up approach followed by biologists lacks generality as much as the engineers top-down approach lacks specificity. The optimal solution is probably a new modus operandi that integrates specific observations made by biologists with engineering s functional modeling. computer.org/itpro 45

9 Life Sciences Computing We re already seeing the first applications of network analysis in human therapy, where instead of the classical magic bullet pharmacological paradigm aimed at ultra-specific targeting of a single protein, a new kind of approach is emerging, based on simultaneously targeting several molecular processes. The topological and dynamical analysis of the molecular network underlying a specific disease is the only way to implement this new approach, because such analysis lets us look for modulators acting on different network areas so to simultaneously attack different cellular pathways. References 1. E. Eilam and E.J. Chikofsky, Reversing: Secrets of Reverse Engineering, John Wiley, J. Ashworth, E.J. Wurtmann, and N.S. Baliga, Reverse Engineering Systems Models of Regulation: Discovery, Prediction and Mechanisms, Current Opinion in Biotechnology, vol. 23, no. 4, 2012, pp D.B. Kell, Finding Novel Pharmaceuticals in the Systems Biology Era Using Multiple Effective Drug Targets, Phenotypic Screening and Knowledge of Transporters: Where Drug Discovery Went Wrong and How to Fix It, FEBS J., vol. 280, no. 23, 2013, pp A.L. Barabasi and Z.N. Oltvai, Network Biology: Understanding the Cell s Functional Organization, Nature Reviews Genetics, vol. 5, no. 2, 2004, pp W. Huber et al., Graphs in Molecular Biology, BMC Bioinformatics, vol. 8, suppl. 6:S8, 2007; doi: / S6-S8. 6. B. Vogelstein, D. Lane, and A.J. Levine, Surfing the p53 Network, Nature, vol. 408, no. 6810, 2000, pp E. Pieroni et al., Protein Networking: Insights into Global Functional Organization of Proteomes, Prote, vol. 8, no. 4, 2008, pp O. Kuchaiev and N. Przulj, Integrative Network Alignment Reveals Large Regions of Global Network Similarity in Yeast and Human, Bioinformatics, vol. 27, no. 10, 2011, pp E. Prifti et al., Interactional and Functional Centrality in Transcriptional Co-Expression Networks, Bioinformatics, vol. 26, no. 24, 2010, pp A. Benso, S. Di Carlo, and G. Politano, A cdna Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory, IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 8, no. 3, 2011, pp T.C. Chen et al., Cliques in Mitotic Spindle Network Bring Kinetochore-Associated Complexes to Form Dependence Pathway, Prote, vol. 9, no. 16, 2009, pp S. Gatti, Gene Expression Profiling of Hgf/Met Activation in Neonatal Mouse Heart, Transgenic Research, vol. 22, no. 3, 2012, pp H. Jeong, Lethality and Centrality in Protein Networks, Nature, vol. 411, no. 6833, 2001, pp D. Rambaldi et al., Low Duplicability and Network Fragility of Cancer Genes, Trends in Genetics, vol. 24, no. 9, 2008, pp C. Ortutay and M. Vihinen, Identification of Candidate Disease Genes by Integrating Gene Ontologies and Protein-Interaction Networks: Case Study of Primary Immunodeficiencies, Nucleic Acids Research, vol. 37, no. 2, 2009, pp W. Chang et al., Identification of Novel Hub Genes Associated with Liver Metastasis of Gastric Cancer, Int l J. Cancer, vol. 125, no. 12, 2009, pp V. Spirin and L.A. Mirny, Protein Complexes and Functional Modules in Molecular Networks, Proc. Nat l Academies of Science, vol. 100, no. 21, 2003, pp T. Milenkovic and N. Przulj, Uncovering Biological Network Function via Graphlet Degree Signatures, Cancer Informatics, vol. 6, 2008, pp S. Huang, I. Ernberg, and S. Kauffman, Cancer Attractors: A Systems View of Tumors from a Gene Network Dynamics and Developmental Perspective, Seminars in Cell & Developmental Biology, vol. 20, no. 7, 2009, pp T. Enver, Stem Cell States, Fates, and the Rules of Attraction, Cell Stem Cell, vol. 4, no. 5, 2009, pp Alfredo Benso is a tenured associate professor of computer engineering at Politecnico di Torino, Italy. His research interests include bioinformatics and system biology. Benso has a PhD in information technologies from Politecnico di Torino. He is an IEEE Computer Society Golden Core Member and a senior member of IEEE. Contact him at alfredo.benso@polito.it. Stefano Di Carlo is an assistant professor in the Department of Control and Computer Engineering at the Politecnico di Torino, Italy. His research interests include design for testability, built-in self tests, and dependability, as well as bioinformatics and system biology. Di Carlo has a PhD in information technologies from the Politecnico di Torino, Italy. He is a Golden Core Member of IEEE Computer Society and a member of IEEE. Contact him at stefano. dicarlo@polito.it. 46 IT Pro July/August 2014

Gianfranco Politano is a postdoc in the Department of Control and Computer Engineering at the Politecnico di Torino, Italy.

He is a member of IEEE and IEEE Computer Society. Contact him at gianfranco.polita

.it. Alessandro Savino is a postdoc in the Department of Control and Computer Engineering at the Politecnico di Torino, Italy, and an associate with the Institute for Cancer Research and

Savino has a PhD in information technologies from the Politecnico di Torino, Italy. Contact him at alessandro.savi.it. Enrico M.

pharmaceutical industry. His research interests include biological data mining and analysis, and complex biological systems simulation and analysis.

10 Gianfranco Politano is a postdoc in the Department of Control and Computer Engineering at the Politecnico di Torino, Italy. His research interests include system reliability and machine learning techniques. Politano has a PhD in information technologies from the Politecnico di Torino, Italy. He is a member of IEEE and IEEE Computer Society. Contact him at gianfranco.politano@polito.it. Alessandro Savino is a postdoc in the Department of Control and Computer Engineering at the Politecnico di Torino, Italy, and an associate with the Institute for Cancer Research and Treatment of Candiolo. His research interests include microprocessor test and software-based self-test, as well as bioinformatics and image processing. Savino has a PhD in information technologies from the Politecnico di Torino, Italy. Contact him at alessandro.savino@polito.it. Enrico M. Bucci is a Italian National Research Agency (CNR) researcher and CEO of Biodigitalvalley, a b ioinformatic company dedicated to translating research results into products and services for the pharmaceutical industry. His research interests include biological data mining and analysis, and complex biological systems simulation and analysis. He holds a PhD in biochemistry and molecular biology from IMB, Jena in Germany. He is member of the American Association for the Advancement of Science. Contact him at enrico.bucci@ biodigitalvalley.com. Selected CS articles and columns are available for free at Take the CS Library wherever you go! IEEE Computer Society magazines and Transactions are now available to subscribers in the portable epub format. Just download the articles from the IEEE Computer Society Digital Library, and you can read them on any device that supports epub. For more information, including a list of compatible devices, visit computer.org/itpro itpro ben.indd /07/14 4:57 PM

Biodigitalvalley experience: integrating wet-lab with in-silico biomedical research

Biodigitalvalley experience: integrating wet-lab with in-silico biomedical research 1 BioDigitalValley External service Research projects Product Core know how Industrial Academic ParIS Image ProteinQuest