SIREN Cytoscape plugin: Interaction Type Discrimination in Gene Regulatory Networks

Similar documents
Expanding BioPAX format by integrating Gene Regulation. Biological information. EMBnet.journal 16.1 RESEARCH PAPERS 39

Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data

GeneNetMiner: accurately mining gene regulatory networks from literature

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Random matrix analysis for gene co-expression experiments in cancer cells

Deakin Research Online

Statistical Inference and Reconstruction of Gene Regulatory Network from Observational Expression Profile

Finding Compensatory Pathways in Yeast Genome

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Biology 644: Bioinformatics

Introduction to 'Omics and Bioinformatics

dmgwas: dense module searching for genome wide association studies in protein protein interaction network

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Shannon pipeline plug-in: For human mrna splicing mutations CLC bio Genomics Workbench plug-in CLC bio Genomics Server plug-in Features and Benefits

Research Powered by Agilent s GeneSpring

Ganatum: a graphical single-cell RNA-seq analysis pipeline

User Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1

Knowledge-Guided Analysis with KnowEnG Lab

Inferring Gene Networks from Microarray Data using a Hybrid GA p.1

Clinical and Translational Bioinformatics

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Integrating Biological Databases in the Context of Transcriptional Regulatory Networks

Agilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008

Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment

Agilent Genomics Software Future Directions

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

KEGG: Kyoto Encyclopedia of Genes and Genomes

Dna Technology And Genomics Study Guide READ ONLINE

Agilent GeneSpring GX Software

March Product Release Information. About IPA. IPA Spring Release (2016): Release Notes. Table of Contents

Introduction to Bioinformatics

Introduction to Microarray Analysis

Package TIN. March 19, 2019

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Codon usage diversity in city microbiomes

Computing with large data sets

China National Grid --- BioNode. Jun Wang Beijing Genomics Institute

Cell Stem Cell, volume 9 Supplemental Information

Understanding protein lists from comparative proteomics studies

NextSeq 500 System WGS Solution

An Analytical Upper Bound on the Minimum Number of. Recombinations in the History of SNP Sequences in Populations

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

Lees J.A., Vehkala M. et al., 2016 In Review

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

Towards Gene Network Estimation with Structure Learning

Seven Keys to Successful Microarray Data Analysis

Gene Expression Data Analysis

MetaGO: Predicting Gene Ontology of non-homologous proteins through low-resolution protein structure prediction and protein-protein network mapping

Introduction to BIOINFORMATICS

Loosely Coupled Architecture for Bio-Network Reverse Engineering

Statistical Methods for Network Analysis of Biological Data

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

A comparative analytical assay of gene regulatory networks inferred using microarray and RNA-seq datasets

Cancer Informatics. Comprehensive Evaluation of Composite Gene Features in Cancer Outcome Prediction

DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences

A note on oligonucleotide expression values not being normally distributed

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

MulCom: a Multiple Comparison statistical test for microarray data in Bioconductor.

NGS in Pathology Webinar

Exploring Similarities of Conserved Domains/Motifs

WebMeV A Cloud Based Platform for Genomic Analysis Yaoyu E. Wang

Jianguo (Jeff) Xia, Assistant Professor McGill University, Quebec Canada June 26, 2017

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018

Applied Bioinformatics

Bioinformatics for Biologists

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

Machine Learning. HMM applications in computational biology

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D.

Package goseq. R topics documented: December 23, 2017

Diagnostic visualizations for normalization of gene expression microarray data. Visualization of LC-MS/MS data and quality scores

DRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo

Scanned chip images were analyzed in two ways: levels and obtain Presence /Absence (P/A) calls for targeted genes. The metrics sheets

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Confident Protein ID using Spectrum Mill Software

Supplementary Information Supplementary Figures

Product Applications for the Sequence Analysis Collection

Exercise1 ArrayExpress Archive - High-throughput sequencing example

Rapidly regulated genes are intron poor

Network System Inference

Comp/Phys/Mtsc 715. Example Videos. Administrative 4/12/2012. Bioinformatics Visualization. Vis 2005, Bertram. Vis 2005, Cantarel(tighten.

APA Version 3. Prerequisite. Checking dependencies

Oracle Enterprise Manager

1. Klenk H.-P. and colleagues published the complete genome sequence of the organism Archaeoglobus fulgidus in Nature. [ Nature 390: (1997)].

Permutation Clustering of the DNA Sequence Facilitates Understanding of the Nonlinearly Organized Genome

Biological Interpretation of Metabolomics Data. Martina Kutmon Maastricht University

KnetMiner USER TUTORIAL

Ingenuity Pathway Analysis (IPA )

BIOSTATISTICS AND MEDICAL INFORMATICS (B M I)

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

UPIC: Perl scripts to determine the number of SSR markers to run

Concepts of Genetics, 10e (Klug/Cummings/Spencer/Palladino) Chapter 1 Introduction to Genetics

Transcription:

SIREN Cytoscape plugin: Interaction Type Discrimination in Gene Regulatory Networks Jason Montojo, Pegah Khosravi 2,, Vahid H. Gazestani 3, Gary D. Bader * The Donnelly Centre, University of Toronto, Toronto, Canada 2 School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran 3 Institute of Parasitology, McGill University, Montreal, Quebec, Canada * Corresponding authors: Gary D. Bader E-mail: gary.bader@utoronto.ca Abstract Integrating expression data with gene interactions in a network is essential for understanding the functional organization of the cells. Consequently, knowledge of interaction types in biological networks is important for data interpretation. Signing of Regulatory Networks (SIREN) plugin for Cytoscape is an open-source Java tool for discrimination of interaction type (activatory or inhibitory) in gene regulatory networks. Utilizing an information theory based concept, SIREN seeks to identify the interaction type of pairs of genes by examining their corresponding gene expression profiles. We introduce SIREN, a fast and memory efficient tool with low computational complexity, that allows the user to easily consider it as a complementary approach for many network reconstruction methods. SIREN allows biologists to use independent expression data to predict interaction types for known gene regulatory networks where reconstruction methods do not provide any information about the nature of their interaction types. The SIREN Cytoscape plugin is implemented in Java and is freely available at http://baderlab.org/software/sirenplugin and via the Cytoscape app manager. Findings Introduction Integrating expression data with gene interactions in a network is essential for understanding cellular process. Many tools within the Cytoscape ecosystem were developed to analyse and interpret biological networks []. The SIREN Cytoscape app is a standalone tool for making fast and efficient gene interaction type determination. The app implements the SIREN algorithm which uses association-based approach to derive interaction type from expression data. It has been shown that signed gene co-expression networks provide a sound foundation for module-centric analysis, network simulation and is essential for data interpretation [2, 3]. Analysis of gene expression data and other omics data commonly use co-expression methods. Most coexpression measures are categorized as either correlation coefficient or mutual information (MI) measures. MI-based methods, in particular, are used to measure non-linear associations and able to detect additional correlations invisible to the linear Pearson coefficient [4-6]. SIREN uses an information-based approach. This new method is capable of identifying the type of interaction between two interacting genes. It is important to note SIREN detects the interaction types but it cannot detect the direction of interactions. The resulting signed network of

relationships among genes is available as a signed Cytoscape network for further analysis (Figure ). Figure : The resulting signed network SIREN applied on reconstructed prostate cancer network. Network contains 495 nodes and 43 edges, laid out in s. Methods and implementation The fundamental idea in SIREN is that if two connected nodes in a network have similar expression patterns, there is probably synergy between them. On the other hand, if their expression patterns are distinct, the interacting genes probably have a negative influence on each other [7]. Our method can be useful for the analysis of reconstructed networks and is unaffected by the choice of reconstruction method used to generate the input network. As inputs, SIREN uses expression data and a genetic network. SIREN determines a regulation type for every pair of connected nodes in the network by comparing the expression profiles of each pair and assigning a similarity score to it. To determine the distance between two nodes, SIREN rescales the raw similarity according to a rescaling matrix, and then integrates this score with a derivative form of point-wise mutual information formula. SIREN = W(x, y)p(x, y)log (p(x, y) p(x)p(y)) Where W(x,y) is the rescaling matrix, p(x,y) is the joint probability of a particular co-occurrence of events and p(x)p(y) is the marginal probability. Using a reliable cut-off threshold (±0.58) which SIREN does not detect any interaction type in random data to evaluate this SIREN score, a regulation type (positive or negative) is assigned to the edges in the network. For more information please read the original paper in [7]. We implemented SIREN algorithm as an app for Cytoscape, which is a collection of powerful tools for network analysis and inference. It can be installed using Cytoscape s menu-driven app 2

manager. Upon first use, the user must download the latest version of SIREN. The user then imports the network from file and uses the SIREN app to score the network. Finally, the software asks the user to import expression data from file. The network should be in two column format. There is no limitation for the number of row and column of expression data but the first column should contains IDs related to each gene which is the same with IDs in network [Additional file and Additional file 2). SIREN does not impose any limitations for gene identifier but it should be the same for both the expression data and network. The plugin can determine interaction type for all nodes with available expression data. The resulting signed network can also be exported in standard formats, e.g. XGMML, SIF and PDF. Users can update networks with their favorite threshold. Cytoscape s customizable filters of allow them to select edges with the highest SIREN score among all edges in a network. Users can easily add their own attributes to the networks with different colors based on activatory and inhibitory activity or SIREN weight. For example, the user can use a red line for activatory or blue line for inhibitory and different thickness of edges based on SIREN score for each edge (Figure 2). Figure 2: Add attributes to the networks The blue edge shows inhibitory and the red one shows activatory. Also, the thicker edge width illustrates higher SIREN score and vice versa. The size of the network data is limited by the amount of available memory and disk space. We recommend using a system with at least 4GB of total RAM. To evaluate time and memory efficiency of SIREN, we used human prostate cancer network contains 436 interactions of 526 genes from STRING platform [8] and microarray data were extracted from GEO (GDS2545), we tested SIREN on the derived network. We also evaluated SIREN efficiency on E. coli data contains 4005 interactions among 696 genes that the network 3

extracted from the RegulonDB database [9] and microarray dataset consisting of 907 samples from the Many Microbe Microarray Database (M 3D ) Web site [0]. Detecting interaction types during these data took just second on an Intel Core i5 system with 4GB of RAM. These examples show how the SIREN Cytoscape plugin can be used to determine interaction types for genes that may be involved in prokaryotic and eukaryotic gene regulatory networks (GRNs). Conclusions We described a statistical framework algorithm for inferring the regulatory type of interactions in a known GRN given corresponding genome-wide gene expression data. The algorithm is highly flexible and can be adopted by many reconstructed gene regulatory network with a number of large-scale data. For example, user can easily determine the interaction association (positive or negative) by integrating their reconstructed networks with available data from different sources. Availability and Requirements Project name: SIREN app Project home page: http://baderlab.org/software/sirenplugin Operating system: Platform independent Programming language: Java Other requirements: Cytoscape version 3.0.0 or newer, Java SE 7 License: GNU LGPL Any restrictions to use by non-academics: None Acknowledgements This work was supported by NRNB (U.S. National Institutes of Health, National Center for Research Resources grant number P4 GM03504). PK is supported by the School of Biological Sciences of Institute for Research in Fundamental Sciences (IPM). VHG is supported by CIHR Systems Biology Fellowship. The authors would like to warmly thank the SIREN team for support this project. Author details The Donnelly Centre, University of Toronto, Toronto, Canada. 2 School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran. 3 Institute of Parasitology, McGill University, Montreal, QC, Canada. List of abbreviations GRN: Gene regulatory network; MI: Mutual information; M 3D : Many Microbe Microarray Database; SIREN: Signing of Regulatory Networks Authors contributions JM and VHG developed the code for SIREN. PK analyzed the data and created the figures. PK and JM wrote the paper. GDB headed the research program. All authors made substantial written contributions to the manuscript, and have given approval to the final version presented here. Competing interests The authors declare that they have no competing interests. 4

References. Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, Pico AR, Bader GD, Ideker T: A travel guide to Cytoscape plugins. Nature methods 202, 9:069-076. 2. Guziolowski C, Kittas A, Dittmann F, Grabe N: Automatic generation of causal networks linking growth factor stimuli to functional cell state changes. The FEBS journal 202, 279:3462-3474. 3. Mason MJ, Fan G, Plath K, Zhou Q, Horvath S: Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics 2009, 0:327. 4. Numata J, Ebenhoh O, Knapp EW: Measuring correlations in metabolomic networks with mutual information. Genome informatics International Conference on Genome Informatics 2008, 20:2-22. 5. Daub CO, Steuer R, Selbig J, Kloska S: Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004, 5:8. 6. Song L, Langfelder P, Horvath S: Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 202, 3:328. 7. Khosravi p, Gazestani VH, Pirhaji L, Law B, Sadeghi M, Goliaei B, Bader GD: Inferring interaction type in gene regulatory networks using co-expression data. Algorithms for Molecular Biology 205, 0:23. 8. Snel B, Lehmann G, Bork P, Huynen MA: STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res 2000, 28:3442-3444. 9. Gama-Castro S, Jimenez-Jacinto V, Peralta-Gil M, Santos-Zavaleta A, Penaloza-Spinola MI, Contreras-Moreira B, Segura-Salazar J, Muniz-Rascado L, Martinez-Flores I, Salgado H, et al: RegulonDB (version 6.0): gene regulation model of Escherichia coli K-2 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res 2008, 36:D20-24. 0. Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, Gardner TS: Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res 2008, 36:D866-870. High resolution versions of all figures, associated details and Supplementary Information available at http://baderlab.org/pegahkhosravi/siren 5