Understanding protein lists from comparative proteomics studies

Size: px
Start display at page:

Download "Understanding protein lists from comparative proteomics studies"

Transcription

1 Understanding protein lists from comparative proteomics studies Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine

2 A typical comparative shotgun proteomics study IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI Li et.al. JPR, CSHL Proteomics Course, 07/30/2010

3 Understanding a protein list Level I What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes? Level II Which biological processes and pathways are the most interesting in terms of the experimental question? Level III How do the proteins work together and are we missing some important proteins? 3 CSHL Proteomics Course, 07/30/2010

4 A typical result table from comparative shotgun proteomics studies All identified proteins Sequence coverage Count data Group totals Normalized totals Log ratio p False Discovery Rate (FDR) Li et.al. JPR, 2010 Sample data for this lecture can be downloaded from 4 CSHL Proteomics Course, 07/30/2010

5 Level one: information retrieval Query interface ( Output One-protein-at-a-time Time consuming Information is local and isolated Hard to automate the information retrieval process 5 CSHL Proteomics Course, 07/30/2010

6 Biomart: a batch information retrieval system Biomart is a query-oriented data management system. Particularly suited for providing 'data mining' like searches of complex descriptive data such as those related to genes and proteins Open source and can be customized Projects using Biomart for information retrieval include Ensembl, UniProt, InterPro, Reactome, and many others (see a complete list and get access to the tools from ) 6 CSHL Proteomics Course, 07/30/2010

7 Ensembl Biomart analysis Choose dataset Choose database: Ensembl Genes 58 Choose dataset: Homo sapiens genes (GRch37) Set filters Gene: a list of genes/proteins identified by various database IDs (e.g. IPI IDs) Gene Ontology: filter for proteins with specific GO terms (e.g. cell cycle) Protein domains: filter for proteins with specific protein domains (e.g. SH2 domain) Region: filter for genes in a specific chromosome region (e.g. chr1 1: or 11q13) Others Select output features Gene annotation information in the Ensembl database, e.g. gene description, chromosome name, gene start, gene end, strand, band, gene name, etc. External data: Gene Ontology, IDs in other databases Expression: anatomical system, development stage, cell type, pathology Protein domains: SMART, PFAM, Interpro, etc. 7 CSHL Proteomics Course, 07/30/2010

8 Ensembl Biomart: getting information for all proteins in a list Export all results to a file 8 CSHL Proteomics Course, 07/30/2010

9 Ensembl Biomart: filtering for a specific group of proteins Use the filter to select for a specific group or groups of proteins, e.g. cell cycle proteins, transcription factors, proteins with transmembrane domains, etc. 9 CSHL Proteomics Course, 07/30/2010

10 Level II: understanding a gene list at the functional group level Random Enrichment analysis: is a functional group (e.g. cell cycle) significantly associated with the experimental question? All identified proteins (1733) Observed 1590 annotated Filter for FDR<0.05 IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI IPI Compare MMP9 SERPINF1 A2ML1 F2 FN1 LYZ TNXB FGG MPO FBLN1 THBS1 HDLBP GSN FBN1 CA2 P11 CCL21 FGB Differentially expressed protein list (260 proteins) Extracellular space (670 proteins) 10 CSHL Proteomics Course, 07/30/2010

11 Enrichment analysis: hypergeometric test Significant proteins Non-significant proteins Proteins in the group k j-k j Total Other proteins n-k m-n-j+k m-j Total n m-n m Hypergeometric test: given a total of m proteins where j proteins are in the functional group, if we pick n proteins randomly, what is the probability of having k or more proteins from the group? p = min(n, j ) i= k m j n i j i m n Observed m n k j Zhang et.al. Nucleic Acids Res. 33:W741, CSHL Proteomics Course, 07/30/2010

12 Commonly used functional groups Gene Ontology ( Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products Three organizing principles: molecular function, biological process, and cellular component Pathways KEGG ( Pathway commons ( WikiPathways ( Cytogenetic bands 12 CSHL Proteomics Course, 07/30/2010

13 WebGestalt: Web-based Gene Set Analysis Toolkit 8 organisms 132 ID types webgestalt 73,986 functional groups Zhang et.al. Nucleic Acids Res. 33:W741, 2005 Duncan et al. BMC Bioinformatics. 11(Suppl 4) :P10, CSHL Proteomics Course, 07/30/2010

14 WebGestalt analysis Select the organism of interest. Upload a gene/protein list in the txt format, one ID per row. Optionally, a value can be provided for each ID. In this case, put the ID and value in the same row and separate them by a tab. Then pick the ID type that corresponds to the list of IDs. Categorize the uploaded ID list based upon GO Slim (a simplified version of Gene Ontology that focuses on high level classifications). Analyze the uploaded ID list for for enrichment in various biological contexts. You will need to select an appropriate predefined reference set or upload a reference set. If a customized reference set is uploaded, ID type also needs to be selected. After this, select the analysis parameters (e.g., significance level, multiple test adjustment method, etc.). Retrieve enrichment results by opening the respective results files. You may also open and/or download a TSV file, or download the zipped results to a directory on your desktop. 14 CSHL Proteomics Course, 07/30/2010

15 WebGestalt: ID mapping Input list 260 significant proteins identified in the HNSCC study (CSHL2010_hnscc_sig_withLogRatio.txt) Mapping result Total number of User IDs: 260. Unambiguously mapped User IDs to Entrez IDs: 229. Unique User Entrez IDs: 224. The Enrichment Analysis will be based upon the unique IDs. 15 CSHL Proteomics Course, 07/30/2010

16 WebGestalt: GOSlim classification Biological process Molecular function Cellular component 16 CSHL Proteomics Course, 07/30/2010

17 WebGestalt: top 10 enriched GO biological processes Reference list: CSHL2010_hnscc_all_proteins.txt 17 CSHL Proteomics Course, 07/30/2010

18 WebGestalt: top 10 enriched cytogenetic bands In head and neck squamous cell carcinoma (HNSCC), 11q13 amplification occurs in 30 40% of tumors and correlates with an increase in tumor grade, lymph node metastases, recurrence and decreased survival. (Clark et al. Oncogene, 2008) 18 CSHL Proteomics Course, 07/30/2010

19 WebGestalt: top 10 enriched KEGG pathways 19 CSHL Proteomics Course, 07/30/2010

20 WebGestalt: top 10 enriched WikiPathways 20 CSHL Proteomics Course, 07/30/2010

21 Level III: understanding gene lists at the network level What is a protein-protein interaction network? Most proteins mediate their function through interacting with other proteins to form molecular machines or to participate in various regulatory processes. A protein-protein interaction network is a graph model of this complex system in which nodes (proteins) are connected by edges (interactions). How to get protein interaction information? Experiments: yeast two-hybrid; tandem affinity purification Computational prediction Databases: DIP ( MINT ( ); BIND ( ); BioGRID ( ); HPRD ( ); MIPS ( etc Why network-based analysis? Not limited by existing knowledge on protein functions and pathway annotations Allowing better understanding of the mechanisms by which the identified proteins work together to lead to a specific phenotype change Revealing proteins that are missed in the original experiment Identifying drug targets based on network topology Function prediction for unannotated proteins 21 CSHL Proteomics Course, 07/30/2010

22 Genes2Networks Genes2Networks takes a list of genes/proteins as seeds and identify all interacting proteins that fall on paths through the background network between them. Berger et.al. BMC Bioinformatics, 8:372, CSHL Proteomics Course, 07/30/2010

23 Genes2Networks: identifying significant intermediates Genes2Networks uses a Z-score value to evaluate the significance of intermediate nodes in the output subnetwork. The Z-score is computed for each intermediate node using a binomial proportions test. Links from the node Total links Subnetwork a c Whole network b d z = a c b d b d 1 b d c 23 CSHL Proteomics Course, 07/30/2010

24 Genes2Networks analysis Specify input gene/protein list (CSHL2010_hnscc_sig_entrezSymbol.txt). Select Max Path Length (a value of 1 will only return connections between seed nodes). Choose a z score cutoff for labeling graph nodes. Choose colors for graph nodes based on whether or not it is in the seed list or above the significance cutoff. Select databases to use for generating the background network. Select filtering options for the background dataset. Remove low quality interactions from the background dataset by removing interactions from high throughput experiments or interactions without enough evidence. 24 CSHL Proteomics Course, 07/30/2010

25 Genes2networks: HNSCC network Number of nodes: 139 Number of interactions: 225 Hubs (#interactions) FN1 (27) THBS1(16) FGA(11) FLNA(10) Significant intermediates (z score) F13A1 (11.6) NID1 (8.0) MFAP5(7.5) LCN1(7.5) BMP1 (7.2) Fibronectin (FN1) plays a major role in cell adhesion, growth, migration and differentiation. It is important for processes such as wound healing and embryonic development. Altered fibronectin expression, degradation, and organization has been associated with a number of pathologies, including cancer and fibrosis. 25 CSHL Proteomics Course, 07/30/2010

26 WebGestalt: pathway enrichment of the network Network analysis enriches proteins in the same pathways. Nodes in the subnetwork: CSHL2010_genes2networks_subnet_nodes.txt Reference list: Human genome 26 CSHL Proteomics Course, 07/30/2010

27 Understanding a protein list: summary Level I What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes? Biomart ( Level II Which biological processes and pathways are the most interesting in terms of the experimental question? WebGestalt ( Related tools: DAVID ( GenMAPP ( Level III How do the proteins work together and are we missing some important proteins? Genes2Networks ( Related tools: Cytoscape ( STRING ( GeneMANIA ( Ingenuity ( Pathway Studio ( 27 CSHL Proteomics Course, 07/30/2010

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

HINT-KB: The Human Interactome Knowledge Base

HINT-KB: The Human Interactome Knowledge Base HINT-KB: The Human Interactome Knowledge Base Konstantinos Theofilatos 1, Christos Dimitrakopoulos 1, Dimitrios Kleftogiannis 2, Charalampos Moschopoulos 3, Stergios Papadimitriou 4, Spiros Likothanassis

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Chemical-disease inference using Comparative Toxicogenomics Database

Chemical-disease inference using Comparative Toxicogenomics Database Chemical-disease inference using Comparative Toxicogenomics Database 童俊維高雄醫學大學藥學系暨毒理學博士學位學程 cwtung@kmuedutw http://cwtungkmuedutw Genomics data grows Where are the data? Literatures Chemical A interacts

More information

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu

More information

Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger

Protein-Protein-Interaction Networks. Ulf Leser, Samira Jaeger Protein-Protein-Interaction Networks Ulf Leser, Samira Jaeger SHK Stelle frei Ab 1.9.2015, 2 Jahre, 41h/Monat Verbundprojekt MaptTorNet: Pankreatische endokrine Tumore Insb. statistische Aufbereitung und

More information

Function Prediction of Proteins from their Sequences with BAR 3.0

Function Prediction of Proteins from their Sequences with BAR 3.0 Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2

More information

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome Protein-Protein Interactions Protein Interactions A Protein may interact with: Other proteins Nucleic Acids Small molecules Protein-Protein Interactions: The Interactome Experimental methods: Mass Spec,

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

IPA Advanced Training Course

IPA Advanced Training Course IPA Advanced Training Course Academia Sinica 2015 Oct Gene( 陳冠文 ) Supervisor and IPA certified analyst 1 Review for Introductory Training course Searching Building a Pathway Editing a Pathway for Publication

More information

Functional Enrichment Analysis & Candidate Gene Ranking

Functional Enrichment Analysis & Candidate Gene Ranking Functional Enrichment Analysis & Candidate Gene Ranking Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232, S Building 10th Floor CCHMC Homepage: http://anil.cchmc.org

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison CodeLink compatible Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood

More information

Supplementary materials

Supplementary materials Supplementary materials Calculation of the growth rate for each gene In the growth rate dataset, each gene has many different growth rates under different conditions. The average growth rate for gene i

More information

Enabling Reproducible Gene Expression Analysis

Enabling Reproducible Gene Expression Analysis Enabling Reproducible Gene Expression Analysis Limsoon Wong 25 July 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo) 2 Plan An issue in gene expression analysis Comparing pathway sources: Comprehensiveness,

More information

BABELOMICS: Microarray Data Analysis

BABELOMICS: Microarray Data Analysis BABELOMICS: Microarray Data Analysis Madrid, 21 June 2010 Martina Marbà mmarba@cipf.es Bioinformatics and Genomics Department Centro de Investigación Príncipe Felipe (CIPF) (Valencia, Spain) DNA Microarrays

More information

Functional analysis using EBI Metagenomics

Functional analysis using EBI Metagenomics Functional analysis using EBI Metagenomics Contents Tutorial information... 2 Tutorial learning objectives... 2 An introduction to functional analysis using EMG... 3 What are protein signatures?... 3 Assigning

More information

JAFA: a Protein Function Annotation Meta-Server

JAFA: a Protein Function Annotation Meta-Server JAFA: a Protein Function Annotation Meta-Server Iddo Friedberg *, Tim Harder* and Adam Godzik Burnham Institute for Medical Research Program in Bioinformatics and Systems Biology 10901 North Torrey Pines

More information

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?

Bioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What? Bioinformatic Tools So you acquired data.. But you wanted knowledge So Now What? We have a series of questions What the Heck is That Ion? How come my MW does not match? How do I make a DB to search against?

More information

Using semantic web technology to accelerate plant breeding.

Using semantic web technology to accelerate plant breeding. Using semantic web technology to accelerate plant breeding. Pierre-Yves Chibon 1,2,3, Benoît Carrères 1, Heleena de Weerd 1, Richard G. F. Visser 1,2,3, and Richard Finkers 1,3 1 Wageningen UR Plant Breeding,

More information

Gene Annotation and Gene Set Analysis

Gene Annotation and Gene Set Analysis Gene Annotation and Gene Set Analysis After you obtain a short list of genes/clusters/classifiers what next? For each gene, you may ask What it is What is does What processes is it involved in Which chromosome

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Identifying Signaling Pathways. BMI/CS 776 Spring 2016 Anthony Gitter

Identifying Signaling Pathways. BMI/CS 776  Spring 2016 Anthony Gitter Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu Goals for lecture Challenges of integrating high-throughput assays Connecting relevant

More information

Bioinformatics to chemistry to therapy: Some case studies deriving information from the literature

Bioinformatics to chemistry to therapy: Some case studies deriving information from the literature Bioinformatics to chemistry to therapy: Some case studies deriving information from the literature. Donald Walter August 22, 2007 The Typical Drug Development Paradigm Gary Thomas, Medicinal Chemistry:

More information

The BioXM TM Knowledge Management Environment: a general and visually driven framework applied to the integration of large biological datasets

The BioXM TM Knowledge Management Environment: a general and visually driven framework applied to the integration of large biological datasets The BioXM TM Knowledge Management Environment: a general and visually driven framework applied to the integration of large biological datasets Supplementary Material Dieter Maier 1*, Wenzel Kalus 1, Martin

More information

Build Your Own Gene Expression Analysis Panels

Build Your Own Gene Expression Analysis Panels Build Your Own Gene Expression Analysis Panels George J. Quellhorst, Jr. Ph.D. Associate Director, R&D Agenda What Will We Discuss? Introduction Building Your Own Gene List Getting Started Increasing Coverage

More information

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Technical Overview Introduction RNA Sequencing (RNA-Seq) is one of the most commonly used next-generation sequencing (NGS)

More information

PIN (Proteins Interacting in the Nucleus) DB: A database of nuclear protein complexes from human and yeast

PIN (Proteins Interacting in the Nucleus) DB: A database of nuclear protein complexes from human and yeast Bioinformatics Advance Access published April 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. PIN (Proteins Interacting in the Nucleus) DB: A database of nuclear protein complexes

More information

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.

More information

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia

More information

IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA

IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA Marisa Chen Account Manager Qiagen Advanced Genomics Marisa.Chen@qiagen.com (203) 500-1237 Dev Mistry,

More information

Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks

Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks Maheshwari and Brylinski BMC Bioinformatics (2017) 18:257 DOI 10.1186/s12859-017-1675-z RESEARCH ARTICLE Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction

More information

Exercise1 ArrayExpress Archive - High-throughput sequencing example

Exercise1 ArrayExpress Archive - High-throughput sequencing example ArrayExpress and Atlas practical: querying and exporting gene expression data at the EBI Gabriella Rustici gabry@ebi.ac.uk This practical will introduce you to the data content and query functionality

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

B I O I N F O R M A T I C S

B I O I N F O R M A T I C S B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What

More information

massir: MicroArray Sample Sex Identifier

massir: MicroArray Sample Sex Identifier massir: MicroArray Sample Sex Identifier Sam Buckberry October 30, 2017 Contents 1 The Problem 2 2 Importing data and beginning the analysis 2 3 Extracting the Y chromosome probe data 3 4 Predicting the

More information

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing Gene Regulation Solutions Microarrays and Next-Generation Sequencing Gene Regulation Solutions The Microarrays Advantage Microarrays Lead the Industry in: Comprehensive Content SurePrint G3 Human Gene

More information

Bioinformatics Analysis of Nano-based Omics Data

Bioinformatics Analysis of Nano-based Omics Data Bioinformatics Analysis of Nano-based Omics Data Penny Nymark, Pekka Kohonen, Vesa Hongisto and Roland Grafström Hands-on Workshop on Nano Safety Assessment, 29 th September, 2016, National Technical University

More information

Introduction to EMBL-EBI.

Introduction to EMBL-EBI. Introduction to EMBL-EBI www.ebi.ac.uk What is EMBL-EBI? Part of EMBL Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands,

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s

More information

Peptide libraries: applications, design options and considerations. Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST

Peptide libraries: applications, design options and considerations. Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST Peptide libraries: applications, design options and considerations Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST Overview 1 2 3 4 5 Introduction Peptide library basics Peptide library design considerations

More information

The Integrated Biomedical Sciences Graduate Program

The Integrated Biomedical Sciences Graduate Program The Integrated Biomedical Sciences Graduate Program at the university of notre dame Cutting-edge biomedical research and training that transcends traditional departmental and disciplinary boundaries to

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Beyond Single Gene Analysis [of microarray data] High throughput data. How do ontologies help?

Beyond Single Gene Analysis [of microarray data] High throughput data. How do ontologies help? Beyond Single Gene Analysis [of microarray data] Nigam Shah nigam@stanford.edu Stanford Medical Informatics High throughput data high throughput is one of those fuzzy terms that is never really defined

More information

BIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007

BIOINFORMATICS IN AQUACULTURE. Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007 BIOINFORMATICS IN AQUACULTURE Aleksei Krasnov AKVAFORSK (Ås, Norway) Bergen, September 21, 2007 Research area Functional genomics of salmonids Major in diseases, stress and toxicity Experience is in -

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute Introduction to Microarray Data Analysis and Gene Networks Alvis Brazma European Bioinformatics Institute A brief outline of this course What is gene expression, why it s important Microarrays and how

More information

Web based Bioinformatics Applications in Proteomics. Genbank

Web based Bioinformatics Applications in Proteomics. Genbank Web based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu February 9, 2010 Genbank Primary nucleic acid sequence database Maintained by NCBI National Center for Biotechnology

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Introduction to Bioinformatics. Ulf Leser

Introduction to Bioinformatics. Ulf Leser Introduction to Bioinformatics Ulf Leser Bioinformatics 25.4.2003 50. Jubiläum der Entdeckung der Doppelhelix durch Watson/Crick 14.4.2003 Humanes Genom zu 99% sequenziert mit 99.99% Genauigkeit 2008 Genom

More information

Oncomine cfdna Assays Part III: Variant Analysis

Oncomine cfdna Assays Part III: Variant Analysis Oncomine cfdna Assays Part III: Variant Analysis USER GUIDE for use with: Oncomine Lung cfdna Assay Oncomine Colon cfdna Assay Oncomine Breast cfdna Assay Catalog Numbers A31149, A31182, A31183 Publication

More information

Towards definition of an ECM parts list: An advance on GO categories

Towards definition of an ECM parts list: An advance on GO categories Towards definition of an ECM parts list: An advance on GO categories The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University my background Undergraduate Degree computer systems engineer (ASU

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic

More information

A tutorial introduction into the MIPS PlantsDB barley&wheat databases. Manuel Spannagl&Kai Bader transplant user training Poznan June 2013

A tutorial introduction into the MIPS PlantsDB barley&wheat databases. Manuel Spannagl&Kai Bader transplant user training Poznan June 2013 A tutorial introduction into the MIPS PlantsDB barley&wheat databases Manuel Spannagl&Kai Bader transplant user training Poznan June 2013 MIPS PlantsDB tutorial - some exercises Please go to: http://mips.helmholtz-muenchen.de/plant/genomes.jsp

More information

Package goseq. R topics documented: December 23, 2017

Package goseq. R topics documented: December 23, 2017 Package goseq December 23, 2017 Version 1.30.0 Date 2017/09/04 Title Gene Ontology analyser for RNA-seq and other length biased data Author Matthew Young Maintainer Nadia Davidson ,

More information

Gene expression connectivity mapping and its application to Cat-App

Gene expression connectivity mapping and its application to Cat-App Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster Outline TITLE OF THE PRESENTATION Gene expression

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes oftware.com/proteom e_software_prod_sca ffold_download.html Follow installation instructions

More information

High-throughput scale. Desktop simplicity.

High-throughput scale. Desktop simplicity. High-throughput scale. Desktop simplicity. NextSeq 500 System. Flexible power. Speed and simplicity for whole-genome, exome, and transcriptome sequencing. Harness the power of next-generation sequencing.

More information

TELEBIOINFORMATICS SERVICES OF POLITECNICO DI MILANO FOR POST-GENOMIC BIOMEDICAL APPLICATIONS

TELEBIOINFORMATICS SERVICES OF POLITECNICO DI MILANO FOR POST-GENOMIC BIOMEDICAL APPLICATIONS TELEBIOINFORMATICS SERVICES OF POLITECNICO DI MILANO FOR POST-GENOMIC BIOMEDICAL APPLICATIONS Marco Masseroli and Francesco Pinciroli Biomedical Informatics and Telemedicine Laboratory, Bioengineering

More information

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017 Agenda What is Functional Genomics? RNA Transcription/Gene Expression Measuring Gene

More information

Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation

Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation Immunity Supplemental Information Compendium of Immune Signatures Identifies Conserved and Species-Specific Biology in Response to Inflammation Jernej Godec, Yan Tan, Arthur Liberzon, Pablo Tamayo, Sanchita

More information

How to view Results with. Proteomics Shared Resource

How to view Results with. Proteomics Shared Resource How to view Results with Scaffold 3.0 Proteomics Shared Resource An overview This document is intended to walk you through Scaffold version 3.0. This is an introductory guide that goes over the basics

More information

Fundamentals of Bioinformatics: computation, biology, computational biology

Fundamentals of Bioinformatics: computation, biology, computational biology Fundamentals of Bioinformatics: computation, biology, computational biology Vasilis J. Promponas Bioinformatics Research Laboratory Department of Biological Sciences University of Cyprus A short self-introduction

More information

Training Account. Account: ~ Password: ingenuity123. Sample & Assay Technologies

Training Account. Account: ~ Password: ingenuity123. Sample & Assay Technologies Training Account Account: asininca21@ingenuity.com ~ asinica40@ingenuity.com Password: ingenuity123 1 IPA Introductory Training Course Academia Sinica 2014 September Chris (Yu-Lun Kuo) 2 About me Chris

More information

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches

The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches DOI 10.1186/s13742-015-0083-4 RESEARCH The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches Ishita K. Khan 1, Qing Wei 1, Samuel Chapman 3, Dukka

More information

ONLINE BIOINFORMATICS RESOURCES

ONLINE BIOINFORMATICS RESOURCES Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower

More information

The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex

The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex University of Iowa Iowa Research Online Theses and Dissertations Spring 2011 The hidden transcriptome: discovery of novel, stress-responsive transcription in Daphnia pulex Stephen Butcher University of

More information

Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice

Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice Using 2-way ANOVA to dissect gene expression following myocardial infarction in mice Thank you for waiting. The presentation will be starting in a few minutes at 9AM Pacific Daylight Time. During this

More information

Bayesian Networks as framework for data integration

Bayesian Networks as framework for data integration Bayesian Networks as framework for data integration Jun Zhu, Ph. D. Department of Genomics and Genetic Sciences Icahn Institute of Genomics and Multiscale Biology Icahn Medical School at Mount Sinai New

More information

Access to Information from Molecular Biology and Genome Research

Access to Information from Molecular Biology and Genome Research Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is

More information

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What

More information

Lecture 11: Bioinformatics tools and databases Vladimir Rogojin. Fall 2015

Lecture 11: Bioinformatics tools and databases Vladimir Rogojin. Fall 2015 Introduction to Computational and Systems Biology Lecture 11: Bioinformatics tools and databases Vladimir Rogojin Department of Computer Science, Åbo Akademi http://users.abo.fi/ipetre/compsysbio Fall

More information

The Gene Ontology Providing a Functional Role in Proteomic Studies

The Gene Ontology Providing a Functional Role in Proteomic Studies 2 Education & Training DOI 10.1002/pmic.200800002 Practical Proteomics 1/2008 The Gene Ontology Providing a Functional Role in Proteomic Studies Emily C. Dimmer 1, Rachael P. Huntley 1, Daniel G. Barrell

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

SMISS: A protein function prediction server by integrating multiple sources

SMISS: A protein function prediction server by integrating multiple sources SMISS 1 SMISS: A protein function prediction server by integrating multiple sources Renzhi Cao 1, Zhaolong Zhong 1 1, 2, 3, *, and Jianlin Cheng 1 Department of Computer Science, University of Missouri,

More information

Ontologies examples and applications

Ontologies examples and applications Web Science & Technologies University of Koblenz Landau, Germany Ontologies examples and applications 2 UMLS - Unified Medical Language System Framework consisting of several knowledge bases and according

More information

Types of Databases - By Scope

Types of Databases - By Scope Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of

More information

A PROTEIN INTERACTION NETWORK OF THE MALARIA PARASITE PLASMODIUM FALCIPARUM

A PROTEIN INTERACTION NETWORK OF THE MALARIA PARASITE PLASMODIUM FALCIPARUM A PROTEIN INTERACTION NETWORK OF THE MALARIA PARASITE PLASMODIUM FALCIPARUM DOUGLAS J. LACOUNT, MARISSA VIGNALI, RAKESH CHETTIER, AMIT PHANSALKAR, RUSSELL BELL, JAY R. HESSELBERTH, LORI W. SCHOENFELD,

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

Training materials.

Training materials. Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Genomics AGRY Michael Gribskov Hock 331

Genomics AGRY Michael Gribskov Hock 331 Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will

More information

MICROARRAYS+SEQUENCING

MICROARRAYS+SEQUENCING MICROARRAYS+SEQUENCING The most efficient way to advance genomics research Down to a Science. www.affymetrix.com/downtoascience Affymetrix GeneChip Expression Technology Complementing your Next-Generation

More information

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential Applications Richard Finkers Researcher Plant Breeding, Wageningen UR Plant Breeding, P.O. Box 16, 6700 AA, Wageningen, The Netherlands,

More information

Ad hoc Interfaces for Querying Genomic Data

Ad hoc Interfaces for Querying Genomic Data Ad hoc Interfaces for Querying Genomic Data Peter Rieger Stephan Heymann Berlin-Adlershof Exemplified by Alternative Spliceform Evaluation Observations The One Domain per Exon Concept (Mid 90s) Modular

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

TOTAL CANCER CARE: CREATING PARTNERSHIPS TO ADDRESS PATIENT NEEDS

TOTAL CANCER CARE: CREATING PARTNERSHIPS TO ADDRESS PATIENT NEEDS TOTAL CANCER CARE: CREATING PARTNERSHIPS TO ADDRESS PATIENT NEEDS William S. Dalton, PhD, MD CEO, M2Gen & Director, Personalized Medicine Institute, Moffitt Cancer Center JULY 15, 2013 MOFFITT CANCER CENTER

More information

Network System Inference

Network System Inference Network System Inference Francis J. Doyle III University of California, Santa Barbara Douglas Lauffenburger Massachusetts Institute of Technology WTEC Systems Biology Final Workshop March 11, 2005 What

More information

Introduction to genome biology

Introduction to genome biology Introduction to genome biology Lisa Stubbs We ve found most genes; but what about the rest of the genome? Genome size* 12 Mb 95 Mb 170 Mb 1500 Mb 2700 Mb 3200 Mb #coding genes ~7000 ~20000 ~14000 ~26000

More information

PV92 PCR Bio Informatics

PV92 PCR Bio Informatics Purpose of PCR Chromosome 16 PV92 PV92 PCR Bio Informatics Alu insert, PV92 locus, chromosome 16 Introduce the polymerase chain reaction (PCR) technique Apply PCR to population genetics Directly measure

More information