In silico prediction of novel therapeutic targets using gene disease association data

Size: px
Start display at page:

Download "In silico prediction of novel therapeutic targets using gene disease association data"

Transcription

1 In silico prediction of novel therapeutic targets using gene disease association data, PhD, Associate GSK Fellow Scientific Leader, Computational Biology and Stats, Target Sciences GSK Big Data in Medicine

2 Challenges in pharma R&D Time and costs are increasing but success rate is declining 2

3 N molecules Relative cost (per molecule) Why focus on targets? Late phase failures cost (a lot) more Lead discovery Lead optimization Pre-clinical FTIH Phase 2 Phase 3 0 Manhattan Institute,

4 Potential targets Target validation Potential targets Target validation Rethink the drug discovery pipeline Spend more time and resources in target validation to reduce attrition in later phases Lead discovery Lead optimisation Pre-clinical FTIH Phase 2 Phase 3 Launch Lead discovery Lead optimisation Pre-clinical FTIH Phase 2 Phase 3 Launch 4

5 Target discovery and genetics evidence Cook et al., 2014; Nelson et al., % of efficacy failures are due to poor linkage between target and disease. The proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline. Selecting genetically supported targets could double the success rate in clinical development.

6 Open Targets A platform for therapeutic target identification and validation 6

7 Could it be as easy as spotting spam s? Predicting therapeutic targets Is it possible to predict novel therapeutic targets using available gene disease association data? 7

8 A simple machine learning workflow Predict therapeutic targets only using gene disease association data Generate input data matrix Assign labels and split into training, test and prediction sets Exploratory data analysis Explore predicted targets across the drug discovery pipeline Evaluate best classifier performance on test set Tune, train and test classifiers using nested cross-validation Make predictions using best performing classifier Validate with literature text mining 8

9 Data sources and data processing Input data matrix generation Obtain all gene disease associations and supporting evidence from Open Targets platform. For all genes, create numeric features by taking the mean score across all diseases: Genetic associations (germline) Somatic mutations Significant gene expression changes Disease-relevant phenotype in animal model Pathway-level evidence Gather positive labels from Pharmaprojects: only consider targets with drugs currently on the market, in clinical trials or preclinical studies. Exclude targets with drugs withdrawn from market or whose development has been discontinued. 9

10 A positive unlabelled (PU) semi-supervised learning approach Split data into training, test and prediction set A semi-supervised framework with only positive labels is used: targets according to PharmaProjects constitute the positive class (P), while the rest of the proteome is used as the unlabelled class (U), containing both negatives and yet-to-be-discovered positive. All positive cases (1421) and an equal number of randomly selected unlabelled cases (2842 in total) are set apart for training (80%) and testing (20%). The remainder is kept as a prediction set where predictions from the final model will be made. 10

11 Dimensionality reduction reveals structure in the data t-distributed Stochastic Neighbour Embedding (t-sne) 11

12 What are the most important features? Chi-squared test + information gain 12

13 Nested cross-validation and bagging for tuning and model selection Tuning, training and testing four classifiers Four classifiers are independently tuned, trained and tested on the training set using a nested cross-validation strategy (4 inner rounds for parameter tuning and 4 outer rounds to assess performance): Random forest (tuned parameters: number of trees and number of features); Feed-forward neural network with single hidden layer (tuned parameters: size and decay); Support vector machine with radial kernel (tuned parameters: gamma and cost); Gradient boosting machine with AdaBoost exponential loss function (tuned parameters: number of trees and interaction depth). In PU learning, U contains both positive and negative cases, which results in classifier instability. Bagging (bootstrap aggregating) can improve the performance of instable classifiers by randomly resampling P and U with replacement (bootstrap) and then aggregating the results by majority voting: Bagging with 100 iterations was applied to the neural network, the support vector machine and the gradient boosting machine. Random forests are already a special case of bagging. 13

14 Evaluating classifiers performance Receiver operating characteristic curves AUC

15 Disease association evidence higher for more advanced targets Model predicts late-stage targets more easily than early-stage ones 15

16 Literature text mining validation of predictions Highly significant overlap between predictions and text mining results 16

17 Conclusions In silico predictions of novel therapeutic targets using gene disease association data The gene disease association data from Open Targets contains enough information to predict whether a protein can make a therapeutic target or not with decent accuracy (71%) Aside from standard cross-validation and testing, prediction results were also validated by mining the scientific literature for therapeutic targets and assessing the significance of the overlap. The ability of the neural network model to predict late stage targets with greater accuracy confirms that clear linkage between target and disease is essential to maximise chances of success in the clinic. Of the evidence types tested, animal models showing disease-relevant phenotypes, dysregulated gene expression in disease tissue and genetic associations between gene and disease appear as the most informative ones. 17

18 Acknowledgements Ian Dunham Philippe Sanseau Gautier Koscielny Giovanni Dall Olio Pankaj Agarwal Mark Hurle Steven Barrett Nicola Richmond Jin Yao 18

19 Thank you 19

20 Pharmaprojects An industry-wide drug development database 20

21 Exploratory data analysis reveals sparse data with little structure Hierarchical clustering + principal component analysis 21

22 Tune, train and test classifiers using cross-validation Decision tree classification criteria 22

23 Evaluating classifiers performance Performance measures for supervised learning 23

24 Neural network performance on independent test set Selected classifier with most balanced overall performance for further analyses Cross-validation Test Misclassification error Accuracy AUC Recall/Sensitivity Specificity Precision F1 Score

25 Tune, train and test classifiers using cross-validation Misclassification error 25

26 Evaluate best classifier performance on test set Confusion matrices Actual value Crossvalidation Prediction outcome Unknown Target Unknown Target Actual value Test Prediction outcome Unknown Target Unknown Target

27 Split into training, test and prediction sets Assess the effect of randomly sampling from unlabelled class: Monte Carlo simulation 27

28 Tune, train and test classifiers using crossvalidation Precision recall curves 28

29 Tune, train and test classifiers using crossvalidation Overlap between predictions on training set Predicted targets Predicted non-targets In silico prediction of novel therapeutic targets using gene disease association data 29

30 Targets with lower disease association fail more often Majority of targets with discontinued programmes not predicted as targets 30

31 Generating predictions on remaining 15K genes Run model on prediction set (not used for training/testing) 31

32 Validate with literature text mining Assess the significance of the literature-based validation: permutation test 32

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Stock Price Prediction with Daily News

Stock Price Prediction with Daily News Stock Price Prediction with Daily News GU Jinshan MA Mingyu Derek MA Zhenyuan ZHOU Huakang 14110914D 14110562D 14111439D 15050698D 1 Contents 1. Work flow of the prediction tool 2. Model performance evaluation

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Machine Learning Techniques For Particle Identification

Machine Learning Techniques For Particle Identification Machine Learning Techniques For Particle Identification 06.March.2018 I Waleed Esmail, Tobias Stockmanns, Michael Kunkel, James Ritman Institut für Kernphysik (IKP), Forschungszentrum Jülich Outlines:

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Maximizing opportunities towards achieving clinical success D R U G D I S C O V E R Y. Report Price Publication date

Maximizing opportunities towards achieving clinical success D R U G D I S C O V E R Y. Report Price Publication date F o r a c l e a r e r m a r k e t p e r s p e c t i v e Early Stage Drug Safety Strategies & Risk Management Maximizing opportunities towards achieving clinical success D R U G D I S C O V E R Y Report

More information

Week 1: Discovery Biology Basic knowledge and tools used in Research and Development

Week 1: Discovery Biology Basic knowledge and tools used in Research and Development Downing - Keio Summer School 2018 Healthcare and Biotechnology Course Outline page 1 Strand A: Healthcare and Biotechnology Course Outline This course aims to provide an overview of the current challenges

More information

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here. Contents List of Figures xv Foreword xxiii Preface xxv Acknowledgments xxix Chapter

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results

More information

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan

Study on the Application of Data Mining in Bioinformatics. Mingyang Yuan International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New

More information

A new strategy for genetics & pharmacogenomics (GpGx) Robert M. Plenge, MD, PhD Vice President Head of Genetics & Pharmacogenomics

A new strategy for genetics & pharmacogenomics (GpGx) Robert M. Plenge, MD, PhD Vice President Head of Genetics & Pharmacogenomics A new strategy for genetics & pharmacogenomics (GpGx) Robert M. Plenge, MD, PhD Vice President Head of Genetics & Pharmacogenomics 1 Robert Plenge Our Shared Goals Impact the entire pipeline Drive early

More information

Practical Application of Predictive Analytics Michael Porter

Practical Application of Predictive Analytics Michael Porter Practical Application of Predictive Analytics Michael Porter October 2013 Structure of a GLM Random Component observations Link Function combines observed factors linearly Systematic Component we solve

More information

Classification of DNA Sequences Using Convolutional Neural Network Approach

Classification of DNA Sequences Using Convolutional Neural Network Approach UTM Computing Proceedings Innovations in Computing Technology and Applications Volume 2 Year: 2017 ISBN: 978-967-0194-95-0 1 Classification of DNA Sequences Using Convolutional Neural Network Approach

More information

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Predictive Genomics, Biology, Medicine Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data Ex. Find mean m and standard deviation s for

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Application of Deep Learning to Drug Discovery

Application of Deep Learning to Drug Discovery Application of Deep Learning to Drug Discovery Hiroshi Tanaka Tohoku Medical Megabank Orga nization, Tohoku University Current Situation of Drug Discovery Rapid increase of R&D expenditure More than 1B

More information

1/27 MLR & KNN M48, MLR and KNN, More Simple Generalities Handout, KJ Ch5&Sec. 6.2&7.4, JWHT Sec. 3.5&6.1, HTF Sec. 2.3

1/27 MLR & KNN M48, MLR and KNN, More Simple Generalities Handout, KJ Ch5&Sec. 6.2&7.4, JWHT Sec. 3.5&6.1, HTF Sec. 2.3 Stat 502X Details-2016 N=Stat Learning Notes, M=Module/Stat Learning Slides, KJ=Kuhn and Johnson, JWHT=James, Witten, Hastie, and Tibshirani, HTF=Hastie, Tibshirani, and Friedman 1 2 3 4 5 1/11 Intro M1,

More information

Application of Deep Learning to Drug Discovery

Application of Deep Learning to Drug Discovery Application of Deep Learning to Drug Discovery Hase T, Tsuji S, Shimokawa K, Tanaka H Tohoku Medical Megabank Organization, Tohoku University Current Situation of Drug Discovery Rapid increase of R&D expenditure

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

REIMAGINING DRUG DEVELOPMENT:

REIMAGINING DRUG DEVELOPMENT: Biology Reconstructed REIMAGINING DRUG DEVELOPMENT: Accurate Disease Modeling To Drive Successful Therapies Julia Kirshner, CEO julia@zpredicta.com 1 SUCCESS RATES OF DRUG DEVELOPMENT ARE LOW, " PARTICULARLY

More information

Determining Method of Action in Drug Discovery Using Affymetrix Microarray Data

Determining Method of Action in Drug Discovery Using Affymetrix Microarray Data Determining Method of Action in Drug Discovery Using Affymetrix Microarray Data Max Kuhn max.kuhn@pfizer.com Pfizer Global R&D Research Statistics Groton, CT Method of Action As the level of drug resistance

More information

Feature Selection in Pharmacogenetics

Feature Selection in Pharmacogenetics Feature Selection in Pharmacogenetics Application to Calcium Channel Blockers in Hypertension Treatment IEEE CIS June 2006 Dr. Troy Bremer Prediction Sciences Pharmacogenetics Great potential SNPs (Single

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS. Massimiliano Marcellino Bocconi University

BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS. Massimiliano Marcellino Bocconi University BIG DATA SKILLS: CHALLENGES FOR THE UNIVERSITY WORLD CREATING A NEW GENERATION OF DATA SCIENTISTS Massimiliano Marcellino Bocconi University CES 2017 Seminar on the new generation of statisticians and

More information

From Profit Driven Business Analytics. Full book available for purchase here.

From Profit Driven Business Analytics. Full book available for purchase here. From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business

More information

Successful and Faster Drug Development through Data Mining Dirk Belmans, Ph.D. SAS Belgium

Successful and Faster Drug Development through Data Mining Dirk Belmans, Ph.D. SAS Belgium Successful and Faster Drug Development through Data Mining Dirk Belmans, Ph.D. SAS Belgium The Life-Cycle of a Successful Pharmaceutical Product Sales Costs Patent Filing R&D time to launch is decreasing

More information

Deep Dive into High Performance Machine Learning Procedures. Tuba Islam, Analytics CoE, SAS UK

Deep Dive into High Performance Machine Learning Procedures. Tuba Islam, Analytics CoE, SAS UK Deep Dive into High Performance Machine Learning Procedures Tuba Islam, Analytics CoE, SAS UK WHAT IS MACHINE LEARNING? Wikipedia: Machine learning, a branch of artificial intelligence, concerns the construction

More information

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems Oracle Life Sciences eseminar Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems http://conference.oracle.com Meeting Place: US Toll Free: 1-888-967-2253 US Only: 1-650-607-2253

More information

Knowledge-Guided Analysis with KnowEnG Lab

Knowledge-Guided Analysis with KnowEnG Lab Han Sinha Song Weinshilboum Knowledge-Guided Analysis with KnowEnG Lab KnowEnG Center Powerpoint by Charles Blatti Knowledge-Guided Analysis KnowEnG Center 2017 1 Exercise In this exercise we will be doing

More information

Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction

Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction Matrix Factorization-Based Data Fusion for Drug-Induced Liver Injury Prediction Marinka Žitnik1 and Blaž Zupan 1,2 1 Faculty of Computer and Information Science, University of Ljubljana, Tržaška 25, 1000

More information

Recent years have witnessed an expansion in the disciplines encompassing drug

Recent years have witnessed an expansion in the disciplines encompassing drug Course overview Recent years have witnessed an expansion in the disciplines encompassing drug discovery outside the pharmaceutical industry. This is most notable with a significant number of Universities

More information

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur... What is Evolutionary Computation? Genetic Algorithms Russell & Norvig, Cha. 4.3 An abstraction from the theory of biological evolution that is used to create optimization procedures or methodologies, usually

More information

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system

More information

2. Materials and Methods

2. Materials and Methods Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all

More information

TARGET VALIDATION. Maaike Everts, PhD (with slides from Dr. Suto)

TARGET VALIDATION. Maaike Everts, PhD (with slides from Dr. Suto) TARGET VALIDATION Maaike Everts, PhD (with slides from Dr. Suto) Drug Discovery & Development Source: http://dlab.cl/molecular-design/drug-discovery-phases/ How do you identify a target? Target: the naturally

More information

Case Study: Dr. Jonny Wray, Head of Discovery Informatics at e-therapeutics PLC

Case Study: Dr. Jonny Wray, Head of Discovery Informatics at e-therapeutics PLC Reaxys DRUG DISCOVERY & DEVELOPMENT Case Study: Dr. Jonny Wray, Head of Discovery Informatics at e-therapeutics PLC Clean compound and bioactivity data are essential to successful modeling of the impact

More information

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk Summer Review 7 Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk Jian Zhou 1,2,3, Chandra L. Theesfeld 1, Kevin Yao 3, Kathleen M. Chen 3, Aaron K. Wong

More information

Dallas J. Elgin, Ph.D. IMPAQ International Randi Walters, Ph.D. Casey Family Programs APPAM Fall Research Conference

Dallas J. Elgin, Ph.D. IMPAQ International Randi Walters, Ph.D. Casey Family Programs APPAM Fall Research Conference Utilizing Predictive Modeling to Improve Policy through Improved Targeting of Agency Resources: A Case Study on Placement Instability among Foster Children Dallas J. Elgin, Ph.D. IMPAQ International Randi

More information

Data Mining Applications with R

Data Mining Applications with R Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG

More information

Introduction to Random Forests for Gene Expression Data. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.

Introduction to Random Forests for Gene Expression Data. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3. Introduction to Random Forests for Gene Expression Data Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.5 1 References Breiman, Machine Learning (2001) 45(1): 5-32. Diaz-Uriarte

More information

References. Introduction to Random Forests for Gene Expression Data. Machine Learning. Gene Profiling / Selection

References. Introduction to Random Forests for Gene Expression Data. Machine Learning. Gene Profiling / Selection References Introduction to Random Forests for Gene Expression Data Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.5 Breiman, Machine Learning (2001) 45(1): 5-32. Diaz-Uriarte

More information

Machine Learning in Computational Biology CSC 2431

Machine Learning in Computational Biology CSC 2431 Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs

More information

Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Background

Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Background Title: Genome-Wide Predictions of Transcription Factor Binding Events using Multi- Dimensional Genomic and Epigenomic Features Team members: David Moskowitz and Emily Tsang Background Transcription factors

More information

Presentation to the Committee on Accelerating Rare Disease Research and Orphan Product Development

Presentation to the Committee on Accelerating Rare Disease Research and Orphan Product Development Presentation to the Committee on Accelerating Rare Disease Research and Orphan Product Development 23 November 2009 Sharon F. Terry, MA President & CEO, Genetic Alliance Executive Director, PXE International

More information

2017 Qualifying Examination

2017 Qualifying Examination B1 1 Basic Molecular Genetics Mechanisms Dr. Ueng-Cheng Yang Molecular Genetics Techniques Cellular Energetics 24 2 Dr. Dar-Yi Wang Transcriptional Control of Gene Expression 8 3 Dr. Chuan-Hsiung Chang

More information

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute

Bioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence

More information

Prediction model of side effect in drug discovery and its implementation for Web application

Prediction model of side effect in drug discovery and its implementation for Web application Prediction model of side effect in drug discovery and its implementation for Web application Noriko Etani Graduate School of Medicine, Kyoto University CREST, Japan Science and Technology Agency Outline

More information

Machine Learning in Pharmaceutical Research

Machine Learning in Pharmaceutical Research Machine Learning in Pharmaceutical Research Dr James Weatherall Global Director of Biomedical & Clinical Informatics AstraZeneca Research & Development October 2011 Overview Background: Pharmaceutical

More information

Challenges and needs to re-use existing data in drug development

Challenges and needs to re-use existing data in drug development The Picture of Everything by Howard Hallis Challenges and needs to re-use existing data in Francois Pognan, PhD, Discovery Investigative Safety Basel, Switzerland : A Human Pathways Approach to Disease

More information

Synthetic vaccine research and development. Comprehensive and innovative synthetic biology solutions and technologies

Synthetic vaccine research and development. Comprehensive and innovative synthetic biology solutions and technologies Synthetic vaccine research and development Comprehensive and innovative synthetic biology solutions and technologies From plan to product, Thermo Fisher Scientific supports your synthetic vaccine goals

More information

PREDICTING PREVENTABLE ADVERSE EVENTS USING INTEGRATED SYSTEMS PHARMACOLOGY

PREDICTING PREVENTABLE ADVERSE EVENTS USING INTEGRATED SYSTEMS PHARMACOLOGY PREDICTING PREVENTABLE ADVERSE EVENTS USING INTEGRATED SYSTEMS PHARMACOLOGY GUY HASKIN FERNALD 1, DORNA KASHEF 2, NICHOLAS P. TATONETTI 1 Center for Biomedical Informatics Research 1, Department of Computer

More information

PREDICTION AND SIMULATION OF MULTI-TARGET THERAPIES FOR TRIPLE NEGATIVE BREAST CANCER THROUGH A NETWORK-BASED DATA INTEGRATION APPROACH

PREDICTION AND SIMULATION OF MULTI-TARGET THERAPIES FOR TRIPLE NEGATIVE BREAST CANCER THROUGH A NETWORK-BASED DATA INTEGRATION APPROACH University of Pavia Dep. of Electrical, Computer and Biomedical Engineering PREDICTION AND SIMULATION OF MULTI-TARGET THERAPIES FOR TRIPLE NEGATIVE BREAST CANCER THROUGH A NETWORK-BASED DATA INTEGRATION

More information

Cellular Assays. A Strategic Market Analysis. Sample Slides

Cellular Assays. A Strategic Market Analysis. Sample Slides A Strategic Market Analysis Sample Slides 2002 For information contact: Frontline Strategic Consulting, Inc. 1065 E. Hillsdale Blvd, Suite 403, Foster City, CA 94404 650-525-1500 x125, x135 or x145 info@frontlinesmc.com

More information

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM Abbas Heiat, College of Business, Montana State University-Billings, Billings, MT 59101, 406-657-1627, aheiat@msubillings.edu ABSTRACT CRT and ANN

More information

Identifying Splice Sites Of Messenger RNA Using Support Vector Machines

Identifying Splice Sites Of Messenger RNA Using Support Vector Machines Identifying Splice Sites Of Messenger RNA Using Support Vector Machines Paige Diamond, Zachary Elkins, Kayla Huff, Lauren Naylor, Sarah Schoeberle, Shannon White, Timothy Urness, Matthew Zwier Drake University

More information

Modelli predittivi in radioterapia: modelli statistici vs Machine Learning

Modelli predittivi in radioterapia: modelli statistici vs Machine Learning Modelli predittivi in radioterapia: modelli statistici vs Machine Learning Tiziana Rancati Programma Prostata Fondazione IRCCS Istituto Nazionale dei Tumori tiziana.rancati@istitutotumori.mi.it Modeling

More information

Introduction to Drug Development

Introduction to Drug Development Introduction to Drug Development Yves Geysels, PhD Head Clinical Research Operations, Belgium Past President of the Associoation of Clinical Research Professionals (ACRP) Board Member of the European Forum

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

CS6716 Pattern Recognition

CS6716 Pattern Recognition CS6716 Pattern Recognition Aaron Bobick School of Interactive Computing Administrivia Shray says the problem set is close to done Today chapter 15 of the Hastie book. Very few slides brought to you by

More information

Functional genomics + Data mining

Functional genomics + Data mining Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data

More information

Drug target identification. Enabling our pharmaceutical and biotech partners to effectively discover proteins or genes as novel targets

Drug target identification. Enabling our pharmaceutical and biotech partners to effectively discover proteins or genes as novel targets Drug target identification Enabling our pharmaceutical and biotech partners to effectively discover proteins or genes as novel targets Drug target identification Target identification and validation are

More information

Copyright 2013, SAS Institute Inc. All rights reserved.

Copyright 2013, SAS Institute Inc. All rights reserved. IMPROVING PREDICTION OF CYBER ATTACKS USING ENSEMBLE MODELING June 17, 2014 82 nd MORSS Alexandria, VA Tom Donnelly, PhD Systems Engineer & Co-insurrectionist JMP Federal Government Team ABSTRACT Improving

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Prostate Cancer Genetics: Today and tomorrow

Prostate Cancer Genetics: Today and tomorrow Prostate Cancer Genetics: Today and tomorrow Henrik Grönberg Professor Cancer Epidemiology, Deputy Chair Department of Medical Epidemiology and Biostatistics ( MEB) Karolinska Institutet, Stockholm IMPACT-Atanta

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

BioXplain The Alliance for Integrative Biology

BioXplain The Alliance for Integrative Biology The Alliance for Integrative Biology The First Open Platform for Iterative, Predictive and Integrative Biology Manuel GEA Co-founder & CEO Bio-Modeling Systems BioXplain 2009 Pharma R&D critical challenge

More information

Antibody Discovery at Evotec

Antibody Discovery at Evotec Antibody Discovery at Evotec - Overview - Evotec Antibodies Adding value to our partners research Innovative and flexible solutions from target ID to pre-clinical candidate The people A wide therapeutic

More information

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001 Pharmacogenetics: A SNPshot of the Future Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001 1 I. What is pharmacogenetics? It is the study of how genetic variation affects drug response

More information

Lecture 6: Decision Tree, Random Forest, and Boosting

Lecture 6: Decision Tree, Random Forest, and Boosting Lecture 6: Decision Tree, Random Forest, and Boosting Tuo Zhao Schools of ISyE and CSE, Georgia Tech CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Tree? Tuo Zhao Lecture 6: Decision

More information

Pioneering Clinical Omics

Pioneering Clinical Omics Pioneering Clinical Omics Clinical Genomics Strand NGS An analysis tool for data generated by cutting-edge Next Generation Sequencing(NGS) instruments. Strand NGS enables read alignment and analysis of

More information

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2 LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2 DEADLINE: March 13, 2019, 2:00pm EDT Applicants will be notified of Proposal invitations in May 2019 This Letter of Intent

More information

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2 LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases Round 2 DEADLINE: May 15, 2019, 2:00pm EDT Applicants will be notified of Proposal invitations in August 2019 This Letter of

More information

- OMICS IN PERSONALISED MEDICINE

- OMICS IN PERSONALISED MEDICINE SUMMARY REPORT - OMICS IN PERSONALISED MEDICINE Workshop to explore the role of -omics in the development of personalised medicine European Commission, DG Research - Brussels, 29-30 April 2010 Page 2 Summary

More information

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases

LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases LETTER OF INTENT Rapid Response: Canada 2019 Parkinson s & Related Diseases DEADLINE: Wednesday, August 1, 2018, 2:00pm EDT Applicants will be notified of Proposal invitations in September 2018. This Letter

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Summary of genetic association data and their traits and gene mappings.

Nature Genetics: doi: /ng Supplementary Figure 1. Summary of genetic association data and their traits and gene mappings. Supplementary Figure 1 Summary of genetic association data and their traits and gene mappings. Distribution of the (a) number of publications or sources and (b) reported associations for each unique MeSH

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

BioXplain The Alliance for Integrative Biology The First Open Platform for Iterative, Predictive and Integrative Biology

BioXplain The Alliance for Integrative Biology The First Open Platform for Iterative, Predictive and Integrative Biology The Alliance for Integrative Biology The First Open Platform for Iterative, Predictive and Integrative Biology BioXplain 2009 Pharma R&D critical challenge Despite increasing investments in Technology

More information

Biomarker discovery. Enabling pharmaceutical and biotech partners to discover relevant biomarkers in diseases of interest

Biomarker discovery. Enabling pharmaceutical and biotech partners to discover relevant biomarkers in diseases of interest Biomarker discovery Enabling pharmaceutical and biotech partners to discover relevant biomarkers in diseases of interest Biomarker discovery A biomarker is a measurable indicator of the severity or presence

More information

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016

CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 Background A typical human cell consists of ~6 billion base pairs of DNA and ~600 million bases of mrna. It is time-consuming and expensive to

More information

Computational Approaches to Analysis of DNA Microarray Data

Computational Approaches to Analysis of DNA Microarray Data 2006 IMI and Schattauer GmbH 91 Computational pproaches to nalysis of DN Microarray Data J. Quackenbush Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department

More information

Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies

Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies Zhengyang Dong Middlesex School, Concord, MA, USA Corresponding Author: Zhengyang Dong Middlesex

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

PharmaPerspectiveonCDx. DrGillian Ellison

PharmaPerspectiveonCDx. DrGillian Ellison PharmaPerspectiveonCDx DrGillian Ellison Pharma Perspective Need for CDx Partnering with Dx company Co- development Launch readiness & LCM Need for CDx Drug development Drug development is a challenging

More information

How Targets Are Chosen. Chris Wayman 12 th April 2012

How Targets Are Chosen. Chris Wayman 12 th April 2012 How Targets Are Chosen Chris Wayman 12 th April 2012 A few questions How many ideas does it take to make a medicine? 10 20 20-50 50-100 A few questions How long does it take to bring a product from bench

More information

Our website:

Our website: Biomedical Informatics Summer Internship Program (BMI SIP) The Department of Biomedical Informatics hosts an annual internship program each summer which provides high school, undergraduate, and graduate

More information

From Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore

From Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore From Bench to Bedside: Role of Informatics Nagasuma Chandra Indian Institute of Science Bangalore Electrocardiogram Apparent disconnect among DATA pieces STUDYING THE SAME SYSTEM Echocardiogram Chest sounds

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Azure ML Studio. Overview for Data Engineers & Data Scientists

Azure ML Studio. Overview for Data Engineers & Data Scientists Azure ML Studio Overview for Data Engineers & Data Scientists Rakesh Soni, Big Data Practice Director Randi R. Ludwig, Ph.D., Data Scientist Daniel Lai, Data Scientist Intersys Company Summary Overview

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

BIOINFORMATICS THE MACHINE LEARNING APPROACH

BIOINFORMATICS THE MACHINE LEARNING APPROACH 88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and

More information

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine

Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine Complex Adaptive Systems Forum: Transformative CAS Initiatives in Biomedicine January 18, 2013 Anna D. Barker, Ph.D. Director, Transformative Healthcare Networks C-Director, Complex Adaptive Systems Initiative

More information

Course Agenda. Day One

Course Agenda. Day One Course Agenda BioImmersion: Biotech for the Non-Scientist A three-day, in-depth course that provides the background required for understanding today s fast-paced biotech marketplace. Beginning with an

More information

Smart India Hackathon

Smart India Hackathon TM Persistent and Hackathons Smart India Hackathon 2017 i4c www.i4c.co.in Digital Transformation 25% of India between age of 16-25 Our country needs audacious digital transformation to reach its potential

More information

Micar Innovation. Drug Discovery Factory for novel drug molecules

Micar Innovation. Drug Discovery Factory for novel drug molecules Problem There are so many incurable diseases around the world that need adequate novel compounds for the treatment, like: - Chronic pain still a pandemic in the 21 st century and affecting 1.5bn people

More information

Statistical Analysis of Gene Expression Data Using Biclustering Coherent Column

Statistical Analysis of Gene Expression Data Using Biclustering Coherent Column Volume 114 No. 9 2017, 447-454 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu 1 ijpam.eu Statistical Analysis of Gene Expression Data Using Biclustering Coherent

More information

PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS

PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS Journal of Engineering Science and Technology Vol. 13, No. 7 (2018) 1916-1925 School of Engineering, Taylor s University PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS

More information

Exon Skipping. Wendy Erler Patient Advocacy Wave Life Sciences

Exon Skipping. Wendy Erler Patient Advocacy Wave Life Sciences Exon Skipping Wendy Erler Patient Advocacy Wave Life Sciences Forward Looking Statements This document contains forward-looking statements. All statements other than statements of historical facts contained

More information