Gene expression connectivity mapping and its application to Cat-App

Similar documents
Gene Signature Lab: Exploring integrative LINCS (ilincs) Data and Signatures Analysis Portal & Other LINCS Resources

Bioinformatics : Gene Expression Data Analysis

Aiming to minimize testing in laboratory animals for the regulatory safety evaluation of petroleum substances

Progress and Future Directions in Integrated Systems Toxicology. Mary McBride Agilent Technologies

How Targets Are Chosen. Chris Wayman 12 th April 2012

- OMICS IN PERSONALISED MEDICINE

Machine Learning. HMM applications in computational biology

Knowledge-Guided Analysis with KnowEnG Lab

"Stratification biomarkers in personalised medicine"

Pioneering Clinical Omics

Lecture 23: Clinical and Biomedical Applications of Proteomics; Proteomics Industry

Data Mining for Biological Data Analysis

Our website:

The Human Toxome Project a test case for pathway identification by multiomics. Thomas Hartung

IPA Advanced Training Course

Gene Expression Data Analysis

Research Powered by Agilent s GeneSpring

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

EECS730: Introduction to Bioinformatics

Alexander Statnikov, Ph.D.

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Whole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist

PREDICTION AND SIMULATION OF MULTI-TARGET THERAPIES FOR TRIPLE NEGATIVE BREAST CANCER THROUGH A NETWORK-BASED DATA INTEGRATION APPROACH

Capabilities & Services

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt

Integrating Molecular Toxicology Earlier in the Drug Development Process

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE

Proteomics And Cancer Biomarker Discovery. Dr. Zahid Khan Institute of chemical Sciences (ICS) University of Peshawar. Overview. Cancer.

Functional genomics + Data mining

Introducing a Highly Integrated Approach to Translational Research: Biomarker Data Management, Data Integration, and Collaboration

Integration of FDSS7000 into a modular robotic system for Open Innovation drug discovery

MICROARRAYS+SEQUENCING

Microarrays & Gene Expression Analysis

University of Pennsylvania Perelman School of Medicine High Throughput Screening Core

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Information Driven Biomedicine. Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Top 5 Lessons Learned From MAQC III/SEQC

Course Agenda. Day One

DNA Studies (Genetic Studies) Disclosure Statement. Definition of Genomics. Outline. Immediate Present and Near Future

Stefano Monti. Workshop Format

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

Bayer Pharma s High Tech Platform integrates technology experts worldwide establishing one of the leading drug discovery research platforms

AIT - Austrian Institute of Technology

Enabling more sophisticated analysis of proteomic profiles. Limsoon Wong (Joint work with Wilson Wen Bin Goh)

Gene Expression Analysis with Pathway-Centric DNA Microarrays

Introduction to Drug Design and Discovery

BioXplain The Alliance for Integrative Biology The First Open Platform for Iterative, Predictive and Integrative Biology

Predictive and Causal Modeling in the Health Sciences. Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics

Towards a framework for transcriptomics and other Big Data analysis for regulatory application

Computational methods in bioinformatics: Lecture 1

The essentials of microarray data analysis

Maximizing opportunities towards achieving clinical success D R U G D I S C O V E R Y. Report Price Publication date

CellFateScout a bioinformatics tool for elucidating small molecule signaling pathways that drive cells in a specific direction

April transmart v1.2 Case Study for PredicTox

Recent years have witnessed an expansion in the disciplines encompassing drug

Microarray Informatics

Bioinformatics for Biologists

The Irish Biomarker Network - Inaugural Workshop Improved Translation of Biomarkers to the Clinic

Combining Target-Based and Phenotypic Discovery Assays for Drug Repurposing. Sharlene Velichko, Ph.D. Director of Research Biology November 11, 2017

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Hector R. Wong, MD Division of Critical Care Medicine Cincinnati Children s Hospital Medical Center Cincinnati Children s Research Foundation

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Biomarkers and clinical trials. Patricia woo UCL

Introduction to BIOINFORMATICS

Enrichment Design with Patient Population Augmentation

Introduction into single-cell RNA-seq. Kersti Jääger 19/02/2014

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Measuring gene expression

Terminology for personalized medicine

Case Study: Dr. Jonny Wray, Head of Discovery Informatics at e-therapeutics PLC

Improving coverage and consistency of MS-based proteomics. Limsoon Wong (Joint work with Wilson Wen Bin Goh)

MANIFESTO OF STUDIES 2012

BIOCRATES Life Sciences AG Short Company Presentation

Machine Learning in Computational Biology CSC 2431

TARGET VALIDATION. Maaike Everts, PhD (with slides from Dr. Suto)

BioXplain The Alliance for Integrative Biology

Genomics and Proteomics *

Introduction to Bioinformatics

Application of Deep Learning to Drug Discovery

Application of Deep Learning to Drug Discovery

Extending the small molecule similarity principle to all levels of biology

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

GENOMICS for DUMMIES

Medicinal Chemistry of Modern Antibiotics

Medicinal Chemistry of Modern Antibiotics

NGS Approaches to Epigenomics

UAMS ADVANCED DIAGNOSTICS FOR ADVANCING CURE

From Bench to Bedside: Role of Informatics. Nagasuma Chandra Indian Institute of Science Bangalore

Towards unbiased biomarker discovery

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Personalized Medicine A new challenge for applied human pharmacology?

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

Introduction to Bioinformatics

Bilateral technical meeting between the EFSA Panel on Genetically Modified Organisms and AFSSA

Yoshiaki Uyama,, Ph.D. Office of New Drug III (PMDA)

Comp/Phys/Mtsc 715. Example Videos. Administrative 4/12/2012. Bioinformatics Visualization. Vis 2005, Bertram. Vis 2005, Cantarel(tighten.

Mayo Clinic Arizona/Rochester and U. Minnesota

The Major Function Of Rna Is To Carry Out The Genetic Instructions For Protein Synthesis

Transcription:

Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster

Outline TITLE OF THE PRESENTATION Gene expression connectivity mapping Initial success and further development Typical applications Application to Cat-App Discussion

Cat-App Workprogramme TITLE OF THE PRESENTATION

Work package 4b Connectivity mapping analysis 1. Similarity matrix of the ~ 160 Concawe UVCBs: establishing significant connections among the assayed UVCBs themselves. 2. Performing robustness analysis on the connections and groupings of Concawe substances obtained.

01 Introduction to gene expression connectivity mapping

The Lamb Connectivity Map Key components 1. Reference Profiles: A set of gene expression profiles, obtained from systematic microarray gene expression profiling. ( MCF7 and other cell lines, 164 bioactive small molecule compounds, 564 microarrays, 453 individual reference profiles) 2. Gene signature: a short list of important genes, selected by the researchers as a result of some experiments investigating a particular biological condition. 3. Connection score, defined as a function of a Reference Profile and a Gene Signature. It should reflect the underlying biological connection between them. J Lamb et al., Science 313, 1929 1935 (2006)

How to make a reference gene-expression profile An example: Cells: Compound: Dose: Duration: MCF7 (Breast cancer epithelial cell line) Estradiol 0.1 mu M 6 hours Estradiol MCF7 cells ----------------- mrna levels assayed by microarray(s) control MCF7 cells --------------- mrna levels assayed by microarray(s) expression ratios ----- reference profile 7

The context of Connectivity Map concept Using some form of molecular profile to characterize a biological state The basis of many practices in medicine Various biochemical tests measure the molecular profile of a patient To learn about the biological state of the patient (being with a particular disease or healthy). Connectivity Map concept is the generalization of molecular profile (bio-marker) measurement. The sheer number of different mrnas measured enables a much richer and complex description of the biological state than a few marker molecules could achieve. The specification of the mrna profiles as measured by high throughput technologies (microarrays, NGS) can provide an adequate description of the biological state.

02 Initial success and further development

Example: HDAC inhibitors Keith B. Glaser et al (2003), Molecular Cancer Therapeutics, Vol. 2, 151 163. Gene Expression Profiling of Multiple Histone Deacetylase (HDAC) Inhibitors Three Cell lines: T24 (bladder) MDA435 (breast carcinoma) MDA468 (breast carcinoma) Treated with three HDAC inhibitors: vorinostat MS 27 275 trichostatin A The result gene signature: ( > 2 fold, p< 0.01) 8 up regulated genes 5 down regulated genes

J Lamb et al., Science 313, 1929 1935 (2006) TITLE OF THE PRESENTATION

A simpler and more robust framework Zhang & Gant s Improvements upon the Lamb Connectivity Map: 1. A simpler, unified framework for ranking genes 2. More principled statistical testing procedures 3. Effective safeguard against false connections creeping in 4. Increased sensitivity in picking up real biological connections Reference Profiles: To treat up and down regulated genes in a unified framework, on a equal footing basis. To give a signed rank to each gene, with the most differentially expressed genes (either up or down) receiving the highest ranks. Connection strength: Zhang & Gant 2008, BMC Bioinformatics 2008, 9:258. Connection Strength Connection Score = Maximum possible connection strength Statistical testing procedures: The null hypothesis H 0 : R is a reference gene expression profile, s is a gene signature. The null hypothesis H 0 states that there is no underlying biological connection between R and s, and that s is merely a random gene signature. The calculation of p value: After observing a connection score between a reference profile R and a gene signature s, we generate a large number (10,000) of random gene signatures, and calculate the 10,000 connection scores. The proportion of scores higher than the observed score ( in absolute values) is the p value. 12

453 Ref profiles (164 compounds) 5 human cell lines ER agonists: Estradiol Alpha estradiol Genistein NDGA (nordihydroguaiaretic acid ) Estrogen Zhang & Gant 2008, BMC Bioinformatics 2008, 9:258. ER antagonists: Fulvestrant Vorinostat Tamoxifen Raloxifene Fujimoto et al (2004), Estrogenic activity of an antioxidant, nordihydroguaiaretic acid (NDGA), Life Sciences, 74, 1417 1425, where NDGA has been shown to have estrogenic activity and able to elicit an estrogen like response.

sscmap: A Java software for the connectivity mapping http://purl.oclc.org/net/sscmap Bundled with 6100 individual reference profiles, covering over 1000 compounds. The implementation of the improved framework Zhang & Gant 2009, BMC Bioinformatics 2009, 10:236.

Three fronts to push forward Methodological and algorithmic advancement Novel applications: phenotypic targeting and manipulation Software and Tools: High performance computing model: multi core, GPU, Cluster

Methodology and software development O'Reilly PG, Wen Q, Bankhead P, Dunne PD, McArt DG, McPherson S, Hamilton PW, Mills KI, Zhang SD. QUADrATiC: scalable gene expression connectivity mapping for repurposing FDA approved therapeutics. BMC Bioinformatics. 2016 May 4;17(1):198. McArt DG, Bankhead P, Dunne PD, Salto Tellez M, Hamilton P, Zhang SD. cudamap: a GPU accelerated program for gene expression connectivity mapping. BMC Bioinformatics. 2013 Oct 11;14:305. McArt DG, Dunne PD, Blayney JK, Salto Tellez M, Van Schaeybroeck S, Hamilton PW, Zhang SD. Connectivity Mapping for Candidate Therapeutics Identification Using Next Generation Sequencing RNA Seq Data. PLoS One. 2013 Jun 26;8(6):e66902. McArt DG, Zhang SD. Identification of candidate small molecule therapeutics to cancer by gene signature perturbation in connectivity mapping. PLoS One. 2011 Jan 31;6(1):e16382..

Connectivity mapping to LINCS FDA reference profiles TITLE OF THE PRESENTATION

03 Applications

The Principle of Phenotypic targeting Biological states can be adequately characterised by geneexpression profiles A phenotype (biological state) represented by localized points in the high dimensional gene expression space. Use connectivity mapping to identify compounds that can move a state point towards its desired locations.

Schematic view: a two gene world Gene 2 Gene 1

Schematic view: a two gene world Gene 2 Gene 1

Schematic view: a two gene world Gene 2 Gene 1

The general principles of phenotypic targeting: A Schematic view in a two gene world Gene 2 Gene 1

Recent data release from the Broad Institute LINCS project TITLE OF THE PRESENTATION

Gene signature constructions with three major types of analyses Differential expression analysis: typically two condition comparisons. Disease vs Normal, Resistant vs sensitive Co expression analysis: Closest correlates of key genes of interest; new players of biological processes or pathways. Survival analysis: Integrating clinical and genomics data for Prognostic markers identification.

Differential expression based analyses and applications: Disease vs Normal Ramsey et al 2013, Stem Cells. 31(7):1434 45.

Differential expression based analyses and applications: Disease vs Normal Liberante et al 2016, Oncotarget. 7(6):6609 19.

Co expression based analyses and applications: Seed genes Malcomson et al 2016, PNAS 113(26):E3725 34.

The utility of an effective connectivity map 1. Developing new biological hypotheses following the leads of novel connections 2. Chemical screening process, drug development pipeline 3. Identify novel pharmacological and toxicological properties of candidate compounds 4. Predictive toxicology 5. Discover new uses for old drugs 6. Extensible to other omics technologies, eg. Protein profiles, metabolite profiles.

04 Application to Cat-App

Apply connectivity mapping to Cat App 1. Connectivity mapping scores as similarity measure for the substances 2. Unsupervised clusters as newly discovered categories 3. Supervised classification to assign substances to known categories. 4. Connectivity mapping scores as slices into the ToxPi framework. 5. Per cell line connectivity scores as slices in a genomics only ToxPi.

Work package 4B Connectivity mapping analysis 1. Similarity matrix of the ~ 160 Concawe UVCBs: establishing significant connections among the assayed UVCBs themselves. (a) Creation of the expression profiles for the ~ 160 petroleum substances. Substance concentration Cell line time point (b) Clustering at the substance level or individual profile level (substance cell line concentration combinations)

Work package 4B Connectivity mapping analysis 2. Performing robustness analysis on the connections and groupings of Concawe substances obtained. Using a suitable public dataset to develop and test the algorithms and analysis pipeline. Applying the tested procedures to Cat App data.

Summary TITLE OF THE PRESENTATION Gene expression connectivity mapping Initial success and further development Typical applications Application to Cat-App Discussion

Boulevard du Souverain, 165 B 1160 Brussels Belgium T. +32 2 566 91 60 F. +32 2 566 91 81 www.concawe.eu

Work package 4B Connectivity mapping analysis 1. Similarity matrix of the ~ 160 Concawe UVCBs: establishing significant connections among the assayed UVCBs themselves. (a) Creation of the expression profiles for the ~ 160 petroleum substances. Substance concentration Cell line time point (b) Clustering at the substance level or individual profile level (substance cell line concentration combinations)

Work package 4B.1 Connectivity mapping analysis (a) Creation of the expression profiles for the ~ 160 petroleum substances. Factors: Substance, Cell line, Concentration, time point Input data: Processed and normalized gene expression data from WP4a Output: singed ranking of all genes assayed Methods: For individual dosage: ranking genes by log ratio and p value (if available) For multiple dosage (dose response data), ranking genes by their correlations with dosage, and statistical significance (p value); or by maximum log ratio across doses.

Work package 4B.1 Connectivity mapping analysis (b) Clustering at the substance level or individual profile level (substance cell lineconcentration combinations) Connectivity mapping scores used as a similarity measure For a given cell line, a similarity matrix is constructed for all the substances An overall similarity matrix is constructed by averaging across cell lines Clustering be performed using the overall similarity matrix

Work package 4B Connectivity mapping analysis 2. Performing robustness analysis on the connections and groupings of Concawe substances obtained. Using a suitable public dataset to develop and test the algorithms and analysis pipeline. Applying the tested procedures to Cat App data.

Work package 4B.2 Connectivity mapping analysis 2. Robustness analysis: suitable public dataset to use Gene expression profiling data. Perturbgen vs control design, ideally with replicates. Share a common set of transcripts measured.

Work package 4B.2 Connectivity mapping analysis 2. Robustness analysis: Methodology Simulation based approach Controlled levels of random noise Gene signature perturbations Effect of reduced gene expression space

Boulevard du Souverain, 165 B 1160 Brussels Belgium T. +32 2 566 91 60 F. +32 2 566 91 81 www.concawe.eu