Informatics challenges in data management, high-throughput screening and development of predictive models of ADME properties

Size: px
Start display at page:

Download "Informatics challenges in data management, high-throughput screening and development of predictive models of ADME properties"

Transcription

1 Informatics challenges in data management, high-throughput screening and development of predictive models of ADME properties Peter Gund 1, Janet Cohen 1, William J. Egan 1,2, Osman F. Güner 3, and Kirk McMillan 1. (1) Pharmacopeia Laboratories, Princeton NJ (2) Center for Informatics and Drug Discovery, Princeton NJ (3) Molecular Simulations, San Diego CA

2 The importance of ADME properties 9 out of 10 clinical candidates fail! 40% due to pharmacokinetics (ADME) problems 10% due to animal toxicity 10% due to side effects (human toxicity) - Prentis, et al., Br. J. Clin. Pharmac. 1988, 25, Cost of new drug approval: $ million Including cost of failed drugs Est. $70 million cost for ADME/Tox failures - Pharma. Exec. Jan 2000; Windhover Information; Prentis

3 Drug Discovery and Development A PreClinical Candidate has a 10% chance of becoming a drug Typical Success Rates at Each Step of Drug Discovery & Development < 1% 0.1-1% 1-10% 50% 20% 70+% Drug Lead Discovery Lead Optimization Candidate Selection Preclinical Development Clinical Development Hit Lead Preclinical Candidate Clinical Candidate NDA Typical Discovery Costs: Typical Development Costs: $120MM $60MM $180MM preclinical clinical Drug

4 Why test a candidate with poor ADME properties in the clinic? Difficult to predict ADME properties in humans ADME properties are variable, poorly understood Example: Bioavailability (e.g., as measured by drug blood levels) due to combination of effects: Absorption: variable; depends on route of delivery, formulation, patient s physiology Distribution: variable for different patients Metabolism: different patients have different enzyme levels and variants Excretion: great variation among patients

5 Other properties contribute Water solubility Different from absorption (but partially correlated) Measurement a bit tricky - ph, salt, cosolvent effects Distribution to brain (through BB Barrier) Toxicity Varies with individuals and with species Some but not all mechanisms are understood Specificity Which of thousands of other enzymes/receptors are hit?

6 How can we improve the discovery and development success rate? Identify/eliminate early candidates that will ultimately fail! Have better ADME/Tox prediction tools for selection of candidates Optimize ADME/Tox properties in parallel with potency Discover hits and leads with better ADME/Tox properties

7 The challenge: Use ADME/Tox data earlier in drug discovery Fast ADME/tox descriptor calcs Approximate models/guidelines Set up assays Design drug-like libraries Acquire drug-like samples Run assays Select most promising leads More accurate descriptors More complex models Verify target relevance Design drug-like, bioavailable analogs Optimize affinity Obtain selectivity Optimal experimental and computational models Secondary assays Design drug-like analogs Achieve safety, efficacy, bioavailability, etc. Lead Discovery Lead Optimization Candidate Selection

8 Experimental vs. computational models for ADME and toxicity Experimental Models Clinical: performance in healthy and sick humans Preclinical: performance in manipulated animals Discovery: performance in tissue, organ, egg, etc., in vivo systems Computational Models Receptor-ligand models Reactivity models Transport models (Sub)structure-activity, property-activity models Appropriate descriptors Statistical/physical models Screening: performance in in vitro isolated systems Hybrid models To be successful, we must improve the accuracy of both models!

9 Can we predict ADME properties from computer models? Absorption human absorption bivariate model (CIDD) Talk: COMP44 - W.J. Egan, K.M. Merz, Mon. 4:15 pm Aqueous Solubility QSPR model (CIDD) Talk: COMP137 - A. Cheng, K. M. Merz, Wed. 11:15 am Distribution, Metabolism, Toxicity Plasma binding model (CIDD) Blood-brain barrier model (CIDD) Etc.

10 Egan bivariate absorption model compounds >90% absorbed actively transported compounds >90% absorbed compounds <30% absorbed PCOP compounds (>100 nm/s caco-2) PCOP compounds (<34 nm/s caco-2) Egan, W. J., Merz, K. M., and Baldwin, J. J., Prediction of Drug Absorption Using Multivariate Statistics. J. Med. Chem. 2000, in press

11 1.5 Performance of CIDD blood-brain penetration model Test Set Results 1 Model r2 = 0.86 RMSE = 0.27 RMSEP = predicted logbb log BB

12 How do we build better models? Need high quality data Literature Experimentally measured Need well designed datasets Need good access to the resulting data By structure, by property ranges Use advanced data visualization methods

13 ADME/Tox data sources Literature - requires culling Highly variable results, poorly described Some databases available e.g., Metabolism (Synopsys) Need for consistent assays Pharmacopeia Laboratories experiments

14 Pharmacopeia Labs ADME/Tox assays Absorption Caco-2 Cell Permeability p-glycoprotein P Efflux Metabolic Stability Microsomal Oxidation CYP450 mediated, isoforms identification Solubility Drug Interaction Cytochrome P450 Inhibition Others under Development

15 Use of experimental design of a dataset to derive a predictive Model Chemical Space Var 1 max A welldesigned dataset Var 1 Var 1 min A poorly designed dataset Var 2 min Var 2 Var 2 max

16 Providing access to the data: Pharmacopeia Information Environment (PIE) For Combinatorial Library Members (6.5 million) Store computed descriptors, predicted properties For Virtual Combinatorial Libraries Compute predicted properties for each member Compute property contributions for each component For Discrete Compounds Store computed descriptors Store observed and predicted properties

17 Some ADME datafields in PIE Assay Name Target Name Target Type Protocol/version Data Type (e.g., P app ) Data Unit (e.g., nm/sec) Data Value (numeric) Text (e.g., inactive) Property measured or computed AvgData (average value) StdErr (standard error) N (no. of observations) Sample identifier Chemical Structure

18 Derivation of models from the data Use published descriptors and methods where possible Example: Cerius 2 QSAR descriptors and methods used to derive CIDD aqueous solubility model Talk: COMP137 - A. Cheng, K. M. Merz, Wed. 11:15 am Develop new descriptors or methods where necessary Example: novel descriptors, statistics developed for CIDD human absorption model Talk: COMP44 - W.J. Egan, K.M. Merz, Mon. 4:15 pm

19 Uses of ADME data in drug discovery at Pharmacopeia Creating drug-like sample libraries Selecting from virtual libraries Optimizing a lead Selecting a candidate

20 Predicted absorption of members of a proposed combinatorial library Spotfire Output

21 Examining Structure and Properties of Individual Library Members DESC

22 Absorption Profile of the Pharmacopeia Lead Discovery Services Collection Best 43% Good 21% Ugly 13% Mediocre 23% 3.3 million compounds, 64 libraries 64% should be moderately to well-absorbed 87% should be good leads

23 Optimizing affinity and absorbability Affinity fi ABSCLS 0 ABSCLS 1 ABSCLS 2 ABSCLS 3 absorbability Project timeline fi

24 Use of ADME/Tox data: Conclusions Database of results useful for SAR, design Descriptors, predicted properties stored in PIE Better quality models are being generated Models successfully used in drug discovery Design of drug-like combinatorial libraries Able to find better screening hits Parallel optimization of affinity and absorbability for lead compounds Selection of best candidate for development

25 Acknowledgements Center for Informatics and Drug Discovery Ken Merz, Ailan Cheng, I-Ping Cheng, George Lauri, Wenguang Zheng Pharmacopeia Laboratories Jack Baldwin, Tau Guo, Doug Hobbs, Anfan Wu