Bayesian Designs for Early Phase Clinical Trials

Size: px
Start display at page:

Download "Bayesian Designs for Early Phase Clinical Trials"

Transcription

1 Bayesian Designs for Early Phase Clinical Trials Peter F. Thall, PhD Department of Biostatistics M.D. Anderson Cancer Center Fourteenth International Kidney Cancer Symposium Miami, Florida, November 6-7, 2015

2 Bayesian Statistics in Nutshell Bayesians consider model parameters, q, to be random and give them Prior Probability Distributions. q = a treatment effect, covariate effect, median survival, etc. Typically, one cannot observe q, but its meaning is clear. Bayes Theorem [ Prior Knowledge About q ] + [data ] [ Posterior of q ] After data are observed, compute the posterior distribution of q using Bayes Theorem, and use it to make inferences. Bayesian Learning As new data are obtained sequentially in a clinical trial, the posterior at each stage is the prior for the next stage.

3 Bayesian Estimation: 95% Credible Intervals for q under 4 different beta distributions, all with mean 2/

4 Why Bayesian? A Very Simple Statistical Problem q = Pr(Toxicity) of an experimental agent at a fixed dose. Observe [# toxicities] in n=3 patients. How to estimate q? Usual frequentist estimator: Sample mean ( proportion, percent ) Estimate of q = [# toxicities] / [sample size] has 4 possible values: 0/3 = 0, 1/3, 2/3, 3/3 = 1 ( 0%, 33%, 67%, 100% ) But the estimate may not make sense : Estimate = 0 says I believe that toxicity is impossible. Estimate =1 says I believe that toxicity is certain. The usual textbook 95% ci for q is [0, 0] if X=0, [1, 1] if X=3

5 A Simple Bayesian Solution q = Pr(Toxicity) is considered to be random. Assume a non-informative prior on q with prior mean.50 Posterior mean of q given X = # toxicities in 3 patients is (.50) ¼ + (X / 3) ¾ Prior Mean Sample Mean

6 Frequentist versus Bayesian Estimation Observed Number of Toxicities Sample Mean Posterior Mean of q Posterior 95% Credible Interval * for q * Prob[ L < q < U data] =.95

7 Commonly Used 3+3 Algorithms Implicitly target a dose with Pr(Toxicity) = either.17 or.33 # Patients with Toxicity Decision 0/3 Escalate one level 1/3 Treat 3 more at current level 0/3 + 0/3 To get here a de-escalation rule must have been applied at the next higher dose level Stop & choose current level as MTD 1/3 + 0/3 Escalate one level unless a de-escalation rule was applied at next higher level, in which case choose current level as MTD 1/3 + 1/3 Stop & choose previous level as MTD (3+3 A) Stop & choose current level as MTD (3+ 3 B) unless previous level has only 3 patients, in which case treat 3 more at previous level 1/3 + { 2/3 or 3/3 } Stop & choose previous level as MTD unless previous level has only 3 patients in which case treat 3 more at previous level 2/3 or 3/3 Stop & choose previous level as MTD unless previous level has only 3 patients in which case treat 3 more at previous level

8 Typical Data from a Phase I Trial after 3+3 Algorithm Dose mg/m 2 #Toxicities / #Patients Posterior 95% Credible Interval / / / Usual claim: The MTD is 200 mg/m 2. Really? At the end of phase I, you know almost nothing about Pr(Toxicity dose) 1) A 95% CI for Pr(Tox 200) at the MTD runs from.01 to.52 2) Toxicity type and severity level are ignored. 3) Efficacy is ignored. What if Pr(response d=300) =.50 and Pr(response d=200) =.25?

9 How a Bayesian sees Pr(Toxicity) : Distributions that vary with dose

10 Continual Reassessment Method (CRM) O Quigley, Pepe, Fisher, dozens of later Me too! papers The CRM is a model-based Bayesian method that chooses doses sequentially for successive cohorts of patients in a phase I clinical trial. It requires the physicians to specify : 1) a definition of toxicity 2) a fixed target p* for Prob(toxicity dose). Usually, p* =.20,.25,.30, or.33. For each new cohort, the CRM chooses the dose with posterior expected Prob(toxicity dose) closest to p*. Additional Safety Rules that we always use at MDACC: Do not skip an untried dose when escalating If the lowest dose is too toxic Stop the trial.

11 Properties of the CRM 1) Assumes toxicity is binary (like the 3+3) 2) Requires a fixed target Ptox (unlike the 3+3) 3) It explicitly assumes Pr(Tox dose) increases with dose. 4) More work to implement than 3+3 : A computer program is required for simulations and trial conduct

12 Computer Simulations of 3+3 vs CRM with target.25 for Pr(toxicity) simulated trials of each method for each assumed dose-toxicity curve

13 Selection Percentages Under C algorithm A selects no MTD ~ 10% of the time CRM 3+3 A 3+3 B None True Prob(Toxicity) Dose

14 Selection Percentages Under C A is much more likely to 60 select an ineffective dose as 50 the MTD B is much more likely to select an unsafe dose as the MTD CRM 3+3 A 3+3 B None True Prob(Toxicity) Dose

15 Dose = 1 Dose = 2 Dose = 3 Dose = 4 Two Outcomes: Toxicity and Efficacy in a phase I-II trial. p T (dose) and p E (dose) are Random Quantities that vary with dose 12 Patients Posterior Distributions of p T (dose) for 4 dose levels, based on data from 12 patients Posterior distributions of p E (dose) for 4 dose levels, based on data from 12 patients Prob(Toxicity dose) Prob(Efficacy dose)

16 Phase I-II Trial Designs based on Efficacy and Toxicity For each successive cohort, adaptively optimize dose of one agent (dose 1, dose 2 ) of a two-agent combination (dose, schedule) Efficacy - Toxicity Trade-Offs Pr(Toxicity) Mild Moderate Toxicity High Severe Efficacy 3 Pr(Efficacy)

17 Bayesian Utility-Based Dose-Finding in Pediatric Brain Tumors Diffuse Intrinsic Pontine Gliomas (DIPGs) Very aggressive pediatric brain tumors, median age = 5 years No effective treatment exists, median survival < 1 year. Radiation Therapy (RT) is standard treatment, mainly palliative. RT dose-toxicity and dose-efficacy profiles not understood. Goal: Study three RT doses, given serially per a fixed fractionation schedule

18 Bayesian Utility-Based Dose-Finding in Pediatric Brain Tumors Toxicity = Low, Moderate, High, or Severe Efficacy = Total number of improvements in (i) Clinical Symptoms (ii) Radiographic Appearance of the Tumor (iii) Quality of Life Possible Efficacy values = 0, 1, 2, or 3 (Toxicity, Efficacy) scored at day 42

19 Elicited Joint Outcome Utilities Toxicity Efficacy

20 Some Properties of the Utilities Efficacy Score Toxicity Severity Low Moderate High Severe Question: Why not use DLT = {High, Severe} and apply a usual dose finding method (e.g. 3+3 or CRM )? Answer: U(0,Moderate) = U(3, High) = 25 Scoring these two outcomes as No DLT and DLT makes no sense!

21 Conduct of the Radiation Therapy Trial 1) Accrual rate = 6 to 10 patients/year 2) N = 30 children maximum, cohorts of size 3 3) Treat the first cohort of 3 patients at the lowest dose, then apply the adaptive utility-based criterion. 4) Do not skip the middle dose when escalating. 5) A dose is unacceptably toxic if is it likely to have Pr(High or Severe toxicity) > 10%

22 Computer Simulations: Operating Characteristics of RT Trial Design

23 Computer Simulations: Operating Characteristics of RT Trial Design

24 Which of these two doses, in terms of their (p E, p T ) pairs, is more desirable?

25 Efficacy-Toxicity Trade-Offs: Which of these two doses, in terms of their (p E, p T ) pairs, is more desirable?

26 Bayesian Design to Optimize Lenalidomide Dose in Myeloma Patients for Autologous Stem Cell Transplant Lenalidomide studied at doses { 25, 50, 75, 100 } mg/m 2 on each of days -8, -7,, -2 before transplant + fixed dose of IV melphalan as preparative regimen Toxicity = Regimen-related death, graft failure, or grade 3,4 atrial fibrillation, deep venous thrombosis, or pulmonary embolism within 30 days post transplant Efficacy = Alive and in Remission at day 30 post transplant.20 = Upper Limit on p T (x),.15 = Lower Limit on p E (x) N max = 60, cohort size = 3, first cohort treated at 25 mg/m 2

27 Lenalidomide Autologous SCT Trial: Bayesian Design Simulated Under Scenario 1 Pr(Toxicity) Pr(Response) Desirability % Selected Dose = 25 Dose = 50 Dose = 75 Dose = 100

28 Lenalidomide Autologous SCT Trial: Bayesian Design Simulated Under Scenario 2 Pr(Toxicity) Pr(Response) Desirability % Selected Dose = 25 Dose = 50 Dose = 75 Dose = 100

29 Lenalidomide Autologous SCT Trial: Bayesian Design Simulated Under Scenario 3 Pr(Toxicity) Pr(Response) Desirability % Selected Dose = 25 Dose = 50 Dose = 75 Dose = 100

30 Phase II: Misinterpreting the Simon Design ( Ratain and Karrison, 2007 ) Suppose standard treatment has Pr(response) = p 0 =.05. Three Simon designs with a =.05, b =.10 : Design 1: Target p a =.20 Reject p 0 =.05 if you observe > 5/41 responses (12.2%) Design 2: Target p a =.30 Reject p 0 =.05 if you observe > 3/17 responses (17.6%) But both designs specify an empirical response rate to reject p 0 =.05 that is smaller than.20. What is going on?!!?

31 Misinterpreting the Simon Design (Rattain, 2007) In a test of hypotheses, one can only accept or reject the null, but reject the null does not mean accept the alternative. If X/n > cut-off one may conclude p >.05, not p = p a. The target p a =.20 or.30 is a straw man. It is just a device to compute sample size.

32 Simple Bayesian Analyses Assuming a non-informative beta(½,½) prior Design 1: 5/41 (12.2%) responses Pr(p >.20 data)=.10 95% ci for p is ( includes both p 0 =.05 and p a =.20 ) Design 2: 3/17 (17.6%) responses Pr(p >.30 data)=.13 95% ci for p is ( includes p a =.30 ) From a Bayesian viewpoint, neither of these data give convincing evidence that the alternative hypothesis is true!!

33 Bayesian Decision Rules for Target p =.30 1) Assume a Bayesian model, with prior p ~ beta(.30,.70) (uninformative) 2) Monitor more frequently: Stop if Pr( p >.30 data ) <.015 Pr(p >.30 0/6) =.016 Continue Pr(p >.30 0/7) =.010 STOP Pr(p >.30 1/13) =.019 Continue Pr(p >.30 1/14) =.014 STOP

34 Bayesian Hierarchical Models to Borrow Strength Problem: Design a phase II trial to evaluate p = Pr(Tumor Response) with a new Magic Molecule in 10 different sarcoma subtypes (Thall et al. 2003) Approach 1: Assume the 10 subtypes have the same p and conduct one trial with one futility stopping rule. What if the subtypes have different Pr(Tumor Response)? Approach 2: Assume the subtypes have different p 1,,p 10, and conduct 10 trials, each with its own stopping rule. Are the 10 subtypes really independent? Is conducting 10 trials feasible? What about rare subtypes?

35

36 Bayesian Hierarchical Model (Also Useful for Meta-Analysis) Response Response... Response data in S 1 data in S 2 data in S 10 p 1 p 2... p 10 Prior on p 1 Prior on p 2... Prior on p 10 Hyper Prior

37 N=50 N=150 Posteriors for 5 Response Probabilities Independent, Identical Priors Highly Informative Hierarchical Prior

38 N= N=150 Independent, Identical Priors Posteriors for 5 Response Probabilities Hierarchical Prior, Moderately Informative

39 A Futile Futility Rule Response = [PFS > 12 months] The Rule: Stop early if < 23/39 responses Assumed accrual rate = 10 patients per year Months # accrued # evaluated # observed responses Decision GO GO GO GO GO Accrual Done By the time the futility stopping rule can be applied, all 39 patients have been accrued. STOP

40 A Simple Bayesian Solution (Thall et al., 2006) Monitor PFS time Bayesian Stopping Rule based on possibly right-censored PFS time data: Stop an arm if it is unlikely that its median PFS time is at least 11 months longer than the null median, corresponding to 50% oneyear PFS. Apply the rule either continuously, or monthly. Undesirable Desirable Pr(PFS>12 months) Median PFS (Months)

41 Mean number of Patients Probability of Early Stopping Operating Characteristics of the Bayesian Stopping Rule N Prob mon 13.9 mon 16.3 mon 19.3 mon 23.3 mon True Median Progression Free Survival Time 0.00

42 General Conclusions Bayesian Statistics provides a practical basis for design and conduct of complex clinical trials. Computer Simulation is an essential tool for calibrating design parameters. Major Caveat Developing statistical models, methods, and computer programs is extremely labor-intensive and time-consuming.