Efficacy, Safety and Futility Stopping Boundaries

Size: px
Start display at page:

Download "Efficacy, Safety and Futility Stopping Boundaries"

Transcription

1 Efficacy, Safety and Futility Stopping Boundaries ExL Pharma Workshop Philadelphia, PA Feb 25-26, 2007 Cyrus R. Mehta President, Cytel Inc. web: tel: ExL Pharma. Feb 25-26, Philadelphia

2 Contents of the Talk Three Real Examples with Early Stopping Boundaries Efficacy boundaries alone (CHARM trial) Efficacy and safety boundaries (CRASH trial) Efficacy and futility boundaries (COMET trial) Acknowledgement: I thank Professor Stuart Pocock for providing me with these examples 2 ExL Pharma. Feb 25-26, Philadelphia

3 1. Tough Efficacy Boundaries: CHARM Trial Candersartan vs. placebo for reducing mortality in heart failure patients (CHARM). American Heart Journal (Pocock, 2005) Primary endpoint is all causes mortality Require 85% power to detect a 14% reduction in annual mortality from 8% in the placebo group, with a 2-sided level-0.05 test 3 ExL Pharma. Feb 25-26, Philadelphia

4 How Many Events Needed? Number of events needed to achieve 1 β power is [ ] zα/2 + z 2 β D =4 = 1570 ln(hr) Investigators want a4yearstudy Enroll for 2 years Follow up for 2 additional years after last patient enrolled What sample size will produce 1570 events in 4 years? 4 ExL Pharma. Feb 25-26, Philadelphia

5 Sample Size Calculation Patients enroll at the rate of A per month for S a months and are followed for an additional S f months Accrual Period Accrual Plus Follow-up 0 S a (S a + S f ) For exponential survival with hazard rate λ the expected number of failures by calendar time l is A(l 1 e λl ) for l S λ a D λ (l) = A{S a e λl λ (eλs a 1)} for l>s a 5 ExL Pharma. Feb 25-26, Philadelphia

6 S a =24, S a + S f =48. Find the accrual rate A such that D λe (48) + D λc (48) = 1570 We must enroll A = 317 subjects/month for 24 months. Sample size N = = ExL Pharma. Feb 25-26, Philadelphia

7 Single Look Design 7 ExL Pharma. Feb 25-26, Philadelphia

8 Single Look Design: Blinded Monitoring of Events The single-look design monitors events in blinded fashion until 1570 events have been observed. Then performs final analysis Statistical significance is declared if Z 1.96, or equivalently p 0.05 Designs with unblinded interim monitoring and possible early stopping have more complicated criteria for declaring significance 8 ExL Pharma. Feb 25-26, Philadelphia

9 Group Sequential Design: Unblinded Monitoring of Events In a group sequential design a DMC performs unblinded efficacy analyses up to K times, after observing D 1,D 2,...D K events Let c 1,c 2,...c K be corresponding stopping boundaries. Statistical significance is declared the the first time that Z j c j We require the c j s to satisfy the level condition: P 0 K j=1 ( Z j c j )=α 9 ExL Pharma. Feb 25-26, Philadelphia

10 Spending Function Boundaries Specify a monotone increasing function of t for t [0, 1] with α(0)=0, α(1) = α. Lan and DeMets (1983) have proposed ( ) zα/4 α(t) =4 4Φ t but any other montone function could be used also Let t j =( D j D K ) be the information fraction at look j Solve recursively for c 1,c 2,...c K : and for j =3,...K, P 0 { Z 1 c 1 } = α(t 1 ) α(t 1 )+P 0 { Z 1 <c 1, Z 2 c 2 } = α(t 2 ) α(t j 1 )+P 0 { Z 1 <c 1,..., Z j 1 <c j 1, Z j c j } = α(t j ) 10 ExL Pharma. Feb 25-26, Philadelphia

11 Lan-DeMets (OBF) α-spending Function α(t) =4 4Φ ( ) zα/4 t 11 ExL Pharma. Feb 25-26, Philadelphia

12 Lan-DeMets (PK) α-spending Function α(t) =α log{1 +(e 1)t} 12 ExL Pharma. Feb 25-26, Philadelphia

13 A Parametric Family of Spending Functions Gamma Family Hwang IK, Shih WJ and DeCani JS (1990). Statistics in Medicine, 9, α(t) =α (1 e γt ), where γ 0 (1 e γ ) Setting γ to -4 or -5 generates boundaries similar to O Brien-Fleming, while setting γ to 1 generates boundaries similar to Pocock Can generate very conservative or very aggressive boundaries by choice of γ 13 ExL Pharma. Feb 25-26, Philadelphia

14 LD(OF) Boundaries of CHARM 14 ExL Pharma. Feb 25-26, Philadelphia

15 Impact of Group Sequential Design on Power and P-value Penalty Taking multiple looks for early stopping: decreases power raises the p-value hurdle at the final look The magnitude of these changes increases with number of looks and with aggressivelness of the stopping boundaries The Lan-DeMets, O Brien-Fleming type boundary is popular because it is not too aggressive For the CHARM trial, however, the DMC wanted much tougher boundaries 15 ExL Pharma. Feb 25-26, Philadelphia

16 LD(OF), Haybittle-Peto and Gamma(-12) Boundaries 16 ExL Pharma. Feb 25-26, Philadelphia

17 Haybittle Peto Boundaries The DMC decided that they would use the Haybittle-Peto boundaries These are very simple rules based purely on the p-value at each look Plan for 7 looks Reject if p< at the first 3 looks Reject if p<0.001 at the next 3 looks Adjust the p-value at the final look to get a level-α test 17 ExL Pharma. Feb 25-26, Philadelphia

18 Last-look P-Value for HP Let c j =Φ 1 ( /2) for j =1, 2, 3 and c j =Φ 1 ( /2) for j =4, 5, 6. Then c 7 satisfies { 6 } P 0 ( Z j c j ) ( Z 7 c 7 ) = α j=1 18 ExL Pharma. Feb 25-26, Philadelphia

19 Comments on HP Boundaries Developed as an ad-hoc method to enable interim monitoring when there is no serious intention to stop early Final p-value depends on number and spacing of interim looks. It must be re-calculated if these design parameters change 19 ExL Pharma. Feb 25-26, Philadelphia

20 Interim Monitoring of CHARM Efficacy Results at each DMC meeting Date # Deaths HP Boundary P-Value Hazard Ratio 8/9/ /27/ /27/ /1/ /9/ /22/ /1/ /31/ ExL Pharma. Feb 25-26, Philadelphia

21 21 ExL Pharma. Feb 25-26, Philadelphia

22 Why Didn t They Stop at Look 4? Secondary endpoints, CV death and CHF hospitalization still awaiting adjudication Very short average length of follow-up No previous trial had shown evidence of benefit from Candesartan Results did not appear strong enough to influence clinical practice 22 ExL Pharma. Feb 25-26, Philadelphia

23 2. Asymetric Efficacy and Safety Boundaries: The CRASH Trial Large international multicenter trial to determine efficacy and safety of administering intravenous corticosteroids to subjects with significant head injury (Lancet, vol 364, 2004) Endpoint is death within 14 days of randomization Randomize subjects with Glasgow Coma Score 14 to placebo or corticosteroids Placebo arm 14-day mortality estimated to be 15% Design for 90% power to detect a 2% drop in 14-day mortality with a two-sided test conducted at level α =0.05 Risk benefit ratio is unclear. Corticosteroids believed to be beneficial. But evidence from meta-analysis suggests possibility of harm 23 ExL Pharma. Feb 25-26, Philadelphia

24 Single-Look Design patients required to achieve 90% power 24 ExL Pharma. Feb 25-26, Philadelphia

25 Drawback of the Single-Look Design Very large sample size commitment with no possibility of early termination for benefit, harm or futility Suppose the corticosteroids are actually beneficial? Do we really have to randomize 6320 patients to placebo before we know for sure? What if the meta-analysis results are correct and corticosteroids are actually harmful? In that case we will have randomized 6320 patients to a treatment that is worse than placebo 25 ExL Pharma. Feb 25-26, Philadelphia

26 Group Sequential Design Monitor the interim data Stop the trial early if evidence of benefit or harm emerges 26 ExL Pharma. Feb 25-26, Philadelphia

27 Using East for the Design 27 ExL Pharma. Feb 25-26, Philadelphia

28 Evaluate Properties by Simulation 28 ExL Pharma. Feb 25-26, Philadelphia

29 Interim Monitoring of Crash Recruitment began in April The DMC met twice. The efficacy results at the two meetings are tabulated below along with the final data: Date of Corticosteroid Placebo Statistics DMC Meeting Deaths Subjects Deaths Subjects ˆδ se(ˆδ) Z June (20.2%) (18.2%) May (20.6%) (17.9%) Final Data 1052 (21.1%) (17.9%) The safety boundary was crossed at the second look and the DMC stopped the trial, declaring that use of corticosteroids was unsafe The final analysis confirmed the conclusions of the DMC 29 ExL Pharma. Feb 25-26, Philadelphia

30 Tracking the Path of the Test Statistic 30 ExL Pharma. Feb 25-26, Philadelphia

31 3. Futility Boundaries: The COMET Trial Dornase alpha versus placebo for patients with a hospitalized exacerbation of chronic bronchitis Primary endpoint is 90-day all-cause mortality Encouraging results from 244 patient pilot (p =0.002) Investigators plan a 3-look 5600 patient trial Provides 90% power to detect a 20% drop in 90-day mortality (15% to 12%) with a two-sided level-0.05 test 31 ExL Pharma. Feb 25-26, Philadelphia

32 Trial with No Futility Boundary 32 ExL Pharma. Feb 25-26, Philadelphia

33 Options for Futility Boundaries 1. Low conditional power Somewhat arbitrary. How low? Impact on overall power must be evaluated 2. Lower confidence bound rules out clinical benefit Has same drawbacks as conditional power 3. Formal futility boundary Overall power is preserved via β-spending function Boundary can be expressed in terms of conditional power 33 ExL Pharma. Feb 25-26, Philadelphia

34 Benefit of Formal Futility Boundaries Stopping a trial for futility (as opposed to safety) is a difficult recommendation for the DMC to make, and a painful decision for the sponsor As a result many trials continue to the end, consuming resources that could have been better utilized on other compounds The presence of a formal futility boundary whose operating characteristics have been examined ahead of time can encourage more aggressive early stopping for futility 34 ExL Pharma. Feb 25-26, Philadelphia

35 COMET with Futility Boundary 35 ExL Pharma. Feb 25-26, Philadelphia

36 Implications of Adding a Futility Boundaries Need larger sample size (5699 patients) for same amount of power Lower expected sample size either if risk reduction is 20% or if it is 0% The two boundaries meet at l 3 = u 3 = 1.959; aneasier hurdle than the corresponding boundary c 3 = for design with no futilty boundary 36 ExL Pharma. Feb 25-26, Philadelphia

37 37 ExL Pharma. Feb 25-26, Philadelphia

38 Boundary Interaction Because of the futility boundary, the final efficacy boundary became easier to cross (from to ). This is sometimes refered to as buying back the α In this case the final efficacy boundary is more favorable even than -1.96, the one-sided efficacy cut-off for a single-look trial We shall see, however that there is a price to be paid for this windfall the futility boundary is binding. 38 ExL Pharma. Feb 25-26, Philadelphia

39 Why Futility Boundary Relaxes the Efficacy Criterion Multiple looks give you extra opportunities to cross an efficacy boundary under H 0 Therefore you pay a penalty (c 3 =1.993 in Plan 1) to prevent excess false positives But multiple looks give you extra opportunities to cross futility boundary under H 1 Therefore you receive a reward (l 3 = u 3 =1.959 in Plan 5) to prevent excess false negatives 39 ExL Pharma. Feb 25-26, Philadelphia

40 Warning! This Futility Boundary is Binding The sponsor should be aware that taking advantage of this reward (i.e., relaxing the standard for declaring statistical significance at the final look) will make the futility boundary binding If you overrule the futility boundary, the type-1 error will be inflated 40 ExL Pharma. Feb 25-26, Philadelphia

41 41 ExL Pharma. Feb 25-26, Philadelphia

42 Non-Binding Futility Boundary To make the futility boundary non-binding, you must leave the efficacy boundary untouched Only in that way can you be assured that the type-1 error will be preserved This will cost you a slight loss of power since you cannot pull up the efficacy boundary anymore 42 ExL Pharma. Feb 25-26, Philadelphia

43 Plan 3 has a non-binding futility boundary Notice how the sample size increases when going from Plan 1 to Plan 2 to Plan 3 to compensate for the power loss 43 ExL Pharma. Feb 25-26, Philadelphia

44 Verify Properties by Simulation 44 ExL Pharma. Feb 25-26, Philadelphia

45 Interim Monitoring of Comet At the DMC meeting in July 1995, trial was stopped for futility with 197/1866 (10.6%) deaths on dornase alpha and 163/1865 (8.7%) deaths on placebo 45 ExL Pharma. Feb 25-26, Philadelphia

46 Final Comments on COMET In the actual trial there was no futility boundary Trial was indeed stopped, but only after much discussion That decision still remains a topic for discussion Had there been a futility boundary it would have been crossed by a wide margin and there would be no further discussion What would DMC have done had the results had come out 163/1865 (8.7%) for placebo and 153/1866 ( 8.2%) for dornase alpha? Without a futility boundary the decision would have been more difficult 46 ExL Pharma. Feb 25-26, Philadelphia