ADAPTIVITY IN CLINICAL TRIAL DESIGNS: OLD AND NEW. Dept of Mathematical Sciences, University of Bath, UK. Cornell University, Ithaca, New York

Size: px
Start display at page:

Download "ADAPTIVITY IN CLINICAL TRIAL DESIGNS: OLD AND NEW. Dept of Mathematical Sciences, University of Bath, UK. Cornell University, Ithaca, New York"

Transcription

1 ADAPTIVITY IN CLINICAL TRIAL DESIGNS: OLD AND NEW Professor Christopher Jennison, Dept of Mathematical Sciences, University of Bath, UK and Professor Bruce Turnbull, Cornell University, Ithaca, New York 1st German Biometric Conference Halle, March 200 c 200 Jennison, Turnbull 1

2 Adaptivity Adaptive choice of test statistic as information on assumptions emerges; eg, scores in a linear rank test, logrank vs Gehan test Adaptive allocation to achieve balance within strata Adaptive allocation to assign fewer patients to an inferior treatment Adaptivity to accruing information on nuisance parameters Adaptivity to accruing information on safety/secondary endpoints Adaptivity to adjust power based on accruing information on primary endpoints Adaptivity to to drop arms in a multi-arm study based on accruing information on primary endpoints And more 2

3 Outline of Presentation 1 Interim monitoring of clinical trials Adapting to observed data 2 Distribution theory, the role of information Error-spending tests Adapting to unpredictable information Adapting to nuisance parameters 4 Most efficient group sequential tests Adapting optimally to observed data More recent adaptive proposals Example of inefficiency in an adaptive designs 7 Conclusions

4 1 Interim monitoring of clinical trials It is standard practice to monitor progress of clinical trials for reasons of ethics, administration (accrual, compliance) and economics Special methods are needed since multiple looks at accumulating data can lead to over-interpretation of interim results Methods developed in manufacturing production were first transposed to clinical trials in the 190s Traditional sequential methods assumed continuous monitoring of data, whereas it is only practical to analyse a clinical trial on a small number of occasions The major step forward was the advent of Group Sequential methods in the 1970s 4

5 Reject to give overall type I error rate = has not been rejected by analysis Stop to reject at analysis To test Reject Accept Choose, stop and accept if if, Use standardised test statistics, : vs, where represents the treatment difference Pocock s repeated significance test (1977)

6 are equal within an accepted tolerance two-sided to show two treatment formulations as treatment B, within a margin (non-inferiority) one-sided to show treatment A is as good Equivalence tests: testing : against One-sided test: testing : against Two-sided test: Types of hypothesis testing problems

7 Types of early stopping 1 Stopping to reject : No treatment difference Allows rapid progress from a positive outcome Avoids exposing further patients to the inferior treatment Appropriate if no further checks are needed on safety or long-term effects 2 Stopping to accept : No treatment difference Stopping for futility or abandoning a lost cause Saves time and effort when a study is unlikely to lead to a positive conclusion 7

8 ' 8 and at effect sizes in between A sequential test can reduce expected sample size under,,! " # Reject $ &%! " # Reject $, Specify type I error rate and power find out whether or If the new treatment if not effective, it is not appropriate to keep sampling to : against To look for superiority of a new treatment, test One-sided tests

9 1 / 0 ( / 0 ) * 9 adapting to data, stopping when a decision is possible 2 Sample size can be around 0 to 70% of the fixed sample size + + Accept - -,, Reject A typical boundary one-sided testing boundary: One-sided tests

10 < = = # 4 # : 4 7 are approximately multivariate at analysis be the estimate of the parameter of interest, 4 10 Cov 2 4 Var 2 4 = $ ; for and 98 2 $ ; normal, In very many situations, Canonical joint distribution of Var 2 4 The information for is analysis based on data at Let 2 Joint distribution of parameter estimates

11 4 4 In a test of 11 2 < Cov = <@ = for 8 : 2, 2 7 is multivariate normal, For this, Var 2 4 :, the standardised statistic at analysis is Canonical joint distribution of?-statistics

12 A A ; 12 with drift observed at times 7 It can be helpful to know the score statistics behave as Brownian motion Cov 2 A % A ; B % B for DC The score statistics possess the independent increments property, A 8 : 2 The score statistics A, are also multivariate normal with Canonical joint distribution of score statistics

13 % 7 can be adjustment for covariates if required So, we have the theory to support general comparisons, including a general model fitted by maximum likelihood (large sample theory) a general normal linear, The results also apply when is a parameter in: EF EG the effect size in a comparison of two normal means a single normal mean, demonstrated directly for: The preceding results for the joint distribution of Sequential distribution theory

14 Survival data The canonical joint distributions also arise for: parameter estimates in Cox s proportional hazards regression model a sequence of log-rank statistics (score statistics) for comparing two survival curves and to H-statistics formed from these For survival data, observed information is roughly proportional to the number of failures seen Special types of group sequential test are needed to handle unpredictable and unevenly spaced information levels 14

15 J * J Q 1 K LM RRTSS UUWVV VV UUWSS RR N O J P spent up to the current analysis The error spending function, I 2, gives the type I error probability to be type I error probability as a function of observed information Lan & DeMets (Biometrika, 198) presented two-sided tests which spend Error spending tests

16 I 1 Turnbull (2000) reaching this target See slides 20 and 21 or Chapter 7 of Jennison & sequence over-runs the target X YZ, or if the study ends without reaching Precise rules are available to protect the type I error rate if the information Accept if X YZ is reached without rejecting 2 cumulative type I error probability For each, 2,, set the boundary at analysis to give Specify the error spending function I 2 Maximum information design

17 I ^ ] _ ] I 17 Adapting to unpredictable information etc! [\ " # $ 2 % I 2 Reject if where Cumulative information Analysis 2:! [\ " # $ 2 ] ^ Reject if where Observed information Analysis 1: Implementing error spending tests

18 ` by analysis I by analysis J J J J U U d Q 18 Power family of error spending tests: I 2 and ` 2 X YZ bk! " [\ i # Accept $ 2! " [\ # Reject $ 2 2fe, set boundary values At analysis hg so that K LM K LM RR RR Ubaa Ubaa aa aa RR RR N O J P c O J P Define I 2 and ` 2 for spending type I and type II error probabilities One-sided error spending tests

19 n q H l l levels, 19 where nis the inflation factor for this design X YZ 2 H op ' at For type I error rate and power X YZ analysis under a typical sequence of information levels, eg, The value of X YZ is chosen so that boundaries converge at the final or an analysis with m X YZ is reached 2 A maximum information design continues until a boundary is crossed 1 Computation of 2fe hg does not depend on future information Implementing one-sided error spending tests

20 X e 7 7 as calculated guarantees type I error probability of exactly 7 and gain extra power ( ) * 7 7 and YZ but the information levels g g /Accept sr / 0 g 7 e 7 e 20 deviate from the equally spaced values (say) used in choosing X YZ ,, tvu s If X YZ, solving for is liable to give g 7 Over-running may also occur if 7 So, reduce to Keeping Reject Over-running

21 e 7 ' 7 as calculated, the type I error probability is exactly 7 attained power will be just below g g / 0 ( su t sr / 0 ) * g 7 e 7 g and e 21 This time, increase to % Again, with + + Accept - - w w,, Reject is liable to give If a final information level X YZ is imposed, solving for Under-running

22 # $ J J } J ~ J J J { { { { { { J ƒ z 22 they should not be influenced by the estimated treatment effect Changes affecting can be based on observed information; extend the patient accrual period If the overall failure rate is low or censoring is high, one may decide to Information With fixed dates for analyses, continue until information reaches X YZ yx # Number of failures by analysis $ Information depends on the number of observed failures, (1) Survival data, log-rank statistics Error-spending designs and nuisance parameters

23 q H 2 Adapting to nuisance parameters to provide this level of information depends on the unknown variance The maximum required information is fixed but the sample size needed information X YZ n A group sequential design with inflation factor nneeds maximum 2 H op ' at and power % requires information In a two treatment comparison, a fixed sample test with type I error rate (2) Normal responses with unknown variance Error-spending designs and nuisance parameters

24 Š G F G p ˆ ; F G p ; 24 (estimated) information At interim analyses, apply the error spending boundary based on observed NB, state in the protocol, not initial targets for F and X YZ the final analysis has As updated estimates of are obtained: Adjust future group sizes so equal to an initial estimate, Initially: Set maximum sample sizes to give information X YZ if is The information from F observations on treatment A G and on B is Adjusting sample size as variance is estimated

25 or averaged over several 2 one particular s Find the design which minimises average sample size (information) at fix maximum sample size (information), if desired fix number of analyses,, ' at fix type I error rate and power %, Formulate the testing problem: Optimising a group sequential test: benchmark for judging efficiency of designs with other desirable features Optimal designs may be used directly or they can serve as a 4 Optimal group sequential tests

26 2 can be solved accurately and quickly optimisation problem the key is that the unconstrained Bayes problem This is essentially a Lagrangian method for solving a constrained ' at frequentist properties: type I error rate and power % Search for a set of costs such that the Bayes test has the desired backwards induction (dynamic programming) costs for a wrong decision Write a program to solve this Bayes problem by Create a Bayes decision problem with a prior on, sampling costs and Derivation of optimal group sequential tests

27 œ J # 1 1 i analyses, Š as nup to a point Note: 1 2 as but with diminishing returns, at œ= at œ= at œ= at œ=11 œminimum over Minimum values of O 0 J P O J P š, as a percentage of 2 p 2 equal group sizes, minimising $@ One-sided tests, ' Œ, n Ž, Example of properties of optimal tests

28 Error spent is proportional to k Š n A 28 &% ; % ; u Error spent is proportional to Error spending, -family (Hwang et al, 1994) determines the inflation factor Error spending, -family each žimplies an inflation factor nsuch that Ž Parametric family indexed by ž, boundaries for involve Ÿ, Pampallona & Tsiatis One-sided tests: Assessing families of group sequential tests n

29 «J 29 PSfrag replacements Adapting optimally to observed data tests are sub-optimal Both error spending families are highly efficient but Pampallona & Tsiatis family family optimal ªfamily O 0 J P O J P š as a percentage of Tests with, Œ, % ' Families of tests

30 2 : then choose as a function of A 0 error rate and power Specify sampling rule and stopping rule to achieve desired overall type I and so forth A % A 8 2 % 2 %, observe A where observe A 8 : 2, Initially, fix, chosen adaptively: Schmitz (199) proposed group sequential tests in which group sizes are Squeezing a little extra efficiency

31 :is the density of z 1 [ Π1 problems Again, optimal designs can be found by solving related Bayes decision Maximum number of analyses Maximum sample information fixed sample information Constraints: where I 2 2 distribution 2 : I 2 f Aim for low values of and power % ' at To test : versus : with type I error rate Examples of Schmitz designs

32 2 Adapting super-optimally to observed data efficiency gains are slight Varying group sizes adaptively makes for a complex procedure and the Optimal equal group group sizes Optimal Optimal adaptive non-adaptive, non-adaptive, designoptimised (Schmitz) sizes Optimal average 1 2 as a percentage of the fixed sample information Examples of Schmitz designs

33 # i 1 1 R D: Optimal adaptive test C: Optimal non adaptive test B: ρ family test, optimal 1st group 0 A: ρ family test, equal group sizes Average ASN 70 7 K = 2 Designs minimise average ASN [\ 2 p 1 [\ i 2 p [\ 2 power &% ' at, and analyses Tests of : versus : with type I error rate Examples of Schmitz designs Œ, $@²±

34 # i R D: Optimal adaptive test C: Optimal non adaptive test B: ρ family test, optimal 1st group A: ρ family test, equal group sizes Average ASN 0 K = Designs minimise average ASN [\ 2 p 1 [\ i 2 p [\ 2 power &% ' at, and Œanalyses Tests of : versus : with type I error rate Examples of Schmitz designs Œ, $@²±

35 Recent adaptive methods Adaptivity Flexibility Pre-planned extensions The way the design changes in response to interim data is pre-determined: particular designs of Proschan and Hunsberger (199), Li et al (2002), Hartung and Knapp (200) are very much like Schmitz designs Partially pre-planned The time of the first interim analysis is pre-specified, as is the method for combining results from different stages: Bauer (1989), Bauer & Köhne (1994) Re-design may be unplanned The method of combining results from different stages is implicit in the original design and carried over into any re-design: Fisher (1998), Cui et al (1999), Denne (2001) or Müller & Schäfer (2001)

36 ˆ Bauer (1989) and Bauer & Köhne (1994) proposed mid-course design changes to one or more of Treatment definition Choice of primary response variable Sample size: in order to maintain power under an estimated nuisance parameter to change power in response to external information to change power for internal reasons a) secondary endpoint, eg, safety b) primary endpoint, ie,

37 µ! % and other part 1 information! 7 2! 8 under Combine! and! by Fisher s combination test: conditionally on! 8 ³ 2 under, Run part 2 with these changes, giving Make design changes! 8 ³ 2 under Run part 1 as planned This gives Each part yields a one-sided P-value and these are combined Investigators decide at the design stage to split the trial into two parts Bauer & Köhne s two-stage scheme

38 8 intersection of these Note: Each stage has its own null hypothesis and the overall is the the two stages with a pre-specified rule Applying Fisher s combination test for! and! gives a meta-analysis of Use a long-term endpoint (eg, survival) Phase III Compare selected dose against control Use a rapidly available endpoint (eg, tumour response) Phase IIb Compare several doses and select the best drug development process, such as: With major changes, the two parts are rather like separate studies in a B & K: Major design changes before part 2

39 B & K: Minor design changes before stage 2 With only minor changes, the form of responses in stage 2 stays close to the original plan Bauer & Köhne s method provides a way to handle this Or, an error spending test could be used: Slight departures from the original design will perturb the observed information levels, which can be handled in an error spending design After a change of treatment definition, one can stratify with respect to patients admitted before and after the change As long as the overall score statistic can be embedded in a Brownian motion, one can use an error spending test with a maximum information design 9

40 ¹ exactly ¹ 40 power precisely! (c) The two-stage design of Stein (194) attains both type I error and (b) Error spending designs can use estimated information (from ) (a) Many internal pilot designs are available Other methods:! and! from º-tests are independent ³ 2 under requirement assuming variance is equal to, the estimate from stage 1 One can choose the second stage s sample size to meet this power sample size depends on ' at Aiming for type I error rate and power &%, the necessary Example Normal response with unknown variance, B & K: Nuisance parameters

41 »»» 41 see Denne (2001) or Müller & Schäfer (2001) preserving conditional type I error probability under possible within a fixed sample study or a group sequential design by Recent work shows similar re-design, maintaining the type I error rate, is If no re-design was planned ' at be increased, eg, to give conditional power % If this happens after part 1 of a B & K design, the part 2 sample size can than ( ) ' at endpoint, investigators wish to achieve power % rather At an interim stage suppose, for reasons not concerning the primary Bauer and Köhne design External factors or internal, secondary information

42 stopping on and optimal GSTs do this optimally! But, group sequential tests already base the decision for early It is good to be able to rescue a poorly designed study available If re-design is unplanned, the conditional type I error rate approach is power under Many methods have been proposed to do this, often by fixing conditional trying to be efficient a wait and see approach to choosing a study s power requirement, to rescue an under-powered study, Motivation may be: Responding to ¼, an estimate of the primary endpoint

43 : ¾ ¾ ¾ ¾!! 4 Then 2 8 under where ¾ and ¾ are pre-specified with p p combined in the overall statistic ½ ; 2 % ½ ; 2 % normal method (Mosteller and Bush, 194): For a study with two parts, combine! and! by the weighted inverse L Fisher (1998), Cui et al (1999), The variance spending method

44 4 at stage 1 is smaller than hoped for, 4 j ¾sample size in stage ¾ ¾ 44 stage counterparts hence, second stage observations receive lower weight than their first, second stage sample size is increased to enhance power under In one standard scenario: But, after flexible, adaptive re-design this is typically not the case statistic if The overall statistic p is the usual efficient test Suppose observations are independent, normally distributed Variance spending method normal observations

45 m 4 if there is a need to adapt observations, but the data are examined after the first 0 responses to see An adaptive design starts out as a fixed sample test with à We suppose this requires a sample size First, consider a fixed sample study attaining power 09 at ÂÀ and worth detecting (cf the example cited by Cui et al) However, effect sizes as low as about ÂÀ À Œare clinically relevant Investigators are optimistic the effect,, could be as high as ÁÀ We wish to design a test with type I error probability Œ Scenario Example A Cui, Hung & Wang (1999) style example Example of inefficiency in an adaptive design

46 4 thereby maintaining overall type I error rate most 0 decrease in sample size is allowed and we keep the total sample size to at Then, truncate this additional sample size to the interval (0, 00), so no Choose the remaining sample size to give conditional power 09 if in fact type I error rate given Otherwise, re-design the remainder of the trial, preserving the conditional z, stop the trial for futility, accepting If ÂÀ Denote the estimated effect based on the first 0 observations by Cui et al adaptive design

47 @ 47 If continuing past the first stage, total sample size ranges from 100 to 0 achieving power 08 at ÁÀ À Œ(ie, ÁÀ ŒÄ) The adaptive test improves on the power of the fixed sample test, θ/δ* Adaptive test Fixed sample test, n= Power Power of the Cui et al adaptive test

48 48 compared to 0 power and ASN It also has a much lower maximum sample size 22 This test dominates the Cui et al adaptive design with respect to both after 22 gives a test meeting the requirement of power 09 at z taking the first analysis after 8 observations and the second analysis type I error rate is Œ, We have compared a power family, error spending test with : attain power 09 when z Similar overall power can be obtained by a non-adaptive GST designed to A conventional group sequential test

49 49 proposed in the literature We have found similar inefficiency in many more of the adaptive designs size lower average sample size function, and a much smaller maximum sample The advantages of the conventional GST are clear It has higher power, a θ/δ* θ/δ* group GST Adaptive test 0 2 group GST Adaptive test 04 Power ASN Cui et al adaptive test vs non-adaptive GST

50 Conditional power and overall power It might be argued that only conditional power is important once a study is under way so overall power is irrelevant once data have been observed However: Overall power integrates over conditional properties in just the right way It is overall power that is available at the design stage, when a stopping rule and sampling rule (even an adaptive one) are chosen As the example shows, chasing conditional power can be a trap leading to very large sample sizes when the estimated effect size is low and, given the variability of this estimate, the true effect size could well be zero To a pharmaceutical company conducting many trials, long term performance is determined by overall properties, ie, the power and average sample size of each study 0

51 7 Conclusions Error Spending tests using Information Monitoring can adapt to unpredictable information levels, nuisance parameters, observed data, ie, efficient stopping rules Methods preserving conditional type I error allow re-design in response to external developments or internal evidence from secondary endpoints Recently proposed adaptive methods can facilitate re-sizing for nuisance parameters, support re-sizing to rescue an under-powered study, allow an on-going approach to study design But, they will not improve on the efficiency of standard Group Sequential Tests and they can be substantially inferior 1

52 References Bauer, P (1989) Multistage testing with adaptive designs (with discussion) Biometrie und Informatik in Medizin und Biologie 20, Bauer, P and Köhne, K (1994) Evaluation of experiments with adaptive interim analyses Biometrics 0, Correction Biometrics 2, (199), 80 Chi, GYH and Liu, Q (1999) The attractiveness of the concept of a prospectively designed two-stage clinical trial J Biopharmaceutical Statistics 9, 7 47 Cui, L, Hung, HMJ and Wang, S-J (1999) Modification of sample size in group sequential clinical trials Biometrics, 8 87 Denne, JS (2001) Sample size recalculation using conditional power Statistics in Medicine 20, ICH Topic E9 Note for Guidance on Statistical Principles for Clinical Trials ICH Technical Coordination, EMEA: London, 1998 Fisher, LD (1998) Self-designing clinical trials Statistics in Medicine 17, Fisher, RA (192) Statistical Methods for Research Workers, 4th Ed, Oliver and Boyd, London Hartung, J and Knapp, G (200) A new class of completely self-designing clinical trials Biometrical Journal 4, -19 Hwang, IK, Shih, WJ and DeCani, JS (1990) Group sequential designs using a family of type I error probability spending functions Statist Med, 9, Jennison, C and Turnbull, BW (199) Group sequential tests for bivariate response: Interim analyses of clinical trials with both efficacy and safety endpoints Biometrics 49, Jennison, C and Turnbull, BW (2000) Group Sequential Methods with Applications to Clinical Trials, Chapman & Hall/CRC, Boca Raton Jennison, C and Turnbull, BW (200) Mid-course sample size modification in clinical trials based on the observed treatment effect Statistics in Medicine 2,

53 Jennison, C and Turnbull, BW (2004a,b,c) Preprints available at mascj/ or at (Click on Technical Reports, Faculty Authors, Turnbull and finally on Submit ) a Efficient group sequential designs when there are several effect sizes under consideration b Adaptive and non-adaptive group sequential tests c Meta-analyses and adaptive group sequential designs in the clinical development process Li, G, Shih, W J, Xie, T and Lu J (2002) A sample size adjustment procedure for clinical trials based on conditional power Biostatistics, Mehta, C R and Tsiatis, A A (2001) Flexible sample size considerations using information-based interim monitoring Drug Information J, Müller, H-H and Schäfer, H (2001) Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential procedures Biometrics 7, Pampallona, S and Tsiatis, AA (1994) Group sequential designs for one-sided and two-sided hypothesis testing with provision for early stopping in favor of the null hypothesis J Statist Planning and Inference, 42, 19 Pocock, SJ (1977) Group sequential methods in the design and analysis of clinical trials Biometrika, 4, Proschan, MA and Hunsberger, SA (199) Designed extension of studies based on conditional power Biometrics 1, Schmitz, N (199) Optimal Sequentially Planned Decision Procedures Lecture Notes in Statistics, 79, Springer-Verlag: New York