Workshop on flexible designs for diagnostic studies. from diagnostic accuracy to personalized medicine

Size: px
Start display at page:

Download "Workshop on flexible designs for diagnostic studies. from diagnostic accuracy to personalized medicine"

Transcription

1 Workshop on flexible designs for diagnostic studies from diagnostic accuracy to personalized medicine

2 Structure Background Sources of error Summary 06/11/2017 Introduction 2

3 Structure Background Sources of error Summary 06/11/2017 Introduction 3

4 Development process of a diagnostic tests Phase I: pilot studies technical and methodologicalevaluations Phase II: case control studies first assessment of the diagnostic accuracy Phase III: confirmatory diagnostic accuracy studies reliable estimation of the diagnostic accuracy Phase IV: randomised diagnostic studies evaluation of the effectiveness [EMA, 2010; Köbberling et al., 1990] 06/11/2017 Introduction 4

5 Comparison index test versus reference standard Index test positive test result negative test result Reference standard diseased non diseased overall true positives (TP) false negatives (FN) false positives (FP) true negatives (TN) overall primary endpoints sensitivity = TP / specificity = TN / 06/11/2017 Introduction 5

6 Comparison index versus standard test Diseased individuals Standard test Index test positive negative overall positive a b a + b negative c d c + d Δ sensitivity / overall a + c b+ d Non diseased individuals Δ specificity / Standard test Index test positive negative overall positive e f e + f negative g h g + h overall e + g f + h 06/11/2017 Introduction 6

7 Comparison of mass spectrometry (index test) versus biopsy (reference standard) Condition: suspicion of acute rejection after renal transplantation (assumed prev. = 25 % ) Assumptions: diagnostic test se sp Mass spectromety 91 % 76 % Hypotheses:, : 0.83, : 7 Sample size: se 150 sp N = 600 [Zapf et al., 2015] 06/11/2017 Introduction 7

8 Paired comparison of autofluorescence device (Velscope) with standard (white light) Biopsy as reference standard Condition: Suspicion of oral cancer (assumed prevalence = 10 %) Assumptions: diagnostic test se sp white light 50 % 100 % Velscope 100 % 96 % Hypotheses:, :, : 0.2 Sample size: se 15 sp Results: N = 123, prevalence = 5 % diagnostic test se sp. 115 N = 150 white light 17 % 97 % [Rana et al., 2012] 06/11/2017 Introduction 8 Velscope 100 % 74 %

9 Assumptions: Strategy % appropriately treated culture % culture + 10 % more Sample size: α = 5%, power = 80% N = 750 patients Result: Strategy % appropriately treated culture 75 % culture + 67 % OR = 1.44 (95% CI = ) 06/11/2017 Introduction 9 [Holm et al., 2017]

10 Aim of the DFG project Development of flexible study designs for diagnostic studies 06/11/2017 Introduction 10

11 Structure Background Sources of error Summary 06/11/2017 Introduction 11

12 Cutoff value data driven selection overestimation of sens and spec [Leeflang et al., 2008] 06/11/2017 Introduction 12

13 Comparison of SPCR and SACR versus reference standard (NICE criteria) Condition: suspicion of severe pre eclampsia (assumed prevalence = 5%) Assumptions: Index test se sp SPCR, SACR 95 % 'high' Hypotheses:, : 0.90, :??? Sample size calculation: α = 5%, power = 80%, 'negligible number' of missing ref. standard 240 with severe PE, N = 3000 overall (?240 / 0.05 = 4800?) Interim analysisafter 500 women: prevalence = 15.6%, ref. standard missing for 14% Consequences: surrogate reference standard; N = 1790 Results: Index test se sp SPCR, SACR 94 % 57 % [Waugh et al., 2017] 06/11/2017 Introduction 13

14 Biased accuracy estimators [Lijmer et al., 1999] 06/11/2017 Introduction 14

15 Missing values Reference standard Reasons: technical problems, not feasible, not ethical, Problem: comple case analysis biased results Solutions: multiple imputation, sensitivity analyses, Begg and Index test Reasons: technical problems, Problems: usefulness of the test? ignoring / imputation methods biased results Solutions: multinomial GLMM with a third category Greenes method, different reference standard [Reitsma et al., 2009; Alonzo et al, 2011; Begg et al., 1986; Pham et al., 2017] 06/11/2017 Introduction 15

16 Prevalence, non inferiority margins, % discordant results, benefit riskratio Prevalence Estimation from former studies / literature / databases Prevalence in phase II in phase III Prevalence in phase III in phase IV??? Uncertainty; may change during thecourseof the study (better screening, comparators, therapy) Non inferiority margins science based % discordant results former studies Benefit risk ratio sience based 06/11/2017 Introduction 16

17 Structure Background Sources of error Summary 06/11/2017 Introduction 17

18 Adaptive study designs Diagnostic accuracy studies Randomized diagnostic studies Adaptive seamless designs Blinded interim analysis Unblinded interim analysis Blinded interim analysis Unblinded interim analysis Adaptations regarding: prevalence % discordant results % missing values reference standard external: non inferiority margins, cutoff value Adaptations regarding: estimated accuracy target population comparator hypotheses Adaptations regarding: diagnostic accuracy or benefit risk ratio (external) Adaptations regarding: proportion of TP, TN, FP, FN or benefit risk ratio (internal) study design 06/11/2017 Introduction 18

19 References I Alonzo et al. (2011). Bias in estimating accuracy of a binary screening test with differential disease verification. Stat Med, 30(15): Begg et al. (1986). The influence of uninterpretability on the assessment of diagnostic tests. J Chronic Dis, 39(8): Chen et al. (2014). Biomarker adaptive designs in clinical trials. TCR, 3(3): EMA (2010). Guideline on clinical evaluation of diagnostic agents. Doc. Ref. CPMP/ EWP/1119/98/Rev.1. document _library/ Scientific_guideline/2009/09/WC pdf (date of last access 01/11/17). Holm et al. (2017). Effect of point of care susceptibility testing in general practice on appropriate prescription of antibiotics for patients with uncomplicated urinary tract infection: a diagnostic randomised controlled trial. BMJ open, 7:e Köbberling et al., Ed (1990). Memorandum for the Evaluation of Diagnostic Measures. J Clin Chem Clin Biochem, 28(12): Leeflang et al. (2008). Bias in sensitivity and specificity caused by data driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clin Chem, 54(4):

20 References II Lijmer et al. (1999). Empirical evidence of design related bias in studies of diagnostic tests. JAMA, 282: Pham, Schlattmann (2017). Comparison of the multinomial generalized linear mixed model with three alternative generalized mixed models in the meta analysis of diagnostic accuracy trials with non evaluable index test results. GMDS abstract. Rana et al. (2012). Clinical evaluation of an autofluorescence diagnostic device for oral cancer detection: a prospective randomized diagnostic study. Eur J Cancer Prev, 21(5): Reitsma et al. (2009). A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidem, 62: Waugh et al. (2017). Spot protein creatinine ratio and spot albumin creatinine ratio in the assessment of pre eclampsia: a diagnostic accuracy study with decision analytic model based economic evaluation and acceptability analysis. Health Technology Assessment, 21(61):1 90. Zapf et al. (2015). Non invasive diagnosis of acute rejection in renal transplant patients using mass spectrometry of urine samples a multicentre phase 3 diagnostic accuracy study. BMC Nephrol; 16:153.

21 APPENDIX 06/11/2017 Introduction 21

22 Design related bias [Rutjes et al., 2006, CMAJ; 174(3)] 06/11/2017 Introduction 22

23 Design related bias [Rutjes et al., 2006, CMAJ; 174(3)] 06/11/2017 Introduction 23

24 Design related bias [Rutjes et al., 2006, CMAJ; 174(3)] 06/11/2017 Introduction 24