Designing and Validating a Multicolor Flow Cytometry Assay Brent Wood MD PhD Department of Laboratory Medicine University of Washington
Specimen Handling Sample Requirements 5 ml Peripheral blood (EDTA, Heparin, ACD) 1-2 ml Bone marrow aspirate (EDTA, Heparin) Ideally 1 st pull to limit hemodilution Body Fluids Tissues in RPMI (1 cm 3 ) Bone marrow biopsy Lymph node biopsies Tissue biopsies (GI, Skin, etc.)
Specimen Transport Rapid delivery to lab is important Assays require viable cells Deterioration is significant after ~48 hours. Cells more stable in heparin than EDTA Heparin commonly used for reference testing Refrigeration retards degradation Heating/Cooling cycles probably not good Useful for storage when transport is delayed Prolonged transport leads to poor viability
Specimen Transport Australian COG B-ALL MRD sample - 4 days old Seattle MRD = 0.015% Local MRD = 0.13% HP12-05174
Transfix Myelomonocyt ic Myelomonocyt ic Myelomonocyt ic CD14 PE-Cy55 CD14 PE-Cy55 CD14 PE-Cy55 CD16 APC-A700 CD16 APC-A700 CD16 APC-A700 Day 0 Day 3 Day 3 Transfix RT Myelomonocyt ic Myelomonocyt ic Myelomonocyt ic CD14 PE-Cy55 CD14 PE-Cy55 CD14 PE-Cy55 CD16 APC-A700 CD16 APC-A700 CD16 APC-A700 Day 0 Day 4 Day 4 - Transfix 4C
Principle Specimen Processing Assay as rapidly as possible with minimal manipulation and stabilize early Method Stain / lyse / wash NH 4 Cl + 0.25% formaldehyde FACSlyse (some loss of compromised cells) Versalyse + formaldehyde Pre-lysis or Bulk lysis Facilitates plasma removal Used to concentrate cells Cell recovery reduced Activates some cell populations (monocytes)
Assay Design and Validation Purpose Why? Design What? Verification What? Validation - What? Implementation How? Documentation If it isn t documented, it didn t happen
Assay Life Cycle Design Validation Plan Execute Validation Failure Validation Report Implementation Plan Yes Revision? No Retirement
Define Purpose of Assay Most important question What information is required? What information is optional? What information is most important? Prioritize Compromises are inevitable Simplest assay is best Less likely to fail Easier to maintain
Assay Design Target Identification Population(s) Minimal antigens required to obtain information Redundancy? Fluorochrome matching Intensity and background Instrument capabilities Intrinsic expression level of target Reagent availability Reagent performance Fluorochrome performance Conjugate performance Antibody titration Compensation effects Specimen processing Lyse/stain vs. Stain/lyse vs. No lyse Wash vs. No wash Verification Iterative process
Example - Progenitor evaluation Objective Evaluate novel antigen expression on progenitor subpopulations of defined lineages Antigens to be evaluated: Progenitor gating - CD45 Early progenitors - CD34, CD117, CD38 Erythroid - CD71 B cell - CD19 pdcs, basophils - CD123, HLA-DR Myelomonocytic - CD15, HLA-DR Test reagents - CD45RA, CD133, CD7, etc.
Possibilities PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR P-X PE- Cy7 A594 APC APC- A700 APC- X7
Possibilities - Availability PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR P-X PE- Cy7 A594 APC APC- A700 APC- X7
Fluorochrome Matching Match level of expression with fluorochrome intensity Bright expression = Dim fluorochrome Dim expression = Bright fluorochrome Note increased background from other fluorochromes
Fluorochrome Intensity Small molecule Phycobiliproteins and tandems CD4 FITC = Green PE = Dark Blue PE-TR = Light blue PerCP-Cy5.5 = Magenta PE-Cy7 = Orange Pacific Blue = Red A594 = Yellow green APC = Light green A700 = Blue APC-Cy7 = Purple
CD34
Surface Antigen Titration 2 ul + 5 ul + 10 ul Titer for signal to noise Saturation desirable Use same total volume as assay Use 5 ul
Cytoplasmic Antigen Titration 10 ul, 5 ul, 2 ul, 1ul of neat Titer for signal to noise Not saturation Particularly important for cytoplasmic antigens 10 ul, 5 ul, 2 ul, 1ul of 1:10 Use 5 ul of 1:10 Use 10 ul of 1:10
Possibilities PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR P-X PE- Cy7 A594 APC APC- A700 APC- X7
Compensation Spectral overlap between fluorochromes Critical to success of method For 10 color experiment Need to determine 90 values for Comp Matrix Software compensation required Maximum flexibility Non-destructive
FITC = Green PE = Orange Excitation = Dotted Emission = Solid
Compensation - Method Single stained controls used One for each individual fluorochrome One for each individual tandem As bright as brightest reagent to be used Samples run without compensation Compensation calculated in software Applied either at acquisition or analysis
Compensation Correct Undercompensated Overcompensated
Compensation
Compensation
Compensation Don t worry unduly about PMT voltage 245% 103% 47.5% 23.0% 12.2% 1.9% 4.7% 10.3% 21.0% 41.0% PE = 400 volts PE-TR = 550 volts 450 volts 500 volts 550 volts 600 volts Compensation values should reflect relative spectral overlap, i.e. detector gains should be equal
Compensation Validation Fluorescence minus one (FMO) control Removal of one reagent from the antibody combination
Compensation Validation Fluorescence minus one FMO control for PE Intensity of PE signal should be reduced to background
Compensation Background Avoid increased background due to fluorochromes Adjacent with longer wavelength emission PE / PE-TR, PE-TR / PE-Cy5, PE-Cy5.5 or PerCP-Cy5.5/ PE-Cy7 APC / APC-A700, APC-A700 / APC-Cy7 Primary fluorochrome of tandem PE and PE-TR, PE-Cy5, PE-Cy5.5, or PE-Cy7 APC and APC-A700 or APC-Cy7 Interlaser excitation and emission PE-Cy5 and APC PE-Cy5.5 or PerCP-Cy5.5 and APC-A700 PE-Cy7 and APC-Cy7 PE-TR and A594
Adjacent fluorochromes Fluorochromes cause increased background that is dependent on emission spectra and signal intensity
Primary of Tandems Antibody Aggregates 10.5% 1.6% Tandem leakage causes increased background
Interlaser compensation Uncompensated Compensated Interlaser compensation is dependent on excitation and emission spectra of fluorochrome
Strategies to deal with compensation background Avoid detection of dim expression in presence of high background Avoid bright fluorescence Put fluorochromes on different populations Put fluorochromes brightly on same population
Avoid bright fluorescence
Different populations
Bright dual positive
Possibilities PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR P-X PE- Cy7 A594 APC APC- A700 APC- X7
Possibilities PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR PECy5 PE- Cy7 A594 APC APC- A700 APC- X7
Final PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR PECy5 PE- Cy7 A594 APC APC- A700 APC- X7
Compromises PB / V450 DR CD15 CD19 CD123 CD117 CD38 CD34 CD71 CD45 X FITC PE PE-TR PECy5 PE- Cy7 A594 APC APC- A700 APC- X7
Compensation Compromises 09-13546
PE background 09-15566
Verification Fluorescence minus one controls Reveal antibody interactions Avoid IgG2 class unless plasma removed Understand background Troubleshooting Run samples Under same conditions as to be used Representative samples Positive and negative controls
Validation Objective Document, through the use of specific laboratory investigations, that the performance characteristics of a method are suitable and reliable for the intended analytical applications. Two main components Validation Plan Define validation approach and methods Define acceptance criteria Review and sign prior to execution Validation Report Assess if acceptance criteria met Identify deviations in performance Review and sign at completion
Categories of Methods Category Quantitative Relative Quantitative Quasi-Quantitative Qualitative Definition Calibration standard, Reference material Calibration standard, Reference material not representative No calibration standard, continuous numeric data Lacks proportionality, categorical data reported Flow cytometry Quantitative (MESF) = Relative quantitative Cell enumeration (CD34, T cells subsets, MRD) = Quasi-quantitative Immunophenotyping = Qualitative Wood B, et al (2013) Validation of Cell-based Fluorescence Assays: Practice Guidelines from the ICSH and ICCS - Part V - Assay performance criteria. Cytometry Part B 84B:315 323.
Validation Plan Methodology Accuracy Specificity Sensitivity Limit of Detection (LOD) Limit of Quantitation (LOQ) Linearity Imprecision Within Run Between Run Intermediate (analysts, equipment) Carryover Robustness
Accuracy
Quasi-Quantitative Methods Accuracy Assay average vs. reference value No reference material for cell-based assays Cannot determine accuracy Surrogates Reference method Rare Differences may reflect methodologies Inter-laboratory comparison Samples from patients with known condition Assay must be uniquely diagnostic Recommendation Minimum 10 samples, 90% concordance
Accuracy vs. Precision
Quasi-Quantitative Methods Imprecision Intra-assay Use sample of same composition as to be assayed Imprecision related to number of events % CV = SQRT(N) / N, where N = number of events Recommendation Inter-assay 5 samples run in triplicate on same run Span range for analytical decisions Use % CV as the acceptance criterion» Target of 10% good for many assays Sample stability is major issue May use stabilized control material, if available and appropriate May perform on same day if instrument powered down Recommendation 2-3 levels of material assayed in triplicate over 3-5 runs
Quasi-Quantitative Methods Specificity Analytical Reagent specificity Provided by manufacturer Based on Leukocyte Antigen Differentiation Workshop Lab developed reagents require supporting data Gating specificity Clinical Distinguish population of interest Correlation with clinical situation of interest and not others Recommendation Assay material containing population of interest and other populations expected to be in sample
Quasi-Quantitative Methods Sensitivity Analytical (LOD, LOB) Limit of Blank = Highest signal in absence of measurand Mean (blank) + 1.645 SD (blank) Limit of Detection = 95% of signal above LOB LOB + 1.645 SD (low positive) Functional (LLOQ) Lower Limit of Quantitation Principle LOD with total error (bias + imprecision) meeting clinical criteria May >= LOD, but never lower Replicate assay of samples with dim fluorescence of low population frequency
Sensitivity
Quasi-Quantitative Methods Sensitivity (analytical) Ideal (soluble analytes) 100 low positive and 60 negative samples X samples assayed X times over X days, X = # samples / 10 Not realistic due to sample availability and reagent cost Recommendation Listmode file consists of many individual measurements 5 low positive and 5 negative samples run in 5 replicates Assay over minimum of 3 days Effect of daily QC and instrument start up LOB confirmed if < 5% exceed target LOD confirmed if < 5% below target
Quasi-Quantitative Methods Sensitivity (analytical) Alternate methods for LOB FMO control Does not take into account non-specific reagent binding Isotype control Accounts for isotype-mediated non-specific binding Must be carefully matched to binding characteristics of reagent Known negative populations Must be matched for autofluorescence and background binding Alternate method for LOD Each file consists of numerous measurements Minimum 1 low positive sample + 1 negative sample Only relevant for antigen intensity measurements 5 samples of each type will improve confidence
Quasi-Quantitative Methods Sensitivity (clinical) Definitions (LLOQ = bias + imprecision) Bias = difference of mean from true value Total error = Bias + 2SD For cell assays assume Bias = 0 and use %CV Ideal 40 replicates of 3-5 samples over 3 days Recommendation Assay 5 replicates near LLOQ Acceptable if imprecision meets acceptance criterion Serial dilution to create samples with low population frequency near LLOQ Dilution with unlabeled antibody to create low level intensity
Linearity
Quasi-Quantitative Methods Linearity Not directly applicable for quasi-quantitative assays Exceptions Enumeration of population frequency Quantitation of fluorescence intensity Recommendations Assess instrument linearity semi-annually Enumeration of population frequency Serial dilution of positive sample into negative background Can be performed as part of LLOQ experiment Quantitation of fluorescence intensity Serial dilution with increasing unlabeled antibody Can use calibrated fluorescence beads Must confirm equivalence of emission spectrum and environmental influences
Quasi-Quantitative Methods Stability Determine stability of fresh specimen, processed specimen, reagents Recommendations Fresh specimen stability 5 healthy or disease state specimens Assay at baseline and at intervals to desired stability limit Must perform for each processing or storage condition Accept < 20% variation from baseline or 80% within assay imprecision Processed specimen stability Same as fresh specimens Reagent stability Stability data for at least 3 lots of reagent Inter-lot CVs < 10% Validate under conditions to be used
Quasi-Quantitative Methods Carryover More an instrument rather than assay performance issue Recommendation Sequentially measure specimens in a low-high-low sequence Beads can substitute in part for cells, but may not be as sticky Sporadic release of accumulated cellular material very difficult to assess Consistent acquisition procedures best prevention Run water/buffer between tubes and samples until background is low Periodic use of cleaning solutions, e.g. bleach, detergent, etc.
Quasi-Quantitative Methods Robustness A measure of the assay capacity to remain unaffected by small but deliberate changes in test conditions. Robustness provides an indication of the ability of the assay to perform under normal usage. Robustness measures the effect of deliberate changes (incubation time, temperature, sample preparation, buffer ph) that can be controlled through specifications in the assay protocol. Recommendation No recommendation Often not assessed Ruggedness The reproducibility of the assay under a variety of normal, but variable, test conditions. Variable conditions might include different machines, operators, and reagent lots. Ruggedness provides an estimate of experimental reproducibility with unavoidable error. Also called Intermediate Imprecision when within laboratory.
Example AML MRD Assay Three tube assay for AML MRD detection Reference method = Serial dilution and Clinical outcome Sensitivity + Imprecision + Linearity Serial dilution of 3 samples (2 BM, 1 PB): 0%, 0.01%, 0.1%, 1% Each positive run in triplicate Accuracy Serial dilution of known leukemia, see above. Clinical outcome Specificity 3 negative normal marrows + 3 MRD positive samples Stability Same as those used for sensitivity assessment 3 samples run at baseline and daily for 3 days
Results AML MRD Specificity 3 MRD negative marrows confirmed assay negative 3 MRD positive marrows confirmed assay positive Sensitivity LOB = 0%, no false positives (0.009% LAIP) LOD = 0.003% LLOQ = LOD Imprecision Within run % CV = 3.5 at 1%, 6.6% at 0.1%, 6.4% at 0.01% Between run % CV = 3.7 at 1%, 6.9% at 0.1%, 4.2% at 0.01% Linearity Mean R 2 = 0.999897, Slope mean = 0.968 Accuracy Concordance on 100%. Stability 15% deviation after 4 days, 2 days judged to be stability limit
Example AML CD123 Assay Two tube assay for AML CD123 quantitation + Quantibrite beads Reference method = None Accuracy No reference materials Sensitivity + Imprecision (within run) 3 PB + 3 BM AML run in 5 replicates (% and ABC) FMO used to estimate LOB, CD123+ blasts (low) used to estimate LOD Imprecision + Specificity Between run: Immunotrol on 5 sequential days (Lymphs=Neg, Mono=dim, pdc=bright) Linearity Quantibrite beads + CS&T instrument linearity Stability See Imprecision
Results AML CD123 Accuracy Not done Specificity Lymphocyte confirmed negative, Monocytes dim positive, pdc bright Sensitivity LOB = 406 ABC BM and 242 PB, Median = 347 LOD = 433 ABC BM and 351 PB, Median = 439 LLOQ = LOD Imprecision CD123 % = 0.58 % CV BM, 0.18 % CV BM CD123 ABC = 1.48 % CV BM, 1.02 % CV BM Linearity Instrument linearity confirmed Stability Confirmed assay stability over 5 days
Qualitative Methods Accuracy Correlation with clinical or laboratory findings Recommendation 20 normal and 20 abnormal samples assayed Spiked specimens are acceptable for rare specimen types Specificity Analytical, same as for quasi-quantitative assays Clinical, specificity = TN / (TN + FP) Can determine at same time as accuracy Sensitivity Ability to recognize finding above background Highly variable depending on assay Sensitivity = TP / (TP + FN)
Qualitative Methods Imprecision Recommendation Linearity Assay 3 replicates each of positive and negative sample Not relevant Stability Recommendation Concordance on 4 of 5 samples (80%) at each time point or condition
Example Mast cell Disease Assay Two tube assay for mast cell disease diagnosis Reference method = morphology + clinical findings Accuracy Run all samples and compare with morphology + clinical findings until 10 true + Specificity Specificity = TN / (TN + FP) Sensitivity Sensitivity = TP / (TP + FN) Precision 5 positive and 5 negative samples run in triplicate Stability 5 positive and 5 negative samples run at baseline and daily for 3 days
Example Mast cell Disease Accuracy 108 samples assayed over ~ 9 months, results compared with morphology + clinical findings Specificity + Sensitivity Stability 40% decrease in # mast cells after 24-48 hours, slight decrease in CD2 & CD25 No increase in CD2 or CD25 on normal mast cells Precision (%CV) Flow + Flow - Mast Cell Disease + 10 0 Mast Cell Disease - 17 81 Positive CD117+ = 11.5%, CD2+ = 15.5%, CD25+ = 2.6% Negative CD117+ = 7.5%, CD2+ = 50.4%, CD25+ = 52.2% Sensitivity 100% Specificity 83% PPV 37% NPV 100%
Example Mast cell Disease Mastocytosis (N=10) % Mast cells % CD2+ % CD25+ Range 0.03-37.5 10.2-34.2 62-97.9 Average 4.0 16.0 85.9 % cases + 70 100 Not mastocytosis false positives (N=17) % Mast cells % CD2+ % CD25+ Range 0.02-0.24 12.1-98.8 22.2-95.7 Average 0.07 46.7 46.4 % cases + 18 88 70% + both 6% + both All cases have history of treatment for a hematopoeitic neoplasm
Mast Cell Disease Discovered CD2 and CD25 expression not specific Present following therapy for hematopoietic neoplasms Sensitivity and Specificity = 100% Using 10% threshold for positivity Accounting for prior therapy Assessment of CD2 is unnecessary CD25 alone detects all positive cases Mast cells are progressively lost after 48 hours Precision is within expectation for level of expression Cherian, et al (2016) Expression of CD2 and CD25 on mast cell populations can be seen outside the setting of systemic mastocytosis. Cytometry B Clin Cytom. 90(4):387-92. doi: 10.1002/cyto.b.21336
Conclusions Assay validation is critical Assay validation requires careful experimental design Assay validation in flow cytometry is often inadequate Laboratories with insufficient resources to validate an assay should not perform it
Example B-ALL MRD Assay Single tube assay for B-ALL MRD detection Reference method = COG assay Accuracy 5 positive and 5 negative MRD samples sent to Johns Hopkins University Specificity 10 negative normal pediatric marrows + 10 MRD positive samples 5 negatives were those used for accuracy assessment Sensitivity + Imprecision + Linearity Serial dilution of 5 samples: 0%, 0.01%, 0.1%, 1% Each run in triplicate on 5 different days Stability 1 sample run at baseline and daily for 5 days
Results B-ALL MRD Accuracy Concordance on 90% (1 false positive) Specificity 10 MRD negative marrows confirmed assay negative 10 MRD positive marrows confirmed assay positive Sensitivity LOB = 0%, no false positives LOD = 0.007% LLOQ = LOD Imprecision % CV = 2.0 at 1%, 5.0% at 0.1%, 11.0% at 0.01% Linearity R 2 = 0.999946 Stability 5% deviation after 5 days using WBC denominator 16% deviation after 4 days using mononuclear cell denominator