Flow Cytometry Method Validation Protocols

Size: px
Start display at page:

Download "Flow Cytometry Method Validation Protocols"

Transcription

1 Flow Cytometry Method Validation Protocols Nithianandan Selliah, 1 Steven Eck, 2 Cherie Green, 3 Teri Oldaker, 4 Jennifer Stewart, 5 Alessandra Vitaliti, 6 and Virginia Litwin 7,8 1 Covance Central Laboratories, Indianapolis, Indiana 2 MedImmune, Gaithersburg, Maryland 3 Genentech, Inc., A Member of the Roche Group, Development Sciences Department, South San Francisco, California 4 Independent Consultant, San Clemente, California 5 Flow Contract Site Laboratory, LLC, Bothell, Washington 6 Novartis Institute for BioMedical Research, Novartis Pharma AG, Basel, Switzerland 7 Caprion Biosciences, Montreal, Quebec, Canada 8 Corresponding author: vlitwin@caprion.com Analytical method validation provides a means to ensure that data are credible and reproducible. This unit will provide a brief introduction to analytical method validation as applied to cellular analysis by flow cytometry. In addition, the unit will provide practical procedures for three different types of validation. The first is a limited validation protocol that is applicable for research settings and non-regulated laboratories. The second is validation protocol that presents the minimum validation requirements in regulated laboratories. The third is a transfer validation protocol to be used when methods are transferred between laboratories. The recommendations presented in this unit are consistent with the white papers published by the American Association of Pharmaceutical Scientists and the International Clinical Cytometry Society, as well as with Clinical Laboratory Standards Institute Guideline H62: Validation of Assays Performed by Flow Cytometry (currently in preparation). C 2018 by John Wiley & Sons, Inc. Keywords: biomarker fit-for-purpose validation precision sensitivity stability How to cite this article: Selliah, N., Eck, S., Green, C., Oldaker, T., Stewart, J., Vitaliti, A., & Litwin, V. (2019). Flow cytometry method validation protocols., 87, e53. doi: /cpcy.53 INTRODUCTION The classic definition of analytical method validation is the confirmation by examination and the provision of objective evidence that the particular specifications for an intended use are fulfilled (CFR Title 21; see Internet Resources). More simply stated, method validation is conducted to characterize the assay performance in order to provide a framework for how to interpret the data. Unless the investigator understands the inherent variation and limitations of a method, it is not possible to accurately interpret the data or transfer methods from one laboratory to another. For example, intra-assay precision (repeatability) involves assaying a sample several times in one analytical run and then determining the coefficient of variation () of the replicates. Once the intra-assay precision/imprecision is established, the researcher can conclude that data which are greater than the intra-assay imprecision are meaningful and not due to noise in the e53, Volume 87 Published in Wiley Online Library (wileyonlinelibrary.com). doi: /cpcy.53 C 2018 John Wiley & Sons, Inc. 1of28

2 Table 1 Validation Parameters Achievable in Flow Cytometry Validation parameters which can be addressed with flow cytometric methods Validation parameters which can sometimes be addressed with flow cytometric methods Validation parameters which cannot be addressed with flow cytometric methods Specificity Linearity Accuracy Precision/robustness Standard calibrators Selectivity Sensitivity Interference (matrix, drug) Range of quantification Limit of detection Limit of quantification Stability Specimen stability Processed sample stability Reference intervals Incurred sample reanalysis Normal signal distribution Prozone effect a Because of the complexity of the technology, the lack of qualified reference materials, and the bioanalytical category of the data, not very validation parameter can be assessed for flow cytometric methods. system. This information is also useful when transferring assays between laboratories, instruments, or analysts, as it provides a benchmark for comparing assay performance in each laboratory. During the process of analytical method validation, a variety of assay performance characteristics may be evaluated: accuracy, specificity, precision (repeatability/robustness), sensitivity (limits of detection/quantification), stability, reference ranges, linearity, and interference (matrix, drug) (Table 1). The intended use of the data will dictate which validation parameters need to be evaluated. If, during the lifetime of the assay, the intended use of the data changes, additional validation parameters can be addressed at that time. This practical, iterative approach is referred to as Fit-for-Purpose validation (Lee et al., 2006). Owing to the complexity of the technology, the lack of qualified reference materials, and the bioanalytical category of the data, not every validation parameter can be assessed for flow cytometric methods (O Hara et al., 2011; Wood et al., 2013). Assay specificity, sensitivity (limits of detection/quantification), precision (repeatability/robustness), stability, and reference ranges can always be validated for flow cytometric methods. Linearity, standard calibration, and interference (matrix, drug) can sometimes be validated, but accuracy, selectivity, range of quantification, incurred sample reanalysis, normal signal distribution, and prozone effect cannot normally be directly evaluated for flow cytometric methods. 2of28 Accuracy (or trueness) is the closeness of the agreement between the measured result and a true value. Data obtained from flow cytometry are considered quasi-quantitative because the numerical result generated from the test sample is proportional to the sample but not generated from a calibrator or reference material, so true accuracy is not determined during flow cytometry method validation (Lee et al., 2006) (Table 2). The inability to demonstrate accuracy is the most controversial aspect of cell-based assay validation. The enumeration of CD19-positive B cells provides an illustration of this point. The simplest way to measure these cells in whole blood is with CD45 (for lymphocyte

3 Table 2 Bioanalytical Data Categories FAQ Is there a calibration curve? What type of results are generated? How can the data be used? How is accuracy demonstrated in validation? What are some examples? Definitive quantitative Yes For definitive qualitative data, the reference standards are well defined and fully representative of the endogenous analyte. Numeric results are generated. Continuous numeric results are generated from a definitive standard curve The data can be used to determine the absolute quantitative values for unknown samples During validation, true accuracy is demonstrated by spike/recovery experiments of well-defined standards into matrix samples Pharmacokinetic data Relative quantitative Quasi-quantitative Qualitative Yes Although there are reference standards, for relative qualitative data, the reference standards are not fully representative of the endogenous analyte. Numeric results are generated. Continuous numeric results are generated from a relative standard curve Data are not used to define the absolute concentrations.the data are used for comparison of various conditions such as tracking temporal changes in concentration values. During validation, relative accuracy is demonstrated by spike/recovery experiments of well-defined standards into matrix samples Cytokine enzyme immunoassays No Data are not generated from a calibration curve. The assay generates numeric results which are expressed in terms of a characteristic of the test sample. Results are numeric and expressed in terms of a characteristic of the test sample Data are not used to define the absolute concentrations.the data are used for comparison of various conditions such as tracking temporal changes in concentration values. Owing to the lack of reference standards, traditional accuracy validation cannot be demonstrated. As an alternative to true accuracy measurements, many labs will compare their method to a gold standard method or exchange samples with another laboratory (Wood et al., 2013; Oldaker, 2018). Flow cytometric methods where population frequencies are reported No Numeric results are not generated. The assay generates categorical results, which are expressed in terms of a characteristic of the test sample, in ordinal or nominal formats. Non-numeric results such as yes/no, positive/negative, dim/bright are reported The data are used to characterize the test sample Given that results are not numeric, traditional accuracy validation cannot be demonstrated. As an alternative to true accuracy measurements, many labs will use specimens from a confirmed diagnosis or exchange samples with another laboratory (Wood et al., 2013; Oldaker, 2018) Flow cytometric methods for characterization of leukemia/lymphoma specimens. Genetic methods for polymorphism identification. a Understanding the type of data generated by a method is critical to understanding how a method can be validated. The concept of categories of bioanalytical data is new to some laboratories. 3of28

4 identification) and CD19 (for B cell identification). In this example, CD19-positive B lymphocytes (CD45 bright, SSC low,cd19 + ) are typically reported as the relative percentage of lymphocytes (CD45 bright, SSC low ). Because there is no reference standard containing a certified number of B cells, we cannot determine if our method is accurate. If we want to evaluate a more complex phenotype such as Breg (CD45 bright, SSC low, CD19 +,CD24 bright,cd38 bright ), the challenges of finding a certified reference material are magnified. If, on the other hand, the level of CD19 expression is to be reported and the fluorescence intensity units are calibrated using fluorescence quantitation beads, the resulting data would be considered relative quantitative. In this case, a calibrator is available in the form of the antigen-binding beads or beads with a pre-defined number of fluorescent molecules, but the calibrator is not representative of the test sample. In this case, the fluorescence intensity results are relative quantitative; the other data remain quasi-quantitative. The classification of leukemia and lymphoma specimens based on immunophenotyping would be considered qualitative data. Regulatory agencies may be required to address the accuracy validation by alternative methods. The guidelines published by the International Council for Standards in Hematology (ICSH) and the International Clinical Cytometry Society (ICCS) Working Group (Wood et al., 2013) provide suggestions for alternative methods for establishing accuracy such as: testing proficiency samples; comparing to a reference method; inter-laboratory comparison; and testing disease-state samples. These approaches are of questionable value given that the true value of the measurand cannot be determined without a reference standard. Moreover, due to the differences in the technologies, it is unlikely that flow cytometric methods will correlate well with other methodologies. Linearity verification is not applicable for quasi-quantitative methods (as in flow cytometry assays; Armbruster & Pry, 2008). When fluorescence intensity signal output is quantified with calibration beads, then linearity of the quantitation beads can be evaluated (Wang, Gaigalas, Marti, Abbasi, & Hoffman, 2008). Alternatively, linearity could be evaluated with the results obtained with serial dilution samples. In this case, acceptance criteria should not be applied, as the data are dependent on pipetting accuracy and no calibrated reference materials are available for statistical analysis. In this unit, three validation plans or protocols are described for assays that report numeric data. The first is a Limited Assay Validation (Basic Protocol 1), which is recommended for research and non-regulated laboratories. The next is an Initial Assay Validation (Basic Protocol 2), which is designed to meet the minimal basic requirements for lowand moderate-risk assays conducted in a regulated laboratory. The third is a Transfer Validation (Basic Protocol 3), which should be applied whenever a method is being transferred from one facility to another. All three validation plans follow the fit-forpurpose validation approach, where the validation parameters are selected based on the intended use of the assay. These validation protocols represent the minimal requirement and may not be applicable for every intended use such as high-risk assays or data to be used for a primary endpoint in a clinical trial. This unit does not present validation plans for assays that report qualitative (i.e., categorical or non-numeric) data. STRATEGIC PLANNING 1. Assessment of Validation Parameters for Flow Cytometric Methods 4of Specificity Analytical specificity is defined as the ability of an assay to identify or quantify a specific analyte in the presence of many other analytes or interfering substances. Specificity in flow cytometry translates into how well the assay measures the cellular population or

5 antigen of interest. Are the events in a particular gate what the investigator intended and not artifacts due to doublets, a contaminating cellular population, compensation errors, or tandem dye degradation? Is each particular antigen optimally detected? When measuring antigen levels, is the measure free from interference by other compounds that might be caused by cross reactivity to biologic drugs or related analytes? These concerns are addressed during the panel design and assay optimization phase by the selection of cellular markers and corresponding monoclonal antibody clones, antibody titration, wash steps, buffers, and gating strategy to identify the intended population (Mahnke, Chattopadhyay, & Roederer, 2010). At a very high level, assay development and optimization should include the following considerations: Panel design: considerations of overall objective of the assay, the instrumentation, and the available reagents, the monoclonal antibody clone, fluorochrome assignment, reagent titration, buffer selection, and staining and gating procedures. The need for fluorescent minus one (FMO) or other gating control tubes. Sample type and selection of anti-coagulant (collection tube). QC material selection, if required: type of QC material based on populations for the assay. All the populations and/or all the antibodies should be monitored in QC material. Test run: evaluate the final assay with a sample of the same type that will be used during the testing phase (e.g., whole blood, bone marrow, cell lines, etc.) in order to verify the antibody and the fluorochrome selection. This step is critical to assess the instrument setup and compensation matrix as well Precision Assay precision is one of the most critical parameters in flow cytometry validation. Intra-assay precision is determined by how close the results are when the same sample is tested repeatedly under the same conditions. The precision acceptance criterion for cell-based assays is 10% to 25% CV (coefficient of variation) (Lee et al., 2006; O Hara et al., 2011; Wood et al., 2013). Higher imprecision (30% to 35% CV) is often acceptable for rare populations or dimly expressed antigens (O Hara et al., 2011; Wood et al., 2013). Reported values from assays with higher imprecision should be regarded skeptically. For such tests, demonstration that the assay has sufficient precision to achieve meaningful distinctions in its intended application should be provided. Inter-precision measures the variability between analysts, instruments, and labs. After intra-assay precision is established, inter-assay (or reproducibility), inter-instrument, and inter-analyst variabilities should be evaluated, if required Sensitivity There are multiple sensitivity parameters that can be assessed. Sensitivity refers to the precision and accuracy of the measurement of rare events or dim antigens. Assessment of sensitivity is important in monitoring rare populations or dimly expressed antigens, and critical when establishing minimal residual disease (MRD) in leukemia and lymphoma. Limit of blank (LOB) is the highest signal in the absence of analyte (i.e., blank sample). Limit of detection (LOD) is the lowest level of the analyte that can be detected around the level of the blank. Lower limit of quantitation (LLOQ) is the lowest level of analyte that can be reliably detected (i.e., with acceptable precision). LLOQ may be same as LOD under some circumstances, but is never lower than LOD (Armbruster & Pry, 2008; Shrivastava & Gupta, 2011; Wood et al., 2013). For flow cytometric methods, the major challenge in sensitivity validation is finding or creating validation samples for LOB/LOD and LLOQ. The LOB/LOD is similar to the buffer blank used in spectrophotometric methods. Approaches to creating surrogate matrix samples that lack the population 5of28

6 of interest include: partially staining a sample by omitting antibodies in such a way that the population of interest will not be detected; depleting the population of interest with immunomagnetic beads; or using patient or healthy donor samples that lack the population of interest. Similar strategies would be used to create samples with low levels of the population of interest to assess LLOQ: admixing partially stained samples into fully stained samples; partially depleting the population of interest; and, for leukemia and lymphoma, admixing disease-state samples into normal donor samples Stability Specimen stability assessment is critical in all cases where samples will not be assayed within a few hours of collection. Certain markers or cellular subsets may be lost or altered during the storage/shipment of the sample (typically whole blood or bone marrow). Antigen expression, cellular composition, and viability can change over time in the anticoagulant tube. The time points for the stability evaluation should be based on when the samples are expected to arrive at the testing laboratory, and should include at least one time point beyond the expected transit time. The generally accepted change for a particular marker from the baseline specimen value and the stored specimen value is 20% difference or a change within the acceptable assay precision (10% to 30% CV). The stability of the processed (stained/fixed) sample prior to acquisition should be evaluated if the samples will not be acquired on the instrument within 1 hr of processing (staining), for example, in laboratories with high workload and/or few instruments. The generally accepted change for a particular marker is the same as the sample stability (Brown et al., 2015; Wood et al., 2013). If processed sample stability is not assessed, then the sample should be acquired within 1 hr of completing the staining/fixing process Carryover It is critical to evaluate instrument carryover from one sample tube to the other when evaluating high-sensitivity assays reporting rare events. Carryover can be assessed during the initial validation by placing a tube with buffer between sample tubes. Data from the blank tubes would be evaluated in the same gating template as the samples Reference ranges Reference ranges (age, gender, disease specific ranges, healthy ranges) are essential in the interpretation of clinical chemistry laboratory results but are not always required for flow cytometric methods. The initial evaluation for establishing reference ranges is to test a minimum of 120 samples from each group (typically 60 male and 60 female, depending on the assay). Data are evaluated first for the distribution type (parametric or nonparametric), followed by the appropriate statistical test such as mean ± 2 for Gaussian (normal) distribution or a 90% to 95% confidence interval applied to transformed, nonparametric data. An in-depth procedure for establishing reference ranges can be found in the Clinical and Laboratory Standards Institute (CLSI) guideline EP28-A3c (Horowitz, Altaie, & Boyd, 2010). When validating an assay approved by the regulatory agencies for diagnostic use (IVD/CE), the laboratory can evaluate as few as 20 samples in order to verify the reference ranges provided by the manufacturer (Horowitz, 2010). Another approach is to compile global ranges from scientific publications (Maecker, McCoy, & Nussenblatt, 2012). In fit-for-purpose validation, fewer samples can be evaluated in order to gain information on intra-subject variability. A minimum of ten donors is recommended. 2. DOCUMENTATION 6of28 Validation should follow a three-step approach (Fig. 1): (1) Say It (the Validation Plan or Protocol); (2) Do It (the Experimental Phase); (3) Prove It (the Validation Report).

7 Assay Optimization / Development Validation Plan Assay Modification Experimental Phase NO Data Analysis Data Acceptable? YES Validation Report Method Specific SOP Assay Implementation Figure 1 Flow cytometry method validation workflow overview. Documentation is critical because in a regulated environment, if you don t document what you did, it didn t happen Method validation plan The Method Validation Plan or Validation Protocol provides detailed documentation of the validation experimental design. The Validation Plan should include: A full description of the method including gating strategy and list of the assay read-outs to be validated. The source (disease, healthy) and type of validation samples, including anticoagulants when using whole blood or bone marrow samples. Quality control material, if applicable. The control material should mimic the actual sample as closely as possible. If the test sample will be in whole blood or bone marrow, preserved whole-blood controls such as CD-Chex (Streck), Multi-Check (BD), or 7of28

8 IMMUNO-TROL TM (Beckman Coulter) are the most appropriate. If the test sample will be PBMC, cryopreserved or lyophilized PBMC preparations are appropriate. Critical reagents (manufacturer, catalog number): For monoclonal antibodies, the fluorochrome and clone designation must be specified. Buffers: List all buffers, fixatives, lysing reagents, permeabilization reagents, with vendor, and catalog number. Describe any dilutions or buffer preparation to be conducted. Describe the storage and expiration dates of all prepared reagents. Equipment (manufacture, model, serial number). Software (manufacture and version). Responsibilities (the name and role of the staff members who will conduct the validation). A description of each validation parameter to be evaluated: The number of samples, the number of replicates, and the number of analytical runs to evaluate each validation parameter. The statistical test applied to each validation experiment and the acceptance criterion Method validation report Method validation report describes the results of the experimental phase. Validation Report should mirror the Validation Plan. Any deviations from the Validation Plan which occurred during the experimental phase should be explained clearly. The validation report should include: A reagent table with the lot number and expiration dates of all reagents. The statistical results clearly summarized with table summaries and/or figures: If any of the parameters did not meet the acceptance criteria, it should be discussed within that section of the document. If any outliers were identified, then exclude them from the statistical analysis justification must be provided along with the statistical tool used to identify outliers. Note that the outliers must be included in the report. Individual results should be presented in a an appendix in table format. Changes in the gating strategy or other deviations from the validation plan must be clearly described in the validation report. The documentation for deviations typically includes an impact assessment, but must follow the laboratory s institution s quality processes. Copies of the gated data (from the data analysis software) for every sample should be available in a validation binder and/or electronic format. In addition, the location of the listmode files should be either included in the Validation Report or described in an SOP. Each laboratory must follow their institution s procedures for review, approval, and archiving of the validation report and data files Standard operating procedure The standard operating procedure (SOP) for the assay being validated is typically prepared before the validation begins; however, it is also acceptable to provide the method procedure within the validation plan and then create the SOP after the validation report is approved. 8of28 The SOP should be organized and written in a manner that fulfills the applicable regulatory requirements and is easy for end users to understand and follow. That should include

9 the information listed below or reference a source document such as another SOP where the information is located. Some laboratories will issue a final SOP after the validation is completed which includes the stability and LLOQ information, whereas other laboratories will capture stability and LLOQ in a Laboratory Information Management System (LIMS). Each laboratory must follow its institution s quality processes regarding the specifics of how this information is handled. Information included in the SOP: The reagent tables including storage and handling instructions. Complete details for sample accessioning, processing, and reporting. Detailed instrument setup and compensation procedure. Detailed instrument acquisition procedure (e.g., how many events to collect). Detailed gating instructions. Additional specific requirements for a method SOP depend on the regulatory environment in which the testing is conducted. 9of28

10 BASIC PROTOCOL 1 LIMITED ASSAY VALIDATION The Limited Assay Validation represents the minimal recommended parameters for research environments and non-regulated laboratories. In this example, it is assumed that the samples will not be shipped to a testing facility and assumes one operator and one instrument; thus, specimen stability as well as inter-operator and inter-instrument variation are not included in this validation plan. Note that although sample stability is not included in this validation plan, it is assumed that stability was demonstrated during assay optimization and that samples are analyzed within that window. Materials The materials are specific to each method undergoing validation. Refer the Validation Plan, Section SAY IT! Prepare Method Validation Plan. The Limited Assay Validation is recommended for research environments and nonregulated laboratories where formal documentation practices are not mandatory. Nonetheless, it is highly recommended that the validation plan be well thought-out prior to initiating the analytical method validation. Details regarding the Validation Plan are provided in Section 2.1. Using the fit-for-purpose validation approach, the validation parameters should be selected based on the intended use of the assay. The example shown below represents the minimal validation protocol and assumes one operator and one instrument. 10 of 28 a. Validation Samples. i. Intra-assay precision: Intra-assay precision (repeatability) must be performed in the assay matrix samples, e.g., whole blood, PBMC, bone marrow aspirate, CSF, cell lines, etc. If disease state samples are not readily available, the sample may be validated initially in samples obtained from apparently healthy donors (HD), but precision should be verified in a limited number (ideally at least three) of disease-state samples as they become available. It is acceptable to use the initial clinical specimens for this purpose. For the validation of assays designed to evaluate leukemic populations or MRD, it is acceptable to spike cryopreserved disease-state samples into HD blood or bone marrow, but precision should be verified in a limited number (ideally at least three) disease-state samples as they become available. ii. Inter-assay precision: Ideally, inter-assay precision (reproducibility) would also be conducted in matrix samples. If this approach is followed, then independent analytical runs must be performed on the same day to avoid introducing variability due to specimen stability. An analytical run is defined as all steps from sample processing to acquisition and analysis. When inter-assay testing is conducted on the same day, the sample processing and acquisition for each analytical run must be done separately QC material may also be used for the inter-assay precision. The control material should mimic the actual sample as closely as possible. If the test sample will be in whole blood or bone marrow, preserved whole blood controls such as CD-Chex (Streck), Multi-Check (BD), or MMUNO-TROL TM (Beckman Coulter) are the most appropriate. If the test sample will be PBMC, cryopreserved or lyophilized PBMC preparations are appropriate. The advantage of using QC material is that the inter-assay precision may be conducted on separate days.

11 iii. Limit of Blank/Detection: An assay must be assessed for performance adequacy on the lowest sample it is intended to test. Assessment of the LOB/LOD helps ensure that the lower limit of quantitation is set appropriately, and never below the LOB. Creating samples for the LOB/LOD requires some creative thinking. The investigator needs to find a way to create a sample which would be the equivalent of a buffer blank used in spectrophotometric methods and remain as close to the matrix sample as possible. Approaches to creating samples include: A sample partially stained by omitting antibodies in such a way that the target cell population or antigen of interest will not be detected. For some assays, this may be the gating control samples such as a Fluorescence Minus One (FMO), where one antigen is omitted from the staining tube, or FMX, where several antigens are omitted from the staining tube. A sample from which the target cell population has been depleted, for example by immunomagnetic beads. While this approach seems like an ideal solution, the depletion is rarely 100% effective. A healthy or disease state sample that lacks the target cell population. For example, a healthy donor will lack leukemic cells. iv. Limit of Quantification: Similar strategies to those described for the LOB/LOD would be used to create samples for the evaluation of the LLOQ with low levels of the population of interest. Typically, five levels of dilution (e.g., 1:3 dilutions) would be created for each of three donors. Approaches to creating the samples include: Admixing fully stained samples into partially stained samples. Partially depleting the population-of-interest. For leukemia and lymphoma, admixing disease-state samples into normal donor samples. v. Fixed Sample Stability: Samples for fixed sample stability should all be processed at the same time. A separate tube should be processed for each time point and storage condition evaluated. b. Validation Experiments: i. Specificity (assay optimization): As discussed in Section 1.1, specificity for flow cytometry for the most part is addressed during assay development. The validation documentation should describe the pre-validation information related to specificity. ii. Sensitivity: Lower limit of blank/detection If using FMO/FMx samples for the LOB/LOD, separate experiments are not required. Data from all the validation runs can be used for the calculation. A minimum of 10 data points should be used. Note that it is not always necessary to establish the LOB/LOD; in many cases the LLOQ is sufficient. That said, if the assay is not expected to be detecting rare events, the LOB/LOD or LLOQ may not be required. Note that once the LOB/LOD is established, data below LOD should not be used in the calculations for the other validation parameters (precision, stability, LLOQ). The data cannot be removed from the validation records, but should be displayed as <LOD. Each laboratory needs to decide how to report <LOD data, but, knowing that the data are below the detection limits of the assay, they should not be considered as meaningful in any way. 11 of 28

12 Validation Samples As described above Samples 10 Replicates 1ormore Runs Data from all validation runs 1 Operators 1 Statistical Evaluation Record both the events and/or relative % for the main populations (reportable results) where the LOB/LOD is relevant LOB Calculate the mean (all donors/all runs) LOD Calculate the mean (all donors/all runs) + 3 Acceptance Criteria There is no acceptance criterion for LOB/LOD. This parameter is informative only. 12 of 28 Lower limit of quantification (LLOQ) It is not always necessary to establish the LLOQ if the assay is not expected to be detecting rare events. Note that once the LLOQ is established, data below LLOQ should not be used in the calculations for the other validation parameters (precision, stability, technology transfer). The data cannot be removed from the validation records, but should be displayed as <LLOQ. Each laboratory needs to decide how to report <LLOQ data, but, knowing that the data are below the quantitation limits of the assay, they should not be considered as valuable. Validation Samples As described above Samples 15 (5 levels from each of three donors). Samples are as described in Section a, iv Replicates 3 Runs 1 or more. All samples from each individual donor must be acquired in the same analytical run 1 Operators 1 Statistical Evaluation Record both the events and/or relative % for the main populations (reportable results) where the LLOQ is relevant Mark all values <LOD Calculate the mean, and for each reportable result from each sample where all three replicates are >LOD and the number of events is greater than the minimum determined in each laboratory. Note that although several publications suggest that the minimum should be in the range of events, each individual laboratory should set their own limit based on the validation data and intended use of the data (Rawstron et al., 2008). Data Evaluation and Review the data tables for events and/or relative %. The LLOQ Acceptance Criteria is established at the lowest concentration where the following criteria are met for all three donors: All three replicates are >LOD The minimum number of events is present for all three replicates An appropriate titration effect is evident (i.e., values are roughly proportional to the dilution scheme applied i.e., 1:3) The cutoff for LLOQ is at the discretion of each investigator and should be based on the intended use of the data. The value of data with imprecision >30%-35% CV may be called in to question.

13 iii. Intra-Assay Precision (see Fig. 2): Validation Samples Matrix samples for the assay Samples 3-6 Replicates 3 Runs 1 or more (Note that all replicates per sample are tested in the same run) 1 Operators 1 Statistical Evaluation Calculate the mean, and for each reportable result from each sample where all three replicates are >LLOQ, if established Calculate mean of the for each reportable result from all samples (Mean ). The precision is then described by the Mean and the range. Acceptance Criteria 10 is desired, up to 25 is acceptable When the population or antigen is less well defined (rare events), 30%-35% CV can be acceptable It is not necessary for each sample to meet the criteria as long as the Mean meet the criteria. If one sample falls outside the acceptance criteria, the reasons should be explained in the validation report. Sample 1 Sample 2 Sample 3 Sample 4 Mean Figure 2 Repeatability: intra-assay precision experimental design and statistical analysis Example 1. In this example, four samples are assayed in triplicate. Each sample is assayed in triplicate in the same analytical run. All testing is performed by one operator on one instrument. 13 of 28

14 iv. Inter-Assay Precision (see Fig. 3): Validation Samples Matrix samples for the assay or QC samples, if available Samples 2-3 Replicates 3 Runs 2-4 (Note if only one instrument is available, ideally the runs would occur on different days. When QC material is being used for the inter-assay validation samples, then this is an acceptable approach. When matrix samples are being used, then the samples must be assayed on the same day to avoid the influence of sample instability. If the inter-assay samples are run on the same day, every effort must be made to ensure that the runs are independent of each other, such as separate sample processing and separate instrument setups.) 1 Operators 1 Statistical Evaluation For each analytical run, calculate the mean, and for each reportable result where all three replicates are >LLOQ, if established from each sample for each run Calculate and from the mean of all the runs for each reportable result from each sample For each reportable result, calculate the mean of the (calculated from the mean of all runs) from each sample (Mean ). The inter-assay precision for each reportable result should be reported at the Mean followed by the CV range. Acceptance Criteria 10% CV is desired, up to 25% CV is acceptable when the population or antigen is less well defined % CV can be acceptable for rare events. v. Fixed Sample Stability: Validation Samples Matrix samples for the assay (whole blood, bone marrow, cell lines, etc.) Samples 3-6 Time points Baseline (within 1 hr of staining), 6 hr at ambient, 24 hr at 4 C (the time point should be aligned with the maximal delay between staining and the time that the sample would be expected to be acquired on the instrument) Replicates 1 per time point Runs 1 per time point 1 Operators 1 Statistical Evaluation Calculate the % change from baseline and each time point Calculate between baseline and each time point Acceptance Criteria Stability is established at the latest time point where the assay meets the precision criteria set for the inter-assay evaluation. Normally, less than a 20% change from the baseline samples or a change comparable to the intra assay imprecision for 80% of the samples evaluated. 2. DO IT! The Experimental Phase: Execute the testing phase as described above. 14 of PROVE IT! Prepare validation report as described in Section 2.2. It is important to note that if any parameter is not within the acceptance criteria for one sample, explain the reason for the difference. Review the raw data to see if this parameter has low events (near LOD/LLOQ) or if there is a technical error or instrument error. If one or

15 more parameters in one of the validation samples do not meet the acceptance criteria, the assay may still be acceptable if all the other samples are within the acceptance criteria. A solid rationale must be provided and documented. If one or more reportable results fail the validation criteria, the assay will still be valid for the other reportable results but the readouts which failed validation cannot be resulted. Sample 1 Sample 2 Run 1 Run 2 Run 1 Run 2 Grand Mean Grand Mean Figure 3 Reproducibility: inter-assay precision experimental design and statistical analysis Example 1. In this example, two samples are assayed in triplicate in two analytical runs. All testing is performed by one operator on one instrument. 15 of 28

16 BASIC PROTOCOL 2 INITIAL ASSAY VALIDATION Initial Assay Validation is performed for a new assay in a laboratory. Using the fit-forpurpose validation approach, the validation parameters should be selected based on the intended use of the assay. The example shown below represents the minimal validation protocol for low and moderate risk assays in regulated laboratories. This example uses an efficient Design of Experiment (DOE) approach where total imprecision is determined but the individual sources of the variation are not identified. In cases where the assay does not meet the precision criteria, it may be necessary to conduct additional testing to identify the source of the error. For example, one instrument may not be fully optimized, or one operator may not be fully trained. If it is determined that the instrument and operators are not the source of the validation failure, then the assay will need to be re-optimized. Note that the validation of high-risk assays, as described in the Clinical Laboratory Standards Institute (CSLI) Guideline H62, Validation of Assays Performed by Flow Cytometry, is beyond the scope of this unit (Litwin & Oldacker, manuscript in preparation). Materials The materials are specific to each method undergoing validation. Refer the Validation Plan, Section SAY IT! Prepare Method Validation Plan. The Initial Assay Validation is recommended for regulated laboratories where Good Documentation Practices (GDP) are followed. Details regarding the Validation Plan are provided in Section 2.1. Using the fit-for-purpose validation approach, the validation parameters should be selected based on the intended use of the assay. The example shown below represents the minimal validation protocol for low- and moderate-risk assays in regulated laboratories. In this example, it is assumed that the samples will be shipped to a testing facility and that two operators and two instruments will be included in the validation; thus, precision, LOB/LOD, LLOQ, specimen stability, inter-operator variation, and inter-instrument variation are included in this protocol. 16 of 28 a. Validation Samples: i. Intra-assay precision (repeatability) must be performed in the assay matrix samples, e.g., whole blood, PBMC, bone marrow aspirate, CSF, cell lines, etc. If disease-state samples are not readily available, the sample may be validated initially in samples obtained from apparently healthy donors (HD), but precision and stability should be verified in a limited number (ideally at least three) disease state samples as they become available. It is acceptable to use the initial clinical specimens for this purpose. For the validation of assays designed to evaluate leukemic populations or MRD, it is acceptable to spike disease state cryopreserved samples in to HD blood or bone marrow, but precision and stability may be verified in a limited number (ideally at least three) disease state samples if they become available. ii. Inter-assay precision: Ideally, inter-assay precision (reproducibility) would also be conducted in matrix samples. If this approach is followed, then independent analytical runs must be performed on the same day to avoid introducing variability due to specimen stability. An analytical run is defined as all steps from sample processing to acquisition and analysis. When inter-assay testing is conducted on the same day, the sample processing and acquisition for each analytical run must be done separately. QC material may also be used for the inter-assay precision. The control material should mimic the actual sample as closely as possible. If the test sample will be in whole blood or bone marrow,

17 preserved whole blood controls such as CD-Chex (Streck), Multi-Check (BD), or MMUNO-TROL TM (Beckman Coulter) are the most appropriate. If the test sample will be PBMC, cryopreserved or lyophilized PBMC preparations are appropriate. The advantage of using QC material is that the inter-assay precision may be conducted on separate days. b. Limit of Blank/Detection: An assay must be assessed for performance adequacy on the lowest sample it is intended to test. Assessment of the LOB/LOD helps ensure that the lower limit of quantitation is set appropriately and never below the LOB. Creating samples for the LOB/LOD requires some creative thinking. The investigator needs to find a way to create a sample which would be the equivalent of a buffer blank used in spectrophotometric methods and remain as close to the matrix sample as possible. Approaches to create samples include: A sample partially stained by omitting antibodies in such a way that the target cell population or antigen of interest will not be detected. For some assays, this may be the gating control sample, such as a Fluorescence Minus One (FMO), where one antigen is omitted from the staining tube, or FMX, where several antigens are omitted from the staining tube. A healthy or disease state sample that lacks the target cell population. For example, a healthy donor will lack leukemic cells. c. Limit of Quantification: Similar strategies to those described for the LOB/LOD would be used to create samples for the evaluation of the LLOQ with low levels of the population of interest. Typically, five levels of dilutions (e.g., 1:3 dilutions) would be created for each of three donors. Approaches to creating the samples include: Admixing fully stained samples into partially stained samples. Partially depleting the population-of-interest. For leukemia and lymphoma, admixing disease-state samples into normal donor samples. d. Specimen Stability: Stability must be performed in the assay matrix samples, e.g., whole blood, PBMC, bone marrow aspirate, CSF, etc. Samples must be stored under the same conditions (i.e., temperature) as the actual samples. Ideally, the baseline sample will be processed within 2 hr of collection, but in cases where disease-state samples are procured from a commercial vendor, this may not be possible. In such cases, the baseline sample will be the time post-collection when the sample arrives at the testing facility. If samples are stored under refrigerated conditions, it is best to collect a separate sample for each time point so that the storage temperature remains constant for the stored samples. e. Fixed Sample Stability: Samples for fixed sample stability should all be processed at the same time. A separate tube should be processed for each time point and storage condition evaluated. f. Validation Experiments: i. Specificity (Assay Optimization): As discussed in Section 1.1 (Assessment of Validation Parameters for Flow Cytometric Methods), specificity for flow cytometry for the most part is addressed during assay development. The validation documentation should describe the pre-validation information related to specificity. g. Sensitivity (lower limit of blank/detection): If using FMO/FMx samples for the LOB/LOD, separate experiments are not required. Data from all the validation runs can be used for the calculation. A minimum of 10 data points should be used. Note it is not always necessary to establish the LOB/LOD, in many cases the LLOQ is sufficient. That said, if the assay is not expected to be detecting rare events, 17 of 28

18 the LOB/LOD or LLOQ may not be required. Note that once the LOB/LOD is established, data below LOD should not be used in the calculations for the other validation parameters (precision, stability, LLOQ). The data cannot be removed from the validation records. but should be displayed as <LOD. Each laboratory needs to decide how to report <LOD data, but knowing that the data are below the detection limits of the assay, they should not be considered as meaningful in any way. Validation Samples As described above Samples 10 Replicates 1ormore Runs Data from all validation runs 1-2 Operators 1-2 Statistical Evaluation Record both the events and/or relative % for the main populations (reportable results) where the LOB/LOD is relevant LOB Calculate the mean (all donors/all runs) LOD Calculate the mean (all donors/all runs) + 3 Acceptance Criteria There is no acceptance criterion for LOB/LOD. This parameter is informative only. h. Sensitivity: Lower Limit of Quantification (LLOQ): It is not always necessary to establish the LLOQ if the assay is not expected to be detecting rare events. Note that once the LLOQ is established, data below LLOQ should not be used in the calculations for the other validation parameters (precision, stability,). The data cannot be removed from the validation records but should be displayed as <LLOQ. Each laboratory needs to decide how to report <LLOQ data, but knowing that the data are below the quantitation limits of the assay, they should not be considered as valuable. Validation Samples As described above Samples 15 (5 levels from each of three donors). Samples are as described in Section a, iv. Replicates 3 Runs 1 or more. All samples from each individual donor must be acquired in the same analytical run. 1-2 Operators 1-2 Statistical Evaluation Record both the events and/or relative % for the main populations (reportable results) where the LLOQ is relevant Mark all values <LOD Calculate the mean, and for each reportable result from each sample where all three replicates are > LOD and the number of events is greater than the minimum determined in each laboratory. Note that although several publications suggest that the minimum should be in the range of events, each individual laboratory should set their own limit based on the validation data and intended use of the data (Rawstron et al., 2008). 18 of 28

19 Data Evaluation and Acceptance Criteria Review the data tables for events and/or relative %. LLOQ is established at the lowest concentration where the following criteria are met for all three donors. All three replicates are >LOD The minimum number of events are present for all three replicates An appropriate titration effect is evident (i.e., values are roughly proportional to the dilution scheme applied i.e., 1:3) The cutoff for LLOQ is at the discretion of each investigator and should be based on the intended use of the data. The value of data with imprecision >30%-35% CV may be called in to question. i. Intra-Assay Precision (see Fig. 4): Validation Samples Matrix samples for the assay Samples 3-6 Replicates 3 Runs 1 or more (Note that all replicates per sample are tested in the same run. If the same samples will also be used for inter-assay precision, each of two operators will test all of the samples.) 1-2 (Note that all replicates per sample are tested on the same instrument. Each operator conduct testing on one instrument.) Operators 1-2 Statistical Evaluation Calculate the mean,, and for each reportable result from each sample where all three replicates are >LLOQ, if established Calculate the mean of the for each reportable result from all samples (Mean ). The intra-assay precision for each reportable result should be reported as the Mean followed by the CV range. Acceptance Criteria 10% CV is desired, up to 25% CV is acceptable When the population or antigen is less well defined (rare events), 30% to 35% CV is acceptable It is not necessary for each sample to meet the criteria as long as the Mean meets the criteria. If one sample falls outside the acceptance criteria, the reasons should be explained in the validation report. j. Inter-assay precision (see Fig. 5): Inter-assay precision will follow a factorial design approach that allows for the simultaneous evaluation of multiple factors, as opposed to separate studies that evaluate one factor at a time. Using this efficient approach, each operator will acquire the samples on one of the two instruments being validated, but neither operator will acquire samples on both instruments. Thus, the inter-assay precision evaluation will encompass inter-operator and inter-instrument, but it is not possible to distinguish the variability contribution that comes from the analyst from that coming from the instrument. If inter-assay precision does not meet the acceptance criteria, an investigation into the contribution of the instrument and the analyst should be conducted. 19 of 28