ASVCP quality assurance guidelines: control of general analytical factors in veterinary laboratories

Size: px
Start display at page:

Download "ASVCP quality assurance guidelines: control of general analytical factors in veterinary laboratories"

Transcription

1 Veterinary Clinical Pathology ISSN SPECIAL REPORT ASVCP quality assurance guidelines: control of general analytical factors in veterinary laboratories Bente Flatland 1, Kathy P. Freeman 2, Kristen R. Friedrichs 3, Linda M. Vap 4, Karen M. Getzy 5, Ellen W. Evans 6, Kendal E. Harr 7 1 Department of Pathobiology, College of Veterinary Medicine, University of Tennessee, Knoxville, TN, USA; 2 Rynachulaig Farm, Killin, Perthshire, Scotland, UK; 3 Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, WI, USA; 4 Department of Microbiology, Immunology and Pathology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO, USA; 5 Poudre Valley Health System Laboratories, Fort Collins, CO, USA; 6 Schering-Plough Research Institute, Lafayette, NJ, USA; and 7 Phoenix Central Laboratory for Veterinarians, Everett, WA, USA Key Words Analytical error, clinical chemistry, hematology, laboratory management, quality control Correspondence Bente Flatland, Department of Pathobiology, College of Veterinary Medicine, University of Tennessee, 2407 River Drive, Knoxville, TN bflatlan@utk.edu DOI: /j X x Abstract: Owing to lack of governmental regulation of veterinary laboratory performance, veterinarians ideally should demonstrate a commitment to self-monitoring and regulation of laboratory performance from within the profession. In response to member concerns about quality management in veterinary laboratories, the American Society for Veterinary Clinical Pathology (ASVCP) formed a Quality Assurance and Laboratory Standards (QAS) committee in This committee recently published updated and peer-reviewed Quality Assurance Guidelines on the ASVCP website. The Quality Assurance Guidelines are intended for use by veterinary diagnostic laboratories and veterinary research laboratories that are not covered by the US Food and Drug Administration Good Laboratory Practice standards (Code of Federal Regulations Title 21, Chapter 58). The guidelines have been divided into 3 reports on 1) general analytic factors for veterinary laboratory performance and comparisons, 2) hematology and hemostasis, and 3) clinical chemistry, endocrine assessment, and urinalysis. This report documents recommendations for control of general analytical factors within veterinary clinical laboratories and is based on section 2.1 (Analytical Factors Important In Veterinary Clinical Pathology, General) of the newly revised ASVCP QAS Guidelines. These guidelines are not intended to be all-inclusive; rather, they provide minimum guidelines for quality assurance and quality control for veterinary laboratory testing. It is hoped that these guidelines will provide a basis for laboratories to assess their current practices, determine areas for improvement, and guide continuing professional development and education efforts. Introduction In the United States, performance of human clinical laboratories is federally regulated through the 1988 Clinical Laboratory Improvements Amendment (CLIA), which is administered by the Federal Drug Administration (FDA) and the Center for Medicare and Medicaid Services (CMS). 1 Lack of such regulation for veterinary laboratories requires that veterinarians demonstrate a commitment to self-monitoring and regulation from within the profession. In 1996 the American Society for Veterinary Clinical Pathology (ASVCP) formed a Quality Assurance and Laboratory Standards (QAS) Committee in response to concerns by ASVCP members about quality management in laboratories performing veterinary testing. The QAS committee was charged with encouraging and promoting the establishment of standards for the performance of laboratory procedures on veterinary samples. 2 By providing leadership in this area, the ASVCP hopes to raise the quality of veterinary laboratory medicine and improve the health of veterinary patients. The ASVCP recently published the newly revised quality assurance guidelines online. 2 These guidelines underwent rigorous review by peers and the ASVCP Executive Board during 2009 with approval by the ASVCP membership in December The guidelines are 264 Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology

2 Flatland et al ASVCP QA guidelines: general analytical factors aimed predominantly at laboratory professionals and are not intended to be all-inclusive; rather, they provide minimum guidelines for quality assurance (QA) and quality control (QC) for veterinary laboratory testing. This report documents guidelines for the control of general analytical factors within veterinary clinical laboratories and is adapted from section 2.1 (Analytical Factors Important In Veterinary Clinical Pathology, General) of the newly revised ASVCP quality assurance guidelines. A glossary of relevant terms can be found in Table 1. It is hoped that these guidelines will provide a basis for laboratories to assess their current practices, determine areas for improvement, and guide continuing professional development and education efforts. Table 1. Glossary of terms, listed alphabetically, relevant to quality assurance guidelines. Term Definition Accuracy Refers to the trueness of a test result (how close the test result is to the true value of the analyte). 6 In clinical chemistry, an analyte s true value is represented by a thoroughly researched definitive method or reference method (eg, absorption spectrometry). Clinical method comparison studies may compare a newer field method to a definitive or reference method or, more commonly, compare 2 field methods. 6 For definitions of method types, see Tietz 12 and Westgard. 36 Analytical range (analytical measurement range, reportable range) Analytical sensitivity Analytical specificity Association Bias Bland-Altman plot (Bland-Altman difference plot, Bland-Altman bias plot) Calibrator Clinical reportable range Coefficient of variation (CV) Constant bias Control The range of numeric results that a method can produce without manipulation of the sample (eg, dilutions) (definition from College of American Pathologists). 36 Also defined as the analyte concentration range over which measurements are within pre-determined tolerances for imprecision (random error) and bias (systematic error). 6 The term reportable range is a CLIA term and is defined as the range of concentration of the substance in the specimen for which method performance is reliable and test results can be reported. 36 The ability of a test or method to assess small variations of the concentration of an analyte. 6,37 Use of this term is controversial, and it has been used by some as a synonym for limit of detection (LoD). 38 The ability of a test or method to determine the concentration of target analyte, despite presence of potentially interfering substances. 6 Refers to the fact that results yielded by 2 methods trend in the same direction. Represented by r, the correlation coefficient. An r value of 1.0 represents perfect association. Association is not a measure of agreement, asitis possible for 2 methods to be highly associated but still have considerable systematic error between them. 13 This value should be used in method comparison studies to help determine if a sufficient range of values has been obtained and to ensure that simple linear regression is appropriate for analysis. It should not be used as a means of determining method suitability. Synonym for systematic error (SE). 39 The term bias is used to refer to the mean difference between two methods, as mean difference represents SE. 36 For example, in Bland-Altman difference plot analysis, the mean difference may be referred to as the bias. Bias (average difference of paired values, or mathematical difference between means of each method) as derived from paired t-testing or regression analysis may be used in formulas for calculating TE. 14 (See also systematic error, constant bias, proportional bias,andtotal error.) A graph plotting the mathematical difference between results obtained using 2 methods on the y-axis and the mean of results obtained using the 2 methods on the x-axis. The mean difference (plotted as a line parallel to the x-axis) represents systematic error and is often referred to as the bias. Commercially available software programs may also plot limits of agreement, 2 lines surrounding the mean difference and parallel to the x-axis, representing 1/ SD from the mean. For information on the use of Bland-Altman plots, see Linnet and Boyd 6 or Bland and Altman. 19 A material or device of known or assigned quantitative characteristics that is used to calibrate, graduate, or adjust a measurement procedure. 3 (Contrast with control.) The lowest and highest numeric results that can be reported after accounting for any specimen manipulation (dilution or concentration) that is used to extend the analytical measurement range (definition from College of American Pathologists 36 ). Standard deviation divided by mean, multiplied by 100 (expressed as %). 40 CV varies with analyte concentration and is often higher at lower analyte concentrations. 37 (See also precision.) A type of systematic error where the degree of error remains the same over the range of analyte concentrations (the results of 1 method are consistently above or below the other method). 39 If the y-intercept in regression analysis deviates markedly from 0, this is evidence of constant bias. Constant bias can also be inferred from the mean difference of a Bland-Altman plot, if the mean difference is markedly above or below zero. 13 (See also systematic error.) A material (plasma, serum, whole blood, or other type of patient-derived specimen, lyophilized preparation, solution, or device) that is used in the quality control process. 3,41 (Contrast with calibrator.) Control chart A graph plotting observed control material concentration on the y-axis and run number, time, or date on the x- axis. A variety of computer software packages are available to generate control charts, including Levey-Jennings plots, Westgard multi-rule control charts, etc. 3 (See also Levey-Jennings plot.) Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology 265

3 ASVCP QA guidelines: general analytical factors Flatland et al Table 1. Term Continued. Definition Interferograph A graph plotting analyte concentration as a percentage of an original result (adulterated/non-adulterated 100) on the y-axis and interferent concentration on the x-axis. Used to evaluate and predict the effects of an interferent on analyte measurements. 26 Levey-Jennings plot A type of control chart, originally developed by Levey and Jennings in 1950, plotting observed control concentration on the y-axis and run number (or time) on the x-axis. Control limits are set as the mean 1, 2, or 3 SD and/or fractions of these values. 3,36 (See also control chart.) Limit of blank (LoB) The highest measurement result that is likely to be observed for a blank sample. Typically estimated as a 95% confidence limit (one-sided) using: Limit of detection (LoD) LoB ¼ mean blank þ 1:65ðSD blank Þ LoB replaces the previously used term Lower Limit of Detection (LLD). 27 The lowest amount of analyte in a sample that can be detected with a given probability (may not be quantified as an exact value). Particularly important for drug tests, forensic tests, cancer markers (and other biomarkers), and endocrine tests. 27 Estimated as a 95% confidence interval (one-sided) using: LoD ¼ LoB þ 1:65ðSD low conc sample Þ LoD replaces the previously used term Biological Limit of Detection (BLD). 27 Limit of quantification (LoQ) Lowest amount of analyte that can be quantitatively determined with stated acceptable precision and trueness, given stated experimental conditions. LoQ is similar to a previously used term, functional sensitivity. 27 Linear range That part of the analytical range in which measured and expected values have a linear relationship (based on a study of samples having known concentrations). 6,10 Method decision chart A graph plotting inaccuracy of a method (from 0 to TE) on the y-axis and imprecision of the same method (from 0 (MEDx chart) to 0.5TE) on the x-axis. The chart is divided into zones delineating unacceptable, poor, marginal, good, and excellent method performance. Used to evaluate method performance and help determine what type of statistical quality control is most appropriate. 8,23 An example of method decision chart use can be found in Jensen AL, et al. 42 Method decision charts can also incorporate sigma metrics as part of the decision-making process corresponding to quality of performance; worksheets are available on the Westgard website. 43 Method validation A complex set of procedures designed to show that a method or instrument works as expected and achieves the intended results. 36 Method validation studies may include all or some of the following: comparison of performance to that of a reference method, assessment of linearity, reportable range, detection limits, precision, analytical specificity, analytical sensitivity, interferents, and recovery. 44 Method verification A set of procedures designed to show that a method or instrument performs according to manufacturer s claims (confirmation that specified requirements have been met). 36 Generally considered less rigorous than method validation. Proportional bias A type of systematic error (SE) where the magnitude of the error changes as the analyte concentration changes. 39 Precision Quality assurance Quality control Quality control, multilevel Quality control, multirule Quality control, multistage Often proportional bias increases as the analyte concentration increases (but it also may increase as the analyte concentration decreases). If the slope in regression analysis deviates markedly from 1.0, this is evidence of proportional bias. Proportional bias can also be inferred from the spatial relationship of points to the zero line in a Bland-Altman plot (eg, if points hug the zero line at low analyte concentrations but scatter away at high analyte concentrations). 13 (See also systematic error.) The ability of a test or method to get the same result if a sample is analyzed multiple times. 37 The opposite of random error or imprecision. Precision is estimated via replication studies and is often expressed as the coefficient of variation (CV) of a test or method. 11 (See also repeatability and reproducibility.) A set of quality goals for laboratory performance. Concerned with all aspects of laboratory performance. Includes quality planning, implementation, monitoring, and assessment with the goal of providing an environment of continuous quality improvement. 3,45 Includes daily statistical and non-statistical procedures and strategies for ensuring that accurate results, appropriate comments, and correct interpretations are reported to clients. 3 A control procedure that uses different statistical control rules for testing differing levels of control materials (eg, different rules are used for controls materials testing low, normal, and high analyte concentrations). A control procedure that uses 2 or more statistical control rules for testing control measurements and determining acceptability of a control run. At least 1 rule is chosen for its ability to detect random errors, and another is chosen to detect systematic errors. 36 A control procedure that uses multiple statistical control rules for different facets of method or instrument performance (eg, different rules may be used at start-up or following a change in reagent lot or instrument service, versus during a routine instrument run). 266 Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology

4 Flatland et al ASVCP QA guidelines: general analytical factors Table 1. Term Continued. Definition Quality requirement Random error (RE, imprecision) Regression analysis Repeatability (within-run precision, intra-assay precision) Reproducibility (between-run precision, interassay precision) Run length Systematic error (SE) Total allowable error (TE a ) A goal for the analytical quality of a method or instrument. Quality requirements must be individually defined for each analyte. Most commonly expressed as allowable total error (TE a ) or as a clinical decision interval requirement. A hierarchy of quality requirement models has been recommended, consisting of (from most desirable to least desirable): (1) quality requirements based on clinical outcomes in specific settings, (2) quality requirements based on general clinical decision-making, (3) quality requirements based on published professional recommendations, (4) performance goals established by regulatory bodies or proficiency testing organizations, and (5) quality requirements based on current state of the art, based on proficiency testing or information in current publications. 46,47 Error occurring in a positive and/or negative direction, and whose occurrence, direction, and magnitude cannot be predicted. Synonymous with imprecision. 39 Commonly used regression models include linear regression (ordinary least squares regression), Passing-Bablok regression, and Deming regression. Closeness of agreement between results of successive measurements carried out under the same conditions (short-term replication study). 6,11 (See also precision and random error.) Closeness of agreement between results of successive measurements carried out under different conditions (different times, operators, calibrators, reagent lots, etc). Also known as a long-term replication study. 6,11 (See also precision and random error.) The length (time duration) of a shift during which specimens are analyzed. The run length begins with start-up QC and ends when specimen analysis is finished and final QC is performed (if any). If multistage QC is being performed, a run length may include QC during the middle of the shift, eg, if QC is performed every 50 samples. An error occurring in 1 direction (a systematic shift) between an observed measured value and the analyte s true value. SE may be the same over a range of analyte concentrations (constant bias) or change as the analyte s concentration changes (proportional bias). 39 An analytical quality requirement that places a limit on the amount of random and systematic error tolerated for a single measurement or test result (in other words, the amount of total error that is acceptable or tolerable for a single measurement or test result). 36 TE a should reflect the degree of change that needs to be detected for clinical decision-making and should be determined by individual laboratories for individual instruments. 48 Total error (TE) Total error is the combined effect of random and systematic errors. 39 TE ¼ SE þ RE t-test, paired TE calc ¼ bias meas þ 3s meas Where bias meas (representing SE) is the mathematical difference between the means of the 2 methods (as calculated during a method comparison study using t-testing or regression statistics) and s meas (representing RE) is the standard deviation of the test method (as calculated from a replication study). 14 A parametric statistical test used to compare the means of 2 groups. The non-parametric equivalent is the Wilcoxon rank sum test (Mann-Whitney U test), which compares the medians of 2 groups. Paired t-testing may be used to compare 2 methods if the measured analytes have a narrow measured range and if proportional bias is not present. 16 Definitions of many other terms applicable to quality assurance and quality control can be found in on-line glossaries. 3,12 CLIA is the Clinical Laboratory Improvements Amendment, the piece of legislation that governs federal regulation of human clinical laboratory performance. Monitoring Laboratory Performance Monitoring laboratory performance may be divided into internal and external monitoring. Internal monitoring refers to within-laboratory monitoring of procedures and equipment; internal monitoring of all laboratory equipment is recommended and should include assessment of electronic safety, calibration, maintenance, and performance. Maintenance of an instrument performance log for each instrument is recommended and should include information regarding any problems encountered, their investigation, and their resolution. Accumulated QC data should be reviewed systematically on a regular schedule through use of Levey-Jennings plots (or other control charts); appropriate actions should be taken when QC results exceed predetermined limits or demonstrate undesirable trends. 3 External monitoring refers to proficiency testing and inspections by accrediting organizations. Participation in an external proficiency testing program that is specific to veterinary laboratories is recommended. All participating laboratories should analyze the same materials. Results Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology 267

5 ASVCP QA guidelines: general analytical factors Flatland et al should be tabulated regularly (monthly, quarterly, or annually) by the testing agency and distributed to participants with statistical summaries expressing the closeness of individual laboratory means to the group mean. Means should be calculated and analyzed based on identification of the method (same methods compared). Each laboratory should carefully assess their reported performance; a marked deviation from the group mean should prompt inquiry. A more complete description of proficiency testing has been published. 4 Method Validation Prior to adopting a new test procedure or bringing a new instrument on-line, method or instrument validation should be performed to ensure the procedure performs according to the laboratory s standards and manufacturer s claims. Method or instrument validation studies may include assessment of linearity, precision, accuracy, analytical range, and detection limits (limit of detection, limit of blank, limit of quantification) of the method and examine the effects of interfering substances. 5 Reference intervals and QC procedures for the new method should be determined before patient testing begins. If limited data are available for reference interval determination, this should be explained in an addendum to the test, and the basis for interpretation of results should be explained. 6 Analytical quality requirements, such as allowable total error (TE a ) or clinical decision limits, should ideally be established for each test prior to beginning method or instrument validation studies. 7 These requirements serve as a benchmark for test performance. The total error (TE) inherent in the new method or instrument, as determined during validation studies, must fall within these quality requirements or the new method should be rejected. 8 Numerous commercial software programs are available to facilitate the statistical analysis of results collected during method validation studies. Additional information and graphing tools for method validation are accessible online. 9 Linearity study Linearity studies determine that part of the analytical measurement range (reportable range) of a method that is linear by assaying various analyte concentrations. Analyte solutions having matrices that approximate real samples are preferred over water or saline dilution. Using 5 levels of analyte concentration is recommended. 10 Level 1: close to the detection limit of the assay Level 2: 3 parts low pool plus 1 part high pool Level 3: 2 parts low pool and 2 parts high pool Level 4: 1 part low pool and 3 parts high pool Level 5: exceeding the expected upper analytical limit of the assay Three to 4 replicate measurements for each analyte concentration (each solution) are recommended. 10 To evaluate data, the mean value for each solution is plotted on the y axis and the expected value is plotted on the x axis. 10 The plot is visually inspected for outliers, linearity, and best fit line. 10 If the assay is not linear within the manufacturer s recommended reportable range, the method should be rejected. Alternatively, the reportable range can be modified to lie within the linear region. Short-term replication study A short-term replication study, also known as repeatability, within-run precision, or intra-assay precision, is an estimation of the random error (RE), or imprecision, of the method over a short time interval (typically o 24 hours). Samples are analyzed during a single 8-hour shift or within a single analytical run. 11 Standard solutions, commercially available control materials or pooled fresh patient samples can be used for analysis. The concentration of analyte should approximate important clinical decision-making concentrations. A minimum of 2 levels (normal and high) is recommended if the analyte is medically significant when increased. At least 3 levels (low, normal, and high) are recommended if the analyte is medically significant when decreased or increased. Performing a minimum of 20 replications is recommended during the time interval of interest. 11 Analysis should begin by determining the distribution of the data; if Gaussian distribution is not present, data should be examined for outliers. The cause(s) of outliers should be investigated and corrected as needed. If Gaussian distribution is not achieved following elimination of outliers, then transformation of the data and additional statistical analyses may be required. Data analysis should also include calculation of mean, standard deviation (SD), and coefficient of variation (CV). SD and CV, as measures of RE, should be compared to the laboratory standard (TE a or clinical decision limit). If the SD or CV exceeds this standard, the method should be rejected. 11 For this initial assessment of method precision, SE (bias) is assumed to be zero. Additional analysis of bias (determined from a comparison of methods study) should be conducted after replication studies are completed. Long-term replication study A long-term replication study, also known as reproducibility, between-run precision, or inter-assay 268 Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology

6 Flatland et al ASVCP QA guidelines: general analytical factors precision, estimates RE, or imprecision, of the method over a longer time interval that approximates real working conditions. At a minimum 20 replications are performed during different shifts (and runs) over a minimum of 20 days, ideally by different operators. Sample selection and data analysis are the same as for the short-term replication experiment. 11 Comparison of methods study A method comparison study provides estimation of systematic error (SE, or bias) of a new test method compared with an established method, if one exists. Both constant and proportional bias should be investigated. The comparison method should be chosen with consideration for known accuracy and quality and may be a definitive method, a reference method, or another field method. 12 Comparison of a new test method with proficiency testing data may also be considered; however, careful attention to the known accuracy of such data is recommended. A minimum of 40 patient samples tested by both methods is recommended, and studies should be carried out over a period of 5 to 20 days. 13,14 Specimens should represent the spectrum of results expected in clinical application of the method and should span the entire reportable range with adequate sample numbers at limits of the range. 13 Duplicate measurements by each method are desirable, but single measurements are acceptable. 13 Results should be examined at the time they are performed. If a marked difference is detected in values obtained by the 2 methods, immediate retesting should be performed to determine if the discrepancy is repeatable or if an error occurred. Specimens should be analyzed within 2 hours of each other (or sooner, depending upon analyte stability) by the test and comparative methods. 14 Sample handling should be defined in advance, to avoid artifactual variation in results based on differences in sample handling. If samples are analyzed at different laboratories (4 2 hour interval between testing), sample stability must be considered. Regarding data analysis, visual inspection using data comparison graphs is recommended. Outliers should be investigated and samples re-analyzed as needed (if still fresh). Scatter plots (referred to as a comparison plot 14 ) can be used for visual inspection of data range, adequacy of data distribution, and association; the convention is to plot results of the new test method on the y axis and results of the comparative method on the x axis. A best fit line can be drawn based on visual assessment of the data or may be drawn by commercial software packages. 13 A correlation coefficient (r) also assesses association between methods and helps provide guidance regarding which type of regression analysis can be used to estimate SE. 13 Correlation alone is not acceptable as a measure of agreement between methods. For analytes covering a wide range, regression statistics are typically used to determine SE (described below). 13,14 For analytes covering a narrow range (eg, electrolytes), t-test statistics may be used to determine SE. 14 Paired t-testing can be used to compare means of the results of the 2 methods and determine whether SE (bias, expressed as difference between the means) is statistically significant Paired t-testing, however, is not appropriate in the presence of proportional bias. 16 If r is Z0.99 (for data with a wide range) or (for data with a narrow range), then linear regression (ordinary linear regression, ordinary least squares regression) can be used to estimate the SE at medical decision concentrations. 13,14,17 Using linear regression, SE (bias) at a particular medical decision concentration (X c )canbe determined by calculating the difference between it and the corresponding y value (Y c ) from the regression line: Y c ¼ a þ bx c ða ¼ yintercept; b ¼ slopeþ SE ðbiasþ ¼Y c X c ðsame units as the analyteþ If r is o 0.99 (for data with a wide range) or o (for data with a narrow range), data should be improved by collecting more data or decreasing variance by doing replicate measurements before linear regression is performed. Linear regression assumes the comparative method is free of error and that any error in the new test method is normally distributed and constant over the range studied; these assumptions are not often met in clinical chemistry (especially where 2 field methods are compared). 13,18 As an alternative, Passing-Bablok or Deming regression (models that do not make these assumptions) may be used to estimate SE. 13,18 Regardless of the particular regression model, constant bias may be inferred if the y intercept differs significantly from 0, and proportional bias may be inferred if the slope differs significantly from ,16 Difference between the slope and its ideal value of 1.0 expressed as a percentage can also be used to describe proportional error (eg, an observed slope of 0.8 indicates a proportional error of 20%). 16 Subdivision of test results into groups (eg, below, within, or above the reference interval) may be used to provide additional analysis of data within ranges that are clinically significant. Creation of a difference plot (eg, Bland-Altman plot) is also recommended to provide a graphical analysis of the data. The mathematical difference between Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology 269

7 ASVCP QA guidelines: general analytical factors Flatland et al results obtained using the new test and comparative method is plotted on the y axis, and the mathematical mean of results obtained using both methods is plotted on the x axis. 19 The mean difference, plotted on the graph as a line parallel to the x axis, reflects SE and is often referred to as the bias. For methods with good agreement, results are scattered around the line of zero difference ( zero line, also plotted as a line parallel to the x axis), with approximately half the differences above and half below this line throughout the range of the analyte concentration. 13,20,21 Statistical programs creating Bland- Altman plots may also show limits of agreement (LOA), lines drawn parallel to thexaxisrepresenting 1.96 SDs from the mean difference. Judgment of method acceptability in part depends upon whether these limits (and the 95% of differences falling within them) are large enough to be of clinical significance. 22 Criteria for acceptable performance of a test depend on clinical relevance of the identified SE and on analytical quality requirements (eg, TE a ) for the test, as determined by each laboratory. Calculated total error (TE calc )includes SE, as determined by a method comparison experiment, and RE as determined by a long-term replication experiment 14 :TE calc =SE meas 13RE meas.se meas (also written as bias meas )isobtainedfrompairedt-testing or regression analysis and is the mathematical difference between mean results of the test and comparative methods (mean difference). SE meas can also be calculated as the mean of the differences between paired values. RE meas (also written as s meas ) is the SD of the replication experiment. Performance is considered acceptable if TE calc o TE a. 14 A method evaluation decision chart (MEDx chart), which takes into account the TE a, SE, and RE, also can be used to determine method acceptability. 8,23 Interference study Interference studies estimate systematic error (typically constant bias) caused by substances within the specimen being analyzed. 24 Common interfering substances include hemoglobin, lipids, and bilirubin. 25 Additional comparisons may be made between heparinized plasma and serum, serum samples collected in gel tubes and plain tubes, or other possible interferents as indicated by the test methodology, analyte, or instrument of interest. To perform an interference study standard solutions, individual patient specimens, or pooled patient samples can be used. The latter 2 are preferred because of their ready availability and complex matrix. 24 Samples with varying analyte concentrations spanning the clinical reportable range should be chosen. Defined quantities of hemoglobin (from lysed RBCs), lipid (commercially available solutions), or bilirubin (commercial standard solutions) are added to samples to reach an increased concentration anticipated to occur in patient samples. 24 The volume of interferent added should be minimized to avoid changes in the sample matrix. 24 Duplicate measurements on all samples are recommended. Small differences in the measured analyte concentration caused by the interferent may be masked by RE inherent to the method; duplicate measurements will help obviate this problem. 24 Measurements should be performed by both the new method and a comparative method, if one exists. If both methods show similar SE caused by the interferent, then presence of SE alone may not be sufficient to reject the new method. 24 A paired t-test is recommended for comparing the results from the interferent-containing sample and the unadulterated control, as regression statistics are typically not applicable (data unlikely to have a wide range). 24 The criterion for acceptable performance is SE meas (expressed as the difference between means) o TE a. 24 If the SE meas 4 TE a, the laboratory should decide whether specimens likely to contain interfering substances can be readily identified and whether specimens should be rejected if potential interferents are present or if their effect can be quantitated or semiquantitated based on additional studies. Interferences cannot always be avoided, and interferographs (definition in Table 1) can be constructed to examine and predict the effects of lipid, bilirubin, and hemoglobin on test results. 26 Interferences are species-specific; ideally, interferographs should be created for each analyte and species tested. Recovery study Recovery studies estimate the amount of systematic error that is due to proportional bias. Proportional bias occurs when a substance within the sample matrix reacts with the analyte and competes for analytical reagent; typically, magnitude of proportional bias increases as the analyte concentration increases. Proportional bias is determined by calculating the percent recovery of an amount of standard analyte added to a patient specimen. 24 Standard solutions of high concentration are often used, as they can be added in small amounts in order to minimize specimen dilution but still achieve a recognizable significant change in the analyte concentration. Dilution of the original specimen should not exceed 10%. 24 The amount of analyte added should result in a sample that reaches the next medical decision level for that analyte. Similar to the interference experiment, small additions will be 270 Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology

8 Flatland et al ASVCP QA guidelines: general analytical factors affected by the inherent imprecision of the method (RE) more than large additions. Replicate measurements of both recovery (spiked) and control specimens are recommended. Recovery samples should be analyzed by both the new test method and by a comparison method, if available. The number of patient specimens to be tested depends on the numbers and types of reactions anticipated to produce SE. Instructions for recovery study data calculations are available. 24 As for the interference study, the criterion for acceptable performance is SE meas o TE a. Small amounts of proportional bias may be acceptable; however, the method should be rejected if a large amount of proportional bias 4 TE a is observed. Detection limit study Detection limit studies estimate the lowest concentration of an analyte that can be reliably measured. Confirming detection limits is recommended for all assays in which a low value is of clinical significance, eg, forensic tests, drug levels, endocrine assays, immunoassays, and cancer markers. 27 Detection limit studies may quantify the limit of blank, limit of detection, or limit of quantification (definitions in Table 1). 27 A blank sample that does not contain the analyte and a recovery (spiked) sample containing a low concentration of the analyte are used in detection limit studies. Several spiked samples containing analyte at the detection concentrations claimed by the manufacturer may need to be evaluated. Twenty replicate measurements for each sample are recommended. The blank solution measurements can be performed as short-term replication studies (within-run, intra-assay) on the same day; however, the spiked sample should be analyzed over a longer period of time to evaluate day-to-day or between-run variation. A minimum of 5 days is commonly used. Mean and SD from replicate measurements are used to calculate the limit of interest (limit of blank, limit of detection, or limit of quantification). Additional details and calculations are available. 27 Personnel Knowledge Laboratory personnel should have thorough working knowledge of laboratory equipment and its use, including, but not limited to, the following topics: Linearity differences in animal samples compared with human samples Effects of hemolysis, lipemia, icterus, carotenoid pigments (especially in large animals), and different anticoagulants on each assay Reportable ranges Partitioned reportable ranges and reference intervals, if applicable (eg, breed-, gender-, or age-specific) Expected physiologic ranges. Repeat criteria may be established that trigger reanalysis of a sample. Criteria for repeating a test should include equipment-generated error messages or flags, as well as results that are grossly outside of normal physiologic range. For the latter, consider use of panic values pre-programmed into the operating system of the biochemical analyzer. Retesting to confirm an abnormal result should be communicated to the client as part of the report. Common problems encountered with veterinary samples and appropriate steps to take with various error messages or flags or sample conditions. Regular instrument maintenance schedule (daily, weekly, monthly, and as needed). Repair or replacement of inadequate or faulty equipment Problem-solving procedures (how to perform troubleshooting of instruments and assays) Appropriate use of comments and species-specific criteria. Comments and species-specific criteria may be of interpretive benefit to clients. Direct communication with clients should be limited to those in the organization who are qualified to provide data interpretation in the context of clinical history and previous therapies. Additional information about laboratory management is available. 28 Instrumentation Instrument performance Instrumentation and methods must be capable of providing test results within the laboratory s stated quality goals. Performance characteristics of particular importance include precision, accuracy, analytical range, detection limits, and analytical specificity (definitions in Table 1). 6 Information about analytical specificity is provided by demonstrating presence or absence of interference by various substances that may be present in the sample (hemoglobin, lipid, bilirubin, drugs). Instruments with adjustable settings for different substances, species, or both should be carefully checked to determine setting accuracy and validity. Laboratoryand manufacturer-defined performance characteristics should be compared and adjustments made as needed. The instrument manufacturer s technical representatives generally assist in this portion of instrument qualification and setup. Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology 271

9 ASVCP QA guidelines: general analytical factors Flatland et al Instrument function checks Manufacturer s instructions for routine maintenance (daily, weekly, monthly) should be followed unless laboratories have modified them for their own use and documented appropriate instructions in a manual of standard operating procedures. A log of instrument maintenance, calibration, and repair should be maintained in the laboratory or by a metrology unit. Appropriate function checks of critical operating characteristics should be made on all instruments. These include, but are not limited to, stray light, zeroing, electrical levels, optical alignment, and background checks. Prior to sample testing, laboratory personnel should perform QC, calibration, or both for each instrument daily or once per shift. Instruments should be operated by appropriately trained personnel according to manufacturer instructions. Calibration A calibrator is a material or device of known or assigned quantitative characteristics that is used to calibrate, graduate, or adjust a measurement procedure. 3 Most instruments should be calibrated at least every 6 months. More frequent calibrations may take place if required by the manufacturer, after major service, when quality control data are outside predetermined limits, or troubleshooting otherwise indicates need and when workload, equipment performance, or reagent stability indicate the need for more frequent calibration. 10 After calibration, control materials should be analyzed and must be within range according to the laboratory s standard operating procedure before patient results are reported. Quality Control (QC) Quality control may be non-statistical or statistical. Examples of non-statistical QC include, but are not limited to 3 : Use of standard operating procedures (suggested outline below) Checking water quality and electrical power sources Checking calibration of balances, pipettes, and centrifuges Checking temperature stability of water baths, refrigerators, and freezers Additional non-statistical strategies for monitoring the quality of laboratory results may include 3 : Use of repeat criteria (ie, which results automatically trigger repeat analysis?) Criteria for review by pathologist or medical personnel (ie, which results trigger review by a medical technologist or veterinarian?) Send-out criteria (ie, which results trigger a need to send duplicate samples to a reference laboratory for comparative testing?) Monitoring patient data using key indicator tests (ie, is there a trend in measured values of a key test that deviates from prior means?) Delta checks (comparison of serial patient test results) Limit checks (ie, are patient results within physiologic limits?) Statistical QC refers to the analysis of control materials and interpretation of QC results using preestablished acceptance and rejection criteria ( rules ) and control charts to evaluate instrument performance. The process of selecting statistical rules for analyzing QC data is referred to as QC validation. Control materials are not the same as calibrators (definitions in Table 1), although calibrators can be used as control material to perform QC. Reagents and materials used for QC Frequency of control material use should be documented as part of the laboratory s quality plan, and annual (or more frequent) review of quality policies and procedures by staff should be documented. Actions resulting from failed QC should also be documented and should follow standard operating procedures determined by the laboratory for each department and/or type of instrument. Such procedures may include confirmation of results, appropriate data entry, and appropriate use of charts and graphs. A mechanism (eg, audits, competency testing) should be in place to determine whether testing personnel follow policies and procedures correctly. QC records should be reviewed frequently to ensure that suitable action is taken when QC results fail to meet the criteria for acceptability, and a reporting structure should exist to inform management of QC issues. Problems requiring attention should be forwarded to appropriate individuals, and corrective actions should routinely be evaluated to determine effectiveness. Control materials can be pooled patient samples or commercially purchased controls or calibrators. If using calibrators as controls, use different lots for QC and for calibration. If pooled patient samples are used, establish a mean value for all analytes (minimum n = 10 to establish a mean). Regardless of the control material used and in addition to analysis of QC data and QC validation, laboratory results should be monitored using non-statistical strategies as described 272 Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology

10 Flatland et al ASVCP QA guidelines: general analytical factors above. For maximal stability and consistency for any given instrument, laboratories should ideally purchase a minimum 1-year supply of control materials (or calibrators) having the same lot number. 29 Control materials should be clearly labeled with dates received/ opened and should be stored according to manufacturer s recommendations. Expiration dates should be carefully observed, and expired reagents should be discarded appropriately. Verification of reagent stability over the run length should be done during method validation by assaying control materials multiple times throughout an entire run length and comparing the resulting mean and SD with results from short-term repeatability (within-run or intra-assay precision) experiments. Additionally, laboratories should establish criteria (or verify manufacturer s criteria) for an acceptable range of performance for QC materials. Mean, SD, and CV should be calculated with a minimum of 20 replications. 11 Control material analyte concentrations/activities often represent low, normal, and high results for people. If the thresholds for pathologic changes in animals are significantly different from these, it may be necessary to include additional control materials that more closely reflect pathologic analyte concentrations/activities in animals. The numbers of control materials used will depend, in part, on instrument performance and on the manufacturer s recommendations. QC guided by Westgard multirules or other rules based on QC validation is recommended (see below). Selection of the number of control materials is part of the QC validation process; 2 (normal analyte concentration and either high or low analyte concentration) to 3 (low, normal, and high analyte concentration) control materials are typically used. Additional QC data points (obtained either by analyzing control materials in duplicate or triplicate or by using additional control materials) may be needed for some assays to ensure a high probability of error detection and a low probability of false rejection. Use of additional control materials may also be needed if there are changes in reagent lots or instrument operators or if certain equipment maintenance (eg, software update) is performed. Controls should be assayed in the same manner as patient specimens. Controls should be run once daily (maximum run length of 24 hours) unless the instrument manufacturer recommends more frequent control runs. Establish QC frequency with the following considerations: Test frequency (throughput, the number of tests performed during each run or each day) Degree to which quality requirements for the test depend on precise analytical performance Analyte or reagent stability Frequency of QC failures Training and experience of personnel Cost (increasing QC frequency adds to overall cost per test) Selection of rules for the statistical monitoring of method performance (QC validation) QC validation can be done using normalized OpSpecs Charts, the EZRUNS calculator ( or other quality assurance programs. 30 QC validation utilizes an analytical quality requirement (TE a or clinical decision interval) for the test, along with CV (RE, estimated from replication studies) and bias (SE, estimated from method comparison studies), to determine possible rules that can be applied for statistical QC. 3 For most automated methods, a probability of error detection of 4 90% and probability of false rejection of o 5% are sufficient. For extremely stable assays with few anticipated problems, a probability of error detection as low as 50% may be acceptable. 31 Use of multiple QC rules to monitor 1 assay is referred to as multirule QC. Different QC rules may be required for different levels of a single analyte (multilevel QC). 32,33 For example, more stringent multilevel QC may be required to detect error at lower analyte levels than at higher analyte levels. More stringent QC rules may be necessary during initial adoption of a new method or after calibration and maintenance than during routine operation. Multistage QC refers to use of different QC rules for start up (more stringent) and for routine operation (less stringent). 34 Manuals of Standard Operating Procedures (SOP) SOP manuals may be organized as paper copies, stored electronically, or both, and personnel should archive and back up electronic copies appropriately. All procedures currently in use should be included in SOP manuals that are easily accessible by all laboratory personnel performing the assay. Appropriately identified individual(s) should perform editing. Organization of the manual(s) will vary with the size, needs, and requirements of the testing facility. Certain accrediting organizations may have specific requirements, and specific SOPs may be recommended or required. On completion of training of new personnel, a check-list should be used to document competency in performing the assay and knowledge of aspects related to the assay. When an SOP is revised, a review with all applicable Vet Clin Pathol 39/3 (2010) c2010 American Society for Veterinary Clinical Pathology 273