Scientific Opinion on. Performances of Brucellosis Diagnostic Methods for Bovines, Sheep, and Goats. Adopted on 11 December 2006 EFSA-Q

Size: px
Start display at page:

Download "Scientific Opinion on. Performances of Brucellosis Diagnostic Methods for Bovines, Sheep, and Goats. Adopted on 11 December 2006 EFSA-Q"

Transcription

1 The EFSA Journal (2006) 432, 1-44 Scientific Opinion on Performance of Brucellosis Diagnostic Methods for Bovines, Sheep, and Goats Scientific Opinion on Performances of Brucellosis Diagnostic Methods for Bovines, Sheep, and Goats Adopted on 11 December 2006 EFSA-Q

2 Index Summary Mandate from the European Commission Background Bovine brucellosis Sheep and goat brucellosis Terms of reference List of abbreviations List of definitions Introduction and objectives Bovine brucellosis diagnostics Sheep and goat brucellosis diagnostics Materials Test available for diagnosis of brucellosis Systematic collection of pertinent information Sytematic literature review Data from collaborative trial (CCT) Data generated by Member State National Reference Laboratories Methods Variables for the meta-analysis Statistical analyses Community collaborative trial Analysis plan EQUIVAL: Meta-analysis of one-sided equivalence tests LOGREG: Multi-variable analysis of Se and Sp Results and discussion Community collaborative trial for the evaluation of serological tests for brucellosis in sheep and goats Meta-analysis Summary of data sources and explorative analyses EQUIVAL: Meta-analysis of one-sided equivalence tests for Se and Sp LOGREG: Multi-variable analysis of Se and Sp Specificity data from National Reference Laboratories Conclusions Specific conclusions on tests evaluated for bovines Specific conclusions on tests evaluated for sheep and goats Recommendations Specific recommendations on test evaluated for bovines Specific recommendations on test evaluated for sheep and goats General recommendations References Working group members and acknowledgements AHAW Scientific Panel Members

3 Summary EFSA has been requested by the European Commission for an assessment of the available scientific data on brucellosis diagnostic tests, and to issue a scientific opinion on the suitability of current and new tests for the diagnosis of brucellosis in bovines, sheep, and goats. At the Plenary Meeting of 25/26 May 2005, the AHAW Panel decided to entrust the collection and analysis of available data to a working group. A report on this task was given to the AHAW panel. The conclusions and recommendations were adopted at the Plenary Meeting on 11/12 December The Scientific Report reviewed all the available scientific data on Brucellosis diagnostic tests for bovines, sheep, and goats using a meta-analysis approach. Based on the data of a systematic literature review, meta-analytical estimates of the animallevel diagnostic sensitivity (Se) and specificity (Sp) for all new and standard tests, as well as the estimates of the difference for Se and Sp for all combination of tests, were obtained. Using an equivalence analysis approach, the null hypotheses that Se or Sp of a new brucellosis test are lower than the performance of standard tests were investigated. The Report elaborated on the importance of the negative predictive value and the Se with regard to safety in intra- Community trade and in addition explored the role of the Sp for the given purpose. For bovines the key conclusions and recommendations of the AHAW panel were: i) the following brucellosis diagnostic tests currently considered as standard tests in the EU legislation on intra-community trade showed comparable Se and Sp and were found to be suitable for remaining as standard test: Rose Bengal Test (RBT), Complement Fixation Test (CFT), and Indirect Enzyme Linked Immunosorbent Assay (ielisa); ii) the Fluorescence Polarization Assay test (FPA) showed Se and Sp comparable to that of standard tests and was found to be suitable for inclusion in the EU legislation on intra- Community trade of bovines as standard test for brucellosis diagnosis; iii) the Radial Immunodiffusion Test with Native Hapten (RIDNH) showed lower Se and equal Sp comparable to that of standard tests and could be suitable for inclusion in the EU legislation on intra-community trade as complementary test for brucellosis diagnosis; iv) for the Competition Enzyme Linked Immunosorbent Assay (celisa), it is recommended that this type of test should remain in the EU legislation on intra-community trade, where it is currently included as a complementary test, pending the conduct of further studies; v) the Serum Agglutination Test (SAT) showed lower Se and Sp compared to other standard tests and was not found to be suitable for remaining in the EU legislation on intra-community trade. For sheep and goats the key conclusions and recommendations of the AHAW panel were: i) the two brucellosis diagnostic tests currently considered as standard tests in the EU legislation on intra-community trade (RBT and CFT) showed comparable Se and Sp and were found suitable for remaining as standard tests; ii) the modified Rose Bengal Test (MRBT), the ielisa, the celisa, the FPA, and the Brucellin Skin Test (BST) are suitable for inclusion in the EU legislation on intra-community trade because their Se was equal to that of standard tests. However, with the exception of the BST, these tests have Sp lower to the standard tests or their Sp is not sufficiently documented. Thus, when using Se and Sp as criteria for assessing the fitness for the purpose of intra- Community trade, the AHAW Panel concludes that these tests, except BST, are not suitable for inclusion in the EU legislation on intra-community trade of sheep and goats unless new data demonstrate that these tests are at least as specific as the standard tests. iii) the RIDNH is not suitable for inclusion in the EU legislation on intra-community trade because its Se was lower compared to that of standard tests. It can be pointed out that this test has equal Sp compared to standard tests. 3

4 Key words: Brucellosis, Bovines, Sheep, Goats, Animal Health, Diagnostic tests, Metaanalysis, Intra-Community trade, Zoonosis. 1. Mandate from the European Commission Letter SANCO/E2/FR/rd (05) D/ of 30 March 2005 Request for a scientific opinion concerning Brucellosis diagnostic methods for bovines, sheep and goats 1.1. Background Bovine brucellosis Article 16 of Council Directive 64/432/EEC 1, provides for the possibility to amend Annex C on diagnostic methods for bovine brucellosis following the comitology procedure. On 11 October 1999 the Scientific Committee on Animal Health and Animal Welfare (SCAHAW) adopted a report 2 on the modification of technical Annexes to Directive 64/432/EEC to take account of scientific developments regarding tuberculosis, brucellosis and enzootic bovine leucosis. This report proposed that updates of the OIE Manual of Standards for Diagnostic Tests and Vaccines on the diagnostic tests for brucellosis should be assessed as to their suitability for incorporation into EU legislation. The Brucella Fluorescent Polarization Assay (FPA) is a new diagnostic test that has been recently included, in chapter of the 4th edition of the Manual of Diagnostic Tests and Vaccines for Terrestrial Animals, published in 2001, as a prescribed test for international trade. Annex C to Council Directive 64/432/EEC was last amended by Commission Regulation (EC) No 535/ It differentiates two groups of tests, those that can be used for the purpose of certification for intra-community trade and the complementary tests. In order to assist the Commission in developing and updating current legislation in this area, the European Food Safety Authority is requested to issue a scientific opinion on the suitability of FPA for inclusion in Annex C to Council Directive 64/432/EEC Sheep and goat brucellosis Article 14 of Council Directive 91/68/EEC 4 provides for the possibility to amend Annex C on diagnostic methods following the comitology procedure. On 12 July 2001 the SCAHAW adopted an opinion on Brucellosis in sheep and goats 5. It was concluded that the tests used for serological diagnosis of B. melitensis in sheep and goats were developed for the diagnosis of B. abortus in cattle and, despite the fact that no proper validation of the tests had been done, a combination of tests showed adequate sensitivity and specificity for the control and eradication of the disease under certain conditions. However, it was considered essential to improve and validate the existing diagnostic tests. The Task Force (TF) on monitoring animal disease eradication 6 recommended discussing the possibilities to further improve tests and testing procedures. The eradication programmes, 1 OJ 121, , p. 1977/64. Directive as last amended by Regulation (EC) No 21/2004 (OJ L 5, , p. 8.) OJ L 80, , p OJ L 46, , p. 19. Directive as last amended by Commission Decision 2003/78/EC (OJ L 258, , p. 11.)

5 especially those which foresee the use of vaccination, would immediately benefit from the availability of new improved diagnostic tools. The Working Group on Sheep and Goat Brucellosis of the TF concluded at the meeting held in Brussels on September 2002 that modifications of Annex C to Council Directive 91/68/EEC should be considered only after proper assessment of the characteristics of available diagnostic methods. New tests could be taken into account as candidates for complementary diagnosis of sheep and goat brucellosis. The TF at the meeting held in Brussels on 2 December 2002 supported the proposal for a ring-trial (RT) in sheep and goat brucellosis serology, the protocol of which was further elaborated by a working group on sheep and goat brucellosis of the TF on 30 January The purpose of this RT was to propose serological tests that could be potentially included in Annex C to Council Directive 91/68/EC. In view of the above, the Commission asks the European Food Safety Authority to assess the available scientific data, including the outcome of the above mentioned ring-trial, and issue a scientific opinion on the suitability of new tests for inclusion in Annex C to Council Directive 91/68/EEC Terms of reference The Commission asks the European Food Safety Authority: With regard to bovine brucellosis: 1. to issue a scientific opinion on the suitability of the Brucella Fluorescent Polarization Assay (FPA) for inclusion in Annex C to Council Directive 64/432/EEC for the purpose of certification for intra-community trade or as a complementary test, and 2. to issue a scientific opinion on the suitability of inclusion of other new tests, if any, or the replacement of existing tests in Annex C to Council Directive 64/432/EEC for the purpose of certification for intra-community trade or as a complementary test. With regard to sheep and goat brucellosis: 3. to issue a scientific opinion on the suitability of the available diagnostic tests for inclusion in Annex C to Council Directive 91/68/EEC, and 4. in the event of a negative opinion to (3), to inform the Commission on which further validation studies are necessary to evaluate the reliability of each new test. 5

6 2. List of abbreviations AGID: agar gel immunodiffusion test BST: brucellosis skin test BF: Brucellosis-Free ca: Canada celisa: competitive ELISA CI: confidence interval CFT: complement fixation test CYT: cytosol de: Germany dpi: days post infection EC: European Commission EDA: explorative data analysis GD: Agar Gel Immunodiffusion Test es: Spain ielisa: indirect ELISA irl: Ireland it: Italy FPA: Fluorescent Polarization Assay FPSR False Positive Serological Reaction to brucellosis serological testing fr: France LPS: smooth lipo-polysaccharide LR: logistic regression MA meta-analysis MCMC: Markov Chain Monte Carlo technique melisa: milk ELISA mfpa: microplate fluorescence polarization assay MRBT: modified Rose Bengal test mmp: modified Micro Plate Rose Bengal Test mprbt: microplate Rose Bengal test NPV: Negative predictive value OPS: O-polysaccharide OBF: Officially Brucellosis Free OIE: World Organisation for Animal Health ot: other P: Prevalence PPV: Positive Predictive Value pt: Portugal RBT: rose Bengal plate test RIDNH: radial immunodiffusion-native hapten test SAT: serum agglutination test SE: standard error Se: diagnostic sensitivity SeDS: Sensitivity Data Set SOP: standard operating procedure Sp: diagnostic specificity SpDS: Specificity Data Set YO9: Yersinia enterocolitica O:9 WC: whole cells 6

7 3. List of definitions Bac: bacteriology used as gold standard Epi: epidemiology used as gold standard Complementary test: a test which has been listed in the Annex C to Council Directive 64/432/EEC but cannot be used for intra-community trade. These currently are the BST and the celisa. New test: recently developed test for individual animal currently not approved as a standard test Standard test: for the purpose of this study all those tests conducted on individual animals, for cattle are those listed in the Annex C of Council Directive 64/432/EEC as being approved for intra-community trade (ielisa, CFT, RBT, SAT) and for sheep and goats in the Annex C of Council Directive 91/68/EEC (RBT and CFT). 4. Introduction and objectives This opinion of the AHAW Panel of the EFSA is based on a Scientific Report drafted by the working group on Brucellosis and accepted by the AHAW Panel as the basis for this Scientific Opinion. The data, methods and results reported by the working group are summarised in this Scientific Opinion to the extent necessary to appreciate the analytical process and the basis for the conclusions and recommendations. The full Scientific Report is available on the EFSA website The OIE has adopted a procedure for the validation of diagnostic tests used for purposes related to international trade with live animals or animal products. It comprises the following steps in a diagnostic test validation process: Stage I. Calibration, repeatability, analytical Se, analytical Sp, Stage II. Cut-off value, diagnostic Se, diagnostic Sp, agreement between tests Stage III. Reproducibility Stage IV. Test applications, international recognition Tests under consideration by the working group should have completed stage I. However, it was recognised that there appears to have been a paucity of data on the performance of tests for the detection of brucellosis in sheep and goats. The mandate covers stage II. Specifically, the aim of the working group was to assess the diagnostic sensitivity and specificity of the test(s) fitness for purpose in the certification of freedom from brucellosis. Stages III and IV were also not the aim of the current mandate. Cost effectiveness of tests is not within the remit of this mandate. However, certain elements of cost are highlighted in the Discussion section, when considered relevant. The background provided in the mandate from the European Commission (EC) indicates that the cattle and the sheep and goat diagnostics needed to be evaluated separately. Brucellosis diagnostics in pigs was outside the scope of this mandate. The purpose of this study was therefore the evaluation of brucellosis diagnostic tests for use in intra-community trade. In the case of cattle, a prerequisite for intra-community trade is that the herd be Officially Brucellosis-Free (OBF) and in the case of sheep and goats that the herd has either the OBF status or the Brucellosis-Free (BF) status. To achieve this status all animals in the herd need repeated testing. In addition, for cattle the individual animals that are to be exported require testing prior to departure. In order for a herd test to be satisfactory the results of all individual animal tests need to be negative. Hence, it is of interest that the characteristics of a diagnostic test at the herd level be known. However, these characteristics are largely determined by the characteristics of the test when used at the level of the 7

8 individual animal. Hence, it is essential that these tests be evaluated at the individual-animal level. This is the focus of the comparisons that were carried out between the various tests. The implications of any differences that were found between tests were also discussed. Any evaluation of a diagnostic test needs to be made with its proposed use in mind. The overall objective of the study was to assess fitness for purpose of brucellosis diagnostic tests for use in intra-community trade, consistent with the relevant directive. Thus it is recognised that the request was not for the evaluation of these diagnostic tests for the purpose such as e.g. the control or eradication of brucellosis. It should therefore be noted that the current evaluation of diagnostic tests does not imply fitness or lack thereof for this purpose. For each of questions posed by the risk manager, the text below indicates the hypothesis that will be tested Bovine brucellosis diagnostics The first objective was to assess whether the FPA can be added to Annex C of 64/432/EEC as a bovine brucellosis diagnostic test for intra-community trade of cattle. To address this question it was determined whether the performance of the FPA, when carried out as described in its standard operating procedure (SOP), was equivalent when compared to the current standard tests used in the EU. Fitness for the purpose of intra-community trade was interpreted to mean that the test should have a high likelihood to detect previously unrecognised cases of brucellosis of animals that are considered for shipment to a OBF or BF herd in another Member State. To achieve equivalence it was therefore necessary to meet the following requirements: For standard tests: to have sufficient diagnostic sensitivity (Se). This was interpreted as not having significantly inferior Se than any of the current standard tests used in the EU; and to have reasonable Sp. This was also interpreted as not having significantly inferior Sp than any of the current standard tests used in the EU. For complementary tests: tests that did not qualify as standard tests, but have additional features to support the use of standard tests in brucellosis free or officially free herds when: o positive reactions are suspected to have been caused by either cross-reacting infections (e.g. as is currently indicated for the BST in the Annex C of 64/432/EEC) or other events such as vaccination (e.g. as is currently indicated for the celisa); o there are other technical justifications, such as haemolysed or anticomplementary sera. Other objectives: If, in the review of the information, scientific evidence demonstrates that any other test is equivalent to the current standard tests, such candidate new tests should be identified. Various tests are already approved for intra-community trade. In addition a test can only be proposed if its characteristics are well-defined. Hence, the working group decided to examine this question for the celisa. If in the review of the information, scientific evidence demonstrates that any current standard test is performing significantly worse than other standard tests and therefore represents a threat for intra-community trade, such non-adequate standard tests should be identified.. In the event no new test could be recommended the additional objective was to inform the Commission on which further validation studies were necessary to evaluate the reliability for each new test. 8

9 To gather evidence to support or otherwise the consideration of any new test as complementary test based on its diagnostic performance characteristics Sheep and goat brucellosis diagnostics The first objective was to assess whether any available brucellosis diagnostic test can be added to Annex C of Council Directive 91/68/EEC as a sheep and goat brucellosis diagnostic test for intra-community trade of sheep and goats. For brucellosis in sheep and goats the tests to be used for intra-community trade are the RBT and the CFT, as prescribed in Council Directive 91/68/EEC and as described in Council Decision 90/242/EEC. Since there currently are only two standard tests, various other available tests were considered, in addition to the FPA and the celisa. The same approach was used as described above for the evaluation of the FPA in cattle. Other objectives: If in the review of the information, scientific evidence demonstrates that any current standard test is significantly less sensitive (or less specific) than other standard tests and therefore represents a threat for intra-community trade, such non-adequate standard tests should be identified. In the event no new test could be recommended the additional objective was to inform the Commission on which further validation studies were necessary to evaluate the reliability for each new test. It should be noted though that the above does not represent the sequence in which the hypotheses were addressed. Specifically, it was decided that before addressing the objective about inclusion of a new test, the results comparing standard tests were assessed to decide whether any standard tests should be considered inferior and thus not be considered appropriate to compare new candidates against. The benchmark for comparison of new tests was the worst of the standard tests (to address the formal aspects of the mandate) and the worst of all equivalent standard tests after removal of less performant outliers. For the comparison of the Se, we set the allowed difference to zero. This choice for a critical value for the difference means that test B is considered equivalent with regard to sensitivity to test A if the Confidence Interval (CI) of the difference between the two sensitivities includes the zero. Test B is considered inferior to test A if the difference (sensitivity A-sensitivity B) is positive and CI does not include the zero. For the comparison of the Sp we chose the allowed difference to be zero as well. The choice of these values is motivated by the purpose of the testing (intra-community trade) defined in the mandate. A high level of Se is desirable to avoid false negative test results and thereby to reduce the risk of disease transmission via trade. A high level of Sp is relevant in relation to the purpose of intra-community trade for reasons discussed in this scientific opinion. Whereas from the mandate it is clear that the tasks differ between the brucellosis diagnostic tests for cattle and the ones for sheep and goats, the working group adopted the same general approach to address these questions. The approach taken consisted of a meta-analysis (MA) which encompasses evaluating all pertinent studies through a statistical analysis which takes into consideration differences between studies. It comprised the following steps: the examination of the questions posed and their discussion with the risk manager; the formulation of the study objectives and their implementation in statistical hypotheses; a systematic collection, evaluation, and summarisation of pertinent information ; the adaptation or development, implementation, and application of suitable statistical methodology for analysis; and 9

10 the reporting of the results through the present scientific report. 5. Materials 5.1. Test available for diagnosis of brucellosis The OIE Manual of Diagnostic Tests and Vaccines for Terrestrial Animals was considered as a source of information on tests to consider for potential evaluation. The working group considered the rivanol test, and decided not to include it. Furthermore, the SAT was not considered for sheep and goats. The test matrices considered were blood serum, milk, and skin. Milk whey was excluded. The milk ring test was also excluded as it pertains to pooled samples (see below Inclusion criteria). In order to be able to recommend tests they need to be sufficiently standardised. The working group used EU and OIE standards and manufacturer SOPs, as appropriate. Table 1 and Table 2 list the tests for which EU or OIE standards exist for diagnosis of brucellosis in cattle and in sheep and goats, respectively. The requirements for diagnostic tests approved for intra- Community trade of cattle are listed in the Annex C of Council Directive 64/432/EEC of 26 June 1964 on animal health problems affecting intra-community trade in bovine animals and swine. The requirements for diagnostic tests approved for intra-community trade of small ruminants are listed in the Annex C of Council Directive 91/68/EEC of 28 January 1991 on animal health conditions governing intra-community trade in ovine and caprine animals, and the methods are described in the Annex of the Council Decision 90/242/EEC of 21 May 1990 introducing a Community financial measure for the eradication of brucellosis in sheep and goats. Table 1. Tests available for the diagnosis of bovine brucellosis and their status according to EU legislation and the OIE Manual a. Name of the test EU OIE Indirect Enzyme-linked immunosorbent assay in serum (ielisa) b d Indirect Enzyme-linked immunosorbent assay milk (Milk ielisa) b e Complement Fixation Test (CFT) b d Milk Ring Test (MRT) b e Rose Bengal Plate Test (RBT) b d Serum Agglutination Test (SAT) b e Brucellosis Skin Test (BST) c e Competitive enzyme-linked immunosorbent assay (celisa) c d Fluorescence Polarisation Assay(FPA) Native hapten and polyb tests a) accessed on 1 st October 2006 b) Approved for intra-community trade in the EU (64/432/EEC). c) Complementary test in the EU (64/432/EEC) d) Prescribed test for international trade (OIE Mannual) e) Other test (OIE Manual) d e 10

11 Table 2. Tests available for the diagnosis of sheep and goats brucellosis (excluding B. ovis) and their status according to EU legislation and the OIE Manuala. Name of the test EU OIE Rose Bengal Plate Test (RBT) b c Complement Fixation Test (CFT) b c Brucellosis Skin Test (BST) d Indirect Enzyme-linked immunosorbent assay in serum (ielisa) Modified Rose Bengal Plate Test (MRBT) b7 Serum Agglutination Test (SAT) Competitive enzyme-linked immunosorbent assay (celisa) Fluorescence Polarisation Assay (FPA) Native hapten tests a) accessed on 1 st October 2006 b) Approved for intra-community trade (91/68/EEC) c) Complementary test d) Prescribed test for international trade (OIE Manual) e) Alternative test (OIE Manual) f) Other test (OIE Manual) d e The WG has interpreted the legal requirements for diagnostic tests for brucellosis in terms of the following selection criteria. for cattle: o celisa needed to meet EU and OIE requirements as well as supplier s SOP (VLA, SVANOVA) ; o FPA needed to meet the OIE and the supplier s SOP requirements; for sheep and goats: o the BST needed to meet the description in the OIE specifications; o the FPA test needed to meet the supplier s SOP; o the AGID test with native hapten needed to meet the procedure provided by J.M. Blasco; o the celisa needed to meet supplier s SOP (SVANOVA, VLA); o the ielisa needed to meet the supplier s SOP (Institut Pourquier, SVANOVA, VLA) Systematic collection of pertinent information The estimation of diagnostic performance of the tests considered in the report is based on three sources of evidence: data published in peer-reviewed journals or documented in a way that ensures scientific quality similar to the level of a peer-reviewed article, data from a collaborative trial sponsored by the EC on the performance of certain brucellosis tests for sheep and goat (referred to as ring trial in section ) and data generated by Member State National Reference Laboratories Sytematic literature review A systematic literature review was conducted by the working group, supported by EU national reference laboratories and Dr. Darrell Abernethy. Suppliers of the FPA, celisa and ielisa were requested by the EFSA to submit dossiers or unpublished studies. A total of Interpretation 11

12 articles pertaining to the diagnostics of brucellosis were evaluated for potential inclusion in the meta-analysis. This procedure was adopted to minimise the risk to exclude articles based on the failure of the original search criteria. For sheep and goats brucellosis the number of available studies was much more limited. For initial inclusion, the study data had to: pertain to brucellosis diagnostic tests in cattle or small ruminants, respectively; provide an opportunity for direct comparison between an approved test and a new test and hence the study needed to include: o tests described or referenced in sufficient detail to be able to reproduce the study. This includes origin of control sera and control of kit reagents, and cutoff used; o at least one standard test which meets current EC specifications; and o at least one potential new test which meets EC, OIE, and/or supplier s requirements. These are listed in the next section. Hence, for cattle the study needed to include results on either the FPA or the celisa. For sheep & goat celisa and ielisa there are no well-defined international standards. Hence, celisa and ielisa tests which did not comply with supplier requirements were still accepted but were identified as such provide additional assurance that no test would have been conducted according to obsolete methods, and therefore no publications prior to 1970 were considered. In addition, no ielisa results were considered prior to This is the date the ielisa was standardised in the EU. There was no explicit time limit for the FPA. In addition, since it was understood that few publications are available on brucellosis diagnostics in sheep and goat, the timeline was set to 1970 for all studies in either of those two species; allow for comparison of individual animal test results. Hence, tests on pooled (milk) samples were not considered; and permit re-classification of cases reported as suspect i.e. close to but slightly below the threshold of positivity. Some studies reported results as doubtful (sometimes called suspect or retest). However, in order to be able to compare different tests it was decided that only two categories would be allowed: positive or negative. Considering the mandate, the following rule was applied to this end: a suspect case was not classified as positive for Se; and a suspect case was classified as negative for the purpose of Sp. Articles that were considered pertinent for inclusion were subsequently evaluated in detail and a decision was made whether they met the exclusion criteria: Study population aspects: o As intra-community trade of vaccinated cattle is not permitted, vaccinated cattle were excluded. Hence, studies in cattle for which information on the vaccination status was not provided were excluded. o Sheep and goats can be exported from a herd that is either OBF or BF. In the latter case the herd may contain animals that have been vaccinated by the Rev. 1 vaccine provided they were vaccinated before the age of 7 months and sampled after 18 month of age. Hence, as with cattle, studies where the vaccination status of the sheep or goats was unknown were excluded. Also excluded were studies where these animals had not been vaccinated compliant with the EU Directive. However, data on vaccinated sheep and goats were only utilised to calculate Sp and not for estimation of Se. Study design elements: o the sample selection criteria needed to be well-defined and permit identification of the tests used to select samples. The criteria to select samples for evaluation of Se or Sp could e.g. have been based on a combination of two 12

13 tests as long as it was clear from the study report which tests had been used for this selection. On the other hand, use of a criterion such as two out of four tests positive was considered a basis for exclusion. o In case of experimental infections: for studies with repeated sampling post infection the following reasoning was applied. It was assumed that infected herds are tested every 6 months and that therefore animals will then have been infected for a period between 0 and 6 months. In addition, it was assumed that most tests will not often detect infections that have occurred less than 2 weeks earlier. Hence, only samples taken between 2 weeks and 6 months after inoculation were retained; and studies with strains not relevant for the epidemiology of brucellosis in the EU were excluded e.g. experimental infection with the Rev. 1 strain and infection with Brucella ovis. o For the evaluation of Sp only data on animals from herds reported to be OBF were considered. The country where the study was conducted, a minimum threshold for Sp, and a minimum number of animals included in the study were considered by the working group as potential selection criteria but none of them were retained. Instead, it was decided that these characteristics would be noted and their potential impact could be evaluated and adjusted for, as needed, in the statistical analysis. Similarly, publications that used a cut-off which was optimized based on the results from the study will not be excluded even though the evaluation is intended to assess methods that have an established cut-off. The reason for leaving them in the database for analysis is that the alternative cut-off that was used can then perhaps be evaluated e.g. to determine whether it possibly could be recommended as an alternative cutoff if the proposed is not found to be adequate. The data from the qualifying studies were collected by multiple reviewers using a web-based relational data base (MySQL) with four major tables in three levels of hierarchy (article, test, Se and Sp estimate). The data management was done using SAS (SAS version 9.1; SAS Institute Inc., Raleigh, NC, USA). The scientific report provides details of the data management, validation and cleaning processes Data from collaborative trial (CCT) Data from a Community collaborative trial (CCT, also referred to as ring trial in section 2.1.2) on performance evaluation of brucellosis serological tests for sheep and goats were made available to the WG for analysis and consideration. Results indicted as intermediate in the original data were considered as negative for the analysis. The participating laboratories are listed in the scientific report. A panel of 158 field sera provided by some participating laboratories has been tested. The sera belonged to one of nine distinct groups according to the species and the infection/vaccination status (Table 3). A total of 33 test entities, i.e. different tests or test with identifiable suppliers were included in the analysis. The CCT was conducted prior to the establishment of the EFSA WG on brucellosis serology. The objective of the trial has been to compare the performance of serological tests for sheep and goats each test considered with respect to including CFT and RBT standard and new tests to select serological tests that potentially could be included in the Annex C of the Council directive 91/68/EC Data generated by Member State National Reference Laboratories The Member State National Reference Laboratories were invited by EFSA to provide published or unpublished study results that merited consideration. Several national reference 13

14 laboratories from OBF Member States kindly provided results of their national test programmes to permit the evaluation of Sp data on a large number of animals. 14

15 Table 3. Nine groups of sera included in the Community collaborative trial for the evaluation of serological tests for brucellosis in sheep and goats. Species Infection Vaccination Number 1 G Y U 5 2 G N N 22 3 G N R 10 4 S Y N 18 5 S Y U 15 6 S N N 22 7 S N R 39 8 S N Y 5 9 S U U 22 S= sheep, G=goats, N=no, Y=yes, U=unknown, R=recently vaccinated (1-11 months). 6. Methods 6.1. Variables for the meta-analysis The data used for the meta-analysis were derived from the systematic literature review. Descriptive statistics were generated on the validated dataset to identify variables with many missing data, variables where only one category of entry had been used, variables where some categories contained very few entries, and variables that were highly correlated. The scientific report contains explorative summary analyses of the variables. The variables considered in various steps of the analysis of either Se or Sp are shown in Table 4 Each reference population was identified by a unique number. When subsets of the same reference population were used, only those subsets with size of at least 90% of the largest sample set were assumed to belong to the reference population from which the largest sample set was drawn. A so-called counter parameter was constructed for the data set for Se (and Sp) as the sample-size-weighted mean of all Sp (and Se) estimates observed within the same source document with the same test. The rationale for the counter parameters in a multivariable model is that they theoretically control for the variability that arises through the choice of a cut-off value, which affects both Se and Sp of a given test in an inverse manner. For each publication and reference population only one record for each test was kept in case of repeated Se estimations at different dpi (days post infection). This avoids that the weight of a study would be unduly inflated by the number of times the animals were tested. The Se results from a study were averaged across all dpi as described in the scientific report. No repeated measurements over time were observed in the Sp data set (SpDS). 15

16 Table 4. Study variables considered as potential covariates for sensitivity and specificity a. Description of variable b Variable name in Variable name in sensitivity data set specificity data set Part of study where used c Test entity (T*) testpos testneg 2, 3, 4 (s) Isotype of immunoglobulin detected (T) isotypepos isotypeneg 2, 4(c) Biological matrix of the test (T) matrixpos matrixneg 2, (3) Status of standardisation of the test (T) standardisedpos standardisedneg 2, 3, 4 (c) Antigen class of test (T) antigenclasspos antigenclassneg 2 Species (P) speciespos speciesneg 2, 3, 4 (s,c) Characterisation of reference population (P) charactpos charactneg 2, (3) Country of origin of the sample (P) countrypos countryneg 2, (3) Vaccination status of the sample (P) vaccstatpos vaccstatneg 2, (3) Vaccination dose (P) vaccdosepos vaccdoseneg 2 Age group (P) agepos ageneg 2, (3) Percent females in the sample (P) femalepos femaleneg 2, (3) Percent pregnant females in the sample (P) pregnantpos pregnantneg 2, (3) Approach for selection of cut-off (D) cutoffpos cutoffneg 2, 3, 4 (c) Criterion for determining true status (D) criterionpos criterionneg 2, (3) Sampling design of the study (D) designpos designneg 2, (3) Method considered as gold standard (D) goldpos goldneg 2, 3, 4 (c) Days post infection of the sample (D) Dpi not applicable 2, 3 Time between vaccination and testing (D) Vacctimepos vacctimeneg 2 Yersinia infection (DP) not applicable yersinia 2, (3) Sample includes repeated measurements (D)repeatpos repeatneg 2 Sample size (D) samplepos sampleneg 2, 3, 4 (o) Counter parameter (D*) cparapos cparaneg 2 Number of true results (R) Truepos trueneg (2), (3), 4 (o) Identification of source document (I) Idpubpos idpubneg implicitly used Identification of test modification (I) Idtestpos idtestneg implicitly used Identification of reference population (I) Idrefpos idrefneg implicitly used a The scientific report contains explorative summary statistics for the underlined variables. b It is indicated, whether the variable describes the test (T), the study design (D), the reference population (P), a random outcome (R) or serves as identifier (I). Variables marked with an asterisk (*) were constructed from the data. c 2= EDA, explorative descriptive analysis, 3=EQUIVAL, equivalence analysis (number in parenthesis if variable is used implicitly via the choice of reference population); 4= LOGREG, logistic regression modelling for stratification (s), as potential confounder (c) or outcome variable (o). 16

17 6.2. Statistical analyses Community collaborative trial The design of the trial as multi-centre study has led to repeated measurements of the same samples in different laboratories. This feature qualifies the data for separate analysis rather than combining them with the data obtained from the literature review. The detection rates (i.e. the proportion of samples testing positive) were determined for nine different groups of samples (see Table 3) and for each of 33 test entities. As a measure of reproducibility, the probability that two identical test materials (i.e. two aliquots of the same serum sample) sent to two different laboratories will both be given the same result (i.e. both negative or both positive) was considered. This definition is equivalent to the concordance in the methodology of ring trials. A method described by van der Voet and van Raamsdonk (2004) was used with a modification arising from the fact that only one test result for each serum, test and laboratory was available Analysis plan The analyses of the data set generated through systematic literature review are structured in three modules, namely EDA, an explorative analysis of the data set generated through systematic literature review, EQUIVAL, a meta-analysis (MA) to summarise evidence for diagnostic equivalence among the tests studied and LOGREG, an MA based on logistic-regression analysis to (a) summarise the estimates (= model predictions) of Se and Sp for each test and (b) summarise all pairwise differences of these performance parameters. The analyses were conducted on the Se data set (SeDS) and Sp data set (SpDS) established in the project data management phase as described above. All statistical methods described below have been implemented in the statistical package R 8 unless stated otherwise EDA: Explorative data analysis of the diagnostic Se and Sp The objective of the EDA was to describe the reported values of animal-level Se and Sp of standard tests and new tests for cattle and goats and sheep, to assess whether there is evidence for a publication bias and to explore the potential effect of confounding variables. The results are purely illustrative and will not be used for statistical inference. 95% exact binomial confidence intervals (CI) were calculated for each estimate of Se and Sp using the Clopper and Pearson method implemented in R and were compared graphically. Funnel plots of sample size against the diagnostic parameters were constructed. Funnel plots and box plots were generated using arcsine-transformed data because this transformation is useful for achieving a normal distribution for proportion data. Presupposing that results from larger studies converge to the true parameter and if parameter estimates from individual studies are symmetrically distributed, the shape of the plot resembles a funnel with the large opening down and the tip pointing upwards to the true parameter value. This shape is distorted in the presence of publication bias, for example if lower estimates are less likely published than higher estimates. Funnel plots are considered a quasi-statistical approach for detecting publication bias in MA. The potential relationship of covariate values (characteristics of the diagnostic tests, the study populations, or study design factors) on the diagnostic parameters was visualised using scatter plots or box plots for continuous and categorical variables, 8 R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL 17

18 respectively. All explorative analyses were conducted on the reduced data set for Se after removal of repeated Se estimates over time on identical reference populations, with the exception of the variable dpi EQUIVAL: Meta-analysis of one-sided equivalence tests The objectives of the EQUIVAL module was a) to test the null hypotheses that the animallevel Se and Sp of a new test is inferior to the diagnostic performance of any of the standard tests for each individual study and b) to summarise the evidence for such differences into meta-analytical summary measures. Similar comparisons among standard tests were done to identify less performant standard tests. As the Sp and Se were evaluated separately and tests for cattle and small ruminants were considered separately, these analyses were conducted on four different datasets. In the EQUIVAL module, all possible pairs of tests evaluated on one reference population were compared using the statistical approach of non-inferiority (or onesided equivalence) testing with regard to Se and, in a separate analysis, Sp using a modified approach based on Chen et al., The analysis was restricted to those pairs of tests that had actually been compared in the source document and on the same reference population ("matching approach"). Criteria for selection of Se (and Sp) estimates for analysis were defined using the status of the test and design characteristics (standardised test protocol, cutoff value defined by the SOP, bacteriology used as gold standard). The meta-analytical summary of the difference between Se (and Sp) of two tests measure was obtained using the inverse-variance method. If the CI of the meta-analytical difference measure for one given pair of tests included zero, it was concluded that there was no evidence that one of the two tests was inferior to the other. The impact of potential confounding variables was addressed through the matching approach and through the study selection criteria that were chosen (see scientific report on details for three selection rules). The use of the least performant standard test as benchmark for comparison requires identification of this least performing test. Therefore, all possible combinations of tests were considered. All test principles were included as separate entities in the comparison except as for the celisa and ielisa, where each identifiable supplier was considered individually and all generic or in-house versions were considered jointly. It was avoided to use the worst of the standard tests as benchmark for comparison if this test was in fact inferior (less sensitive or less specific) compared to all other standard tests LOGREG: Multi-variable analysis of Se and Sp The objective of the LOGREG module is to obtain meta-analytic (MA) summary estimates of the Se and Sp for all new tests and standard tests for cattle and small ruminants. Moreover, MA summary estimates the difference estimates for these parameters for all combinations of tests for cattle and small ruminants are obtained. The MA summaries are based on the data of the systematic literature review. The Se and Sp of the tests are investigated as outcome variables while adjusting for important confounding variables using logistic regression (LR) analysis. The impact of important potential confounders was thus addressed through statistical adjustment. An innovative element of the LOGREG module lies in the use of the Markov Chain Monte Carlo (MCMC) technique for generating probability distribution for the coefficients of the logistic models. These probability distributions are then used to generate the posterior probability distributions of Se and Sp by the inverse logit link. From the posterior distributions of Se and Sp, the meta-analytical difference statistic is simulated through Monte-Carlo sampling. The distribution of the differences (D) was used for the model-based statistical inference about equivalence. The proportion of iterations, in which D is less than or equal to the critical value δ is in fact the probability (confidence) that D δ. This approach is possible in the Bayesian framework, which allows to combine prior knowledge about a parameter ( prior information) with information derived from the data to 18

19 generate the posterior probability distribution of the parameter. For the reported analyses, only non-informative priors have been used. Similar to the EQUIVAL approach, the choice of the gold standard method was considered as most important covariate for the assessment of Se and Sp. For the purpose of the analysis of Se, two levels for the gold standard were considered, bacteriology (as reference level) and criteria other than bacteriology. For the analysis of Sp, epidemiology (as reference level) and criteria other than epidemiology were used. Note that the reference levels of these variables are those that were used for the strict selection rule in EQUIVAL. The EDA gave clear evidence for a potential strong effect of the presence of YO9 infection on the specificity estimate for tests evaluated for cattle. Hence, for the cattle/specificity model, this variable was included along with the gold standard in the LR models. Data pertaining to cattle and small ruminants were modelled separately. The LR can be used as prediction model of Se and Sp for a defined set of covariates. This was exploited to obtain model-based estimates (in the report also referred to as predictions) of the Se and Sp for a level of the covariate, which is the preferred level for interpretation. The preferred levels for the confounder gold standard were bacteriology for analysis of Se and epidemiology for analysis of Sp. The preferred level for the covariate related to Yersinia infection in non infected animals was not infected. Simple LR models were fitted for sensitivity and specificity separately for all tests under consideration with other explanatory variables but the results were beyond the scope of the mandate. However, these results are of interest to generate new hypotheses about potential confounders for the diagnostic sensitivity and specificity of serological tests for brucellosis. 7. Results and discussion 7.1. Community collaborative trial for the evaluation of serological tests for brucellosis in sheep and goats The detection rates (i.e. the proportion of positive results) were compared among the tests for all of the nine groups of samples investigated in the Community collaborative trial (CCT). The summary results were combined for all laboratories Table 5. The results are presented in detail in the scientific report. Here, the results of group 2 (22 noninfected, non-vaccinated goats) are demonstrated as an example. The detection rate was zero for all standard tests except generic version of the RBT (line 30 in Table 5), where a mean of 1.1% was observed across all laboratories. For new tests, the detection rate was zero as well except 4.5% for FPA (line 10) for one laboratory and mean 0.9% across all laboratories. Individual laboratories reported for mprbt (line 28) and MRBT (line 29) a rate of 9% with RBT (line 30) and a rate of 31.8% with ielisa H (line 24). The corresponding mean detection rates for all laboratories were 0.5%, 1.1%, 1.1% and 15.9% with these tests, respectively. The expected detection is close to zero and close to 100% for non-infected and infected animals, respectively. None of the tests evaluated reached these performance goals in all groups of the serum panel and all laboratories. For the celisas, no substantial differences among the different versions (suppliers) were found. Comparing their detection rates with those for standard tests, no important differences were found. The FPA was evaluated in three distinguishable variants. The results for different groups of sera suggest that the version FPA (line 10) shows more variability among laboratories compared to two other variants (lines 11, 12). The ielisa results show some variation in the detection rate due to the variant of the test. Two modifications of the RBT (mprbt and MRBT) showed variable detection rates among the laboratories. Unequivocal positive results with infected sheep and goats (vaccination status unknown) were obtained with the mmp (27) and the MRBT (29). The mprbt (28) showed more variable outcomes. 19