Use of whole genome sequencing of mycobacteria as a first line diagnostic clinical service in England. Tim Peto University of Oxford

Size: px
Start display at page:

Download "Use of whole genome sequencing of mycobacteria as a first line diagnostic clinical service in England. Tim Peto University of Oxford"

Transcription

1 Use of whole genome sequencing of mycobacteria as a first line diagnostic clinical service in England. Tim Peto University of Oxford

2 Disclosures No financial conflicts of interest Funding NIHR Oxford Biomedical Research Centre National Institute of Health Research Wellcome Trust Gates Foundation Medical Research Council Public Health England

3 Modernising Molecular Microbiology Oxford Large scale whole genomic sequencing of systematically collected clinical isolates Aim to introduce WGS into routine clinical service Main pathogens studied to date Clostridium difficile Mycobacteria Staphylococcus aureus Enterobacteriaceae Gonococcus

4 Whole genome sequencing applications Single pathogen-agnostic platform Species identification In silico antimicrobial resistance testing and virulence factor identification Relatedness Local, regional and national transmission Strain based surveillance Tracking disease reservoirs

5 What do we need for WGS Sequencing platform Computer hardware Database to store data Software to analyze the data Knowledge Base to interpret the data Reporting system Accreditation for clinical service

6 Modernising Molecular Microbiology Oxford Large scale whole genomic sequencing of systematically collected clinical isolates Aim to introduce WGS into routine clinical service Main pathogens studied to date Clostridium difficile Mycobacteria Staphylococcus aureus Enterobacteriaceae Gonococcus

7 Official PHE launch of WGS-based mycobacterial diagnostics in England on World TB day, 2017

8 Advantages of Whole Genome Sequencing Promise to be quicker, better and cheaper than conventional techniques Pankhurst Lancet Infect Diseases 2016

9 Workflow for sequencing Sample collection Primary culture Extract DNA Prepare DNA including tagging with unique nucleotides so that samples can be pooled Load sample into sequencing machine (local)

10 Sequencing platforms Two flavours: short read and long read Short read (~300 bp) Hard to assemble Long read strand sequencer (30,000 bp) High error rate Ion Proton MiSeq PacBio RS II MinION will it work?

11 Sequencing Load sample (with 10 others + 1 control) into Illumina MiSeq machine in local lab Machine runs for hours To get bp read lengths Results sent by internet to sequencing center via cloud (currently BaseSpace)

12 Current Oxford Software Solution COMPASS (not-for-profit) Automatically linked ( pipeline ) (Taverna) Speciation (Kraken Centrifuge); remove human DNA Fast detection of resistance and virulence determinants (Mykrobe) Mapping against reference for accurate detection of point mutations (BWA/Stampy) and obtaining near complete sequences [De novo assembly for novel genetic insertions and mobile elements (Spades/Velvet)] Comparison of sequences with other isolates Characterization of transmission chains

13 Resources for Pipeline - IT Hardware intensive Currently in Public Health England for England service Copy in Oxford for research and back up for PHE 10 servers 70 cores for sequencing 1 X 125GB RAM Server for Kraken speciation 3 x 125 GB RAM Servers for distant matrices

14 WGS TB report Report created by our open source software(compass pipeline) which creates an automated report Mycobacterium species Resistance prediction Nearest neighbour description of transmission chains

15

16

17 Relatedness 17

18 Resources Knowledge Base Catalogue of sequenced species for Kraken international archive Locally sequenced mycobacterium species Catalogue of resistance determinants Validated data for each species-drug combination Transmission chains Collection of previously sequenced isolates Calibration of genetic distances against epidemiologically defined controls

19 TB Species 168 Typed Reference Mycobacterium sequenced. Software developed to speciate TB samples Using Mykrobe software Bradley Nature Commnications (2015) Software run on 5348 samples giving the following top species:

20 Consecutive samples form Birmingham, UK tuberculosis 3,812 fortuitum 33 bovis 70 peregrinum 27 africanum 21 xenopi 22 intracellulare 362 mucogenicum 18 avium 322 lentiflavum 7 abscessus 257 paraffinicum 7 gordonae 174 simiae 7 chelonae 107 marinum 5 chimaera 62 porcinum 5 malmoense 48 szulgae 5 kansasii 34

21 Validation of Mycobacterium Speciation 1936 consecutive strains were compared between conventional genotyping (LPA -Hains) vwgs Quan et al J Clin Micro Feb 2018 Discordants Clinical samples: 15/1825 discordant (<1%) (No LPA identified TB complex discordant) Rare Species: 29/40 Mixtures: 5/23

22 Species: Causes of discrepants WGS catalogue is bigger than the HAIN catalogue. 30 cases. Most common example M.Llatzerense(11) M.ratisbonense(5) Problems with LPA identifying M.fortuitum WGS speciating TB (2), M.avium (1) Typed strain of M.tomidae probably wrong Need to further study the classification of rarer clinical Mycobacterium species

23 Prediction of resistance Current techniques Conventional phenotyping Slow and expensive Error rates are unclear Molecular testing for key mutations using LPA. Different strips for different groups of drugs

24 WGS resistance prediction List all relevant mutations in one step potentially give resistant information on all drugs Provide information on mixed strains WGS is able to provide quality of sample.

25 How good is the data Identification of the catalogue of resistant determining variants Interpretation of the results Validation of the results Determination of the error rates

26 Development of catalogue of resistant determinants Walker et al Lancet Inf Dis : 1202

27 Development of catalogue of resistant determinants Literature review showed the likely candidate genes determining resistant for each drug

28 Development of catalogue of resistant determinants Literature review showed the likely candidate genes determining resistant for each drug 16 genes

29 Strategy for identifying variants TB isolates have little variation Sequencing genes in TB show that most have only 0 or 1 mutation compared to the wild type Exploited this to create a simple algorithm to identify resistant determining variants

30 Step 1 Creation of resistance catalogue Drugs: INH*,RIF*,EMB*,PZA*,QUI,STREP, AG 16 candidate genes explored for resistant mutations of 4 front line drugs Training set (for Catalogue 1) sequenced and phenotyped (N) Midlands, UK (unselected) 1122 Midlands UK (outbreaks) 412 Midlands UK (resitant isolates) 94 Oxfordshire, UK (unselected) 338 Gauteng province, South Africa 54 Kenema district, Sierra Leone 79 TOTAL 2099

31 Mutations identified in relevant genes (CAT 1) Catalogue I (2099 strains) Resistant Sensitive INH RIF EMB PYR 35 55

32 Step 2 Creation of resistant catalogue Validation with 1552 new strains. Validation set (CAT 1) sequenced and phenotyped (N) Hamburg, Germany 841 Nukus,Uzbekistan 261 Gauteng province, South Africa 450 TOTAL 1552

33 Validation test (catalogue I) Pheno RES Pheno SENS Drug total R S U R S U sens spec U (%) INH RIF EMB PYR

34 Validation test (catalogue I) Pheno RES Pheno SENS Drug total R S U R S U sens spec U (%) INH RIF EMB PYR Very major error

35 Validation test (catalogue I) Pheno RES Pheno SENS Drug total R S U R S U sens spec U (%) INH RIF EMB PYR Very major errors (False negative) Major errors (False negative)

36 Step 3 Creation of Resistant catalogue 2 Use the validation set, together with the derivation set, to improve the catalogue 102 extra mutations classified as resistant to make a new catalogue Other mutations were classified as sensitive. In addition other mutations were included LPA / Xpert mutations Mutations from a systematic review of the literature by P. Miotto et al for ReSeqTB (Eur J Resp (2017) pnca mutations, Yadon A et al. A comprehensive characterization of pnca polymorphisms that confer resistance to pyrazinamide. Nat Comms 2017;:1 10. Any frame-shift mutation in a non-essential gene (pnca, katg) fabg1 L203 synonymous mutation

37 Mutations identified in relevant genes Catalogue vi (2099 strains) Catalogue v2 +(1552 strains + literature search) Resistant Sensitive Resistant Sensitiv e INH RIF EMB PYR

38 Step 4 Testing of Catalogue 2

39 CRyPTIC Comprehensive Resistance Prediction for Tuberculosis: an international consortium Whole genome sequencing a global collection of ~ M. tuberculosis genomes >30000 strains will also be phenotyped to 13 drugs Aim: to define the M. tuberculosis resistome

40 Sample collection

41 Results from the first strains Drug resistance and sensitivity predicted from genotype and validated against conventional phenotype testing First line drugs Isoniazid Rifampicin Ethambutol Pyrazinamide

42 Strains phenotyped (4 drugs) Country Total all sens INH-R RIF-S INH-S, RIF-R INH-R RIF-R Other Australia Belgium Canada China German Italy NL Pakistan Peru Russia Serbia S Africa Spain Swazi Thailand UK 3, Total

43 Testing of catalogue 2 Do the extra mutations Decrease the number of uncharacterized mutations that have not been assessed?

44

45 Testing of catalogue 2 Do the extra mutations Decrease the number of uncharacterized mutations that have not been assessed? Increase sensitivity at the expense of specificity?

46

47 For all isolates Resistant phenotype, n (%) Susceptible phenotype, n (%) R S U F Total R S U F Total INH RIF ETH PYR TOTAL Res PPV NPV Sensitivity Specificity No prediction (%) (%) (%) (%) (%) (%) INH RIF ETH PYR

48 For all isolates Resistant phenotype, n (%) Susceptible phenotype, n (%) R S U F Total R S U F Total INH RIF ETH PYR TOTAL False Negatives: Very Major Errors 90% had no mutations in relevant genes 9% had only sensitive mutations 1 new pnca mutation found in 2 PZA Resistant strains probably true

49 For all isolates Resistant phenotype, n (%) Susceptible phenotype, n (%) R S U F Total R S U F Total INH RIF ETH PYR TOTAL False Negatives: Very Major Errors 90% had no mutations in relevant genes 9% had only sensitive mutations 1 new pnca mutation found in 2 PZA Resistant strains probably true False Positives: Major Errors On review they had mutations that regularly predicted resistance In other isolates

50 Repeat testing 22 isolates discordants were repeat tested 19/22 were successfully tested 18 were reconciled 1 had intermediate MICs with unstable phenotyping Most discordants were due to known intermediate sensitivities or to clerical errors.

51 Estimation of clerical errors KatG S315T and rpob S450L are common mutations invariably causing resistance to INH and RIF respectively. The discordant rates for these mutations varied according to the different participating participating sites and was used to estimate the site-specific clerical error rate This rate was used to estimate proportion of false negatives that could be attributed to labeling errors.

52 Estimated Labeling errors Overall labeling error rates about 0.8% Varied between sites Effect of labelling errors causing discordance depends on the prevalence of resistance. About 40% of isoniazid and 16% of rifampicin resistance missed by genotyping might be explained by labelling errors.

53 Real life performance What is the Major Error and Very Major Error rates for genomics using Catalogue II? Use consecutive results from UK, Italy, Germany and Italy (c 4000 strains) Drug Genotype: Resistance Genotype: Sensitive Pheno R Pheno S False + (%) Pheno R Pheno S False (%) INH RIF EMB PZA

54 Real life performance What is the Major Error and Very Major Error rates for genomics using Catalogue II? Use consecutive results from UK, Italy, Germany and Italy (c 4000 strains) Drug Genotype: Resistance Genotype: Sensitive Pheno R Pheno S False + (%) Pheno R Pheno S False (%) INH RIF EMB PZA

55 Real life performance What is the Major Error and Very Major Error rates for genomics using Catalogue II? Use consecutive results from UK, Italy, Germany and Italy (c 4000 strains) Drug Genotype: Resistance Genotype: Sensitive Pheno R Pheno S False + (%) Pheno R Pheno S False (%) INH RIF EMB PZA

56 Can we predict 4 drug sensitive profiles? Well known that resistance to one drug predicts resistance to another drug. This was seen in the 10,000 strains.

57 Can we predict 4 drug sensitive profiles Well known that resistance to one drug predicts resistance to another drug. This was seen in the 10,000 strains. We postulated that genomics would predict 4 drug sensitivity accurately enough to be useful clinically.

58 Phenotypic results 4 drug sensitive Any pheno resistance Genotypic predicted results Any drug resistance 4 drug sensitive Total (2.8%) 2665 (35%) very major error rate: 2.8% at prevalence of 35% 1% at prevalence of 10% The error rate is dependent on the prevalence of resistance

59 Conclusions WGS gives acceptable results compared to conventional phenotyping Unknown mutations are probably rare Results are given as R, S and U. Discrepant results are being repeat tested Laboratory or clerical errors for conventional phenotyping is at the same rate as WGS predictions of sensitive strains for the 4 first line drug

60 Birmingham TB Lab Performance wk to Jan 8 th 2018 DNA for extraction 101 (7 failed) TotaL Miseq runs 10 (0 failed) Samples Sequenced and Reported 135 Failed to identify species 0 No with TB complex 41 Failed resistance (at least 1/4 main drugs) 1 Failed rifamplicin 0 Turn round time 7 days Pipeline turn round 0.5 days

61 Current work Cryptic Collaboration I Measure MICs for 13 common drugs Compare the results of WGS on 13 drugs on c40,000 extra strains Isoniazid, Ethionamide Rifampicin, Rifabutin Ethambutol Amikacin, Kanamycin Moxifloxacin, Levofloxacin Linezolid Delaminid Bedquiline, Clofazimine

62 Current work CRYPTIC II Widen the repertoire of drug resistant predictions For front line drugs Improve prediction accuracy identify intermediate levels of resistance Reduce the number of unknown mutations

63 Microtitre plate for semiquantitative DST to 14 drugs

64

65 Reading of Thermofisher Plates Human reading of a digital image Digital image can be stored Digital analysis of results avoids human error Crowd-source reading (digital v human) Bashthebug.net (please try it!) Each drug is read by 15 different readers via the internet. 600,000 drug MICs have so far been successfully read.

66 Future work Public Health England has already introduced routine WGS into clinical practice stopped routine phenotyping for fully sensitive strains We plan to: widen drug resistance reporting to other drugs as soon as CRYPTIC results validates WGS prediction models develop techniques for WGS sputum without culture Develop the use newer sequencing platforms for quicker, cheaper and easier sequencing (eg Oxford Nanopore) Develop open-source software solutions that can be accessed worldwide

67 Oxford, UK: Derrick Crook Tim Walker Sarah Walker David Clifton Danny Wilson Philip Fowler Clara Grazian Yang Yang Jessica Hedge Zam Iqbal Phelim Bradley Ana Gibertoni Cruz Sarah Hoosdally Carlos Del Ojo Elias Tanya Golubchik PHE Birmingham, UK: Grace Smith + team Brighton, UK: John Paul Kevin Cole Leeds, UK: Mark Wilcox Deborah Gascoyne-Binzi Acknowledgements National Institute for Communicable Diseases, South Africa: Nazir Ismail Shaheed Valley Omar Forschungs Institute Borstel, Germany: Stefan Niemann Thomas Kohl Matthias Merker Genoscreen, Lille: Philip Supply Serbia: Irena Zivanovic Pakistan TB control programme: Sabira Tahseen Mumbai, India: Nerges Mistry Camilla Rodrigues Anirvan Chatterjee Kayzad Nilgiriwala China / CDC China: Guangxue He Qian Gao Yanlin Zhao Joy Flemming Baoli Zhu Sydney, Australia: Vitali Sinchenko Vancouver, Canada: Jennifer Gardy London / Russia: Francis Drobniewski Valencia, Spain: Iñaki Comas Thailand / Singapore: Ong Twee Hee San Rafaele, Milano Daniela Cirillo Paolo Miotto Andrea Cabbibe Maria Rosaria De Filippo Lele Borroni RIVM, Netherlands Dick van Soolingen Han de Neeling Harvard Medical School Maha Farhat LSHTM/Peru David Moore Loui Grandjean OUCRU, Vietnam Guy Thwaites Thuong Nguyen Thuy Thuong