Sequence quality: GMI Proficiency Tests for Whole Genome Sequencing of bacteria

Size: px
Start display at page:

Download "Sequence quality: GMI Proficiency Tests for Whole Genome Sequencing of bacteria"

Transcription

1 Sequence quality: GMI Proficiency Tests for Whole Genome Sequencing of bacteria Presented by Pimlapas Leekitcharoenphon (Shinny) (DTU-Food) Research Group of Genomic Epidemiology National Food Institute, Technical University of Denmark EURL-AR Training course /09/17

2 GMI ( 2

3 Objectives of GMI PT The main objective of the annual proficiency test (PT) is to facilitate the production of reliable laboratory results of consistently good quality within the area of whole genome sequencing (WGS) by Selecting two strains of three species of public health importance Selecting species that range in sequencing difficulties Assessing the sequencing quality based on a set of quality markers e.g. N50, no of contigs etc. but also the ability to identify epidemiological markers such as MLST and resistance genes Identify participants underperforming To facilitate harmonization and standardization of whole genome sequencing and data analysis setting tentative arbitrary quality control thresholds 3

4 Objectives of GMI PT - To quantify differences among laboratories in order to facilitate the development of reliable laboratory results of consistently good quality within the area of DNA preparation, sequencing, and analysis (e.g. phylogeny). - To facilitate harmonization and standardization in whole genome sequencing and data analysis 4

5 Structure of GMI PT, wet-lab Component 1a Material provided: Bacterial cultures (lyophilized) DNA extraction, purification Library-preparation, and whole-genome-sequencing of six bacterial cultures Component 1b Material provided: Purified DNA (pre-prepared, dried) Library-preparation, and whole-genome-sequencing of the same six bacterial cultures Results Submission reads (via a portal or ftp site) Survey response Method details MLST (optional) Resistance genes (optional) 5

6 Development of GMI PT 2014 pilot PT 2015 full roll-out Salmonella (2) E. coli (2) S. aureus (2) 2016 K. pneumonia (2) L. monocytogenes (2) C. coli (1) C. jejuni (1) 6

7 Participation in the 2016 GMI PT 46 laboratories in 22 countries had provided data for at least one of the PT components Australia (3), Austria, Belgium (2), Canada (2), Denmark (3), Finland, France, Germany (3), Hong Kong, Italy (7), Latvia, Luxembourg, Mexico, the Netherlands (3), Poland, Portugal, Singapore (2), Sweden (2), Switzerland, Taiwan, the United Kingdom (2), the United States (6)

8 Measured QC parameters Number of reads mapped to reference total DNA sequence reference chromosome reference plasmid #1 reference plasmid #2 reference plasmid #3 and unmapped reads Proportion of reads mapped to the above Depth of coverage, of the above Size of assembled genome Size of assembled genome per total size of DNA sequence (%) Total number of contigs Number of contigs > 200 bp N50 NG50 8 DTU Food, Technical University of Denmark 24 September 2017

9 Individual participants reports Pending for the 2016 trial 9

10 QC parameters output Resistance gene partly as expected Resistance gene not as expected Resistance gene as expected 2 times standard deviation 3 times standard deviation 10 Data from participants with obvious errors will be omitted prior to analysis DTU Food, Technical University of Denmark 24 September 2017

11 Proportion of reads mapped to reference DNA sequence (%) The proportion of reads produced which map directly to the closed genome of the same strain. (=> cannot exceed 100%) % % Only in Bact samples Indication of contamination or strain mix up Outlier #83 and #115 missing reads Outlier 11 DTU Food, Technical University Campylobacter; of Denmark GMI omitted #114

12 Size of assembled genome per total size of DNA sequence (%) The proportion of contigs which map directly to the closed genome of the same strain (=> should not exceed 100%) Outlier % Clearly contaminations Assembly exceed the expected size of the reference Outlier #83 and #79 of the DNA and #71 of both samples types % Outlier 12 DTU Food, Technical University Campylobacter; of Denmark GMI16-001

13 N50 Number of contigs - Fewer is better N50 Size of contig Total size of contigs 50% of size

14 N50 Definition: The length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs, and for which the collection of all contigs of that length or shorter also contains at least half of the sum of the lengths of all contigs. A N50 more than normally indicate good quality. bp Poor performance short contigs #79 and #105 for the Bact sample #71, #105, and #110 for the DNA sample bp Outlier 14 DTU Food, Technical University of Denmark Campylobacter; GMI16-001

15 Total number of contigs The total number of contigs assembled. A number of contigs less than 1000 normally indicate good quality. bp bp Poor performance large number of contigs #71, #79 and #105 for the Bact sample #71, and #105 for the DNA sample Outlier DTU Food, Technical University Campylobacter; of Denmark GMI16-001

16 SNP analysis Number of SNPs per strain Strain GMI #83 Sample Number of Type SNPs Culture 3 DNA 0 3 SNPs difference to the ref. (#83) DTU Food, Technical University of Denmark Campylobacter; GMI16-001

17 Overall results poor performance Obvious outliers removed, #114 submitted data of another strain #83 Bact, indication of contaminations Detected AMR genes not present in the reference genome Proportion of reads mapping to ref. much less than 100% Proportion of size per total size of ref. much higher than 100% 3 SNPs difference to the ref. #79 Bact, indication of contaminations and poor performance Detected AMR genes not present in the reference genome Proportion of size per total size of ref. much higher than 100% A total no. of contig higher than N50 lower than bp 17 DTU Food, Technical University of Denmark

18 Overall results poor performance, cont #115 Bact, indication of contaminations Proportion of reads mapping to ref. much less than 100% #71 Both sample types, indication of contaminations and poor performance Proportion of size per total size of ref. much higher than 100% A total no. of contig higher than N50 lower than bp for DNA #105, Both sample types, indication of poor performance A total no. of contig higher than N50 lower than bp for DNA 18 DTU Food, Technical University of Denmark

19 Overall results poor performance, cont #110, Both sample types, indication of poor performance N50 lower than bp for DNA 19 DTU Food, Technical University of Denmark

20 Summary of PT 2016 The interpretation of the MLST data and final layout of the QC are pending but scheduled to be finished in May 17 The individual participants reports disseminated before July 17 PT report 2016 online before July 17 and 2015 report before Sep 17 A satisfactory results for most labs except for #71, #79, #83, #114, #115 due to contaminations #71, #79, #105, #110 due to poor sequencing performance Continuation in 2017 focusing on Salm., E.coli and S. aureus 20 DTU Food, Technical University of Denmark

21 Acknowledgement Oksana Lukjancenko (DTU Food) Susanne Klarsmose Pedersen (DTU Food) Pimlapas Leekitcharoenphon (DTU Food) Rolf Sommer Kaas (DTU Food) Inge Marianne Hansen (DTU Food) Jacob Dyring Jensen (DTU Food) Frank Aarestrup (DTU Food) Ole Lund (DTU Systems Biology) Jose Luis Bellod Cisneros (DTU Systems Biology) James Pettengill (US FDA) Division of Microbiology (CFSAN/FDA) Anthony Underwood (PHE) Brian Beck (Microbiologics) Isabel Cuesta de la Plaza (ISCIII) Angel Zaballos (ISCIII) Jorge De La Barrera Martinez (ISCIII)..and the rest of WG 4 ( advisory group ) 21 GMI is supported by:

22 Thank you for your attention Pimlapas Leekitcharoenphon (Shinny), PhD Research Group Genomic Epidemiology WHO Collaborating Centre for Antimicrobial Resistance in Food borne Pathogens and Genomics European Union Reference Laboratory for Antimicrobial Resistance National Food Institute, Technical University of Denmark DTU Food, Technical University of Denmark