Cover Page. The handle holds various files of this Leiden University dissertation.

Size: px
Start display at page:

Download "Cover Page. The handle holds various files of this Leiden University dissertation."

Transcription

1 Cover Page The handle holds various files of this Leiden University dissertation. Author: Knetsch, Cornelis Willem (Wilco) Title: Molecular typing & evolution of Clostridium difficile Issue Date:

2 General Introduction and outline thesis Chapter 1 C.W. Knetsch

3 10 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile GENERAL INTRODUCTION AND OUTLINE THESIS This thesis describes the application of molecular typing methods to study the epidemiology and evolution of the bacterium Clostridium difficile. This research started in a period in which the so called hypervirulent types, claimed to cause more severe C. difficile infections (CDI), just had emerged worldwide including The Netherlands. The increase in more severe CDI prompted the development of rapid molecular assays that could detect hypervirulent C. difficile types and their involvement in hospital outbreaks. The aim of this PhD project was to identify novel molecular markers for hypervirulent C. difficile and use these markers to develop new rapid molecular typing assays. During this PhD project ( ), whole genome sequencing (WGS) combined with single nucleotide polymorphism (SNP) analysis evolved as a novel typing tool for strain characterisation. This novel typing method has already been applied to study the epidemiology of various pathogens including Mycobacterium tuberculosis 1, Staphylo - coccus aureus 2, 3, Streptococcus pneumonia 4 and C. difficile 5. This thesis illustrates the rise of sequence based typing, during which we started to develop and test new (multiplex) real-time PCR assays and gradually shifted our focus towards sequence based typing methods such as multi-locus sequence typing and whole genome SNP typing. THE GUT BACTERIUM CLOSTRIDIUM DIFFICILE Clostridium difficile is a rod-shaped anaerobic gut bacterium that is a major cause of antibiotic associated diarrhea in the developed world. Human CDI is usually acquired inside the healthcare system, and is therefore also known as a nosocomial infection. In addition, CDI has also been increasingly found outside hospitals (i.e. community acquired CDI), thereby demonstrating that CDI is not merely a simple nosocomial disease. Approximately onequarter of all diagnosed CDI patients are acquired in the community 6. Risk factors for CDI are the use of antibiotics, advanced age, comorbidity and hospital stay The economic burden of CDI on the healthcare system in the United States has been estimated to be $4.8 billion annually, excluding the costs of treating CDI in long term healthcare facilities like nursing homes 13. In 2006, it was estimated that CDI conferred a total direct cost to the EU of approximately 3 billion/year 14. Assuming a 3% annual inflation rate, this approximates to 3.7 billion in These costs refer to healthcare-associated CDI, including communityassociated CDI and nursing home associated CDI, and the indirect socio-economic costs. Recently the median incidence rate of CDI in Europe was estimated to be 6.0 per patient days with an overall in-hospital mortality of CDI cases of 12% 15.

4 General Introduction and outline thesis 11 Clostridia are members of the phylum Firmicutes, one of the four major groups of the normal gut flora, also called the gut microbiota. The other three major phyla are the Bacteroidetes, Proteobacteria and the Actinobacteria. The human gut microbiota consists of millions of bacteria that are inhabitants of our intestine; these bacteria can both be beneficial and/or harmful. The gut microbiota plays a crucial role in nutrition and health 16. Certain metabolites, like short-chain fatty acids (SCFA s), that commensal gut bacteria produce are important for the functioning of our immune system. In particular, the SCFA butyrate is known to promote the induction of regulatory T cells in the colonic environment 17. As such, a diverse and balanced microbiota is required to establish an efficient immune response against invading pathogens 17. The composition of the microbiota is host specific and may change over life-time as an effect of different environmental influences 18. For example, when a healthy balanced microbiota becomes disrupted by the (over)use of broad spectrum antibiotics, potentially harmful bacteria like C. difficile may get the opportunity to colonize the gut as a first step to initiate disease. In mice it was demonstrated that a single dose of clindamycin induced rapid changes of the gut microbiota, with reduction of the family Lachnospiraceae and dominance of Enterobacteriaceae 19, 20. Moreover, the reduction in bacterial diversity of the microbiota induced by antimicrobial therapy may last for a prolonged period 19. In line with this, the study by Hensgens et al. 21 showed that patients have an increased susceptibility for CDI during a period of 3 months after cessation of antibiotic treatment. Consequently, the disrupted gut microbiota plays an important role in the onset of CDI and potentially various metabolic diseases, including obesitas, diabetes and inflammatory bowel disease (IBD), which are at present studied. For example, it has been shown that the phylum Bacteroidetes is decreased in obese people, whereas this reduction was counterbalanced with a proportional increase in the phylum Firmicutes, which resulted into a gut microbiota that has an increased capacity to harvest nutrients 22. Faecal microbiota transplant (discussed later) is currently envisaged as an efficient treatment to restore the unbalanced microbiota. C. difficile is a commensal bacterium present in the intestine of approximately 5% of healthy individuals 23. C. difficile can become part of our gut microbiota when highly infectious spores are ingested. A spore is a dormant stage of C. difficile that allows the bacterium to survive outside a host for weeks to years. The robust outer shell of the spore 24 preserves the genetic material and ensures that the bacterium can survive various unfavourable conditions such as limited availability of nutrients. Since spores are extremely resistant to various disinfectants 25, heat or low ph, they are very hard to eradicate 26 and consequently play a key role in disease transmission. After ingestion, the spores are able to survive the acidic environment of the stomach. Upon exposure to bile acids in the small intestine, C. difficile spores start to germinate into vegetative cells.

5 12 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile The flagella of the vegetative cells promote movement to the colon where the bacteria adhere to the epithelial lining and await an opportunity to expand and colonize the colon 27. Inside the lumen of the colon C. difficile produces the two major virulence factors toxin A (tcda) and B (tcdb), which destroy the epithelial lining of the colon 28. The toxins are encoded on the so-called pathogenicity locus, along with the positive regulator (tcdr) and a putative negative regulator (tcdc) 12. Disruption of the epithelial lining causes the production of pro-inflammatory molecules and subsequently recruitment and influx of neutrophils and monocytes, thereby initiating inflammation, which eventually results into the characteristic symptoms of CDI: watery diarrhea and inflammation of the colon leading to pseudomembranous colitis as its most fulminant form 27, 29. It is commonly accepted that at least one of the toxins is required to cause disease, while the exact contribution of each individual toxin is still under debate 30, 31. THE DIAGNOSIS OF CLOSTRIDIUM DIFFICILE INFECTION (CDI) Accurate and rapid diagnosis of C. difficile is essential for starting antibiotic treatment and applying appropriate hospital infection prevention. Frequently applied diagnostic assays for CDI include the cytotoxicity assay, toxigenic culture, enzyme immune assays (EIAs) targeting the toxins or Glutamate dehydrogenase (GDH), and real-time polymerase chain reaction (PCR) based assays primarily targeting the toxin genes 12, 32. All of these diagnostic assays have their own strengths and limitations. An ideal diagnostic assay should be fast, inexpensive and easy to apply with high sensitivity and specificity. Unfortunately, none of the above mentioned assays fulfils all of these requirements. The performance of a diagnostic assay is evaluated by comparing it to a gold standard reference method. Currently, two methods for diagnosing C. difficile are considered as gold standard: the cytotoxicity assay 33 and the toxigenic culture 34. The cytotoxicity assay detects the presence of free toxins in stool samples, whereas toxigenic culture evaluates the potency of cultured isolates to produce toxins in vitro 35. Toxigenic culture is considered to be the most sensitive of both assays 36, and consequently, it was preferred as the gold standard. However, a major disadvantage of the toxigenic culture is the lack to differentiate patients with CDI from individuals with asymptomatic colonization. The most rapid and easy to perform diagnostic tests are the EIAs, but it has become evident that these rapid assays lack sensitivity to accurately diagnose CDI, especially in a low prevalence disease setting ranging between 5-10% 32, 35, 37, 38. In contrast, assays such as culture, cyto toxigenic culture and the cytotoxicity assay generally display sufficient sensitivity but are laborious and require expertise. Real-time PCR assays or nucleic acid amplification tests (NAATs) have become a

6 General Introduction and outline thesis 13 popular method for diagnosis of CDI since they offer rapid diagnosis with high sensitivity (88.5%) and specificity (95.4 %) in comparison with toxigenic culture 39. As a consequence, many commercial and in-house developed real-time PCR assays have been developed in recent years 40. Most of the developed NAATs target the toxin B gene(tcdb) which encodes for a major virulence factor. Multistep test approaches (algorithms) are proposed as a strategy to improve clinical interpretation 32, 41, 42. Ideally, such a two-step algorithm starts with a screening test that displays a high negative predictive value (NPV) to reliably exclude patients without CDI. An EIA to detect GDH or a NAAT targeting the tcdb gene are examples of suitable initial screening assays. Samples that were found positive in the initial screening must be confirmed by a second, relatively sensitive, assay which preferably detect free toxins in stools. A two-step algorithm has already been recommended in 2009 by the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) as an efficient approach to diagnose C. difficile 32. Currently, this guideline is being updated and the new guideline will recommend toxin detection in combination with clinical evaluation of the patients symptoms. There are other methods available for identification of C. difficile, however for some methods it is required to isolate C. difficile from a stool sample. An alternative method to identify and characterize C. difficile, that can be performed after primary (selective) culture, is matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) mass spectrometry 43. This method, which is easy to perform, is able to identify micro-organisms such as bacteria, yeasts and fungi fast and reliably. Single bacterial colonies can be directly spotted on a target plate and analysed in the mass spectrometer without extensive sample preparation. The MALDI-TOF mechanism is based on the analysis of ionized proteins which are arranged in a spectrum of increasing protein mass, resulting into a species-specific molecular fingerprint. This characteristic fingerprint can be compared to a database (pattern matching), containing reference spectra from various organisms resulting into an identification on species level. One example of this approach is the commercially available Biotyper system (Bruker, Germany). Currently, the medical micro - biology department of the Leiden University Medical Center successfully applies MALDI- TOF combined with the Biotyper for primary identification of pathogens 44. Although MALDI-TOF based identification is relatively fast, diarrheal samples, such as those of suspected CDI patients, need to be cultured first, which increases the overall duration of the workflow significantly. The majority of the peaks present in MALDI-TOF fingerprint are derived from highly abundant and conserved ribosomal proteins. This lack of diversity makes this method less suitable to perform molecular typing.

7 14 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile DISEASE MANAGEMENT Once a patient has been diagnosed with CDI three distinct interventions should be made. First, the ongoing antibiotic treatment should be discontinued or altered. Secondly, common infection prevention measures such as patient isolation combined with restricted contact, appropriate hygiene and effective cleaning of the environment are essential to prevent further spread of the pathogen 45. Thirdly, a C. difficile specific antibiotic treatment should be started if clinical symptoms do not resolve rapidly 46. Currently applied antibiotic treatments for CDI are either oral metronidazole or vancomycin, of which the latter is approved by the US Food and Drug Administration (FDA). A major concern when treating CDI with these antibiotics is the high number of CDI recurrences after treatment. Both antibiotics have a broad spectrum and wipe out the normal protective flora in addition to killing (vegetative) C. difficile. If spores in the gut are able to survive the antibiotic treatment, they could easily reinfect the host. Another FDA-approved antibiotic for treating CDI is fidaxomycin. This narrow spectrum antibiotic has the advantage of resulting in significantly lower recurrency rates compared to treatment with vancomycin 47. However, this beneficial effect was absent when treating the notorious PCR ribotype (RT) 027/BI/NAP01 strains 48. The cost effectiveness of fidaxomycin is currently still under debate and needs to be determined in a large scale noncommercial study. A therapy that has been used in the past, which was recently rediscovered as alternative treatment for recurrent CDI, is infusion of donor faeces, or the so called faecal microbiota transplantation (FMT) 49. Infusion of donor faeces is a simple but highly effective method to restore the disrupted microbiota to a healthy and balanced state. A recent clinical trial demonstrated that 81% (13 out of 16) of the patients with multiple recurrences of CDI were cured after having a single faecal transplant 50. Moreover, after the transplantation of donor-faeces, patients demonstrated increased bacterial diversity in their microbiota, similar to that in healthy donors. One issue regarding the infusion of donor faeces is the uncertainty about the harmful effects of the transplant for the recipient on the long term 51, such as transmittable diseases. This uncertainty is limited by screening the donor (faeces) for transmittable diseases although this screening result into high costs. This issue may be overcome by applying frozen faecal microbiota transplantation which has been shown to be effective in treating relapsing CDI 52. In addition, the novel oral, capsulized, frozen FMT may provide a solution for practical barriers like the discomfort of a nasogastric tube for the patient, although larger studies are needed to confirm the effectiveness and safety of frozen FMT capsules 53. Alternatively, a defined and simplified mixture of culturable bacterial strains, having a similar effect on restoring the microbiota, could overcome the high costs associated with screening donor faeces 54, 55. Key to accomplish this is to find out which components in the faeces are responsible

8 General Introduction and outline thesis 15 for the beneficial effect. The first attempt to understand this was recently reported by Buffie et al. 56. In this study it was determined that Clostridium scindens, which is able to meta bolize primary bile acids, is associated with resistance to C. difficile infection. The mechanism behind this is that the bacterium metabolizes primary bile acids into secondary bile acids which promotes resistance to infection. However, further studies are needed to ensure the safety of manipulating intestinal bile acids since some secondary bile acids have been linked to gastrointestinal cancers 57. Although treatment intervention and general infection prevention measures are important in all cases of CDI, sometimes additional efforts are required to control CDI caused by certain C. difficile subtypes with increased pathogenicity, like the so-called hypervirulent PCR RTs 027 and 078. In the next section the molecular concept for this increased virulence will be discussed. MOLECULAR FACTORS THAT POTENTIALLY CONTRIBUTE TO HYPERVIRULENCE Certain C. difficile strains are associated with a more severe course of CDI (i.e. severe diarrhea, pseudomembranous colitis, high recurrence rates and increased mortality) and are therefore often referred to as hypervirulent strains. An example of hypervirulent C. difficile is the epidemic type 027, which in 2005 was responsible for major epidemics associated with increased mortality rates 14, In addition, C. difficile PCR RT 078, which is particularly in Europe an important cause of human CDI, is also frequently associated with a more severe course of CDI 61, 62. In recent years a considerable amount of research has been carried out to find a molecular basis for the hypervirulent status of specific C. difficile strains. Although these efforts have resulted in some clues, the exact nature underlying this increased virulence still remains to be elucidated. Obviously, if certain C. difficile strains are able to produce increased levels of the major Clostridial toxins, this could contribute to their virulent status. In literature a clear relation has been suggested between mutations in the tcdc gene, a putative negative regulator inhibiting toxin expression, and an increased production of the toxins 63. Notably, it has been reported that the above mentioned hypervirulent PCR RTs 027 and 078 contain such mutations in the tcdc gene 64, and consequently this has been proposed as a possible explanation for their hypervirulent status. However, several studies also demonstrated that TcdC is not a major regulator of toxin production 65, 66 thereby questioning the role of tcdc mutations as a molecular factor contributing to increased virulence.

9 16 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile Various other potential virulence factors have been suggested, including the role of binary toxin, an increased rate of sporulation and resistance to specific antibiotics. It has been reported that the presence of binary toxin promotes cell protrusions in host colon cells which are potentially involved in colonization 67. However, the exact role of binary toxin in disease is not yet fully understood. An increased rate of sporulation for strains belonging to hypervirulent PCR RT 027 has been suggested in literature 68, 69. However, another study demonstrated that the sporulation rate of PCR RT 027 strains was no higher than that of non-pcr RT 027 strains 70, thereby challenging the contribution of sporulation dynamics to the hypervirulent status. Furthermore, resistance against certain antibiotics, such as fluoroquinolones, may have contributed to the epidemic spread of specific C. difficile PCR RTs worldwide. Clear examples of this are the two distinct epidemic C. difficile lineages that have emerged and spread globally after independent acquisition of the same fluoroquinolone resistance-conferring mutation 5. MOLECULAR TYPING AND EPIDEMIOLOGY OF C. DIFFICILE Rapid and adequate characterisation (typing) of C. difficile strains is crucial since this information drives disease management and the infection prevention control teams to implement effective measures. Molecular typing can be described as the characterisation of bacteria beyond the species level and clustering of individual bacterial isolates in a meaningful way 71. Molecular typing is also important for epidemiological surveillance studies since it provides insight into the national and global changes of circulating C. difficile PCR RTs and potentially highlights the emergence of new problematic ( hyper - virulent ) PCR RTs. Furthermore, molecular typing is used to early recognize hospital outbreaks and to study the transmission routes responsible for the spread of bacteria. Finally, certain molecular typing methods are very suitable to study the evolutionary relatedness among bacterial isolates. In chapter 2 a full overview is provided of the most applied molecular typing methods to study C. difficile. Here, the description is limited to a brief introduction of the most widely applied typing methods together with the latest trends in the field of molecular typing. Figure 1 provides an overview of the most frequently applied molecular typing methods to characterise C. difficile. It specifies which methods are preferred for certain analysis and/or studies. In general, typing methods with lower discriminatory power are useful for surveillance studies, whereas high resolution methods are suitable to study transmission in hospital outbreaks. Band based typing methods are frequently applied for strains characterisation, while sequence based methods are also used to study evolution (Figure 1).

10 General Introduction and outline thesis 17 Figure 1. Commonly applied molecular typing methods for C. difficile. Molecular typing Routine surveillance Hospital outbreaks PFGE Strain characterisation Evolution PCR ribotyping MLST CE ribotyping MLVA Whole genome SNP typing low Discriminatory power high The figure illustrates the most frequently applied typing methods for C. difficile categorized according to their usefulness, i.e. applicable for routine surveillance or hospital outbreak analysis. In addition, the methods are divided into methods used for strain characterisation and/or evolutionary analysis. The bottom scale indicates the relative difference in discriminatory power ranging from low to high. Routine surveillance of C. difficile Routine surveillance studies provide us with insight on the variety of C. difficile PCR RTs that are circulating within our healthcare system. The last decade, PCR ribotyping, pulsedfield gel electrophoresis (PFGE) and multilocus sequence typing (MLST) have contributed significantly to the epidemiological surveillance of C. difficile. PCR ribotyping has been adopted as the preferred method applied in Europe, whereas PFGE is primarily used in North-America. The mechanism by which PCR ribotyping characterizes bacterial isolates depends on the variability of the intergenic spacer region (ISR) between the 16S and the 23S

11 18 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile ribosomal DNA (rdna) 72 and is described in more detail in chapter 2. Currently, this technique is able to discriminate approximately 400 distinct PCR RTs. In 2004, MLST has been introduced as an alternative method to type C. difficile isolates 73. MLST discriminates bacterial isolates by using the DNA sequence variation (single nucleotide polymorphisms or SNPs) present in multiple highly conserved genomic loci (housekeeping genes). This variation can also be used to cluster bacterial strains in a meaningful way, i.e. reconstruction of the phylogeny, thereby making it an appropriate tool to study the evolution of bacterial species (figure 1). With various typing methods available worldwide, it is not surprising that strains can have multiple different assigned types, which is depending on the typing method used. For example, the epidemic strains responsible for the large C. difficile outbreaks were typed as PCR RT027/ST-1/NAP01, corresponding to the three typing methods mentioned above. It is obvious that using different typing methods is suboptimal since it limits comparison of typing results from various laboratories over the world. A consensus typing approach would facilitate the comparison of typing results obtained by different laboratories and eventually will improve our understanding of international epidemiology. Therefore, a Molecular Esperanto is needed in terms of speaking a common language of harmonized and standardized typing methodology 74, 75. One way to develop a consensus typing approach is to initiate a multi-centre feasibility study that adopts one typing approach and develops a consensus protocol for it. Such a study needs to address all kind of intralaboratory anomalies, including inter laboratory reproducibility and differences in data interpretation. In addition, an online, standardised and well curated database is needed to compare typing results from different laboratories. If not addressed properly, these issues will hamper the use of typing methods for surveillance on a broader scale (European or worldwide). Currently, workshops are issued by the Dutch reference laboratory that aim to improve the typing capacity in Europe 76. In addition, the adaptation of conventional PCR ribotyping to high resolution capillary gel electrophoresis (CE) PCR ribotyping together with the availability of a standardised and widely accepted protocol will hopefully overcome these issues (chapter 4). In the last decade, an increased incidence of CDI has been observed which was mainly a result of the large hospital outbreaks caused by the epidemic PCR RT027/ ST-1/NAP01 strains 58, 59. A recent study performed by He et al. 5 demonstrated that the strains associated with the outbreaks originated in North America (Montreal and Pittsburgh), and from there spread worldwide to Europe, Asia and Australia. In response to Dutch outbreaks, a national surveillance program was launched in the Netherlands combined with strict guidelines to rapidly recognize this epidemic PCR RT and to prevent its further spread. After the large outbreaks a decrease in incidence of PCR RT 027 was

12 General Introduction and outline thesis 19 noticed, whereas the overall mean incidence of CDI remained stable in the Netherlands 77. This decrease in incidence may partially be explained by the success of the implemented surveillance program and the successive hospital infection prevention measures. However, more recent data demonstrated that PCR RT 027 re-emerged in several Dutch healthcare facilities and caused outbreaks in the last year 78. This clearly demonstrates that vigilance is essential for long-term eradication. Other commonly found types within the Dutch healthcare system are PCR RTs 001, 014/020 and 078 (figure 2). The presence of these types in the Dutch healthcare system is not unexpected since these types are also frequently found within the European healthcare system 10 and other parts of the world. Notably, the sur veillance data revealed new emerging types like PCR RT078/ST-11/NAP07-08 which has been stably present as the third most frequently found PCR RT in the Dutch healthcare system. The rise of PCR RT 078 was also observed in several other European countries including the United Kingdom 79 and in particular Schotland 80. In addition, it has been reported that PCR RT 078/ST 11 is associated with severe CDI 62. Figure 2. The cumulative proportion of the major C. difficile PCR ribotypes present in Dutch acute care hospitals (May 2009 to September 2013) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 2nd trimester 09 3rd trimester 09 1st trimester 10 2nd trimester 10 2nd trimester 11 3rd trimester 10 1st trimester 11 3rd trimester 11 1st trimester 12 2nd trimester 12 3rd trimester 12 1st trimester 13 2nd trimester 13 Unknown RT Other RT RT087 RT126 RT023 RT015 RT027 RT005 RT002 RT078 RT014 RT001 Shown are the cumulative proportions of the major C. difficile PCR ribotypes in the Netherlands. All faecal samples that were collected in the sentinel surveillance and of which C. difficile was isolated in the National Reference Laboratory Leiden were included (n=2.498 in a period of May 2009 to September 2013). Data was collected by hospitals distributed all over the Netherlands and includes CDI patients older than 2 years. The three most frequently observed PCR RTs were 001 (18.3%), 014/020 (14.5%) and 078 (11.2%).

13 20 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile Molecular typing tools to study hospital outbreaks The typing methods described thus far have been very useful for highlighting the national and global changes of circulating C. difficile PCR RTs. However, these methods have also been commonly applied to identify hospital outbreaks. Hospital outbreaks are considered when there is a temporal increase in the incidence of a bacterial species caused by transmission of a certain strain 81. Although useful for detecting sudden increases in incidence, conventional typing methods often lack sufficient discriminatory power to determine if the identified bacterial subtypes responsible for the presumptive outbreak are clonal. Similarly, being infected with the same subtype does not necessarily mean that the corresponding CDI patients are infected with a clonal strain. This makes it hard to separate the isolates that are actually part of an outbreak from those that are novel unrelated introductions of the same subtype. For this reason conventional typing results are often interpreted in the context of clinical epidemiology data to enhance their usefulness. For example, when CDI patients are infected with the same PCR RT and have shared time on the same hospital ward then this most likely represents a patient to patient transmission event or they are infected by a common source. Thus, the interpretation of typing data is highly dependent on the quality of the corresponding clinical epidemiological data. To study hospital outbreaks in more detail a typing tool with high resolution is needed that is capable of characterising bacterial isolates with a high discriminatory power. Multilocus variable number tandem repeat analysis (MLVA) and the more recently emerged whole genome single nucleotide polymorphism (SNP) typing are powerful tools to study hospital outbreaks (figure 1). In chapter 2, the mechanisms and the utility of these high resolution typing methods are discussed in detail. MLVA exploits the variable number of tandem repeats in DNA sequences present at carefully selected genomic loci 82. The number of repeats for each locus can be combined to an MLVA profile, i.e. a string of numbers, which can be easily compared between bacterial isolates. MLVA is a useful method to study hospital outbreaks since clear cut off values have been defined when isolates are clonal or genetically related. MLVA typed bacterial isolates can be grouped into clonal and/or genetically related clusters using the summed tandem repeat differences (STRD) as a measure of genetic difference. Current practice in the Netherlands is that bacterial isolates, which are suspected to be part of a hospital outbreak, are initially characterised by PCR ribotyping, followed by MLVA analysis of isolates that belong to the same PCR RT. There are a number of published studies available that have used this method to study hospital outbreaks 8, Despite this, lack of standardisation of MLVA, especially the cluster analysis, has hampered its broader use for international surveillance 71. One study compared MLVA and whole-genome SNP typing as tools to investigate the transmission of C. difficile 88. This study reported a concordance of 80% in typing results between both methods 88.

14 General Introduction and outline thesis 21 Whole genome sequencing (WGS) and the related SNP typing analysis enable us to characterise and compare bacterial isolates at the level of single nucleotide differences. This level of discrimination allows us to compare bacterial isolates very precisely so that it can be determined if sequenced isolates belong to the same outbreak or not. As such the barriers of a potential hospital outbreak can be accurately defined as well as the pathways by which C. difficile has spread 89, 90. Bacterial isolates can be sequenced on a variety of platforms that all have their own dynamics 91, 92. Re-sequencing of bacterial isolates and subsequently mapping of the sequences against a high quality reference genome is an efficient method to reconstruct the whole genome and call the sequence variance (i.e. phylogenetic SNPs). Initially, high throughput WGS was restricted to dedicated core facilities due to the costs, the specialized equipment needed to perform sequencing and the complexity of the associated data analysis. Currently, a transition is ongoing of this high resolution technique from the core facilities towards molecular epidemiology laboratories worldwide. The main goal of these laboratories is to confirm or refute hospital outbreaks by WGS combined with studying the spread of bacteria in the community and the identification of novel reservoirs for human disease. Whole genome SNP typing has already been applied in some studies to investigate local transmission inside healthcare systems and more globally to study the worldwide spread of C. difficile 5. The high resolution of this typing method and the availability of the whole bacterial genome offer many opportunities to answer a variety of epidemiological questions. However, the wider usage of this method is still restrained by the complexity of the underlying SNP typing data analysis. Host to host transmission of a bacterial strain can also be studied by whole genome SNP typing. Host to host transmission is evident whenever sequenced isolates have identical genomes (i.e. 0 SNP differences). In case of a few SNP differences more sophisticated analyses are required to confirm or refute transmission. To analyse this, knowledge of the molecular clock (i.e. the rate at which bacterial genomes acquire mutations) is needed, combined with a robust definition (cut-off values) of when strains are considered to be clonal. In a recent study clonality was defined as 2 SNP differences between bacterial isolates 94. Also, this study applied a cut-of-value of >10 SNPs as sufficient genetic diversity for cases to be genetically unrelated. A concern when using a standard cut-off value for determining if strains are clonal is that the mutation rate can vary between certain C. difficile strains. CDI is not a simple nosocomial disease Traditionally, symptomatic CDI patients are considered to be responsible for the large majority of CDI transmission within the healthcare system. However, a recent study using high resolution SNP typing has demonstrated that we need to reconsider the traditional

15 22 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile view of transmission solely by symptomatic patients 94. The authors of this paper showed that approximately 45% of the presumed outbreak cases (i.e. same subtype, same hospital, same period) could not be linked to a previous CDI case according to whole genome SNP typing. This nicely demonstrates that high resolution whole genome SNP typing enables us to study more precisely host to host transmission during hospital outbreaks. Another interesting observation described by the study of Eyre et al. 94 was that novel C. difficile genotypes (i.e. genetically unrelated > 10 SNP differences) were continuously introduced in hospitals during the study period, meaning that these novel genotypes probably originated from a large and diverse C. difficile reservoir outside the healthcare system (i.e. the community). Various sources are potentially involved in the establishment of this diverse C. difficile community reservoir including the environment, food and animals The presence of C. difficile in environmental sources (soil and water) is reviewed by Hensgens et al. 6. In this review it was summarized that clinically relevant strains can be found in environmental sources. However, the actual infectious dose to acquire CDI is unknown and accordingly the contribution of these environmental sources to the established community reservoir is unknown. In addition, foodborne transmission of C. difficile has also been hypo thesized as a potential source for community-acquired CDI 01, 102. Asymptomatic colonized individuals may potentially contribute to CDI out side medical settings (i.e. community acquired CDI). Interestingly, recent studies have demonstrated increased incidence of CDI outside medical settings 96, 103, 104. A definition of community acquired CDI is provided by the European Centre for Disease Prevention and Control (ECDC) 105 : patients with symptoms of CDI starting in the community, or within 48 hours of admission to a healthcare facility, provided that the onset was more than 12 weeks after the last discharge from a healthcare facility. In addition to community acquired CDI, asymptomatic colonized individuals may also be responsible for introducing C. difficile into the healthcare system and onward transmission to other patients. Asymptomatic carriage of C. difficile can be quite common among patients visiting hospitals, i.e. ranging from 4% to 12% of tested stools The C. difficile subtypes observed in the community are similar to those observed in the healthcare system with the exception of PCR RT , which suggests that C. difficile is circulating between both domains. However, in a relatively small study conducted by Eyre et al. 108 no evidence of onward transmission from asymptomatic colonized individuals was observed. Another question that needs to be addressed is where and how these asymptomatic individuals become colonized. One possible route may be household transmission. CDI cases that were discharged from the hospital and who return home, may still shed spores and potentially transmit C. difficile to their relatives/ household contacts (9.5%), pets

16 General Introduction and outline thesis 23 (18.2%) and the home environment (1.4%) 111. However, larger studies are needed to quantify the relative contribution of household transmission as a source for human CDI. Another reservoir that has been hypothesized as potential source for human CDI is the animal population, which is discussed in the next paragraph. C. difficile in (farm) animals C. difficile is increasingly recognized in a variety of animal species 6, 102, It is prominently present in farm animals, including cattle, horses and pigs, but it can also be found in companion animals like cats and dogs 116. Interestingly, the PCR RTs isolated from farm animals are very similar to the clinically relevant PCR RTs, including types 014/020, 002, and , , however with the exception of PCR RT 027. Notably, PCR RT 078 which is believed to be more associated with community acquired CDI 61, was the dominant PCR RT found among farm animals, especially in pigs 117, 118, 120, 121. The increased association of PCR RT 078 with human CDI as well as its presence in pigs has driven the hypothesis of zoonotic transmission 6, 99, 100, 116. So far, several studies have analysed the overlap of C. difficile genotypes in animals and humans 83, 84, 118, 122. Interestingly, some of these studies used MLVA to demonstrate that C. difficile PCR RT 078 isolated from animals and humans were very highly related; i.e. part of the same clonal cluster 83, 84, 123. Furthermore, these studies reported similarities in antimicrobial resistance phenotypes between the pig and human isolates although in some cases human strains were significantly more resistant to tetracycline than pig strains 83. This, combined with the overlap in C. difficile genotypes, suggests that transmission between the human and the animal population is indeed occurring. An alternative hypothesis is that both host populations acquire the bacterium from a, yet to be identified, common environmental source 6. The true scale by which zoonotic transmission of C. difficile is occurring and the importance of farm animals as reservoir for human disease remains to be elucidated in future studies. These future studies should preferably exploit molecular typing methods with high discriminatory power, such as whole genome SNP typing, that allow us to compare human and animal isolates very precisely. COMPARATIVE GENOMICS AND EVOLUTION OF C. DIFFICILE Whole genome sequencing has, in addition to typing of bacterial isolates, a wider range of applications, including comparative genomics, marker development and evolutionary analysis. In 2006, the Wellcome Trust Sanger Institute in the United Kingdom finished the

17 24 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile sequencing of the first complete C. difficile genome, designated as strain Strain 630 (PCR RT 012) was isolated in Switzerland (1982) from a patient with severe pseudo membranous colitis that spread to multiple other patients. The sequenced circular genome of 630 consists of approximately 4.3 million bases, with a low G+C content of approximately 29%, and an additional circular plasmid of 7.8 kilobases. The C. difficile genome encodes for almost 3800 proteins. Since then WGS of C. difficile genomes has expanded resulting into the availability of several other high quality genomes and numerous of unfinished draft genomes in the National Center for Biotechnology Information (NCBI) database. These genomes represent the major problematic PCR RTs found within the healthcare system, including but not limited to PCR RTs 027 (R20291, BI-1 and CD196), 078 (M120), 001 (BI-9) and 017 (M68 and CF5) 125. Horizontal gene transfer (HGT) is a common mechanism by which bacteria acquire genes during evolution. Integrative and conjugative elements (ICEs) are transposable mobile elements that facilitate the process of HGT. They are integrated into the bacterial genome and capable of transfer to and integration into the genome of other prokaryotes. The Tn916 family is a well-known family of mobile elements that has a broad host range and is also commonly found in the genus Clostridium 126, 127. ICEs (transposons) often carry antimicrobial resistance determinants giving the bacterium a (multi) drug resistant phenotype. For example, the Tn916 harbours the antibiotic resistance determinant tetm that codes for tetracycline resistance 128. Furthermore, it has become evident that transposons can also mediate the transfer of various other beneficial accessory genes allowing the bacterium to adapt to and survive in new environmental conditions, like the human gut 126, HGT has also contributed to the diverse genome of the C. difficile species 124, 125, 133. Analysis of the C. difficile 630 genome revealed that it harbours a con siderable number of mobile elements, including 7 putative conjugative transposons (Ctn1 to Ctn 7) 124. In addition, it has been demonstrated that the presence of these elements can vary between and/or within subtypes 134. This was recently highlighted by the discovery of a large transposable element Tn6164 (of around 100 kb in length) in the M120 genome, of which the presence is restricted to a few PCR RT 078 genomes 135. So far, various mobile elements have been identified and with the increase in the number of sequenced C. difficile genomes it is likely that many more novel mobile elements will be identified.

18 General Introduction and outline thesis 25 Figure 3. The macroevolution of C. difficile Shown is a phylogenetic tree based on the whole genomes of various C. difficile strains (n=9). The phylogeny illustrates the evolutionary relationships between different lineages (highlighted by different colours). M120 = PCR RT 078, M68 and CF5 = PCR RT 017), BI-9 = PCR RT 001, 630 = PCR RT 012 and 027s = PCR RT 027. For each lineage the identified mobile elements are indicated with CTnCD# together with the drug resistance genes present on those elements (in red). The dashed line represents the root of the phylogenetic tree Proc Natl Acad Sci U S A. Apr 20, 2010; 107(16): WGS of various C. difficile genomes has boosted the study of the C. difficile population structure. He et al. 125, were the first to demonstrate that C. difficile is a hetero - geneous species which diversified through various lineages (figure 3). This WGS study also demonstrated that PCR RT 078 is the most distant relative within the C. difficile species. However, at that time WGS was expensive and complex to perform, and therefore, the less expensive and easier to perform multilocus sequence typing acquired much attention.

19 26 Wilco Knetsch: Molecular typing & Evolution of Clostridium difficile Various large MLST studies were initiated that included many diverse C. difficile sub - types These MLST studies showed that C. difficile evolved through at least five lineages (macro evolution) and demonstrated which subtypes were allocated to the respective lineages (micro evolution). Especially the micro evolution greatly improved our understanding of closely related subtypes. In particular, the distinct PCR RTs that were assigned with identical sequence types (STs) as hypervirulent PCR RT 027 (016, 036, and 176; all ST-1) and PCR RT 078 (033, 045, 066, and 126; all ST-11). Since PCR RT 027 and 078 are notorious C. difficile types, we sometime refer to the above mentioned related PCR RTs as 027 and 078-likes. Future, (whole genome) sequencing studies should include many more diverse C. difficile strains, which will potentially improve current knowledge on the population structure, for example by identifying novel lineages, and will probably improve the current knowledge on the micro evolution (e.g. evolution within the lineages). Comprehensive knowledge about the macro and micro evolution of C. difficile as well as the online availability of the underlying whole genome sequences also facilitates comparative genomics. Clearly, the search for type specific molecular markers and virulence associated genes has become much easier with extensive knowledge about the C. difficile subtypes that belong to the same phylogenetic cluster. However, comparing bacterial genomes with the aim of finding a robust and unique marker can still be rather challenging, especially when it concerns large and highly mobile bacterial genomes such as the C. difficile genome 124. A robust marker needs to be stably integrated into the bacterial genome and preferably has a link to virulence. HGT has been important in shaping the C. difficile genome 125, and has introduced many unique genomic regions with genes that potentially encode for virulence associated proteins. However, many of these genomic regions are associated with mobility, i.e. transferred by mobile elements. Mobile elements have the potential to excise from genomes and integrate into other (less closely related) genomes, making these regions less suitable for marker discovery. Another challenge in comparative genomics is the speed by which novel sequenced draft genomes are released in publicly available databases. Many of the newly released draft genomes are either of lower quality (i.e. unfinished) compared to the initially sequenced reference genomes or lack the required (high quality) annotation that is needed for interpretation. Likewise, important metadata such as the designated type (i.e. PCR RT/ST/NAP), which is crucial when comparing genomes, is often unavailable. Smart bioinformatics approaches are needed to overcome these challenges and that will accommodate future marker development for detection of specific pathogenic bacteria.

20 General Introduction and outline thesis 27 OUTLINE OF THIS THESIS With this thesis we aim to improve the currently available typing capacity for C. difficile using different approaches. With the discovery of novel genomic markers we contributed to the rapid detection of hypervirulent C. difficile PCR RTs. In addition, we intent to improve current knowledge on the genetic diversity of C. difficile strains by characterising a diverse collection of reference strains using multiple molecular typing methods. The consensus CE ribotyping protocol proposed by us will facilitate accurate transfer and comparability of typing data, thereby supporting future international C. difficile surveillance programs. Finally, we want to expand the existing typing capacity by developing and applying high resolution SNP typing for C. difficile PCR RT 078. Chapter 1 is the current section in which an introduction to this thesis is provided. Chapter 2 provides an overview on the currently most frequently applied typing methods to characterise C. difficile, together with some future perspectives. In this chapter typing methods are reviewed that are applied in surveillance studies as well as methods that are suitable to investigate hospital outbreaks. Chapter 3 reports on the performance of an in-house developed real-time PCR targeting the tcdb gene. This assay was evaluated using the appropriate gold standard, and compared to the performance of a commercially available assay and another in-house developed assay. Chapter 4 describes the development and validation of an international consen - sus protocol for capillary electrophoresis (CE) PCR ribotyping. CE PCR ribotyping was never validated in a multi-centre study addressing interlaboratory variability. Here, we report the reproducibility, accuracy and portability of a consensus protocol across four reference centres, spanning Europe and North America. Chapter 5 is focused on two genomic markers that are unique to C. difficile lineages linked to hypervirulence (i.e. lineage 5 with PCR RT 078 and lineage 2 with PCR RT 027). We also report on other PCR RTs that contain the described markers, thereby providing valuable information on strains that are highly related to hypervirulent strains. Furthermore, we report on the bio-informatic approach underlying the marker discovery. Chapter 6 describes a comparative analysis of various C. difficile strains belonging to a reference collection that was well characterized by several typing methods, including MLST (using 7 housekeeping genes; HG) and PCR ribotyping, combined with