How Stable Are the Core Genes of Bacterial Pathogens?

Size: px
Start display at page:

Download "How Stable Are the Core Genes of Bacterial Pathogens?"

Transcription

1 How Stable Are the Core Genes of Bacterial Pathogens? High rates of recombination within the meningococci and pneumococci means even their core genomes are subject to rapid change Edward J. Feil Edward J. Feil is an MRC Research Fellow in the Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath Clonal complexes may account for a large proportion of all the individual strains within a population ne big surprise of the microbial O genomic era is the remarkable evolutionary potential bestowed on many species by frequent horizontal genetic transfer. The loss or gain of individual genes or large genomic islands accounts for the emergence of many specific metabolic, virulence, or drug resistance phenotypes. However, even for the most dynamic microbial genomes, these rapid changes typically are superimposed upon a backbone of core genes that are presumably indispensable for a particular species. Those core genes are considered likely components in all isolates of a given species or genus. Because selection at these loci exerts a stabilizing rather than a diversifying influence, variations detected in these genes tend to be neutral, or nearly so, and will accumulate in a clock-like manner. Thus, these genes should be reliable indicators of evolutionary history. For instance, core 16S rrna sequences are frequently used to reconstruct the phylogenies of microbial species and, for an individual bacterial species, metabolic or housekeeping genes are commonly used for typing. Multilocus enzyme electrophoresis (MLEE) and the more recently developed multilocus sequence typing (MLST) are two typing techniques that are based on such assumptions. MLST characterizes isolates of a bacterial species by the alleles present (as defined by nucleotide sequence data) at each of seven core genes. The method is now being widely used for molecular typing, and large datasets have been generated for important gram-negative and grampositive bacterial pathogens, including Neisseria meningitidis, Streptococcus pneumoniae, and Staphylococcus aureus. In addition to their primary use as a typing resource ( these data are providing a means for examining the degree to which homologous recombination occurs within the core genome. Such analyses are providing some surprises. In particular, the evidence from analyses of the meningococcal and pneumococcal MLST datasets demonstrates that even the core genomes of these species are subject to rapid evolutionary change by homologous recombination. Clonality and Recombination within Bacterial Populations Bacterial populations are typically characterized by the presence of a limited number of widespread, predominant genotypes (clones) and a much larger number of rare strains. In many populations, these predominant clones belong to larger clusters of very closely related strains, called clonal complexes, that evolve by radial ( starlike ) diversification from a recent common ancestor. Together, these clonal complexes may account for a large proportion of all the individual strains within a population. Molecular-typing techniques aim to assign individual isolates to clones or clonal complexes, but different techniques vary in their discriminatory power. MLST using seven neutral loci distinguishes clones and clonal complexes in most 234 Y ASM News / Volume 69, Number 5, 2003

2 bacterial species examined so far, with some notable exceptions. The core genome of Mycobacterium tuberculosis, for example, is very uniform and if MLST were attempted for this species, almost all M. tuberculosis strains would appear identical, rendering the data useless for epidemiological purposes. At the other extreme, recombination within Helicobacter pylori occurs so frequently that alleles within isolates from a defined geographical area are at complete linkage equilibrium, or panmixis, meaning that most isolates possess a unique genotype and discrete clones cannot be readily discerned. Between these two extremes lie a number of species in which MLST readily identifies multiple examples of the same or very similar genotypes. Fred Cohan, of Wesleyan University, Middletown, Conn., favors the term ecotype for these clusters. This term alludes to the idea that clusters emerge and are maintained in the population because they represent adaptations to specific ecological microniches. Extending this argument, Cohan has recently suggested that ecotypes provide a biologically meaningful species definition, and that traditionally named bacterial species should in fact be considered clusters of related species, or genera. However, with the important exception of the emergence of clones resistant to one or multiple antibiotics, the precise selective advantage of most ecotypes within most named bacterial species remains unclear. Owing to the general scarcity of very old strains within strain collections, the stability and relative rates of emergence and extinction of clonal complexes is also unknown, although it is likely that these rates are in part determined by the frequency of homologous recombination within the population. Estimating the Impact of Recombination and Mutation on Clonal Divergence It is widely accepted that the frequency of homologous recombination within a given bacterial species should be directly reflected in the degree of linkage observed between alleles, as recombination will act to break up linkage disequilibrium. Put another way, the extent to which the population falls into discrete clonal clusters should be lessened in populations where recombination is frequent because this process might, over time, act to break these clusters down. However, this argument fails to take into account two main difficulties. First, it is necessary to consider the effect of the selective maintenance of adaptive genotypes, where specific clones might survive even in a freely recombining population. Second, it is necessary to assume that the sample of strains being examined is representative of the natural population. This sampling condition is rarely met in those species such as the meningococcus or pneumococcus that typically live as commensals but occasionally cause invasive disease; strain collections of these species tend to massively underrepresent isolates recovered from asymptomatic carriage. The emphasis on the study of disease-causing isolates of these species is understandable, but from a population biology perspective it has led to a seriously skewed view of the overall diversity and microevolution of these species. Fortunately, this problem is currently being addressed by, among others, Brian Spratt of Imperial College, London, United Kingdom (UK), and Martin Maiden of the University of Oxford, UK, who are using MLST to characterize large samples of the carriage populations of these species. Many analytical techniques have been devised to detect homologous recombination directly from alignments of sequence data, and interested readers are referred to a recent review by Philip Awadalla of the University of California, Davis. A potential drawback with many sophisticated approaches is that they require difficult estimates of population genetics parameters. David Guttman of the University of Toronto, Ontario, Canada, and Dan Dykhuizen of the State University of New York, Stony Brook, developed a method that neatly sidesteps many of the complexities involved in estimating recombination rates from sequence data. Their approach restricts the analysis to very closely related strains of Escherichia coli belonging to the group A lineage. This means that the task of identifying and scoring recombination and mutation events is simplified because it is unlikely, in the short time since the strains diverged from a common ancestor, that more than a single event will have occurred at any given site in the genome. For example, a single point mutation is unlikely to have been covered up by a subsequent recombinational replacement. There may still be difficulties in distinguishing Volume 69, Number 5, 2003 / ASM News Y 235

3 FIGURE 1 1 MLST genotypes Recombination 7 loci in founder Nucleotide changes Point mutation 2 Subdivide into clonal complexes (5 7 loci shared with at least one member of group) 4 Analyze nucleotide changes Single-locus variants (SLVS) 3 Ancestral (founder) genotype differs at one locus from other genotypes recombination events from mutation, particularly if the parental recombining strains are very similar and homologous replacement results in a only small number of nucleotide changes. In principle, however, the relative frequencies of events over the very short term should reflect their relative contributions to evolution over the longer term. Examining Intraclonal Diversification within Bacterial Populations While in the laboratory of Brian Spratt, I extended the work of Guttman and Dykhuizen and developed a similar approach based on Founder Estimating the impact of recombination on clonal diversification. MLST genotypes are first subdivided into clonal complexes (2). These groups are defined on the basis that each strain within the group will share at least 5 to 7 identical loci with at least one other member of the group. Each genotype in turn is compared against all other genotypes within the clonal complex, and the putative ancestral (founder) genotype is assigned on the basis that this genotype differs from the largest number of other genotypes within the clonal complex at only one locus out of seven (3). The variant alleles within single-locus variants (SLVs) are then compared to the corresponding allele in the ancestral genotype, and each SLV is assigned as having arisen by recombination, or by point mutation, on the basis of the number of nucleotide changes and the frequency of the changed allele within the dataset. MLST data. This approach follows three main steps. First, clonal complexes are defined as groups of strains in which each strain has a defined level of similarity (typically five out of seven identical alleles) to at least one other strain in the group. In step two, the most likely ancestral, or founder, genotype of each clonal complex is assigned parsimoniously on the basis that it defines, out of all the genotypes within the clonal complex, the highest number of very close relatives. Initial diversification of a clone will result in clonal variants that differ at only one of seven MLST loci; the most parsimonious founder is thus the genotype associated with the greatest number of such single-locus variants (SLVs). Finally, the likely patterns of descent of members of the clonal complex from the predicted founder genotype are graphically displayed (Fig. 1). Having identified the most parsimonious founder genotype of each clonal complex, those SLVs in which the variant allele arose by point mutation are distinguished from those that arose by recombination. If an allele changed by a single de novo point mutation, we expect the resulting allele to have two characteristics. First, it will differ from the corresponding allele in the founder genotype only at a single nucleotide site (multiple-point mutations within a single gene will be unlikely, as the remaining six genes have remained unchanged). Second, the mutated allele is likely to be novel, that is, not present in unrelated lineages elsewhere in the MLST dataset. Hence, all of the variant alleles in SLVs that are both novel and have single-nucleotide changes can be assigned as putative point mutations, and all other allelic changes can be assigned as putative recombinational replacements. Because many of the MLST datasets contain at least 500 isolates, most of the common alleles in the population will be represented, and those that have been imported by recombination are likely to be found elsewhere within the database. Using this approach, the recombination-tomutation ratio can be determined for any species for which a suitable MLST database is available. The results so far suggest that for both the meningococci and the pnuemococci, a single 236 Y ASM News / Volume 69, Number 5, 2003

4 allele is 5- to 10-fold more likely to change by recombination than by point mutation. In contrast, estimates for S. aureus suggest that alleles are approximately 15-fold more likely to change by point mutation than recombination. FIGURE th percentile Detecting the Intraspecies Phylogenetic Signal Tree number Pneumococcal Trees of gdh tree other six genes In Likelihood Random trees High rates of recombination will eliminate any intraspecies phylogenetic signal within those genes affected. Hence, for a given set of strains, phylogenetic trees reconstructed from the sequences of different genes will show a low level of similarity, or congruence. Because MLST datasets are based on multiple loci, they are ideal for assessing the effects of recombination through an examination of the congruence between gene trees. This approach differs from that just described because it requires only a small number of strains (say, 30 50). Because these strains are chosen to be distantly related to each other (on the basis of the MLST data), this analysis examines the effect of recombination over the long-term diversification between clones rather than on the shortterm microevolutionary events occurring within clones. This statistical method determines whether pairs of gene trees are more similar in topology to each other than would be expected by chance (Fig. 2). This test for congruence has been applied to MLST data for six different bacterial pathogens, including 30 strains belonging to N. meningitidis. The results demonstrate that in the great majority of cases gene trees constructed from meningococcal housekeeping genes are no more similar to each other than to trees of random topology. A similar picture emerged for S. pneumoniae. Although the MLST data originally reported for S. aureus also suggest low levels of congruence, these data contained errors, and an analysis using a corrected dataset provides some evidence of a consistent phylogenetic signal between gene trees, although several noncongruencies remain. Thus, S. aureus, along with Haemophilus influenzae, exhibits a level of congruence intermediate between that observed in the pathogenic E. coli (where all trees were congruent with each other) and the meningococcus or pneumococcus, where no phylogenetic signal could be detected. Two MLST Data-Based Approaches Have Distinctive Applications The two approaches for detecting recombination among bacterial species outlined above both demonstrate high rates of recombination for the meningococcus and the pneumococcus, and lower levels of recombination in S. aureus. Despite this broad agreement, caution is still required in assuming that the microevolutionary events occurring within clones reflect the relative importance of recombination and point mutation over longer evolutionary time scales. For example, it is possible that many of the point mutations observed within clonal complexes of A statistical test of congruence using a maximum likelihood approach. A maximum likelihood tree for the first gene (in this case the pneumococcal gdh gene) is computed, and the -ln likelihood score of this tree is compared to the ln likelihood scores obtained for 200 trees of random topology as a fit to the gdh data. As expected, all of the random trees are a poor fit to the gdh data. The ln likelihood scores for the maximum likelihood trees from the other six gene trees, as a fit tothegdh data, are also computed and compared against the distribution of scores from the trees of random topology. If the likelihood scores of the trees from one of the other six genes fall to the left of the 99th percentile of this distribution, the trees are significantly congruent. In the figure above, none of the six other gene trees showed a ln likelihood score that indicated a better fit to the gdh data than the 99th percentile for the trees of random topology, hence the trees from these six genes show no significant congruence with the gdh tree. Volume 69, Number 5, 2003 / ASM News Y 237

5 S. aureus are deleterious and will, in time, be removed from the population by purifying selection. This may increase the relative contribution of homologous recombination as one moves further back in the tree. In support of this notion, it has been noted that the proportion of nonsynonymous point mutation increases with overall sequence similarity when comparing different strains of S. aureus, and rises sharply when intraclonal comparisons are made. These two approaches also have distinct pros and cons. The main advantage of the method for estimating recombination parameters within clonal complexes is that it is quantitative, and thus allows meaningful comparisons between species. It is possible that such comparisons may eventually lead to evidence concerning which ecological and/or biological factors are important in determining the recombination rate within a given population. A disadvantage is that the accuracy of the estimates is related to the proportion of alleles in the natural population that are represented in the dataset, thus the technique is only appropriate once a particular MLST scheme contains data from several hundred isolates. However, current estimates are likely to improve as MLST is adopted for routine typing and the amount of data available increases. The usefulness of this method is also dependent on the population structure of the species. For example, it would not be applicable to highly recombinogenic species such as H. pylori, for which every strain tends to correspond to a unique MLST genotype, as the analysis focuses on events occurring within clonal complexes. In contrast, the statistical test of congruence does not require a very large MLST dataset or the presence of clonal complexes, although differences between species are more difficult to quantify. Strikingly, different intraspecies gene trees in the meningococcus and pneumococcus typically bear no more similarity to each other than to trees of random topology. This result highlights the futility of attempting to reconstruct phylogenies in these species. The presence of a phylogenetic signal in some species, but not in others, has important consequences. For example, in E. coli or Salmonella enterica, trees reconstructed using core genes are a good approximation of the phylogenetic history of the species. Thomas Whittam of Michigan State University and coworkers have demonstrated the usefulness of a validated phylogenetic tree for pathogenic strains of E. coli by tracking the gain of important virulence determinants along the different branches of the tree. If there were no phylogenetic signal, however, such an approach would be pointless. The evidence for a high frequency of recombination within meningococcal and pneumococcal housekeeping genes is now indisputable, indicating that even the core genomes of these two species are subject to rapid evolutionary change by horizontal gene transfers. Although homologous replacement of genes certainly contributes to the diversification and apparent phylogenetic history of gene sequences, this process is not directly comparable with many of the large-scale genetic events frequently documented through whole genome comparisons. Unlike changes in gene content, or the import of genomic islands, there may be no obvious selective benefits conferred by homologous replacements of core genes, and these events almost always occur between the same or similar species. However, from the large differences in base composition in imported genes, genes may occasionally be acquired across vast evolutionary distances. Most of these acquisitions will be detrimental, or of no selective advantage, and will not be observed. Nonetheless, acquiring genes that provide a microbial species with the ability to colonize a new niche, or to exploit a new mode of transmission, occasionally may launch an isolate into a new lifestyle and one that may have serious public health consequences. SUGGESTED READING Awadalla, P The evolutionary genomics of pathogen recombination. Nat. Rev. Gen. 4: Cohan, F. M What are bacterial species? Annu. Rev. Microbiol. 56: Feil, E. J., J. E. Cooper, H. Grundmann, D. A. Robinson, M. C. Enright, T. Berendt, S. J. Peacock, J. Maynard Smith, M. Murphy, B. G. Spratt, C. E. Moore, and N. P. J. Day How clonal is Staphylococcus aureus? J. Bacteriol., in press. Feil, E. J., and B. G. Spratt Recombination and the population structures of bacterial pathogens. Annu. Rev. Microbiol. 55: Y ASM News / Volume 69, Number 5, 2003

6 Guttman, D. S., and D. E. Dykhuizen Clonal divergence in Escherichia coli as a result of recombination, not mutation. Science 266: Maiden, M. C. J., J. A. Bygraves, E. Feil, G. Morelli, J. E. Russell, R. Urwin, Q. Zhang, J. Zhou, K. Zurth, D. A. Caugant, I. M. Feavers, M. Achtman, and B. G. Spratt Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95: Ochman, H Lateral and oblique gene transfer. Curr. Opin. Genet. Dev. 11: Reid, S. D., C. J. Herblin, A. C. Bumbaugh, R. K. Selander, and T. S. Whittam Parallel evolution of virulence in pathogenic Escherichia coli. Nature 406: MicrobeWorld is now on radio! ASM Launches Daily Radio Series MicrobeWorld is a nationally syndicated daily 90-second radio show designed to increase public appreciation of microbiology. Public backing is a persuasive tool in convincing station directors to take on a new radio series. You can help us with one of our most important tasks, getting our radio feature series on a station in your area. Contact Paul Bartishevich, president of our production partner, Finger Lakes Productions in Ithaca, NY, with the names of stations in your area that you would like to carry the program and the names of potential program sponsors. His address is pbart@flpradio.com. ASM appreciates your assistance in ensuring the success of the series. For further information, contact ASM Communications Director Barbara Hyde at bhyde@asmusa.org. Hear samples of MicrobeWorld stories at Visit our educational website at Volume 69, Number 5, 2003 / ASM News Y 239