Supplementary Material Manuscript title: Cross-immunity and community structure of a multiple-strain pathogen in the tick vector Description of the primers used in the second PCR: The forward and the reverse primers both had the 454 Life Science s A or B sequencing adapter (blue italic font) and the key (bold black font). The forward primer also included a 10 bp Multiplex Identifier (MID) to individually tag each sample (red italic font). The template-specific sequences target the ospc gene (green font). Forward: CCATCTCATCCCTGCGTGTCTCCGACTCAG-MID- TATTAATGACTTTATTTTTATTTATATCT Reverse: CCTATCCCCTGTGTGCCTTGGCAGTCTCAGTTGATTTTAATTAAGGTTTTTTTGG 1
Description of the method used to clean the sequences Preliminary cleaning of the ospc gene sequences: The 454 sequencing run produced 352,882 sequences. The split_library.py function of QIIME (1) was used to do a preliminary cleaning on each pool of sequences with the following parameters: a maximum of four mismatches for the primers, a minimum length of 300 bp, maximum homopolymers of eight bp, and with primers retained. During this step, sequences were assigned to their sample name according to their ten bp MID tag. Removing indels and chimeric sequences: After the preliminary cleaning, we checked our sequences for methodological errors such as insertions or deletions (indels) produced by the PCR or the sequencing technique. For each pool, the nucleotide sequences were aligned individually to an OspC amino acid sequence using Exonerate (2). This software can align genomic sequences to a protein sequence and takes frame shifts into account. Indels were softmasked (written in lower case) and all sequences were aligned together using psa2msa. Softmasked nucleotides were coded as gaps in sequences that did not contain that particular indel. Resulting sequences contained many gaps and were very long (1697 bp). We therefore deleted all positions for which more than 60% of the sequences had a gap using TrimAl (3). After deleting these uninformative positions, the resultant sequences were 521 bp long. The last cleaning step removed chimeric and other artificial sequences that can arise during either PCR or 454-sequencing. In the Exonerate software, sequences that align poorly will contain a large number of gaps. We therefore erased all sequences that contained fewer than 350 bp and more than 171 gaps, basing our criteria on the distribution of the sequence length. In addition, some poorly aligned sequences were split into two separate sequences by the software but were assigned the same name. We therefore deleted all sequences for which the name was repeated twice. The purpose of the previous cleaning steps was to eliminate low quality 2
sequences and sequences that were artificially created by the molecular methods. The resultant cleaned data set of 240,410 sequences (521 bp long) is expected to contain only those sequences that are biologically real. 3
Table S1. Comparison of the nomenclature of the different major ospc groups of B. afzelii between the present study and four earlier studies (4-7). We based the new names of the major ospc groups on the classification by Bunikis et al. (8) and Strandh and Raberg (9). The studies of Baranton et al. (4) and Theisen et al. (6) did not name the major ospc groups. We therefore used the order of appearance in the original paper to rename the major ospc groups as unknown 1 (Unk1), unknown 2 (Unk2), and so on. New Names Lagal 2003 Baranton 2001 Theisen 1995 Genbank Accession number A1 A2 Unk2 Unk1 AY363710 A2 ME* Unk14 Unk3 FJ750334 A3 A3 Unk8 AY363712 A4 A8 Unk13 AY363713 A5 Unk3 Unk2 AY363714 A6 AY363715 A7 A6 Unk11 AY363718 A8 AY150201 A9 A1 Unk5 Unk4 AY363719 A10 YU* AY363720 A11 Unk7 AY363721 A12 A4 Unk9 AY150205 A13 FJ750336 A14 A5 Unk4 AY150203 A15 Unk12 AB000348 A16 FJ546555 A17 X83552 A18 Unk10 AY491403 A19 Unk6 AF230184 A20 AY491407 *YU and ME were named in Pérez et al. (7). 4
Table S2. Comparison of the nomenclature of the different major ospc groups of B. garinii between the present study and three earlier studies (4-6). We based the new names of the major ospc groups on the classification by Lagal et al. (5). The studies of Baranton et al. (4) and Theisen et al. (6) did not name the major ospc groups. We therefore used the order of appearance in the original paper to rename the major ospc groups as unknown 1 (Unk1), unknown 2 (Unk2), and so on. New Names Lagal 2004 Baranton 2001 Theisen 1995 Genbank Accession number G1 G1 Unk3 L42879 G2 G2 Unk4 Unk2 X81526 G3 G3 Unk9 L42870 G4 G4 Unk21 Unk1 AJ132797 G5 G5 Unk8 AY150185 G6 G6 Unk16 AY150186 G7 G7 AY150199 G8 G8 Unk19 AY150193 G9 G9 Unk20 AY150191 G10 G10 Unk11 AY150189 G11 G11 Unk18 AY150195 G12 Unk17 L42886 G13 Unk6 L42875 G14 Unk10 L42863 G15 Unk13 D49376 G16 Unk1 X83556 G17 Unk2 D49509 G18 Unk3 AF098941 G19 Unk5 D49505 G20 Unk7 D49504 G21 Unk12 D49381 G22 Unk14 D49499 G23 Unk15 D49508 G24 Unk22 Unk4 X84773 5
Table S3. The number of substitutions, insertions and deletions were counted after comparing 30 ospc gene sequences within each of the 23 major ospc groups. The substitutions were split into synonymous (S) and non-synonymous (NS) substitutions. The insertions and deletions were categorized into changes involving one, two, or three nucleotides (1nt, 2nt, 3nt). The sequences included the first 350 bp of the ospc gene. Substitutions Indels ospc Total S NS 1nt 2nt 3nt A1 7 4 3 6 0 2 A2 13 9 4 14 1 5 A3 3 3 0 4 2 3 A5 5 1 4 10 0 0 A7 10 7 3 14 4 7 A9 9 4 5 8 0 2 A10 1 1 0 0 0 4 A11 7 4 3 2 0 3 A12 5 3 2 3 0 3 A14 7 3 4 2 1 2 V1 10 5 5 6 1 4 Q 10 7 3 17 1 7 G2 3 1 2 4 0 4 G4 5 5 0 9 0 6 G6 6 3 3 6 1 5 G7 3 1 2 8 1 3 G8 0 0 0 2 1 3 G9 2 1 1 3 1 4 G10 13 8 5 4 0 4 G11 5 4 1 3 0 2 G13 4 1 3 4 2 4 G14 1 1 0 3 0 1 G15 12 7 5 11 4 1 Total 141 83 58 143 20 79 6
Figure S1. The number of major ospc groups found by the clustering analysis is stable across a range of similarity thresholds (93% to 98%). In this analysis, the ospc gene sequences were combined for B. afzelii and B. garinii. References 1. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI. 2010. QIIME allows analysis of highthroughput community sequencing data. Nature Methods 7:335-336. 2. Slater GS, Birney E. 2005. Automated generation of heuristics for biological sequence comparison. BMC bioinformatics 6:31. 3. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimal: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972-1973. 4. Baranton G, Seinost G, Theodore G, Postic D, Dykhuizen D. 2001. Distinct levels of genetic diversity of Borrelia burgdorferi are associated with different aspects of pathogenicity. Research in microbiology 152:149-156. 5. Lagal V, Postic D, Ruzic-Sabljic E, Baranton G. 2003. Genetic diversity among Borrelia strains determined by single-strand conformation polymorphism analysis of the ospc gene and its association with invasiveness. Journal of Clinical Microbiology 41:5059-5065. 7
6. Theisen M, Borre M, Mathiesen MJ, Mikkelsen B, Lebech A-M, Hansen K. 1995. Evolution of the Borrelia burgdorferi outer surface protein OspC. Journal of Bacteriology 177:3036-3044. 7. Pérez D, Kneubühler Y, Rais O, Jouda F, Gern L. 2011. Borrelia afzelii ospc genotype diversity in Ixodes ricinus questing ticks and ticks from rodents in two Lyme borreliosis endemic areas: Contribution of co-feeding ticks. Ticks and tick-borne diseases 2:137-142. 8. Bunikis J, Garpmo U, Tsao J, Berglund J, Fish D, Barbour AG. 2004. Sequence typing reveals extensive strain diversity of the Lyme borreliosis agents Borrelia burgdorferi in North America and Borrelia afzelii in Europe. Microbiology 150:1741-1755. 9. Strandh M, Råberg L. 2015. Within-host competition between Borrelia afzelii ospc strains in wild hosts as revealed by massively parallel amplicon sequencing. Phil Trans R Soc B 370:20140293. 8