SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 ARTICLE NUMBER: Highly heterogeneous mutation rates in the hepatitis C virus genome Ron Geller, Úrsula Estada, Joan B. Peris, Iván Andreu, Juan-Vicente Bou, Raquel Garijo, José M. Cuevas, Rosario Sabariegos, Antonio Mas, and Rafael Sanjuán. NATURE MICROBIOLOGY 1

2 Supplementary Table S1. Genome regions containing low mutable (LM) and highly mutable (HM) clusters. Type Gene H77 sites Sequence a LM Core GCC AGG GCC CUG GCG CAU GGC GUC CGG GUU CUG GAA E GGG GAC UUG UGC GGG UCA E GGU GCC CAC UGG GGA GUC CUA GCG P UCC UUC CUC GUG P AGG UGG GUG NS UCG GGA GGC CCC CUG CUG UGC CCC GUG GGA CAC GCU GUA GGC NS4B AGU GUU GGA NS5A CGG ACG GUG GUC NS5A CCC CUG GAG GGG GAG NS5A UGG UCG ACA GUC NS5A GAU GAC GCG GAG GAU GUC NS5B UCC GUG UGG AAA GAC UUU CUG NS5B AAG UCC CUC ACC GAG AGG CUU HM NS GCG UAC GCA GCU GAG NS UCA GGG GGU GCU UAU GAC AUA AUA NS4B AUC ACU CCU GCU GUA CAG ACC AAC UGG CAA AGA CUC GAG ACC NS5A GAG GAA UAC GUG GAA AUA AGG NS5A UCG UUU AGA GUA GGA CUA CAC GAU NS5B GAA GUU UUC a Bases showing no mutations despite >10,000 fold coverage in each of the lines (LM) or showing median mutation frequencies >10 3 (HM) are underlined. Triplets correspond to polyprotein codons. 2 NATURE MICROBIOLOGY

3 SUPPLEMENTARY INFORMATION Supplementary Table S2. Primers for first and second round PCRs used to amplify HCV fragments. PCR 1 PCR 2 Fragment Primer Sequence Primer Sequence 1 239F CCGCAAGACTGCTAGCCGAGTA 342F ATGAGCACGAATCCTAAACCTCAA 1063R GCCACCCAACATTTCGAGG 1063R GCCACCCAACATTTCGAGG 2 491F GAAGACTTCCGAGCGGTCGC 915F TACCAAGTACGCAACTCTTCGGG 3277R CCATGTAATGAGCTTGGTCTCCA 2579R TGCCTCTGCCTGGGATATGAG F GACAGGGACAGGTCCGAGCT 2408F CATCCACCTCCACCAGAACAT 4345R TGGTCAAGGACAGTGCCGAT 4214R GATGGGGCTGCCAGTAGTAATTGT F CTAGAGACAACCATGAGGTCCCC 4047F AAGAGCACCAAGGTCCCGG 5754R GGCTAGTGGTTAGTGGGCTGGT 5658R GTATCCCACTGATGAAGTTCCACAT F GAGTTCGATGAGATGGAAGAGTG 5517F GAGCAGTTTAAACAGAAGGCCCT 7343R GACTACCGTCCGCTTCTTCCG 7231R AACGGGGGGTTGTAGTCCG F GCTCCATCTCTCAAGGCAACTTG 7053F GGCGGTAACATCACCAGGGTT 8936R GCAATCAAGGGCCTGTTCAAG 8741R CCTCTTTCCAGCGCCGTC F GGCGGTAACATCACCAGGGTT 8530F ACTGCACCATGCTCGTGTG 9377R TCATCGGTTGGGGAGGAGG 9377R TCATCGGTTGGGGAGGAGG NATURE MICROBIOLOGY 3

4 Supplementary Fig. S1. Duplex Sequencing coverage for each of the three replicon lines. The cloned fragments 1-7 are indicated. Sites with coverage < 5000 (dashed lines) or more than 5% of readings with indeterminate bases were not included in the analysis. 4 NATURE MICROBIOLOGY

5 SUPPLEMENTARY INFORMATION Supplementary Fig. S2. Mutation sampling depth and among-line reproducibility of mutation frequencies obtained by Duplex Sequencing. a. Rarefaction curves showing mutation sampling as a function of sampling effort. For each fragment (1-7, color legend) and line (panels), the number of unique mutations detected (variant richness) is shown for different levels of sequencing coverage (sample size). The sample size was varied by taking a random subsample of sequencing reads at each nucleotide site. The estimated total number of mutations present in each fragment was inferred by the Chao1 method1 and is indicated with asterisks. b. Between-line per-site mutation frequency correlations. For each pairwise comparison between lines, the observed mutation frequency in each of the individual genome sites analyzed is plotted in log scale. Pearson correlations are indicated on top. Comparison of among-site versus among-line log mutation frequency variance using 1000 randomly sampled triplets of site showed that among-site variance was 2.72-fold higher (0.1137) than among-line variance (0.0417). NATURE MICROBIOLOGY 5

6 Supplementary Fig. S3. Mutation frequencies obtained by Duplex Sequencing for each gene and for paired/unpaired RNA. a. Box plot of mutation frequencies for each HCV gene. Box lower and upper limits indicate percentiles 25 and 75, and the middle line shows the median. Whiskers show the 10 th and 90 th percentiles, and outlying points are individually plotted. b. Base composition has stronger effect on mutation frequency than RNA structure. For each base type, the box plot indicates mutation frequencies for sites predicted to form RNA basepairs (p.) versus those not forming pairs (u.). A functionally important structure located in the NS5B gene (stem-loop encompassing H77 positions 8967 to 9299) is also plotted. 6 NATURE MICROBIOLOGY

7 SUPPLEMENTARY INFORMATION Supplementary Fig. S4. Coomassie-stained SDS-PAGE gel (a, b) and in vitro extension of a HCV RNA fragment using the purified NS5B RNA polymerase (c). a. Ni chromatography (1: cell lysis product, 2: flow through, 3: molecular weight marker, 4: wash in 30 mm imidazole, 5: elution in Ni with 500 mm imidazole, 6: elution in Ni with 500 mm imidazole, 7: elution in Ni with 500 mm imidazole, 8: elution in Ni with 500 mm imidazole, 9: elution in Ni with 500 mm imidazole). b. Heparin chromatography (1: first elution, 2: second elution, 3: third elution, 4: fourth elution, 5: molecular weight marker). c. Autoradiography of the RNA template produced by T7 transcription and of the in vitro polymerization of a 521-nt fragment corresponding to the HCV E1-E2 region. Lanes 1 and 3 show two independent run-off transcriptions of the template RNA. Lanes 2 and 4 show the product synthesized by NS5B. Lanes 2 and 4 show the product synthesized by NS5B using two independent run-off transcriptions as templates. These data are representative of six different independent experiments. All images represent full raw scan data. NATURE MICROBIOLOGY 7

8 Supplementary references 1. Chao,A. Non-parametric estimation of the number of classes in a population. Scand. J. Stats. 11, (1984). 8 NATURE MICROBIOLOGY