Supplementary material and methods

Size: px
Start display at page:

Download "Supplementary material and methods"

Transcription

1 Supplementary material and methods raav production and animal experiments The paav-rsvp-gfp-pa vector plasmid contained the Rous Sarcoma Virus promoter (RSVp), followed by a short synthetic intron (pci plasmid, Promega), the egfp coding sequence, and the SV40 pa signal. The expression cassette was cloned between AAV2 Inverted Terminal Repeats (ITR). Research grade single-stranded AAV 2/8 vectors were produced by transient transfection of HEK293 and purified by double cesium chloride gradient ultracentrifugation (Ayuso et al. 2010) followed by dialysis against dpbs (INSERM UMR 1089 vector core). Four six-week-old mice (SPF-grade, C57BL/6J, male) received in each tibialis anterior muscle either dpbs or 7.2e11 vg of the ssaav2/8-rsvp-gfp-pa vector diluted in dpbs ( 2.4e13 vg/kg) under general anesthesia. Animals were euthanized 15 days post-injection (pi) and whole injected muscles were collected and frozen within less than 10 minutes. Because this small scale study was designed for a technical validation, we arbitrary chose a limited number of animals which were randomly picked in a small C57BL/6J colony purchased from Charles River Laboratories (L'Arbresle, France). All experimental procedures were validated by the research ethics board from Loire Atlantique (#CEEA ) DNA extraction Genomic DNA was extracted from 50 to 100 mg of muscle samples using Gentra Puregene Blood Kit (QIAGEN). Briefly, frozen tissues were mixed with Cell Lysis Solution and shredded in a QIAGEN Tissue Lyser II (30 s at 30 Hz). DNA was then recovered according to the manufacturer s instructions and stored at -20 C. Control preparation In addition to muscle samples from 4 mice, we also included 2 negative controls prepared by mixing together DNA extracted from non-injected mouse muscles and artificial circular forms of raav mimicking the episomal forms of the vector genome found in vivo. To generate the raav circles, linear vector genomes were first recovered from the vector plasmid by enzymatic restriction (Xma1). The band corresponding to raav linear genome was extracted from an agarose gel and purified with NucleoTrap (Macherey Nagel). An optimized ligation protocol

2 was then performed by ON incubation (T4-DNA ligase, NEB) and the remaining linear forms were digested by incubating DNA 2H at 37 C with an exonuclease (plasmid safe DNAse, Epicentre). The mixture was purified with Nucleotrap and the presence of circular forms (ranging from monomers to high molecular weight concatemers) was verified by agarose electrophoresis. Finally, the concentration of raav genomes was quantified by a qpcr targeting the egfp coding sequence. We mixed non-injected mouse muscle DNA and raav circular forms to obtain an approximate ratio of 5 vector genomes per mouse diploid genome. NGS library preparation protocol All the enzymes used in the protocol were purchased from New England Biolabs and DNA oligonucleotides from Sigma-Aldrich. In 0.5 ml tubes (Diagenode), 5 µg of injected or control muscle DNA were diluted in 100 µl of TE (10:1) and sheared with a Bioruptor standard UCD200 (Diagenode program: 30 s ON - 90 s OFF, 4 cycles, power: low). Then, high-molecular-weight fragments were excluded by mixing samples to 0.5x (50 µl) SPRIselect (Beckman Coulter), the supernatants were transferred to new tubes and purified by adding 70 µl SPRIselect. End repair was performed by mixing sheared DNA with 10 µl T4 DNA ligase buffer 10x, 4 ul dntp 10mM, 5 µl T4 DNA Polymerase 3 U/µL, 1 µl Klenow DNA Polymerase 5 U/µL, 5 µl T4 PNK and 30 µl water in a final volume of 100 µl. Following a 30 min incubation at room temperature (RT) samples were purified and low-molecular-weight fragments were removed using 0.7x SPRIselect. A-Tailing was performed by mixing 5 µl NEBuffer 2 10x, 10 µl datp 1 mm, 3 µl Klenow exo- 5 U/µL to blunted DNA. The mix was incubated 30 minutes at 37 C in a heated block and purified using 1.6x SPRIselect. Illumina adapters were synthesized by annealing P5 oligo with a P7 indexed oligo as previously described (Kozarewa, I. & Turner, D. J. Methods Mol. Biol. 2011). Ligation of adapters was performed by mixing A-tailed DNA to 2 µl adapters 40 µm, 25 µl Quick ligase buffer 2x, and 5 µl T4 DNA Ligase 2000 U/µL. The mix was incubated 15 min at RT and purified two successive times using 1x SPRIselect to remove the adapter dimers. Samples were quantified on a Bioanalyzer s D1K chip (Agilent technologies). Volumes corresponding to 500 ng of DNA were transferred in new 1.5 ml tubes and

3 fully dehydrated using a SpeedVac at low temperature (approximately 2 hours). DNA was solubilized in 3.4 µl of nuclease-free water and captured using custom 80-mers biotinylated RNAs following manufacturer s protocol (MYbaits MYcroarray). The resulting single stranded (ss) DNA samples were purified by 1.6x SPRIselect. Captured DNA (10 µl/reaction) was added in a mix containing 1 µl of PfuUltra II Fusion HS DNA Polymerase (Agilent technologies), 5 µl PfuUltra II Buffer 10x, 0.5 µl dntp 25 mm, 1 µl primer P5 10 µm (5 AATGATACGGCGACCACCGAG 3 ), 1 µl primer P7 10 µm (5 CAAGCAGAAGACGGCATACGAG 3 ), and 2.5 µl DMSO in a final volume of 50 µl. The PCR reaction was performed using the following program on the Veriti thermocycler (Applied Biosystems): 2 min at 98 C, 18 cycles (98 C 20 s, 60 C 20 s, 72 C 30 s) and finally 3 min at 72 C. PCR products were purified using 1x SPRIselect. DNA size distribution was checked using a High Sensitivity DNA chip (Bioanalyzer Agilent technologies). The indexed samples were quantified by qpcr (Kapa biosystems), pooled and sequenced on a MiSeq platform (Illumina). The sequencing yielded 13,640,928 paired-end reads (2x150bp). Bioinformatics analyses Raw sequencing data were processed independently by 2 bioinformaticians using different pipelines allowing extraction of chimeric read pairs (ie. 1 pair mapped on AAV and its mate mapped on host cell DNA) and chimeric reads (i.e. the read overlapped both references). For all analyses, the mouse reference genome assembly (Genome Reference Consortium GRCm38, UCSC version mm10) and the raav RSVp egfp SV40-pA genome (see reference sequence below) were used as templates for NGS data mapping. Source codes of Chimera_Finder and FindMyVirus, as well as additional information (dependencies, requirements, etc.) are freely available at the following links: and Raw Fastq data and BAM files obtained by FindMyVirus are available from the Dryad Digital Repository: We obtained comparable results with both pipelines, in particular a similar frequency of false positives was found in negative controls. Coverage data were visualized with the integrated genomic viewer (

4 Oligonucleotide sequences for library indexing P5 oligo 5' AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T 3' P7 indexed oligo 1 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG 3' P7 indexed oligo 2 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGCTTG 3' P7 indexed oligo 3 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTAATCTCGTATGCCGTCTTCTGCTTG 3' P7 indexed oligo 4 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTGAAAATCTCGTATGCCGTCTTCTGCTTG 3' P7 indexed oligo 5 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG 3' P7 indexed oligo 6 5' P-GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG 3' *: phosphorothioate linkage, P : phosphate, underlined sequence = index raav2/8 RSVp egfp SV40 pa reference sequence ctgcgcgctcgctcgctcactgaggccgcccgggcaaagcccgggcgtcgggcgacctttggtcgcccggcctcagtgagcgagcgagcgcgca gagagggagtggccaactccatcactaggggttccttgtagttaatgattaacccgcatgctacttatctacgtagccatgctctaggaagatc tcgacgcgtcatgtttgacagcttatcatcgcagatccgtatggtgcactctcagtacaatctgctctgatgccgcatagttaagccagtatct gctccctgcttgtgtgttggaggtcgctgagtagtgcgcgagcaaaatttaagctacaacaaggcaaggcttgaccgacaattgcatgaagaat ctgcttagggttaggcgttttgcgctgcttcgcgatgtacgggccagatattcgcgtatctgaggggactagggtgtgtttaggcgaaaagcgg ggcttcggttgtacgcggttaggagtcccctcaggatatagtagtttcgcttttgcatagggagggggaaatgtagtcttatgcaatactcttg tagtcttgcaacatggtaacgatgagttagcaacatgccttacaaggagagaaaaagcaccgtgcatgccgattggtggaagtaaggtggtacg atcgtgccttattaggaaggcaacagacgggtctgacatggattggacgaaccactaaattccgcattgcagagatattgtatttaagtgccta gctcgatacaataaacgccatttgaccattcaccacattggtgtgcacctccaagctgggtaccagctgctagcaagcttgagatctgcttcag ctggaggcactgggcaggtaagtatcaaggttacaagacaggtttaaggagaccaatagaaactgggcttgtcgagacagagaagactcttgcg tttctgataggcacctattggtcttactgacatccactttgcctttctctccacaggtgcagctgctgcagcgggaattcatggtgagcaaggg cgaggagctgttcaccggggtggtgcccatcctggtcgagctggacggcgacgtaaacggccacaagttcagcgtgtccggcgagggcgagggc gatgccacctacggcaagctgaccctgaagttcatctgcaccaccggcaagctgcccgtgccctggcccaccctcgtgaccaccctgacctacg gcgtgcagtgcttcagccgctaccccgaccacatgaagcagcacgacttcttcaagtccgccatgcccgaaggctacgtccaggagcgcaccat cttcttcaaggacgacggcaactacaagacccgcgccgaggtgaagttcgagggcgacaccctggtgaaccgcatcgagctgaagggcatcgac ttcaaggaggacggcaacatcctggggcacaagctggagtacaactacaacagccacaacgtctatatcatggccgacaagcagaagaacggca tcaaggtgaacttcaagatccgccacaacatcgaggacggcagcgtgcagctcgccgaccactaccagcagaacacccccatcggcgacggccc cgtgctgctgcccgacaaccactacctgagcacccagtccgccctgagcaaagaccccaacgagaagcgcgatcacatggtcctgctggagttc gtgaccgccgccgggatcactctcggcatggacgagctgtacaagtaagatgcggccgcagttacgctagggataacagggtaatataggcggc cgcttcgagcagacatgataagatacattgatgagtttggacaaaccacaactagaatgcagtgaaaaaaatgctttatttgtgaaatttgtga tgctattgctttatttgtaaccattataagctgcaataaacaagttaacaacaacaattgcattcattttatgtttcaggttcagggggagatg tgggaggttttttaaagcaagtaaaacctctcaaatgtggtaaaatcgattaggatcttcctagagcatggctacgtagataagtagcatggcg ggttaatcattaactacaaggaacccctagtgatggagttggccactccctctctgcgcgctcgctcgctcactgaggccgggcgaccaaaggt cgcccgacgcccgggctttgcccgggcggcctcagtgagcgagcgagcgcgcag Fasta, genebank and bed files are available from the Dryad Digital Repository:

5 Supplementary data Controls (%) Mouse samples (%) Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr Chr X Chr Y Mitochondria Supplementary Table 1: Genomic distribution of raav-host DNA junctions. Percentages of junctions mapped on each mouse chromosome (Chr) and mitochondrial genome are represented for the two controls and the four mouse samples.

6 Supplementary Figure 1: Distribution of junction breakpoints along raav genomes. The upper panel represents an annotated map of raav2/8 RSVp egfp SV40-pA genome. The coverage of reads mapped along raav having a mate pair in the mouse genome (GRCm38/mm10) is figured below, for negative control replicates and experimental samples. A maximum of 119, 61, 201, 213, 188 and 172 overlapping reads were obtained for Control 1, Control 2, Mouse 1, Mouse 2 Mouse 3 and Mouse 4, respectively.

7 Polyclonal raav integration Clonal samples / Pre-amplified IS Supplementary Figure 2: Schematic representation of the paired-end sequencing protocol applied by Kaeppel et al. and in our study (left) compared to clonal or pre-amplified raav IS (right). When raav IS are only represented once, the bioinformatic analyses cannot differentiate them from artefacts (left). When IS are represented several times, the multiple retrieval of the same event by bioinformatics allows to correctly identify the IS among the artefacts (right). ITR: inverse terminal repeat, #: chimeric fragment generated during adapter ligation step, *: raav integration site.

8 Supplementary Figure 3: Comparison of the sequencing protocols used by Kaeppel et al. (left) and by our laboratory (right), from DNA extraction to the preparation of the library. The standard procedure for library preparation is indicated in the center and specific steps are in bold under the corresponding protocol. IM: intramuscular; vg/kg: vector genomes per kilogram; (#): In the protocol used by Kaeppel et al. this step was replaced by a direct blunt ligation of the adapters. (*): Steps prone to generating artificial chimeric reads