Ad5 genome. Before filtering. After filtering. Suppl. File 4 PAGE mated reads. Ad5 DNA insertion site 5' 3' 293 unmated reads

Size: px
Start display at page:

Download "Ad5 genome. Before filtering. After filtering. Suppl. File 4 PAGE mated reads. Ad5 DNA insertion site 5' 3' 293 unmated reads"

Transcription

1 Before filtering Ad5 DNA insertion site 5' 3' After filtering B Ad5 genome : [0-144] : [0-5771] 293 chr19 position chr19:48,389,121 Ad5 DNA Ad5 DNA chr19:48,389,144 Ad5 genome position (A) Plasmid insertion site determination plot before and after filtering for mapping uniquely to the human genome sequence. The filtering significantly improves the specificity to find the insert site, as illustrated here for the adenoviral insertion in the 293 genome. (B) Adenoviral sequence segment present in the 293 genome. Both and map to a ~4 kb region of the Ad5 viral genome that includes E1A and E1B coding sequences. Inset: the breakpoint-spanning map to a specific region on chromosome 19. The insertion site was confirmed on both sides by PCR and Sanger sequencing of the amplicons. All mapping coverage plots and scatter plots (including those on the next few pages) represent not filtered for unique mapping to the human genome, for maximal sensitivity. is calculated per bp (RTG). PAGE 1

2 chr3 position (bp) chr3 position (bp) SV40T plasmid prsv1609 SV40T plasmid prsv1609 "Large T CDS" : [0-66] 293T : [0-103] 293T prsv1609 plasmid insertion site discovery in the 293T genome. Both and that align to the putative prsv1609 sequence are illustrated here. We found 2 distinct sets of breakpoint, suggesting 2 different insertions on human chromosome 3: one located in the plasmid origin of replication, the other at the end of the Large T CDS and 3' region of its expression cassette. PAGE 2

3 A pfrt/laczeo B [0-193] Chr12 position (bp) [0-196] pfrt/laczeo position (bp) C D pcdna6/tr Chr20 position (bp) [0-146] [0-196] pfrt/laczeo and pcdna6/tr plasmid discovery in the 293FRT genome. (A and B) The breakpoint for the pfrt/laczeo plasmid align in the neighbourhood of the SV40 early promoter of the plasmid, and chromosome 12 in the human genome. Note the lack of coverage around the puc ori of the plasmid; this suggests that part of this region is not present in the 293FRT genome. (C, D and E) The breakpoint for the pcdna6/tr plasmid align in the neighbourhood of the CMV promoter of the plasmid, and chromosome 20 in the human genome. The few hits on chromosome 11 are false positives as the map to very short stretches of sequence which are almost identical on both the plasmid and the genome. PAGE 3 False positive Chr11 position (bp) E

4 A pxp2d2-rpap -luci B [0-199] chr9 position (bp) [0-186] pxp2d2-rpap-luci position (bp) C D Chr9 position (bp) pm5neo-mecor [0-193] False positive F False positive (Ecotropic receptor) E Chr13 position (bp) Chr8 position (bp) [0-198] pm5neo-mecor position (bp) pm5neo-mecor position (bp) Text pm5neo-mecor position (bp) pxp2d2-rpap-luci and pm5neo-mecor plasmid discovery in the 293FRT genome. (A and B) The breakpoint for the pxp2d2-rpap-luci plasmid align in the neighbourhood of the rpap promoter of the plasmid, and chromosome 9 in the human genome. (C and D) Similarly, the breakpoint for the pm5neo-mecor plasmid align to part of the backbone of the plasmid, and the same position as pxp2d2-rpap-luci in the human genome. The absence of read coverage between position ~ on the plasmid suggests that this part was deleted. The few hits on chromosome 8 are likely false positives (see panel E), while the hits on chromosome 13 are due to the presence of a homolog of the mouse ecotropic receptor in the human genome (panel F). The mecor coding sequence is present on this vector. The map to the exons of the human homolog in the genome (note the >20 kbp region to which these map in a discontinuous manner). See also Figure 6 for the results of the PCR and Sanger sequencing validation of these insertion sites. PAGE 4

5 pcdna6/tr [0-349] 293SG [0-619] 293SG B C D chr5 position (bp) chr6 position (bp) chr6 position (bp) E F chr7 position (bp) chr7 position (bp) chr17 position (bp) chr11 position (bp) G pcdna6/tr plasmid insertion site discovery in the 293SG genome. (A, B, C, D, E, F, G) The entire pcdna6/tr plasmid sequence is well-covered by both and. We found several sets of breakpoint, matching to positions on chromosome 5, 6, 7, 11 and 17, suggesting that this plasmid is present in multiple sites across the genome. One or more copies of the plasmid has integrated in a inverted repeat, reflected in scatter plots that are each other's mirror image. This region of chromosome 7 is known to be misassembled in the current builds of the human genome. Such complex integrations are difficult to resolve using short-read sequencing technologies. PAGE 5

6 pcdna6/tr [0-299] [0-240] B C D E chr7 position (bp) chr6 position (bp) chr11 position (bp) chr17 position (bp) pcdna6/tr plasmid insertion site discovery in the genome. (A) The entire pcdna6/tr plasmid sequence is well-covered by both and. (B, C, D, E) Breakpoint match chromosome 6, 7, 11 and 17. PAGE 6

7 pcdna3.1-zeo- STendoT [0-249] [0-240] chr3 position (bp) B False positive C D (ST6GAL1) chr13 position (bp) pcdna3.1-zeo-stendot insertion site, chr13 5' 3' chr13:56,361,622 STendoT plasmid pcdna3.1-zeo-stendot position (bp) pcdna3.1-zeo-stendot position (bp) pcdna3.1-zeo-stendot plasmid insertion site discovery in the genome. (A and C) The breakpoint for the pcdna3.1-zeo-stendot plasmid suggest 2 different insertion sites on chromosome 13, one in which the plasmid breakpoint is located in the middle of the hst-endot-myc cassette of the plasmid which would disrupt the function of this transgene. As cells express functional EndoT protein activity (manuscript in preparation), the other integrated plasmid copy must be responsible for the expression of hst-endot in this cell line. We were able to validate one breakpoint of that copy by PCR and Sanger sequencing (panel D). Additionally, we found one set of breakpoint that map to chromosome 3; these are falsepositive associated with the sequence identity of the first section of the hst-endot CDS. PAGE 7