Targeted complete next generation sequencing and quality control of transgenes and integration sites in CHO cell line development

Size: px
Start display at page:

Download "Targeted complete next generation sequencing and quality control of transgenes and integration sites in CHO cell line development"

Transcription

1 Targeted Locus Amplification Technology Targeted complete next generation sequencing and quality control of transgenes and integration sites in CHO cell line development Cergentis B.V. Yalelaan CM Utrecht The Netherlands

2 Contents Introduction... 3 TLA technology... 4 Experimental set-up... 4 Multiple vectors and large transgenes... 4 TLA in the cell line development process... 4 A. Development and quality control of new cell line generation techniques... 4 B. Clone selection... 5 C. Genetic characterisation of MCBs... 5 D. Clonality assessment... 5 E. Assessment of genetic stability... 5 TLA-based sequencing data... 5 Transgene sequence... 5 Single nucleotide variants and InDels... 5 Structural variants... 5 Identification of unknown transgene sequences and co-integrations... 5 Transgene copy numbers... 6 Integration sites... 7 The CHO genome... 7 Whole-genome coverage plots... 7 Breakpoint sequences... 7 Structural changes in integration sites... 8 Targeted sequencing of individual integration sites... 9 Sequencing of targeted integrations, knock-outs, etc Genetic stability of MCBs, WCBs and EOP cells... 9 References Cergentis services and kits Contact CHO Cell Line Development page 2 of 10

3 Summary Cergentis TLA technology is a powerful tool to completely sequence transgenes and their integration sites TLA can detect all SNVs and structural variants in the integrated transgene sequence TLA analyses are used in the development and quality control of cell lines, for clone selection, and for the assessment of clonality and genetic stability Introduction Chinese hamster ovary (CHO) cells are widely used for the production of recombinant proteins and therapeutic antibodies 1. The efficient generation of high quality CHO cell lines is a critical step in the development and production of (candidate) drugs. The quality and protein expression of a transgenic CHO cell line depends on the location of the transgene integration site(s) and the integrity of integrated transgene sequences. Multiple methods are used to create transgenic cell lines and new techniques continue to be developed. All these techniques can result in undesired (off-target) integrations, multiple integration sites, unexpected integrations of backbone sequences as well as undesired sequence or structural variants in the integrated transgene sequence 2,3,4,5,6. To analyse the genetics of the generated cell lines, Southern blot and FISH analyses are often used. However, generated data is incomplete and can be hard to interpret. Cergentis proprietary Targeted Locus Amplification (TLA) technology 7 is a powerful tool to sequence transgenes and their integration sites. As such, TLA-based transgene and integration site sequencing presents a cost-effective and high-quality alternative to conventional approaches to control the quality of transgenic cells. TLA-based transgene analyses can: identify the transgene integration site(s) detect structural changes in the host DNA at the transgene integration site(s) sequence the entire transgene and detect any single nucleotide variants as well as structural changes within the transgene provide an estimation of the transgene copy number Figure 1: Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications require one primer pair complementary to a short locus-specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus-specific sequence and declines with greater physical distance from the locus-specific sequence. CHO Cell Line Development page 3 of 10

4 TLA technology The TLA technology uniquely enables the targeted amplification and next generation sequencing (NGS) of any locus of interest using just one primer pair complementary to a short locus-specific sequence. TLA uses the physical proximity of sequences as the basis of selection and results in the amplification and sequencing of hundreds of kilobases surrounding the locus-specific sequence 7 (Figure 1). In the analysis of larger transgenes, additional TLA primer pairs can be added to ensure sufficient sequencing coverage is generated on the integration site(s) and the surrounding host genome. TLA in the cell line development process The TLA technology can be applied in various stages of the cell line development process (Figure 3). The TLA protocol can be executed manually and has also been automated 8. Experimental set-up In transgene sequencing analyses, primer pairs complementary to short transgene-specific sequences are used. In this manner, TLA generates sequence information across the entire integrated transgene and across the genomic locus in which the transgene has integrated (NB: transgene refers to the entire transgenic sequence that was introduced, i.e. including any plasmid/vector/backbone sequences) (Figure 2). Figure 3: Overview of different stages of cell line development and the purpose of TLA analyses. Figure 2: TLA-based transgene sequencing. Using one TLA primer pair complementary to a sequence unique to the transgene, complete sequence information is generated across the transgene and its integration site(s). Cergentis recommends the use of two transgene-specific primer pairs complementary to two different transgene specific sequences in two independent TLA amplifications and sequencing analyses. This enables the detection of any partial integrations consisting of DNA sequences complementary to just one of the used primer pairs. In addition, this results in two independent data sets and provides an excellent quality control of all identified genetic variants. Multiple vectors and large transgenes TLA analyses can also be applied if a sample contains multiple different integrated vectors. Depending on the experimental requirements and the nature of integrated transgene sequences, TLA primers can be used complementary to sequences that occur in all vectors, or primers can be designed that are specific for each individual integrated vector. A. Development and quality control of new cell line generation techniques Several methods are in use to generate transgenic CHO cells and new techniques continue to be developed. The genetic characterisation of transgenes and genomic integration sites gives detailed information about the quality, efficiency and reliability of the method used for transgenesis. TLA analyses can determine the number of integration sites and their exact positions and assess if integrated transgenic sequences contain single nucleotide variants or structural changes. Additionally, TLA can also be used to identify integration sites in high-producer lines as candidates for possible targeted integrations. The TLA data will show whether an integration site is clean or subject to (large) genomic rearrangements. This is important, because large structural changes cannot be reconstituted using targeted integration. Clean integration sites in highproducing cell lines are the most favourable candidates for targeted transgene integrations. CHO Cell Line Development page 4 of 10

5 B. Clone selection TLA analyses can increase the efficiency and quality of clone selection 9. TLA can be used to: Deselect siblings; TLA analyses can determine which candidate cell lines share the same integration site(s) and are of the same genetic origin. Select cell lines without transgene sequence variation in the gene of interest (GOI). TLA analyses enable the early deselection of cell lines with potentially problematic single nucleotide or structural variants in the plasmid or GOI sequence. TLA-based sequencing data Transgene sequence TLA analyses result in high sequencing coverage across the integrated transgene sequence (Figure 4). Therefore, TLA enables the sensitive detection of sequence variants. Select cell lines with desired (numbers of) transgene integration sites. The genetic stability of cell lines is affected by the number and complexity of integration sites. This can, therefore, be reason to select cell lines with a small number of clean integration sites. Furthermore, the use of genetically clean CHO cell lines can facilitate regulatory filings. C. Genetic characterisation of MCBs Since TLA generates high-quality data on the transgene sequence, TLA analyses can be used for the complete genetic characterisation of transgenes and integration sites in MCBs, in accordance with the ICH Q5B guideline 10. D. Clonality assessment TLA analyses detect transgene integration site(s) at basepair resolution. Identified breakpoint-spanning reads are unique to an integration site. Breakpoint sequence-based PCRs can therefore be used to test sub clones and assess the clonality of an MCB (if an MCB is clonal, all sub clones are expected to share the same integration site(s)), in accordance with the ICH Q5D guideline 11,12. E. Assessment of genetic stability TLA-based analyses of multiple generations (e.g. the MCB, WCBs of different generations and the EOP cells) will provide information about the stability of a certain clone in time; stable cell lines are expected to consistently show the same integration sites and (if present) transgene sequence variants. Figure 4: Example of an NGS coverage profile across a transgene. A large part of the transgene has been sequenced with > 1000x coverage (i.e. with at least 1000 NGS reads). Y- axis shows NGS coverage, cut-off at 1000x. X-axis shows the position within the transgene sequence. Single nucleotide variants and InDels Identified single nucleotide variants (SNVs) and small InDels (insertions or deletions) are specified in tables (Figure 5). Structural variants TLA enables the detection of structural variants of the transgene sequence by detecting fusions between different transgene sequences. Identified transgenetransgene fusions either represent a junction site of two transgene copies that have integrated as a concatemer or are the result of a structural rearrangement (inversion, deletion or duplication) within one copy of a transgene (Figure 6). Identification of unknown transgene sequences and cointegrations TLA analyses can be used to sequence and identify unexpected (partial) vector sequences, unknown transgene sequences and to detect co-integrations of other DNA sequences 2. Detailed analyses of unexpected sequences can be performed by mapping generated sequence information on appropriate reference (genome) sequences. CHO Cell Line Development page 5 of 10

6 Transgene copy numbers TLA analyses enable the estimation of transgene copy numbers. Copy number estimations are based on three variables: the number of integration sites, the number of transgenetransgene fusions, and the ratio of the sequencing coverage on the transgene-side and genome-side of the integration site. Using transgene-specific primer sets, only the allele with an integrated transgene sequence is sequenced. Therefore, the estimation provides information about the copy number present in transgenic alleles. Since alleles containing transgenes might have duplicated, and not all cells might contain a transgenic allele, this is not always the same as the copy number per cell in the analysed sample. Figure 5: Example of a table with identified SNVs and InDels in a transgene sequence. Region: Indication of annotated region in the transgene sequence. Pos: position of the mutation in the specified transgene sequence. Ref: reference nucleotide present within the transgene sequence. Mut: identified mutation compared to the reference sequence. Cov: sequencing coverage at the position of the mutation (for primer set 1 or 2). %: Percentage of reads in which the mutation was identified. Figure 6: TLA-based sequencing of a transgene concatemer. A. Schematic depiction of an integration site consisting of a concatemer of three (partial) transgene sequences. The orientation of the transgene is indicated with arrows. The numbers below indicate the position in the transgene sequence. B. Example of a table with identified transgenetransgene fusions. For each of the fusions, the exact base pair position of the fusion within the transgene sequence, as well as the relative orientation of the fusion partners is specified. CHO Cell Line Development page 6 of 10

7 Integration sites TLA-based transgene sequencing enables the detection of the exact genomic positions of integration site(s). Integration sites are detected based on: Sequencing coverage peaks in the host genome Breakpoint sequences between the host genome and transgene sequence The CHO genome The generated data is mapped against a reference genome. In standard TLA-based CHO analyses, Cergentis uses the CHO-K1GS assembly CHOK1GS_HZDv1, released in Upon request, other publicly available reference genome sequences or genome assemblies provided by the customer can be used. Whole-genome coverage plots TLA results in high sequencing coverage across the genomic positions of transgene integration sites. Integration sites are therefore clearly visible in wholegenome coverage plots (Figure 7). Breakpoint sequences TLA analyses can determine transgene integration sites at base pair resolution and identify the breakpoint sequence between an integrated transgene and the flanking host genome. For each of the identified breakpoints, the position as well as the orientation of the fusion are reported (Figure 8). Figure 7: Examples of whole-genome coverage plots of the CHO genome. The top 50 scaffolds with NGS coverage are shown. The identified TLA coverage peaks, indicative of an integration event, are clearly visible. The left plot shows a single integration site in scaffold 1. The plot on the right shows 2 integrations: 1 in scaffold 8, the other in scaffold 28. The identified integration sites are encircled in blue. Figure 8: Transgene-genome breakpoint sequences. A. Schematic depiction of an integrated transgene into the host genome, with the positions in the sequence indicated below the breakpoints. B. Breakpoint sequences as depicted in a TLA report. For each breakpoint, the exact sequence as well as the relative orientation of the fusion partners is provided. CHO Cell Line Development page 7 of 10

8 Breakpoint-based genotyping Since identified breakpoint sequences are unique to a particular integration site, they can be used for the design of conventional PCR primers to discriminate between the integrated allele and the wild-type allele 2. Such PCRs can then be used to assess the presence of the same integration site in daughter cells and, as such, the clonality of the original parental cell line. Structural changes in integration sites Transgene integrations frequently result in structural changes in the host genome 2,3. TLA analyses enable the detection of structural changes in integration sites. Examples of sequencing coverage profiles resulting from a clean integration and possible rearrangements are shown in Figure 9. If complex structural rearrangements are identified, e.g. chromosomal translocations or co-integrations, pairedend sequencing analyses can be used to determine which breakpoints occur in physical proximity to each other and constitute the same integration site 14. TLA analyses using primers specific for identified integration sites can be performed to further characterise rearrangements resulting from transgene integrations. Figure 9: Schematic depictions of the sequencing coverage profiles resulting from different rearrangements. The coverage profiles show results from mapping the generated reads to the host genome. A schematic map of the host genome with the integrated transgene sequence is shown below each coverage plot. Arrows in the host genome indicate the orientation of the sequence after the integration. Red arrows show the positions of transgene-genome breakpoint sequences. CHO Cell Line Development page 8 of 10

9 Targeted sequencing of individual integration sites TLA analysis with primers complementary to a wild-type sequence next to an individual integration site will provide sequence information across both transgenic and wild-type alleles. This enables the quantification of integration events in that locus (Figure 10). editing (e.g. targeted knock-outs or -integrations using ZFN, TALEN or CRISPR/Cas9). TLA can thus be used to assess whether genetic alterations have been generated successfully and efficiently 4 (Figure 11). Figure 10: TLA-based analyses of individual integration sites. A TLA analysis with a primer pair in close proximity to the integration site, provides sequence information across the entire locus. Such analyses can also be used to determine which genetic variants within the transgene sequence (i.e. SNVs and transgene-transgene fusions) occur in which integration site. Sequencing of targeted integrations, knock-outs, etc. TLA analyses can be used to perform targeted sequencing of loci that were targeted using homology arm driven integrations, as well as loci in which genetic alterations have been introduced using targeted gene Figure 11: TLA-based analyses of targeted genetic modifications. A TLA analysis with a primer pair in close proximity to the targeted site, provides sequence information across the entire locus, including the target site. Genetic stability of MCBs, WCBs and EOP cells TLA enables detailed analyses of the genetic stability of integration sites and the transgene sequence. A TLA based analysis of a MCB and different generations of WCB and EOP cells can establish whether cells continue to share the same integration site(s) and transgene sequence or whether genetic alterations have occurred 10,11 (Figure 12). Figure 12: Examples of comparisons between MCB, WCB and EOP cells of different cell lines. The WCBs and EOP cells of cell lines 1 and 2 share all integration sites, SNVs and transgene fusions identified in the original MCB. Genetic differences occur in cell line 3: 2 integration sites (A.) and 2 transgene-transgene fusions are lost (C.) and 2 SNVs (B.) have arisen over time. (The frequencies of the SNVs and transgene-transgene fusions in generated sequencing data are not represented in these tables. However, this information is available and provides information about the relative abundance of these variants.) CHO Cell Line Development page 9 of 10

10 References 1. Zhu J., Hatton D. (2016) New Mammalian Expression Systems. In: Advances in Biochemical Engineering/ Biotechnology. Springer, Berlin, Heidelberg 2. Leslie O. Goodwin et al. Large-scale discovery of mouse transgenic integration sites reveals frequent structural variation and insertional mutagenesis biorxiv, December 18, Carol Cain-Hom et al. Efficient mapping of transgene integration sites and local structural changes in Cre transgenic mice using targeted locus amplification Nucleic Acids Research, Justin Eyquem et al. Targeting a CAR to the TRAC locus with CRISPR/Cas9 enhances tumour rejection Nature 543: Dan Boyd et al. Isolation and characterization of a monoclonal antibody containing an extra heavy-light chain Fab arm mabs 2018, Vol 0, No 0, Horizon genome: CHO-K1GS assembly CHOK1GS_HZDv1 (GCA_ ): ftp://ftp.ensembl.org/pub/release- 90/fasta/cricetulus_griseus_chok1gshd/dna/ 14. Matthew Snyder et al. Haplotype-resolved genome sequencing: experimental methods and applications Nature Reviews Genetics 16: Cergentis Application Note; Targeted complete next generation sequencing of transgenes and integration sites Cergentis services and kits Cergentis provides TLA as service and kits. 6. Christian Kaas et al. Deep sequencing reveals different compositions of mrna transcribed from the F8 gene in a panel of FVIII-producing CHO cell lines Biotechnology Journal 10: De Vree PJ et al. Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping Nat Biotechnol 32: Erik Splinter et al. Automated Targeted Locus Amplification (TLA) Technology for Targeted Complete Gene Sequencing Using the JANUS NGS Express Workstation Perkin Elmer Application Note 9. Brian Mickus (Gilead Sciences) Targeted sequencing for comprehensive genetic characterization of a recombinant CHO cell line Cell Culture Engineering XV, May ICH Q5B - Analysis of the Expression Construct in Cells Used for Production of r-dna Derived Protein Products ICH, November 1995 Contact Address: Cergentis B.V. Yalelaan CM Utrecht The Netherlands 11. ICH Q5D - Derivation and Characterisation of Cell Substrates Used for Production of Biotechnological/ Biological Products ICH, July 1997 Website: Phone: Christopher Frye et al. Industry view on the relative importance of clonality of biopharmaceuticalproducing cell lines Biologicals, 44 (2016) General: Sales: info@cergentis.com sales@cergentis.com CHO Cell Line Development page 10 of 10