Single Molecule Variant Detection: From Heteroduplexes in a Single DNA Molecule to Whole Chromosome Rearrangements

Size: px
Start display at page:

Download "Single Molecule Variant Detection: From Heteroduplexes in a Single DNA Molecule to Whole Chromosome Rearrangements"

Transcription

1 Single Molecule Variant Detection: From Heteroduplexes in a Single DNA Molecule to Whole Chromosome Rearrangements Joris Vermeesch PacBio User meeting 10 November 2015 Barcelona

2 Outline Polymerase Specific Error Rates and Profiles PAR1 Length Polymorphism in the Human Population A Distinct Class of Chromoanagenesis Events Characterized by Focal Copy Number Gains

3 Polymerase error rates? Traditional methods are indirect Can we measure mutation rates directly?

4 PacBio Circular Sequencing DNA Fragment w/pacbio Adaptors PacBio Circular Sequencing Subread Alignment = Accurate Consensus Sequence = random sequencing error

5 Experimental Setup Amplicons: Different PCR products (Primers with PacBio Barcode per polymerase) Polymerases: Usage: - KAPA HiFi DNA Polymerase Illumina Library preps - Phusion High-Fidelity PCR Master Mix exome amplification - Illumina TruSeq Nano NIPT - Takara LA Taq optimized for long range PCR - HotStar HiFidelity DNA Polymerase NGS amplicons - Platinum Taq general usage PacBio RSII: - 4 SMRTcells - DNA/Polymerase Binding Kit P4-180 minute movies

6 Number of Passes and Errors Phusion Kapa HotStar Takara PlatTaq Nano CASK -> FAM120C -> Avg. 18 passes

7 Results by Nt (Min.10 passes) Reproducibility p-value between SMRTcells: between polymerases: x10-39 between duplicate PCR/library:

8 Results by Nt (Min.10 passes) Overall Error Rate: 1.81x x x x x x10-5

9 PacBio Heteroduplex Sequencing DNA Fragment w/pacbio Adaptors H PacBio Circular Sequencing Subread Alignment H H H H = Accurate Consensus Sequence H H H H = Heteroduplex sequence variant = random sequencing error

10 PacBio Heteroduplex Sequencing Forward Strand Reads Reverse Strand Reads

11 PacBio Heteroduplex Sequencing Phusion KAPA HotStar Nano TaKaRa Plat.Taq T>C A>G C>T G>A Observation: pyrimidine transitions are more frequent than purine transitions.

12 Outline Polymerase Specific Error Rates and Profiles PAR1 Length Polymorphism in the Human Population A Distinct Class of Chromoanagenesis Events Characterized by Focal Copy Number Gains

13 Pseudoautosomal region of X and Y

14 Detection of duplication flanking PAR PAR

15 X specific amplicon is paternally transmitted! ChrX ChrY 2xFather-PAR PAR 2xControl-PAR

16 X specific amplicon is paternally transmitted! ChrX ChrY 2xFather-PAR PAR Dup 2xControl-PAR 2xFather-Dup 1xControl-Dup

17 BAC based baits and resequencing shows a 5 kb region with three copies Illumina Targeted Sequencing

18 Extended PAR region validation and functional proof Schematic Insertional Translocation Representation by Non-Allelic Homologous Recombination 1) Validate the breakpoint. 2) Is this a single ancestral event or recurrent? 3) Functions as a pseudoautosomal sequence (i.e. evidence of recombination)?

19 Population structure of extended Y

20 Junc2 Junc1 Validation of PAR breakpoint by PacBio sequencing: evidence of recombination? Junction Primers Two Junction Sequences Identified: TCTTGTGTTGTACCCGAGCGAGTTAGAAAAACGCCACACTTTGAGACGATTTAAGAGTCCTTTATTAGCCGGCGACCGAGAGACGGCTA ACGCTCAAAATTCTCTCGGCCCCGAGGAAGGGGCTTGATTAACTTTTAGATCTTGGTTTAGGAAGGGGAGGGCGGGGGGTCTAGTGAA AACCATTTTACAGAAGTAAAGTAGGCAAAAAGTTAAAAGGATAAATGGTTGCAGGAAAGTAAACAGTTCCAGGTGCAGGGGCTTTAAGAC TATTACAAGGTGATAGACGCG_G_GGCTTTGGGCGTTACTAATCAGACGAATTCCCGGGAACTGCGGATGTAGCTCGCCACAGTATCTTA TCAGTTAACTGCATTCTTGGATGTGCTGGGAGTCAGCCTGCACGAGTTCAGTCCTTGAGGAAGGGGCTGCCAGTGAAAGAGCCAAGGT GGAGTCTGGC_G_GGCTCTCTTAGCTAAGGGAGAGTCCATTCAGGTGGAAAGAAGGCTAGGTGAGTAGAGGAAAAGGGAGAGTCTAAA AACAGGTTAGTAAAAACCAGGTTGGGCATTACAGGTGAAACCCCGTCTCTACTAAAAAATACAAAAAAAATTTGCCAGGCATGGTGGCGG GCGCCTGTAGTCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGC_A_TGAACC_T_GGGAGGCGGAGCTTGCAGTGAGCCGAGAT CTCCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCT CAACAACAACAACAAAAAGAAATAACTCCCAGACTTCCAGCA GACTCCTTGACTGCCATGAGAGATGTCAGG TCTTGTGTTGTACCCGAGCGAGTTAGAAAAACGCCACACTTTGAGACGATTTAAGAGTCCTTTATTAGCCGGCGACCGAGAGACGGCTA ACGCTCAAAATTCTCTCGGCCCCGAGGAAGGGGCTTGATTAACTTTTAGATCTTGGTTTAGGAAGGGGAGGGCGGGGGGTCTAGTGAA AACCATTTTACAGAAGTAAAGTAGGCAAAAAGTTAAAAGGATAAATGGTTGCAGGAAAGTAAACAGTTCCAGGTGCAGGGGCTTTAAGAC TATT ATAGACGCG_A_GGCTTTGGGCGTTACTAATCAGACGAATTCCCGGGAACTGCGGATGTAGCTCGCCACAGTATCTTATC AGTTAACTGCATTCTTGGATGTGCTGGGAGTCAGCCTGCACGAGTTCAGTCCTTGAGGAAGGGGCTGCCAGTGAAAGAGCCAAGGTGG AGTCTGGC_T_GGCTCTCTTAGCTAAGGGAGAGTCCATTCAGGTGGAAAGAAGGCTAGGTGAGTAGAGGAAAAGGGAGAGTCTAAAAA CAGGTTAGTAAAAACCAGGTTGGGCATTACAGGTGAAACCCCGTCTCTACTAAAAAATACAAAAAAAATTTGCCAGGCATGGTGGCGGGC GCCTGTAGTCTCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGC_A_TGAACC_T_GGGAGGCGGAGCTTGCAGTGAGCCGAGATCTC CCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCT CAACAACAACAACAAAAAGAAATAACTCCCAGACTTCCAGCAGAC TCCTTGACTGCCATGAGAGATGTCAGG

21 Extended PAR: evidence of recombination? Duplication Primers Example family 1: } Paternally inherited allele

22 Extended PAR: evidence of recombination? Duplication Haplotypes rs (A>G) rs (G>C) chrx: (G>A) rs (A>G) rs (G>T) chrx: (A>G) rs (A>G) rs (C>G) rs (G>T) rs (A>G) rs (C>T) rs (C>G) chrx: (C>T) rs (C>G) rs (C>G) rs (G>A) rs (A>G) rs (deltg) rs (G>T) B=brother F=father P=patient #=family

23 Extended PAR Region Conclusions The duplicon is an insertional translocation due to non-allelic homologous recombination from the X to the Y chromosome that is flanked by a long terminal repeat (LTR6B). This is a recurrent event based on: the rare insertion occurs in different Y-chr haplogroups, occurs from different geographies, and the finding of a reciprocal deletion. X/Y recombination occurs in ancestrally related individuals based on: 4 different extended PAR duplication haplotypes and 2 different PAR junction haplotypes. This finding represents a novel mechanism shaping sex chromosomal evolution.

24 Outline Polymerase Specific Error Rates and Profiles PAR1 Length Polymorphism in the Human Population A Distinct Class of Chromoanagenesis Events Characterized by Focal Copy Number Gains

25 acgh of patients identifies chromosomes characterized by multiple duplicons on a single chromosome acgh screen of patients with developmental anomalies Chr22-11 duplications Chr18-1 triplication, 7 duplications Chr22-8 duplications

26 FISH Patient 1

27 FISH Patient 1

28 FISH Patient 1

29 Breakpoint Characterization Patient DNA Illumina, Inc. HiSeq Coverage: Patient x Patient x Patient x Pacific Biosciences PacBio Coverage: Patient x Patient x

30 Breakpoint Characterization Analysis Illumina, Inc. Align w/bwa v (Li and Durbin 2009) Copy # w/seqcbs (Shen and Zhang 2012) - filtered for: - gains - p-value < min deviation of 15% from 1, but not higher than 2 - retain adjacent to non-filtered regions Structural Variation w/breakdancer v1.1.2 (Chen et al. 2009) - filtered for intrachromosomal translocations (ITX) and inversions (INV) > 20 kb - PCR and Sanger confirm

31 Breakpoint Characterization Analysis Pacific Biosciences Structural Variation: RS_BridgeMapper.1 with default settings, except: - minimum subread length 500bp - minimum polymerase read length 500bp - diploid analysis Bridgemapper split reads filtered for: - both fragments on chr22 - >20kb apart - connected two CNVs

32 Breakpoint Characterization Patient 1 acgh copy # SeqCBS copy # Junction by: Illumina PacBio Both

33 Breakpoint Characterization Patient 2 acgh copy # SeqCBS copy # Patient 3 1 Chr18 Chr22 Junction by: Illumina Both

34 Junction Sequences Patient 1

35 Mechanism Chromoanagenesis chromothripsis chromoanasynthesis Liu et al Cell; 146(6): Stephens et al Cell;144(1): Repair by Non-homologous end joining (NHEJ) (Kloosterman et al Cell;1(6): ) Replication errors by: -fork stalling and template switching (FoSTeS) -does not generate insertions at breakpoints -microhomology-mediated break-induced replication (MMBIR) -non-templated insertions <20 bp

36 New class of chromoanagenesis events Markedly different from chromothripsis and chromoanasynthesis because: Only duplications and no deletions on a single chromosome Retention of original chromosome structure. Clustering of duplications A combination of microhomology and non-templated insertions at the breakpoints Mechanism???? We hypothesize this to be a repair process driven by noncanonical non-homologous end joining mediated by polymerase theta.

37 Ongoing Tandem repeats variability Genome wide haplotyping Aims: Unravel complex LCRs in patients with genomic disorders Improve preimplantation genetic diagnosis Methods: Direct PacBio sequencing Targeted enrichment via cell hybrids/microdissection Vacancies!

38 Acknowledgements Chromoanasynthesis: Heleen Masset Hilde Van Esch Pascale Kleinfinger Julie Plaisancié Caroline Schluth-Bolard Damien Sanlaville Greet Peeters Matthias Declercq Matthew Hestand Wim Meert Jeroen Van Houdt PAR1 Project: Martin A. Mensah Maarten H.D. Larmuseau Mala Isrie Nancy Vanderheyden Erika L. Souche Radka Stoeva Hilde Van Esch Koen Devriendt Thierry Voet Ronny Decorte Peter N. Robinson F+ Fellowship

39 Overall Conclusions PacBio sequencing: Random error rates and circular sequencing provide high accuracy consensuses - identify error rates in the 10-6 range! - ideal for identifying low-frequency mosaics! Multi-kb reads permit: - sequencing over repetitive elements - variant phasing - structural variation at bp resolution Future (now): gold and platinum quality genome assemblies