mtdna Heteroplasmy and NextGen Sequencing: A Perfect Marriage

Size: px
Start display at page:

Download "mtdna Heteroplasmy and NextGen Sequencing: A Perfect Marriage"

Transcription

1 Mitchell M. Holland, Ph.D. Associate Professor, Biochem & MolBio Former Director, Forensic Science Program Penn State University, University Park, PA mtdna Heteroplasmy and NextGen Sequencing: A Perfect Marriage ISABS 25 June

2 Marriage is work, and requires planning and commitment Why NGS is the best approach for forensic mtdna analysis The challenges we face in leveraging the advantages Some recent findings that help address the challenges Where we re headed from here

3 454 GS Junior Croat Med J (2011), 52, pp illumina MiSeq Why bother changing from Sanger sequencing to NGS for forensic mtdna analysis? LT PGM Heteroplasmy Detecting & Reporting

4 Identification of Nicholas Romanov Tsar Nicholas II Family Reference 5 Generations Removed

5 DGGE to Identify Heteroplasmy Family Reference 5 Generations Removed Used DGGE analysis to identify the heteroplasmic sequences including from the distant maternal relative

6 Identification of Nicholas Romanov Tsar Nicholas II Weight of the Evidence Georgij Romanov LR = ~150 LR = ~380,000 When the heteroplasmy was considered

7 Likelihood Ratio LR = p(e1/r) x p(e2/r) p(e1/r ) x p(e2/r ) p(e1/r) = the probability of the evidence (match between Georgij and Nicholas) given the hypothesis that the remains are those of Nicholas Romanov E2 = the probability of co-occurrence of heteroplasmy R = given the hypothesis that the remains are unrelated

8 Likelihood Ratio Likelihood of a Match LR = p(e1/r) p(e1/r ) p(e1/r) = e -g = 0.96 g = number of generations between individuals = 2 m = mutation rate, which we estimated as 1/50 generations

9 Likelihood Ratio LR = p(e1/r) p(e1/r ) = 0.96/ = 148 Rounded to 150 p(e1/r) = e -g = 0.96 g = number of generations between individuals = 2 m = mutation rate, which we estimated as 1/50 generations P(E1/R ) = the haplotype frequency of 2/308 = one observation in a database of 307 unrelated people, and one observation in the case = EMPOP now has ~35,000 sequences in the database, f = (upper bound of the 95% CI)

10 Likelihood Ratio Likelihood of Sharing Heteroplasmy LR = p(e2/r) p(e2/r ) p(e2/r) = e -g = 0.61 g = number of generations between individuals = 2 = rate of fixation to (apparent) homoplasmy = ¼ for the Romanov lineage (fixation in 4 generations) = 0.25

11 Likelihood Ratio LR = p(e2/r) p(e2/r ) = 0.61/2.4 X 10-4 = 2.5 X 10 3 p(e2/r) = e -g = 0.61 g = number of generations between individuals = 2 = rate of fixation to (apparent) homoplasmy = ¼ for the Romanov lineage (fixation in 4 generations) = 0.25 P(E2/R ) = the chance of randomly sampling another individual with heteroplasmy at any position = 4/50 X 1/332 = 2.4 X /50 = the chance of randomly selecting individuals with heteroplasmy 1/332 = polymorphic sites identified in HV1/HV2 (332/610)

12 Likelihood Ratio LR = p(e1/r) x p(e2/r) p(e1/r ) x p(e2/r ) = (150)(2.5 X 10 3 ) = 3.8 X 10 5 Therefore, the DNA evidence is 380,000 times more likely if the remains are Nicholas Romanov

13 Things to consider Empirical mutation rate Transmission of heteroplasmic variants between generations, and the drift between tissues Rate of heteroplasmy On a per nucleotide basis Within population groups NIJ 2014-DN-BX-K022

14 The topic are closely related Mutations lead to heteroplasmic states Mutations in the germline are transmitted to the next generation At what point does the sequence become homoplasmic within a lineage? Mutation or Reversion? Regeneration?

15 Bottleneck At what point does the sequence become homoplasmic within a lineage? Mutation or Reversion?

16 NGS = Massive Cloning Experiment Simple Library Preparation

17 Our choice of instrument Bridge Amplification Cluster Generation MiSeq

18 Development of Mito NGS Protocols Development of an NGS approach on the MiSeq Evaluation of an NGS approach on the PGM

19 illumina Sequencing Chemistry All four fluorescently labeled ddntps (terminators) are added together, one is incorporated at each cluster and read by a CCD camera Sequencing By Synthesis Reversible Terminators

20 Reversible Terminators Once incorporated and read by the CCD camera, the dye is removed from the terminator along with the protective group on the 3 -hydroxyl group

21 Recent Studies 160+ whole mtgenome sequences Initial assessment of low level heteroplasmy rates Included in the data set were 50 maternal pairs that allowed us to assess inheritance patterns of low level mtdna variants (blind to us), including tissue specific differences empirical mutation rates and the germline bottleneck

22 Genetic Bottlenecks & Empirical Mutation Rates Given our data, the number of mtdna molecules transmitted to the next generation is Given our data, the germ-line mutation rate is 0.13 mutations/site/myr (compared to phylogenic rate estimates of 0.118) Non-synonymous mutations showed signs of purifying selection Proceedings of the National Academy of Sciences

23 Maternal Age Effects No age association was found for the children, but there was a correlation between age of the mothers & rate of heteroplasmy, and a correlation between age of the mothers at fertilization & rate of heteroplasmy in the children

24 Age Effects on Tissue- Associated Heteroplasmy Heteroplasmy allele frequencies diverged less in the tissues of a child than in those of a mother The frequencies were more strongly correlated between the two tissues for children (R 2 = 92%) than between the two tissues for mothers (R 2 = 49%) Hematopoietic Stem Cells (derived from mesoderm in the early embryo) versus Epithelial Cells (stratified squamous)

25 Effects of the Germ-line Bottleneck The lack of correlation between the tissue types of mothers and children is a result of the action of the germ-line bottleneck

26 Rate of Heteroplasmy Data Set = 109 Individual Lineages (50 Pairs of Maternal Relatives) % Heteroplasmy >1% Heteroplasmy >10% Heteroplasmy Coding Region 69% 50% 14% Control Region 50% 26% 8.6%* *Consistent with previous reports: for example, Irwin et al, J Mol Evol 2009

27 Maternal Transmission 1 of 50 maternal pairs (mother:child) had no sites of heteroplasmy and matched completely 49 of 50 pairs could be differentiated through heteroplasmic differences at the mtgenome level Maternal Transmission

28 Site Differences Between Maternal Relatives #1098 #1100 Primary Haplotype A263G Heteroplasmy Positions 200 A/G (3.0%) Primary Haplotype A263G Heteroplasmy Positions T16093C 16093T/C (12.6%) C16261T C16291C T16311C T16362C T16519C T16093C 16093C/T (3.4%) C16261T C16291C T16311C T16362C T16519C

29 Site Differences Between Maternal Relatives #1161 Primary Haplotype G203A T204C T239C A263G C16193T A16219G T16362C A16482G Heteroplasmy Positions A16037A/G (44%) G16390G/A (0.9%) #1279 Primary Haplotype G203A T204C T239C A263G Heteroplasmy Positions C150C/T (0.58%) A16037G/A (3.0%) C16193T C16193T/C (0.53%) A16219G T16362C A16482G

30 Passive & Active Damage Active damage via sodium citrate, low ph and temperature Relatively Random Mutation Pattern Poster FG18 Molly Rathbun Heteroplasmy-based ratios of 15:1 to 21:1 for the Control Region Irwin et al, J Mol Evol 2009 & Ramos et al, PLOS One 2013

31 Reporting Thresholds Poisson Distribution = Predicts the degree of spread around a known average # of Minor Site Observations 95% of the error is below this level Poisson Distribution with (mean) Corresponding to the AT Reporting (Interpretation) Threshold (RT) Analytical Threshold (AT) Nucleotide Position

32 ECDF = Empirical Cumulative Distribution Function Reporting Thresholds

33 False Heteroplasmy

34 Impact of False Heteroplasmy onthresholds Minor Variant % Heteroplasmy Number of Observations Nucleotide Position

35 Validation of the illumina D-loop Protocol Poster FG17 Laura Wilson Transposase Adapted (TA) Primer: F TCGTCGGCAGCGTCAGATGTGTATAAGAGACACACCATTAGCACCCAAAGCT-3

36 Amplicon Input for Library Preparation

37 Mixture Studies

38 Summary If we all agree that NGS should be employed in forensic cases then we need to better understand rates of heteroplasmy transmission and drift of heteroplasmic variants where to set thresholds for reporting heteroplasmy how DNA damage will impact the reporting thresholds whether the NGS approach will be comparable to the current approach when dealing with forensic samples statistical approaches when reporting heteroplasmy

39 Thanks!! Illumina Cydne Holt, Kathy Stephens, Joe Valaro, Carey Davis, Dan Gheba, etc SoftGenetics NextGENe John Fosnacht, Teresa Snyder-Leiby, etc Penn State Kateryna Makova, Anton Nekratenko Mitotyping Technologies Bob Bever, et al Battelle Memorial Institute National Institute of Justice (NIJ 2014-DN-BX-K022) Eberly College of Science, Forensic Science Program

40 Walther Parson & Ann Gross Jen McElhoe, Research Associate (NIJ) Master s Students: Molly Rathbun (damage) Laura Wilson (D-loop val) UG Students: Jaclyn Junod (enzymes) Current Research Group Alyssa Duffy Lindsay Domdrosky Jillian Baker Sean Lynch

41