Title: Quantitative heteroduplex analysis for SNP genotyping.

Size: px
Start display at page:

Download "Title: Quantitative heteroduplex analysis for SNP genotyping."

Transcription

1 Elsevier Editorial System(tm) for Analytical Biochemistry Manuscript Draft Manuscript Number: Title: Quantitative heteroduplex analysis for SNP genotyping. Article Type: Full Length Article Section/Category: DNA Recombinant Techniques and Nucleic Acids Keywords: Keywords: High-resolution melting; SNP; mixing; spiking; genotyping; heteroduplex analysis; quantitative TGCE analysis; nearest-neighbor symmetry. Corresponding Author: Dr. Robert A. Palais, Corresponding Author's Institution: University of Utah First Author: Robert A. Palais Order of Authors: Robert A. Palais; Michael A Liew; Michael A Liew; Carl T Wittwer Manuscript Region of Origin: Abstract: ABSTRACT High-resolution melting of PCR products can detect heterozygous mutations and most homozygous mutations without electrophoretic or chromatographic separations. However, some homozygous SNPs have melting curves identical to the wild type as predicted by nearest-neighbor thermodynamic models. In these cases, if DNA of a known reference genotype is added to each unknown before PCR, quantitative heteroduplex analysis can differentiate heterozygous, homozygous, and wild type genotypes if the fraction of reference DNA is chosen carefully. Theoretical calculations suggest that melting curve separation is proportional to heteroduplex content difference, and that the addition of reference homozygous DNA at oneseventh of total DNA results in the best discrimination between the three genotypes of bi-allelic SNPs. This theory was verified experimentally by quantitative analysis of both high-resolution melting and temperature

2 gradient capillary electrophoresis data. Reference genotype proportions other than one-seventh of total DNA were suboptimal and failed to distinguish some genotypes. Optimal mixing before PCR followed by high-resolution melting analysis permits genotyping of all SNPs with a single closed-tube analysis.

3 Cover Letter Dear Editor: We are pleased to submit our manuscript, "Quantitative heteroduplex analysis for SNP genotyping", for your consideration in Analytical Biochemistry. Based on recently developed high-resolution DNA melting techniques, the manuscript both develops a theoretical background and presents experimental evidence for complete SNP genotyping by single-step quantitative heteroduplex analysis. We have found that the optimal fraction of homozygous reference DNA added to unknown samples is 1/7. In contrast to prior methods where mixing is performed after PCR, reference DNA is added before PCR, allowing closed-tube homogeneous analysis. The method is simple, rapid, and inexpensive, requiring only PCR, a dsdna dye, and a melting apparatus. Sincerely, Bob Palais, Ph.D., for the authors

4 * Manuscript Quantitative heteroduplex analysis for SNP genotyping. Deleted: and optimization of DNA mixtures for genotyping all SNPs by high-resolution melting Short title: Quantitative heteroduplex analysis. Deleted: for genotyping Robert A Palais 1, *, Michael A Liew 2, and Carl T Wittwer 3 1 Department of Mathematics, University of Utah, Salt Lake City, UT. 2 Institute for Clinical and Experimental Pathology, ARUP, Salt Lake City, UT. 3 Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT. * Address correspondence to this author at: Department of Mathematics, University of Utah, Salt Lake City, UT Tel , Fax ; palais@math.utah.edu.

5 ABSTRACT High-resolution melting of PCR products can detect heterozygous mutations and most homozygous mutations without electrophoretic or chromatographic separations. However, some homozygous SNPs have melting curves identical to the wild type as predicted by nearest-neighbor thermodynamic models. In these cases, if DNA of a known reference genotype is added to each unknown before PCR, quantitative heteroduplex analysis can differentiate heterozygous, homozygous, and wild type genotypes if the fraction of reference DNA is chosen carefully. Theoretical calculations suggest that melting curve separation is proportional to heteroduplex content difference, and that the addition of reference homozygous DNA at one-seventh of total DNA results in the best discrimination between the three genotypes of bi-allelic SNPs. This theory was verified experimentally by quantitative analysis of both high-resolution melting and temperature gradient capillary electrophoresis data. Reference genotype proportions other than one-seventh of total DNA were suboptimal and failed to distinguish some genotypes. Optimal mixing before PCR followed by high-resolution melting analysis permits genotyping of all SNPs with a single closed-tube analysis. Keywords: High-resolution melting; SNP; mixing; spiking; genotyping; heteroduplex analysis; quantitative TGCE analysis; nearest-neighbor symmetry. Deleted: techniques Deleted: differences Deleted: To address the remaining Deleted: we propose adding Deleted: homozygous Deleted: which enables Deleted: distinguish between Deleted: discrimination among Deleted: a Deleted: high-resolution melting curves from Deleted: SNP Deleted: a Deleted: SNP Deleted: - Deleted: samples Deleted: DNA Deleted: Our analysis Deleted: s Deleted: when homozygotes are most similar, quantity of additional Deleted: will produce Deleted: which optimizes the Deleted: optimal Deleted: separation Deleted: Deleted:, diploid DNA Deleted: comprises one-seventh of the resulting mixture Deleted: Q Deleted: (qtgce) Deleted: validated the model independently and demonstrated that suboptimal mixtures Deleted: the Deleted: and

6 INTRODUCTION Heteroduplex analysis is a popular technique to screen for sequence variants in diploid DNA. After PCR, heteroduplexes are analyzed by separation techniques such as conventional gel electrophoresis [1,2,3], denaturing high pressure liquid chromatography (dhplc)[4], and temperature gradient capillary electrophoresis (TGCE) [5]. Recently, heteroduplexes have been detected directly in solution after PCR by high-resolution melting analysis. Either labeled primers [6] or a saturating DNA dye [7] were used to detect a change in shape of the fluorescent melting curve resulting from heteroduplexes. High-resolution melting of PCR products from diploid DNA has been used for mutation scanning [8-11], HLA matching [12], and genotyping [7, 13]. Heteroduplex analysis techniques using separation are seldom used for genotyping because different homozygotes are usually not resolved. Both dhplc and TGCE usually fail to detect homozygous single nucleotide polymorphisms (SNPs), as well as small homozygous insertions and deletions. If suspected, these homozygous changes can be detected by mixing the PCR product of the unknown sample with the PCR product from a known homozygous reference sample. The mixture is first denatured, followed by cooling to form heteroduplexes, and the separation repeated. If heteroduplexes are detected in the mixture, the two samples are of different genotype. Two sequential analyses and manual processing are required, however, exposing the concentrated PCR product to the laboratory and increasing the chance of PCR product contamination of subsequent reactions. High-resolution melting can usually distinguish different homozygotes by a difference in melting temperature. Complete genotyping of human SNPs by high-resolution melting is possible in over 90% of cases using short PCR products [13]. However, with some SNPs, the melting Deleted: usually separated Deleted: although Deleted: DHPLC Deleted:, Deleted: 0 Deleted: can be used Deleted: PCR Deleted: without separation Deleted: 5 Deleted: when Deleted: were produced by PCR Deleted: is Deleted: separated Deleted: In some cases, DHPLC may separate PCR products by size (13). However, b Deleted: DHPLC Deleted: base changes Deleted: a Deleted: PCR product Deleted:, Deleted: ing, then hybridizing the mixture, and performing another separation. Deleted: Formation Deleted: and Deleted: is exposed Deleted:, Deleted: Deleted: to DHPLC and TGCE, Deleted:. can usually be distinguished by high-resolution melting analysis Deleted: in Deleted: some curves may not distinguish the mutant homozygote from wild type. Typically, this is due to a

7 nearest-neighbor thermodynamic symmetry [14, 15] where the bases adjacent to the SNP are identical on both DNA strands, and the SNP consists of an interchange between complementary bases. An example is the SNP causative for hemochromatosis (HFE) H63D, 187C>G. The two homozygous genotypes 5 -TCA-3 and 5 -TGA-3 have identical nearest neighbor pairs (TC/AG and CA/GT). For complete genotyping of these symmetric SNPs, post-pcr mixing and separation studies could be performed, but the advantage of closed-tube analysis is then lost. It is known that when DNA mixtures are amplified by PCR for heteroduplex detection, the strand proportions of different genotypes before and after amplification do not change [2, 3]. This suggests that complete genotyping of these SNPs should also be possible by mixing unknown and reference samples before PCR instead of after. Depending on both the fraction of reference DNA and the genotype of the unknown sample, different amounts of heteroduplexes will be produced that should allow discrimination of all genotypes. Previously, we empirically determined that such discrimination was possible with the addition of 15% homozygous reference DNA to 85% of unknown DNA prior to PCR [13]. However, the optimal percentage of reference DNA was not known. We now present a predictive mathematical model for the heteroduplex content of mixtures in terms of the fraction of homozygous reference DNA added and the DNA of unknown genotype. The resultant heteroduplex content determines the extent the melting curve deviates from wild type samples and the relative intensity of the heteroduplex TGCE peaks. The reference DNA fraction (wild type) that optimally distinguishes genotypes is 1 7 of the total DNA, resulting in predicted heteroduplex contents of 0 (wild type), (heterozygote), and (homozygous mutant) This prediction was tested by amplifying various fractions of reference wild type DNA mixed into each genotype, followed by quantitative high-resolution melting curve analysis and TGCE. Deleted: ) gene locus To address this limitation posed by an Earlier studies have confirmed that w of mixed genotypes i s stoichiometric of strands are nearly the same... [1] Deleted: When homozygous samples are mixed after PCR, equal volumes of PCR products are combined, denatured, annealed, and melted. lternatively, unknown DNA can be mixed with known homozygous DNA... [2] Deleted: amount that is added whether the genotype of the sample DNA is heterozygous, homozygous of the same genotype, or homozygous of a different genotype (none in the case of the same genotype.) By choosing the amount of reference DNA (e.g., wild type) properly, we would like this genotype-dependent heteroduplex content difference to result in high-resolution melting curves which allow discrimination of all SNP genotypes.... [3] Deleted: Deleted: (w/w) of Deleted: the optimum amount of known homozygous DNA Deleted: Deleted: Deleted: to distinguish all SNP genotypes was approximately 15%. Deleted: have created... [4] Deleted: present a rigorous derivation of this optimum, by analyzing the theoretical Deleted:, as well as for the effect of on high-resolution separation... [5] Deleted: s Deleted: on size... [6] Deleted: measurements across a full spectrum of genotype mixing proportions, which are also of interest in pooled sample studies Deleted: The primary consequences of this model are: 1) For each reference DNA fraction and after normalization, the difference between melting curves corresponding to different sample genotypes is simply their heteroduplex content difference multiplied by a fixed curve shape, (with a similar result for TGCE peaks); and 2) which heteroduplex content when mixed with wild type, homozygous mutant, and heterozygous genotypes,... [7] Deleted: In Fig. 1, we show highresolution melting curves of amplicons from DNA exhibiting three SNP... [8]

8 MATERIALS AND METHODS Oligonucleotides were obtained from Integrated DNA Technologies and quantified by absorbance at 260 nm (A 260 ). Four oligonucleotides of sequence CCAGCTGTTCGTGTTCTATGATXATGAGAGTCGCCGTGTG and its complement CACACGGCGACTCTCATYATCATAGAACACGAACAGCTGG where X and Y were either C or G were purified by HPLC. Homoduplexes (X=C, Y=G or X=G, Y=C) or heteroduplexes (X=C, Y=C or X=G, Y=G) of the HFE 187C>G SNP were formed by binary combinations. Nine whole blood samples (from three different people of each of the three HFE genotypes, wild type, homozygous 187C>G, and heterozygous 187C>G) were obtained from ARUP laboratories after identifications were removed. Human genomic DNA was extracted from these samples (QIAamp DNA Blood Kit, QIAGEN), concentrated by ethanol precipitation and quantified by A 260. One of the wild type samples was selected as the reference and mixed with the other samples prior to PCR. The final fraction of reference DNA ranged from 0 to 1. For each DNA sample, 21 different reference fractions and the original unmixed sample were amplified by PCR and analyzed by high-resolution melting and TGCE. PCR For high-resolution melting analysis, 40 bp products were amplified in a LightCycler (Roche)[13]. Ten microliter reaction mixtures consisted of 25 ng of genomic DNA, 3 mm MgCl 2, 1x LightCycler FastStart DNA Master Hybridization Probes master mix, 1x LCGreen Plus (Idaho Technology), 0.5 µm forward (CCAGCTGTTCGTGTTCTATGAT ) and reverse (CACACGGCGACTCTCAT) primers and 0.01U/µl E. coli UNG (Roche). The PCR was initiated with a 10 min hold at 50 C followed by a 10 min hold at 95 C. Rapid thermal cycling was performed between 85 C and 63 C at a programmed transition rate of 20 C/s for 40 cycles. After Deleted: In this section we will describe the experimental methods we used to obtain high-resolution melting curves and TGCE data from actual mixtures of reference and sample DNA. Deleted: using a Deleted: ( Deleted: The samples consisted of three independent samples for each of the homochromatosis genotypes: wild type, homozygous 187C>G, and heterozygous 187C>G. Deleted:, Deleted:, Deleted: in Deleted: s Deleted: total DNA we will refer to as reference fractions, Deleted: ing Deleted: Amplification of the hemochromatosis SNP loci Deleted: All DNA samples with a common reference fraction were amplified together, along with two control samples containing heterozygous DNA with no wild type added. Deleted: we used Deleted: small Deleted: amplicon melting with primers as close to the SNP as dimer and misprime constraints permit, Deleted: as described in [12]. The amplicon was 40bp long. The PCR protocol followed here was modified slightly from the protocol described in [12]. PCR was performed in a LightCycler. Deleted: (UNG, Roche) Formatted: Font: 12 pt Deleted: for contamination control by UNG Formatted: Font: (Default) Times New Roman, 12 pt Deleted: and Formatted: Font: (Default) Times New Roman, 12 pt Formatted Deleted: for activation of the Formatted Deleted: the annealing temperature Formatted Formatted... [9]... [10]... [11]... [12]... [13]

9 denaturation at 94 C and rapid cooling to 40 C, a melting curve was generated on the LightCycler at 0.05 C/s to confirm the presence of amplicon. All DNA samples with a common reference fraction were amplified together, along with two heterozygous control samples with no added reference. For TGCE analysis, a 242 bp product was amplified on a ABI Reactions components were as given above, except that 0.4 µm forward (CACATGGTTAAGGCCTGTTG) and reverse (GATCCCACCCTTTCAGACTC) primers were used. The PCR was initiated with a 10 min hold at 25 C and a 6 min hold at 95 C. Thermal cycling consisted of a 30 s hold at 94 C, a 30 s hold at 62 C and a 1min hold at 72 C for 40 cycles, followed by a 7 min hold at 72 C. The samples were then heated to 95 C for 5 min followed by a cooling over 60 min to 25 C for heteroduplex formation. Analysis by high-resolution melting Synthesized oligonucleotide duplexes (0.5 µm) were melted in PCR solution (see above). Prior to high-resolution melting analysis, all samples (synthesized and PCR generated) were heated to 94 C and cooled to 40 C at a programmed rate of 20 C/s. High-resolution melting curves were obtained with an HR-1 instrument (Idaho Technology) by melting at a rate of 0.3 C/s between 65 C to 85 C. Data were analyzed by custom software written in LabView as previously described [7]. After normalization and removal of background fluorescence, data were usually temperatureshifted by superimposing their high-temperature regions where only the most stable homoduplexes remain. The difference between melting curves was quantified as the maximum vertical distance between curves. The heteroduplex proportion of a curve was estimated by this distance (from wild type), normalized so that unmixed heterozygous samples equaled 0.5. Difference plots of the temperature-shifted data were created by subtracting the average of wild type curves from sample curves to highlight the variation between genotypes. Deleted: Samples were then rapidly heated to Deleted: followed by Formatted... [14] Deleted: ed Formatted: Font: (Default) Times New Roman, 12 pt Formatted: Font: (Default) Times New Roman, 12 pt Deleted: analysis Formatted: Font: (Default) Times New Roman, 12 pt Deleted: between 60 C and 85 C Formatted: Font: (Default) Times New Roman, 12 pt Deleted: longer amplicon was required. The PCR protocol followed here was modified slightly from the protocol described in [14]. The amplicon was long. PCR performed i Perkin Elmer block cycler Ten microliter reaction mixtures consisted of 25ng of genomic DNA, 3 mm MgCl 2, 1x LightCycler FastStart DNA Master Hybridization Probes master mi x, and 0.01U/µl Escherichia coli (E. coli) uracil N-glycosylase (UNG, Roche). All samples were then overlayed with mineral oil to prevent... [15] Formatted: Superscript Deleted: for contamination control by UNG for activation of the polymerase followed by a 7min hold at 72 C for final elongation Upon completion of these thermal cycles t slow approximately to promote... [16] Deleted: Fixed most the typos and informal statements. (Without Table 2 (sequence) should I have the 242bp amplicon somewhere?) Deleted: Samples with a common reference fraction were analyzed simultaneously. rapidly... [17] Formatted: Font: (Default) Times New Roman, 12 pt Deleted: promote Formatted: Font: (Default) Times New Roman, 12 pt Deleted: formation Formatted: Font: (Default) Times New Roman, 12 pt Deleted: by inserting capillary tubes containing the PCR mixture into the and ramp while continuously monitoring fluorescence Resulting melting data were first standardized... by [18]

10 Analysis by TGCE PCR amplicons were transferred to 24 well TGCE trays and diluted 1:1 with 1xFastStart Taq polymerase PCR buffer and overlaid with mineral oil. TGCE was performed using the Reveal mutation discovery system, reagents and Revelation software (Spectrumedix) as previously described [16]. DNA samples were injected electro-kinetically at 2 kv for 45 seconds, resulting in peak heights ranging from 5,000-40,000 intensity units with ethidium bromide staining. Optimal results were obtained when the temperature was ramped from C over 21 minutes and data was acquired over 35 minutes. Sequential camera images were converted to plots of image frame number (time) versus intensity units (DNA concentration). TGCE data was fit to exponential decay distributions, one for each peak. Each exponential distribution was one-sided (i.e., zero to the left of the initial peak) and an iterative solution for the amplitudes and decay rates of successive peaks was obtained. The heteroduplex proportion was obtained by dividing the sum of the amplitudes of the two heteroduplex peaks by the sum of all peaks. This ratio was then adjusted by a factor (close to 1) so that heteroduplex controls were equal to the expected value of 0.5. Deleted:

11 RESULTS Experimental melting curves of the thermodynamically symmetric SNP, HFE H63D are shown in Figs. 1 and 2. Duplexes melted in Fig. 1 were formed by annealing equal concentrations of synthetic 40-mer oligonucleotides. After normalization of the melting curves between 0 and 100% fluorescence, both homoduplexes (wild type TCA and mutant TGA) show identical melting transitions as predicted by nearest neighbor thermodynamics. Melting curves of the two pure heteroduplexes are distinguishable from each other and destabilized by about 3 C from the homoduplexes. A mixture of all four strands in equal concentrations simulates melting of a heterozygous DNA sample after PCR amplification, resulting in a composite melting curve in between the homoduplex and heteroduplex melting curves. Fig. 2A shows the melting curves produced from amplifying the three possible genotypes of H63D from human genomic DNA. Similar to the study with synthetic oligonucleotides, the homozygous mutant and wild type samples are superimposed. The melting curve that results from amplification of heterozygous DNA includes both homoduplex and heteroduplex components, resulting in a curve very similar to that observed when all synthetic duplexes are mixed (Fig. 1). Fig. 2B shows the same data presented as a difference plot where each sample is subtracted from the average wild type curve. Although heterozygotes are readily identified on such plots, the two different homozygotes cannot be distinguished. When a homozygous reference DNA is added to each sample before PCR, all genotypes can usually be distinguished, depending on the fraction of reference DNA present. Fig. 3 shows both the theoretical prediction and experimental data correlating the fraction of reference wild type DNA (x) to the resulting heteroduplex proportion. A detailed derivation of our heteroduplex model is presented as Supplementary Material. If the sample genotype is wild type, the mixture has a

12 heteroduplex content of zero for all values of x. If the sample is homozygous mutant, the mixture has heteroduplex content m(x)=2x(1-x), shown as a symmetric curve with a peak at x=0.5 in Fig. 3. This formula agrees with intuition that when x is 0 or 1, the heteroduplex content is 0, and when x is 0.5, equal amounts of wild type DNA and mutant homozygous DNA result in a heteroduplex content of 0.5. If the sample is heterozygous, the mixture has heteroduplex content h(x)= 1 2 (1-x2 ). This formula predicts a steadily decreasing curve with a maximum of 0.5 for the heteroduplex content of a heterozygote without added reference (Fig. 3). The reference DNA fraction predicted to maximize the difference between any two genotypes is x= 1, shown as the thin vertical line in 7 Fig. 3. At x= 1 7, the heteroduplex content of the homozygous mutant sample, m( 1 7 half of the heteroduplex content of the heterozygous sample h( 1 7 ) = )=, is exactly 49 Experimental melting curve data confirming the model is also displayed in Fig. 3. The peak heights on difference plots (e.g. Fig. 2B) were used to establish the heteroduplex proportion resulting from the addition of various fractions of wild type, reference DNA. These values were scaled so that the heteroduplex proportion of unmixed heterozygotes was 0.5, allowing a direct comparison of the experimental values to the model. The correlation between the experiment and the model satisfies R 2 > 0.99 for the heterozygous samples, and R 2 > 0.98 for the homozygous samples. Melting curves of the different H63D genotypes at various values of reference DNA fraction are shown in Figs. 4 and 5. Fig. 4 shows the melting curves corresponding to the optimal reference DNA fraction, 4 = 1. Melting curves of the three genotypes are well separated and 28 7 Deleted: easily classified by the observer's eye or by automatic hierarchal clustering. Difference plots (Fig. 4B) show that the peak height of the heterozygous genotype is approximately half that of the homozygous genotype, as predicted by theory and Fig. 3.

13 Fig. 5 shows melting curves corresponding to suboptimal reference DNA fractions. Fig. 5A and B show melting curves with reference DNA fractions of and. As predicted by Fig. 3, it is 28 difficult to differentiate homozygous mutant and heterozygous samples. Fig. 5C shows the melting curves at a reference DNA fraction of 19, near the predicted value with the best separation at 28 reference DNA fractions greater than 1. Although the homozygous and heterozygous mutant 3 genotypes are distinguishable, much better separation is obtained at the optimal reference DNA fraction of 1 7. The dependence of heteroduplex proportion on the reference DNA fraction was also studied by TGCE. Results similar to melting analysis were obtained with a correlation between model and experiment of R 2 > 0.97 for both heterozygous and homozygous samples (data not shown). TGCE traces at the theoretically optimal reference DNA fraction of 1 7 are shown in Fig. 6. The heteroduplex peaks of the mixtures with homozygous mutant DNA appear approximately halfway between those of the heterozygous and wild type genotypes. Replicates of a common genotype cluster well, though not as perfectly as their melting curves. Mixtures with suboptimal 9 28 and reference DNA fractions failed to distinguish homozygous and heterozygous mutant samples, similar to results obtained by melting analysis (data not shown).

14 DISCUSSION The objective of this study was to find the optimal fraction of homozygous reference DNA in an unknown sample added before PCR in order to distinguish all genotypes by quantitative heteroduplex analysis. Theoretical considerations suggest that this optimal fraction is 1 and a 7 detailed derivation is provided as Supplemental Material. High-resolution melting analysis [13] and TGCE [5] were used to provide experimental verification of the theory. Our quantitative heteroduplex model is based on three assumptions. First, when two genotypes of DNA are mixed, we assume that PCR generates all four strands with equal efficiency. Therefore, at the end of PCR, the relative proportion of these strands, whether in homoduplex or heteroduplex form, is the same as that of the corresponding strands in the initial template. Second, when PCR products are denaturated and annealed after amplification, the proportion of homoduplex and heteroduplex species formed is given by the product of the relative concentrations of each strand, i.e., there is no preference for association with exactly vs. nearly complementary strands. Finally, the model assumes that the normalized melting curve resulting from such a mixture of duplexes is given by the weighted average of the melting curves corresponding to each duplex, with coefficients given by their relative proportion in the mixture. As a consequence of this model, the expected difference between two melting curves is the difference in their heteroduplex proportions multiplied by a universal curve. The heteroduplex proportion depends on the genotype and reference DNA fraction, while the universal curve does not. The universal curve is the difference between the mean homoduplex and mean heteroduplex Deleted: and melting curves. The temperature at the maximum difference between curves is constant. However, the magnitude of that difference and area between curves are directly proportional to the difference in the heteroduplex proportion between samples. That is, for any particular PCR product and SNP,

15 the position and shape of the difference curve remains constant at different allele fractions, while its magnitude and area are proportional to the difference in heteroduplex proportions. These predictions are supported by data showing that the shape and position of the difference curves remains nearly constant, while the peak height and area depend on the heteroduplex proportion difference. For experimental demonstration, an SNP with nearest neighbor thermodynamic identity between the wild type and mutant homozygous DNA was chosen. The added reference DNA was wild type, though mutant homozygous DNA could have been used without affecting the results. Adding reference DNA to samples before PCR allows complete genotyping if the amount of heteroduplexes formed after PCR are sufficiently different between all three genotypes. When wild type reference DNA is mixed with wild type samples, no heteroduplexes result. However, when wild type DNA is added to heterozygous or homozygous mutant DNA, the amount of heteroduplexes formed after PCR depends on the genotype and the fraction of wild type DNA present. The melting curve of a heteroduplex-enhanced homozygous mutant sample moves away from the stationary wild type melting curve, while the melting curve of a heteroduplex-reduced heterozygous sample moves toward the other two and eventually crosses between them. Both melting analysis and TGCE results follow the quadratic behavior of the model qualitatively and quantitatively over the full range of reference DNA fractions. The optimal fraction of reference DNA for genotype discrimination by theory and experiment was 1. This 7 fraction should be independent of amplicon size and type of sequence variant. TGCE is not usually used for heteroduplex quantification. The two heteroduplexes are delayed several frames compared to homoduplexes, and may be separated from each other as well. TGCE peaks exhibit simple mathematical behavior which makes it possible to separate and quantify the relative contributions of the heteroduplexes. Agreement between the results of TGCE

16 analysis, melting curve experiments and theory not only confirms the theory but also indicates that both of these methods can be used for estimation of heteroduplex content of samples mixed with reference DNA or of pooled samples. Most heteroduplex-based techniques do not detect homozygous sequence variants. An exception is high-resolution melting analysis where most, but not all homozygous SNPs can be detected [13]. The technique is attractive as a closed-tube method that allows both scanning for any sequence variant, and specific genotyping without probes [7, 8, 13]. Scanning sensitivity and specificity approach 100% for PCR products less than 400 bp [8]. The utility of high-resolution amplicon melting for genotyping was recently demonstrated in one study where 21 out of 21 pairs of different heterozygotes within the same amplicon were distinguishable [17]. However, some homozygous variants (4-16% of human SNPs) have very similar, if not identical, melting transitions compared to wild type [13]. Detection of these SNPs usually requires the addition of a reference amplicon after PCR is completed, following by a second heteroduplex analysis. This study demonstrates that complete genotyping in a single step is possible if a reference homozygous DNA is added to the unknown samples before PCR and quantitative heteroduplex analysis is performed. It provides an experimentally validated model for performing this analysis which also determines the optimal amount of reference DNA to be added. Acknowledgments This research was supported by grants from the State of Utah Centers of Excellence Program, Associated Regional and University Pathologists (ARUP) and NIH grants GM and GM We would also like to thank Noriko Kusukawa for reviewing the manuscript.

17 References [1] W.E. Highsmith Jr., Q. Jin, A.J. Nataraj, J.M. O'Connor, V.D. Burland, W.R. Baubonis,.F.P. Curtis, N. Kusukawa, M.M. Garner, Use of a DNA toolbox for the characterization of mutation scanning methods. I: construction of the toolbox and evaluation of heteroduplex analysis, Electrophoresis 20(6) (1999) [2] G. Ruano, K.K. Kidd, Modeling of heteroduplex formation during PCR from mixtures of DNA templates, PCR Methods App. 2 (1992) [3] G. Ruano, A.S. Deinard, S.Tishkoff, K.K. Kidd, Detection of DNA sequence variation via deliberate heteroduplex formation from genomic DNAs amplified en masse in "population tubes", PCR Methods App. 4 (1994) [4] W. Xiao, P.J. Oefner, Denaturing high-performance liquid chromatography: A review, Hum. Mutat. 17(6) (2001) [5] Q. Li, Z. Liu, H. Monroe, C.T. Culiat, Integrated platform for detection of DNA sequence variants using capillary array electrophoresis, Electrophoresis 23(10) (2002) [6] C.N. Gundry, J.G. Vandersteen, G.H. Reed, R.J. Pryor, J. Chen, C.T. Wittwer, Amplicon melting analysis with labeled primers: a closed-tube method for differentiating homozygotes and heterozygotes, Clin. Chem. 49(3) (2003)

18 [7] C.T. Wittwer, G.H. Reed, C.N. Gundry, J.G.Vandersteen, R.J, High-resolution genotyping by amplicon melting analysis using LCGreen, Clin. Chem. 49(6 Pt 1) (2003) [8] G.H. Reed, C.T. Wittwer, Sensitivity and specificity of single-nucleotide polymorphism scanning by high-resolution melting analysis, Clin. Chem. 50(10) (2004) [9] J.T. McKinney, N. Longo, S.H. Hahn, D. Matern, P. Rinaldo, A.W. Strauss, S. Dobrowolski. Rapid, comprehensive screening of the human medium chain acyl-coa dehydrogenase gene, Mol. Genet. Metab. 82(2) (2004) [10] C. Willmore, J.A. Holden, L. Zhou, S. Tripp, C.T. Wittwer, L.J. Layfield, Detection of c-kitactivating mutations in gastrointestinal stromal tumors by high-resolution amplicon melting analysis, Am. J. Clin. Pathol. 122(2) (2004) [11] S.F. Dobrowolski, J.T. McKinney, C. Amat di San Filippo, K. Giak Sim, B. Wilcken, N. Longo. Validation of dye-binding/high-resolution thermal denaturation for the identification of mutations in the SLC22A5 gene. Hum Mutat. 25(3) (2005) [12] L. Zhou, J. Vandersteen, L. Wang, T. Fuller, M. Taylor, B. Palais, C.T. Wittwer, Highresolution DNA melting curve analysis to establish HLA genotypic identity, Tissue Antigens 64(2) (2004)

19 [13] M. Liew, R. Pryor, R. Palais, C. Meadows, M. Erali, E. Lyon, C. Wittwer, Genotyping of single-nucleotide polymorphisms by high-resolution melting of small amplicons, Clin. Chem. 50(7) (2004) [14] K.J. Breslauer, R. Frank, H. Blocker, L.A. Marky. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci USA 83 (1986)) [15] N. Peyret, P.A. Seneviratne, H.T. Allawi, J. SantaLucia Jr. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A:A, C:C, G:G, and T:T mismatches. Biochemistry. 38 (1999) [16] R.L. Margraf, et al., Genotyping hepatitis C virus by heteroduplex mobility analysis using temperature gradient capillary electrophoresis, J. Clin. Microbiol. 42(10) (2004) [17] R. Graham, M. Liew, C. Meadows, E. Lyon, C.T. Wittwer. Distinguishing different DNA heterozygotes by high-resolution melting. Clin Chem. May 19 (2005) [Epub ahead of print].

20 Figure Legends Figure 1. Melting curves of synthethized oligonucleotide homoduplexes, heteroduplexes and their mixture for an SNP with nearest neighbor symmetry. Normalized melting curves of synthetic 40 bp homoduplexes and heteroduplexes are shown. The sequences of the duplexes are the same as the HFE H63D (187C>G) PCR products used in the rest of the study. Binary mixtures of strands with complementary bases (C:G or G:C) at the SNP form homoduplexes that have identical melting curves because of nearest neighbor symmetry. Strands with mismatched base pairs at the SNP form heteroduplexes with destabilized melting curves (C:C less stable than G:G). Equal parts of all strands were mixed and the resulting melting curve in the center resembles that of the genomic heterozygous sample after PCR amplification (see Fig. 2). Figure 2. Melting curves of an SNP with nearest neighbor symmetry after PCR amplifcation. The same sequence as in Fig. 1 was amplified from genomic DNA by PCR and melting curves were obtained for wild type (WT), mutant homozygous (MUT) and heterozygous (HET) genotypes. For each genotype, three different DNA samples are displayed. In panel A, normalized, temperature-shifted melting curves are displayed. All six homozygous samples appear to trace the same path. Melting curves of heterozygous samples show a low temperature transition from the contribution of heteroduplexes, as predicted from Fig. 1. In panel B, the difference of each curve from the mean wild type curve is displayed.

21 Figure 3. The dependence of heteroduplex proportion on genotype and the fraction of added wild type DNA. When wild type reference DNA is mixed with homozygous mutant (MUT) or heterozygous (HET) DNA, the resultant heteroduplex proportion depends on the fraction of reference DNA (x) in the mixture. For homozygous mutant DNA, the heteroduplex proportion is 2x(1-x), shown in the figure as a solid curve. For heterozygous DNA, the heteroduplex proportion is ½(1-x 2 ), shown as a dashed curve. The optimal separation between genotypes occurs when the reference DNA fraction is 1 7 (left vertical line), giving heteroduplex proportions of 0 49 (WT), (MUT) and 24 (HET). When the reference DNA fraction is (right vertical line), both MUT and HET samples have the same heterodupex proportion ( 4 ) and cannot be differentiated. Detailed 9 derivations are provided as Supplementary Material. Experimental high-resolution melting data for homozygous mutant (circles) and heterozygous (squares) samples are shown as the mean and standard deviation of tripicate samples. These values are the maximum fluorescence difference between the wild type melting curves and each mixture, scaled to make unmixed heterozygous samples equal to 0.5. This scaling allows for direct comparison to predicted heteroduplex proportions. Figure 4. Melting curve analysis of an SNP with nearest neighbor symmetry after addition of an optimal amount of wild type DNA and PCR amplification. Triplicate samples of each genotype (WT = wild type, HET = heterozygous, MUT = homozygous mutant) were mixed with wild type DNA to obtain a reference DNA fraction of 1. After amplification, melting, and 7 normalization, mutant homozygous curves were equidistant from the wild type curves and the barely altered heterozygous curves. All samples of the same genotype appear as single curves because of the reproducibility of high-resolution analysis. Individual curves can be seen in panel B

22 where the difference of each curve with the mean wild type curve is displayed. The heterozygous and homozygous curves have the same shape and peak, while their magnitude varies by a factor of two, as predicted (Fig. 3). The heteroduplex content of the homozygous mutant mixture is 0.5 times that of the heterozygous mixture. Figure 5. Melting curve analysis of an SNP with nearest neighbor symmetry after addition of suboptimal amounts of wild type DNA and PCR amplification. Three samples of each genotype (WT = wild type, HET = heterozygous, MUT = homozygous mutant) were mixed with different amounts of wild type DNA to obtain different suboptimal reference DNA fractions. Wild type DNA fractions of 9, 14 19, and are shown in Panels A, B, and C, respectively and can be compared to the predicted heteroduplex proportions of Fig. 3. Samples of the same genotype usually appear as single curves because of the reproducibility of high-resolution analysis. Figure 6. TGCE data of an SNP with nearest neighbor symmetry after addition of an optimal amount of wild type DNA and PCR amplification. Triplicate samples of each genotype were mixed with wild type DNA to obtain a reference DNA fraction of 1. After amplification and 7 separation by TGCE, data was normalized by shifting and scaling the largest peaks to a common location and magnitude. Mutant homozygous curves were approximately equidistant from the wild type curves and the barely altered heterozygous curves.

23 Page 4: [1] Deleted Carl 4/26/2005 9:22:00 PM ) Page 4: [1] Deleted Carl 4/26/2005 9:23:00 PM Page 4: [1] Deleted Carl 4/26/2005 9:22:00 PM gene locus Page 4: [1] Deleted Carl 4/26/2005 9:34:00 PM To address this limitation posed by Page 4: [1] Deleted Carl 4/26/2005 9:34:00 PM an Page 4: [1] Deleted Carl 4/26/2005 9:37:00 PM Earlier studies have confirmed that w Page 4: [1] Deleted Carl 4/26/2005 9:38:00 PM of mixed genotypes i Page 4: [1] Deleted Carl 4/26/2005 9:38:00 PM s Page 4: [1] Deleted Carl 4/26/2005 9:38:00 PM Page 4: [1] Deleted Carl 4/26/2005 9:39:00 PM stoichiometric Page 4: [1] Deleted Carl 4/26/2005 9:39:00 PM of strands Page 4: [1] Deleted Carl 4/26/2005 9:37:00 PM are nearly the same Page 4: [2] Deleted 3/6/2005 8:00:00 PM When homozygous samples are mixed after PCR, equal volumes of PCR products are combined, denatured, annealed, and melted.

24 Page 4: [2] Deleted lternatively, unknown DNA can be mixed with known homozygous DNA 3/6/2005 8:06:00 PM Page 4: [3] Deleted Carl 4/26/2005 9:42:00 PM amount Page 4: [3] Deleted Carl 4/26/2005 9:42:00 PM that is added Page 4: [3] Deleted Carl 4/26/2005 9:41:00 PM whether the genotype of the sample DNA is heterozygous, homozygous of the same genotype, or homozygous of a different genotype Page 4: [3] Deleted Carl 4/26/2005 9:41:00 PM (none in the case of the same genotype.) Page 4: [3] Deleted Carl 4/26/ :10:00 PM By choosing the amount of reference DNA (e.g., wild type) properly, we would like this genotype-dependent heteroduplex content difference to result in high-resolution melting curves which allow discrimination of all SNP genotypes. Page 4: [4] Deleted Carl 4/26/ :12:00 PM have Page 4: [4] Deleted Carl 4/26/ :12:00 PM created Page 4: [5] Deleted Carl 4/26/ :18:00 PM, as well as for the effect of Page 4: [5] Deleted Carl 4/26/ :18:00 PM on Page 4: [5] Deleted Carl 4/26/ :22:00 PM high-resolution Page 4: [5] Deleted Carl 4/26/ :22:00 PM

25 separation Page 4: [6] Deleted Carl 4/26/ :23:00 PM on Page 4: [6] Deleted Carl 4/27/2005 6:53:00 AM size Page 4: [7] Deleted Carl 4/26/ :24:00 PM The primary consequences of this model are: 1) For each reference DNA fraction and after normalization, the difference between melting curves corresponding to different sample genotypes is simply their heteroduplex content difference multiplied by a fixed curve shape, (with a similar result for TGCE peaks); and 2) Page 4: [7] Deleted Carl 4/26/ :24:00 PM which Page 4: [7] Deleted Carl 4/26/ :26:00 PM heteroduplex content Page 4: [7] Deleted Carl 4/26/ :27:00 PM when mixed with wild type, homozygous mutant, and heterozygous genotypes, respectively. Page 4: [7] Deleted Carl 4/26/ :27:00 PM This is quite close to the empirically derived value of 15%. Page 4: [7] Deleted Carl 4/26/ :29:00 PM We then tested the Page 4: [7] Deleted Carl 4/26/ :29:00 PM mixtures representing Page 4: [7] Deleted Carl 4/26/ :30:00 PM a full spectrum of mixing

26 Page 4: [7] Deleted Carl 4/26/ :30:00 PM and performing q Page 4: [7] Deleted Carl 4/26/ :31:00 PM the Page 4: [8] Deleted 3/6/2005 8:31:00 PM In Fig. 1, we show high-resolution melting curves of amplicons from DNA exhibiting three SNP genotypes. The melting curve corresponding to samples with a homozygous mutation is indistinguishable from that of the wild type, due to nearest-neighbor thermodynamic symmetry. (This says that the bases immediately surrounding the mutation are identical when the strands are interchanged, e.g., 5'-TCA-3' 5'-TGA-3' 3'-AGT-5' 3'-ACT-5' The melting curves corresponding to heterozygous samples, which as genomic DNA consist of equal parts wild-type homoduplexes and mutant homoduplexes, appear quite different than the melting curves of either of these species of duplex. Even though PCR amplifies all strands in the form of homoduplexes, by the time it plateaus, strands are reassociating randomly into homoduplexes and heteroduplexes instead of extending. In the heterozygous case, the DNA that is melted is an equal mixture of four species of duplex, the wild-type and mutant heteroduplexes, and two types of nearly complementary heteroduplexes, so that the total heteroduplex content is ${1 \over 2}$ (Fig. 2). There are only complementary strands amplified from wild-type and homozygous mutant samples, so even at the end of PCR, there are no heteroduplexes present. After PCR and analysis has determined a sample to be indistinguishable wild-type or mutant

27 homozygous, a procedure sometimes known as spiking can be performed, which consists of of adding DNA of known genotype to determine genotypic identity or difference from the absence or presence of heteroduplexes. Our goal is to find a method requiring no post-pcr mixing (which is susceptible to contamination) which can distinguish wild-type and homozygous mutant from each other, as well as from heterozygous samples, in one-step. To do so, we seek the optimal fraction of wild-type DNA to be added to samples before PCR, which we call the `mixture fraction', so that after amplification and random reassociation of strands, the heteroduplex content of mixtures with the different genotypes will make the resulting melting curves most distinguishable. The mixture with a wild-type samples will still have no heteroduplex content regardless of the amount of the identical DNA which is added. In contrast, if wild-type DNA is mixed with a homozygous mutant sample, even though PCR amplifies all strands of the mixture as homoduplexes, by the time it plateaus (or after heating then cooling the mixture to promote random reassociation) a fraction of heteroduplexes will be formed, depending on the amount of wild-type DNA added. If wild-type DNA is mixed with a heterozygous mutant sample, the mixture will now consist of unequal parts of wild-type homoduplexes and mutant homoduplexes. At the end of PCR, or after dissociating by heating and annealing by cooling, a reduced fraction of heteroduplexes will be formed depending on the amount of wild-type DNA added. As the heteroduplex enhanced homozygous mutant melting curve moves away from the wild-type melting curve, the heteroduplex reduced heterozygous melting curve moves toward them both. We seek the point where the three are best separated.

28 Page 5: [9] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 5: [10] Deleted Carl 4/29/ :14:00 AM for activation of the polymerase Page 5: [11] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 5: [12] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 5: [13] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 6: [14] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 6: [14] Formatted Carl 4/29/ :09:00 AM Font: (Default) Times New Roman, 12 pt Page 6: [15] Deleted Carl 4/29/ :19:00 AM longer amplicon was required. The PCR protocol followed here was modified slightly from the protocol described in [14]. The amplicon was Page 6: [15] Deleted Carl 4/29/ :20:00 AM long. Page 6: [15] Deleted Carl 4/29/ :20:00 AM PCR Page 6: [15] Deleted Carl 4/29/ :20:00 AM performed i Page 6: [15] Deleted Carl 4/29/ :20:00 AM Page 6: [15] Deleted Carl 4/26/ :48:00 PM Perkin Elmer Page 6: [15] Deleted Carl 4/26/ :48:00 PM block cycler Page 6: [15] Deleted Carl 4/29/ :23:00 AM

29 Ten microliter reaction mixtures consisted of 25ng of genomic DNA, 3 mm MgCl 2, 1x LightCycler FastStart DNA Master Hybridization Probes master mi Page 6: [15] Deleted Carl 4/29/ :23:00 AM x, Page 6: [15] Deleted Carl 4/29/ :24:00 AM and 0.01U/µl Escherichia coli (E. coli) uracil N-glycosylase (UNG, Roche). All samples were then overlayed with mineral oil to prevent evaporation Page 6: [15] Deleted Carl 4/29/ :26:00 AM 25 Page 6: [16] Deleted Carl 4/29/ :27:00 AM for contamination control by UNG Page 6: [16] Deleted Carl 4/29/ :27:00 AM for activation of the polymerase Page 6: [16] Deleted Carl 4/29/ :27:00 AM followed by a 7min hold at 72 C for final elongation Page 6: [16] Deleted Carl 4/29/ :27:00 AM Upon completion of these thermal cycles t Page 6: [16] Deleted Carl 4/29/ :27:00 AM slow Page 6: [16] Deleted Carl 4/29/ :28:00 AM approximately Page 6: [16] Deleted Carl 4/29/ :28:00 AM to promote Page 6: [17] Deleted Carl 4/29/ :30:00 AM

30 Samples with a common reference fraction were analyzed simultaneously. Page 6: [17] Deleted Carl 4/29/ :17:00 AM rapidly Page 6: [18] Deleted Carl 4/29/ :32:00 AM by inserting capillary tubes containing the PCR mixture into Page 6: [18] Deleted Carl 4/29/ :31:00 AM the Page 6: [18] Deleted Carl 4/29/ :33:00 AM and Page 6: [18] Deleted Carl 4/29/ :33:00 AM ramp Page 6: [18] Deleted Carl 4/29/ :31:00 AM while continuously monitoring fluorescence Page 6: [18] Deleted Carl 4/29/ :35:00 AM Resulting melting data were first standardized by Page 6: [18] Deleted Carl 4/29/ :35:00 AM. Next, they Page 6: [18] Deleted Carl 4/29/ :35:00 AM to adjust for small variations in reported temperature, Page 6: [18] Deleted Carl 4/29/ :36:00 AM toe' feature, or Page 6: [18] Deleted Carl 4/29/ :36:00 AM are left to melt Page 6: [18] Deleted Carl 4/29/ :43:00 AM The amplitude of the d

31 Figure Click here to download high resolution image

32 Figure Click here to download high resolution image

33 Figure Click here to download high resolution image

34 Figure Click here to download high resolution image

35 Figure Click here to download high resolution image

36 Figure Click here to download high resolution image

37 Supplementary Material Click here to download Supplementary Material: Supplementary_Material.doc Supplementary Material Here we provide details of our model for quantitative heteroduplex analysis of SNPs. First we determine an expression for the difference curve obtained by subtracting the melting curves from samples of different genotypes mixed with homozygous reference DNA at fraction x. This curve is the given by the difference in heteroduplex fractions of the mixtures times a weighted average of the homoduplex curves minus the mean of the heteroduplex curves. When the melting curves of wild type and mutant homozygotes are indistinguishable, the weighted average of the homoduplex curves is just the common homozygote curve. In this case the reference DNA fraction which maximizes separation of the difference curves all three genotypes of bi-allelic diploid DNA is x= 1 7. Samples of different genotypes are designated W for wild type, M for homozygous mutant, and H for heterozygous mutant. The forward and reverse strands of the wild type amplicon are designated w and w, respectively,with m and m for the homozygous mutant strands. We assume that the genotype of the reference DNA is wild type. We assume amplification preserves the relative proportion of all strand species. Let x be the homozygous reference DNA fraction at the beginning of PCR. If all strands are fully extended during PCR and no strand re-association occurs, Table 1 gives the homoduplex fractions at the end of PCR. Table 1. Homoduplex fractions without strand re-association Genotype [ww ] [mm ] W 1 0 M x 1-x H x (1-x) = 1 2 (1+x) 1 2 (1-x)