APPLICATION NOTE

Size: px
Start display at page:

Download "APPLICATION NOTE"

Transcription

1 APPLICATION NOTE NGS Library Preparation Produces Balanced, Comprehensive Methylome Coverage from Low Input Quantities ABSTRACT Next-generation sequencing (NGS) of bisulfite-converted DNA to detect methylation status with per-base resolution is currently restricted by input requirements, requiring at least 50 ng of DNA. This Application Note shows how the unique chemistry of the DNA Library Kit enables the construction of high complexity libraries for: Genome-wide methylation analysis from 5 ng of human cell-free DNA (cfdna). Targeted sequencing of ng of human genomic DNA using SeqCap Epi CpGiant hybridization capture. Direct sequencing of 10 ng human genomic DNA. Direct sequencing of 1 ng of genomic DNA from the model organism Arabidopsis thalania. By utilizing the Library Kit, scientists will no longer be forced to sacrifice data when using low quantities of starting material. Introduction Epigenetic research has been impeded by the amount of starting material required to perform whole genome bisulfite sequencing (WGBS) and targeted sequencing of bisulfite-converted DNA samples. Large input requirements preclude sequencing of samples with limited genomic copies, which can arise when tissue collection is constrained or the sample is restricted to a small number of cells to avoid heterogeneity. To address this limitation, Swift Biosciences has reduced the amount of input DNA by up to 100X compared to other commercially available library preparation kits. In addition to lowering input quantity requirements, the efficient repair and adaptation of template DNA by Swift Biosciences technology also enables improved overall genomic coverage across multiple input quantities, thereby granting researchers more high quality data at any input. The Kit is based on Swift s Adaptase technology which enables unbiased adapter attachment to single-stranded bisulfite-converted DNA (Figure 1A). This preferable workflow eliminates significant (up to 90%) library loss associated with workflows where the NGS library is constructed prior to bisulfite conversion (Figure 1B). The templateindependent adapter attachment of the Kit also provides significant improvements in methylome coverage uniformity compared to methods that incorporate adapters by random priming of single-stranded DNA (Figure 1C).

2 Figure 1: Workflow for Three Library Preparations Evaluated (A) ACCEL-NGS METHYL-SEQ (B) TRADITIONAL (C) RANDOM PRIMING dsdna Fragment dsdna Fragment Unfragmented dsdna Bisulfite Conversion End Repair, Tailing and Ligation Reactions Bisulfite Conversion Adaptase Extension and Ligation Reactions Unconverted Library Molecules Bisulfite Conversion Broken Library Molecules Converted Library Molecule Primed DNA Synthesis ly Primed Fragment Inefficiently Primed Fragment 3 Tagging Functional Library Molecules Functional Library Molecule Functional Library Molecule For both and random priming kits, bisulfite conversion is performed prior to library construction. With the traditional library kit, bisulfite conversion is performed on the completed library. The lightning bolts represent bisulfite-induced fragmentation, NGS adapters are depicted in green and blue, and non-uracil containing library products are shown in yellow. METHODS OVERVIEW Bisulfite conversion was performed using the EZ DNA Methylation-Gold Kit 1 according to supplier s instructions. Bisulfite-converted DNA was quantified using the RNA setting on a NanoDrop instrument, and final libraries were quantified by qpcr. Sequencing data was aligned to its respective reference genome using BSMAP and normalized to a specified number of reads for comparison purposes 2. Bisulfite-Conversion Control (BCC) or unmethylated Lambda DNA (Promega Cat. No. D1521) was added to each sample to measure bisulfite conversion efficiency. For all samples, conversion efficiency was consistently 99%. Genome-Wide Hypomethylation Examining whole genome methylation density (MD) from 5 ng of cfdna, the Library Kit was used to construct libraries from cfdna obtained from the blood plasma of 8 cancer patients and 5 healthy controls. Libraries were PCR amplified with 7 cycles of PCR and sequenced on a MiSeq using V3 chemistry and 2 x 75 read length. Each sample was normalized to 10 million total reads, which provided sufficient coverage to determine repetitive element MD across the genome. Percent hypomethylation was calculated by comparing the MD of 1 Mb bins to the average MD of those from the 5 healthy controls. Bins were assigned as hypomethylated if the MD was > 3 SD below that of the 5 healthy controls (Table 1 and Figure 2). 2

3 Table 1: Genome-wide Hypomethylation in Cancer Samples Sample Pathology % Hypomethylation 1 Fallopian tube high-grade papillary serous carcinoma pt3c N1 with 2 nodes involved by micrometasasis cm ovarian borderline serous content (cancer-like) Recurrent pt2, pn0 mammary carcinoma, 2.15 cm pt1/pn1 pancreatic adenocarcinoma with neoadjuvant therapy Metastatic colon cancer to the liver (previously treated) cm ovarian borderline serous content (cancer-like) Colon-cancer, non-resectable Adenocarcinoma T4a by imaging Metastatic colorectal adenocarcinoma with liver metastasis, 2 cm primary 43.4 Percent hypomethylation of 8 cancer samples was calculated by comparing the MD of 1 Mb bins to the average of the 5 healthy control samples. Bins were assigned as hypomethylated if MD was > 3 SD below the average MD. Figure 2: Genome-wide Hypomethylation of Sample 8 This Circos plot represents the MD of 1 Mb bins across chromosomes 1-22 for Sample 8 (Metastatic colorectal adenocarcinoma with liver metastasis, 2 cm primary). Comparative Analysis of Targeted Bisulfite Sequencing Using Human DNA Targeted sequencing of CpG-rich regions was performed in order to compare sequencing metrics from libraries constructed by either the Methyl- Seq Library Kit or the Roche-Kapa Library Preparation Kit (supplied by Roche-NimbleGen). Human HapMap DNA NA12878 from the Coriell Institute for Medical Research was fragmented to an average 200 bp by Covaris M220. The Kit was used to construct libraries from 100 ng, 10 ng, and 1 ng inputs (all inputs are within specification of the Kit). The Roche-Kapa Library Preparation Kit was used to construct libraries from inputs of 1 µg (within specification) and 10 ng (below specification). Libraries were amplified with 11, 14, and 17 cycles of PCR (Swift Biosciences) and 12 and 18 cycles of PCR (Roche-Kapa). The SeqCap Epi CpGiant Enrichment Kit (Roche-NimbleGen) was used for hybridization capture of libraries. Each sample was normalized to 80 million total reads for direct comparison of performance metrics. To analyze differentially methylated regions (DMRs), 10 ng libraries from both Swift Biosciences and Roche-Kapa (NA12878) were compared to 10 ng libraries from an H1 ES cell line and sequenced to approximately 40X coverage. Libraries were sequenced on a HiSeq 2500 using V4 chemistry and 2 x 125 read length (Table 2 and Figure 3). 3

4 Table 2: Targeted Methylation Sequencing from ng with SeqCap Epi CpGiant input Method % Aligned % on Target % Duplication Mean Coverage % Covered 2X % Covered 20X 100 ng SWIFT x μg Kapa x ng SWIFT x ng Kapa x ng SWIFT x NoT Covered Sequencing metrics for libraries created from Roche-Kapa library prep (1 µg, 10 ng) and Swift s Kit (100 ng, 10 ng, 1 ng). All samples were normalized to 80M reads for comparison. Figure 3: DMRs Called from 10 ng Libraries with Swift s Kit chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 0 Mb 50 Mb 100 Mb 150 Mb 200 Mb 250 Mb hyper hypo DMRs were identified from 10 ng libraries from an H1 ES cell line and a B-lymphocyte cell line (NA12878). Libraries created with the Kit identified 294,130 DMRs (shown above). Libraries created with the Roche-Kapa kit identified only 464 DMRs (not shown). WGBS of Arabidopsis and Human DNA To evaluate coverage of the model organism Arabidopsis thaliana 3, the Library Kit, a traditional dsdna library preparation kit (requiring methylated adapters), and a random priming library preparation kit were used to construct libraries from inputs of 100 ng, 10 ng, or 1 ng. DNA was fragmented to an average 350 bp using a Covaris M220, if specified. Technical replicate libraries were constructed according to suppliers instructions and PCR amplified accordingly. Swift utilized 4, 7, and 10 cycles, respectively 4. For traditional and random priming, the recommended cycling for 100 ng was used plus three additional cycles for each 10X lower input. Amplified libraries were quantified by qpcr and sequenced on a HiSeq 2500 using V4 chemistry and 2 X 125 read length. Each sample was normalized to 30 million total reads for direct comparison of performance metrics (Table 3A). An library was also constructed using 10 ng NA12878 DNA. This sample was processed and analyzed in the same manner as the Arabidopsis samples, with the exception that million reads were analyzed (Table 3B and Figure 4). 4

5 Table 3A: Library Coverage Metrics 100 ng Arabidopsis 10 ng Arabidopsis 1 ng Arabidopsis Sample % Reads Aligned Genome Coverage % Duplicate Reads Estimated Library Size (MILLIONS) Relative Library Size % CpX Missing X Traditional X X X Traditional X X X Traditional X X % CpX Covered 10X Each Arabidopsis file was normalized to 30.2 million reads and data reported as an average of duplicate bisulfite-converted samples. The relative library size is the ratio of alternative methods to the library size. Table 3B: Library Coverage Metrics Sample 10 ng Human % Reads Aligned Genome Coverage % Duplicate Reads X 7.9 1,393 Estimated Library Size (MILLIONS) To assess coverage for the human genome, an library was sequenced using HapMap NA12878 DNA. The human sequencing data was normalized to million reads. Figure 4: Coverage of Unique CpX and CpG Dinucleotide Sequences (A) 10% 9% 8% % CpX Missing 7% 6% 5% 4% 3% 2% 1% 0% 100 ng 10 ng 1 ng Traditional (B) 100% 90% % CpX Covered 10X 80% 70% 60% 50% 40% 30% 20% 10% 0% Traditional 100 ng 10 ng 1 ng (C) 100% 80% 60% 40% 20% 0% % CpG Covered < 1x 2.2% 1x 97.8% 5x 96.4% (A, B) To assess methylome-specific coverage, 17.5 million unique CpX (CpG + CpH; where H is A, T, or C) dinucleotides were assessed from the Arabidopsis TAIR10 reference, including both completeness of coverage (A) and uniformity of coverage (B). (C) To assess methylome-specific coverage of the human genome, 28.1 million unique CpG dinucleotides were assessed for both completeness of coverage and coverage uniformity at 1X and 5X (where the average depth was 8.9X). 5

6 RESULTS Determination of Genome-Wide Hypomethylation from 5 ng of cfdna Bisulfite sequencing of plasma cfdna has recently proven useful for non-invasive detection and monitoring of cancer burden 5. Detection of cancer-associated genome-wide hypomethylation from patient blood plasma samples has the potential to serve as a high sensitivity and specificity diagnostic test for multiple cancer types. To examine this application with the Library Kit, we performed WGBS on cfdna from 8 cancer patient samples and 5 normal controls. 5 ng of input cfdna and 10 million mapped reads per sample provided sufficient coverage to determine genome-wide hypomethylation status in cancer patient samples. This methylation status ranged from % hypomethylation for various cancer pathologies (see Table 1). Cancer samples exhibiting a low percentage of cfdna hypomethylation may shed less tumor DNA into the blood plasma or have insufficient tumor burden. Sample 8 demonstrated the highest percentage of hypomethylated DNA, supported by a pathology report that indicated an advanced stage colorectal cancer. When comparing to the normal control samples, a Circos plot for Sample 8 illustrated the specific genomic regions exhibiting hypomethylation (see Figure 2). These data support the correlation of plasma cfdna hypomethylation status and tumor burden. This correlation potentially provides a non-invasive means to detect and monitor tumor burden in patients. Targeted Methylation Sequencing from ng with SeqCap Epi CpGiant Library enrichment by hybridization capture enables methylation analysis of a targeted region of the genome, reducing sequencing costs compared to WGBS. The SeqCap Epi CpGiant Enrichment Kit (Roche-NimbleGen) captures 80.5 Mb of the human genome, which contains greater than 5.5 million CpG dinucleotides. When compared, sequencing metrics for libraries prepared with the DNA Library Kit outperformed those from the Roche- Kapa library preparation kit supplied by Roche-NimbleGen at every input. Duplicate reads cripple hybridization capture methylation studies and must be removed prior to analysis. A proficient library preparation minimizes PCR duplicates by adapting input DNA molecules with high efficiency. Roche-Kapa libraries constructed from 1 µg of DNA exhibited duplication rates comparable to those from tenfold less DNA with the Swift Accel- NGS Kit (9.4% and 6.5%) (see Table 2). Similarly, Roche-Kapa libraries constructed from 10 ng of DNA demonstrated duplication rates that were comparable to those from Swift 1 ng libraries (71% and 62%). Fewer duplicate reads reduce sequencing costs and increase coverage of unique library molecules, potentially increasing the number of samples that can be multiplexed on a single sequencing run. Coverage metrics from Roche-Kapa libraries highlight a significant deficit in coverage of the target region. This deficit in coverage increases as template amount decreases. When comparing 10 ng input libraries from Swift and Roche-Kapa, mean coverage with the Roche-Kapa library was only 1X, compared to 35X with Swift, from an equivalent read depth. While the Swift library covers 98.5% of the target region at least 2X, the Roche-Kapa library covers only 24.7% of the target 2X. At 20X, the Swift library covers 71% of the target region while the Roche-Kapa library covers a mere 0.2%. Finally, while the Roche-Kapa library only covers 52.3% of the target region at any depth (1X or greater), the Swift library covers 99.2% of the target region at least 1X. Importantly, the coverage metrics from Swift 1 ng libraries outperform those from the Roche-Kapa 10 ng libraries (see Table 2). Taken together, these results illustrate that the Swift Biosciences kit yields significantly more usable data than the Roche-Kapa kit. As epigenetic methylation status has been shown to correlate with transcriptional expression of genes important in cancer and other diseases, identifying CpG sites of the genome exhibiting differential methylation can be informative for both researchers and physicians. To underscore the significance of comprehensive coverage, DMR analysis was compared 6

7 between libraries constructed with either Roche-Kapa or Swift Biosciences kits. Libraries constructed from 10 ng of DNA from an H1 ES cell line were compared to the libraries from 10 ng of DNA from a B-lymphocyte cell line (NA12878). These results illustrate the utility of DMR identification from a relatively low input quantity, which diminishes noise from cellular heterogeneity. Swift s Kit called a total of 294,130 DMRs between the two sample types (37,799 hypomethylated and 256,331 hypermethylated) (see Figure 3). In contrast, the low coverage exhibited by the Roche-Kapa kit resulted in a total of only 464 DMRs called (not shown). Whole Genome Sequencing of Arabidopsis thalania from 1 ng Input with Kit To evaluate WGBS of a small genome, three library preparation technologies were assessed at 100, 10, and 1 ng inputs of Arabidopsis thalania gdna: Library Kit, a traditional dsdna library preparation kit (requiring methylated adapters), and a random priming library preparation kit. All input quantities are within the specifications of the Swift Biosciences kit, while the 10 and 1 ng inputs are below the specifications for the traditional and random priming kits. Across all inputs, the Kit exhibited superior performance for the percentage of reads aligned to the genome, the depth of coverage over the entire genome and for CpX (CpG and CpH) sites in particular, the percentage of duplicate reads, and estimated library size. Maximizing alignment data and minimizing duplicate reads expand the usable data generated from each sequencing run. An increase in usable data enables a specific depth of coverage to be reached with fewer reads, reducing per sample sequencing costs. The Kit improves average read alignment to 87% compared to the traditional kit s 79% and the random priming kit s 72% (see Table 3A). With tenfold lower duplication at 100 and 10 ng inputs than the random priming kit, and 3.4-fold lower duplication at 1 ng than the traditional kit, the Kit demonstrated high efficiency in adapting bisulfite-converted fragments (see Table 3A). This gain in sequence data was also apparent in the library complexity as measured by the estimated library size (see Table 3A). The Kit demonstrated the highest library complexity at 100 ng, greater than tenfold higher than the random priming kit. While all three kits had reduced estimated library size at 1 ng, the traditional kit was most sensitive as its complexity fell sixfold lower than the Kit. The gain in data output for the Accel- NGS Kit was also observed in the average genome coverage (see Table 3A). Accurate cytosine methylation detection in the Arabidopsis genome requires comprehensive CpX dinucleotide coverage with balanced read depth across these genomic sequences; ideally, all unique CpX dinucleotides would be represented and covered at equal read depth. Two metrics were used to evaluate completeness and uniformity of CpX coverage the percent CpX missing (see Figure 4A) and the percent CpX covered 10X (see Figure 4B). With a mere 0.6% missing, the Kit demonstrated the most comprehensive unique CpX coverage at all inputs (see Figure 4A). The traditional kit had reduced representation at 1 ng (2% missing) while the random priming kit showed the highest percentage of CpX missing across all three inputs (6% missing) (see Figure 4A). The Methyl- Seq Kit also demonstrated the most balanced CpX coverage uniformity across all three input quantities, with a remarkable advantage at 1 ng input (77% at 10X coverage compared to 17% and 31%) (see Figure 4B). At all inputs tested, the Kit from Swift Biosciences provided the most comprehensive and uniform methylome coverage. The traditional and random priming kits produced biased representations of the methylome with less usable data. 7

8 WGBS of the Human Methylome While Arabidopsis is a useful model organism, it lacks unique elements, such as CpG islands, which can be found in the human genome. These CpG islands and other CpG-rich promotor regions can lack full representation using previously available methods. Data generated from the Kit with human DNA demonstrated comprehensive coverage of unique CpG dinucleotides with only 2.2% missing from only 9X average sequencing depth. Further, 96.4% of the methylome was covered at least 5X (see Table 3B, Figure 4C). This uniformity of coverage from low-pass sequencing validates that the Kit performs with great efficiency across different genomes, including those with a complex organization of base composition. Conclusion The DNA Library Kit preserves high library complexity, delivers low duplication rates, and achieves comprehensive coverage from low sample inputs. This remarkable performance with low input quantities enables analysis of cancer-associated, genome-wide hypomethylation from just 5 ng of cfdna. For targeted sequencing, the Kit demonstrated superior duplication rates and coverage metrics at all input quantities tested, maximizing the amount of usable sequencing data. Similarly, at all DNA inputs tested, the Accel- NGS WGBS libraries yield data displaying increased library complexity and methylome coverage when compared to traditional and random priming kits. These results demonstrate the comprehensive superiority of libraries constructed with the Library Kit, empowering low input methylome analysis. References 1. EZ DNA Methylation-Gold Kit. Zymo Research Xi et al. BMC Bioinformatics Jul 27;10:232. doi: / Lister et al. Cell May 2; 133(3): doi: /j.cell Swift Biosciences Chan et al. Proc Natl Acad Sci Nov 19; 110(47): doi: /pnas Swift Biosciences, Inc. 58 Parkland Plaza, Suite 100 Ann Arbor, MI , Swift Biosciences, Inc. The Swift logo and Adaptase are trademarks and is a registered trademark of Swift Biosciences. SeqCap is a trademark of Roche-NimbleGen, Inc. EZ DNA Methylation-Gold is a trademark of Zymo Research. NanoDrop is a registered trademark of Thermo Fisher Scientific Inc. HiSeq and MiSeq are registered trademarks of Illumina, Inc , 05/16