TECH NOTE Ligation-Free ChIP-Seq Library Preparation

Size: px
Start display at page:

Download "TECH NOTE Ligation-Free ChIP-Seq Library Preparation"

Transcription

1 TECH NOTE Ligation-Free ChIP-Seq Library Preparation The DNA SMART ChIP-Seq Kit Ligation-free template switching technology: Minimize sample handling in a single-tube workflow >> Simplified protocol with post-pcr size selection: Higher yield with a combined post-pcr size selection and clean-up step >> Sensitive, reproducible data: Libraries have high non-redundant rates, numbers of peaks identified, and overlap with ENCODE data >> Overview Pushing good science forward sometimes demands working through experimental challenges. Preparing next-generation sequencing (NGS) libraries from chromatin immunoprecipitation (ChIP) experiments can be one of those challenges due to the small amount of DNA available. Most library preparation methods rely on ligation to add sequencing adapters when generating ChIP sequencing (ChIP-seq) libraries. However, these methods require double-stranded (ds) DNA inputs, limiting the types of DNA samples that can be used for library preparation. The DNA SMART ChIP-Seq Kit utilizes a modified version of SMART template switching technology to provide a ligation-free method for addition of Illumina sequencing adapters. Template switching technology allows for single-step adapter addition, and has the sensitivity required for library preparation from picogram quantities of nucleic acids. The DNA SMART ChIP-Seq Kit provides a robust and reliable tool for ChIP-seq applications, particularly at low input levels (100 pg 10 ng), and library preparation can be completed in approximately four hours. Either dsdna or single-stranded (ss) DNA templates may be used, making this kit ideal for ChIP-seq library preparation. Template Switching Technology for DNA DNA SMART technology eliminates the need for an adapter ligation step and associated clean-up. This streamlined protocol is enabled by the SMARTScribe Reverse Transcriptase (RT) which copies the DNA template and adds a few additional nucleotides to the 3 end of the newly synthesized DNA. The carefully designed DNA SMART Oligonucleotide base-pairs with these additional non-template nucleotides and creates an extended template, enabling the SMARTScribe RT to continue replicating to the end of the oligonucleotide. Sequencing libraries are then amplified by PCR using primers containing Illumina adapters

2 Flowchart of technology in the DNA SMART ChIP-Seq Kit. This single-tube workflow allows users to generate Illumina-compatible libraries for ChIP-seq experiments. After library size selection and purification, the total time from input DNA to ChIP-seq library is approximately four hours. Protocol Improvements The DNA SMART ChIP-Seq Kit uses a combined size selection and clean-up step after library amplification by PCR. Compared to other protocols that perform size selection before PCR amplification, post-pcr size selection results in a higher yield while maintaining the quality of the libraries. Sequencing metrics comparing pre- and post-pcr size selection ChIP Antibody CTCF

3 Size selection pre-pcr post-pcr post-pcr (single selection) Library yield (nm) No. uniquely mapping reads 5,119,363 5,908,181 4,302,276 Non-redundant rate No. of peaks identified with 3.85 M reads (uniquely mapped, non-duplicates) Number of overlapping peaks 32,827 34,011 33,398 28,039 28,039 28,469 28,924 Sequencing metrics from ChIP-seq libraries purified before or after amplification. ChIP-seq libraries were generated from 200 pg of the same input ChIP DNA with size selection before or after library amplification (16 cycles of PCR). Size selection removed both small and large inserts (except for the single size selection sample indicated, which only removed small fragments/adapter dimers). Performing size selection after amplification does not alter the quality of the data generated from these libraries. The location and shape of the peaks identified using post-pcr size selection still matched reported ENCODE data.

4 Size selection before or after PCR amplification does not affect library quality. Peaks identified from pre- or post-pcr size selection and purification (Panel A) and electropherograms showing the different libraries generated using pre- or post-pcr size selection (Panel B) are shown. Sensitive Library Production The DNA SMART ChIP-Seq Kit has the sensitivity to generate sequencing libraries from very small amounts of fragmented DNA. The number of unique, non-duplicate reads is high across all input levels, and the number of peaks identified is similar across input amounts. Sequencing metrics from various amounts of input DNA ChIP antibody H3K4me3 Input amount (ng) No. of PCR cycles Library yield (nm) Total no. reads (millions)

5 % reads mapped No. of uniquely mapped reads (millions) No. of unique reads without duplicates (millions) % useful reads (uniquely mapped, non-duplicates) No. of peaks identified 16,738 16,811 16,366 17,277 16,584 19,601 Sequencing metrics from specific amounts of input DNA. Chip-seq libraries generated from different dilutions of the same input DNA sample have very similar sequencing metrics, showing the high sensitivity and reproducibility of this kit. Libraries generated with this kit have high reproducibility; technical replicates generated from the same amount of input DNA have >93% overlap for input levels greater than 100 pg, and libraries generated from different input amounts have >94% overlap with each other. Complexity (the nonredundant rate) is also very high for these libraries.

6 ChIP-seq library complexity and reproducibility is maintained across input amounts. The reproducibility between technical replicates was similar across input amounts (Panel A). The non-redundant rate (normalized for 10 million uniquely mapped reads) was well above the standard recommended by the ENCODE project (0.8) for inputs >0.5 ng (Panel B; error bars indicate the standard deviation of two technical replicates). Compared to the 4 ng library, the number of peaks were similar across lower input libraries (Panel C). The shape and location of the peaks was similar across input levels, and matched very well to ENCODE data (293 cells, anti-h3k4me3 antibody, U. Washington), even for as little as 50 pg input DNA (Panel D). Reproducible Libraries from Specific Numbers of Cells Exact quantification of DNA obtained by ChIP can be very difficult due to the low concentrations. Typically, researchers must use the entire ChIP DNA sample obtained for sequencing library preparation. With DNA SMART technology, ChIP-seq data from total (unquantified) ChIP DNA is

7 consistent across different starting cell numbers. Sequencing metrics from total DNA from specified numbers of cells ChIP antibody H3K4me3 Input amount (millions of cells) No. of PCR cycles Library yield (nm) Total no. reads (millions) % reads mapped No. of uniquely mapped reads (millions) No. of unique reads without duplicates (millions) % useful reads (uniquely mapped, non-duplicates) Non-redundant rate No. of peaks identified 19,459 19,339 18,549 22,564 The DNA SMART ChIP-Seq Kit generates high-quality libraries from low cell number ChIP experiments. Total DNA recovered from ChIP experiments using the number of cells indicated was used as input for the DNA SMART ChIP-Seq Kit. Mapping statistics were very good across all input levels. DNA SMART ChIP-Seq Kit libraries maintain consistent representation of sequences even at the lowest input levels. Between 86 91% of the identified peaks overlapped between different input amounts. Across all cell inputs, the DNA SMART ChIP-Seq Kit generates data that matches well with the reported data from the ENCODE project (>87 % overlap in peaks).

8 Peak overlap between libraries generated from various inputs. Peaks from different numbers of cells were identified using 6 million (for 10,000 cells) or 9 10 million (for 50,000 1,000,000 cells) uniquely mapped reads. There was a high amount of overlap across the number of input cells (Panel A). The peaks were of similar shape across cell inputs and matched the peaks obtained by the ENCODE project (293 cells, anti-h3k4me3 antibody, U. Washington; Panel B). Additionally, the total number of peaks overlapping with those identified in the ENCODE project was high (Panel C). Summary The modified template switching technology at the core of the DNA SMART ChIP-Seq Kit provides a sensitive means for generating sequencing libraries in a ligation-independent manner from either ssdna or dsdna templates. The post-pcr library size selection and clean-up step generates libraries with higher yields than if size selection and clean-up are performed before library amplification, simplifies the workflow, and makes this kit ideal for low-input DNA samples. The DNA SMART ChIP-Seq Kit generates sensitive, robust, reproducible ChIP-seq libraries for Illumina sequencing using a simplified, single-tube protocol that can be completed in about four hours. Methods For ChIP assays, HEK 293T cells were grown to 80% confluence and fixed with 1% formaldehyde

9 for 10 minutes. ChIP was performed with ChIP-grade anti-h3k4me3 or anti-ctcf antibodies according to standard methods (chromatin shearing by sonication with Bioruptor Pico; Diagenode). DNA was purified with a Macherey-Nagel NucleoSpin Gel and PCR Clean-Up kit. Sequencing libraries were generated using the DNA SMART ChIP-Seq Kit and size selection was performed with AMPure XP beads (Beckman Coulter) using Option 1 or Option 4 as described in the DNA SMART ChIP-Seq Kit User Manual. Sequencing was carried out on Illumina MiSeq or HiSeq 2500 instruments. All runs were paired-end sequencing using the Custom Read2 Seq Primer from the DNA SMART ChIP-Seq Kit for some runs. Mapping of reads (unpaired) to the human genome (hg19) was performed using Bowtie2 with default settings (plus trimming of the first three 5 nucleotides of the reads obtained with the Read Primer 1). Uniquely mapping reads were selected and the SAM files were sorted and converted to BAM files using SAMTOOLS. Peaks were identified using MACS version 1.4 (default settings except the p-value cutoff set at 1 x 10-7 ). Raw data generated by the ENCODE consortium were downloaded as fastq files from and analyzed similarly to the data generated with the DNA SMART ChIP-Seq Kit. Reads and peaks were visualized using IGV or the UCSC genome browser