Chemical and/or enzymatic deamination across various sequencing methods to localize modified cytosines.

Size: px
Start display at page:

Download "Chemical and/or enzymatic deamination across various sequencing methods to localize modified cytosines."

Transcription

1 Supplementary Figure 1 Chemical and/or enzymatic deamination across various sequencing methods to localize modified cytosines. (a) Upon treatment with bisulfite under acidic conditions and at elevated temperatures, unmodified cytosine becomes sulfonated at the 6-position, which facilitates its hydrolytic deamination. Moving to alkaline conditions promotes desulfonation to yield uracil. In contrast, APOBEC3A catalyzes enzymatic hydrolytic deamination under physiological temperatures and phs. (b) Workflows of various deamination-based sequencing methods to localize modified cytosines. Traditional bisulfite sequencing (BS-Seq) localizes 5mC and 5hmC together after deamination of unmodified C. TET-assisted bisulfite sequencing (TAB-Seq) changes reactivities to specifically localize 5hmC by first protecting 5hmC with a glucose moiety and then oxidizing 5mC with TET to form 5-formyl or 5-carboxylcytosine, which will be read as T following bisulfite conversion and sequencing. Oxidative bisulfite sequencing (oxbs-seq) directly localizes 5mC by oxidizing 5hmC with potassium perruthenate before bisulfite treatment. When used in conjunction with standard bisulfite sequencing, 5hmC can be indirectly localized through subtraction of the oxbs signal from that of BS-Seq. Finally, APOBEC-Coupled Epigenetic Sequencing (ACE-Seq) does not rely on bisulfite, and instead utilizes enzymatic deamination of C and 5mC by APOBEC3A after protecting 5hmC with a glucose moiety. The sequencing readout therefore is comparable to TAB-Seq without the need for the destructive bisulfite treatment.

2 Supplementary Figure 2 Phenotypic validation of modifications in phage DNA. (a-c) Validation of T4 mutant phenotypes by (a) restriction digest and (b,c) liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). (a) AluI cleaves only unmodified cytosine-containing DNA, while MspJI selectively cleaves 5hmC-containing DNA. Exogenous treatment of T4-hmC with a glucosyltransferase also causes a mass shift to generate a band the same size as T4- ghmc. The gels are adapted from Bryson, et al. mbio, 2015 (PMID: ) with permission from mbio; specifically, only panel A of the original figure was used, and the labels were altered slightly to align with nomenclature used in this article. (b) Raw MS/MS traces of C, 5mC, 5hmC, and 5ghmC in T4-C, T4-hmC, and T4-ghmC phage stocks. and denote peaks representing - and -anomers of 5ghmC. The MS/MS transitions are provided in the methods. Note that the peak area does not directly correlate with abundance given different ionization efficiencies. Independent experiments were performed in triplicate; representative traces shown. (c) Levels of C and 5ghmC in each stock are quantified relative to the T4-C and T4-ghmC reference genomes. Mean values from triplicate experiments are listed above each bar; error bars report the standard deviation. The presence of 5ghmC in T4-C is likely from rare reversion of the phage during propagation. T4 phage can package some of its host E. coli gdna (strain DH10B). The presence of C in T4-hmC and T4- ghmc correlates with the expected amount based on the frequency of sequencing reads that map to the E. coli DH10B (bottom table). Notably, the presence of small amounts of E. coli gdna do not impact any of the ACE-Seq quality metrics, as these are derived from reads that map to the T4-hmC genome alone (as in Fig. 2d,f). (d,e) Validation of CG methylation of phage genomic DNA by (d) restriction digest and (e) LC-MS/MS. (d) 75 ng of phage DNA that was either untreated or treated with M.SssI was incubated in the presence of no restriction enzyme (-), MspI, or HpaII, with appropriate buffers. While MspI cleaves both unmethylated and methylated CG-containing sites, HpaII is blocked from cleavage by CG methylation. Experiments were performed several times with similar results; representative image shown. (e) Raw MS/MS traces of C, 5mC, 5hmC, and 5ghmC in the methylated phage stock. Experiment was performed in triplicate (n = 3); representative traces shown.

3 Supplementary Figure 3 ACE-Seq optimization to promote full deamination of a locus in the T4-C phage genome. (a) Highlighter plot showing deamination events (red) and non-deamination events (cyan) compared to the non-deaminated master sequence. Clones from experiments without DMSO and with 10% DMSO are shown. Strong hairpin formation was predicted using secondary structure prediction software from position , likely contributing to lack of deamination in this area without DMSO. (b) Highlighter plot showing deamination events (red) and non-deamination events (cyan) compared to the non-deaminated master sequence. Clones from experiments performed at 25 ºC, 37 ºC, and under slow ramping conditions (4 ºC to 50 ºC over 2 hours) are shown.

4 Supplementary Figure 4 Phenotypic validation of modifications in phage DNA. 15 ng of a 1:1 pooled mixture of methylated phage gdna and T4-hmC DNA was treated with or without GT and then with or without A3A (4 samples, each in triplicate). These reactions were then digested to nucleosides, and analyzed via LC-MS/MS. Levels of (a) C, (b) 5mC, and (c) 5hmC were quantified using standard curves generated from purified nucleoside controls, and the relative amount of the base graphed relative to the untreated sample (-A3A, - GT). Individual data points from triplicate experiments are overlaid on the bar graph; mean percentages are listed above each bar, and error bars represent standard deviations from the mean. (d) Representative raw traces of MS/MS counts for 5hmC and 5ghmC. Upon GT treatment, 5hmC signal decreased significantly, accompanied by the conversion to 5ghmC. Mean percentages of 5hmC remaining from triplicate experiments (n = 3) are listed above the remaining peaks in the + GT samples. (e) Chart outlining predicted/observed products for both spike-in controls under each condition tested.

5 Supplementary Figure 5 Quantitative PCR (qpcr) of bisulfite-treated versus A3A-treated samples. ACE-Seq or BS-Seq samples of unsheared mesc DNA analyzed qualitatively in Fig. 3a were used to seed qpcr reactions. 0.5 μl of the treated samples were combined with 500 nm each of the forward and reverse primers (to amplify either (a) the 200-bp amplicon or (b) the 1-kb amplicon) and amplified using the KAPA SYBR Fast Rox low qpcr Mastermix kit (KAPA Biosystems). For the 200-bp amplicon, a two-step PCR protocol was used in which the samples were initially denatured at 95 C for 3 minutes, and then cycled between 95 C (15 seconds) and 63 C (20 seconds) for a total of 35 cycles. For the 1-kb amplicon, a two-step PCR protocol was used in which the samples were initially denatured at 95 C for 5 minutes, and then cycled between 95 C (30 seconds) and 66 C (90 seconds) for a total of 41 cycles. Resulting qpcr products were run on 1% agarose gels and stained with SybrSafe to confirm specific amplification. The gray dashed line in (b) represents the mean C T value from the no-template control due to primer dimer amplification. Notably, the signals from the no-template and bisulfite samples for the 1-kb amplicon were exclusively from primer dimer amplification and were not specific to the desired 1-kb product. Mean values from triplicate experiments (n = 3) are shown, with error bars representing standard deviation from the mean. The data from 1 µg input samples are the same as those presented in Fig. 3b. (c) Shown is the uncropped gel from Fig. 3b.

6 Supplementary Figure 6 Comparison of ACE-Seq as a function of gdna input. (a) Browser snapshot showing base-resolution 5hmCG (blue) maps near the Neurod6 gene (chr6:55,614,961-55,646,769; mm9). ACE- Seq raw signals of CGs on both strands were combined and only CG dyads sequenced to depth >=2 are shown. Gray tracks denote sequencing coverage for each CG dyad. Experiment was performed once each for 2 ng and 20 ng input of each gdna (n = 2). (b) Correlation density plot between ACE-Seq experiments with 2 ng or 20 ng of DNA as input. Mean ACE-Seq raw signals were calculated for tiled 10-kb bins across the mouse genome. Correlation analysis was performed with 10-kb bins spanning the genome (n = 238,401 bins).

7 Supplementary Figure 7 Relationship of DNA modifications and chromatin states at representative loci. Browser snapshot showing base-resolution CG (green), 5mCG (red), 5hmCG (blue) maps, as well as RNA-Seq, ATAC-Seq, and ChIP- Seq of major histone modifications near the (a) transcriptionally-active Neurod6 (chr6:55,667,934-55,690,102; mm10) gene and (b) the inactive Gad2 gene (chr2:22,607,043-22,660,967; mm10). Only CGs sequenced to depth 2 are shown. Gray tracks denote sequencing coverage for each position in each base-resolution map. ACE-Seq traces represent merged data sets from single experiments at 2 ng and 20 ng of input DNA (n = 2).

8 Supplementary Figure 8 Base-resolution analysis of 5hmCGs and 5mCGs at imprinted regions in mouse cortical excitatory neurons. (a) The average 5hmC (left) or 5mC (right) levels are shown within two groups of imprinted regions (hmc-high: blue; hmc-low: green) and their flanking regions. (b) Snapshot of base-resolution C (green), 5mC (red), and 5hmC (blue) maps near the Kcnq1ot1 gene locus (chr7:143,290, ,300,000; genome build: mm10). Only CGs sequenced to depth >=2 are shown. Gray tracks denote sequencing coverage for each position. ACE-Seq traces represent merged data sets from single experiments at 2 ng and 20 ng of input DNA (n = 2). (c) Heat-map representation of normalized RNA-Seq, H3K4me3 (ChIP-Seq), H3K27me3 (ChIP-Seq), 5hmC (ACE-Seq), 5mC (derived from BS-Seq and ACE-Seq), and 5hmC/5mC ratios within 30 imprinted regions. Imprinted regions were ranked by their 5hmC levels in cortical excitatory neurons.

9 Supplementary Figure 9 Distribution of C, 5mC, and 5hmC levels at representative genomic elements. Ternary plots show the levels of C, 5mC, and 5hmC within 1-kb bins overlapping with representative genomic elements.

10 Supplementary Figure 10 Enrichment of high-level 5hmCG sites. (a) The distribution of abundances of called 5hmCG (blue) and 5mCG (red) at individual sites in cortical excitatory neurons. The dashed line denotes the signal level of 0.6, the cutoff used to select high-level 5hmCG sites. (b) The fraction of high-level 5hmCG (purple) and all 5hmCG (yellow) sites within various genomic elements (relative to total number of CG sites in each group).

11 Supplementary Figure 11 Possible applications of enzymatic deamination for 5mC localization. (a) Comparison of chemical and enzymatic deamination. Chemical deamination is efficient on C, while 5mC is largely resistant to the reaction. Also, 5hmC is converted to a CMS adduct, while 5fC is inefficiently deaminated. Efforts to drive this reaction starts to increase aberrant 5mC deamination (see Wu et al., Nature Prot, 2016, 11: ). Enzymatic deamination is efficient on C and 5mC and discriminates against all ox-mcs (see Schutsky et al., NAR, 2017, 45: ) (b) Schemes for detection of 5mC. Potential schemes for localizing 5mC and all ox-mcs involve TET-mediated oxidation with or without coupled protection with GT, which would lead to conversion of C bases only. As another alternative, after bisulfite treatment, A3A can be used to deaminate 5mC but not CMS to differentiate 5hmC and 5mC, although such a method would carry forward the limitations of bisulfite.