Supplemental Information. Figure S1 shows the characterization of the AFF4, ELL2 and CDK9 antibodies.

Size: px
Start display at page:

Download "Supplemental Information. Figure S1 shows the characterization of the AFF4, ELL2 and CDK9 antibodies."

Transcription

1 Supplementary Materials and Methods Supplementary References Supplementary Figure Legends Supplementary Figures Supplemental Information Figure S1 shows the characterization of the AFF4, ELL2 and CDK9 antibodies. Figure S2 shows the genome-wide occupancy of the SEC components at highly transcribed genes such as the histone loci. Figure S3 shows the presence of the pausing form of Pol II, with DSIF/NELF on Hoxa1, but not on Hoxb1. Figure S4 shows SEC recruitment on Hoxa1, but not Hoxb1, upon RA treatment. Figure S5 shows the requirement of CDK9 on both Hoxa1 and Hoxb1 gene induction by RA. Figure S6 shows SEC is required for the activation of Cdx1 and Chac1, but not Csn3 and Nrip1, by RA. Figure S7 shows the requirement of CDK9 for all of the RA highly induced genes. Figure S8 shows data concerning SEC recruitment to serum-response genes. Figure S9 shows that Brd4 is not required for the Cyp26a1 gene activation by RA.

2 Supplemental Materials and Methods Affymetrix Microarray Analysis Affymetrix Mouse 430 v2 arrays were analyzed in R, version , using the packages affy (Gautier et al. 2004), version , and limma (Smyth et al. 2005), version Normalization was done using rma. Annotation information for the probes was taken from Ensembl 62. Solexa/Illumina Data Analysis Reads for the H3K27me3 data presented were sourced from the NCBI Sequence Read Archive from the original work of Mikkelsen et al. (2007). Sequencing reads were acquired through the primary Solexa image analysis pipeline, where bases were called and reads were filtered for quality, according to default Solexa standards. Filtered reads were then aligned to the mouse genome (NCBI build 37, UCSC mm9) or the human genome (UCSC hg19) using the bowtie (Langmead et al. 2009) alignment tool, version Only those sequences that matched uniquely to the genome with up to two mismatches were retained for subsequent analysis. Enriched regions of ChIP-seq signal were determined by the MACS (Zhang et al., 2008) peak-finding program, version 1.4.0rc2. Sequence reads for each ChIP-seq dataset and their associated whole-cell extract controls were used for the input and control file, respectively. The effective genome size was configured appropriately for mouse and

3 human datasets, and the p-value cutoff was set to 1.00e-08 or FDR <= 1%, and a foldchange greater than five. All other MACS parameters were left default. RNA-seq analysis was done using TopHat v (Trapnell et al. 2009) and Bowtie v Only uniquely mapping reads were used. Human transcript annotations were from Ensembl 59. Cufflinks v0.9.3 (Trapnell et al. 2010) was used to quantity FPKM values for each transcript. Gene expression values were retained as the maximum FPKM of an underlying transcript. Gene Annotation Annotations for all mouse transcripts were from Ensembl release 62. All human transcript annotations were from Ensembl 59. Genes were called bound for all ChIP-seq samples if an enriched peak region was found within 1kb of the transcription start site for any transcript isoform of the gene. Track Figures Read coverage information in the track figures was created using R by extending the reads 150 bases toward the interior of the sequenced fragment and then by computing the number of extended reads in 25 bp windows as the count of extended reads per million reads sequenced (RPM; counts/million). The resulting coverage object was exported and visualized using the UCSC genome browser (Kent et al. 2002).

4 Histogram and Heatmap Figures Histogram representations of ChIP-seq binding for Pol II and SEC-members were done using R. First, all gene annotations and enriched peak regions were loaded. For each gene region, +/- 5 kb surrounding the transcription start site was calculated. Using 50bp windows tiling the 10 kb regions, enriched peak regions were used to label a tile either enriched or not enriched. The resulting data structure contained 200 columns, the number of rows equals the number of annotated genes in the genome, and a one or zero in each position of the matrix indicating enrichment. The heatmap representation of the microarray expression values was also done in R using all probes that had at least a two-fold change in expression, up or down, at six hours of induction versus no induction. For each time-point and replicate depicted (2, 4, and 6 hour), expression values were converted to fold-changes relative to the 0 hour (wild-type) time-point. Log2 fold-changes were then binned into nine equally spaced groups from >2 to <-2 in 0.5 value increments. The three replicates for all of the three time-points were combined into a matrix, and then sorted based on the total sum of bin magnitudes.

5 Supplementary References Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12: Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25. Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448: Rahl, P.B., Lin, C.Y., Seila, A.C., Flynn, R.A., McCuine, S., Burge, C.B., Sharp, P.A., and Young, R.A. (2010). c-myc regulates transcriptional pause release. Cell 141: Smyth, G.K., Michaud, J., and Scott, H.S. (2005). Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21: Zhang, X., Lian, Z., Padden, C., Gerstein, M.B., Rozowsky, J., Snyder, M., Gingeras, T.R., Kapranov, P., Weissman, S.M., and Newburger, P.E. (2009). A myelopoiesisassociated regulatory intergenic noncoding RNA transcript within the human HOXA cluster. Blood 113: Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9: R137.

6 Supplementary Figure Legends Figure S1. Characterization of the AFF4, ELL2 and CDK9 antibodies. (A) CDK9 antiserum can specifically recognize endogenous CDK9 protein. Whole cell extracts from CDK9 shrna and non-targeting shrna lentiviral infected ES cells were prepared and analyzed by Western blotting with CDK9 antibody. (B-C) AFF4 and ELL2 antibodies can specifically recognize their endogenous proteins. Whole cell extracts from AFF4 or ELL2 shrna and non-targeting shrna lentiviral infected HCT-116 cells were prepared and analyzed by Western blotting with AFF4 or ELL2 antibody. Triangles indicate titrations of cell extracts. Tubulin serves as a loading control. Figure S2. Genome-wide analysis of SEC components AFF4, ELL2, CDK9 by ChIPseq in ES cells finds SEC enriched at highly transcribed genes such as the histone loci. SEC subunits are enriched at the transcription start sites (TSS) of these genes, but can also travel with Pol II into the gene body. Figure S3. Hoxa1, but not Hoxb1, contains the pausing form of RNA Polymerase II in untreated mouse ES (mes) cells. (A) The Hoxa1 promoter was preloaded with the S5, but not S2, phosphorylated form of Pol II indicative of TFIIH activity. Also present on Hoxa1 are DSIF (represented by Spt5) and NELF (represented by NELFA). In contrast, Hoxb1 is devoid of any of these factors. ChIP-seq data are from Rahl et al. (2010). (B) The general transcription factor TFIIB is present at the Hoxa1 before RA treatment, but not at the Hoxb1 promoter, by ChIP analysis. However, little or no TBP was detectable

7 on the Hoxa1 and Hoxb1 gene promoters. Gapdh is a highly expressed gene and Hba1 is a non-transcribed gene in mes cells and these serve as positive and negative controls, respectively. Error bars represent the standard deviation. Figure S4. The Hoxa1, but not the Hoxb1, promoter is preloaded with Pol II and recruits SEC after RA treatment in ES cells. (A) Bivalent marks, paused Pol II and SEC recruitment to the Hoxa1 promoter. In ES cells, the Hoxa1 and Hotairm1 regions are bivalently marked by H3K27me3 and H3K4me3. The Hoxa1 gene promoter is also preloaded with Pol II. (B) Bivalent marks and paused Pol II are absent from the Hoxb1 genes, which do not have detectable SEC after 6 hour RA treatment. While H3K27me3 also heavily marks the whole Hoxb1 region, there is no detectable H3K4me3 or Pol II at its promoter. Before RA treatment, there is no detectable AFF4 or ELL2 signal on the Hoxa1 or Hoxb1 genes. Both AFF4 and ELL2 are recruited to the Hoxa1, but not the Hoxb1, gene region after exposure to RA for 6 hours. The non-coding RNA, Hotairm1, which shares the promoter region with Hoxa1, can also be induced by 6 hour RA treatment (Zhang et al. 2009). And both AFF4 and ELL2 are recruited to the Hotairm1 gene after RA treatment. Figure S5. CDK9 is required for both Hoxa1 and Hoxb1 gene activation by RA. (A) ELL2 mrna is specifically and significantly knocked down by ELL2 shrna. Either a shrna targeting ELL2 or a non-targeting shrna (NonT) was introduced by lentiviral infection for 3 days. RT-qPCR was used to measure the mrna levels of ELL, ELL2 and ELL3. (B) The Cdk9 inhibitor, flavopiridol, inhibits the activation of both Hoxa1 and Hoxb1 by RA treatment. ES cells were induced with RA for 1, 3 and 6 hours in the

8 presence and absence of 1 μm of flavopiridol. RT-qPCR was used to measure the mrna levels of Hoxa1 and Hoxb1 at the indicated time points. Error bars represent the standard deviation. Figure S6. ELL2 RNAi inhibits the induction of Cdx1 and Chac1, but not Csn3 and Nrip1, by RA. shrna targeting ELL2 was introduced by lentiviral infection for 3 days before RA treatment. RT-qPCR was used to measure the mrna levels of Cdx1, Chac1, Csn3 and Nrip1. RARg is a non-sec target gene control. Error bars represent the standard deviation. Figure S7. The P-TEFb complex is required for all RA highly induced gene activation. (A) Cdk9 is recruited to all of the RA highly induced gene promoters. Cdk9 ChIP was performed with ES cells in the presence and absence of RA for 6 and 18 hours (RA0, RA6 and RA8, respectively). (B) The Cdk9 inhibitor, flavopiridol (FP), abolished RA-mediated gene activation. ES cells were induced with RA for 1, 3 and 6 hours in the presence and absence of 1 μm of flavopiridol. RT-qPCR was used to measure the mrna levels at the indicated time points. Error bars represent the standard deviation. Figure S8. SEC is recruited to serum-induced genes. ChIP-seq of SEC subunits and Pol II in HCT-116 cells was performed before and after serum stimulation. (A) Histogram of the occupancy of AFF4, ELL2 and Pol II genome-wide. The TSS of each gene in the genome was used to measure the distance to the nearest bound region, which is plotted if falling within 5kb of the TSS. This analysis also shows that SEC components are

9 enriched over the TSS, similar to Pol II occupancy. (B) Venn Diagram analysis shows that 15 of the serum-induced genes recruit SEC (AFF4 with ELL2). SEC is newly recruited to 55 genes, where both AFF4 and ELL2 are co-bound after serum stimulation and not co-bound before stimulation. Out of these 55 genes, 15 of them were induced more than 2-fold after serum treatment by RNA-seq analysis. The gene numbers reflect all genes of the above criteria, which were not annotated with the biotype pseudogene or processed_transcript. (C) Comparison of RNA-seq expression levels after serum stimulation for gene subsets of all Pol II-bound and active genes. Genes co-bound by SEC show a statistically significant difference in expression versus all Pol II-bound and active genes (p < 1e-9 by Wilcoxon rank sum test). Expression is measured as fragments per kilobase of transcript per million reads aligned (FPKM) and shown as the log2 (FPKM). Active genes are defined as having an FPKM >= (D) RNA-seq analysis of fold-change of expression after serum stimulation compared to before stimulation for gene subsets of all Pol II-bound and active genes. SEC co-bound genes after serum stimulation show a statistically significant difference in fold-change compared to all Pol II-bound and active genes (p < 0.05 by Welch s two sample t-test). (E) RT-qPCR analysis of the induction kinetics of 17 serum-inducible genes. Genes that recruit SEC are shown in yellow and genes that do not recruit SEC are shown in blue. Thus, SEC is frequently associated with the most rapidly activated genes after serum stimulation. Figure S9. The SEC/P-TEFb complex, but not Brd4/P-TEFb complex, is required for the Cyp26a1 gene activation by RA. (A) The P-TEFb inhibitor, flavopiridol (FP), abolishes the induction of Cyp26a1 by RA. ES cells were induced with RA for 1, 3 and 6

10 hours in the presence and absence of 1 μm of flavopiridol. RT-qPCR was used to measure the Cyp26a1 mrna levels at the indicated time points. (B) Induction of Cyp26a1 with RA is not affected by Brd4 knockdown. Two different shrna constructs targeting Brd4 and a non-targeting shrna were introduced by lentiviral infection for 3 days before RA treatment. Total RNAs were extracted from these treated cell samples and then subjected to RT-qPCR analysis. Error bars represent the standard deviation.