SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 18Pura Brain Liver 2 kb 9Mtmr Heart Brain 1.4 kb 15Ppara 1.5 kb IPTpcc 1.3 kb GAPDH 1.5 kb IPGpr1 2 kb GAPDH 1.5 kb 17Tcte 13Elov12 Lung Thymus 6 kb 5 kb 17Tbcc Kidney Embryo 4.5 kb 3.7 kb GAPDH 1.5 kb GAPDH 1.5 kb Figure 1: Northern blot validation of lincrnas. RNA blot analysis was performed on 7 tissues (Brain, Liver, Lung, Thymus, Heart, Embryo, and Kidney). Hybridization of 12 lincrnas are shown for randomly selected lincrnas. A reference name for each lincrna is shown on the left, the predicted sizes are indicated on the right. For each tissue, GAPDH was hybridized as a control. 1

2 0.6 Protein Coding RNA (CSF > 0) Known Protein Coding Transcribed Segments Intergenic K36me3 Transcribed Segments non-coding RNA (CSF < 0) Number of Informative Mammalian Alignments frequency Transcribed Segments with protein coding potential Transcribed Segments lacking protein coding potential (lincrna) Figure 2: Transcribed Segments in intergenic K4/K36 domains do not have coding potential. The normalized CSF score for each exon determined from our tiling microarray is plotted. Red dots represent known protein coding controls and blue represents novel exons both determined by hybridization to a tiling array. The y-axis represent the normalized CSF score (CSF<0, noncoding, CSF>0 protein coding). The x-axis is the number of mammalian species that contributed to the determination of coding potential. The frequency graph of the right indicates the proportion of novel intergenic exons that were seen at each value. The horizontal dashed line indicates the threshold for protein-coding and non-coding determination. 2

3 lincrna Exon Conservation Guttman et al. Supplemental Figure 3 Cumulative Frequency Introns Exons lincrna Exons UTRs Intergenic Fantom (alignable) Intergenic Fantom (all) Pi LOD Enrichment Figure 3. lincrna Exon Conservation Compared with FANTOM and UTRs. Sequence conservation across 21 mammalian species is plotted cumulatively across each exon in the lincrna transcript (Blue), protein coding exons (Green), and introns of protein coding genes (Red), as well as alignable FANTOM exons (pink), all FANTOM exons (black), and UTRs (orange). The X-axis is the enrichment of the log odds score of the Pi estimator (see methods) normalized by random genomic regions, thus larger LOD scores are more highly conserved. 3

4 RNA Polymerase II -10Kb 0Kb 10Kb Location Relative to Start of K36me3 Unit RNA Polymerase II Enrichment Density Intergenic lincrna Promoter Protein Coding Promoter Enrichment Level Figure 4. lincrna Promoters Are Enriched for RNA Polymerase II binding. (A) The average enrichment of RNA Polymerase II as a function of the distance from the start of the K36me3 unit. (B) Distibutions of the enrichment of RNA Polymerase II over the promoter regions of lincrnas (blue), protein coding genes (green), and random (non-masked) genomic regions (red). 4

5 a Guttman et al. Supplemental Figure 5 LincRNA Exons ES MEF WE WE HL-9.5 WE-9.5 FL-9.5 WE-13.5 FL-13.5 HL-13.5 WE-10.5 HL-10.5 FL-10.5 Testes Universal Ovaries Lung NPC Brain DC:TLR9 DC:TLR1 DC:TLR4 DC b Bootstrap confidence TLR2 93 TLR9 BMDC TLR4 50 MEF Brain Testes Ovaries Lung NPC WE13.5 WE9.5 FL9.5 WE10.5 HL10.5 FL10.5 ES 92 HL13.5 FL13.5 Figure 5. lincrnas cluster experiments in biologically meaningful ways. A hierarchical clustering heatmap of lincrnas across 19 conditions (Whole Embryo E9.5, E10.5, E13.5, Hindlimb E9.5, E10.5, E13.5, Forelimb E9.5, E10.5, E13.5, Embryonic Stem Cells, Embryonic Fibroblasts, Neural Progenitor Cells, Ovary, Testis, Lung, Brain, Dendridic Cells, TLR2, TLR4, TLR9) blue represent low expression and red represents high expression. Bootstrapping was performed to determine the reproducibility of these clusters. The hierarchical tree and associated bootstrap values are indicated along the tree (bottom). 5

6 a Protein Coding Genes lincrna b lincrna lincrna Figure 6. lincrnas are correlated with groups of protein coding genes. (A) A hierarchically clustered heatmap of the correlation between lincrnas and protein coding genes across 19 conditions is shown. Blue represent negative correlation, red represents positive correlation, and white indicates no correlation. lincrnas are indicated along the columns and protein coding genes along the rows. (B) Correlation matrix showing numerous blocks of lincrnas with correlated expression (red) as well as blocks of lincrnas with anticorrelated expression (blue). 6

7 Guttman et al. Supplemental Figure 7 Brain Cluster GPCRS_CLASS_A_RHODOPSIN_LIKE Muscle Cluster MUSCLE_MYOSIN AGEING_BRAIN_DN NO1PATHWA HIPPOCAMPUS_DEVELOPMENT CREBPATHWAY GPCRPATHWAY CALCINEURINPATHWAY GABAPATHWAY DEPRESSION_DN OLIGODENDROGLIA_MYELINATION NFATPATHWAY PIP3_SIGNALING ALKPATHWAY INSULIN_UP STRIATED_MUSCLE_CONTRACTION MYOD_UP FIBRINOLYSISPATHWAY GSK3PATHWAY COCAINE_BRAIN_5D_UP HSC_MATURE_SHARED PGC1APATHWAY BRENTANI_CYTOSKELETON G-protein coupled receptor protein muscle development rhodopsin-like receptor activity actin cytoskeleton signaltransducer activity cytoskeleton cyclic-nucleotide-mediated signaling cell motility second-messenger-mediated signaling morphogenesis peptide binding muscle ber camp nucleotide second messenger structural constituent of muscle transmission of nerve impulse cytoskeletal protein binding synaptic transmission myo bril IP3 second messenger sarcomere log(enrichment p-value) log(enrichment p-value) Figure 7. lincrnas are associated with many biological processes. We performed biclustering on lincrnas by functional terms and identify many significant biological processes including (a) a brain cluster (b) muscle development cluster and (c) mirna regulated cluster. Heatmaps of these biclusters, lincrnas are indicated by columns and the Gene Set terms that they associate with are indicated in rows. Blue boxes represent negative association, Red boxes represent positive association, and white boxes represent no association. Representative functional terms are indicated on the right of each heatmap. Gene Ontology was used to determine general categories of each bicluster and the results are plotted as a bar graph below each cluster. The length of each bar represent the log(p-value) for the enrichment of each term._ 7

8 a DNA Standards (Recombined: Un-Recombined) 0: 25:75 50:50 75:50 :0 AdGFP AdCre H 2 O Recombined Dox: AdGFP U 9h 6h 3h 1h AdCre U 9h 6h 3h 1h Un-Recombined p53 Hsp90 b T0 T3 T6 T9 c 117 Exons 39 lincrnas P53 linc promoter All linc promoters High Binding Affinity Low Binding Affinity Motif Score Figure 8. p53 regulated lincrnas. (a) DNA Gel and Western blot showing the accurate reactivation of p53 as described27 (b) Shows a heatmap of 39 lincrnas that show a temporal induction across a p53 induction time course. (c) These RNAs are enriched for the p53 binding motif. p53 induced lincrna promoters (red) compared with all lincrna promoters (blue). 8

9 Ivanova et al ES MEF Gene Set Association (FDR<.001) 1) Embryonic Stem Cell Upregulated 2) Cell proliferation 3) DNA replication MLF NPC ES RNA Figure 9. A lincrna functionally associated with cell proliferation is ES cells is required to maintain proper cell proliferation rates in ES cells. A figure from Ivanova et al describing the screen they performed to identify genes involved in ES pluripotency (top). One of the top 10 hits is a lincrna expressed only in ES cells and functionally associated with cell proliferation in ES cells (boxed gene). The K4-K36 across ES, MEF, MLF, and NPC is shown along with the RNA peaks identified in ES (red). 9

10 a b c d e Figure 10. lincrnas are positionally enriched near transcription factors. (a) Gene Ontology (GO) enrichment was computed for all lincrna neighbors and are plotted as the log(p-value). (b) Swiss-Prot key words from DAVID ( and enriched domains are plotted as the log(pvalue). (c) The distribution of enriched GO terms for protein coding genes that neighbour lincrnas are displayed. (d) The distribution of enriched GO terms for randomized lincrna datasets (methods) and their associated gene neighbours is indicated. (e) Pie charts show the proportion of genes annotated as transcription factors (red) and other genes (blue). The proportion of lincrna neighbour genes are shown on the left as compared to all known protein-coding genes (right). 10

11 Repeat Type 3.5 Repeat Enrichment over random lincrna Protein Coding Genes 0 Repeat Subtype 5 Repeat Enrichment over random lincrna ProteinCoding Genes 1 0 Figure 11. Repeat Content Enrichment in lincrnas. A barchart plotting the enrichment of repeat elements of various types (top) and subtypes (bottom) are plotted for lincrnas (dark gray bars) and protein-coding genes (light gray). The average enrichment is indicated by the height of each bar and the error bars indicate the standard deviation of the estimate. Enrichment is defined as the number of repeats divided by the expected number of repeats for random regions of equal size. 11