Understanding Transcripts in UCSC Genome Browser

Size: px
Start display at page:

Download "Understanding Transcripts in UCSC Genome Browser"

Transcription

1 Understanding Transcripts in UCSC Genome Browser Cline and Kent, Nature Biotechnology 27, (2009) 1

2 Gene Annotations 2

3 Gene Annotations 2 3

4 Galaxy Databases are not analyses tools. Databases are where you get the data. Browsers are where you visualize the results. For a bench biologist there is not much in between besides spreadsheets or Perl scripting. No tools for new datatypes Some datatypes generated by high throughput genomics are so new that there are no tools to analyze them. For example, how do you extract sequences of coding exons from the latest 28-way alignments of vertebrate genomes or analyze quality scores from 454/Solexa/ SOLiD? With Galaxy. Genomics is not really reproducible The Methods section of too many papers sound like the data were analyzed using a collection of in-house scripts. How do you repeat such a study? Galaxy saves every step of your analysis and allows you to share these workflows with others. Too many tools Bioinformatics publishes hundreds of application notes per year. How does one know which tool to use? Galaxy integrates a multitude of different tools by giving them the same look and feel and linking them to data warehouses. 4

5 5

6 Screen casts and documentation Introduction and Examples A picture is worth a thousand words, and by the same token a screencast is worth a million. This page lists screencasts highlighting fundamental aspects of Galaxy's functionality. For more screencasts and for older versions click here * Introduction to Galaxy Interface (12 min) This screencast is a quick introduction to Galaxy interface. As an example we download coordinates of all human exons and analyze their lengths to find the longest one. * Galaxy and UCSC (4 min) Explains how to use the direct connection between the UCSC Table Browser and Galaxy. * More on History (3 min) An in-depth look at sharing data and analyses within Galaxy. * Promoters and SNP (9 min) How to find promoters under relaxed selective constraint? * DNase I hypersensitive sites (5 min) How many genes have DNase I hypersensitive sites at the 5'-end? * ChIP-on-chip: Affy Sp1 versus CpGs (7 min) Which of the Sp1-enriched sites overlap with genomic annotations? * ChIP-on-chip: Affy Sp1 versus Genes (7 min) Which of the Sp1-enriched sites are within upstream 5 kb of protein-coding genes? 6

7 ChIP-Seq 7

8 ChIP-Seq,... 8

9 ChIP-Seq data integration Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells Xi Chen, 1,2,6 Han Xu, 3,6 Ping Yuan, 1 Fang Fang, 1,2 Mikael Huss, 4 Vinsensius B. Vega, 3 Eleanor Wong, 5 Yuriy L. Orlov, 4 Weiwei Zhang, 1,2 Jianming Jiang, 1,2 Yuin-Han Loh, 1,2 Hock Chuan Yeo, 4 Zhen Xuan Yeo, 4 Vipin Narang, 3 Kunde Ramamoorthy Govindarajan, 3 Bernard Leong, 3 Atif Shahab, 3 Yijun Ruan, 5 Guillaume Bourque, 3 Wing-Kin Sung, 3 Neil D. Clarke, 4 Chia-Lin Wei, 5, * and Huck-Hui Ng 1,2, * 1 Gene Regulation Laboratory, Genome Institute of Singapore, Singapore Department of Biological Sciences, National University of Singapore, Singapore Computational and Mathematical Biology 4 Computational and Systems Biology Group 5 Genome Technology and Biology Group Genome Institute of Singapore, Singapore These authors contributed equally to this work *Correspondence: weicl@gis.a-star.edu.sg (C.-L.W.), nghh@gis.a-star.edu.sg (H.-H.N.) DOI /j.cell SUMMARY Transcription factors (TFs) and their specific interactions with targets are crucial for specifying geneexpression programs. To gain insights into the transcriptional regulatory networks in embryonic stem (ES) cells, we use chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequencespecific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-myc, n-myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12). These factors are known to play different roles in ES-cell biology as components of the LIF and BMP signaling pathways, self-renewal regulators, and key reprogramming factors. Our study provides insights into the integration of the signaling pathways into the ES-cell-specific transcription circuitries. Intriguingly, we find specific genomic regions extensively targeted by different TFs. Collectively, the comprehensive mapping of TF-binding creation of genetically altered animals (Thomas and Capecchi, 1986). In addition, human ES cells can potentially serve as an inexhaustible source of cells for the derivation of clinically useful cells for regenerative medicine and cell-based therapy. Mouse ES cells were first isolated in 1981 from mouse blastocysts (Smith, 2001). Maintenance of the self-renewing state of mouse ES cells requires the cytokine leukemia inhibitory factor (LIF). The binding of LIF to its receptor activates STAT3 through phosphorylation (Niwa et al., 1998). LIF alone, however, is not sufficient to maintain ES cells, as their maintenance requires the presence of fetal calf serum. Bone morphogenetic proteins (BMPs) appear to be key serum-derived factors that act in conjunction with LIF to enhance the self-renewal and pluripotency of mouse ES cells (Ying et al., 2003). The binding of BMP4 to its receptors triggers the phosphorylation of Smad1 and activates the expression of members of the Id (inhibitor of differentiation) gene family. As ES cells overexpressing Ids can self-renew in the absence of BMP4, it is proposed that induction of Id expression is the critical contribution of the BMP/Smad pathway. Hence, the LIF and BMP signaling pathways play a central role in the maintenance of a pluripotential stem cell phenotype. Besides these signaling pathways, which sense the presence of extrinsic growth factors in the environment, intrinsic factors From their supplemental Excel sheets containing all peaks for the factors, we converted them into BED files 9