The MethDB DAS Server
|
|
- Monica Sims
- 6 years ago
- Views:
Transcription
1 [Epigenetics 1:2, e1-e5, EPUB Ahead of Print: March/April 2006]; 2006 Landes Bioscience Research Paper The MethDB DAS Server Adding an Epigenetic Information Layer to the Human Genome Vincent Negre Christoph Grunau* Institut de Génétique Humaine; CNRS UPR 1142; Montpellier, France *Correspondence to: Christoph Grunau; Institut de Génétique Humaine; CNRS UPR 1142; 141 rue de la Cardonille; Montpellier; France; Tel.: ; Fax: ; Received 10/03/06; Accepted 04/05/06 This manuscript has been published online, prior to printing for Epigenetics, Volume 1, Issue 2. Definitive page numbers have not been assigned. The current citation is: Epigenetics 2006; 1(2): Once the issue is complete and page numbers have been assigned, the citation will change accordingly. KEY WORDS biological database, distributed annotation system, DNA methylation, human genome ABBREVIATIONS MethDB DAS LDAS CGI ACKNOWLEDGEMENTS DNA methylation database distributed annotation system lightweight DAS server CpG island This work was supported by a grant of the BioSTIC Languedoc-Roussilon. We are grateful for technical support from the bioinformatics group of the Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier (LIRMM). ABSTRACT The DNA methylation database MethDB ( was developed in order to standardize and collect the dispersed data about this epigenetic phenomenon in a common resource. In the first version of MethDB, data was gathered by annotators and the database could only be queried. In a second step, we added an on-line data submission system that is open to the public. Here we present the DAS annotation server of MethDB that allows integration of MethDB into the network of biological databases via the Distributed Annotation System (DAS) and the representation of DNA methylation data as an epigenetic information layer to the human genome. In order to validate our system and to incorporate the data of the first large scale methylation analysis of the human genome, we assembled the sequences of the human CpG island tagging project into CpG islands and imported them into MethDB. The database contains now methylation content data and 5382 methylation patterns or profiles for 48 species, 1511 individuals, 198 tissues and cell lines and 79 phenotypes. INTRODUCTION Methylation of cytosine residues is a covalent modification of genomic DNA that adds epigenetic information to the primary DNA sequence. The goal of the DNA methylation database MethDB ( is to collect, standardize and annotate the available DNA methylation data and to make them available. The database is the major source for experimentally confirmed DNA methylation data. These data are regularly consulted via a dedicated web-server ( Recently, MethDB provided the entire training data set for the development of an algorithm to predict methylated sites in the human genome. 1 Here we describe the establishment of a new service that allows the integration of epigenetic data stored in MethDB into the network of biological databases via the Distributed Annotation System (DAS). 2 METHODS Annotation files for the MethDB LDAS were generated with el-dasionator ( atgc.lirmm.fr/cgi-bin/ldas/form.pl). 3 CpG island sequences were downloaded from as FASTA file and assembled into contigs with the Staden package. 4 Details are shown in Table 1. RESULTS Establishment of a distributed annotation system annotation server for MethDB. Methylation can be thought of as an additional information layer on the DNA. The idea of information layers is also applied in the Distributed Annotation System where annotations are anchored to a reference sequence (e.g. the human genome) and superposed in appropriate browsers as layers of annotations. Data exchange between reference server and annotation server follows standardized protocols, and consequently, several independent annotation servers can be connected to a reference sequence. We established an annotation server for MethDB using the Lightweight DAS (LDAS) server package ( servers/). Human DNA sequences for which methylation data are available in MethDB were aligned to the Ensembl reference sequence, and LDAS compatible annotation files were generated with el-dasionator. The MethDB DAS server is updated in regular intervals and can be accessed at using the DAS protocol. The Ensembl ContigView ( 5 is a popular genome browser that uses the DAS e1 Epigenetics 2006; Vol. 1 Issue 2
2 protocol. It provides a reference server and allows for the attachment of external DAS sources. A detailed description of how the MethDB DAS server can be attached to the Ensembl ContigView is available on the MethDB home page. For a given computer and a given web browser this attachment procedure has to be done only once. Since several JavaScripts have been included in the Ensembl browser recently, it appears that for compatibility reasons Firefox (www. mozilla.com) is the web-browser of choice. A list of web browsers that have been successfully tested is available on our instruction page ( After attachment, regions for which methylation data are available will be represented in the Ensembl ContigView as colored rectangles superimposed with the corresponding genomic sequence. These rectangles are hyperlinked to MethDB; clicking on them displays basic information and provides a direct access to the corresponding methylation data in MethDB. For a particular human locus, MethDB can now be queried by two alternative approaches: either directly via the generic query form of MethDB or via the Ensembl browser (or any other DAS compatible system). To our knowledge, MethDB is the only DNA methylation database that can be directly integrated into Ensembl. Integration of human CpG island sequences into MethDB. In mammalia, methylation is predominantly found in cytosine residues followed by a guanine residue (CpG pair). CpG pairs are underrepresented in the genome except for CpG islands (CGI) where they occur in statistically expected frequency. CpG islands in the 5' region of genes can be free of methylation while the rest of the genome is methylated. In an experimental approach based on this Table 1 Assembly of sequences from the CpG island tagging project treatment number of sequences after treatment download First filter ( 100 bp, 5% unknown bases) pre-gap with automatic vector clipping Gap4 alignment with 20 nucleotides initial contigs assembled match, 25 maximum pads per read, average length 163 bp 5% maximum mismatch (2265 without CpGs) Final filter ( 50 bp, 5% unknown bases) (1618 without CpGs) characteristic hypomethylation, Cross and colleagues had generated a library of human CpG islands using an affinity column and dividing the genome into methylated and unmethylated parts (6). Later, this library was sequenced by the CpG island tagging project, and sequences were made available for download at HGP/cgi.shtml as a single file of concatenated unordered FASTA sequences. In this form, the sequences could not be queried, and no information about the location on the genome was present. In addition, the sequence files still contained vector contaminations. In order to make the information accessible that these particular DNA sequences are hypomethylated, and to show that the MethDB DAS server is capable of handling large data sets, we downloaded the sequences, filtered them and assembled them into Figure 1. Representation of independent cross-confirmation of experimental results. Ensembl ContigView representation of a region of chromosome 17 ( In the upper lane of the annotations for the reverse strand the exon-intron structure ( Ensembl trans. ). The following lane ( CpG islands ) shows predicted CpG islands. The annotation layer CPG island clones is empty. The following layer contains DNA methylation annotations in MethDB and is labeled MethDB. Clicking on each colored rectangle provides additional information (not shown). Annotations in the MethDB layer are hyper-linked to MethDB. The predicted CGI and the independent methylation data in MethDB point toward the same region around Mb. Epigenetics e2
3 Figure 2: Superposition of independent annotation layers allows for the reconstruction of CGI. ContigView of a region of chromosome 11 around the CALCA gene ( The green rectangle corresponds to the CGI assembled from annotation layers MethDB and CPG island clones. contigs. These sequences representing experimentally confirmed hypomethylated areas in the genome were imported into MethDB and an arbitrary methylation score of 0 was assigned to them (0 = no methylation, 1 = maximum methylation) % (1618 sequences) do not contain CpG pairs and must probably be considered as false positives. Data upload to MethDB was accomplished through a simple script and further processing to the DAS server was done as outlined above. DISCUSSION Using a large CGI sequence data set we showed that MethDB is capable of integrating several thousand sequence-anchored methylation data and representing them as annotation layer via DAS. Because it is still not entirely clear what defines the genomic regions that become unmethylated or methylated during development, we believe that the superposition of different information like sites of transcription, exon positions and epigenetic data will help to put forward experimentally provable hypothesis. We will give in the following examples of how MethDB data can be used in combination with other information layers to cross-confirm methylation data from different sources, identify the borders of CGI and visualize the methylation within, reconstruct methylation profiles along a gene, rapidly identify problems in the experimental set-up, identify new CGI, and better plan experiments based on readily available data. Example 1: CGI of BRCA1 Cross-confirmation of experimental data. Figure 1 shows the superposition of three information sources representing the exon-intron structure of BRCA1, the location of a predicted CGI, and experimental methylation data in MethDB. The location of the transcription start, and G + C content and CpG density suggests a CGI around Mb (Fig. 1). The MethDB annotation CpG_island:10521 confirms the existence of an experimentally confirmed hypomethylated area in this region. MethDB annotations 5mC:101, 102 and 106 represent independent experiments Figure 3. Methylation data along the GSTP1 gene ( &h=ensg ). Clicking on the MethDB annotation links leads to detailed methylation data for each sequence segment represented by blue rectangles. The information can be used to reconstitute methylation profiles using different data sources. e3 Epigenetics 2006; Vol. 1 Issue 2
4 Figure 4. Direct visualization of strand specific data. Ensembl ContigView of a region around the GLA gene ( contigview?region=x&vc_start= &vc_end= &h=otthumg ). Two annotations are available in MethDB confirming the presence of a CGI. Annotation 5mC:6 points towards experimental data that were obtained analyzing the GLA locus, but the reverse strand was actually investigated. that analyzed the methylation state of this region. These are literature data linked to the corresponding publication for further details. All independent data sources point toward the same area resulting in mutual confirmation. Example 2: CALCA Reconstruction of a CGI. CGI can be predicted with bioinformatics tools. However, the sensitivity and specificity of such a prediction depends naturally on the chosen parameters, and as any prediction needs experimental confirmation. A more appropriate approach would be to identify CGI based on their hypomethylation state. In large-scale CGI cloning approaches, large CGI will for technical reasons not be covered entirely but must be reconstructed. Superposition of different data sources will facilitate this task. An example is shown in Figure 2. Three different annotation layers are shown: intron-exon structure of CALCA, CPG island clones of an independent CGI cloning project and annotations of MethDB. No CGI is predicted in this region by means of bioinformatics. Merging the data from the independent CGI cloning projects shows that the CGI actually spans from approximately position 14,949,850 to 14,953,050. For a subregion, further methylation data are available in MethDB (5mC:71). None of the individual project alone would have delivered this data but combination of data results in a natural reconstruction of the most probable CGI. Example 3: GSTP1 Generation of methylation profiles along a region. Technical constraints limit the length of sequence stretches that can analyzed for site-specific methylation. In general, overlapping or neighboring sequence fragments are analyzed. The reconstruction of methylation profiles can be relatively fastidious. The representation as annotation layer makes this task straightforward. An example is shown in Figure 3. Analysis of MethDB annotation 5mC:37-56 shows that the hypomethylated area spans actually from position 67,107,300 to 67,108,200. Example 4: Problematic data. Misassignment of methylation patterns to specific genes or genomic locations are surprisingly frequent in the literature. Since MethDB is a curated database, most of these incorrect results have be eliminated during data processing. However, incorrectness cannot be entirely excluded. Representation as information layer facilitates the identification of problematic data. The MethDB annotation 5mC:6 in Figure 4 refers actually to a methylation analysis of the GLA gene on the reverse strand. Here, the wrong strand was investigated. Since DNA methylation is probably symmetric, this does not invalidate the conclusions drawn in the original publication. However, annotation CpG_island:597 (Fig. 4) confirms the existence of a CGI in this area that is shared between GLA and HNRPH2, and critical re-evaluation of the original findings might be useful. Example 5: Identification of new CGIs. The superposition of different annotation layers allows for the identification of new CGI with potentially interesting methylation profiles. An example is shown in figure 5. In a region upstream of p16-ink4a (CDKN2A), both the CPG island clones and MethDB contain several hints for previously unidentified CGI. We decided to further analyze this region and extracted a FASTA sequence of chromosome 9 from position 21,954,000 to 21,960,000. This procedure can easily be done using the corresponding feature of the Ensembl ContigView. We next used MethPrimer ( for the prediction of CGI. This combination of experimental data and subsequent detailed bioinformatics analysis allowed the identification of three new CGI. The most 5' CGI is probably associated with a gene for a mrna with unknown function (EMBL ac.nr. AK128836) that was isolated from a testes library (Fig. 5, annotation layer Ensembl mrna), the two 3' CGI are located within CDKN2A and might have a function in the regulation of this gene. Investigation of their methylation status appears to be worthwhile. Example 6: MethDB allows better experiment planning. Predicted CGI often span several hundred basepairs, and a detailed analysis of the entire region is costly and labor-intensive. Additional experimental information would be welcome to narrow down unmethylated candidate regions. We have recently investigated the methylation status of the promotor region of the human gene for cyclin D1 (CCND1) ( start= &vc_end= ). The predicted CGI had a length of more than 3 kb and the analysis of the full sequence was not feasible. MethDB data from the CGI mapping project indicated two hypomethylated subregions within the predicted CGI (CpG_island: Epigenetics e4
5 Figure 5: Identification of new CpG islands. Upper lane Ensembl ContigView of a region downstream of the CDKN2A gene with experimental data in annotation layers CPG island clones and MethDB that suggest the existence of three previously unidentified CGI represented by green rectangles ( In the lower lane representation of the bioinformatics analysis of the same region using MethPrimer. Both experimental data and the prediction software point towards the same regions and CpG_island: ). Further examination of the sequence with bioinformatics tools and reporter assays allowed to identify the actual promotor region adjacent to CpG_island:13960, and the DNA methylation status was studied in a sequence fragment overlapping this annotated region. A detailed description of the experiments has been published, 7 and data will be available in MethDB with the next regular update. From a technical point of view it is noteworthy that now probably the sequences of most CpG islands are available in MethDB, and further methylation data that relate in general to CpG islands (e.g. from CGI micro-arrays) can be anchored to these sequences. External data can also be linked to MethDB using the sequence ID (example: The MethDB DAS server is hosted by the Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier (LIRMM) on a recently acquired mainframe resulting in excellent performance of this server. The speed of information display in the Enseml ContigView depends on the number of information layers the used chooses to view. The user should be aware that each time the display is refreshed, an enormous amount of data is requested via the network, and should restrict the displayed features correspondingly to shorten reply times. The increased amount of data in MethDB had also led to increased access times using the conventional web-form, and we decided to update the server hardware. Even complex searches take now less than 6 seconds. Large-scale DNA methylation mapping projects will in general develop their proper databases and often provide some way to represent their date in relation to the genomic sequence. However, mediumsize and single-locus studies will generally not make their data available in a database-compatible format. This is regrettable, since these studies usually deliver high-quality data that must be unearthed by conventional bibliographic searches. The availability of a DAS server for MethDB allows now integrating such data into an epigenetic information layer for the human genome, to use them concomitantly with other data sources, and to profit from mutual confirmation or the highlighting of conflicting results. References 1. Bhasin M, Zhang H, Reinherz EL, Reche PA. Prediction of methylated CpGs in DNA sequences using a support vector machine. FEBS Lett 2005; 579: Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics 2001; 2:7 3. Negre V, Grunau C. el-dasionator: an LDAS upload file generator. BMC Bioinformatics 2004; 5:55 4. Staden R, Beal KF, Bonfield JK. The Staden package, Methods Mol Biol 2000; 132: Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Birney E. Ensembl Nucleic Acids Res 2005; 33 Database Issue:D Cross SH, Charlton JA, Nan X, Bird AP. Purification of CpG islands using a methylated DNA binding column. Nat Genet 1994; 6: Krieger S, Grunau C, Sabbah M, Sola B. Cyclin D1 gene activation in human myeloma cells is independent of DNA hypomethylation or histone hyperacetylation. Exp Hematol 2005; 33: e5 Epigenetics 2006; Vol. 1 Issue 2
Browsing Genomes with Ensembl
April Feb 2006 2007 Browsing Genomes with Ensembl Joint project Ensembl - Project EMBL European Bioinformatics Institute (EBI) Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationCRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data.
CRAC: An integrated approach to analyse RNA-seq reads Additional File 4 Results on real RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 The real
More informationDeposited research article AutoPrime: selecting primers for expressed sequences Gunnar Wrobel*, Felix Kokocinski and Peter Lichter
This information has not been peer-reviewed. Responsibility for the findings rests solely with the author(s). Deposited research article AutoPrime: selecting primers for expressed sequences Gunnar Wrobel*,
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationTELEBIOINFORMATICS SERVICES OF POLITECNICO DI MILANO FOR POST-GENOMIC BIOMEDICAL APPLICATIONS
TELEBIOINFORMATICS SERVICES OF POLITECNICO DI MILANO FOR POST-GENOMIC BIOMEDICAL APPLICATIONS Marco Masseroli and Francesco Pinciroli Biomedical Informatics and Telemedicine Laboratory, Bioengineering
More informationAnnotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G
Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of
More informationPRIMEGENSw3 User Manual
PRIMEGENSw3 User Manual PRIMEGENSw3 is Web Server version of PRIMEGENS program to automate highthroughput primer and probe design. It provides three separate utilities to select targeted regions of interests
More informationMethDB a public database for DNA methylation data
270 274 Nucleic Acids Research, 2001, Vol. 29, No. 1 2001 Oxford University Press MethDB a public database for DNA methylation data Christoph Grunau*, Eric Renault, André Rosenthal 1 and Gérard Roizes
More informationChIP-seq and RNA-seq
ChIP-seq and RNA-seq Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions (ChIPchromatin immunoprecipitation)
More informationChroMoS Guide (version 1.2)
ChroMoS Guide (version 1.2) Background Genome-wide association studies (GWAS) reveal increasing number of disease-associated SNPs. Since majority of these SNPs are located in intergenic and intronic regions
More informationExperimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse
Karin Fleischhanderl; Martina Fondi Experimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse 108 - Biotechnologie Abstract --- Keywords: Alternative
More informationChIP-seq and RNA-seq. Farhat Habib
ChIP-seq and RNA-seq Farhat Habib fhabib@iiserpune.ac.in Biological Goals Learn how genomes encode the diverse patterns of gene expression that define each cell type and state. Protein-DNA interactions
More informationFigure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By
1 2 3 Figure 1. FasterD SERCH PGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- y keywords (ENSEML ID, HUGO gene name, synonyms or
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationImpact of gdna Integrity on the Outcome of DNA Methylation Studies
Impact of gdna Integrity on the Outcome of DNA Methylation Studies Application Note Nucleic Acid Analysis Authors Emily Putnam, Keith Booher, and Xueguang Sun Zymo Research Corporation, Irvine, CA, USA
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationAnnotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.
David Wang Bio 434W 4/27/15 Annotation of contig27 in the Muller F Element of D. elegans Abstract Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans. Genscan predicted six
More informationGenome Sequence Assembly
Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:
More informationDNA Methylation Mapping
E DNA Methylation Mapping 19 Designing PCR Primer for DNA Methylation Mapping Long-Cheng Li Summary DNA methylation is an epigenetic mechanism of gene regulation, and aberrant methylation has been associated
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationEnsembl: A New View of Genome Browsing
28 TECHNICAL NOTES EMBnet.news 15.3 Ensembl: A New View of Genome Browsing Giulietta M. Spudich and Xosé M. Fernández- Suárez European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxon, Cambs,
More informationNimbleGen Arrays and LightCycler 480 System: A Complete Workflow for DNA Methylation Biomarker Discovery and Validation.
Cancer Research Application Note No. 9 NimbleGen Arrays and LightCycler 480 System: A Complete Workflow for DNA Methylation Biomarker Discovery and Validation Tomasz Kazimierz Wojdacz, PhD Institute of
More informationTutorial. Bisulfite Sequencing. Sample to Insight. September 15, 2016
Bisulfite Sequencing September 15, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Bisulfite
More informationAssemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz
Assemblytics: a web analytics tool for the detection of assembly-based variants Maria Nattestad and Michael C. Schatz Table of Contents Supplementary Note 1: Unique Anchor Filtering Supplementary Figure
More informationHC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet
HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet NAME: DATE: QUESTION ONE Using primers given to you by your TA, you carried out sequencing reactions to determine the identity of the
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationGenome Projects. Part III. Assembly and sequencing of human genomes
Genome Projects Part III Assembly and sequencing of human genomes All current genome sequencing strategies are clone-based. 1. ordered clone sequencing e.g., C. elegans well suited for repetitive sequences
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationSequence Annotation & Designing Gene-specific qpcr Primers (computational)
James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More informationAnnotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More informationEnsembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically
More informationMODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?
MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome
More informationIdentifying Regulatory Regions using Multiple Sequence Alignments
Identifying Regulatory Regions using Multiple Sequence Alignments Prerequisites: BLAST Exercise: Detecting and Interpreting Genetic Homology. Resources: ClustalW is available at http://www.ebi.ac.uk/tools/clustalw2/index.html
More informationApplications of HMMs in Epigenomics
I529: Machine Learning in Bioinformatics (Spring 2013) Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Background:
More informationBiotechnology Explorer
Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual
More informationNGS Approaches to Epigenomics
I519 Introduction to Bioinformatics, 2013 NGS Approaches to Epigenomics Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Background: chromatin structure & DNA methylation Epigenomic
More informationMate-pair library data improves genome assembly
De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationIntroduction to Cellular Biology and Bioinformatics. Farzaneh Salari
Introduction to Cellular Biology and Bioinformatics Farzaneh Salari Outline Bioinformatics Cellular Biology A Bioinformatics Problem What is bioinformatics? Computer Science Statistics Bioinformatics Mathematics...
More informationSundaram DGA43A19 Page 1. Finishing Drosophila grimshawi Fosmid: DGA43A19 Varun Sundaram 2/16/09
Sundaram DGA43A19 Page 1 Finishing Drosophila grimshawi Fosmid: DGA43A19 Varun Sundaram 2/16/09 Sundaram DGA43A19 Page 2 Abstract My project focused on the fosmid clone DGA43A19. The main problems with
More informationFast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing:
Fast, Accurate and Sensitive DNA Variant Detection from Sanger Sequencing: Patented, Anti-Correlation Technology Provides 99.5% Accuracy & Sensitivity to 5% Variant Knowledge Base and External Annotation
More information1. The AGI (Arabidospis Genome Initiative) convention gene names or AtRTPrimer ID should
We will show how users can select their desired types of primer-pairs, as we explain each of forms indicated by the blue-filled rectangles of Figure 1. Figure 1 Front-end webpage for searching desired
More informationA Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin.
1 A Guide to Consed Michelle Itano, Carolyn Cain, Tien Chusak, Justin Richner, and SCR Elgin. Main Window Figure 1. The Main Window is the starting point when Consed is opened. From here, you can access
More information7. For What Molecule Do Genes Contain The
7. For What Molecule Do Genes Contain The Instructions For Building For what molecule do genes contain the instructions for building? Are there many or few acetyl molecules attached to the genes associated
More informationSupplementary Figures
Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived
More informationPackage DTA. April 11, 2018
Type Package Title Dynamic Transcriptome Analysis Version 2.24.0 Date 2012-03-22 Package DTA April 11, 2018 Author Bjoern Schwalb, Benedikt Zacher, Sebastian Duemcke, Achim Tresch Maintainer Bjoern Schwalb
More information2/19/13. Contents. Applications of HMMs in Epigenomics
2/19/13 I529: Machine Learning in Bioinformatics (Spring 2013) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Background:
More informationGenome annotation & EST
Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary
More informationThe Genome Analysis Centre. Building Excellence in Genomics and Computational Bioscience
Building Excellence in Genomics and Computational Bioscience Wheat genome sequencing: an update from TGAC Sequencing Technology Development now Plant & Microbial Genomics Group Leader Matthew Clark matt.clark@tgac.ac.uk
More informationReading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction
Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain
More informationSection C: The Control of Gene Expression
Section C: The Control of Gene Expression 1. Each cell of a multicellular eukaryote expresses only a small fraction of its genes 2. The control of gene expression can occur at any step in the pathway from
More informationConstruction of plant complementation vector and generation of transgenic plants
MATERIAL S AND METHODS Plant materials and growth conditions Arabidopsis ecotype Columbia (Col0) was used for this study. SALK_072009, SALK_076309, and SALK_027645 were obtained from the Arabidopsis Biological
More informationEPIGENTEK. EpiQuik Methylated DNA Immunoprecipitation Kit. Base Catalog # P-2019 PLEASE READ THIS ENTIRE USER GUIDE BEFORE USE
EpiQuik Methylated DNA Immunoprecipitation Kit Base Catalog # PLEASE READ THIS ENTIRE USER GUIDE BEFORE USE The EpiQuik MeDIP Kit can be used for immunoprecipitating the methylated DNA from a broad range
More informationBioinformatics Course AA 2017/2018 Tutorial 2
UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it
More informationUnit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression
Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression On completion of this subtopic I will be able to State the meanings of the terms genotype,
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationResult Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for
Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for you to discover methylation changes at specific genomic
More information(Candidate Gene Selection Protocol for Pig cdna Chip Manufacture Using TIGR Gene Indices)
(Candidate Gene Selection Protocol for Pig Chip Manufacture Using TIGR Gene Indices) Chip Chip Chip Red Hat Linux 80 MySQL Perl Script TIGR(The Institute for Genome Research http://wwwtigrorg) SsGI (Sus
More informationBasics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility
2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationHomework 4. Due in class, Wednesday, November 10, 2004
1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors
More informationSars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway
Joseph F. Ryan* Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen, Norway Current Address: Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine,
More informationMarch 9, Hidden Markov Models and. BioInformatics, Part I. Steven R. Dunbar. Intro. BioInformatics Problem. Hidden Markov.
and, and, March 9, 2017 1 / 30 Outline and, 1 2 3 4 2 / 30 Background and, Prof E. Moriyama (SBS) has a Seminar SBS, Math, Computer Science, Statistics Extensive use of program "HMMer" Britney (Hinds)
More informationGATCGTGCACGATCTCGGCAATTCGGGATGCCGGCTCGTCACCGGTCGCT
Problem. (pts) A. (5pts) Your colleague professor Eugene Mathew Lateed generated a genome-wide DNA methylation map for normal colon cells using MRE-seq and MeDIP-seq. In an intergenic region, he found
More informationPrimePCR Assay Validation Report
Gene Information Gene Name heat shock 10kDa protein 1 (chaperonin 10) Gene Symbol Organism Gene Summary Gene Aliases RefSeq Accession No. UniGene ID Ensembl Gene ID HSPE1 Human This gene encodes a major
More informationEpigenetic Analysis: ChIP-chip and ChIP-seq
Chapter 25 Epigenetic Analysis: ChIP-chip and ChIP-seq Matteo Pellegrini and Roberto Ferrari Abstract The access of transcription factors and the replication machinery to DNA is regulated by the epigenetic
More informationReference genomes and common file formats
Reference genomes and common file formats Overview Reference genomes and GRC Fasta and FastQ (unaligned sequences) SAM/BAM (aligned sequences) Summarized genomic features BED (genomic intervals) GFF/GTF
More informationGENEXPLORER: AN INTERACTIVE TOOL TO STUDY REPEAT GENE SEQUENCE IN THE HUMAN GENOME DEVANGANA KAR. (Under the Direction of Eileen T.
GENEXPLORER: AN INTERACTIVE TOOL TO STUDY REPEAT GENE SEQUENCE IN THE HUMAN GENOME by DEVANGANA KAR (Under the Direction of Eileen T. Kraemer) ABSTRACT A large part of the human genome is made up of repeating
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationBrowser Exercises - I. Alignments and Comparative genomics
Browser Exercises - I Alignments and Comparative genomics 1. Navigating to the Genome Browser (GBrowse) Note: For this exercise use http://www.tritrypdb.org a. Navigate to the Genome Browser (GBrowse)
More informationBionano Access v1.1 Release Notes
Bionano Access v1.1 Release Notes Document Number: 30188 Document Revision: C For Research Use Only. Not for use in diagnostic procedures. Copyright 2017 Bionano Genomics, Inc. All Rights Reserved. Table
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationMicroarray Data Analysis in GeneSpring GX 11. Month ##, 200X
Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options
More informationA Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool of BAC Clones and High-throughput Technology
Send Orders for Reprints to reprints@benthamscience.ae 210 The Open Biotechnology Journal, 2015, 9, 210-215 Open Access A Short Sequence Splicing Method for Genome Assembly Using a Three- Dimensional Mixing-Pool
More informationBIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology
BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationEnsEMBL and the process of genebuild
EnsEMBL and the process of genebuild Julio Fernández Banet (jb16@sanger.ac.uk) Wellcome Trust Sanger Institute EnsEMBL Group (Genebuild Team) 01 - Dec- 2006 Overview What is Ensembl? Ensembl project Open
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationYong Wang and Frederick C.C. Leung
BIOINFORMATICS Vol. 20 no. 7 2004, pages 1170 1177 DOI: 10.1093/bioinformatics/bth059 An evaluation of new criteria for CpG islands in the human genome as gene markers Yong Wang and Frederick C.C. Leung
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationTutorial for Stop codon reassignment in the wild
Tutorial for Stop codon reassignment in the wild Learning Objectives This tutorial has two learning objectives: 1. Finding evidence of stop codon reassignment on DNA fragments. 2. Detecting and confirming
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationFinishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome
Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Ruth Howe Bio 434W 27 February 2010 Abstract The fourth or dot chromosome of Drosophila species is composed primarily of highly condensed,
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationThe Human Genome Project
The Human Genome Project The Human Genome Project Began in 1990 The Mission of the HGP: The quest to understand the human genome and the role it plays in both health and disease. The true payoff from the
More informationLecture 2: Biology Basics Continued. Fall 2018 August 23, 2018
Lecture 2: Biology Basics Continued Fall 2018 August 23, 2018 Genetic Material for Life Central Dogma DNA: The Code of Life The structure and the four genomic letters code for all living organisms Adenine,
More informationChimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang
Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang Ruth Howe Bio 434W April 1, 2010 INTRODUCTION De novo annotation is the process by which a finished genomic sequence is searched for
More informationGo to Bottom Left click WashU Epigenome Browser. Click
Now you are going to look at the Human Epigenome Browswer. It has a more sophisticated but weirder interface than the UCSC Genome Browser. All the data that you will view as tracks is in reality just files
More informationIntroduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationSelected Techniques Part I
1 Selected Techniques Part I Gel Electrophoresis Can be both qualitative and quantitative Qualitative About what size is the fragment? How many fragments are present? Is there in insert or not? Quantitative
More informationGalaxy Platform For NGS Data Analyses
Galaxy Platform For NGS Data Analyses Weihong Yan wyan@chem.ucla.edu Collaboratory Web Site http://qcb.ucla.edu/collaboratory http://collaboratory.lifesci.ucla.edu Workshop Outline ü Day 1 UCLA galaxy
More informationReference genomes and common file formats
Reference genomes and common file formats Dóra Bihary MRC Cancer Unit, University of Cambridge CRUK Functional Genomics Workshop September 2017 Overview Reference genomes and GRC Fasta and FastQ (unaligned
More informationChapter 24: Promoters and Enhancers
Chapter 24: Promoters and Enhancers A typical gene transcribed by RNA polymerase II has a promoter that usually extends upstream from the site where transcription is initiated the (#1) of transcription
More information