Tutorial section. VEGA, the genome browser with a difference
|
|
- Brittany Robbins
- 5 years ago
- Views:
Transcription
1 VEGA, the genome browser with a difference Keywords: vertebrate, annotation, database, manual, curation Abstract The Vertebrate Genome Annotation (Vega) database is a community resource for browsing manual annotation from a variety of vertebrate genomes of finished sequence ( vega.sanger.ac.uk). Vega is different from other genome browsers as it has a standardised classification of genes which encompasses pseudogenes and non-coding transcripts. The data is manually curated, which is more accurate at identifying splice variants, pseudogenes poly(a) features, non-coding and complex gene structures and arrangements than current automated methods. The database also contains annotation from regions, not just whole genomes, and displays multiple species annotation (human, mouse, dog and zebrafish) for comparative analysis. Vega encourages community feedback that results in annotation updates and manual annotation of finished vertebrate sequence. Since completion of the draft human genome sequence in ,2 and the subsequent finishing of this in many different genome browsers have been developed to enable scientists to access genome data. The initial interpretation of the human genome was through automated annotation such as Ensembl 4 and the UCSC genome browsers. 5 There are currently limits to an automated approach for the analysis of genomes, for example in duplicated regions identifying unprocessed pseudogenes, and therefore there is still a need for manual intervention. As the genome sequence became finished, quality curated browsers such as MapView 6,7 and the H-InvDB 8,9 were developed. The Vertebrate Genome Annotation (Vega) database 10 is a community resource for browsing manual annotation from a variety of vertebrate genomes of finished sequence. 11 Vega is based on the Ensembl schema, with gene objects shown in shades of blue, and also incorporates curation-specific data. The database allows users to view the manual annotation provided by the Havana group at the Wellcome Trust Sanger Institute (WTSI), 12 IMB-Jena, the Joint Genome Institute, Genoscope and Washington University. It currently contains the manual annotation of ten human chromosomes (6, 7, 9, 10, 13, 14, 20, 22, X and Y). As the genome sequencing centres publish the annotation and analysis of their chromosomes then the data will be accessible in Vega. Why is Vega different from other browsers? It has a standardised classification of genes which encompasses pseudogenes and non-coding transcripts. PolyA sites/signals are annotated. The data are manually curated. The data are periodically updated. It contains annotation of haplotypes. & HENRY STEWART PUBLICATIONS BRIEFINGS IN BIOINFORMATICS. VOL 6. NO JUNE
2 Table 1: Vega annotation definitions Known Novel Novel transcript Putative Pseudogene Predicted Ig segment Ig pseudogene segment Identical to human cdna or protein sequences in the Entrez Gene database ( query.fcgi?db¼gene/) Have an open reading frame and are identical or homologous to known vertebrate cdnas and/or proteins from all species Similar to novel gene but no open reading frame or open reading frame ambiguous Homologous to spliced vertebrate expressed sequence tags (ESTs) with no significant open reading frame Homologous to protein sequences with a disrupted CDS and an active gene can be found at another locus Based on ab initio prediction for which at least one exon is supported by biological data (unspliced ESTs, protein sequence similarity with mouse or tetraodon genomes) Only used in chromosome 14 Immunoglobulin gene segments Inactivated immunoglobulin segment Single nucleotide polymorphisms (SNPs) are mapped to manual curation. It is multispecies and small regions of finished sequence can be submitted and annotated as well as whole genomes. It encourages community feedback and results in annotation updates. GENE CLASSIFICATION A standardised set of definitions has been used to categorise the annotation of the different gene features (Table 1). Irrespective of which category gene objects have been assigned to all annotated gene structures are supported by homology to cdnas, expressed sequence tags (ESTs) or protein sequences. GENE NAMING It is important to use the correct gene nomenclature to maintain consistency in the annotation database, especially when comparing haplotypic or syntenic regions. The Vega annotators interact closely with the nomenclature committees from the Human Genome Organisation (HUGO, HGNC), 13 Zebrafish Information Network (ZFIN) 14 and Mouse Genome Database (MGD). 15 If an approved symbol is not available for a gene locus, an interim identifier is used in the format of international clone identifier followed by number, eg RP11 695B14.2. All loci and their associated transcripts and exons are given stable versioned database IDs (eg OTTHUMG ) that are generated and tracked in the Otter database 16 that underlies Vega (see Figure 1). Whenever a locus is edited the version number increases and the date of the change saved. MAIN FEATURES OF VEGA Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation features, non-coding genes, complex gene arrangements and clusters than automated methods. Splice variants account for approximately 50 per cent of gene loci in finished chromosomes 9, 10 and X, with an average of 2.5 alternative transcripts per locus. Note the majority are noncoding but have canonical splice sites. Splice variants must be supported by splicing EST/cDNA evidence, but the presence of a coding sequence (CDS) is not essential. Hence the majority of variants are annotated without a CDS. ESTs and cdnas from different species are also used as evidence to predict alternative transcripts as genome comparison studies have shown that gene structures are generally conserved between human and mouse. 17 Pseudogenes are defined as nonfunctional copies of genes and are categorised in Vega into unprocessed and processed pseudogenes (viewed in two shades of grey). They are generated by 190 & HENRY STEWART PUBLICATIONS BRIEFINGS IN BIOINFORMATICS. VOL 6. NO JUNE 2005
3 Official HUGO ID Gene last modified date Stable Otter ID for gene locus Splice variants: 7 coding, 1 non-coding, each with stable Otter transcript ID Figure 1: Curated Locus Report giving information about the NFB1 locus on chromosome 9 either of two mechanisms: retrotransposition or duplication of genomic DNA. Those that arise from retrotransposition are called processed pseudogenes 18 and have no 59 promoter sequence or introns but generally have an integrated poly(a) tail at the 39 end that often retains the poly(a) signal. Unprocessed pseudogenes have arisen from genomic duplication and often have a structure that is very similar to the ancestral gene and may even splice correctly. The majority of pseudogenes of both types contain frameshifts and/or stop codons in the coding region. Pseudogenes are valuable in annotation as they have been implicated in human disease 19 and can be used to study evolution. Poly(A) sites /signals are annotated and may be browsed in Vega. Poly(A) signals are displayed in light red and poly(a) sites in dark red in contigview. Alternative polyadenylation appears to affect many higher eukaryotes, mainly in a tissue-dependent manner which may be implicated in disease. 20 All poly(a) features are checked manually, using large numbers of ESTs marking out the 39 ends of genes and the fact that signals (of which there are 10 variants in human 21 ) are usually found within 60 bases of the poly(a) site. SNPs can be viewed in ContigView and are mapped from the Glovar database 22 onto the clones within Vega. Glovar contains all the data from dbsnp together with SNPs found from comparisons of the trace repository 23 with the current genome build. Using Vega annotation, SNPs are classified as coding (red), untranslated region (pink), intronic (blue) or other (grey). ACCESSING AND QUERYING DATA As the Vega browser is based on Ensembl web code it has similar standard entry points such as keyword search and & HENRY STEWART PUBLICATIONS BRIEFINGS IN BIOINFORMATICS. VOL 6. NO JUNE
4 similarity searching (BLAST, SSAHA). ExportView can be used to download data in formats such as FastA, Gene Feature Format (GFF) and flat files. There is also direct access to annotation via a distributed annotation server (DAS). If required, the Ensembl API 24 can be used to perform more comprehensive searches of the Vega data. Also Vega genes mapped to the current genome assembly can be downloaded from Ensembl using Ensmart. MHC HAPLOTYPE ANNOTATION Unlike other browsers Vega can also contain annotations from regions, not just whole chromosomes. Regions available include the haplotype COX for the major histocompatibility complex (MHC) on human chromosome 6, with more haplotypes to follow. 25 ACCESSING MULTISPECIES ANNOTATION IN VEGA Vega can display multiple species annotation for comparative analysis. In the mouse annotation browser selected regions such as the Del36H deletion region on chromosome 13 and the insulin-dependent diabetes (IDD) susceptibility loci regions. The latter are annotated in both the reference mouse strain (C57BL/6) and the non-obese diabetic (Nod) strain. 26 The zebrafish genome is being sequenced in its entirety at the Sanger Institute and Vega will be the main site for browsing the manually curated data. The reference is Tuebingen strain and Vega currently displays chromosomes/linkage groups 1 25 plus one artificial chromosome, U, that contains all clones with unknown chromosomal locations. The AB chromosome displays clones from the AB strain. Manual annotation is added on a monthly basis and clones which have not yet been annotated (displayed in grey) are shown with features from automated computational analysis (repeat masking, BLAST searches, etc). Recently the finished sequence of the MHC (DLA) class II region from the dog breed Doberman has been annotated and is available in Vega. 27 The sequence displays a high level of conservation with the human, cat and mouse class II region. COMMUNITY FEEDBACK Vega is a community annotation database and therefore to maintain up-to-date annotation it is essential to have feedback from researchers. A webform 28 is available by which users can contact the Vega team to improve/correct annotation if there is additional evidence. Manual annotation of finished vertebrate sequence may also be submitted if it has been peer reviewed and/or meet the annotation standards. 29 FUTURE DEVELOPMENTS IN VEGA Currently available genome browsers often display different transcript structures for the same loci. In order to produce a single standard human gene set the Consensus CDS (CCDS) project has been set up between NCBI, USCS, Ensembl and the Havana group. The aim is to compare the human gene sets produced by RefSeq, Ensembl and Vega and then identify transcripts where the protein coding region is agreed on by all collaborators. These CDSs will be identified by stable CCDS identifiers in all the browsers. In the near future manual annotation of the regions for the ENCODE project 30,31 will be displayed in Vega. As mouse and zebrafish genomes reach completion it is hoped that the manually annotated orthologues may be browsed using multicontigview which is already available in Ensembl. Acknowledgments I gratefully acknowledge the help of Dr Jennifer Ashurst and Dr Laurens Wilming at the Wellcome Trust Sanger Institute. Dr Jane Loveland HAVANA Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA, UK 192 & HENRY STEWART PUBLICATIONS BRIEFINGS IN BIOINFORMATICS. VOL 6. NO JUNE 2005
5 References Tel: +44 (0) Fax: +44 (0) Lander, E. S., Linton, L. M., Birren, B. et al. (2001), Initial sequencing and analysis of the human genome, Nature, Vol. 409(6822), pp Venter, J. C., Adams, M. D., Myers, E. W. et al. (2001), The sequence of the human genome, Science, Vol. 291(5507), pp International Human Genome Sequencing Consortium (2004), Finishing the euchromatic sequence of the human genome, Nature, Vol. 431(7011), pp Hubbard, T., Andrews, D., Caccamo, M. et al. (2005), Ensembl 2005, Nucleic Acids Res., Vol. 33 (Database issue), pp. D Kent, W. J., Sugnet, C. W., Furey, T. S. et al. (2002), The human genome browser at UCSC, Genome Res., Vol. 12(6), pp Wheeler, D. L., Chappey, C., Lash, A. E. et al. (2002), Database resources of the National Center for Biotechnology Information: 2002 update, Nucleic Acids Res., Vol. 30(1), pp URL: mapview/ 8. Imanishi, T., Itoh, T., Suzuki, Y. et al. (2004), Integrative annotation of 21,037 human genes validated by full-length cdna clones, PLoS Biol., Vol. 2(6), p. e URL: Ashurst, J. L., Chen, C.-K., Gilbert, J. G. R. et al. (2005), The Vertebrate Genome Annotation (Vega) database, Nucleic Acids Res., Vol. 33 (Database issue), pp. D URL: URL: Wain, H. M., Lush, M. J., Ducluzeau, F. et al. (2004), Genew: The Human Gene Nomenclature Database, 2004 updates, Nucleic Acids Res., Vol. 32 (Database issue), pp. D Sprague, J., Clements, D., Conlin, T. et al. (2003), The Zebrafish Information Network (ZFIN): The zebrafish model organism database, Nucleic Acids Res., Vol. 31(1), pp Eppig, J. T., Bult, C. J., Kadin, J. A. et al. (2005), The Mouse Genome Database (MGD): From genes to mice a community resource for mouse biology, Nucleic Acids Res., Vol. 33 (Database issue), pp. D Searle, S. M., Gilbert, J., Iyer, V. and Clamp, M. (2004), The otter annotation system, Genome Res., Vol. 14(5), pp Batzoglou, S., Pachter, L., Mesirov, J. P. et al. (2000), Human and mouse gene structure: Comparative analysis and application to exon prediction, Genome Res., Vol. 10(7), pp Vanin, E. F. (1985), Processed pseudogenes: Characteristics and evolution, Annu. Rev. Genet., Vol. 19, pp Kenmochi, N., Yoshihama, M., Higa, S. and Tanaka, T. (2000), The human ribosomal protein L6 gene in a critical region for Noonan syndrome, J. Human Genet., Vol. 45(5), pp Edwalds-Gilbert, G., Veraldi, K. L. and Milcarek, C. (1997), Alternative poly(a) site selection in complex transcription units: Means to an end?, Nucleic Acids Res., Vol. 25(13), pp Beaudoing, E., Freier, S., Wyatt, J. R. et al. (2000), Patterns of variant polyadenylation signal usage in human genes, Genome Res., Vol. 10(7), pp URL: Homo_sapiens/ 23. URL: URL: Stewart, C. A., Horton, R., Allcock, R. J. N. et al. (2004), Complete MHC haplotype sequencing for common disease gene mapping, Genome Res., Vol. 14(6), pp Hill, N. J., Lyons, P. A., Armitage, N. et al. (2000), NOD Idd5 locus controls insulitis and diabetes and overlaps the orthologous CTLA4/ IDDM12 and NRAMP1 loci in humans, Diabetes, Vol. 49(10), pp Debenham, S. L., Hart, E. A., Ashurst, J. L. et al. (2005), Genomic sequence of the class II region of the canine MHC: Comparison with the MHC of other mammalian species, Genomics, Vol. 85(1), pp URL: index.html 29. URL: guidelines.pdf 30. ENCODE Project Consortium (2004), The ENCODE (ENCyclopedia Of DNA Elements) Project, Science, Vol. 306(5696), pp URL: & HENRY STEWART PUBLICATIONS BRIEFINGS IN BIOINFORMATICS. VOL 6. NO JUNE
Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationGenome annotation & EST
Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationBrowsing Genomes with Ensembl
April Feb 2006 2007 Browsing Genomes with Ensembl Joint project Ensembl - Project EMBL European Bioinformatics Institute (EBI) Wellcome Trust Sanger Institute Produce accurate, automatic genome annotation
More informationEnsembl: A New View of Genome Browsing
28 TECHNICAL NOTES EMBnet.news 15.3 Ensembl: A New View of Genome Browsing Giulietta M. Spudich and Xosé M. Fernández- Suárez European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxon, Cambs,
More informationTraining materials.
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationIdentifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M.
Identifying Genes and Pseudogenes in a Chimpanzee Sequence Adapted from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. M. Brent Prerequisites: A Simple Introduction to NCBI BLAST Resources: The GENSCAN
More informationAaditya Khatri. Abstract
Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationTraining materials.
Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2015-2016 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2016-2017 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationNiemann-Pick Type C Disease Gene Variation Database ( )
NPC-db (vs. 1.1) User Manual An introduction to the Niemann-Pick Type C Disease Gene Variation Database ( http://npc.fzk.de ) curated 2007/2008 by Dirk Dolle and Heiko Runz, Institute of Human Genetics,
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationAnnotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G
Annotation Practice Activity [Based on materials from the GEP Summer 2010 Workshop] Special thanks to Chris Shaffer for document review Parts A-G Introduction: A genome is the total genetic content of
More informationChimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R.
Chimp BAC analysis: Adapted by Wilson Leung and Sarah C.R. Elgin from Chimp BAC analysis: TWINSCAN and UCSC Browser by Dr. Michael R. Brent Prerequisites: BLAST exercise: Detecting and Interpreting Genetic
More informationIl trascrittoma dei mammiferi
29 Novembre 2005 Il trascrittoma dei mammiferi dott. Manuela Gariboldi Gruppo di ricerca IFOM: Genetica molecolare dei tumori (responsabile dott. Paolo Radice) Copyright 2005 IFOM Fondazione Istituto FIRC
More informationInvestigating Inherited Diseases
Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise to inherited diseases.
More informationAccess to genes and genomes with. Ensembl. Worked Example & Exercises
Access to genes and genomes with Ensembl Worked Example & Exercises September 2006 1 CONTENTS WORKED EXAMPLE... 2 BROWSING ENSEMBL... 21 Exercises... 21 Answers... 22 BIOMART... 25 Exercises... 25 Answers...
More informationGap Filling for a Human MHC Haplotype Sequence
American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationBIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP
Jasper Decuyper BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP MB&C2017 Workshop Bioinformatics for dummies 2 INTRODUCTION Imagine your workspace without the computers Both in research laboratories and in
More informationINTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet
INTRODUCTION TO BIOINFORMATICS SAINTS GENETICS 12-120522 - Ian Bosdet (ibosdet@bccancer.bc.ca) Bioinformatics bioinformatics is: the application of computational techniques to the fields of biology and
More informationGenome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)
Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA
More informationComparison of human (and other) genome browsers
SOFTWARE REVIEW Comparison of human (and other) genome browsers Terrence S. Furey* Institute for Genome Sciences and Policy, Duke University, 101 Science Drive, Box 3382, Durham, NC 27708, USA * Correspondence
More informationAfter the draft sequence, what next for the Human Genome Mapping Project Resource Centre?
Comparative and Functional Genomics Comp Funct Genom 2001; 2: 176 179. DOI: 10.1002 / cfg.83 Interview: Duncan Campbell After the draft sequence, what next for the Human Genome Mapping Project Resource
More informationGenomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010
Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010 Genomics is a new and expanding field with an increasing impact
More informationHands-On Four Investigating Inherited Diseases
Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationEnsembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team
Ensembl and ENA High level overview and use cases Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute
More informationNCBI & Other Genome Databases. BME 110/BIOL 181 CompBio Tools
NCBI & Other Genome Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2011 Admin Reading Dummies Ch 3 Assigned Review: "The impact of next-generation sequencing technology on genetics" by E.
More informationHUMAN GENOME BIOINFORMATICS. Tore Samuelsson, Dec 2009
HUMAN GENOME BIOINFORMATICS Tore Samuelsson, Dec 2009 The sequenced (gray filled) and unsequenced (white) portions of the human genome. Peter F.R. Little Genome Res. 2005; 15: 1759-1766 Human genome organisation
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationExperimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse
Karin Fleischhanderl; Martina Fondi Experimental validation of candidates of tissuespecific and CpG-island-mediated alternative polyadenylation in mouse 108 - Biotechnologie Abstract --- Keywords: Alternative
More informationOutline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions
Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationGenomics: Genome Browsing & Annota3on
Genomics: Genome Browsing & Annota3on Lecture 4 of 4 Introduc/on to BioMart Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development,
More informationIdentification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources
Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationEnsembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically
More informationGenome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does
More information9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3
cdna libraries, EST clusters, gene prediction and functional annotation Biosciences 741: Genomics Fall, 2013 Week 3 1 2 3 4 5 6 Figure 2.14 Relationship between gene structure, cdna, and EST sequences
More informationHuman KIR sequences 2003
Immunogenetics (2003) 55:227 239 DOI 10.1007/s00251-003-0572-y ORIGINAL PAPER C. A. Garcia J. Robinson L. A. Guethlein P. Parham J. A. Madrigal S. G. E. Marsh Human KIR sequences 2003 Received: 17 March
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationIntroduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph
Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent
More informationSeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen
SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationGenomes contain all of the information needed for an organism to grow and survive.
Section 3: Genomes contain all of the information needed for an organism to grow and survive. K What I Know W What I Want to Find Out L What I Learned Essential Questions What are the components of the
More informationOpen Access. Abstract
Software ProSplicer: a database of putative alternative splicing information derived from protein, mrna and expressed sequence tag sequence data Hsien-Da Huang*, Jorng-Tzong Horng*, Chau-Chin Lee and Baw-Jhiune
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationBIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology
BIO4342 Lab Exercise: Detecting and Interpreting Genetic Homology Jeremy Buhler March 15, 2004 In this lab, we ll annotate an interesting piece of the D. melanogaster genome. Along the way, you ll get
More informationVega and the Otterlace Community Manual Annotation Tool
Photo bymaj Britt Hansen 2/3/2015 Vega and the Otterlace Community Manual Annotation Tool Toby Hunt Havana group, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK Havana: Human and vertebrate analysis
More informationFigure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By
1 2 3 Figure 1. FasterD SERCH PGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- y keywords (ENSEML ID, HUGO gene name, synonyms or
More informationIntroduction to NGS analyses
Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature13127 Factors to consider in assessing candidate pathogenic mutations in presumed monogenic conditions The questions itemized below expand upon the definitions in Table 1 and are provided
More informationEntrez Gene: gene-centered information at NCBI
D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National
More informationGenome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity
Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationBrowsing Genes and Genomes with Ensembl
Browsing Genes and Genomes with Ensembl Emily Perry Ensembl Outreach Project Leader EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.
More informationGene Finding Genome Annotation
Gene Finding Genome Annotation Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics Population biology & evolution Medical genomics
More informationCHAPTER 21 LECTURE SLIDES
CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.
More informationThe human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.
Data mining in Ensembl with BioMart Worked Example The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Which other genes related to human
More informationBioinformatics Course AA 2017/2018 Tutorial 2
UNIVERSITÀ DEGLI STUDI DI PAVIA - FACOLTÀ DI SCIENZE MM.FF.NN. - LM MOLECULAR BIOLOGY AND GENETICS Bioinformatics Course AA 2017/2018 Tutorial 2 Anna Maria Floriano annamaria.floriano01@universitadipavia.it
More informationComparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.
Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied
More informationQuestion 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.
Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or
More informationA new strategy to identify novel genes and gene isoforms: Analysis of human chromosomes 15, 21 and 22
Gene 365 (2006) 35 40 www.elsevier.com/locate/gene A new strategy to identify novel genes and gene isoforms: Analysis of human chromosomes 15, 21 and 22 Matteo Rè a,1, Flavio Mignone a,1, Michele Iacono
More informationAligning GENCODE and RefSeq transcripts By EMBL-EBI and NCBI
Aligning GENCODE and RefSeq transcripts By EMBL-EBI and NCBI Joannella Morales, Ph.D. LRG Project Manager jmorales@ebi.ac.uk contact@lrg-sequence.org https://www.lrg-sequence.org https://www.ensembl.org
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationAnnotating Fosmid 14p24 of D. Virilis chromosome 4
Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome
More informationWhat is Bioinformatics?
What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More informationSupplementary Online Material. the flowchart of Supplemental Figure 1, with the fraction of known human loci retained
SOM, page 1 Supplementary Online Material Materials and Methods Identification of vertebrate mirna gene candidates The computational procedure used to identify vertebrate mirna genes is summarized in the
More informationPiloting the Zebrafish Genome Browser
DEVELOPMENTAL DYNAMICS 235:747 753, 2006 TECHNIQUES Piloting the Zebrafish Genome Browser Anthony DiBiase, 1 * Rachel A. Harte, 2 Yi Zhou, 1 Leonard Zon, 1 and W. James Kent 2 This correspondence is a
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationBiotechnology Project Lab
Only for teaching purposes - not for reproduction or sale Advanced Cell Biology & Biotechnology Biotechnology Project Lab Giovanna Gambarotta COMPETENCES THAT YOU WILL ACQUIRE - compare DNA sequences -
More informationOverview: GQuery Entrez human and amylase Search Pubmed Gene Gene: collected information about gene loci AMY1A Genomic context Summary
Visualizing Whole Genomes The UCSC Human Genome Browser: Hands-on Exercise What do you do with a whole genome sequence once it is complete? Most genome-wide analyses require having the data, but not necessarily
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationThe Human Genome Project
The Human Genome Project The Human Genome Project Began in 1990 The Mission of the HGP: The quest to understand the human genome and the role it plays in both health and disease. The true payoff from the
More informationIn silico variant analysis: Challenges and Pitfalls
In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI www.ensembl.org Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels
More informationChapter 15 The Human Genome Project and Genomics. Chapter 15 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning
Chapter 15 The Human Genome Project and Genomics Genomics Is the study of all genes in a genome Relies on interconnected databases and software to analyze sequenced genomes and to identify genes Impacts
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationBME 110 Midterm Examination
BME 110 Midterm Examination May 10, 2011 Name: (please print) Directions: Please circle one answer for each question, unless the question specifies "circle all correct answers". You can use any resource
More informationPharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001
Pharmacogenetics: A SNPshot of the Future Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001 1 I. What is pharmacogenetics? It is the study of how genetic variation affects drug response
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More information