!"#$!"#$%&'&()#*'+*,-*%#".+/&0+("%&1)#.+*%,+#')%)#.
|
|
- Silas Wilson
- 5 years ago
- Views:
Transcription
1 Bioinformatics Contents Examples of biological databases " Nucleic sequences: Genbank, EMBL, and DDBJ " Protein sequences: UniProt " The Gene Ontology (GO) project Issues and perspectives for biological databases Biomolecular databases Jacques van HeldenFORMER ADDRESS ( ) Bioinformatique des Génomes et des Réseaux (BiGRe lab) BGRe Bioinformatique des Génomes et Réseaux NEW ADDRESS (since Nov 1 st, 2011) Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM Unit U1090) Inserm U1090 "#$ "#$%&'&()#*'+*,-*%#".+/&0+("%&1)#.+*%,+#')%)#. Examples of biomolecular databases Examples of biomolecular databases Sequence and structure databases " Protein sequences (UniProt) " DNA sequences (EMBL, Genbank, DDBJ) " 3D structures (PDB) " Structural motifs (CATH) " Sequence motifs (PROSITE, PRODOM) Genome sequences and annotations " Genome-specific databases (SGD, FlyBase, AceDB, PlasmoDB, ) " Multiple genomes (Integr8, NCBI, KEGG, TIGR, ) Molecular functions " Transcriptional regulation (TRANSFAC, RegulonDB, InteractDB) " Enzymatic catalysis (Expasy, LIGAND/KEGG, BRENDA) " Transport (YTPdb) Biological processes " Metabolic pathways (EcoCyc, LIGAND/KEGG, Biocatalysis/biodegradation) " Signal transduction pathways (CSNdb, Transpath) " Protein-protein interactions (DIP, BIND, MINT) " Gene networks (GeneNet, FlyNets) Databases of databases There are hundreds of databases related to molecular biology and biochemistry. New databases are created every year. Every year, the first issue of Nucleic Acids Research is dedicated to biological databases " " 2011 Issue: The same journal maintains a database of databases: the Molecular Biology Database Collection " Some bioinformatics centres maintain multiple database, with cross-links between them. The SRS server at EBI holds an impressive collection of databases. " Nucleic sequence databases: GenBank, EMBL, and DDBJ
2 Nucleic sequence databases To publish an article dealing with a sequence, scientific journals impose to have previously deposited this sequence in a reference database. There are 3 main repositories for nucleic acid sequences. Sequences deposited in any of these 3 databases are automatically synchronized in the 2 other ones. The sequencing pace Nucleic sequences " Entire genomes " " Genbank (April 2011) 126,551,501,141 bases in 135,440,924 sequence records in the traditional GenBank divisions 191,401,393,188 bases in 62,715,288 sequence records in the Whole Genome Ssequencing GOLD Release V.2 (Oct 2011) contains ~2000 completely sequenced genomes. Protein sequences " Essentially obtained by translation of putative genes in nucleic sequences (almost no direct protein sequencing). UniProtKB/TrEMBL (2011) contains 17 millions of protein sequences. " " Okubo et al. (2006) NAR 34: D6-D9 Size of the nucleotide database EMBL Nucleotide Sequence Database: Release Notes - Release 113 September Class entries nucleotides CON:Constructed 7,236, ,112,791,043 EST:Expressed Sequence Tag 73,715,376 40,997,082,803 GSS:Genome Sequence Scan 34,528,104 21,985,922,905 HTC:High Throughput CDNA sequencing 491, ,229,662 HTG:High Throughput Genome sequencing 152,599 25,159,746,658 PAT:Patents 24,364,832 12,117,896,594 STD:Standard 13,920,617 37,665,112,606 STS:Sequence Tagged Site 1,322, ,037,867 TSA:Transcriptome Shotgun Assembly 8,085,693 5,663,938,279 WGS:Whole Genome Shotgun 88,288, ,661,696, Total 252,106, ,481,663,919 Division entries nucleotides ENV:Environmental Samples 30,908,230 14,420,391,278 FUN:Fungi 6,522,586 11,614,472,226 HUM:Human 32,094,500 38,072,362,804 INV:Invertebrates 31,907,138 52,527,673,643 MAM:Other Mammals 40,012, ,678,620,711 MUS:Mus musculus 11,745,671 19,701,637,499 PHG:Bacteriophage 8,511 85,549,111 PLN:Plants 52,428,994 55,570,452,118 PRO:Prokaryotes 2,808,489 28,807,572,238 ROD:Rodents 6,554,012 33,326,106,733 SYN:Synthetic 4,045, ,174,055 TGN:Transgenic 285, ,743,891 UNC:Unclassified 8,617,225 4,957,442,673 VRL:Viruses 1,358,528 1,518,575,082 VRT:Other Vertebrates 22,809,428 42,568,889, Total 252,106, ,481,663,919 Adapted from Didier Gonze Genbank (NCBI - USA) The EMBL Nucleotide Sequence Database (EBI - UK) DDBJ - DNA Data Bank of Japan
3 Size of the nucleic sequence databases Summary of database contents for the 3 main databases of nucleic sequences. Source: NAR database issue January Bases bases (without (including URL Sequences shotgun) shotgun) Organisms DDBJ 2.0E E+09 EMBL 1.0E E+05 GenBank 4.6E E E E+05 UniProt : protein sequences and functional annotations UniProt - the Universal Protein Resource Database content (Sept 2012) " UniProtKB: 24,532,088 entries Translation of EMBL coding sequences (non-redundant with Swiss-Prot) " UniProtKB/Swiss-Prot section (reviewed): 537,505 entries annotation by experts high information content many references to the literature good reliability of the information " The rest (90% of the entries) Automatic annotation by sequence similarity. Features " The most comprehensive protein database in the world. " A huge team: >100 annotators + developers. " Annotation by experts: annotators are specialized for different types of proteins or organisms. " World-wide recognized as an essential resource. References " Bairoch et al. The SWISS-PROT protein sequence data bank. Nucleic Acids Res (1991) vol. 19 Suppl pp " The UniProt Consortium. The Universal Protein Resource (UniProt) Nucleic Acids Res (2008). Database Issue. Taxonomic distribution of the sequences Number of entries (polypeptides) in Swiss-Prot Within Eukaryotes Header : name and synonyms Human-based annotation by specialists Structured annotation : keywords and Gene Ontology terms
4 Protein interactions; Alternative products Detailed description of regions, variations, and secondary structure Peptidic sequence References to original publications Cross-references to many databases (fragment shown) 3D Structure of macromolecules
5 PDB - The Protein Data Bank Genome browsers EnsEMBL Genome Browser (Sanger Institute + EBI) UCSC Genome Browser (University California Santa Cruz - USA) Human gene Pax6 aligned with Vertebrate genomes UCSC Genome Browser (University California Santa Cruz - USA) UCSC Genome Browser (University California Santa Cruz - USA) Drosophila gene eyeless (homolog to Pax6) aligned with Insect genomes Drosophila 120kb chromosomal region covering the Achaete-Scute Complex
6 ECR Browser EnsEMBL - Example: Drosophila gene Pax6 Integr8 - access to complete genomes and proteomes Comparative genomics Integr8 - genome summaries Integr8 - clusters of orthologous genes (COGs)
7 Integr8 - clusters of paralogous genes Databases of protein domains Prosite - protein domains, families and functional sites Prosite - aligned sequences and logo Some of the sequences that were used to built the Prosite profile for the Zn(2)-C6 fungal-type DNAbinding domain (ZN2_CY6_FUNGAL_2, PS50048). The Sequence Logo (below) indicates the level of conservation of each residue in each column of the alignment. Note the 6 cysteines, characteristic of this domain. Prosite - Example of profile matrix Prosite - Example of sequence logo
8 Prosite - Example of domain signature PFAM (Sanger Institute - UK) Protein families represented by multiple sequence alignments and hidden Markov models (HMMs) The domain signature is a string-based pattern representing the residues that are characteristic of a domain. CATH - Protein Structure Classification CATH - Protein Structure Classification CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels: " Class (C), " Architecture (A), " Topology (T) " Homologous superfamily (H). The boundaries and assignments for each protein domain are determined using a combination of automated and manual procedures which include computational techniques, empirical and statistical evidence, literature review and expert analysis. References " Orengo et al. The CATH Database provides insights into protein structure/ function relationships. Nucleic Acids Res (1999) vol. 27 (1) pp " Cuff et al. The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res (2008) pp. InterPro (EBI - UK) InterPro (EBI - UK) Antennapedia-like Homeobox (entry IPR001827)
9 Ontology definition Ontologie: partie de la métaphysique qui s'intéresse à l'être en tant qu'être, indépendamment de ses déterminations particulières Ontology: part of the metaphysics that focusses on the being as a beging, independently of its particular determinations Le Petit Robert - dictionnaire alphabétique et analogique de la langue française The Gene Ontology (GO) database The "bio-ontologies" Gene ontology: processes Answer to the problem of inconsistencies in the annotations " Controlled vocabulary " Hierarchical classification between the terms of the controlled vocabulary E.g.: The Gene Ontology " molecular function ontology " process ontology " cellular component ontology Gene ontology: molecular functions Gene ontology: cellular components
10 Gene Ontology Database Gene Ontology Database ( Example: methionine biosynthetic process Status of GO annotations (NAR DB issue 2006) Term definitions " Biological process terms 9,805 " Molecular function terms 7,076 " Cellular component terms 1,574 " Sequence Ontology terms 963 Genomes with annotation 30 " Excludes annotations from UniProt, which represent 261 annotated proteomes. QuickGO ( Web site A user-friendly Web interface to the Gene Ontology. Graphical display of the hierarchical relationships between terms. Convenient browsing between classes. Annotated gene products " Total 1,618,739 " Electronic only 1,460,632 " Manually curated 158,107 Remarks on "bio-ontologies" What is biological function? Improvement compared to free text " controlled vocabulary (choice among synonyms) " hierarchical relationships between the concepts Nothing to do with the philosophical concept of ontology " A "bio-ontologies" is usually nothing more than a taxonomical classification of the terms of a controlled vocabulary Multiple possibilities of classification criteria " e.g. compartment subtypes (plasma membrane is a membrane) " e.g. compartment locations (nucleus is inside cytoplasm is inside plasma membrane) To be useful, should remain purpose-based " each biologist might wish to define his/her own classification based on his/her needs and scope of interest " impossible to define a unifying standard for all biologists No representation of molecular interactions " relationships between objects are only hierarchical, not horizontal or cyclic " e.g. does not describe which genes are the target of a given transcription factor A general definition " Fonction: action, rôle caractéristique d un élément, d un organe, dans un ensemble (souvent opposé à structure). Source: Le Petit Robert - dictionnaire alphabetique et analogique de la langue francaise " Function: characteristic action (role) of an element (organ) within an set (often opposed to structure) Function and gene ontology " Understanding the function requires to establish the link between molecular activity and the context in which it takes place (process). " Multifunctionality Same activity can play different roles in different processes. Example: scute gene in Drosophila melanogaster: a transcription factor (activity) involved in sex determination, determination of neural precursors and malpighian tubules (3 processes). Multiple activities of a same protein in a given process Example: aspatokinase PutA in Escherichia coli, contains 2 enzymatic domains (enzymatic activities) + a DNA-binding domain (DNA binding transcription factor) -> 3 molecular activities in the same process (proline utilization).
11 LIGAND - Small compounds and metabolic reactions Small compounds, reactions and metabolic pathways KEGG - Kyoto Encycplopaedia of Genes and Genomes Ecocyc, BioCyc and Metacyc - Metabolic pathways Protein interaction networks and transduction pathways Microarray databases
12 HapMap Human genome resources The International HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. Associations between genetic variations (SNPs,...) and diseases + response to pharmaceuticals. Issues for biological databases Dealing with biological complexity Data content " Coverage " Information content Data quality Issues for biomolecular databases " Data structure " Consistency Query capabilities Interfaces " User interfaces " Programmatic interfaces Annotation Funding Towards biological complexity Data content The main databases currently available are focussed on one type of molecular entity : nucleic sequences, proteins, compounds, This type of organization is very convenient as far as the information to be represented is simple (e.g. DNA sequences, structures of small molecules and macromolecules). It becomes more difficult if we want to represent " the interactions between biological objects, " the integration of various elements in a biological process (metabolic pathways, protein interaction networks, regulatory networks, ) " complex concepts such as biological function Scope of the database " types of biological objects represented Number of entries " coverage of the current knowledge Information content " Level of detail in the description of the biological objects References to the source of information
13 Data quality Query capabilities Data Consistency " always use the same name to indicate the same object " (this seems trivial, but its is unfortunately still not always the case) " event better: define an ID for each objects, and allow to retrieve it by any of its synonyms " spelling mistakes Data Structuration " distinct fields for distinct attributes of the biological objects Reliability " Evidences? Level of confidence? " Assignation of function by similarity recursive process propagation of errors Browsing (click and read) Simple search " select records with some constraints More elaborate search " select specific fields of some records with constraints on some fields (~SQL SELECT) Complex querying " ability to return an answer that results from a "live" computation, and was not part of any record of the dabatase Interfaces Annotation User interfaces " user-friendly " convenient browsing " intuitive query forms " visualization (graphical output) Programmatic interfaces " communication with external programs: other databases (concept of distributed database) analysis tools Problem " The flow of available data is increasing exponentially Strategies " internal curators " selected external experts " public submission " computer-based extraction of information from biological texts Funding Public funding " Problem: easier to obtain public funds for creating a new database than for maintaining or expanding existing resources Private funding " Industrial companies are ready to invest in good data and good query capabilities interested by academic expertise Solutions " All users pay (per query for example) Note: academic users are anyway funded by public funds " Hybrid solution access is free for academic users, not for companies companies can buy the whole database an install it in-house (+ add their own private data) academia-industry interface is often ensured by a spinoff company
Bioinformatics. Biomolecular databases
Bioinformatics Biomolecular databases Jacques van HeldenFORMER ADDRESS (1999-2011) Université Libre de Bruxelles, Belgique Bioinformatique des Génomes et des Réseaux (BiGRe lab) http://www.bigre.ulb.ac.be/
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationWeb-based Bioinformatics Applications in Proteomics
Web-based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu January 30, 2009 NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ 1 Pubmed
More informationProduct Applications for the Sequence Analysis Collection
Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationSince 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL
Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database o A high quality
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SGBC-SLU 2016 VALIDATION Experimental Literature Manual or semi-automatic computational analysis EXPERIMENTAL Costs Needs skilled manpower
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationBioinformatics for Cell Biologists
Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM) Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena
More informationGenome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita
Genome Informatics Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, 2008 Kiyoko F. Aoki-Kinoshita Introduction Genome informatics covers the computer- based modeling and data processing
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationA WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING
A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING D. Martucci a, F. Pinciroli a,b, M. Masseroli a a Dipartimento di Bioingegneria, Politecnico di Milano, Milano,
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationGenetics and Bioinformatics
Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s
More informationBioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics
The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the
More informationKyoto Encyclopedia of Genes and Genomes (KEGG)
NPTEL Biotechnology -Systems Biology Kyoto Encyclopedia of Genes and Genomes (KEGG) Dr. M. Vijayalakshmi School of Chemical and Biotechnology SASTRA University Joint Initiative of IITs and IISc Funded
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationApplied Bioinformatics
Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationDatabases/Resources on the web
Databases/Resources on the web Jon K. Lærdahl jonkl@medisin.uio.no A lot of biological databases available on the web... MetaBase, the database of biological databases (1801 entries) - h p://metadatabase.org
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationKEGG: Kyoto Encyclopedia of Genes and Genomes
1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 29 34 KEGG: Kyoto Encyclopedia of Genes and Genomes Hiroyuki Ogata, Susumu Goto, Kazushige Sato, Wataru Fujibuchi, Hidemasa Bono
More informationMetabolism and metabolic networks. Metabolism is the means by which cells acquire energy and building blocks for cellular material
Metabolism and metabolic networks Metabolism is the means by which cells acquire energy and building blocks for cellular material Metabolism is organized into sequences of biochemical reactions, metabolic
More informationDatabases in genomics
Databases in genomics Search in biological databases: The most common task of molecular biologist researcher, to answer to the following ques7ons:! Are they new sequences deposited in biological databases
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationAnnotation. (Chapter 8)
Annotation (Chapter 8) Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store
More informationBIMM 143: Introduction to Bioinformatics (Winter 2018)
BIMM 143: Introduction to Bioinformatics (Winter 2018) Course Instructor: Dr. Barry J. Grant ( bjgrant@ucsd.edu ) Course Website: https://bioboot.github.io/bimm143_w18/ DRAFT: 2017-12-02 (20:48:10 PST
More informationThis practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.
PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationNiceProt View of Swiss-Prot: P18907
Hosted by NCSC US ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot Mirror sites: Australia Bolivia Canada China Korea Switzerland Taiwan Search Swiss-Prot/TrEMBL for horse alpha Go Clear NiceProt
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationTranslating Biological Data Sets Into Linked Data
Translating Biological Data Sets Into Linked Data Mark Tomko Simmons College, Boston MA The Broad Institute of MIT and Harvard, Cambridge MA September 28, 2011 Overview Why study biological data? UniProt
More informationVALLIAMMAI ENGINEERING COLLEGE
VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2015-2016 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2016-2017 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationWeb based Bioinformatics Applications in Proteomics. Genbank
Web based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu February 9, 2010 Genbank Primary nucleic acid sequence database Maintained by NCBI National Center for Biotechnology
More informationWill discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice
Will discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice integration - web system (V) 1 Touring the Protein Space (outline) 1. Protein Sequence - how rich? How
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationWhat is Bioinformatics?
What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is
More informationFrom assembled genome to annotated genome
From assembled genome to annotated genome Procaryotic genomes Eucaryotic genomes Genome annotation servers (web based) 1. RAST 2. NCBI Gene prediction pipeline: Maker Function annotation pipeline: Blast2GO
More informationMetabolic modeling (4cr)
582605 Metabolic modeling (4cr) Lecturer: prof. Juho Rousu Course assistant: Markus Heinonen Lectures: Tuesdays and Fridays, 14.15-16, B119 Exercises: 16.03.-24.04. Tuesdays 16.15-18, C221 Course topics:
More informationFunctional analysis using EBI Metagenomics
Functional analysis using EBI Metagenomics Contents Tutorial information... 2 Tutorial learning objectives... 2 An introduction to functional analysis using EMG... 3 What are protein signatures?... 3 Assigning
More information2. Materials and Methods
Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationKlinisk kemisk diagnostik BIOINFORMATICS
Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationBiology 644: Bioinformatics
Processes Activation Repression Initiation Elongation.... Processes Splicing Editing Degradation Translation.... Transcription Translation DNA Regulators DNA-Binding Transcription Factors Chromatin Remodelers....
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationBrowsing Genes and Genomes with Ensembl
Browsing Genes and Genomes with Ensembl Emily Perry Ensembl Outreach Project Leader EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More informationSequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases
Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationAccess to Information from Molecular Biology and Genome Research
Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationDRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo
DRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo South African National Bioinformatics Institute University of the Western Cape RELEVEANCE OF DATA SHARING! Fragmented data
More informationTraining materials.
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationIdentifying Regulatory Regions using Multiple Sequence Alignments
Identifying Regulatory Regions using Multiple Sequence Alignments Prerequisites: BLAST Exercise: Detecting and Interpreting Genetic Homology. Resources: ClustalW is available at http://www.ebi.ac.uk/tools/clustalw2/index.html
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationBIOINFORMATICS TO ANALYZE AND COMPARE GENOMES
BIOINFORMATICS TO ANALYZE AND COMPARE GENOMES We sequenced and assembled a genome, but this is only a long stretch of ATCG What should we do now? 1. find genes What are the starting and end points for
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationFunction Prediction of Proteins from their Sequences with BAR 3.0
Open Access Annals of Proteomics and Bioinformatics Short Communication Function Prediction of Proteins from their Sequences with BAR 3.0 Giuseppe Profiti 1,2, Pier Luigi Martelli 2 and Rita Casadio 2
More informationIntroduction to Molecular Biology Databases
Introduction to Molecular Biology Databases Laboratorio de Bioinformática Centro de Astrobiología INTA-CSIC Centro de Astrobiología PRESENT BIOLOGY RESEARCH Data sources Genome sequencing projects: genome
More informationRetrieval of gene information at NCBI
Retrieval of gene information at NCBI Some notes 1. http://www.cs.ucf.edu/~xiaoman/fall/ 2. Slides are for presenting the main paper, should minimize the copy and paste from the paper, should write in
More informationDina El-Khishin (Ph.D.) Bioinformatics Research Facility. Deputy Director of AGERI & Head of the Genomics, Proteomics &
Dina El-Khishin (Ph.D.) Deputy Director of AGERI & Head of the Genomics, Proteomics & Bioinformatics Research Facility Agricultural Genetic Engineering Research Institute (AGERI) Giza EGYPT Bioinformatics
More informationMS bioinformatics analysis for proteomics. Protein anotations
MS bioinformatics analysis for proteomics Protein anotations UCO - Córdoba Organized by: ProteoRed, EUPA and Seprot Alberto Medina January, 23rd 2009 Summary Introduction Some issues Software: Fatigo -
More informationAnalysis of Microarray Data
Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationSmall Genome Annotation and Data Management at TIGR
Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient
More informationThe RNA tools registry
university of copenhagen f a c u lt y o f h e a lt h a n d m e d i c a l s c i e n c e s The RNA tools registry A community effort to catalog RNA bioinformatics resources and their relationships Anne Wenzel
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationFACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE
FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)
More informationEngineering Genetic Circuits
Engineering Genetic Circuits I use the book and slides of Chris J. Myers Lecture 0: Preface Chris J. Myers (Lecture 0: Preface) Engineering Genetic Circuits 1 / 19 Samuel Florman Engineering is the art
More informationUpstream/Downstream Relation Detection of Signaling Molecules using Microarray Data
Vol 1 no 1 2005 Pages 1 5 Upstream/Downstream Relation Detection of Signaling Molecules using Microarray Data Ozgun Babur 1 1 Center for Bioinformatics, Computer Engineering Department, Bilkent University,
More informationOverview of Health Informatics. ITI BMI-Dept
Overview of Health Informatics ITI BMI-Dept Fellowship Week 5 Overview of Health Informatics ITI, BMI-Dept Day 10 7/5/2010 2 Agenda 1-Bioinformatics Definitions 2-System Biology 3-Bioinformatics vs Computational
More informationBioinformatic Tools. So you acquired data.. But you wanted knowledge. So Now What?
Bioinformatic Tools So you acquired data.. But you wanted knowledge So Now What? We have a series of questions What the Heck is That Ion? How come my MW does not match? How do I make a DB to search against?
More information