MOLECULAR BIOLOGY DATABASES. Juan Carlos Sánchez Ferrero
|
|
- Abraham Marsh
- 6 years ago
- Views:
Transcription
1 MOLECULAR BIOLOGY DATABASES Juan Carlos Sánchez Ferrero Centro Nacional de Biotecnología, CSIC July 2008
2 GROWING NUMBER OF DATA Molecular biology data explosion in the omics era: genome sequencing, high-throughput proteomics, structural genomics, functional genomics Protein structures (PDB) Nucleotide sequences (GenBank) Year Bioinformatics: to uncover relevant biological information hidden in the huge amount of biological data All that information (primary data and derived data) is stored in databases
3 Ouellette, 2000
4 PRIMARY BIOLOGICAL DATA SOURCES Genome sequencing projects: genome sequences Functional genomics: gene expression data Proteomics: protein catalogues, postranslational modifications Structural genomics: protein structures Functional interactions between cellular components: gene regulatory networks Physical interactions between proteins: protein interaction networks Diverse experimental data: non structured information=>publications
5 GROWING NUMBER OF DATABASES The number of molecular biology databases is also highly increasing Because of huge data amount and new types of biological data (less) Due to database specialization (most), e.g. by taxonomic group
6 Environmental genome shotgun sequencing of the Sargasso Sea. Venter et al. Science March billion nucleotides 1.2 million new genes 782 new rhodopsin like photoreceptors 1800 genomic species
7 HOW TO ACCESS DATABASES Databases are organized: As flat files with all the information in a single text file As relational databases (based on MySQL, Oracle, etc) Users can: Query a database through a web form and get the information on the screen (HTML) or download it as text (e.g. sequences) and other formats such as XML. Download all or the most relevant information of a database as text or XML
8 CLASSIFICATION OF MOLECULAR BIOLOGY DATABASES First classification scheme: Classification scheme based on the source of the database content (core data): Primary DBs: the content consists of experimentally obtained data Secondary DBs: the content is the result of analyses of data in primary DBs
9 CONTENT OF PRIMARY DATABASES Experimentally derived information: Nucleic acid sequences: complete genomes, cloned genome fragments, cdnas, ESTs, SNPs, small RNAs Protein or nucleic acid structures: atomic coordinates obtained by NMR and X-ray crystallography Transcript or protein expression data: obtained in microarray experiments and by proteomic approaches, respectively Cellular processes, such as experimentally determined metabolic or regulatory pathways
10 CONTENT OF SECONDARY DATABASES Predictions or interpretations based on information contained in primary databases Protein sequences, deduced from nucleotide sequences Alignments of protein or nucleic acid sequences Protein families, inferred by sequence similarity or by the presence of common motifs or domains Protein families, inferred by structural similarity Reconstructed (predicted) cellular processes, such as metabolic pathways
11 CLASSIFICATION OF MOLECULAR BIOLOGY DATABASES Second classification scheme: Follows the well known layout to describe the levels of organization of protein structures. Primary DBs: the content consists of SEQUENCES (primary structure of nucleic acids or proteins). Secondary DBs: the content consists of PATTERNS (regions of local regularity, for example, conserved motifs or domains). Structure DBs: the content consists of sets of ATOMIC COORDINATES (three-dimensional packing of secondary structure elements). This scheme applies only to biological molecules (for example, a database of results from micro-array expression experiments would not fit).
12 DATABASE CONTENT & ANNOTATIONS Additional information complementing the DB content information Annotations can refer to: Authorship of the entry Experimental conditions Source of the biological material Sub-cellular location Molecular function or cellular process Bibliographic references Cross-references: entry attributes that make reference to related entries in other databases. Some annotations consist of information of secondary type, since they are predictions, and some of them have been transferred from other databases
13 DATABASE SEARCHES: QUERY TYPES TEXT QUERIES These type of fixed form queries are searches performed against the ANNOTATIONS, which, almost by definition, consist of texts. It is usual to allow the combination of words with Boolean operators (and, or, not) the use of wild cards (*), and also, to specify the search field, for example: ponb and ayala* [AUTH] would result in a search with the word ponb, in any field, combined with a search of the word ayala* in the author field.
14 DATABASE SEARCHES: QUERY TYPES QUERIES by CONTENT This type of fixed form query refers to searches performed against the CONTENT or CORE DATA, which, almost by definition, consists in abstract representations: Strings of characters that represent nucleotide or protein sequences. Tables of atomic coordinates that represent three dimensional objects Bitmap files that represent 2D gel images.
15 DATABASE SEARCHES: QUERY TYPES QUERIES by CONTENT For example, in the case of a sequence database, we may be asking: Does a sequence exactly like: LLLIHRLH or similar to it, exists in the database? Because biological sequences change along time, or between individuals, this type of search is not a matter of finding exact matches in strings of characters. On the contrary, it must consider the principles of molecular evolution. A number of algorithms have been developed to cope with that task, and BLAST is the most popular.
16 NUCLEOTIDE DATABASES International collaboration among main three nucleotide sequence databases: NCBI GenBank, EMBL Nucleotide Sequence Database and DNA Data Bank of Japan (DDBJ) Every sequence that is cited in a publication must be submitted to one of these databases and made publicly available Submissions from individual laboratories and batch submissions from large-scale sequencing projects Daily exchange of information between databases
17 NUCLEOTIDE DATABASES GenBank Maintained at the National Center for Biotechnology Information (NCBI) More than 61 million nucleotide sequences from ~240,000 organisms Diverse sources: expressed sequence tag (EST), high throughput genomic (HTG), environmental sample (ENV), whole genome shotgun (WGS),... Features with biological significance such as coding regions and their translations, transcription units, repeat regions, and sites of mutations Complete bimonthly releases and daily updates Accessible through Entrez (NCBI's search and retrieval system) and FTP RefSeq: comprehensive, integrated, non-redundant set of sequences including genomic DNA, transcripts and proteins
18 NUCLEOTIDE DATABASES EMBL Nucleotide Sequence Database Maintained at the European Bioinformatics Institute (EBI) 80.5 million entries from ~260,000 organisms (as of Sept. 2006) Entry types include standard (STD), constructed (CON), third party annotation (TPA), whole genome shotgun (WGS), annotated constructed (ANN) and mass genome annotation library (MGA) Complete releases every three months Accessible through the EBI Sequence Retrieval System (SRS), other web services and FTP
19 GENOME DATABASES Ensembl Ensembl, EBI: mostly vertebrates Entrez Genome, NCBI: eukaryotes, prokaryotes and viruses UCSC Genome Browser: mostly animals The Institute for Genomics Research: bacteria, fungi, parasites, plants
20 GENOME DATABASES Ensembl Joint project between EMBL-EBI and the Sanger Institute Comprehensive source of annotation of (mostly) chordate genome sequences Automated annotation system of genes from unannotated species Currently 37 genomes (Feb. 2008) Features include: Chromosome maps Contig views Gene predictions Annotations Gene structures (exons-introns) Regulatory regions SNPs mrnas Peptides Comparative genomics and evolutionary trees Hubbard et al. (2007) Nucl. Acids Res. 35:D610-D617
21 GENOME DATABASES Entrez Genome Includes complete chromosomes, organelles and plasmids as well as draft genome assemblies Chromosome views, contig maps, sequence maps (NCBI Map Viewer) Integrated with Entrez Nucleotide and Entrez protein Entrez Genome Project: 22 eukaryotic and 646 prokaryotic complete genomes (Feb. 2008)
22 GENOME DATABASES OTHER SPECIALIZED DATABASES Rat Genome Database: laboratory rat, Rattus norvegicus VectorBase: invertebrate vectors of human pathogens (e.g. Anopheles) FlyBase: genus Drosophila BeetleBase: genus Tribolium WormBase: C. elegans and C. briggsae PlasmoDB: Plasmodium falciparum TAIR: Arabidopsis thaliana Candida Genome Database: Candida albicans Saccharomyces Genome Database, CYGD: Saccharomyces cerevisiae EcoCyc: Escherichia coli PBRC: family Poxviridae
23 GENE-CENTERED DATABASES Complete information about a gene: from genomic location to protein function Integrated with NCBI Entrez, contains genes from multiple genomes Only human genes, maintained by the Weizmann Institute of Science
24 GENE-CENTERED DATABASES Genomic context Transcripts and links to Entrez Nucleotide Proteins and links to Entrez Protein Functional information extracted from Pubmed Protein interactions Gene Ontology terms
25 PROTEIN DATABASES Non-redundant data from SwissProt, PIR, PDB, and translations of all coding sequences present in the EMBL/GenBank/DDBJ Sources: protein databases including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq
26 PROTEIN DATABASES Consortium by the European Bioinformatics Institute (EBI), the Protein Information Resource (PIR) and the Swiss Institute of Bioinformatics (SIB) The world's most comprehensive catalog of information on protein
27 PROTEIN DATABASES UniProtKB/Swiss-Prot Manually curated protein sequences from all organisms Functional information with bibliographic references Domains and sites Secondary and tertiary structures Posttranslational modifications 356,194 entries in UniProtKB/Swiss-Prot (Feb. 2008)
28 PROTEIN DATABASES UniProtKB/TrEMBL Translations of all coding sequences present in the EMBL/GenBank/DDBJ Automatic annotation and classification Proteins that get manually annotated go to UniProtKB/Swiss-Prot 5,395,414 entries in UniProtKB/TrEMBL (Feb. 2008)
29 STRUCTURE DATABASES Protein Data Bank RCSB PDB (USA), MSD-EBI (Europe), PDBj (Japan) and BMRB (USA) To maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available Protein and nucleic acid structures X-ray crystallography, NMR and electron microscopy More than structures (Feb 2008)
30 STRUCTURE DATABASES Protein structure classification databases Structural Classification of Proteins Folds, superfamilies and families of structural domains Classified by manual inspection Currently more than 34,000 PDB structures (Feb 2008) Hierarchical classification of protein domain structures Class(C), Architecture(A), Topology(T) and Homologous(H) superfamily Combination of automated and manual procedures More than 30,000 classified PDB structures (Feb 2008)
31 PROTEIN DOMAINS DATABASES Protein domains Elements that usually fold independently of the rest of the protein chain They often play an important role in the biological function of the protein Domains defined with sequence patterns and profiles Domains defined with hidden Markov models (HMM) Domains defined with HMMs, and Pfam domains Domain definitions from multiple databases Domain definitions from Pfam, SMART and COG
32 PROTEIN DOMAIN DATABASES Around ~9,300 protein families/domains Covers 73% of proteins in UniProtKB Tool for searching domains in protein sequences Domains are defined with HMM profiles Multiple sequence alignments Protein domain architectures Species distribution Cross-references with protein sequence and structure databases Maintained at the Sanger Institute
33 GENE EXPRESSION DATABASES Gene expression profiling studies, array comparative genomic hybridization, chromatin-immunoprecipitation on arrays (ChIP-chip) studies, etc Experiment-centered and gene-centered views ArrayExpress Maintained by NCBI More than 8,000 experiments Maintained by the EBI Around 3,300 experiments Link to Expression Profiler, an online data analysis tool
34 PROTEIN INTERACTION DATABASES Information on interaction partners, interaction type and experimental evidence Experimental protein-protein interactions (literature) Direct (physical) and indirect (functional) associations More than 62,000 proteins and 162,000 interactions Maintained at the EBI Known and predicted protein-protein interactions Direct and indirect associations More than 1.5 million proteins from 373 organisms At EMBL and UniZH
35 PATHWAY DATABASES Display metabolic and signaling pathways and how genes are functionally connected Comprehensive catalogue of genes and pathways at Kyoto University Human signaling pathways database by Nature and National Cancer Institute Metabolic pathways mostly in microorganisms and plants
36 BIBLIOGRAPHIC DATABASES Published experimental information Over 16 million citations from MEDLINE and other life science journals Links to full text articles Links to related articles Links to gene information and other NCBI databases US National library of Medicine
37 ONTOLOGIES IN MOLECULAR BIOLOGY Controlled and structured vocabularies, constructed with two purposes Proposing standard collections of terms for annotation Organizing the knowledge of a given field around its language Examples of ontologies Enzyme Commission Nomenclature: EC numbers for enzymes MeSH (Medical Subject Headings) terms: NLM controlled vocabulary Gene Ontology: controlled vocabulary to describe gene and gene product attributes in any organism
38 ONTOLOGIES IN MOLECULAR BIOLOGY Enzyme Commission Nomenclature. EC 1... Oxidoreductases. EC Acting on the CH OH group of donors. EC With NAD(+) or NADP(+) as acceptor. EC With a cytochrome as acceptor.
39 GENE ONTOLOGY Developed and maintained by the GO Consortium of laboratories and institutions involved in molecular biology database management Gene Ontology terms have been grouped in three ontologies: Molecular functions Biological processes Cellular components Most molecular biology databases have joined this initiative, and have included annotations following this standard
40 GENE ONTOLOGY Annotators Database curators Gene Ontology Annotation (GOA) database: central repository, - this controlled vocabulary to a non-redundant set of proteins GO browsers Search GO terms for a gene/protein Retrieve genes/proteins associated to a GO term applies
41 GenBank and EMBL GenBank and the EMBL nucleotide databases were founded in Contain ALL DNA sequences ever published. Submissions are made by the laboratories or centers that obtain them. Depositing sequences in GenBank / EMBL is a requisite of most journals to accept manuscripts in which new sequences are reported. In August 2003 they contained: 18, sequence files , nucleotides Every two months a new complete version of the database is published.
42 Organismal Divisions in GenBank PRI Primate ROD Rodent MAM Mammalian VRT Vertebrate INV Invertebrate PLN Plant BCT Bacterial RNA Structural VRL Viral PHG Phage SYN Synthetic UNA Unannotated Funcional Divisions in GenBank PAT Patent EST Expressed Sequence Tags STS Sequence Tagged Sites GSS Genome Survey Sequences HTG High Throughput Genome
43 Entrez is the indexing and data retrieval system developed by the NCBI The Entrez Global Query page searches NCBI Entrez databases, either individually or globally
44 MAIN DATABASES ACCESIBLE WITH ENTREZ PubMed: Bibliographic references in Molecular Biology and Medicine. Nucleotide: Composite database of DNA sequences from Genbank, EMBL and DDBJ, plus other databases or projects such as RefSeq. RefSeq contains nucleotide sequences from the Nucleotide database that have been curated or re annotated, by the NCBI. Protein: Proteins sequences derived from translation of DNA sequences in GenBank, EMBL y DDBJ, plus sequences from PIR, SWISSPROT and Protein Data Bank (PDB). Genome: Complete genomes, chromosomes, contig maps, physical maps.
45 MAIN DATABASES ACCESIBLE WITH ENTREZ Entrez Gene: locus centered database that integrates information from other databases (replaces LocusLink). Structure: (Molecular Modeling Database, MMDB) experimentally obtained structures from Protein Data Bank (PDB). Taxonomy: Names and taxonomy of organisms that have at least one sequence at the NCBI databases. OMIM: Online Mendelian Inheritance in Man, catalog of human mutations and associated diseases.
46 SEQUENCE FILES FLAT FILE FORMATS For sequence databases, the main formats are: FASTA GenBank EMBL and SwissPro
47 FASTA format > essential GI Accession.version Locus Additional information >gi emb X BSBOFCGEN B.subtilis bofc gene CTGCAGCGGCTGACAATAGCAGGCCGACAACGGTTGAGGTGTCAACAGCTGATTTTGTGATGAAGGATAA ACCGCATTTCTTTTTCCTTGAACGCTATAAGGATTCATATGAGGAGGAGATTCTCCGTTTTGCAGAAGCG ATCGGCACAAACCAGGAGACTCCCTGCACCGGCAATGACGGTTTACAGGCCGGGAGGATCGCCAGAGCAG CACAGCAATCGCTTGCTTTTGGCATGCCTGTTAGCATTGAGCACACTGAAAAAATCGCTTTTTAATCTAA CAGGATTACAATTCAGCAAGCTTGGGTATATACTCCATTGATACTTTAAGTAGGCGGTGGAGAAAATGAA TACAGTACATGCTAAAGGAAATGTTTTGAACAAAATCGGAATTCCTTCTCACATGGTTTGGGGTTATATT GGCGTTGTCATCTTTATGGTTGGAGACGGCCTCGAACAAGGCTGGCTGTCTCCTTTTCTCGTTGATCATG GTCTCAGTATGCAGCAATCCGCATCGTTATTTACCATGTACGGCATTGCTGTCACCATCTCAGCTTGGCT TTCAGGAACGTTTGTGGAAACTTGGGGGCCGAGAAAAACGATGACTGTCGGATTGCTTGCATTTATCCTC > CTGCAGCGGCTGACAATAGCAGGCCGACAACGGTTGAGGTGTCAACAGCTGATTTTGTGATGAAGGATAA ACCGCATTTCTTTTTCCTTGAACGCTATAAGGATTCATATGAGGAGGAGATTCTCCGTTTTGCAGAAGCG ATCGGCACAAACCAGGAGACTCCCTGCACCGGCAATGACGGTTTACAGGCCGGGAGGATCGCCAGAGCAG CACAGCAATCGCTTGCTTTTGGCATGCCTGTTAGCATTGAGCACACTGAAAAAATCGCTTTTTAATCTAA CAGGATTACAATTCAGCAAGCTTGGGTATATACTCCATTGATACTTTAAGTAGGCGGTGGAGAAAATGAA
48
49 GBFF: HEADER LOCUS BSBOFCGEN 2664 bp DNA linear BCT 15 APR 1997 DEFINITION B.subtilis bofc, orf1, csbx, and orf4 genes. ACCESSION X93081 VERSION X GI: KEYWORDS bofc gene; csbx gene; ORF1; ORF4. SOURCE Bacillus subtilis ORGANISM Bacillus subtilis Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. REFERENCE 1 AUTHORS Gomez,M. and Cutting,S.M. TITLE BofC encodes a putative forespore regulator of the Bacillus subtilis sigma K checkpoint JOURNAL Microbiology 143 (Pt 1), (1997) MEDLINE PUBMED REFERENCE 2 (bases 1 to 2664) AUTHORS Cutting,S.M. TITLE Direct Submission JOURNAL Submitted (14 NOV 1995) S.M. Cutting, Dept. of Microbiology, University of Pennsylvania School of Medicine, 346 Johnson Pavillon, 3610 Hamilton Walk, Philadelphia, PA , USA
50 GBFF: FEATURES FEATURES source gene CDS gene CDS Location/Qualifiers /organism="bacillus subtilis" /strain="py79" /isolate="168" /db_xref="taxon:1423" /germline /gene="orf1" < /gene="orf1" /codon_start=3 /transl_table=11 /protein_id="caa " /db_xref="gi: " /db_xref="sptrembl:o05389" /translation="aaadnsrpttvevstadfvmkdkphffflerykdsyeeei EAIGTNQETPCTGNDGLQAGRIARAAQQSLAFGMPVSIEHTEKIAF" /gene="csbx" /gene="csbx" /note="sigma B transcribed gene" /codon_start=1
51 GBFF: SEQUENCE BASE COUNT 670 a 518 c 690 g 786 t ORIGIN 1 ctgcagcggc tgacaatagc aggccgacaa cggttgaggt gtcaacagct gattttgtga 61 tgaaggataa accgcatttc tttttccttg aacgctataa ggattcatat gaggaggaga 121 ttctccgttt tgcagaagcg atcggcacaa accaggagac tccctgcacc ggcaatgacg 181 gtttacaggc cgggaggatc gccagagcag cacagcaatc gcttgctttt ggcatgcctg 241 ttagcattga gcacactgaa aaaatcgctt tttaatctaa caggattaca attcagcaag 301 cttgggtata tactccattg atactttaag taggcggtgg agaaaatgaa tacagtacat 361 gctaaaggaa atgttttgaa caaaatcgga attccttctc acatggtttg gggttatatt 421 ggcgttgtca tctttatggt tggagacggc ctcgaacaag gctggctgtc tccttttctc 481 gttgatcatg gtctcagtat gcagcaatcc gcatcgttat ttaccatgta cggcattgct 541 gtcaccatct cagcttggct ttcaggaacg tttgtggaaa cttgggggcc gagaaaaacg 601 atgactgtcg gattgcttgc atttatcctc ggttcggccg cttttatcgg ctgggcgatt 661 cctcatatgt attatccggc tctcttgggc agctatgctc ttagaggctt gggatatccg 721 ctgtttgcat actcttttct cgtatgggtg tcatacagca cctctcaaaa tattcttgga 781 aaagccgtcg gctggttttg gtttatgttt acgtgcggcc ttaacgtgct cggtccgttc 841 tattccagct atgcagttcc ggcctttgga gaaatcaata cgctttggag cgctttactg 901 tttgtggcgg caggcggaat tcttgcctta ttttttaaca aagataaatt tactccgata 961 caaaaacaag atcagccgaa atggaaagaa ctgtcgaagg catttacgat tatgtttgaa 1021 aaccctaagg taggcatcgg cggagtggtc aagacgatta atgcgatagg acaatttgga 1081 tttgccatct ttcttcctac ttatttagca cgatacgggt attcggtttc ggaatggctg 1141 caaatatggg ggactctgtt ttttgtgaat // LOCUS BSOTHERGENE 4356 bp DNA linear BCT 15 APR 1997
52 EMBL AND SwissProt FORMAT The field name is specified by a TWO CHARACTER keyword, in the beginning of each line, what makes easier to parse the file to extract specific information. ID BSBOFCGEN standard; genomic DNA; PRO; 2664 BP. XX AC X93081; XX SV X XX DT 15-APR-1997 (Rel. 51, Created) DT 15-APR-1997 (Rel. 51, Last updated, Version 12) XX DE B.subtilis bofc, orf1, csbx, and orf4 genes XX KW bofc gene; csbx gene; ORF1; ORF4. XX OS Bacillus subtilis OC Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. XX
53 EMBL AND SwissProt FORMAT RN [1] RX MEDLINE; RX PUBMED; RA Gomez M., Cutting S.M.; RT "BofC encodes a putative forespore regulator RT of the Bacillus subtilis sigma K checkpoint"; RL Microbiology 143: (1997). XX XX DR GOA; O DR GOA; O DR GOA; O DR GOA; O DR SWISS-PROT; O05389; YRBE_BACSU. DR SWISS-PROT; O05390; CSBX_BACSU. DR SWISS-PROT; O05391; BOFC_BACSU. DR SWISS-PROT; O05392; RUVA_BACSU.
54 EMBL AND SwissProt FORMAT FT source FT /db_xref="taxon:1423" FT /germline FT /mol_type="genomic DNA" FT /organism="bacillus subtilis" FT /strain="py79" FT /isolate="168" FT CDS < FT /codon_start=3 FT /db_xref="goa:o05389" FT /db_xref="swiss-prot:o05389" FT /transl_table=11 FT /gene="orf1" FT /protein_id="caa " FT /translation="aaadnsrpttvevstadfvmkdkphfffl ERYKDSYEEEILRFAEFTAIGTNQETPCTGNDGLQAGRIARAA QQSLAFGMPVSIEHTEKIAF"
55 EMBL AND SwissProt FORMAT XX SQ Sequence 2664 BP; 670 A; 518 C; 690 G; 786 T; 0 other; ctgcagcggc tgacaatagc aggccgacaa cggttgaggt gtcaacagct gattttgtga 60 tgaaggataa accgcatttc tttttccttg aacgctataa ggattcatat gaggaggaga 120 ttctccgttt tgcagaagcg atcggcacaa accaggagac tccctgcacc ggcaatgacg 180 gtttacaggc cgggaggatc gccagagcag cacagcaatc gcttgctttt ggcatgcctg 240 ttagcattga gcacactgaa aaaatcgctt tttaatctaa caggattaca attcagcaag 300 cttgggtata tactccattg atactttaag taggcggtgg agaaaatgaa tacagtacat 360 gctaaaggaa atgttttgaa caaaatcgga attccttctc acatggtttg gggttatatt 420 ggcgttgtca tctttatggt tggagacggc ctcgaacaag gctggctgtc tccttttctc 480 //
56
57
58 Sequence Retrieval System is a data warehouse developed by Lion and EBI It allows the connection of related information from many databases
59 SEQUENCE RETRIEVAL SYSTEM Developed at different successive institutions Started by Thure Etzold, at the EMBL EBI (Cambridge), financed in part by EMBnet Lion Biosciences. It is not a database. It is a data warehouse. SRS uses flat text file versions of many databases: EMBL, Swiss Prot, MEDLINE, etc., which are copied and indexed, to allow the connection of related information from several databases. There are many mirrors installed. The most popular is, probably, the one at the EBI. SRS offers access to more than 700 databases.
60 ACKNOWLEDGMENTS This presentation contains material from previous presentations by: - Manuel J. Gómez, Centro de Astrobiología, CSIC-INTA - Rodrigo López, European Bioinformatics Institute
Introduction to Molecular Biology Databases
Introduction to Molecular Biology Databases Laboratorio de Bioinformática Centro de Astrobiología INTA-CSIC Centro de Astrobiología PRESENT BIOLOGY RESEARCH Data sources Genome sequencing projects: genome
More informationEECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Sequence Alignment Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ Database What is database An organized set of data Can
More informationTypes of Databases - By Scope
Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SLU 2017 Biological Databases Sequence Databases Genome Databases Structure Databases Sequence Databases The sequence databases are the
More informationRedundancy at GenBank => RefSeq. RefSeq vs GenBank. Databases, cont. Genome sequencing using a shotgun approach. Sequenced eukaryotic genomes
Databases, cont. Redundancy at GenBank => RefSeq http://www.ncbi.nlm.nih.gov/books/bv.fcg i?rid=handbook RefSeq vs GenBank Many sequences are represented more than once in GenBank 2003 RefSeq collection
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More informationLecture 2 Introduction to Data Formats
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 2 Introduction to Data Formats Introduction to Data Formats Real world, data and formats Sequences and
More informationComputational Biology and Bioinformatics
Computational Biology and Bioinformatics Computational biology Development of algorithms to solve problems in biology Bioinformatics Application of computational biology to the analysis and management
More informationI nternet Resources for Bioinformatics Data and Tools
~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.
More informationProtein Bioinformatics Part I: Access to information
Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures
More informationIntroduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks
Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional
More informationBiological databases an introduction
Biological databases an introduction By Dr. Erik Bongcam-Rudloff SGBC-SLU 2016 VALIDATION Experimental Literature Manual or semi-automatic computational analysis EXPERIMENTAL Costs Needs skilled manpower
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationBioinformatics overview
Bioinformatics overview Aplicações biomédicas em plataformas computacionais de alto desempenho Aplicaciones biomédicas sobre plataformas gráficas de altas prestaciones Biomedical applications in High performance
More informationA Field Guide to GenBank and NCBI Molecular Biology Resources
A Field Guide to GenBank and NCBI Molecular Biology Resources slightly modified from Peter Cooper ftp://ftp.ncbi.nih.gov/pub/cooper/fieldguide/ Eric Sayers ftp://ftp.ncbi.nih.gov/pub/sayers/field_guide/u_penn/
More informationBioinformatics for Proteomics. Ann Loraine
Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data
More informationThe University of California, Santa Cruz (UCSC) Genome Browser
The University of California, Santa Cruz (UCSC) Genome Browser There are hundreds of available userselected tracks in categories such as mapping and sequencing, phenotype and disease associations, genes,
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationELE4120 Bioinformatics. Tutorial 5
ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar
More informationGene-centered resources at NCBI
COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving
More informationTwo Mark question and Answers
1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three
More informationIntroduction to BIOINFORMATICS
Introduction to BIOINFORMATICS Antonella Lisa CABGen Centro di Analisi Bioinformatica per la Genomica Tel. 0382-546361 E-mail: lisa@igm.cnr.it http://www.igm.cnr.it/pagine-personali/lisa-antonella/ What
More informationBioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will
More informationGenome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita
Genome Informatics Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, 2008 Kiyoko F. Aoki-Kinoshita Introduction Genome informatics covers the computer- based modeling and data processing
More informationAAGTGCCACTGCATAAATGACCATGAGTGGGCACCGGTAAGGGAGGGTGATGCTATCTGGTCTGAAG. Protein 3D structure. sequence. primary. Interactions Mutations
Introduction to Databases Lecture Outline Shifra Ben-Dor Irit Orr Introduction Data and Database types Database components Data Formats Sample databases How to text search databases What units of information
More informationArray-Ready Oligo Set for the Rat Genome Version 3.0
Array-Ready Oligo Set for the Rat Genome Version 3.0 We are pleased to announce Version 3.0 of the Rat Genome Oligo Set containing 26,962 longmer probes representing 22,012 genes and 27,044 gene transcripts.
More informationBioinformatics for Cell Biologists
Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM) Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena
More informationIntroduc)on to Databases and Resources Biological Databases and Resources
Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs
More informationKlinisk kemisk diagnostik BIOINFORMATICS
Klinisk kemisk diagnostik - 2017 BIOINFORMATICS What is bioinformatics? Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological,
More informationGREG GIBSON SPENCER V. MUSE
A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationGenome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007
Genome and DNA Sequence Databases BME 110: CompBio Tools Todd Lowe April 5, 2007 Admin Reading: Chapters 2 & 3 Notes available in PDF format on-line (see class calendar page): http://www.soe.ucsc.edu/classes/bme110/spring07/bme110-calendar.html
More informationDatabases/Resources on the web
Databases/Resources on the web Jon K. Lærdahl jonkl@medisin.uio.no A lot of biological databases available on the web... MetaBase, the database of biological databases (1801 entries) - h p://metadatabase.org
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics 260.602.01 September 1, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Teaching assistants Hugh Cahill (hugh@jhu.edu) Jennifer Turney (jturney@jhsph.edu) Meg Zupancic
More informationThe Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 71 74. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.235 Conference Review The Gene Ontology Annotation
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be SUPPLEMENTARY CHAPTER: DATA BASES AND MINING 1 What
More informationNCBI Molecular Biology Resources. NCBI Resources
NBI Molecular Biology Resources A Field Guide NBI Resources The NBI Entrez System NBI Sequence Databases Primary data: GenBank Derivative data: RefSeq, Gene Protein Structure and Function Sequence polymorphisms
More informationSince 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL
Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL PIR-PSD Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database o A high quality
More informationIntroduction to 'Omics and Bioinformatics
Introduction to 'Omics and Bioinformatics Chris Overall Department of Bioinformatics and Genomics University of North Carolina Charlotte Acquire Store Analyze Visualize Bioinformatics makes many current
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools News About NCBI Site Map
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2015-2016 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationWhat You NEED to Know
What You NEED to Know Major DNA Databases NCBI RefSeq EBI DDBJ Protein Structural Databases PDB SCOP CCDC Major Protein Sequence Databases UniprotKB Swissprot PIR TrEMBL Genpept Other Major Databases MIM
More informationGene-centered databases and Genome Browsers
COURSE OF BIOINFORMATICS a.a. 2016-2017 Gene-centered databases and Genome Browsers We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about
More informationAgenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence
Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase
More informationAnnotation. (Chapter 8)
Annotation (Chapter 8) Genome annotation Genome annotation is the process of attaching biological information to sequences: identify elements on the genome attach biological information to elements store
More informationDatabases in Bioinformatics. Molecular Databases. Molecular Databases. NCBI Databases. BINF 630: Bioinformatics Methods
Databases in Bioinformatics BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Molecular Databases Molecular Databases Nucleic acid sequences: GenBank, DNA Data Bank of Japan, EMBL
More informationIntroduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools
Introduction and Public Sequence Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 29, 2011 Course Syllabus: Admin http://www.soe.ucsc.edu/classes/bme110/spring11 Reading: Chapters 1, 2 (pp.29-56),
More informationSequence Based Function Annotation
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationWeb-based Bioinformatics Applications in Proteomics
Web-based Bioinformatics Applications in Proteomics Chiquito Crasto ccrasto@genetics.uab.edu January 30, 2009 NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ 1 Pubmed
More informationCompiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology
Bioinformatics Model Answers Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology Page 1 of 15 Previous years questions asked. 1. Describe the software used in bioinformatics 2. Name four
More informationab initio and Evidence-Based Gene Finding
ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene
More informationAccess to Information from Molecular Biology and Genome Research
Future Needs for Research Infrastructures in Biomedical Sciences Access to Information from Molecular Biology and Genome Research DG Research: Brussels March 2005 User Community for this information is
More informationThis software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part
This software/database/presentation is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government
More informationGenetics and Bioinformatics
Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s
More informationGenome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)
Maj Gen (R) Suhaib Ahmed, I (M) The human genome comprises DNA sequences mostly contained in the nucleus. A small portion is also present in the mitochondria. The nuclear DNA is present in chromosomes.
More informationRegulation of eukaryotic transcription:
Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:
More informationNCBI & Other Genome Databases. BME 110/BIOL 181 CompBio Tools
NCBI & Other Genome Databases BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2011 Admin Reading Dummies Ch 3 Assigned Review: "The impact of next-generation sequencing technology on genetics" by E.
More informationDatabases in genomics
Databases in genomics Search in biological databases: The most common task of molecular biologist researcher, to answer to the following ques7ons:! Are they new sequences deposited in biological databases
More informationSequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University
Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned
More informationWeb-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide.
Page 1 of 18 Web-based tools for Bioinformatics; A (free) introduction to (freely available) NCBI, MUSC and World-wide. When and Where---Wednesdays 1-2pm Room 438 Library Admin Building Beginning September
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationThe Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica
The Ensembl Database Dott.ssa Inga Prokopenko Corso di Genomica 1 www.ensembl.org Lecture 7.1 2 What is Ensembl? Public annotation of mammalian and other genomes Open source software Relational database
More informationBioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases
Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras Lecture - 5a Protein sequence databases In this lecture, we will mainly discuss on Protein Sequence
More informationProtein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)
Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical
More informationEnsembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets
Ensembl workshop Thomas Randall, PhD tarandal@email.unc.edu bioinformatics.unc.edu www.unc.edu/~tarandal/ensembl handouts, papers, datasets Ensembl is a joint project between EMBL - EBI and the Sanger
More informationData Retrieval from GenBank
Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing
More informationA WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING
A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING D. Martucci a, F. Pinciroli a,b, M. Masseroli a a Dipartimento di Bioingegneria, Politecnico di Milano, Milano,
More informationMotivation From Protein to Gene
MOLECULAR BIOLOGY 2003-4 Topic B Recombinant DNA -principles and tools Construct a library - what for, how Major techniques +principles Bioinformatics - in brief Chapter 7 (MCB) 1 Motivation From Protein
More informationG4120: Introduction to Computational Biology
G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Lecture 3 February 13, 2003 Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics
More information11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11
Proteomics, functional genomics, and systems biology Biosciences 741: Genomics Fall, 2013 Week 11 1 Figure 6.1 The future of genomics Functional Genomics The field of functional genomics represents the
More informationIntroduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph
Introduction to Plant Genomics and Online Resources Manish Raizada University of Guelph Genomics Glossary http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml Annotation Adding pertinent
More informationApplied Bioinformatics
Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement
More informationIntroduction to Bioinformatics. What are the goals of the course? Who is taking this course? Textbook. Web sites. Literature references
Introduction to Bioinformatics Who is taking this course? People with very diverse backgrounds in biology Some people with backgrounds in computer science and biostatistics Most people (will) have a favorite
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationMARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305
MARINE BIOINFORMATICS & NANOBIOTECHNOLOGY - PBBT305 UNIT-1 MARINE GENOMICS AND PROTEOMICS 1. Define genomics? 2. Scope and functional genomics? 3. What is Genetics? 4. Define functional genomics? 5. What
More informationGENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF GENOMICS Genomics is the study of genomes in their entirety Bioinformatics is the analysis of the information content of genomes - Genes, regulatory sequences,
More informationuser s guide Question 3
Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.
More informationEntrez Gene: gene-centered information at NCBI
D54 D58 Nucleic Acids Research, 2005, Vol. 33, Database issue doi:10.1093/nar/gki031 Entrez Gene: gene-centered information at NCBI Donna Maglott*, Jim Ostell, Kim D. Pruitt and Tatiana Tatusova National
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationLeonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck
More informationDina El-Khishin (Ph.D.) Bioinformatics Research Facility. Deputy Director of AGERI & Head of the Genomics, Proteomics &
Dina El-Khishin (Ph.D.) Deputy Director of AGERI & Head of the Genomics, Proteomics & Bioinformatics Research Facility Agricultural Genetic Engineering Research Institute (AGERI) Giza EGYPT Bioinformatics
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 7
More informationSequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases
Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around
More informationIntroduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il TA: Oleg Rokhlenko Lecture 1 Introduction to Bioinformatics Introduction to Bioinformatics What is Bioinformatics?
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationGene Finding Genome Annotation
Gene Finding Genome Annotation Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics Population biology & evolution Medical genomics
More informationBig picture and history
Big picture and history (and Computational Biology) CS-5700 / BIO-5323 Outline 1 2 3 4 Outline 1 2 3 4 First to be databased were proteins The development of protein- s (Sanger and Tuppy 1951) led to the
More informationGlobal Biomolecular Information Infrastructure and Australia. Graham Cameron Director The EMBL Australia Bioinformatics Resource
Global Biomolecular Information Infrastructure and Australia Graham Cameron Director The EMBL Australia Bioinformatics Resource What is bioinformatics? Methods, data, IT to exploit biomolecular information
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Brad Broom Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org bmbroom@mdanderson.org 8
More informationINTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet
INTRODUCTION TO BIOINFORMATICS SAINTS GENETICS 12-120522 - Ian Bosdet (ibosdet@bccancer.bc.ca) Bioinformatics bioinformatics is: the application of computational techniques to the fields of biology and
More information2. The dropdown box has a number of databases that are searchable. Select the gene option and search for dihydrofolate reductase.
Bioinformatics Introduction Worksheet The first part of this exercise is aimed at walking you through some of the key tools used by scientists to explore the relationship between genes and proteins throughout
More informationBIMM 143: Introduction to Bioinformatics (Winter 2018)
BIMM 143: Introduction to Bioinformatics (Winter 2018) Course Instructor: Dr. Barry J. Grant ( bjgrant@ucsd.edu ) Course Website: https://bioboot.github.io/bimm143_w18/ DRAFT: 2017-12-02 (20:48:10 PST
More informationA New Database of Genetic and. Molecular Pathways. Minoru Kanehisa. sequencing projects have been. Mbp) and for several bacteria including
Toward Pathway Engineering: A New Database of Genetic and Molecular Pathways Minoru Kanehisa Institute for Chemical Research, Kyoto University From Genome Sequences to Functions The Human Genome Project
More informationBiology 644: Bioinformatics
Processes Activation Repression Initiation Elongation.... Processes Splicing Editing Degradation Translation.... Transcription Translation DNA Regulators DNA-Binding Transcription Factors Chromatin Remodelers....
More informationEvolutionary Genetics. LV Lecture with exercises 6KP. Databases
Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP Databases HS2018 Bioinformatics - R R Assignment The Minimalistic Approach!2 Bioinformatics - R Possible Exam Questions for R: Q1: The function
More informationDigital information cycle. Database. Database. BINF 630: Bioinformatics Methods
Digital information cycle BINF 630: Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu Creation and capture Storage and management Rights management Search and access Distribution Electronic
More informationuser s guide Question 1
Question 1 How does one find a gene of interest and determine that gene s structure? Once the gene has been located on the map, how does one easily examine other genes in that same region? doi:10.1038/ng966
More information