Big picture and history

Similar documents
Two Mark question and Answers

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Introduction to BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS

Genome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)

Computational methods in bioinformatics: Lecture 1

Introduction and Public Sequence Databases. BME 110/BIOL 181 CompBio Tools

The University of California, Santa Cruz (UCSC) Genome Browser

Introduction to Bioinformatics

Basic Bioinformatics: Homology, Sequence Alignment,

ELE4120 Bioinformatics. Tutorial 5

Tutorial for Stop codon reassignment in the wild

GREG GIBSON SPENCER V. MUSE

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Bioinformatics for Cell Biologists

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

COMPUTER RESOURCES II:

Introduction to 'Omics and Bioinformatics

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching!

Protein Bioinformatics Part I: Access to information

Gene-centered resources at NCBI

Chapter 2: Access to Information

Types of Databases - By Scope

MATH 5610, Computational Biology

Textbook Reading Guidelines

Bioinformatics for Cell Biologists

Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology

Bioinformatics for Proteomics. Ann Loraine

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Engineering Genetic Circuits

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

BLASTing through the kingdom of life

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Genes are coded DNA instructions that control the production of proteins within a cell. The first step in decoding genetic messages is to copy a part

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Bioinformatics, in general, deals with the following important biological data:

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

Sequencing the Human Genome

CSE/Beng/BIMM 182: Biological Data Analysis. Instructor: Vineet Bafna TA: Nitin Udpa

SAMPLE LITERATURE Please refer to included weblink for correct version.

Introduction to Algorithms in Computational Biology Lecture 1

What is Bioinformatics?

Retrieval of gene information at NCBI

Discover the Microbes Within: The Wolbachia Project. Bioinformatics Lab

Assembling Protein Molecules

Information Extraction from Biomedical Text

BIO 101 : The genetic code and the central dogma

Introduction to BIOINFORMATICS

Introduction to Bioinformatics

Annotation and the analysis of annotation terms. Brian J. Knaus USDA Forest Service Pacific Northwest Research Station

CS313 Exercise 1 Cover Page Fall 2017

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

I nternet Resources for Bioinformatics Data and Tools

Genome Sequence Assembly

Download the Lectin sequence output from

Bioinformation by Biomedical Informatics Publishing Group

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Practical Bioinformatics for Biologists (BIOS 441/641)

The Integrated Biomedical Sciences Graduate Program

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

NOTES Gene Expression ACP Biology, NNHS

Sequence Based Function Annotation

Bioinformatics Course AA 2017/2018 Tutorial 2

Databases/Resources on the web

Introduc)on to Databases and Resources Biological Databases and Resources

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Introduction. CS482/682 Computational Techniques in Biological Sequence Analysis

B I O I N F O R M A T I C S

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING

Molecular Biology. IMBB 2017 RAB, Kigali - Rwanda May 02 13, Francesca Stomeo

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Which Process Is The First Step In Making A Protein From Dna Instructions >>>CLICK HERE<<<

Why Use BLAST? David Form - August 15,

Textbook Reading Guidelines

Computers in Biology and Bioinformatics

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

Genomics and Proteomics *

Written by: Prof. Brian White

Introduction on Several Popular Nucleic Acids Databases

Who, When, and Where. Section Days & Times

Annotation. (Chapter 8)

Bioinformatics Translation Exercise

Outline. Computational Genomics and Molecular Biology. Genes Encode Proteins. DNA forms a double stranded helix. DNA replication T C A G

Biotechnology Explorer

Practical Bioinformatics for Biologists (BIOS493/700)

Why learn sequence database searching? Searching Molecular Databases with BLAST

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Transcription:

Big picture and history (and Computational Biology) CS-5700 / BIO-5323

Outline 1 2 3 4

Outline 1 2 3 4

First to be databased were proteins The development of protein- s (Sanger and Tuppy 1951) led to the of representatives of several of the more common protein families such as cytochromes from a variety of organisms. Margaret Dayhoff (1972, 1978) and her collaborators at the National Biomedical Research Foundation (NBRF), Washington, DC, were the first to assemble of these into a protein sequence atlas in the 1960s, and their collection center eventually became known as the Information Resource (PIR, formerly Identification Resource Dayhoff and her coworkers organized the proteins into families and superfamilies based on the degree of sequence similarity.

First to be databased were proteins

Outline 1 2 3 4

were first assembled at Los Alamos National Laboratory (LANL), New Mexico, by Walter Goad and colleagues in the GenBank database and at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany. Initially, a sequence entry included a computer filename and DNA or protein sequence files. These were eventually expanded to include much more information about the sequence, such as function, mutations, encoded proteins, regulatory sites, and references. This information was then placed along with the sequence into a database format that could be readily searched for many types of information.

Outline 1 2 3 4

from public An important step in providing sequence database access was the development of Web pages that allow queries to be made of the major sequence (GenBank, EMBL, etc.). An early example of this technology at NCBI was a menu-driven program called GEN-INFO developed by D. Benson, D. Lipman, and colleagues. This program searched rapidly through previously indexed sequence for entries that matched a biologist s query. Subsequently, a derivative program called ENTREZ with a simple window-based interface, and eventually a Web-based interface, was developed at NCBI. The idea behind these programs was to provide an easy-to-use interface with a flexible search procedure to the sequence.

Outline 1 2 3 4

Because DNA involves ordering a set of peaks (A, G, C, or T) on a gel, the process can be quite error-prone, depending on the quality of the data. As more s became available in the late 1970s, interest also increased in developing computer programs to analyze these in various ways. In 1982 and 1984, Nucleic Acids Research published two special issues devoted to the application of computers for sequence analysis, including programs for large mainframe computers down to the then-new microcomputers.

Outline 1 2 3 4

for comparing In 1970, A.J. Gibbs and G.A. McIntyre (1970) described a new for comparing two amino acid and nucleotide in which a graph was drawn with one sequence writ- ten across the page and the other down the left-hand side. Whenever the same letter appeared in both, a dot was placed at the intersection of the corresponding sequence positions on the graph

Outline 1 2 3 4

, global, local, and multiple Various s for aligning entire matching segments, small matching adjacent segments, and multiple variable-length segments.

Outline 1 2 3 4

Prediction of RNA secondary Methods for predicting RNA secondary on computers were also developed at an early time. For example, if the complement of a sequence on an RNA molecule is repeated down the sequence in the opposite chemical direction, the regions may base-pair and form a hairpin

Prediction of protein and RNA There are a large number of proteins whose are known, but very few whose s have been solved. Solving protein s involves the time-consuming and highly specialized procedures of X-ray crystallography and nuclear magnetic resonance (NMR). Consequently, there is much interest in trying to predict the of a protein, given its sequence. Early attempts were made at predicting protein from sequence.

Outline 1 2 3 4

, DNA, and RNA Variations within a family of related nucleic acid or protein provide an invaluable source of information for evolutionary biology, enabling the discovery of between species in an objectively quantifiable manner.

Outline 1 2 3 4

The first genome database The first genome database, was called ACEDB (a C. elegans database), and the s to access this database were developed by Mike Cherry and colleagues (Cherry and Cartinhour 1993). This database was accessible through the internet and allowed of, information about genes and mutants, investigator addresses, and references. Similar were subsequently developed using the same s for A. thaliana and S. cerevisiae.

Outline 1 2 3 4

And then the field of bioinformatics exploded from 1982 to the present, the number of bases in GenBank has doubled approximately every 18 months. As of 15 August 2017, GenBank release 221.0 has 203,180,606 loci, 240,343,378,258 bases, from 203,180,606 reported.

Outline 1 2 3 4

Outline 1 2 3 4

Nexus of many fields

Nexus of many fields

Contrasted to data science Same job, way worse pay...

Slightly more detail

Even more detail

A different perspective

AI s in

Outline 1 2 3 4

and computational biology https://en.wikipedia.org/wiki/computational_epidemiology https://en.wikipedia.org/wiki/mathematical_modelling_of_infectious_disease https://en.wikipedia.org/wiki/compartmental_models_in_epidemiology https://en.wikipedia.org/wiki/computational_biology https://en.wikipedia.org/wiki/ https://en.wikipedia.org/wiki/_assembly https://en.wikipedia.org/wiki/_analysis https://en.wikipedia.org/wiki/comparative_genomics https://en.wikipedia.org/wiki/health_informatics https://en.wikipedia.org/wiki/imaging_informatics https://en.wikipedia.org/wiki/neuroinformatics https://en.wikipedia.org/wiki/computational_neuroscience https://en.wikipedia.org/wiki/modelling_biological_systems https://en.wikipedia.org/wiki/computational_phylogenetics https://en.wikipedia.org/wiki/computational_genomics https://en.wikipedia.org/wiki/biodiversity_informatics https://en.wikipedia.org/wiki/biological_network https://en.wikipedia.org/wiki/structural_bioinformatics https://en.wikipedia.org/wiki/ecosystem_model https://en.wikipedia.org/wiki/models_of_dna_evolution https://en.wikipedia.org/wiki/translational_bioinformatics https://en.wikipedia.org/wiki/gene_ https://en.wikipedia.org/wiki/gene_prediction https://en.wikipedia.org/wiki/bioimage_informatics https://en.wikipedia.org/wiki/ prediction https://en.wikipedia.org/wiki/computational_anatomy https://en.wikipedia.org/wiki/cellular_model

Outline 1 2 3 4

Ontology In computer science and information science, an is a formal naming and definition of the types, properties, and inter of the entities that really exist in a particular domain of discourse. An upper (or foundation ) is a model of the common objects that are generally applicable across a wide range of domain ontologies. It usually employs a core glossary that contains the terms and associated object descriptions as they are used in various relevant domain sets, for example, the Basic Formal Ontology (BFO) Domain : Open Biomedical (abbreviated OBO; formerly Open Biological ) is an effort to create controlled vocabularies for shared use across different biological and medical domains. As of 2006, OBO forms part of the resources of the U.S. National Center for Biomedical Ontology where it will form a central element of the NCBO s BioPortal.

The Ontology (SO) at www.sequence.org/ is a collaborative project for the definition of sequence features used in biological sequence annotation. For example, an X element combinatorial repeat is a repeat region located between the X element and the telomere or adjacent Y element.

The Gene Ontology (GO) is a controlled vocabulary that connects each gene to one or more functions. The is intended to categorize gene products rather than the genes themselves. Different products of the same gene may play very different roles, and labelling and treating all of these functions under the same gene name may (and often does) lead to confusion.

Outline 1 2 3 4

Databases More to come later