ONLINE BIOINFORMATICS RESOURCES

Similar documents
Protein Bioinformatics Part I: Access to information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Sequence Databases and database scanning

Web-based Bioinformatics Applications in Proteomics

Textbook Reading Guidelines

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Bioinformatics for Cell Biologists

Types of Databases - By Scope

NCBI web resources I: databases and Entrez

I nternet Resources for Bioinformatics Data and Tools

ELE4120 Bioinformatics. Tutorial 5

Klinisk kemisk diagnostik BIOINFORMATICS

Sequence Based Function Annotation

Protein structure. Wednesday, October 4, 2006

Biological databases an introduction

Protein Structure Databases, cont. 11/09/05

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Two Mark question and Answers

Web based Bioinformatics Applications in Proteomics. Genbank

Gene-centered resources at NCBI

Biological databases an introduction

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

BIMM 143: Introduction to Bioinformatics (Winter 2018)

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

Genome Informatics. Systems Biology and the Omics Cascade (Course 2143) Day 3, June 11 th, Kiyoko F. Aoki-Kinoshita

Textbook Reading Guidelines

BIOLOGY 200 Molecular Biology Students registered for the 9:30AM lecture should NOT attend the 4:30PM lecture.

Introduction to Bioinformatics

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

B I O I N F O R M A T I C S

Access to Information from Molecular Biology and Genome Research

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

MOL204 Exam Fall 2015

Introduction to EMBL-EBI.

Lecture 7 Motif Databases and Gene Finding

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

Dina El-Khishin (Ph.D.) Bioinformatics Research Facility. Deputy Director of AGERI & Head of the Genomics, Proteomics &

G4120: Introduction to Computational Biology

The University of California, Santa Cruz (UCSC) Genome Browser

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

GREG GIBSON SPENCER V. MUSE

RESEARCH METHODOLOGY, BIOSTATISTICS AND IPR

Introduction to Bioinformatics

Chapter 2: Access to Information

Array-Ready Oligo Set for the Rat Genome Version 3.0

Introduction to Bioinformatics

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia

Introduction to BIOINFORMATICS

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

VALLIAMMAI ENGINEERING COLLEGE

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

G4120: Introduction to Computational Biology

Computational methods in bioinformatics: Lecture 1

!"#$!"#$%&'&()#*'+*,-*%#".+/&0+("%&1)#.+*%,+#')%)#.

CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS

Will discuss proteins in view of Sequence (I,II) Structure (III) Function (IV) proteins in practice

Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology

Functional analysis using EBI Metagenomics

Computational Biology and Bioinformatics

Protein Bioinformatics PH Final Exam

BGGN 213: Foundations of Bioinformatics (Fall 2017)

Function Prediction of Proteins from their Sequences with BAR 3.0

G4120: Introduction to Computational Biology

Bioinformatics. Biomolecular databases

Basic Bioinformatics: Homology, Sequence Alignment,

NiceProt View of Swiss-Prot: P18907

The Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro

Community-assisted genome annotation: The Pseudomonas example. Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada

Genome Resources. Genome Resources. Maj Gen (R) Suhaib Ahmed, HI (M)

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

CFSSP: Chou and Fasman Secondary Structure Prediction server

Product Applications for the Sequence Analysis Collection

CHEM 436 / 630. Molecular modelling of proteins. Winter 2018 Term. Instructor: Guillaume Lamoureux Concordia University, Montréal, Canada

Browsing Genomes with Ensembl Genomes

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

EE550 Computational Biology

MATH 5610, Computational Biology

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

The RNA tools registry

How to use Variant Effects Report

Translating Biological Data Sets Into Linked Data

Bioinformatics for Proteomics. Ann Loraine

Sequence Analysis. Introduction to Bioinformatics BIMMS December 2015

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Metabolism and metabolic networks. Metabolism is the means by which cells acquire energy and building blocks for cellular material

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Browsing Genes and Genomes with Ensembl

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

Introduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1

Databases in genomics

Introduc)on to Databases and Resources Biological Databases and Resources

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

BIO 152 Principles of Biology III: Molecules & Cells Acquiring information from NCBI (PubMed/Bookshelf/OMIM)

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Structure & Function. Ulf Leser

ab initio and Evidence-Based Gene Finding

Transcription:

Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014

The larger picture.. Lower cost, Improved efficiency and modern technology: The sequencers, computers, and rate at which data is being produced have all increased in speed - huge amounts of biological data produced at a FASTER rate than interpreted. Storage and Exchange of data: Need to improve database design, develop better software for database access / manipulation, harmonize data-entry procedures to compensate for the varied computer procedures and systems used in different laboratories Computing power, access to information (data) and right software (tools) has enabled scientists increase rate at answering research questions and new discoveries

Online Bioinformatics resources INFORMATION Journals / publications, biological research data. TOOLS Programs to analyse data

Information Journal Website: Almost every major journal has a web access to abstracts is usually free, even when the content is subscription. E-journals: Some electronic journals are online-only journals; some are online versions of printed journals, and some consist of the online equivalent of a printed journal, but with additional online-only (sometimes video and interactive media) material.

Information Servers (eg NCBI Pubmed; Google scholar; SCOPUS) A search engine to search references and abstracts on life sciences and biomedical topics in multiple databases

Sequence Databases Primary Databases: International Nucleotide Sequence Database Collaboration comprises of GenBank (USA), European Nucleotide Archive (Europe) and DNA DataBase of Japan. http://www.ncbi.nlm.nih.gov They cooperate to make all publicly available sequences available. http://www.ebi.ac.uk/ena/

Sequence Databases Secondary Databases UniProt: It is a database of protein sequence and functional information. Information about the biological function of proteins derived from the research literature Available under Uniprot SWISS PROT: Manually annotated and reviewed TrEMBL: Automatically annotated and not reviewed. Uniprot: www.uniprot.org

Other Databases Multigenome: ensembl: genome databases for vertebrates and other eukaryotic species, Genome specific databases: eg Wormbase; Saccharomyces database; Vectorbase; Flu DB; Zebrafish Model Organisms, Flybase etc. NB: You can also access to these genomes from the public databases Biochemical pathway databases: Biological activities are orchestrated by various molecules These include KEGG; ExPASY; MetaCyc; BioPath Gene expression Data: DNA Microarrays: array of probe molecules that can bind specific DNA / mrna. Fluorolabelling enables viewing level of expression of genes; eg NCBI Geo (Gene experssion omnibus) Expression Atlas (EBI) 2D PAGE: allows quantitative study of protein concentration in www.yeastgenome.org/ the cell. Eg SWISS-2D PAGE www.fudb.org www.zfin.org www.wormbase.org/ http://vectorbase.org/ http://www.metacyc.org/ KEGG: http://www.genome.jp http://web.expasy.org/pathways/ http://www.molecular-networks.com/databases/biopath http://www.ncbi.nlm.nih.gov/geo/ http://www.ebi.ac.uk/gxa http://world-2dpage.expasy.org/swiss-2dpage/

Protein Structure Databases Protein Databank (PDB) (experimentally validated) Secondary & Others: ModBase: A database of annotated comparative protein structure models Modelled proteins) SCOP: Structural classification of Proteins Depending on α ; β ; α+β ; membrane & cell surface proteins; small proteins; coiled coil proteins, etc CATH: hierarchical domain classification of protein structures in the Protein Data Bank (Class Architecture Topology Homologous superfamilies) www.pdb.org https://modbase.compbio.ucsf.edu/modweb/

Softwares / Tools Sources of (informative) tools: Journals eg Bioinformatics, Nucleic Acids Research, Journal of Molecular Biology, Protein science publish papers on cutting edge developments and innovations in computational biology methods Most biological databases have software resource listings- eg Sequence searching, visualisation resources (genome / alignment / genome level). Web servers: Simple web implementation of the softwares. Clear inputs, outputs, parameters, graphical data representation and downloadable results.

Basic tasks Task Objective Tool Sequence similarity Sequence alignment Gene finding DNA Translation Local pairwise alignment Global pairwise alignment Identify homologous sequences to gain information Identify conserved regions, domains, motifs Identify coding regions in gdna sequences Convert DNA Sequence to protein Detect short regions of homology in longer sequences Find the best full length alignment in 2 sequences BLAST; SSEARCH; FASTA, ENA search CLUSTAL; Muscle; GENSCAN; ORFinder; GRAIL Various servers and tools eg ExPasy, Transeq, EBI, and ORFinders BLAST; FASTA ALIGN http://web.expasy.org/translate/

http://www.ebi.ac.uk/tools

http://www.ebi.ac.uk/tools

Protein Domains A protein domain is a conserved part of a given protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. Databases include: SMART; PROSITE; NCBI; CATH GLYCINAMIDE RIBONUCLEOTIDE SYNTHETASE (GAR-SYN) FROM E. COLI. http://prosite.expasy.org/ http://smart.embl-heidelberg.de/

Motifs A motif is a locally, conserved region / short sequence pattern shared by set of sequence; (Multiple sequence analysis) Thus can be indicative of function / structural similarities. Can be displayed via Sequence logos, or as patterns of amino acids. Patterns of amino acids [PROSITE]: For example N-glycosylation site motif takes the form: N{P}[ST]{P} To mean: Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro

DSSP: Wolfgang Kabsch and Chris Sander EXPASY: seconadary structure prediction JPRED 3: from university of Dundee PDB: DSSP prediction

Honorable mention About PredictProtein PredictProtein integrates feature prediction for secondary structure, solvent accessibility, transmembrane helices, globular regions, coiled-coil regions, structural switch regions, B-values, disorder regions, intraresidue contacts, protein-protein and protein-dna binding sites, sub-cellular localization, domain boundaries, beta-barrels, cysteine bonds, metal binding sites and disulphide bridges. Listed below is a comprehensive list of methods and databases currently incorporated into the server.

Integrated Methods Method Name Description Reference Web Server Download PROFphd Prediction of secondary structure and solvent accessibility - - ftp://rostlab.org/profphd PHDhtm Prediction of transmembrane helices - - ftp://rostlab.org/profphd PROFtmb Prediction of transmembrane beta-barrels - - ftp://rostlab.org/proftmb NORSp Predictor of non-regular secondary structure - - ftp://rostlab.org/norsp NCOILS Calculates the probability that the sequence will adopt a coiled-coil conformation - - - SEG Identifies low complexity regions - - - DISULFIND Prediction of disulfide bridges - - ftp://rostlab.org/profbval PROFBval Prediction of residue mobility - - ftp://rostlab.org/profbval NorsNet Prediction protein disordered sites - - ftp://rostlab.org/norsnet UCON Contact based prediction of disordered sites - - METADISORDER Consensus based prediction of protein disorder - - ISIS Prediction of protein-protein interaction sites - - LocTree2 Prediction of sub-cellular localization for all domains of life - - - MetaStudent Prediction of GO terms for Molecular Function and Biological Process - - ftp://rostlab.org/metastudent SNAP2 Prediction of functional changes due to single nucleotide polymorphism - - - ConSurf Prediction of functional changes due to single nucleotide polymorphism - - - Supporting Databases Database Name Description Reference Download UniRef Used for sequence homology searches by PSI-BLAST abstract http://www.uniprot.org/help/uniref PDB The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. As a member of the wwpdb, the RCSB PDB curates and annotates PDB data according to agreed upon standards. abstract http://www.rcsb.org/pdb/download/download.do Pfam A collection of protein families abstract ftp://ftp.sanger.ac.uk/pub/databases/pfam/releases/ PROSITE Database of biologically significant sites, patterns and profiles abstract ftp://ftp.expasy.org/databases/prosite/

In a Nutshell There is vast amount of available out there. There is a vast number of tools out there.. Select best that is applicable for your project..

Thank you Dedan Githae Bioinformatician BecA-ILRI Hub d.githae@cgiar.org Online Bioinformatics resources http://hub.africabiosciences.org