Bioinformatics for Cell Biologists

Similar documents
Types of Databases - By Scope

Bioinformatics for Proteomics. Ann Loraine

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Sequence Based Function Annotation

Gene-centered resources at NCBI

BIMM 143: Introduction to Bioinformatics (Winter 2018)

Two Mark question and Answers

Bioinformatics for Cell Biologists

Annotation. (Chapter 8)

Retrieval of gene information at NCBI

NCBI web resources I: databases and Entrez

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

ONLINE BIOINFORMATICS RESOURCES

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

user s guide Question 3

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

Introduction to Bioinformatics

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Chapter 2: Access to Information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Since 2002 a merger and collaboration of three databases: Swiss-Prot & TrEMBL

Protein Bioinformatics Part I: Access to information

user s guide Question 3

BGGN 213: Foundations of Bioinformatics (Fall 2017)

Gene-centered databases and Genome Browsers

Gene-centered databases and Genome Browsers

ELE4120 Bioinformatics. Tutorial 5

Computational Biology and Bioinformatics

INTRODUCTION TO BIOINFORMATICS. SAINTS GENETICS Ian Bosdet

Biological databases an introduction

The Gene Ontology Annotation (GOA) project application of GO in SWISS-PROT, TrEMBL and InterPro

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

Browsing Genes and Genomes with Ensembl

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Introduction to BIOINFORMATICS

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

GS Analysis of Microarray Data

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

B I O I N F O R M A T I C S

Introduc)on to Databases and Resources Biological Databases and Resources

Important gene-information's

GS Analysis of Microarray Data

Ingenuity Pathway Analysis (IPA )

Worksheet for Bioinformatics

Databases/Resources on the web

Applied Bioinformatics

Web-based Bioinformatics Applications in Proteomics

GS Analysis of Microarray Data

Overview of Health Informatics. ITI BMI-Dept

GREG GIBSON SPENCER V. MUSE

Array-Ready Oligo Set for the Rat Genome Version 3.0

Product Applications for the Sequence Analysis Collection

Final exam: Introduction to Bioinformatics and Genomics DUE: Friday June 29 th at 4:00 pm

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Biological databases an introduction

PATHWAY ANALYSIS. Susan LM Coort, PhD Department of Bioinformatics, Maastricht University. PET course: Toxicogenomics

Sequence Databases and database scanning

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

user s guide Question 1

The University of California, Santa Cruz (UCSC) Genome Browser

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Biology 644: Bioinformatics

Chimp Sequence Annotation: Region 2_3

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

What is Bioinformatics?

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING

Klinisk kemisk diagnostik BIOINFORMATICS

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Analysis of Microarray Data

Textbook Reading Guidelines

Genetics and Bioinformatics

ab initio and Evidence-Based Gene Finding

Big picture and history

Guided tour to Ensembl

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Analysis of Microarray Data

Data Retrieval from GenBank

Web based Bioinformatics Applications in Proteomics. Genbank

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

BME 110 Midterm Examination

Computers in Biology and Bioinformatics

G4120: Introduction to Computational Biology

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases

Introduction to Bioinformatics for Medical Research. Gideon Greenspan TA: Oleg Rokhlenko. Lecture 1

I nternet Resources for Bioinformatics Data and Tools

Compiled by Mr. Nitin Swamy Asst. Prof. Department of Biotechnology

Introduction to Bioinformatics

The RNA tools registry

Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX

A White Paper on SCan- MarK Explorer The Sophic Cancer Biomarker Knowledge Environment

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Introduction to Bioinformatics

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

What You NEED to Know

Introduction to EMBL-EBI.

Transcription:

Bioinformatics for Cell Biologists 15 19 March 2010 Developmental Biology and Regnerative Medicine (DBRM)

Schedule Monday, March 15 09.00 11.00 Introduction to course and Bioinformatics (L1) D224 Helena Storvall 12.00 17.00 Core databases for bioinformatics (C1) Space Ersen Kavak, Helena Storvall, Daniel Ramskold Tuesday, March 16 09.00 09.45 Alignments (L2) D224 Daniel Ramskold 10.00 10.45 Phylogenetics (L3) D224 Prof. Bengt Persson, CMB and LIU 11.00 11.45 Protein Sequence Bioinformatics (L4) D224 Prof. Bengt Persson, CMB and LIU 13.00 17.00 Computer Excercise 1: Alignments, Genomes and Browsers (C2) Space Rickard Sandberg

Wednesday, March 17 09.00 12.00 Computer Exercise 2: Phylogenetics and Proteins (C3) Space Rickard Sandberg 13.30 15.00 Invited Speaker 1: D224 Transcriptome and translational regulation Dr. Ola Larsson, McGill University 15.30 17.00 Next generation sequencing bioinformatics (L5) D224 Rickard Sandberg Thursday, March 18 09.00 12.00 Computer Exercise 3: Tools for Next Gen Sequencing, Galaxy (C4) Space Rickard Sandberg 13.00 14.30 Bioinformatics of microrna target predictions (L6) D224 15.00 16.30 Statistical issues with genome wide experiments (L7) D224 Yudi Pawitan, Dept of Medical Epidemiology and Biostatistics

Friday, March 19 09.00 11.00 Project work 12.00 15.00 Project presentations (20 min per group) 15.00 16.00 Wrap up and course evaluation Rickard Sandberg

Examination Project in groups of 2 (or 3). Form groups today! Apply bioinformatics resources to gather all information possible about your gene of interest. Save all information in a wiki. Each group will present their project at the end of the course. Each group member is expected to participate in the presentation. Examination date: 19 March 2010 (Friday)

Getting to know you better Name Department Areas of research Bioinformatics resources currently using Expectations of the course 7

Introduction to bioinformatics Helena Storvall Department of Cell and Molecular Biology Karolinska Institutet Stockholm, Sweden

Overview What is bioinformatics? Why is it important? Uses of bioinformatics Example problems Databases and tools What databases solves the problem? Take home message Goals of the course

What is bioinformatics? Bioinformatics is the use of computer technology to manage, analyze and understand biological information Storage and sharing (databases) Computations and statistics Visualization of data Simulations Comparisons of data

Why is it important? Data is in abundance Genome assemblies Expression data Protein sequence and structure Challenges: Storing the data Visualizing data Translating it into knowledge!

Sequence data The amount of sequencing data is increasing exponentially 1988: ~20 000 sequences 1998: ~ 3 milj sequences 2008: ~ 99 milj sequences

Examples of uses De novo genome assembly revolutionized by next generation sequencing Transcriptomics genome wide expression measurements Alignments structure and function prediction, heritage Protein folding simulation folding@home, Blue Gene QSAR Quantitative structure activity relationship

Genome wide mindset HeLa cells transfected with microrna, expression measured by microarray Is downregulation due to direct interaction or secondary effect? Simple approach: search for sequence complem entarity to the mirna Bioinformatics approach: search for enriched sequence motifs Lim et al. Nature 2005

Scenarios

What kind of data is out there? Others: OMIM, PDB Pfam UniProt

Sequence databases are synchronized

Entrez Cross database search in NCBI resources Results include: PubMed Entrez Gene RefSeq OMIM Protein sequence Protein structure

Entrez gene Focuses on the genomes that have been completely sequenced, have an active research community to contribute gene specific information, or that are scheduled for intense sequence analysis. Content of Entrez Gene: RefSeq collaborating model organism databases many other databases available from NCBI.

Gene Annotations Annotation = descriptive summary Gene annotations encompass Genomic position, strand information Intron exon boundaries Gene name Isoforms RefSeq Manually curated Ensembl Gene set and UCSC known genes Automatic annotations

RefSeq RefSeq represents the NCBI curated reference sequences. Contains useful annotations and it is manually curated RefSeq are either genomic, mrna or protein sequences. All RefSeq sequences are assembled/taken from data deposited into GenBank. Not all sequences are in RefSeq

Ensemble Gene set and UCSC known genes UniProt RefSeq Automatically annotated Contains predicted genes Contains more non proteincoding genes

Gene ontology GO describes how gene products behave in a cellular context. Three organizing principles: Molecular function: describes activities, such as catalytic or binding activities, at the molecular level. Biological process: involvement in multistep process, eg signal transduction, cell physiological process. Cellular component: what the gene product is localized to or a subcomponent of, eg localized to nucleus, subcomponent of ribosome.

OMIM OMIM = Online mendelian inheritance in man Summaries of human gene function Manually curated Focused on relationship between genotype and phenotype Originally focused on human disease, now encompass all kinds of genes Good place to start searching information about a gene

Alignment tools Alignment = match your sequence to known sequences BLAST Basic Local Alignment Search Tool Nucleotide, protein, translated nucleotides Maps against all known sequences Inheritance maps to several organisms

Alignment tools BLAT BLAST Like Alignment Tool Faster than BLAST Simultaneous queries Maps only to genome assembly Only one organism at a time Might miss divergent or short alignments Connected to UCSC genome browser

Genome browsers UCSC genome browser and Ensembl Collects genomic information Alignments to the genome Expression data Different isoforms Position on the genome Provides a comprehensive visualization of this collection Among the most important tools in bioinformatics!

UCSC genome browser

Protein annotations UniProt reviewed proteins = swiss prot protein sequence and annotation data merge between SWISS PROT and PIR Both reviewed and un reviewed proteins Manually curated brings together experimental results, computed features and scientific conclusions unreviewed proteins = TrEMBL (Translated EMBL) contains the translations of all coding sequences (CDS) present in the EMBL Nucleotide Sequence Database not yet integrated in SWISSPROT.

UniProt

Pfam Pfam Protein Family database Collection of protein domain families Pfam A built from UniProt Pfam B un annotated, automatically rendered Pfam entries are classified in one of four ways: Family: A collection of related proteins Domain: A structural unit which can be found in multiple protein contexts Repeat: A short unit which is unstable in isolation but forms a stable structure when multiple copies are present Motifs: A short unit found outside globular domains

Other protein tools PDB Protein Data Bank protein structures (NMR, x ray chrystallography) STRING Protein protein interactions Emboss pepinfo Physico chemical properties of protein Hydrophobicity, polarity, charge

Pathways KEGG biochemical pathways BioCarta intracellular signaling pathways

Functional enrichment in gene sets DAVID functional annotation tool DAVID = Database for Annotation, Visualization and Integrated Discovery Screens both gene ontology and pathways Searches for enrichment of functional features

Expression patterns Antibody based data Human Protein Atlas Mamep mouse development Allen brain map mouse and human brain Sequencing and array data Gene expression omnibus Array express

Scenario 1 Entrez > OMIM, PubMed UCSC genome browser UCSC genome browser BLAST OMIM BLAST > OMIM 1

Scenario 2 BLAT, UCSC genome browser BLAST Emboss pepinfo Uniprot Pfam 2

Scenario 3 STRING Gene Ontology KEGG BioCarta PDB 3

Take home messages Bioinformatics is needed to translate data into knowledge A genome wide approach gives a broader result There are many tools and databases out there, this lecture only covers a selection Several ways to solve a problem find your own preference

Goals of the course After taking this course: you will know about the most commonly used bioinformatic databases have a better understanding of how they work know how to find and use genomic data and genome wide datasets, such as transcriptomes see how bioinformatics can be part of your own research projects