Introduction to Bioinformatics

Similar documents
BIOINFORMATICS Introduction

Gene Identification in silico

Introduction to Bioinformatics

Computational gene finding. Devika Subramanian Comp 470

Sequence Databases and database scanning

Algorithms in Bioinformatics

Small Genome Annotation and Data Management at TIGR

Introduction to BIOINFORMATICS

Translating Biological Data Sets Into Linked Data

O C. 5 th C. 3 rd C. the national health museum

Protein Bioinformatics Part I: Access to information

Bioinformatics. Ingo Ruczinski. Some selected examples... and a bit of an overview

Introduction to Bioinformatics

ELE4120 Bioinformatics. Tutorial 5

Motif Discovery from Large Number of Sequences: a Case Study with Disease Resistance Genes in Arabidopsis thaliana

Basic Bioinformatics: Homology, Sequence Alignment,

BIOINFORMATICS IN BIOCHEMISTRY

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Product Applications for the Sequence Analysis Collection

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

2017 Amplyus, all rights reserved

Big picture and history

ONLINE BIOINFORMATICS RESOURCES

New Programs in Quantitative Biology: Hunter College.

Bioinformatics (Globex, Summer 2015) Lecture 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Examination Assignments

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics

Gene-centered resources at NCBI

Database Searching and BLAST Dannie Durand

Web based Bioinformatics Applications in Proteomics. Genbank

Bioinformatics. Lecturer: Antinisca Di Marco Tutor: Francesco Gallo

Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Overview of Health Informatics. ITI BMI-Dept

1.1 What is bioinformatics? What is computational biology?

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Protein Structure Prediction. christian studer , EPFL

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

GENOME ANALYSIS AND BIOINFORMATICS

AGRO/ANSC/BIO/GENE/HORT 305 Fall, 2016 Overview of Genetics Lecture outline (Chpt 1, Genetics by Brooker) #1

Introduction to Bioinformatics

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

Genome Sequence Assembly

Worksheet for Bioinformatics

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

NCBI web resources I: databases and Entrez

Gene Signal Estimates from Exon Arrays

Introduction to Bioinformatics and Gene Expression Technology

Access to Information from Molecular Biology and Genome Research

CHAPTER 21 LECTURE SLIDES

DNA Structure and Analysis. Chapter 4: Background

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Genomic region (ENCODE) Gene definitions

Applications in Bio-informatics and Biomedical Engineering

Classification and Learning Using Genetic Algorithms

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09

CISC 436/636 Computational Biology &Bioinformatics (Fall 2016) Lecture 1

Practical Bioinformatics for Biologists (BIOS 441/641)

Bioinformatics, in general, deals with the following important biological data:

Theory and Application of Multiple Sequence Alignments

MICROBIAL GENETICS (BIO-375/575)

Time Series Motif Discovery

Introduction to Bioinformatics and Gene Expression Technologies

Gibbs Sampling and Centroids for Gene Regulation

Exploring Similarities of Conserved Domains/Motifs

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

Recommendations from the BCB Graduate Curriculum Committee 1

Function Prediction of Proteins from their Sequences with BAR 3.0

The use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Genome and DNA Sequence Databases. BME 110: CompBio Tools Todd Lowe April 5, 2007

Concepts of Bioinformatics

Ontologies - Useful tools in Life Sciences and Forensics

Why learn sequence database searching? Searching Molecular Databases with BLAST

Engineering Genetic Circuits

COMPUTER RESOURCES II:

Sequence Analysis Lab Protocol

Biotechnology Explorer

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

BIOINFORMATICS AND SYSTEM BIOLOGY (INTERNATIONAL PROGRAM)

Advances in the Biomedical Applications of the EELA Project

Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018

M. Phil. (Computer Science) Programme < >

Fundamentals of Bioinformatics: computation, biology, computational biology

Top 5 Lessons Learned From MAQC III/SEQC

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Bioinformatics for Proteomics. Ann Loraine

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot

Advances in analytical biochemistry and systems biology: Proteomics

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Mate-pair library data improves genome assembly

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010

Chimp Sequence Annotation: Region 2_3

RNA-Seq with the Tuxedo Suite

Introduction to Bioinformatics

B I O I N F O R M A T I C S

Transcription:

Introduction to Bioinformatics Changhui (Charles) Yan Old Main 401 F http://www.cs.usu.edu www.cs.usu.edu/~cyan 1

How Old Is The Discipline? "The term bioinformatics is a relatively recent invention, not appearing in the literature until 1991 However, had been building databases, developing algorithms and making biological discoveries by sequence analysis since the 1960s--- ---long before anyone thought to label this activity with a special term.so bioinformatics has, in fact, been in existence for more than 400 years (Mark S. Boguski, Trends Guide to Bioinformatics Elsevier, Trends Supplement 1998 p1) 2

What Is Bioinformatics? Any use of computers to handle biological information The use of computers to characterize biology molecules or to simulate dynamics of molecules The use of computers to store, compare, retrieve, or analyze biology information Computational Biology, Proteomics, Genomics, Medical Informatics 3

Bioinformatic Problems 4

Central Dogma 5

Genome 6

Bioinformatic Problems Genome Sequencing 7

Human Genome Project (HGP) To determine the sequences of the 3 billion bases that make up human DNA To identify the approximate 100,000 genes in human DNA (The estimates has been changed to 20,000-25,000 by Oct 2004) To store this information in databases To develop tools for data analysis 8

Human Genome Project (HGP) HGP began in October 1990 and completed in 2003 99% human DNA sequence finished to 99.99% accuracy (April 2003) 15,000 full-length length human genes identified (March 2003) Finished genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster (April 2003) Post-genome era 9

Completely ly Sequenced Genomes 10

Genome Projects More than 60 eukaryotic genome sequencing projects are underway 11

Genome Sequencing 12

Genome Sequencing 13

Difficulties due to Repeats Uncertainty Missing data Huge size!!!! 14

Gene finding Genome Sequencing Gene Finding 15,000 human genes identified The estimates are 100,000 (1990) 20,000-25,000 25,000 (Oct 2004) 3 billion bases that make up human DNA 15

Gene-finders 16

Sequence Alignment Genome Gene Finding Sequence alignment 17

Longest Common Subsequences 18

Sequence Alignment Pair-wise Alignment Multiple Sequence Alignment Searching Databases http://www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov/blast/ 19

Sequence Alignment Global vs. Local 20

Gene Expression Genome Sequencing Gene Finding Sequence Alignment Gene Expression 21

Gene Expression 22

Protein Folding Genome Sequencing Gene Finding Sequence Alignment Gene Expression Protein Structure 23

Protein Structure Visualization of protein structure Protein structure alignment Protein structure prediction 24

Protein Structure Prediction Comparative modeling If the sequence is similar to another one whose structure is known. Fold recognition In absence of a significantly similar sequence with known structure, these methods try to determine how well a known structure fits the sequence to model. Ab initio prediction Can detect the structures that have not been discovered. Monte Carlo search for lowest energy. 25

Protein Function Prediction Genome Sequencing Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function 26

Protein Function Prediction similar sequence-similar similar structure-similar similar function paradigm Identification of homologous sequences (BLAST, PSI- BLAST) (>30% identity) Identification of conserved functional sites (<=30%) 27

Conserved Functional Sites -- Motifs [AG]-G-x(0,1) x(0,1)-[gap] [GAP]-x-N-x-[STA]-x(6) x(6)-[gs] [GS]-x(9) x(9)-g 28

Motifs 29

Conserved Functional Sites -- Motifs Single motif PROSITE: a database of biologically significant sites 30

Conserved Functional Sites -- Motifs Multiple motifs PRINTS: a database of protein fingerprints. A fingerprint is a group of conserve motifs characterizing a protein function 31

PRINTS >ATHA_PIG 32

PRINTS 33

Conserved Functional Sites -- Motifs Hidden Markov Model Pfam: 34

Protein Interaction Network Genome Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function Protein Interaction Network 35

Protein Interaction Network 36

37

Protein Interaction Network 38

Bioinformatic Problems Genome Gene Finding Sequence Alignment Gene Expression Protein Structure Protein Function Protein Interaction Network 39

Bioinformatic Problems There are more. Phylogeny analysis: Tree of life Databases and tools development 40

Bioinformatic Databases GenBank (DNA sequences) ProteinDataBank (Protein structures) PIR (Protein sequences) Nucleic Acids Research (2005) 719 databases 41

Bioinformatic Programs Sequence analysis: BLAST, ClustalX,, EMBOSS, GCG Molecular imaging/modeling: PyMol, MOLMOL, RasMol 42