Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Size: px
Start display at page:

Download "Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases"

Transcription

1 Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing BLAST PSI-BLAST Evolution Evolution of forelimbs of vertebrates Evolution has duplicated and shuffled bits and pieces of molecules to produce new linear arrangements that combine function in novel ways. Regions of similarity often suggest an evolutionary tie and/or common functional properties between very different molecules. Adaptive convergence Shared morphology does NOT necessarily imply common ancestry When similarity is due to common ancestry, we call it homology Common similarity problems Start with a query sequence with unknown properties and search within a database of millions of sequences to find those which share similarity with the query. Start with a small set of sequences and identify similarities and differences among them. In many sequences or very long sequences, detect commonly occurring patterns. 1

2 Common similarity problems (rephrased) One against many Common among several Common part of many How homology helps Given molecular sequences X and Y: X ~ Y AND INFO(Y) INFO(X) ( ~ means similar) Are the sequences similar? Why is similarity important Similar sequences (homologues) often derive from the same ancestor, share the same structure, and have similar biological function. Extrapolation of findings. Similarity judgements should be based on: The types of changes or mutations that occur within sequences. Characteristics of those different types of mutations. The frequency of those mutations. Crude similarity thresholds Proteins 25% similarity Nucleic acids 75% similarity Below 25/75% is twilight zone everything is possible. 2

3 Refined similarity thresholds E-value expectation value: how likely the result is by chance. Length of the segments similar between two sequences. Patterns of aa conservation. Number of indels. Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing BLAST PSI-BLAST BLAST Basic Local Alignment and Search Tool BLAST at NCBI and BLAST at EMBnet tml use different databases yield slightly different results. Standard BLAST uses substitution matrix (i.e. PAM or BLOSUM) to reward identity match, gives positive points for similar aa, and penalties for different aa. Different BLASTs blastp : compares your protein with a protein database. tblastn : compares your protein with a nucleotide database (t is for translated). Protein vs. nucleotide database BLASTing protein at NCBI Six ways to translate DNA to protein direct and reverse strand 3 reading frames each. tblastn runs all 6 possibilities. Input your sequence 5 to 3 (N to C). You run query sequence against target databases to get hits or matches. 3

4 blastp input by accession no. blastp input by sequence FASTA CD conserved domain search deselected Intermediate result Waiting for results Waiting for results European server If page indicates that search would take more than 10 minutes than use other BLAST server. Morning use USA server Afternoon use Japan server Click just once. Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing BLAST PSI-BLAST 4

5 BLAST output Graphics shows where your query is similar to others. Hit list ranked names of similar sequences. Alignments one to one. Parameters used for search. Graphics part Hit list pass the mouse over the bar to see more Hit list Accession number and the description. Score (bits) must be >50 to be reliable. E-value - expectation of match by chance (given the database), must be <0.001 to be reliable. Alignment Alignments do NOT lie if you know how to look at them. x means masking (low-complexity segment) + means similarity consensus line 5

6 Saving BLAST results BLASTing nucleic acid Reproducibility in time is low because database, BLAST program, and default program parameters change in time. Convert to pdf. Save as Complete Webpage. Save Picture as. Common mistake Friends of my friends are my friends. NOT necessarily. BLAST runs local alignments, hits are NOT transitive unless the alignments are overlapping. Sequence 1: AAAAATTTTTT Sequence 2: AAAAA Sequence 3: TTTTTT BLASTs for DNA blastn - DNA against DNA; for noncoding DNA. tblastx - tdna against tdna; for protein discovery. blastx tdna against protein; for proteins encoded in your query DNA and for DNA sequence of unknown quality. Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing BLAST PSI-BLAST Using filters Correct database (nt/protein) Organism database Repetitions 6

7 Use of BLAST Finding genes in a genome Predicting a protein function Predicting a protein 3-D structure Finding protein family members Finding genes in a genome Quick and dirty BLAST way: Cut your genome to 5kb overlapping sequences, use blastx against nonredundant (NR) protein database for every piece. Proper way: Run gene prediction software. Predicting a protein function Quick and dirty BLAST way: Use blastp against Swiss-Prot. If >25% identity over the whole protein length then you know the function of your protein. Proper way: Conduct domain analysis and wet-lab (bluefingers) experiments. Predicting a protein 3-D structure Quick and dirty BLAST way: Use blastp against PDB. If >25% identity over the whole protein length then you know the probable structure of your protein. Proper way: Conduct homology modelling, X-ray, and NMR experiments. Finding protein family members Quick and dirty BLAST way: Use blastp (or PSI-BLAST) against nonredundant protein family. Make a multiple sequence alignment of all members of the family and draw a phylogenetic tree. Proper way: Clone new family members using PCR. BLAST parameters Power is nothing without control. Reasons for changing default parameters: sequence has a biased composition (use masking), NO results (change substitution matrix and gap penalties), too many results (change NR database to Swiss-Prot, use Entrez keyword with Boolean operators, and increase E-value threshold), testing robustness of findings. 7

8 BLAST protein masking Low-complexity regions (many prolines, many glutamic acids) false matches. Masking by replacement with X. Use InterPro, CD search, or Pfscan to find and mask common domains (i.e. Zn finger domain and fibronectin domain). BLAST DNA masking BLAST output 60% of human DNA are repeats Large-scale genome sequencing brings errors - remains of vectors in human database. Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing BLAST PSI-BLAST PSI-BLAST Position Specific Iterated BLAST For distantly related sequences. 1st iteration finds relatives by blastp with BLOSUM62 matrix. 2nd iteration uses results of the 1st run to generate a new substitution matrix (one aa has different penalizations on different positions) and looks for more relatives. 3rd 8

9 PSI-BLASTing protein PSI-BLASTing protein st.cgi?cmd=web&layout=twowindo ws&auto_format=semiauto&align MENTS=250&ALIGNMENT_VIEW=Pairw ise&client=web&composition_bas ED_STATISTICS=on&DATABASE=nr&C DD_SEARCH=on&DESCRIPTIONS=500 &ENTREZ_QUERY=(none)&EXPECT=10 &FORMA PSI-BLASTing format PSI-BLAST output check box will be used for a next iteration, can be edited green dot used in previous iterations new - reported for the first time as hit Avoiding mistakes with PSI-BLAST When we look for hemoglobin and after 2nd iteration alcohol dehydrogenase appears among hits, it is time to stop. Read annotation to distinguish between interesting finding and false finding. Check domains by InterPro/CD server/ Pfscan and cut proteins to 200 aa pieces with one domain each. BLAST alternatives Smith and Waterman ssearch : the slowest, more accurate FASTA slower, good for DNA (originally fast all) BLAT for locating cdna in a genome, keeps an index of the entire genome in memory. The index consists of all non-overlapping 11-mers except for those heavily involved in repeats FLASH Fast alignment Algorithm for finding Structural Homology 9

10 Úkol 1 We compared 4 homologs of papain sequence by structural comparison: kiwi aktinidin, human prokatepsins L and B, Staphylococcus aureus stafopain. Run Papaia papain through BLAST and PSIBLAST. Which homologs (out of 4 mentioned above) is hit by BLAST and PSIBLAST? Úkol 2 How many cytokinin dehydrogenase sequences are in databases? 10

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

Evolutionary Genetics. LV Lecture with exercises 6KP

Evolutionary Genetics. LV Lecture with exercises 6KP Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted

More information

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE

FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE FACULTY OF BIOCHEMISTRY AND MOLECULAR MEDICINE BIOMOLECULES COURSE: COMPUTER PRACTICAL 1 Author of the exercise: Prof. Lloyd Ruddock Edited by Dr. Leila Tajedin 2017-2018 Assistant: Leila Tajedin (leila.tajedin@oulu.fi)

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

Modern BLAST Programs

Modern BLAST Programs Modern BLAST Programs Jian Ma and Louxin Zhang Abstract The Basic Local Alignment Search Tool (BLAST) is arguably the most widely used program in bioinformatics. By sacrificing sensitivity for speed, it

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

From AP investigative Laboratory Manual 1

From AP investigative Laboratory Manual 1 Comparing DNA Sequences to Understand Evolutionary Relationships. How can bioinformatics be used as a tool to determine evolutionary relationships and to better understand genetic diseases? BACKGROUND

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

Alignment to a database. November 3, 2016

Alignment to a database. November 3, 2016 Alignment to a database November 3, 2016 How do you create a database? 1982 GenBank (at LANL, 2000 sequences) 1988 A way to search GenBank (FASTA) Genome Project 1982 GenBank (at LANL, 2000 sequences)

More information

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick

Application for Automating Database Storage of EST to Blast Results. Vikas Sharma Shrividya Shivkumar Nathan Helmick Application for Automating Database Storage of EST to Blast Results Vikas Sharma Shrividya Shivkumar Nathan Helmick Outline Biology Primer Vikas Sharma System Overview Nathan Helmick Creating ESTs Nathan

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 2. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 2. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 2 Sepp Hochreiter gene Central Dogma nucleus DNA 1. transcription (mrna) 2. transport mrna protein 3. translation (ribosom, trna) 4. folding (protein)

More information

The use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015

The use of bioinformatic analysis in support of HGT from plants to microorganisms. Meeting with applicants Parma, 26 November 2015 The use of bioinformatic analysis in support of HGT from plants to microorganisms Meeting with applicants Parma, 26 November 2015 WHY WE NEED TO CONSIDER HGT IN GM PLANT RA Directive 2001/18/EC As general

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Molecular Databases and Tools

Molecular Databases and Tools NWeHealth, The University of Manchester Molecular Databases and Tools Afternoon Session: NCBI/EBI resources, pairwise alignment, BLAST, multiple sequence alignment and primer finding. Dr. Georgina Moulton

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Vorlesungsthemen Part 1: Background

More information

Hands-On Four Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases Hands-On Four Investigating Inherited Diseases The purpose of these exercises is to introduce bioinformatics databases and tools. We investigate an important human gene and see how mutations give rise

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

Exploring the Genetic Basis for Behavior. Instructor s Notes

Exploring the Genetic Basis for Behavior. Instructor s Notes Exploring the Genetic Basis for Behavior Instructor s Notes Introduction This lab was designed for our 300-level Advanced Genetics course taken by juniors and seniors majoring in Biology or Biochemistry.

More information

Sequence Analysis Lab Protocol

Sequence Analysis Lab Protocol Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136

More information

Types of Databases - By Scope

Types of Databases - By Scope Biological Databases Bioinformatics Workshop 2009 Chi-Cheng Lin, Ph.D. Department of Computer Science Winona State University clin@winona.edu Biological Databases Data Domains - By Scope - By Level of

More information

Biotechnology Explorer

Biotechnology Explorer Biotechnology Explorer C. elegans Behavior Kit Bioinformatics Supplement explorer.bio-rad.com Catalog #166-5120EDU This kit contains temperature-sensitive reagents. Open immediately and see individual

More information

Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)

Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA) Vol. 6(1), pp. 1-6, April 2014 DOI: 10.5897/IJBC2013.0086 Article Number: 093849744377 ISSN 2141-2464 Copyright 2014 Author(s) retain the copyright of this article http://www.academicjournals.org/jbsa

More information

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

ONLINE BIOINFORMATICS RESOURCES

ONLINE BIOINFORMATICS RESOURCES Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower

More information

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama Method to assign the coding regions of ESTs Céline Becquet Summer Program 2002 Structural Neuropathology Lab Molecular Neuropathology Group RIKEN Brain Science Institute Host : Dr. Nobuyuki Nukina Tutor

More information

Small Genome Annotation and Data Management at TIGR

Small Genome Annotation and Data Management at TIGR Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

Theory and Application of Multiple Sequence Alignments

Theory and Application of Multiple Sequence Alignments Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)

More information

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme

Illumina (Solexa) Throughput: 4 Tbp in one run (5 days) Cheapest sequencing technology. Mismatch errors dominate. Cost: ~$1000 per human genme Illumina (Solexa) Current market leader Based on sequencing by synthesis Current read length 100-150bp Paired-end easy, longer matepairs harder Error ~0.1% Mismatch errors dominate Throughput: 4 Tbp in

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs 1997 Oxford University Press Nucleic Acids Research, 1997, Vol. 25, No. 17 3389 3402 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul*, Thomas L. Madden,

More information

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32.

AP BIOLOGY. Investigation #3 Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST. Slide 1 / 32. Slide 2 / 32. New Jersey Center for Teaching and Learning Slide 1 / 32 Progressive Science Initiative This material is made freely available at www.njctl.org and is intended for the non-commercial use of students and

More information

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018 Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources

Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Identification of Single Nucleotide Polymorphisms and associated Disease Genes using NCBI resources Navreet Kaur M.Tech Student Department of Computer Engineering. University College of Engineering, Punjabi

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Sequence Analysis '17 -- lecture 16 1. Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Alpha helix Right-handed helix. H-bond is from the oxygen at i to the nitrogen

More information

Sequence searching and sequence alignments MBV-INFX410

Sequence searching and sequence alignments MBV-INFX410 Sequence searching and sequence alignments MBV-INFX410 In this exercise we will start with a bacterial DNA repair protein called Nth and identify its homologs in different species, including humans, using

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

Teaching Bioinformatics in the High School Classroom. Models for Disease. Why teach bioinformatics in high school?

Teaching Bioinformatics in the High School Classroom. Models for Disease. Why teach bioinformatics in high school? Why teach bioinformatics in high school? Teaching Bioinformatics in the High School Classroom David Form Nashoba Regional High School dform@nrsd.net Relevant, real life examples It s visual Allows for

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG Chapman & Hall/CRC Mathematical and Computational Biology Series ALGORITHMS IN BIO INFORMATICS A PRACTICAL INTRODUCTION WING-KIN SUNG CRC Press Taylor & Francis Group Boca Raton London New York CRC Press

More information

I nternet Resources for Bioinformatics Data and Tools

I nternet Resources for Bioinformatics Data and Tools ~i;;;;;;;'s :.. ~,;;%.: ;!,;s163 ~. s :s163:: ~s ;'.:'. 3;3 ~,: S;I:;~.3;3'/////, IS~I'//. i: ~s '/, Z I;~;I; :;;; :;I~Z;I~,;'//.;;;;;I'/,;:, :;:;/,;'L;;;~;'~;~,::,:, Z'LZ:..;;',;';4...;,;',~/,~:...;/,;:'.::.

More information

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome Protein-Protein Interactions Protein Interactions A Protein may interact with: Other proteins Nucleic Acids Small molecules Protein-Protein Interactions: The Interactome Experimental methods: Mass Spec,

More information

Bioinformatics, in general, deals with the following important biological data:

Bioinformatics, in general, deals with the following important biological data: Pocket K No. 23 Bioinformatics for Plant Biotechnology Introduction As of July 30, 2006, scientists around the world are pursuing a total of 2,126 genome projects. There are 405 published complete genomes,

More information

Glossary of Commonly used Annotation Terms

Glossary of Commonly used Annotation Terms Glossary of Commonly used Annotation Terms Akela a general use server for the annotation group as well as other groups throughout TIGR. Annotation Notebook a link from the gene list page that is associated

More information

Getting To Know Your Protein

Getting To Know Your Protein Getting To Know Your Protein Comparative Protein Analysis: Part II. Protein Domain Identification & Classification Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

Protein Bioinformatics PH Final Exam

Protein Bioinformatics PH Final Exam Name (please print) Protein Bioinformatics PH260.655 Final Exam => take-home questions => open-note => please use either * the Word-file to type your answers or * print out the PDF and hand-write your

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

Integration of data management and analysis for genome research

Integration of data management and analysis for genome research Integration of data management and analysis for genome research Volker Brendel Deparment of Zoology & Genetics and Department of Statistics Iowa State University 2112 Molecular Biology Building Ames, Iowa

More information

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases

Sequence Databases. Chapter 2. caister.com/bioinformaticsbooks. Paul Rangel. Sequence Databases Chapter 2 Paul Rangel Abstract DNA and Protein sequence databases are the cornerstone of bioinformatics research. DNA databases such as GenBank and EMBL accept genome data from sequencing projects around

More information

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Lecture 2: Central Dogma of Molecular Biology & Intro to Programming Central Dogma of Molecular Biology Proteins: workhorse molecules of biological systems Proteins are synthesized from the genetic blueprints

More information

Assigning Sequences to Taxa CMSC828G

Assigning Sequences to Taxa CMSC828G Assigning Sequences to Taxa CMSC828G Outline Objective (1 slide) MEGAN (17 slides) SAP (33 slides) Conclusion (1 slide) Objective Given an unknown, environmental DNA sequence: Make a taxonomic assignment

More information

JPred and Jnet: Protein Secondary Structure Prediction.

JPred and Jnet: Protein Secondary Structure Prediction. JPred and Jnet: Protein Secondary Structure Prediction www.compbio.dundee.ac.uk/jpred ...A I L E G D Y A S H M K... FUNCTION? Protein Sequence a-helix b-strand Secondary Structure Fold What is the difference

More information

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes? Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

Tools and Opportunities to Enhance Risk Analysis. Nathan J. Hillson

Tools and Opportunities to Enhance Risk Analysis. Nathan J. Hillson Tools and Opportunities to Enhance Risk Analysis Nathan J. Hillson njhillson@lbl.gov Future Biotechnology Products and Opportunities to Enhance the Capabilities of the Biotechnology Regulatory System National

More information

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004

Analysis of large deletions in human-chimp genomic alignments. Erika Kvikstad BioInformatics I December 14, 2004 Analysis of large deletions in human-chimp genomic alignments Erika Kvikstad BioInformatics I December 14, 2004 Outline Mutations, mutations, mutations Project overview Strategy: finding, classifying indels

More information

Genetics Lecture 21 Recombinant DNA

Genetics Lecture 21 Recombinant DNA Genetics Lecture 21 Recombinant DNA Recombinant DNA In 1971, a paper published by Kathleen Danna and Daniel Nathans marked the beginning of the recombinant DNA era. The paper described the isolation of

More information

LAB. WALRUSES AND WHALES AND SEALS, OH MY!

LAB. WALRUSES AND WHALES AND SEALS, OH MY! Name Period Date LAB. WALRUSES AND WHALES AND SEALS, OH MY! Walruses and whales are both marine mammals. So are dolphins, seals, and manatee. They all have streamlined bodies, legs reduced to flippers,

More information

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Molecular Cell Biology - Problem Drill 11: Recombinant DNA Molecular Cell Biology - Problem Drill 11: Recombinant DNA Question No. 1 of 10 1. Which of the following statements about the sources of DNA used for molecular cloning is correct? Question #1 (A) cdna

More information

Lecture Four. Molecular Approaches I: Nucleic Acids

Lecture Four. Molecular Approaches I: Nucleic Acids Lecture Four. Molecular Approaches I: Nucleic Acids I. Recombinant DNA and Gene Cloning Recombinant DNA is DNA that has been created artificially. DNA from two or more sources is incorporated into a single

More information

Computational aspects of ncrna research. Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics

Computational aspects of ncrna research. Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects of ncrna research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects on ncrna Bacterial ncrnas research Gene discovery Target discovery Discovery

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

user s guide Question 3

user s guide Question 3 Question 3 During a positional cloning project aimed at finding a human disease gene, linkage data have been obtained suggesting that the gene of interest lies between two sequence-tagged site markers.

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives Lab Exercise Protein Synthesis Contents Objectives 1 Introduction 1 Activity.1 Overview of Process 2 Activity.2 Transcription 2 Activity.3 Translation 3 Resutls Section 4 Introduction Having information

More information

Homology Modeling of Mouse orphan G-protein coupled receptors Q99MX9 and G2A

Homology Modeling of Mouse orphan G-protein coupled receptors Q99MX9 and G2A Quality in Primary Care (2016) 24 (2): 49-57 2016 Insight Medical Publishing Group Research Article Homology Modeling of Mouse orphan G-protein Research Article Open Access coupled receptors Q99MX9 and

More information

Protein 3D Structure Prediction

Protein 3D Structure Prediction Protein 3D Structure Prediction Michael Tress CNIO ?? MREYKLVVLGSGGVGKSALTVQFVQGIFVDE YDPTIEDSYRKQVEVDCQQCMLEILDTAGTE QFTAMRDLYMKNGQGFALVYSITAQSTFNDL QDLREQILRVKDTEDVPMILVGNKCDLEDER VVGKEQGQNLARQWCNCAFLESSAKSKINVN

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment

Changing Mutation Operator of Genetic Algorithms for optimizing Multiple Sequence Alignment International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 11 (2013), pp. 1155-1160 International Research Publications House http://www. irphouse.com /ijict.htm Changing

More information

BIOINFORMATICS AND FUNCTIONAL GENOMICS

BIOINFORMATICS AND FUNCTIONAL GENOMICS BIOINFORMATICS AND FUNCTIONAL GENOMICS third edition Jonathan Pevsner Bioinformatics and Functional Genomics Bioinformatics and Functional Genomics Third Edition Jonathan Pevsner Department of Neurology,

More information

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417. SENIOR BIOLOGY Blueprint of life and Genetics: the Code Broken? NAME SCHOOL / ORGANISATION DATE Bay 12, 1417 Bay number Specimen number INTRODUCTORY NOTES Blueprint of Life In this part of the workshop

More information

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm Prateek Kumar 1, Steven Henikoff 2,3 & Pauline C Ng 1,3 1 Department of Genomic Medicine, J. Craig

More information