From Sequence to Knowledge: The Art & Science of Phage Genome Annotation. Ramy K. Aziz Cairo University

Size: px
Start display at page:

Download "From Sequence to Knowledge: The Art & Science of Phage Genome Annotation. Ramy K. Aziz Cairo University"

Transcription

1 From Sequence to Knowledge: The Art & Science of Phage Genome Annotation Ramy K. Aziz Cairo University

2 A helping hand through The Annotation Bottleneck From Sequence to Knowledge: PhAnToMe, RAST, and the Ultimate Kropinski Toolkit Compiled by: Andrew Kropinski and Ramy Aziz

3 Data & links: Online material Slides Old tutorials (more detailed, but missing latest ): Evergreen 2011: (Karin) Evergreen 2013: Evergreen 2015:

4 INTRODUCTION

5 The analysis bottleneck Observation: We generate more data than we can analyze. We generate sequence data faster than we can analyze them. Opinion: Bottlenecks are not created equal! It is important to define the question(s) before working on the answer(s)!

6 The analysis bottleneck The Lavigne paradox

7 The analysis bottleneck The Lavigne paradox

8 Quick group activity Defining the question(s): How many of you have annotated a genome? How many phage genomes have you sequenced (or are in the process of sequencing)? a) None b) 1-5 c) 5-50 d) > 50 What is the single most pressing question you want to answer from genome analysis?

9 Begin with the end in mind (Covey, the 7 habits) DEFINING THE QUESTION(S)

10 What You Want Frameshift Incomplete: genome termini The goal: complete accurate Faulty assembly chimeric fragments

11 A process of reconstruction

12 Annotation à Reconstruction from genome from metagenome - complete - accurate frameshift Incomplete faulty assembly Credit: Andrew Kropinski Credit: Bas Dutilh

13 Annotation à Reconstruction from genome from metagenome - complete - accurate frameshift Incomplete faulty assembly Credit: Andrew Kropinski Credit: Bas Dutilh 21 July 2016 Phage Genomics - VoM 2016

14 A process of reconstruction Experimentally DNA TGATTGTGTGTTTGCGCAATGCG TGATTGGTCTNNNTCTCTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG

15 A process of reconstruction Experimentally DNA TGATTGTGTGTTTGCGCAATGCG TGATTGGTCTNNNTCTCTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG Computationally TGATTGTGTGTTTGCGCAATGCG TGATTGGTCTNNNTCTCTTGCGCAATGCG ATGTGTATATATAGTGAGCTTGCCC GTCTCTCTNNNTCTCTTG

16 Assembly trna calling Validation (segmenter) From Sequence to Knowledge From raw sequence data to genome submission/ publication orienting Fixing frameshifts Gene finding/ ORF calling Annotation (Assigning functions) Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics

17 Countless tools

18 Authority figures Andrew Kropinski Rob Lavigne Rob Edwards

19 General outline Part I: The Kropinski toolkit Tools approved and recommended by Andrew Kropinski ( from seq to pub Part II: SEED-based tools: The RAST family The PhAnToMe database/portal

20 The Kropinski Toolkit

21 What we want, according to Andrew Well characterized genome, in which, ideally we know: the location & function of all the genes the location of promoters & terminators the correct taxonomy PstI A PstI kb Viruses; dsdna viruses, no RNA stage; Caudovirales; Siphoviridae; T1virus

22 Desired outcome: Create GenBank submission Good title Complete, accurate description of the genome and its taxonomy

23 Desired outcome (2)

24 Desired outcome (3)

25 Desired outcome (4) Protein products of concern, particularly for those interested in phage therapy: Integrases Toxins PstI PstI kb A 30 32

26 Processes and Steps I. Primary analysis (QC/ pre-annotation proofreading: e.g., orient with BLASTN) II. Genome annotation Gene finding (ORF calling) Automated annotation Massaging (edition, functional assignment) III. Second dimension (regulatory elements) IV. Comparative genomics V. Metadata VI. Visualization

27 Assembly trna calling Validation (segmenter) From Sequence to Knowledge From raw sequence data to genome submission/ publication orienting Fixing frameshifts Gene finding/ ORF calling Annotation (Assigning functions) Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics

28 II. Genome Annotation AUTOMATED ANNOTATION

29 RAST (subsystems-based tools) Will be the major focus of this short tutorial The goal is: 1. Quick demo how to use RAST 2. Quick preview batch annotation in RAST 3. Optimize RAST for phage annotation 4. Demonstrate & discuss how to improve RAST output

30 RAST (subsystems-based tools) But, before getting there

31 The Kropinski wisdom 1. Always use more than one tool 2. Never blindly trust any automated (or manual) process 3. Use your eyes and hands: visual inspection/ manual proofreading, re-annotation Every apparently complicated file is still editable on your favorite text editor (e.g., NotePad) 4. If you don t know a gene s function (if you can t convince your grandma), better keep it unnamed than contribute to error propagation 2 Aug 2015 Phage Genomics - Evergreen 2015

32 What do I call my gene product (i.e. protein)? phage hypothetical protein redundant gp87 (gp = gene product) hypothetical protein gp200 describes radically different proteins in Listeria, Enterococcus, Mycobacterium, Rhodococcus, Sphingomonas, Pseudomonas, Bacillus and Synechococcus phage genomes Add /note= similar to gp43 of Escherichia coli phage T4

33 What do I call my gene product (i.e. protein)? /product= UboA ; NrdA ; hypothetical protein SA5_0153/152 ; ORF184 (as bad as gp184); RNAP1 ; "32 kda protein Bad because they don`t mean anything to the casual (or informed) reader. Unless you are a bioinformatician or biostatistician be conservative in recording hits. Could you convince your grandmda?, if not list as a hypothetical protein but do take a stand putative DNA polymerase is cowardly

34 Nomenclature Sins hypothetical protein DNA polymerase with no or poor quality evidence is far worse than: DNA polymerase hypothetical protein Be cautious about using BLASTP hits in naming gps is there additional evidence to back the designation up

35 Consistent Nomenclature All of these describe homologs of the product of the coliphage T4 riia gene! riia protector from prophage-induced early lysis protector from prophage-induced early lysis protector from prophage-induced early lysis riia membrane-associated affects host membrane ATPase riia membrane-associated affects host membrane ATPase phage riia lysis inhibitor riia protector riia riia protein membrane integrity protector hypothetical protein unnamed protein product!!!!!! protein of unknown function

36 Bottom line: Manual vs. Automated Turtles know the road better than rabbits Khalil Gibran but they may never reach the end! The best approach? Human expert-based annotation 2 Aug 2015 Phage Genomics - Evergreen 2015

37 Assembly trna calling Validation (segmenter) From Sequence to Knowledge From raw sequence data to genome submission/ publication orienting Fixing frameshifts Gene finding/ ORF calling Annotation (Assigning functions) Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics

38 IV. COMPARATIVE GENOMICS

39 Genomic pairwise comparisons EMBOSS Stretcher: N.B. genomes must be collinear BLASTN - NCBI ANI (Average Nucleotide Identity): GGDC 2.0 (Genome to Genome Distance Calculator): jspeciesws ANI:

40 Proteomic pairwise comparisons CoreGenes ( TBLASTX Remember protein sequence is more conserved than DNA sequence; probably useful for more distant relationships

41 VI. POLISH IT TO PUBLISH IT

42 Assembly trna calling Validation (segmenter) From Sequence to Knowledge From raw sequence data to genome submission/ publication orienting Fixing frameshifts Gene finding/ ORF calling Annotation (Assigning functions) Introns and Inteins Subsystem assignment Refinement/ Secondary annotation loop Special purpose: toxins, morons, integrases, lifestyle prediction Regulatory elements (promoters, terminators) Output: files and graphics

43 Servers & software BLAST Ring Image Generator ( CGView ( CGView Comparison Tool: Circos ( DNAPlotter: ( Easyfig ( GenomeVx ( GView Server ( progressivemauve and ACT

44 EasyFig

45 CGView Comparison Tool

46 BLAST Ring Image Generator

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the main database of nucleotide sequences at the National Center for Biotechnology

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

a-dB. Code assigned:

a-dB. Code assigned: This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for teachers Description: In this activity, students copy unknown DNA sequences and use them to search GenBank, the database of nucleotide sequences at the National Center for Biotechnology

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Data Retrieval from GenBank

Data Retrieval from GenBank Data Retrieval from GenBank Peter J. Myler Bioinformatics of Intracellular Pathogens JNU, Feb 7-0, 2009 http://www.ncbi.nlm.nih.gov (January, 2007) http://ncbi.nlm.nih.gov/sitemap/resourceguide.html Accessing

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

BLASTing through the kingdom of life

BLASTing through the kingdom of life Information for students Instructions: In short, you will copy one of the sequences from the data set, use blastn to identify it, and use the information from your search to answer the questions below.

More information

Practical Tips. : Practical Tips Matrix Science

Practical Tips. : Practical Tips Matrix Science Practical Tips : Practical Tips 2006 Matrix Science 1 Peak detection Especially critical for Peptide Mass Fingerprints A tryptic digest of an average protein (30 kda) should produce of the order of 50

More information

Sequence Analysis Lab Protocol

Sequence Analysis Lab Protocol Sequence Analysis Lab Protocol You will need this handout of instructions The sequence of your plasmid from the ABI The Accession number for Lambda DNA J02459 The Accession number for puc 18 is L09136

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Why Use BLAST? David Form - August 15,

Why Use BLAST? David Form - August 15, Wolbachia Workshop 2017 Bioinformatics BLAST Basic Local Alignment Search Tool Finding Model Organisms for Study of Disease Can yeast be used as a model organism to study cystic fibrosis? BLAST Why Use

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Germ-line vs somatic-variation theories

Germ-line vs somatic-variation theories BME 128 Tuesday April 26 (1) Filling in the gaps Antibody diversity, how is it achieved? - by specialised (!) mechanisms Chp6 (Protein Diversity & Sequence Analysis) - more about the main concepts in this

More information

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1)

FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) FINDING GENES AND EXPLORING THE GENE PAGE AND RUNNING A BLAST (Exercise 1) 1.1 Finding a gene using text search. Note: For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium.

More information

RNA Genomics. BME 110: CompBio Tools Todd Lowe May 14, 2010

RNA Genomics. BME 110: CompBio Tools Todd Lowe May 14, 2010 RNA Genomics BME 110: CompBio Tools Todd Lowe May 14, 2010 Admin WebCT quiz on Tuesday cover reading, using Jalview & Pfam Homework #3 assigned today due next Friday (8 days) In Genomes, Two Types of Genes

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010 Computational analysis of non-coding RNA Andrew Uzilov auzilov@ucsc.edu BME110 Tue, Nov 16, 2010 1 Corrected/updated talk slides are here: http://tinyurl.com/uzilovrna redirects to: http://users.soe.ucsc.edu/~auzilov/bme110/fall2010/

More information

Community-assisted genome annotation: The Pseudomonas example. Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada

Community-assisted genome annotation: The Pseudomonas example. Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada Community-assisted genome annotation: The Pseudomonas example Geoff Winsor, Simon Fraser University Burnaby (greater Vancouver), Canada Overview Pseudomonas Community Annotation Project (PseudoCAP) Past

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

Small Genome Annotation and Data Management at TIGR

Small Genome Annotation and Data Management at TIGR Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient

More information

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use

More information

Finding Genes, Building Search Strategies and Visiting a Gene Page

Finding Genes, Building Search Strategies and Visiting a Gene Page Finding Genes, Building Search Strategies and Visiting a Gene Page 1. Finding a gene using text search. For this exercise use http://www.plasmodb.org a. Find all possible kinases in Plasmodium. Hint: use

More information

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417. SENIOR BIOLOGY Blueprint of life and Genetics: the Code Broken? NAME SCHOOL / ORGANISATION DATE Bay 12, 1417 Bay number Specimen number INTRODUCTORY NOTES Blueprint of Life In this part of the workshop

More information

Biology 4100 Minor Assignment 1 January 19, 2007

Biology 4100 Minor Assignment 1 January 19, 2007 Biology 4100 Minor Assignment 1 January 19, 2007 This assignment is due in class on February 6, 2007. It is worth 7.5% of your final mark for this course. Your assignment must be typed double-spaced on

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State

Practical Bioinformatics for Life Scientists. Week 14, Lecture 27. István Albert Bioinformatics Consulting Center Penn State Practical Bioinformatics for Life Scientists Week 14, Lecture 27 István Albert Bioinformatics Consulting Center Penn State No homework this week Project to be given out next Thursday (Dec 1 st ) Due following

More information

Big Idea 3C Basic Review

Big Idea 3C Basic Review Big Idea 3C Basic Review 1. A gene is a. A sequence of DNA that codes for a protein. b. A sequence of amino acids that codes for a protein. c. A sequence of codons that code for nucleic acids. d. The end

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical

More information

Bacterial Genome Annotation

Bacterial Genome Annotation Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control

More information

Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii

Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii J. Zhang 1 *, X. Liu 1 * and X.-J. Li 2 1 Department of Geriatrics Medicine, The Third People s Hospital of Chongqing,

More information

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. BLAST Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences. An example could be aligning an mrna sequence to genomic DNA. Proteins are frequently composed of

More information

Microbial Genetics. Chapter 8

Microbial Genetics. Chapter 8 Microbial Genetics Chapter 8 Structure and Function of Genetic Material Genome A cell s genetic information Chromosome Structures containing DNA that physically carry hereditary information Gene Segments

More information

BLAST. Subject: The result from another organism that your query was matched to.

BLAST. Subject: The result from another organism that your query was matched to. BLAST (Basic Local Alignment Search Tool) Note: This is a complete transcript to the powerpoint. It is good to read through this once to understand everything. If you ever need help and just need a quick

More information

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018 Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

Ch. 10 From DNA to Protein. AP Biology

Ch. 10 From DNA to Protein. AP Biology Ch. 10 From DNA to Protein Protein Synthesis Metabolism and Gene Expression n Inheritance of metabolic diseases suggests that genes coded for enzymes n Diseases (phenotypes) caused by non-functional gene

More information

DNA sequencing. Dideoxy-terminating sequencing or Sanger dideoxy sequencing

DNA sequencing. Dideoxy-terminating sequencing or Sanger dideoxy sequencing DNA sequencing Dideoxy-terminating sequencing or Sanger dideoxy sequencing Tools DNA template (single stranded) Specific primer (usually 17-23 mer, free 3 -OH) dntps DNA polymerase capacity of polymerizing

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

Assigning Sequences to Taxa CMSC828G

Assigning Sequences to Taxa CMSC828G Assigning Sequences to Taxa CMSC828G Outline Objective (1 slide) MEGAN (17 slides) SAP (33 slides) Conclusion (1 slide) Objective Given an unknown, environmental DNA sequence: Make a taxonomic assignment

More information

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF)

European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF) Guideline for the submission of DNA sequences derived from genetically modified organisms and associated annotations within the framework of Directive 2001/18/EC and Regulation (EC) No 1829/2003 European

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

Aaditya Khatri. Abstract

Aaditya Khatri. Abstract Abstract In this project, Chimp-chunk 2-7 was annotated. Chimp-chunk 2-7 is an 80 kb region on chromosome 5 of the chimpanzee genome. Analysis with the Mapviewer function using the NCBI non-redundant database

More information

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University

Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University Optimization of RNAi Targets on the Human Transcriptome Ahmet Arslan Kurdoglu Computational Biosciences Program Arizona State University my background Undergraduate Degree computer systems engineer (ASU

More information

2 Gene Technologies in Our Lives

2 Gene Technologies in Our Lives CHAPTER 15 2 Gene Technologies in Our Lives SECTION Gene Technologies and Human Applications KEY IDEAS As you read this section, keep these questions in mind: For what purposes are genes and proteins manipulated?

More information

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 RNA Genomics II BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 1 TIME Why RNA? An evolutionary perspective The RNA World hypotheses: life arose as self-replicating non-coding RNA (ncrna)

More information

Analysis Report. Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly

Analysis Report. Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly Analysis Report Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly 1 Table of Contents 1. Result of Whole Genome Assembly

More information

Wednesday, November 22, 17. Exons and Introns

Wednesday, November 22, 17. Exons and Introns Exons and Introns Introns and Exons Exons: coded regions of DNA that get transcribed and translated into proteins make up 5% of the genome Introns and Exons Introns: non-coded regions of DNA Must be removed

More information

An Analysis of Adenovirus Genomes Using Whole Genome Software Tools

An Analysis of Adenovirus Genomes Using Whole Genome Software Tools www.bioinformation.net Volume 12(6) An Analysis of Adenovirus Genomes Using Whole Genome Software Tools Padmanabhan Mahadevan* Hypothesis Department of Biology, University of Tampa, 401 W. Kennedy Blvd.

More information

Identification of a Cucumber mosaic virus Subgroup II Strain Associated with Virus-like Symptoms on Hosta in Ohio

Identification of a Cucumber mosaic virus Subgroup II Strain Associated with Virus-like Symptoms on Hosta in Ohio 2013 Plant Management Network. Accepted for publication 18 December 2012. Published. Identification of a Cucumber mosaic virus Subgroup II Strain Associated with Virus-like Symptoms on Hosta in Ohio John

More information

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome

More information

Gene Prediction: Preliminary Results

Gene Prediction: Preliminary Results Gene Prediction: Preliminary Results Outline Preliminary Pipeline Programs Program Comparison Tests Metrics Gene Prediction Tools: Usage + Results GeneMarkS Glimmer 3.0 Prodigal BLAST ncrna Prediction

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Theory and Application of Multiple Sequence Alignments

Theory and Application of Multiple Sequence Alignments Theory and Application of Multiple Sequence Alignments a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It Brett Pickett, PhD History Structure of DNA discovered (1953)

More information

Protein Bioinformatics Part I: Access to information

Protein Bioinformatics Part I: Access to information Protein Bioinformatics Part I: Access to information 260.655 April 6, 2006 Jonathan Pevsner, Ph.D. pevsner@kennedykrieger.org Outline [1] Proteins at NCBI RefSeq accession numbers Cn3D to visualize structures

More information

Recombinant or Synthetic Nucleic Acid Molecules

Recombinant or Synthetic Nucleic Acid Molecules Overview of the NIH Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules NIH Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules A scientificallyresponsive

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Interpretation of sequence results

Interpretation of sequence results Interpretation of sequence results An overview on DNA sequencing: DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It use a modified PCR reaction where both

More information

GREG GIBSON SPENCER V. MUSE

GREG GIBSON SPENCER V. MUSE A Primer of Genome Science ience THIRD EDITION TAGCACCTAGAATCATGGAGAGATAATTCGGTGAGAATTAAATGGAGAGTTGCATAGAGAACTGCGAACTG GREG GIBSON SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc.

More information

Eukaryotic Gene Structure

Eukaryotic Gene Structure Eukaryotic Gene Structure Terminology Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Gene Basic physical and

More information

C. Incorrect! Threonine is an amino acid, not a nucleotide base.

C. Incorrect! Threonine is an amino acid, not a nucleotide base. MCAT Biology - Problem Drill 05: RNA and Protein Biosynthesis Question No. 1 of 10 1. Which of the following bases are only found in RNA? Question #01 (A) Ribose. (B) Uracil. (C) Threonine. (D) Adenine.

More information

Genetic Engineering & Recombinant DNA

Genetic Engineering & Recombinant DNA Genetic Engineering & Recombinant DNA Chapter 10 Copyright The McGraw-Hill Companies, Inc) Permission required for reproduction or display. Applications of Genetic Engineering Basic science vs. Applied

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

From DNA to Protein: Genotype to Phenotype

From DNA to Protein: Genotype to Phenotype 12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each

More information

Bundle 5 Test Review

Bundle 5 Test Review Bundle 5 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? _Nucleic

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Chapter 11 Quiz #8: February 13 th You will distinguish between the famous scientists and their contributions towards DNA You will demonstrate replication, transcription, and translation from a sample

More information

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test DNA is the genetic material Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test Dr. Amy Rogers Bio 139 General Microbiology Hereditary information is carried by DNA Griffith/Avery

More information

Having the same or similar function frequently occurs with homologs. True/False

Having the same or similar function frequently occurs with homologs. True/False Which of the following is NOT part of the explanation for how complex functional molecules were assembled, despite the vastness of protein space? Gaia directs protein evolution, through negative feedback

More information

Answer: Sequence overlap is required to align the sequenced segments relative to each other.

Answer: Sequence overlap is required to align the sequenced segments relative to each other. 14 Genomes and Genomics WORKING WITH THE FIGURES 1. Based on Figure 14-2, why must the DNA fragments sequenced overlap in order to obtain a genome sequence? Answer: Sequence overlap is required to align

More information

MAKER: An easy to use genome annotation pipeline. Carson Holt Yandell Lab Department of Human Genetics University of Utah

MAKER: An easy to use genome annotation pipeline. Carson Holt Yandell Lab Department of Human Genetics University of Utah MAKER: An easy to use genome annotation pipeline Carson Holt Yandell Lab Department of Human Genetics University of Utah Introduction to Genome Annotation What annotations are Importance of genome annotations

More information

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

Sequence Annotation & Designing Gene-specific qpcr Primers (computational) James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under

More information

Genomic region (ENCODE) Gene definitions

Genomic region (ENCODE) Gene definitions DNA From genes to proteins Bioinformatics Methods RNA PROMOTER ELEMENTS TRANSCRIPTION Iosif Vaisman mrna SPLICE SITES SPLICING Email: ivaisman@gmu.edu START CODON STOP CODON TRANSLATION PROTEIN From genes

More information

SAMPLE LITERATURE Please refer to included weblink for correct version.

SAMPLE LITERATURE Please refer to included weblink for correct version. Edvo-Kit #340 DNA Informatics Experiment Objective: In this experiment, students will explore the popular bioninformatics tool BLAST. First they will read sequences from autoradiographs of automated gel

More information

Gene Expression - Transcription

Gene Expression - Transcription DNA Gene Expression - Transcription Genes are expressed as encoded proteins in a 2 step process: transcription + translation Central dogma of biology: DNA RNA protein Transcription: copy DNA strand making

More information

Exploring a fatal outbreak of Escherichia coli using PATRIC

Exploring a fatal outbreak of Escherichia coli using PATRIC Exploring a fatal outbreak of Escherichia coli using PATRIC On May 19, 2011, the Robert Koch Institute, Germany's national-level public health authority, was informed about a cluster of three cases of

More information

Microbial Metabolism Systems Microbiology

Microbial Metabolism Systems Microbiology 1 Microbial Metabolism Systems Microbiology Ching-Tsan Huang ( 黃慶璨 ) Office: Agronomy Hall, Room 111 Tel: (02) 33664454 E-mail: cthuang@ntu.edu.tw MIT OCW Systems Microbiology aims to integrate basic biological

More information

Pauling/Itano Experiment

Pauling/Itano Experiment Chapter 12 Pauling/Itano Experiment Linus Pauling and Harvey Itano knew that hemoglobin, a molecule in red blood cells, contained an electrical charge. They wanted to see if the hemoglobin in normal RBC

More information

Computational Biology I LSM5191

Computational Biology I LSM5191 Computational Biology I LSM5191 Lecture 5 Notes: Genetic manipulation & Molecular Biology techniques Broad Overview of: Enzymatic tools in Molecular Biology Gel electrophoresis Restriction mapping DNA

More information

Bundle 6 Test Review

Bundle 6 Test Review Bundle 6 Test Review DNA vs. RNA DNA Replication Gene Mutations- Protein Synthesis 1. Label the different components and complete the complimentary base pairing. What is this molecule called? Deoxyribonucleic

More information

number Done by Corrected by Doctor Hamed Al Zoubi

number Done by Corrected by Doctor Hamed Al Zoubi number 3 Done by Neda a Baniata Corrected by Waseem Abu Obeida Doctor Hamed Al Zoubi Note: it is important to refer to slides. Bacterial genetics *The main concepts we will talk about in this lecture:

More information

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist

More information

Sequencing the Human Genome

Sequencing the Human Genome Revised and Updated Edvo-Kit #339 Sequencing the Human Genome 339 Experiment Objective: In this experiment, students will read DNA sequences obtained from automated DNA sequencing techniques. The data

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation. Chapter 12 Packet DNA and RNA Name Period California State Standards covered by this chapter: Cell Biology 1. The fundamental life processes of plants and animals depend on a variety of chemical reactions

More information

Antigen 43 Primer Design

Antigen 43 Primer Design Antigen 43 Primer Design 7-29-2010 Background We want to amplify the flu operon off of the E. coli K12 chromosome using PCR in order to make the cell surface of E. coli and other Pseudomonas species frizzy.

More information