Gene Prediction. Lab & Preliminary Results. Faction 2 Saturday, March 11, 2017

Size: px
Start display at page:

Download "Gene Prediction. Lab & Preliminary Results. Faction 2 Saturday, March 11, 2017"

Transcription

1 Gene Prediction Lab & Preliminary Results Faction 2 Saturday, March 11, 2017 Group Members: Michelle Kim Khushbu Patel Krithika Xinrui Zhou Chen Lin Sujun Zhao Hannah Hatchell rohini mopuri Jack Cartee

2 GENE PREDICTION BLAST Validation Ab-initio Prediction RNA Prediction GeneMark Glimmer Prodigal FGenesB rrnascan RNAmmer RFam

3 Glimmer Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

4 Usage Ab-initio gene prediction method long-orfs -n -t 1.15 genome.fasta run1.longorfs extract -t genome.fasta run3.longorfs > run3.train build-icm -r run3.icm < run3.train glimmer3 -o50 -g110 -t30 genome.fasta run3.icm run3.run1 5tail +2 run3.run1.predict > run3.coords upstream-coords.awk 25 0 run3.coords extract genom.seq - > run3.upstream elph run3.upstream LEN=6 get-motif-counts.awk > run3.motif set startuse = start-codon-distrib -3 genome.fasta run3.coords glimmer3 -o50 -g110 -t30 -b run3.motif -P $startuse genom.seq run3.icm run3

5 Output

6 NCBI Reference Genes NCBI Annotation Glimmer Genes Predicted TP FN FP Sensitivity Positive Predictio n Rate Output Statistics Samples 1-5 Assembled in Spades: Contigs 1 Contigs 2 Contigs 3 Contigs 4 Contigs 5 # of Predictions 4,895 5,253 4,889 4,830 4,754 # of Nodes

7 GeneMark Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

8 >Web-based ab initio gene prediction method >Self-training program based on GeneMark.hmm heuristics model GeneMarkS Platform >Handles sequences 1Mb >Input: copy or upload FASTA files using web interface >Output: file containing predicted genes (strand, start, end) delivered via

9 GeneMarkS Platform

10 Output

11 NCBI Reference Genes NCBI Annotation GeneMark S Annotation True Positive (TP) False Negative (FN) False Positive (FP) Sensitivity Positive Predictio n Rate Output Statistics Samples 1-5 Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 # of Predictions # of Nodes

12 FGenesB Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

13 Command Line: Web Interface Command Input: Nucleotide sequence (plain or FASTA format) Output: FgenesB output (html file)

14 FgenesB-Web Interface

15 Results

16 Results

17 NCBI Reference Output Statistics Genes NCBI Annotati on FGenes B Annotati on True Positive (TP) False Negati ve (FN) False Positiv e (FP) Sensitiv ity(%) Positive Predictio n Rate (%)

18 Prodigal Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

19 Prodigal provides 3 modes of gene prediction Normal mode: learn properties of the sequence provided and then predict gene based on these properties Mode Type Anonymous mode: apply pre-calculated training files to the sequence provided and predict based on the best results Training mode: work like normal mode, but save a training file for further use Command: -p mode specify mode (normal train, or anon)

20 Command Command used: prodigal i input.fasta o output.gff f gff(output format)

21 Result NCBI Reference Statistics Genes NCBI Annotation Prodigal Annotation True Positive (TP) False Negative (FN) False Positive (FP) Sensitivity Positive Prediction Rate

22 RNAmmer Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

23 predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences RNAmmer Usage : rnammer [-S kingdom] [-m molecules] [-xml xmlfile] [-gff gff-file] [-d] [-p] [-h hmmreport] [-f fasta-file] [sequence] RNAmmer consists of two components: A core Perl program, 'core-rnammer', and a wrapper, 'rnammer'. The wrapper sets up the search by writing one or more temporary configuration(s).

24 RNAmmer

25 Tools RNAmmer (rrnas) trnascan- SE (trnas) Rfam Prediction Result NCBI Annotation Predictions for Reference (rrnas = 8 trnas=79) Assembly Contigs trnascan- SE Sample 1 Sample 2 Sample 3 Sample 4 Sample RNAmmer

26 trnascan-se Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

27 Working & Command Working: trnascan and EufindtRNA are used as first-pass pre filters to identify "candidate" trna regions of the sequence. These subsequences are then passed to Cove for further analysis, and output if Cove confirms the initial trna prediction. Usage: trnascan-se [-options] <FASTA file(s)> -B : search for bacterial trnas (use bacterial trna model) -o <file> : save final results in <file> -f <file> : save trna secondary structures to <file>

28 Tabular Output Sequence with Cove score over 20 is counted as a trna

29 Secondary Structure file Tabular Output Uppercase Nucleotides match consensus trna model Lowercase Nucleotides are non-conserved

30 Rfam Glimmer GeneMark FGenesB Prodigal RNAmmer trnascan-se Rfam

31 Using Rfam 12.0 and Infernal 1.1 The Rfam builds library of covariance models. Rfam In conjunction with Infernal Software gives the Non-coding RNA s by Homology search. Gives the output in different formats(tabular, standard cmscan output).

32 Rfam

33 RNAcon?

34 MERGE

35 Sample Salmonella enterica Reference genome: Salmonella enterica subsp. enterica serovar Typhi str. CT annotated genes

36 Glimmer 4989 Array _Intersect _Raw FGenesB GeneMark Prodigal Glimmer FGenesB GeneMark Prodigal

37 bedtools intersect a [file1] b [file2] f 0.80 r bedtools intersect -f Minimum overlap required as a fraction of A -F Minimum overlap required as a fraction of B -r Require that the fraction of overlap be reciprocal for A and B -f 0.80 r <=> -f 0.80 F 0.80

38 Glimmer 4989 Array _Intersect _Raw FGenesB GeneMark Prodigal Glimmer FGenesB GeneMark Prodigal

39 Raw Ratio Glimmer 4989 Glimmer 1 FGenesB FGenesB Calculation GeneMark Prodigal GeneMark Prodigal Glimmer FGenesB GeneMark Prodigal Ratio[i][j] = Glimmer FGenesB GeneMark Prodigal 2Raw[i][j] Raw[i][i] + Raw[j][j]

40 Glimmer 1 Array _Intersect _Ratio FGenesB GeneMark Prodigal Glimmer FGenesB GeneMark Prodigal

41 Prodigal 80 Venn Diagram FGenesB 47 GeneMark

42 1.71% Pr Prodigal Venn Diagram 2.41% Pr 2.39% Fg FGenesB 2.69% Pr 2.66% Ge 93.18% Pr 92.25% Fg 92.00% Ge GeneMark 1.00% Fg 4.36% Fg 4.35% Ge 0.99% Ge

43 Glimmer 1 Methods Prodigal GeneMark FGenesB Glimmer FGenesB GeneMark Prodigal Glimmer FGenesB GeneMark Prodigal Merge :Prodigal, GeneMark & FGenesB

44 Total Genes Overlapped Genes Predicted P_TG P_OG Reference R_TG R_OG Criteria Sensitivity = TP TP + FN = R_OG R_TG 100% Proportion of genes in reference have been predicted. PositivePredictionRate = P_OG P_TG 100% Proportion of predicted genes which are in reference.

45 bedtools intersect a [file1] b [file2] f 0.80 r -u bedtools intersect -f -r -u Minimum overlap required as a fraction of A Require that the fraction of overlap be reciprocal for A and B Write original A entry once if any overlaps found in B

46 Positive Prediction Rate Positive Prediction Rate & Sensitivity Sensitivity merge prodigal genemark fgenesb glimmer

47 Ab initio Gene Prediction RNA prediction Final Prediction RNA Prediction

48 bedtools intersect a [file1] b [file2] f 0.80 r -u bedtools intersect -f -r -u Minimum overlap required as a fraction of A Require that the fraction of overlap be reciprocal for A and B Write original A entry once if any overlaps found in B

49 Merge3 & RNAmmer 0 overlap RNA prediction Merge3 & trnascan 0 overlap Merge3 & Rfam 0 overlap

50 Ab initio Gene Prediction GENE PREDICTION Final Prediction RNA Prediction

51 GENE PREDICTION BLAST Validation Ab-initio Prediction RNA Prediction GeneMark Prodigal FGenesB trnascan RNAmmer RFam

52 Thanks for listening. Saturday, March 11, 2017 Group Members: Michelle Kim Khushbu Patel Krithika Xinrui Zhou Chen Lin Sujun Zhao Hannah Hatchell rohini mopuri Jack Cartee

53 Sensitivity Glimmer? Merge3 Positive Prediction Rate Sensitivity Merge3+Glimmer Positive Prediction Rate

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition

GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition GeneMarkS-2: Raising Standards of Accuracy in Gene Recognition Alexandre Lomsadze 1^, Shiyuyun Tang 2^, Karl Gemayel 3^ and Mark Borodovsky 1,2,3 ^ joint first authors 1 Wallace H. Coulter Department of

More information

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation Genome Annotation Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America May 27th, 2015 Outline Genome Annotation 1 Repeat Annotation 2 Repeat

More information

Analysis Report. Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly

Analysis Report. Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly Analysis Report Institution : Macrogen Japan Name : Macrogen Japan Order Number : 1501APB-0004 Sample Name : 8380 Type of Analysis : De novo assembly 1 Table of Contents 1. Result of Whole Genome Assembly

More information

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project

More information

Genome sequence of Acinetobacter baumannii MDR-TJ

Genome sequence of Acinetobacter baumannii MDR-TJ JB Accepts, published online ahead of print on 11 March 2011 J. Bacteriol. doi:10.1128/jb.00226-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

COMPUTER RESOURCES II:

COMPUTER RESOURCES II: COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

Protein Synthesis: Transcription and Translation

Protein Synthesis: Transcription and Translation Protein Synthesis: Transcription and Translation Proteins In living things, proteins are in charge of the expression of our traits (hair/eye color, ability to make insulin, predisposition for cancer, etc.)

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

MicroSEQ Rapid Microbial Identification System

MicroSEQ Rapid Microbial Identification System MicroSEQ Rapid Microbial Identification System Giving you complete control over microbial identification using the gold-standard genotypic method The MicroSEQ ID microbial identification system, based

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

Genomics and Transcriptomics of Spirodela polyrhiza

Genomics and Transcriptomics of Spirodela polyrhiza Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence

More information

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010 Computational analysis of non-coding RNA Andrew Uzilov auzilov@ucsc.edu BME110 Tue, Nov 16, 2010 1 Corrected/updated talk slides are here: http://tinyurl.com/uzilovrna redirects to: http://users.soe.ucsc.edu/~auzilov/bme110/fall2010/

More information

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 RNA Genomics II BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 1 TIME Why RNA? An evolutionary perspective The RNA World hypotheses: life arose as self-replicating non-coding RNA (ncrna)

More information

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important! Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii

Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii Bioinformatic analysis of phage AB3, a phikmv-like virus infecting Acinetobacter baumannii J. Zhang 1 *, X. Liu 1 * and X.-J. Li 2 1 Department of Geriatrics Medicine, The Third People s Hospital of Chongqing,

More information

Computational aspects of ncrna research. Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics

Computational aspects of ncrna research. Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects of ncrna research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects on ncrna Bacterial ncrnas research Gene discovery Target discovery Discovery

More information

Small Genome Annotation and Data Management at TIGR

Small Genome Annotation and Data Management at TIGR Small Genome Annotation and Data Management at TIGR Michelle Gwinn, William Nelson, Robert Dodson, Steven Salzberg, Owen White Abstract TIGR has developed, and continues to refine, a comprehensive, efficient

More information

Gene-centered resources at NCBI

Gene-centered resources at NCBI COURSE OF BIOINFORMATICS a.a. 2014-2015 Gene-centered resources at NCBI We searched Accession Number: M60495 AT NCBI Nucleotide Gene has been implemented at NCBI to organize information about genes, serving

More information

Design of E. coli chip v2 (pan-genome chip)

Design of E. coli chip v2 (pan-genome chip) Design of E. coli chip v2 (pan-genome chip) Sequences We have included the following strains in the design, primarily obtained from NCBI GenomeProjects [34]: Strain Accession NCBI Proj ID contigs ORFs

More information

TSSpredator User Guide v 1.00

TSSpredator User Guide v 1.00 TSSpredator User Guide v 1.00 Alexander Herbig alexander.herbig@uni-tuebingen.de Kay Nieselt kay.nieselt@uni-tuebingen.de June 3, 2013 1 Getting Started TSSpredator is a tool for the comparative detection

More information

ABSTRACT METHODS FOR MICROBIAL GENOMICS. Professor Steven L. Salzberg Department of Computer Science

ABSTRACT METHODS FOR MICROBIAL GENOMICS. Professor Steven L. Salzberg Department of Computer Science ABSTRACT Title of Dissertation: COMPARATIVE AND COMPUTATIONAL METHODS FOR MICROBIAL GENOMICS Derrick Edward Wood, Doctor of Philosophy, 2014 Directed by: Professor Steven L. Salzberg Department of Computer

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

Study Guide for Chapter 12 Exam DNA, RNA, & Protein Synthesis

Study Guide for Chapter 12 Exam DNA, RNA, & Protein Synthesis Name: Date: Period: Study Guide for Chapter 12 Exam DNA, RNA, & Protein Synthesis ***Completing this study guide in its entirety will result in extra credit on the exam. You must show me the DAY OF the

More information

RNA folding & ncrna discovery

RNA folding & ncrna discovery I519 Introduction to Bioinformatics RNA folding & ncrna discovery Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Non-coding RNAs and their functions RNA structures RNA folding

More information

Complete genome sequence of Clostridium acetobutylicum. DSM 1731, a solvent producing strain with multi-replicon

Complete genome sequence of Clostridium acetobutylicum. DSM 1731, a solvent producing strain with multi-replicon JB Accepts, published online ahead of print on 8 July 2011 J. Bacteriol. doi:10.1128/jb.05596-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

Three-Way Comparison and Investigation of Annotated Halorhabdus utahensis Genome

Three-Way Comparison and Investigation of Annotated Halorhabdus utahensis Genome Three-Way Comparison and Investigation of Annotated Halorhabdus utahensis Genome Peter Bakke, Nick Carney, Will DeLoache, Mary Gearing, Matt Lotz, Jay McNair, Pallavi Penumetcha, Samantha Simpson, Laura

More information

Applied Bioinformatics

Applied Bioinformatics Applied Bioinformatics Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Course overview What is bioinformatics Data driven science: the creation and advancement

More information

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian, Genscan The Genscan HMM model Training Genscan Validating Genscan (c) Devika Subramanian, 2009 96 Gene structure assumed by Genscan donor site acceptor site (c) Devika Subramanian, 2009 97 A simple model

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Training materials.

Training materials. Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

NCBI web resources I: databases and Entrez

NCBI web resources I: databases and Entrez NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table

More information

NOTES Gene Expression ACP Biology, NNHS

NOTES Gene Expression ACP Biology, NNHS Name Date Block NOTES Gene Expression ACP Biology, NNHS Model 1: Transcription the process of genes in DNA being copied into a messenger RNA 1. Where in the cell is DNA found? 2. Where in the cell does

More information

Glossary of Commonly used Annotation Terms

Glossary of Commonly used Annotation Terms Glossary of Commonly used Annotation Terms Akela a general use server for the annotation group as well as other groups throughout TIGR. Annotation Notebook a link from the gene list page that is associated

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Bioinformatic tools for metagenomic data analysis

Bioinformatic tools for metagenomic data analysis Bioinformatic tools for metagenomic data analysis MEGAN - blast-based tool for exploring taxonomic content MG-RAST (SEED, FIG) - rapid annotation of metagenomic data, phylogenetic classification and metabolic

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

7.2 Protein Synthesis. From DNA to Protein Animation

7.2 Protein Synthesis. From DNA to Protein Animation 7.2 Protein Synthesis From DNA to Protein Animation Proteins Why are proteins so important? They break down your food They build up muscles They send signals through your brain that control your body They

More information

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis Lina L. Faller Department of Computer Science University of New Hampshire June 2008

More information

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional

More information

Protein Synthesis. DNA to RNA to Protein

Protein Synthesis. DNA to RNA to Protein Protein Synthesis DNA to RNA to Protein From Genes to Proteins Processing the information contained in DNA into proteins involves a sequence of events known as gene expression and results in protein synthesis.

More information

#28 - Promoter Prediction 10/29/07

#28 - Promoter Prediction 10/29/07 BCB 444/544 Required Reading (before lecture) Lecture 28 Mon Oct 29 - Lecture 28 Promoter & Regulatory Element Prediction Chp 9 - pp 113-126 Gene Prediction - finish it Wed Oct 30 - Lecture 29 Phylogenetics

More information

Post-assembly Data Analysis

Post-assembly Data Analysis Assembled transcriptome Post-assembly Data Analysis Quantification: the expression level of each gene in each sample DE genes: genes differentially expressed between samples Clustering/network analysis

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

Introduction to RNA sequencing

Introduction to RNA sequencing Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence

More information

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas

ORTHOMINE - A dataset of Drosophila core promoters and its analysis. Sumit Middha Advisor: Dr. Peter Cherbas ORTHOMINE - A dataset of Drosophila core promoters and its analysis Sumit Middha Advisor: Dr. Peter Cherbas Introduction Challenges and Motivation D melanogaster Promoter Dataset Expanding promoter sequences

More information

Transcription in Eukaryotes

Transcription in Eukaryotes Transcription in Eukaryotes Biology I Hayder A Giha Transcription Transcription is a DNA-directed synthesis of RNA, which is the first step in gene expression. Gene expression, is transformation of the

More information

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5

PRESENTING SEQUENCES 5 GAATGCGGCTTAGACTGGTACGATGGAAC 3 3 CTTACGCCGAATCTGACCATGCTACCTTG 5 Molecular Biology-2017 1 PRESENTING SEQUENCES As you know, sequences may either be double stranded or single stranded and have a polarity described as 5 and 3. The 5 end always contains a free phosphate

More information

SSA Signal Search Analysis II

SSA Signal Search Analysis II SSA Signal Search Analysis II SSA other applications - translation In contrast to translation initiation in bacteria, translation initiation in eukaryotes is not guided by a Shine-Dalgarno like motif.

More information

Computational gene finding. Devika Subramanian Comp 470

Computational gene finding. Devika Subramanian Comp 470 Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative

More information

RNA-Seq Software, Tools, and Workflows

RNA-Seq Software, Tools, and Workflows RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:

More information

Gene and Translation Initiation Site Prediction in Metagenomic Sequences

Gene and Translation Initiation Site Prediction in Metagenomic Sequences Bioinformatics Advance Access published July 12, 2012 Gene and Translation Initiation Site Prediction in genomic Sequences Doug Hyatt 1,2*, Philip F. LoCascio 1, Loren J. Hauser 1,2,and Edward C. Uberbacher

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

DNA - DEOXYRIBONUCLEIC ACID

DNA - DEOXYRIBONUCLEIC ACID DNA - DEOXYRIBONUCLEIC ACID blueprint of life (has the instructions for making an organism) established by James Watson and Francis Crick codes for your genes shape of a double helix made of repeating

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Discovering Common Sequence Variation in A. thaliana. Gunnar Rätsch

Discovering Common Sequence Variation in A. thaliana. Gunnar Rätsch Machine Learning Methods for Discovering Common Sequence Variation in A. thaliana Gunnar Rätsch Friedrich Miescher Laboratory, Max Planck Society, Tübingen Technical University Berlin March 31, 2008 Current

More information

Transcription and Translation

Transcription and Translation Biology Name: Morales Date: Period: Transcription and Translation Directions: Read the following and answer the questions in complete sentences. DNA is the molecule of heredity it determines an organism

More information

Complete Genome Sequence of the Polycyclic Aromatic Hydrocarbon-Degrading. Bacterium Alteromonas sp. Strain SN2

Complete Genome Sequence of the Polycyclic Aromatic Hydrocarbon-Degrading. Bacterium Alteromonas sp. Strain SN2 JB Accepts, published online ahead of print on 24 June 2011 J. Bacteriol. doi:10.1128/jb.05252-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

PROTEIN SYNTHESIS. copyright cmassengale

PROTEIN SYNTHESIS. copyright cmassengale PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other

More information

PROTEIN SYNTHESIS. copyright cmassengale

PROTEIN SYNTHESIS. copyright cmassengale PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other

More information

Self-test Quiz for Chapter 12 (From DNA to Protein: Genotype to Phenotype)

Self-test Quiz for Chapter 12 (From DNA to Protein: Genotype to Phenotype) Self-test Quiz for Chapter 12 (From DNA to Protein: Genotype to Phenotype) Question#1: One-Gene, One-Polypeptide The figure below shows the results of feeding trials with one auxotroph strain of Neurospora

More information

Genome Sequence Assembly

Genome Sequence Assembly Genome Sequence Assembly Learning Goals: Introduce the field of bioinformatics Familiarize the student with performing sequence alignments Understand the assembly process in genome sequencing Introduction:

More information

BIO 311C Spring Lecture 36 Wednesday 28 Apr.

BIO 311C Spring Lecture 36 Wednesday 28 Apr. BIO 311C Spring 2010 1 Lecture 36 Wednesday 28 Apr. Synthesis of a Polypeptide Chain 5 direction of ribosome movement along the mrna 3 ribosome mrna NH 2 polypeptide chain direction of mrna movement through

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Ab initio gene identification in metagenomic sequences

Ab initio gene identification in metagenomic sequences Nucleic Acids Research Advance Access published April 19, 2010 Nucleic Acids Research, 2010, 1 15 doi:10.1093/nar/gkq275 Ab initio gene identification in metagenomic sequences Wenhan Zhu 1, Alexandre Lomsadze

More information

Chapter 10: Gene Expression and Regulation

Chapter 10: Gene Expression and Regulation Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must

More information

Gene Structure & Gene Finding Part II

Gene Structure & Gene Finding Part II Gene Structure & Gene Finding Part II David Wishart david.wishart@ualberta.ca 30,000 metabolite Gene Finding in Eukaryotes Eukaryotes Complex gene structure Large genomes (0.1 to 10 billion bp) Exons and

More information

DNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted

DNA RNA PROTEIN. Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted DNA RNA PROTEIN Professor Andrea Garrison Biology 11 Illustrations 2010 Pearson Education, Inc. unless otherwise noted DNA Molecule of heredity Contains all the genetic info our cells inherit Determines

More information

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! Protein Synthesis/Gene Expression Why do we need to make proteins? To build parts for our body as

More information

Complete Genome Sequence of Pathogenic Bacterium

Complete Genome Sequence of Pathogenic Bacterium JB Accepts, published online ahead of print on 25 March 2011 J. Bacteriol. doi:10.1128/jb.00301-11 Copyright 2011, American Society for Microbiology and/or the Listed Authors/Institutions. All Rights Reserved.

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Gene Signal Estimates from Exon Arrays

Gene Signal Estimates from Exon Arrays Gene Signal Estimates from Exon Arrays I. Introduction: With exon arrays like the GeneChip Human Exon 1.0 ST Array, researchers can examine the transcriptional profile of an entire gene (Figure 1). Being

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

DNA Structure and Replication, and Virus Structure and Replication Test Review

DNA Structure and Replication, and Virus Structure and Replication Test Review DNA Structure and Replication, and Virus Structure and Replication Test Review What does DNA stand for? Deoxyribonucleic Acid DNA is what type of macromolecule? DNA is a nucleic acid The building blocks

More information

RNA & PROTEIN SYNTHESIS

RNA & PROTEIN SYNTHESIS RNA & PROTEIN SYNTHESIS DNA & RNA Genes are coded DNA instructions that control the production of proteins within the cell. The first step in decoding these genetic messages is to copy part of the nucleotide

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes

MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes Resource MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes Brandi L. Cantarel, 1 Ian Korf, 2 Sofia M.C. Robb, 3 Genis Parra, 2 Eric Ross, 4 Barry Moore, 1 Carson Holt,

More information

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018 Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made

More information

Year III Pharm.D Dr. V. Chitra

Year III Pharm.D Dr. V. Chitra Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 3, 2004 Eukaryotic Gene Structure eukaryotic genomes are considerably more complex than those of prokaryotes eukaryotic cells have organelles

More information

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE 1 TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE Bioinformatics Senior Project Wasay Hussain Spring 2009 Overview of RNA 2 The central Dogma of Molecular biology is DNA RNA Proteins The RNA (Ribonucleic

More information

RNA and Protein Synthesis

RNA and Protein Synthesis RNA and Protein Synthesis CTE: Agriculture and Natural Resources: C5.3 Understand various cell actions, such as osmosis and cell division. C5.4 Compare and contrast plant and animal cells, bacteria, and

More information

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6 Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6 Overview ChIP-Sequencing technology (ChIP-Seq) uses high-throughput DNA sequencing to map protein-dna interactions across the entire genome.

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Regulation of eukaryotic transcription:

Regulation of eukaryotic transcription: Promoter definition by mass genome annotation data: in silico primer extension EMBNET course Bioinformatics of transcriptional regulation Jan 28 2008 Christoph Schmid Regulation of eukaryotic transcription:

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Protein Synthesis Notes

Protein Synthesis Notes Protein Synthesis Notes Protein Synthesis: Overview Transcription: synthesis of mrna under the direction of DNA. Translation: actual synthesis of a polypeptide under the direction of mrna. Transcription

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information