Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Size: px
Start display at page:

Download "Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)"

Transcription

1 Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013)

2 Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA Gene: SL6G GCCAAAATGTAGCAGCTTGGTTATCACAATAGATAGTTAAAGGTGGTACTGACTTACTCCACAAAG GACTATCTATCAACATAGATTGCAACCAATCATCTTCCTCCGAAAGAAGACATAGCTATAAATTCT GATTCATGGTTGAATGAGTAATACATGTCTGCTTCTTTGATTTACAAGAGATCGATGCTCCAGCCA Description: Putative disease resistance gene, Mi-homolog AAGTGAAAATTCAAGCAGTATTTAGACTCATTCGGATCAGAATTCCAAGTAGCATCGCTATAACCT ACGAAACATTTGAAATGTAGAATAATGTAGGCCAACATGTATCATACCCTTCAGGCATCTTAAATC Domains: NBS, LRR ACGAATTATGACATTCCAATGTTCAACTCTGGGATTACTAGTAAACTTGCTCAAAGTTCCTAGTGC TTAACCATATCCGGCCTGGTACAATGCATGACATACATAAAACTCCCAACAATCTGAGAATATGTC AGTTGATCGAGAACATCACTAGAGTTTCGCATTAGTTCGATGCTAGGATCAAATGGTGTAGGAGCA Best Blast hit: SA AGTTTGTCTCAAAGTGACCAAACCTCTTTAAATTTAAATTTAAATTTAAATTTAAATTTAAACTCA ATATAACTTGATTGAATAAGAGTTAGGCCATTCGTTGATCTTATAATTTTGATGCCCAAAATAAAT TTATAATGTTATAATACATAAAGACATATTATAACACAGATGTGTTTTGAAATTTACTAAATATGC AAATATCATCACCATTGATTGAGTAGTCATTAGAAATCATTACTCATCTAAATTTTTCATTTCATT ATTTTGGAGCTTGCTTTAATCCAAAAAGAGATTTAAAAAGCTTACAGACTTTGTGTTCTTACAGGT ATGACAAATACTTCTGATTGTTTCATGTACACTTCTTCATCTAGATCACCATTTAGAAATGCAGTC TTCACATCCATTTGATGTGTTACCATACTATGAATTGCGGCTAAAGCAACAAGAGTTGAATATAAG TCATTCGAGCTATAGGTGCAAAGGTATCAAAATAATCAATATCTTTCAATTGAGTATAACCTTTTG CTACCAAGCGAGCTTTGTACTTATCTAAGGTACCATCGATTTTAAATTCTTTCTAAAAGTCCATCT ACAATCAATAGATTTACATCCAGAAGGGAGGTCTGATAAAATTCATGTATTGTTTGACATAATAGA GTGCATTTCATCATTAATACCTTCACGCCAGAAAGGATCATCATGTGAAGCCGTTGCTTCAACAAA ACTTTCAGGATCCCTTTCAACTAAATAAACTTGAAATTTTGGTCTAAAATCTCTTGGTTTGGCTGA TCTTGCACTACGTCTTGATTGATTTTCTTCCAAAGGTTCAACTATTTTCTTTTAAGAGAAGATATA!

3 Gene prediction the eukaryotic gene model splice sites ATG * intergenic promoter poly-a intergenic 5 UTR 3 UTR protein coding region initial exon internal exon terminal exon intron intron

4 Structural gene annotation alignment based (1) Prediction of gene structures based on alignments Transcripts and proteins provide direct evidence Requires experimental data for each gene ATG * ATG *

5 Structural gene annotation alignment based (2) Genome-to-genome alignment Requires annotated genome of closely related species

6 Example tomato genome browser

7 Structural gene annotation ab initio (1) Prediction of gene structures based on gene model Start, stop, splice sites Exon, intron, intergenic length distributions Triplet/hexamer frequencies (coding vs. non-coding) ATG! * ATG! ATG! * ATG! * ATG! *

8 Structural gene annotation ab initio (2) Different predictors produce different results Underlying models (HMM, SVM, ) Quality of training Lack of understanding of biology

9 Structural gene annotation ab initio (3) Requirements for ab initio gene predictors Training through verified transcript (and protein) alignments Sufficient sequence context in order to make accurate predictions Some properties are common for all eukaryotes Start, stop, splice site consensus Many properties differ, even between related species Intron and intergenic length distribution Codon usage

10 Example - differences in intron lengths ~200 nt ~300 nt ~1200 nt ~2200 nt Bradnam and Korf, PLoS One. 2008

11 Generation of a consensus gene structure ATGTGTTACCATACTATGAATTGCGGCTAAAGCAACAAGAGTTGAATATAAGTCATTCGAGCTATAGGTGCAAAGGTATCAAAATAATCAATATCTTTCAATTGAGTATACCTTTTGCTACCAAGCGAGCTTTGTACTTATCTAAGG! GP1 GP2 GP3 EST prot gene

12 Example tomato genome browser

13 Functional gene annotation alignment based Inferring function through sequence similarity Proteins with similar sequence often share function Annotation quality of database sequences Many proteins with unknown function Propagation of erroneous annotation GQPKSKITHVVFCCTSGVDMPGADYQLTKLLGLRPSVKRLMMYQQG! : :! GQPKEKLGHVVFCTTSGVDMPGA--QLTKLMGLRPSIKKLMMYQQG!

14 Functional gene annotation domain based Inferring function through domain searches Domains are the functional parts of a protein Global functional annotation of the protein E.g. kinase, ATP-binding Gene Ontology (GO) terms CC NBS LRR LRR LRR

15 The annotated gene ATGTGTTACCATACTATGAATTGCGGCTAAAGCAACAAGAGTTGAATATAAGTCATTCGAGCTATAGGTGCAAAGGTATCAAAATAATCAATATCTTTCAATTGAGTATAACCTTTTGCTACCAAGCAGCTTTGTACTTATCTAAGG! model blastn blastx Putative disease resistance gene, Mi-homolog Unknown protein [Arabidopsis thaliana] domains NBS! LRR! LRR! LRR! Gene Ontology terms GO: ATP binding GO: apoptosis

16 Genome annotation: more than gene finding AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA GCCAAAATGTAGCAGCTTGGTTATCACAATAGATAGTTAAAGGTGGTACTGACTTACTCCACAAAG GACTATCTATCAACATAGATTGCAACCAATCATCTTCCTCCGAAAGAAGACATAGCTATAAATTCT GATTCATGGTTGAATGAGTAATACATGTCTGCTTCTTTGATTTACAAGAGATCGATGCTCCAGCCA trna Interspersed genes repeats (e.g. transposons) AAGTGAAAATTCAAGCAGTATTTAGACTCATTCGGATCAGAATTCCAAGTAGCATCGCTATAACCT ACGAAACATTTGAAATGTAGAATAATGTAGGCCAACATGTATCATACCCTTCAGGCATCTTAAATC rrna Tandem repeats, SSRs ACGAATTATGACATTCCAATGTTCAACTCTGGGATTACTAGTAAACTTGCTCAAAGTTCCTAGTGC TTAACCATATCCGGCCTGGTACAATGCATGACATACATAAAACTCCCAACAATCTGAGAATATGTC mirnas AGTTGATCGAGAACATCACTAGAGTTTCGCATTAGTTCGATGCTAGGATCAAATGGTGTAGGAGCA AGTTTGTCTCAAAGTGACCAAACCTCTTTAAATTTAAATTTAAATTTAAATTTAAATTTAAACTCA ATATAACTTGATTGAATAAGAGTTAGGCCATTCGTTGATCTTATAATTTTGATGCCCAAAATAAAT TTATAATGTTATAATACATAAAGACATATTATAACACAGATGTGTTTTGAAATTTACTAAATATGC AAATATCATCACCATTGATTGAGTAGTCATTAGAAATCATTACTCATCTAAATTTTTCATTTCATT ATTTTGGAGCTTGCTTTAATCCAAAAAGAGATTTAAAAAGCTTACAGACTTTGTGTTCTTACAGGT ATGACAAATACTTCTGATTGTTTCATGTACACTTCTTCATCTAGATCACCATTTAGAAATGCAGTC TTCACATCCATTTGATGTGTTACCATACTATGAATTGCGGCTAAAGCAACAAGAGTTGAATATAAG TCATTCGAGCTATAGGTGCAAAGGTATCAAAATAATCAATATCTTTCAATTGAGTATAACCTTTTG CTACCAAGCGAGCTTTGTACTTATCTAAGGTACCATCGATTTTAAATTCTTTCTAAAAGTCCATCT ACAATCAATAGATTTACATCCAGAAGGGAGGTCTGATAAAATTCATGTATTGTTTGACATAATAGA GTGCATTTCATCATTAATACCTTCACGCCAGAAAGGATCATCATGTGAAGCCGTTGCTTCAACAAA ACTTTCAGGATCCCTTTCAACTAAATAAACTTGAAATTTTGGTCTAAAATCTCTTGGTTTGGCTGA TCTTGCACTACGTCTTGATTGATTTTCTTCCAAAGGTTCAACTATTTTCTTTTAAGAGAAGATATA! Non-coding Repetitive sequences RNAs

17 Repeat identification and masking Repeats may contain coding elements E.g. reverse transcriptase in a retrotransposon This may result in many false gene predictions 56,797 genes predicted in rice 16,220 of these are repeat-related! Prior to gene prediction, repeats should be masked sequence similarity: requires database of known repeats de novo: distinguish between gene families and repeats

18 Genome annotation pipeline transcripts genome sequence repeat database trna repeat masker mirna gene predictor gene predictor gene predictor integration BLAST domain search

19 The annotated genome ATGTGTTACCATACTATGAATTGCGGCTAAAGCAACAAGAGTTGAATATAAGTCATTCGAGCTATAGGTGCAAAGGTATCAAAATAATCAATATCTTTCAATTGAGTATAACCTTTTGCAC genes blastx repeats ABC transporter trnas CGAGTCAGCTTCATATACTGCGCGCGATATATATTATCGCGTACGATCGATCGATCTGTACGGGTGACTTATTCGTGTATAGTCTATATCTTCGCTAGCTGATTATCGAGCGTACGTACGT genes blastx repeats trnas Cytochrome P450 MADS box transcription factor

20 Example tomato genome browser

21 Beyond genome annotation Automated annotation can provide candidate genes Similarity to known genes from other species Targets for crop improvement, treatment of (genetic) diseases, etc. Comparative genomics Study the evolution of species What makes a species unique? What makes an individual unique?

22 Activities Read A beginner's guide to eukaryotic genome annotation Mark Yandell, Daniel Ence Nature Reviews Genetics 2012 vol. 13 (5) pp Explore the tomato genome annotation Tomato Genome Project Explore the Arabidopsis genome annotation Tools, gbrowse

23 The End Wageningen UR

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation

Genome Annotation. Stefan Prost 1. May 27th, States of America. Genome Annotation Genome Annotation Stefan Prost 1 1 Department of Integrative Biology, University of California, Berkeley, United States of America May 27th, 2015 Outline Genome Annotation 1 Repeat Annotation 2 Repeat

More information

Genome Annotation Genome annotation What is the function of each part of the genome? Where are the genes? What is the mrna sequence (transcription, splicing) What is the protein sequence? What does

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity

Genome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader

More information

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

TIGR THE INSTITUTE FOR GENOMIC RESEARCH Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 3, 2004 Eukaryotic Gene Structure eukaryotic genomes are considerably more complex than those of prokaryotes eukaryotic cells have organelles

More information

Gene Structure & Gene Finding Part II

Gene Structure & Gene Finding Part II Gene Structure & Gene Finding Part II David Wishart david.wishart@ualberta.ca 30,000 metabolite Gene Finding in Eukaryotes Eukaryotes Complex gene structure Large genomes (0.1 to 10 billion bp) Exons and

More information

Relationship of Gene s Types and Introns

Relationship of Gene s Types and Introns Chi To BME 230 Final Project Relationship of Gene s Types and Introns Abstract: The relationship in gene ontology classification and the modification of the length of introns through out the evolution

More information

Eukaryotic Gene Prediction. Wei Zhu May 2007

Eukaryotic Gene Prediction. Wei Zhu May 2007 Eukaryotic Gene Prediction Wei Zhu May 2007 In nature, nothing is perfect... - Alice Walker Gene Structure What is Gene Prediction? Gene prediction is the problem of parsing a sequence into nonoverlapping

More information

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project

More information

MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes

MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes Resource MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes Brandi L. Cantarel, 1 Ian Korf, 2 Sofia M.C. Robb, 3 Genis Parra, 2 Eric Ross, 4 Barry Moore, 1 Carson Holt,

More information

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important! Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic

More information

DNA makes RNA makes Proteins. The Central Dogma

DNA makes RNA makes Proteins. The Central Dogma DNA makes RNA makes Proteins The Central Dogma TRANSCRIPTION DNA RNA transcript RNA polymerase RNA PROCESSING Exon RNA transcript (pre-mrna) Intron Aminoacyl-tRNA synthetase NUCLEUS CYTOPLASM FORMATION

More information

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian, Genscan The Genscan HMM model Training Genscan Validating Genscan (c) Devika Subramanian, 2009 96 Gene structure assumed by Genscan donor site acceptor site (c) Devika Subramanian, 2009 97 A simple model

More information

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018

Agenda. Annotation of Drosophila. Muller element nomenclature. Annotation: Adding labels to a sequence. GEP Drosophila annotation projects 01/03/2018 Agenda Annotation of Drosophila January 2018 Overview of the GEP annotation project GEP annotation strategy Types of evidence Analysis tools Web databases Annotation of a single isoform (walkthrough) Wilson

More information

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015 Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH BIOL 7210 A Computational Genomics 2/18/2015 The $1,000 genome is here! http://www.illumina.com/systems/hiseq-x-sequencing-system.ilmn Bioinformatics bottleneck

More information

CHAPTERS , 17: Eukaryotic Genetics

CHAPTERS , 17: Eukaryotic Genetics CHAPTERS 14.1 14.6, 17: Eukaryotic Genetics 1. Review the levels of DNA packing within the eukaryote nucleus. Label each level. (A similar diagram is on pg 188 of your textbook.) 2. How do the coding regions

More information

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence

Agenda. Web Databases for Drosophila. Gene annotation workflow. GEP Drosophila annotation projects 01/01/2018. Annotation adding labels to a sequence Agenda GEP annotation project overview Web Databases for Drosophila An introduction to web tools, databases and NCBI BLAST Web databases for Drosophila annotation UCSC Genome Browser NCBI / BLAST FlyBase

More information

Make the protein through the genetic dogma process.

Make the protein through the genetic dogma process. Make the protein through the genetic dogma process. Coding Strand 5 AGCAATCATGGATTGGGTACATTTGTAACTGT 3 Template Strand mrna Protein Complete the table. DNA strand DNA s strand G mrna A C U G T A T Amino

More information

Training materials.

Training materials. Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

From DNA to Protein: Genotype to Phenotype

From DNA to Protein: Genotype to Phenotype 12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each

More information

Assessing De-Novo Transcriptome Assemblies

Assessing De-Novo Transcriptome Assemblies Assessing De-Novo Transcriptome Assemblies Shawn T. O Neil Center for Genome Research and Biocomputing Oregon State University Scott J. Emrich University of Notre Dame 100K Contigs, Perfect 1M Contigs,

More information

BEADLE & TATUM EXPERIMENT

BEADLE & TATUM EXPERIMENT FROM DNA TO PROTEINS: gene expression Chapter 14 LECTURE OBJECTIVES What Is the Evidence that Genes Code for Proteins? How Does Information Flow from Genes to Proteins? How Is the Information Content in

More information

Methods and Algorithms for Gene Prediction

Methods and Algorithms for Gene Prediction Methods and Algorithms for Gene Prediction Chaochun Wei 韦朝春 Sc.D. ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Shanghai Jiao Tong University Shanghai Center for Bioinformation Technology 5/12/2011 K-J-C

More information

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society, Tübingen, Germany NGS Bioinformatics Meeting, Paris (March 24, 2010)

More information

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018

Outline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018 Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT

More information

DNA Function: Information Transmission

DNA Function: Information Transmission DNA Function: Information Transmission DNA is called the code of life. What does it code for? *the information ( code ) to make proteins! Why are proteins so important? Nearly every function of a living

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

Fig Ch 17: From Gene to Protein

Fig Ch 17: From Gene to Protein Fig. 17-1 Ch 17: From Gene to Protein Basic Principles of Transcription and Translation RNA is the intermediate between genes and the proteins for which they code Transcription is the synthesis of RNA

More information

Unit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total)

Unit 6: Molecular Genetics & DNA Technology Guided Reading Questions (100 pts total) Name: AP Biology Biology, Campbell and Reece, 7th Edition Adapted from chapter reading guides originally created by Lynn Miriello Chapter 16 The Molecular Basis of Inheritance Unit 6: Molecular Genetics

More information

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09

Molecular Biology Primer. CptS 580, Computational Genomics, Spring 09 Molecular Biology Primer pts 580, omputational enomics, Spring 09 Starting 19 th century What do we know of cellular biology? ell as a fundamental building block 1850s+: ``DNA was discovered by Friedrich

More information

Transcription in Eukaryotes

Transcription in Eukaryotes Transcription in Eukaryotes Biology I Hayder A Giha Transcription Transcription is a DNA-directed synthesis of RNA, which is the first step in gene expression. Gene expression, is transformation of the

More information

Gene Expression: Transcription

Gene Expression: Transcription Gene Expression: Transcription The majority of genes are expressed as the proteins they encode. The process occurs in two steps: Transcription = DNA RNA Translation = RNA protein Taken together, they make

More information

Bioinformatics for plant genome annotation. Mark Fiers

Bioinformatics for plant genome annotation. Mark Fiers Bioinformatics for plant genome annotation Mark Fiers Promoter: Prof. Dr. W.J. Stiekema Hoogleraar Genoominformatica Laboratorium voor Bioinformatica Wageningen Universiteit Copromoter: Dr. Ir. J.P. Nap

More information

Year III Pharm.D Dr. V. Chitra

Year III Pharm.D Dr. V. Chitra Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz]

Files for this Tutorial: All files needed for this tutorial are compressed into a single archive: [BLAST_Intro.tar.gz] BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Prequisites: None Resources: The BLAST web

More information

Student Learning Outcomes (SLOS)

Student Learning Outcomes (SLOS) Student Learning Outcomes (SLOS) KNOWLEDGE AND LEARNING SKILLS USE OF KNOWLEDGE AND LEARNING SKILLS - how to use Annhyb to save and manage sequences - how to use BLAST to compare sequences - how to get

More information

Eukaryotic Gene Structure

Eukaryotic Gene Structure Eukaryotic Gene Structure Terminology Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Gene Basic physical and

More information

Chapter 13. From DNA to Protein

Chapter 13. From DNA to Protein Chapter 13 From DNA to Protein Proteins All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequenceof a gene The Path From Genes to

More information

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks

Introduction to Bioinformatics CPSC 265. What is bioinformatics? Textbooks Introduction to Bioinformatics CPSC 265 Thanks to Jonathan Pevsner, Ph.D. Textbooks Johnathan Pevsner, who I stole most of these slides from (thanks!) has written a textbook, Bioinformatics and Functional

More information

Discovering Common Sequence Variation in A. thaliana. Gunnar Rätsch

Discovering Common Sequence Variation in A. thaliana. Gunnar Rätsch Machine Learning Methods for Discovering Common Sequence Variation in A. thaliana Gunnar Rätsch Friedrich Miescher Laboratory, Max Planck Society, Tübingen Technical University Berlin March 31, 2008 Current

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Wednesday, November 22, 17. Exons and Introns

Wednesday, November 22, 17. Exons and Introns Exons and Introns Introns and Exons Exons: coded regions of DNA that get transcribed and translated into proteins make up 5% of the genome Introns and Exons Introns: non-coded regions of DNA Must be removed

More information

Chapter 12. DNA TRANSCRIPTION and TRANSLATION

Chapter 12. DNA TRANSCRIPTION and TRANSLATION Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making

More information

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made

More information

RNA : functional role

RNA : functional role RNA : functional role Hamad Yaseen, PhD MLS Department, FAHS Hamad.ali@hsc.edu.kw RNA mrna rrna trna 1 From DNA to Protein -Outline- From DNA to RNA From RNA to Protein From DNA to RNA Transcription: Copying

More information

RNA-Seq with the Tuxedo Suite

RNA-Seq with the Tuxedo Suite RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with

More information

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1 AP BIOLOGY MOLECULAR GENETICS ACTIVITY #2 NAME DATE HOUR PROTEIN SYNTHESIS Molecular Genetics Activity #2 page 1 GENETIC CODE PROTEIN SYNTHESIS OVERVIEW Molecular Genetics Activity #2 page 2 PROTEIN SYNTHESIS

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

Genomes summary. Bacterial genome sizes

Genomes summary. Bacterial genome sizes Genomes summary 1. >930 bacterial genomes sequenced. 2. Circular. Genes densely packed. 3. 2-10 Mbases, 470-7,000 genes 4. Genomes of >200 eukaryotes (45 higher ) sequenced. 5. Linear chromosomes 6. On

More information

ELE4120 Bioinformatics. Tutorial 5

ELE4120 Bioinformatics. Tutorial 5 ELE4120 Bioinformatics Tutorial 5 1 1. Database Content GenBank RefSeq TPA UniProt 2. Database Searches 2 Databases A common situation for alignment is to search through a database to retrieve the similar

More information

Mapping strategies for sequence reads

Mapping strategies for sequence reads Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements

More information

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot The Genetic Code and Transcription Chapter 12 Honors Genetics Ms. Susan Chabot TRANSCRIPTION Copy SAME language DNA to RNA Nucleic Acid to Nucleic Acid TRANSLATION Copy DIFFERENT language RNA to Amino

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Bioinformatics for Proteomics. Ann Loraine

Bioinformatics for Proteomics. Ann Loraine Bioinformatics for Proteomics Ann Loraine aloraine@uab.edu What is bioinformatics? The science of collecting, processing, organizing, storing, analyzing, and mining biological information, especially data

More information

Genetics Biology 331 Exam 3B Spring 2015

Genetics Biology 331 Exam 3B Spring 2015 Genetics Biology 331 Exam 3B Spring 2015 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) DNA methylation may be a significant mode of genetic regulation

More information

Chapter 14 Active Reading Guide From Gene to Protein

Chapter 14 Active Reading Guide From Gene to Protein Name: AP Biology Mr. Croft Chapter 14 Active Reading Guide From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single

More information

Chapter 17: From Gene to Protein

Chapter 17: From Gene to Protein Name Period This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect to spend at least 6 hours to truly master

More information

Genie Gene Finding in Drosophila melanogaster

Genie Gene Finding in Drosophila melanogaster Methods Gene Finding in Drosophila melanogaster Martin G. Reese, 1,2,4 David Kulp, 2 Hari Tammana, 2 and David Haussler 2,3 1 Berkeley Drosophila Genome Project, Department of Molecular and Cell Biology,

More information

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST

Last Update: 12/31/2017. Recommended Background Tutorial: An Introduction to NCBI BLAST BLAST Exercise: Detecting and Interpreting Genetic Homology Adapted by T. Cordonnier, C. Shaffer, W. Leung and SCR Elgin from Detecting and Interpreting Genetic Homology by Dr. J. Buhler Recommended Background

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang Ruth Howe Bio 434W April 1, 2010 INTRODUCTION De novo annotation is the process by which a finished genomic sequence is searched for

More information

7.2 Protein Synthesis. From DNA to Protein Animation

7.2 Protein Synthesis. From DNA to Protein Animation 7.2 Protein Synthesis From DNA to Protein Animation Proteins Why are proteins so important? They break down your food They build up muscles They send signals through your brain that control your body They

More information

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer: Sequence Variations Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms NCBI SNP Primer: http://www.ncbi.nlm.nih.gov/about/primer/snps.html Overview Mutation and Alleles Linkage Genetic variation

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

Bio 101 Sample questions: Chapter 10

Bio 101 Sample questions: Chapter 10 Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information

More information

Guided tour to Ensembl

Guided tour to Ensembl Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org

More information

Chapter 14: Gene Expression: From Gene to Protein

Chapter 14: Gene Expression: From Gene to Protein Chapter 14: Gene Expression: From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect

More information

Product Applications for the Sequence Analysis Collection

Product Applications for the Sequence Analysis Collection Product Applications for the Sequence Analysis Collection Pipeline Pilot Contents Introduction... 1 Pipeline Pilot and Bioinformatics... 2 Sequence Searching with Profile HMM...2 Integrating Data in a

More information

8/21/2014. From Gene to Protein

8/21/2014. From Gene to Protein From Gene to Protein Chapter 17 Objectives Describe the contributions made by Garrod, Beadle, and Tatum to our understanding of the relationship between genes and enzymes Briefly explain how information

More information

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010

Computational analysis of non-coding RNA. Andrew Uzilov BME110 Tue, Nov 16, 2010 Computational analysis of non-coding RNA Andrew Uzilov auzilov@ucsc.edu BME110 Tue, Nov 16, 2010 1 Corrected/updated talk slides are here: http://tinyurl.com/uzilovrna redirects to: http://users.soe.ucsc.edu/~auzilov/bme110/fall2010/

More information

Chapter 18: Regulation of Gene Expression. 1. Gene Regulation in Bacteria 2. Gene Regulation in Eukaryotes 3. Gene Regulation & Cancer

Chapter 18: Regulation of Gene Expression. 1. Gene Regulation in Bacteria 2. Gene Regulation in Eukaryotes 3. Gene Regulation & Cancer Chapter 18: Regulation of Gene Expression 1. Gene Regulation in Bacteria 2. Gene Regulation in Eukaryotes 3. Gene Regulation & Cancer Gene Regulation Gene regulation refers to all aspects of controlling

More information

Transcription Eukaryotic Cells

Transcription Eukaryotic Cells Transcription Eukaryotic Cells Packet #20 1 Introduction Transcription is the process in which genetic information, stored in a strand of DNA (gene), is copied into a strand of RNA. Protein-encoding genes

More information

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Review of Protein (one or more polypeptide) A polypeptide is a long chain of.. Gene expression Review of Protein (one or more polypeptide) A polypeptide is a long chain of.. In a protein, the sequence of amino acid determines its which determines the protein s A protein with an enzymatic

More information

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes? Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology

More information

RNA metabolism. DNA dependent synthesis of RNA RNA processing RNA dependent synthesis of RNA and DNA.

RNA metabolism. DNA dependent synthesis of RNA RNA processing RNA dependent synthesis of RNA and DNA. RNA metabolism DNA dependent synthesis of RNA RNA processing RNA dependent synthesis of RNA and DNA http://www.youtube.com/watch?v=ovc8nxobxmq DNA dependent synthesis of RNA : production of an RNA molecule

More information

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation. Chapter 12 Packet DNA and RNA Name Period California State Standards covered by this chapter: Cell Biology 1. The fundamental life processes of plants and animals depend on a variety of chemical reactions

More information

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011

RNA Genomics II. BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 RNA Genomics II BME 110: CompBio Tools Todd Lowe & Andrew Uzilov May 17, 2011 1 TIME Why RNA? An evolutionary perspective The RNA World hypotheses: life arose as self-replicating non-coding RNA (ncrna)

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Genes and gene finding

Genes and gene finding Genes and gene finding Ben Langmead Department of Computer Science You are free to use these slides. If you do, please sign the guestbook (www.langmead-lab.org/teaching-materials), or email me (ben.langmead@gmail.com)

More information

DNA Structure and Analysis. Chapter 4: Background

DNA Structure and Analysis. Chapter 4: Background DNA Structure and Analysis Chapter 4: Background Molecular Biology Three main disciplines of biotechnology Biochemistry Genetics Molecular Biology # Biotechnology: A Laboratory Skills Course explorer.bio-rad.com

More information

Molecular Genetics Student Objectives

Molecular Genetics Student Objectives Molecular Genetics Student Objectives Exam 1: Enduring understanding 3.A: Heritable information provides for continuity of life. Essential knowledge 3.A.1: DNA, and in some cases RNA, is the primary source

More information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC

More information

DNA Replication and Repair

DNA Replication and Repair DNA Replication and Repair http://hyperphysics.phy-astr.gsu.edu/hbase/organic/imgorg/cendog.gif Overview of DNA Replication SWYK CNs 1, 2, 30 Explain how specific base pairing enables existing DNA strands

More information

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA.

Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA. Summary of Supplemental Information Figure S1: NUN preparation yields nascent, unadenylated RNA with a different profile from Total RNA. Figure S2: rrna removal procedure is effective for clearing out

More information

Interpreting RNA-seq data (Browser Exercise II)

Interpreting RNA-seq data (Browser Exercise II) Interpreting RNA-seq data (Browser Exercise II) In previous exercises, you spent some time learning about gene pages and examining genes in the context of the GBrowse genome browser. It is important to

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Outline Central Dogma of Molecular

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Introduction to the UCSC genome browser

Introduction to the UCSC genome browser Introduction to the UCSC genome browser Dominik Beck NHMRC Peter Doherty and CINSW ECR Fellow, Senior Lecturer Lowy Cancer Research Centre, UNSW and Centre for Health Technology, UTS SYDNEY NSW AUSTRALIA

More information

Chapter 10 - Molecular Biology of the Gene

Chapter 10 - Molecular Biology of the Gene Bio 100 - Molecular Genetics 1 A. Bacterial Transformation Chapter 10 - Molecular Biology of the Gene Researchers found that they could transfer an inherited characteristic (e.g. the ability to cause pneumonia),

More information

GEP Glossary. alpha-satellite sequence

GEP Glossary. alpha-satellite sequence GEP Glossary Topic Definition 3' Refers to the third carbon of the nucleic acid sugar moiety to which additional nucleotides may be added by polymerase, often used to refer to that end of a single-stranded

More information

REGULATION OF PROTEIN SYNTHESIS. II. Eukaryotes

REGULATION OF PROTEIN SYNTHESIS. II. Eukaryotes REGULATION OF PROTEIN SYNTHESIS II. Eukaryotes Complexities of eukaryotic gene expression! Several steps needed for synthesis of mrna! Separation in space of transcription and translation! Compartmentation

More information

Gene Expression - Transcription

Gene Expression - Transcription DNA Gene Expression - Transcription Genes are expressed as encoded proteins in a 2 step process: transcription + translation Central dogma of biology: DNA RNA protein Transcription: copy DNA strand making

More information

Transcription. DNA to RNA

Transcription. DNA to RNA Transcription from DNA to RNA The Central Dogma of Molecular Biology replication DNA RNA Protein transcription translation Why call it transcription and translation? transcription is such a direct copy

More information