Gene & genome organisation. Computational gene identification

Size: px
Start display at page:

Download "Gene & genome organisation. Computational gene identification"

Transcription

1 Gene & genome organisation Computational gene identification

2 Eubacterial gene

3

4

5

6 Eukaryotic gene Regulatory elements Promoter Translation start Introns polya signal Transcription stop DNA Transcription start Exons Translation stop

7 Promoters in eukaryotic DNA 1. TATA box 2. Initiators 5 Y Y A +1 N [T,A] Y Y Y 3 3. CpG islands

8 EMBOSS CpGPlot

9 Splicing RNA (primary transcript) polya tail RNA (spliced) Translation Protein

10

11

12

13

14 Regulated splicing Exons may be combined differently during splicing. One gene can in this way give rise to multiple forms of a protein. Primary RNA transcript Splice variants

15 Alternative splicing of the? -tropomyosin pre-mrna a 13 Nonmuscle Smooth muscle Striated muscle Striated muscle' Hepatoma Brain a 13

16

17 Higher eukaryote genomes contain a substantial amount of non-coding sequences Genome No of genes Genes / MB Homo sapiens 3000 Mb ~40,000? ~13 Mycoplasma genitalium 0.6 MB ~600 ~1000

18 Repetitive DNA ~50 % of human genomic DNA mobile elements - - viral retrotransposons common in yeast & Drosophila - non-viral retrotransposons common in mammals LINES SINES LINES long interspersed elements 6-7 kb LINE1 : 600,000 copies in human genome = 15 % of genomic DNA

19

20 SINES short interspersed elements ~300 bp Sequence conservation ~80 % within the same species Alu sequence the most abundant class of SINE 1 million copies = 10 % of genomic DNA Many Alu sequences have cleavage sites for the restriction enzyme AluI, (AGCT), hence the name Originally derived from SRP RNA by reverse transcription

21 Repetitive sequences Mammals % genome size ~ 3 GB Fugu : < 15 % ~365 MB

22 Why so many repetitive elements in higher mammals? Mobile elements probably had a significant influence on evolution of higher organisms : Novel genes and new controls on gene expression were created because mobile elements have served as sites for recombination, leading to gene duplications and other gene rearrangements (exon shuffling).

23 Detection of repetitive DNA Dotplot analysis

24 Detection of repetitive DNA RepeatMasker RepeatMasker is a program that screens DNA sequences for interspersed repeats known to exist in mammalian genomes as well as for low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (replaced by Ns). On average, over 40% of a human genomic DNA sequence is masked by the program. Sequence comparisons in RepeatMasker are performed by the program cross_match, an efficient implementation of the Smith-Waterman-Gotoh algorithm developed by Phil Green HSU (22462) + MER7A DNA/MER2_type (109) HSU (21523) C TIGGER1 DNA/MER2_type (0) HSU (21222) C AluSx SINE/Alu (4) HSU (20544) C TIGGER1 DNA/MER2_type (943) HSU (20243) C AluSg SINE/Alu (0) HSU (19548) C TIGGER1 DNA/MER2_type (1608) HSU (19427) + MER7A DNA/MER2_type (1)

25 Human genome contains a substantial number of pseudogenes - non-functional gene variants Non-processed pseudogene Gene duplication has resulted in new copy of gene Copy has mutated to become non-functional Processed pseudogenes Non-functional genomic copies of mrnas. Often contain multiple mutations

26 Human genome: * variation of GC content * longer introns in AT-rich regions

27

28 Gene prediction methods - Ab initio, pattern recognition - Database searching Identification of ORFs Finding long ORFs Stop codon expected every 64/3 = 21 codons number of stop codons=3 (UAA, UAG, UGA) Average proteins are much longer Disadvantages: short genes are not detected some ORFs are false positives not suitable for eukaryotes

29

30 LOCUS AAB aa BCT 03-MAR-1995 DEFINITION aeph=putative exoenzyme production regulatory peptide [Erwinia carotovora, carotovora, Peptide, 47 aa]. ACCESSION AAB32243 PID g VERSION AAB GI: DBSOURCE locus S74077 accession S KEYWORDS. SOURCE Pectobacterium carotovorum carotovora. ORGANISM Pectobacterium carotovorum Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae; Pectobacterium. REFERENCE 1 (residues 1 to 47) AUTHORS Murata,H., Chatterjee,A., Liu,Y. and Chatterjee,A.K. TITLE Example of awful ORF prediction Regulation of the production of extracellular pectinase, cellulase, and protease in the soft rot bacterium Erwinia carotovora subsp. carotovora: evidence that aeph of E. carotovora subsp. carotovora 71 activates gene expression in E. carotovora subsp. car JOURNAL Appl. Environ. Microbiol. 60 (9), (1994) MEDLINE REMARK GenBank staff at the National Library of Medicine created this entry [NCBI gibbsq ] from the original journal article. This sequence comes from Fig. 2A. COMMENT Method: conceptual translation supplied by author. FEATURES Location/Qualifiers source /organism="pectobacterium carotovorum" /db_xref="taxon:554" Protein /product="aeph" /name="putative exoenzyme production regulatory peptide" CDS /gene="aeph+" /coded_by="s : " /note="author translates GTG start as Val" ORIGIN 1 vgqepkgies rkiqdghvrk kvgrqqglwv rttkkekfsr msrdanv

31 Compositional bias in coding regions Codon usage for enteric bacterial (highly expressed) genes 7/19/83 AmAcid Codon Number /1000 Fraction.. Gly GGG Gly GGA Gly GGU Gly GGC Glu GAG Glu GAA Asp GAU Asp GAC Val GUG Val GUA Val GUU Val GUC Ala GCG Ala GCA Ala GCU Ala GCC Arg AGG Arg AGA Ser AGU Ser AGC

32 CodonPreference Codon preference plot is constructed by calculating a codon preference statistic for each position of each of three reading frames. The statistic is calculated over a window of length w and window moved along the sequence in increments of three bases, maintainin the reading frame. The magnitude of the codon preference statistic is a measure of the likeness of particular window of codons to a predetermined preferred usage. p = preference parameter = relative likelihood of a codon being found in a gene as opposed to a random sequence f ABC /F ABC p = r ABC /R ABC f frequency of codon ABC(found in frequency table) F sum of frequencies for all codons that are members of ABCs synonymous family r frequency of codon ABC in a random sequence R sum of frequencies of ABCs synonymous family in a random sequence Codon preference statistic P (sum logp i /w) P = e w is between 25 and 50

33 CodonPreference is a frame-specific gene finder that tries to recognize protein coding sequences by virtue of the similarity of their codon usage to a codon frequency table or by the bias of their composition (usually GC) in the third position of each codon.

34 Compositional bias of exons K-tuple method from Bishop ed., Guide to Human Genome Computing Consider a sequence S = {s i } of length L. It can be transformed into a sequence of k-tuples (i.e oligonucleotides of length k) : W = {W k,i } (i = 1,, L - k + (1) ; W k,i??? Here? = {W k,i } is the set of all the possible oligonucleotides W k of length k. In this way it is possible to construct a table F with the occurrence frequency F(W k ) for all possible k-tuples of the set of sequences {S} having the function of interest. Consider two sets of sequences {S (1) } and {S (2) } with mutually exclusive functions, for instance intron and exon. It is possible to calculate the k-tuple frequency tables F 1 and F 2 for these two sets of sequences. The difference in frequencies between these tables can be used for discrimination. To analyze the test sequence using the F 1 and F 2 tables, calculate the local discriminant index for the ith position: d(i) = F 1 (S k,i ) / (F 1 ( S k,i ) + F 2 ( S k,i )) d(i) is smoothed using an averaging window of 2w+1 consecutive positions i + w?d??????????d(i) j = i - w k= 6 is often used

35 Local sites (=signals) used in the prediction of genes Promoters Terminators of transcription Start and stop codons Splice sites Branch points Polyadenylation sites Signal sensors = methods for detecting signals Content sensors Hexamer counts to discriminate between exons and introns Gene finding methods: Combination of signal and content sensors

36 Gene prediction methods Ab initio: HMM methods Genscan HMMGene Genie GeneMark.hmm FGENEH GeneID Ab initio: Neural network methods GRAIL NetGene2 Homology based Blast Procrustes Genewise

37 Limitations Non-coding parts, 5 and 3 UTRs, and non-coding RNAs are not detected Lack of suitable training sets of very long genomic sequences Methods are conservative - they have been trained on typical genes

HUMAN GENOME BIOINFORMATICS. Tore Samuelsson, Dec 2009

HUMAN GENOME BIOINFORMATICS. Tore Samuelsson, Dec 2009 HUMAN GENOME BIOINFORMATICS Tore Samuelsson, Dec 2009 The sequenced (gray filled) and unsequenced (white) portions of the human genome. Peter F.R. Little Genome Res. 2005; 15: 1759-1766 Human genome organisation

More information

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation 1. DNA, RNA structure 2. DNA replication 3. Transcription, translation DNA and RNA are polymers of nucleotides DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous

More information

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones?

(a) Which enzyme(s) make 5' - 3' phosphodiester bonds? (c) Which enzyme(s) make single-strand breaks in DNA backbones? EXAMPLE QUESTIONS AND ANSWERS 1. Topoisomerase does which one of the following? (a) Makes new DNA strands. (b) Unties knots in DNA molecules. (c) Joins the ends of double-stranded DNA molecules. (d) Is

More information

A Zero-Knowledge Based Introduction to Biology

A Zero-Knowledge Based Introduction to Biology A Zero-Knowledge Based Introduction to Biology Konstantinos (Gus) Katsiapis 25 Sep 2009 Thanks to Cory McLean and George Asimenos Cells: Building Blocks of Life cell, membrane, cytoplasm, nucleus, mitochondrion

More information

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3 Bio 111 Handout for Molecular Biology 4 This handout contains: Today s iclicker Questions Information on Exam 3 Solutions Fall 2008 Exam 3 iclicker Question #28A - before lecture Which of the following

More information

Biomolecules: lecture 6

Biomolecules: lecture 6 Biomolecules: lecture 6 - to learn the basics on how DNA serves to make RNA = transcription - to learn how the genetic code instructs protein synthesis - to learn the basics on how proteins are synthesized

More information

Lecture 11: Gene Prediction

Lecture 11: Gene Prediction Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are

More information

GenBank Growth. In 2003 ~ 31 million sequences ~ 37 billion base pairs

GenBank Growth. In 2003 ~ 31 million sequences ~ 37 billion base pairs Gene Finding GenBank Growth GenBank Growth In 2003 ~ 31 million sequences ~ 37 billion base pairs GenBank: Exponential Growth Growth of GenBank in billions of base pairs from release 3 in April of 1994

More information

Important points from last time

Important points from last time Important points from last time Subst. rates differ site by site Fit a Γ dist. to variation in rates Γ generally has two parameters but in biology we fix one to ensure a mean equal to 1 and the other parameter

More information

Protein Synthesis. Application Based Questions

Protein Synthesis. Application Based Questions Protein Synthesis Application Based Questions MRNA Triplet Codons Note: Logic behind the single letter abbreviations can be found at: http://www.biology.arizona.edu/biochemistry/problem_sets/aa/dayhoff.html

More information

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism Translation The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism Degenerate Code There are 64 possible codon triplets There are 20 naturally-encoding amino acids Several codons specify

More information

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins

Bioinformatics CSM17 Week 6: DNA, RNA and Proteins Bioinformatics CSM17 Week 6: DNA, RNA and Proteins Transcription (reading the DNA template) Translation (RNA -> protein) Protein Structure Transcription - reading the data enzyme - transcriptase gene opens

More information

7.016 Problem Set 3. 1 st Pedigree

7.016 Problem Set 3. 1 st Pedigree 7.016 Problem Set 3 Question 1 The following human pedigree shows the inheritance pattern of a specific disease within a family. Assume that the individuals marrying into the family for all generations

More information

Gene Structure & Gene Finding Part II

Gene Structure & Gene Finding Part II Gene Structure & Gene Finding Part II David Wishart david.wishart@ualberta.ca 30,000 metabolite Gene Finding in Eukaryotes Eukaryotes Complex gene structure Large genomes (0.1 to 10 billion bp) Exons and

More information

Gene Identification in silico

Gene Identification in silico Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction

More information

Computational gene finding

Computational gene finding Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative

More information

Annotating the Genome (H)

Annotating the Genome (H) Annotating the Genome (H) Annotation principles (H1) What is annotation? In general: annotation = explanatory note* What could be useful as an annotation of a DNA sequence? an amino acid sequence? What

More information

Genome annotation & EST

Genome annotation & EST Genome annotation & EST What is genome annotation? The process of taking the raw DNA sequence produced by the genome sequence projects and adding the layers of analysis and interpretation necessary

More information

Biomolecules: lecture 6

Biomolecules: lecture 6 Biomolecules: lecture 6 - to learn the basics on how DNA serves to make RNA = transcription - to learn how the genetic code instructs protein synthesis - to learn the basics on how proteins are synthesized

More information

Computational gene finding

Computational gene finding Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative

More information

Level 2 Biology, 2017

Level 2 Biology, 2017 91159 911590 2SUPERVISOR S Level 2 Biology, 2017 91159 Demonstrate understanding of gene expression 2.00 p.m. Wednesday 22 November 2017 Credits: Four Achievement Achievement with Merit Achievement with

More information

Human Gene,cs 06: Gene Expression. Diversity of cell types. How do cells become different? 9/19/11. neuron

Human Gene,cs 06: Gene Expression. Diversity of cell types. How do cells become different? 9/19/11. neuron Human Gene,cs 06: Gene Expression 20110920 Diversity of cell types neuron How do cells become different? A. Each type of cell has different DNA in its nucleus B. Each cell has different genes C. Each type

More information

ORFs and genes. Please sit in row K or forward

ORFs and genes. Please sit in row K or forward ORFs and genes Please sit in row K or forward https://www.flickr.com/photos/teseum/3231682806/in/photostream/ Question: why do some strains of Vibrio cause cholera and others don t? Methods Mechanisms

More information

Translating the Genetic Code. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences

Translating the Genetic Code. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences Translating the Genetic Code DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences An overview of gene expression Figure 13.2 The Idea of A Code 20 amino acids 4 nucleotides How do nucleic acids

More information

PRINCIPLES OF BIOINFORMATICS

PRINCIPLES OF BIOINFORMATICS PRINCIPLES OF BIOINFORMATICS BIO540/STA569/CSI660, Fall 2010 Lecture 3 (Sep-13-2010) Primer on Molecular Biology/Genomics Igor Kuznetsov Department of Epidemiology & Biostatistics Cancer Research Center

More information

Fishy Amino Acid Codon. UUU Phe UCU Ser UAU Tyr UGU Cys. UUC Phe UCC Ser UAC Tyr UGC Cys. UUA Leu UCA Ser UAA Stop UGA Stop

Fishy Amino Acid Codon. UUU Phe UCU Ser UAU Tyr UGU Cys. UUC Phe UCC Ser UAC Tyr UGC Cys. UUA Leu UCA Ser UAA Stop UGA Stop Fishy Code Slips Fish 1 GGTTATAGAGGTACTACC Fish 2 GGCTTCAGAGGTACTACC Fish 3 CATAGCAGAGGTACTACC Fish 4 GGTTATTCTGTCTTATTG Fish 5 GGCTTCTCTGTCTTATTG Fish 6 CATAGCGCTGCAACTACC Fishy Amino Acid Codon UUU Phe

More information

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project

More information

CSE 527 Computational Biology Autumn Lectures ~14-15 Gene Prediction

CSE 527 Computational Biology Autumn Lectures ~14-15 Gene Prediction CSE 527 Computational Biology Autumn 2004 Lectures ~14-15 Gene Prediction Some References A great online bib http://www.nslij-genetics.org/gene/ A good intro survey JM Claverie (1997) "Computational methods

More information

Describe the features of a gene which enable it to code for a particular protein.

Describe the features of a gene which enable it to code for a particular protein. 1. Answers should be written in continuous prose. Credit will be given for biological accuracy, the organisation and presentation of the information and the way in which the answer is expressed. Cancer

More information

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages

CONVERGENT EVOLUTION. Def n acquisition of some biological trait but different lineages CONVERGENT EVOLUTION Def n acquisition of some biological trait but different lineages Living Rock cactus Baseball plant THE QUESTION From common ancestor or independent acquisition? By Lineage By Convergence

More information

FUNCTIONAL BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.

More information

ANCIENT BACTERIA? 250 million years later, scientists revive life forms

ANCIENT BACTERIA? 250 million years later, scientists revive life forms ANCIENT BACTERIA? 250 million years later, scientists revive life forms Thursday, October 19, 2000 U.S. researchers say they have revived bacteria that have been dormant for more then 250 million years,

More information

p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION

p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION p-adic GENETIC CODE AND ULTRAMETRIC BIOINFORMATION Branko Dragovich http://www.phy.bg.ac.yu/ dragovich dragovich@ipb.ac.rs Institute of Physics, Mathematical Institute SASA, Belgrade 6th International

More information

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction

Reading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain

More information

BS 50 Genetics and Genomics Week of Oct 24

BS 50 Genetics and Genomics Week of Oct 24 BS 50 Genetics and Genomics Week of Oct 24 Additional Practice Problems for Section Question 1: The following table contains a list of statements that apply to replication, transcription, both, or neither.

More information

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns Folding simulation: self-organization of 4-helix bundle protein yellow = helical turns Protein structure Protein: heteropolymer chain made of amino acid residues R + H 3 N - C - COO - H φ ψ Chain of amino

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

Chimp Sequence Annotation: Region 2_3

Chimp Sequence Annotation: Region 2_3 Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker

More information

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Gene Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Gene Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Gene Prediction Now What? Suppose we want to annotate a genome according to genetic traits. Given a genome, where are the genes? Given

More information

Gene Prediction. Mario Stanke. Institut für Mikrobiologie und Genetik Abteilung Bioinformatik. Gene Prediction p.

Gene Prediction. Mario Stanke. Institut für Mikrobiologie und Genetik Abteilung Bioinformatik. Gene Prediction p. Gene Prediction Mario Stanke mstanke@gwdg.de Institut für Mikrobiologie und Genetik Abteilung Bioinformatik Gene Prediction p.1/23 Why Predict Genes with a Computer? tons of data 39/250 eukaryotic/prokaryotic

More information

Protein Synthesis: Transcription and Translation

Protein Synthesis: Transcription and Translation Review Protein Synthesis: Transcription and Translation Central Dogma of Molecular Biology Protein synthesis requires two steps: transcription and translation. DNA contains codes Three bases in DNA code

More information

Computational gene finding. Devika Subramanian Comp 470

Computational gene finding. Devika Subramanian Comp 470 Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative

More information

ab initio and Evidence-Based Gene Finding

ab initio and Evidence-Based Gene Finding ab initio and Evidence-Based Gene Finding A basic introduction to annotation Outline What is annotation? ab initio gene finding Genome databases on the web Basics of the UCSC browser Evidence-based gene

More information

Enduring Understanding

Enduring Understanding Enduring Understanding The processing of genetic information is imperfect and is a source of genetic variation. Objective: You will be able to create a visual representation to illustrate how changes in

More information

Just one nucleotide! Exploring the effects of random single nucleotide mutations

Just one nucleotide! Exploring the effects of random single nucleotide mutations Dr. Beatriz Gonzalez In-Class Worksheet Name: Learning Objectives: Just one nucleotide! Exploring the effects of random single nucleotide mutations Given a coding DNA sequence, determine the mrna Based

More information

7.013 Problem Set

7.013 Problem Set 7.013 Problem Set 4-2013 Question 1 The following human pedigree shows the inheritance of a specific disease. Please note: The filled squares or circles represent the abnormal phenotype. The individuals

More information

Gene Prediction. Srivani Narra Indian Institute of Technology Kanpur

Gene Prediction. Srivani Narra Indian Institute of Technology Kanpur Gene Prediction Srivani Narra Indian Institute of Technology Kanpur Email: srivani@iitk.ac.in Supervisor: Prof. Harish Karnick Indian Institute of Technology Kanpur Email: hk@iitk.ac.in Keywords: DNA,

More information

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma. Cannarozzi 28th October 2005 Class Overview RNA Protein Genomics Transcriptomics Proteomics Genome wide Genome Comparison Microarrays Orthology: Families comparison and Sequencing of Transcription factor

More information

Computational Biology I LSM5191 (2003/4)

Computational Biology I LSM5191 (2003/4) Computational Biology I LSM5191 (2003/4) Aylwin Ng, D.Phil. Lecture Notes: Features of the Human Genome Reading List International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis

More information

Bacterial Genome Annotation

Bacterial Genome Annotation Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control

More information

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko Chapter 10 The Structure and Function of DNA PowerPoint Lectures for Campbell Essential Biology, Fifth Edition, and Campbell Essential Biology with Physiology, Fourth Edition Eric J. Simon, Jean L. Dickey,

More information

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks Rationale of Genetic Studies Some goals of genetic studies include: to identify the genetic causes of phenotypic variation develop genetic tests o benefits to individuals and to society are still uncertain

More information

Disease and selection in the human genome 3

Disease and selection in the human genome 3 Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression

More information

ENZYMES AND METABOLIC PATHWAYS

ENZYMES AND METABOLIC PATHWAYS ENZYMES AND METABOLIC PATHWAYS This document is licensed under the Attribution-NonCommercial-ShareAlike 2.5 Italy license, available at http://creativecommons.org/licenses/by-nc-sa/2.5/it/ 1. Enzymes build

More information

132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, This exposition is based on the following source, which is recommended reading:

132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, This exposition is based on the following source, which is recommended reading: 132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, 214 1 Gene Prediction Using HMMs This exposition is based on the following source, which is recommended reading: 1. Chris Burge and Samuel

More information

Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading:

Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading: Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, 211 155 12 Gene Prediction Using HMMs This exposition is based on the following source, which is recommended reading: 1. Chris Burge and Samuel

More information

From Gene to Protein. How Genes Work

From Gene to Protein. How Genes Work From Gene to Protein How Genes Work 2007-2008 The Central Dogma Flow of genetic information in a cell How do we move information from DNA to proteins? DNA RNA protein replication phenotype You! Step 1:

More information

Review of Central Dogma; Simple Mendelian Inheritance

Review of Central Dogma; Simple Mendelian Inheritance Review of Central Dogma; Simple Mendelian Inheritance Problem Set #1 Answers: 1. 5'-ACCGTTATGAC-3' 2. No. You would also need to know if this organism has a double stranded DNA genome. Assuming that it

More information

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR

UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR UNIT I RNA AND TYPES R.KAVITHA,M.PHARM LECTURER DEPARTMENT OF PHARMACEUTICS SRM COLLEGE OF PHARMACY KATTANKULATUR RNA, as previously mentioned, is an acronym for ribonucleic acid. There are many forms

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE The Making of the The Fittest: Making of the Fittest Natural Selection Natural and Adaptation Selection and Adaptation Educator Materials TEACHER MATERIALS INTRODUCTION TO THE MOLECULAR GENETICS OF THE

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 1, 2004 Prokaryotic Gene Structure prokaryotes are simplest free-living organisms studying prokaryotes can give us a sense what is the minimum

More information

Bioinformation by Biomedical Informatics Publishing Group

Bioinformation by Biomedical Informatics Publishing Group Study of codon bias perspective of fungal xylanase gene by multivariate analysis Smriti Shrivastava, Raju Poddar, Pratyoosh Shukla*, Kunal Mukhopadhyay Department of Biotechnology, Birla Institute of Technology

More information

Honors packet Instructions

Honors packet Instructions Honors packet Instructions The following are guidelines in order for you to receive FULL credit for this bio packet: 1. Read and take notes on the packet in full 2. Answer the multiple choice questions

More information

Gene finding. Lorenzo Cerutti Swiss Institute of Bioinformatics. EMBNet course, September 2002

Gene finding. Lorenzo Cerutti Swiss Institute of Bioinformatics. EMBNet course, September 2002 Gene finding Lorenzo Cerutti Swiss Institute of Bioinformatics EMBNet course, September 2002 Introduction Gene finding is about detecting coding regions and infer gene structure Gene finding is difficult

More information

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

The combination of a phosphate, sugar and a base forms a compound called a nucleotide. History Rosalin Franklin: Female scientist (x-ray crystallographer) who took the picture of DNA James Watson and Francis Crick: Solved the structure of DNA from information obtained by other scientist.

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Ch 17 Practice Questions MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Garrod hypothesized that "inborn errors of metabolism" such as alkaptonuria

More information

Keywords: DNA methylation, deamination, codon usage, genome, genomics

Keywords: DNA methylation, deamination, codon usage, genome, genomics A PECULIAR CODON USAGE PATTERN REVEALED AFTER REMOVING THE EFFECT OF DNA METHYLATION Xuhua Xia Department of Biology, University of Ottawa E-mail: xxia@uottawa.ca Keywords: DNA methylation, deamination,

More information

Mutations. Lecture 15

Mutations. Lecture 15 Mutations Lecture 15 Objectives 1: Mutation Define mutation. Describe the types of mutations and their effects on the protein that is produced Distinguishing between spontaneous and induced mutations.

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

Genomes summary. Bacterial genome sizes

Genomes summary. Bacterial genome sizes Genomes summary 1. >930 bacterial genomes sequenced. 2. Circular. Genes densely packed. 3. 2-10 Mbases, 470-7,000 genes 4. Genomes of >200 eukaryotes (45 higher ) sequenced. 5. Linear chromosomes 6. On

More information

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences. Bio4342 Exercise 1 Answers: Detecting and Interpreting Genetic Homology (Answers prepared by Wilson Leung) Question 1: Low complexity DNA can be described as sequences that consist primarily of one or

More information

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018

Collect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018 Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l

More information

How does the human genome stack up? Genomic Size. Genome Size. Number of Genes. Eukaryotic genomes are generally larger.

How does the human genome stack up? Genomic Size. Genome Size. Number of Genes. Eukaryotic genomes are generally larger. How does the human genome stack up? Organism Human (Homo sapiens) Laboratory mouse (M. musculus) Mustard weed (A. thaliana) Roundworm (C. elegans) Fruit fly (D. melanogaster) Yeast (S. cerevisiae) Bacterium

More information

From DNA to Protein. Chapter 14

From DNA to Protein. Chapter 14 From DNA to Protein Chapter 14 What do genes code for? How does DNA code for cells & bodies? How are cells and bodies made from the instructions in DNA? DNA proteins cells bodies The Central Dogma Flow

More information

Problem Set 3

Problem Set 3 Name: 1 Topic 1. 2007 7.013 Problem Set 3 Due before 5 PM on FRIDAY, March 16, 2007. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. You have been studying transcription

More information

Gene Prediction 10/21/05

Gene Prediction 10/21/05 Gene Prediction 1/21/5 1/21/5 Gene Prediction Announcements Eam 2 - net Friday Posted online: Eam 2 Study Guide 544 Reading Assignment (2 papers) (formerly Gene Prediction - ) 1/21/5 D Dobbs ISU - BCB

More information

Analysis of Biological Sequences SPH

Analysis of Biological Sequences SPH Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,

More information

How life. constructs itself.

How life. constructs itself. How life constructs itself Life constructs itself using few simple rules of information processing. On the one hand, there is a set of rules determining how such basic chemical reactions as transcription,

More information

Understanding Genes & Mutations. John A Phillips III May 16, 2005

Understanding Genes & Mutations. John A Phillips III May 16, 2005 Understanding Genes & Mutations John A Phillips III May 16, 2005 Learning Objectives Understand gene structure Become familiar with genetic & mutation databases Be able to find information on genetic variation

More information

Codon Bias with PRISM. 2IM24/25, Fall 2007

Codon Bias with PRISM. 2IM24/25, Fall 2007 Codon Bias with PRISM 2IM24/25, Fall 2007 from RNA to protein mrna vs. trna aminoacid trna anticodon mrna codon codon-anticodon matching Watson-Crick base pairing A U and C G binding first two nucleotide

More information

6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA

6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA 6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA DNA mrna Protein DNA is found in the nucleus, but making a protein occurs at the ribosome

More information

measuring gene expression December 5, 2017

measuring gene expression December 5, 2017 measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA

More information

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,

Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian, Genscan The Genscan HMM model Training Genscan Validating Genscan (c) Devika Subramanian, 2009 96 Gene structure assumed by Genscan donor site acceptor site (c) Devika Subramanian, 2009 97 A simple model

More information

Chemistry 121 Winter 17

Chemistry 121 Winter 17 Chemistry 121 Winter 17 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State) E-mail: upali@latech.edu Office: 311 Carson Taylor Hall ; Phone: 318-257-4941;

More information

CSE 527 Computational Biology" Gene Prediction"

CSE 527 Computational Biology Gene Prediction CSE 527 Computational Biology" Gene Prediction" Gene Finding: Motivation" Sequence data flooding in" What does it mean?" "protein genes, RNA genes, mitochondria, chloroplast, regulation, replication, structure,

More information

Improved Splice Site Detection in Genie

Improved Splice Site Detection in Genie Improved Splice Site Detection in Genie Martin Reese Informatics Group Human Genome Center Lawrence Berkeley National Laboratory MGReese@lbl.gov http://www-hgc.lbl.gov/inf Santa Fe, 1/23/97 Database Homologies

More information

DNA strands and label their 5 and 3 Top strand. Direction of

DNA strands and label their 5 and 3 Top strand. Direction of Question 1 (20 points) a) In the table below, name the sub-cellular location or organelle(s) of the eukaryotic cell that will fluoresce when the following macromolecules are tagged with a fluorescent dye.

More information

From Gene to Protein. How Genes Work (Ch. 17)

From Gene to Protein. How Genes Work (Ch. 17) From Gene to Protein How Genes Work (Ch. 17) What do genes code for? How does DNA code for cells & bodies? how are cells and bodies made from the instructions in DNA DNA proteins cells bodies The Central

More information

Genes and Proteins. Objectives

Genes and Proteins. Objectives Genes and Proteins Lecture 15 Objectives At the end of this series of lectures, you should be able to: Define terms. Explain the central dogma of molecular biology. Describe the locations, reactants, and

More information

Genomics and Gene Recognition Genes and Blue Genes

Genomics and Gene Recognition Genes and Blue Genes Genomics and Gene Recognition Genes and Blue Genes November 3, 2004 Eukaryotic Gene Structure eukaryotic genomes are considerably more complex than those of prokaryotes eukaryotic cells have organelles

More information

DNA Evolution of knowledge about gene. Contains information about RNAs and proteins. Polynucleotide chains; Double stranded molecule;

DNA Evolution of knowledge about gene. Contains information about RNAs and proteins. Polynucleotide chains; Double stranded molecule; Evolution of knowledge about gene G. Mendel Hereditary factors W.Johannsen, 1909 G.W.Beadle, E.L.Tatum, 1945 Ingram, 1957 Actual concepts The gene hereditary unit located in chromosomes Hypotheses One

More information

GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS

GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS. Olivier GARSMEUR & Stéphanie SIDIBE-BOCS GENOME ANNOTATION INTRODUCTION TO CONCEPTS AND METHODS Olivier GARSMEUR & Stéphanie SIDIBE-BOCS Introduction two main concepts: Identify the different elements of the genome, (location and stucture) :

More information

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided.

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided. 7.014 Quiz II 3/18/05 Your Name: TA's Name: Write your name on this page and your initials on all the other pages in the space provided. This exam has 10 pages including this coversheet. heck that you

More information

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko Chapter 10 The Structure and Function of DNA PowerPoint Lectures for Campbell Essential Biology, Fifth Edition, and Campbell Essential Biology with Physiology, Fourth Edition Eric J. Simon, Jean L. Dickey,

More information

You are genetically unique

You are genetically unique BNF 5106 - Lecture 1 Genetics, Genes, Genetic codes, and Mutations You are genetically unique Since each parent has 23 pairs of chromosomes, the probability that each parent gives twice the same chromosomes

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS Nucleic acids are extremely large molecules that were first isolated from the nuclei of cells. Two kinds of nucleic acids are found in cells: RNA (ribonucleic

More information

DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences

DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,

More information