Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading:
|
|
- Pearl Cynthia Burke
- 6 years ago
- Views:
Transcription
1 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, Gene Prediction Using HMMs This exposition is based on the following source, which is recommended reading: 1. Chris Burge and Samuel Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78-94 (1997) Introduction In the 196s, it was discovered that a gene and its protein product are colinear structures with a direct correlation between the triplets of nucleotides in the gene and the amino acids in the protein. It soon became clear that genes can be difficult to determine, due to the existence of overlapping genes, and genes within genes etc. Moreover, the paradox arose that the genome size of many eukaryotes does not correspond to genetic complexity, for example, the salamander genome is 1 times the size of that of human. In 1977, the surprising discovery of split genes was made: genes that consist of multiple pieces of coding DNA called exons, separated by stretches of non-coding DNA called introns. DNA DNA Transcription RNA mrna Translation nucleus mrna splicing Protein Protein Prokaryote Eukaryote The existence of split genes and a large proportion of non-coding DNA makes the following problem challenging in eukaryotes: The gene finding problem: Given a DNA sequence, correctly predict the structure of every gene contained in the sequence Types of genes Until quite recently, the word gene was usually understood to mean a coding gene that is translated into a protein sequence. We now distinguish between: Protein-coding genes and non-coding genes, also known as RNA genes, that code for RNAs such as: rrna trna snrna (small nuclear RNA) snorna (small nucleolar RNA)
2 156 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, 211 mirnas (micro RNAs).... In this chapter, we focus on the prediction of protein-coding genes ORF prediction in prokaryotes The simplest way to detect potential coding regions in prokaryotes is to look for so-called open reading frames: Definition (Open reading frame (ORF)) An ORF is a sequence of codons in DNA that starts with a start codon (ATG), ends with a stop codon (TAA, TAG or TGA) and has no other (in-frame) stop codons inside. Assume that we are at a start codon. In random DNA, the probability P (k) that we see the next stop codon exactly k codons away is geometrically distributed with p =. It is calculated # stop codons # all codons = 3 64 as P (k) = (1 p) k 1 p. The mean value is 1 p = codons. This is much smaller than the number of codons in an average protein ( 3). Essentially, long ORFs indicate genes, whereas short ORF may or may not indicate genes or short exons Codon usage Additionally, codon usage can be taken into account: Definition (Codon usage) The codon usage of a stretch of DNA sequence is given by a 64-component vector that counts how many times each codon is present in the sequence. These counts differ substantially between coding and non-coding regions. Example: part of the codon usage vector for coding regions of E. coli: amino acid Codon Freq. per 1 codons Gly (glycine) GGG 1.89 Gly GGA.44 Gly GGT Gly GGC Glu (glutamic acid) GAG Glu GAA 57.2 Aps (aspartic acid) GAT Asp GAC Eukaryotic gene structure Here is a simple model of the structure of a eukaryotic gene: Promotor TATA 5 UTR Start site ATG Initial exon Donor site Acceptor site Intron GT AG internal exon(s) Intron GT AG Terminal exon Stop site TAA TAG TGA 3 UTR Poly A AAATAAAA
3 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, Given a long segment of DNA sequence, how to find such genes in it? Statistical properties of the different components of such a gene model can help to predict genes in unannotated DNA. For example, for the bases around the start site the observed frequencies can be described using a position weight matrix: Pos A C G T This type of matrix is obtained by counting the frequencies of different nucleotides near the start sites in a training set of annotated sequences GENSCAN s model In the remainder of this chapter, we will discuss the details of GENSCAN, a classic program for predicting genes in eukaryotes, which is based on an HMM: E+ E1+ E2+ I+ I1+ I2+ P (promoter) A (poly A signal) F (5 UTR) Esngl (single exon gene) T (3 UTR) Einit+ (initial exon) Eterm+ (terminal exon) Einit (initial exon) Eterm (terminal exon) F+ (5 UTR) Esngl+ (single exon gene) T+ (3 UTR) P+ (promoter) A+ (poly A signal) I I1 I2 Forward (+) strand Reverse ( ) strand N (intergenic region) E E1 E GENSCAN s model Genscan has 27 states: The N state models intergenic regions. the P and A states model a promotor region and a poly-a attachment site. The F and T states model the 5 and 3 untranslated regions. The Einit, Eterm and Esngl model initial-, terminal- and single exons, respectively. There are three intron states I, I1 and I2, representing Genscan predicts genes in both strands of DNA simulatenously, that is why there are two copies of each state (except the intergenic one).
4 158 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, 211 GENSCAN uses an explicit state duration HMM. This is an HMM in which a duration period is explicitly modeled for each state, using a probability distribution. The model is thought of generating a parse π, consisting of: a sequence of states q = (q 1, q 2,..., q n ), and an associated sequence of durations d = (d 1, d 2,..., d n ), which, using probabilistic models for each of the state types, generates a DNA sequence S of length L = n i=1 d i. The generation of a parse of a given sequence length L proceeds as follows: 1. An initial state q 1 is chosen according to an initial distribution p on the states, i.e. p i = P (q 1 = Q (i) ), where Q (j) (j = 1,..., 27) is an indexing of the states of the model. 2. A state duration or length d 1 is generated conditional on the value of q 1 = Q (i) from the duration distribution f Q (i). 3. A sequence segment s 1 of length d 1 is generated, conditional on d 1 and q 1, according to an appropriate sequence-generating model for state type q The subsequent state q 2 is generated, conditional on the value of q 1, from the (first-order Markov) state transition matrix T, i.e. T i,j = P (q k+1 = Q (j) q k = Q (i) ). This process is repeated until the sum n i=1 d i of the state durations first equals or exceeds L, at which point the last state duration is appropriately truncated, the final stretch of sequence is generated and the process stops. The resulting sequence is simply the concatenation of the sequence segments, S = s 1 s 2... s n. In addition to its topology involving the 27 states and 46 transitions depicted above, the model has four main components: a vector of initial probabilities p, a matrix of state transition probabilities T, a set of length distributions f, and a set of sequence generating models P. (Recall that an HMM has initial-, transition- and emission probabilities) Computation of a gene prediction As the GENSCAN model is an HMM, augmented by explicit durations, the algorithms discussed in the HMMs chapter can all be applied (after appropriate modifications to take the durations into account). In particular, a gene prediction can be performed by using the Viterbi algorithm to compute a most probable path through the HMM. Note that the model is set up in such a way that a gene prediction is performed in both strands of the DNA simultaneously.
5 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, Details of the model So far, we have discussed the topology and the other main components of the GENSCAN model in general terms. The following details still need to be discussed: different intron phases, the initial and transition probabilities, the state length distributions, transcriptional and translational signals, splice signals, and reverse-strand states. Due to time constraints, we will only consider a few of these issues Intron phase Splicing in genes does not respect codon boundaries. Hence, any exon (except the terminal one) may end with an overhang of, 1 or 2 nucleotides and this is tracked by the HMM by transitioning to an intron state of phase, 1 or 2, respectively. When we enter an exon state from on intron state of phase i, then we must first generate the appropriate number of nucleotides to complete the overhanging codon of the previous exon before generating further codons. phase 1 intron TA TGT GTT ACT CGC GCT CGC TT exon phase 2 intron 12.9 State length distributions for exons and introns The different states of the model correspond to sequence segments with different length distributions. For some states, especially internal exon states E k, length is important for proper biological function, i.e. proper splicing and inclusion in the final processed mrna. It has been shown in vivo that internal deletions of exons to sizes below about 5 bp may often lead to exon skipping, and steric interference between factors recognizing splice sites, may make splicing of small exons more difficult. Spliceosomal assembly may be inhibited if internal exons are expanded beyond 3 bp. In summary, these arguments support the observation that internal exons are usually bp long, with only a few of length less that 5 bp or more than 3 bp. Constraints for initial and terminal exons are slightly different. The duration in initial, internal and terminal exon states is modeled by a different empirical distribution for each of the types of states.
6 16 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, Number of exons 3 Number of exons 1 Number of exons Length (bp) Length (bp) Length (bp) Initial exons Internal exons Terminal exons In contrast to exons, the length of introns does not seem biologically critical, although a minimum length of 7 8 may be preferred. The length distribution for introns appears to be approximately geometric (exponential), with the average length depending on the C+G content of the sequence (the higher the content, the shorter the introns, on average). Hence, the duration in intron states is modeled by a geometric distribution with parameter q that depends on the C+G content. 3 Number of introns 2 1 1k 2k 3k 4k 5k 6k 7k 8k Length (bp) Introns 12.1 Simple signal models There are a number of different models of biological signal sequences, such as donor and acceptor sites, promoters, etc. One simple approach is the weight matrix method (WMM). Here, the frequency p a (i) of each nucleotide a at position i of a signal of length n is derived from a collection of aligned signal sequences. The product P (A) = n A = a 1 a 2... a n. i=1 P a (i) i Here is a WMM for recognition of a start site: is used to estimate the probability of generating a particular sequence Pos A C G T Under this model, the sequence...ccgccacc ATG GCGC... has the highest probability of containing a start site, namely: P = =.6. The sequence...agtttttt ATG TAAT... has the lowest non-zero probability of containing a start site at the indicated position, namely: P = =
7 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, Splice signals The donor and acceptor splice signals are probably the most important signals, as the majority of exons are internal ones. The consensus region of the donor splice sites covers the last 3 bp of the exon (positions -3 to -1) and the first 6 bp of the succeeding intron (positions 1 to 6):... exon intron... Position Consensus c/a A G G T a/g A G t WMM: A C G T Example of GENSCAN summary output GENSCAN 1. Date run: 28-Apr-14 Time: 2:56:56 Sequence HUMAN DNA : bp : 52.9% C+G : Isochore 3 (51-57 C+G%) Parameter matrix: HumanIso.smat Predicted genes/exons: Gn.Ex Type S.Begin...End.Len Fr Ph I/Ac Do/T CodRg P... Tscr Intr Intr Term PlyA PlyA Term Intr Intr Init Prom PlyA Sngl Prom Prom Performance of GENSCAN GENSCAN was run on a test set of 57 vertebrate sequences and the forward strand exons in the optimal GENSCAN parse of the sequence were compared to the annotated exons. The following table shows the results and compares them with results obtained using other programs:
8 162 Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, 211 (Source: Burge and Karlin 1997) GENSCAN performs very well here and is a widely-used gene finding method Summary A straight-forward approach to finding genes in prokaryotes is to search for long open reading frames that have a plausible codon usage. Gene finding in eukaryotes is more complicated, due of splicing. GENSCAN is a popular tool for solving this problem. It uses an explicit-duration HMM to model genes and the Viterbi algorithm is used to produce a parse. GENSCAN can be run at
Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,
Genscan The Genscan HMM model Training Genscan Validating Genscan (c) Devika Subramanian, 2009 96 Gene structure assumed by Genscan donor site acceptor site (c) Devika Subramanian, 2009 97 A simple model
More informationOutline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation
Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project
More informationLecture 11: Gene Prediction
Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationGenomics and Gene Recognition Genes and Blue Genes
Genomics and Gene Recognition Genes and Blue Genes November 1, 2004 Prokaryotic Gene Structure prokaryotes are simplest free-living organisms studying prokaryotes can give us a sense what is the minimum
More informationThemes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!
Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic
More informationTranscription in Eukaryotes
Transcription in Eukaryotes Biology I Hayder A Giha Transcription Transcription is a DNA-directed synthesis of RNA, which is the first step in gene expression. Gene expression, is transformation of the
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationBIO 311C Spring Lecture 36 Wednesday 28 Apr.
BIO 311C Spring 2010 1 Lecture 36 Wednesday 28 Apr. Synthesis of a Polypeptide Chain 5 direction of ribosome movement along the mrna 3 ribosome mrna NH 2 polypeptide chain direction of mrna movement through
More informationComputational gene finding. Devika Subramanian Comp 470
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationCH 17 :From Gene to Protein
CH 17 :From Gene to Protein Defining a gene gene gene Defining a gene is problematic because one gene can code for several protein products, some genes code only for RNA, two genes can overlap, and there
More informationEukaryotic Gene Structure
Eukaryotic Gene Structure Terminology Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Gene Basic physical and
More informationMOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1
AP BIOLOGY MOLECULAR GENETICS ACTIVITY #2 NAME DATE HOUR PROTEIN SYNTHESIS Molecular Genetics Activity #2 page 1 GENETIC CODE PROTEIN SYNTHESIS OVERVIEW Molecular Genetics Activity #2 page 2 PROTEIN SYNTHESIS
More informationUCSC Genome Browser. Introduction to ab initio and evidence-based gene finding
UCSC Genome Browser Introduction to ab initio and evidence-based gene finding Wilson Leung 06/2006 Outline Introduction to annotation ab initio gene finding Basics of the UCSC Browser Evidence-based gene
More informationMake the protein through the genetic dogma process.
Make the protein through the genetic dogma process. Coding Strand 5 AGCAATCATGGATTGGGTACATTTGTAACTGT 3 Template Strand mrna Protein Complete the table. DNA strand DNA s strand G mrna A C U G T A T Amino
More informationFig Ch 17: From Gene to Protein
Fig. 17-1 Ch 17: From Gene to Protein Basic Principles of Transcription and Translation RNA is the intermediate between genes and the proteins for which they code Transcription is the synthesis of RNA
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationChapter 13. From DNA to Protein
Chapter 13 From DNA to Protein Proteins All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequenceof a gene The Path From Genes to
More informationDNA Replication and Repair
DNA Replication and Repair http://hyperphysics.phy-astr.gsu.edu/hbase/organic/imgorg/cendog.gif Overview of DNA Replication SWYK CNs 1, 2, 30 Explain how specific base pairing enables existing DNA strands
More informationGenomics and Gene Recognition Genes and Blue Genes
Genomics and Gene Recognition Genes and Blue Genes November 3, 2004 Eukaryotic Gene Structure eukaryotic genomes are considerably more complex than those of prokaryotes eukaryotic cells have organelles
More informationGenes and gene finding
Genes and gene finding Ben Langmead Department of Computer Science You are free to use these slides. If you do, please sign the guestbook (www.langmead-lab.org/teaching-materials), or email me (ben.langmead@gmail.com)
More informationWednesday, November 22, 17. Exons and Introns
Exons and Introns Introns and Exons Exons: coded regions of DNA that get transcribed and translated into proteins make up 5% of the genome Introns and Exons Introns: non-coded regions of DNA Must be removed
More informationDNA Function: Information Transmission
DNA Function: Information Transmission DNA is called the code of life. What does it code for? *the information ( code ) to make proteins! Why are proteins so important? Nearly every function of a living
More informationThe Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot
The Genetic Code and Transcription Chapter 12 Honors Genetics Ms. Susan Chabot TRANSCRIPTION Copy SAME language DNA to RNA Nucleic Acid to Nucleic Acid TRANSLATION Copy DIFFERENT language RNA to Amino
More informationOutline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions
Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson
More informationGene Expression: Transcription
Gene Expression: Transcription The majority of genes are expressed as the proteins they encode. The process occurs in two steps: Transcription = DNA RNA Translation = RNA protein Taken together, they make
More information8/21/2014. From Gene to Protein
From Gene to Protein Chapter 17 Objectives Describe the contributions made by Garrod, Beadle, and Tatum to our understanding of the relationship between genes and enzymes Briefly explain how information
More informationChapter 14 Active Reading Guide From Gene to Protein
Name: AP Biology Mr. Croft Chapter 14 Active Reading Guide From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single
More informationChapter 14: Gene Expression: From Gene to Protein
Chapter 14: Gene Expression: From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationGenome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)
Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA
More informationChapter 17: From Gene to Protein
Name Period This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect to spend at least 6 hours to truly master
More informationYear III Pharm.D Dr. V. Chitra
Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves
More informationSection 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?
Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Ch 17 Practice Questions MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) Garrod hypothesized that "inborn errors of metabolism" such as alkaptonuria
More informationFrom Gene to Protein. How Genes Work (Ch. 17)
From Gene to Protein How Genes Work (Ch. 17) What do genes code for? How does DNA code for cells & bodies? how are cells and bodies made from the instructions in DNA DNA proteins cells bodies The Central
More informationMODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?
MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE? Lesson Plan: Title Introduction to the Genome Browser: what is a gene? JOYCE STAMM Objectives Demonstrate basic skills in using the UCSC Genome
More informationPROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein
PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationChapter 12. DNA TRANSCRIPTION and TRANSLATION
Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making
More informationCh. 10 Notes DNA: Transcription and Translation
Ch. 10 Notes DNA: Transcription and Translation GOALS Compare the structure of RNA with that of DNA Summarize the process of transcription Relate the role of codons to the sequence of amino acids that
More informationTranscription Eukaryotic Cells
Transcription Eukaryotic Cells Packet #20 1 Introduction Transcription is the process in which genetic information, stored in a strand of DNA (gene), is copied into a strand of RNA. Protein-encoding genes
More informationAssessment Schedule 2013 Biology: Demonstrate understanding of gene expression (91159)
NCEA Level 2 Biology (91159) 2013 page 1 of 6 Assessment Schedule 2013 Biology: Demonstrate understanding of gene expression (91159) Assessment Criteria with Merit with Excellence Demonstrate understanding
More information7.2 Protein Synthesis. From DNA to Protein Animation
7.2 Protein Synthesis From DNA to Protein Animation Proteins Why are proteins so important? They break down your food They build up muscles They send signals through your brain that control your body They
More informationPROTEIN SYNTHESIS. copyright cmassengale
PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other
More informationPROTEIN SYNTHESIS. copyright cmassengale
PROTEIN SYNTHESIS 1 DNA and Genes 2 Roles of RNA and DNA DNA is the MASTER PLAN RNA is the BLUEPRINT of the Master Plan 3 RNA Differs from DNA RNA has a sugar ribose DNA has a sugar deoxyribose 4 Other
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationGene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar
Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Introduction Protein-coding gene prediction RNA gene prediction Modification
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationCh 10 Molecular Biology of the Gene
Ch 10 Molecular Biology of the Gene For Next Week Lab -Hand in questions from 4 and 5 by TUES in my mailbox (Biology Office) -Do questions for Lab 6 for next week -Lab practical next week Lecture Read
More informationProkaryotic Transcription
Prokaryotic Transcription Transcription Basics DNA is the genetic material Nucleic acid Capable of self-replication and synthesis of RNA RNA is the middle man Nucleic acid Structure and base sequence are
More informationGene Prediction: Statistical Approaches
Gene Prediction: Statistical Approaches Outline Codons Discovery of Split Genes Exons and Introns Splicing Open Reading Frames Codon Usage Splicing Signals TestCode Gene Prediction: Computational Challenge
More informationBEADLE & TATUM EXPERIMENT
FROM DNA TO PROTEINS: gene expression Chapter 14 LECTURE OBJECTIVES What Is the Evidence that Genes Code for Proteins? How Does Information Flow from Genes to Proteins? How Is the Information Content in
More informationReview of Protein (one or more polypeptide) A polypeptide is a long chain of..
Gene expression Review of Protein (one or more polypeptide) A polypeptide is a long chain of.. In a protein, the sequence of amino acid determines its which determines the protein s A protein with an enzymatic
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationChapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.
Chapter 12 Packet DNA and RNA Name Period California State Standards covered by this chapter: Cell Biology 1. The fundamental life processes of plants and animals depend on a variety of chemical reactions
More informationHello! Outline. Cell Biology: RNA and Protein synthesis. In all living cells, DNA molecules are the storehouses of information. 6.
Cell Biology: RNA and Protein synthesis In all living cells, DNA molecules are the storehouses of information Hello! Outline u 1. Key concepts u 2. Central Dogma u 3. RNA Types u 4. RNA (Ribonucleic Acid)
More informationFrom DNA to Protein: Genotype to Phenotype
12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each
More informationFrom Gene to Protein transcription, messenger RNA (mrna) translation, RNA processing triplet code, template strand, codons,
From Gene to Protein I. Transcription and translation are the two main processes linking gene to protein. A. RNA is chemically similar to DNA, except that it contains ribose as its sugar and substitutes
More informationProtein Synthesis. OpenStax College
OpenStax-CNX module: m46032 1 Protein Synthesis OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this section, you will
More informationProtein Synthesis: Transcription and Translation
Review Protein Synthesis: Transcription and Translation Central Dogma of Molecular Biology Protein synthesis requires two steps: transcription and translation. DNA contains codes Three bases in DNA code
More informationSSA Signal Search Analysis II
SSA Signal Search Analysis II SSA other applications - translation In contrast to translation initiation in bacteria, translation initiation in eukaryotes is not guided by a Shine-Dalgarno like motif.
More informationProtein Synthesis Notes
Protein Synthesis Notes Protein Synthesis: Overview Transcription: synthesis of mrna under the direction of DNA. Translation: actual synthesis of a polypeptide under the direction of mrna. Transcription
More informationTranscription & post transcriptional modification
Transcription & post transcriptional modification Transcription The synthesis of RNA molecules using DNA strands as the templates so that the genetic information can be transferred from DNA to RNA Similarity
More informationTranscription. DNA to RNA
Transcription from DNA to RNA The Central Dogma of Molecular Biology replication DNA RNA Protein transcription translation Why call it transcription and translation? transcription is such a direct copy
More information1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation
1. DNA, RNA structure 2. DNA replication 3. Transcription, translation DNA and RNA are polymers of nucleotides DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous
More informationTRANSCRIPTION AND PROCESSING OF RNA
TRANSCRIPTION AND PROCESSING OF RNA 1. The steps of gene expression. 2. General characterization of transcription: steps, components of transcription apparatus. 3. Transcription of eukaryotic structural
More informationChromosomes. Chromosomes. Genes. Strands of DNA that contain all of the genes an organism needs to survive and reproduce
Chromosomes Chromosomes Strands of DNA that contain all of the genes an organism needs to survive and reproduce Genes Segments of DNA that specify how to build a protein genes may specify more than one
More informationCodon Bias with PRISM. 2IM24/25, Fall 2007
Codon Bias with PRISM 2IM24/25, Fall 2007 from RNA to protein mrna vs. trna aminoacid trna anticodon mrna codon codon-anticodon matching Watson-Crick base pairing A U and C G binding first two nucleotide
More informationEukaryotic Gene Prediction. Wei Zhu May 2007
Eukaryotic Gene Prediction Wei Zhu May 2007 In nature, nothing is perfect... - Alice Walker Gene Structure What is Gene Prediction? Gene prediction is the problem of parsing a sequence into nonoverlapping
More informationSolutions to Quiz II
MIT Department of Biology 7.014 Introductory Biology, Spring 2005 Solutions to 7.014 Quiz II Class Average = 79 Median = 82 Grade Range % A 90-100 27 B 75-89 37 C 59 74 25 D 41 58 7 F 0 40 2 Question 1
More informationProtein Synthesis & Gene Expression
DNA provides the instructions for how to build proteins Each gene dictates how to build a single protein in prokaryotes The sequence of nucleotides (AGCT) in DNA dictates the order of amino acids that
More informationDNA makes RNA makes Proteins. The Central Dogma
DNA makes RNA makes Proteins The Central Dogma TRANSCRIPTION DNA RNA transcript RNA polymerase RNA PROCESSING Exon RNA transcript (pre-mrna) Intron Aminoacyl-tRNA synthetase NUCLEUS CYTOPLASM FORMATION
More informationGene Prediction: Statistical Approaches
Gene Prediction: Statistical Approaches Outline Codons Discovery of Split Genes Exons and Introns Splicing Open Reading Frames Codon Usage Splicing Signals TestCode Gene Prediction: Computational Challenge
More informationDNA Transcription. Dr Aliwaini
DNA Transcription 1 DNA Transcription-Introduction The synthesis of an RNA molecule from DNA is called Transcription. All eukaryotic cells have five major classes of RNA: ribosomal RNA (rrna), messenger
More informationGene Structure & Gene Finding Part II
Gene Structure & Gene Finding Part II David Wishart david.wishart@ualberta.ca 30,000 metabolite Gene Finding in Eukaryotes Eukaryotes Complex gene structure Large genomes (0.1 to 10 billion bp) Exons and
More informationMethods and Algorithms for Gene Prediction
Methods and Algorithms for Gene Prediction Chaochun Wei 韦朝春 Sc.D. ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Shanghai Jiao Tong University Shanghai Center for Bioinformation Technology 5/12/2011 K-J-C
More informationGenome Annotation. What Does Annotation Describe??? Genome duplications Genes Mobile genetic elements Small repeats Genetic diversity
Genome Annotation Genome Sequencing Costliest aspect of sequencing the genome o But Devoid of content Genome must be annotated o Annotation definition Analyzing the raw sequence of a genome and describing
More informationDo you think DNA is important? T.V shows Movies Biotech Films News Cloning Genetic Engineering
DNA Introduction Do you think DNA is important? T.V shows Movies Biotech Films News Cloning Genetic Engineering At the most basic level DNA is a set of instructions for protein construction. Structural
More informationInterpretation of sequence results
Interpretation of sequence results An overview on DNA sequencing: DNA sequencing involves the determination of the sequence of nucleotides in a sample of DNA. It use a modified PCR reaction where both
More informationBio 101 Sample questions: Chapter 10
Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information
More informationThr Gly Tyr. Gly Lys Asn
Your unique body characteristics (traits), such as hair color or blood type, are determined by the proteins your body produces. Proteins are the building blocks of life - in fact, about 45% of the human
More informationDisease and selection in the human genome 3
Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression
More informationSAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer
TEACHER S GUIDE SAY IT WITH DNA: Protein Synthesis Activity by Larry Flammer SYNOPSIS This activity uses the metaphor of decoding a secret message for the Protein Synthesis process. Students teach themselves
More informationGene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar
Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar Gene Prediction Introduction Protein-coding gene prediction RNA gene prediction Modification
More informationChapter 10: Gene Expression and Regulation
Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must
More informationGene Prediction. Srivani Narra Indian Institute of Technology Kanpur
Gene Prediction Srivani Narra Indian Institute of Technology Kanpur Email: srivani@iitk.ac.in Supervisor: Prof. Harish Karnick Indian Institute of Technology Kanpur Email: hk@iitk.ac.in Keywords: DNA,
More informationmeasuring gene expression December 5, 2017
measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA
More information90 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 4, 2006
90 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 4, 2006 8 RNA Secondary Structure Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison. Biological sequence analysis,
More informationAdv Biology: DNA and RNA Study Guide
Adv Biology: DNA and RNA Study Guide Chapter 12 Vocabulary -Notes What experiments led up to the discovery of DNA being the hereditary material? o The discovery that DNA is the genetic code involved many
More informationRNA and PROTEIN SYNTHESIS. Chapter 13
RNA and PROTEIN SYNTHESIS Chapter 13 DNA Double stranded Thymine Sugar is RNA Single stranded Uracil Sugar is Ribose Deoxyribose Types of RNA 1. Messenger RNA (mrna) Carries copies of instructions from
More informationPauling/Itano Experiment
Chapter 12 Pauling/Itano Experiment Linus Pauling and Harvey Itano knew that hemoglobin, a molecule in red blood cells, contained an electrical charge. They wanted to see if the hemoglobin in normal RBC
More informationBio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?
Bio11 Announcements TODAY Genetics (review) and quiz (CP #4) Structure and function of DNA Extra credit due today Next week in lab: Case study presentations Following week: Lab Quiz 2 Ch 21: DNA Biology
More informationBiomolecules: lecture 6
Biomolecules: lecture 6 - to learn the basics on how DNA serves to make RNA = transcription - to learn how the genetic code instructs protein synthesis - to learn the basics on how proteins are synthesized
More informationBioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012
Bioinformatics ONE Introduction to Biology Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012 Biology Review DNA RNA Proteins Central Dogma Transcription Translation
More informationGene Expression - Transcription
DNA Gene Expression - Transcription Genes are expressed as encoded proteins in a 2 step process: transcription + translation Central dogma of biology: DNA RNA protein Transcription: copy DNA strand making
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationAnnotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence
Annotating 7G24-63 Justin Richner May 4, 2005 Zfh2 exons Thd1 exons Pur-alpha exons 0 40 kb 8 = 1 kb = LINE, Penelope = DNA/Transib, Transib1 = DINE = Novel Repeat = LTR/PAO, Diver2 I = LTR/Gypsy, Invader
More information