May 16. Gene Finding
|
|
- Dennis Atkinson
- 6 years ago
- Views:
Transcription
1 Gene Finding
2 j T[j,k] k i Q is a set of states T is a matrix of transition probabilities T[j,k]: probability of moving from state j to state k Σ is a set of symbols e j (S) is the probability of emitting S while in state j. Automaton M=(Q,T, π,σ,e) At 9irst, M goes to initial state j with probability π j In state j, M emits a symbol from Σ according to e j, and moves to state k with probability T[j,k]., 2016
3 k S 1 S i-1 j S i P max (i,j M) = max k P max (i- 1,k) T[k,j] e j (S i ) (Viterbi) P sum (i,j M) = k (P sum (i- 1,k) T[k,j]) e j (S i ), 2016
4 E F (H)=0.5 E L (H)=0.1 M=(Q,T, π,σ,e), 2016
5 E F (H)= E L (H)=0.1 H H T T T is the observed sequence P_max(2,F) e-1 4.5e-2 1.3e-2 5.8e e-2 5.4e-2 1.6e e-3, 2016
6 HMMs allow us to model position speci9ic gap penalties, and allow for automated training to get a good alignment. Patterns/Pro9iles/HMMs allow us to represent families and foucs on key residues Each has its advantages and disadvantages, and needs special algorithms to query ef9iciently.
7 Input: a protein sequence of unknown function. To get function: Compare against a database of protein sequences with known function. Create a database of multiple alignment of diverged family members. Search using patterns such as regular expressions Search using pro9iles Search using HMMs In all cases, search domains, not the entire sequence
8 A number of databases capture proteins (domains) using various representations Each domain is also associated with structure/ function information, parsed from the literature. Each database has speci9ic query mechanisms that allow us to compare our sequences against them, and assign function HMM 3D
9 What is a Gene? 5/16/16 CSE 182
10 In our discussion of BLAST, we alternated between looking at DNA, and protein sequences, treating them as strings. DNA, RNA, and proteins are the 3 important molecules What is the relation between the three?
11
12 We de9ine a gene as a location on the genome that codes for proteins. The genic information is used to manufacture proteins through transcription, and translation. There is a unique mapping from triplets to amino- acids 5/16/16 CSE 182
13
14
15 Transcription start ATAGATGATGTACGATGAGAATGTGATTAATG Translation start Donor Acceptor
16 The ribosomal machinery reads mrna. Each triplet is translated into a unique amino- acid until the STOP codon is encountered. There is also a special signal where translation starts, usually at the ATG (M) codon. 5/16/16 CSE 182
17 The ribosomal machinery reads mrna. Each triplet is translated into a unique amino- acid until the STOP codon is encountered. There is also a special signal where translation starts, usually at the ATG (M) codon. Given a DNA sequence, how many ways can you translate it? 5/16/16 CSE 182
18 The gene can lie on any strand (relative to the reference genome) The code can be in one of 3 frames. Frame 1 Frame 2 Frame 3 S R V * W R V Q Y S G * S I V D AGTAGAGTATAGTGGACG TCATCTCATATCACCTGC -ve strand
19 5/16/16 CSE 182
20 ATG 5 UTR exon 3 UTR Translation start intron Transcription start Donor splice site 5/16/16 CSE 182 Acceptor
21 Eukaryotic gene de9initions: Location that codes for a protein The transcript sequence(s) that encodes the protein The protein sequence(s) Suppose you want to know all of the genes in an organism. This was a major problem in the 70s. PhDs, and careers were spent isolating a single gene sequence. All of that changed with better reagents and the development of high throughput methods like EST sequencing 5/16/16 CSE 182
22 Proteins are the molecular machinery of the cell. Drugs target proteins, binding to activate/inhibit.
23 Only a few (protein) targets were known in1970s The focus was on designing drugs that interact with the target.
24
25
26 It is possible to extract all of the mrna from a cell. However, mrna is unstable An enzyme called reverse transcriptase is used to make a DNA copy of the RNA. Use DNA polymerase to get a complementary DNA strand. Sequence the (stable) cdna from both ends. This leads to a collection of transcripts/expressed sequences (ESTs). Many might be from the same gene AAAA TTTT AAAA TTTT 5/16/16 CSE 182
27 The expressed transcript (mrna) has a poly- A tail at the end, which can be used as a template for Reverse Transcriptase. This collection of DNA has only the spliced message! It is sampled at random and sequenced from one (3 /5 ) or both ends. Each message is sampled many times. The resulting collection of sequences is called an EST database AAAA TTTT AAAA TTTT 5/16/16 CSE 182
28 Often, reverse transcriptase breaks off early. Why is this a good thing? The 3 end may not have a much coding sequence. We can assemble the 5 end to get more of the coding sequence 5/16/16 CSE 182
29 Newer methods like RNA- seq offer a more comprehensive sampling of the set of transcripts: They can be used for gene 9inding, but, Differences in expression/abundance of transcripts The gene sequence is in small pieces The fragments must still be mapped back to the genome to get the coordinates. There are other features of the gene that are not revealed by transcript sequencing
30 Given Genomic DNA, identify all the coordinates of the gene TRIVIA QUIZ! What is the name of the FIRST gene 9inding program? (google testcode) ATG 5 UTR Translation start intron exon 3 UTR Donor splice site Transcription start Acceptor
31 Given genomic DNA, does it contain a gene (or not)? Key idea: The distributions of nucleotides is different in coding (translated exons) and non- coding regions. Therefore, a statistical test can be used to discriminate between coding and non- coding regions.
32 You are given a collection of exons, and a collection of intergenic sequence. Count the number of occurrences of ATGATG in Introns and Exons. Suppose 1% of the hexamers in Exons are ATGATG Only 0.01% of the hexamers in Intergenic are ATGATG How can you use this idea to 9ind genes?
33 Frequencies (X10-5 ) AAAAAA AAAAAC AAAAAG AAAAAT I E X 5 10 Compute a frequency count for all hexamers. Exons, Intergenic and the sequence X are all vectors in a multi-dimensional space Use this to decide whether a sequence X is exonic/ intergenic.
34 Plot the following vectors E= [10, 20] I = [10, 5] V 3 = [6, 10] V 4 = [9, 15] Is V 3 more like E or more like I? V 3 E 5 I
35 Normalize V = V/ V All vectors have the same length (lie on the unit circle) Next, compute the angle to E, and I. Choose the feature that is closer (smaller angle. β E V 3 I E - score(v 3 ) = α α + β α
36 Fickett and Tung (1992) compared various measures Measures that preserve the triplet frame are the most successful. Genscan uses a 5th order Markov Model
37 Exon Intron AAAAAA 20 1 AAAAAC AAAAAG 5 30 AAAAAT 3.. A AAAAA C AAAAC Tot G AAAAG Pr EXON [AAAAAACGAGAC..] =T[AAAAA,A] T[AAAAA,C] T[AAAAC,G] T[AAACG,A] = (20/78) (50/78).
38 " CodingDifferential[x] = log Pr Exon [x] % $ ' # Pr Intron [x]& The coding differential can be computed as the log odds of the probability that a sequence is an exon vs. and intron. In Genscan, separate transition matrices are trained for each frame, as different frames have different hexamer distributions
39
40 Plot the coding score using a sliding window of 9ixed length. The (large) exons will show up reliably. Not enough to predict gene boundaries reliably Coding
41 Signals at exon boundaries are precise but not speci9ic. Coding signals are speci9ic but not precise. When combined they can be effective ATG GT AG Coding
42 We can compute the following: E- score[i,j] I- score[i,j] D- score[i] A- score[i] Goal is to 9ind coordinates that maximize the total score i j
43 Ex: Grail II. Used statistical techniques to combine various signals into a coherent gene structure. It was not easy to train on many parameters. Guigo & Bursett test revealed that accuracy was still very low. Problem with multiple genes in a genomic region
44 An HMM is the best way to model and optimize the combination of signals Here, we will use a simpler approach which is essentially the same as the Viterbi algorithm for HMMs, but without the formalism.
45 i 1 i 2 i 3 i 4 IIIIIEEEEEEIIIIIIEEEEEEIIIIEEEEEE IIIII Identifying a gene is equivalent to labeling each nucleotide as E/I/intergenic etc. These labels are the hidden states For simplicity, consider only two states E and I
46 i 1 i 2 i 3 i 4 IIIIIEEEEEEIIIIIIEEEEEEIIIIEEEEEE IIIII Given a labeling L, we can score it as I- score[0..i 1-1] + E- score[i 1..i 2 ] + D- score[i 2 +1] + I- score[i i 3-1] + A- score[i 3-1] + E- score[i 3..i 4 ] +. Goal is to compute a labeling with maximum score.
47 De9ine V E (i) = Best score of a labeling of the pre9ix 1..i such that the i- th position is labeled E De9ine V I (i) = Best score of a labeling of the pre9ix 1..i such that the i- th position is labeled I Why is it enough to compute V E (i) & V I (i)?
48 # E_score[ j i] + V V E (i) = max I ( j 1) j<i $ % +A_score[ j 1]} j i # I_score[ j..i] + V V I (i) = max E ( j 1) j<i $ % +D_score[ j]} j i
49 Note that we deal with two states, and consider all paths that move between the two states. E I i
50 We did not deal with the boundary cases in the recurrence. Instead of labeling with two states, we can label with multiple states, E init, E 9in, E mid, I, I G (intergenic) I G I Note: all links are not shown here E fin E mid E init
51
52 Gene 9inding can be interpreted as a d.p. approach that threads genomic sequence through the states of a gene HMM. E init, E 9in, E mid, I, I G (intergenic) I G I E fin Note: all links are not shown here E mid E init i
53 A probabilistic model for each of the states (ex: Exon, Splice site) needs to be described In standard HMMs, there is an exponential distribution on the duration of time spent in a state. This is violated by many states of the gene structure HMM. Solution is to model these using generalized HMMs.
54
55 Each state also emits a duration for which it will cycle in the same state. The time is generated according to a random process that depends on the state.
56 q k j i F k (i) = P q k (X j,i ) f qk ( j i +1) a lk j<i l Q F l ( j) Duration Prob.: Probability that you stayed in state q k for j-i+1 steps Emission Prob.: Probability that you emitted X i..x j in state q k (given by the 5th order markov model) Forward Prob: Probability that you emitted i symbols and ended up in state q k
57 Various signals distinguish coding regions from non- coding HMMs are a reasonable model for Gene structures, and provide a uniform method for combining various signals. Further improvement may come from improved signal detection
58 Coding versus non- coding Splice Signals Translation start ATG 5 UTR exon 3 UTR Translation start intron Transcription start Donor splice site Acceptor
59 The donor site marks the junction where an exon ends, and an intron begins. For gene 9inding, we are interested in computing a probability D[i] = Prob[Donor site at position i] Approach: Collect a large number of donor sites, align, and look for a signal.
60 Fixed length for the splice signal. Each position is generated independently according to a distribution Figure shows data from > 1200 donor sites AAGGTGAGT CCGGTAAGT GAGGTGAGG TAGGTAAGG
61 Various signals distinguish coding regions from non- coding HMMs are a reasonable model for Gene structures, and provide a uniform method for combining various signals. Further improvement may come from improved signal detection
62 Nature Science
63
64 Gene prediction is harder with alternative splicing. One approach might be to use comparative methods to detect genes Given a similar mrna/protein (from another species, perhaps?), can you 9ind the best parse of a genomic sequence that matches that target sequence Yes, with a variant on alignment algorithms that penalize separately for introns, versus other gaps.
65 Pr[GGTA] is a donor site? 0.5*0.5 Pr[CGTA] is a donor site? 0.5*0.5 Is something wrong with this explanation? GGTA GGTA GGTA GGTA CGTG CGTG CGTG CGTG
66 PWMs do not capture correlations between positions Many position pairs in the Donor signal are correlated
67 Choose the position i which has the highest correlation score. Split sequences into two: those which have the consensus at position i, and the remaining. Recurse until <Terminating conditions> Stop if #sequences is small enough
68
69 Various signals distinguish coding regions from non- coding HMMs are a reasonable model for Gene structures, and provide a uniform method for combining various signals. Further improvement may come from improved signal detection
70 Nature Science
71
72 Gene prediction is harder with alternative splicing. One approach might be to use comparative methods to detect genes Given a similar mrna/protein (from another species, perhaps?), can you 9ind the best parse of a genomic sequence that matches that target sequence Yes, with a variant on alignment algorithms that penalize separately for introns, versus other gaps.
73 Procrustes/Sim4: mrna vs. genomic Genewise: proteins versus genomic CEM: genomic versus genomic Twinscan: Combines comparative and de novo approach. Mass Spec related? Later in the class we will consider mass spectrometry data. Can we use this data to identify genes in eukaryotic genomes? (Research project)
74 RefSeq and other databases maintain sequences of full- length transcripts/ genes. We can query using sequence.
75 Sequence Comparison (BLAST & other tools) Protein Motifs: Pro9iles/Regular Expression/ HMMs Discovering protein coding genes Gene 9inding HMMs DNA signals (splice signals) How is the genomic sequence itself obtained? ESTs Gene finding Protein sequence analysis
GenBank Growth. In 2003 ~ 31 million sequences ~ 37 billion base pairs
Gene Finding GenBank Growth GenBank Growth In 2003 ~ 31 million sequences ~ 37 billion base pairs GenBank: Exponential Growth Growth of GenBank in billions of base pairs from release 3 in April of 1994
More information132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, This exposition is based on the following source, which is recommended reading:
132 Grundlagen der Bioinformatik, SoSe 14, D. Huson, June 22, 214 1 Gene Prediction Using HMMs This exposition is based on the following source, which is recommended reading: 1. Chris Burge and Samuel
More informationGrundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, This exposition is based on the following source, which is recommended reading:
Grundlagen der Bioinformatik, SoSe 11, D. Huson, July 4, 211 155 12 Gene Prediction Using HMMs This exposition is based on the following source, which is recommended reading: 1. Chris Burge and Samuel
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationHow to design an HMM for a new problem. HMM model structure. Inherent limitation of HMMs. Duration modeling. Duration modeling
How to design an HMM for a new problem Architecture/topology design: What are the states, observation symbols, and the topology of the state transition graph? Learning/Training: Fully annotated or partially
More informationGene Prediction in Eukaryotes
Gene Prediction in Eukaryotes Jan-Jaap Wesselink Biomol Informatics, S.L. jjw@biomol-informatics.com June 2010/Madrid jjw@biomol-informatics.com (BI) Gene Prediction June 2010/Madrid 1 / 34 Outline 1 Gene
More informationProfile HMMs. 2/10/05 CAP5510/CGS5166 (Lec 10) 1 START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END
Profile HMMs START STATE 1 STATE 2 STATE 3 STATE 4 STATE 5 STATE 6 END 2/10/05 CAP5510/CGS5166 (Lec 10) 1 Profile HMMs with InDels Insertions Deletions Insertions & Deletions DELETE 1 DELETE 2 DELETE 3
More informationGenscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,
Genscan The Genscan HMM model Training Genscan Validating Genscan (c) Devika Subramanian, 2009 96 Gene structure assumed by Genscan donor site acceptor site (c) Devika Subramanian, 2009 97 A simple model
More informationOutline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation
Tues, Nov 29: Gene Finding 1 Online FCE s: Thru Dec 12 Thurs, Dec 1: Gene Finding 2 Tues, Dec 6: PS5 due Project presentations 1 (see course web site for schedule) Thurs, Dec 8 Final papers due Project
More information3'A C G A C C A G T A A A 5'
AP Biology Chapter 14 Reading Guide Gene Expression: From Gene to Protein Overview 1. What is gene expression? Concept 14.1 Genes specify proteins via transcription and translation Basic Principles of
More informationSequence Analysis. II: Sequence Patterns and Matrices. George Bell, Ph.D. WIBR Bioinformatics and Research Computing
Sequence Analysis II: Sequence Patterns and Matrices George Bell, Ph.D. WIBR Bioinformatics and Research Computing Sequence Patterns and Matrices Multiple sequence alignments Sequence patterns Sequence
More informationHomework 4. Due in class, Wednesday, November 10, 2004
1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors
More informationApplications of HMMs in Computational Biology. BMI/CS Colin Dewey
Applications of HMMs in Computational Biology BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Fall 2008 The Gene Finding Task Given: an uncharacterized DNA sequence Do:
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationOutline. 1. Introduction. 2. Exon Chaining Problem. 3. Spliced Alignment. 4. Gene Prediction Tools
Outline 1. Introduction 2. Exon Chaining Problem 3. Spliced Alignment 4. Gene Prediction Tools Section 1: Introduction Similarity-Based Approach to Gene Prediction Some genomes may be well-studied, with
More informationMODULE 5: TRANSLATION
MODULE 5: TRANSLATION Lesson Plan: CARINA ENDRES HOWELL, LEOCADIA PALIULIS Title Translation Objectives Determine the codons for specific amino acids and identify reading frames by looking at the Base
More informationFermentation. Lesson Overview. Lesson Overview 13.1 RNA
13.1 RNA THINK ABOUT IT DNA is the genetic material of cells. The sequence of nucleotide bases in the strands of DNA carries some sort of code. In order for that code to work, the cell must be able to
More informationAnnotating the Genome (H)
Annotating the Genome (H) Annotation principles (H1) What is annotation? In general: annotation = explanatory note* What could be useful as an annotation of a DNA sequence? an amino acid sequence? What
More informationComputational gene finding
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) Lec 1 Lec 2 Lec 3 The biological context Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationReading Lecture 8: Lecture 9: Lecture 8. DNA Libraries. Definition Types Construction
Lecture 8 Reading Lecture 8: 96-110 Lecture 9: 111-120 DNA Libraries Definition Types Construction 142 DNA Libraries A DNA library is a collection of clones of genomic fragments or cdnas from a certain
More informationFig Ch 17: From Gene to Protein
Fig. 17-1 Ch 17: From Gene to Protein Basic Principles of Transcription and Translation RNA is the intermediate between genes and the proteins for which they code Transcription is the synthesis of RNA
More informationVideos. Bozeman Transcription and Translation: Drawing transcription and translation:
Videos Bozeman Transcription and Translation: https://youtu.be/h3b9arupxzg Drawing transcription and translation: https://youtu.be/6yqplgnjr4q Objectives 29a) I can contrast RNA and DNA. 29b) I can explain
More informationTranscription is the first stage of gene expression
Transcription is the first stage of gene expression RNA synthesis is catalyzed by RNA polymerase, which pries the DNA strands apart and hooks together the RNA nucleotides The RNA is complementary to the
More informationBiology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall
Biology Biology 1 of 39 12-3 RNA and Protein Synthesis 2 of 39 Essential Question What is transcription and translation and how do they take place? 3 of 39 12 3 RNA and Protein Synthesis Genes are coded
More informationBiology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall
Biology Biology 1 of 39 12-3 RNA and Protein Synthesis 2 of 39 12 3 RNA and Protein Synthesis Genes are coded DNA instructions that control the production of proteins. Genetic messages can be decoded by
More informationThe Nature of Genes. The Nature of Genes. Genes and How They Work. Chapter 15/16
Genes and How They Work Chapter 15/16 The Nature of Genes Beadle and Tatum proposed the one gene one enzyme hypothesis. Today we know this as the one gene one polypeptide hypothesis. 2 The Nature of Genes
More informationVideos. Lesson Overview. Fermentation
Lesson Overview Fermentation Videos Bozeman Transcription and Translation: https://youtu.be/h3b9arupxzg Drawing transcription and translation: https://youtu.be/6yqplgnjr4q Objectives 29a) I can contrast
More informationGenome 373: Hidden Markov Models III. Doug Fowler
Genome 373: Hidden Markov Models III Doug Fowler Review from Hidden Markov Models I and II We talked about two decoding algorithms last time. What is meant by decoding? Review from Hidden Markov Models
More informationGene Prediction. Mario Stanke. Institut für Mikrobiologie und Genetik Abteilung Bioinformatik. Gene Prediction p.
Gene Prediction Mario Stanke mstanke@gwdg.de Institut für Mikrobiologie und Genetik Abteilung Bioinformatik Gene Prediction p.1/23 Why Predict Genes with a Computer? tons of data 39/250 eukaryotic/prokaryotic
More informationBIO 311C Spring Lecture 36 Wednesday 28 Apr.
BIO 311C Spring 2010 1 Lecture 36 Wednesday 28 Apr. Synthesis of a Polypeptide Chain 5 direction of ribosome movement along the mrna 3 ribosome mrna NH 2 polypeptide chain direction of mrna movement through
More informationProGen: GPHMM for prokaryotic genomes
ProGen: GPHMM for prokaryotic genomes Sharad Akshar Punuganti May 10, 2011 Abstract ProGen is an implementation of a Generalized Pair Hidden Markov Model (GPHMM), a model which can be used to perform both
More informationGenes & Gene Finding
Genes & Gene Finding Ben Langmead Department of Computer Science Please sign guestbook (www.langmead-lab.org/teaching-materials) to tell me briefly how you are using the slides. For original Keynote files,
More informationLecture 7 Motif Databases and Gene Finding
Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 7 Motif Databases and Gene Finding Motif Databases & Gene Finding Motifs Recap Motif Databases TRANSFAC
More informationCSE 527 Computational Biology Autumn Lectures ~14-15 Gene Prediction
CSE 527 Computational Biology Autumn 2004 Lectures ~14-15 Gene Prediction Some References A great online bib http://www.nslij-genetics.org/gene/ A good intro survey JM Claverie (1997) "Computational methods
More informationUnit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression
Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression Unit 1: DNA and the Genome Sub-Topic (1.3) Gene Expression On completion of this subtopic I will be able to State the meanings of the terms genotype,
More informationRNA, & PROTEIN SYNTHESIS. 7 th Grade, Week 4, Day 1 Monday, July 15, 2013
RNA, & PROTEIN SYNTHESIS 7 th Grade, Week 4, Day 1 Monday, July 15, 2013 The Central Dogma RNA vs. DNA Ribonucleic Acid RNA is required for translation of genetic information stored in DNA into protein
More informationLecture 10. Ab initio gene finding
Lecture 10 Ab initio gene finding Uses of probabilistic sequence Segmentation models/hmms Multiple alignment using profile HMMs Prediction of sequence function (gene family models) ** Gene finding ** Review
More informationBIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.
!! www.clutchprep.com CONCEPT: GENES Beadle and Tatum develop the one gene one enzyme hypothesis through their work with Neurospora (bread mold). This idea was later revised as the one gene one polypeptide
More informationYear III Pharm.D Dr. V. Chitra
Year III Pharm.D Dr. V. Chitra 1 Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Only one strand of DNA serves
More informationThe Flow of Genetic Information
Chapter 17 The Flow of Genetic Information The DNA inherited by an organism leads to specific traits by dictating the synthesis of proteins and of RNA molecules involved in protein synthesis. Proteins
More informationBi 8 Lecture 5. Ellen Rothenberg 19 January 2016
Bi 8 Lecture 5 MORE ON HOW WE KNOW WHAT WE KNOW and intro to the protein code Ellen Rothenberg 19 January 2016 SIZE AND PURIFICATION BY SYNTHESIS: BASIS OF EARLY SEQUENCING complex mixture of aborted DNA
More informationGenes and How They Work. Chapter 15
Genes and How They Work Chapter 15 The Nature of Genes They proposed the one gene one enzyme hypothesis. Today we know this as the one gene one polypeptide hypothesis. 2 The Nature of Genes The central
More informationChapter 13. From DNA to Protein
Chapter 13 From DNA to Protein Proteins All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequenceof a gene The Path From Genes to
More informationComparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.
Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationGenome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)
Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA
More informationComputational gene finding. Devika Subramanian Comp 470
Computational gene finding Devika Subramanian Comp 470 Outline (3 lectures) The biological context Lec 1 Lec 2 Lec 3 Markov models and Hidden Markov models Ab-initio methods for gene finding Comparative
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. Evidence Based Annotation. GEP goals: Evidence for Gene Models 08/22/2017
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationGenes and gene finding
Genes and gene finding Ben Langmead Department of Computer Science You are free to use these slides. If you do, please sign the guestbook (www.langmead-lab.org/teaching-materials), or email me (ben.langmead@gmail.com)
More informationCollect, analyze and synthesize. Annotation. Annotation for D. virilis. GEP goals: Evidence Based Annotation. Evidence for Gene Models 12/26/2018
Annotation Annotation for D. virilis Chris Shaffer July 2012 l Big Picture of annotation and then one practical example l This technique may not be the best with other projects (e.g. corn, bacteria) l
More informationMethods and Algorithms for Gene Prediction
Methods and Algorithms for Gene Prediction Chaochun Wei 韦朝春 Sc.D. ccwei@sjtu.edu.cn http://cbb.sjtu.edu.cn/~ccwei Shanghai Jiao Tong University Shanghai Center for Bioinformation Technology 5/12/2011 K-J-C
More informationLesson Overview. Fermentation 13.1 RNA
13.1 RNA The Role of RNA Genes contain coded DNA instructions that tell cells how to build proteins. The first step in decoding these genetic instructions is to copy part of the base sequence from DNA
More informationTranscription. DNA to RNA
Transcription from DNA to RNA The Central Dogma of Molecular Biology replication DNA RNA Protein transcription translation Why call it transcription and translation? transcription is such a direct copy
More informationBio 101 Sample questions: Chapter 10
Bio 101 Sample questions: Chapter 10 1. Which of the following is NOT needed for DNA replication? A. nucleotides B. ribosomes C. Enzymes (like polymerases) D. DNA E. all of the above are needed 2 The information
More informationCOMPUTER RESOURCES II:
COMPUTER RESOURCES II: Using the computer to analyze data, using the internet, and accessing online databases Bio 210, Fall 2006 Linda S. Huang, Ph.D. University of Massachusetts Boston In the first computer
More informationComputational Gene Finding
Computational Gene Finding Dong Xu Digital Biology Laboratory Computer Science Department Christopher S. Life Sciences Center University of Missouri, Columbia E-mail: xudong@missouri.edu http://digbio.missouri.edu
More informationBacterial Genome Annotation
Bacterial Genome Annotation Bacterial Genome Annotation For an annotation you want to predict from the sequence, all of... protein-coding genes their stop-start the resulting protein the function the control
More informationGene Structure & Gene Finding Part II
Gene Structure & Gene Finding Part II David Wishart david.wishart@ualberta.ca 30,000 metabolite Gene Finding in Eukaryotes Eukaryotes Complex gene structure Large genomes (0.1 to 10 billion bp) Exons and
More informationTextbook Reading Guidelines
Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: January 16, 2013 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science
More informationPROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein
PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 08: Gene finding aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggc tatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatt
More informationOutline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions
Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson
More informationMolecular Cell Biology - Problem Drill 08: Transcription, Translation and the Genetic Code
Molecular Cell Biology - Problem Drill 08: Transcription, Translation and the Genetic Code Question No. 1 of 10 1. Which of the following statements about how genes function is correct? Question #1 (A)
More informationMultiple choice questions (numbers in brackets indicate the number of correct answers)
1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer
More information6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA
6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA DNA mrna Protein DNA is found in the nucleus, but making a protein occurs at the ribosome
More informationSSA Signal Search Analysis II
SSA Signal Search Analysis II SSA other applications - translation In contrast to translation initiation in bacteria, translation initiation in eukaryotes is not guided by a Shine-Dalgarno like motif.
More informationDNA is normally found in pairs, held together by hydrogen bonds between the bases
Bioinformatics Biology Review The genetic code is stored in DNA Deoxyribonucleic acid. DNA molecules are chains of four nucleotide bases Guanine, Thymine, Cytosine, Adenine DNA is normally found in pairs,
More informationLecture 11. Initiation of RNA Pol II transcription. Transcription Initiation Complex
Lecture 11 *Eukaryotic Transcription Gene Organization RNA Processing 5 cap 3 polyadenylation splicing Translation Initiation of RNA Pol II transcription Consensus sequence of promoter TATA Transcription
More informationBiology A: Chapter 9 Annotating Notes Protein Synthesis
Name: Pd: Biology A: Chapter 9 Annotating Notes Protein Synthesis -As you read your textbook, please fill out these notes. -Read each paragraph state the big/main idea on the left side. -On the right side
More informationMake the protein through the genetic dogma process.
Make the protein through the genetic dogma process. Coding Strand 5 AGCAATCATGGATTGGGTACATTTGTAACTGT 3 Template Strand mrna Protein Complete the table. DNA strand DNA s strand G mrna A C U G T A T Amino
More informationTIGR THE INSTITUTE FOR GENOMIC RESEARCH
Introduction to Genome Annotation: Overview of What You Will Learn This Week C. Robin Buell May 21, 2007 Types of Annotation Structural Annotation: Defining genes, boundaries, sequence motifs e.g. ORF,
More informationTranscription and Translation. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences
Transcription and Translation DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences Protein Structure Made up of amino acids Polypeptide- string of amino acids 20 amino acids are arranged in different
More informationFrom RNA To Protein
From RNA To Protein 22-11-2016 Introduction mrna Processing heterogeneous nuclear RNA (hnrna) RNA that comprises transcripts of nuclear genes made by RNA polymerase II; it has a wide size distribution
More informationTranscription steps. Transcription steps. Eukaryote RNA processing
Transcription steps Initiation at 5 end of gene binding of RNA polymerase to promoter unwinding of DNA Elongation addition of nucleotides to 3 end rules of base pairing requires Mg 2+ energy from NTP substrates
More informationTranscription in Eukaryotes
Transcription in Eukaryotes Biology I Hayder A Giha Transcription Transcription is a DNA-directed synthesis of RNA, which is the first step in gene expression. Gene expression, is transformation of the
More informationLecture for Wednesday. Dr. Prince BIOL 1408
Lecture for Wednesday Dr. Prince BIOL 1408 THE FLOW OF GENETIC INFORMATION FROM DNA TO RNA TO PROTEIN Copyright 2009 Pearson Education, Inc. Genes are expressed as proteins A gene is a segment of DNA that
More informationDNA Function: Information Transmission
DNA Function: Information Transmission DNA is called the code of life. What does it code for? *the information ( code ) to make proteins! Why are proteins so important? Nearly every function of a living
More informationIntroduction to Cellular Biology and Bioinformatics. Farzaneh Salari
Introduction to Cellular Biology and Bioinformatics Farzaneh Salari Outline Bioinformatics Cellular Biology A Bioinformatics Problem What is bioinformatics? Computer Science Statistics Bioinformatics Mathematics...
More informationKey Area 1.3: Gene Expression
Key Area 1.3: Gene Expression RNA There is a second type of nucleic acid in the cell, called RNA. RNA plays a vital role in the production of protein from the code in the DNA. What is gene expression?
More informationI. Gene Expression Figure 1: Central Dogma of Molecular Biology
I. Gene Expression Figure 1: Central Dogma of Molecular Biology Central Dogma: Gene Expression: RNA Structure RNA nucleotides contain the pentose sugar Ribose instead of deoxyribose. Contain the bases
More informationAnalysis of Biological Sequences SPH
Analysis of Biological Sequences SPH 140.638 swheelan@jhmi.edu nuts and bolts meet Tuesdays & Thursdays, 3:30-4:50 no exam; grade derived from 3-4 homework assignments plus a final project (open book,
More informationThemes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!
Themes: RNA is very versatile! RNA and RNA Processing Chapter 14 RNA-RNA interactions are very important! Prokaryotes and Eukaryotes have many important differences. Messenger RNA (mrna) Carries genetic
More informationChapter 17: From Gene to Protein
Name Period Chapter 17: From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect to
More informationStudying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome
Lesson Overview 14.3 Studying the Human Genome THINK ABOUT IT Just a few decades ago, computers were gigantic machines found only in laboratories and universities. Today, many of us carry small, powerful
More informationBis2A 12.0 Transcription *
OpenStax-CNX module: m56068 1 Bis2A 12.0 Transcription * Mitch Singer Based on Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License
More informationThe Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Genetic Code. Genes and How They Work
Genes and How They Work Chapter 15 Early ideas to explain how genes work came from studying human diseases. Archibald Garrod studied alkaptonuria, 1902 Garrod recognized that the disease is inherited via
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationReplication, Transcription, and Translation
Replication, Transcription, and Translation Information Flow from DNA to Protein The Central Dogma of Molecular Biology Replication is the copying of DNA in the course of cell division. Transcription is
More informationChapter 12. DNA TRANSCRIPTION and TRANSLATION
Chapter 12 DNA TRANSCRIPTION and TRANSLATION 12-3 RNA and Protein Synthesis WARM UP What are proteins? Where do they come from? From DNA to RNA to Protein DNA in our cells carry the instructions for making
More informationTranscription and Post Transcript Modification
Transcription and Post Transcript Modification You Should Be Able To 1. Describe transcription. 2. Compare and contrast eukaryotic + prokaryotic transcription. 3. Explain mrna processing in eukaryotes.
More informationBiotechnology Project Lab
Only for teaching purposes - not for reproduction or sale Advanced Cell Biology & Biotechnology Biotechnology Project Lab Giovanna Gambarotta COMPETENCES THAT YOU WILL ACQUIRE - compare DNA sequences -
More informationLecture Summary: Regulation of transcription. General mechanisms-what are the major regulatory points?
BCH 401G Lecture 37 Andres Lecture Summary: Regulation of transcription. General mechanisms-what are the major regulatory points? RNA processing: Capping, polyadenylation, splicing. Why process mammalian
More informationChimp Sequence Annotation: Region 2_3
Chimp Sequence Annotation: Region 2_3 Jeff Howenstein March 30, 2007 BIO434W Genomics 1 Introduction We received region 2_3 of the ChimpChunk sequence, and the first step we performed was to run RepeatMasker
More informationThe Structure of Proteins The Structure of Proteins. How Proteins are Made: Genetic Transcription, Translation, and Regulation
How Proteins are Made: Genetic, Translation, and Regulation PLAY The Structure of Proteins 14.1 The Structure of Proteins Proteins - polymer amino acids - monomers Linked together with peptide bonds A
More informationGene finding: putting the parts together
Gene finding: putting the parts together Anders Krogh Center for Biological Sequence Analysis Technical University of Denmark Building 206, 2800 Lyngby, Denmark 1 Introduction Any isolated signal of a
More informationAn Overview of Probabilistic Methods for RNA Secondary Structure Analysis. David W Richardson CSE527 Project Presentation 12/15/2004
An Overview of Probabilistic Methods for RNA Secondary Structure Analysis David W Richardson CSE527 Project Presentation 12/15/2004 RNA - a quick review RNA s primary structure is sequence of nucleotides
More informationDNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences
DNAFSMiner: A Web-Based Software Toolbox to Recognize Two Types of Functional Sites in DNA Sequences Huiqing Liu Hao Han Jinyan Li Limsoon Wong Institute for Infocomm Research, 21 Heng Mui Keng Terrace,
More informationSection 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?
Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein? Messenger RNA Carries Information for Protein Synthesis from the DNA to Ribosomes Ribosomes Consist
More informationEukaryotic Gene Structure
Eukaryotic Gene Structure Terminology Genome entire genetic material of an individual Transcriptome set of transcribed sequences Proteome set of proteins encoded by the genome 2 Gene Basic physical and
More information