BLAST Basics. ... Elements of Bioinformatics Spring, Tom Carter. tom/

Size: px
Start display at page:

Download "BLAST Basics. ... Elements of Bioinformatics Spring, Tom Carter. tom/"

Transcription

1 BLAST Basics Elements of Bioinformatics Spring, 2003 Tom Carter tom/ March,

2 Sequence Comparison One of the fundamental tasks we would like to do in bioinformatics is to compare two sequences of nucleotides or amino acids. In general, if two sequences are similar, we can hope (or presume?) that the two sequences (molecules) have similar biological functions, or share (meaningful) evolutionary history, or both. We could look at the global similarity of two sequences (according to some probability model) to get an overall statistic relating the two sequences, or we could look at local areas of similarity between the two sequences, considering two sequences to be similar if they share similar subsequences. In what follows, we will look at an approach to local sequence comparisons. 2

3 Basic model for local sequence comparisons Our general model for the similarity of molecular biological sequences is common origin, and therefore we will imagine that one sequence can be derived from the other via a succession of basic changes or operations (i.e., mutations). These basic operations are: Substitution this change replaces a single nucleotide or amino acid by another. Thus, for example, the sequence AGCTTTTCAT results in the sequence AGCATTTCAT via the substitution of an A for the T in the fourth location. 3

4 Insertion this adds a single nucleotide or amino acid in the sequence. The sequence AGCTTTTCAT results in the sequence AGACTTTTCAT via the insertion of an A after the G. Deletion this removes a single nucleotide or amino acid in the sequence. The sequence AGCTTTTCAT results in the sequence ACTTTTCAT via the deletion of the G. 4

5 We are then interested in developing probability models to describe biological sequences. We will also be interested in models to describe the conversion of one sequence into another via a succession of the three basic operations. There are a variety of possible approaches to developing such a probability model. The general approach is to start with a very simple model, and then add additional features to the model to get a better description. In the simplest model, the Random model (R), we could assume, for example, that a DNA sequence arises from a random process whereby each nucleotide is equally likely to occur (i.e., each of A, C, G, T has probability 1/4 of occurring), and thus the probability of observing a given specific sequence of n nucleotides would be (1/4) n. 5

6 More generally, given two nucleotide or amino acid sequences x and y, we can look at the joint probability of observing the pair of sequences (assuming the Random model R): P (x, y R) = i q xi j q yj where q xi is the probability of occurrence of the ith nucleotide in sequence x, etc. What we want to do is build an alternative probability model which reflects evolutionary history (or at least biochemistry). Let us call this alternative model M. In a first version of this model, we assume that the two sequences have the same length, and think of aligning the two sequences. We can then look at the joint probability of the occurrence of the two nucleotides or amino acids in each position, assuming that they derive from a common ancestor at some time in the past via substitution. We would then have, for each pair (a, b) of nucleotides or amino acids 6

7 a probability P ab that the given pair resulted from substitutions from a common ancestor c. This would give us a probability of observing the pair of sequencs x and y given by: P (x, y M) = i p xi y i. We then want to develop a way of estimating the likelihood that similarity between a pair of sequences reflects actual biological similarity, or is just a random occurrence. We can calculate the ratio of the two probabilities as an odds ratio: P (x, y M) P (x, y R) = i p xi y i i q xi j q yj = i p xi y i q xi q yi. 7

8 We can convert this into an additive system by taking logarithms, so that we can calculate by adding up a log-odds ratio: S = i s(x i, y i ). In this formula, s(a, b) = log ( Pab q a q b is the likelihood ratio of the pair (a, b) resulting from a substitution from a common ancestor rather than just by a random occurrence. ) The next step, then, would be to build a table (an array) of s(a, b) log-likelihoods for all possbile pairs of nucleotides or amino acids. On the next page is an example of such a table for amino acids, called the BLOSUM-62 Clustered Scoring Matrix in 1/2 Bit Units. 8

9 BLOSUM Clustered Scoring Matrix in 1/2 Bit Units Cluster Percentage: >= 62 Entropy = , Expected = A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X * Blocks Substitution Matrices for Protein Sequence Comparisons (Blosum) 9

10 Basics of the BLAST algorithm Once we have a scoring matrix (such as the Blosum-62 matrix), we can develop the BLAST algorithm. BLAST stands for Basic Local Alignment Search Tool. The fundamental approach is to develop a similarity score between pairs of sequences. This is done by finding (short) local very close matches between sections of a given sequence and the target, or comparison, sequence. Typically these short matches are about 3 amino acids for proteins, and about 11 nucleotides for nucleic acids. This short match is then extended by one residue at a time to find the local alignment with the maximum score. This score can then be compared with other scores for other sequences to find the best match. 10

11 In a more general form of the algorithm, we can include the possibility of a gap in which we don t require a match. Typically we will include a penalty in our scoring for opening a gap, and an additional penalty for each residue added to the gap. These will then be included in the score for the local alignment match. In practice, we often want to BLAST a given sequence against a whole database of sequences. In order to make this process reasonably fast, we can start by building a table of the short sequences that occur in the database. Our given sequence is then matched against the table of short sequences, which then refer the algorithm to appropriate sequences in the database for possible extensions. We can keep a running total of the best scores we have seen so far, and abandon extensions which are not as good as what we have already found. 11

12 Building the BLOSUM matrices The BLOSUM scoring matrices were developed from a database of typical blocks of amino acids observed in proteins. First, the blocks database was clustered at a given percentage level. In other words, blocks that were at least the given percentage identical were clustered together to give a typical example of a cluster of similar proteins. From each cluster, a representative amino acid sequence was developed. The entries in the BLOSUM matrix are then calculated as the actual frequency of occurrence of the amino acid pair in the clustered blocks database, divided by the expected probability of occurrence. The expected value is calculated from the frequency of occurrence of each of the two 12

13 individual amino acids in the blocks database, which gives an estimate of a chance (random) alignment of the two amino acids. The actual/expected ratio is expressed as a log-odds score in so-called halfbit units. These units are obtained by taking the base 2 logarithm of the ratio, and then multiplying by 2. A zero score means that the frequency of the amino acid pair in the database was the same as a chance alignment, a positive score that the pair was found more often than by chance, and a negative score that the pair was found less often than by chance. The accumulated score of a given alignment of several amino acids in two sequences is calculated by adding up the respective scores of each individual pair of amino acids in the alignment. The Blosum matrix values are based on the observed amino acid substitutions in a large set of approximately 2000 conserved amino 13

14 acid patterns, called blocks. These blocks come from a database of protein sequences representing over 500 groups of related proteins, which can act as signatures for protein families. The Blosum matrices are based on a different principle and a larger data set than the Dayhoff PAM (percent accepted substitution) matrices, which are derived from the observed rate of mutation during predicted evolutionary changes in a relatively small number of protein families. To build the blosum matrices, the sequences of the proteins in 500 families were aligned in the regions defined by the blocks. Each column in the aligned sequences then provided a set of possible amino acid substitutions. For this analysis, it is assumed that the probability of change from amino acid X to amino acid Y is the same as the probability of the reverse change from Y to X, and thus the resulting matrix is symmetric. 14

15 The various substitutions were then tabulated for all the aligned patterns in the database. More common substitutions should represent a closer relationship between two amino acids in related proteins, and thus should show a higher score in sequence alignment. Rare substitutions will show lower scores. This approach, however, can result in too high a representation of amino acid substitutions which occur in the most closely related members of the protein families. To reduce this dominant contribution from the most similar proteins, the sequences of the most similar proteins were grouped together into a single sequence before scoring the amino acid substitutions in the aligned blocks. The amino acid changes within these clustered sequences were then averaged. Patterns with 60% agreement were grouped together to make one substitution matrix called blosum60, and those 80% alike to make the blosum80 matrix, etc. 15

16 As the clustering percentage was increased, the ability of the resulting matrix to distinguish actual from chance alignments also increased. This discriminating capability of the scoring system depends on the relative entropy, or average information content per residue pair. However, at the same time, the dominance effect of the most similar proteins also increases, which biases the matches. Blosum62 represents a generally reasonable balance between information content and match bias and is therefore often used as the default matrix for predicting alignments among typical protein families. References Henikoff S. and Henikoff J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: Henikoff S. and Henikoff J.G. (1993). Performance evaluation of amino acid substitution matrices. Proteins 17 :

17 Building the PAM matrices The PAM scoring / substitution matrices are intended to reflect actual changes in sequences over evolutionary time. The general idea is to compare sequences for which we (believe we) can infer an evolutionary history. PAM stands for Point Acceptance Matrix. The idea is that we build a scoring matrix according to point (substitution) mutations that have actually been accepted by evolution. The original PAM matrices were developed by M. O. Dayhoff in The steps for building the original PAM matrices involved: Gather sequences and align pairs with at least 85% agreement, being sure to minimize ambiguity and number of coincident mutations. 17

18 Build phylogenetic trees, inferring ancestral sequences. Here is an example of what an inferred phylogenetic tree might look like, with amino acid substitutions indicated: ABIH / \ / \ I-G / \ J-H / \ / \ ABGH ABIJ / \ / \ B-C / \ A-D B-D / \ A-C / \ / \ ACGH DBGH ADIJ CBIJ 18

19 Count the residue replacements accepted by natural selection. We let A ij be the number of times amino acid i was replaced by amino acid j. We take A ii to be 0. For each amino acid j, compute the relative mutability. Call this m j. We compute the relative mutability by counting the number of changes of an amino acid between aligned sequences, divided by the number of occurrences of the amino acid. These numbers are then scaled to the number of replacements of the given amino acid per 100 residues in the alignments. 19

20 For example: Aligned sequences A C A A C B Amino acids A B C Number of changes Frequency of occurrence Relative mutability The relative mutabilities found by Dayhoff were (with Ala arbitrarily set to 100): Ser 149 Asp 90 His 50 Met 122 Thr 90 Phe 45 Asn 111 Gap 84 Arg 44 Ile 110 Val 80 Leu 38 Glu 102 Gly 48 Tyr 34 Ala 100 Lys 57 Cys 27 Gln 98 Pro 56 Trp 22 20

21 We now compute the Mutation Probability Matrix (called one PAM of evolutionary distance) by the formulae and M ij = m ja ij i A ij, (i j), M jj = 1 m j. This matrix is symmetric, and the sum of any row (or column) is one: i M ij = j M ij = 1. Count the number f i of occurrences of each amino acid i, and then form the relatedness odds matrix: R ij = M ij f i. We can then calculate the log-odds scoring matrix by S ij = log(r ij ). 21

22 Some properties of the Mutation Probability Matrix: The Mutation Probability Matrix M 1 defines a unit of evolutionary change that is, 1 PAM is an average of one Accepted Point Mutation per 100 residues. Note that there is no obvious direct connection between this rate of change and real (million years, or whatever) rates of evolutionary change. We can use M1 to simulate evolutionary change. Given an amino acid sequence, we use a random number generator to apply M1 to each residue in the sequence. M1 is scaled so that for an average amino acid sequence, applying M1 to each residue once will result, on average, in one amino acid change per 100 residues. We could repeat this process, to get 2, 3, 4,... PAMs of simulated evolutionary change. 22

23 The following are equivalent: Successive applications of M1 to a sequence n times. Matrix multiplication of M 1 times itself n times, and applying the resulting M1 n to the sequence. If the m j are all quite small, then the following is also approximately equivalent: Scaling the elements of M 1 according to the formulae: and M ij = n m ja ij i A ij, (i j), M jj = 1 n m j, and applying the resulting matrix to the sequence. The last formulae give an easy way to approximate the matrix for any desired 23

24 PAM distance. Thus, for example, we could build a PAM100 or PAM250 matrix. What would be the difference between using a PAM100 matrix versus a PAM250 matrix for similarity scoring between protein sequences? Why might one choose one or the other? One way to think about this is that PAM100 corresponds with an average of 100 substitutions per 100 amino acids in the sequence (and PAM250 corresponds to 250 substitutions per 100 amino acids). At first, this may not seem very meaningful. How could there be an average of 250 substitutions per 100 amino acids? In general, we want to think of a scoring matrix as a way to measure the evolutionary distance between two 24

25 sequences. We can think of the PAM matrices as telling us about evolutionary distance in terms of PAM units. One PAM unit is the length of time it takes for a sequence to experience an average of one (accepted) mutation per every 100 amino acids in the sequence. Remember that most mutations will not confer a survival advantage (or in fact will have a deleterious effect) on the organism, and hence will not be (differentially) reproduced, and hence will not be accepted by evolution. Thus, PAM1 matrix scoring of sequence similarity will be sensitive to sequences that have diverged from each other (have a common ancestor) in the recent past (within 1 PAM). It will be much less sensitive to similarities between sequences that have diverged longer ago than 1 PAM. (Note: 1 PAM in years may be different for different families of proteins, or different regions of a protein. Why?) 25

26 PAM100 (or PAM250) will be sensitive to similarities between sequences that have diverged within the past 100 PAM (or 250 PAM). In general, we can expect there to have been significantly more divergence between sequences in 100 PAM or 250 PAM than in 1 PAM. We can think of the PAM1, PAM100, and PAM250 scoring matrices as similarity measuring tools with differing levels of focus. In order for two sequences to have a relatively high PAM1 similarity score, they must have diverged very recently. PAM1 would in general give a low score to a pair of sequences that diverged 50 PAM ago. On the other hand, PAM100 scoring could still give a reasonably high similarity score to two such sequences (as could PAM250). Thus we can think of PAM100 as being sensitive to a broader range of sequences similarities than PAM1, and PAM250 a still broader range. 26

27 Thus, if we BLAST a sequence against a protein database with PAM1, we would expect relatively few sequences in the database to return a high score (only those that diverged within about 1 PAM). On the other hand, if we BLAST with PAM250, we would expect many more sequences to return a high score (including many that diverged more than 1 PAM ago, but less than 250 PAM ago). Suppose we had two sequences that had a high PAM1 similarity score. What would we expect them to look like? They would be nearly identical. We would expect them to differ only by about one residue per 100 (i.e., we would expect about 99 out of 100 residues to be the same). Suppose we had two sequences that had a relatively high PAM250 similarity score. What would we expect them to look like? Said another way, suppose we applied M1 27

28 to a sequence 250 times. How different would we expect it to look? Suppose we follow individual residues through 250 successive applications of M1. What sorts of things could happen? A residue could just stay the same through all 250 steps (on average, we could expect about 8% of the residues not to have changed at all). It could have changed somewhere along the way, and then at some later step changed back (remember that M1 is symmetric). Thus, a certain portion of the residues would be the same as they started. Other residues would have changed, but the changes would be expected to reflect the probabilities in the M1 matrix. In some sense, we could think of PAM1 as representing a relatively small cloud of sequences around a given sequence. PAM100 would represent a larger cloud, and PAM250 an even larger cloud. High 28

29 scores would occur for sequences within the cloud. Why would we choose one PAM matrix over another? PAM250 will give relatively high scores to more distantly related sequences (as well as to closely related sequences). On the other hand, PAM250 is less discriminating than PAM100, and thus is more likely to (mistakenly) give a relatively high score to a sequence that is similar simply by chance, rather than because of evolutionary relatedness. There is a tradeoff between how broad a range of sequences will be recognized as similar, and how many randomly similar sequences will be (mistakenly) included. 29

30 Log Odds Matrix for a 250 PAM evolutionary distance Cys C 12 Ser S 0 2 Thr T Pro P Ala A Gly G Asn N Asp D Glu E Gln Q His H Arg R Lys K Met M Ile I Leu L Val V Phe F Tyr Y Trp W C S T P A G N D E Q H R K M I L V F Y W In this table, the amino acids in the table are grouped according to the chemistry of the side group: C-sulfhydryl, STPAG-small hydrophilic, NDEQ-acid, acidamide and hydrophilic, HRK-basic, MILV-small hydrophobic and FYW-aromatic. The matrix was obtained by taking the log of each element in the relatedness odds matrix for 250 PAM. The elements in this matrix are multiplied by 10 for readability. A score of -10 means that a given pair would be expected to be aligned only one tenth as frequently in related sequences as random chance would predict; a score of 2 means that the pair would be expected to align 1.6 times as frequently. The amino acids were arranged by assuming that positive values represent evolutionarily conservative replacements; the clusters correspond to groupings based on the physicochemical properties of the amino acids. 30

31 About the PAM model Some assumptions in the PAM model: This is a pointwise Markov model in other words, a substitution at any given site depends only on the amino acid at that site and the probability given by the table. In particular, a substitution does not depend on nearby residues. The model assumes that sequences being compared have typical amino acid composition. Some sources of error in the PAM model: Many sequences are not typical in composition. 31

32 Rare substitutions may be observed too infrequently to accurately reflect relative probabilities accurately. For example, in the original work, 36 amino acid pair substitutions were not observed at all. Extrapolating to higher order PAMs will multiply errors in PAM1. Markov models are an imperfect representation of evolutionary processes. For example, even distantly related sequences often have islands or blocks of conserved residues. This means that substitutions are not equally likely across entire sequences. 32

33 Scoring, Statistics and Expectations Suppose we BLAST a sequence X against a database, and it reports a sequence Y with score S. What does the score S mean, and how much confidence can we have that the similarity is meaningful, and not just the result of a random coincidence? One way to think about this is to imagine that we have a database of random sequences. How many sequences in the database would we expect to have a similarity score as high as S with our sequence? Or, perhaps, how large would the database have to be in order for us to expect for there to be at least one sequence with a score as high as S? Let s assume that our sequence X has length m, and that each sequence in the random 33

34 database has length n. If we assume that each sequence in the random database is constructed by an independent, identically distributed random process for each residue in the sequence, then we would expect the similarity scores with our sequence X to be normally distributed (this is by the central limit theorem). We would expect the maximum scores across the database to follow an Extreme Value Distribution, and thus for the expected values associated with scores S to be given by: E(S) = Kmne λs. Here K and λ are scaling parameters for the size of the database and scoring method, respectively. If things are appropriately scaled, then an E(S) = 1 would mean that we would expect there to be about 1 sequence Y in the random database with a similarity score as 34

35 high as S. An E(S) = 5 would mean we expect about 5 sequences in the database to have a score as high as S. If E(S) is less than one, that would mean that in order to expect even one random sequence to score as high as S, the random database would have to be larger. For example, an E(S) = 0.01 would mean that the database would have to be 100 times as big for us to expect even one match with score as high as S. In evaluating the results of BLASTing a sequence against a database, we can then use an E-value to assess our results. If the E-value is very small (say ), then it is extremely unlikely that the S score is the result of a random coincidence. On the other hand, if the E-value is 1 or more, then there is a fairly high likelihood that the match is the result of a random coincidence (and the higher the E-value, the higher the likelihood). Fairly typically, people insist on an E-value 35

36 less than about 0.05 before they have confidence that the match is likely to be meaningful. One other thing we can do is to normalize the scores according to S = λs ln(k). ln(2) In effect, this sets the units of the score. The ln(2) in the denominator means that we are using bits as our units. If we use these normal units, then we have that the expectation is E(S ) = mn2 S. This normalization allows us to do reasonable comparisons of the results of various BLASTs. Another issue is the fact that not all the sequences in the database will have the same 36

37 length. In effect, we can account for this by treating the database as though it were one long sequence, and then, since the E-value scales as the length of the sequences in the database, we can simply multiply the pairwise E-values by the number of sequences in the database. Thus, as the database grows, so do the E-values. It should be noted that the theoretical analysis underlying the foregoing has only really been done for ungapped scoring (i.e., no gaps allowed). On the other hand, empirical evidence suggests that the same general results hold for gapped scoring. NOTE: much of this material is discussed in the NCBI BLAST tutorial at 1.html 37

38 General issues of scoring matrices The results you will get from applying a local alignment algorithm will depend on the scoring matrix being used. In general, all the scoring matrices in general use are of the form: ( qij ln S ij =, λ where q ij are the target frequencies (positive numbers that sum to 1), p i are the background frequencies of the residues, and λ is a scaling number (the same λ as above). p i p j ) It is important to remember that the best scoring matrix to use for a given class of alignments is one whose target frequencies best characterize the class. The class of alignments depends on the specific characteristics of the research being done, the 38

39 sample sequences being used, and the databases being searched. It is worth your while to think about these issues as you choose parameters for your BLAST searches. It may also be worthwhile to explore various BLAST parameters to see what sorts of results are returned for a particular case at hand, and to progressively refine your search depending on the intermediate results you get. 39

40 One more example (codons) Mutation costs for amino acids A S G L K V T P E D N I Q R F Y C H M W Z B X Ala=A O Ser=S 1 O Gly=G Leu=L Lys=K Val=V Thr=T Pro=P Glu=E Asp=D O Asn=N O Ile=I Gln=Q Arg=R Phe=F Tyr=Y O Cys=C His=H Met=M Trp=W Glx=Z Asx=B ???=X The table is generated by calculating the minimum number of base changes required to convert an amino acid in row i to an amino acid in column j. Note that Met->Tyr is the only change that requires all 3 codon positions to change. 40

41 Nucleotide BLASTs Many of these ideas can also be applied to comparison of nucleotide sequences as well. However, there are a variety of differences: There are only four nucleotides, as opposed to the twenty amino acids. This means that the scoring matrix is much smaller. A typical very simple scoring matrix would look like: A T C G A T C G In this matrix, we only score for a match or mismatch of nucleotides. 41

42 A slightly more sophisticated scoring matrix might take into account the differences between purines and pyrimidines: A T C G A T C G In this version of a scoring matrix, a purine to purine (A G) or pyrimidine to pyrimidine (T C) transition is considered more likely than a purine pyrimidine (A T, A C, T G, or C G) transversion. 42

43 Straight nucleotide sequence to nucleotide sequence comparisons can be useful, but they do not easily reflect the fact that much of the genome information in which we are interested is sequences that are translated into amino acid sequences (proteins). Thus, a very typical approach is first to translate a given nucleotide sequence into a corresponding amino acid sequence, and then BLAST the resulting sequence against protein databases. For a typical nucleotide sequence (e.g., a sequence derived from the direct analysis of the genome of some species), there are six possible translation frames three in the forward direction, and three in the reverse direction, using the Watson-Crick complementary sequence. These translations ordinarily do not take account of possible introns, but allowing gaps in the alignments may handle this. 43

44 Because of the redundancy of the genetic code (i.e., multiple codons code for the same amino acid), it is not particularly easy to recode amino acid sequences to nucleotide sequences. There might be trillions of different nucleotide sequences, each of which encodes for the same amino acid sequence. There is another potential difficulty, for which provision is sometimes made. The genetic code is often thought of as universal, in the sense that the same codons code for the same amino acids and the same START and STOP codes are used in the vast majority of genes in nearly all species. However, some exceptions have been found. Exceptions often involve using one or two of the three STOP codons to code for an amino acid instead. 44

45 Mitochondrial genes are one place where alternative codings have been discovered. Animal and microorganism (but not plant, apparently) mitochondria use UGA to encode tryptophan (Trp) rather than as a chain terminator. In addition, most animal mitochondria use AUA to code for methionine instead of isoleucine. However, all vertebrate mitochondria seem to use AGA and AGG as chain terminators (STOP codons). Yeast mitochondria assign all codons beginning with CU to threonine instead of leucine (which is still encoded by UUA and UUG, as it is in normal cytosolic mrna). Plant mitochondria use the universal code, and this has permitted angiosperms to transfer mitochondrial genes to their nucleus. Exceptions to the universal code seem to be far rarer for nuclear genes. A few 45

46 unicellular eukaryotes have been found that use one or two (of their three) STOP codons for amino acids instead. These examples are all simple code substitutions (where a codon is used for another purpose, but the same 20 amino acids are used). The vast majority of proteins are constructed from the standard 20 amino acids, although some of these may be chemically altered, e.g. by phosphorylation, after mrna to amino acid translation has occurred. However, at least two cases have been found where an amino acid other than one of the standard 20 is inserted by a trna into the growing polypeptide. The two nonstandard amino acids that have been observed are: Selenocysteine. In certain Archaea, eubacteria, and animals, the codon 46

47 UGA sometimes codes for selenocysteine, but UGA is still often used as a STOP codon. Pyrrolysine. In one gene found in a member of the Archaea, the codon UAG is sometimes used for pyrrolysine. Again, UAG may still be used as a STOP codon. In both of these cases, the codon (UGA for selenocysteine, or UAG for pyrrolysine) is sometimes used to code for the alternative amino acid, but is often still used as a STOP codon. How the ribosomal translation machinery knows when it encounters a UGA or UAG codon whether to use a special trna to insert selenocysteine or pyrrolysine, or simply to stop translation, is not yet known. 47

48 For example, in the Biology Workbench BLASTx (nucleotide to protein translation and then BLAST), the following genetic codes are available: Standard Vertebrate mitochondrial Yeast mitochondrial Mold mitochondrial Invertebrate mitochondrial Ciliate nuclear Echinoderm mitochondrial Euplotid nuclear Bacterial Alternative yeast nuclear Ascidian mitochondrial Flatworm mitodhondrial Blepharisma macronuclear 48

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

THE GENETIC CODE Figure 1: The genetic code showing the codons and their respective amino acids

THE GENETIC CODE Figure 1: The genetic code showing the codons and their respective amino acids THE GENETIC CODE As DNA is a genetic material, it carries genetic information from cell to cell and from generation to generation. There are only four bases in DNA and twenty amino acids in protein, so

More information

11 questions for a total of 120 points

11 questions for a total of 120 points Your Name: BYS 201, Final Exam, May 3, 2010 11 questions for a total of 120 points 1. 25 points Take a close look at these tables of amino acids. Some of them are hydrophilic, some hydrophobic, some positive

More information

Protein Synthesis. Application Based Questions

Protein Synthesis. Application Based Questions Protein Synthesis Application Based Questions MRNA Triplet Codons Note: Logic behind the single letter abbreviations can be found at: http://www.biology.arizona.edu/biochemistry/problem_sets/aa/dayhoff.html

More information

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation 1. DNA, RNA structure 2. DNA replication 3. Transcription, translation DNA and RNA are polymers of nucleotides DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous

More information

DNA.notebook March 08, DNA Overview

DNA.notebook March 08, DNA Overview DNA Overview Deoxyribonucleic Acid, or DNA, must be able to do 2 things: 1) give instructions for building and maintaining cells. 2) be copied each time a cell divides. DNA is made of subunits called nucleotides

More information

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural

More information

www.lessonplansinc.com Topic: Gene Mutations WS Summary: Students will learn about frame shift mutations and base substitution mutations. Goals & Objectives: Students will be able to demonstrate how mutations

More information

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with

More information

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below. Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose

More information

Key Concept Translation converts an mrna message into a polypeptide, or protein.

Key Concept Translation converts an mrna message into a polypeptide, or protein. 8.5 Translation VOBLRY translation codon stop codon start codon anticodon Key oncept Translation converts an mrn message into a polypeptide, or protein. MIN IDES mino acids are coded by mrn base sequences.

More information

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos Daily Agenda Warm Up: Review Translation Notes Protein Synthesis Practice Redos 1. What is DNA Replication? 2. Where does DNA Replication take place? 3. Replicate this strand of DNA into complimentary

More information

Biomolecules: lecture 6

Biomolecules: lecture 6 Biomolecules: lecture 6 - to learn the basics on how DNA serves to make RNA = transcription - to learn how the genetic code instructs protein synthesis - to learn the basics on how proteins are synthesized

More information

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided.

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided. 7.014 Quiz II 3/18/05 Your Name: TA's Name: Write your name on this page and your initials on all the other pages in the space provided. This exam has 10 pages including this coversheet. heck that you

More information

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1 BSCI348S Fall 2003 Midterm 1 Multiple Choice: select the single best answer to the question or completion of the phrase. (5 points each) 1. The field of bioinformatics a. uses biomimetic algorithms to

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

DNA Begins the Process

DNA Begins the Process Biology I D N A DNA contains genes, sequences of nucleotide bases These Genes code for polypeptides (proteins) Proteins are used to build cells and do much of the work inside cells DNA Begins the Process

More information

Evolutionary Genetics. LV Lecture with exercises 6KP

Evolutionary Genetics. LV Lecture with exercises 6KP Evolutionary Genetics LV 25600-01 Lecture with exercises 6KP HS2017 >What_is_it? AATGATACGGCGACCACCGAGATCTACACNNNTC GTCGGCAGCGTC 2 NCBI MegaBlast search (09/14) 3 NCBI MegaBlast search (09/14) 4 Submitted

More information

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are? 2 strands, has the 5-carbon sugar deoxyribose, and has the nitrogen base Thymine. The actual process of assembling the proteins on the ribosome is called? DNA translation Adenine pairs with Thymine, Thymine

More information

From DNA to Protein: Genotype to Phenotype

From DNA to Protein: Genotype to Phenotype 12 From DNA to Protein: Genotype to Phenotype 12.1 What Is the Evidence that Genes Code for Proteins? The gene-enzyme relationship is one-gene, one-polypeptide relationship. Example: In hemoglobin, each

More information

Mutation Rates and Sequence Changes

Mutation Rates and Sequence Changes s and Sequence Changes part of Fortgeschrittene Methoden in der Bioinformatik Computational EvoDevo University Leipzig Leipzig, WS 2011/12 From Molecular to Population Genetics molecular level substitution

More information

Problem: The GC base pairs are more stable than AT base pairs. Why? 5. Triple-stranded DNA was first observed in 1957. Scientists later discovered that the formation of triplestranded DNA involves a type

More information

Chapter 14 Active Reading Guide From Gene to Protein

Chapter 14 Active Reading Guide From Gene to Protein Name: AP Biology Mr. Croft Chapter 14 Active Reading Guide From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single

More information

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism

Degenerate Code. Translation. trna. The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism Translation The Code is Degenerate trna / Proofreading Ribosomes Translation Mechanism Degenerate Code There are 64 possible codon triplets There are 20 naturally-encoding amino acids Several codons specify

More information

Thr Gly Tyr. Gly Lys Asn

Thr Gly Tyr. Gly Lys Asn Your unique body characteristics (traits), such as hair color or blood type, are determined by the proteins your body produces. Proteins are the building blocks of life - in fact, about 45% of the human

More information

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns Folding simulation: self-organization of 4-helix bundle protein yellow = helical turns Protein structure Protein: heteropolymer chain made of amino acid residues R + H 3 N - C - COO - H φ ψ Chain of amino

More information

Disease and selection in the human genome 3

Disease and selection in the human genome 3 Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS 1 CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, PhD Computer Science, Kennesaw State University 2 Genetics The discovery of

More information

Level 2 Biology, 2017

Level 2 Biology, 2017 91159 911590 2SUPERVISOR S Level 2 Biology, 2017 91159 Demonstrate understanding of gene expression 2.00 p.m. Wednesday 22 November 2017 Credits: Four Achievement Achievement with Merit Achievement with

More information

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma. Cannarozzi 28th October 2005 Class Overview RNA Protein Genomics Transcriptomics Proteomics Genome wide Genome Comparison Microarrays Orthology: Families comparison and Sequencing of Transcription factor

More information

DNA/RNA. Transcription and Translation

DNA/RNA. Transcription and Translation DNA/RNA Transcription and Translation Review DNA is responsible for controlling the production of proteins in the cell, which is essential to life DNA RNA Proteins Chromosomes contain several thousand

More information

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS Nucleic acids are extremely large molecules that were first isolated from the nuclei of cells. Two kinds of nucleic acids are found in cells: RNA (ribonucleic

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

2. From the first paragraph in this section, find three ways in which RNA differs from DNA. Name Chapter 17: From Gene to Protein Begin reading at page 328 Basic Principles of Transcription and Translation. Work on this chapter a single concept at a time, and expect to spend at least 6 hours

More information

FROM MOLECULES TO LIFE

FROM MOLECULES TO LIFE Chapter 7 (Strickberger) FROM MOLECULES TO LIFE Organisms depended on processes that transformed materials available outside of the cell into metabolic products necessary for cellular life. These processes

More information

PROTEIN SYNTHESIS Study Guide

PROTEIN SYNTHESIS Study Guide PART A. Read the following: PROTEIN SYNTHESIS Study Guide Protein synthesis is the process used by the body to make proteins. The first step of protein synthesis is called Transcription. It occurs in the

More information

7.012 Final Exam

7.012 Final Exam 7.012 Final Exam 2006 You have 180 minutes to complete this exam. There are 19 pages including this cover page, the AMINO AID page, and the GENETI ODE page at the end of the exam. Please write your name

More information

7.013 Problem Set 3 FRIDAY October 8th, 2004

7.013 Problem Set 3 FRIDAY October 8th, 2004 MIT Biology Department 7.012: Introductory Biology - Fall 2004 Instructors: Professor Eric Lander, Professor Robert. Weinberg, Dr. laudette ardel Name: T: 7.013 Problem Set 3 FRIDY October 8th, 2004 Problem

More information

DNA- THE MOLECULE OF LIFE. Link

DNA- THE MOLECULE OF LIFE. Link DNA- THE MOLECULE OF LIFE Link STRUCTURE OF DNA DNA (Deoxyribonucleic Acid): DNA is a long, stringy, twisted molecule made up of nucleotides that carries genetic information. DISCOVERIES Rosalind Franklin,

More information

Just one nucleotide! Exploring the effects of random single nucleotide mutations

Just one nucleotide! Exploring the effects of random single nucleotide mutations Dr. Beatriz Gonzalez In-Class Worksheet Name: Learning Objectives: Just one nucleotide! Exploring the effects of random single nucleotide mutations Given a coding DNA sequence, determine the mrna Based

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

Zool 3200: Cell Biology Exam 3 3/6/15

Zool 3200: Cell Biology Exam 3 3/6/15 Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either

More information

Lecture 11: Gene Prediction

Lecture 11: Gene Prediction Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are

More information

Chapter 17: From Gene to Protein

Chapter 17: From Gene to Protein Name Period This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect to spend at least 6 hours to truly master

More information

Transcription and Translation

Transcription and Translation Biology Name: Morales Date: Period: Transcription and Translation Directions: Read the following and answer the questions in complete sentences. DNA is the molecule of heredity it determines an organism

More information

Adv Biology: DNA and RNA Study Guide

Adv Biology: DNA and RNA Study Guide Adv Biology: DNA and RNA Study Guide Chapter 12 Vocabulary -Notes What experiments led up to the discovery of DNA being the hereditary material? o The discovery that DNA is the genetic code involved many

More information

Chapter 14: Gene Expression: From Gene to Protein

Chapter 14: Gene Expression: From Gene to Protein Chapter 14: Gene Expression: From Gene to Protein This is going to be a very long journey, but it is crucial to your understanding of biology. Work on this chapter a single concept at a time, and expect

More information

Honors packet Instructions

Honors packet Instructions Honors packet Instructions The following are guidelines in order for you to receive FULL credit for this bio packet: 1. Read and take notes on the packet in full 2. Answer the multiple choice questions

More information

Chapter 10: Gene Expression and Regulation

Chapter 10: Gene Expression and Regulation Chapter 10: Gene Expression and Regulation Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are the workhorses but contain no information THUS Information in DNA must

More information

CFSSP: Chou and Fasman Secondary Structure Prediction server

CFSSP: Chou and Fasman Secondary Structure Prediction server Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil

More information

DNA- THE MOLECULE OF LIFE

DNA- THE MOLECULE OF LIFE DNA- THE MOLECULE OF LIFE STRUCTURE OF DNA DNA (Deoxyribonucleic Acid): DNA is a long, stringy, twisted molecule made up of nucleotides that carries genetic information. DISCOVERIES Rosalind Franklin,

More information

Solutions to Quiz II

Solutions to Quiz II MIT Department of Biology 7.014 Introductory Biology, Spring 2005 Solutions to 7.014 Quiz II Class Average = 79 Median = 82 Grade Range % A 90-100 27 B 75-89 37 C 59 74 25 D 41 58 7 F 0 40 2 Question 1

More information

RNA and Protein Synthesis

RNA and Protein Synthesis Harriet Wilson, Lecture Notes Bio. Sci. 4 - Microbiology Sierra College RNA and Protein Synthesis Considerable evidence suggests that RNA molecules evolved prior to DNA molecules and proteins, and that

More information

RNA and PROTEIN SYNTHESIS. Chapter 13

RNA and PROTEIN SYNTHESIS. Chapter 13 RNA and PROTEIN SYNTHESIS Chapter 13 DNA Double stranded Thymine Sugar is RNA Single stranded Uracil Sugar is Ribose Deoxyribose Types of RNA 1. Messenger RNA (mrna) Carries copies of instructions from

More information

1. DNA replication. (a) Why is DNA replication an essential process?

1. DNA replication. (a) Why is DNA replication an essential process? ame Section 7.014 Problem Set 3 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68120 by 5:00pm on Friday

More information

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 37. The essential fatty acids are A. palmitic acid B. linoleic acid C. linolenic

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Exam Chapter 17 Genes to Proteins Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. The following questions refer to Figure 17.1, a simple metabolic

More information

Welcome to Genome 371!

Welcome to Genome 371! Genome 371, 4 Jan 2010, Lecture 1 Welcome to Genome 371! If you are not registered - please don t take a seat! (class is full) - see Anne Paul (outside) to get on the wait list If you are registered and

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Neurospora mutants. Beadle & Tatum: Neurospora molds. Mutant A: Mutant B: HOW? Neurospora mutants

Neurospora mutants. Beadle & Tatum: Neurospora molds. Mutant A: Mutant B: HOW? Neurospora mutants Chapter 10: Central Dogma Gene Expression and Regulation Mutant A: Neurospora mutants Mutant B: Not made Not made Fact 1: DNA contains information but is unable to carry out actions Fact 2: Proteins are

More information

DNA Structure and Replication, and Virus Structure and Replication Test Review

DNA Structure and Replication, and Virus Structure and Replication Test Review DNA Structure and Replication, and Virus Structure and Replication Test Review What does DNA stand for? Deoxyribonucleic Acid DNA is what type of macromolecule? DNA is a nucleic acid The building blocks

More information

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores. 1 Introduction 2 Chromosomes Topology & Counts 3 Genome size 4 Replichores and gene orientation 5 Chirochores 6 7 Codon usage 121 marc.bailly-bechet@univ-lyon1.fr Bacterial genome structures Introduction

More information

Name Date of Data Collection. Class Period Lab Days/Period Teacher

Name Date of Data Collection. Class Period Lab Days/Period Teacher Comparing Primates (adapted from Comparing Primates Lab, page 431-438, Biology Lab Manual, by Miller and Levine, Prentice Hall Publishers, copyright 2000, ISBN 0-13-436796-0) Background: One of the most

More information

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein This is also known as: The central dogma of molecular biology Protein Proteins are made

More information

Genetics I. DNA, RNA and protein structure

Genetics I. DNA, RNA and protein structure DNA, RNA and protein structure Genetic information is stored in: - nucleus genome - mitochondria mitochondriome - chloroplasts plastome - cellular parasites (viruses) - genomic parasites (transposons)

More information

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE

INTRODUCTION TO THE MOLECULAR GENETICS OF THE COLOR MUTATIONS IN ROCK POCKET MICE The Making of the The Fittest: Making of the Fittest Natural Selection Natural and Adaptation Selection and Adaptation Educator Materials TEACHER MATERIALS INTRODUCTION TO THE MOLECULAR GENETICS OF THE

More information

C. Incorrect! Threonine is an amino acid, not a nucleotide base.

C. Incorrect! Threonine is an amino acid, not a nucleotide base. MCAT Biology - Problem Drill 05: RNA and Protein Biosynthesis Question No. 1 of 10 1. Which of the following bases are only found in RNA? Question #01 (A) Ribose. (B) Uracil. (C) Threonine. (D) Adenine.

More information

Gene Expression REVIEW Packet

Gene Expression REVIEW Packet Name Pd. # Gene Expression REVIEW Packet 1. Fill-in-the-blank General Summary Transcription & the Big picture Like, ribonucleic acid (RNA) is a acid a molecule made of nucleotides linked together. RNA

More information

Gene Expression Translation U C A G A G

Gene Expression Translation U C A G A G Why? ene Expression Translation How do cells synthesize polypeptides and convert them to functional proteins? The message in your DN of who you are and how your body works is carried out by cells through

More information

DNA & Protein Synthesis UNIT D & E

DNA & Protein Synthesis UNIT D & E DNA & Protein Synthesis UNIT D & E How this Unit is broken down Chapter 10.1 10.3 The structure of the genetic material Chapter 10.4 & 10.5 DNA replication Chapter 10.6 10.15 The flow of genetic information

More information

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1 AP BIOLOGY MOLECULAR GENETICS ACTIVITY #2 NAME DATE HOUR PROTEIN SYNTHESIS Molecular Genetics Activity #2 page 1 GENETIC CODE PROTEIN SYNTHESIS OVERVIEW Molecular Genetics Activity #2 page 2 PROTEIN SYNTHESIS

More information

Worksheet: Mutations Practice

Worksheet: Mutations Practice Worksheet: Mutations Practice There are three ways that DNA can be altered when a mutation (change in DNA sequence) occurs. 1. Substitution one base-pairs is replaced by another: Example: G to C or A to

More information

KEY CONCEPT DNA was identified as the genetic material through a series of experiments. Found live S with R bacteria and injected

KEY CONCEPT DNA was identified as the genetic material through a series of experiments. Found live S with R bacteria and injected Section 1: Identifying DNA as the Genetic Material KEY CONCEPT DNA was identified as the genetic material through a series of experiments. VOCABULARY bacteriophage MAIN IDEA: Griffith finds a transforming

More information

Biology: The substrate of bioinformatics

Biology: The substrate of bioinformatics Bi01_1 Unit 01: Biology: The substrate of bioinformatics What is Bioinformatics? Bi01_2 handling of information related to living organisms understood on the basis of molecular biology Nature does it.

More information

7.2 Protein Synthesis. From DNA to Protein Animation

7.2 Protein Synthesis. From DNA to Protein Animation 7.2 Protein Synthesis From DNA to Protein Animation Proteins Why are proteins so important? They break down your food They build up muscles They send signals through your brain that control your body They

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words). 1 Quiz1 Q1 2011 Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words) Value Correct Answer 1 noncovalent interactions 100% Equals hydrogen bonds (100%) Equals H-bonds

More information

Problem Set 8. Answer Key

Problem Set 8. Answer Key MCB 102 University of California, Berkeley August 11, 2009 Isabelle Philipp Online Document Problem Set 8 Answer Key 1. The Genetic Code (a) Are all amino acids encoded by the same number of codons? no

More information

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives

Protein Synthesis. Lab Exercise 12. Introduction. Contents. Objectives Lab Exercise Protein Synthesis Contents Objectives 1 Introduction 1 Activity.1 Overview of Process 2 Activity.2 Transcription 2 Activity.3 Translation 3 Resutls Section 4 Introduction Having information

More information

Protein Synthesis

Protein Synthesis HEBISD Student Expectations: Identify that RNA Is a nucleic acid with a single strand of nucleotides Contains the 5-carbon sugar ribose Contains the nitrogen bases A, G, C and U instead of T. The U is

More information

Codon Bias with PRISM. 2IM24/25, Fall 2007

Codon Bias with PRISM. 2IM24/25, Fall 2007 Codon Bias with PRISM 2IM24/25, Fall 2007 from RNA to protein mrna vs. trna aminoacid trna anticodon mrna codon codon-anticodon matching Watson-Crick base pairing A U and C G binding first two nucleotide

More information

Do you think DNA is important? T.V shows Movies Biotech Films News Cloning Genetic Engineering

Do you think DNA is important? T.V shows Movies Biotech Films News Cloning Genetic Engineering DNA Introduction Do you think DNA is important? T.V shows Movies Biotech Films News Cloning Genetic Engineering At the most basic level DNA is a set of instructions for protein construction. Structural

More information

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity.

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity. Mutagenesis 1. Classification of mutation 2. Base Substitution 3. Insertion Deletion 4. s 5. Chromosomal Aberration 6. Repair Mechanisms Classification of mutation 1. Definition heritable change in DNA

More information

GENE EXPRESSION AT THE MOLECULAR LEVEL. Copyright (c) The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

GENE EXPRESSION AT THE MOLECULAR LEVEL. Copyright (c) The McGraw-Hill Companies, Inc. Permission required for reproduction or display. GENE EXPRESSION AT THE MOLECULAR LEVEL Copyright (c) The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Gene expression Gene function at the level of traits Gene function

More information

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION

M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION M I C R O B I O L O G Y WITH DISEASES BY TAXONOMY, THIRD EDITION Chapter 7 Microbial Genetics Lecture prepared by Mindy Miller-Kittrell, University of Tennessee, Knoxville The Structure and Replication

More information

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!! Protein Synthesis/Gene Expression Why do we need to make proteins? To build parts for our body as

More information

PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION

PROTEIN SYNTHESIS TRANSCRIPTION AND TRANSLATION ame Biology eriod Date RTEI SYTHESIS TRSRITI D TRSLTI D is the molecule that stores the genetic information in your cells. That information is coded in the four s of D: (cytosine), (guanine), (adenine),

More information

translation The building blocks of proteins are? amino acids nitrogen containing bases like A, G, T, C, and U Complementary base pairing links

translation The building blocks of proteins are? amino acids nitrogen containing bases like A, G, T, C, and U Complementary base pairing links The actual process of assembling the proteins on the ribosome is called? translation The building blocks of proteins are? Complementary base pairing links Define and name the Purines amino acids nitrogen

More information

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33 HPER 1 DN: he Hereditary Molecule SEION D What Does DN Do? hapter 1 Modern enetics for ll Students S 33 D.1 DN odes For Proteins PROEINS DO HE nitty-gritty jobs of every living cell. Proteins are the molecules

More information

Protein Synthesis Making Proteins

Protein Synthesis Making Proteins Protein Synthesis Making Proteins 2009-2010 Bodies Cells DNA Bodies are made up of cells All cells run on a set of instructions spelled out in DNA DNA Cells Bodies How does DNA code for cells & bodies?

More information

Protein Synthesis: Transcription and Translation

Protein Synthesis: Transcription and Translation Review Protein Synthesis: Transcription and Translation Central Dogma of Molecular Biology Protein synthesis requires two steps: transcription and translation. DNA contains codes Three bases in DNA code

More information

DNA Replication and Protein Synthesis

DNA Replication and Protein Synthesis DNA Replication and Protein Synthesis DNA is Deoxyribonucleic Acid. It holds all of our genetic information which is passed down through sexual reproduction DNA has three main functions: 1. DNA Controls

More information

Molecular Genetics. Before You Read. Read to Learn

Molecular Genetics. Before You Read. Read to Learn 12 Molecular Genetics section 3 DNA,, and Protein DNA codes for, which guides protein synthesis. What You ll Learn the different types of involved in transcription and translation the role of polymerase

More information

Bio 102 Practice Problems Genetic Code and Mutation

Bio 102 Practice Problems Genetic Code and Mutation Bio 102 Practice Problems Genetic Code and Mutation Multiple choice: Unless otherwise directed, circle the one best answer: 1. Beadle and Tatum mutagenized Neurospora to find strains that required arginine

More information

Gene Expression Transcription/Translation Protein Synthesis

Gene Expression Transcription/Translation Protein Synthesis Gene Expression Transcription/Translation Protein Synthesis 1. Describe how genetic information is transcribed into sequences of bases in RNA molecules and is finally translated into sequences of amino

More information

Chem 465 Biochemistry II

Chem 465 Biochemistry II Chem 465 Biochemistry II Name: 2 points Multiple choice (4 points apiece): 1. Which of the following is not true of trna molecules? A) The 3'-terminal sequence is -CCA. B) Their anticodons are complementary

More information