ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

Similar documents
Sections 12.3, 13.1, 13.2

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).

3. Use the codon chart to translate the mrna sequence into an amino acid sequence.

CHapter 14. From DNA to Protein

Hello! Outline. Cell Biology: RNA and Protein synthesis. In all living cells, DNA molecules are the storehouses of information. 6.

3. The following sequence is destined to be translated into a protein: However, a mutation occurs that results in the molecule being altered to:

Biology Celebration of Learning (100 points possible)

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

DNA Structure and Replication, and Virus Structure and Replication Test Review

Protein Synthesis 101

Ch 12.DNA and RNA.Biology.Landis

Griffith and Transformation (pages ) 1. What hypothesis did Griffith form from the results of his experiments?

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

1. An alteration of genetic information is shown below. 5. Part of a molecule found in cells is represented below.

13.1 RNA Lesson Objectives Contrast RNA and DNA. Explain the process of transcription.

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

Transcription and Translation

6. Which nucleotide part(s) make up the rungs of the DNA ladder? Sugar Phosphate Base

BIOLOGY 111. CHAPTER 6: DNA: The Molecule of Life

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

Just one nucleotide! Exploring the effects of random single nucleotide mutations

A DNA molecule consists of two strands of mononucleotides. Each of these strands

From DNA to Protein: Genotype to Phenotype

Lecture for Wednesday. Dr. Prince BIOL 1408

Chapter 8. Microbial Genetics. Lectures prepared by Christine L. Case. Copyright 2010 Pearson Education, Inc.

Chapter 14: From DNA to Protein

Basic concepts of molecular biology

Protein Synthesis: Transcription and Translation

Living Environment. Directions: Use Aim # (Unit 4) to complete this study guide.

DNA & Genetics. Chapter Introduction DNA 6/12/2012. How are traits passed from parents to offspring?

DNA RNA Protein Trait Protein Synthesis (Gene Expression) Notes Proteins (Review) Proteins make up all living materials

Molecular Basis of Inheritance

Comparing RNA and DNA

RNA, & PROTEIN SYNTHESIS. 7 th Grade, Week 4, Day 1 Monday, July 15, 2013

Ch 10.4 Protein Synthesis

Protein Synthesis

From Gene to Protein

Chapter 10: Gene Expression and Regulation

Assessment Schedule 2013 Biology: Demonstrate understanding of gene expression (91159)

BIOLOGY. Monday 14 Mar 2016

Write: Unit 5 Review at the top.

Genes and How They Work. Chapter 15

From DNA to Protein: Genotype to Phenotype

Class XII Chapter 6 Molecular Basis of Inheritance Biology

Chapter 8: DNA and RNA

Review? - What are the four macromolecules?

Genes are coded DNA instructions that control the production of proteins within a cell. The first step in decoding genetic messages is to copy a part

From Gene to Protein Transcription and Translation i

Basic concepts of molecular biology

Chapter 11. Gene Expression and Regulation. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc..

Protein Synthesis Honors Biology

COMP 555 Bioalgorithms. Fall Lecture 1: Course Introduction

GENETICS and the DNA code NOTES

Gene Expression. Student:

Chapter 4 DNA Structure & Gene Expression

Chapter 13: RNA and Protein Synthesis. Dr. Bertolotti

DNA life s code. Importance of DNA. DNA Structure. DNA Structure - nucleotide. DNA Structure nitrogen bases. Linking Nucleotides

Name Date Class. The Central Dogma of Biology

To truly understand genetics, biologists first had to discover the chemical nature of genes

Helps DNA put genetic code into action RNA Structure

The Nature of Genes. The Nature of Genes. Genes and How They Work. Chapter 15/16

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

BIO 101 : The genetic code and the central dogma

From Gene to Protein Transcription and Translation i

What Are the Chemical Structures and Functions of Nucleic Acids?

The Structure of RNA. The Central Dogma

Biology Day 67. Tuesday, February 24 Wednesday, February 25, 2015

DNA - DEOXYRIBONUCLEIC ACID

Biology 30 DNA Review: Importance of Meiosis nucleus chromosomes Genes DNA

DNA & PROTEIN SYNTHESIS REVIEW

Gene Expression Transcription/Translation Protein Synthesis

Gene Expression DNA to Protein - 1

Unit VII DNA to RNA to protein The Central Dogma

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Transcription & Translation Practice Examination

Adv Biology: DNA and RNA Study Guide

RNA & PROTEIN SYNTHESIS

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA

How can something so small cause problems so large?

E. Incorrect! The four different DNA nucleotides follow a strict base pairing arrangement:

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

The Flow of Genetic Information

RNA and Protein Synthesis

Independent Study Guide The Blueprint of Life, from DNA to Protein (Chapter 7)

Why are proteins important?

2. Examine the objects inside the box labeled #2. What is this called? nucleotide

MATH 5610, Computational Biology

Jay McTighe and Grant Wiggins,

DNA. Essential Question: How does the structure of the DNA molecule allow it to carry information?

BIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.

(Very) Basic Molecular Biology

From Gene to Protein Transcription and Translation

Chapter 13. From DNA to Protein

4/3/2013. DNA Synthesis Replication of Bacterial DNA Replication of Bacterial DNA

AGENDA for 02/07/14 AGENDA: HOMEWORK: Due end of period. Due Thurs, OBJECTIVES:

Lesson 8. DNA: The Molecule of Heredity. Gene Expression and Regulation. Introduction to Life Processes - SCI 102 1

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

4/22/2014. Interest Grabber. Section Outline. Today s Goal. Percentage of Bases in Four Organisms. Figure 12 2 Griffith s Experiment

produces an RNA copy of the coding region of a gene

Transcription:

A very coarse introduction to bioinformatics In this exercise, you will get a quick primer on how DNA is used to manufacture proteins. You will learn a little bit about how the building blocks of these proteins may change due to point mutations. And, you will investigate how a statistical description of these mutations can be used as evidence supporting hypothesized evolutionary paths. Summary of genetic machinery DNA sequences are composed of four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). The two strands of DNA are composed of nucleotides with complementary bases where A is matched up with T and C is matched up with G. An example of the two strands of a fragment of DNA is shown here: DNA (3' end)...cacctacgttcaggaggtcaggactggtac... (5' end) (5' end)...gtggatgcaagtcctccagtcctgaccatg... (3' end) A DNA sequence is transcribed into mrna starting at the 3' end of the DNA strand. Each nucleotide of the DNA strand attracts its complementary RNA base through a pairing rule. mrna is composed of A, G, and C bases with uracine (U) substituted for thymine. The pairing rule is shown below: DNA -> mrna A U T A G C C G DNA (3' end)...cacctacgttcaggaggtcaggactggtac... (5' end) mrna (5' end)...guggaugcaaguccuccaguccugaccaug... (3' end) mrna is translated into a sequence of amino acids by ribosomes. This process is initiated by a ribosome with the amino acid methionine attached to it. This ribosome latches onto the 5' end of the mrna strand and begins to move downstream until it finds a sequence of three nucleotides, AUG, known as the start codon. This codon triggers the ribosome to start building a polypeptide with methionine as the first amino acid in the chain. The ribosome continues down the mrna strand by binding to the next triplet (without re-using any of the mrna nucleotides and matching each subsequent triplet with a particular amino acid. This continues until the ribosome reaches a stop codon signifying the end of the chain of amino acids which make up the particular protein coded by this genetic sequence. The triplets and their corresponding amino acids are displayed in the matrix on the following page.

An alternate way of visualizing this information is the codon wheel:

The red letters in the codon wheel are a simple code for each of the twenty amino acids that can be used to express the sequence of amino acids that comprise a single protein. The ribosome finds the beginning of the gene by looking for a start codon and then reads the mrna sequence as a series of triplets until it reaches a stop codon. start codon v mrna (5' end) GUGGAUGCAAGUCCUCCAGUCCUGACCAUG (3' end) mrna triplets AUG CAA GUC CUC CAG UCC UGA (start) (stop) Amino acids M Q V L Q S Notice that the codon table (or wheel) shows that there are two different mrna codes for glutamine (Q) which differ in the third base of the triplet. Either CAG or CAA triplets will cause a glutamine to be added to the polypeptide chain. There is redundancy in the codon for some amino acids but there is no ambiguity. Redundancy is actually a trait which protects the genetic code from some point mutations. A point mutation is characterized by the accidental substitution of a single nucleotide with another nucleotide. A point mutation is responsible for sickle cell disease due to the substiation of of a T for an A in a triplet for GAG. The sequence GAG codes for glutamic acid whereas the sequence GTG codes for valine. The substitition of one amino acid for another can cause the resulting protein to have a different shape and/or functionality. Point mutations that occur in the third base position are sometimes silent when they change the genetic code but do not change the sequence of amino acids in the resulting protein. Point mutations in the first and second positions are much more likely to change the structure of a protein.

Properties of amino acids This Venn diagram relates some of the properties of amino acids. Substitutions of amino acids with similar properties are less likely to change the function of the protein than those whose properties are very different.

Evolution and bioinformatics Bioinformatics is a study of genetics and molecular biology using tools derived from the fields of computer science and information technology. By studying many samples of genetic material from a single species, it is possible to estimate the probability that mutations to the genetic code will cause one amino acid in a protein to be changed to a different amino acid. This can be quantified by a substitution matrix. In the matrix below each entry is a log-odds score, which means that it is the log of the likelihood of a certain amino acid substitution taking place in a fixed number of generations. A positive number represents a substitution that is relatively common. A negative number is one that is less common. These numbers are known as a PAM score (for Point Accepted Mutation). To compare a sequence of amino acids in two related proteins: sequence 1: sequence 2: PEYDLLV PERDILV To get from sequence 1 to 2, the first two amino acids must be preserved, the third substituted, the fourth conserved, etc. These two sequences would be scored as follows:

sequence 1: P E Y D L L V sequence 2: P E R D I L V PAM score: +7 +5-2 +6 +2 +4 +4 = 26 A third sequence might get a PAM score of 24 when compared to sequence 1. The lower PAM score suggests that the third sequence is less likely to have evolved from sequence 1 over a given amount of time. A simplified look at the study of evolution using bioinformatics 1. Isolate and decode a particular gene in two different species Not always as easy as it sounds. These genes may have different lengths and various insertions, deletions, and mutations. 2. Align the base pairs for the two genes. 3. Determine the sequence of amino acids coded by the genes. 4. Calculate a quantitative measure of the differences between the proteins produced by the two genes. 5. Assess this difference in evolutionary terms. The average protein in the human body is comprised of about 400 amino acids but there might be tens of thousands of base pairs in the DNA sequence which produces this protein. Analyzing two alleles of a single gene would be virtually impossible to do by hand. There are more than 20,000 genes in the human genome. Obviously, most of the raw work of bioinformatics must be done by computers.

Genetic Sequence Alignment One of the more difficult and time-consuming tasks involved in the study of genetic information is deciding how to two sequences should be aligned for comparison. Evolution of the genetic code includes not only point mutations, but also dropped information and inserted information in the genetic code. The following notes describe how you can use PAM scores to decide how sequences of amino acids in two different species should be compared. To solve alignment problems, we use the PAM table from the previous section: This table gives you a relative measure of the likelihood of a substitution between amino acids due to point mutations. We will now consider one more type of mutation an insertion or deletion of one or more amino acids. Each instance of a insertion or deletion has an equivalent PAM score of -5. Using this information and given two sequences of possibly different lengths, one must decide what is the most likely alignment of the amino acids. For example, sequence 1: sequence 2: PEYLLV PERDILV To align these two sequences, we must take into account a deletion of a single codon in the DNA code of the sequence 1. Or equivalently, this could be an insertion of a single codon in sequence 2. Several alignments might reasonably be considered:

sequence 1: P E Y - L L V sequence 2: P E R D I L V PAM score: +7 +5-2 -5 +2 +4 +4 = 15 sequence 1: P E - Y L L V sequence 2: P E R D I L V PAM score: +7 +5-5 -3 +2 +4 +4 = 14 sequence 1: P E Y L - L V sequence 2: P E R D I L V PAM score: +7 +5-2 -4-5 +4 +4 = 9 The higher PAM score suggests that the first alignment is the most likely to arise from evolutionary processes. There is a tabular solution technique that can simplify the alignment process. To use this technique, we first write one sequence at the top of the table and the other along the left hand side: P E R D I L V In the upper left hand corner of the table, the 0 is a starting point for calculating the relative probability of producing one of these sequences from the other through point mutations, substitutions, or deletions. The goal is to sum together the relative probabilities of each link in the replication/mutation sequence that gets to the bottom right corner of the table. The most probable path through this table is usually the best alignment of the two sequences. The first amino acid in each sequence is proline (P). The relative probably of an accurate replication of this is given by the PAM score for a P to P replication, +7. This is the most likely event, so we add this PAM score to our current score of 0 (from the cell above the current one along the diagonal). We enter the new sum in the corresponding cell, 0+7 = 7.

P 0+7 E R D I L V Moving down in the table indicates that the vertical sequence has an insertion (PAM score = -5).compared to the horizontal sequence. Moving to the right indicates a deletion (PAM score = -5). Moving diagonally down and to the right indicates that the next two amino acids represent either an accurate duplication or a point mutation. In our case, this would be an accurate duplication of E (PAM score = +5). This last option is more probable (higher PAM score) as indicated in the table below. P 7 7-5=2 E 7-5=2 7+5=12 R D I L V We continue this process for each cell in out table choosing the most probable outcome and retaining some of the neighboring PAM values for comparison purposes. P 7 2 E 2 12 12-5=7 R 12-5=7 12-2=10 D I L V

The alignment should generally follow a path to the left and downward. Inspection of the two sequences shows that the last amino acids should match up directly. So, the task is to find the path from left to right that is most probable. We will continue to choose the most likely outcome at each step to see where that takes us. The next step is shown below: P 7 2 E 2 12 7 R 7 10 5 D 5 6 I L V And, continuing to the end of the sequence, we get the following: P 7 2 E 2 12 7 R 7 10 5 D 5 6 1 I 1 8 3 L 3 9 V 4 This path has a net PAM score of +4. However, this is only one of many possible paths through the table. Another one is shown below: P 7 2 E 2 12 7 R 7 10 5 D 5 6 I 1 L 5 V 9

Yet another path through this table is shown below. This alignment is the most probable one that we found previously. P 7 2 E 2 12 7 R 7 10 5 D 5 0 I 0 7 L 11 V 15 This path problem formulation allows the more mathematically-oriented scientists to use graph theory to solve for the most probable alignment between any two sequences. However, the optimization problem is beyond our present needs. A brute force approach will also work wherein one tries all possible paths through the table and chooses the one with the highest PAM score.