MATH 5610, Computational Biology

Similar documents
3'A C G A C C A G T A A A 5'

Genes and How They Work. Chapter 15

Introduction to Cellular Biology and Bioinformatics. Farzaneh Salari

Videos. Lesson Overview. Fermentation

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

Hello! Outline. Cell Biology: RNA and Protein synthesis. In all living cells, DNA molecules are the storehouses of information. 6.

6.C: Students will explain the purpose and process of transcription and translation using models of DNA and RNA

The Nature of Genes. The Nature of Genes. Genes and How They Work. Chapter 15/16

I. Gene Expression Figure 1: Central Dogma of Molecular Biology

Basic concepts of molecular biology

From Gene to Protein. How Genes Work

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

Roadmap. The Cell. Introduction to Molecular Biology. DNA RNA Protein Central dogma Genetic code Gene structure Human Genome

BIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.

Gene function at the level of traits Gene function at the molecular level

From DNA to Protein: Genotype to Phenotype

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Lecture for Wednesday. Dr. Prince BIOL 1408

Biology 3201 Genetics Unit #5

Lesson Overview. Fermentation 13.1 RNA

Translation BIT 220 Chapter 13

7.2 Protein Synthesis. From DNA to Protein Animation

The Structure of Proteins The Structure of Proteins. How Proteins are Made: Genetic Transcription, Translation, and Regulation

From DNA to Protein: Genotype to Phenotype

CH 17 :From Gene to Protein

Bioinformatics: Sequence Analysis. COMP 571 Luay Nakhleh, Rice University

Ch 12.DNA and RNA.Biology.Landis

Protein Synthesis Notes

Griffith and Transformation (pages ) 1. What hypothesis did Griffith form from the results of his experiments?

Basic concepts of molecular biology

Molecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016

The Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Nature of Genes. The Genetic Code. Genes and How They Work

From DNA to Protein. Chapter 14

MATH 5610, Computational Biology

Textbook Reading Guidelines

Fig Ch 17: From Gene to Protein

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Biology. Biology. Slide 1 of 39. End Show. Copyright Pearson Prentice Hall

Analysis of Biological Sequences SPH

RNA, & PROTEIN SYNTHESIS. 7 th Grade, Week 4, Day 1 Monday, July 15, 2013

NUCLEIC ACID METABOLISM. Omidiwura, B.R.O

Key Area 1.3: Gene Expression

Transcription. Unit: DNA. Central Dogma. 2. Transcription converts DNA into RNA. What is a gene? What is transcription? 1/7/2016

Gene Expression: Transcription, Translation, RNAs and the Genetic Code

Protein Synthesis

Nucleic acids deoxyribonucleic acid (DNA) ribonucleic acid (RNA) nucleotide

From RNA To Protein

Fermentation. Lesson Overview. Lesson Overview 13.1 RNA

Algorithms in Bioinformatics ONE Transcription Translation

Lecture Overview. Overview of the Genetic Information. Marieb s Human Anatomy and Physiology. Chapter 3 DNA & RNA Protein Synthesis Lecture 6

RNA and PROTEIN SYNTHESIS. Chapter 13

GENE EXPRESSION AT THE MOLECULAR LEVEL. Copyright (c) The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Unit 1. DNA and the Genome

From Gene to Protein

BIO 311C Spring Lecture 36 Wednesday 28 Apr.

Gene Expression Transcription/Translation Protein Synthesis

ENGR 213 Bioengineering Fundamentals April 25, A very coarse introduction to bioinformatics

Chapter 10: Gene Expression and Regulation

Biology. DNA & the Language of Life

Unit II Problem 3 Genetics: Summary of Basic Concepts in Molecular Biology

iclicker Question #28B - after lecture Shown below is a diagram of a typical eukaryotic gene which encodes a protein: start codon stop codon 2 3

PRINCIPLES OF BIOINFORMATICS

PROTEIN SYNTHESIS. Higher Level

MOLECULAR GENETICS PROTEIN SYNTHESIS. Molecular Genetics Activity #2 page 1

3. The following sequence is destined to be translated into a protein: However, a mutation occurs that results in the molecule being altered to:

Chapter 12-3 RNA & Protein Synthesis Notes From DNA to Protein (DNA RNA Protein)

Section 10.3 Outline 10.3 How Is the Base Sequence of a Messenger RNA Molecule Translated into Protein?

Ch 10 Molecular Biology of the Gene

Algorithms in Bioinformatics

DNA. Essential Question: How does the structure of the DNA molecule allow it to carry information?

Chapter 17: From Gene to Protein

C. Incorrect! Threonine is an amino acid, not a nucleotide base.

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

From Gene to Protein. How Genes Work (Ch. 17)

CHAPTER 21 LECTURE SLIDES

Molecular Genetics. Before You Read. Read to Learn

Chapter 10 - Molecular Biology of the Gene

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

DNA & RNA. Chapter Twelve and Thirteen Biology One

O C. 5 th C. 3 rd C. the national health museum

Chapter 17 From Gene to Protein

Big Idea 3C Basic Review

Adv Biology: DNA and RNA Study Guide

How to Use This Presentation

Genomics and Gene Recognition Genes and Blue Genes

Introduction to Algorithms in Computational Biology Lecture 1

Chapter 8 From DNA to Proteins. Chapter 8 From DNA to Proteins

Chapter 12: Molecular Biology of the Gene

Study Guide A. Answer Key

Chapter 11. Gene Expression and Regulation. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc..

Annotating the Genome (H)

DNA is the MASTER PLAN. RNA is the BLUEPRINT of the Master Plan

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

Protein Synthesis ~Biology AP~

PROTEIN SYNTHESIS. copyright cmassengale

Unit IX Problem 3 Genetics: Basic Concepts in Molecular Biology

Advanced Algorithms and Models for Computational Biology

PROTEIN SYNTHESIS. copyright cmassengale

Transcription:

MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24

Announcements Error on syllabus Class meets 5:30-6:45, not 7:00-8:15. Office hours for Christiaan Van Woudenberg: TR after class (6:45-7:45) MATH 5610, Computational Biology p.2/24

Key Ideas from Last Lecture DNA and RNA are strings from 4 letter alphabet. Orientation Complementarity Central Dogma MATH 5610, Computational Biology p.3/24

Tonight More Molecular Biology Proteins Translation & the genetic code Genes: reading frames, introns/exons Protein Structure and Function Hydrophobicity/Hydrophilicity Sequence Alignment MATH 5610, Computational Biology p.4/24

Proteins Do most of the work in a cell. Enzymes: catalysts for chemical reactions. Structural Proteins: form cellular structure. Regulatory Proteins: control expression of genes or activities of other proteins. Transport Proteins: carry molecules across membranes or around body. Composed of strings of amino acids. Translated from RNA by ribosome complexes. Fold up into 3 dimensional structure which largely determines protein function. MATH 5610, Computational Biology p.5/24

Translation mrna is translated to form proteins. Genetic Code: 3 nucleotides translate to 1 amino acid. 20 amino acids + stop codon = 21 codes needed. Question: How many possible sequences of 3 nucleotides are there? MATH 5610, Computational Biology p.6/24

Translation mrna is translated to form proteins. Genetic Code: 3 nucleotides translate to 1 amino acid. 20 amino acids + stop codon = 21 codes needed. Question: How many possible sequences of 3 nucleotides are there? Answer: 4 3 = 64. Duplication: different codons code for same amino acids. Often an error in 3rd position in codon results in same amino acid. MATH 5610, Computational Biology p.6/24

Translation mrna is translated to form proteins. Genetic Code: 3 nucleotides translate to 1 amino acid. 20 amino acids + stop codon = 21 codes needed. Question: How many possible sequences of 3 nucleotides are there? Answer: 4 3 = 64. Duplication: different codons code for same amino acids. Often an error in 3rd position in codon results in same amino acid. MATH 5610, Computational Biology p.6/24

Genes A gene is a sequence of nucleotides in the DNA molecule coding for one unit of genetic information. A gene is expressed when that segment of DNA is transcribed into mrna. RNA polymerase binds to DNA molecule, and then moves along the DNA building the complementary RNA molecule. For binding to occur, there must be a promoter sequence on the DNA, which is a set of specific nucleotide sequences in just the right positions relative to the gene. Gene expression is regulated by proteins (or RNA molecules) that bind to the promoter regions. Sometimes, such a protein makes it easier for RNA polymerase to bind to the DNA and begin transcription (positive regulation). Other times, the protein makes it harder (negative regulation). MATH 5610, Computational Biology p.7/24

Reading Frames Recall: nucleotides in the mrna molecule are translated to proteins in triplets. If you shifted by one nucleotide, you would get an entirely different amino acid sequence: tyr leu arg leu ----- ----- ----- ----- U A C C U U A G A C U C G ----- ----- ----- ----- thr leu asp ser There are 3 possible reading frames, which correspond to the 3 possible ways of dividing the sequence up into triplets. MATH 5610, Computational Biology p.8/24

Open reading frames The choice of reading frame depends on where the ribosome binds to the mrna. This always occurs at a start codon (AUG). Translation begins immediately following a start codon (AUG), and continues until a stop codon is reached (UAA, UAG, or UGA). An open reading frame (ORF) is a sequence of mrna beginning with a start codon, and ending at the first stop codon encountered. All proteins are translated from ORFs. But not all ORFs are translated. Long ORFs are rare, unless the ORF corresponds to an actual protein. In a random Nucleotide sequence, stop codons make up 3/64 of the codons, so average ORF length is about 21 codons. Most proteins are hundreds of amino acids long. MATH 5610, Computational Biology p.9/24

Introns and Exons In prokaryotes, the mrna is transcribed directly from the DNA. In Eukaryotes, the mrna can be modified by splicing before it is translated. Splicing involves removing internal sequences called introns from the mrna. Introns do not code for proteins. The parts of the mrna that are not removed are called exons. Exons contain the sequence information for the protein. Alternative splicing: Many RNA sequences can be spliced in multiple ways (called alternative splicings). MATH 5610, Computational Biology p.10/24

Protein Structure and Function The 3-D structure of a protein largely determines the function. Hierarchy of structure: Primary structure: Linear order of the amino acids. Secondary structure: Location and direction of common structures called α-helices and β-sheets. Tertiary structure: The 3-dimensional shape of the protein. Quaternary structure: The overall 3-D structure of a complex of multiple proteins. (Image from www.biology.bnl.gov/structure/images/swami_p18.jpg) MATH 5610, Computational Biology p.11/24

Hydrophobicity/Hydrophilicity Some amino acids have polar side chains. (the charge of the molecule is not symmetric. These residues are attracted to water. Such amino acids are called Hydrophilic. Other amino acids involve nonpolar side chains. These are called Hydrophobic. Because proteins reside in water, they tend to fold up in ways such that the hydrophylic residues are on the outside and the hydrophobic residues are in the inside. MATH 5610, Computational Biology p.12/24

Topics not covered from Ch. 1 Read about this on your own. Chemical details. Molecular Biology Tools Genomic Information Content (Optional). MATH 5610, Computational Biology p.13/24

Sequence Comparison/Alignment Dot Plots Sequence Alignment Scoring methods Derivation of scoring matrices Dynamic Programming MATH 5610, Computational Biology p.14/24

Dot Plot A C T C G A G C A C A G T A G C MATH 5610, Computational Biology p.15/24

Sequence Alignment Definition: Alignment = pairwise matching between the characters of each sequence. often requires inserting gaps into the sequences. Ex: 2 alignments of the same sequences: AATCTATA AAG-AT-A Which is better? AATCTATA AA--GATA MATH 5610, Computational Biology p.16/24

Sequence Alignment Definition: Alignment = pairwise matching between the characters of each sequence. often requires inserting gaps into the sequences. Ex: 2 alignments of the same sequences: AATCTATA AAG-AT-A Which is better? AATCTATA AA--GATA MATH 5610, Computational Biology p.17/24

Types of Alignments Global best alignment of two fixed sequences Semiglobal Finds best overlap of two sequences. Doesn t penalize gaps at beginning or end. Local Finds the best scoring alignment of subsequences. Multiple Sequence Alignment aligns multiple sequences. MATH 5610, Computational Biology p.18/24

Scoring Alignments Goal: Devise a scoring function for an alignment such that the best alignment gets the highest score. Once the scoring function is defined, we will then be able to devise algorithms to search for the highest scoring alignments. Simple Example: Matches = +1 Mismatches = 0 Gaps = -1 AATCTATA AAG-AT-A ++0-00-+ = +1 AATCTATA AA--GATA ++--0+++ = +3 MATH 5610, Computational Biology p.19/24

Discussion Question What makes a good scoring function? MATH 5610, Computational Biology p.20/24

Ideas for Scoring Functions: Edit distance. (how many edits (sub., ins., del.) are needed to transform one sequence to another?) Homology. Assume both sequences evolved from a common (but unknown) ancestor. Which alignment best reflects this evolutionary relationship? Avoid unintended mathematical biases. Computational efficiency. Complex scoring functions may be harder to compute with. MATH 5610, Computational Biology p.21/24

Substitution Score Matrix Evolutionarily, some substitutions are more probable than others. Physical/Chemical properties. Ex: in DNA, transitional substitutions (purine purine) are more probable than transverional substitutions (purine pyrimidine) Selective pressure during evolution. Ex: in protein, substitutions that change structure are selected against. MATH 5610, Computational Biology p.22/24

Nucleotide Score Matrices Usually quite simple: BLAST matrix (match=5, mismatch=-4) A T C G A 5-4 -4-4 T -4 5-4 -4 C -4-4 5-4 G -4-4 -4 5 Transition/Transversion matrix A T C G A 1-5 -5-1 T -5 1-1 -5 C -5-1 1-5 G -1-5 -5 1 MATH 5610, Computational Biology p.23/24

Amino Acid Substitution Score Matrix Based on statistical model of accepted mutations (i.e., mutations that survive evolution). Example: (BLOSUM62 amino acid substitution matrix) C S T P A G C 9 S -1 4 T -1 1 5 P -3-1 -1 7 A 0 1 0-1 4 G -3 0-2 -2 0 6.... MATH 5610, Computational Biology p.24/24