Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Similar documents
An enzyme is a protein that acts as a biological catalyst that is, it speeds up a metabolic reaction without itself being permanently charged.

Protein Structure Prediction. christian studer , EPFL

Structural bioinformatics

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

Introduction to Proteins

2/23/16. Protein-Protein Interactions. Protein Interactions. Protein-Protein Interactions: The Interactome

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

BIRKBECK COLLEGE (University of London)

How Do You Clone a Gene?

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.


Chapter 13 Section 2: DNA Replication

Analysis of Biological Sequences SPH

Proteins Higher Order Structures

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

ENZYMES. Unit 3 - Energy

What is RNA? Another type of nucleic acid A working copy of DNA Does not matter if it is damaged or destroyed

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

CHAPTER 17 FROM GENE TO PROTEIN. Section C: The Synthesis of Protein

Nucleic Acids, Proteins, and Enzymes

Final exam. Please write your name on the exam and keep an ID card ready.

Mapping strategies for sequence reads

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

The common structure of a DNA nucleotide. Hewitt

Creation of a PAM matrix

Genetic Algorithms For Protein Threading

Overview. Things to do Prior to Protein Work. Preparation of Protein for Crystallization. Crystallization. Strategies when Crystal is Absent

Accurate Prediction for Atomic-Level Protein Design and its Application in Diversifying the Near-Optimal Sequence Space

Nanobiotechnology. Place: IOP 1 st Meeting Room Time: 9:30-12:00. Reference: Review Papers. Grade: 50% midterm, 50% final.

Bi 8 Lecture 7. Ellen Rothenberg 26 January Reading: Ch. 3, pp ; panel 3-1

Structural Bioinformatics (C3210) DNA and RNA Structure

Bio-inspired Models of Computation. An Introduction

Genetics Lecture 21 Recombinant DNA

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

Protein Folding Problem I400: Introduction to Bioinformatics

Nucleic Acids and the RNA World. Pages Chapter 4

Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication.

BIOINFORMATICS Introduction

Metaheuristics and Cognitive Models for Autonomous Robot Navigation

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

Thursday 26 May 2016 Afternoon Time allowed: 1 hour 30 minutes

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

Ligand docking and binding site analysis with pymol and autodock/vina

DNA Replication. The Organization of DNA. Recall:

TEKS and S.E.s. B.9C identify and investigate the role of enzymes

RNA Structure Prediction and Comparison. RNA Biology Background

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Lesson Overview DNA Replication

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

Predicting Microarray Signals by Physical Modeling. Josh Deutsch. University of California. Santa Cruz

Characterization of Aptamer Binding using SensíQ SPR Platforms

Lecture Four. Molecular Approaches I: Nucleic Acids

Investigating Protein Stability with the Optical Tweezer

STUDY GUIDE SECTION 10-1 Discovery of DNA

Biological immune systems

COMPUTER SIMULATION OF ENZYME KINETICS

green B 1 ) into a single unit to model the substrate in this reaction. enzyme

translation The building blocks of proteins are? amino acids nitrogen containing bases like A, G, T, C, and U Complementary base pairing links

Proteins the primary biological macromolecules of living organisms

Covalently bonded sugar-phosphate backbone with relatively strong bonds keeps the nucleotides in the backbone connected in the correct sequence.

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica

MATH 5610, Computational Biology

Peptide libraries: applications, design options and considerations. Laura Geuss, PhD May 5, 2015, 2:00-3:00 pm EST

Chapter 3 Nucleic Acids, Proteins, and Enzymes

Electrophoresis and transfer

DNA Technology. Asilomar Singer, Zinder, Brenner, Berg

DNA & Protein Synthesis UNIT D & E

Residue Contact Prediction for Protein Structure using 2-Norm Distances

From DNA to Protein Structure and Function

DNA, RNA, PROTEIN SYNTHESIS, AND MUTATIONS UNIT GUIDE Due December 9 th. Monday Tuesday Wednesday Thursday Friday 16 CBA History of DNA video

CFSSP: Chou and Fasman Secondary Structure Prediction server

TRANSPORTATION PROBLEM AND VARIANTS

Section DNA: The Molecule of Heredity

Τάσος Οικονόµου ιαλεξη 8. Kινηση, λειτουργια, ελεγχος.

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Kinetics Review. Tonight at 7 PM Phys 204 We will do two problems on the board (additional ones than in the problem sets)

From Gene to Protein transcription, messenger RNA (mrna) translation, RNA processing triplet code, template strand, codons,

developer.* The Independent Magazine for Software Professionals Automating Software Development Processes by Tim Kitchens

Zool 3200: Cell Biology Exam 3 3/6/15

The Double Helix. DNA and RNA, part 2. Part A. Hint 1. The difference between purines and pyrimidines. Hint 2. Distinguish purines from pyrimidines

Dr. Jeffrey P. Thompson bio350

Answers to Module 1. An obligate aerobe is an organism that has an absolute requirement of oxygen for growth.

Astronomy picture of the day (4/21/08)

Worksheet Structure of DNA and Replication

All atom protein folding with massively parallel computers

DNA Structure and Replication. Higher Human Biology

Learning to Use PyMOL (includes instructions for PS #2)

Active Learning Exercise 9. The Hereditary Material: DNA

Immune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor

Machine learning in neuroscience

The mechanism(s) of protein folding. What is meant by mechanism. Experimental approaches

Components of DNA. Components of DNA. Aim: What is the structure of DNA? February 15, DNA_Structure_2011.notebook. Do Now.

2. The instructions for making a protein are provided by a gene, which is a specific segment of a molecule.

DNA RNA PROTEIN SYNTHESIS -NOTES-

DNA replication: Enzymes link the aligned nucleotides by phosphodiester bonds to form a continuous strand.

DNA, RNA, PROTEIN SYNTHESIS, AND MUTATIONS UNIT GUIDE Due December 9 th. Monday Tuesday Wednesday Thursday Friday 16 CBA History of DNA video

Nucleic acids deoxyribonucleic acid (DNA) ribonucleic acid (RNA) nucleotide

Optimizing Synthetic DNA for Metabolic Engineering Applications. Howard Salis Penn State University

Transcription:

Protein design CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror 1

Outline Why design proteins? Overall approach: Simplifying the protein design problem Protein design methodology Designing the backbone Select sidechain rotamers: the core optimization problem Optional: giving the backbone wiggle room Optional: negative design Optional: complementary experimental methods Examples of recent designs How well does protein design work? 2

Why design proteins? 3

Problem definition Given the desired three dimensional structure of a protein, design an amino acid sequence that will assume that structure. Of course, a precise set of atomic coordinates would determine sequence. Usually we start with an approximate desired structure. Alternatively, we may want to design for a particular function (e.g., the ability to bind a particular ligand). 4 http://www.riken.jp/zhangiru/images/sequence_protein.jpg

Sample applications Designing enzymes (proteins that catalyze chemical reactions) Useful for production of industrial chemicals and drugs Designing proteins that bind specifically to other proteins Potential for HIV, cancer, Alzheimer s treatment Special case: antibody design Designing sensors (proteins that bind to and detect the presence of small molecules for example, by lighting up or changing color) Calcium sensors used to detect neuronal activity in imaging studies Proteins that detect TNT or other explosives, for mine detection Making a more stable variant of an existing protein Or a water-soluble variant of a membrane protein You re not responsible for these 5

Overall approach: simplifying the protein design problem 6

Direct approach Given a target structure, search over all possible protein sequences For each protein sequence, predict its structure, and compare to the target structure Choose the best match 7

Direct approach has two major problems Computationally intractable We d need to use ab initio structure prediction Ab initio structure prediction for even one sequence is computationally intensive 20 N possible sequences with N residues May not be good enough! Ab initio structure prediction is far from perfect, in part because energy functions are imperfect Given an energy function, what we really want is to maximize the probability of the desired structure (compared to all other possible folded and unfolded structures) We could do this by sampling the full Boltzmann distribution for each candidate sequence but that s even more computationally intensive! 8

We can dramatically simplify this problem by making a few assumptions 1. Assume the backbone geometry is fixed 2. Assume each amino acid can only take on a finite number of geometries (rotamers) 3. Assume that what we want to do is to maximize the energy drop from the completely unfolded state to the target geometry In other words, simply ignore all the other possible folded structures that we want to avoid We ll first address the problem under these assumptions, then consider relaxing them a bit 9

The simplified problem At each position on the backbone, choose a rotamer (an amino acid type and a side-chain geometry) to minimize overall energy We assume the energy is a free energy. The Rosetta all-atom force field (physics-based/knowledge-based hybrid) is a common choice. For each amino acid sequence, energy is measured relative to the unfolded state. In practice a reference energy for each amino acid is subtracted off, corresponding roughly to how much that amino acid favors folded states You re not responsible for this Assume that energy can be expressed as a sum of terms that depend on one rotamer or two rotamers each. This is the case for the Rosetta force fields (and for most molecular mechanics force fields as well). Thus, we wish to minimize total energy ET, where E T = E i (r i ) + E ij (r i,r j ) i i j Note that ri specifies both the amino acid at position i and its side-chain geometry 10

Protein design methodology 11

Protein design methodology Designing the backbone 12

Designing the backbone The first step of most protein design protocols is to select one or more target backbone structures. This is as much art as science. Often multiple target structures are selected, because some won t work. (Apparently proteins can only adopt a limited set of backbone structures, but we don t have a great description of what that set is.) Methods to do this: Use an experimentally determined backbone structure Assemble secondary structural elements by hand Use a fragment assembly program like Rosetta, selecting fragment combinations that fit some approximate desired structure 13

Example of backbone design To design Top7, a protein with a novel fold, Kuhlman et al. started with a schematic, then used Rosetta fragment assembly to find 172 backbone models that fit it. Initial schematic of target fold. Hexagons = β sheet. Squares = α helix. Arrows = hydrogen bonds. Letters indicate amino acids in final designed sequence (these were not determined until much later). Final structure Kuhlman et al., Science 302:1364-8 (2003) 14

Protein design methodology Select sidechain rotamers: the core optimization problem 15

The optimization problem Given a desired backbone geometry, we wish to select rotamers at each position to minimize total energy E T = E i (r i ) + E ij (r i,r j ) i i j where ri specifies both the amino acid at position i and its side-chain geometry 16

Optimization methods Heuristic methods Not guaranteed to find optimal solution, but faster Most common is Metropolis Monte Carlo Moves may be as simple as randomly choosing a position, then randomly choosing a new rotamer at that position May decrease temperature over time (simulated annealing) Exact methods Guaranteed to find optimal solution, but slow for larger proteins Most common is likely Dead-End Elimination Method, which prunes branches of the exhaustive search tree by proving that certain rotamers are incompatible with the global optimum The A* optimization algorithm (originally developed at Stanford, for robot path-finding) is also used You re not responsible for the details of how these exact methods work. 17

Protein design methodology Optional: giving the backbone wiggle room 18

Flexible backbone design One of our key simplifying assumptions was that of a fixed backbone geometry. For many applications, protein design works better if you give the backbone some limited wiggle room. This requires optimizing simultaneously over rotamers and backbone geometry. Often addressed through a Monte Carlo search procedure that alternates between local tweaks to backbone dihedrals and changes to side-chain rotamers 19

Protein design methodology Optional: negative design 20

Negative design Another simplifying assumption was that we simply minimize the energy of the desired structure We do not consider all other possible structures. It s possible that their energy ends up even lower. In negative design, we identify a few structures that we want to avoid, and we try to keep their energies high during the design process. This can help, but we cannot explicitly avoid all possible incorrect structures without making the problem much more complicated. So the overall approach is still heuristic. 21

Protein design methodology Optional: complementary experimental methods 22

Complementary experimental methods Computational protein design is often combined with experimental protein engineering methods For example, computational designs can often be improved by directed evolution Directed evolution involves introducing random mutations to proteins and picking out the best ones Usually this is done in living cells, with the fittest cells (i.e., those containing the best version of the protein) selected by some measure 23

Examples of recent designs 24

Designing proteins that bind specific ligands The example below required specification of the position of certain side chains that will form favorable interactions with the ligand Protein designed to bind tightly to a specific steroid, but not to related molecules Tinberg et al., Nature 501:212-6 (2013) 25

Designing enzymes In the example below, the protein holds two molecules in just the right relative positions for them to react. This speeds up the reaction. Molecule 1 Molecule 2 Siegel et al., Science 329:309-13 (2010) 26

Design of a transporter De novo design of a protein that transports zinc ions (Zn 2+ ), but not calcium ions (Ca 2+ ), across a cell membrane 27 Joh et al., Science 346:1520-24 (2014)

How well does protein design work? 28

How well does protein design work? Very impressive recent successes However, one should keep in mind that: Successful protein design projects often involve making and experimentally testing dozens of candidate proteins to find a good one Projects and design strategies that fail generally aren t published Design of membrane proteins remains difficult 29

A caveat This course covers a rapidly developing field. The literature is not always self-consistent. This includes papers in scientific journals sometimes even those recommended on the course website as additional reading.