Protein Structure Prediction by Constraint Logic Programming

Similar documents

Basic concepts of molecular biology

Basic concepts of molecular biology

11 questions for a total of 120 points

6-Foot Mini Toober Activity

Fundamentals of Protein Structure

Nucleic acid and protein Flow of genetic information

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Algorithms in Bioinformatics ONE Transcription Translation

Proteins: Wide range of func2ons. Polypep2des. Amino Acid Monomers

Steroids. Steroids. Proteins: Wide range of func6ons. lipids characterized by a carbon skeleton consis3ng of four fused rings

Unit 1. DNA and the Genome

DNA.notebook March 08, DNA Overview

Using DNA sequence, distinguish species in the same genus from one another.

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF BIOTECHNOLOGY Professor Bjørn E. Christensen, Department of Biotechnology

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Protein Structure Analysis

Bi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13

Granby Transcription and Translation Services plc

Protein Synthesis. Application Based Questions

Biochemistry and Cell Biology

ENZYMES AND METABOLIC PATHWAYS

Molecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016

2018 Protein Modeling Exam Key

Solutions to Problem Set 1

From mechanism to medicne

MOLEBIO LAB #3: Electrophoretic Separation of Proteins

EE550 Computational Biology

2) Which functional group is least important in biochemistry? A) amine B) ester C) hydroxyl D) aromatic E) amide

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Key Concept Translation converts an mrna message into a polypeptide, or protein.

Aipotu II: Biochemistry

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

Turning λ Cro into a

Protein Structure and Function! Lecture 4: ph, pka and pi!

BIL 256 Cell and Molecular Biology Lab Spring, 2007 BACKGROUND INFORMATION I. PROTEIN COMPOSITION AND STRUCTURE: A REVIEW OF THE BASICS

Zool 3200: Cell Biology Exam 3 3/6/15

What is necessary for life?

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-

LIST OF ACRONYMS & ABBREVIATIONS

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

Lecture 19A. DNA computing

Additional Case Study: Amino Acids and Evolution

Name: TOC#. Data and Observations: Figure 1: Amino Acid Positions in the Hemoglobin of Some Vertebrates

How life. constructs itself.

DNA is normally found in pairs, held together by hydrogen bonds between the bases

BME205: Lecture 2 Bio systems. David Bernick

What is necessary for life?

The study of protein secondary structure and stability at equilibrium ABSTRACT

1/4/18 NUCLEIC ACIDS. Nucleic Acids. Nucleic Acids. ECS129 Instructor: Patrice Koehl

NUCLEIC ACIDS. ECS129 Instructor: Patrice Koehl

蛋白質體學. Proteomics Amino acids, Peptides and Proteins 陳威戎 & 21

Your Name. Mean ± SD = 72.3 ± 17.6 Range = out of 112 points N = Points

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

Packing of Secondary Structures

Biology: The substrate of bioinformatics

A Zero-Knowledge Based Introduction to Biology

CFSSP: Chou and Fasman Secondary Structure Prediction server

03-511/711 Computational Genomics and Molecular Biology, Fall

The combination of a phosphate, sugar and a base forms a compound called a nucleotide.

IB HL Biology Test: Topics 1 and 3

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

Bio 451 Examination I September 22, 1995

Dr. R. Sankar, BSE 631 (2018)

The Nature of Life. Some properties of life. Copyright 2005 Pearson Education, Inc. publishing as Benjamin Cummings

Protein Structure, Function, and Folding

Solutions to 7.02 Quiz II 10/27/05

Chemistry 121 Winter 17

The Effect of Using Different Neural Networks Architectures on the Protein Secondary Structure Prediction

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

Computational Methods for Protein Structure Prediction

AOCS/SQT Amino Acid Round Robin Study

Forensic Science: DNA Evidence Unit

Lezione 10. Bioinformatica. Mauro Ceccanti e Alberto Paoluzzi

First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!

Basics of Protein Structure: Short Introduction to Molecular Structures

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

Name. Student ID. Midterm 2, Biology 2020, Kropf 2004

DNA stands for deoxyribose nucleic acid

THE GENETIC CODE Figure 1: The genetic code showing the codons and their respective amino acids

KEMM15 Lecture note in structural bioinformatics: A practical guide. S Al-Karadaghi, Biochemistry & Structural Biology, Lund University

Amino Acids and Proteins

Chapter 10. The Structure and Function of DNA. Lectures by Edward J. Zalisko

Ch Biophysical Chemistry

Human Biology 175 Lecture Notes: Cells. Section 1 Introduction

Aipotu I & II: Genetics & Biochemistry

BIOLOGY. Monday 14 Mar 2016

Hmwk # 8 : DNA-Binding Proteins : Part II

Aipotu I: Genetics & Biochemistry

Visualizing proteins with PyMol

Name Date of Data Collection. Class Period Lab Days/Period Teacher

Residue Contact Prediction for Protein Structure using 2-Norm Distances

He who asks is a fool for five minutes, but he who does not ask remains a fool forever. Today. Admin Stuff. CSE527 Computational Biology

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function.

The Stringent Response

Transcription:

MPRI C2-19 Protein Structure Prediction by Constraint Logic Programming François Fages, Constraint Programming Group, INRIA Rocquencourt mailto:francois.fages@inria.fr http://contraintes.inria.fr/

Molecules in the Cell Small molecules: covalent bonds 50-200 kcal/mol 70% water, 1% ions, 6% amino acids (20), nucleotides (5), fats, sugars, ATP, ADP, Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol Stability and bindings determined by the number of weak bonds: 3D shape 20% proteins (50-10 4 amino acids) RNA (10 2-10 4 nucleotides AGCU) DNA (10 2-10 6 nucleotides AGCT)

Aminoacids 20 aminoacids: Alanine (A), Cysteine (C), Aspartic Acid (D), Glutamic Acid (E), Phenylalanine (F), Glycine (G), Histidine (H), Isoleucine (I), Lysine (K), Leucine (L), Methionine (M), Asparagine (N), Proline (P), Glutamine (Q), Arginine (R), Serine (S), Threonine (T), Valine (V), Tryptophan (W), Tyrosine (Y). Same backbone centered on C linked with covalent C-N bonds to different side chains (residues) from 1 atom (for G) to 18 (for T)

Glycine G Tryptophan T C2H5NO2 9+1=10 atoms C11H12N2O2 9+18=27 atoms White = H, Blue = N, Red = O, Grey = C http://www.chemie.fu-berlin.de/chemistry/bio/amino-acids en.html

Primary Structure of Proteins Protein = word of n amino acids (20 n possibilities) linked with C-N bonds n=25-100 for hormones, 100-500 for globular proteins, 3000 for fibrous proteins, Example: MPRI Methionine-Proline-Arginine-Isoleucine (unstable!)

Secondary Structure of Proteins Protein = word of m forms among three forms (3 m possibilities) stabilized by hydrogen bonds H---O: 1. -helices of 5-40 contiguous residues (with 3.6 res. per tour) 2. b-sheets of strands composed of 5-10 contiguous residues 3. random coils,

Tertiary Structure of Proteins Protein= 3D spatial conformation Native conformation in a determined environment (e.g. water) 30000 structures in Protein Data Bank Bonded interactions between atoms: - covalent bonds (unbreakable) - disulfide bonds (slow to form/break) e.g. between cysteine residues - hydrogen bonds H O (fast to form/break) Non-bonded interactions between atoms: - electrostatic (long-range) - van der Waals (short range) Confirmation that minimizes the free energy (Anfinsen s hypothesis)

Importance of Protein Structure Prediction Understand protein folding, interaction capabilities, protein docking Domain prediction, function prediction Drug design and/or optimization More than 50% of the drugs target receptor proteins Enzymes design and/or optimization Inverse problem: protein synthesis of a given shape Can restrict the number of amino acids One of the most important problems of bioinformatics Methods are evaluated in the CASP competition (every two years) Protein data bank: repository of tertiary structures (weekly updates)

Protein Structure Prediction and Folding Problems Hypotheses: Each aminoacid is a sphere centered on its C α atom. An energy function is given for each pair of aminoacids that are close. Input: the primary structure of a protein i.e. sequence of n amino acids (plus some information on its secondary structure) Output (PSP): the tertiary structure of the protein that minimizes the free energy. Output (PFP): the folding sequence to the tertiary structure of the protein (with secondary structures formed first)

HP Energy Model Hydrophobic (H) aminoacids: Cys (C), Ile (I), Leu (L), Phe (F), Met (M), Val (V), Trp (W), His (H), Tyr (Y), Ala (A) Polar (P) aminoacids: Lys (K), Glu (E), Arg (R), Ser (S), Gln (Q), Asp (D), Asn (N), Thr (T), Pro (P), Gly (G) The protein is in water: hydrophobic elements tend to occupy the center of the protein. H aminoacids tend to stay close each other (hydrophobic) P aminoacids tend to stay in the frontier (polar) Energy(HH)= -1 Energy(HP,PH,PP)=0 for pairs of aminoacids in contact

Matrix Energy Model Same assumption: only pairs of aminoacids in contact contribute to the energy value. There is a potential matrix storing the contribution for each pair of aminoacids in contact. Values are either positive or negative. The global energy must be minimized. Energy(AB)=w AB for pairs on aminoacids in contact

Spatial Models In known protein structures, the distance between two consecutive C atoms is essentially fixed (3.8 A) 1. Lattice (discrete) models. 2. Continuous models.

Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E.

Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E. Def. Squared euclidean distance eucl(a,b)=(x B -x A ) 2 +(y B -y A ) 2 +(z B -z A ) 2

Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E. Def. Squared euclidean distance eucl(a,b)=(x B -x A ) 2 +(y B -y A ) 2 +(z B -z A ) 2 Def. A cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) x,y,z Z} E={(A,B) A,B P, eucl(a,b)=k=1} A cubic lattice is 6-connected. (solve x 2 +y 2 +z 2 =1 in (0,0,0))

PSP in a Cubic Lattice Model Cubes of size 1, Vertices labeled by aminoacid names Find a conformation w:{1 n} Z 3 such that: w(i) w(j) for i j w(i)-w(i+1) =1 for the Euclidean norm Minimizes the energy E = Σ w(i)-w(j) =1, i+1<j Energy(w(i),w(j)) Thm. The existence of an admissible conformation with energy less than a constant k is NP-complete. Proof. Reduce the bin packing problem to a PSP problem

Admissible Foldings Let S=s1,,sn be a sequence of amino acids Def. A folding in a crystal lattice (P,E) is a mapping w:n P, w(i)=pi

Admissible Foldings Let S=s1,,sn be a sequence of amino acids Def. A folding in a crystal lattice (P,E) is a mapping w:n P, w(i)=pi Def. An admissible folding is a folding such that Constant distance k for consecutive pairs: eucl(w(i),w(i+1))=k Non-overlapping: eucl(w(i),w(j))>0 for i j Occupancy of side chain: eucl(w(i),w(j))>k for i-j 2 Bend angles in [90..150 ]: k1 eucl(w(i),w(i+2)) k2 for i {1,,n-2}

Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) x,y,z Z, x+y+z is even} E={(A,B) A,B P, eucl(a,b)= 2} An FCC is 6-connected at distance 2 (solve x 2 +y 2 +z 2 =4 in (0,0,0))

Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) x,y,z Z, x+y+z is even} E={(A,B) A,B P, eucl(a,b)= 2} An FCC is 6-connected at distance 2?-connected at distance 2?

Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) x,y,z Z, x+y+z is even} E={(A,B) A,B P, eucl(a,b)= 2} An FCC is 6-connected at distance 2 12-connected at distance 2 (solve x 2 +y 2 +z 2 =2 in (0,0,0)) 18 contact neighbours at distance 2? bend angles?

Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) x,y,z Z, x+y+z is even} E={(A,B) A,B P, eucl(a,b)= 2} An FCC is 6-connected at distance 2 12-connected at distance 2 18 contact neighbours at distance 2 (60 ), 90, 120 and 180 bend angles

Face-Centered Cubic Lattice Model Abstraction: Each aminoacid is a sphere centered on its C α atom Fixed distance of 3.8Ä between C α atoms Table Pot(x,y) of energy value per pairs of aminoacids x and y Face-centered Cubic Lattice model: Cubes of size 2 where points are vertices and central points of faces 12 connected neighbors at distance 2 (corresponding to 3.8Ä) 18 contact neighbors at distance 2 (corresponding to 5.4Ä<6.4Ä) Conformation w:{1 n} Z 3 such that: w(i) w(j) for i j and w(i)-w(i+1) = 2 Minimizes the energy E = Σ w(i)-w(j) =2, i+1<j Pot(w(i),w(j))

Principles of Constraint Logic Programming Constrain-and-generate versus generate-and-test. find(x):- find(x):- constrain(x), generate(x). generate(x), test(x). active use of constraints to prune the search tree in advance - symbolic/numerical solving of constraints - constraint propagation over finite domains Optimization by branch-and-bound procedure optim(x,c,r):- cost(x)<c, find(x), optim(y,cost(x),r). optim(x,c,c).

Placing N queens on an NxN Chessboard queens(n,l):- length(l,n), fd_domain(l,1,n), safe(l), fd_labeling(l,first_fail). safe([]). safe([x L]):- noattack(l,x,1), safe(l). noattack([],_,_). noattack([y L],X,I):- X#\=Y, X#\=Y+I, X+I#\=Y, I1 is I+1, noattack(l,x,i1).

Optimal Scheduling of a Project of 50 Tasks

CLP Approaches to Protein Structure Predication HP cubic model [Backofen-Will 03] HP face-centered cubic model [Dal Palu-Dovier-Fogolari04] from seconds for n<50 to tenth of minutes s for n<100 fcc_pf( ID, Time, Compact):- protein(id,primary,secondary), constrain(primary,secondary,indexes, Tertiary,Energy,Matrix,Freq,Compact), writetime(video,'constraint time: '),nl,!, solution_search(time,primary,secondary,indexes, Tertiary,Energy,Matrix,Freq), print_results(id,time,primary,secondary,tertiary,compact).

Constraint Logic Program [Dal Palu-Dovier-Fogolari04] Domain bounds domain_bounds(n,[x,y,z Rest]):- CUBESIZE is N * 2, domain([x,y,z],0,cubesize), domain_bounds(_,[]). sum([x,y,z],#=,sum), even(sum), domain_bounds(n, Rest). Circuit constraint avoid_self_loops(tertiary, N):- BASE is 2*N+1, positions_to_integers(tertiary,listintegers,base), all_different(listintegers). Connectivity constraints next(x1,y1,z1,x2,y2,z2):- domain([dx,dy,dz],0,1), Non-connectivity constraints Dx #= abs(x1-x2), Dy #= abs(y1-y2), Dz #= abs(z1-z2), sum([dx,dy,dz], #=, 2). non_next(x1,y1,z1,x2,y2,z2):- Dx#=(X1-X2)*(X1-X2), Dy#=(Y1-Y2)*(Y1-Y2), Contact energy constraint Dz#=(Z1-Z2)*(Z1-Z2), sum([dx,dy,dz],#>,2).

Explored Search Tree for Protein 1KVG (n=12) Size Time 17 0,04 27 1,76 34 0,8 36 4,31 54 45,3 63 58mn 69 2mn 78 1,18 97 35mn 104 1 0mn