Structural bioinformatics

Similar documents
Dr. R. Sankar, BSE 631 (2018)

Virtual bond representation

Computational Methods for Protein Structure Prediction

CFSSP: Chou and Fasman Secondary Structure Prediction server

Cryo-electron microscopy

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Fundamentals of Protein Structure

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

Protein Structure Prediction

Residue Contact Prediction for Protein Structure using 2-Norm Distances

6-Foot Mini Toober Activity

Basic concepts of molecular biology

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Molecular Structures

Molecular Structures

Traditional approaches to 3-D structure determination

LIST OF ACRONYMS & ABBREVIATIONS

Packing of Secondary Structures

Algorithms in Bioinformatics ONE Transcription Translation


Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

03-511/711 Computational Genomics and Molecular Biology, Fall

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

Docking. Why? Docking : finding the binding orientation of two molecules with known structures

Basic concepts of molecular biology

BIRKBECK COLLEGE (University of London)

Immune system IgGs. Carla Cortinas, Eva Espigulé, Guillem Lopez-Grado, Margalida Roig, Valentina Salas. Group 2

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

In silico measurements of twist and bend. moduli for beta solenoid protein self-

Protein 3D Structure Prediction

Protein NMR II. Lecture 5

Amino Acids and Proteins

Ali Yaghi. Tamara Wahbeh. Mamoun Ahram

Ch Biophysical Chemistry

Pacific Symposium on Biocomputing 4: (1999)

Introduction to protein structure analysis and prediction

Introduction to Proteins

Protein Structure Analysis

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Solving Structure Based Design Problems using Discovery Studio 1.7 Building a Flexible Docking Protocol

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 5 Protein Structure - III

1) The penicillin family of antibiotics, discovered by Alexander Fleming in 1928, has the following general structure: O O

From code to translation

Protein Data Bank and Structure Display with PyMOL

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.

1. DNA replication. (a) Why is DNA replication an essential process?

Molecular design principles underlying β-strand swapping. in the adhesive dimerization of cadherins

Protein Structure Prediction. christian studer , EPFL

Research in Structural Bioinformatics and Molecular Biophysics. OUTLINE: What is it and why is it useful? EXAMPLES: b. Improving enzyme s function.

Unit 1. DNA and the Genome

11 questions for a total of 120 points

Supplementary Table 1: List of CH3 domain interface residues in the first chain (A) and

1/4/18 NUCLEIC ACIDS. Nucleic Acids. Nucleic Acids. ECS129 Instructor: Patrice Koehl

NUCLEIC ACIDS. ECS129 Instructor: Patrice Koehl

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

BMB/Bi/Ch 170 Fall 2017 Problem Set 1: Proteins I

Supplementary Figure 1

Solutions to Problem Set 1

Protein Structure Prediction by Constraint Logic Programming

Supplementary Figure 1. Electron microscopy of gb-698glyco/1g2 Fab complex. a)

IV107 Bioinformatika I

Solutions to 7.02 Quiz II 10/27/05

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Protein Folding Problem I400: Introduction to Bioinformatics

Lecture 1 - Introduction to Structural Bioinformatics

Supplementary Data for Monti, et al.

Molecular Docking Study of Some Novel Nitroimidazo[1,2-b]pyridazine

The Structure Lectures

Amino Acid Sequences and Evolutionary Relationships

Cristian Micheletti SISSA (Trieste)

Amino Acid Sequences and Evolutionary Relationships

Description of Changes and Corrections for PDB File Format Version 4.0. Provisional Document April 12, 2011

KEMM15 Lecture note in structural bioinformatics: A practical guide. S Al-Karadaghi, Biochemistry & Structural Biology, Lund University

7.014 Solution Set 4

ECS 129: Structural Bioinformatics March 15, 2016

BIOINFORMATICS Introduction

Aipotu II: Biochemistry

466 Asn (N) to Ala (A) Generate beta dimer Interface

SUPPLEMENTARY INFORMATION

Proteins and their 3 D Structure

Amino Acid Sequences and Evolutionary Relationships. How do similarities in amino acid sequences of various species provide evidence for evolution?

EE550 Computational Biology

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Protein design. CS/CME/Biophys/BMI 279 Oct. 20 and 22, 2015 Ron Dror

7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions will be posted on the web.

2018 Protein Modeling Exam Key

Dynamic Programming Algorithms

Nucleic acid and protein Flow of genetic information

Collagen. 7.88J Protein Folding. Prof. David Gossard October 20, 2003

Zool 3200: Cell Biology Exam 3 3/6/15

What s New in Discovery Studio 2.5.5

Retrieving and Viewing Protein Structures from the Protein Data Base

Visualizing proteins with PyMol

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Transcription:

Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant Sciences Department Weizmann Institute of Science Similar sequence Similar sequence Similar structure Similar sequence Similar function Similar sequence Similar structure http://pdb.weizmann.ac.il/ http://www.rcsb.org/pdb/ Source of data: Crystal structures NMR models Other PDB The PDB database is the main repository for the processing and distribution of 3D biological macromolecular structure data

http://www.rcsb.org/pdb/ PDB content growth XRay Crystallography Data Source Clone/Express/Purify Crystallize XRay diffraction data + Solve phase problem Interpret electron density map Coordinates of atoms in protein molecule

NMR Spectroscopy Data Source Xray crystallography NMR information about spatiallyclosed atoms list of distance constraints + dihedral angles constraints multiple models of protein structure Atomic resolution Good Reasonable Hydrogens Rarely determined Determined Molecule size No restriction Small proteins Dynamics Snapshot Multi models Membrane proteins Problematic Procedure Very long long Coordinates of atoms in protein molecule What information is included in the PDB? File Format Protein description Literature Data about the experiment Sequence Header section Structure (atomic coordinates) Connectivity Coordinate section http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

JRNL AUTH L.J.HARRIS,S.B.LARSON,K.W.HASEL,A.MCPHERSON JRNL TITL REFINED STRUCTURE OF AN INTACT IGG2A MONOCLONAL JRNL TITL 2 ANTIBODY HEADER IMMUNOGLOBULIN 25OCT96 1IGT JRNL REF BIOCHEMISTRY V. 36 1581 1997 COMPND MOLECULE: IGG2A INTACT ANTIBODY MAB231; JRNL REFN ASTM BICHAW US ISSN 00062960 0033 SOURCE MOUSE (MUS MUSCULUS, STRAIN BALB/C) KEYWDS INTACT IMMUNOGLOBULIN V REGION C REGION, IMMUNOGLOBULIN EXPDTA XRAY DIFFRACTION AUTHOR L.J.HARRIS,S.B.LARSON,K.W.HASEL,A.MCPHERSON REVDAT 1 07JUL97 1IGT 0 REMARK 2 RESOLUTION. 2.8 ANGSTROMS. REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER; REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER; REMARK 470 I=INSERTION CODE): REMARK 470 M RES CSSEQI ATOMS REMARK 470 LEU A 6 CG CD1 CD2 REMARK 470 ARG A 8 CG CD NE CZ NH1 NH2 HELIX 1 1 PRO A 80 ASP A 82 5 SHEET 1 A 4 LEU A 4 SER A 7 0 SHEET 2 A 4 ILE A 19 HIS A 24 1 N HIS A 24 O THR A SHEET 3 A 4 GLY A 70 ILE A 75 1 N ILE A 75 O ILE A SHEET 4 A 4 PHE A 62 SER A 67 1 N SER A 67 O GLY A SSBOND 1 CYS A 23 CYS A 88 CRYST1 65.820 76.770 100.640 88.05 92.35 97.23 P 12

SEQRES SEQRES SEQRES 1 A 214 ASP ILE VAL LEU THR GLN SER PRO SER SER LEU SER 2 A 214 SER LEU GLY ASP THR ILE THR ILE THR CYS HIS ALA 3 A 214 GLN ASN ILE ASN VAL TRP LEU SER TRP TYR GLN GLN Atom Atom Res Res X Y Z Occ Bfact No name No ATOM 1 N ASP A 1 1.600 85.453 44.624 1.00 43.02 ATOM 2 CA ASP A 1 1.649 84.304 45.569 1.00 38.99 HET NAG D 1 26 HETNAM NAG NACETYLDGLUCOSAMINE FORMUL 5 NAG 8(C8 H15 N1 O6) HETATM 3568 CA CA 0 12.108 17.156 78.830 1.00 7.31 HETATM 3569 O HOH 1 12.160 19.496 78.042 1.00 33.27 HETATM 3570 O HOH 2 23.163 36.984 67.113 1.00 18.80 HETATM 3571 O HOH 3 10.102 42.843 63.995 1.00 24.28 HETATM 3572 O HOH 4 22.311 19.282 69.877 1.00 27.58 CONECT 482 480 3568 CONECT 509 507 3568 CONECT 3568 482 509 3799 Visualization Molecular graphics What do we need? Rotation & translation Color specific parts of the molecule Labeling of residues and atoms Geometrical measurements (distances & angles) Schematic representation: Atoms/Bonds/Secondary structures, Molecular surfaces Compare structures Saving pictures

Representation of molecules (1) Stickmodel Ball & Stick Ball size: 0 Stick size: 0.2 Ball size: 0.4 Stick size: 0.2 Molecular surfaces Spacefilled model Ball size: 0.8 Stick size: 0 Representation of molecules (2) Backbone only connections between Calpha atoms Schematic Surface helix cylinder strand arrow How to search in the PDB? The OCA browser developed in the WIS by Jaime Prilusky is the best interface to the PDB. Entries can be retrieved by variety of criteria such http://bip.weizmann.ac.il/ocabin/ocamain

Problems in the PDB database Missing data Quality of data Format problems residue numbers Independence of data is doubtful Structural analysis of proteins Examination of atomic interactions Examination of secondary structures Cavities Buried/exposed regions Analysis of ligands Topics in structural bioinformatics Structural alignment Structural classification Secondary structure prediction Structure prediction Molecular docking Molecular dynamics

Structural alignment why to compare protein structures? Structures are more conserved in evolution than sequences. Two homologous proteins have the same overall structure. It is possible that 2 proteins without detectable similarity will have the same structure. In the twilight zone of sequence similarity, structural alignment might help to correctly determine the relations between 2 proteins Structural similarity is therefore more sensitive method than sequence alignment to determine protein function What properties of protein might be used to detect structural similarity to other proteins? Structural classification All " All! sequence Type and number of secondary structures (sheets, helices) Structural arrangement of secondary structures Structural attributes of individual amino acids Distances between amino acids in the protein!/ "!+"

Secondary structure prediction Prediction of tertiary structures based on the amino acid sequence is still very difficult task. Prediction of more local structural properties is easier The most known classification databases are: SCOP CATH Prediction of secondary structures is important and more feasible Prediction of secondary structures is a bridge between the linear information and the 3D structure Programs in this field often employ different types of machine learning approaches ACHYTTEKRGGSGTKKREA Building 3D models of proteins ACHYTTEKRGGSGTKKREA HHHHHHHHOOOOOSSSSSS

Building by homology (Homology modelling) Fold recognition (Threading) Alignment with proteins of known structure The sequence: M A A G Y A V L S M A A A A A T S K G G G A Y F F Y A D E L Y G V V V V L I V L S D E S + Known protein folds structural model structural model Ab initio Building by homology The sequence M A A G Y A V L S There are millions of proteins but only several thousands different folds. If we can find a similar protein with a known structure we can use the fold of that structure as the basic template to the structure of our protein. structural model Positions of loop and side chains will be constructed in the second stage

Find proteins with known structure which are similar to your sequence build alignment Build structural model Check the model Finish Construction of loops might be done by: Using database of loops. The loops are classified according to their length, the geometry of their edges and their sequence Without any use of previous data, using physical and chemical principles

Several web pages for homology modeling COMPOSER felix.bioccam.ac.uksoftbase.html MODELLER guitar.rockefeller.edu/modeller/modeller.html WHAT IF www.sander.emblheidelberg.de/whatif/ SWISSMODEL www.expasy.ch/swissmodel.html SwissModel http://www.expasy.ch/swissmod/swissmodel.html

Modeller http://guitar.rockefeller.edu/modeller/about_modeller.shtml Advanced program for homology modeling Based on distance constraints Implemented in several popular modelling packages such as InsightII The source is available for unix platforms at the above URL Threading (fold recognition) The input sequence is threaded on different folds from library of known folds Using scoring functions we get a score for the compatability between the sequence and the structure Statisticaly significant score tells that the input protein adopts similar 3D structure to that of the examined fold

This method is less accurate but could be applied for.more cases When the fold of our protein is not represented in the database we can not get a correct solution using this method. The most important part is the accuracy of the scoring function which evaluate the compatibility of a structure and a sequence. H bond donor H bond acceptor Glycin Hydrophobic Input: sequence Library of folds of known proteins H bond donor H bond acceptor Glycin Hydrophobic Web sites for fold recognition Profiles: 3DPSSM http://www.bmm.icnet.uk/~3dpssm Libra I http://www.ddbj.nig.ac.jp/htmls/email/libra/libra_i.html UCLA DOE http://www.doembi.ucla.edu/people/frsvr/frsvr.html Contact potentials 123D http://wwwimmb.ncifcrf.gov/~nicka/123d.html S=2 Z= 1 S=5 Z=1.5 S=20 Z=5 Profit http://lore.came.sbg.ac.at/home.html

Abinitio methods for modelling Great theoretical interest but not practical The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts If we will have perfect function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold Docking: finding the binding orientation of two molecules with known structures According to the molecules involved: ProteinLigand docking ProteinProtein docking Specific docking algorithms usually designed to deal with one of these problems but not with both (different contact area, flexibility, level of representation, etc.) Local docking Global docking Why? Understanding interactions, roles of specific amino acids, design of mutations and changes of activity. Comparison of affinities of different molecules Drug design