Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Similar documents
Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Structural bioinformatics

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Protein 3D Structure Prediction

Computational Methods for Protein Structure Prediction

Introduction to Proteins

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

CFSSP: Chou and Fasman Secondary Structure Prediction server

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

LIST OF ACRONYMS & ABBREVIATIONS

Amino Acids and Proteins

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 29 September 2005

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor

Molecular Structures

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Molecular Structures

Ch Biophysical Chemistry

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Fundamentals of Protein Structure

6-Foot Mini Toober Activity

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.

Algorithms in Bioinformatics ONE Transcription Translation

X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004


Dr. R. Sankar, BSE 631 (2018)

Computer Aided Drug Design. Protein structure prediction. Qin Xu

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Protein Folding Problem I400: Introduction to Bioinformatics

Protein Structure/Function Relationships

Basic concepts of molecular biology

Solutions to Problem Set 1

Protein Structure Prediction

Protein Structure Databases, cont. 11/09/05

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Amino Acid Sequences and Evolutionary Relationships

Structure and Function of the First Full-Length Murein Peptide Ligase (Mpl) Cell Wall Recycling Protein

Amino Acid Sequences and Evolutionary Relationships

Protein Structure Prediction. christian studer , EPFL

Virtual bond representation

4/10/2011. Rosetta software package. Rosetta.. Conformational sampling and scoring of models in Rosetta.

Basic concepts of molecular biology

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

ECS 129: Structural Bioinformatics March 15, 2016

2018 Protein Modeling Exam Key

Immune system IgGs. Carla Cortinas, Eva Espigulé, Guillem Lopez-Grado, Margalida Roig, Valentina Salas. Group 2

Sequence Databases and database scanning

First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!

11 questions for a total of 120 points

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Protein Structure Analysis

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

CENTRE FOR BIOINFORMATICS M. D. UNIVERSITY, ROHTAK

Cristian Micheletti SISSA (Trieste)

N.A. Lacher, Q. Wang, and C.W. Demarest. June 24th, Pfizer BioTherapeutics Pharmaceutical Sciences

Homework Project Protein Structure Prediction Contact me if you have any questions/problems with the assignment:

Are specialized servers better at predicting protein structures than stand alone software?

Prediction of protein disorder Zsuzsanna Dosztányi

BC 367, Exam 2 November 13, Part I. Multiple Choice (3 pts each)- Please circle the single best answer.

Bi 8 Lecture 7. Ellen Rothenberg 26 January Reading: Ch. 3, pp ; panel 3-1

Fourier-based classification of protein secondary structures

Protein analysis. Dr. Mamoun Ahram Summer semester, Resources This lecture Campbell and Farrell s Biochemistry, Chapters 5

Molecular design principles underlying β-strand swapping. in the adhesive dimerization of cadherins

Unit 1. DNA and the Genome

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Sequence Determinants of a Conformational Switch in a Protein

Structure Prediction. Modelado de proteínas por homología GPSRYIV. Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

1. DNA replication. (a) Why is DNA replication an essential process?

From code to translation

De novo sequencing in the identification of mass data. Wang Quanhui Liu Siqi Beijing Institute of Genomics, CAS

If you wish to have extra practice with swiss pdb viewer or to familiarize yourself with how to use the program here is a tutorial:

Nucleic acid and protein Flow of genetic information

Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Dynamic Programming Algorithms

7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions will be posted on the web.

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Examining the components of your peptide sample with AccuPep QC. Lauren Lu, Ph.D. October 29, 2015, 9:00-10:00 AM EST

Pacific Symposium on Biocomputing 4: (1999)

03-511/711 Computational Genomics and Molecular Biology, Fall

ONLINE BIOINFORMATICS RESOURCES

7.014 Solution Set 4

Important points from last time

SUPPLEMENTARY INFORMATION Figures. Supplementary Figure 1 a. Page 1 of 30. Nature Chemical Biology: doi: /nchembio.2528

Protein structure. Wednesday, October 4, 2006

Zool 3200: Cell Biology Exam 3 3/6/15

Chem 465 Biochemistry II

Supporting Information Contents

Workunit: P Title:MT

1/4/18 NUCLEIC ACIDS. Nucleic Acids. Nucleic Acids. ECS129 Instructor: Patrice Koehl

NUCLEIC ACIDS. ECS129 Instructor: Patrice Koehl

7.013 Problem Set 3 FRIDAY October 8th, 2004

Amino Acids. Amino Acid Structure

Lecture for Wednesday. Dr. Prince BIOL 1408

Transcription:

Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features, such as active site, substrate specificity, allosteric regulation etc. They aid in rational drug design and protein engineering. They can elucidate evolutionary relationships undetectable by sequence comparisons. They can be used to put mutations in the proper structural context.

Learning Objectives Outline the basic steps in comparative protein structure modelling. Explain how structure models can be used to support biological hypotheses. Perform simple homology modelling using web servers and evaluate the results.

The Protein Data Bank Sep. 2001 May 2006 Oct. 2012 X-ray 13116 30860 74567 NMR 2451 5368 9605 Other 338 200 674 Total 15905 36428 84846 The PDB also contains nucleotide and nucleotide analogue structures. 90000 80000 70000 60000 50000 40000 Yearly Total 30000 20000 10000 0 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Growth of Genbank

No Structure = No Go? Do homology modelling. Use known protein structures to make reliable models. Validate the model. Adjust expectations and use accordingly!

Why Do We Need Homology Modelling? Ab Initio protein folding (random sampling): 100 aa, 3 conf./residue gives approximately 10 48 different overall conformations! Random sampling is NOT feasible, even if conformations can be sampled at picosecond (10-12 sec) rates. Levinthal s paradox Do homology modelling instead.

How Is It Possible? The structure of a protein is uniquely determined by its amino acid sequence (but sequence is sometimes not enough): prions ph, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution. Structure > Function > Sequence

How Often Can We Do It? Currently 85000 structures in the PDB Reduces to 20000 structures (chains) <30 % identical (sequence) with a resolution <3.0 Å. These fall in ~1400 different structure classes (folds). ~25% of all sequences can be modelled. ~50% can be assigned to a fold class.

Protein Folds (SCOP) in PDB 1600 1400 1200 1000 No new folds! 800 Yearly Total 600 400 200 0 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Worldwide Structural Genomics Fold space coverage Complete genomes Disease-causing organisms Model organisms Membrane proteins Protein-ligand interactions Hou et al., PNAS 2003, 100: 2386-2390

Homology Modeling & Structural Genomics Roberto Sánchez et al. Nature Structural Biology 7, 986-990 (2000)

How Well Can We Do It? Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20 M24 (1999)

How Is It Done? Identify template(s) Initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation ß

Template Identification Search with sequence Blast Psi-Blast Fold recognition methods

Sequence vs. Structure Residues in the same column in an alignment are either: Structurally equivalent/similar Evolutionary equivalent/related/homologous Different types of similarity not necessarily equivalent. Use biological information to guide/adjust your alignment. Functional annotation in databases Active site/motifs

Alignment

F D I C R L P G S A E A V C F 6-2 0-3 -2 2-2 -3-1 -2-3 -2 0-3 N -3 2-2 -2 0-2 -2 0 2 0 1 0-2 -2 V 0-2 2-2 -1 2-1 -1-1 0-1 0 5-2 C R -2-2 -2-2 5-1 0 0 1-1 0-1 -1-2 T P E -3 2-2 -3 0-2 1 0 1 1 5 1-1 -3 A -2 0-1 -2-1 -1 1 0 1 5 1 5 0-2 I 0-3 5-2 -2 2-2 -2-1 -1-2 -1 2-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

F D I C R L P G S A E A V C F 6-2 0-3 -2 2-2 -3-1 -2-3 -2 0-3 N -3 2-2 -2 0-2 -2 0 2 0 1 0-2 -2 V 0-2 2-2 -1 2-1 -1-1 0-1 0 5-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 R -2-2 -2-2 5-1 0 0 1-1 0-1 -1-2 T -2 0 0-1 0 0 0-1 2 0 1 0 0-1 P -2 0-2 -3 0-2 8 0 0 1 1 1-1 -3 E -3 2-2 -3 0-2 1 0 1 1 5 1-1 -3 A -2 0-1 -2-1 -1 1 0 1 5 1 5 0-2 I 0-3 5-2 -2 2-2 -2-1 -1-2 -1 2-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

Improving the Alignment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS From Professional Gambling by Gert Vriend http://www.cmbi.kun.nl/gv/articles/text/gambling.html

Template Quality Selecting the best template is crucial! The best template may not be the one with the highest % id (best p-value ) Template 1: 93% id, 3.5 Å resolution Template 2: 90% id, 1.5 Å resolution

The Importance of Resolution 4 Å low 3 Å 2 Å 1 Å high

Key Parameters Resolution R values Agreement between data and model. Usually between 0.15 and 0.25, should not exceed 0.30. R + 0.05 > R free > R. Ramachandran plot B factors Contributions from static and dynamic disorder Well determined ~10-20 Å 2, intermediate ~20-30 Å 2, flexible 30-50 Å 2, invisible >60 Å 2.

Template Quality Ramachandran Plot X-ray structure good data. NMR structure low quality data

Error Recovery Errors in the model can NOT be recovered at a later step The alignment can not make up for a bad choice of template. Loop modeling can not make up for a poor alignment. The step where the errors were introduced should be redone.

Validation Most programs will get the bond lengths and angles right. Model Rama. plot ~ template Rama. plot. select a high quality template! Inside/outside distributions of polar and apolar residues. 26

Model Validation ProQ ProQ is a neural network-based predictor Structural features quality of a protein model. ProQ is optimized to find correct models NOT (necessarily) native structures. Two quality measures: MaxSub & LGscore Arne Elofssons group: http://www.sbc.su.se/~bjorn/proq/ 27

No Template No Go? Use De novo/ab initio/free modelling methods: find the fold of native protein by simulating the biological process of protein folding A VERY DIFFICULT task because a protein chain can fold into millions of different conformations. Use it only when no detectable homologues can be found. Methods can also be useful for fold recognition in cases of extremely low homology (e.g. convergent evolution). 28

Fold recognition models in CASP6 Two-high-scoring predictions by the top groups in FR/H (top) and FR/A (bottom). The assigned z-scores are given for the top predictions (center) as well as for two average predictions (right). G. Wang Assessment of fold recognition predictions in CASP6, Proteins 61, S7, Pages 46-66 29

Fragment-based ab initio modelling Rosetta method of the Baker group: Secondary structure prediction Fragments library of 3 and 9 residues from known structures Link fragments together, use only backbone and CB atoms Contact/pair potential Energy minimization techniques (Monte Carlo optimization) to calculate tertiary structure Refine structure including side chains Das R, Baker D, Annu. Rev. Biochem. 2008. 77:363 82 http://robetta.bakerlab.org/ 30

Problems with empirical potentials Fragments with correct local structure 31

Human intervention The best groups in CASP use maximum knowledge of query proteins Specialists can help to find a correct template and correct alignments Knowledge of function Cysteines forming disulfide bridges or binding e.g. zinc molecules Proteolytic cleavage sites Other metal binding residues Antibody epitopes or escape mutants Ligand binding Results from CD or fluorescence experiments 32

Summary Successful homology modelling depends on the following: Template quality Alignment (add biological information) Modelling program/procedure (try more than one) Always validate your final model!

Modelling Servers Comparative (homology) modelling: CPHmodels (simple) http://www.cbs.dtu.dk/services/cphmodels/ SwissModel (intermediate) http://swissmodel.expasy.org HHpred (complex) http://toolkit.tuebingen.mpg.de/hhpred Ab initio Robetta (also comparative; intermediate) http://robetta.bakerlab.org

A Modelling Example

TMX3 A Novel Thioredoxin-Like Protein of the Endoplasmic Reticulum

Background Disulfide bond formation Mostly by protein disulfide isomerase (PDI) in the ER Also by ERp57 (a PDI homologue), calreticulin and calnexin Many other PDI relatives with unknown function a) a a' b' b b) * c)

Disulfide Metabolism

Discovery of TMX3 Search for Trx-like proteins Found several conserved proteins TMX3 Conserved in all higher eukaryotes Signal peptide Trx domain Transmembrane helix Glycosylation sites ER retention signal

Overview of TMX3

Experimental Data I Expression in most tissues highest in muscle tissues. Inside ER Transmembrane protein Glycosylated

Experimental Data II CD spectroscopy Secondary structure content Fluorescence spectroscopy Functional implications

Experimental Data III Redox titration Redox potential: -157 mv Similar to PDI Reduced fraction 1 0.8 0.6 0.4 0.2 Similar function to PDI? 0 10-5 10-4 0.001 0.01 0.1 1 [GSH] 2 [GSSG], (M)

Building a Model Alignments showed similarity to PDI domains 40% sequence ID to a domain ERp57 domains Calsequestrin An acidic Trx-domain protein involved in calcium storage in muscle cells. 25% sequence ID to Trx2+3 CSQ PDI abb

Modelling Alignment Model built with MODELLER

Model vs. Template a) b) N N C C (C)

ProQ ProQ - Results Prediction using secondary structure ============================= Predicted LGscore : 4.109 Predicted MaxSub : 0.487 ============================= Different ranges of quality: LGscore>1.5 fairly good model LGscore>2.5 very good model LGscore>4 extremly good model MaxSub>0.1 fairly good model MaxSub>0.5 very good model MaxSub>0.8 extremly good model http://www.sbc.su.se/~bjornw/proq/proq.cgi

Surface Potential CSQ is an acidic protein +128 180º TMX3 is not... and is probably not regulated by calcium ions. -128 +81 180º -81

The Catalytic Domain PDI a Trx PDI abb TMX3 a

A Hydrophobic Pocket? TMX3 c PDI PDI b a' * b' b * a 25º TMX3 b *

Missing Parts Transmembrane helix Dimerisation? PDI c a' b' * 180º * a b

Levamisole Unc-74 and C. elegans Cysteine-loop ligand-gated ion channels

Conclusions Suggestions from model & experiments TMX3 is a thioredoxin-like molecule of the ER TMX3 is responsible for formation of cysteine-loop-gated ion channels involved in voluntary muscle contraction. Future experiments Structure(!) Test suggestions from model Hydrophobic pocket/target? Identify interaction partner(s)

Acknowledgements Johannes Haugstetter (ETH, Zürich) Lars Ellgaard (University of Copenhagen) Further reading: Haugstetter J, Blicher T, Ellgaard.L. Identification and characterization of a novel thioredoxin-related transmembrane protein of the endoplasmic reticulum. J Biol Chem. 2005 Mar 4;280(9): 8371-80. Haugstetter J, Maurer MA, Blicher T, Pagac M, Wider G, Ellgaard.L. Structure-function analysis of the endoplasmic reticulum oxidoreductase TMX3 reveals interdomain stabilization of the N-terminal redox-active domain. J Biol Chem. 2007 Nov 16;282(46): 33859-67.