Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Similar documents
Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Protein 3D Structure Prediction

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Structural bioinformatics

Computational Methods for Protein Structure Prediction

Introduction to Proteins

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

Molecular Structures

Molecular Structures

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

LIST OF ACRONYMS & ABBREVIATIONS

Amino Acids and Proteins

6-Foot Mini Toober Activity

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.

CFSSP: Chou and Fasman Secondary Structure Prediction server

Protein Structure Prediction

Basic concepts of molecular biology

11 questions for a total of 120 points

4/10/2011. Rosetta software package. Rosetta.. Conformational sampling and scoring of models in Rosetta.

Dynamic Programming Algorithms


Amino Acid Sequences and Evolutionary Relationships

Amino Acid Sequences and Evolutionary Relationships

Fundamentals of Protein Structure

Are specialized servers better at predicting protein structures than stand alone software?

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 29 September 2005

Basic concepts of molecular biology

ECS 129: Structural Bioinformatics March 15, 2016

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Computer Aided Drug Design. Protein structure prediction. Qin Xu

MATH 5610, Computational Biology

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor

Dr. R. Sankar, BSE 631 (2018)

Virtual bond representation

Structure Prediction. Modelado de proteínas por homología GPSRYIV. Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo

3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid

X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

Sequence Databases and database scanning

Bioinformatics Practical Course. 80 Practical Hours

Protein Structure Prediction. christian studer , EPFL

Algorithms in Bioinformatics ONE Transcription Translation

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Solutions to Problem Set 1

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Homework Project Protein Structure Prediction Contact me if you have any questions/problems with the assignment:

Workunit: P Title:MT

Cristian Micheletti SISSA (Trieste)

Protein Structure Databases, cont. 11/09/05

Protein Structure Analysis

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Protein NMR II. Lecture 5

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Protein structure. Wednesday, October 4, 2006

Structure and Function of the First Full-Length Murein Peptide Ligase (Mpl) Cell Wall Recycling Protein

Ch Biophysical Chemistry

Immune system IgGs. Carla Cortinas, Eva Espigulé, Guillem Lopez-Grado, Margalida Roig, Valentina Salas. Group 2

If you wish to have extra practice with swiss pdb viewer or to familiarize yourself with how to use the program here is a tutorial:

Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Important points from last time

CENTRE FOR BIOINFORMATICS M. D. UNIVERSITY, ROHTAK

Unit 1. DNA and the Genome

03-511/711 Computational Genomics and Molecular Biology, Fall

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

doi: /nature09408 Figure S1

Protein design. CS/CME/BioE/Biophys/BMI 279 Oct. 24, 2017 Ron Dror

Sequence Determinants of a Conformational Switch in a Protein

7.014 Solution Set 4

Supplement for the manuscript entitled

Introduction to Bioinformatics

Protein Structure/Function Relationships

Amino Acid Sequences and Evolutionary Relationships. How do similarities in amino acid sequences of various species provide evidence for evolution?

466 Asn (N) to Ala (A) Generate beta dimer Interface

N.A. Lacher, Q. Wang, and C.W. Demarest. June 24th, Pfizer BioTherapeutics Pharmaceutical Sciences

Laboratory Evolution of Robust and Enantioselective Baeyer-Villiger Monooxygenases for Asymmetric Catalysis

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn

From code to translation

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia

1. DNA replication. (a) Why is DNA replication an essential process?

7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions will be posted on the web.

Supplementary Data for Monti, et al.

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Prediction of protein disorder Zsuzsanna Dosztányi

Protein Folding Problem I400: Introduction to Bioinformatics

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

In silico measurements of twist and bend. moduli for beta solenoid protein self-

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Supporting Information Contents

BC 367, Exam 2 November 13, Part I. Multiple Choice (3 pts each)- Please circle the single best answer.

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods

Zool 3200: Cell Biology Exam 3 3/6/15

Visualizing proteins with PyMol

Transcription:

Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features, such as active site, substrate specificity, allosteric regulation etc. They aid in rational drug design and protein engineering. They can elucidate evolutionary relationships undetectable by sequence comparisons. They can be used to put mutations in the proper structural context.

Learning Objectives Outline the basic steps in comparative protein structure modelling. Explain how structure models can be used to support biological hypotheses. Perform simple homology modelling using web servers and evaluate the results.

The Protein Data Bank Sep. 2001 May 2006 Oct. 2012 X-ray 13116 30860 74567 NMR 2451 5368 9605 Other 338 200 674 Total 15905 36428 84846 The PDB also contains nucleotide and nucleotide analogue structures. 90000 80000 70000 60000 50000 40000 Yearly Total 30000 20000 10000 0 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Growth of Sequences http://www.kurzweilai.net/dna-sequencing-data

No Structure = No Go? Do structure modelling. Use known protein structures to make models. Comparative modeling (Easy) Fold recognition (Difficult) New/unknown fold (Ab initio methods very difficult) Validate the model. Adjust expectations and use accordingly!

Do We Need Homology Modelling? Ab Initio protein folding (random sampling): 100 aa, 3 conf./residue gives approximately 10 48 different overall conformations! Random sampling is NOT feasible, even if conformations can be sampled at picosecond (10-12 sec) rates. Levinthal s paradox Do homology modelling instead.

How Is It Possible? The structure of a protein is uniquely determined by its amino acid sequence (but sequence is sometimes not enough): prions ph, ions, cofactors, chaperones Structure is conserved much longer than sequence in evolution. Structure > Function > Sequence

How Often Can We Do It? Currently 85000 structures in the PDB Reduces to 20000 structures (chains) <30 % identical (sequence) with a resolution <3.0 Å. These fall in ~1400 different structure classes (folds). ~25% of all sequences can be modelled. ~50% can be assigned to a fold class.

Protein Folds (SCOP) in PDB 1600 1400 1200 1000 No new folds! 800 Yearly Total 600 400 200 0 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

Worldwide Structural Genomics Fold space coverage Complete genomes Disease-causing organisms Model organisms Membrane proteins Protein-ligand interactions Hou et al., PNAS 2003, 100: 2386-2390

Homology Modeling & Structural Genomics What a single new fold gives. Roberto Sánchez et al. Nature Structural Biology 7, 986-990 (2000)

Protein Folds http://www.jcsg.org/

How Well Can We Do It? Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20 M24 (1999)

How Is It Done? Identify template(s) Initial alignment Improve alignment Backbone generation Loop modelling Side chains Refinement Validation ß

Template Identification Search with sequence Blast Psi-Blast Fold recognition methods

Sequence vs. Structure Residues in the same column in an alignment are either: Structurally equivalent/similar Evolutionary equivalent/related/homologous Different types of similarity not necessarily equivalent. Use biological information to guide/adjust your alignment. Functional annotation in databases Active site/motifs

Alignment

F D I C R L P G S A E A V C F 6-2 0-3 -2 2-2 -3-1 -2-3 -2 0-3 N -3 2-2 -2 0-2 -2 0 2 0 1 0-2 -2 V 0-2 2-2 -1 2-1 -1-1 0-1 0 5-2 C R -2-2 -2-2 5-1 0 0 1-1 0-1 -1-2 T P E -3 2-2 -3 0-2 1 0 1 1 5 1-1 -3 A -2 0-1 -2-1 -1 1 0 1 5 1 5 0-2 I 0-3 5-2 -2 2-2 -2-1 -1-2 -1 2-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

F D I C R L P G S A E A V C F 6-2 0-3 -2 2-2 -3-1 -2-3 -2 0-3 N -3 2-2 -2 0-2 -2 0 2 0 1 0-2 -2 V 0-2 2-2 -1 2-1 -1-1 0-1 0 5-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 R -2-2 -2-2 5-1 0 0 1-1 0-1 -1-2 T -2 0 0-1 0 0 0-1 2 0 1 0 0-1 P -2 0-2 -3 0-2 8 0 0 1 1 1-1 -3 E -3 2-2 -3 0-2 1 0 1 1 5 1-1 -3 A -2 0-1 -2-1 -1 1 0 1 5 1 5 0-2 I 0-3 5-2 -2 2-2 -2-1 -1-2 -1 2-2 C -3-2 -2 8-2 -3-3 -2-1 -2-3 -2-2 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

Improving the Alignment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS From Professional Gambling by Gert Vriend http://www.cmbi.kun.nl/gv/articles/text/gambling.html

Template Quality Selecting the best template is crucial! The best template may not be the one with the highest % id (best p-value ) Template 1: 93% id, 3.5 Å resolution Template 2: 90% id, 1.5 Å resolution

The Importance of Resolution 4 Å low 3 Å 2 Å 1 Å high

Key Parameters Resolution R values Agreement between data and model. Usually between 0.15 and 0.25, should not exceed 0.30. R + 0.05 > R free > R. Ramachandran plot B factors Contributions from static and dynamic disorder Well determined ~10-20 Å 2, intermediate ~20-30 Å 2, flexible 30-50 Å 2, invisible >60 Å 2.

Template Quality Ramachandran Plot X-ray structure good data. NMR structure low quality data

Error Recovery Errors in the model can NOT be recovered at a later step The alignment can not make up for a bad choice of template. Loop modeling can not make up for a poor alignment. The step where the errors were introduced should be redone.

Validation Most programs will get the bond lengths and angles right. Model Rama. plot ~ template Rama. plot. select a high quality template! Inside/outside distributions of polar and apolar residues. 27

Model Validation ProQ ProQ is a neural network-based predictor Structural features quality of a protein model. ProQ is optimized to find correct models NOT (necessarily) native structures. Two quality measures: MaxSub & LGscore Arne Elofssons group: http://www.sbc.su.se/~bjorn/proq/ 28

Summary Successful homology modelling depends on the following: Template quality Alignment (add biological information) Modelling program/procedure (try more than one) Always validate your final model!

Finding Remote Homologues (Fold Recognition)

Why %id Is a Poor Measure 1200 models sharing 25-95% sequence identity with the submitted sequences (www.expasy.ch/swissmod)

Identification of Correct Fold % ID is a poor measure Many evolutionarily related proteins share low sequence identity A short alignment of 5 amino acids can share 100% id, but what does it mean? Alignment score even worse Many sequences will score high against each other (especially in hydrophobic stretches) P-value or E-value more reliable.

What are P and E values? E-value Number of expected hits in database with score higher than match Depends on database size P-value Probability that a random hit will have score higher than match Database size independent P(Score) Score 150 10 hits with higher score (E=10) 10000 hits in database => P=10/10000 = 0.001 Score

Sequence Profiles Not all positions in a protein are equally likely to mutate Some amino acids (active sites) are highly conserved, and the score for mismatch must be very high. Other amino acids can mutate almost freely, and the score for mismatch should be lower than the BLOSUM score. Sequence profiles can capture these differences.

Sequence Profiles ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I -TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---I IE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD--- -TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---V ASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE---- TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDGMERNTAGVP Matching anything but G à large negative score Anything can match

How to Make Sequence Profiles PSIBLAST Align (BLAST) sequence against large sequence database (Swiss-Prot). Select significant alignments and make profile (weight matrix) using techniques for sequence weighting and pseudo counts. Use weight matrix to align against sequence database to find new significant hits. Repeat 2 and 3 (normally 3 times!).

Ab Initio Methods

No Template No Go? De novo / ab initio / free modelling methods: simulate the biological process of protein folding A VERY DIFFICULT task because a protein chain can fold into millions of different conformations. Use it only when no detectable homologues can be found. Methods can also be useful for fold recognition in cases of extremely low homology (e.g. convergent evolution). 38 http://cnx.org/content/m11461/latest/

Fragment-based ab initio modelling Rosetta method of the Baker group: Secondary structure prediction Fragments library of 3 and 9 residues from known structures Link fragments together, use only backbone and CB atoms Contact/pair potential Energy minimization techniques (Monte Carlo optimization) to calculate tertiary structure Refine structure including side chains Das R, Baker D, Annu. Rev. Biochem. 2008. 77:363 82 http://robetta.bakerlab.org/ 39

Problems with Empirical Potentials Fragments with correct local structure http://www.cs.ucl.ac.uk/staff/d.jones/t42morph.html

Fold recognition models in CASP6 Two-high-scoring predictions by the top groups in FR/H (top) and FR/A (bottom). The assigned z-scores are given for the top predictions (center) as well as for two average predictions (right). G. Wang Assessment of fold recognition predictions in CASP6, Proteins 61, S7, Pages 46-66 41

Human intervention The best groups in CASP use maximum knowledge of query proteins Specialists can help to find a correct template and correct alignments Knowledge of function Cysteines forming disulfide bridges or binding e.g. zinc molecules Proteolytic cleavage sites Other metal binding residues Antibody epitopes or escape mutants Ligand binding Results from CD or fluorescence experiments 42

Human Intervention II Fold It: The Protein Folding Game Rosetta Energy Potentials http://fold.it/portal/ Uses the HUMAN pattern recognition abilities for finding the lowest energy fold.

How Fast-folding Proteins Fold Lindorff-Larsen et al. Science. 334 (6055): 517-520.

Modelling Servers Comparative (homology) modelling: CPHmodels (simple) http://www.cbs.dtu.dk/services/cphmodels/ SwissModel (intermediate) http://swissmodel.expasy.org HHpred (complex) http://toolkit.tuebingen.mpg.de/hhpred Ab initio Robetta (also comparative; intermediate) http://robetta.bakerlab.org

Summary Methods using sequence profiles are best Use only ab initio methods if necessary and know that the quality is really low! Try to use as much knowledge as possible for alignment and template selections in difficult cases. Use meta-servers when you can. TRY FOLDIT!