3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid

Similar documents
Protein 3D Structure Prediction

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

JPred and Jnet: Protein Secondary Structure Prediction.

CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods

Practical lessons from protein structure prediction

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods

Structure Prediction. Modelado de proteínas por homología GPSRYIV. Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo

Bioinformatics Practical Course. 80 Practical Hours

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein Structure Prediction

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Creation of a PAM matrix

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

EVAcon: a protein contact prediction evaluation service

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

G4120: Introduction to Computational Biology

Computational Methods for Protein Structure Prediction

Protein structure prediction in 2002 Jack Schonbrun, William J Wedemeyer and David Baker*

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

Secondary Structure Prediction. Michael Tress CNIO

An Overview of Protein Structure Prediction: From Homology to Ab Initio

MOL204 Exam Fall 2015

Methods and tools for exploring functional genomics data

Exploiting Sequence Relationships for Predicting Protein Function

Secondary Structure Prediction. Michael Tress, CNIO

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Dynamic Programming Algorithms

SECONDARY STRUCTURE AND OTHER 1D PREDICTION MICHAEL TRESS, CNIO

Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction. N. von Öhsen, I. Sommer, R. Zimmer

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Designing and benchmarking the MULTICOM protein structure prediction system

G4120: Introduction to Computational Biology

Why learn sequence database searching? Searching Molecular Databases with BLAST

Sequence Based Function Annotation

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

Protein Structure Prediction. christian studer , EPFL

Are specialized servers better at predicting protein structures than stand alone software?

BIOINFORMATICS Introduction

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching!

Simple jury predicts protein secondary structure best

BIOINFORMATICS Sequence to Structure

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

LiveBench-1: Continuous benchmarking of protein structure prediction servers

Protein structure. Wednesday, October 4, 2006

Sequence Databases and database scanning

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

Protein structure. Predictive methods

proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia

A REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES APPLIED TO PROTEIN STRUCTURE PREDICTION

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

Genetic Algorithms For Protein Threading

Supplement for the manuscript entitled

Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction

A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs

Assessing a novel approach for predicting local 3D protein structures from sequence

Protein Structure Prediction

Textbook Reading Guidelines

Università degli Studi di Roma La Sapienza

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia

Two Mark question and Answers

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn

Match the Hash Scores

VALLIAMMAI ENGINEERING COLLEGE

MATH 5610, Computational Biology

Exploring Similarities of Conserved Domains/Motifs

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices

Christian Sigrist. January 27 SIB Protein Bioinformatics course 2016 Basel 1

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Template Based Modeling and Structural Refinement of Protein-Protein Interactions:

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation

Cascaded walks in protein sequence space: Use of artificial sequences in remote homology detection between natural proteins

Advanced topics in bioinformatics

An introduction to multiple alignments

Basic Local Alignment Search Tool

Protein annotation and modelling servers at University College London

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

STRUM: Structure-based stability change prediction on. single-point mutation

Protein Bioinformatics

FUNCTIONAL BIOINFORMATICS

Bioinformation by Biomedical Informatics Publishing Group

Database Searching and BLAST Dannie Durand

In silico Protein Recombination: Enhancing Template and Sequence Alignment Selection for Comparative Protein Modelling

Structural bioinformatics

What s New in Discovery Studio 2.5.5

Consensus Prediction of Protein Secondary Structures

Pacific Symposium on Biocomputing 5: (2000)

Getting To Know Your Protein

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

Applying Hidden Markov Model to Protein Sequence Alignment

Transcription:

3D Structure Prediction with Fold Recognition/Threading Michael Tress CNB-CSIC, Madrid

MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDSY RKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFAL VYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDL EDERVVGKEQGQNLARQWCNCAFLESSAKSKINVNEIFYD LVRQINR MLEILDTAGTEQFTAMRDLYMKNGQGFALVYSIT AQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDL EDERV

?? MREYKLVVLGSGGVGKSALTVQFVQGIF VDEYDPTIEDSYRKQVEVDCQQCMLEIL DTAGTEQFTAMRDLYMKNGQGFALVYS ITAQSTFNDLQDLREQILRVKDTEDVPM ILVGNKCDLEDERVVGKEQGQNLARQW CNCAFLESSAKSKINVNEIFYDLVRQINR? Given a new sequence that we want to know as much as possible about, one of the fundamental questions is "how does it fold?"

Homology Modelling vs Fold Recognition 0 30 100 % seq. ID Application Fold Recognition Homology Modelling Target Sequence Model Quality Any Sequence Fold Level >= 30-50% ID with template Atomic Level If the sequence is similar to a known structure (>30-50% identity) you can usually generate an all atom model by homology modelling.

The Motivation for Fold Recognition. What if the sequence shares very little evolutionary similarity with other known structures? Is there anything that can be done then? Pairwise sequence search methods can detect folds when sequence similarity is high, but are very poor at detecting relationships that have less than 20% identity. This is where fold recognition comes in.

Sequence Space is Much More Diverse than Structure Space Sequence space Structural space Homology Modelling Targets Fold Recognition Targets The development of prediction methods that used structural information came from the observation that many proteins with apparently unrelated sequences had very similar 3-dimensional structures.

MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDS YRKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQ GFALVYSITAQSTFNDLQDLREQILRVKDTEDVPMILVG NKCDLEDERVVGKEQGQNLARQWCNCAFLESSAKSKIN VNEIFYDLVRQINR? In this case the problem to be solved is the inverse of the protein folding problem: given a protein structure, what amino acid sequences are likely to fold into that structure? ASKGEELFTGVVPILVELDGDVNGHKFSVSGEGE GDATYGKLTLKFICTTPVPWPTLVTTFSYGVQCFS RYPDHMKRHDFFKSAMPEGYVQERTIFFKDDGN YKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGH KLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIE DGSVQLADHYQQNTPIGDGPVLLPDNHYSTQSA LSKDPNEKRDHMVLLEFVTAAGIT HGMDELYK?? MREYPVKKGFPTDYDSIKRKISELG FDVKSEGDLIIASIPGISRIEIKPDK RKILVNTGDYDSDADKLAVVRTYN DFIEKLTGYSAKERKKMMTKD

The Reach of Fold Recognition In fact, studies show that no method can reliably detect proteins that share the same basic structure (or fold), but have no common ancestor. However, fold recognition methods can often detect proteins that are too distantly related to be detected by sequence based methods. Sequence search methods have evolved greatly (PSI-BLAST), but fold recognition methods can find folds that PSI-BLAST cannot find, because they evaluate the structural fit of the detected fold as well as the evolutionary fit.

The Quality Difference Between Fold Recognition Predictions and Homology Modelling Models

The General Principle of Fold Recognition Algorithms Fold recognition methods are based on the idea of superimposing a target sequence on database of known folds and evaluating the sequence-fold alignments. Fold recognition alignments are quite different from ordinary sequence alignments since they are evaluated from a structural perspective. Each fold recognition algorithm has its own method of scoring the alignments.

Threading Algorithms in Detail I Fold recogntion methods employ libraries of known protein folds that usually contain a representative subset all known structures. The target sequences, or the multiple sequence profiles built up around the sequences, are aligned against the fold library, or against representations of the folds in the library. These abstractions can be strings of descriptors that represent 3D structual features or secondary structural features.

Threading Algorithms in Detail II Alignments are usually generated by dynamic programming using scoring matrices that relate each amino acid to structural features or to 3D structural environments. Each alignment between target sequence and fold is scored for a range of factors, some evolutionary, some secondary structure-based and some based on physical potentials. The final score is the goodness of fit of the target sequence to each fold and is usually reported as a probability.

Scoring Functions for Fold Recognition Scoring functions measure some or more of the following: How similar is the structural environment of the residue to those in which the aligned amino acid is usually found? Pair potentials Solvation energy Coincidence of real and predicted secondary structure and accessibility Evolutionary information (from aligned structures and sequences)

Structural Environments Bowie et al. (1991) created a fold recognition approach that described each position of a fold template as being in one of eighteen environments. These environments were defined by measuring the area of side chain that was buried, the fraction of the side chain area that was exposed to polar atoms, and the local secondary structure. Other researches have developed similar methods, where the structural environments described include exposed atomic areas and type of residue-residue contacts.

How Structural Environments Scores are Used First a scoring matrix is generated from the probabilities of finding each of the twenty amino acids in each of the environment classes. Probabilitles are drawn from databases of known structures. A 3D profile matrix is created for each fold in the fold library using these probabilities. The matrix defines the probability of finding a certain amino acid in a certain position in each fold. The target sequence is aligned with the 3D profile and the fit evaluated from the resultant 3D alignment score.

Solvation Energy Solvation potential is a term used to describe the preference of an amino acid for a specific level of residue burial. It is derived by comparing the frequency of occurrence of each amino acid at a specific degree of residue burial to the frequency of occurrence of all other amino acid types with this degree of burial. The degree of burial of a residue is defined as the ratio between its solvent accessible surface area and its overall surface area.

Pair or Contact Potentials - the Tendency of residues to be in Contact counts d Counts become propensities (frequency at each distance separation) or energies (Boltzmann principle, -KT ln) Make count of interacting pairs of each residue type at different distance separations E d

Pair Potentials in Fold Recognition The energy that results from aligning a certain target sequence residue at a certain position depends on its interactions with other residues. This creates problems when pair potentials are used to create sequence structure alignments, since you do not know the position of all the residues in the model before threading them. Threading methods that use pair potentials in this way, such as THREADER (Jones et al, 1992) have to use clever programming methods to get round this problem.

Coincidence Between Secondary Structure and Accessibility Secondary structure prediction TOPITS uses predicted secondary structure and accessibilities for the target sequence and compares them with the known values of the template. http://cubic.bioc.colu mbia.edu/predictprotein

A Fold Recognition Example - 3D-PSSM 3D-PSSM combines: Target sequence profiles, Template sequence profiles, Residue equivalence, Secondary structure matching Solvation potentials Sequences are aligned to folds using dynamic programming with the alignments scored by a range of 1D and 3D profiles.

Preparing the Query Sequence Query sequences have their secondary structure predicted by PSI-Pred. PSI-BLAST profiles are also generated for the query sequence to allow bi-directional scoring. The 3D-FSSP dynamic programming algorithm is used to scan the fold library with the query sequence.

3D-PSSM Fold Profile Library PSI-BLAST and a non-redundant database are used to create profiles for each of the folds in the library. Each fold is aligned with members of the same superfamily using the structural alignment program SSAP. Those folds from SCOP with sufficient structural similarity are then also used to create profiles using PSI-BLAST in the same way. All the related profiles are merged

Secondary Structure and Solvation Potentials Secondary structure is assigned to each fold based on the annotation in the STRIDE database. Each residue in the fold is also assigned a solvation potential. The degree of burial of each residue is defined as the ratio between its solvent accessible surface area and its overall surface area. Solvation potential is divided into 21 bins, ranging from 0% (buried) to 100%(exposed).

Sequence and Secondary Structure Profiles 3D-PSSM also uses the coincidence of predicted secondary structure (target sequence) and known secondary structure (fold). Here a simple scoring scheme is used for matching secondary structure types, +1 for a match, otherwise -1.

3D-PSSM - Dynamic Programming Three passes of dynamic programming are performed for each query-template alignment. Each pass uses a different matrix to score the alignment, but secondary structure and solvation potential are used in each pass. The score for a match between a query residue and a fold residue is calculated the sum of the secondary structure, solvation potential and profile scores. The final score is simply the maximum of the scores from the three passes.

Fold Recognition Servers I 3D-PSSM - www.sbg.bio.ic.ac.uk/~3dpssm/ Based on sequence profiles, solvatation potentials and secondary structure. TOPITS - www.embl-heidelberg.de/predictprotein/ It employs the coincidence of secondary structure and accesibility. GenTHREADER - www.psipred.net/ Combines profiles and sequence-structure alignments. A neural network-based jury system calculates the final score based on solvation and pair potentials. ORFeus - grdb.bioinfo.pl/ ORFeus is a sensitive sequence similarity search tool. It aligns two profiles and also uses predicted secondary

Fold Recognition Servers II SAM T02 - www.cse.ucsc.edu/research/compbio/hmmapps/t02-query.html The query is checked against a library of hidden Markov models. This is NOT a threading technique, it is sequence based. FUGUE - www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html FUGUE uses environment-based fold profiles that are created from structural alignments. RAPTOR - www.cs.uwaterloo.ca/~j3xu/raptor_form.htm Best-scoring server in CAFASP competition; strangely is out of action at the moment.

Consensus Fold Recognition In the CASP series of world-wide fold recognition conferences it had long been recognised that human experts were better at fold prediction than the methods these same experts had developed. Human experts usually use several different fold recognition methods and predict folds after evaluating the results (not just the top hits) from a range of methods. So why not produce an algorithm that does at least some of what a human expert could do?

Consensus Fold Recognition - PCONS First consensus server Pcons did the following: The target sequence was sent to six publicly available fold recognition web servers. The scores from each method were re-scaled so they were equivalent. Models were built from all the predictions. The models were then structurally superimposed and evaluated for their similarity. The quality of the model was predicted from the rescaled score and from its similarity to other predicted models.

Consensus Servers - Libellula De Juan et al., 2001 http://www.pdg.cnb.uam.es:8081/libellula.html

Consensus Fold Recognition Servers INGBU - www.cs.bgu.ac.il/~bioinbgu/form.html This produces a consensus prediction based on five methods that exploit sequence and structure information in different ways. 3D Shotgun - http://bioinfo.pl/meta/ ShotGun is a consensus predictor that utilizes the results of fold recognition servers, such as FFAS, 3D- PSSM, FUGUE and mgenthreader, and uses a jury system to select structures Pcons - www.sbc.su.se/~arne/pcons/ Pcons was the first consensus server for fold recognition. It selects the best prediction from several servers. PMOD can also generate models using the alignment, template and MODELLER

Structure Prediction (Re: Pazos) Target sequence BLAST search Fold Recognition Servers Eg 3DPSSM GenTHREADER Model 1 model 2 model 3 model 4... Fold visualisation and model comparison: Threadlize Structural classifcation: FSSP, SCOP model 1 model 2 Are there domains? PFAM/ProDom/ InterPro - BLAST results dom1 dom2... Multiple Alignment (ClustalW, PSI-BLAST) Conserved residues, correlated mutations Secondary structure, accessibility, Trans-membrane segments PHD, PSIPRED Similarity to Known structure? BLAST against PDB NO Text mining for biological information: (MedLine, Swissprot) Active sites, domains, cofactors etc. 3D backbone SI Homology modelling servers SWISSMODEL Align with template core Loops... 3D model Model Evaluation - ProSa - Biotech suite Side chain canonical loops MaxSprout Complete 3D model

Acknowledgements Florencio Pazos Lawrence Kelley Arne Eloffson and anyone else whose figures I used...