3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid

Size: px
Start display at page:

Download "3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid"

Transcription

1 3D Structure Prediction with Fold Recognition/Threading Michael Tress CNB-CSIC, Madrid

2 MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDSY RKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFAL VYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDL EDERVVGKEQGQNLARQWCNCAFLESSAKSKINVNEIFYD LVRQINR MLEILDTAGTEQFTAMRDLYMKNGQGFALVYSIT AQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDL EDERV

3 ?? MREYKLVVLGSGGVGKSALTVQFVQGIF VDEYDPTIEDSYRKQVEVDCQQCMLEIL DTAGTEQFTAMRDLYMKNGQGFALVYS ITAQSTFNDLQDLREQILRVKDTEDVPM ILVGNKCDLEDERVVGKEQGQNLARQW CNCAFLESSAKSKINVNEIFYDLVRQINR? Given a new sequence that we want to know as much as possible about, one of the fundamental questions is "how does it fold?"

4 Homology Modelling vs Fold Recognition % seq. ID Application Fold Recognition Homology Modelling Target Sequence Model Quality Any Sequence Fold Level >= 30-50% ID with template Atomic Level If the sequence is similar to a known structure (>30-50% identity) you can usually generate an all atom model by homology modelling.

5 The Motivation for Fold Recognition. What if the sequence shares very little evolutionary similarity with other known structures? Is there anything that can be done then? Pairwise sequence search methods can detect folds when sequence similarity is high, but are very poor at detecting relationships that have less than 20% identity. This is where fold recognition comes in.

6 Sequence Space is Much More Diverse than Structure Space Sequence space Structural space Homology Modelling Targets Fold Recognition Targets The development of prediction methods that used structural information came from the observation that many proteins with apparently unrelated sequences had very similar 3-dimensional structures.

7 MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDS YRKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQ GFALVYSITAQSTFNDLQDLREQILRVKDTEDVPMILVG NKCDLEDERVVGKEQGQNLARQWCNCAFLESSAKSKIN VNEIFYDLVRQINR? In this case the problem to be solved is the inverse of the protein folding problem: given a protein structure, what amino acid sequences are likely to fold into that structure? ASKGEELFTGVVPILVELDGDVNGHKFSVSGEGE GDATYGKLTLKFICTTPVPWPTLVTTFSYGVQCFS RYPDHMKRHDFFKSAMPEGYVQERTIFFKDDGN YKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGH KLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIE DGSVQLADHYQQNTPIGDGPVLLPDNHYSTQSA LSKDPNEKRDHMVLLEFVTAAGIT HGMDELYK?? MREYPVKKGFPTDYDSIKRKISELG FDVKSEGDLIIASIPGISRIEIKPDK RKILVNTGDYDSDADKLAVVRTYN DFIEKLTGYSAKERKKMMTKD

8 The Reach of Fold Recognition In fact, studies show that no method can reliably detect proteins that share the same basic structure (or fold), but have no common ancestor. However, fold recognition methods can often detect proteins that are too distantly related to be detected by sequence based methods. Sequence search methods have evolved greatly (PSI-BLAST), but fold recognition methods can find folds that PSI-BLAST cannot find, because they evaluate the structural fit of the detected fold as well as the evolutionary fit.

9 The Quality Difference Between Fold Recognition Predictions and Homology Modelling Models

10 The General Principle of Fold Recognition Algorithms Fold recognition methods are based on the idea of superimposing a target sequence on database of known folds and evaluating the sequence-fold alignments. Fold recognition alignments are quite different from ordinary sequence alignments since they are evaluated from a structural perspective. Each fold recognition algorithm has its own method of scoring the alignments.

11 Threading Algorithms in Detail I Fold recogntion methods employ libraries of known protein folds that usually contain a representative subset all known structures. The target sequences, or the multiple sequence profiles built up around the sequences, are aligned against the fold library, or against representations of the folds in the library. These abstractions can be strings of descriptors that represent 3D structual features or secondary structural features.

12 Threading Algorithms in Detail II Alignments are usually generated by dynamic programming using scoring matrices that relate each amino acid to structural features or to 3D structural environments. Each alignment between target sequence and fold is scored for a range of factors, some evolutionary, some secondary structure-based and some based on physical potentials. The final score is the goodness of fit of the target sequence to each fold and is usually reported as a probability.

13 Scoring Functions for Fold Recognition Scoring functions measure some or more of the following: How similar is the structural environment of the residue to those in which the aligned amino acid is usually found? Pair potentials Solvation energy Coincidence of real and predicted secondary structure and accessibility Evolutionary information (from aligned structures and sequences)

14 Structural Environments Bowie et al. (1991) created a fold recognition approach that described each position of a fold template as being in one of eighteen environments. These environments were defined by measuring the area of side chain that was buried, the fraction of the side chain area that was exposed to polar atoms, and the local secondary structure. Other researches have developed similar methods, where the structural environments described include exposed atomic areas and type of residue-residue contacts.

15 How Structural Environments Scores are Used First a scoring matrix is generated from the probabilities of finding each of the twenty amino acids in each of the environment classes. Probabilitles are drawn from databases of known structures. A 3D profile matrix is created for each fold in the fold library using these probabilities. The matrix defines the probability of finding a certain amino acid in a certain position in each fold. The target sequence is aligned with the 3D profile and the fit evaluated from the resultant 3D alignment score.

16 Solvation Energy Solvation potential is a term used to describe the preference of an amino acid for a specific level of residue burial. It is derived by comparing the frequency of occurrence of each amino acid at a specific degree of residue burial to the frequency of occurrence of all other amino acid types with this degree of burial. The degree of burial of a residue is defined as the ratio between its solvent accessible surface area and its overall surface area.

17 Pair or Contact Potentials - the Tendency of residues to be in Contact counts d Counts become propensities (frequency at each distance separation) or energies (Boltzmann principle, -KT ln) Make count of interacting pairs of each residue type at different distance separations E d

18 Pair Potentials in Fold Recognition The energy that results from aligning a certain target sequence residue at a certain position depends on its interactions with other residues. This creates problems when pair potentials are used to create sequence structure alignments, since you do not know the position of all the residues in the model before threading them. Threading methods that use pair potentials in this way, such as THREADER (Jones et al, 1992) have to use clever programming methods to get round this problem.

19 Coincidence Between Secondary Structure and Accessibility Secondary structure prediction TOPITS uses predicted secondary structure and accessibilities for the target sequence and compares them with the known values of the template. mbia.edu/predictprotein

20 A Fold Recognition Example - 3D-PSSM 3D-PSSM combines: Target sequence profiles, Template sequence profiles, Residue equivalence, Secondary structure matching Solvation potentials Sequences are aligned to folds using dynamic programming with the alignments scored by a range of 1D and 3D profiles.

21 Preparing the Query Sequence Query sequences have their secondary structure predicted by PSI-Pred. PSI-BLAST profiles are also generated for the query sequence to allow bi-directional scoring. The 3D-FSSP dynamic programming algorithm is used to scan the fold library with the query sequence.

22 3D-PSSM Fold Profile Library PSI-BLAST and a non-redundant database are used to create profiles for each of the folds in the library. Each fold is aligned with members of the same superfamily using the structural alignment program SSAP. Those folds from SCOP with sufficient structural similarity are then also used to create profiles using PSI-BLAST in the same way. All the related profiles are merged

23 Secondary Structure and Solvation Potentials Secondary structure is assigned to each fold based on the annotation in the STRIDE database. Each residue in the fold is also assigned a solvation potential. The degree of burial of each residue is defined as the ratio between its solvent accessible surface area and its overall surface area. Solvation potential is divided into 21 bins, ranging from 0% (buried) to 100%(exposed).

24 Sequence and Secondary Structure Profiles 3D-PSSM also uses the coincidence of predicted secondary structure (target sequence) and known secondary structure (fold). Here a simple scoring scheme is used for matching secondary structure types, +1 for a match, otherwise -1.

25 3D-PSSM - Dynamic Programming Three passes of dynamic programming are performed for each query-template alignment. Each pass uses a different matrix to score the alignment, but secondary structure and solvation potential are used in each pass. The score for a match between a query residue and a fold residue is calculated the sum of the secondary structure, solvation potential and profile scores. The final score is simply the maximum of the scores from the three passes.

26 Fold Recognition Servers I 3D-PSSM - Based on sequence profiles, solvatation potentials and secondary structure. TOPITS - It employs the coincidence of secondary structure and accesibility. GenTHREADER - Combines profiles and sequence-structure alignments. A neural network-based jury system calculates the final score based on solvation and pair potentials. ORFeus - grdb.bioinfo.pl/ ORFeus is a sensitive sequence similarity search tool. It aligns two profiles and also uses predicted secondary

27 Fold Recognition Servers II SAM T The query is checked against a library of hidden Markov models. This is NOT a threading technique, it is sequence based. FUGUE - www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html FUGUE uses environment-based fold profiles that are created from structural alignments. RAPTOR - Best-scoring server in CAFASP competition; strangely is out of action at the moment.

28 Consensus Fold Recognition In the CASP series of world-wide fold recognition conferences it had long been recognised that human experts were better at fold prediction than the methods these same experts had developed. Human experts usually use several different fold recognition methods and predict folds after evaluating the results (not just the top hits) from a range of methods. So why not produce an algorithm that does at least some of what a human expert could do?

29 Consensus Fold Recognition - PCONS First consensus server Pcons did the following: The target sequence was sent to six publicly available fold recognition web servers. The scores from each method were re-scaled so they were equivalent. Models were built from all the predictions. The models were then structurally superimposed and evaluated for their similarity. The quality of the model was predicted from the rescaled score and from its similarity to other predicted models.

30 Consensus Servers - Libellula De Juan et al.,

31 Consensus Fold Recognition Servers INGBU - This produces a consensus prediction based on five methods that exploit sequence and structure information in different ways. 3D Shotgun - ShotGun is a consensus predictor that utilizes the results of fold recognition servers, such as FFAS, 3D- PSSM, FUGUE and mgenthreader, and uses a jury system to select structures Pcons - Pcons was the first consensus server for fold recognition. It selects the best prediction from several servers. PMOD can also generate models using the alignment, template and MODELLER

32 Structure Prediction (Re: Pazos) Target sequence BLAST search Fold Recognition Servers Eg 3DPSSM GenTHREADER Model 1 model 2 model 3 model 4... Fold visualisation and model comparison: Threadlize Structural classifcation: FSSP, SCOP model 1 model 2 Are there domains? PFAM/ProDom/ InterPro - BLAST results dom1 dom2... Multiple Alignment (ClustalW, PSI-BLAST) Conserved residues, correlated mutations Secondary structure, accessibility, Trans-membrane segments PHD, PSIPRED Similarity to Known structure? BLAST against PDB NO Text mining for biological information: (MedLine, Swissprot) Active sites, domains, cofactors etc. 3D backbone SI Homology modelling servers SWISSMODEL Align with template core Loops... 3D model Model Evaluation - ProSa - Biotech suite Side chain canonical loops MaxSprout Complete 3D model

33 Acknowledgements Florencio Pazos Lawrence Kelley Arne Eloffson and anyone else whose figures I used...

Protein 3D Structure Prediction

Protein 3D Structure Prediction Protein 3D Structure Prediction Michael Tress CNIO ?? MREYKLVVLGSGGVGKSALTVQFVQGIFVDE YDPTIEDSYRKQVEVDCQQCMLEILDTAGTE QFTAMRDLYMKNGQGFALVYSITAQSTFNDL QDLREQILRVKDTEDVPMILVGNKCDLEDER VVGKEQGQNLARQWCNCAFLESSAKSKINVN

More information

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading Molecular Modeling 9 Protein structure prediction, part 2: Homology modeling, fold recognition & threading The project... Remember: You are smarter than the program. Inspecting the model: Are amino acids

More information

JPred and Jnet: Protein Secondary Structure Prediction.

JPred and Jnet: Protein Secondary Structure Prediction. JPred and Jnet: Protein Secondary Structure Prediction www.compbio.dundee.ac.uk/jpred ...A I L E G D Y A S H M K... FUNCTION? Protein Sequence a-helix b-strand Secondary Structure Fold What is the difference

More information

CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods

CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods PROTEINS: Structure, Function, and Genetics Suppl 5:171 183 (2001) DOI 10.1002/prot.10036 CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods DanielFischer, 1 * ArneElofsson,

More information

Practical lessons from protein structure prediction

Practical lessons from protein structure prediction 1874 1891 Nucleic Acids Research, 2005, Vol. 33, No. 6 doi:10.1093/nar/gki327 SURVY AND SUMMARY Practical lessons from protein structure prediction Krzysztof Ginalski 1,2,3, Nick V. Grishin 3,4, Adam Godzik

More information

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods PROTEINS: Structure, Function, and Genetics 53:503 516 (2003) CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods Daniel Fischer, 1 * Leszek Rychlewski, 2 Roland L. Dunbrack,

More information

Structure Prediction. Modelado de proteínas por homología GPSRYIV. Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo

Structure Prediction. Modelado de proteínas por homología GPSRYIV. Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo Master Dianas Terapéuticas en Señalización Celular: Investigación y Desarrollo Modelado de proteínas por homología Federico Gago (federico.gago@uah.es) Departamento de Farmacología Structure Prediction

More information

Bioinformatics Practical Course. 80 Practical Hours

Bioinformatics Practical Course. 80 Practical Hours Bioinformatics Practical Course 80 Practical Hours Course Description: This course presents major ideas and techniques for auxiliary bioinformatics and the advanced applications. Points included incorporate

More information

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl)

Protein Sequence Analysis. BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Protein Sequence Analysis BME 110: CompBio Tools Todd Lowe April 19, 2007 (Slide Presentation: Carol Rohl) Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical

More information

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

1-D Predictions. Prediction of local features: Secondary structure & surface exposure Programme 8.00-8.30 Last week s quiz results 8.30-9.00 Prediction of secondary structure & surface exposure 9.00-9.20 Protein disorder prediction 9.20-9.30 Break get computers upstairs 9.30-11.00 Ex.:

More information

Protein Structure Prediction

Protein Structure Prediction Homology Modeling Protein Structure Prediction Ingo Ruczinski M T S K G G G Y F F Y D E L Y G V V V V L I V L S D E S Department of Biostatistics, Johns Hopkins University Fold Recognition b Initio Structure

More information

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Jianlin Cheng, PhD Computer Science Department and Informatics Institute University of

More information

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments BLAST 100 times faster than dynamic programming. Good for database searches. Derive a list of words of length w from query (e.g., 3 for protein, 11 for DNA) High-scoring words are compared with database

More information

EVAcon: a protein contact prediction evaluation service

EVAcon: a protein contact prediction evaluation service Nucleic Acids Research, 2005, Vol. 33, Web Server issue W347 W351 doi:10.1093/nar/gki411 EVAcon: a protein contact prediction evaluation service Osvaldo Graña, Volker A. Eyrich 1, Florencio Pazos, Burkhard

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2004 G4120: Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2004 Oliver Jovanovic, All Rights Reserved. Analysis of Protein Sequences Coding

More information

Computational Methods for Protein Structure Prediction

Computational Methods for Protein Structure Prediction Computational Methods for Protein Structure Prediction Ying Xu 2017/12/6 1 Outline introduction to protein structures the problem of protein structure prediction why it is possible to predict protein structures

More information

Protein structure prediction in 2002 Jack Schonbrun, William J Wedemeyer and David Baker*

Protein structure prediction in 2002 Jack Schonbrun, William J Wedemeyer and David Baker* sb120317.qxd 24/05/02 16:00 Page 348 348 Protein structure prediction in 2002 Jack Schonbrun, William J Wedemeyer and David Baker* Central issues concerning protein structure prediction have been highlighted

More information

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004 CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004 Lecture #5: 13 April 2004 Topics: Sequence motif identification Scribe: Samantha Chui 1 Introduction

More information

Secondary Structure Prediction. Michael Tress CNIO

Secondary Structure Prediction. Michael Tress CNIO Secondary Structure Prediction Michael Tress CNIO Why do we Need to Know About Secondary Structure? Secondary structure prediction is an important step towards deducing protein 3D structure. Secondary

More information

An Overview of Protein Structure Prediction: From Homology to Ab Initio

An Overview of Protein Structure Prediction: From Homology to Ab Initio An Overview of Protein Structure Prediction: From Homology to Ab Initio Final Project For Bioc218, Computational Molecular Biology Zhiyong Zhang Abstract The current status of the protein prediction methods,

More information

MOL204 Exam Fall 2015

MOL204 Exam Fall 2015 MOL204 Exam Fall 2015 Exercise 1 15 pts 1. 1A. Define primary and secondary bioinformatical databases and mention two examples of primary bioinformatical databases and one example of a secondary bioinformatical

More information

Methods and tools for exploring functional genomics data

Methods and tools for exploring functional genomics data Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for

More information

Exploiting Sequence Relationships for Predicting Protein Function

Exploiting Sequence Relationships for Predicting Protein Function MASTER IN TECNOLOGIE BIOINFORMATICHE APPLICATE ALLA MEDICINA PERSONALIZZATA Protein Sequence Analysis Exploiting Sequence Relationships for Predicting Protein Function Florencio Pazos (CNB-CSIC) Florencio

More information

Secondary Structure Prediction. Michael Tress, CNIO

Secondary Structure Prediction. Michael Tress, CNIO Secondary Structure Prediction Michael Tress, CNIO Why do we need to know about secondary structure? Secondary structure prediction is one important step towards deducing the 3D structure of a protein.

More information

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling Molecular Modeling 2018 -- Lecture 8 Local structure Database search Multiple alignment Automated homology modeling An exception to the no-insertions-in-helix rule Actual structures (myosin)! prolines

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

SECONDARY STRUCTURE AND OTHER 1D PREDICTION MICHAEL TRESS, CNIO

SECONDARY STRUCTURE AND OTHER 1D PREDICTION MICHAEL TRESS, CNIO SECONDARY STRUCTURE AND OTHER 1D PREDICTION MICHAEL TRESS, CNIO Amino acids have characteristic local features Protein structures have limited room for manoeuvre because the peptide bond is planar, there

More information

Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction. N. von Öhsen, I. Sommer, R. Zimmer

Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction. N. von Öhsen, I. Sommer, R. Zimmer Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction N. von Öhsen, I. Sommer, R. Zimmer Pacific Symposium on Biocomputing 8:252-263(2003) PROFILE-PROFILE ALIGNMENT: A POWERFUL TOOL

More information

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Conformational Analysis 2 Conformational Analysis Properties of molecules depend on their three-dimensional

More information

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Usage scenarios for sequence based function annotation Function prediction of newly cloned

More information

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center Comparative Modeling Part 1 Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center Function is the most important feature of a protein Function is related to structure Structure is

More information

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Sequence Analysis '17 -- lecture 16 1. Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Alpha helix Right-handed helix. H-bond is from the oxygen at i to the nitrogen

More information

Designing and benchmarking the MULTICOM protein structure prediction system

Designing and benchmarking the MULTICOM protein structure prediction system Li et al. BMC Structural Biology 2013, 13:2 RESEARCH ARTICLE Open Access Designing and benchmarking the MULTICOM protein structure prediction system Jilong Li 1, Xin Deng 1, Jesse Eickholt 1 and Jianlin

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2009 G4120: Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology & Immunology Copyright 2009 Oliver Jovanovic, All Rights Reserved. Analysis of Protein

More information

Why learn sequence database searching? Searching Molecular Databases with BLAST

Why learn sequence database searching? Searching Molecular Databases with BLAST Why learn sequence database searching? Searching Molecular Databases with BLAST What have I cloned? Is this really!my gene"? Basic Local Alignment Search Tool How BLAST works Interpreting search results

More information

Sequence Based Function Annotation

Sequence Based Function Annotation Sequence Based Function Annotation Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Sequence Based Function Annotation 1. Given a sequence, how to predict its biological

More information

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries. PRACTICAL 1: BLAST and Sequence Alignment The EBI and NCBI websites, two of the most widely used life science web portals are introduced along with some of the principal databases: the NCBI Protein database,

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

Are specialized servers better at predicting protein structures than stand alone software?

Are specialized servers better at predicting protein structures than stand alone software? African Journal of Biotechnology Vol. 11(53), pp. 11625-11629, 3 July 2012 Available online at http://www.academicjournals.org/ajb DOI:10.5897//AJB12.849 ISSN 1684 5315 2012 Academic Journals Full Length

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 19 Spring 2004-1- Protein structure Primary structure of protein is determined by number and order of amino acids within polypeptide chain.

More information

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Bioinformatics Tools Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine Overview This lecture will

More information

What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching!

What I hope you ll learn. Introduction to NCBI & Ensembl tools including BLAST and database searching! What I hope you ll learn Introduction to NCBI & Ensembl tools including BLAST and database searching What do we learn from database searching and sequence alignments What tools are available at NCBI What

More information

Simple jury predicts protein secondary structure best

Simple jury predicts protein secondary structure best Simple jury predicts protein secondary structure best Burkhard Rost 1,*, Pierre Baldi 2, Geoff Barton 3, James Cuff 4, Volker Eyrich 5,1, David Jones 6, Kevin Karplus 7, Ross King 8, Gianluca Pollastri

More information

BIOINFORMATICS Sequence to Structure

BIOINFORMATICS Sequence to Structure BIOINFORMATICS Sequence to Structure Mark Gerstein, Yale University gersteinlab.org/courses/452 (last edit in fall '06, includes in-class changes) 1 (c) M Gerstein, 2006, Yale, gersteinlab.org Secondary

More information

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs. Page 1 REMINDER: BMI 214 Industry Night Comparative Genomics Russ B. Altman BMI 214 CS 274 Location: Here (Thornton 102), on TV too. Time: 7:30-9:00 PM (May 21, 2002) Speakers: Francisco De La Vega, Applied

More information

LiveBench-1: Continuous benchmarking of protein structure prediction servers

LiveBench-1: Continuous benchmarking of protein structure prediction servers LiveBench-1: Continuous benchmarking of protein structure prediction servers JANUSZ M. BUJNICKI, 1 ARNE ELOFSSON, 2 DANIEL FISCHER, 3 AND LESZEK RYCHLEWSKI 1 1 Bioinformatics Laboratory, International

More information

Protein structure. Wednesday, October 4, 2006

Protein structure. Wednesday, October 4, 2006 Protein structure Wednesday, October 4, 2006 Introduction to Bioinformatics Johns Hopkins School of Public Health 260.602.01 J. Pevsner pevsner@jhmi.edu Copyright notice Many of the images in this powerpoint

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 3-21-2012 Protein Tertiary Model Assessment Using Granular Machine Learning

More information

Protein structure. Predictive methods

Protein structure. Predictive methods Protein structure Predictive methods Sources for this lecture Park et al, Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. JMB 1998 David

More information

proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia

proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia proteins STRUCTURE O FUNCTION O BIOINFORMATICS Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia Structural Biology and Biocomputing Programme,

More information

A REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES APPLIED TO PROTEIN STRUCTURE PREDICTION

A REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES APPLIED TO PROTEIN STRUCTURE PREDICTION A REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES APPLIED TO PROTEIN STRUCTURE PREDICTION Jiang Ye B.Sc., University of Ottawa, 2003 A PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

More information

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem. Dec-82 Oct-84 Aug-86 Jun-88 Apr-90 Feb-92 Nov-93 Sep-95 Jul-97 May-99 Mar-01 Jan-03 Nov-04 Sep-06 Jul-08 May-10 Mar-12 Growth of GenBank 160,000,000,000 180,000,000 Introduction to Bioinformatics Iosif

More information

Genetic Algorithms For Protein Threading

Genetic Algorithms For Protein Threading From: ISMB-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Genetic Algorithms For Protein Threading Jacqueline Yadgari #, Amihood Amir #, Ron Unger* # Department of Mathematics

More information

Supplement for the manuscript entitled

Supplement for the manuscript entitled Supplement for the manuscript entitled Prediction and Analysis of Nucleotide Binding Residues Using Sequence and Sequence-derived Structural Descriptors by Ke Chen, Marcin Mizianty and Lukasz Kurgan FEATURE-BASED

More information

Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction

Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction PROTEINS: Structure, Function, and Genetics 34:508 519 (1999) Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction James A. Cuff 1,2 and Geoffrey J. Barton

More information

A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs

A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs CHANGHUI YAN and JING HU Department of Computer Science Utah State University Logan, UT 84341 USA cyan@cc.usu.edu http://www.cs.usu.edu/~cyan

More information

Assessing a novel approach for predicting local 3D protein structures from sequence

Assessing a novel approach for predicting local 3D protein structures from sequence Assessing a novel approach for predicting local 3D protein structures from sequence Cristina Benros*, Alexandre G. de Brevern, Catherine Etchebest and Serge Hazout Equipe de Bioinformatique Génomique et

More information

Protein Structure Prediction

Protein Structure Prediction Protein Structure Prediction 1 Protein Structure Prediction Primary Structure Secondary Structure Prediction Tertiary Structure Prediction Prediction of Coiled Coil Domains Prediction of Transmembrane

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Università degli Studi di Roma La Sapienza

Università degli Studi di Roma La Sapienza Università degli Studi di Roma La Sapienza. Tutore Prof.ssa Anna Tramontano Coordinatore Prof. Marco Tripodi Dottorando Daniel Carbajo Pedrosa Dottorato di Ricerca in Scienze Pasteuriane XXIV CICLO Docente

More information

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia

An insilico Approach: Homology Modelling and Characterization of HSP90 alpha Sangeeta Supehia 259 Journal of Pharmaceutical, Chemical and Biological Sciences ISSN: 2348-7658 Impact Factor (SJIF): 2.092 December 2014-February 2015; 2(4):259-264 Available online at http://www.jpcbs.info Online published

More information

Two Mark question and Answers

Two Mark question and Answers 1. Define Bioinformatics Two Mark question and Answers Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three

More information

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn PROTEINS: Structure, Function, and Genetics 53:486 490 (2003) Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn Fragments Qiaojun Fang and David

More information

Match the Hash Scores

Match the Hash Scores Sort the hash scores of the database sequence February 22, 2001 1 Match the Hash Scores February 22, 2001 2 Lookup method for finding an alignment position 1 2 3 4 5 6 7 8 9 10 11 protein 1 n c s p t a.....

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER BM6005 BIO INFORMATICS Regulation 2013 Academic Year 2018-19 Prepared

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices

Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Article No. jmbi.1999.3091 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 292, 195±202 COMMUNICATION Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices

More information

Christian Sigrist. January 27 SIB Protein Bioinformatics course 2016 Basel 1

Christian Sigrist. January 27 SIB Protein Bioinformatics course 2016 Basel 1 Christian Sigrist January 27 SIB Protein Bioinformatics course 2016 Basel 1 General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific

More information

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized

Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized 1 2 3 Imaging informatics computer assisted mammogram reading Clinical aka medical informatics CDSS combining bioinformatics for diagnosis, personalized medicine, risk assessment etc Public Health Bio

More information

Template Based Modeling and Structural Refinement of Protein-Protein Interactions:

Template Based Modeling and Structural Refinement of Protein-Protein Interactions: Template Based Modeling and Structural Refinement of Protein-Protein Interactions: by Brandon Govindarajoo A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of

More information

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases

Outline. Evolution. Adaptive convergence. Common similarity problems. Chapter 7: Similarity searches on sequence databases Chapter 7: Similarity searches on sequence databases All science is either physics or stamp collection. Ernest Rutherford Outline Why is similarity important BLAST Protein and DNA Interpreting BLAST Individualizing

More information

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Vorlesungsthemen Part 1: Background

More information

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation www.bioinformation.net Web server Volume 11(8) NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation Seethalakshmi Sakthivel, Habeeb S.K.M* Department of Bioinformatics,

More information

Cascaded walks in protein sequence space: Use of artificial sequences in remote homology detection between natural proteins

Cascaded walks in protein sequence space: Use of artificial sequences in remote homology detection between natural proteins Supporting text Cascaded walks in protein sequence space: Use of artificial sequences in remote homology detection between natural proteins S. Sandhya, R. Mudgal, C. Jayadev, K.R. Abhinandan, R. Sowdhamini

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

An introduction to multiple alignments

An introduction to multiple alignments An introduction to multiple alignments original version by Cédric Notredame, updated by Laurent Falquet Overview! Multiple alignments! How-to, Goal, problems, use! Patterns! PROSITE database, syntax, use!

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool 14.06.2010 Table of contents 1 History History 2 global local 3 Score functions Score matrices 4 5 Comparison to FASTA References of BLAST History the program was designed by Stephen W. Altschul, Warren

More information

Protein annotation and modelling servers at University College London

Protein annotation and modelling servers at University College London Published online 27 May 2010 Nucleic Acids Research, 2010, Vol. 38, Web Server issue W563 W568 doi:10.1093/nar/gkq427 Protein annotation and modelling servers at University College London D. W. A. Buchan*,

More information

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous. Two proteins sharing a common ancestor are said to be homologs. Homologyoften implies structural

More information

STRUM: Structure-based stability change prediction on. single-point mutation

STRUM: Structure-based stability change prediction on. single-point mutation STRUM: Structure-based stability change prediction on single-point mutation Lijun Quan, Qiang Lv, Yang Zhang Supplemental Information Figure S1. Histogram distribution of TM-score of the I-TASSER models

More information

Protein Bioinformatics

Protein Bioinformatics Protein Bioinformatics MSRKGPRAEVCADCSA PDPGWASISRGVLVCDE CCSVHRLGRHISIVKHLR HSAWPPTLLQMVHTLA SNGANSIWEHSLLDPA QVQSGRRKAN Building Useful Relationships Between Multiple Protein Sequences and Structures

More information

FUNCTIONAL BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.

More information

Bioinformation by Biomedical Informatics Publishing Group

Bioinformation by Biomedical Informatics Publishing Group Algorithm to find distant repeats in a single protein sequence Nirjhar Banerjee 1, Rangarajan Sarani 1, Chellamuthu Vasuki Ranjani 1, Govindaraj Sowmiya 1, Daliah Michael 1, Narayanasamy Balakrishnan 2,

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

In silico Protein Recombination: Enhancing Template and Sequence Alignment Selection for Comparative Protein Modelling

In silico Protein Recombination: Enhancing Template and Sequence Alignment Selection for Comparative Protein Modelling doi:10.1016/s0022-2836(03)00309-7 J. Mol. Biol. (2003) 328, 593 608 In silico Protein Recombination: Enhancing Template and Sequence Alignment Selection for Comparative Protein Modelling Bruno Contreras-Moreira,

More information

Structural bioinformatics

Structural bioinformatics Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant

More information

What s New in Discovery Studio 2.5.5

What s New in Discovery Studio 2.5.5 What s New in Discovery Studio 2.5.5 Discovery Studio takes modeling and simulations to the next level. It brings together the power of validated science on a customizable platform for drug discovery research.

More information

Consensus Prediction of Protein Secondary Structures

Consensus Prediction of Protein Secondary Structures Consensus Prediction of Protein Secondary Structures by Zheng Wang Bachelor of Management Information System, Shandong Economic University, Jinan, P. R. China, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT

More information

Pacific Symposium on Biocomputing 5: (2000)

Pacific Symposium on Biocomputing 5: (2000) HOW UNIVERSAL ARE FOLD RECOGNITION PARAMETERS. A COMPREHENSIVE STUDY OF ALIGNMENT AND SCORING FUNCTION PARAMETERS INFLUENCE ON RECOGNITION OF DISTANT FOLDS. KRZYSZTOF A. OLSZEWSKI Molecular Simulations

More information

Getting To Know Your Protein

Getting To Know Your Protein Getting To Know Your Protein Comparative Protein Analysis: Part II. Protein Domain Identification & Classification Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics KBM Journal of Science Education (2010) 1 (1): 7-12 doi: 10.5147/kbmjse/2010/0013 Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics Pablo Sobrado Department of Biochemistry,

More information

Applying Hidden Markov Model to Protein Sequence Alignment

Applying Hidden Markov Model to Protein Sequence Alignment Applying Hidden Markov Model to Protein Sequence Alignment Er. Neeshu Sharma #1, Er. Dinesh Kumar *2, Er. Reet Kamal Kaur #3 # CSE, PTU #1 RIMT-MAEC, #3 RIMT-MAEC CSE, PTU DAVIET, Jallandhar Abstract----Hidden

More information