Protein Structure Prediction. christian studer , EPFL

Similar documents
CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

BIOINFORMATICS Introduction

Structural bioinformatics

Introduction to Bioinformatics

Database Searching and BLAST Dannie Durand

JPred and Jnet: Protein Secondary Structure Prediction.

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Introduction to Proteins

Protein 3D Structure Prediction

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

BLAST. compared with database sequences Sequences with many matches to high- scoring words are used for final alignments

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction

Università degli Studi di Roma La Sapienza

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

SMISS: A protein function prediction server by integrating multiple sources

What is Bioinformatics? Bioinformatics is the application of computational techniques to the discovery of knowledge from biological databases.

3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid

Dynamic Programming Algorithms

Ranking Beta Sheet Topologies of Proteins

APPLYING FEATURE-BASED RESAMPLING TO PROTEIN STRUCTURE PREDICTION

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

How Do You Clone a Gene?

Sequence Databases and database scanning

Proteins Higher Order Structures

Exploring Similarities of Conserved Domains/Motifs

B CELL EPITOPES AND PREDICTIONS

Nucleic Acids, Proteins, and Enzymes

Creation of a PAM matrix

Teaching Principles of Enzyme Structure, Evolution, and Catalysis Using Bioinformatics

Sequence Based Function Annotation. Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University

Protein Synthesis Notes

Why learn sequence database searching? Searching Molecular Databases with BLAST

Protein annotation and modelling servers at University College London

Typically, to be biologically related means to share a common ancestor. In biology, we call this homologous

CS612 - Algorithms in Bioinformatics

VL Algorithmische BioInformatik (19710) WS2013/2014 Woche 3 - Mittwoch

NUCLEIC ACIDS Genetic material of all known organisms DNA: deoxyribonucleic acid RNA: ribonucleic acid (e.g., some viruses)

The String Alignment Problem. Comparative Sequence Sizes. The String Alignment Problem. The String Alignment Problem.

Distributions of Beta Sheets in Proteins with Application to Structure Prediction

Chapter 3 Nucleic Acids, Proteins, and Enzymes

Theory and Application of Multiple Sequence Alignments

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

Building an Antibody Homology Model

Protein Bioinformatics PH Final Exam

MATH 5610, Computational Biology

Green Genes: a DNA Curriculum

SAMPLE LITERATURE Please refer to included weblink for correct version.

Structural Bioinformatics (C3210) DNA and RNA Structure

Supersecondary Structure Motifs and De Novo Protein Structure Prediction

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

Genome Sequence Assembly

Alignment to a database. November 3, 2016

Nucleic Acids: DNA and RNA

Protein single-model quality assessment by feature-based probability density functions

Recommendations from the BCB Graduate Curriculum Committee 1

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

Characterization of transcription factor binding sites by high-throughput SELEX. Overview of the HTPSELEX Database

Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction

Comparative Bioinformatics. BSCI348S Fall 2003 Midterm 1

RNA Structure Prediction and Comparison. RNA Biology Background

Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM

The Double Helix. DNA and RNA, part 2. Part A. Hint 1. The difference between purines and pyrimidines. Hint 2. Distinguish purines from pyrimidines

Nanobiotechnology. Place: IOP 1 st Meeting Room Time: 9:30-12:00. Reference: Review Papers. Grade: 50% midterm, 50% final.

90 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 4, 2006

Profile-Profile Alignment: A Powerful Tool for Protein Structure Prediction. N. von Öhsen, I. Sommer, R. Zimmer

ONLINE BIOINFORMATICS RESOURCES

Introduction to BIOINFORMATICS

proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia

CS612 - Algorithms in Bioinformatics

BLAST. Basic Local Alignment Search Tool. Optimized for finding local alignments between two sequences.

Homology Modeling of Mouse orphan G-protein coupled receptors Q99MX9 and G2A

Is Abalone, Bio designer and Fold it, the best software for Protein Structure Prediction of AIDS Virus?

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Gene-centered resources at NCBI

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

Modeling of Protein Production Process by Finite Automata (FA)

The Integrated Biomedical Sciences Graduate Program

Badri Adhikari, Debswapna Bhattacharya, Renzhi Cao, and Jianlin Cheng*

Nucleic acids and protein synthesis

DNA & Protein Synthesis UNIT D & E

Create a model to simulate the process by which a protein is produced, and how a mutation can impact a protein s function.

BIOCHEMISTRY Nucleic Acids

DNA is the genetic material. DNA structure. Chapter 7: DNA Replication, Transcription & Translation; Mutations & Ames test

proteins Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11

Gene Expression Transcription/Translation Protein Synthesis

BETA STRAND Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna

LAB 6: Agarose Gel Electrophoresis of Restriction Digested Plasmid DNA

Packing of Secondary Structures

ALGORITHMS IN BIO INFORMATICS. Chapman & Hall/CRC Mathematical and Computational Biology Series A PRACTICAL INTRODUCTION. CRC Press WING-KIN SUNG

Solutions to Quiz II

Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures

Transcription Gene regulation

From DNA to Protein Structure and Function

Project Developer Forum Introduction for Executive Board

Protein Bioinformatics Part I: Access to information

Retrieving and Viewing Protein Structures from the Protein Data Base

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

Introduction to Bioinformatics Online Course: IBT

Biotechnology Explorer

Transcription:

Protein Structure Prediction christian studer 17.11.2004, EPFL

Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results

Definition of the problem Massive amounts of data from DNA sequencing available Protein sequences can be derived Knowing the tertiary structure is useful in drug design etc.

Primary structure

Secondary structure

Tertiary structure

Quarternary structure Two and more protein molecules assembled Called protein subunits

Zur Anzeige wird der QuickTime Dekompressor YUV420 codec benötigt. What we are looking for...

Possible Approaches 2D prediction Ab initio structure prediction Homology based modeling

2D prediction Classical approach X-ray crystallography NMR techniques

Zur Anzeige wird der QuickTime Dekompressor GIF benötigt. Myoglobin

2D prediction limitations High experimental costs Large computing power required Depending on parameters: time, chemical environment, location in the metabolical pathway

Ab initio structure prediction Predicts 3D structure from 1D sequence data Ignores additional informations Target function (Free energy minimalization, molecular dynamics)

Levinthal paradox The number of possible protein sequences and its folding possibilities are astronomical Even real proteins cannot try all possible confirmations during folding Paradox: In nature, the folding is very quick and reliable

Further problems The physics behind structural stability not completely understood Primary sequence may not completely define tertiary structure

Folding at home

Homology based modeling Reduce to a known problem Similar amino sequence yields similar protein structure

Templates Known structure of another protein Sequence alignment called threading

Threading Extension of sequence analysis Predicts the course of the protein backbone Aligns a sequence to the templates structure Scoring function

Folding 3D alignment of secondary structures In nature: folds more important than exact protein sequence

DSSP Application in Bioinformatics Define Secondary Structure Of Proteins Objective classification No prediction

Protein Data Bank

Application Input: 3D coordinates of atoms Output: secondary structure Usage: dssp [-na] [-v] pdb_file [dssp_file]

Codes H: alpha-helix B: beta-bridge E: beta-sheet, extended strand G: 3-turn-helix I: 5-turn-helix S: bend

Output...;...1...;...2...;...3...;...4...;...5...;...6...;...7...-- sequential resnumber, including chain breaks as extra residues.-- original PDB resname, not nec. sequential, may contain letters.-- amino acid sequence in one letter code.-- secondary structure summary based on columns 19-38 xxxxxxxxxxxxxxxxxxxx recommend columns for secstruc details.-- 3-turns/helix.-- 4-turns/helix.-- 5-turns/helix.-- geometrical bend.-- chirality.-- beta bridge label.-- beta bridge label.-- beta bridge partner resnum.-- beta bridge partner resnum.-- beta sheet label.-- solvent accessibility # RESIDUE AA STRUCTURE BP1 BP2 ACC 35 47 I E + 0 0 2 36 48 R E > S- K 0 39C 97 37 49 Q T 3 S+ 0 0 86 (example from 1EST) 38 50 N T 3 S+ 0 0 34 39 51 W E < -KL 36 98C 6

Stereoscopic rendering

PSI-BLAST Application in Bioinformatics Position-Specific Iterative BLAST Iterative use of BLAST on a specific database

Application Input: Protein sequence Output: Protein family / sequence profile Usage: Web based

Protein family PSI-BLAST helps to find homologies Finds distant relatives Generates multiple alignments aka. sequence profile

Generalization Input: Target sequence Output: Template structure

Measurement Two values to be optimized: Minimum rms between matched points Maximum number of matched residues

Scoring Determine the best alignment Minimize free energy Pair-interaction and contact capacity are important

Scoring function calibration Different factors weighted I.E. sequence, secondary structure, contact capacities... Substitution matrices, linear programming, supervised learning, support vector machines... help to improve the scoring function

Secondary structure prediction The better the secondary structure prediction, the better the tertiary structure prediction In special cases knowing secondary structures don t help Requires more computing power

Assessment Simple calculation: Number of matching residues Total number of residues

Tertiary structure prediction Brute force Sequence methods Sequence and secondary structure Tertiary structure based methods

Brute force

Sequence methods PSI-BLAST and fold libraries allow for efficient searching Other methods: FASTA, HMM... Include secondary structures for further information

Tertiary structure based methods Two main types: Profile method Threading approaches

Profile threading methods Many variations, based on different input data Example: 3D-PSSM

Other methods Optimal threading via exhaustive enumeration Branch-and-bound algorithms for fragment threading Heuristic search procedures based on simulations Dynamic programming / Double dynamic programming Recursive dynamic programming

Algorithms and support Major databases: ~40 Tools: ~50

Evaluation Every two years: CASP Algorithms are tested against resolved, but unpublished proteins

Results CASP history teaches: Progress is being made, but humans can still outperform computers Not everything can be solved

CASP 3 43 targets 15 easy ones solved More than half of the 21 difficult ones solved by at least one method (completely automatic)