CS612 - Algorithms in Bioinformatics

Similar documents
CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Introduction to Bioinformatics

Retrieving and Viewing Protein Structures from the Protein Data Base

Molecular Structures

Molecular Structures

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Protein Folding Problem I400: Introduction to Bioinformatics

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Computational methods in bioinformatics: Lecture 1

CMSE 520 BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS

BMB/Bi/Ch 170 Fall 2017 Problem Set 1: Proteins I

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Searching and Analysing 3D Protein-ligand Structures using PDBe web services. Abhik Mukhopadhyay.

PREDICTION OF PROTEIN TERTIARY STRUCTURE USING CROSS VALIDATION TECHNIQUE

Protein Structure Databases, cont. 11/09/05

Protein structure. Wednesday, October 4, 2006

Lecture 1 - Introduction to Structural Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics

Structural bioinformatics

Protein Structure. Protein Structure Tertiary & Quaternary

This practical aims to walk you through the process of text searching DNA and protein databases for sequence entries.

CHEM 436 / 630. Molecular modelling of proteins. Winter 2018 Term. Instructor: Guillaume Lamoureux Concordia University, Montréal, Canada

Textbook Reading Guidelines

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 5 Protein Structure - III

GMOL: An Interactive Tool for 3D Genome Structure Visualization

Overview. Secondary Structure. Tertiary Structure

Bioinformatics Practical Course. 80 Practical Hours

Bioinformatics Tools. Stuart M. Brown, Ph.D Dept of Cell Biology NYU School of Medicine

Biological databases an introduction

Biochemistry Proteins

Homology Modeling of Mouse orphan G-protein coupled receptors Q99MX9 and G2A

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Two Mark question and Answers

Protein Structure Prediction. christian studer , EPFL

Regulatory Consideration for the Characterization of HOS in Biotechnology Products

Virtual bond representation

Learning to Use PyMOL (includes instructions for PS #2)

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Combining PSSM and physicochemical feature for protein structure prediction with support vector machine

ONLINE BIOINFORMATICS RESOURCES

Textbook Reading Guidelines

Molecular Dynamics of Proteins

Outline. Computational Genomics and Molecular Biology. Genes Encode Proteins. DNA forms a double stranded helix. DNA replication T C A G

Predicting Protein Structure and Examining Similarities of Protein Structure by Spectral Analysis Techniques

Protein structure and evolution in the light of flexibility

systemsdock Operation Manual

What s New in Discovery Studio 2.5.5

Protein Bioinformatics PH Final Exam

Structures of Biomolecules by NMR Spectroscopy

1) In no more than 3 sentences and using your own words: What were the key observations in Meselson & Stahl s paper? (6 points)

CHEM/ENCH 323: Biological Chemistry Winter term 2018

Proteins and folding

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

Project ideas. CS/BioE/CME/Biophys/BMI 279 Nov. 9, 2017 Ron Dror

4/10/2011. Rosetta software package. Rosetta.. Conformational sampling and scoring of models in Rosetta.

Comments. polyproline ( PXXP ) motif for SH3 binding RGD motif for integrin binding GXXXG motif within the TM domain of membrane protein

Protein Data Bank and Structure Display with PyMOL

PROTEINS & NUCLEIC ACIDS

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

ProBiS: a web server for detection of structurally similar protein binding sites

Protein Structure Prediction

Molecular Dynamics Simulations of the NF-κB. Inducing Kinase:

By Dr. Abo Bakr Eltayeb

Ali Yaghi. Tamara Wahbeh. Mamoun Ahram

A Balanced Secondary Structure Predictor

Poster Project Extended Report: Protein Folding and Computational Techniques Blake Boling. Abstract. Introduction

Computational Methods for Protein Structure Prediction

ViewFeature: Integrated Feature Analysis and Visualization

Protein Bioinformatics Part I: Access to information

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

A Molecular Replacement Pipeline

(13 February 2016) Figure 1: Max Perutz and John Kendrew, with models of hemoglobin and myoglobin, respectively (1962;

An Overview of Protein Structure Prediction: From Homology to Ab Initio

Molecular Forces in Antibody Maturation*

Protein Structure/Function Relationships

Exploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling

R = G A (purine) Y = T C (pyrimidine) K = G T (Keto) M = A C (amino) S = G C (Strong bonds) W = A T (Weak bonds)

If you wish to have extra practice with swiss pdb viewer or to familiarize yourself with how to use the program here is a tutorial:

Getting to know RNA Polymerase: A Tutorial For Exploring Proteins in VMD By Sharlene Denos (UIUC)

What can you tell me about this picture?

Introduction to molecular docking

A REVIEW OF ARTIFICIAL INTELLIGENCE TECHNIQUES APPLIED TO PROTEIN STRUCTURE PREDICTION

Project Manual Bio3055. Lung Cancer: Cytochrome P450 1A1

Protein Peeling 2: a web server to convert protein structures into series of Protein Units.

Project ideas. CS/BioE/CME/Biophys/BMI 279 Nov. 9, 2017 Ron Dror

Predicting protein-protein interactions on a proteome scale by matching evolutionary and structural similarities at interfaces using PRISM

Near-Native Protein Folding

Research in Structural Bioinformatics and Molecular Biophysics. OUTLINE: What is it and why is it useful? EXAMPLES: b. Improving enzyme s function.

Protein Folding and Misfolding

Comparing the Structure of the Light Harvesting Complex II across Species of Purple Bacteria

Introduction to Proteins

Lecture 1a Introduction to Protein Structures - Molecular Graphics Tool. Software Widely Used by Scientific Community

Nanobiotechnology. Place: IOP 1 st Meeting Room Time: 9:30-12:00. Reference: Review Papers. Grade: 50% midterm, 50% final.

Bi 8 Lecture 7. Ellen Rothenberg 26 January Reading: Ch. 3, pp ; panel 3-1

KEMM15 Lecture note in structural bioinformatics: A practical guide. S Al-Karadaghi, Biochemistry & Structural Biology, Lund University

Navigating BioLuminate for PyMOL Users: A Practical Approach. 2nd European Life Science Bootcamp Ana Negri, PhD

Problem Set #1

CS612 - Algorithms in Bioinformatics

Transcription:

Spring 2012 Class 6 February 9, 2012

Backbone and Sidechain Rotational Degrees of Freedom

Protein 3-D Structure The 3-dimensional fold of a protein is called a tertiary structure. Many proteins consist of more than one polypeptide folded together. The spatial relationship between these separate polypeptide chains is called the quaternary structure.

Protein Folds Mainly α

Protein Folds Mainly β

Protein Folds α/β

Protein Structure Determination How does a protein know its fold? It is believed that all the information is encoded in its primary structure (amino acid sequence). Yet, no algorithm exists as of today to successfully predict this structure the protein folding problem. There are experimental methods to determine protein structure.

X-ray Crystallography Crystallize the protein. Pass an X-ray to create a diffraction pattern. Reconstruct atomic model from electron density map.

NMR (Nuclear Magnetic Resonance) Spectroscopy Magnetic field is applied to a solution containing the protein. Atomic nuclei are aligned by the field. When unaligned, the nuclei give off a typical signal. Inter-atomic distances can be inferred. The 3-D structure can be modeled. Done in solutions atoms are free to move. Usually produces an ensemble of structures.

The Protein Databank (PDB) Most of the protein structures discovered to date can be found in a large protein repository called the The RCSB Protein DataBank (PDB): http://www.rcsb.org. PDB is a public domain repository that contains experimentally determined structures of three-dimensional proteins. The majority of the proteins in the PDB have been determined by x-ray crystallography. The number of proteins determined using NMR methods has been increasing as efficient computational techniques to derive structures from NMR data have been developed.

Retrieving Protein Structures from the PDB Starting with 7 structures in 1971, the number has been growing exponentially since then. There are tens of thousands of structures as of today. All PDB entries are 4-letter words! 1CRZ, 2BHL... Sometimes the chain number is added: 1CRZA, 1CRZB... You can download the coordinates and display the structure The BLAST server and other databases contain links to PDB entries if the sequence has a known structure.

The PDB

The PDB File Format

Classification of Protein Structures - The SCOP Database Chothia, Murzin (Cambridge) Hand-curated hierarchical taxonomy of proteins based on their structural and evolutionary relationships. Classes Fold Level Super Family Family Domain

Visualization of Protein Structures Numerous tools are available for visualizing the structures stored in the PDB and other repositories. Most such tools allow a detailed examination of the molecule in a variety of rendering modes. For example, sometimes it may be useful to have a detailed image of the surface of the molecule as experienced by a molecule of water. For other purposes, a simple, cartoonish representation of the major structural features may be sufficient.

Visualization Tools Full featured academic packages UCSF Chimera (http://chimera.ucsf.edu/) PyMOL (http://pymol.sourceforge.net/) VMD (http://www.ks.uiuc.edu/research/vmd/) Viewers RasMol/Chime (http://www.openrasmol.org/) Jmol (http://jmol.sourceforge.net/) SwissProt PDB-Viewer (DeepView) ( http://www.expasy.org/spdbv/) RCSB Protein Workshop (http://www.rcsb.org/)

Modeling Tools Amber (http://amber.scripps.edu/) Charmm (http://www.charmm.org/) NAMD (http://www.ks.uiuc.edu/research/namd/) Gaussian (http://www.gaussian.com/) ModBase (http://modbase.compbio.ucsf.edu/) Modeller (http://www.salilab.org/modeller/) DOCK (http://dock.compbio.ucsf.edu/) Many, many more (see http://cmm.info.nih.gov/)

Comparison of Visualization Packages JMol Best-in-class viewer Web enabled Scriptable Input only Compatible with RasMol scripts Limited analytical capabilities Mostly through Javascript wrappers

Comparison of Visualization Packages PyMol Best-in-class for peptidometics, speed Single-screen interface (+command line) Extensible Some modeling capabilities Good publication tools (built-in ray tracer) Scriptable

Comparison of Visualization Packages VMD Best-in-class for MD (integrated with NAMD) and other analysis tools Scriptable Excellent stereo capabilities Embedded ray tracer (Tachyon) Extensible Now supports Python, previously was only TCL/TK

Comparison of Visualization Packages Chimera Best-in-class for visualizing very large structures Multiscale extension, volume viewer Focus on extensibility, broad functionality Primarily analytical interface Familiar GUI interface (+command line) Scriptable Reasonable tools for publication & presentation Embedded ray tracer (POV-Ray) Excellent sequence/structure capabilities Eeasonable interface to modeling programs

Comparison of Visualization Packages There is no best package for everything (IMHO) YMMV (Your Mileage May Vary) What we think is easy, you may think is hard What we think is hard, you may think is easy Choosing the best package for you Does what you need Good documentation Good support (either local or from the authors)

Representation of Protein Structures Wireframe

Representation of Protein Structures Cartoon

Representation of Protein Structures Color by Secondary Structure

Representation of Protein Structures Spheres

Representation of Protein Structures Solvent Accessible Surface, Colored by Chain

The Protein Folding Problem Protein folding is the translation of primary sequence information into secondary, tertiary and quaternary structural information Dont forget post-translational modifications. They change the chemical nature of the primary sequence and thus affect the final structure

Relationship Between Structure and Function H. Wu, 1931: First formulation of the protein folding problem on record Mirsky and Pauling 1936: Chemical and physical properties of protein molecules attributed to amino-acid composition and structural arrangement of the amino-acid chain Denaturing conditions like heating assumed to abolish chemical and physical properties of a protein by melting away the protein structure The relationship between protein structure and function was under significant debate until the revolutionizing experiments of Christian Anfinsen and colleagues at the National Institute of Health (NIH) in the 1960s

Anfinsenś Experiments: Spontaneous Refolding Experiments by Christian B. Anfinsen showed that the small ribonuclease enzyme would re-assume structure and enzymatic activity after denaturation This ability to regain both structure and function was confirmed on thousands of other proteins. It seemed to be an inherent property of amino-acid chains After a decade of experiments, Anfinsen concluded that the amino-acid sequence governed the folding of a protein chain into a biologically-active conformation under a normal physiological milieu He received the Nobel prize in chemistry for his formulation of this relationship between the amino acid sequence and the biologically-active (functional) structure of a protein The telegram that I received from the Swedish Royal Academy of Sciences specifically cites...studies on ribonuclease, in particular on the relationship between the amino acid sequence and the biologically active conformation... Studies on the Principles that Govern the Folding of Protein Chains Nobel Lecture, Dec. 11, 1972

From Anfinsenś Experiments to Simulations Let the 3D spatial arrangement of atoms constituting the protein chain be referred to as a conformation How does the amino-acid sequence determine the biologically-active conformation? If one knows the answer to the above, one can formulate instructions for a computer algorithm to compute the biologically-active conformation There are many conformations (possible arrangements of the chain) of a protein chain What makes the biologically-active conformation different from the rest? Can this information be used to guide a computer algorithm to this special conformation? The telegram that I received from the Swedish Royal Academy of Sciences specifically cites...studies on ribonuclease, in particular on the relationship between the amino acid sequence and the biologically active conformation... Studies on the Principles that Govern the Folding of Protein Chains Nobel Lecture, Dec. 11, 1972