Spring 2012 Class 6 February 9, 2012
Backbone and Sidechain Rotational Degrees of Freedom
Protein 3-D Structure The 3-dimensional fold of a protein is called a tertiary structure. Many proteins consist of more than one polypeptide folded together. The spatial relationship between these separate polypeptide chains is called the quaternary structure.
Protein Folds Mainly α
Protein Folds Mainly β
Protein Folds α/β
Protein Structure Determination How does a protein know its fold? It is believed that all the information is encoded in its primary structure (amino acid sequence). Yet, no algorithm exists as of today to successfully predict this structure the protein folding problem. There are experimental methods to determine protein structure.
X-ray Crystallography Crystallize the protein. Pass an X-ray to create a diffraction pattern. Reconstruct atomic model from electron density map.
NMR (Nuclear Magnetic Resonance) Spectroscopy Magnetic field is applied to a solution containing the protein. Atomic nuclei are aligned by the field. When unaligned, the nuclei give off a typical signal. Inter-atomic distances can be inferred. The 3-D structure can be modeled. Done in solutions atoms are free to move. Usually produces an ensemble of structures.
The Protein Databank (PDB) Most of the protein structures discovered to date can be found in a large protein repository called the The RCSB Protein DataBank (PDB): http://www.rcsb.org. PDB is a public domain repository that contains experimentally determined structures of three-dimensional proteins. The majority of the proteins in the PDB have been determined by x-ray crystallography. The number of proteins determined using NMR methods has been increasing as efficient computational techniques to derive structures from NMR data have been developed.
Retrieving Protein Structures from the PDB Starting with 7 structures in 1971, the number has been growing exponentially since then. There are tens of thousands of structures as of today. All PDB entries are 4-letter words! 1CRZ, 2BHL... Sometimes the chain number is added: 1CRZA, 1CRZB... You can download the coordinates and display the structure The BLAST server and other databases contain links to PDB entries if the sequence has a known structure.
The PDB
The PDB File Format
Classification of Protein Structures - The SCOP Database Chothia, Murzin (Cambridge) Hand-curated hierarchical taxonomy of proteins based on their structural and evolutionary relationships. Classes Fold Level Super Family Family Domain
Visualization of Protein Structures Numerous tools are available for visualizing the structures stored in the PDB and other repositories. Most such tools allow a detailed examination of the molecule in a variety of rendering modes. For example, sometimes it may be useful to have a detailed image of the surface of the molecule as experienced by a molecule of water. For other purposes, a simple, cartoonish representation of the major structural features may be sufficient.
Visualization Tools Full featured academic packages UCSF Chimera (http://chimera.ucsf.edu/) PyMOL (http://pymol.sourceforge.net/) VMD (http://www.ks.uiuc.edu/research/vmd/) Viewers RasMol/Chime (http://www.openrasmol.org/) Jmol (http://jmol.sourceforge.net/) SwissProt PDB-Viewer (DeepView) ( http://www.expasy.org/spdbv/) RCSB Protein Workshop (http://www.rcsb.org/)
Modeling Tools Amber (http://amber.scripps.edu/) Charmm (http://www.charmm.org/) NAMD (http://www.ks.uiuc.edu/research/namd/) Gaussian (http://www.gaussian.com/) ModBase (http://modbase.compbio.ucsf.edu/) Modeller (http://www.salilab.org/modeller/) DOCK (http://dock.compbio.ucsf.edu/) Many, many more (see http://cmm.info.nih.gov/)
Comparison of Visualization Packages JMol Best-in-class viewer Web enabled Scriptable Input only Compatible with RasMol scripts Limited analytical capabilities Mostly through Javascript wrappers
Comparison of Visualization Packages PyMol Best-in-class for peptidometics, speed Single-screen interface (+command line) Extensible Some modeling capabilities Good publication tools (built-in ray tracer) Scriptable
Comparison of Visualization Packages VMD Best-in-class for MD (integrated with NAMD) and other analysis tools Scriptable Excellent stereo capabilities Embedded ray tracer (Tachyon) Extensible Now supports Python, previously was only TCL/TK
Comparison of Visualization Packages Chimera Best-in-class for visualizing very large structures Multiscale extension, volume viewer Focus on extensibility, broad functionality Primarily analytical interface Familiar GUI interface (+command line) Scriptable Reasonable tools for publication & presentation Embedded ray tracer (POV-Ray) Excellent sequence/structure capabilities Eeasonable interface to modeling programs
Comparison of Visualization Packages There is no best package for everything (IMHO) YMMV (Your Mileage May Vary) What we think is easy, you may think is hard What we think is hard, you may think is easy Choosing the best package for you Does what you need Good documentation Good support (either local or from the authors)
Representation of Protein Structures Wireframe
Representation of Protein Structures Cartoon
Representation of Protein Structures Color by Secondary Structure
Representation of Protein Structures Spheres
Representation of Protein Structures Solvent Accessible Surface, Colored by Chain
The Protein Folding Problem Protein folding is the translation of primary sequence information into secondary, tertiary and quaternary structural information Dont forget post-translational modifications. They change the chemical nature of the primary sequence and thus affect the final structure
Relationship Between Structure and Function H. Wu, 1931: First formulation of the protein folding problem on record Mirsky and Pauling 1936: Chemical and physical properties of protein molecules attributed to amino-acid composition and structural arrangement of the amino-acid chain Denaturing conditions like heating assumed to abolish chemical and physical properties of a protein by melting away the protein structure The relationship between protein structure and function was under significant debate until the revolutionizing experiments of Christian Anfinsen and colleagues at the National Institute of Health (NIH) in the 1960s
Anfinsenś Experiments: Spontaneous Refolding Experiments by Christian B. Anfinsen showed that the small ribonuclease enzyme would re-assume structure and enzymatic activity after denaturation This ability to regain both structure and function was confirmed on thousands of other proteins. It seemed to be an inherent property of amino-acid chains After a decade of experiments, Anfinsen concluded that the amino-acid sequence governed the folding of a protein chain into a biologically-active conformation under a normal physiological milieu He received the Nobel prize in chemistry for his formulation of this relationship between the amino acid sequence and the biologically-active (functional) structure of a protein The telegram that I received from the Swedish Royal Academy of Sciences specifically cites...studies on ribonuclease, in particular on the relationship between the amino acid sequence and the biologically active conformation... Studies on the Principles that Govern the Folding of Protein Chains Nobel Lecture, Dec. 11, 1972
From Anfinsenś Experiments to Simulations Let the 3D spatial arrangement of atoms constituting the protein chain be referred to as a conformation How does the amino-acid sequence determine the biologically-active conformation? If one knows the answer to the above, one can formulate instructions for a computer algorithm to compute the biologically-active conformation There are many conformations (possible arrangements of the chain) of a protein chain What makes the biologically-active conformation different from the rest? Can this information be used to guide a computer algorithm to this special conformation? The telegram that I received from the Swedish Royal Academy of Sciences specifically cites...studies on ribonuclease, in particular on the relationship between the amino acid sequence and the biologically active conformation... Studies on the Principles that Govern the Folding of Protein Chains Nobel Lecture, Dec. 11, 1972