Protein Folding Problem I400: Introduction to Bioinformatics

Protein Folding Problem I400: Introduction to Bioinformatics November 29, 2004

Protein biomolecule, macromolecule more than 50% of the dry weight of cells is proteins polymer of amino acids connected into linear chains strings of symbols machinery of life play central role in the structure and function of cells regulate and execute many biological functions a) amino acid b) peptide bond

Peptide bonds they are planar, nearly they are very strong a) Pauling s theoretical model (1951) b) experimentally determined bond lengths

Ramachandran plot 1. α region corresponds to the residues found typically in the alpha helices 2. β region corresponds to the residues found typically in the beta sheets 3. L region corresponds to the residues typically found in the left handed helices most angles are not allowed because of steric collisions between atoms of the same residue between atoms of the neighboring residues the allowed combinations can be calculated

Ramachandran plot a) observed Ramachandran angles for all residues except glycine b) observed Ramachandran angles for glycine

Levels of protein structure Native state (conformation) conformation at which protein shows its activity

Helical structure a) idealized diagram b) the same as a) but with approximate positions for main-chain atoms c) schematic diagram of an alpha helix d) a ball and stick model

Anti-parallel beta sheet

Anti-parallel beta sheet Typical length: 5-10 residues

Structural motifs - Hairpin loop a) histogram showing the frequency of hairpin loops of different lengths in 62 proteins b) the two most frequently occurring two-residue hairpin loops

Parallel beta sheet

Helix-loop-helix motif calmodulin helix-loop-helix motif with Ca 2+ atom attached

Four helix bundle hydrophobic residues tend to be on the inside polar residues tend to be on the outside of proteins a) four helix bundle red cylinders are helices while green parts are loops b) projection from above

Disulphide bonds (bridges) covalent bond between two cysteins (i.e. their sulfur atoms) require oxidative environment present in extracellular proteins stabilize proteins create so-called longrange interactions

Central dogma of molecular biology How do proteins fold?

Levinthal s calculation Cyrus Levinthal 1968 Q: do proteins explore all possible conformations before they adopt a specific 3-D structure? A: let s consider a simplified problem each residue can adopt one of the three discrete groups from the Ramachandran plot (alpha, beta, L) a switch between conformations can be done in 10-12 seconds then, a protein with 150 residues would need to explore 3 150 possible states, which is 10 71 at the rate of 10-12 a protein would need ~10 50 years we know that protein folds between 0.1s and 1000s

Protein folding problem How do proteins fold into a specific 3-D structure? How does the primary structure of a protein determine its secondary and tertiary structure? ----- there are two conditions a protein needs to meet there must be a single, stable, folded conformation (thermodynamic condition) a protein must fold on an appropriate time scale (kinetic condition) thus, only a small amount of conformational space is explored also, there must exist a specific folding pathway the paradox how proteins quickly fold into specific 3-D conformations is called a protein folding problem

Anfinsen s experiment Urea: There is sufficient information contained in the protein sequence to guarantee correct folding from any of a large number of unfolded states.

Thermodynamic hypothesis native conformation of a protein is adopted spontaneously i.e. amino acid sequence 3-D structure Anfinsen s demonstration of this fundamental property of proteins opened the problem to a massive amount of experimental and theoretical effort. His summary of the experiments was presented as a Nobel Prize Lecture and published in: Anfinsen, C.B. (1973) "Principles that govern the folding of protein chains." Science 181 223-230.

Structural features present in any folded globular protein most main chain hydrogen bonds are formed core is formed of hydrophobic residues, but not all hydrophobic residues are buried well-packed hydrophobic core secondary structural elements (helices, sheet) show a hydrophobic and hydrophilic face (amphipathic). Amhipathic helices have hydrophobic residues every third/fourth residue; amphipathic sheets alternate hydrophobic and hydrophilic residues along each strand most polar residues are found on the surface buried polar residues have H-bonded partners all (usually) charged residues are on the surface

Hierarchical folding model there is a major folding pathway that most proteins follow local neighborhoods interact and create folding hydrophobic units then, domains and entire proteins are created however, not all local neighborhoods show propensities towards one preferred conformations

Folding intermediates Molten Globule the first observable event in the protein folding pathway; partially organized ordered state resembles a liquid state but has most secondary structures formed

Can proteins misfold? the lack of function is not always the worst-case scenario the role of molecular chaperones (also proteins) is to assist folding of many proteins (so-called chaperone clients)

Fischer s experiment Hermann Emil Fischer 1894 An enzyme and a substrate have to fit each other like a lock and key in order to exert chemical effect on each other lock-and-key theory later, lock-and-key paradigm was expanded to contain socalled induced fit theory Lock-and-key is a structure-function paradigm!

Sequence-structure-function paradigm Standard protein structure/function paradigm (Fischer, 1894, Anfinsen 1973) Amino Acid Sequence 3-D Structure Protein Function Classification: Gene Transfer EC Number: 1.2.1.13 Dominant view: 3-D structure is prerequisite for protein function

Calcineurin-calmodulin counter example Calcineurin: Kissinger et al., 1995 CALMODULIN calcium-dependent phosphatase regulated by calmodulin (calcium-binding protein) induces conformational change of calmodulin upon binding may be involved in human hart failure when calcium concentration is chronically increased disorder is important for the binding mechanism

How calmodulin works? calmodulin (CaM) wraps around an autoinhibitory helix and pulls it away this exposes an active site of the kinase activity is shown only in the presence of calcium (with exceptions) binding is driven by hydrophobic and electrostatic interactions

Intrinsically disordered proteins Proteins with regions that have no stable 3-D conformation have regions absent in protein crystal structures exist as ensembles of conformations Ramachandran angles vary widely in time Detected by x-ray crystallography NMR spectroscopy other methods (limited proteolysis, circular dichroism...) Found in all 3 kingdoms more often in higher eukaryotes Have function and required for function cell signaling transcription regulation

Major areas of research Prediction of secondary structure, fold, and function Identification of targets for high-throughput experimental studies Molecular simulation (docking) Evolutionary analyses System sciences (developing models of cells)