Protein Structure, Function, and Folding

Size: px
Start display at page:

Download "Protein Structure, Function, and Folding"

Transcription

1 Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science & Engineering 2003 Protein Structure, Function, and Folding Dan E. Krane Wright State University - Main Campus, dan.krane@wright.edu Michael L. Raymer Wright State University - Main Campus, michael.raymer@wright.edu Follow this and additional works at: Part of the Computer Sciences Commons, and the Engineering Commons Repository Citation Krane, D. E., & Raymer, M. L. (2003). Protein Structure, Function, and Folding.. This Presentation is brought to you for free and open access by Wright State University s CORE Scholar. It has been accepted for inclusion in Computer Science and Engineering Faculty Publications by an authorized administrator of CORE Scholar. For more information, please contact corescholar@ library-corescholar@wright.edu.

2 Two cyteines in close proximity will form a covalent bond Disulfide bond, disulfide bridge, or dicysteine bond. Significantly stabilizes tertiary structure. Disulfide Bonds 1

3 Determining Protein Structure There are O(100,000) distinct proteins in the human proteome. 3D structures have been determined for 14,000 proteins, from all organisms Includes duplicates with different ligands bound, etc. Coordinates are determined by X-ray crystallography 2

4 X-Ray Crystallography ~0.5mm The crystal is a mosaic of millions of copies of the protein. As much as 70% is solvent (water)! May take months (and a green thumb) to grow. 3

5 X-Ray diffraction Image is averaged over: Space (many copies) Time (of the diffraction experiment) 4

6 Resolution is dependent on the quality/regularity of the crystal R-factor is a measure of leftover electron density Solvent fitting Refinement Electron Density Maps 5

7 The Protein Data Bank ATOM 1 N ALA E APR 213 ATOM 2 CA ALA E APR 214 ATOM 3 C ALA E APR 215 ATOM 4 O ALA E APR 216 ATOM 5 CB ALA E APR 217 ATOM 6 N GLY E APR 218 ATOM 7 CA GLY E APR 219 ATOM 8 C GLY E APR 220 ATOM 9 O GLY E APR 221 ATOM 10 N VAL E APR 222 ATOM 11 CA VAL E APR 223 ATOM 12 C VAL E APR 224 ATOM 13 O VAL E APR 225 ATOM 14 CB VAL E APR 226 ATOM 15 CG1 VAL E APR 227 ATOM 16 CG2 VAL E APR 228 6

8 A Peek at Protein Function Serine proteases cleave other proteins Catalytic Triad: ASP, HIS, SER 7

9 Cleaving the peptide bond 8

10 Three Serine Proteases Chymotrypsin Cleaves the peptide bond on the carboxyl side of aromatic (ring) residues: Trp, Phe, Tyr; and large hydrophobic residues: Met. Trypsin Cleaves after Lys (K) or Arg (R) Positive charge Elastase Cleaves after small residues: Gly, Ala, Ser, Cys 9

11 Specificity Binding Pocket 10

12 The Protein Folding Problem Central question of molecular biology: Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be? Input: AAVIKYGCAL Output: φ 1 ψ 1, φ 2 ψ 2 = backbone conformation: (no side chains yet) 11

13 Protein Folding Biological perspective Central dogma : Sequence specifies structure Denature to unfold a protein back to random coil configuration β-mercaptoethanol breaks disulfide bonds Urea or guanidine hydrochloride denaturant Also heat or ph Anfinsen s experiments Denatured ribonuclease Spontaneously regained enzymatic activity Evidence that it re-folded to native conformation 12

14 Folding intermediates Levinthal s paradox Consider a 100 residue protein. If each residue can take only 3 positions, there are = possible conformations. If it takes s to convert from 1 structure to another, exhaustive search would take years! Folding must proceed by progressive stabilization of intermediates Molten globules most secondary structure formed, but much less compact than native conformation. 13

15 Forces driving protein folding It is believed that hydrophobic collapse is a key driving force for protein folding Hydrophobic core Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions 14

16 Folding help Proteins are, in fact, only marginally stable Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form Many proteins help in folding Protein disulfide isomerase catalyzes shuffling of disulfide bonds Chaperones break up aggregates and (in theory) unfold misfolded proteins 15

17 The Hydrophobic Core Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. The mutation E6 V in the β chain places a hydrophobic Val on the surface of hemoglobin The resulting sticky patch causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently Sickle cell anemia was the first identified molecular disease 16

18 Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination. 17

19 Computational Problems in Protein Folding Two key questions: Evaluation how can we tell a correctly-folded protein from an incorrectly folded protein? H-bonds, electrostatics, hydrophobic effect, etc. Derive a function, see how well it does on real proteins Optimization once we get an evaluation function, can we optimize it? Simulated annealing/monte carlo EC Heuristics We ll talk more about these methods later 18

20 Fold Optimization Simple lattice models (HPmodels) Two types of residues: hydrophobic and polar 2-D or 3-D lattice The only force is hydrophobic collapse Score = number of H H contacts 19

21 Scoring Lattice Models H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues 20

22 What can we do with lattice models? For smaller polypeptides, exhaustive search can be used Looking at the best fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used Greedy, branch and bound Evolutionary computing, simulated annealing Graph theoretical methods 21

23 Learning from Lattice Models The hydrophobic zipper effect: Ken Dill ~

24 Representing a lattice model Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation 23

25 Preference-order representation Each position has two preferences If it can t have either of the two, it will take the least favorite path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FR},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL} 24

26 Decoding the representation The optimizer works on the representation, but to score, we have to decode into a structure that lets us check for bumps and score. Example: How many bumps in: URDDLLDRURU? We can do it on graph paper Start at 0,0 Fill in the graph In PERL we use a two-dimensional array 25

27 A two-dimensional array in PERL $configuration = URDDLLDRURU ; $sequence = HPPHHPHPHHH ; foreach $i (1..100) { foreach $j (1..100) { $grid[$i][$j] = empty ; } } $x = 0; $y = = = split(//,$sequence); 26

28 Setting up the grid foreach $move { $residue = shift(@residues); if ($move = U ) { $y_position++; } if ($move = R ) { $x_position++; } etc if ($grid[$x][$y] ne empty ) { BUMP! } else { $grid[$x][$y] = $residue; } 27

29 More realistic models Higher resolution lattices (45 lattice, etc.) Off-lattice models Local moves Optimization/search methods and φ/ψ representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc. 28

30 The Other Half of the Picture Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). Theoretical force field: G = G van der Waals + G h-bonds + G solvent + G coulomb Empirical force fields Start with a database Look at neighboring residues similar to known protein folds? 29

31 Threading: Fold recognition Given: Sequence: IVACIVSTEYDVMKAAR A database of molecular coordinates Map the sequence onto each fold Evaluate Objective 1: improve scoring function Objective 2: folding 30

32 Secondary Structure Prediction AGVGTVPMTAYGNDIQYYGQVT A-VGIVPM-AYGQDIQY-GQVT AG-GIIP--AYGNELQ--GQVT AGVCTVPMTA---ELQYYG--T AGVGTVPMTAYGNDIQYYGQVT ----hhhhhhhhhhhh--eeee 31

33 Secondary Structure Prediction Easier than folding Current algorithms can prediction secondary structure with 70-80% accuracy Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, Based on frequencies of occurrence of residues in helices and sheets PhD Neural network based Uses a multiple sequence alignment Rost & Sander, Proteins, 1994, 19,

34 Chou-Fasman Parameters Name Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine A Arginine R Aspartic Acid D Asparagine N Cysteine C Glutamic Acid E Glutamine Q Glycine G Histidine H Isoleucine I Leucine L Lysine K Methionine M Phenylalanine F Proline P Serine S Threonine T Tryptophan W Tyrosine Y Valine V

35 Chou-Fasman Algorithm Identify α-helices 4 out of 6 contiguous amino acids that have P(a) > 100 Extend the region until 4 amino acids with P(a) < 100 found Compute ΣP(a) and ΣP(b); If the region is >5 residues and ΣP(a) > ΣP(b) identify as a helix Repeat for β-sheets [use P(b)] If an α and a β region overlap, the overlapping region is predicted according to ΣP(a) and ΣP(b) 34

36 Chou-Fasman, cont d Identify hairpin turns: P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3) Predict a hairpin turn starting at positions where: P(t) > The average P(turn) for the four residues > 100 ΣP(a) < ΣP(turn) > ΣP(b) for the four residues Accuracy 60-65% 35

37 Chou-Fasman Example CAENKLDHVRGPTCILFMTWYNDGP CAENKL Potential helix (!C and!n) Residues with P(a) < 100: RNCGPSTY Extend: When we reach RGPT, we must stop CAENKLDHV: ΣP(a) = 972, ΣP(b) = 843 Declare alpha helix Identifying a hairpin turn VRGP: P(t) = Average P(turn) = Avg P(a) = 79.5, Avg P(b) =