Homology Modelling. Credits: Stephen College Vanathi 2080

Size: px
Start display at page:

Download "Homology Modelling. Credits: Stephen College Vanathi 2080"

Transcription

1 Homology Modelling Credits: Stephen College Vanathi 2080

2 Why we should care more about structure and less about informatics

3 Why we should care more about structure and less about informatics Structure and dynamics are needed to infer function.

4 Why we should care more about structure and less about informatics Structure and dynamics are needed to infer function. There is more to bioinformatics than AGCT or protein sequence

5 (from Mount, Bioinformatics Sequence and Genome Analysis primary (1º) secondary (2º) quaternary (4º) tertiary (3º) From: Brandon & Tooze, Introduction to Protein Structure

6 (from Mount) primary (1º) secondary (2º) quaternary (4º) tertiary (3º) From: Brandon & Tooze, Introduction to Protein Structure

7 om: Brandon & Tooze, Introduction to Protein Structure amino acids/proteins: Glycine G Alanine A Phenylalanine PHE Leucine L Isoleucine ILE Valine VAL Proline PRO Methionine MET Glutamic acid E Aspartic acid ASP Glutamine GLN Asparagine ASN Lysine LYS Arginine R GLY ALA F LEU I V P M GLU D Q N K ARG

8 amino acids/proteins: Hydrophobi c amino acids Glycine G Alanine A Phenylalanine PHE Leucine L Isoleucine ILE Valine VAL Proline PRO Methionine MET Glutamic acid E Aspartic acid ASP Glutamine GLN Asparagine ASN Lysine LYS Arginine R GLY ALA F LEU I V P M GLU D Q N K ARG

9 HO Hydrophobic amino acids H2N H2N Alanine ALA A Phenylalanine PHE Leucine LEU L Isoleucine ILE Valine VAL Proline PRO Methionine MET O HO O CH3 Phenylalanine Alanine HO HO H2N H2N O O H2N OH N Leucine HO Isoleucine O F I V P M OH O S Proline H2N O Valine Methionine

10 HO Hydrophobic amino acids H2N H2N Alanine ALA A Phenylalanine PHE Leucine LEU L Isoleucine ILE Valine VAL Proline PRO Methionine MET O HO O CH3 Phenylalanine Alanine HO HO H2N H2N O O H2N OH N Leucine HO Isoleucine O F I V P M OH O S Proline H2N O Valine Methionine Helix breaker (except at N-caps)

11 amino acids/proteins: Charged amino acids Charged residues tend to reside on protein surface Glycine G Alanine A Phenylalanine PHE Leucine L Isoleucine ILE Valine VAL Proline PRO Methionine MET Glutamic acid E Aspartic acid ASP Glutamine GLN Asparagine ASN Lysine LYS Arginine R GLY ALA F LEU I V P M GLU D Q N K ARG

12 OH H2N Charged amino acids: Glutamic acid GLU E Aspartic acid ASP D Lysine LYS K Arginine ARG R Histidine HIS H OH H2N O O O -O O- O Glutamic acid Aspartic acid OH OH H2N OH H2N O O H2N HN NH NH3 Lysine NH + H2N O NH2 Arginine + + Histidine

13 OH H2N Charged amino acids: Glutamic acid GLU E Aspartic acid ASP D Lysine LYS K Arginine ARG R Histidine HIS H OH H2N O O O -O O- O Glutamic acid Aspartic acid OH OH H2N OH H2N O O H2N HN NH NH3 Lysine NH + H2N O NH2 Arginine + + Histidine

14 amino acids/proteins: Polar amino acids Glycine G Alanine A Phenylalanine PHE Leucine L Isoleucine ILE Valine VAL Proline PRO Methionine MET Glutamic acid E Aspartic acid ASP Glutamine GLN Asparagine ASN Lysine LYS Arginine R GLY ALA F LEU I V P M GLU D Q N K ARG

15 OH Polar amino acids: OH H2N O H2N Glutamine Asparagine Serine Threonine Tyrosine TYR Y Tryptophan W Histidine Cysteine O O H2N NH2 O Glutamine Asparagine OH OH H2N H2N O O OH OH Serine OH H2N H2N GLN ASN SER THR Q N S T TRP HOHIS H C CYS O O Threonine NH HO H2N O SH Cysteine OH Tyrosine Tryptophan

16 Protein Architecture Proteins consist of amino acids linked by peptide bonds Each amino acid consists of: a central carbon atom an amino group a carboxyl group and a side chain Differences in side chains distinguish the various amino acids

17 Peptide Bonds

18 Amino Acid Side Chains Vary in: Size Shape Polarity

19 What determines fold? Anfinsen s experiments in 1957 demonstrated that proteins can fold spontaneously into their native conformations under physiological conditions. This implies that primary structure does indeed determine folding or 3-D stucture. Some exceptions exist Chaperone proteins assist folding Abnormally folded Prion proteins can catalyze misfolding of normal prion proteins that then aggregate

20 Other factors Physical properties of protein that influence stability & therefore, determine its fold: Rigidity of backbone Amino acid interaction with water Hydropathy index for side chains Interactions among amino acids Electrostatic interactions

21 Conformation Flexibility Backbone (main chain of atoms in peptide bonds, minus side chains) conformation: Torsion or rotation angles around: C-N bond (Φ) C-C bond (Ψ) Sterical hinderance: Most Pro Least - Gly

22 Levels of Description of Structural Complexity Primary Structure (AA sequence) Secondary Structure Spatial arrangement of a polypeptide s backbone atoms without regard to sidechain conformations α, β, coil, turns (Venkatachalam, 1968) Super-Secondary Structure α, β, α/β, α+β (Rao and Rassman, 1973) Tertiary Structure 3-D structure of an entire polypeptide Quarternary Structure Spatial arrangement of subunits (2 or more

23 Primary Structure

24 Structure: Helices ALPHA HELIX : a result of H-bonding between every fourth peptide bond (via amino and carbonyl groups) along the length of the polypeptide chain Individual Amino acid H-bond

25 Structure: Beta Sheets BETA PLEATED SHEET: a result of H-bonding between polypeptide chains

26 Tertiary Structure: Hexokinase (6000 atoms, 48 kd, 457 amino acids) polypeptides with a tertiary level of structure are usually referred to as globular proteins, since their shape is irregular and globular in form

27 Quarternary Structure: Haemoglobin

28 Phi and Psi Angles

29 How does a protein fold? Assume that the 2n torsional angles, φ and ϕ, of a n-residue protein each have three stable conformations. This yields 3 2n ~ 10n possible conformations for the protein. If a protein can explore new conformations at the rate that single bonds can reorient, it can find 1013 conformations per second. We can then calculate the time, t, required for a protein to explore all the conformations available to it: t = 10n/1013 For a small protein of n = 100 residues, t = 1087 s, which is immensely more than the apparent age of the universe (20 billion years = 6 x s). But we know that many proteins fold to their native conformation in less than a few seconds.

30

31 In theory Protein sequence contains all the information the cell needs to create a folded protein (Anfinsen s principle) Final conformation corresponds to the lowest energy state for that sequence Chaperones speed up folding but should not change the end

32 Therefore!!! Given the primary sequence, and the solvent properties, you have enough information to calculate the exact 3D structure of a protein. No extra information should be necessary

33 However!!! The computation required is orders of magnitude greater than anything generally available Blue Gene, the largest supercomputer ever, will predict the fold of one protein in about a year!!

34 Other: PS3 example

35 The Answer The only currently viable method is homology modelling Assumes that sequence identity translates into structural identity Uses known structures to supply a base set of co-ordinates around which to build a model

36 Predicting Protein Structure: Comparative Modeling (formerly, homology modeling) KQFTKCELSQNLYDIDGYGRIALPELICTMFH TSGYDTQAIVENDESTEYGLFQISNALWCKSS QSPQSRNICDITCDKFLDDDITDDIMCAKKIL DIKGIDYWIAHKALCTEKLEQWLCEKE? 1alc Homologous Share Similar Sequence Use as template & model KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKF ESNFNTQATNRNTDGSTDYGILQINSRWWCNDGR TPGSRNLCNIPCSALLSSDITASVNCAKKIVSDG NGMNAWVAWRNRCKGTDVQAWIRGCRL 8lyz

37 Structure prediction In an ideal world, we would be able to accurately predict protein structure from the sequence only! Because of the myriad possible configurations of a protein chain This goal can t reliably be achieved, yet. Knowledge based prediction vs. Simulation based on physical forces. Here we will only concern ourselves with knowledge-based methods, although we might use simulation in order to optimize our models.

38 Can we predict protein structures? MNIFEMLRID HLLTKSPSLN DEAEKLFNQD LDAVRRCALI LQQKRWDEAA TTFRTGTWDA EGLRLKIYKD AAKSELDKAI VDAAVRGILR NMVFQMGETG VNLAKSRWYN YKNL TEGYYTIGIG GRNCNGVITK NAKLKPVYDS VAGFTNSLRM QTPNRAKRVI ab initio folding simulation: not yet... Rosetta approach: neither... Fold recognition (threading): Often works, but...???

39 Approaches to predicting protein structures obtain sequence (target) fold assignment comparative modeling ab initio modeling build, assess model

40 Homology Modelling of Proteins Definition: Prediction of three dimensional structure of a target protein from the amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available. Why a Model: A Model is desirable when either X-ray crystallography or NMR spectroscopy cannot determine the structure of a protein in time or at all. The built model provides a wealth of information of how the protein functions with information at residue property level. This information can than be used for mutational studies or for drug design.

41 Homology modeling = Comparative protein modeling = Knowledge-based modeling Idea: Extrapolation of the structure for a new (target) sequence from the known 3D-structures of related family members (templates).

42 How valid is this approach? Look at the PDB database of all experimentally solved structures Compare structures of sequences with homology alpha-carbon co-ordinates for two proteins with 50% identity differ by around 1Å (RMSD)

43 Homology models can be very smart Homology models have RMSDs less than 2Å more than 70% of the time.

44 Sequence similarity implies structural similarity? identity/similarity Percentage sequence identity Sequence identity implies structural similarity Don t know 20 0 region... (B.Rost, Columbia, NewYork) Number of residues aligned 250

45 How does it work?

46 Identify Template Biggest hurdle!! Uses NRL3D database Search by BLAST or FASTA Look for P value lower than 10-5 Equates to ~30% identity Choose one or more templates

47 Modeling Fold Identification Aim: To find a template or templates structures from protein data base pairwise sequence alignment - finds high homology sequences BLAST Improved Multiple sequence alignment methods improves sensitivity - remote homologs PSIBLAST, CLUSTAL

48 Known Structures (Templates) Comparative Modeling Target Sequence Alignment Template - Target Protein Data Bank PDB Database of templates Separate into single chains Remove bad structures (models) Create BLAST database Template Selection Structure modeling Homology Model(s) Structure Evaluation & Assessment

49 Generate Alignment Critical for good model Most common cause of inaccuracy Trivial if identity > 70% Extremely difficult if identity < 40% Spend time, use all available data Integrate available biochemical data

50 Build Framework Builds a model for only the conserved areas in the alignment Averages the position of conserved Calpha atoms in the target sequence from the positions of the corresponding atoms in the templates Where multiple templates are used the position is biased based on the amount of local sequence homology

51 Model Building from template Core conserved regions Protein Fold Variable Loop regions Side chains Multiple templates Calculate the framework from average of all template structures Generate one model for each template and evaluate

52

53 I. Manual Modeling [ ]

54 II. Template based fragment assembly a) Build conserved core framework averaging core template backbone atoms (weighted by local sequence similarity with the target sequence) Leave non-conserved regions (loops) for later.

55 Dressing up the Core Model Core ModelRigid Body Assembly Add loops Add Side chains End Game in protein folding Molecular dynamics of all atoms in explicit solvent

56 Build Loops Not all of sequence will be conserved Divergence tends to be in loops, not main structural elements Short loops are built from a database of loop structures Different kinds of loops join different secondary structure elements Closest database match for each loop is built

57 II. Template based fragment assembly b) Loop modeling use the spare part algorithm to find compatible fragments in a Loop-Database ab-initio rebuilding of loops (Monte Carlo, molecular dynamics, genetic algorithms, etc.)

58 Loops result from substitution Mini protein folding s, insertions problem- 3 to 10 and residues longer in deletions in membrane proteins the same family Some Homology Ab Initio Compare the modeling methods loop sequence methods have generates string to DB less number of various and get hits loops to be random and evaluate. added because conformations of extensive of loops and multiple score sequence alignment of profiles Loop Builders

59 Construction of loops might be done by: Using database of loops which appear in known structures. The loops could be catagorised by their length or sequence Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them.

60

61

62 Add sidechains Often little information from template All side chains have preferred conformations (rotamers) More complex than backbone since more bonds to rotate (? angles) Preferred rotamers passed through VDW exclusion test Run a second pass to eliminate clashes

63 II. Template based fragment assembly c) Side Chain placement Find the most probable side chain conformation, using homologues structures back-bone dependent rotamer libraries energetic and packing criteria

64 Complete Backbone Initial backbone and loop construction only adds Calpha atoms Other N, C and O must be added Scans library of pentapeptides Accurate method Co-ordinates differ by ~0.2Å

65 Refine Automatic modelling will produce some unfavourable contacts Energy minimisation programs will alleviate these local problems Only used sparingly, only some atoms allowed to move Extensive use moves model away from actual co-ordinates

66 II. Template based fragment assembly d) Energy minimization modeling will produce unfavorable contacts and bonds idealization of local bond and angle geometry extensive energy minimization will move coordinates away keep it to a minimum SwissModel is using GROMOS 96 force field for a steepest descent

67 II. Template based fragment assembly d) Energy minimization

68 Evaluate More important for lower homology models Need to know whether model is any good Just because it s pretty doesn t make it right! Several programs available

69 What Check Checks Geometry of structure Looks at bond lengths and angles Compares to accepted values Highlights areas of concern Some concerns, even in experimental structures

70 1D 3D Check Each amino acid has certain probability of occurring in a given environment Environment is defined by solvent accessibility, buried / exposed, secondary structure etc. Can calculate total probability for all amino acids in protein Deviation from expected may indicate problems

71 Empirical Pair Potentials Looks at all pairwise interactions between amino acids in a structure Calculates the energy state for these interactions Unfavourable energy states indicate a problem Good for finding global errors

72 What if there are problems? Most common error is misaligned sequence with template Often unclear which alignment is correct from 1D view Construct several models from alignment variants and see which looks best

73 Comparable with Crystal Structure? No!!! A model is a tool - it is not a result Different models have different levels of accuracy Accuracy is usually determined by identity with template and correctness of alignment

74 How good is my model? Regions of sequence homology between structures generally differ by ~0.5Å. Best possible result Models based on >70% identity will approach this resolution Lower identity (40-50%) models may have greater variability but core should still be accurate Very low (30-40%) or misaligned models will give general shape only. May still position catalytic residues correctly

75 What can be modelled? Sequence Identity 90 % 50% 25% <25% Errors expected Similar to chrystallographically derived structures except for a few side chains RMS errors up to 2 Å with considerably larger local errors Alignment becomes extremely critical and large errors may follow Homology often undetected

76 Available Software Program MODELLER InsightII SWISS-MODEL SEGMOD Availabilit y Free WWW Method * Installatio n 3 Local Commercial 1 Local Free swissmod/ 1 Extern (WWW) Free cla.edu/genemine/ 2 Local *Method key: 1, comparative modelling by rigid-body assembly; 2, comparative modelling by segment matching; 3, comparative modelling by satisfaction of spatial restraints.

77 Swiss-Model Method: Knowledge-based approach. Requirements: At least one known 3D-structure of a related protein. Good quality sequence alignements. Procedures: Superposition of related 3D-structures. Generation of a multiple a alignement. Generation of a framework for the new sequence. Rebuild lacking loops. Complete and correct backbone. Correct and rebuild side chains. Verify model structure quality and check packing. Refine structure by energy minimisation and molecular dynamics.

78 Practical Example TNF muteins Given: htnf and mtnf are very similar htnf binds to mtnf-r1 but not to mtnf-r2 Observations: If we humanize mtnf by constructing a mutein mtnf-dy71-3sth/e89t we get a TNF that binds only to mtnf-r1 but is also much less stable! Why? Additional mutation of AA 102 -> mtnf-dy713sth/e89t/p102q restores the trimer stability. Why?

79 Using Deep View to perform protein modelling SPDBV (or Deep View) interacts with the SWISS Model server such that demanding modelling requests can be formulated. Deep View (or SPDBV) can be obtained from:

80 Step 1: Formatting your target sequence FastA format required = a raw sequence (no numbers / spaces / headers - just single letter amino acid codes), with a single header line which begins with a ">". => In our case this will look like: >mtnf-dy71-3sth/e89t LRSSSQNSSDKPVAHVVANHQVEEQLEWLSQRANALLANGMDLKDNQLVV PADGLYLVYSQVLFKGQGCPSTHVLLTHTVSRFAISYQTKVNLLSAVKSP CPKDTPEGAELKPWYEPIYLGGVFQLEKGDQLSAEVNLPKYLDFAESGQV YFGVIAL

81 Step 2: Loading your sequence Select: SwissModel => Load raw sequence to model Your protein will be loaded into the main view window. Because SPDBV has no structural information about this protein, it will temporarily model all of it as a perfect alpha-helix.

82 Step 3: Finding a template Identify a suitable homologue in the NRL3D database => Blast search of the ExPDB database => ExPDB database is taken directly from the PDB database (if a structure contains multiple chains then every chain is given a separate entry) Create a good sequence alignment between that sequence and your query

83 Step 3: Finding a template (II) Select: SwissModel => Find appropriate ExPDB templates Your default web browser opens up at the Blast search page at Expasy. Your sequence should already have been entered into the search form and you only have to press the submit button.

84 Step 3: Finding a template (III) The column you should first take interest in is the Blast Score. This is a measure of the statistical significance of the hits Blast has found. Basically, the lower this score, the better. The default cutoff score for automated modelling with Swiss Model is 1.0e-5, so anything lower than that is a good start. Clicking on the PDB reference on the right hand side will take you to the summary sheet for that structure in the PDB database.

85 Step 3: Finding a template (III)

86 Step 3: Finding a template (III)

87 Step 3: Finding a template (III) Remark! The search from within Swiss Model does not use the latest version of Blast (Blast 2.0), but rather the older Blast 1.4, not able to put gaps in the alignments. Therefore it is advisable to complement this search with other search tools. If your sequence happens to be 90% identical to another sequence in the database (as in our case) then it doesn't really matter which program you use to search it - they will all work, but if you have a sequence which is of lower homology then other programs may

88 Step 4: Saving and incorporating the template Use the import function ( File => Import ). You need to look at the PDB code(s) you have decided to use before performing the import. If your structure contains multiple protein chains then you only want to import one of them. For single chain structures simply enter the PDB code into the name window and press the: Grab from server => PDB file button. If you have a multi-chain structure then use: Grab from server => ExPDB file.

89 Step 4: Saving and incorporating the template (II) In our case type 2TNFC in the name box and press the PDB file button. This should automatically load the structure into the main view. Notice that although both the sequence and the template are now present in the same window they are not at all superimposed and SPDBV will not automate the modelling process.

90 Step 5: Threading your sequence Make sure you can see the alignment window. If you cannot, then open it by selecting: Window =>Alignment

91 Step 5: Threading your sequence (II) Ensure that the following items are selected (have a tick next to them): Swiss Model => Update threading display automatically Swiss Model => Auto colour by threading energy Make sure that the mutein1 sequence is active (click on its name in Layers infos ), it should appear in red in the Layers infos and Align windows. Then select: Fit =>Fit Raw Sequence

92 Step 5: Threading your sequence (III) Our long alpha helix disappears magically and wraps itself around the template. The program has superimposed the alpha-carbon backbones of the template and our query sequence. I have made the template invisible by clicking on the vis tab in the Layers infos, so we see just

93 Step 5: Threading your sequence (IV) Our model sequence has been recoloured and is mostly of a green colour. The colouring you can see on the model is the Threading energy. If a particular amino acid is happy in its environment, and isn't clashing with any surrounding amino acids, then it will have a low energy state, indicated by a green colour. If, however, the side chain of an amino acid comes too close to that of an adjacent group then this is an energetically unfavourable position and the residue will be coloured in orange or red.

94 Step 5: Threading your sequence (V) Before we submit our sequence to model the alignment should be optimized. This is done with two considerations in mind: How good is my alignment on a sequence level? Are all the critical parts of my protein aligned with their counterparts in the template? How good is my alignment in 3 dimentions? Does the alignment I have generated cause any nasty clashes with surrounding amino acids? In our case with 97% sequence identity between target and template these are obviously trivial

95 Step 6: Submit your modelling request Swiss Model => Submit Modelling Request This should bring up a file selector window asking you where you want to save the modelling request file. Save this with the other files you have generated on this project.

96 Step 6: Submit your modelling request (II) SPDBV will now automatically open a web browser and load your project file into the optimise mode of Swiss Model. After filling in all the requested information press the Send Request Button.

97 Step 6: Submit your modelling request (III) We receive back a confirmation form indicating that our submission was successful. This contains also a table indicating our choice of preferences.

98 Step 7: Evaluating the resulting model SwissModel_News Welcome_to_SwissModel TraceLog-AAAa005Pd WhatCheck-AAAa005Pd Files received from SWISS Model: News from Swiss Model - a bit of promotion for them, and reassurance for you that your address was OK Confirmation that your request has been received and has been submitted to the queue. This also contains your Trace log number which will uniquely identify your process if anything goes wrong! A log of all the processes Swiss Model has gone through to generate (hopefully!) the model you will receive. This will also tell you where and why a request failed. analysis report on your model. It The geometric gets to see the model before you do!! Obviously, you won't get this if you deselected the What Check option when you submitted your request. Model-AAAa005Pd Hopefully! If you get this file then Swiss Model has been able to generate a model for you. This message will have your model as an attachment in PDB format. You can open this up straight into SPDBV.

99 The trace file ProModII trace log for Batch.0 ============================================================ ProModII: ProMod version 3.5 date Jul :15 ProModII: SPDBV version 3.5 ProModII: Loop version 2.60 ProModII: LoopDB version 2.60 ProModII: ProModII: The trace file is the output from the Promod II program at Expasy Parameters version 3.5 which actually calculates your model. Topologies version 3.5 ProModII: Loading Project File: Batch.0 ProModII: N-terminal overhang trimmed for chain ' '. Start at residue: 9 ProModII: adding blocking groups ProModII: Adding Missing Sidechains ProModII: Building CSP loop with anchor residues SER 71 and VAL 74 ProModII: Number of Ligations found: 6 ProModII: all loops are bad; continuing CSP with larger segment ProModII: Building CSP loop with anchor residues PRO 70 and VAL 74 ProModII: Number of Ligations found: 87 ProModII: ACCEPTING loop 25: clash= 0 FF= ProModII: Optimizing Sidechains ProModII: Dumping Preliminary Model ProModII: Adding Hydrogens ProModII: Optimizing loops and OXT (nb = 5) ProModII: Final Total Energy: ProModII: Removing Hydrogens KJ/mol ProModII: Fixing Atom Nomenclature ProModII: Dumping Sequence Alignment *** -9.7 PP= -5.00

100 The model file Click on the right mouse button when pointing at the attachments icon at the bottom of the message and choose Save Link As. Pick a suitable location for the file.

101 Viewing our model Select: File =>Open PDB File. overlay of our model file with the template used in the modelling process Make the template invisible (Using vis in the Layers infos) so you can see just the model Default view: the atoms in CPK colouring with a ribbon representation overlaid, coloured by B-factor.

102 B-factors in models In a model the B-factor is a measure of how much structural information the modelling program was able to transfer from the template(s). A high B-factor doesn t necessarily mean that that part of the model is wrong, and more importantly, a low B-factor doesn t make it right. In general models are more accurate where they can take more information from the templates, but unless the underlying alignment is correct then this information may be applied to the wrong residues in the model.

103 The B-factor in mutein1

104 Looking for problems Bfactors colour the model by Bfactor and look for warm colours Only means that the program isn't confident about the structural assignment at a particular position (doesn't necessarily mean that the model is wrong at that point!).

105 Phi and Psi Angles

106 Restraints Only some psi and phi angles are allowed Van der Waals radii cannot overlap Get areas of allowed regions on a plot of psi vs phi These correspond to different types of secondary structure

107 Ramachandran Plot

108 Exceptions Anything outside allowed regions usually has problems Glycine can occupy excluded regions as it has no side chain Proline has an unusual conformation where the side chain binds to the backbone

109 Looking for problems Phi-Psi problems select all groups within the structure: Select => All view the plot: Window => Ramachandran Plot It is permissible only for Glycines (and occasionally Prolines) to have angles outside the usually accepted boundaries. His73 just fells outside the allowable region for RH alphahelices

110 What Check reports This report contains a lot of information about your model structure and most of it is probably not relevant. Different scales of problems: Errors - These are the really important ones. If a group has an error associated with it then the structure at that point is probably wrong. Warnings - This is a less serious message. A warning occurs when a group has bond angles or lengths which are unusual in protein structures, but not impossible. Crystal structures will often produce warnings when passed through What Check. Notes - This is not an error message at all but simply a place for putting a summary of statistical data about your As a modeller you don t have to sort out every error. Look where the problem is in protein. the protein. If it is on the end of an exposed loop - or far away from your active site, then it's probably not worth spending too much time trying to sort it out!!

111 Looking for problems Force field energies SPDBV provides the facility to be able to calculate the average energy for each group within a structure, based on its conformation and interaction with its neighbours. Select: Tools =>Compute energy (force field) Although you can select a subset of energies to calculate it is easiest to do everything.

112

113 Looking for problems Force field energies It would take a long time to go through each residue in the report and see whether there were any problems - however, what this energy report is especially good for it spotting general areas of concern. This is because it is possible to colour your model by Force Field energy using: Color => Force Field Energy. The absence of stretches of warm colours indicates that we have a reasonable model to work with.

114

115 If we humanize mtnf by constructing a mutein mtnfdy71-3sth/e89t we get a TNF that binds only to mtnfr1 but is also much less stable! Why?

116 Surface Determines What Binds Steric access Shape Hydrophobic accessible surface Electrostatic surface Sequence and structure optimized to generate surface properties for requisite binding event(s)

117 Hydrogen Bonds Some elements (esp. O and N) can draw electrons away from surrounding hydrogens This creates a partial charge These partial charges can interact to form hydrogen bonds Water can form very good H-bonds

118 Hydrogen Bonds

119 Hydrogen Bonding Amino Acids

120 Additional mutation of AA 102 -> mtnf-dy71-3sth/e89t/p102q restores the trimer stability. Why?

121 Scheme Any given protein sequence Check sequence identity with proteins with known structure > 35% < 35% Homology Modeling Fold Recognition < 35% ab initio Folding Structure selection Structure refinement Final Structure

122 Hydrophobic Effects Main driving force for protein folding Greatest free energy change from random to folded state Brings together hydrophobic residues Increases the entropy (disorder) of surrounding solvent Counteracts decreased entropy in protein

123 Hydrophobic Effects In bulk water the molecules are disordered and form H-bonds in all directions Around a hydrophobic molecule water can only form H-bonds away from the molecule and thus becomes more ordered To minimise this decrease in entropy the surface area of the hydrophobic molecules must be reduced. This is best achieved by gathering them together, as in a folded protein - or a separated oil / water mix.