Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Structure formation and association of biomolecules Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Motivation Many biomolecules are chemically synthesized in a cell as linear chain molecules. Chain molecules (proteins, RNA) need to form a defined three-dimensional structure: Structure formation or folding process The folded three-dimensional structure of a biomolecule is directly related to its function.

Biomolecule function depends on threedimensional biomolecular structure. Protein-peptide complex

Protein folding Proteins can fold from an extended chain into a compact globular structure (some proteins need help to adopt the correct structure in vivo). The fold is determined by the amino acid sequence. The final protein three-dimensional (3D) structure has a well packed core. The 3D-structure has limited stability Folded structure can be disrupted by heat and changes of solution conditions

Interactions of biomolecules Most biomolecules form functional complexes Many functions are mediated by multiple-protein-protein interactions + What are the driving forces for such interactions? Specific vs. non-specfic interactions What happens during a binding process? Can we predict which proteins interact and how?

Outline Protein structure Folded and unfolded state Molecular interactions and forces in solution Thermodynamics of protein folding Experimental methods to study protein folding Theories of protein folding Structure formation of nucleic acids Computer simulation of structure formation Mechanism of biomolecular association Prediction of binding (geometry and affinity) Talks on interesting papers

Aim Qualitative and quantitative understanding of folding and binding processes What can be observed? What are the current theories on folding and association What are the forces that drive structure formation and folding? Protein folding problem Folding of nucleic acids Prediction of 3D structures Predicion of how biomolecules interact Computer simulation of folding and binding

Overview on protein structure (folded state) Primary structure Sequence of the protein -Gly-Ala-Arg-Phe-Val- G A R F V Secondary structure Regular local structural elements stabilized by hydrogen bonds Tertiary structure Three-dimensional fold of the protein Quaternary structure: Non-covalent assembly of several proteins to form a functional complex

Primary structure of proteins - Amino acid sequence of a protein: primary protein structure. - 20 naturally amino acids occur in proteins. - Amino acids have a common chemical structure: Tetrahedral (sp3) carbon atom (Cα) bound to four asymmetric groups: 1. Amino group (NH 2 ) 2. Carboxy group (COOH) 3. H atom 4. Functional chemical group: characteristic for the amino acid - Naturally occuring amino acids are typically L amino acids. C β C α H α NH 2 COOH

The geometry of the peptide bond Amino acids connect via a peptide bond -> poly-peptide structure. Peptide bond formation by a condensation reaction Peptide unit NH-CO (omega: ω) is planar (small deviations are possible) and is mostly in a trans configuration (except for Pro). O phi N R 1 psi H N omega O H O R 2

Amino acids can be grouped based on side chain character Non-polar side-chains (hydrophobic): Alanine (Ala), Valine (Val), Isoleucine (Ile), Leucin (Leu), Methionine (Met), Phenylalanine (Phe), Proline (Pro) and Tryptophan (Trp), also Glycine (Gly) Polar side-chains (hydrophilic): Serine (Ser), Threonine (Thr), Cysteine (Cys), Asparagine (Asn), Glutamine (Gln), Tyrosine (Tyr) and Histidine (His) Charged side-chains (hydrophilic): Positive charge: Arginine (Arg) and Lysine (Lys) Depending on the ph, Histidine (His) can also be positively charged. Negative charge: Glutamic acid (Glu) and Aspartic acid (Asp)

Secondary structure of proteins Regular structural elements which are formed by hydrogen bonds between relatively small segments of the protein sequence. Important secondary structures: α-helix β-sheet turns coil structures α-helix : hydrogen bonds between the CO of residues i and the NH of residues i+4. 3.6 residues per helical turn. helical rise of 1.5 Å per residue. β-sheet + β-turn

Super-secondary structures To some degree tertiary structure can be viewed as a 3D 'packing' of secondary structural elements. Analysis of 3D protein structures indicates the existence of recurring patterns of secondary structure arrangements-> called motifs or super-secondary structure. common super-secondary structures (motifs) are: β-hairpin β-meander βαβ-motif Helix-loop-helix motif Examples: β-hairpin βαβ-motif helix-loop-helix motif

Tertiary structure of proteins The tertiary structure of a protein are the 3D coordinates to which it is folded. This tertiary structure is directly related to its function. Some combinations of super-secondary elements are frequently found in proteins: Four α-helix bundle (two α-helix-loop-α-helix motifs connected by a loop) βαβαβ (Rossmann) fold (two βαβ motifs sharing one β-strands) Not all helices and strands in a folded protein belong to supersecondary structural elements Data base of protein structures: Brookhaven Protein Structure data base http://www.rcsb.org/pdb

Classification of protein tertiary structures Tertiary structure describes the packing of structural elements within one domain. classification: All (or mostly) α-helical: also called all-α All (or mostly) β-sheet containing: all-β α/β proteins: contain both helices and β-sheets (sometimes additional class, α + β: segregated α-helices + β-sheets) proteins with very little secondary structure content may form another class

Spatial distribution of amino acids in folded proteins The spatial distribution of amino acids with respect to the center of a folded globular protein is not random. Hydrophobic amino acids are found preferentially inside the folded protein. Hydrophilic and charged amino acids are more frequently located at the surface of the protein. This observation indicates that the solvent plays a dominant role for the structure formation of a protein. 3D-fold can be further stabilized by disulfide bonds.

Protein domains Domains: contiguous portions of the protein chain that fold into compact (semi-independent) tertiary structures (definition by Richardson, 1981). Some proteins consists of only one domain. However, especialy proteins with several hundred residues often consists of several domains. In some cases different functions of one protein can be associated with different domains. SHP2: Phosphotyrosine phosphatase 2 N-SH2-domain PTP-domain C-SH2-domain

Is the number of possible 3D protein folds limited? Observation Many sequences form similar structures

Atomic resolution structures of biomolecules are stored at the Protein Data Bank Contains ~35000 structures (mostly determined by X-ray crystallography) Several new structures per day

Methods to determine the threedimensional structure of proteins X-ray crystallography is the most powerful and most successful technique to obtain high resolution structures of a biomolecule ~80% of all structures in the PDB have been solved by X-ray crystallography (X-ray diffraction of protein/nucleic acid crystals) ~20% have been solved by NMR spectroscopy Very few by other experimental techniques A wide range of biomolecules and complexes can be analyzed by X-ray crystallography X-ray crystallography requires high-quality (well ordered) single crystals of the biomolecule NMR spectroscopy allows to determine the structure of a (small) protein in solution

Examples of structures solved by X-ray crystallography nucleosome lysozyme K+ channel RNA-polymerase ribosome Structures between a few hundred to up to million daltons can be solved if high-quality crystals can be obtained.

The unfolded state of proteins The term unfolded state can have several different meanings Random coil structure of a peptide with negligible residual interactions (except chain connectivity) The state formed immediately after synthesis (extended chain) The state(s) formed after denaturation of a protein Denaturation can be achieved by changing the temperature or the solution conditions. What is the unfolded state of a protein?

Random walk or stochastic chain model Random walk or stochastic chains Freely joined chain with constant bond length and no restriction of orientation of segments The average (root mean square) end-to-end distance R for the chain scales with N 0.5. Very common (but also most unrealistic) model of an unfolded protein chain

Experiment: Rg = Ro N ν, ν= 0.598+/-0.028 theoretical prediction for random coil according to rotational isomeric state model (Flory, 1969) for Θ-solvent: ν= 0.588

Rigid segment simulations (connected by flexible linkers) Rg = Ro N ν, ν= 0.602+/-0.028 (RIS for Θ-solvent: ν= 0.588)

Energetic evaluation with a continuum solvent model E vacuum + E solv-nonpolar + E solv-polar + E solv-salt Force field ~surface area Poisson equation Poisson-Boltzmann eq + - - - - - + + + + - - - - - + + + + + - - Contributions to the total energy of a molecule conformation:

ΔG solv = ΔG solv (elec)+δg solv (nonpolar) ΔG solv (elec)= E PB (ε in =2; ε ex =80)-E PB (ε in =2; ε ex =1) ΔG solv (nonpolar)= b + γ * SASA b = 0.86 kcal mol -1 γ = 0.005 kcal mol -1 Å -2

Good agreement with experiment for PARSE set of optimized charges and radii

pure SASA-model Authors use two different solvation models to calculate σ and s for a zipper-model of (polyala) α-helix folding All models give good agreement of calculated solvation free energy with experiment for a model compound (N-Methylacetamid): dg solv (exp) = -10.0 kcal mol -1

Plotting ΔGi vs. i gives RTln(s) and RTln(s) as intercepts and slope, respectively.

Improved continuum modelling of non-polar (hydrophobic) solvation Continuum modelling of hydrophobic contribution using a single surface tension parameter: good predictions for solvation of linear alkanes, poor results for cyclic alkanes cyclo-alkane branched+ n-alkanes Separation into two parts: A.) water reordering at molecule-water interface: (proportional to surface area) B.) molecule-water van der Waals interaction (surface integral) Significant improvement for cyclic alkanes Significant improvement for conformational changes Zacharias, M., J.Phys.Chem. 2003