The Structure Lectures

Similar documents
Structural bioinformatics

Dr. R. Sankar, BSE 631 (2018)

6-Foot Mini Toober Activity

Fundamentals of Protein Structure

11 questions for a total of 120 points

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Basic concepts of molecular biology

Virtual bond representation


Basic concepts of molecular biology

Algorithms in Bioinformatics ONE Transcription Translation

Protein Structure Analysis

KEMM15 Lecture note in structural bioinformatics: A practical guide. S Al-Karadaghi, Biochemistry & Structural Biology, Lund University

DNA.notebook March 08, DNA Overview

EE550 Computational Biology

Bioinformatics. ONE Introduction to Biology. Sami Khuri Department of Computer Science San José State University Biology/CS 123A Fall 2012

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY DEPARTMENT OF BIOTECHNOLOGY Professor Bjørn E. Christensen, Department of Biotechnology

Using DNA sequence, distinguish species in the same genus from one another.

Bi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13

Ch Biophysical Chemistry

Unit 1. DNA and the Genome

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 5 Protein Structure - III

Cryo-electron microscopy

Visualizing proteins with PyMol

Protein Synthesis. Application Based Questions

Nucleic acid and protein Flow of genetic information

First&year&tutorial&in&Chemical&Biology&(amino&acids,&peptide&and&proteins)&! 1.&!

Packing of Secondary Structures

R = G A (purine) Y = T C (pyrimidine) K = G T (Keto) M = A C (amino) S = G C (Strong bonds) W = A T (Weak bonds)

If you wish to have extra practice with swiss pdb viewer or to familiarize yourself with how to use the program here is a tutorial:

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Homework. A bit about the nature of the atoms of interest. Project. The role of electronega<vity

The study of protein secondary structure and stability at equilibrium ABSTRACT

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

Protein Structure Prediction by Constraint Logic Programming

Aipotu II: Biochemistry

LIST OF ACRONYMS & ABBREVIATIONS

Protein Structure Prediction

Proteins: Wide range of func2ons. Polypep2des. Amino Acid Monomers

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

Steroids. Steroids. Proteins: Wide range of func6ons. lipids characterized by a carbon skeleton consis3ng of four fused rings

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

Immune system IgGs. Carla Cortinas, Eva Espigulé, Guillem Lopez-Grado, Margalida Roig, Valentina Salas. Group 2

Protein 3D Structure Prediction

Granby Transcription and Translation Services plc

STRUCTURE, DYNAMICS AND INTERACTIONS OF PROTEINS BY NMR SPECTROSCOPY

2) Which functional group is least important in biochemistry? A) amine B) ester C) hydroxyl D) aromatic E) amide

Bioinformatics Prof. M. Michael Gromiha Department of Biotechnology Indian Institute of Technology, Madras. Lecture - 5a Protein sequence databases

Hmwk # 8 : DNA-Binding Proteins : Part II

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

2018 Protein Modeling Exam Key

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

蛋白質體學. Proteomics Amino acids, Peptides and Proteins 陳威戎 & 21

CFSSP: Chou and Fasman Secondary Structure Prediction server

Solutions to Problem Set 1

ENZYMES AND METABOLIC PATHWAYS

BME205: Lecture 2 Bio systems. David Bernick

Zool 3200: Cell Biology Exam 3 3/6/15

MOLEBIO LAB #3: Electrophoretic Separation of Proteins

What is necessary for life?

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

Amino Acids and Proteins

Computational Methods for Protein Structure Prediction

Molecular Biology. Biology Review ONE. Protein Factory. Genotype to Phenotype. From DNA to Protein. DNA à RNA à Protein. June 2016

From mechanism to medicne

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

How life. constructs itself.

Your Name. Mean ± SD = 72.3 ± 17.6 Range = out of 112 points N = Points

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

Key Concept Translation converts an mrna message into a polypeptide, or protein.

What is necessary for life?

Protein NMR II. Lecture 5

Protein Structure Databases, cont. 11/09/05

Name: TOC#. Data and Observations: Figure 1: Amino Acid Positions in the Hemoglobin of Some Vertebrates

Dynamic Programming Algorithms

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-

Pacific Symposium on Biocomputing 4: (1999)

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

BIRKBECK COLLEGE (University of London)

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

Additional Case Study: Amino Acids and Evolution

Textbook Reading Guidelines

Lecture 19A. DNA computing

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

Aipotu I & II: Genetics & Biochemistry

Biochemistry and Cell Biology


Protein Structure and Function! Lecture 4: ph, pka and pi!

Cristian Micheletti SISSA (Trieste)

Homology Based Motif Generation

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Name. Student ID. Midterm 2, Biology 2020, Kropf 2004

Description of Changes and Corrections for PDB File Format Version 4.0. Provisional Document April 12, 2011

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Biology: The substrate of bioinformatics

Alpha-helices, beta-sheets and U-turns within a protein are stabilized by (hint: two words).

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

A Zero-Knowledge Based Introduction to Biology

Transcription:

The Structure Lectures Boris Steipe boris.steipe@utoronto.ca http://biochemistry.utoronto.ca/steipe Departments of Biochemistry and Molecular and Medical Genetics Program in Proteomics and Bioinformatics University of Toronto 8.0 1

The Structure lectures 8.0 Lecture: Use of protein structures 8.1 Lab: Visualization of structure 9.2 Lecture: Homology and structural similarity 9.3 Lab: Homology modeling 8.0 2

http://creativecommons.org/licenses/by-sa/2.0/ 8.0 3

Lecture 8.0: Use of Protein Structure Boris Steipe boris.steipe@utoronto.ca http://biochemistry.utoronto.ca/steipe Departments of Biochemistry and Molecular and Medical Genetics Program in Proteomics and Bioinformatics University of Toronto ( Some slides have been taken from a lecture held by Chris Hogue, Toronto, for CBW in 2002) 8.0 4

Concepts 1. "Sequence" and "structure" are abstractions of biopolymers. 2. Structure can be determined experimentally. 3. Structure abstractions can be stored, retrieved and visualized. 4. Knowledge of structure allows mechanistic explanations. 5. Structure is not arbitrary, but comes in units - motifs, helices, strands, domains and complexes. 6. Domains are folding units, functional units and units of inheritance. 8.0 5

Concept 1: "Sequence" and "structure" are abstractions of biopolymers. 8.0 6

Physical Amino Acids and Amino Acid Abstractions Formula: C 9 H 9 NO 2 Smiles String : [CH]([NH][R])([C](=[O])[R]) [CH2]- [c]1([ch][ch][c]([ch][ch]1)[ OH]) Name: Tyrosine 3-Letter: Tyr 1-Letter: Y N O OH ATOM 1091 N TYR 145-35.676-13.136 50.622 1.00 10.36 ATOM 1092 CA TYR 145-36.931-13.763 51.019 1.00 10.63 ATOM 1093 C TYR 145-37.676-12.879 52.016 1.00 11.16 ATOM 1094 O TYR 145-37.061-12.316 52.926 1.00 13.91 ATOM 1095 CB TYR 145-36.660-15.140 51.638 1.00 9.52 ATOM 1096 CG TYR 145-37.845-15.737 52.361 1.00 6.36 ATOM 1097 CD1 TYR 145-38.144-15.357 53.663 1.00 3.30 ATOM 1098 CD2 TYR 145-38.691-16.652 51.727 1.00 6.14 ATOM 1099 CE1 TYR 145-39.248-15.856 54.311 1.00 5.57 ATOM 1100 CE2 TYR 145-39.804-17.165 52.376 1.00 4.89 ATOM 1101 CZ TYR 145-40.076-16.757 53.670 1.00 4.35 ATOM 1102 OH TYR 145-41.170-17.231 54.345 1.00 4.44 http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html 8.0 7

The Concept of Abstract Amino Acids Allows Highly Compressed Information Bulky H-bond Donor Nucleophile Phospho-Acceptor Hydrophobic H-Bond Acceptor Y 2 side chain rotational freedom Aromatic 8.0 8

2D-map of amino acid similarity A C D E F G H I K L M N P Q R S T V W Ala Alanine Cys Cysteine Asp Aspartic acid Glu Glutamic acid Phe Phenyalanine Gly Glycine His Histidine Ile Isoleucine Lys Lysine Leu Leucine Met Methionine Asn Asparagine Pro Proline Gln Glutamine Arg Arginine Ser Serine Thr Threonine Val Valine Trp Tryptophan Y Tyr Tyrosine C S-S is cysteine in a disulfide bond, C SH indicates the free thiol. 8.0 9

The Concept of Abstract Amino Acid Similarity is Lossy Bulky (FILQRYW) Hydrophobic (FAMILYVW) H-bond Donor (CHKNQRSTWY) Nucleophile (CDESTY) Phospho-Acceptor (STY) H-Bond Acceptor (DEHNQSTY) Y 2 side chain rotational freedom (CDFHSW) Aromatic (FWH) 8.0 10

Structure Contextualizes Sequence V V I Y T T G (Tyr262 in 1ERQ.pdb) 8.0 11

Structural Abstraction To store structures we need: - coordinate - topology, and z e y g d x b Sulphur Carbon Oxygen Nitrogen - chemical type a information. Met 8.0 12

Concept 2: Structure can be determined experimentally. 8.0 13

Experimental sources of structure X-ray NMR Crystallization required Diffraction Æ data collection The phase problem: MAD, heavy metal isomorphic derivatives...... or "Molecular replacement" give phase approximations Model building in electron density maps Refinement 8.0 14

Experimental sources of structure X-ray Crystallization is limiting. Diffraction is not imaging! Refinement is required. NMR Data Model http://www-structure.llnl.gov/xray/101index.html 8.0 15

Experimental sources of structure X-ray NMR High concentration required ( ~ 1mM) Assignment of peaks...... determination of crosspeaks Æ distance constraints Calculation of models from distance constraints Refinement 8.0 16

Experimental sources of structure X-ray Ensemble of structures that are compatible with experimental distance constraints Consensus model 1DRO.PDB NMR Concentration/Solubility Assignment and NOEs Refinement 8.0 17

Assessing structure quality Metrics: Resolution, R-factor and R-free Bond length and angle deviations Coordinate error can be estimated from diffraction data Programs Whatcheck and Procheck calculate quality metrics: http://www.sci.sdsu.edu/tfrey/bio750/bio750x-ray.html http://swift.cmbi.kun.nl/wiwwwi//fullcheck.html http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html (also NMR) Rules of thumb for "good structures": Resolution 2Å, R-factor 20%, mean coordinate error 0.2 Å, RMSD bond-lengts: 0.02Å 8.0 18

Concept 3: Structure abstractions can be stored, retrieved and visualized. 8.0 19

The PDB The PDB is the primary repository of protein structure data. http://www.rcsb.org/pdb 8.0 20

What s in a Structure File? Population experiments X-ray, 1 structure NMR - sometimes many structures Incomplete - not all atoms are there Hydrogens, parts of the protein in motion Crystallographic space correct, but not always relevant 8.0 21

The PDB format Flat file, column oriented Human readable Human editable Huge legacy problems Flat File: A datafile without indexing structure or hierarchy. In contrast, to relational database, or data grammar. 8.0 22

Header HEADER IMMUNOGLOBULIN 01-MAR-93 2IMM 2IMM 2 COMPND IMMUNOGLOBULIN VL DOMAIN (VARIABLE DOMAIN OF KAPPA LIGHT 2IMM 3 COMPND 2 CHAIN) OF MCPC603 2IMM 4 SOURCE HUMAN (HOMO $SAPIENS) RECOMBINANT SYNTHETIC M603 GENE 2IMM 5 AUTHOR B.STEIPE,R.HUBER 2IMM 6 REVDAT 1 15-JUL-93 2IMM 0 2IMM 7 REMARK 1 2IMM 8 REMARK 1 REFERENCE 1 2IMM 9 REMARK 1 AUTH B.STEIPE,A.PLUCKTHUN,R.HUBER 2IMM 10 REMARK 1 TITL REFINED CRYSTAL STRUCTURE OF A RECOMBINANT 2IMM 11 REMARK 1 TITL 2 IMMUNOGLOBULIN DOMAIN AND A 2IMM 12 REMARK 1 TITL 3 COMPLEMENTARITY-DETERMINING REGION 1-GRAFTED MUTANT 2IMM 13 REMARK 1 REF J.MOL.BIOL. V. 225 739 1992 2IMM 14 REMARK 1 REFN ASTM JMOBAK UK ISSN 0022-2836 070 2IMM 15 [...] REMARK 2 2IMM 23 REMARK 2 RESOLUTION. 2.00 ANGSTROMS. 2IMM 24 REMARK 3 2IMM 25 [...] 8.0 23

Seqres [...] SEQRES 1 114 ASP ILE VAL MET THR GLN SER PRO SER SER LEU SER VAL 2IMM 35 SEQRES 2 114 SER ALA GLY GLU ARG VAL THR MET SER CYS LYS SER SER 2IMM 36 SEQRES 3 114 GLN SER LEU LEU ASN SER GLY ASN GLN LYS ASN PHE LEU 2IMM 37 SEQRES 4 114 ALA TRP TYR GLN GLN LYS PRO GLY GLN PRO PRO LYS LEU 2IMM 38 SEQRES 5 114 LEU ILE TYR GLY ALA SER THR ARG GLU SER GLY VAL PRO 2IMM 39 SEQRES 6 114 ASP ARG PHE THR GLY SER GLY SER GLY THR ASP PHE THR 2IMM 40 SEQRES 7 114 LEU THR ILE SER SER VAL GLN ALA GLU ASP LEU ALA VAL 2IMM 41 SEQRES 8 114 TYR TYR CYS GLN ASN ASP HIS SER TYR PRO LEU THR PHE 2IMM 42 SEQRES 9 114 GLY ALA GLY THR LYS LEU GLU LEU LYS ARG 2IMM 43 [...] Explicit (above) and implicit sequence may differ! 8.0 24

Atom Pitfalls: Atomname is a mix of Chemical element and bond distance. "CA.." ".CA." Atom number Sequence number is actually a string - Chain and insertion code are required to make it unique (e.g B 123A). Amino acid type X Y Z Occ ATOM 119 CA ARG 18 8.386 51.105 35.847 1.00 7.30 2IMM 179 Atom name Sequence number B (Temperature factors) Record type PDB format is strictly column oriented! 8.0 25

Hetero Atoms [...] HETATM 877 O HOH 1-4.169 60.050 40.145 1.00 3.00 2IMM 937 [...] http://xray.bmc.uu.se/hicup/ 8.0 26

The crystallographic asymmetric units does not necessarily contain a functional molecule 1qpi.pdb Tet-repressor/operator complex The contents of a crystal lattice unit cell can be generated from the asymmetric unit by applying the required symmetry operations for the crystallographic space-group. But neither is this trivial for the non-crystallographer, nor is it obvious which of the symmetry replicates might make physiological contacts. 8.0 27

... Biological Unit PQS reasons automatically about how a monomer might be correctly completed to a functional biomolecular complex (and is often correct). http://pqs.ebi.ac.uk/ 8.0 28

NCBI structure group MMDB - very well integrated but somewhat impenetrable. 8.0 29

NDB http://ndbserver.rutgers.edu/ndb/ urx035.pdb (Hammerhead Ribozyme) 8.0 30

PDBsum - and "secondary" structure databases http://www.biochem.ucl.ac.uk/bsm/pdbsum/ 8.0 31

PDBsum - Information 8.0 32

Others Macromolecular Structure Database at EBI (Relibase, PQS...) http://www.ebi.ac.uk/msd/ Macromolecular structure related resources at the PDB http://www.rcsb.org/pdb/links.html Structure links at the Southwestern Biotechnology and Informatics Center http://www.swbic.org/links/1.19.2.5.php Molecular Models from Chemistry http://people.ouc.bc.ca/woodcock/molecule/molecule.html Molecular Library http://www.nyu.edu/pages/mathmol/library/... many, many more. 8.0 33

Concept 4: Knowledge of structure allows mechanistic explanations. 8.0 34

Structure as an integrated map - Example questions Which part of my structure appears to be conserved? Are two functionally important residues possibly in contact? Where is Asn220 relative to the active site? May the mutation E123A possibly have something to do with protein stability? Is Leu234 on the surface, or in the core? I want to clone my protein into a yeast two-hybrid system: should I fuse the DNA binding domain to the N- or the C- terminus? 8.0 35

Geometric relationships Bonds Angles, plain and dihedral Surfaces Chemical potential, amino acid functions Static and dynamic disorder Structural similarity Electrostatics Conservation patterns (structural and functional) Quarternary structure Posttranslational modification sites Unexpected homology [...] 8.0 36

Distances from coordinates XYZ coordinates are vectors in an orthogonal coordinate system, in Å. All the rules of analytical geometry apply. [...] ATOM 687 OH TYR 86 7.415 62.584 32.900 1.00 3.37 [...] ATOM 651 O ASP 82 9.996 62.571 32.488 1.00 5.18 [...] d = [(9.996-7.415) 2 + (62.571-62.584) 2 + (32.488-32.900) 2 ] 0.5 = [(2.581) 2 + (-0.013) 2 + (-0.412) 2 ] 0.5 = [6.661561 + 0.0000169 + 0.169744] 0.5 = [6.831474] 0.5 = 2.614 Å = 0.2614 nm = 2.614. 10-10 m 8.0 37

Dihedral angles i i+1 i+3 i+2 +f Single bonds: Freely rotable, but constrained by steric overlap. Small energetic barrier, preference for staggered conformations. Double bonds: Constrained to planar geometry. Large energetic barrier to isomerization. 8.0 38

y f w Backbone dihedral angles: Ramachandran plots Rotatable bonds in the backbone are named f,y and w. Due to steric overlap, not all combinations of (f,y) are allowed. Allowed and forbidden regions of (f,y) space are shown on the Ramachandran plot. Observed (f,y) values reflect the theoretical boundaries well. 8.0 39

Sidechain rotamers c 3 c 2 c 1 Ponder & Richards (1987) J. Mol. Biol. 193, 775-791 http://dunbrack.fccc.edu/bbdep/ 100 randomly chosen Phe-residues superimposed. 8.0 40

H-bond patterns Example: TYR - Side Chain Donor OH can donate a single hydrogen (The OH-H bond is 1.00Å long and lies in the plane of CE1, CE2, CZ and OH forming an angle of 110 degrees with the CZ-OH bond.) Distribution of H-bond counts in all and buried residues, D-A distances, H-A distances and D-H-A angles intyr sidechains. Tyr-Thr sidechain H-bond: despite canonical geometry, correct topology may be ambiguous! McDonald & Thornton (1994) J. Mol. Biol. 238, 777-793 http://www.biochem.ucl.ac.uk/bsm/atlas/ 8.0 41

Molecular surface Chain "A" of 1AON.PDB - GroEL/ES complex Surface rendering of GroEL/ES complex (D. Goodsell) 8.0 42

Molecular surface Surface provides a visual metaphore, and a useful tool to map properties. But how can a molecular surface be defined? Obviously, the hard-sphere surface is chemically not very relevant. Van der Waals surface 8.0 43

Molecular surface r = 1.4Å Probe! Van der Waals surface 8.0 44

Molecular surface Contact surface Accessible surface "Accessible" Van der Waals surface "Buried" Reentrant surface 8.0 45

Calculating solvent accessible surfaces 1. Draw a sphere around each atom, with a radius of (VdW + solvent probe ). 2. Erase all overlapping sphere surfaces. 3. The remaining area is the accessible surface. r = 1.4Å C: 1.75 Å N: 1.55 O: 1.4Å H: 1.17Å 8.0 46

Parameters and assumptions Problem: Analytical solution inefficient. Solution: Numerical solution with probe points Problem: Regular placement of n probe points Solution: Stochastic placement Problem: Stochastic placement quite irregular Solution: Enforce minimum separation Problem: Efficiency Solution: Place points only once, translate as needed Problem: What is a good value for n? Solution: Try different n, evaluate standard deviation Problem: Should n be constant per atom, or per area? Solution: dots/area - need to scale dots with r VdW Problem: Hydrogens - where to get united atom radii? Solution: Literature search. Problem: Reference areas for relative SAA needed Solution: Model explicitely, as tripeptides [...] u,v Π[0,1] q = 2p u f = cos -1 (2v 1) http://mathworld.wolfram.com/ SpherePointPicking.html Even a straightforward algorithm has it's hidden parameters and assumptions. Results are meaningful only in this context. Any comparison is problematic. 8.0 47

Mapping properties on surfaces Properties of atoms (B-factors) Ensemble properties of residues (hydrophobicity, conservation) Geometry (local curvature) Fields and potentials (isosurfaces, binding potential) AChE (1ACL.PDB) color coded by electrostatic potential with GRASP. (http://trantor.bioc. columbia.edu/grasp/) 8.0 48

Concept 5: Structure is not arbitrary, but contains recurring units. 8.0 49

Basic building blocks of structure: Eg. PROMOTIF - as used in PDBSUM But: classical descriptions of structural building blocks are as much based on idealized concepts of geometry as on observations of nature. An unbiased analysis may arrive at significantly different classifications! 8.0 50

Unbiased structure motifs: alignment with added value Motif alignments... Why are particular amino acids conserved? What is essential in a sequence? A structure motif consensus sequence, compiled from unrelated segments, averages out features of conservation that are only due to incomplete divergence (homology). A consensus sequence, taken from different structural contexts, averages out features of sequence that are due to specific functional (binding, catalysis) or non-local structural requirements (packing, interaction). What remains is information about sequence propensities of local structural elements. 8.0 51

A schematikon motif example: complex loop 3.8 1icf_I_215_0_7 Support = 7 against nrdb distribution 3.6 3.4 3.2 3.0 Motif: 1icf 215 Length: 7 Support: 7 Unique: 7 Rank: 399 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1 2 3 4 5 6 7 Position 8.0 52

A schematikon motif example: strand N-cap 3.8 1whi_0_35_0_4 Support = 23 against nrdb distribution 3.6 3.4 3.2 3.0 Motif: 1whi 35 Length: 4 Support: 7 Unique: 7 Rank: 444 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 1 2 3 4 Position 8.0 53

Concept 6: Domains are folding units, functional units, and units of inheritance. 8.0 54

Domains are ubiquitous in proteins Large proteins are composed of compact, semi-independent units - domains. Reason: Modularity Folding efficiency 2MCP.PDB 8.0 55

Domains in proteins: Number of domains in 787 representative proteins used as the basis for the CATH database Jones S et al. (1998) Protein Science 7:233 8.0 56

Domains in proteins: Non-random relationship between domain number and chain length in the 787 representative proteins used as the basis for the CATH database Jones S et al. (1998) Protein Science 7:233 8.0 57

Domains in proteins: Domain size in the 787 representative proteins used as the basis for the CATH database Jones S et al. (1998) Protein Science 7:233 8.0 58

There is no universal definition of "domains" Possible definitions are based on independently inherited (sub)sequences (sequence domain), modular protein functions (functional domain), folding unit or atomic contacts (structural domain). Domain: A part of structure that can fold irrespective of the presence of other parts of structure But: what is measured is commonly sequence, function, or structure - NOT FOLDING! 8.0 59

Further complications: Analogous structure, Domain insertions, Circular permutations, Domain swapping. Domain insertion 1A2J.PDB Protein disulfide isomerase 2TRX.PDB Thioredoxin 8.0 60

Further complications: Analogous structure, Domain insertions, Circular permutations, Domain swapping. 253 1ERQ.PDB beta lactamase Circular permutation 1ALQ.PDB beta lactamase 8.0 61

Further complications: Analogous structure, Domain insertions, Circular permutations, Domain swapping. Domain swapping 11BG.PDB Bull seminal ribonuclease 8.0 62

Domains can be elusive: The separation of a structure into domains requires the arbitrary definition of thresholds in a continuum of possibilities. 8.0 63

Why care? Function: evolution works on sequence, but selects function. Definition of domains in structure can uncover functional units that may evolve independently. Sequence searches, alignments etc. with domains are much more specific. Once structural domains have been defined, sequence profiles, HMMs or other computational procedures can be used to pick out more members of the domain family from the database. Domains can be defined from sequence patterns, or from the analyis of structure. 8.0 64

Automated (objective) domain definition: - Sequence (CDD) http://www.ncbi.nlm.nih.gov/structure/cdd/cdd.shtml CDD from Smart and Pfam CDART from CDD and Genbank 8.0 65

SemiAutomated consensus domain definition: - Structure (CATH) Dehydrolipoamide dehydrogenase 1LPFA: Jones S et al. (1998) Domain assignment for protein structures using a consensus approach: Chracterization and analysis. Protein Science 7:233-242 8.0 66

SCOP & CATH: structural classification The eight most frequent SCOP Superfolds http://scop.mrc-lmb.cam.ac.uk/scop/ http://www.biochem.ucl.ac.uk/bsm/cath/ 8.0 67

CATH - Class Class1: Mainly Alpha Class 2: Mainly Beta Class 3: Mixed Alpha/Beta Class4: Few Secondary Structures 8.0 68

CATH - Architecture Roll Super Roll Barrel 2-Layer Sandwich 8.0 69

CATH - Topology L-fucose Isomerase Serine Protease Aconitase, domain 4 TIM Barrel 8.0 70

CATH - Homology Alanine racemase Dihydropteroate (DHP) synthetase FMN dependent fluorescent proteins 7-stranded glycosidases 8.0 71

CATH - Entry (Example) 8.0 72

IV: Open Issues I: Integration into processes, scriptable APIs II: Sequence based identification of domains III: Analysing domains in context IV: Defining modular domain functions 8.0 73

Bioinformaticians apparently do not like structure! Sequence: Discrete alphabet Easy to manipulate Well developed datastructures Well developed libraries Structure: Continuous space Linear algebra, complicated energy functions Databases and datastructures are difficult Paucity of libraries Meet the challenge! 8.0 74

Questions? Feedback? boris.steipe@utoronto.ca http://biochemistry.utoronto.ca/steipe/ 8.0 75