Protein Structure Function

Size: px
Start display at page:

Download "Protein Structure Function"

Transcription

1 Protein Engineering 15hp Protein Structure Function

2 The Protein Domain 2

3 What is a domain? An evolutionary conserved, independently folding unit within a protein. A domain often, but not always, consist of a continuos segment of amino-acid sequence. Domains vary in size, but are usually around 200 amino-acids. A domain can have an independent function or contribute to the function of a multi-domain protein. 3

4 Some proteins are made of a single domain but the majority of proteins are multi-domain proteins. Single-domain protein Multi-domain protein N-terminal Hetrotrimeric proteinlinker regions C-terminal

5 Alpha & Beta Domains 5

6 Alpha domains are protein domains composed entirely of alpha helices. Beta domains are protein domains composed entirely of beta sheets. 6

7 Alpha domains Myohemerythrin (2mhr) 7

8 Alpha domains Myoglobin (1a6k)

9 Beta domains Immunoglobulin Light chain (1cfv) 9

10 Beta domains Neuraminidase One subunit (1a4q) 10

11 Beta domains Transthyretin One subunit (1tta)

12 Beta domains Satellite Tobacco Necrosis Virus Coat Protein (2buk) 12

13 Alpha/Beta, Alpha+Beta & Cross-Linked Domains 13

14 Alpha/beta domains are protein domains composed of beta strands connected by alpha helices. Alpha+Beta domains are protein domains composed of separate alpha-helical and betasheet regions. Cross-linked domains are protein domains with little or no secondary structure and stabilized by disulphide bridges or metal ions. 14

15 Alpha/Beta domains Triosephosphate isomerase (TIM) One subunit (1tim) 15

16 Alpha/Beta domains Aspartate semi-aldehyde dehydrogenase One domain (1brm) 16

17 Alpha+Beta domains TATA-binding protein (1tgh) 17

18 Cross-linked domains Neurotoxin from Brazilian scorpion Tityus serrulatus (1b7d) 18

19 Cross-linked domains Human Zinc-finger DNA-binding domain (3znf) 19

20 Protein Interaction Domains 20

21 Protein Interaction Domains Human C-SRC Tyrosine kinase SH2 domain (1shd) 21

22 Protein Interaction Domains Human Calmodulin (1cll) 22

23 Protein Interaction Domains Yeast Ski8p (1sq9) 23

24 CATH Protein Structure Classification The CATH database is a hierarchical domain classification of protein structures. C class Alpha domain, Beta domain, Alpha-Beta domain etc. A architecture Barrel, Sandwich, Propeller, Bundle etc. T topology (fold family) Structures are grouped into fold groups at this level depending on both the overall shape and connectivity of the secondary structures. H homologous superfamily This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. 24

25 The Universe of Protein Structures 25

26 The number of protein folds is large but limited. Protein structures are modular and proteins can be grouped into families on the basis of the domains they contain. The modular nature of protein structure allows for sequence insertions and deletions. 26

27 Domain superfamilies Multi-domain proteins Supradomain

28 Domain superfamilies Different geometries Different functions

29 Mycobacterium tuberculosis IdeR Iron dependent Regulator

30 Hepatitis C virus NS3 Protease/Helicase

31 Soybean Lipoxygenase-1

32 Felis domesticus Pyruvate kinase

33 Saccharomyces cerevisiae FAS Fatty Acid Synthase (2uv8) 33

34 Why do proteins have domains? Mix and match of domains, as an evolutionary process, has given rise to the great diversity of proteins we see today. Large chain folding is more likely to introduce incorrectly folded regions. More energetically favorable.

35 Why predict domains? Sequence alignments at a domain level can detect homologous sequences otherwise hard to find. Secondary structure prediction work better when applied to single domains. Can give insight into protein function. Truncating at domain borders can help expression/ solubility/crystallization. Dividing large proteins into domains may be necessary to solve structure by X-ray crystallography or NMR.

36 Domain prediction - Theoretical approach Soybean Lipoxygenase-1 MFSAGHKIKGTVVLMPKNELEVNPDGSAVDNLNAFLGRSVSLQLISATKADAHGKGKVGKDTFLEGINT SLPTLGAGESAFNIHFEWDGSMGIPGAFYIKNYMQVEFFLKSLTLEAISNQGTIRFVCNSWVYNTKLYKS VRIFFANHTYVPSETPAPLVSYREEELKSLRGNGTGERKEYDRIYDYDVYNDLGNPDKSEKLARPVLGG SSTFPYPRRGRTGRGPTVTDPNTEKQGEVFYVPRDENLGHLKSKDALEIGTKSLSQIVQPAFESAFDLK STPIEFHSFQDVHDLYEGGIKLPRDVISTIIPLPVIKELYRTDGQHILKFPQPHVVQVSQSAWMTDEEFARE MIAGVNPCVIRGLEEFPPKSNLDPAIYGDQSSKITADSLDLDGYTMDEALGSRRLFMLDYHDIFMPYVRQI NQLNSAKTYATRTILFLREDGTLKPVAIELSLPHSAGDLSAAVSQVVLPAKEGVESTIWLLAKAYVIVNDSC YHQLMSHWLNTHAAMEPFVIATHRHLSVLHPIYKLLTPHYRNNMNINALARQSLINANGIIETTFLPSKYS VEMSSAVYKNWVFTDQALPADLIKRGVAIKDPSTPHGVRLLIEDYPYAADGLEIWAAIKTWVQEYVPLYYA RDDDVKNDSELQHWWKEAVEKGHGDLKDKPWWPKLQTLEDLVEVCLIIIWIASALHAAVNFGQYPYGG LIMNRPTASRRLLPEKGTPEYEEMINNHEKAYLRTITSKLPTLISLSVIEILSTHASDEVYLGQRDNPHWTS DSKALQAFQKFGNKLKEIEEKLVRRNNDPSLQGNRLGPVQLPYTLLYPSSEEGLTFRGIPNSISI

37 Pfam - Protein families database Bateman A, Coin L et. al. The Pfam protein families database. Nucleic Acids Res Jan 1;32 (Database issue):d

38 DomPred - Protein Domain Prediction Server Marsden, McGuffin & Jones Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Science, 11 (2002),

39 Armadillo - Domain Linker Prediction Dumontier M, Yao R et. al. Armadillo: domain boundary prediction by amino acid composition. J Mol Biol Jul 29;350(5):

40 IUPred - Dissecting proteins into ordered and disordered parts Dosztanyi Z, Csizmok V et. al. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics Aug 15; 21(16):

41 FoldIndex - Finds unfolded regions in protein sequence Prilusky J, Felder CE et. al. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics Aug 15; 21(16):

42 Domain prediction - Experimental approach Felis domesticus Pyruvate kinase SKPHSDVGTAFIQTQQLHAAMADTFLEHMCRLDIDSPPITARNTGIICTIGPASRSVEILKEMI KSGMNVARLNFSHGTHEYHAETIKNVRAATESFASDPIRYRPVAVALDTKGPEIRTGLIKGS GTAEVELKKGATLKITLDNAYMEKCDENVLWLDYKNICKVVEVGSKVYVDDGLISLLVKEKG ADFLVTEVENGGSLGSKKGVNLPGAAVDLPAVSEKDIQDLKFGVEQDVDMVFASFIRKASD VHEVRKVLGEKGKNIKIISKIENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQKMMI GRCNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIMLSGETAKGDYPLEAV RMQHLIAREAEAAMFHRKLFEELVRGSSHSTDLMEAMAMGSVEASYKCLAAALIVLTESG RSAHQVARYRPRAPIIAVTRNHQTARQAHLYRGIFPVVCKDPVQEAWAEDVDLRVNLAMN VGKARGFFKHGDVVIVLTGWRPGSGFTNTMRVVPVP

43 Limited proteolysis Proteolysis SDS-PAGE HPLC-MS SKPHSDVGTAFIQTQQLHAAMADTFLEHMCRLDIDSPPITARNTGIICTIGPASRSVEIL KEMIKSGMNVARLNFSHGTHEYHAETIKNVRAATESFASDPIRYRPVAVALDTKG PEIRTGLIKGSGTAEVELKKGATLKITLDNAYMEKCDENVLWLDYKNICKVVE VGSKVYVDDGLISLLVKEKGADFLVTEVENGGSLGSKKGVNLPGAAVDL Gao X, Bain K, Bonanno JB et. al. High-throughput limited proteolysis/ mass spectrometry for protein domain elucidation. J Struct Funct Genomics. 2005;6(2-3): ELVRGSSHSTDLMEAMAMGSVEASYKCLAAALIVLTESGRSAHQVARYRPRAPIIAVTRNHQTARQAHLY RGIFPVVCKDPVQEAWAEDVDLRVNLAMNVGKARGFFKHGDVVIVLTGWRPGSGFTNTMRVVPVP PAVSEKDIQDLKFGVEQDVDMVFASFIRKASDVHEVRKVLGEKGKNIKIISKIENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQ KMMIGRCNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIMLSGETAKGDYPLEAVRMQHLIAREAEAAMFHRKLFE

44 GFP-fusion Nuclease treatment Culture plate Expression vectors Hart DJ, Tarendeau F. Combinatorial library approaches for improving soluble protein expression in Escherichia coli. Acta Crystallogr D Biol Crystallogr Jan;62(Pt 1): E.coli bacteria

45 CoFi-blot Erase-a-base process

46 CoFi-blot Expression vectors Filters E.coli bacteria Colony plate Cornvik T, Dahlroth SL, Magnusdottir A, Herman MD, Knaust R, Ekberg M, Nordlund P. Colony filtration blot: a new screening method for soluble protein expression in Escherichia coli. Nat Methods Jul;2(7):507-9.

47 Textbook Petsko & Ringe Protein Structure and Function 1-14 The Protein Domain (p30-31) 1-15 The Universe of Protein Structures (p32-33) 1-17 Alpha Domains and Beta Domains (p36-37) 1-18 Alpha/Beta, Alpha+Beta and Cross-Linked Domains (p38-39) 3-1 Protein Interaction Domains (p88-89)