The Brutlag Bioinformatics Group Conserved Structural and Functional Motifs in Proteins

Size: px
Start display at page:

Download "The Brutlag Bioinformatics Group Conserved Structural and Functional Motifs in Proteins"

Transcription

1 The Brutlag Bioinformatics Group Conserved Structural and Functional Motifs in Proteins Doug Brutlag Department of Biochemistry & Medicine (by courtesy)

2 The Zinc Finger A Typical Protein Motif C..C...H...H

3 eblocks - Discovering Protein Motifs Jane Su A B Higher Specificity C Higher Sensitivity Serge Saxonov

4 Generating Sequence Motifs from Aligned Protein Sequences TEAESNMNDPVAEYQQY TDARQDLYELEVDYANL TEARENIAVLERDFEEV TEAESNMNDLVSEYQQY TEVRANMNDLVAEYQQY SEAESNMNDLVSEYQQY TEAREDLAALEKDYEEV TEAREDLAALERDYIEV SEAREDLAALEKDYEEV AEAREDLAALEKDYIEV SEAREDLAALEKDYEEV SEAREDLAALERDYEEV

5 emotifs Craig Nevill-Manning Tom Wu [fly]..h...h[st]..[kr]p[fy].c

6 emotif Makes Many Motifs

7 ematrices Tom Wu Jimmy Huang

8 Protein Functional Analysis Using BLOCKS+ or eblocks Motifs Significant at an Expectation of 10-3 Human Proteins (TREMBL) Non-Human Proteins (TREMBL) Drosophila melanogaster Total ematrix emotif ematrix+emotif (70%) 2961 (60%) (46%) 2063 (41%) (50%) (29%) Blue: eblocks Black: BLOCKS+

9 eproteome: Functional Genomics Database Serge Saxonov Peter Tan

10 esignal: Signal Transduction Motif Database Jes Alexander

11 3MOTIFs & 3MATRICES Steve Bennett Lin Lu

12 Protein Sequence Motifs & Feature Microenvironments Mike Liang

13 Protein Structural Superposition Amit Singh Jessica Shapiro

14 Applications of Three-Dimensional Protein Superposition Structure Similarity Search Multiple Structural Superposition Discover Conserved Structural Motifs Discover Conserved Substructures Automatic Classification of Protein Structure

15 Structural Similarity Search: Web Interface

16 Ligand Docking Using Robotic Motion Planning Serkan Apaydin Amit Singh Chris Varma

17 Why Robotics? Ligand =? Articulated Robot

18 Robotic Motion Planning Articulated Robot Ligand Binding

19 Results - Characterizing Binding Sites & Catalytic Sites Robotic Ligand Docking Indicates that: The catalytic site is not necessarily the one with the lowest ligand energy The catalytic site is instead characterized by a distinct energy barrier around the site The difficulty of leaving the catalytic site is higher than other potential sites. The difficulty of entering the catalytic is also correspondingly higher. energy kcal/mol kcal/mol kcal/mol Other Low Energy Site Catalytic Site Other Low Energy Site

20 Protein Folding using Stochastic Roadmap Simulation Unfolded set Serkan Apaydin Folded set Carlos Guestrin

21 Stochastic Roadmap Construction v i P ij v j Sample nodes from conformation space; Edge weights are probabilities.

22 Calculating Edge Probabilities Follow Metropolis criteria: P ij = exp( Eij / kbt ), Ni 1, otherwise. Ni if E ij > 0; v i P ii P ij v j Self transition probabilities: P ii =1 j i P ij Correspond to probabilities in Monte Carlo simulation;

23 Computation time on 1ROP Monte Carlo: 36 conformations 100 days of computer time Over 10 9 energy computations SRS: 5000 conformations 1 hour of computer time 5000 energy computations

24 Conclusion Roadmap for analysis of molecular motion; Efficiently considers many MC paths simultaneously; Application to p fold : More accurate results; Fewer energy computations; Many orders of magnitude speed-up. Other applications: Ligand-protein docking; Order of formation of secondary structure.

25 Transcription Factor Binding Sites Seed1 m-matches Xiaole Liu

26 TF Finding Helps Understanding Transcription Regulation Upstream Regions Co-expressed Genes GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC Pho 5 CACATCGCATCACGTGACCAGT...GACATGGACGGC Pho 8 GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA Pho 81 TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG Pho 84 CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT Pho Transcription Start

27 TF Finding Helps Understanding Transcription Regulation Upstream Regions Co-expressed Genes GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT

28 TF Finding Helps Understanding Transcription Regulation Upstream Regions Co-expressed Genes GATGGCTGCACCACGTTTATGC...ACGATGTCTCGC CACATCGCATCACGTGACCAGT...GACATGGACGGC GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT Pho4 binding

29 Discovering the function of proteins & proteomes Thomas D. Wu Craig G. Nevill-Manning Jane (Qiaojuan) Su Jimmy Y. Huang Steven P. Bennett Serge Saxonov Mike Liang Peter Tan Lin Lu Jes Alexander Serkan Apaydin Carlos Questrin Russ Altman Daphne Koller Jean-Claude Latombe Douglas L. Brutlag Department of Biochemistry Biomedical Informatics Computer Science Stanford University NHGRI HHMI