Protein Structural Motifs Search in Protein Data Base

Similar documents
Comparison of GHT-Based Approaches to Structural Motif Retrieval

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

Hmwk 6. Nucleic Acids

RNA secondary structure

Computational Methods for Protein Structure Prediction

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

BETA STRAND Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna

Programme Good morning and summary of last week Levels of Protein Structure - I Levels of Protein Structure - II

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Proteins Higher Order Structures

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

Protein Folding Problem I400: Introduction to Bioinformatics

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction

CFSSP: Chou and Fasman Secondary Structure Prediction server

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

MOL204 Exam Fall 2015

Protein Structure Prediction. christian studer , EPFL

Title: A topological and conformational stability alphabet for multi-pass membrane proteins

Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures

Protein Structure. Protein Structure Tertiary & Quaternary

BIOINFORMATICS Introduction

Protein Structure Prediction

DNA Glycosylase Exercise

Protein Structure Analysis

BMB/Bi/Ch 170 Fall 2017 Problem Set 1: Proteins I

JPred and Jnet: Protein Secondary Structure Prediction.

A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi YIN, Li-zhen LIU*, Wei SONG, Xin-lei ZHAO and Chao DU

Protein Structure Databases, cont. 11/09/05

Introduction to Proteins

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

BIOLOGY 200 Molecular Biology Students registered for the 9:30AM lecture should NOT attend the 4:30PM lecture.

RNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix

Ranking Beta Sheet Topologies of Proteins

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

Structural Bioinformatics (C3210) DNA and RNA Structure

Science Park High School Math Summer Assignment

Nucleotides: structure and functions. Prof. Dalė Vieželienė Biochemistry department Room No

Homework 4. Due in class, Wednesday, November 10, 2004

Gene Expression - Transcription

DNA Repair Protein Exercise

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein Structure/Function Relationships

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Ch Biophysical Chemistry

Hmwk # 8 : DNA-Binding Proteins : Part II

Protein backbone angle prediction with machine learning. approaches

UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination ALGORITHMS FOR BIOINFORMATICS CMP-6034B

BASIC MOLECULAR GENETIC MECHANISMS Introduction:

Representation in Supervised Machine Learning Application to Biological Problems

Structural bioinformatics

BCH222 - Greek Key β Barrels

Distributions of Beta Sheets in Proteins with Application to Structure Prediction

Overview. Secondary Structure. Tertiary Structure

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

STATE OF SOLIDIFICATION & CRYSTAL STRUCTURE

Lecture 2: Central Dogma of Molecular Biology & Intro to Programming

CABIOS. Objectively judging the quality of a protein structure from a Ramachandran plot. Rob W.W.Hooft, Chris Sander and Gerrit Vriend

TERTIARY MOTIF INTERACTIONS ON RNA STRUCTURE

BIRKBECK COLLEGE (University of London)

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

In silico Protein Recombination: Enhancing Template and Sequence Alignment Selection for Comparative Protein Modelling

Problem Set #2

Fundamentals of Biochemistry

DNA & DNA : Protein Interactions BIBC 100

Lecture 9 (10/2/17) Lecture 9 (10/2/17)

STRUCTURE, DYNAMICS AND INTERACTIONS OF PROTEINS BY NMR SPECTROSCOPY

Pacific Symposium on Biocomputing 4: (1999)

[Loganathan *, 5(11): November 2018] ISSN DOI /zenodo Impact Factor

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Textbook Reading Guidelines

Advanced topics in bioinformatics

Distributions of Beta Sheets in Proteins With Application to Structure Prediction

Residue Contact Prediction for Protein Structure using 2-Norm Distances

SUPPLEMENTARY DATA. DNAproDB: an interactive tool for structural analysis of DNA-protein complexes

Clamping down on pathogenic bacteria how to shut down a key DNA polymerase complex

Supplemental Information Molecular Cell, Volume 41

Simple jury predicts protein secondary structure best

PCA and SOM based Dimension Reduction Techniques for Quaternary Protein Structure Prediction

Discovering Sequence-Structure Motifs from Protein Segments and Two Applications. T. Tang, J. Xu, and M. Li

Supersecondary Structure Motifs and De Novo Protein Structure Prediction

Analytical Methods for Materials

Assessing a novel approach for predicting local 3D protein structures from sequence

Replication. Obaidur Rahman

Supplementary Figure 1.

A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation

Textbook Reading Guidelines

GCSE MATHEMATICS 8300/2F PRACTICE PAPER SET 4. Exam Date Morning Time allowed: 1 hour 30 minutes. Please write clearly, in block capitals.

ADAMMLP: An Adaptive Moment Based Hybrid Multi-Layer Perceptron for Protein Secondary Structure Prediction

SUPPLEMENTARY INFORMATION

Suppl. Figure 1: RCC1 sequence and sequence alignments. (a) Amino acid

Exams written in pencil or erasable ink will not be re-graded under any circumstances.

Protein Synthesis Notes

Packing of Secondary Structures

RNP purification, components and activity.

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

1.1 Chemical structure and conformational flexibility of single-stranded DNA

DNA Structures. Biochemistry 201 Molecular Biology January 5, 2000 Doug Brutlag. The Structural Conformations of DNA

A STUDY OF INTELLIGENT TECHNIQUES FOR PROTEIN SECONDARY STRUCTURE PREDICTION. Hanan Hendy, Wael Khalifa, Mohamed Roushdy, Abdel Badeeh Salem

Transcription:

Protein Structural Motifs Search in Protein Data Base Virginio Cantoni 1, Alessio Ferone 2, Ozlem Ozbudak 3, Alfredo Petrosino 2 1 Dept. of Electrical Engineering and Computer Science, Pavia Univ., Italy 2 Dept. of Applied Science. Univ. Naples Parthenope, Itally 3 Dept. of Electronics and Communication Engineering, Istanbul Tech. Univ., Turkey

PDB 2

Protein Data Bank (PDB) http://www.rcsb.org/pdb/ 3

Levels of protein structure representation Primary structure Secondary structure Tertiary structure Quaternary structure 4

Primary structure: the sequence of amino acids 5

Secondary structures Three basic components: helix sheet Loops (linear connections between the components) 6

The helices One of the most closely packed arrangement of residues. ~40% of residues in globular proteins 7

The sheet loosely packed arrangement of residues. Parallel Antiparallel Twisted 8

Secondary Structures Representation Secondary structures are represented as linear vectors (segments): the axis for the helix and the best fit segment for a sheet An alignment algorithm is used to match an helix segments with known axes to determine helix axis. Direct segment fits are made to fit sheet strands. 9

Secondary Structure Determination Programs: DSSP and STRIDE. On the average 4.8% of the target residues were differently assigned, this number reaching 12% for certain targets. 10

Protein Structure Comparison What are the most similar folds? PDB New protein 11

Secondary structure representation Each secondary structure is displayed as a cylinder The protein is represented by and ordered sequence of cylinder with two labels: helices or sheets 12

GHT applied to proteins For every protein, the distance ( ) of every secondary structure from a reference point (RP, eg the geometric center of the protein) and the angle (theta) between the direction of the secondary structure in the 3D space and the segment linking the center of that secondary structure with the RP are first calculated. (GH reference table RT) 13

In the way of GHT (simplified 2D representation) helices and sheets Query protein (scaled 0.5) Mapping Rule Votes Space 14

In the way of GHT helices and sheets Query protein Mapping Rule Votes Space 15

Generalized Hough Transform (SSS) Reference Point A Type A: -helix, l 1,TD.. 16

PROTEIN 1FNB The protein contains 22 Secondary Structure. Searched motif: Greek key (4 -sheets). The red circles are the helices and the blue circles are the sheets. The cyan blue triangles indicate the orientation of the secondary structures. The black point is the reference point. 17

PROTEIN 7FAB The protein contains 46 Secondary Structure. Searched motif: 3 helix and 2 sheet. The red circles are the helices and the blue circles are the sheets. The cyan blue triangles indicate the orientation of the secondary structures. The black point is the reference point. 18

SSC: Secondary Structures Co-occurrences RP Axis angle B Coplanar lines Axis distance Midpoint distance A Type A: -helix, l 1, TD.. Type B: -helix, l 2, TP.. 19

SST: Secondary Structures Triplets Reference Point C normal to ABC : l AB, l BC, l CA B : ABC A Type A: -helix, l 1 Type B: -helix, l 2 Type C: -helix, l 3 20

4 SSs motif: Terns co-occurrence Reference Point FourTerns ABC ACD BCD DAB C D B Type A: -helix, l 1 Type B: -strand, l 2 Type C: -helix, l 3 Type D: -strand, l 4 A 21

Reference Point 5 SSs motif: Terns co-occurrence Ten Terns ABC ACD BCD BDE CDE CEA DEA DAB EAB EBC E C D B Type A: -helix, l 1 Type B: -helix, l 2 Type C: -helix, l 3 Type D: -helix, l 4 Type E: -helix, l 5 A 22

PROTEIN 1FNB 40 Motif RP RP for SSC RP for SST RP for MDM 30 20 z 10 0-50 -10-10 0 10 20 30 40 50 The protein contains 22 Secondary Structure. Searched motif: Greek key (4 -sheets). The red circles are the helices and the blue circles are the sheets, in bold the motif SSs. 23 x 50 0 y

PROTEIN 7FAB 50 Motif RP RP for SSC RP for SST RP for MDM 40 30 20 z 10 0-10 -20-60 -40-20 x 0 20 40 20 y 0-20 The protein contains 46 Secondary Structure. Searched motif: 3 helices and 2 sheets. The red circles are the helices and the blue circles are the sheets, in bold the motif SSs. 24

Searching performances Searching a Greek Key motif (4 SSs, all -sheets) in 1FNB Searching a motif with 5 SSs (3 helices and 2 sheets) in 7FAB 25

PV Benchmark (20 proteins) 26

PV Benchmark: basic features 10000000 Number of candidate motifs 1000000 100000 10000 1000 3ss 4ss 5ss 100 10 20 30 40 50 100 Number of Secondary Structures 27

PV Benchmark: performances 50 5SS 4SS 3SS Searching time (msec) 5 SST 0,5 10 SSC 20 30 40 50 Number of Secondary Structures 28

Average performances SSC Number of proteins Number of SSs per motif Total number of motifs Total Searching Time (sec) Average Searching Time per motif (msec) 20 3 105971 119.882 1.1 20 4 918470 1275.585 1.4 20 5 6455009 11261.911 1.7 SST Number of proteins Number of SSs per motif Total number of motifs Total Searching Time (sec) Average Searching Time per motif (msec) 20 3 105971 768.508 7.3 20 4 918470 10303.806 11.2 20 5 6455009 111809.428 17.3 29