A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation

Size: px
Start display at page:

Download "A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation"

Transcription

1 A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation Minoru Asogawa, Yukiko Fujiwara, Akihiko Konagaya Massively Parallel Systems NEC Laboratory, RWCP * 4-1-1, Miyazaki, Miyamaeku, Kawasaki, Kanagawa 216, Japan asogawa@csl.cl.nec.co.jp, yukiko@csl.cl.nec.co.jp, konagaya@csl.cl.nec.co.jp Abstract This paper reports results obtained from a study on database validation concerning a leucine zipper motif, utilizing both a functional motif model and a structural motif model. As an example for this method, a leucine tipper motif is chosen, which is a subsequence consists of two twisted alpha helix sequences and preceded by a DNA binding site. For a functional motif model, an HMM (Hidden Markov Model) which is trained with leucine zipper subsequences is employed. For a structural motif model, a Neural Network, which is trained to classify an alpha helix region, is employed. Because only 122 such leucine zipper sequences are in the Swiss Protein database (R.22), there is a possibility that the HMM could not learn the general mechanism for helical structures completely. Therefore, a structural motif model for an alpha helix is utilized to eliminate non-helical sequences. Fortunately, there are numerous secondary structures examples available in the PDB database. All polypeptides in the Swiss Protein database v.22 are examined with a combination of an HMM and a neural network. For predicting a leucine zipper region, an HMM achieved percent and improved up to percent by combining a neural network. 1 Introduction Predicting a motif from protein sequences is an important problem. Motifs, which are the preserved sites in the evolution process, are considered to represent the function or structure of proteins. This motif prediction problem increases in importance as many pre tein sequences are revealed, because the rate of sequencing far exceeds that of understanding the structures. l RWCP: Real World Computing Partnership Until recently, a symbolic pattern representation was used to represent a functional motif. For example, the pattern of the leucine zipper motif, a wellknown motif for the DNA binding proteins, is L-X(6)- L-X(6)-L-X(6)-L-X(6)-L representing a repetition of Leucine with any following six residues. One of the issues in motif representation is the exception handling caused by the variety of amino acid sequences. Konagaya [l] employed a stochastic decision predicate, which consists of the conjunctive and disjunctive patterns and their probability parameter to represent the exceptions in the motif. However, using a pattern representation cannot achieve satisfactory classification accuracy. For example, the accuracy of leucine zipper motifs is percent. This is because proteins usually have various sequences corresponding to different species, even around motifs. In leucine zipper motifs, the: repeated L s (Leu) tend to change to other amino acids, such as V (Val), A (Ala), M (Met). Such variations are considered to be related to the evolution process of organisms. Thus, these variations might be some systematic relationships; i.e., the variations of amino acids at a residue related to the neighboring residues. These systematic relationships represent biological characteristics. An HMM can represent these systematic relationships or biological characteristics. Another aspect of motifs is that they contain specific structural motifs; those are secondary structures, such as an alpha helix or a beta strand. In the leucine zipper motif, there are two twisted alpha helix sequences bonded by leucines (or perhaps other similar amino acids). Although an HMM is implicitly trained to represent structural motifs, there is not enough motif data available for structural motif learning. Consequently, an HMM might accept a sequence similar to the leucine zipper motif, but which doesn t form a helical structure. A neural network is utilized to predict structural /95 $ IEEE Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 174

2 motifs; i.e., secondary structure. A large amount of data is available for the secondary structure, and can be utilized for training a neural network. To achieve high classification accuracy, a functional motif, modeled with an HMM, and a structural motif, modeled with a neural network, are combined ss one system. I Protein Data Base I Protein Sequences Belonging to Certain Category 1 Iterative Duplication Learning Method. Motif Represented by HMM us to obtain an optimal HMM topology for the given training sequences, as well as optimal HMM parameters for the network. It starts from a small fullyconnected network and iterates the network generation and parameter optimization. The network generation prunes transitions and adds a state according to the previous topology. This method obtains simpler HMM topology in less time than the one obtained from a fully-connected model. This paper is organized as follows. First, the authors explain HMMs followed by an explanation of the iterative duplication method for an HMM learning. After that, the authors explain about a neural network. Then, the experimental results are given a leucine zipper motif prediction, only employing an HMM. Finally, the performance improvement achieved by combining an HMM and a neural network is shown. Unknown Sequence0 Category Prediction 2 HMMs Neural Network 2.l Overview Learning Secondary Structure Data Base Protein DSSP Data Base Figure 1: Motif prediction outline It is desirable to extract motifs biological characteristics from the training data only. When the training sequences come from two different subgroups or families, it is expected that the resulting HMM topology would branch into two parts. For this purpose, general HMMs containing global loops are needed, instead of the left-to-right models commonly used in speech recognition. Accordingly, one of the problems to solve is determining the HMM topology, because there are lots of candidate topology in general HMMs. One of the methods to determine the HMM topology is to train from a large fully-connected HMM and delete negligible transitions. However, the HMM resulting from a fully-connected one may be very complex and difficult to interpret. Moreover, it takes a huge amount of training time in order to optimize numerous free parameters. An HMM is a nondeterministic finite state automaton that represents a Markov process. HMMs are commonly used in speech recognition[ll], and recently have been applied to protein structure grammar estimation[5] and protein modeling[7], [6]. An HMM is characterized by a network with a finite number of parameters and states (see Fig. 2). Parameters represent initial probabilities, transition probabilities, and observation probabilities. At discrete instants of time, the process is assumed to be in one state and an observation (or output symbol) is generated by the observation probability corresponding to the current state. This state then changes, depending upon its transition probability. Init. prob.a.0 Figure 2: An example of an HMM (left-t&right) Thus, the iterative duplication method, is utilized A special type of an HMM, called a left-to-right for generating an HMM [2] [3]. The method enables model in Fig. 2, is commonly used in the case of speech 175

3 recognition. In this model, states are linearly connected with self-loop transitions; a state visited once is never revisited at a later time. This is because there is little requirement to deal with periodic structures in speech recognition. However, such periodic structures are rather common in amino acid sequences and have great significance for constructing a geometric structure. Therefore, the authors adopt a general HMM containing global loops. The correspondence between motifs and HMMs is as follows. The training set inbolves the portions of amino acid sequences that have the same structure or function. An HMM is expected to model the training proteins in terms of discrimination. The alphabet used for the output symbols corresponds to 20 amino acids. The test sequence is the portion of an amino acid sequence which might have the target structure or function. The result is the likelihood of the test sequence, calculated by tracing all possible transition paths, that observe the test sequence in the HMM. To use a trained HMM its a classifier, the authors define a threshold value determined by Z score [7], generat,ed by both positive examples and negative examples. The probability generated by a given sequence is compared with the threshold value, which is given by approximating both positive and negative likelihood distributions as normal distributions. A threshold value for trained neural network is also determined by both positive examples and negative examples. One of the great advantages of using HMMs and neural networks is that it is possible to quantify similarity between the test sequence and the training set by comparing their likelihood on the HMM. 3 Motif Prediction using an HMM 3.1 Learning Algorithm In order to obtain the optimal HMM topology for the given training sequences, an iterative duplication method[2] is used. This method also produces the opt,imal HMM parameters for the network. The method includes transition network generation and parameter optimization. The method is summarized in Fig. 4. It starts from a small fully-connected network. In order to avoid converging in the local maximum, many initial HMMs with random parameters are prepared. The Baum-Welch algorithm is used for parameter opt,imization. Network generation is implemented by copying one node selected from the current network. The method iterates the network generation and pa- rameter optimization phases until sufficient discrimination accuracy is obtained. The details of network generation follow. First, delete the transitions with negligible transitional probability, that is less than 6 = max(cl, P), where cl is a smoothing value and r is a convergence radius. Next, for each state Si except the final state, count the number of incoming and outgoing transitions of the state, that is the number of transitions from the state Si plus that of transitions to the state Si. Then, select the state with the largest number (denoted as SC) and make a copy of it (denoted as S,,,) so that S,,, has the same transition with SC. If the state SC has a selfloop, sm has a self-loop and the transitions from SC to S,,, and from S,,,, to SC (see Fig. 3). The purpose of deleting the negligible transitions is to restrict the network topology space and eventually to reduce the training cost for parameter optimization. The reason to split the most connected node is that it might represent overlapping of independent states. In this case, the network topology may become simpler by splitting the states. Fig. 3 shows an example of such a case. In Fig. 3 (a), the most connected state is a hatched state which outputs E (Glu) with probability 0.26, Q (Gln) with probability 0.15 and so on. By splitting the state into two states, a new network will be obtained which has additional transitions represented by bold lines (see Fig. 3 (b)). However, the network can be simplified, if the most transitions become negligible after parameter optimization (see Fig. 3 (c)). In each epoch, this algorithm produces an optimal HMM for the training data with each number ofstates. Selecting the HMM with highest prediction accuracy, the optimal number of states for the given data is obtained. 3.2 Prediction using an HMM Prediction is carried out by comparing data with a Z score, which is a normalized likelihood. If a sequence achieves higher likelihood than the threshold value, then it is predicted to have the target motif, that is, the same structure and/or function. In the current implementation, a threshold value, determined by Z score [7], is generated by both positive examples and negative examples. Accuracy E,, is defined as follows; Transitions unrelated ted from Fig. 3. to the explanation are omit- 176

4 x C I C.v new (Left) A general rule for a duplication. (Right) A n example of the step from six to seven. (a) The resulting HMM with 6 states after parameter optimization and negligible transition deletion. (b) A new network by a hatched state copy. (c) An obtained HMM with 7 states after parameter optimization and negligible transition deletion. Figure 3: A part of learning R 0.13 Q 0.10 input: (protein) sequences and a small fully-connected HMM. initialization: optimize parameter for the HMM. choose the best HMM on likelihood as a seed. repeat network generation: delete negligible transitions. copy the most connected state. parameter optimization: optimize parameters for the new topology. choose the best HMM on likelihood. until sufficient accuracy is obtained. output: the resulting HMM. values are normalized with respect to the number of each example. Since there are plenty of negative examples, compared to positive examples, the threshold value would be determined only by the negative examples without normalization. The threshold value for classification is determined by the following method. In this method, it is assumed that the likelihood distribution, both positive and negative, is a normal distribution. Actually, Fig. 5 shows resulting likelihood for the close data, and the likelihood distribution is almost a normal distribution. By using this assumption, Eq. (1) is approximated as, Figure 4: Iterative Duplication Method where N+ is the number of positive examples, N- is the number of negative examples and Cf is the number of correctly classified examples in the positive data. C- is the number of correctly classified negative examples. Both the positive and negative accuracy where P(.;,u, U) is the normal distribution probability function with the mean value as p and standard deviation as u, P,,,~, a,,, is the mean value and standard deviation of negative examples, and ppos, upoj is for positive examples. To obtain the a value which 171

5 produces maximum E,,, the partial derivative for Eq. (2) with respect to (Y is taken and the formula which sets this derivative as 0 is solved. Consequently, the threshold value a is determined as follows; (3) accuracy is obtained. After 300 epoch the classification performance achieves as much as percent for the learning data and percent for testing data. In this performance evaluation, the cell which yields higher activation is considered as an answer category. Note that, in this experiment, the neural network is designed to predict an alpha helix region longer than 15 residues. This is different from usual applications of a neural network, predicting 3 classes of the secondary structures. This is the reason why this neural network shows such a high performance. For applying the trained neural network as an alpha helix region, output activations are compared with helix region teacher signal and a normalized squared error is utilized. Therefore, the small normalized squared error implies that the current subsequence might have an alpha helix structure Figure 5: Z Score Histogram for Close Data Figure 6: Neural Network Architect,ure 4 Secondary Structure Prediction using Neural Network For a neural network, a multi-layered perceptron is utilized. The neural network consists of 3 layers, an input layer, a hidden layer and an output layer. The neural network is designed to observe 15 residues simultaneously [12]. Each residue is coded as a binary string with a length of 20. When the residue is unknown, 0.05 is applied to all 20 input cells. Thus, there are 300 input cells at the input layer. There are 10 cells at the hidden layer. Two cells are at the output layer, individual ones correspond to a prediction category, one is for a helix and the other is a non-helix region (Fig. 6). Th is classification corresponds to a secondary structure of residue at the center of the input window. Learning is implemented with the backpropagation algorithm, until sufficient discrimination 5 Experiments 5.1 Training Data and Test Data For predicting a leucine zipper motif with an HMM, 112 positive examples, which are the collect8ion of subsequences annotated as leucine zipper (like), were chosen from the Swiss Protein database Release 22 [lo]. Positive examples contain short sequences in length 15, 22, 29 and 36, with proportion sas much as 7.14, 56.25, and 4.46 percent, respectively negative examples were randomly selected from the Swiss Protein database Release 22. These negative examples were chosen only from a protein which doesn t have a leucine zipper annotation. The ratio of sequence length is controlled to coincide to 178

6 Proceedings of the 28th Annual Hawaii lnrernational Conference on System Sciences that for positive examples. Randomly selected 80 percent of the positive subsequences are used for training, and the remaining positive examples are used for prediction performance evaluation. To employ a trained HMM as a classifier, a Z score threshold value is determined by both positive and negative examples. For determining threshold value, randomly selected 20 percent of the negative examples were utilized. Remaining 80 percent of the negative examples are used for testing purposes. To train a neural network, alpha helix subsequences are chosen from PDB data base of August of 92. Practically, to determine secondary structures from PDB data, the DSSP program is used. In the training data, subsequences of alpha helix with lengths more than 15 are chosen as the positive data. Neither 2,3 nor 5 helix is included in the positive data. All subsequences which don t contain any helix region in the input window range (15 residues) are chosen as the negative data. Randomly selected 70 percent of the training subsequences are used for learning, and the remaining data are used for prediction performance evaluation. To evaluate a symbolic representation performance, the following selected sequences are utilized; sequences which satisfy L-X(S)-L-X(6)-L-X(6)-L-X(6)-L pattern, a repetition of leucine and any six residues [ll]. Although, those selected sequences are similar to positive examples in terms of symbolic representation, only some of these are true leucine zipper sequences. Since there are numerous negative data ( nonleucine zipper sequences rather than leucine zipper sequences), the symbolic representation yields as high an accuracy as percent. 5.2 Evaluation with a Symbolic Pattern To measure symbolic motif representation performance, subsequences with lengths of 15, 22, 29 and 36 are collected from the Swiss Protein database and tested to determine whether they satisfy the leucine zipper representation, such is L-X(S)-L-X(S)-L-X(S)- L-X(6)-L. This classification is validated with the database annotated comment. The result is shown in Table 1. To make the evaluation performance acceptable, partial matching is separately counted as a partial match ; i.e., as long aa a selected subsequence is included in the correct region, which is determined by the database annotation, it is counted as a partial match. 5.3 Experimental Results with HMM training test pos.data pos.data neg.data average test0 98.9% 81.8% 83.0% 82.4% \ test % 91.3% 65.2% 78.2% test2 98.9% 68.2% 83.9% 76.1% test3 98.9% 87.0% 78.6% 82.8% test4 98.9% 77.3% 75.9% 76.6% Ave. 99.1% 81.3% 77.3% 79.3% Table 2: Prediction accuracy (leucine zipper) Table 2 shows the result of cross-validation for leucine zipper motifs. To contrast the ability of an HMM, 112 carefully selected negative examples are used. These are the sequences which contain the leucine zipper motif, L-X(G)-L-X(S)-L-X(S)-L-X(G)- L. Positive data is divided into 5 groups and tested with both negative and positive data. The average prediction accuracy is 99.1 percent for training data and 79.3 percent for test data; 81.3 percent for positive data To emphasize HMM performance, the negative test set is chosen from the sequences satisfying the L-X(6)-L-X(6)-L-X(6)-L-X(6)-L pattern, the average prediction accuracy for test data is just 14.8 percent; 29.5 percent for the positive data and 0.0 percent for the negative data. Hydrophilic I Hydrophobic Helix 3 Basic M- DNA-binding site p--; --- Observed from top L 5 2 Hydrophilic Figure 8: (Left) Biological structure of a leucine zipper motif. (Right)The helical wheel. Fig. 7 shows an HMM for leucine zipper motifs obtained using the iterative duplication method. This HMM contains global loops corresponding to the helix structure in the leucine zipper motif. Such a helical structure, as that shown in Fig. 8, is caused by 179

7 Proceedings of rhe 28th Annual Hawaii Inlernational Conference on System Sciences Perfect Match positive 1 total 1 percent 11 negative total I percent II average % % % Partial Match positive I total 1 percent 11 negative total percent 1 average % % % Table 1: Prediction Accuracy with a Symbolic Pattern s 0.25 E 0.23 DO.14 TO.17 \ 0.02 L 0.13 v 0.10 \ vo.15 I 0.12 K 0.12 NO.12 LO.11 E0.13 E-A L E0.41 R 0.27 NO.18 D 0.14 K0.12 Figure 7: Extracted HMM for a leucine zipper motif (Left) An HMM (1 eucine zipper). (Right) The helical wheel, i.e., helices observed from the top at HMM paths. Hydrophobic, hydrophilic and the other amino acids are described by bold, hatched and broken letters, respectively. The characters on the circles are t.he most frequently observed amino acids at each state. the existence of seven amino acids per every two periods. This is because a pair of aligned leucines forms a zipper-like structure. On the left in Fig. 8, this characteristic is shown with a helical wheel view from the top. These circles depict that there are many hydrophobic amino acids ou one side around combined leucines and many hydrophilic amino acids on the other side. This tendency for hydrophilic and hydrophobic amino acids is a key to forming two twisted helices. In Fig. 7, each circle at the right corresponding to each HMM path has a similar characteristic to the previous helical wheel. The characters on the circles are the most frequently observed amino acid in each state. In order to see the helical wheel, hydrophobic amino acids, such as I (Ile), V (Val), L (Leu), F (Phe), C (Cys) are described by bold letters in the following. On the other hand, hydrophilic amino acids, such as R (Arg), K (Lys), N (Asn), D (Asp), Q (Gln), E (Glu), H (His) are described by hatched letters. Others M (Met,), A (Ala), G (GUY), T (Thr), s Per>, W ( %I, Y (Tyr), P (Pro) are described by broken letters. These circles show three kinds of helical wheels. Therefore, lit is shown that the iterative duplication method automatically extracted the helical structures and characteristics from the positive data. Fig. 5 indicates the Z score for both positive and negative data on close data. By using the method described in 3.2, the threshold value (I! is determined as , which is indicated with a dotted line in Fig. 5. With this threshold value, the HMM achieved accuracy E,,, as much as percent for the close data. The close data consists of positive data and negative data; open data is 80 of the leucine zipper subse- 180

8 Proceedings of the 28th Annual Hawaii International Conference on System Sciences - I995 quences and is utilized for an HMM learning, negative data is of the non-leucine zipper subsequences and utilized for determining the threshold value. Details are shown in Table 3. This HMM and the threshold value (Y is examined with the open data, which are used for neither HMM learning nor the threshold value determination. The accuracy E,, is percent for the open data. Details are shown in Table 3. Fig. 9 indicates the Z score for the close data NN Nommliied Squared Error Figure 10: Normalized Squared Error Histogram for The Close Data For clarity, all negative Z scores less than the threshold value are omitted from this figure. Figure 9: Z Score Histogram for Open Data when in low likelihood. A solid line, in this figure, is drawn based on this fact. Therefore, the data below this line is interpreted as negative data. By utilizing this method, the number of incorrect classifications for negative data decreases from 4394 to 3190 for the close data. Consequently, accuracy E,, is improved from percent to percent. For the open data, the number of incorrect classifications for negative data decreases from to 6274, and accuracy E,, is improved from percent to percent. 7 Conclusion 6 Experimental Result with an HMM and a Neural Network Fig. 10 indicates a normalized squared error distribution for the close data. The normalized squared error is closely correlated to the class of the positive and negative data. Thus, the normalized squared error could be utilized for segregating the positive data from the negative data. Since some positive data indicates large squared error, it is much better to combine this measurement with another orthogonal measurement, such a9 the HMM likelihood. Fig. 11 indicates a scatter plot for the Z score and the normalized squared error. A dotted line corresponds to the threshold value a for the Z score. Careful examination of Fig. 11 shows that the normalized squared error increases when the Z score decreases for the positive dat,a. This indicates that leucine zipper subsequences tend to show high squared error, An HMM is capable of representing a stochastic motif well. Since a HMM is trained only by a small number of subsequences, there is a possibility that the HMM could not learn the general mechanism completely. Usually, the secondary structure of a motif is well known, especially for the leucine zipper motif. In the leucine zipper motif, there are two twisted alpha helix sequences bonded by leucines (or perhaps other similar amino acids). In the Swiss Protein database, the lengths of alpha helix sequences are 15, 22, 29 and 36. Therefore, a neural network trained with alpha helix subsequences more than 15 residues long and it is used to predict an alpha helix and a non-alpha helix region. According to the experience gained brom leucine zipper motif prediction, the HMM shows higher discrimination performance than a symbolic motif representation. By comparing two tables, Table 3 and 4, it is shown that the prediction performance is improved 181

9 pos.data neg.data correct total percentage correct total percentage average Close Open Table 3: Prediction accuracy with an HMM t Positive Z Score Figure 11: Normalized Squared Error Distribution for Close Data by combining a neural network. Since a large amount of negative data is used in this experiment and since most of the negative data is very easy to classify, the performance improvement looks small, such as percent. The ease achieved in classifying the negative data is shown by the prediction accuracy for a symbolic pattern method. This is because the amount of negative data, which is , further exceeds that for positive data, which is 112. Therefore, it is difficult to decrease the number of misclassifications of the negative data from By combining the neural network, the number of misclassifications of negative data is decreased from to Moreover, since a motif usually contains specific secondary structures, this combined method is widely applicable. Acknowledgment The authors would like to thank the genetic information processing group and Mr. Mamitsuka in NEC for meaningful discussion and valuable help. References [l] A. Konagaya and M. Kondo: Stochastic Motif Extraction using a Genetic Algorithm with the MDL principle, ~~ , HICSS26 (1993). [2] Y. Fujiwara and A. Konagaya: Protein Motif Extraction using Hidden Markov ModeP, pp56-64, Proceedings Genome Informatics Workshop IV (1993). [3] Y. Fujiwara, M. Asogawa and A. Konagaya: Protein Motif Extraction using Hidden Markov Model, To be appeared at ISMB 94 (1994). [4] S. Nakagawa: Speech Recognition Using Stochastic Models, pp29-108, Electronic Society of Information Communication (1988) [5] K. Asai, S. Hayamizu and K. Onizuka: HMM with Protein Structure Grammar, pp , HICSS26 (1993). 182

10 Z Score For clarity, all negative Z scores, less than the threshold value, are omitted from this figure. Figure 12: Normalized Squared Error Distribution for Open Data posdata neg.data correct total percentage correct total percentage average Close Open Table 4: Prediction accuracy with an HMM and a Neural Network [6] P. Baldi and Y. Chauvin and T.HunkapiIIer and M.A.McClure: Hidden Markov models of biological primary sequence information Neural Computation, (1994). [7] D. Haussler, A. Krogh, I. Mian and K. Sjolander: Protein Modeling using Hidden Markov Models: Analysis of Globins, pp , HICSS26 (1993). [ll] A. Aitken: Identification of Protein Consensus Sequences, ~~ , Ellis Horwood Limited (1990). [12] N. Qian, T. Sejnowski: Predicting the Secondary Structure of Globular Proteins Using Neural Network Models ~~ , Journal of Molecular Biology, 202 (1988). [8] J.Takami and S.Sagayama: Automatic Generation of the Hidden Markov Network by Successive State Splitting, , Proceedings of ICASSP (1991). [9] A. Bairoch: PROSITE database, SwissProt Release 22 (1992). [lo] A. Bairoch: Sequence database, SwissProt Release 22 (1992). 183

CFSSP: Chou and Fasman Secondary Structure Prediction server

CFSSP: Chou and Fasman Secondary Structure Prediction server Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 5 7.24/7.88J/5.48J The Protein Folding and Human Disease Packing of Secondary Structures Packing of Helices against sheets Packing of sheets against sheets Parallel Orthogonal Table:

More information

11 questions for a total of 120 points

11 questions for a total of 120 points Your Name: BYS 201, Final Exam, May 3, 2010 11 questions for a total of 120 points 1. 25 points Take a close look at these tables of amino acids. Some of them are hydrophilic, some hydrophobic, some positive

More information

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Sequence Analysis '17 -- lecture 16 1. Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Alpha helix Right-handed helix. H-bond is from the oxygen at i to the nitrogen

More information

MATH 5610, Computational Biology

MATH 5610, Computational Biology MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class

More information

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds

STRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds STRUCTURAL BIOLOGY α/β structures Closed barrels Open twisted sheets Horseshoe folds The α/β domains Most frequent domain structures are α/β domains: A central parallel or mixed β sheet Surrounded by α

More information

Protein 3D Structure Prediction

Protein 3D Structure Prediction Protein 3D Structure Prediction Michael Tress CNIO ?? MREYKLVVLGSGGVGKSALTVQFVQGIFVDE YDPTIEDSYRKQVEVDCQQCMLEILDTAGTE QFTAMRDLYMKNGQGFALVYSITAQSTFNDL QDLREQILRVKDTEDVPMILVGNKCDLEDER VVGKEQGQNLARQWCNCAFLESSAKSKINVN

More information

Sequence Databases and database scanning

Sequence Databases and database scanning Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.

More information

Lecture 11: Gene Prediction

Lecture 11: Gene Prediction Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are

More information

Zool 3200: Cell Biology Exam 3 3/6/15

Zool 3200: Cell Biology Exam 3 3/6/15 Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either

More information

Fundamentals of Protein Structure

Fundamentals of Protein Structure Outline Fundamentals of Protein Structure Yu (Julie) Chen and Thomas Funkhouser Princeton University CS597A, Fall 2005 Protein structure Primary Secondary Tertiary Quaternary Forces and factors Levels

More information

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München

Structure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München Structure formation and association of biomolecules Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München Motivation Many biomolecules are chemically synthesized

More information

7.013 Problem Set 3 FRIDAY October 8th, 2004

7.013 Problem Set 3 FRIDAY October 8th, 2004 MIT Biology Department 7.012: Introductory Biology - Fall 2004 Instructors: Professor Eric Lander, Professor Robert. Weinberg, Dr. laudette ardel Name: T: 7.013 Problem Set 3 FRIDY October 8th, 2004 Problem

More information

Residue Contact Prediction for Protein Structure using 2-Norm Distances

Residue Contact Prediction for Protein Structure using 2-Norm Distances Residue Contact Prediction for Protein Structure using 2-Norm Distances Nikita V Mahajan Department of Computer Science &Engg GH Raisoni College of Engineering, Nagpur LGMalik Department of Computer Science

More information

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.

Problem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below. Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose

More information

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202

All Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202 Secondary Structure In the previous protein folding activity, you created a hypothetical 15-amino acid protein and learned that basic principles of chemistry determine how each protein spontaneously folds

More information

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns

Folding simulation: self-organization of 4-helix bundle protein. yellow = helical turns Folding simulation: self-organization of 4-helix bundle protein yellow = helical turns Protein structure Protein: heteropolymer chain made of amino acid residues R + H 3 N - C - COO - H φ ψ Chain of amino

More information

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons

More information

Structural bioinformatics

Structural bioinformatics Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant

More information

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski. Chapter 8 One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model Abstract Despite the significant increase in computational power, molecular modeling of protein structure using

More information

Introduction to Proteins

Introduction to Proteins Introduction to Proteins Lecture 4 Module I: Molecular Structure & Metabolism Molecular Cell Biology Core Course (GSND5200) Matthew Neiditch - Room E450U ICPH matthew.neiditch@umdnj.edu What is a protein?

More information

Virtual bond representation

Virtual bond representation Today s subjects: Virtual bond representation Coordination number Contact maps Sidechain packing: is it an instrumental way of selecting and consolidating a fold? ASA of proteins Interatomic distances

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Secondary Structure Prediction Secondary Structure Annotation Given a macromolecular structure Identify the regions of secondary structure

More information

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction Seong-gon KIM Dept. of Computer & Information Science & Engineering, University of Florida Gainesville,

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec6:Interpreting Your Multiple Sequence Alignment Interpreting Your Multiple Sequence

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 19 Spring 2004-1- Protein structure Primary structure of protein is determined by number and order of amino acids within polypeptide chain.

More information

DNA Begins the Process

DNA Begins the Process Biology I D N A DNA contains genes, sequences of nucleotide bases These Genes code for polypeptides (proteins) Proteins are used to build cells and do much of the work inside cells DNA Begins the Process

More information

Protein Folding Problem I400: Introduction to Bioinformatics

Protein Folding Problem I400: Introduction to Bioinformatics Protein Folding Problem I400: Introduction to Bioinformatics November 29, 2004 Protein biomolecule, macromolecule more than 50% of the dry weight of cells is proteins polymer of amino acids connected into

More information

Immune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor

Immune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor Immune Programming Payman Samadi Supervisor: Dr. Majid Ahmadi March 2006 Department of Electrical & Computer Engineering University of Windsor OUTLINE Introduction Biological Immune System Artificial Immune

More information

Thr Gly Tyr. Gly Lys Asn

Thr Gly Tyr. Gly Lys Asn Your unique body characteristics (traits), such as hair color or blood type, are determined by the proteins your body produces. Proteins are the building blocks of life - in fact, about 45% of the human

More information

Genetic Algorithms For Protein Threading

Genetic Algorithms For Protein Threading From: ISMB-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Genetic Algorithms For Protein Threading Jacqueline Yadgari #, Amihood Amir #, Ron Unger* # Department of Mathematics

More information

From DNA to Protein Structure and Function

From DNA to Protein Structure and Function STO-106 From DNA to Protein Structure and Function Teacher information Summary: Students model how information in the DNA base sequences is transcribed and translated to produce a protein molecule. They

More information

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33

CHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33 HPER 1 DN: he Hereditary Molecule SEION D What Does DN Do? hapter 1 Modern enetics for ll Students S 33 D.1 DN odes For Proteins PROEINS DO HE nitty-gritty jobs of every living cell. Proteins are the molecules

More information

BIRKBECK COLLEGE (University of London)

BIRKBECK COLLEGE (University of London) BIRKBECK COLLEGE (University of London) SCHOOL OF BIOLOGICAL SCIENCES M.Sc. EXAMINATION FOR INTERNAL STUDENTS ON: Postgraduate Certificate in Principles of Protein Structure MSc Structural Molecular Biology

More information

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 29 September 2005

Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 29 September 2005 Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 9 September 005 Focus concept Purification of a novel seed storage protein allows sequence analysis and

More information

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos

Daily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos Daily Agenda Warm Up: Review Translation Notes Protein Synthesis Practice Redos 1. What is DNA Replication? 2. Where does DNA Replication take place? 3. Replicate this strand of DNA into complimentary

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.

G+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores. 1 Introduction 2 Chromosomes Topology & Counts 3 Genome size 4 Replichores and gene orientation 5 Chirochores 6 7 Codon usage 121 marc.bailly-bechet@univ-lyon1.fr Bacterial genome structures Introduction

More information

Time Series Motif Discovery

Time Series Motif Discovery Time Series Motif Discovery Bachelor s Thesis Exposé eingereicht von: Jonas Spenger Gutachter: Dr. rer. nat. Patrick Schäfer Gutachter: Prof. Dr. Ulf Leser eingereicht am: 10.09.2017 Contents 1 Introduction

More information

1. DNA replication. (a) Why is DNA replication an essential process?

1. DNA replication. (a) Why is DNA replication an essential process? ame Section 7.014 Problem Set 3 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68120 by 5:00pm on Friday

More information

Hybrid Learning Algorithm in Neural Network System for Enzyme Classification

Hybrid Learning Algorithm in Neural Network System for Enzyme Classification Int. J. Advance. Soft Comput. Appl., Vol. 2, No. 2, July 2010 ISSN 2074-8523; Copyright ICSRS Publication, 2010 www.i-csrs.org Hybrid Learning Algorithm in Neural Network System for Enzyme Classification

More information

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources

APPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with

More information

X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction

X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction Tomohisa Shimasaki 1, Hiromi Yoshida 2, Shigehiro Kamitori 2 & Koji

More information

Aipotu I & II: Genetics & Biochemistry

Aipotu I & II: Genetics & Biochemistry Aipotu I & II: Genetics & Biochemistry Objectives: To reinforce your understanding of Genetics, Biochemistry, and Molecular Biology To show the connections between these three disciplines To show how these

More information

From Gene to Protein Transcription and Translation

From Gene to Protein Transcription and Translation Name: Hour: From Gene to Protein Transcription and Translation Introduction: In this activity you will learn how the genes in our DNA influence our characteristics. For example, how can a gene cause albinism

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

Gene Prediction. Srivani Narra Indian Institute of Technology Kanpur

Gene Prediction. Srivani Narra Indian Institute of Technology Kanpur Gene Prediction Srivani Narra Indian Institute of Technology Kanpur Email: srivani@iitk.ac.in Supervisor: Prof. Harish Karnick Indian Institute of Technology Kanpur Email: hk@iitk.ac.in Keywords: DNA,

More information

RNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix

RNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix Reason: RNA has ribose sugar ring, with a hydroxyl group (OH) If RNA in B-from conformation there would be unfavorable steric contact between the hydroxyl group, base, and phosphate backbone. RNA structure

More information

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis

An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis Lina L. Faller Department of Computer Science University of New Hampshire June 2008

More information

Hmwk # 8 : DNA-Binding Proteins : Part II

Hmwk # 8 : DNA-Binding Proteins : Part II The purpose of this exercise is : Hmwk # 8 : DNA-Binding Proteins : Part II 1). to examine the case of a tandem head-to-tail homodimer binding to DNA 2). to view a Zn finger motif 3). to consider the case

More information

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-

36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 37. The essential fatty acids are A. palmitic acid B. linoleic acid C. linolenic

More information

Genetic Algorithms in Matrix Representation and Its Application in Synthetic Data

Genetic Algorithms in Matrix Representation and Its Application in Synthetic Data Genetic Algorithms in Matrix Representation and Its Application in Synthetic Data Yingrui Chen *, Mark Elliot ** and Joe Sakshaug *** * ** University of Manchester, yingrui.chen@manchester.ac.uk University

More information

Disease and selection in the human genome 3

Disease and selection in the human genome 3 Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression

More information

Protein Synthesis. Application Based Questions

Protein Synthesis. Application Based Questions Protein Synthesis Application Based Questions MRNA Triplet Codons Note: Logic behind the single letter abbreviations can be found at: http://www.biology.arizona.edu/biochemistry/problem_sets/aa/dayhoff.html

More information

Proteins. Amino Acids (APK) Peptides (APK) 5/23/2012. Peptide bond. Acid. Amino

Proteins. Amino Acids (APK) Peptides (APK) 5/23/2012. Peptide bond. Acid. Amino Proteins Amino Acids (APK) Acid Amino Image courtesy of Biotech (biotech.chem.indiana.edu/pages/ protein_intro.html) Peptides (APK) Peptide bond 1 Proteins (polypeptides) Segment of a protein Peptide bonds

More information

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation

1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation 1. DNA, RNA structure 2. DNA replication 3. Transcription, translation DNA and RNA are polymers of nucleotides DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous

More information

The study of protein secondary structure and stability at equilibrium ABSTRACT

The study of protein secondary structure and stability at equilibrium ABSTRACT The study of protein secondary structure and stability at equilibrium Michelle Planicka Dept. of Physics, North Georgia College and State University, Dahlonega, GA REU, Dept. of Physics, University of

More information

Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures

Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures Biomolecules 2015, 5, 41-59; doi:10.3390/biom5010041 Article OPEN ACCESS biomolecules ISSN 2218-273X www.mdpi.com/journal/biomolecules/ Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar

More information

RNA Secondary Structure Prediction Computational Genomics Seyoung Kim

RNA Secondary Structure Prediction Computational Genomics Seyoung Kim RNA Secondary Structure Prediction 02-710 Computational Genomics Seyoung Kim Outline RNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction

More information

Protein Structure Databases, cont. 11/09/05

Protein Structure Databases, cont. 11/09/05 11/9/05 Protein Structure Databases (continued) Prediction & Modeling Bioinformatics Seminars Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS 1 CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, PhD Computer Science, Kennesaw State University 2 Genetics The discovery of

More information

Aipotu I: Genetics & Biochemistry

Aipotu I: Genetics & Biochemistry Aipotu I: Genetics & Biochemistry Objectives: To reinforce your understanding of Genetics, Biochemistry, and Molecular Biology To show the connections between these three disciplines To show how these

More information

Your Name: MID TERM ANSWER SHEET SIN: ( )

Your Name: MID TERM ANSWER SHEET SIN: ( ) MIDTERM EXAMINATION (October 23, 2008) BIOE150. Introduction to Bio-Nanoscience & Bio-Nanotechnology Professor Seung-Wuk Lee Fall Semester, 2008 0. Write down your name and the last digit of your SIN in

More information

Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures

Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures Mol2Net, 2015, 1(Section F), pages 1-6, Proceedings 1 SciForum Mol2Net Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures Miguel de Sousa 1, *, Cristian R. Munteanu 2 and

More information

ONLINE BIOINFORMATICS RESOURCES

ONLINE BIOINFORMATICS RESOURCES Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower

More information

From Gene to Protein Transcription and Translation i

From Gene to Protein Transcription and Translation i How do genes influence our characteristics? From Gene to Protein Transcription and Translation i A gene is a segment of DNA that provides the instructions for making a protein. Proteins have many different

More information

Biology: The substrate of bioinformatics

Biology: The substrate of bioinformatics Bi01_1 Unit 01: Biology: The substrate of bioinformatics What is Bioinformatics? Bi01_2 handling of information related to living organisms understood on the basis of molecular biology Nature does it.

More information

2. The instructions for making a protein are provided by a gene, which is a specific segment of a molecule.

2. The instructions for making a protein are provided by a gene, which is a specific segment of a molecule. From Gene to Protein Transcription and Translation By Dr. Ingrid Waldron and Dr. Jennifer Doherty, Department of Biology, University of Pennsylvania, Copyright, 2011 1 In this activity you will learn how

More information

Cambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level

Cambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level ambridge International Examinations ambridge International Advanced Subsidiary and Advanced Level *8744875516* BIOLOGY 9700/22 Paper 2 AS Level Structured Questions October/November 2016 1 hour 15 minutes

More information

From Gene to Protein via Transcription and Translation i

From Gene to Protein via Transcription and Translation i How do genes influence our characteristics? From Gene to Protein via Transcription and Translation i A gene is a segment of DNA that provides the instructions for making a protein. Proteins have many different

More information

IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay 1* and Mr. Suman Chakraborty

IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay 1* and Mr. Suman Chakraborty Volume 2, No. 4, April 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay

More information

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS Liviu Lalescu, Costin Badica University of Craiova, Faculty of Control, Computers and Electronics Software Engineering Department, str.tehnicii, 5, Craiova,

More information

Diversity of chosen effectors in samples of Polish and Norwegian populations of Phytophthora infestans

Diversity of chosen effectors in samples of Polish and Norwegian populations of Phytophthora infestans Diversity of chosen effectors in samples of Polish and Norwegian populations of Phytophthora infestans Emil Stefańczyk 1, Marta Brylińska 1, May Bente Brurberg 2, Ragnhild Naerstad 2, Abdelhameed Elameen

More information

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.

Basic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma. Cannarozzi 28th October 2005 Class Overview RNA Protein Genomics Transcriptomics Proteomics Genome wide Genome Comparison Microarrays Orthology: Families comparison and Sequencing of Transcription factor

More information

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided.

7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided. 7.014 Quiz II 3/18/05 Your Name: TA's Name: Write your name on this page and your initials on all the other pages in the space provided. This exam has 10 pages including this coversheet. heck that you

More information

Proteins typically contain 20 different. Representation of Protein-Sequence Information by Amino Acid Subalphabets

Proteins typically contain 20 different. Representation of Protein-Sequence Information by Amino Acid Subalphabets Representation of Protein-Sequence Information by Amino Acid Subalphabets Claus A. F. Andersen and Søren Brunak Within computational biology, algorithms are constructed with the aim of extracting knowledge

More information

This is the knowledge that you should understand upon completing this section:

This is the knowledge that you should understand upon completing this section: DN 11 Syllabus hecklist This is the knowledge that you should understand upon completing this section: 11.1 DN DN occurs bound to proteins in chromosomes in the nucs and as unbound DN in the mitochondria.

More information

Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data

Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data Justin S. Hogg Department of Computational Biology University of Pittsburgh Pittsburgh, PA 15213 jsh32@pitt.edu Abstract

More information

Permutation Free Encoding Technique for Evolving Neural Networks

Permutation Free Encoding Technique for Evolving Neural Networks Permutation Free Encoding Technique for Evolving Neural Networks Anupam Das, Md. Shohrab Hossain, Saeed Muhammad Abdullah, and Rashed Ul Islam Department of Computer Science and Engineering, Bangladesh

More information

BETA STRAND Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna

BETA STRAND Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna E-mail: a.hochkoeppler@unibo.it C-ter NH and CO groups: right, left, right (plane of the slide)

More information

Model. David Kulp, David Haussler. Baskin Center for Computer Engineering and Computer Science.

Model. David Kulp, David Haussler. Baskin Center for Computer Engineering and Computer Science. Integrating Database Homology in a Probabilistic Gene Structure Model David Kulp, David Haussler Baskin Center for Computer Engineering and Computer Science University of California, Santa Cruz CA, 95064,

More information

Cristian Micheletti SISSA (Trieste)

Cristian Micheletti SISSA (Trieste) Cristian Micheletti SISSA (Trieste) michelet@sissa.it Mar 2009 5pal - parvalbumin Calcium-binding protein HEADER CALCIUM-BINDING PROTEIN 25-SEP-91 5PAL 5PAL 2 COMPND PARVALBUMIN (ALPHA LINEAGE) 5PAL 3

More information

Research Article The Influence of Flanking Secondary Structures on Amino Acid Content and Typical Lengths of 3/10 Helices

Research Article The Influence of Flanking Secondary Structures on Amino Acid Content and Typical Lengths of 3/10 Helices International Journal of Proteomics Volume 214, Article ID 3623, 13 pages http://dx.doi.org/1155/214/3623 Research Article The Influence of Flanking Secondary Structures on Amino Acid Content and Typical

More information

Name Date of Data Collection. Class Period Lab Days/Period Teacher

Name Date of Data Collection. Class Period Lab Days/Period Teacher Comparing Primates (adapted from Comparing Primates Lab, page 431-438, Biology Lab Manual, by Miller and Levine, Prentice Hall Publishers, copyright 2000, ISBN 0-13-436796-0) Background: One of the most

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases

More information

Basic Bioinformatics: Homology, Sequence Alignment,

Basic Bioinformatics: Homology, Sequence Alignment, Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi

More information

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity.

Mutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity. Mutagenesis 1. Classification of mutation 2. Base Substitution 3. Insertion Deletion 4. s 5. Chromosomal Aberration 6. Repair Mechanisms Classification of mutation 1. Definition heritable change in DNA

More information

Basic Concepts of Human Genetics

Basic Concepts of Human Genetics Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes

More information

Bio-inspired Models of Computation. An Introduction

Bio-inspired Models of Computation. An Introduction Bio-inspired Models of Computation An Introduction Introduction (1) Natural Computing is the study of models of computation inspired by the functioning of biological systems Natural Computing is not Bioinformatics

More information

7.88J Protein Folding Problem Fall 2007

7.88J Protein Folding Problem Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 7.88J Protein Folding Problem Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 7.88 Lecture Notes - 8 7.24/7.88J/5.48J

More information

Green Genes: a DNA Curriculum

Green Genes: a DNA Curriculum Green Genes: a DNA Curriculum Massachusetts 4-H Program Activity #2: Build Your Own DNA Model from Paper Parts Time: 45-60 minutes Introduction to Group: Ask the group what DNA is. Make sure the key concept

More information

Automatic motif discovery in an enzyme database using a genetic algorithm-based approach

Automatic motif discovery in an enzyme database using a genetic algorithm-based approach Soft Comput (2005) DOI 10.1007/s00500-005-0490-z FOCUS D. F. Tsunoda H. S. Lopes Automatic motif discovery in an enzyme database using a genetic algorithm-based approach Published online: 10 May 2005 Springer-Verlag

More information

Baum-Welch and HMM applications. November 16, 2017

Baum-Welch and HMM applications. November 16, 2017 Baum-Welch and HMM applications November 16, 2017 Markov chains 3 states of weather: sunny, cloudy, rainy Observed once a day at the same time All transitions are possible, with some probability Each state

More information

Distributions of Beta Sheets in Proteins with Application to Structure Prediction

Distributions of Beta Sheets in Proteins with Application to Structure Prediction Distributions of Beta Sheets in Proteins with Application to Structure Prediction Ingo Ruczinski Department of Biostatistics Johns Hopkins University Email: ingo@jhu.edu http://biostat.jhsph.edu/ iruczins

More information

PROTEIN MODEL CHALLANGE

PROTEIN MODEL CHALLANGE PROTEIN MODEL CHALLANGE Lin Wozniewski lwoz@iun.edu Disclaimer i This presentation was prepared using draft rules. There may be some changes in the final copy of the rules. The rules which will be in your

More information