A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation
|
|
- Augusta Manning
- 6 years ago
- Views:
Transcription
1 A Combination of a Functional Motif Model and a Structural Motif Model for a Database Validation Minoru Asogawa, Yukiko Fujiwara, Akihiko Konagaya Massively Parallel Systems NEC Laboratory, RWCP * 4-1-1, Miyazaki, Miyamaeku, Kawasaki, Kanagawa 216, Japan asogawa@csl.cl.nec.co.jp, yukiko@csl.cl.nec.co.jp, konagaya@csl.cl.nec.co.jp Abstract This paper reports results obtained from a study on database validation concerning a leucine zipper motif, utilizing both a functional motif model and a structural motif model. As an example for this method, a leucine tipper motif is chosen, which is a subsequence consists of two twisted alpha helix sequences and preceded by a DNA binding site. For a functional motif model, an HMM (Hidden Markov Model) which is trained with leucine zipper subsequences is employed. For a structural motif model, a Neural Network, which is trained to classify an alpha helix region, is employed. Because only 122 such leucine zipper sequences are in the Swiss Protein database (R.22), there is a possibility that the HMM could not learn the general mechanism for helical structures completely. Therefore, a structural motif model for an alpha helix is utilized to eliminate non-helical sequences. Fortunately, there are numerous secondary structures examples available in the PDB database. All polypeptides in the Swiss Protein database v.22 are examined with a combination of an HMM and a neural network. For predicting a leucine zipper region, an HMM achieved percent and improved up to percent by combining a neural network. 1 Introduction Predicting a motif from protein sequences is an important problem. Motifs, which are the preserved sites in the evolution process, are considered to represent the function or structure of proteins. This motif prediction problem increases in importance as many pre tein sequences are revealed, because the rate of sequencing far exceeds that of understanding the structures. l RWCP: Real World Computing Partnership Until recently, a symbolic pattern representation was used to represent a functional motif. For example, the pattern of the leucine zipper motif, a wellknown motif for the DNA binding proteins, is L-X(6)- L-X(6)-L-X(6)-L-X(6)-L representing a repetition of Leucine with any following six residues. One of the issues in motif representation is the exception handling caused by the variety of amino acid sequences. Konagaya [l] employed a stochastic decision predicate, which consists of the conjunctive and disjunctive patterns and their probability parameter to represent the exceptions in the motif. However, using a pattern representation cannot achieve satisfactory classification accuracy. For example, the accuracy of leucine zipper motifs is percent. This is because proteins usually have various sequences corresponding to different species, even around motifs. In leucine zipper motifs, the: repeated L s (Leu) tend to change to other amino acids, such as V (Val), A (Ala), M (Met). Such variations are considered to be related to the evolution process of organisms. Thus, these variations might be some systematic relationships; i.e., the variations of amino acids at a residue related to the neighboring residues. These systematic relationships represent biological characteristics. An HMM can represent these systematic relationships or biological characteristics. Another aspect of motifs is that they contain specific structural motifs; those are secondary structures, such as an alpha helix or a beta strand. In the leucine zipper motif, there are two twisted alpha helix sequences bonded by leucines (or perhaps other similar amino acids). Although an HMM is implicitly trained to represent structural motifs, there is not enough motif data available for structural motif learning. Consequently, an HMM might accept a sequence similar to the leucine zipper motif, but which doesn t form a helical structure. A neural network is utilized to predict structural /95 $ IEEE Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 174
2 motifs; i.e., secondary structure. A large amount of data is available for the secondary structure, and can be utilized for training a neural network. To achieve high classification accuracy, a functional motif, modeled with an HMM, and a structural motif, modeled with a neural network, are combined ss one system. I Protein Data Base I Protein Sequences Belonging to Certain Category 1 Iterative Duplication Learning Method. Motif Represented by HMM us to obtain an optimal HMM topology for the given training sequences, as well as optimal HMM parameters for the network. It starts from a small fullyconnected network and iterates the network generation and parameter optimization. The network generation prunes transitions and adds a state according to the previous topology. This method obtains simpler HMM topology in less time than the one obtained from a fully-connected model. This paper is organized as follows. First, the authors explain HMMs followed by an explanation of the iterative duplication method for an HMM learning. After that, the authors explain about a neural network. Then, the experimental results are given a leucine zipper motif prediction, only employing an HMM. Finally, the performance improvement achieved by combining an HMM and a neural network is shown. Unknown Sequence0 Category Prediction 2 HMMs Neural Network 2.l Overview Learning Secondary Structure Data Base Protein DSSP Data Base Figure 1: Motif prediction outline It is desirable to extract motifs biological characteristics from the training data only. When the training sequences come from two different subgroups or families, it is expected that the resulting HMM topology would branch into two parts. For this purpose, general HMMs containing global loops are needed, instead of the left-to-right models commonly used in speech recognition. Accordingly, one of the problems to solve is determining the HMM topology, because there are lots of candidate topology in general HMMs. One of the methods to determine the HMM topology is to train from a large fully-connected HMM and delete negligible transitions. However, the HMM resulting from a fully-connected one may be very complex and difficult to interpret. Moreover, it takes a huge amount of training time in order to optimize numerous free parameters. An HMM is a nondeterministic finite state automaton that represents a Markov process. HMMs are commonly used in speech recognition[ll], and recently have been applied to protein structure grammar estimation[5] and protein modeling[7], [6]. An HMM is characterized by a network with a finite number of parameters and states (see Fig. 2). Parameters represent initial probabilities, transition probabilities, and observation probabilities. At discrete instants of time, the process is assumed to be in one state and an observation (or output symbol) is generated by the observation probability corresponding to the current state. This state then changes, depending upon its transition probability. Init. prob.a.0 Figure 2: An example of an HMM (left-t&right) Thus, the iterative duplication method, is utilized A special type of an HMM, called a left-to-right for generating an HMM [2] [3]. The method enables model in Fig. 2, is commonly used in the case of speech 175
3 recognition. In this model, states are linearly connected with self-loop transitions; a state visited once is never revisited at a later time. This is because there is little requirement to deal with periodic structures in speech recognition. However, such periodic structures are rather common in amino acid sequences and have great significance for constructing a geometric structure. Therefore, the authors adopt a general HMM containing global loops. The correspondence between motifs and HMMs is as follows. The training set inbolves the portions of amino acid sequences that have the same structure or function. An HMM is expected to model the training proteins in terms of discrimination. The alphabet used for the output symbols corresponds to 20 amino acids. The test sequence is the portion of an amino acid sequence which might have the target structure or function. The result is the likelihood of the test sequence, calculated by tracing all possible transition paths, that observe the test sequence in the HMM. To use a trained HMM its a classifier, the authors define a threshold value determined by Z score [7], generat,ed by both positive examples and negative examples. The probability generated by a given sequence is compared with the threshold value, which is given by approximating both positive and negative likelihood distributions as normal distributions. A threshold value for trained neural network is also determined by both positive examples and negative examples. One of the great advantages of using HMMs and neural networks is that it is possible to quantify similarity between the test sequence and the training set by comparing their likelihood on the HMM. 3 Motif Prediction using an HMM 3.1 Learning Algorithm In order to obtain the optimal HMM topology for the given training sequences, an iterative duplication method[2] is used. This method also produces the opt,imal HMM parameters for the network. The method includes transition network generation and parameter optimization. The method is summarized in Fig. 4. It starts from a small fully-connected network. In order to avoid converging in the local maximum, many initial HMMs with random parameters are prepared. The Baum-Welch algorithm is used for parameter opt,imization. Network generation is implemented by copying one node selected from the current network. The method iterates the network generation and pa- rameter optimization phases until sufficient discrimination accuracy is obtained. The details of network generation follow. First, delete the transitions with negligible transitional probability, that is less than 6 = max(cl, P), where cl is a smoothing value and r is a convergence radius. Next, for each state Si except the final state, count the number of incoming and outgoing transitions of the state, that is the number of transitions from the state Si plus that of transitions to the state Si. Then, select the state with the largest number (denoted as SC) and make a copy of it (denoted as S,,,) so that S,,, has the same transition with SC. If the state SC has a selfloop, sm has a self-loop and the transitions from SC to S,,, and from S,,,, to SC (see Fig. 3). The purpose of deleting the negligible transitions is to restrict the network topology space and eventually to reduce the training cost for parameter optimization. The reason to split the most connected node is that it might represent overlapping of independent states. In this case, the network topology may become simpler by splitting the states. Fig. 3 shows an example of such a case. In Fig. 3 (a), the most connected state is a hatched state which outputs E (Glu) with probability 0.26, Q (Gln) with probability 0.15 and so on. By splitting the state into two states, a new network will be obtained which has additional transitions represented by bold lines (see Fig. 3 (b)). However, the network can be simplified, if the most transitions become negligible after parameter optimization (see Fig. 3 (c)). In each epoch, this algorithm produces an optimal HMM for the training data with each number ofstates. Selecting the HMM with highest prediction accuracy, the optimal number of states for the given data is obtained. 3.2 Prediction using an HMM Prediction is carried out by comparing data with a Z score, which is a normalized likelihood. If a sequence achieves higher likelihood than the threshold value, then it is predicted to have the target motif, that is, the same structure and/or function. In the current implementation, a threshold value, determined by Z score [7], is generated by both positive examples and negative examples. Accuracy E,, is defined as follows; Transitions unrelated ted from Fig. 3. to the explanation are omit- 176
4 x C I C.v new (Left) A general rule for a duplication. (Right) A n example of the step from six to seven. (a) The resulting HMM with 6 states after parameter optimization and negligible transition deletion. (b) A new network by a hatched state copy. (c) An obtained HMM with 7 states after parameter optimization and negligible transition deletion. Figure 3: A part of learning R 0.13 Q 0.10 input: (protein) sequences and a small fully-connected HMM. initialization: optimize parameter for the HMM. choose the best HMM on likelihood as a seed. repeat network generation: delete negligible transitions. copy the most connected state. parameter optimization: optimize parameters for the new topology. choose the best HMM on likelihood. until sufficient accuracy is obtained. output: the resulting HMM. values are normalized with respect to the number of each example. Since there are plenty of negative examples, compared to positive examples, the threshold value would be determined only by the negative examples without normalization. The threshold value for classification is determined by the following method. In this method, it is assumed that the likelihood distribution, both positive and negative, is a normal distribution. Actually, Fig. 5 shows resulting likelihood for the close data, and the likelihood distribution is almost a normal distribution. By using this assumption, Eq. (1) is approximated as, Figure 4: Iterative Duplication Method where N+ is the number of positive examples, N- is the number of negative examples and Cf is the number of correctly classified examples in the positive data. C- is the number of correctly classified negative examples. Both the positive and negative accuracy where P(.;,u, U) is the normal distribution probability function with the mean value as p and standard deviation as u, P,,,~, a,,, is the mean value and standard deviation of negative examples, and ppos, upoj is for positive examples. To obtain the a value which 171
5 produces maximum E,,, the partial derivative for Eq. (2) with respect to (Y is taken and the formula which sets this derivative as 0 is solved. Consequently, the threshold value a is determined as follows; (3) accuracy is obtained. After 300 epoch the classification performance achieves as much as percent for the learning data and percent for testing data. In this performance evaluation, the cell which yields higher activation is considered as an answer category. Note that, in this experiment, the neural network is designed to predict an alpha helix region longer than 15 residues. This is different from usual applications of a neural network, predicting 3 classes of the secondary structures. This is the reason why this neural network shows such a high performance. For applying the trained neural network as an alpha helix region, output activations are compared with helix region teacher signal and a normalized squared error is utilized. Therefore, the small normalized squared error implies that the current subsequence might have an alpha helix structure Figure 5: Z Score Histogram for Close Data Figure 6: Neural Network Architect,ure 4 Secondary Structure Prediction using Neural Network For a neural network, a multi-layered perceptron is utilized. The neural network consists of 3 layers, an input layer, a hidden layer and an output layer. The neural network is designed to observe 15 residues simultaneously [12]. Each residue is coded as a binary string with a length of 20. When the residue is unknown, 0.05 is applied to all 20 input cells. Thus, there are 300 input cells at the input layer. There are 10 cells at the hidden layer. Two cells are at the output layer, individual ones correspond to a prediction category, one is for a helix and the other is a non-helix region (Fig. 6). Th is classification corresponds to a secondary structure of residue at the center of the input window. Learning is implemented with the backpropagation algorithm, until sufficient discrimination 5 Experiments 5.1 Training Data and Test Data For predicting a leucine zipper motif with an HMM, 112 positive examples, which are the collect8ion of subsequences annotated as leucine zipper (like), were chosen from the Swiss Protein database Release 22 [lo]. Positive examples contain short sequences in length 15, 22, 29 and 36, with proportion sas much as 7.14, 56.25, and 4.46 percent, respectively negative examples were randomly selected from the Swiss Protein database Release 22. These negative examples were chosen only from a protein which doesn t have a leucine zipper annotation. The ratio of sequence length is controlled to coincide to 178
6 Proceedings of the 28th Annual Hawaii lnrernational Conference on System Sciences that for positive examples. Randomly selected 80 percent of the positive subsequences are used for training, and the remaining positive examples are used for prediction performance evaluation. To employ a trained HMM as a classifier, a Z score threshold value is determined by both positive and negative examples. For determining threshold value, randomly selected 20 percent of the negative examples were utilized. Remaining 80 percent of the negative examples are used for testing purposes. To train a neural network, alpha helix subsequences are chosen from PDB data base of August of 92. Practically, to determine secondary structures from PDB data, the DSSP program is used. In the training data, subsequences of alpha helix with lengths more than 15 are chosen as the positive data. Neither 2,3 nor 5 helix is included in the positive data. All subsequences which don t contain any helix region in the input window range (15 residues) are chosen as the negative data. Randomly selected 70 percent of the training subsequences are used for learning, and the remaining data are used for prediction performance evaluation. To evaluate a symbolic representation performance, the following selected sequences are utilized; sequences which satisfy L-X(S)-L-X(6)-L-X(6)-L-X(6)-L pattern, a repetition of leucine and any six residues [ll]. Although, those selected sequences are similar to positive examples in terms of symbolic representation, only some of these are true leucine zipper sequences. Since there are numerous negative data ( nonleucine zipper sequences rather than leucine zipper sequences), the symbolic representation yields as high an accuracy as percent. 5.2 Evaluation with a Symbolic Pattern To measure symbolic motif representation performance, subsequences with lengths of 15, 22, 29 and 36 are collected from the Swiss Protein database and tested to determine whether they satisfy the leucine zipper representation, such is L-X(S)-L-X(S)-L-X(S)- L-X(6)-L. This classification is validated with the database annotated comment. The result is shown in Table 1. To make the evaluation performance acceptable, partial matching is separately counted as a partial match ; i.e., as long aa a selected subsequence is included in the correct region, which is determined by the database annotation, it is counted as a partial match. 5.3 Experimental Results with HMM training test pos.data pos.data neg.data average test0 98.9% 81.8% 83.0% 82.4% \ test % 91.3% 65.2% 78.2% test2 98.9% 68.2% 83.9% 76.1% test3 98.9% 87.0% 78.6% 82.8% test4 98.9% 77.3% 75.9% 76.6% Ave. 99.1% 81.3% 77.3% 79.3% Table 2: Prediction accuracy (leucine zipper) Table 2 shows the result of cross-validation for leucine zipper motifs. To contrast the ability of an HMM, 112 carefully selected negative examples are used. These are the sequences which contain the leucine zipper motif, L-X(G)-L-X(S)-L-X(S)-L-X(G)- L. Positive data is divided into 5 groups and tested with both negative and positive data. The average prediction accuracy is 99.1 percent for training data and 79.3 percent for test data; 81.3 percent for positive data To emphasize HMM performance, the negative test set is chosen from the sequences satisfying the L-X(6)-L-X(6)-L-X(6)-L-X(6)-L pattern, the average prediction accuracy for test data is just 14.8 percent; 29.5 percent for the positive data and 0.0 percent for the negative data. Hydrophilic I Hydrophobic Helix 3 Basic M- DNA-binding site p--; --- Observed from top L 5 2 Hydrophilic Figure 8: (Left) Biological structure of a leucine zipper motif. (Right)The helical wheel. Fig. 7 shows an HMM for leucine zipper motifs obtained using the iterative duplication method. This HMM contains global loops corresponding to the helix structure in the leucine zipper motif. Such a helical structure, as that shown in Fig. 8, is caused by 179
7 Proceedings of rhe 28th Annual Hawaii Inlernational Conference on System Sciences Perfect Match positive 1 total 1 percent 11 negative total I percent II average % % % Partial Match positive I total 1 percent 11 negative total percent 1 average % % % Table 1: Prediction Accuracy with a Symbolic Pattern s 0.25 E 0.23 DO.14 TO.17 \ 0.02 L 0.13 v 0.10 \ vo.15 I 0.12 K 0.12 NO.12 LO.11 E0.13 E-A L E0.41 R 0.27 NO.18 D 0.14 K0.12 Figure 7: Extracted HMM for a leucine zipper motif (Left) An HMM (1 eucine zipper). (Right) The helical wheel, i.e., helices observed from the top at HMM paths. Hydrophobic, hydrophilic and the other amino acids are described by bold, hatched and broken letters, respectively. The characters on the circles are t.he most frequently observed amino acids at each state. the existence of seven amino acids per every two periods. This is because a pair of aligned leucines forms a zipper-like structure. On the left in Fig. 8, this characteristic is shown with a helical wheel view from the top. These circles depict that there are many hydrophobic amino acids ou one side around combined leucines and many hydrophilic amino acids on the other side. This tendency for hydrophilic and hydrophobic amino acids is a key to forming two twisted helices. In Fig. 7, each circle at the right corresponding to each HMM path has a similar characteristic to the previous helical wheel. The characters on the circles are the most frequently observed amino acid in each state. In order to see the helical wheel, hydrophobic amino acids, such as I (Ile), V (Val), L (Leu), F (Phe), C (Cys) are described by bold letters in the following. On the other hand, hydrophilic amino acids, such as R (Arg), K (Lys), N (Asn), D (Asp), Q (Gln), E (Glu), H (His) are described by hatched letters. Others M (Met,), A (Ala), G (GUY), T (Thr), s Per>, W ( %I, Y (Tyr), P (Pro) are described by broken letters. These circles show three kinds of helical wheels. Therefore, lit is shown that the iterative duplication method automatically extracted the helical structures and characteristics from the positive data. Fig. 5 indicates the Z score for both positive and negative data on close data. By using the method described in 3.2, the threshold value (I! is determined as , which is indicated with a dotted line in Fig. 5. With this threshold value, the HMM achieved accuracy E,,, as much as percent for the close data. The close data consists of positive data and negative data; open data is 80 of the leucine zipper subse- 180
8 Proceedings of the 28th Annual Hawaii International Conference on System Sciences - I995 quences and is utilized for an HMM learning, negative data is of the non-leucine zipper subsequences and utilized for determining the threshold value. Details are shown in Table 3. This HMM and the threshold value (Y is examined with the open data, which are used for neither HMM learning nor the threshold value determination. The accuracy E,, is percent for the open data. Details are shown in Table 3. Fig. 9 indicates the Z score for the close data NN Nommliied Squared Error Figure 10: Normalized Squared Error Histogram for The Close Data For clarity, all negative Z scores less than the threshold value are omitted from this figure. Figure 9: Z Score Histogram for Open Data when in low likelihood. A solid line, in this figure, is drawn based on this fact. Therefore, the data below this line is interpreted as negative data. By utilizing this method, the number of incorrect classifications for negative data decreases from 4394 to 3190 for the close data. Consequently, accuracy E,, is improved from percent to percent. For the open data, the number of incorrect classifications for negative data decreases from to 6274, and accuracy E,, is improved from percent to percent. 7 Conclusion 6 Experimental Result with an HMM and a Neural Network Fig. 10 indicates a normalized squared error distribution for the close data. The normalized squared error is closely correlated to the class of the positive and negative data. Thus, the normalized squared error could be utilized for segregating the positive data from the negative data. Since some positive data indicates large squared error, it is much better to combine this measurement with another orthogonal measurement, such a9 the HMM likelihood. Fig. 11 indicates a scatter plot for the Z score and the normalized squared error. A dotted line corresponds to the threshold value a for the Z score. Careful examination of Fig. 11 shows that the normalized squared error increases when the Z score decreases for the positive dat,a. This indicates that leucine zipper subsequences tend to show high squared error, An HMM is capable of representing a stochastic motif well. Since a HMM is trained only by a small number of subsequences, there is a possibility that the HMM could not learn the general mechanism completely. Usually, the secondary structure of a motif is well known, especially for the leucine zipper motif. In the leucine zipper motif, there are two twisted alpha helix sequences bonded by leucines (or perhaps other similar amino acids). In the Swiss Protein database, the lengths of alpha helix sequences are 15, 22, 29 and 36. Therefore, a neural network trained with alpha helix subsequences more than 15 residues long and it is used to predict an alpha helix and a non-alpha helix region. According to the experience gained brom leucine zipper motif prediction, the HMM shows higher discrimination performance than a symbolic motif representation. By comparing two tables, Table 3 and 4, it is shown that the prediction performance is improved 181
9 pos.data neg.data correct total percentage correct total percentage average Close Open Table 3: Prediction accuracy with an HMM t Positive Z Score Figure 11: Normalized Squared Error Distribution for Close Data by combining a neural network. Since a large amount of negative data is used in this experiment and since most of the negative data is very easy to classify, the performance improvement looks small, such as percent. The ease achieved in classifying the negative data is shown by the prediction accuracy for a symbolic pattern method. This is because the amount of negative data, which is , further exceeds that for positive data, which is 112. Therefore, it is difficult to decrease the number of misclassifications of the negative data from By combining the neural network, the number of misclassifications of negative data is decreased from to Moreover, since a motif usually contains specific secondary structures, this combined method is widely applicable. Acknowledgment The authors would like to thank the genetic information processing group and Mr. Mamitsuka in NEC for meaningful discussion and valuable help. References [l] A. Konagaya and M. Kondo: Stochastic Motif Extraction using a Genetic Algorithm with the MDL principle, ~~ , HICSS26 (1993). [2] Y. Fujiwara and A. Konagaya: Protein Motif Extraction using Hidden Markov ModeP, pp56-64, Proceedings Genome Informatics Workshop IV (1993). [3] Y. Fujiwara, M. Asogawa and A. Konagaya: Protein Motif Extraction using Hidden Markov Model, To be appeared at ISMB 94 (1994). [4] S. Nakagawa: Speech Recognition Using Stochastic Models, pp29-108, Electronic Society of Information Communication (1988) [5] K. Asai, S. Hayamizu and K. Onizuka: HMM with Protein Structure Grammar, pp , HICSS26 (1993). 182
10 Z Score For clarity, all negative Z scores, less than the threshold value, are omitted from this figure. Figure 12: Normalized Squared Error Distribution for Open Data posdata neg.data correct total percentage correct total percentage average Close Open Table 4: Prediction accuracy with an HMM and a Neural Network [6] P. Baldi and Y. Chauvin and T.HunkapiIIer and M.A.McClure: Hidden Markov models of biological primary sequence information Neural Computation, (1994). [7] D. Haussler, A. Krogh, I. Mian and K. Sjolander: Protein Modeling using Hidden Markov Models: Analysis of Globins, pp , HICSS26 (1993). [ll] A. Aitken: Identification of Protein Consensus Sequences, ~~ , Ellis Horwood Limited (1990). [12] N. Qian, T. Sejnowski: Predicting the Secondary Structure of Globular Proteins Using Neural Network Models ~~ , Journal of Molecular Biology, 202 (1988). [8] J.Takami and S.Sagayama: Automatic Generation of the Hidden Markov Network by Successive State Splitting, , Proceedings of ICASSP (1991). [9] A. Bairoch: PROSITE database, SwissProt Release 22 (1992). [lo] A. Bairoch: Sequence database, SwissProt Release 22 (1992). 183
CFSSP: Chou and Fasman Secondary Structure Prediction server
Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationPacking of Secondary Structures
7.88 Lecture Notes - 5 7.24/7.88J/5.48J The Protein Folding and Human Disease Packing of Secondary Structures Packing of Helices against sheets Packing of sheets against sheets Parallel Orthogonal Table:
More information11 questions for a total of 120 points
Your Name: BYS 201, Final Exam, May 3, 2010 11 questions for a total of 120 points 1. 25 points Take a close look at these tables of amino acids. Some of them are hydrophilic, some hydrophobic, some positive
More informationSequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction
Sequence Analysis '17 -- lecture 16 1. Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Alpha helix Right-handed helix. H-bond is from the oxygen at i to the nitrogen
More informationMATH 5610, Computational Biology
MATH 5610, Computational Biology Lecture 2 Intro to Molecular Biology (cont) Stephen Billups University of Colorado at Denver MATH 5610, Computational Biology p.1/24 Announcements Error on syllabus Class
More informationSTRUCTURAL BIOLOGY. α/β structures Closed barrels Open twisted sheets Horseshoe folds
STRUCTURAL BIOLOGY α/β structures Closed barrels Open twisted sheets Horseshoe folds The α/β domains Most frequent domain structures are α/β domains: A central parallel or mixed β sheet Surrounded by α
More informationProtein 3D Structure Prediction
Protein 3D Structure Prediction Michael Tress CNIO ?? MREYKLVVLGSGGVGKSALTVQFVQGIFVDE YDPTIEDSYRKQVEVDCQQCMLEILDTAGTE QFTAMRDLYMKNGQGFALVYSITAQSTFNDL QDLREQILRVKDTEDVPMILVGNKCDLEDER VVGKEQGQNLARQWCNCAFLESSAKSKINVN
More informationSequence Databases and database scanning
Sequence Databases and database scanning Marjolein Thunnissen Lund, 2012 Types of databases: Primary sequence databases (proteins and nucleic acids). Composite protein sequence databases. Secondary databases.
More informationLecture 11: Gene Prediction
Lecture 11: Gene Prediction Study Chapter 6.11-6.14 1 Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Where are
More informationZool 3200: Cell Biology Exam 3 3/6/15
Name: Trask Zool 3200: Cell Biology Exam 3 3/6/15 Answer each of the following questions in the space provided; circle the correct answer or answers for each multiple choice question and circle either
More informationFundamentals of Protein Structure
Outline Fundamentals of Protein Structure Yu (Julie) Chen and Thomas Funkhouser Princeton University CS597A, Fall 2005 Protein structure Primary Secondary Tertiary Quaternary Forces and factors Levels
More informationStructure formation and association of biomolecules. Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München
Structure formation and association of biomolecules Prof. Dr. Martin Zacharias Lehrstuhl für Molekulardynamik (T38) Technische Universität München Motivation Many biomolecules are chemically synthesized
More information7.013 Problem Set 3 FRIDAY October 8th, 2004
MIT Biology Department 7.012: Introductory Biology - Fall 2004 Instructors: Professor Eric Lander, Professor Robert. Weinberg, Dr. laudette ardel Name: T: 7.013 Problem Set 3 FRIDY October 8th, 2004 Problem
More informationResidue Contact Prediction for Protein Structure using 2-Norm Distances
Residue Contact Prediction for Protein Structure using 2-Norm Distances Nikita V Mahajan Department of Computer Science &Engg GH Raisoni College of Engineering, Nagpur LGMalik Department of Computer Science
More informationProblem Set Unit The base ratios in the DNA and RNA for an onion (Allium cepa) are given below.
Problem Set Unit 3 Name 1. Which molecule is found in both DNA and RNA? A. Ribose B. Uracil C. Phosphate D. Amino acid 2. Which molecules form the nucleotide marked in the diagram? A. phosphate, deoxyribose
More informationAll Rights Reserved. U.S. Patents 6,471,520B1; 5,498,190; 5,916, North Market Street, Suite CC130A, Milwaukee, WI 53202
Secondary Structure In the previous protein folding activity, you created a hypothetical 15-amino acid protein and learned that basic principles of chemistry determine how each protein spontaneously folds
More informationFolding simulation: self-organization of 4-helix bundle protein. yellow = helical turns
Folding simulation: self-organization of 4-helix bundle protein yellow = helical turns Protein structure Protein: heteropolymer chain made of amino acid residues R + H 3 N - C - COO - H φ ψ Chain of amino
More informationNeural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University
Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons
More informationStructural bioinformatics
Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant
More informationChapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.
Chapter 8 One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model Abstract Despite the significant increase in computational power, molecular modeling of protein structure using
More informationIntroduction to Proteins
Introduction to Proteins Lecture 4 Module I: Molecular Structure & Metabolism Molecular Cell Biology Core Course (GSND5200) Matthew Neiditch - Room E450U ICPH matthew.neiditch@umdnj.edu What is a protein?
More informationVirtual bond representation
Today s subjects: Virtual bond representation Coordination number Contact maps Sidechain packing: is it an instrumental way of selecting and consolidating a fold? ASA of proteins Interatomic distances
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Secondary Structure Prediction Secondary Structure Annotation Given a macromolecular structure Identify the regions of secondary structure
More informationBayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction
Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction Seong-gon KIM Dept. of Computer & Information Science & Engineering, University of Florida Gainesville,
More informationAb Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*
COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec6:Interpreting Your Multiple Sequence Alignment Interpreting Your Multiple Sequence
More informationHomology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen
Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,
More informationProtein Structure Prediction. christian studer , EPFL
Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of
More informationCSE : Computational Issues in Molecular Biology. Lecture 19. Spring 2004
CSE 397-497: Computational Issues in Molecular Biology Lecture 19 Spring 2004-1- Protein structure Primary structure of protein is determined by number and order of amino acids within polypeptide chain.
More informationDNA Begins the Process
Biology I D N A DNA contains genes, sequences of nucleotide bases These Genes code for polypeptides (proteins) Proteins are used to build cells and do much of the work inside cells DNA Begins the Process
More informationProtein Folding Problem I400: Introduction to Bioinformatics
Protein Folding Problem I400: Introduction to Bioinformatics November 29, 2004 Protein biomolecule, macromolecule more than 50% of the dry weight of cells is proteins polymer of amino acids connected into
More informationImmune Programming. Payman Samadi. Supervisor: Dr. Majid Ahmadi. March Department of Electrical & Computer Engineering University of Windsor
Immune Programming Payman Samadi Supervisor: Dr. Majid Ahmadi March 2006 Department of Electrical & Computer Engineering University of Windsor OUTLINE Introduction Biological Immune System Artificial Immune
More informationThr Gly Tyr. Gly Lys Asn
Your unique body characteristics (traits), such as hair color or blood type, are determined by the proteins your body produces. Proteins are the building blocks of life - in fact, about 45% of the human
More informationGenetic Algorithms For Protein Threading
From: ISMB-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Genetic Algorithms For Protein Threading Jacqueline Yadgari #, Amihood Amir #, Ron Unger* # Department of Mathematics
More informationFrom DNA to Protein Structure and Function
STO-106 From DNA to Protein Structure and Function Teacher information Summary: Students model how information in the DNA base sequences is transcribed and translated to produce a protein molecule. They
More informationCHAPTER 1. DNA: The Hereditary Molecule SECTION D. What Does DNA Do? Chapter 1 Modern Genetics for All Students S 33
HPER 1 DN: he Hereditary Molecule SEION D What Does DN Do? hapter 1 Modern enetics for ll Students S 33 D.1 DN odes For Proteins PROEINS DO HE nitty-gritty jobs of every living cell. Proteins are the molecules
More informationBIRKBECK COLLEGE (University of London)
BIRKBECK COLLEGE (University of London) SCHOOL OF BIOLOGICAL SCIENCES M.Sc. EXAMINATION FOR INTERNAL STUDENTS ON: Postgraduate Certificate in Principles of Protein Structure MSc Structural Molecular Biology
More informationCase 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 29 September 2005
Case 7 A Storage Protein From Seeds of Brassica nigra is a Serine Protease Inhibitor Last modified 9 September 005 Focus concept Purification of a novel seed storage protein allows sequence analysis and
More informationDaily Agenda. Warm Up: Review. Translation Notes Protein Synthesis Practice. Redos
Daily Agenda Warm Up: Review Translation Notes Protein Synthesis Practice Redos 1. What is DNA Replication? 2. Where does DNA Replication take place? 3. Replicate this strand of DNA into complimentary
More informationBIOINFORMATICS Introduction
BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea
More informationG+C content. 1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores.
1 Introduction 2 Chromosomes Topology & Counts 3 Genome size 4 Replichores and gene orientation 5 Chirochores 6 7 Codon usage 121 marc.bailly-bechet@univ-lyon1.fr Bacterial genome structures Introduction
More informationTime Series Motif Discovery
Time Series Motif Discovery Bachelor s Thesis Exposé eingereicht von: Jonas Spenger Gutachter: Dr. rer. nat. Patrick Schäfer Gutachter: Prof. Dr. Ulf Leser eingereicht am: 10.09.2017 Contents 1 Introduction
More information1. DNA replication. (a) Why is DNA replication an essential process?
ame Section 7.014 Problem Set 3 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68120 by 5:00pm on Friday
More informationHybrid Learning Algorithm in Neural Network System for Enzyme Classification
Int. J. Advance. Soft Comput. Appl., Vol. 2, No. 2, July 2010 ISSN 2074-8523; Copyright ICSRS Publication, 2010 www.i-csrs.org Hybrid Learning Algorithm in Neural Network System for Enzyme Classification
More informationAPPENDIX. Appendix. Table of Contents. Ethics Background. Creating Discussion Ground Rules. Amino Acid Abbreviations and Chemistry Resources
Appendix Table of Contents A2 A3 A4 A5 A6 A7 A9 Ethics Background Creating Discussion Ground Rules Amino Acid Abbreviations and Chemistry Resources Codons and Amino Acid Chemistry Behind the Scenes with
More informationX-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction
X-ray structures of fructosyl peptide oxidases revealing residues responsible for gating oxygen access in the oxidative half reaction Tomohisa Shimasaki 1, Hiromi Yoshida 2, Shigehiro Kamitori 2 & Koji
More informationAipotu I & II: Genetics & Biochemistry
Aipotu I & II: Genetics & Biochemistry Objectives: To reinforce your understanding of Genetics, Biochemistry, and Molecular Biology To show the connections between these three disciplines To show how these
More informationFrom Gene to Protein Transcription and Translation
Name: Hour: From Gene to Protein Transcription and Translation Introduction: In this activity you will learn how the genes in our DNA influence our characteristics. For example, how can a gene cause albinism
More informationHomology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen
Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,
More informationGene Prediction. Srivani Narra Indian Institute of Technology Kanpur
Gene Prediction Srivani Narra Indian Institute of Technology Kanpur Email: srivani@iitk.ac.in Supervisor: Prof. Harish Karnick Indian Institute of Technology Kanpur Email: hk@iitk.ac.in Keywords: DNA,
More informationRNA does not adopt the classic B-DNA helix conformation when it forms a self-complementary double helix
Reason: RNA has ribose sugar ring, with a hydroxyl group (OH) If RNA in B-from conformation there would be unfavorable steric contact between the hydroxyl group, base, and phosphate backbone. RNA structure
More informationAn Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis
An Investigation of Palindromic Sequences in the Pseudomonas fluorescens SBW25 Genome Bachelor of Science Honors Thesis Lina L. Faller Department of Computer Science University of New Hampshire June 2008
More informationHmwk # 8 : DNA-Binding Proteins : Part II
The purpose of this exercise is : Hmwk # 8 : DNA-Binding Proteins : Part II 1). to examine the case of a tandem head-to-tail homodimer binding to DNA 2). to view a Zn finger motif 3). to consider the case
More information36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L-
36. The double bonds in naturally-occuring fatty acids are usually isomers. A. cis B. trans C. both cis and trans D. D- E. L- 37. The essential fatty acids are A. palmitic acid B. linoleic acid C. linolenic
More informationGenetic Algorithms in Matrix Representation and Its Application in Synthetic Data
Genetic Algorithms in Matrix Representation and Its Application in Synthetic Data Yingrui Chen *, Mark Elliot ** and Joe Sakshaug *** * ** University of Manchester, yingrui.chen@manchester.ac.uk University
More informationDisease and selection in the human genome 3
Disease and selection in the human genome 3 Ka/Ks revisited Please sit in row K or forward RBFD: human populations, adaptation and immunity Neandertal Museum, Mettman Germany Sequence genome Measure expression
More informationProtein Synthesis. Application Based Questions
Protein Synthesis Application Based Questions MRNA Triplet Codons Note: Logic behind the single letter abbreviations can be found at: http://www.biology.arizona.edu/biochemistry/problem_sets/aa/dayhoff.html
More informationProteins. Amino Acids (APK) Peptides (APK) 5/23/2012. Peptide bond. Acid. Amino
Proteins Amino Acids (APK) Acid Amino Image courtesy of Biotech (biotech.chem.indiana.edu/pages/ protein_intro.html) Peptides (APK) Peptide bond 1 Proteins (polypeptides) Segment of a protein Peptide bonds
More information1. DNA, RNA structure. 2. DNA replication. 3. Transcription, translation
1. DNA, RNA structure 2. DNA replication 3. Transcription, translation DNA and RNA are polymers of nucleotides DNA is a nucleic acid, made of long chains of nucleotides Nucleotide Phosphate group Nitrogenous
More informationThe study of protein secondary structure and stability at equilibrium ABSTRACT
The study of protein secondary structure and stability at equilibrium Michelle Planicka Dept. of Physics, North Georgia College and State University, Dahlonega, GA REU, Dept. of Physics, University of
More informationAmino Acid Distribution Rules Predict Protein Fold: Protein Grammar for Beta-Strand Sandwich-Like Structures
Biomolecules 2015, 5, 41-59; doi:10.3390/biom5010041 Article OPEN ACCESS biomolecules ISSN 2218-273X www.mdpi.com/journal/biomolecules/ Amino Acid Distribution Rules Predict Protein Fold: Protein Grammar
More informationRNA Secondary Structure Prediction Computational Genomics Seyoung Kim
RNA Secondary Structure Prediction 02-710 Computational Genomics Seyoung Kim Outline RNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction
More informationProtein Structure Databases, cont. 11/09/05
11/9/05 Protein Structure Databases (continued) Prediction & Modeling Bioinformatics Seminars Nov 10 Thurs 3:40 Com S Seminar in 223 Atanasoff Computational Epidemiology Armin R. Mikler, Univ. North Texas
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
1 CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, PhD Computer Science, Kennesaw State University 2 Genetics The discovery of
More informationAipotu I: Genetics & Biochemistry
Aipotu I: Genetics & Biochemistry Objectives: To reinforce your understanding of Genetics, Biochemistry, and Molecular Biology To show the connections between these three disciplines To show how these
More informationYour Name: MID TERM ANSWER SHEET SIN: ( )
MIDTERM EXAMINATION (October 23, 2008) BIOE150. Introduction to Bio-Nanoscience & Bio-Nanotechnology Professor Seung-Wuk Lee Fall Semester, 2008 0. Write down your name and the last digit of your SIN in
More informationProt-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures
Mol2Net, 2015, 1(Section F), pages 1-6, Proceedings 1 SciForum Mol2Net Prot-SSP: A Tool for Amino Acid Pairing Pattern Analysis in Secondary Structures Miguel de Sousa 1, *, Cristian R. Munteanu 2 and
More informationONLINE BIOINFORMATICS RESOURCES
Dedan Githae Email: d.githae@cgiar.org BecA-ILRI Hub; Nairobi, Kenya 16 May, 2014 ONLINE BIOINFORMATICS RESOURCES Introduction to Molecular Biology and Bioinformatics (IMBB) 2014 The larger picture.. Lower
More informationFrom Gene to Protein Transcription and Translation i
How do genes influence our characteristics? From Gene to Protein Transcription and Translation i A gene is a segment of DNA that provides the instructions for making a protein. Proteins have many different
More informationBiology: The substrate of bioinformatics
Bi01_1 Unit 01: Biology: The substrate of bioinformatics What is Bioinformatics? Bi01_2 handling of information related to living organisms understood on the basis of molecular biology Nature does it.
More information2. The instructions for making a protein are provided by a gene, which is a specific segment of a molecule.
From Gene to Protein Transcription and Translation By Dr. Ingrid Waldron and Dr. Jennifer Doherty, Department of Biology, University of Pennsylvania, Copyright, 2011 1 In this activity you will learn how
More informationCambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level
ambridge International Examinations ambridge International Advanced Subsidiary and Advanced Level *8744875516* BIOLOGY 9700/22 Paper 2 AS Level Structured Questions October/November 2016 1 hour 15 minutes
More informationFrom Gene to Protein via Transcription and Translation i
How do genes influence our characteristics? From Gene to Protein via Transcription and Translation i A gene is a segment of DNA that provides the instructions for making a protein. Proteins have many different
More informationIMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay 1* and Mr. Suman Chakraborty
Volume 2, No. 4, April 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMAGE HIDING IN DNA SEQUENCE USING ARITHMETIC ENCODING Prof. Samir Kumar Bandyopadhyay
More informationTIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica
TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS Liviu Lalescu, Costin Badica University of Craiova, Faculty of Control, Computers and Electronics Software Engineering Department, str.tehnicii, 5, Craiova,
More informationDiversity of chosen effectors in samples of Polish and Norwegian populations of Phytophthora infestans
Diversity of chosen effectors in samples of Polish and Norwegian populations of Phytophthora infestans Emil Stefańczyk 1, Marta Brylińska 1, May Bente Brurberg 2, Ragnhild Naerstad 2, Abdelhameed Elameen
More informationBasic Biology. Gina Cannarozzi. 28th October Basic Biology. Gina. Introduction DNA. Proteins. Central Dogma.
Cannarozzi 28th October 2005 Class Overview RNA Protein Genomics Transcriptomics Proteomics Genome wide Genome Comparison Microarrays Orthology: Families comparison and Sequencing of Transcription factor
More information7.014 Quiz II 3/18/05. Write your name on this page and your initials on all the other pages in the space provided.
7.014 Quiz II 3/18/05 Your Name: TA's Name: Write your name on this page and your initials on all the other pages in the space provided. This exam has 10 pages including this coversheet. heck that you
More informationProteins typically contain 20 different. Representation of Protein-Sequence Information by Amino Acid Subalphabets
Representation of Protein-Sequence Information by Amino Acid Subalphabets Claus A. F. Andersen and Søren Brunak Within computational biology, algorithms are constructed with the aim of extracting knowledge
More informationThis is the knowledge that you should understand upon completing this section:
DN 11 Syllabus hecklist This is the knowledge that you should understand upon completing this section: 11.1 DN DN occurs bound to proteins in chromosomes in the nucs and as unbound DN in the mitochondria.
More informationImproving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data
Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data Justin S. Hogg Department of Computational Biology University of Pittsburgh Pittsburgh, PA 15213 jsh32@pitt.edu Abstract
More informationPermutation Free Encoding Technique for Evolving Neural Networks
Permutation Free Encoding Technique for Evolving Neural Networks Anupam Das, Md. Shohrab Hossain, Saeed Muhammad Abdullah, and Rashed Ul Islam Department of Computer Science and Engineering, Bangladesh
More informationBETA STRAND Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna
Prof. Alejandro Hochkoeppler Department of Pharmaceutical Sciences and Biotechnology University of Bologna E-mail: a.hochkoeppler@unibo.it C-ter NH and CO groups: right, left, right (plane of the slide)
More informationModel. David Kulp, David Haussler. Baskin Center for Computer Engineering and Computer Science.
Integrating Database Homology in a Probabilistic Gene Structure Model David Kulp, David Haussler Baskin Center for Computer Engineering and Computer Science University of California, Santa Cruz CA, 95064,
More informationCristian Micheletti SISSA (Trieste)
Cristian Micheletti SISSA (Trieste) michelet@sissa.it Mar 2009 5pal - parvalbumin Calcium-binding protein HEADER CALCIUM-BINDING PROTEIN 25-SEP-91 5PAL 5PAL 2 COMPND PARVALBUMIN (ALPHA LINEAGE) 5PAL 3
More informationResearch Article The Influence of Flanking Secondary Structures on Amino Acid Content and Typical Lengths of 3/10 Helices
International Journal of Proteomics Volume 214, Article ID 3623, 13 pages http://dx.doi.org/1155/214/3623 Research Article The Influence of Flanking Secondary Structures on Amino Acid Content and Typical
More informationName Date of Data Collection. Class Period Lab Days/Period Teacher
Comparing Primates (adapted from Comparing Primates Lab, page 431-438, Biology Lab Manual, by Miller and Levine, Prentice Hall Publishers, copyright 2000, ISBN 0-13-436796-0) Background: One of the most
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Contents Cell biology Organisms and cells Building blocks of cells How genes encode proteins? Bioinformatics What is bioinformatics? Practical applications Tools and databases
More informationBasic Bioinformatics: Homology, Sequence Alignment,
Basic Bioinformatics: Homology, Sequence Alignment, and BLAST William S. Sanders Institute for Genomics, Biocomputing, and Biotechnology (IGBB) High Performance Computing Collaboratory (HPC 2 ) Mississippi
More informationMutagenesis. Classification of mutation. Spontaneous Base Substitution. Molecular Mutagenesis. Limits to DNA Pol Fidelity.
Mutagenesis 1. Classification of mutation 2. Base Substitution 3. Insertion Deletion 4. s 5. Chromosomal Aberration 6. Repair Mechanisms Classification of mutation 1. Definition heritable change in DNA
More informationBasic Concepts of Human Genetics
Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes
More informationBio-inspired Models of Computation. An Introduction
Bio-inspired Models of Computation An Introduction Introduction (1) Natural Computing is the study of models of computation inspired by the functioning of biological systems Natural Computing is not Bioinformatics
More information7.88J Protein Folding Problem Fall 2007
MIT OpenCourseWare http://ocw.mit.edu 7.88J Protein Folding Problem Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 7.88 Lecture Notes - 8 7.24/7.88J/5.48J
More informationGreen Genes: a DNA Curriculum
Green Genes: a DNA Curriculum Massachusetts 4-H Program Activity #2: Build Your Own DNA Model from Paper Parts Time: 45-60 minutes Introduction to Group: Ask the group what DNA is. Make sure the key concept
More informationAutomatic motif discovery in an enzyme database using a genetic algorithm-based approach
Soft Comput (2005) DOI 10.1007/s00500-005-0490-z FOCUS D. F. Tsunoda H. S. Lopes Automatic motif discovery in an enzyme database using a genetic algorithm-based approach Published online: 10 May 2005 Springer-Verlag
More informationBaum-Welch and HMM applications. November 16, 2017
Baum-Welch and HMM applications November 16, 2017 Markov chains 3 states of weather: sunny, cloudy, rainy Observed once a day at the same time All transitions are possible, with some probability Each state
More informationDistributions of Beta Sheets in Proteins with Application to Structure Prediction
Distributions of Beta Sheets in Proteins with Application to Structure Prediction Ingo Ruczinski Department of Biostatistics Johns Hopkins University Email: ingo@jhu.edu http://biostat.jhsph.edu/ iruczins
More informationPROTEIN MODEL CHALLANGE
PROTEIN MODEL CHALLANGE Lin Wozniewski lwoz@iun.edu Disclaimer i This presentation was prepared using draft rules. There may be some changes in the final copy of the rules. The rules which will be in your
More information