proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia

Size: px
Start display at page:

Download "proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia"

Transcription

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain ABSTRACT During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies. Proteins 2010; 78: VC 2010 Wiley-Liss, Inc. Key words: protein structure; structural prediction; residue residue contacts. INTRODUCTION Over the past 20 years, many different methods have been developed for the prediction of residue residue contacts in proteins. The techniques developed for the prediction of contacting residues are usually based on the extraction of correlated mutations from pairs of columns in a multiple alignment, 1 5 the training of machine learning methods on contact maps from real structures 6 10 or some combination of both. 11,12 In addition methods that make contact predictions based on the contact maps of predicted models were introduced in the residue residue contact prediction section of the most recent Critical Assessment of Protein Structure Prediction (CASP) experiment Contact predictions based on the contact maps taken from model structures were made possible because the server-predicted model structures were made available to the predictors by the CASP organizers as part of the CASP8 structure prediction experiment. The prediction of residue residue contacts has been part of CASP since CASP2. 13 However, contact prediction in its present form has only been an integral part of the experiment since CASP4 15 and the head-to-head evaluation of participating groups has only been possible since CASP5. 16 In the early CASP experiments, very few groups participated in the contact prediction category, but since CASP7 there has been renewed interest, as is clear from the number of groups that have published new work related to contact prediction since the last experiment Although the prediction of intramolecular residue residue contacts has traditionally been viewed as a source of information that might be useful in the prediction of protein structure, predicted contacts are rarely used directly by structure prediction programs. 28 Contact information has been incorporated in the form of contact potentials in fold recognition methods. 29,30 It has been suggested that predicted contacts might be used as restraints in NMR distance geometry and simulation techniques 31 and that predicting just a few important residue residue contacts with sufficient accuracy might be enough to infer approximate 3D model structures directly for many small proteins. 32 Contact predictions used in this way would be most valuable for target structures that would need to be modeled de novo. However, the approximate 20% accuracy that is routinely reported from the CASP experiments for the free modeling targets would suggest that residue residue contact predictions are not yet accurate enough to be used in ab initio structure prediction. Additional Supporting Information may be found in the online version of this article. Grant sponsor: Consolider, E-Science; Grant number: CSD ; Grant sponsor: ENCODE Project; Grant number: U54 HG *Correspondence to: Michael Tress, Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), c./ Melchor Fernandez Almagro, Madrid, Spain. mtress@cnio.es Received 28 October 2009; Revised 3 February 2010; Accepted 13 February 2010 Published online 12 March 2010 in Wiley InterScience ( DOI: /prot PROTEINS VC 2010 WILEY-LISS, INC.

2 Model Scoring with Contacts One further use of predicted contacts was suggested by the CASP7 contact prediction assessors 18 among others. Reliably predicted intramolecular contacts might be used to select among a range of alternative model structures or could at least be used to limit conformational searches, either through postprocessing or directly as part of the prediction strategy. At the CASP8 meeting in Cagliari in 2008, it was also suggested that these less reliable residue residue contact predictions may still be sufficiently useful to choose between a range of de novo modeled loop regions. 33 In fact the idea that predicted contacts, or at least some form of sequence-based contact predictions, might be able to discriminate between predicted models has a long history. 34 The CASP3 assessors 14 were the first to test the theory that predicted contacts might be useful in scoring models. They carried out a simple test in which they threaded the target sequence onto the target structure itself and scored the threaded targets with the predicted contacts. They recorded a degree of success in selecting the correctly threaded target for fold recognition targets. Recently, several groups have carried out self-assessments that seem to suggest that predicted contacts might be useful in selecting better scoring models from a range of decoys. 35,36 Here, we have revisited the CASP8 contact predictions and attempted to use them to score the models predicted by the structure prediction group and to score the targets themselves. The results show conclusively that information from predicted contacts can be useful in scoring models in the free modeling and difficult comparative modeling regimes and that predicted contacts might also be of some use in selecting short ab initio modeled regions. METHODS Contact predictors In CASP8, 22 groups made predictions for the contact prediction category. 19 Predicting groups in CASP submit lists of residue pairs that are predicted to be in contact at 8Å, along with a probability estimate that each pair is in contact. The prediction format allows predictors a certain amount of flexibility in the way that they can make their predictions and this flexibility allows predicting groups to use a wide range of different strategies. Predictors submit different numbers of predictions and use different distance cut-offs. In addition, the density of the predicted contacts varies between groups some groups avoid predicting adjacent contacts, whereas other groups predict all possible contacts. This means that predictions from these groups are not easily comparable. In CASP, the assessors get around the problem by using the predictor reliability score to rank the predictions of each group and by setting limits on the number of predictions that are assessed for each target. The length of the target or the target domain sequence is usually used to ensure that predictors are assessed over a fixed number of predicted pairs. In CASP8, the assessors used the length of the target domain sequence as a fixed reference point to allow comparisons between groups, limiting predictions by each group to the top-ranked L/5 or L/10 predictions, where L was the length of the target domains defined by the assessors. In this study, we also took the first L/10 predictions as a reference point, but we also scored models using all the predicted contacts submitted by each of the groups. Although the length cut-offs used by the CASP contact prediction assessors makes comparison between groups more reliable, the differences between the strategies of the predicting groups means that it is still difficult to compare groups in a completely fair manner, especially when predictors are assessed over all predicted contacts. For that reason, and because we are studying the utility of predicted contacts in general and not the effectiveness of individual groups, we have chosen not to identify the CASP predicting groups in this study. However, in the interest of furthering study in this field, we will identify them on request. The CASP8 contact assessment concentrated on a limited number of sequence separation ranges. We also looked at a limited number of sequence separation ranges, but we were interested in assessing the usefulness of short-range contacts as well as long-range contacts. We looked at predicted contacts at three distance ranges, x 6, x 12, and x 24. In theory, predicted contacts at a sequence separation of greater than 24 residues are much more valuable as structure constraints in 3D structure prediction, but the pool of available contact information is much greater if predicted short-range contacts are taken into consideration too. The predictions for the CASP8 contact prediction experiment used in this article are available on the CASP web site. 37 Target selection In CASP8 contact predictors were only assessed over a set of free modeling target domains. This is because if predicted contacts are to be used as part of the structure prediction process, they are most useful for targets that are not built from homologous template structures. It was felt that predicting contacts for targets with known templates was relatively trivial. To assess the use of predicted contact in the scoring of model structures, we looked at the contact predictions for all types of targets. Evaluations were made over the CASP8 assessor defined domains. 38 In CASP, targets are often split into multiple domains and these domains can be defined in two different categories. As in the CASP8 analysis, only the residues in the official CASP8 target domain definitions for each target were considered in the analysis. In part, this is because GDT-TS 39 scores for PROTEINS 1981

3 M.L. Tress and A. Valencia models of multidomain targets are too noisy because of the unpredictability of domain orientation, in part because domains in multidomain targets often belong to different prediction categories. We divided the CASP8 target domains into three groups, free modeling targets (the 13 free modeling and template-based modeling/free modeling overlap target domains from the CASP8 structure prediction evaluation), hard comparative modeling targets, and easy comparative modeling targets. The hard comparative modeling targets were 27 CASP8 target domains for which the average of the best AL0 alignments scores for all models was less than 65. The remaining 124 CASP8 target domains formed the easy comparative modeling target set. We also examined the possible use of contact predictions in selecting loops for template-based modeling target domains that we felt could only have been built ab initio. Loops defined as ab initio had at least seven residues that were 4 Å or more away in all close templates. We checked the closest 20 templates according to LGA. 40 Each loop superposition was checked by eye and extended if there were further residues clearly without a reasonable template. Assessment We used the predicted contacts from each CASP8 residue residue contact prediction group to score the 3D models predicted by the CASP8 structure prediction groups. For each predictor, we first generated sets of predicted contacts for each target domain based on the range of sequence separations and distance cut-offs used in the experiment. These sets of predicted contacts were then used to score the predicted 3D models for each of the target domains and the target domain itself. For each structure, we simply calculated the sum of the distances between each of the predicted pairs in the contact set. We used the sum of the distances to make two comparisons. First, how good the predicted pairs were at recognizing the target structure. Selecting the native structures should show the real predictive value of the contacts that the predictors are generating. For the second comparison, we left out the real structure and concentrated solely on the predicted 3D models. We wanted to know how the contact predictors would have performed if they had been used as naïve model scoring methods in CASP8. Here, we used the sum of distances we generated for each 3D model to select a single 3D prediction for each target domain. This assessment is somewhat closer to the way that contact predictors might be used to score models and allows a comparison with the structure prediction servers and quality assessment predictors that participated in CASP8. Selecting the model with the lowest sum of the distances of the predicted pairs is not a very sophisticated means of assessing the effectiveness of model selection from contacts. Predictors could almost certainly build better systems. However, even this simple tactic worked well in this experiment. To make the comparisons, we included all predicted 3D models for each target, not just model 1. However, we excluded physically unrealistic models from the evaluation. Models that had Cb Cb distances that were less than 3.2 Å or that had consecutive residues with Ca Ca distances that were greater than 4.25 Å apart were eliminated from the assessment a priori. 3D models that only covered a fraction of the predicted contacts for the target domain were also excluded because if a 3D model does not contain the residue predicted to be in contact, the contribution to the sum of the distances will be zero. The GDT-TS scores of the model structures submitted to CASP were taken from the CASP8 web pages. 41 RESULTS Using predicted contacts to select the native target structures We first investigated whether it was possible to use the predicted contacts to select the native structure from amongst the decoys in the free modeling regime. Eleven predictors managed to place the native structure in the top three scoring structures for at least five of the 13 free modeling targets. Using all the contacts and a minimum sequence separation of 12 residues, eight of the prediction groups would have been able to identify the native structures of T0443-D2 and T0496 using their predicted contacts, whereas at the other end of the scale, four target domains (T0476, T0482, T0443-D1, and T0405-D1) completely resisted selection using contacts. Three of these latter targets were mostly alpha-helical. For each predictor, we calculated the mean rank of the native structure when chosen from among the 3D models. The rank of the target structure was normalized over the number of models from which the selection was made. The results can be seen in Figure 1(A). All predictors have a ranking that is well over 0.5, the score that predictors would expect to obtain if the ranking of the native structure was entirely random. Most predictors perform better at selecting the native structure when all predicted contacts (rather than L/10) are used. One caveat to these results is that there are very few free modeling targets. However, similar results are obtained when predicted contacts are used to select the native structures in the hard comparative modeling section where there are 26 targets. The mean normalized ranking of the native structure for the harder comparative modeling targets [Fig. 1(B)] show that again all predictors have a ranking that is well over 0.5 and that many of the better predictors have mean normalized rankings of more than 0.8. Once again most predictors 1982 PROTEINS

4 Model Scoring with Contacts Figure 2 Normalized ranks for native structure selection varying the sequence separation. The mean normalized rank of the native structure after scoring with predicted contacts. Results are compared using the hard comparative modeling subset (26 target domains) and three different sequence separations, a minimum of six residues, a minimum of 12 residues, and a minimum of 24 residues. Comparisons are made with all predicted contacts. Similar results can be seen with the free modeling subset. [Color figure can be viewed in the online issue, which is available at Figure 1 Normalized ranks for native structure selection. The mean normalized rank of the native structure after scoring with predicted contacts. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets. Results are compared using all predicted contacts and a subset of the top-ranked L/10 predicted contacts. In this experiment, the minimum sequence separation was 24 residues. Note that most groups are better at selecting the target when all predicted contacts are used to score models. [Color figure can be viewed in the online issue, which is available at are better at selecting the native structure when all predicted contacts are used. When all the predicted contacts were used to score the models using a minimum sequence separation of 12 residues more than half of the contact prediction groups were able to recognize the native structures of target domains T0429-D2, T0430-D2, T0501-D2, T0504-D2, and T0510- D1. The contacts predicted by one of the groups (CP01) for target domain T0429-D2 illustrate the effectiveness of the predicted contacts (Supporting Information Fig. 1). The predictor CP01 predicts many residue residue contacts (there are a total of 419 predicted contacts for target T0429-D2) and most of them are false positives. However, the predicted contacts are exclusively in the core of the protein there are no contacts predicted for residues in the loops that extend from the globular structure. T0429- D2 was one of the hardest template-based targets; only one tertiary structure prediction group managed to generate a model with over 50% of the alignment correct. We used the hard comparative modeling target subset to assess the effect of limiting the sequence separation of the predicted contacts. As can be seen from Figure 2, limiting the sequence separation of the predicted contacts does not have much influence on the power of the predicted contacts to separate the native structure from the predictions. Equally there was no clear pattern to be seen with the free modeling targets (results not shown). Although predicted residue residue contacts do help to select native structure in the free modeling and hard template-based modeling regimes, they work less well with the easier template-based modeling targets. The comparison of the mean normalized ranks of the selected targets from the three groups (Fig. 3) shows that while Figure 3 Normalized ranks for native structure selection in three different prediction regimes. The mean normalized rank of the native structure after scoring with predicted contacts. Results are compared using the three different subsets, free modeling, and hard and easy comparative modeling. Comparisons are made with all predicted contacts and a minimum of 12 residues sequence separation and only groups with predictions for a minimum of eight targets in the free modeling category are shown. Most contact prediction methods perform close to random at selecting the native structure in the easy comparative modeling regime. [Color figure can be viewed in the online issue, which is available at PROTEINS 1983

5 M.L. Tress and A. Valencia the predicted contacts have discriminatory information when the evolutionary divergence between targets and potential structural templates is greatest (the free modeling and hard comparative modeling regimes), they have almost no discriminatory power when the native structure is similar to the predicted 3D models. Model scoring Although the recognition of the target structure shows that residue residue contact prediction methods can generate information that is useful for discriminating real structures from remotely similar 3D models, this is not yet a real world application. This is particularly true in the free modeling and hard comparative modeling regimes where the best predicted models are not necessarily highly similar to the native structure. In this second experiment, we wanted to see whether the predicted contacts could be used to help score and select predicted model structures. Each contact predictor was treated as if it were a structure prediction method. We took all the models for each CASP8 target domain and excluded those that were not feasible, as detailed in the methods section, those that were identical to other models and those that were missing the residues that were predicted to be in contact. For the remaining models, we calculated the sum of distances between the predicted pairs from each contact prediction method. The model with the lowest sum of distances was chosen as the prediction from the contact server for that target domain. This is one way that contact predictors might be used to select good structural models and allows a comparison with the structure prediction servers that participated in CASP8. It should be noted that the comparison between the server predictions in CASP8 and the contact predictors is not completely fair, in part because the contact predictors are selecting from a different pool of models the contact predictors have access to all the models submitted to CASP8, including those human group predictions that only became available after the experiment. However, the aim of the article is not to demonstrate whether or not contact prediction servers are better than 3D prediction servers, but to determine whether contact predictions have any use in 3D model prediction strategies. The results from the free modeling section can be seen in Figure 4. This shows that predictors are markedly better at selecting models when all predicted contacts are used to select the models and that contacts predicted at a sequence separation of greater than 11 are better at selecting high scoring models than those predicted at distances greater than 23. However, the results for the free modeling target domains should be viewed in a somewhat skeptical light, in part because GDT-TS is not as effective a means of measuring prediction accuracy in the free modeling regime and in part because there were so few targets in this subset. Figure 4 Mean GDT scores for models selected by contact predictors. The mean GDT-TS scores of the models selected as number one by the contact prediction methods. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets. Results are compared using all predicted contacts and a subset of the top-ranked L/10 predicted contacts and at two different sequence separations, a minimum of 12 residues and a minimum of 24 residues. Most predictors select models with slightly higher GDT-TS scores at a minimum sequence separation of 12 and using all predicted contacts. [Color figure can be viewed in the online issue, which is available at Results similar to those from the free modeling subset can be seen with the subset of harder comparative modeling targets [Fig. 4(B)], although here the effect of limiting the sequence separation of the contacts is less pronounced. Comparisons with predictions made with minimum sequence separations of 6, 12, and 24 show that the effect of restriction by sequence separation is minimal and prediction method specific (Supporting Information Fig. 2). As in the first experiment, we compared how well each predictor performed in each of the three regimes and the results (Supporting Information Fig. 3) mirror the equivalent comparison in the native structure selection experiment. Contact predictors rank the top scoring models much higher over the harder comparative modeling targets and the free modeling targets than they do over the easy comparative modeling targets. We also asked the question, how well would the contact predictors have performed in CASP8 if they were able select model structures based solely on the predicted 1984 PROTEINS

6 Model Scoring with Contacts contacts. Here, we evaluated the selections made by the contact predictors against the predictions made by the structure prediction groups in CASP8. CASP has traditionally measured comparative structure prediction performance using the mean of the Z-scores calculated from the GDT-TS scores for each target. Since structure prediction groups and contact prediction groups predict over different numbers of targets and each of the target domains has a different degree of difficulty, we compared the contact predictors and structure prediction servers in the same way as CASP. Z-scores were calculated from the GDT-TS scores for each target, not only from the first models of the structure prediction groups but also from the first selected models from the contact prediction groups. We calculated the mean Z-scores over all the predicted targets. In CASP, negative Z-scores are often set to 0. We did not do that in this comparison. The results can be seen in Figure 5. Although we calculated the Z-scores over all the predictors, for clarity and for comparison the figures only show the best official structure prediction servers and the best contact prediction groups The models selected by the contact prediction groups are clearly competitive with the best servers in both the free modeling [Fig. 5(A)] and hard comparative modeling categories [Fig. 5(B)]. The lack of targets in the free modeling category makes it almost impossible to make reliable statistical comparisons between groups. However, the results from the free modeling category concur with those for the hardest targets suggesting that the predictive power of residue residue contacts for the hard-to-predict target domains is real. However, only two contact prediction groups scored well when compared with the best structure prediction servers in the easy comparative modeling section [Fig. 5(C)]. These two methods both predict contacts using contact maps taken directly from predicted models a somewhat circular process. CP10 is the best performing contact predictor in the hard template-based modeling section too, better even than the best structure prediction server, suggesting that consensus contacts from predicted models could be a powerful tool for selecting models in the hard template-based modeling regime. Curiously though, CP10 performs worst of all the methods when attempting to select the native structure, suggesting that methods based on model contact maps may be learning useful consensus-based features of the models rather than native-like structural features. CP17, the best performing contact predictor in the easy comparative modeling regime, almost certainly take their contacts directly or indirectly from the models generated by one of the best performing structure prediction servers in the easy comparative modeling regime. There were a number of outstanding examples of model scoring with contacts, despite the low accuracy of the predicted contacts. For example, in the free modeling Figure 5 Comparison between contact predictors and CASP8 automatic servers. Contact predictors are compared with the best of the automatic servers in CASP8. The comparison is made over the mean Z-scores of the GDT-TS scores of the top-ranked model from each method. Contact predictors (CP) in purple, servers (TS) in light green. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets, and in (C) the results from the 118 easy comparative modeling targets. Contact predictor selections are made with all predicted contacts and a minimum of 12 residues sequence separation. Contact predictors are comparable to the best servers at selecting models in the free modeling and hard comparative modeling regions. In the easy comparative modeling regime, the automatic servers are clearly better than all but the contact predictors that take their contact maps from the predicted models in CASP. The comparison between automatic servers and contact predictors cannot be totally fair, but it does demonstrate the efficacy of the contact prediction servers in the free modeling and hard regimes and the poor results over the easier comparative modeling targets. [Color figure can be viewed in the online issue, which is available at PROTEINS 1985

7 M.L. Tress and A. Valencia Figure 6 Outstanding examples of model selection using predicted contact distances. Superpositions of the models selected using the predicted contacts and the native structures of the target domains. Models were scored using all the contacts and with a minimum sequence separation of 12 residues. A: The second-ranked model for target domain T0416-D2 (in red, selected by 10 predictors) and the native structure (yellow). B: One of the outstanding models for target domain T0513-D2 (in red, selected by four predictors) and the native structure (yellow). C: The best model for target domain T0429-D2 (in red, selected by 10 predictors) and the native structure (yellow). D: The best model for target domain T0487-D2 (in red, selected by four predictors) and the native structure (yellow). [Color figure can be viewed in the online issue, which is available at section, a total of 10 predictors would have selected a model that was second only to the number one ranked MULTICOM 42 model for target domain T0416-D2 [Fig. 6(A)], using all the contacts and with a sequence separation limit of 12 residues or more. Four contact prediction groups (CP00, CP07, CP08, and CP12) would have selected one of the standout models for target domain T0513-D2 [Fig. 6(B)]. There were also outstanding examples in the hard comparative modeling regime, such as target domain T0429-D2 where 10 contact prediction groups would have selected the top-ranked model [the MUSTER server, 43 Fig. 6(C)] and three other groups would have selected models with higher GDT-TS than the second-ranked predictor. Similarly, nine contact prediction groups would have selected (different) models with GDT-TS scores comparable to the top four predictors for target domain T0457-D2 (BAKER, 44 the PSI server, IBT-LT, 45 and Zhang 33 ), whereas for target domain T0487-D4 four contact groups (CP00, CP04, CP12, and CP13) would have selected the best scoring human model [from BAKER, Fig. 6(D)] and three others would have selected slightly inferior models that still scored between 8 and 12 GDT-TS points more than the secondranked human predictor for this target domain and that were a full 25 GDT-TS points better than the best number one-ranked server model (Fig. 7). The contacts predicted by group CP08 for target domain T0407-D2 would have selected the second highest ranked model using our crude model scoring system, a model that was within three GDT-TS points of the IBT- LT prediction and five GDT-TS points better than the second best human prediction. Comparing the selected model with the best number one-ranked server model, from the PHYRE server, 46 it is clear that the regions with the greatest difference between the model selected by CP08 and the PHYRE model are at the terminal ends of the protein. The positioning of the N-terminal strand of the BAKER model is more or less correct, whereas 1986 PROTEINS

8 Model Scoring with Contacts Figure 7 Best contact predictor models and server models. Native-model superpositions for two targets where contact predictors performed substantially better than the best server predictor. All structures are colored blue to red along their sequence. A: Superposition of the model selected using the contacts predicted by group CP08 and the native structure of target domain T0407-D2. B: Superposition of the top-ranked PHYRE model and the native structure of target domain T0407-D2. The N-terminal (blue) and C-terminal (red) regions are predicted differently between the two models. C: Superposition of the model selected using the contacts predicted by four contact groups (CP00, CP04, CP12, and CP13) and the native structure of target domain T0487-D4. D: Superposition of the top-ranked PCons model and the native structure of target domain T0487-D4. The greatest differences between the two models are that the second strand is displaced and that the loose loop conformation between the second and third strands is entirely different. these residues do not form a strand in the PHYRE model (Fig. 7). There is a strand misaligned in the second example in Figure 7 too, although here there are also substantial differences in one of the loop regions between the best comparable server model from Pcons 47 and the model selected by using the contacts predicted by the four contact groups (CP00, CP04, CP12, and CP13). Ab initio predicted regions We also used predicted residue residue contacts to score regions of structures that did not have template structures. These regions were either loops or terminal regions that could only have been modeled ab initio. All these regions (bar one) were part of assessed template-based target domains in CASP8. There were 36 regions from CASP8 target domains that ranged in size from seven residues (target domain T0510, residues 136 to 142) to 39 residues (target domain T0419, 256 to 294). In addition, we included residues 235 to 304 from target T0395 that were excluded from the CASP8 target definition. In Figure 8, we show the success of the contact predictors at differentiating the native loop from the modeled regions. The better predictors have mean normalized rank of native selection of 0.7, not as impressive when compared with the free modeling targets and the hard template-based modeling targets, but showing that the contacts have discriminatory power here too. Although we have not identified the predictors, there is a clear difference between those methods that base their contacts PROTEINS 1987

9 M.L. Tress and A. Valencia models that reached a certain overall quality. We chose as a cut-off models that had GDT-TS scores of at least 60. After removing low-quality models, we were left with models for just 18 ab initio regions. Once again for each predictor, we calculated the mean normalized ranking of the selected loops. The results can be seen in Fig. 8(B). On the whole, the predictors perform better when all contacts are used to score the loops. Two contact predictors stand out, CP02 and CP08, with a mean normalized ranking of greater than 0.7 (comparable to the selective power seen in the best structure predictors, see Fig. 4), and there are many servers that have scores between 0.6 and 0.7, so their selection of the ab initio regions was somewhat better than a random selection (0.5). Once again the predictors that predict contacts using the contact maps taken from predicted 3D models were among the worst performing groups in this section, another indication of the poor quality of the CASP8 structure prediction servers when predicting ab initio regions in template-based models. A comparison with quality assessment methods Figure 8 Normalized rankings for selecting native loops and for selected model loop. In (A) the mean normalized rank of the native loop after scoring with predicted contacts, in (B) the mean normalized GDT-TS rank of the selected model after scoring with predicted contacts. Results are compared using all predicted contacts and a subset of the top-ranked L/2 predicted contacts. In this experiment, the minimum sequence separation was eight residues. Predictors were only compared if they predicted loops for at least 10 of the targets and comparisons in (B) were only made for those models that reached a GDT-TS score of at least 60. [Color figure can be viewed in the online issue, which is available at on information that can be gleaned from the target sequence (on the left) and those that take their predicted contacts from the contact maps of predicted models (poor scoring predictors on the right). Predictions taken from model contact maps are not good at recognizing native loop conformations and indeed predictor CP10 performs markedly worse than random on this test. This is in part because structure predictor servers generally perform poorly when modeling these ab initio regions. Twenty four of the regions that would have to have been modeled ab initio were mainly helical, four formed strands and eight were mainly loop. Although it was just a small sample, contact predictors appeared to perform better at selecting native ab initio regions when the region is part of a beta sheet and less well when they form loops (Supporting Information Fig. 4). The final comparison measures how effective the contact prediction servers would have been at selecting regions that have to be modeled ab initio. Here, we looked at the same 37 regions as in the first experiment, but only considered the modeled ab initio regions in those We have used predicted contacts to score predicted 3D models. In theory, contact predictors would be able to estimate the overall quality of 3D models with a slight modification of their methods. CASP has implemented a quality assessment category since CASP7 and we have been able to use the predictions and results from the CASP quality assessment experiment 48 to compare the performance of contact predictors and the quality assessment groups in this category. The comparison with the quality assessment predictors presented a number of difficulties. The quality assessment category was carried out with whole targets, whereas the contact prediction scoring in this article used target domains. The quality assessment predictors had access only to the CASP server models, whereas the models that the contact predictors were allowed to select came from both human and server groups. We also only concentrated on those target domains that we defined as hard comparative modeling in this article because as we have already shown that CASP format predicted contacts are less useful for scoring easy comparative modeling targets. All these difficulties meant that the comparison had to be carried out over a reduced set of target domains. The comparison was carried out over the nine targets from the hard comparative modeling section that were not split into domains by the CASP assessors. Comparisons were made only over the server models. The results can be seen in Figure 9. Over the nine targets where it was possible to make a comparison between selected server models, 12 quality assessment servers performed better than the best contact predictor. The 12 methods were FAMS-ace2 (human), McGuffin (human 49 ), MULTICOM (human 42 ), 1988 PROTEINS

10 Model Scoring with Contacts Figure 9 Mean GDT-TS for the top models chosen by contact predictors and quality assessment groups over the hard comparative modeling subset. The mean GDT-TS score for the models selected as the best model by CASP8 contact prediction groups and quality assessment groups. The means were calculated over a subset of nine hard comparative modeling target domains as described in the main text. Quality assessment group GDT-TS scores are shown in blue and contact predictors in orange. Only the top 33 groups of the 63 combined quality assessment and contact prediction groups are shown. Models selected by the contact predictors were based on distances calculated from all predicted contacts at a minimum sequence separation of 12 residues. The dark blue bars represent consensus-based quality assessment predictors, the light blue bars show quality assessment groups that use measures calculated for a single model. Single model predictors QA009, QA365, QA069, QA105, and QA117 all use predicted contacts in some form in their quality assessment measures. [Color figure can be viewed in the online issue, which is available at SMEG-CCP (human), ModFOLDclust (server 49 ), Pcons_Pcons (server 47 ), selfqmean (server 50 ), MULTICOM- CLUSTER (server 42 ), MODCHECK-Jury (server), and the three servers from the SAM-T08-MQA series. 51 As far as we can tell 10 of the servers use the CASP predicted server models to generate consensus-based information, the strategy that has been shown to be most effective at selecting higher scoring GDT-TS models. 48 The two servers that do not use consensus information come from the SAM-T08- MQA series. All of the SAM-T08-MQA servers 51 use predicted contacts to assess the quality of the models 52,53 and the SAM-T08-MQAO server uses the predicted contacts alone. In addition, SMEG-CCP appears to use contacts predicted from models to make their quality assessment predictions. There are another nine quality assessment groups that score better than the third best contact prediction group. These also appear to be mainly consensus-based methods (QMEANSclust, 50 two servers from the FAMS series, GS- MetaMQAPconsI and II) or to use models (Bilab-UT). MULTICOM-CMFR 42 is a single model method that has done well over these nine targets, but MULTICOM-CMFR also used predicted and model contact maps as part of its model evaluation scheme. MUFOLD-QA used a range of statistical potentials and machine learning. Although these are results for just nine targets, it is interesting to compare our ranking with that of the quality assessment experiment 48 over all the targets. The quality assessment category had a bias toward easy comparative modeling targets since there were many more of them in CASP8. 38 Quality Assessment groups that performed notably better over the nine hard comparative modeling targets included FAMSace2, SAM-T08-MQAO (which uses contacts from alignments to predict quality assessment), SMEG-CCP (predicts quality assessment from consensus contacts from models), Bilab-UT, MUFOLD-QA (uses statistical potentials), selfqmean (uses statistical potentials), MULTICOM-CMFR (uses predicted contacts), and FAMSD. At least over this small sample, it does seem that predicted contacts help to predict the quality of predicted 3D models for the harder comparative modeling targets. CONCLUSIONS In the CASP8 contact prediction experiment the CASP assessors noted that the numbers of predictors in CASP8 suggested that there was an increased interest in the use of predicted contacts in structure prediction. Here, we have attempted to use the predicted contacts from the CASP8 residue residue contact prediction experiment to score the models for the CASP8 targets. We have been able to show that predicted contacts do indeed contain predictive information and that this information can be used to predict structure. At worst predicted contacts might be used as an aid in the selection of reliable 3D structural models and at best models selected using residue residue contact predictions might even compete with the best of the prediction servers for the harder targets. We split the CASP8 targets into three groups and it was clear that predicted contacts were much more valuable in the free modeling targets and the harder comparative modeling regimes. Most of the simple predictors that were built from the predicted contacts for these targets were highly effective at recognizing the target structure and many were capable of scoring model structures in an efficient manner in the free modeling and harder comparative modeling regimes. Indeed, the experiments show that the contact predictions from the best of the contact predictors could have been used to select models that were as good as or better than the vast majority of the models submitted by the automatic servers in the CASP8 experiment. Methods that select models based on the contact maps of the predicted models are likely to be particularly effective in the hard comparative modeling regime. It is important to note that a direct comparison between structure prediction servers and contact predictor scoring is not possible because of the conditions of the experiments. GDT-TS is known not to be an ideal measure of model quality in the free modeling regime PROTEINS 1989

11 M.L. Tress and A. Valencia and both the free modeling and harder comparative modeling sets contain few targets. Contact predictors also have the advantage of knowing the domain definitions in hindsight (though our results show us that predictors perform equally well with the subset of single domain proteins) and contact predictors had the advantage of being able to select from any model submitted to CASP8 (e.g., both contact predictor selected examples in Figure 7 came from the BAKER group and the servers did not have access to these models). Despite these caveats our experiments clearly show that even with a crude scoring scheme, such as the one in this article, predicted contacts can provide information that is useful for selecting both models and native structures. At present, this information is used in few structure prediction techniques and seems to be orthogonal to the information used by most successful structure prediction strategies. We also looked at how well groups were able to select ab initio predicted loops. Once again the predictors performed better than expected, though the results were less outstanding than there were for the prediction of the whole target domains. Most predictors performed better in the experiments when they used all their predicted contacts, not just the top L/10 ranked contacts. This is in contrast to the results of the CASP7 and CASP8 contact prediction experiments, 17,18 which showed that contact prediction accuracy improved as the number of predicted contacts went down. Our experiments demonstrate that even contacts predicted with very low levels of accuracy can be useful for the purposes of model selection, something that runs contrary to the suggestion of Ortiz and Skolnick 54 that just a few highly reliable predicted contacts might be sufficient in structure prediction. There was one section in which the contact predictors did not prove useful no groups were able to separate the native structure from the models in a consistent fashion for the easy comparative modeling target domains. The few groups that had some success at scoring models in the easy comparative modeling regime took their contacts taken from the contact maps of the predicted structures. This may be because contacts predicted at 8 Å are too coarse to appreciate the subtle differences that occur between models in this regime. We attempted to measure whether the predicted contacts were more useful for scoring certain types of targets or regions of targets that had to predict ab initio. With so few targets, it was difficult to find obvious correlations, although it was clear that the more difficult the target, the better the contact predictors performed, as suggested by the results from the easy comparative modeling section. In addition, contact predictors seemed to perform less effectively with all-helical target domains. There were three poorly predicted all-helical targets in the free modeling regime and the two domains of T0478 were among the worst predicted in the hard comparative modeling regime. The models chosen by the contact predictors in the easy comparative modeling regime were on average worse than the median scoring model in 19 of the 23 all-helical target domains. It is possible that the residue residue contact prediction experiment will not be carried out in the usual format in CASP9 due to the lack of suitable targets. 38 However, this work suggests that despite the low levels of accuracy reported in previous CASP experiments, predicted contacts might have a place in structure prediction as part of model scoring strategies. It is clear that the best quality assessment methods use some form of consensus information to select models, but the results from the reduced comparison that we were able to carry out with the quality assessment groups suggested that predicted contacts could be used as a highly effective part of a model scoring strategy. This is something that has been suggested by the authors of the SAM-T08-MQA servers 52 and as could be seen from our minimal comparison even the SAM-T08- MQA server that used predicted contacts alone was able to compete with the best consensus servers over the harder comparative modeling targets. So, predicted contacts do appear to have promise in the prediction of 3D structures, but rather than guiding the structure prediction, it looks as if they would be put to better use scoring or selecting the models generated by structure prediction programs. In addition, this experiment has only looked at the scoring of models for single compact, globular domains. It may also be possible that predicted contacts could also help in the prediction of other 3D structural features, such as domain domain orientation and the docking of proteins. ACKNOWLEDGMENTS The authors like to thank Osvaldo Graña and Txema Gonzalez-Izarzugaza for their critical input on the article, the CASP organizers for organizing the CASP experiments, and Andriy Kryshtafovych for maintaining the CASP web pages. REFERENCES 1. Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of coordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987;193: Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994;18: Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des Suppl 1997;2: Kundrotas PJ, Alexov EG. Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics 2006;7: Fodor AA, Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 2004;56: Pollastri G, Baldi P. Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 2002;18: PROTEINS

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Conformational Analysis 2 Conformational Analysis Properties of molecules depend on their three-dimensional

More information

proteins PREDICTION REPORT Fast and accurate automatic structure prediction with HHpred

proteins PREDICTION REPORT Fast and accurate automatic structure prediction with HHpred proteins STRUCTURE O FUNCTION O BIOINFORMATICS PREDICTION REPORT Fast and accurate automatic structure prediction with HHpred Andrea Hildebrand, Michael Remmert, Andreas Biegert, and Johannes Söding* Gene

More information

Assessment of Progress Over the CASP Experiments

Assessment of Progress Over the CASP Experiments PROTEINS: Structure, Function, and Genetics 53:585 595 (2003) Assessment of Progress Over the CASP Experiments C eslovas Venclovas, 1,2 Adam Zemla, 1 Krzysztof Fidelis, 1 and John Moult 3 * 1 Biology and

More information

proteins NEW FOLDS: ASSESSMENT Assessment of CASP8 structure predictions for template free targets

proteins NEW FOLDS: ASSESSMENT Assessment of CASP8 structure predictions for template free targets proteins STRUCTURE O FUNCTION O BIOINFORMATICS NEW FOLDS: ASSESSMENT Assessment of CASP8 structure predictions for template free targets Moshe Ben-David, 1 Orly Noivirt-Brik, 1 Aviv Paz, 1 Jaime Prilusky,

More information

proteins CASP PROGRESS REPORTS CASP8 results in context of previous experiments Andriy Kryshtafovych, 1 * Krzysztof Fidelis, 1 and John Moult 2

proteins CASP PROGRESS REPORTS CASP8 results in context of previous experiments Andriy Kryshtafovych, 1 * Krzysztof Fidelis, 1 and John Moult 2 proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP PROGRESS REPORTS CASP8 results in context of previous experiments Andriy Kryshtafovych, 1 * Krzysztof Fidelis, 1 and John Moult 2 1 Genome Center, University

More information

proteins CASP8 TARGET CLASSIFICATION Target domain definition and classification in CASP8

proteins CASP8 TARGET CLASSIFICATION Target domain definition and classification in CASP8 proteins STRUCTURE O FUNCTION O BIOINFORMATICS CASP8 TARGET CLASSIFICATION Target domain definition and classification in CASP8 Michael L. Tress, 1 * Iakes Ezkurdia, 1 and Jane S. Richardson 2 1 Structural

More information

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods

CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods PROTEINS: Structure, Function, and Genetics 53:503 516 (2003) CAFASP3: The Third Critical Assessment of Fully Automated Structure Prediction Methods Daniel Fischer, 1 * Leszek Rychlewski, 2 Roland L. Dunbrack,

More information

proteins PREDICTION REPORT The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins

proteins PREDICTION REPORT The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins proteins STRUCTURE O FUNCTION O BIOINFORMATICS PREDICTION REPORT The use of automatic tools and human expertise in template-based modeling of CASP8 target proteins Česlovas Venclovas* and Mindaugas Margelevičius

More information

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction

Sequence Analysis '17 -- lecture Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Sequence Analysis '17 -- lecture 16 1. Secondary structure 3. Sequence similarity and homology 2. Secondary structure prediction Alpha helix Right-handed helix. H-bond is from the oxygen at i to the nitrogen

More information

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS*

Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 9(1-2) 93-100 (2003/2004) Ab Initio SERVER PROTOTYPE FOR PREDICTION OF PHOSPHORYLATION SITES IN PROTEINS* DARIUSZ PLEWCZYNSKI AND LESZEK RYCHLEWSKI BiolnfoBank

More information

Recursive protein modeling: a divide and conquer strategy for protein structure prediction and its case study in CASP9

Recursive protein modeling: a divide and conquer strategy for protein structure prediction and its case study in CASP9 Recursive protein modeling: a divide and conquer strategy for protein structure prediction and its case study in CASP9 Jianlin Cheng Department of Computer Science, Informatics Institute, C.Bond Life Science

More information

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M.

Computational Methods for Protein Structure Prediction and Fold Recognition... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M. Contents Computational Methods for Protein Structure Prediction and Fold Recognition........................... 1 I. Cymerman, M. Feder, M. PawŁowski, M.A. Kurowski, J.M. Bujnicki 1 Primary Structure Analysis...................

More information

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center

Comparative Modeling Part 1. Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center Comparative Modeling Part 1 Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center Function is the most important feature of a protein Function is related to structure Structure is

More information

Assessing a novel approach for predicting local 3D protein structures from sequence

Assessing a novel approach for predicting local 3D protein structures from sequence Assessing a novel approach for predicting local 3D protein structures from sequence Cristina Benros*, Alexandre G. de Brevern, Catherine Etchebest and Serge Hazout Equipe de Bioinformatique Génomique et

More information

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading

Molecular Modeling 9. Protein structure prediction, part 2: Homology modeling, fold recognition & threading Molecular Modeling 9 Protein structure prediction, part 2: Homology modeling, fold recognition & threading The project... Remember: You are smarter than the program. Inspecting the model: Are amino acids

More information

3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid

3D Structure Prediction with Fold Recognition/Threading. Michael Tress CNB-CSIC, Madrid 3D Structure Prediction with Fold Recognition/Threading Michael Tress CNB-CSIC, Madrid MREYKLVVLGSGGVGKSALTVQFVQGIFVDEYDPTIEDSY RKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFAL VYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDL

More information

MOL204 Exam Fall 2015

MOL204 Exam Fall 2015 MOL204 Exam Fall 2015 Exercise 1 15 pts 1. 1A. Define primary and secondary bioinformatical databases and mention two examples of primary bioinformatical databases and one example of a secondary bioinformatical

More information

CASP 13 Assembly assessment

CASP 13 Assembly assessment CASP 13 Assembly assessment Riviera Maya, Dec 2018 Jose Duarte, Dmytro Guzenko RCSB Protein Data Bank, UC San Diego Biological assembly of targets The Ground Truth is not always 100% clear when talking

More information

Protein Structure Prediction

Protein Structure Prediction Homology Modeling Protein Structure Prediction Ingo Ruczinski M T S K G G G Y F F Y D E L Y G V V V V L I V L S D E S Department of Biostatistics, Johns Hopkins University Fold Recognition b Initio Structure

More information

CFSSP: Chou and Fasman Secondary Structure Prediction server

CFSSP: Chou and Fasman Secondary Structure Prediction server Wide Spectrum, Vol. 1, No. 9, (2013) pp 15-19 CFSSP: Chou and Fasman Secondary Structure Prediction server T. Ashok Kumar Department of Bioinformatics, Noorul Islam College of Arts and Science, Kumaracoil

More information

Designing and benchmarking the MULTICOM protein structure prediction system

Designing and benchmarking the MULTICOM protein structure prediction system Li et al. BMC Structural Biology 2013, 13:2 RESEARCH ARTICLE Open Access Designing and benchmarking the MULTICOM protein structure prediction system Jilong Li 1, Xin Deng 1, Jesse Eickholt 1 and Jianlin

More information

Exploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling

Exploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling Exploring Suboptimal Sequence Alignments and Scoring Functions in Comparative Protein Structural Modeling Presented by Kate Stafford 1,2 Research Mentor: Troy Wymore 3 1 Bioengineering and Bioinformatics

More information

proteins Template-based protein structure prediction the last decade

proteins Template-based protein structure prediction the last decade proteins STRUCTURE O FUNCTION O BIOINFORMATICS Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade Jianyi Yang, 1,2 Wenxuan Zhang, 1,2 Baoji He, 1,2 Sara

More information

CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods

CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods PROTEINS: Structure, Function, and Genetics Suppl 5:171 183 (2001) DOI 10.1002/prot.10036 CAFASP2:TheSecondCriticalAssessmentofFully AutomatedStructurePredictionMethods DanielFischer, 1 * ArneElofsson,

More information

Protein single-model quality assessment by feature-based probability density functions

Protein single-model quality assessment by feature-based probability density functions Protein single-model quality assessment by feature-based probability density functions Renzhi Cao 1 1, 2, 3, * and Jianlin Cheng 1 Department of Computer Science, University of Missouri, Columbia, MO 65211,

More information

BIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btn069

BIOINFORMATICS ORIGINAL PAPER doi: /bioinformatics/btn069 Vol. 24 no. 7 28, pages 924 93 BIOINFORMATICS ORIGINAL PAPER doi:.93/bioinformatics/btn69 Structural bioinformatics A comprehensive assessment of sequence-based and template-based methods for protein contact

More information

Università degli Studi di Roma La Sapienza

Università degli Studi di Roma La Sapienza Università degli Studi di Roma La Sapienza. Tutore Prof.ssa Anna Tramontano Coordinatore Prof. Marco Tripodi Dottorando Daniel Carbajo Pedrosa Dottorato di Ricerca in Scienze Pasteuriane XXIV CICLO Docente

More information

Large-scale model quality assessment for improving protein tertiary structure prediction

Large-scale model quality assessment for improving protein tertiary structure prediction Bioinformatics, 31, 2015, i116 i123 doi: 10.1093/bioinformatics/btv235 ISMB/ECCB 2015 Large-scale model quality assessment for improving protein tertiary structure prediction Renzhi Cao 1, Debswapna Bhattacharya

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

4/10/2011. Rosetta software package. Rosetta.. Conformational sampling and scoring of models in Rosetta.

4/10/2011. Rosetta software package. Rosetta.. Conformational sampling and scoring of models in Rosetta. Rosetta.. Ph.D. Thomas M. Frimurer Novo Nordisk Foundation Center for Potein Reseach Center for Basic Metabilic Research Breif introduction to Rosetta Rosetta docking example Rosetta software package Breif

More information

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004

CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004 CS273: Algorithms for Structure Handout # 5 and Motion in Biology Stanford University Tuesday, 13 April 2004 Lecture #5: 13 April 2004 Topics: Sequence motif identification Scribe: Samantha Chui 1 Introduction

More information

Evaluation of Disorder Predictions in CASP5

Evaluation of Disorder Predictions in CASP5 PROTEINS: Structure, Function, and Genetics 53:561 565 (2003) Evaluation of Disorder Predictions in CASP5 Eugene Melamud 1,2 and John Moult 1 * 1 Center for Advanced Research in Biotechnology, University

More information

An Introduction to Protein Contact Prediction

An Introduction to Protein Contact Prediction An Introduction to Protein Contact Prediction Nicholas Hamilton Institute for Molecular Bioscience and Advanced Computational Modelling Centre, The University of Queensland, St. Lucia, Queensland, 4072,

More information

A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi YIN, Li-zhen LIU*, Wei SONG, Xin-lei ZHAO and Chao DU

A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi YIN, Li-zhen LIU*, Wei SONG, Xin-lei ZHAO and Chao DU 2017 2nd International Conference on Artificial Intelligence: Techniques and Applications (AITA 2017 ISBN: 978-1-60595-491-2 A Protein Secondary Structure Prediction Method Based on BP Neural Network Ru-xi

More information

proteins Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * INTRODUCTION

proteins Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * 1 Department of Computational Medicine

More information

Ranking Beta Sheet Topologies of Proteins

Ranking Beta Sheet Topologies of Proteins , October 20-22, 2010, San Francisco, USA Ranking Beta Sheet Topologies of Proteins Rasmus Fonseca, Glennie Helles and Pawel Winter Abstract One of the challenges of protein structure prediction is to

More information

proteins Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * INTRODUCTION

proteins Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS Interplay of I-TASSER and QUARK for template-based and ab initio protein structure prediction in CASP10 Yang Zhang 1,2 * 1 Department of Computational Medicine

More information

Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions

Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions Article Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions Sitao Wu, 3,4 Andras Szilagyi, 3,5 and Yang Zhang 1,2,3, * 1 Center for Computational Medicine and Bioinformatics

More information

The Assessment of Methods for Protein Structure Prediction

The Assessment of Methods for Protein Structure Prediction 2 The Assessment of Methods for Protein Structure Prediction Anna Tramontano, Domenico Cozzetto, Alejandro Giorgetti, and Domenico Raimondo Summary Methods for protein structure prediction are flourishing

More information

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn

Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn PROTEINS: Structure, Function, and Genetics 53:486 490 (2003) Prediction of Protein Structure by Emphasizing Local Side- Chain/Backbone Interactions in Ensembles of Turn Fragments Qiaojun Fang and David

More information

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics VI. Support Vector Machine Applications in Bioinformatics Jianlin Cheng, PhD Computer Science Department and Informatics Institute University of

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Protein Structure Prediction. christian studer , EPFL

Protein Structure Prediction. christian studer , EPFL Protein Structure Prediction christian studer 17.11.2004, EPFL Content Definition of the problem Possible approaches DSSP / PSI-BLAST Generalization Results Definition of the problem Massive amounts of

More information

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction

Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction Bayesian Inference using Neural Net Likelihood Models for Protein Secondary Structure Prediction Seong-gon KIM Dept. of Computer & Information Science & Engineering, University of Florida Gainesville,

More information

Are specialized servers better at predicting protein structures than stand alone software?

Are specialized servers better at predicting protein structures than stand alone software? African Journal of Biotechnology Vol. 11(53), pp. 11625-11629, 3 July 2012 Available online at http://www.academicjournals.org/ajb DOI:10.5897//AJB12.849 ISSN 1684 5315 2012 Academic Journals Full Length

More information

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling

Molecular Modeling Lecture 8. Local structure Database search Multiple alignment Automated homology modeling Molecular Modeling 2018 -- Lecture 8 Local structure Database search Multiple alignment Automated homology modeling An exception to the no-insertions-in-helix rule Actual structures (myosin)! prolines

More information

Free Modeling with Rosetta in CASP6

Free Modeling with Rosetta in CASP6 PROTEINS: Structure, Function, and Bioinformatics Suppl 7:128 134 (2005) Free Modeling with Rosetta in CASP6 Philip Bradley, 1 Lars Malmström, 1 Bin Qian, 1 Jack Schonbrun, 1 Dylan Chivian, 1 David E.

More information

CASP6 Assessment of Contact Prediction

CASP6 Assessment of Contact Prediction PROTEINS: Structure, Function, and Bioinformatics Suppl 7:214 224 (2005) CASP6 Assessment of Contact Prediction Osvaldo Graña, 1 David Baker, 2 Robert M. MacCallum, 3 Jens Meiler, 4 Marco Punta, 5 Burkhard

More information

An Overview of Protein Structure Prediction: From Homology to Ab Initio

An Overview of Protein Structure Prediction: From Homology to Ab Initio An Overview of Protein Structure Prediction: From Homology to Ab Initio Final Project For Bioc218, Computational Molecular Biology Zhiyong Zhang Abstract The current status of the protein prediction methods,

More information

Distributions of Beta Sheets in Proteins With Application to Structure Prediction

Distributions of Beta Sheets in Proteins With Application to Structure Prediction PROTEINS: Structure, Function, and Genetics 48:85 97 (2002) Distributions of Beta Sheets in Proteins With Application to Structure Prediction Ingo Ruczinski, 1,2 * Charles Kooperberg, 2 Richard Bonneau,

More information

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques Georgia State University ScholarWorks @ Georgia State University Computer Science Dissertations Department of Computer Science 3-21-2012 Protein Tertiary Model Assessment Using Granular Machine Learning

More information

Experimental design of RNA-Seq Data

Experimental design of RNA-Seq Data Experimental design of RNA-Seq Data RNA-seq course: The Power of RNA-seq Thursday June 6 th 2013, Marco Bink Biometris Overview Acknowledgements Introduction Experimental designs Randomization, Replication,

More information

Title: A topological and conformational stability alphabet for multi-pass membrane proteins

Title: A topological and conformational stability alphabet for multi-pass membrane proteins Supplementary Information Title: A topological and conformational stability alphabet for multi-pass membrane proteins Authors: Feng, X. 1 & Barth, P. 1,2,3 Correspondences should be addressed to: P.B.

More information

RosettainCASP4:ProgressinAbInitioProteinStructure Prediction

RosettainCASP4:ProgressinAbInitioProteinStructure Prediction PROTEINS: Structure, Function, and Genetics Suppl 5:119 126 (2001) DOI 10.1002/prot.1170 RosettainCASP4:ProgressinAbInitioProteinStructure Prediction RichardBonneau, 1 JerryTsai, 1 IngoRuczinski, 1 DylanChivian,

More information

Protein structure. Wednesday, October 4, 2006

Protein structure. Wednesday, October 4, 2006 Protein structure Wednesday, October 4, 2006 Introduction to Bioinformatics Johns Hopkins School of Public Health 260.602.01 J. Pevsner pevsner@jhmi.edu Copyright notice Many of the images in this powerpoint

More information

Protein Domain Boundary Prediction from Residue Sequence Alone using Bayesian Neural Networks

Protein Domain Boundary Prediction from Residue Sequence Alone using Bayesian Neural Networks Protein Domain Boundary Prediction from Residue Sequence Alone using Bayesian s DAVID SACHEZ SPIROS H. COURELLIS Department of Computer Science Department of Computer Science California State University

More information

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11 Li et al. BMC Bioinformatics (2015) 16:337 DOI 10.1186/s12859-015-0775-x RESEARCH ARTICLE A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment

More information

proteins Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11

proteins Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Structure prediction using sparse simulated NOE restraints with Rosetta in CASP11 Sergey Ovchinnikov, 1,2 Hahnbeom Park, 1,2 David E. Kim, 2,3 Yuan Liu, 1,2

More information

Protein 3D Structure Prediction

Protein 3D Structure Prediction Protein 3D Structure Prediction Michael Tress CNIO ?? MREYKLVVLGSGGVGKSALTVQFVQGIFVDE YDPTIEDSYRKQVEVDCQQCMLEILDTAGTE QFTAMRDLYMKNGQGFALVYSITAQSTFNDL QDLREQILRVKDTEDVPMILVGNKCDLEDER VVGKEQGQNLARQWCNCAFLESSAKSKINVN

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: January 16, 2013 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

Molecular design principles underlying β-strand swapping. in the adhesive dimerization of cadherins

Molecular design principles underlying β-strand swapping. in the adhesive dimerization of cadherins Supplementary information for: Molecular design principles underlying β-strand swapping in the adhesive dimerization of cadherins Jeremie Vendome 1,2,3,5, Shoshana Posy 1,2,3,5,6, Xiangshu Jin, 1,3 Fabiana

More information

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation

NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation www.bioinformation.net Web server Volume 11(8) NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation Seethalakshmi Sakthivel, Habeeb S.K.M* Department of Bioinformatics,

More information

Simple jury predicts protein secondary structure best

Simple jury predicts protein secondary structure best Simple jury predicts protein secondary structure best Burkhard Rost 1,*, Pierre Baldi 2, Geoff Barton 3, James Cuff 4, Volker Eyrich 5,1, David Jones 6, Kevin Karplus 7, Ross King 8, Gianluca Pollastri

More information

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen

Homology Modelling. Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Homology Modelling Thomas Holberg Blicher NNF Center for Protein Research University of Copenhagen Why are Protein Structures so Interesting? They provide a detailed picture of interesting biological features,

More information

Critical Assessment of Methods of Protein Structure Prediction (CASP) Round 6

Critical Assessment of Methods of Protein Structure Prediction (CASP) Round 6 PROTEINS: Structure, Function, and Bioinformatics Suppl 7:3 7 (2005) Critical Assessment of Methods of Protein Structure Prediction (CASP) Round 6 John Moult, 1 * Krzysztof Fidelis, 2 Burkhard Rost, 4

More information

Contact Lens: Evaluation of Structure by Contacts

Contact Lens: Evaluation of Structure by Contacts Tim Dreszer Contact Lens: Evaluation of Structure by Contacts 1 Contact Lens: Evaluation of Structure by Contacts Abstract Rapid evaluation of the similarity of two structures is an essential tool in protein

More information

Ph.D. in Information and Computer Science (Area: Bioinformatics), University of California, Irvine, August, (Advisor: Dr.

Ph.D. in Information and Computer Science (Area: Bioinformatics), University of California, Irvine, August, (Advisor: Dr. Jianlin Cheng Assistant Professor School of Electrical Engineering and Computer Science University of Central Florida Orlando, FL 32816 Phone: (407) 968-9746 Email: jianlin.cheng@gmail.com Web: http://www.eecs.ucf.edu/~jcheng

More information

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department

More information

Bioinformatics Practical Course. 80 Practical Hours

Bioinformatics Practical Course. 80 Practical Hours Bioinformatics Practical Course 80 Practical Hours Course Description: This course presents major ideas and techniques for auxiliary bioinformatics and the advanced applications. Points included incorporate

More information

Tutorial. Visualize Variants on Protein Structure. Sample to Insight. November 21, 2017

Tutorial. Visualize Variants on Protein Structure. Sample to Insight. November 21, 2017 Visualize Variants on Protein Structure November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

SMISS: A protein function prediction server by integrating multiple sources

SMISS: A protein function prediction server by integrating multiple sources SMISS 1 SMISS: A protein function prediction server by integrating multiple sources Renzhi Cao 1, Zhaolong Zhong 1 1, 2, 3, *, and Jianlin Cheng 1 Department of Computer Science, University of Missouri,

More information

Distributions of Beta Sheets in Proteins with Application to Structure Prediction

Distributions of Beta Sheets in Proteins with Application to Structure Prediction Distributions of Beta Sheets in Proteins with Application to Structure Prediction Ingo Ruczinski Department of Biostatistics Johns Hopkins University Email: ingo@jhu.edu http://biostat.jhsph.edu/ iruczins

More information

proteins TASSER_low-zsc: An approach to improve structure prediction using low z-score ranked templates Shashi B. Pandit and Jeffrey Skolnick*

proteins TASSER_low-zsc: An approach to improve structure prediction using low z-score ranked templates Shashi B. Pandit and Jeffrey Skolnick* proteins STRUCTURE O FUNCTION O BIOINFORMATICS TASSER_low-zsc: An approach to improve structure prediction using low z-score ranked templates Shashi B. Pandit and Jeffrey Skolnick* Center for the Study

More information

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS Anirvan Banerji New York 24th CIRET Conference Wellington, New Zealand March 17-20, 1999 Geoffrey H. Moore,

More information

Functional profiling of metagenomic short reads: How complex are complex microbial communities?

Functional profiling of metagenomic short reads: How complex are complex microbial communities? Functional profiling of metagenomic short reads: How complex are complex microbial communities? Rohita Sinha Senior Scientist (Bioinformatics), Viracor-Eurofins, Lee s summit, MO Understanding reality,

More information

Genetic Algorithm for Predicting Protein Folding in the 2D HP Model

Genetic Algorithm for Predicting Protein Folding in the 2D HP Model Genetic Algorithm for Predicting Protein Folding in the 2D HP Model A Parameter Tuning Case Study Eyal Halm Leiden Institute of Advanced Computer Science, University of Leiden Niels Bohrweg 1 2333 CA Leiden,

More information

Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor

Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Worked Example of Humanized Fab D3h44 in Complex with Tissue Factor Here we provide an example worked in detail from antibody sequence and unbound antigen structure to a docked model of the antibody antigen

More information

Discovering Sequence-Structure Motifs from Protein Segments and Two Applications. T. Tang, J. Xu, and M. Li

Discovering Sequence-Structure Motifs from Protein Segments and Two Applications. T. Tang, J. Xu, and M. Li Discovering Sequence-Structure Motifs from Protein Segments and Two Applications T. Tang, J. Xu, and M. Li Pacific Symposium on Biocomputing 10:370-381(2005) DISCOVERING SEQUENCE-STRUCTURE MOTIFS FROM

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/19/07 CAP5510 1 HMM for Sequence Alignment 2/19/07 CAP5510 2

More information

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Introduction Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke, Rochester Institute of Technology Mentor: Carlos Camacho, University

More information

Structural bioinformatics

Structural bioinformatics Structural bioinformatics Why structures? The representation of the molecules in 3D is more informative New properties of the molecules are revealed, which can not be detected by sequences Eran Eyal Plant

More information

In 1996, the genome of Saccharomyces cerevisiae was completed due to the work of

In 1996, the genome of Saccharomyces cerevisiae was completed due to the work of Summary: Kellis, M. et al. Nature 423,241-253. Background In 1996, the genome of Saccharomyces cerevisiae was completed due to the work of approximately 600 scientists world-wide. This group of researchers

More information

Nagahama Institute of Bio-Science and Technology. National Institute of Genetics and SOKENDAI Nagahama Institute of Bio-Science and Technology

Nagahama Institute of Bio-Science and Technology. National Institute of Genetics and SOKENDAI Nagahama Institute of Bio-Science and Technology A Large-scale Batch-learning Self-organizing Map for Function Prediction of Poorly-characterized Proteins Progressively Accumulating in Sequence Databases Project Representative Toshimichi Ikemura Authors

More information

Evolution and Similarity Evaluation of Protein Structures in Contact Map Space

Evolution and Similarity Evaluation of Protein Structures in Contact Map Space PROTEINS: Structure, Function, and Bioinformatics 59:196 204 (2005) Evolution and Similarity Evaluation of Protein Structures in Contact Map Space Nitin Gupta,* Nitin Mangal, and Somenath Biswas Department

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 24 no. 14 2008, pages 1575 1582 doi:10.1093/bioinformatics/btn248 Structural bioinformatics Using inferred residue contacts to distinguish between correct and incorrect

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Secondary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Secondary Structure Prediction Secondary Structure Annotation Given a macromolecular structure Identify the regions of secondary structure

More information

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski.

Chapter 8. One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model. Sebastian Kmiecik and Andrzej Kolinski. Chapter 8 One-Dimensional Structural Properties of Proteins in the Coarse-Grained CABS Model Abstract Despite the significant increase in computational power, molecular modeling of protein structure using

More information

FUNCTIONAL BIOINFORMATICS

FUNCTIONAL BIOINFORMATICS Molecular Biology-2018 1 FUNCTIONAL BIOINFORMATICS PREDICTING THE FUNCTION OF AN UNKNOWN PROTEIN Suppose you have found the amino acid sequence of an unknown protein and wish to find its potential function.

More information

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics

Bioinformatics & Protein Structural Analysis. Bioinformatics & Protein Structural Analysis. Learning Objective. Proteomics The molecular structures of proteins are complex and can be defined at various levels. These structures can also be predicted from their amino-acid sequences. Protein structure prediction is one of the

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

Recapitulation of Protein Family Divergence using Flexible Backbone Protein Design

Recapitulation of Protein Family Divergence using Flexible Backbone Protein Design doi:10.1016/j.jmb.2004.11.062 J. Mol. Biol. (2005) 346, 631 644 Recapitulation of Protein Family Divergence using Flexible Backbone Protein Design Christopher T. Saunders 1 and David Baker 2 * 1 Department

More information

proteins PREDICTION REPORT Template-based and free modeling by RAPTOR11 in CASP8 Jinbo Xu,* Jian Peng, and Feng Zhao INTRODUCTION

proteins PREDICTION REPORT Template-based and free modeling by RAPTOR11 in CASP8 Jinbo Xu,* Jian Peng, and Feng Zhao INTRODUCTION proteins STRUCTURE O FUNCTION O BIOINFORMATICS PREDICTION REPORT Template-based and free modeling by RAPTOR11 in CASP8 Jinbo Xu,* Jian Peng, and Feng Zhao Toyota Technological Institute at Chicago, Illinois

More information

proteins Massive integration of diverse protein quality assessment methods to improve template

proteins Massive integration of diverse protein quality assessment methods to improve template proteins STRUCTURE O FUNCTION O BIOINFORMATICS Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11 Renzhi Cao, 1 Debswapna Bhattacharya, 1 Badri

More information

A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs

A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs A Hidden Markov Model for Identification of Helix-Turn-Helix Motifs CHANGHUI YAN and JING HU Department of Computer Science Utah State University Logan, UT 84341 USA cyan@cc.usu.edu http://www.cs.usu.edu/~cyan

More information

Learning Structured Preferences

Learning Structured Preferences Learning Structured Preferences Leon Bergen 1, Owain R. Evans, Joshua B. Tenenbaum {bergen, owain, jbt}@mit.edu Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge,

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

A Simple Agent for Supply Chain Management

A Simple Agent for Supply Chain Management A Simple Agent for Supply Chain Management Brian Farrell and Danny Loffredo December 7th, 2006 1 Introduction The Trading Agent Competition Supply Chain Management (TAC SCM) game is a competitive testbed

More information

An image and the image quality assessment will be useful for the readers to evaluate the data quality.

An image and the image quality assessment will be useful for the readers to evaluate the data quality. Reviewers' comments: Reviewer #1 (Remarks to the Author): MS2 bacteriophage is an important model system for nucleic acid biology and has given rise to critical RNA reagents. This structure represents

More information