proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia

Size: px

Start display at page:

Download "proteins Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia"

Derek Reeves
6 years ago
Views:

1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS Predicted residue residue contacts can help the scoring of 3D models Michael L. Tress* and Alfonso Valencia Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain ABSTRACT During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close-to-native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT-TS scores than all but the best 3D prediction groups. Despite the well-known low accuracy of residue residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies. Proteins 2010; 78: VC 2010 Wiley-Liss, Inc. Key words: protein structure; structural prediction; residue residue contacts. INTRODUCTION Over the past 20 years, many different methods have been developed for the prediction of residue residue contacts in proteins. The techniques developed for the prediction of contacting residues are usually based on the extraction of correlated mutations from pairs of columns in a multiple alignment, 1 5 the training of machine learning methods on contact maps from real structures 6 10 or some combination of both. 11,12 In addition methods that make contact predictions based on the contact maps of predicted models were introduced in the residue residue contact prediction section of the most recent Critical Assessment of Protein Structure Prediction (CASP) experiment Contact predictions based on the contact maps taken from model structures were made possible because the server-predicted model structures were made available to the predictors by the CASP organizers as part of the CASP8 structure prediction experiment. The prediction of residue residue contacts has been part of CASP since CASP2. 13 However, contact prediction in its present form has only been an integral part of the experiment since CASP4 15 and the head-to-head evaluation of participating groups has only been possible since CASP5. 16 In the early CASP experiments, very few groups participated in the contact prediction category, but since CASP7 there has been renewed interest, as is clear from the number of groups that have published new work related to contact prediction since the last experiment Although the prediction of intramolecular residue residue contacts has traditionally been viewed as a source of information that might be useful in the prediction of protein structure, predicted contacts are rarely used directly by structure prediction programs. 28 Contact information has been incorporated in the form of contact potentials in fold recognition methods. 29,30 It has been suggested that predicted contacts might be used as restraints in NMR distance geometry and simulation techniques 31 and that predicting just a few important residue residue contacts with sufficient accuracy might be enough to infer approximate 3D model structures directly for many small proteins. 32 Contact predictions used in this way would be most valuable for target structures that would need to be modeled de novo. However, the approximate 20% accuracy that is routinely reported from the CASP experiments for the free modeling targets would suggest that residue residue contact predictions are not yet accurate enough to be used in ab initio structure prediction. Additional Supporting Information may be found in the online version of this article. Grant sponsor: Consolider, E-Science; Grant number: CSD ; Grant sponsor: ENCODE Project; Grant number: U54 HG *Correspondence to: Michael Tress, Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), c./ Melchor Fernandez Almagro, Madrid, Spain. mtress@cnio.es Received 28 October 2009; Revised 3 February 2010; Accepted 13 February 2010 Published online 12 March 2010 in Wiley InterScience ( DOI: /prot PROTEINS VC 2010 WILEY-LISS, INC.

2 Model Scoring with Contacts One further use of predicted contacts was suggested by the CASP7 contact prediction assessors 18 among others. Reliably predicted intramolecular contacts might be used to select among a range of alternative model structures or could at least be used to limit conformational searches, either through postprocessing or directly as part of the prediction strategy. At the CASP8 meeting in Cagliari in 2008, it was also suggested that these less reliable residue residue contact predictions may still be sufficiently useful to choose between a range of de novo modeled loop regions. 33 In fact the idea that predicted contacts, or at least some form of sequence-based contact predictions, might be able to discriminate between predicted models has a long history. 34 The CASP3 assessors 14 were the first to test the theory that predicted contacts might be useful in scoring models. They carried out a simple test in which they threaded the target sequence onto the target structure itself and scored the threaded targets with the predicted contacts. They recorded a degree of success in selecting the correctly threaded target for fold recognition targets. Recently, several groups have carried out self-assessments that seem to suggest that predicted contacts might be useful in selecting better scoring models from a range of decoys. 35,36 Here, we have revisited the CASP8 contact predictions and attempted to use them to score the models predicted by the structure prediction group and to score the targets themselves. The results show conclusively that information from predicted contacts can be useful in scoring models in the free modeling and difficult comparative modeling regimes and that predicted contacts might also be of some use in selecting short ab initio modeled regions. METHODS Contact predictors In CASP8, 22 groups made predictions for the contact prediction category. 19 Predicting groups in CASP submit lists of residue pairs that are predicted to be in contact at 8Å, along with a probability estimate that each pair is in contact. The prediction format allows predictors a certain amount of flexibility in the way that they can make their predictions and this flexibility allows predicting groups to use a wide range of different strategies. Predictors submit different numbers of predictions and use different distance cut-offs. In addition, the density of the predicted contacts varies between groups some groups avoid predicting adjacent contacts, whereas other groups predict all possible contacts. This means that predictions from these groups are not easily comparable. In CASP, the assessors get around the problem by using the predictor reliability score to rank the predictions of each group and by setting limits on the number of predictions that are assessed for each target. The length of the target or the target domain sequence is usually used to ensure that predictors are assessed over a fixed number of predicted pairs. In CASP8, the assessors used the length of the target domain sequence as a fixed reference point to allow comparisons between groups, limiting predictions by each group to the top-ranked L/5 or L/10 predictions, where L was the length of the target domains defined by the assessors. In this study, we also took the first L/10 predictions as a reference point, but we also scored models using all the predicted contacts submitted by each of the groups. Although the length cut-offs used by the CASP contact prediction assessors makes comparison between groups more reliable, the differences between the strategies of the predicting groups means that it is still difficult to compare groups in a completely fair manner, especially when predictors are assessed over all predicted contacts. For that reason, and because we are studying the utility of predicted contacts in general and not the effectiveness of individual groups, we have chosen not to identify the CASP predicting groups in this study. However, in the interest of furthering study in this field, we will identify them on request. The CASP8 contact assessment concentrated on a limited number of sequence separation ranges. We also looked at a limited number of sequence separation ranges, but we were interested in assessing the usefulness of short-range contacts as well as long-range contacts. We looked at predicted contacts at three distance ranges, x 6, x 12, and x 24. In theory, predicted contacts at a sequence separation of greater than 24 residues are much more valuable as structure constraints in 3D structure prediction, but the pool of available contact information is much greater if predicted short-range contacts are taken into consideration too. The predictions for the CASP8 contact prediction experiment used in this article are available on the CASP web site. 37 Target selection In CASP8 contact predictors were only assessed over a set of free modeling target domains. This is because if predicted contacts are to be used as part of the structure prediction process, they are most useful for targets that are not built from homologous template structures. It was felt that predicting contacts for targets with known templates was relatively trivial. To assess the use of predicted contact in the scoring of model structures, we looked at the contact predictions for all types of targets. Evaluations were made over the CASP8 assessor defined domains. 38 In CASP, targets are often split into multiple domains and these domains can be defined in two different categories. As in the CASP8 analysis, only the residues in the official CASP8 target domain definitions for each target were considered in the analysis. In part, this is because GDT-TS 39 scores for PROTEINS 1981

3 M.L. Tress and A. Valencia models of multidomain targets are too noisy because of the unpredictability of domain orientation, in part because domains in multidomain targets often belong to different prediction categories. We divided the CASP8 target domains into three groups, free modeling targets (the 13 free modeling and template-based modeling/free modeling overlap target domains from the CASP8 structure prediction evaluation), hard comparative modeling targets, and easy comparative modeling targets. The hard comparative modeling targets were 27 CASP8 target domains for which the average of the best AL0 alignments scores for all models was less than 65. The remaining 124 CASP8 target domains formed the easy comparative modeling target set. We also examined the possible use of contact predictions in selecting loops for template-based modeling target domains that we felt could only have been built ab initio. Loops defined as ab initio had at least seven residues that were 4 Å or more away in all close templates. We checked the closest 20 templates according to LGA. 40 Each loop superposition was checked by eye and extended if there were further residues clearly without a reasonable template. Assessment We used the predicted contacts from each CASP8 residue residue contact prediction group to score the 3D models predicted by the CASP8 structure prediction groups. For each predictor, we first generated sets of predicted contacts for each target domain based on the range of sequence separations and distance cut-offs used in the experiment. These sets of predicted contacts were then used to score the predicted 3D models for each of the target domains and the target domain itself. For each structure, we simply calculated the sum of the distances between each of the predicted pairs in the contact set. We used the sum of the distances to make two comparisons. First, how good the predicted pairs were at recognizing the target structure. Selecting the native structures should show the real predictive value of the contacts that the predictors are generating. For the second comparison, we left out the real structure and concentrated solely on the predicted 3D models. We wanted to know how the contact predictors would have performed if they had been used as naïve model scoring methods in CASP8. Here, we used the sum of distances we generated for each 3D model to select a single 3D prediction for each target domain. This assessment is somewhat closer to the way that contact predictors might be used to score models and allows a comparison with the structure prediction servers and quality assessment predictors that participated in CASP8. Selecting the model with the lowest sum of the distances of the predicted pairs is not a very sophisticated means of assessing the effectiveness of model selection from contacts. Predictors could almost certainly build better systems. However, even this simple tactic worked well in this experiment. To make the comparisons, we included all predicted 3D models for each target, not just model 1. However, we excluded physically unrealistic models from the evaluation. Models that had Cb Cb distances that were less than 3.2 Å or that had consecutive residues with Ca Ca distances that were greater than 4.25 Å apart were eliminated from the assessment a priori. 3D models that only covered a fraction of the predicted contacts for the target domain were also excluded because if a 3D model does not contain the residue predicted to be in contact, the contribution to the sum of the distances will be zero. The GDT-TS scores of the model structures submitted to CASP were taken from the CASP8 web pages. 41 RESULTS Using predicted contacts to select the native target structures We first investigated whether it was possible to use the predicted contacts to select the native structure from amongst the decoys in the free modeling regime. Eleven predictors managed to place the native structure in the top three scoring structures for at least five of the 13 free modeling targets. Using all the contacts and a minimum sequence separation of 12 residues, eight of the prediction groups would have been able to identify the native structures of T0443-D2 and T0496 using their predicted contacts, whereas at the other end of the scale, four target domains (T0476, T0482, T0443-D1, and T0405-D1) completely resisted selection using contacts. Three of these latter targets were mostly alpha-helical. For each predictor, we calculated the mean rank of the native structure when chosen from among the 3D models. The rank of the target structure was normalized over the number of models from which the selection was made. The results can be seen in Figure 1(A). All predictors have a ranking that is well over 0.5, the score that predictors would expect to obtain if the ranking of the native structure was entirely random. Most predictors perform better at selecting the native structure when all predicted contacts (rather than L/10) are used. One caveat to these results is that there are very few free modeling targets. However, similar results are obtained when predicted contacts are used to select the native structures in the hard comparative modeling section where there are 26 targets. The mean normalized ranking of the native structure for the harder comparative modeling targets [Fig. 1(B)] show that again all predictors have a ranking that is well over 0.5 and that many of the better predictors have mean normalized rankings of more than 0.8. Once again most predictors 1982 PROTEINS

Model Scoring with Contacts Figure 2 Normalized ranks for native structure selection varying the sequence separation.

Results are compared using the hard comparative modeling subset (26 target domains) and three different sequence separations, a minimum of six residues, a minimum of 12 residues, and a minimum of 24

interscience.wiley.com.] Figure 1 Normalized ranks for native structure selection.

4 Model Scoring with Contacts Figure 2 Normalized ranks for native structure selection varying the sequence separation. The mean normalized rank of the native structure after scoring with predicted contacts. Results are compared using the hard comparative modeling subset (26 target domains) and three different sequence separations, a minimum of six residues, a minimum of 12 residues, and a minimum of 24 residues. Comparisons are made with all predicted contacts. Similar results can be seen with the free modeling subset. [Color figure can be viewed in the online issue, which is available at Figure 1 Normalized ranks for native structure selection. The mean normalized rank of the native structure after scoring with predicted contacts. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets. Results are compared using all predicted contacts and a subset of the top-ranked L/10 predicted contacts. In this experiment, the minimum sequence separation was 24 residues. Note that most groups are better at selecting the target when all predicted contacts are used to score models. [Color figure can be viewed in the online issue, which is available at are better at selecting the native structure when all predicted contacts are used. When all the predicted contacts were used to score the models using a minimum sequence separation of 12 residues more than half of the contact prediction groups were able to recognize the native structures of target domains T0429-D2, T0430-D2, T0501-D2, T0504-D2, and T0510- D1. The contacts predicted by one of the groups (CP01) for target domain T0429-D2 illustrate the effectiveness of the predicted contacts (Supporting Information Fig. 1). The predictor CP01 predicts many residue residue contacts (there are a total of 419 predicted contacts for target T0429-D2) and most of them are false positives. However, the predicted contacts are exclusively in the core of the protein there are no contacts predicted for residues in the loops that extend from the globular structure. T0429- D2 was one of the hardest template-based targets; only one tertiary structure prediction group managed to generate a model with over 50% of the alignment correct. We used the hard comparative modeling target subset to assess the effect of limiting the sequence separation of the predicted contacts. As can be seen from Figure 2, limiting the sequence separation of the predicted contacts does not have much influence on the power of the predicted contacts to separate the native structure from the predictions. Equally there was no clear pattern to be seen with the free modeling targets (results not shown). Although predicted residue residue contacts do help to select native structure in the free modeling and hard template-based modeling regimes, they work less well with the easier template-based modeling targets. The comparison of the mean normalized ranks of the selected targets from the three groups (Fig. 3) shows that while Figure 3 Normalized ranks for native structure selection in three different prediction regimes. The mean normalized rank of the native structure after scoring with predicted contacts. Results are compared using the three different subsets, free modeling, and hard and easy comparative modeling. Comparisons are made with all predicted contacts and a minimum of 12 residues sequence separation and only groups with predictions for a minimum of eight targets in the free modeling category are shown. Most contact prediction methods perform close to random at selecting the native structure in the easy comparative modeling regime. [Color figure can be viewed in the online issue, which is available at PROTEINS 1983

5 M.L. Tress and A. Valencia the predicted contacts have discriminatory information when the evolutionary divergence between targets and potential structural templates is greatest (the free modeling and hard comparative modeling regimes), they have almost no discriminatory power when the native structure is similar to the predicted 3D models. Model scoring Although the recognition of the target structure shows that residue residue contact prediction methods can generate information that is useful for discriminating real structures from remotely similar 3D models, this is not yet a real world application. This is particularly true in the free modeling and hard comparative modeling regimes where the best predicted models are not necessarily highly similar to the native structure. In this second experiment, we wanted to see whether the predicted contacts could be used to help score and select predicted model structures. Each contact predictor was treated as if it were a structure prediction method. We took all the models for each CASP8 target domain and excluded those that were not feasible, as detailed in the methods section, those that were identical to other models and those that were missing the residues that were predicted to be in contact. For the remaining models, we calculated the sum of distances between the predicted pairs from each contact prediction method. The model with the lowest sum of distances was chosen as the prediction from the contact server for that target domain. This is one way that contact predictors might be used to select good structural models and allows a comparison with the structure prediction servers that participated in CASP8. It should be noted that the comparison between the server predictions in CASP8 and the contact predictors is not completely fair, in part because the contact predictors are selecting from a different pool of models the contact predictors have access to all the models submitted to CASP8, including those human group predictions that only became available after the experiment. However, the aim of the article is not to demonstrate whether or not contact prediction servers are better than 3D prediction servers, but to determine whether contact predictions have any use in 3D model prediction strategies. The results from the free modeling section can be seen in Figure 4. This shows that predictors are markedly better at selecting models when all predicted contacts are used to select the models and that contacts predicted at a sequence separation of greater than 11 are better at selecting high scoring models than those predicted at distances greater than 23. However, the results for the free modeling target domains should be viewed in a somewhat skeptical light, in part because GDT-TS is not as effective a means of measuring prediction accuracy in the free modeling regime and in part because there were so few targets in this subset. Figure 4 Mean GDT scores for models selected by contact predictors. The mean GDT-TS scores of the models selected as number one by the contact prediction methods. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets. Results are compared using all predicted contacts and a subset of the top-ranked L/10 predicted contacts and at two different sequence separations, a minimum of 12 residues and a minimum of 24 residues. Most predictors select models with slightly higher GDT-TS scores at a minimum sequence separation of 12 and using all predicted contacts. [Color figure can be viewed in the online issue, which is available at Results similar to those from the free modeling subset can be seen with the subset of harder comparative modeling targets [Fig. 4(B)], although here the effect of limiting the sequence separation of the contacts is less pronounced. Comparisons with predictions made with minimum sequence separations of 6, 12, and 24 show that the effect of restriction by sequence separation is minimal and prediction method specific (Supporting Information Fig. 2). As in the first experiment, we compared how well each predictor performed in each of the three regimes and the results (Supporting Information Fig. 3) mirror the equivalent comparison in the native structure selection experiment. Contact predictors rank the top scoring models much higher over the harder comparative modeling targets and the free modeling targets than they do over the easy comparative modeling targets. We also asked the question, how well would the contact predictors have performed in CASP8 if they were able select model structures based solely on the predicted 1984 PROTEINS

6 Model Scoring with Contacts contacts. Here, we evaluated the selections made by the contact predictors against the predictions made by the structure prediction groups in CASP8. CASP has traditionally measured comparative structure prediction performance using the mean of the Z-scores calculated from the GDT-TS scores for each target. Since structure prediction groups and contact prediction groups predict over different numbers of targets and each of the target domains has a different degree of difficulty, we compared the contact predictors and structure prediction servers in the same way as CASP. Z-scores were calculated from the GDT-TS scores for each target, not only from the first models of the structure prediction groups but also from the first selected models from the contact prediction groups. We calculated the mean Z-scores over all the predicted targets. In CASP, negative Z-scores are often set to 0. We did not do that in this comparison. The results can be seen in Figure 5. Although we calculated the Z-scores over all the predictors, for clarity and for comparison the figures only show the best official structure prediction servers and the best contact prediction groups The models selected by the contact prediction groups are clearly competitive with the best servers in both the free modeling [Fig. 5(A)] and hard comparative modeling categories [Fig. 5(B)]. The lack of targets in the free modeling category makes it almost impossible to make reliable statistical comparisons between groups. However, the results from the free modeling category concur with those for the hardest targets suggesting that the predictive power of residue residue contacts for the hard-to-predict target domains is real. However, only two contact prediction groups scored well when compared with the best structure prediction servers in the easy comparative modeling section [Fig. 5(C)]. These two methods both predict contacts using contact maps taken directly from predicted models a somewhat circular process. CP10 is the best performing contact predictor in the hard template-based modeling section too, better even than the best structure prediction server, suggesting that consensus contacts from predicted models could be a powerful tool for selecting models in the hard template-based modeling regime. Curiously though, CP10 performs worst of all the methods when attempting to select the native structure, suggesting that methods based on model contact maps may be learning useful consensus-based features of the models rather than native-like structural features. CP17, the best performing contact predictor in the easy comparative modeling regime, almost certainly take their contacts directly or indirectly from the models generated by one of the best performing structure prediction servers in the easy comparative modeling regime. There were a number of outstanding examples of model scoring with contacts, despite the low accuracy of the predicted contacts. For example, in the free modeling Figure 5 Comparison between contact predictors and CASP8 automatic servers. Contact predictors are compared with the best of the automatic servers in CASP8. The comparison is made over the mean Z-scores of the GDT-TS scores of the top-ranked model from each method. Contact predictors (CP) in purple, servers (TS) in light green. In (A) the results for the 13 free modeling targets, in (B) the results from the 26 hard comparative modeling targets, and in (C) the results from the 118 easy comparative modeling targets. Contact predictor selections are made with all predicted contacts and a minimum of 12 residues sequence separation. Contact predictors are comparable to the best servers at selecting models in the free modeling and hard comparative modeling regions. In the easy comparative modeling regime, the automatic servers are clearly better than all but the contact predictors that take their contact maps from the predicted models in CASP. The comparison between automatic servers and contact predictors cannot be totally fair, but it does demonstrate the efficacy of the contact prediction servers in the free modeling and hard regimes and the poor results over the easier comparative modeling targets. [Color figure can be viewed in the online issue, which is available at PROTEINS 1985

7 M.L. Tress and A. Valencia Figure 6 Outstanding examples of model selection using predicted contact distances. Superpositions of the models selected using the predicted contacts and the native structures of the target domains. Models were scored using all the contacts and with a minimum sequence separation of 12 residues. A: The second-ranked model for target domain T0416-D2 (in red, selected by 10 predictors) and the native structure (yellow). B: One of the outstanding models for target domain T0513-D2 (in red, selected by four predictors) and the native structure (yellow). C: The best model for target domain T0429-D2 (in red, selected by 10 predictors) and the native structure (yellow). D: The best model for target domain T0487-D2 (in red, selected by four predictors) and the native structure (yellow). [Color figure can be viewed in the online issue, which is available at section, a total of 10 predictors would have selected a model that was second only to the number one ranked MULTICOM 42 model for target domain T0416-D2 [Fig. 6(A)], using all the contacts and with a sequence separation limit of 12 residues or more. Four contact prediction groups (CP00, CP07, CP08, and CP12) would have selected one of the standout models for target domain T0513-D2 [Fig. 6(B)]. There were also outstanding examples in the hard comparative modeling regime, such as target domain T0429-D2 where 10 contact prediction groups would have selected the top-ranked model [the MUSTER server, 43 Fig. 6(C)] and three other groups would have selected models with higher GDT-TS than the second-ranked predictor. Similarly, nine contact prediction groups would have selected (different) models with GDT-TS scores comparable to the top four predictors for target domain T0457-D2 (BAKER, 44 the PSI server, IBT-LT, 45 and Zhang 33 ), whereas for target domain T0487-D4 four contact groups (CP00, CP04, CP12, and CP13) would have selected the best scoring human model [from BAKER, Fig. 6(D)] and three others would have selected slightly inferior models that still scored between 8 and 12 GDT-TS points more than the secondranked human predictor for this target domain and that were a full 25 GDT-TS points better than the best number one-ranked server model (Fig. 7). The contacts predicted by group CP08 for target domain T0407-D2 would have selected the second highest ranked model using our crude model scoring system, a model that was within three GDT-TS points of the IBT- LT prediction and five GDT-TS points better than the second best human prediction. Comparing the selected model with the best number one-ranked server model, from the PHYRE server, 46 it is clear that the regions with the greatest difference between the model selected by CP08 and the PHYRE model are at the terminal ends of the protein. The positioning of the N-terminal strand of the BAKER model is more or less correct, whereas 1986 PROTEINS

8 Model Scoring with Contacts Figure 7 Best contact predictor models and server models. Native-model superpositions for two targets where contact predictors performed substantially better than the best server predictor. All structures are colored blue to red along their sequence. A: Superposition of the model selected using the contacts predicted by group CP08 and the native structure of target domain T0407-D2. B: Superposition of the top-ranked PHYRE model and the native structure of target domain T0407-D2. The N-terminal (blue) and C-terminal (red) regions are predicted differently between the two models. C: Superposition of the model selected using the contacts predicted by four contact groups (CP00, CP04, CP12, and CP13) and the native structure of target domain T0487-D4. D: Superposition of the top-ranked PCons model and the native structure of target domain T0487-D4. The greatest differences between the two models are that the second strand is displaced and that the loose loop conformation between the second and third strands is entirely different. these residues do not form a strand in the PHYRE model (Fig. 7). There is a strand misaligned in the second example in Figure 7 too, although here there are also substantial differences in one of the loop regions between the best comparable server model from Pcons 47 and the model selected by using the contacts predicted by the four contact groups (CP00, CP04, CP12, and CP13). Ab initio predicted regions We also used predicted residue residue contacts to score regions of structures that did not have template structures. These regions were either loops or terminal regions that could only have been modeled ab initio. All these regions (bar one) were part of assessed template-based target domains in CASP8. There were 36 regions from CASP8 target domains that ranged in size from seven residues (target domain T0510, residues 136 to 142) to 39 residues (target domain T0419, 256 to 294). In addition, we included residues 235 to 304 from target T0395 that were excluded from the CASP8 target definition. In Figure 8, we show the success of the contact predictors at differentiating the native loop from the modeled regions. The better predictors have mean normalized rank of native selection of 0.7, not as impressive when compared with the free modeling targets and the hard template-based modeling targets, but showing that the contacts have discriminatory power here too. Although we have not identified the predictors, there is a clear difference between those methods that base their contacts PROTEINS 1987

9 M.L. Tress and A. Valencia models that reached a certain overall quality. We chose as a cut-off models that had GDT-TS scores of at least 60. After removing low-quality models, we were left with models for just 18 ab initio regions. Once again for each predictor, we calculated the mean normalized ranking of the selected loops. The results can be seen in Fig. 8(B). On the whole, the predictors perform better when all contacts are used to score the loops. Two contact predictors stand out, CP02 and CP08, with a mean normalized ranking of greater than 0.7 (comparable to the selective power seen in the best structure predictors, see Fig. 4), and there are many servers that have scores between 0.6 and 0.7, so their selection of the ab initio regions was somewhat better than a random selection (0.5). Once again the predictors that predict contacts using the contact maps taken from predicted 3D models were among the worst performing groups in this section, another indication of the poor quality of the CASP8 structure prediction servers when predicting ab initio regions in template-based models. A comparison with quality assessment methods Figure 8 Normalized rankings for selecting native loops and for selected model loop. In (A) the mean normalized rank of the native loop after scoring with predicted contacts, in (B) the mean normalized GDT-TS rank of the selected model after scoring with predicted contacts. Results are compared using all predicted contacts and a subset of the top-ranked L/2 predicted contacts. In this experiment, the minimum sequence separation was eight residues. Predictors were only compared if they predicted loops for at least 10 of the targets and comparisons in (B) were only made for those models that reached a GDT-TS score of at least 60. [Color figure can be viewed in the online issue, which is available at on information that can be gleaned from the target sequence (on the left) and those that take their predicted contacts from the contact maps of predicted models (poor scoring predictors on the right). Predictions taken from model contact maps are not good at recognizing native loop conformations and indeed predictor CP10 performs markedly worse than random on this test. This is in part because structure predictor servers generally perform poorly when modeling these ab initio regions. Twenty four of the regions that would have to have been modeled ab initio were mainly helical, four formed strands and eight were mainly loop. Although it was just a small sample, contact predictors appeared to perform better at selecting native ab initio regions when the region is part of a beta sheet and less well when they form loops (Supporting Information Fig. 4). The final comparison measures how effective the contact prediction servers would have been at selecting regions that have to be modeled ab initio. Here, we looked at the same 37 regions as in the first experiment, but only considered the modeled ab initio regions in those We have used predicted contacts to score predicted 3D models. In theory, contact predictors would be able to estimate the overall quality of 3D models with a slight modification of their methods. CASP has implemented a quality assessment category since CASP7 and we have been able to use the predictions and results from the CASP quality assessment experiment 48 to compare the performance of contact predictors and the quality assessment groups in this category. The comparison with the quality assessment predictors presented a number of difficulties. The quality assessment category was carried out with whole targets, whereas the contact prediction scoring in this article used target domains. The quality assessment predictors had access only to the CASP server models, whereas the models that the contact predictors were allowed to select came from both human and server groups. We also only concentrated on those target domains that we defined as hard comparative modeling in this article because as we have already shown that CASP format predicted contacts are less useful for scoring easy comparative modeling targets. All these difficulties meant that the comparison had to be carried out over a reduced set of target domains. The comparison was carried out over the nine targets from the hard comparative modeling section that were not split into domains by the CASP assessors. Comparisons were made only over the server models. The results can be seen in Figure 9. Over the nine targets where it was possible to make a comparison between selected server models, 12 quality assessment servers performed better than the best contact predictor. The 12 methods were FAMS-ace2 (human), McGuffin (human 49 ), MULTICOM (human 42 ), 1988 PROTEINS

Model Scoring with Contacts Figure 9 Mean GDT-TS for the top models chosen by contact predictors and quality assessment groups over the hard comparative modeling subset.

10 Model Scoring with Contacts Figure 9 Mean GDT-TS for the top models chosen by contact predictors and quality assessment groups over the hard comparative modeling subset. The mean GDT-TS score for the models selected as the best model by CASP8 contact prediction groups and quality assessment groups. The means were calculated over a subset of nine hard comparative modeling target domains as described in the main text. Quality assessment group GDT-TS scores are shown in blue and contact predictors in orange. Only the top 33 groups of the 63 combined quality assessment and contact prediction groups are shown. Models selected by the contact predictors were based on distances calculated from all predicted contacts at a minimum sequence separation of 12 residues. The dark blue bars represent consensus-based quality assessment predictors, the light blue bars show quality assessment groups that use measures calculated for a single model. Single model predictors QA009, QA365, QA069, QA105, and QA117 all use predicted contacts in some form in their quality assessment measures. [Color figure can be viewed in the online issue, which is available at SMEG-CCP (human), ModFOLDclust (server 49 ), Pcons_Pcons (server 47 ), selfqmean (server 50 ), MULTICOM- CLUSTER (server 42 ), MODCHECK-Jury (server), and the three servers from the SAM-T08-MQA series. 51 As far as we can tell 10 of the servers use the CASP predicted server models to generate consensus-based information, the strategy that has been shown to be most effective at selecting higher scoring GDT-TS models. 48 The two servers that do not use consensus information come from the SAM-T08- MQA series. All of the SAM-T08-MQA servers 51 use predicted contacts to assess the quality of the models 52,53 and the SAM-T08-MQAO server uses the predicted contacts alone. In addition, SMEG-CCP appears to use contacts predicted from models to make their quality assessment predictions. There are another nine quality assessment groups that score better than the third best contact prediction group. These also appear to be mainly consensus-based methods (QMEANSclust, 50 two servers from the FAMS series, GS- MetaMQAPconsI and II) or to use models (Bilab-UT). MULTICOM-CMFR 42 is a single model method that has done well over these nine targets, but MULTICOM-CMFR also used predicted and model contact maps as part of its model evaluation scheme. MUFOLD-QA used a range of statistical potentials and machine learning. Although these are results for just nine targets, it is interesting to compare our ranking with that of the quality assessment experiment 48 over all the targets. The quality assessment category had a bias toward easy comparative modeling targets since there were many more of them in CASP8. 38 Quality Assessment groups that performed notably better over the nine hard comparative modeling targets included FAMSace2, SAM-T08-MQAO (which uses contacts from alignments to predict quality assessment), SMEG-CCP (predicts quality assessment from consensus contacts from models), Bilab-UT, MUFOLD-QA (uses statistical potentials), selfqmean (uses statistical potentials), MULTICOM-CMFR (uses predicted contacts), and FAMSD. At least over this small sample, it does seem that predicted contacts help to predict the quality of predicted 3D models for the harder comparative modeling targets. CONCLUSIONS In the CASP8 contact prediction experiment the CASP assessors noted that the numbers of predictors in CASP8 suggested that there was an increased interest in the use of predicted contacts in structure prediction. Here, we have attempted to use the predicted contacts from the CASP8 residue residue contact prediction experiment to score the models for the CASP8 targets. We have been able to show that predicted contacts do indeed contain predictive information and that this information can be used to predict structure. At worst predicted contacts might be used as an aid in the selection of reliable 3D structural models and at best models selected using residue residue contact predictions might even compete with the best of the prediction servers for the harder targets. We split the CASP8 targets into three groups and it was clear that predicted contacts were much more valuable in the free modeling targets and the harder comparative modeling regimes. Most of the simple predictors that were built from the predicted contacts for these targets were highly effective at recognizing the target structure and many were capable of scoring model structures in an efficient manner in the free modeling and harder comparative modeling regimes. Indeed, the experiments show that the contact predictions from the best of the contact predictors could have been used to select models that were as good as or better than the vast majority of the models submitted by the automatic servers in the CASP8 experiment. Methods that select models based on the contact maps of the predicted models are likely to be particularly effective in the hard comparative modeling regime. It is important to note that a direct comparison between structure prediction servers and contact predictor scoring is not possible because of the conditions of the experiments. GDT-TS is known not to be an ideal measure of model quality in the free modeling regime PROTEINS 1989

11 M.L. Tress and A. Valencia and both the free modeling and harder comparative modeling sets contain few targets. Contact predictors also have the advantage of knowing the domain definitions in hindsight (though our results show us that predictors perform equally well with the subset of single domain proteins) and contact predictors had the advantage of being able to select from any model submitted to CASP8 (e.g., both contact predictor selected examples in Figure 7 came from the BAKER group and the servers did not have access to these models). Despite these caveats our experiments clearly show that even with a crude scoring scheme, such as the one in this article, predicted contacts can provide information that is useful for selecting both models and native structures. At present, this information is used in few structure prediction techniques and seems to be orthogonal to the information used by most successful structure prediction strategies. We also looked at how well groups were able to select ab initio predicted loops. Once again the predictors performed better than expected, though the results were less outstanding than there were for the prediction of the whole target domains. Most predictors performed better in the experiments when they used all their predicted contacts, not just the top L/10 ranked contacts. This is in contrast to the results of the CASP7 and CASP8 contact prediction experiments, 17,18 which showed that contact prediction accuracy improved as the number of predicted contacts went down. Our experiments demonstrate that even contacts predicted with very low levels of accuracy can be useful for the purposes of model selection, something that runs contrary to the suggestion of Ortiz and Skolnick 54 that just a few highly reliable predicted contacts might be sufficient in structure prediction. There was one section in which the contact predictors did not prove useful no groups were able to separate the native structure from the models in a consistent fashion for the easy comparative modeling target domains. The few groups that had some success at scoring models in the easy comparative modeling regime took their contacts taken from the contact maps of the predicted structures. This may be because contacts predicted at 8 Å are too coarse to appreciate the subtle differences that occur between models in this regime. We attempted to measure whether the predicted contacts were more useful for scoring certain types of targets or regions of targets that had to predict ab initio. With so few targets, it was difficult to find obvious correlations, although it was clear that the more difficult the target, the better the contact predictors performed, as suggested by the results from the easy comparative modeling section. In addition, contact predictors seemed to perform less effectively with all-helical target domains. There were three poorly predicted all-helical targets in the free modeling regime and the two domains of T0478 were among the worst predicted in the hard comparative modeling regime. The models chosen by the contact predictors in the easy comparative modeling regime were on average worse than the median scoring model in 19 of the 23 all-helical target domains. It is possible that the residue residue contact prediction experiment will not be carried out in the usual format in CASP9 due to the lack of suitable targets. 38 However, this work suggests that despite the low levels of accuracy reported in previous CASP experiments, predicted contacts might have a place in structure prediction as part of model scoring strategies. It is clear that the best quality assessment methods use some form of consensus information to select models, but the results from the reduced comparison that we were able to carry out with the quality assessment groups suggested that predicted contacts could be used as a highly effective part of a model scoring strategy. This is something that has been suggested by the authors of the SAM-T08-MQA servers 52 and as could be seen from our minimal comparison even the SAM-T08- MQA server that used predicted contacts alone was able to compete with the best consensus servers over the harder comparative modeling targets. So, predicted contacts do appear to have promise in the prediction of 3D structures, but rather than guiding the structure prediction, it looks as if they would be put to better use scoring or selecting the models generated by structure prediction programs. In addition, this experiment has only looked at the scoring of models for single compact, globular domains. It may also be possible that predicted contacts could also help in the prediction of other 3D structural features, such as domain domain orientation and the docking of proteins. ACKNOWLEDGMENTS The authors like to thank Osvaldo Graña and Txema Gonzalez-Izarzugaza for their critical input on the article, the CASP organizers for organizing the CASP experiments, and Andriy Kryshtafovych for maintaining the CASP web pages. REFERENCES 1. Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of coordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987;193: Gobel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994;18: Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des Suppl 1997;2: Kundrotas PJ, Alexov EG. Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics 2006;7: Fodor AA, Aldrich RW. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 2004;56: Pollastri G, Baldi P. Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 2002;18: PROTEINS

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction Conformational Analysis 2 Conformational Analysis Properties of molecules depend on their three-dimensional