Competitive Imperialistic Approach for Protein Folding

Size: px
Start display at page:

Download "Competitive Imperialistic Approach for Protein Folding"

Transcription

1 Competitive Imperialistic Approach for Protein Folding E. Khaji a, S.M.Mortazavi b a Department of Physics, Gteborg University, Gothenburg, Sweden. b School of Business, University of Colorado, CO Denver, USA. Abstract The protein folding problem is a fundamental problem in computational molecular biology and biochemical physics which led us to understand the function of a given sequence.the problem is NP-hard and the standard computational approach are not suitable to obtain the enough accurate structure in the huge conformation space. Simplied models such as hydrophobicpolar (HP) model have become one of the major tools for studying protein structure due to the complexity of the protein folding problem. Several optimization methods have been applied on this problem including Monte Carlo methods, evolutionary algorithm, and ant colony optimization algorithm. In this work, we present the results of the experiments of Imperialist Competitive algorithm on 3D HP protein folding problem. The achieved results are compared favorably with specialized state-of-the-art methods for this problem. Our empirical results indicate that Imperialist Competitive algorithm outperforms the existing results for standard benchmark instances from the literature. Furthermore, we compare our folding results with proteins with known folding. Keywords: Imperialistic Competitive Algorithm, metaheuristics, hydrophobic-polar model, protein folding 1. Introduction The 3D structure of proteins, which itself is a function of its sequence, is crucial to pharmacology and medical sciences. Most drugs work by attaching themselves to a protein so that they can either stabilize the normally folded structure or disrupt the folding pathway, which leads to a harmful protein [20]. Thus, knowing exact 3D shapes will help to design drugs, and understanding the functionality of a protein. Although a system of differential equations exists to describe the folding forces, due to its complication, its always preferred to solve the problem using more simplified methods. These models try to generally reect different global characteristics of protein structures [20]. In the hydrophobic-polar (HP) model [4] the primary amino acid sequence of a protein is abstracted to a sequence of hydrophobic (H) and polar (P) residues that is represented as a string over the letter H and P. It describes the proteins based on the the hydrophobicity of amino acids which makes them be less exposed to the aqueous solvent than the polar ones, thus resulting in the formation of a hydrophobic core in the spatial structure. In the model, the amino acid sequence can be seen as a binary sequence of monomers which are hydrophobic or polar. The structure of the protein can now be defined as a series of monomers on the verticess of a three dimensional cubic lattice. The free energy of a conformation is dened as the summation of the non-consecutive cotacts between hydrophobic and hydrophobic amono acids in the way that each contact is considerred as a negative point. Moreover, a contact is assumed as two non-consecutive amino acids in the chain are placed in adjacent sites in the lattice. Therefore, nding optimal structures of the HP model on a cubic lattice is NPcomplete problem [2]. In conclusion, achievement of the native structure of a given sequence is an optimization problem which should be solved with an optimization algorithm such as Ant colony optimization, GA, PSO, or Imperialist Competitive Algorithm (ICA). ICA is a new socio politically motivated global search strategy that has recently been introduced for dealing with different optimization task showing great performance in both convergence rate and better global optima achievement [14-19]. In this paper, we used imperialist Competitive Algorithm in protein folding estimation and compared it with the present results.

2 2. The Protein Folding Probelm Efforts to solve the protein folding problem have traditionally been rooted in two schools of thought [20]. In terms of thermodynamics, native structure of the protein possesses the global minimum of its free energy. On the other hand, one can have a evolutionary view on the problem of protein folding signifying that the native structure has been evolved within the time. Thus, methods have been developed to map the sequence of one protein (target) to the structure of another protein (template),to model the overall fold of the target based on that of the template and to infer how the target structure will be changed,related to the template, as a result of substitutions [1].Accordingly methods for protein-structure prediction has been divided into two classes: de novo modeling and comparative modeling. The de novo approaches can be further subdivided, those based exclusively on the physics of the interactions within the polypeptide chain and between the polypeptide and solvent, using heuristic methods [9], [10],and knowledge-based methods that utilize statistical potential based on the analysis of recurrent patterns in known protein structures and sequences. The comparative modeling models structure by copying the coordinates of the templates in the aligned core regions. The variable regions are modeled by taking fragments with similar sequences from a database [1]. The processes involving in folding of proteins are very complex and only partially understood, thus the simplied models like Dills HP model have become one of the major tools for studying proteins [4]. The HP model is based on the observation that hydrophobic interconnection is the driving force for protein folding and the hydrophobicity of amino acids is the main force for development of native conformation of small globular proteins. In the HP model, the primary amino acid sequence of a protein is abstracted to a sequence of hydrophobic (H) and polar (P) residues, amino acid components. The protein conformations of this sequence are restricted to self-avoiding paths on 3 dimensional sequence lattice. One of the most common approaches to protein structure prediction is based on the thermodynamic hypothesis which states that the native state of the protein is the one with lowest Gibbs free energy. In the HP model, the energy of a conformation is dened as a number of topological contacts between hydrophobic amino acid that are not neighbors in the given sequence. More specically a conformation c with exactly n such H-H contacts has free energy of E(c) = n.(1). The 3D HP protein folding problem can be formally dened as follows.given an amino acid sequence s = s 1 s 2...s n, nd an energy minimizing conformation of s, i.e. nd c s C(s) such that E s = E(c s ) = /min cc(s) E(c), where C(s) is the set of all valid conformations for s. It was proved that this problem is NP-hard [2].A number of well-known heuristic optimization methods have been applied to the 3D protein folding problem including Evolutionary Algorithm (EA) [9], Monte Carlo (MC) algorithm [10] and Ant Colony Optimization (ACO) algorithm [7]. An early application of EA to protein structure prediction was presented by Unger and Moult [11]. Their EA incorporates characteristics of Monte Carlo methods. Currently among the best known algorithms for the HP protein folding problem is Pruned-Enriched Rosenblum Method (PERM) [8]. Among these methods are the Hydrophobic Zipper (HZ) method [5],Ant Colony Optimization (ACO), Ant Colony System (ACS)[20], and the Constraint-based Hydrophobic Core Construction Method (CHCCM) [12]. The Core-direct chain Growth method (CG) [3] biases construction towards nding a good hydrophobic core by using a specically designed heuristic function. 3. Imperialistic Compatetive Algorithm for Protein Folding Problem Imperialist competitive algorithm (ICA) is a heuristic stochastic algorithm sufficient for solving NP-hard problems. The first step of the algorithm is creating initial population where each population in ICA is considered as a country. After calculating the fitness (power) of all countries, some of the most powerful countries in the population selected as the the imperialists while the rest form the colonies are assigned randomly to each of imperialist countries. Indeed, the number of colonies for each imperialist country is pro-

3 portional to the power of the imperialist.when the competition starts, imperialists attempt to achieve more colonies and the colonies start to move toward their imperialists. Thence, within the competition, the powerful imperialists will be improved or substitued with more powerful colonies whearas the weakest colonies will be collapsed. Finally, just one imperialist will remain while the position of the last imperialist and its colonies will be the same. The flowchart of this algorithm is shown in Figure 2 [11]. More details about this algorithm are presented in [8-13]. In the shadow of protein folding,the evaluating step of Imperialist algorithm and any other heuristic algorithm is the crucial point. Considering each country as a sequence of random numebrs, each random number determine the direction in which the next amino acid will be placed. Therefore, among all the possbile nodes in a 3D space for an amino acid to be placed, the one which will be the nearest point to the random number will be choosed. According to this discription, on can easily guess that the dimension of each country is equal to(3.l) 1 where l is the length of the polypeptide. Then, the evaluation of the power of each country is straight forward as described in the introduction. The suedocode of the algorithm is as follows: 0) Define objective function. 1) Create initial empires. 2) Assimilation: Colonies move towards imperialist states in different in directions. 3) Revolution: Random changes occur in the characteristics of some countries. 4) Position exchange between a colony and Imperialist. A colony with a better position than the imperialist, has the chance to take the control of empire by replacing the existing imperialist. 5) Imperialistic competition: All imperialists compete to take possession of colonies of each other. 6) Eliminate the powerless empires. Weak empires loose their power gradually and they will finally be eliminated. 7) If the stop condition is satisfied, stop, if not go to 2. 8) End 3. Numerical Experiments Ten standard benchmark instances of length 48 for 3D HP protein folding shown in Table I have been widely used in the literature [3], [7], [9-11]. Experiments on these standard benchmark instances were conducted by performing a number of independent runs for each problem instance, 20 runs. The following parameter settings are used for all experiment as: Number of initial countries= 500. Number of Initial Imperialists = 8. AlgorithmParams.NumOfDecades = 200. The process in which the socio-political characteristics of a country change suddenly = 0.3. Assimilation coefficient or beta = 2. Assimilation angle coefficient or gama =.5. Algorithm- Params.Zeta = AlgorithmParams.DampRatio = The percent of Search Space Size = In Table II the achieved results by various heuristic algorithms are compared. For every of the benchmark instances the best found result by various methods is reported. We compared the solution quality obtained by: hydrophobic zipper (HZ) algorithm [5], the constrain-based hydrophobic core construction (CHCC) method [13], the core-directed chain growth (CG) algorithm [3], the contact interactions (CI) algorithm [11], the pruned-enriched Rosenbluth method (PERM) [7], the ACO algorithm of Hoos (ACO) [10] and the ICA approach presented in this paper. In the majority of the cases our average results are better than the best found results by other methods. The main disadvantage of heuristic methods, as it is mentioned by other authors, is that they achieve good folding for short proteins only.

4 Table 1 Standard Benchmark Instances. 1 HPHHPPHHHHPHHHPPHHPPHPHHPHPHHPPHHPPPHPPPPPPPPHHP 2 HHHHPHHPHHHHHPPHPPHHPPHPPPPPPHPPHPPPHPPHHPPHHHPH 3 HPHHPHHHHHHPPHPHPPHPHHPHPHPPPHPPHHPPHHPPHPHPPHP 4 PHPHHPPHPHHHPPHHPHHPPPHHHHHHPPHPHHPHPHPPPHPPHPHP 5 PPHPPPHPHHHHPPHHHHPHHPHHHPPHPHPHPPHPPPPPPHHPHHPH 6 HHHPPPHHPHPHHPHHPHHPHPPPPPPPHPHPPHPPPHPPHHHHHHPH 7 PHPPPPHPHHHPHPHHHHPHHPHHPPPHPHPPPHHHPPHHPPHHPPPH 8 PHPHPPPPHPHPHPPHPHHHHHHPPHHHPHPPHPHHPPHPHHHPPPPH 9 PHPHPPPPHPHPHPPHPHHHHHHPPHHHPHPPHPHHPPHPHHHPPPPH 10 PHHPPPPPPHHPPPHHHPHPPHPHHPPHPPHPPHHPPHHHHHHHPPH Table 2 Comparison of 3D Protein Folding. Benchmark HZ CHCC CG CI PERM ACO ICA Conclusion Imperialist Competative Algorithm can be applied to the 3D protein folding problem simply and succesfully. This algorithm outperforms the well known approaches in the literature. The folding achieved by this algorithm is very similar to the real protein folding when it is applied on short proteins. The obtained results are encouraging and the ability of the developed algorithm to generate rapidly high-quality solutions can be seen. References [1] Balev S. Solving the Protein Threading Problem by Lagrangian Relaxation, Algorithms in Bioinformatics, 3240, Springer, 2004, [2] Berger B., T. Leighton, Newton Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete, Computational Biology, 5, 1998, 2740.

5 [3] Beutler T., K. Dill, A fast conformational method: A new algorithm for protein folding simulations Protein Sci., 5, 1996, [4] Dill K., K. Lau, A lattice statistical mechanics model of the conformational sequence spaces of proteins, Macromolecules, 22, 1989, [5] Dill K., K. M. Fiebig, H. S. Chan, Cooperativity in protein-folding kinetics, Nat. Acad. Sci., USA, 1993, [6] Dorigo M., L. M. Gambardella, Ant colony system: A cooperative learning approach to the traveling salesman problem, IEEE Transactions on Evolutionary Computing, 1, 1997, [7] Hsu H. P., V. Mehra, W. Nadler, P. Grassbergen, Growth algorithm for lattice heteropolymers at low temperature, Chemical Physics, 118, 2003, [8] Krasnogor N., D. Pelta, P. M. Lopez, P. Mocciola, E. de la Cana, Genetic algorithms for the protein folding problem: a critical view, Engineering of intelligent systems, ICSC Academic press, 1998, [9] Liang F., W. H. Wong, Evolutionary Monte Carlo for protein folding simulations, Chemical Physics, 115 7, 2001, [10] Shmygelska A., H. H. Hoos, An ant colony optimization algorithm for the 2D and 3D hydrophobic polar protein folding problem, BMC Bioinformatics, 6:30, [11] Toma L., S. Toma, Contact interaction method: a new algorithm for protein folding simulations, Protein Sci., 5, 1996, [12] Unger R., J. Moult, Genetic algorithms for protein folding simulations, Molecular Biology, 231, 1993, [13] Yue K., K. Dill, Forces of tertiary structural organization in globular proteins, Nat. Acad. Sci., USA, 1995, [14] Atashpaz-Gargari, E., Lucas, C. Imperialist Competitive Algorithm: An algorithm for optimization inspired by imperialistic competition IEEE Congress on Evolutionary Computation [15] Atashpaz-Gargari, E., Hashemzadeh, F., Rajabioun, R. and Lucas, C. Colonial Competitive Algorithm, a novel approach for PID controller design in MIMO distillation column process International Journal of Intelligent Computing and Cybernetics, 1 (3), [16] Rajabioun, R., Atashpaz-Gargari, E., and Lucas, C. Colonial Competitive Algorithm as a Tool for Nash Equilibrium Point Achievement Lecture notes in computer science, 5073, [17] Lucas. C., Nasiri-Gheidari. Z., Tootoonchian. F., Application of an imperialist competitive algorithm to the design of a linear induction motor Energy Conversion and Management. 51,pp [18] R. Rajabioun, E. Atashpaz-Gargari, C. Lucas. Colonial Competitive Algorithm as a Tool for Nash Equilibrium Point Achievement Springer LNCSBook Chapter, 2008 [19] E. Hosseini Nasab, M.Khezri, M.Sahab Khodamoradi, E. Atashpaz Gargari. An application of Imperialist Competitive Algorithm to Simulation of Energy Demand Based on Economic Indicators: Evidence from Iran European Journal of Scientific ResearchVol.43 No.4,pp , 2010 [20] S. Fidanova, I. Lirkov. Ant Colony System Approach for Protein Folding Proceedings of the International Multiconference on Computer Science and Information Technology,pp , 2008