Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN 1, Zi-qi WEI 2, Chang-jun ZHOU 1,* and Bin WANG 1

Size: px
Start display at page:

Download "Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN 1, Zi-qi WEI 2, Chang-jun ZHOU 1,* and Bin WANG 1"

Transcription

1 206 International Conference on Artificial Intelligence and Computer Science (AICS 206 ISBN: Stochastic Fractal Search Algorithm for 3D Protein Structure Prediction Chuan SUN, Zi-qi WEI 2, Chang-un ZHOU,* and Bin WANG Key Laboratory of Advanced Design and Intelligent Computing (Dalian University, Ministry of Education, Dalian, China 2 Computing Science Department, University of Alberta Edmonton, Alberta, T6E P6, Canada zhou-chang23@63.com *Corresponding author Keywords: Protein structure prediction, Stochastic Fractal Search algorithm, AB off-lattice model. Abstract. Protein structure prediction (PSP is a process to predict protein three-dimensional structures based on the known amino acid sequence, which has a very important significance for biological information research, disease treatment and drug research and development and so on. PSP is a typical NP problem: firstly, we should establish the appropriate mathematical model according to the problem; secondly, a solution process for the model was given based on an appropriate optimization algorithm. In this paper, an original protein structure prediction scheme is transformed into a numerical optimization problem by adopting AB off-lattice model. Then the minimum free energy value of protein structure will be gained by using Stochastic Fractal Search algorithm (SFS. Experiments show that the proposed method outperforms other algorithms on the accuracy of calculating the protein sequence energy value, which is turned out to be an effective way to analyze protein structure. Introduction Protein structure prediction (PSP is a process to predict protein three-dimensional structures based on the known amino acid sequence[], which has a very important significance for biological information research, disease treatment and drug research. With the completion of the work of human gene map, how to predict the structure of protein according to its gene sequence is becoming an important challenge for biological research. In 960s, Anfinsen proves that protein folding information hidden in the protein sequence. Then he proposed that the natural conformation of the protein corresponds to its lowest energy structure[2], which becomes the basic theory for protein structure prediction. After that, there are many heuristic algorithms[3] in the simulation computer to realize the prediction of protein structure based on AB off-lattice model at home and abroad, such as the Tabu Search-Particle Swarm Optimization (TPSO[4], Balance-Evolution Artificial Bee Colony (BE-ABC[5], Genetic- Particle Swarm Optimization-Tabu Search algorithm (PGATS[6], Artificial Bee Colony algorithm (ABC[7] and others. In this paper, artificial intelligence algorithm Stochastic Fractal Search algorithm (SFS is used to realize the simulation. 3D AB-off-lattice Model In 993, Stillinger et al. proposed the AB non-lattice model[8]. Its biggest characteristic is the bond angle and the torsion angle are both arbitrary, and the interaction force between nonadacent amino acids in protein is considered. The structure is determined by 2n 5 angles when its sequence length is n. In the AB off-lattice model, all amino acids are classified into hydrophobic and hydrophilic amino acids. A is a hydrophobic amino acid and B is a hydrophilic amino acid. The energy value E of AB non-lattice model consists of the bending potential energy of the backbone and the gravitational potential energy between the adacent amino acids as shown: 56

2 cos E r C r n n 2 n i [ i ( i, i ] i 2 4 i i 2 ( k k 2 2 i [ cos( l ] [ sin( l ] k i l i k i l i r C( i, ( i 5 i 8 Where r i denotes the distance between particles i and in the chain. If the residue is hydrophobic, equals to ;otherwise, equals to -. It is easy to show that from Eq. (3, the result equals to for AA pairs, 0.5 for BB pairs, and -0.5 for AB or BA pairs. To put it another way: correlations between AA particles should be strongly enhanced, while BB pairs are weakly encouraged; otherwise it results in a weak repulsion. Therefore, the protein energy based on AB off-lattice model can be calculated, as long as we get amino acid sequence, bond angles and torsion angles between adacent amino acid. (2 (3 Algorithms Introduction Encoding In this paper, the coordinates of the first three amino acids are ( 0 0, 0,,( 0, 0,,(cos,sin 0, and the coordinates of other amino acids are obtained by the following formula ( 4 i n : ( x, y, z ( x cos( cos(, y sin( cos(, z sin( i i i i i 2 i 3 i i 2 i 3 i i 3 The bond and torsion angles are initialized with random number. And then we get appropriate angles and 3D coordinates of amino acid that makes the energy of protein lowest with the following algorithm. Stochastic Fractal Search algorithm SFS is a new algorithm published by Salimi Hamid[9]. Compared with some well-known heuristic search methods[0], the proposed method has the high performance in solving various nonlinear functions. The SFS algorithm consists of two main processes: the Diffusion process and the Update process. Each particle in the Diffusion process attempts to simulate the branching characteristics of a dielectric breakdown that is suitable to solve global optimization problems as a search tool. Diffusion process has advantages in exploring and exploiting the search space by using Gauss distribution. GW G P P P ' ( Best, ( * Best * i log( g *( Pi PBest g PBest and are denoted as the expectation and the standard deviation of Gaussian distribution separately. ip means the position of the ith point. P Best is described as the position of the best point in the group. is calculated by Eq. (6 in order to encourage localized search and reduce the size of Gaussian ump as the number of generation g in Gaussian walk increases. In the Update process, the best particle generated from the diffusion process is the only particle considered, and the remaining particles are discarded. Then the current particles exchange information and generate new individuals. Firstly, all points are ranked based on its fitness function value and get a probability value Pa i separately. Its function is to increase the chance of changing the position of points which have no a good solution. Secondly, P i is updated by Eq. (7 if the condition Pai is satisfied, otherwise it remains unchanged. 57 (4 (5 (6

3 P P *( P P ' i k i The strategy of SFS is to enhance the ability of searching space by using Diffusion process, and to promote the diversity by applying the characteristics of information exchange among populations in Update process, so as to obtain the global minimum value from potential energy surface with multiple local minima. (7 Experimental Results And Analysis Parameter settings In this paper, we implement SFS through MATLAB 204a in Windows 7. The parameters used are set as follows: Maximum Iteration Number MCN = 5000, Maximum Diffusion Number MDN = 5, the Number of Particle NP = 50. Experiment on Fibonacci sequences The Fibonacci sequence is widely used in protein structure prediction because it can reflect certain regularity on the energy of protein amino acids. Tab. shows the lowest free energy values of the Fibonacci sequences gained by TPSO[4], BE-ABC[5], PGATS[6], ABC[7], and SFS on 3D AB offlattice model. Table. Lowest energy values of the Fibonacci Sequences. Length E(BE- E(TPSO E(PGATS E(ABC E(SFS ABC The better the algorithm is, the smaller the energy value is and the more stable the protein is. As can be seen from Table, SFS can obtain lower energy values than other algorithms unless the length 3. For example, when the sequence length is 3, the lowest energy value that SFS can get is , but TPSO can get which get the smaller energy than SFS. When the length of Fibonacci sequence is 2, 34 or 55, the lowest energy value that SFS can obtain is superior to other energy values. Therefore, the SFS algorithm proposed in this paper is feasible and has advantages over other algorithms. It is effective to solve the structural prediction of Fibonacci sequence in protein. Experiment on Real Protein Sequences In this section, we experiment with four real protein sequences, which can be downloaded from the PDB database[2]. Tab.2 shows the lowest energy values of the real protein sequences obtained with TPSO[4], Levy Flight Particle Swarm Optimizer [](LPSO, PGATS[6], ABC[7] and SFS. Table 2. Lowest energy values of real protein sequences. Length E(TPSO E(LPSO E(PGATS E(ABC E(SFS 2KGU CRN KAP PCH From Table 2, we can see that SFS algorithm can obtain lower energy values of four real protein sequences compared to TPSO, LPSO, PGATS. However, when compared with ABC algorithm, the SFS algorithm outperforms ABC in addition to the 2KAP sequence. For example, the value of 2KGU obtained by SFS is , while the best result of others is of PGATS. The difference between them is But the energy value of 2KAP obtained by SFS is , and the best result gained by the other four algorithms is of ABC which is better than SFS s. In 58

4 general, we can conclude that the SFS algorithm is feasible in predicting the real protein structure and more efficient than the other four algorithms. And the predicted conformations of SFS are more closely resemble the native structure of the real protein Figure. Comparison between prediction structure and real structure of CRN. Fig. describe prediction structure and real structure of CRN. The structures of sequences predicted by SFS is depicted on the left side and the right side shows the real structure. We can find prediction structure by SFS is similar to its real structure in the PDB. Therefore, SFS is a algorithm with high performance and the paper proposed is a feasible solution for protein folding prediction. Conclusion In this paper, we use a new optimization algorithm SFS to predict the structure of protein. In SFS, each agent in the group generates some new agents around its current location and forms the diffusion process by Gaussian distribution and simulates updating position of each agent based on statistical approaches. In the process, the information is exchanged among all individuals to produce new individuals, which speeds up the convergence rate. Therefore, the SFS has better performance in finding the global optimum while avoiding getting into the global minimum. In this paper, Protein structure experiments generally yield better results, suggesting that SFS is more accurate and more efficient in predicting protein structure than other algorithms. However, there is still space for SFS to improve and optimize about predicting protein structure. SFS is only adopted for four real protein sequence in experiments. Therefore, further research is required to examine the performance of the proposed algorithm on other complex proteins. Acknowledgement Work partially supported by the National Natural Science Foundation of China (Nos.66722, , , , , References [] Teso S., Risio C.D. An on/off Lattice Approach to Protein Structure Prediction from Contact Maps. Pattern Recognition in Bioinformatics, Springer Berlin Heidelberg, 200, [2] Anfinsen C.B. Principles that govern the folding of protein chains. Science, 973, 8(4096: [3] Talbi E.-G. Metaheuristics: from design to implementation [4] Guo H., Lan R., Chen X. Tabu search-particle swarm algorithm for protein folding prediction. Computer Engineering & Applications [5] Li B., Chiong R., Lin M. A balance-evolution artificial bee colony algorithm for protein structure optimization. Computational Biology and Chemistry. 205; 54:

5 [6] Zhou C.J., Hou C.X., Wei X.P., Zhang Q. Improved hybrid optimization algorithm for 3D protein structure prediction. Journal of Molecular Modeling. 204; 20(7: 2. [7] Li Y.Z., Zhou C.J., Zheng X.D. Artificial Bee Colony Algorithm for the Protein Structure Prediction Based on the Toy Model. Fundam Inform. 205; 36(3: [8] Stillinger F.H., Head-Gordon T., Hirshfeld C.L. Toy model for protein folding. Physical Review E Statistical Physics Plasmas Fluids & Related Interdisciplinary Topics. 993; 48(2: [9] Salimi H. Stochastic Fractal Search: A powerful metaheuristic algorithm. Knowledge-Based Syst. 205; 75: -8. [0] Yang X.S., Deb S. Engineering Optimization by Cuckoo Search. International Journal of Modeling & Optimization. 200; (4: [] Chen X., Lv M. An Improved Particle Swarm Optimization for Protein Folding Prediction. Electronic Business. 20; 3(: -8. [2] The PDB database, 60