TravelingSalesmanProblemBasedonDNAComputing

TravelingSalesmanProblemBasedonDNAComputing Yan Li Weifang University College of computer and communication Engineering Weifang, P.R.China jk97@zjnu.cn Abstract Molecular programming is applied to traveling salesman problem whose solution requires encoding of real values in DNA strands. This paper introduces a new DNA encoding method to represent numerical values. DNA strands are designed to encode real values by variation of their temperatures. According to the characteristics of the problem, a DNA algorithm solving the traveling salesman problem is given. The effectiveness of the proposed method is verified by simulation. 1. Introduction In 1961, Frynman [1] gave a visionary talk about the prospect of performing massively parallel computations in nanotechnology. The first conception of using DNA for computation ripened in the mid-1980s which the first theoretical model of splicing systems introduced by Head [2]. These ideas came to full power with Adleman s seminal experiment which solved a small instant of the directed Hamiltonian path problem (HPP) using solely DNA molecules and biomolecular laboratory techniques [3]. DNA computing is an interdisciplinary field of research straddling computer science, molecular biology, mathematics and chemistry. One advantage of DNA computing is massive parallelism. Another benefit is its enormous information storage capacity. DNA computing holds vast potential for solving problems that are extremely difficult to solve by conventional computation methods. It has been an area of active investigation since Adleman s paper published in Science. Since then, several more theoretical results and innovative experimental solutions have been introduced. Results of these studies are considered rather significant that much of the research has been published by leading scientific journals such as Science and Nature. A DNA is a molecule that plays the main role in DNA based computing. DNA is a polymer, which is strung together from monomers called deoxyribonucleotides. Each deoxyribonucleotide containing three components: a sugar, a phosphate group and a nitrogenous base. This sugar has five carbon atoms for the sake of reference there is a fixed numbering of them. Because the bade also has carbons, to avoid confusion the carbons of the sugar are numbered from 1 to 5. The phosphate group is attached to the 5 carbon, and the base is attached to the 1 carbon. Within the sugar structure there is a hydroxyl group attached to the 3 carbon. Distinct nucleotides are detected only with their bases, which come in two sorts: purines and pyrimidines. Purines include adenine and guanine, abbreviated A and G. Pyrimidines contain cytosine and thymine, abbreviated C and G. Because nucleotides are only distinguished from their bases, they are simply represented as A, G, C, or T nucleotides, depending upon the sort of base that they have. Mathematically, this means we have at our disposal a four-letter alphabet X={A,G,C,T} to encode information. Nucleotides can link together in two different ways. The first method is that the 5 -phosphate group of one nucleotide is joined with 3 -hydroxyl group of the other forming a phos-phodiester bond. The resulting molecule has the 5 -phosphate group of one nucleotide, denoted as 5 end,and the 3 -OH group of the other nucleotide available, denoted as 3 end, for bonding. This gives the molecule the directionality, and we can talk about the direction of 5 end to 3 end or 3 end to 5 end. The second way is that the base of one nucleotide interacts with the bade of the other to form a hydrogen bond. This bonding is the subject of the following restriction on the base pairing: A and T can pair together, and C and G can pair together no other pairings are possible. This pairing principle is called the Watson-Crick complementarity. A DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of four bases they contain. Two strands of DNA can form (under appropriate conditions) a double strand, if the respective bases are the Watson-Crick complements of each other: A matches T and C matches G, also 3 end matches 5 end. Some of the simple operations that

can be performed on DNA sequences are accomplished by a number of enzymesthat execute a few basic tasks. Biologic techniques include hybridization, ligation, extraction, polymerase chain reaction(pcr), gel electrophoresis, melting, annealing and the other techniques. Most of the previous research in DNA computing do not require the consideration of the representation of numerical data in DNA strands. However, many practical application in the real world involve edge-weighed graph problems. Examples include the traveling salesman problem, the shortest path problem, the minimum spanning tree problem. Therefore, representation of numerical data in DNA strands is an important issue toward expanding the capability of DNA computing to solve numerical optimization problems. There exists previous work to represent the numerical data with DNA, but the results are not satisfactory yet [4, 5]. In this paper, Similar to Ji Youn Lee s method [6], we introduce a encoding method that utilizes a temperature gradient to overcome the drawbacks of previous work. The paper is organized as follows. In Section 2, we describe the DNA encoding scheme. Section 3 introduces DNA 3-dimensional graph structures. Section 4 describes the DNA computing for solving the TSP problem. Sections 5, we draw conclusion. 2. DNA Coding for TSP 2.1. Traveling Salesman Problem The traveling salesman problem is to find a minimum cost path for a given set of vertices(cities) and edges(road). In addition, the solution path must contain all the cities given, each only once, and begin from the specified city to which the tour ends. More formally [7], given a set C of m cities, distance d(c i,c j ) Z + for each pair of cities c i,c j C, and a positive integer B, the TSP is to determine a tour of C having length B or less, i.e., a permutation (c π(1),c π(2),,c π(m) ) of C such that ( m 1 ) d(c π(i),c π(i+1) ) + d(c π(m),c π(1) ) B (1) i=1 Figure 1 shows an instance of an edge-weight graph that has 7 nodes and ten edges of five differing cost values. Let the starting vertex is 0. 2.2. Coding Scheme In 1998, Narayanan and Zorlalas [8] presented a conceptual encoding method that represents costs with the lengths of DNA strands. However, this method uses a code length which is proportional to the absolute value of the weights Figure 1. The 7-city traveling salesman problem solved in this work. The paths start and end with city 0. The circles denote the cities and the edges represent the road. The number on each edge represents the cost on the given road. and thus its code length grows fast as the number of resolution of weights increases. In addition, this method seems not appropriate for representing real values. Shin et al. [9] proposed a method for representing the real numbers in fixed-length DNA strands by varying the number of hydrogen bonds. Although the number of hydrogen bonds is an important factor in deciding the thermal stability of DNA strands, it is not a sufficient factor. The melting temperature (T m ) is a more direct characteristic indicating the stability of a DNA duplex. This paper proposes a melting temperature control encoding method. The method uses fixedlength DNA strands and represents weights by melting temperatures of the given DNA strands. The basic Tm calculations were performed according to the following equation [10]: yg + z C 16.4 T m =64.9+41.0 ( wa + x T + y G + z C ) (2) where x, y, w and z are the number of the bases of T, G, A and C, respectively. This equation assumes that the annealing occurs under standard conditions in a buffered solution of 50 mm Na+ and 50 nm of oligonucleotide concentration, with a ph close to 7.0. The basic idea is to design the sequences so that the DNA strands for higher-weight values have higher melting temperature than those for lowerweight values. The weight sequences describe the proportion of edge weights, so this coding scheme can express the real value weights, and therefore a more economical path has a lower G/C contents. Each vertex sequence is designed to have the same melting temperature, because vertex sequences should contribute equally to the thermal stability of path. The coding scheme is illustrated in Figure 2. Figure 2(a) depicts vertex sequence that consists of two parts: the first half of vertex sequence is 10-bp(base pairs) position

sequence(p i ), the second half of vertex sequence is 10-bp weight sequence( W i ). Figure 2(b) shows the edge sequence that consists of two components: One component is complementary to the coding of v i,and the other component is complementary to the coding of vertex v j. The polarity of edge codes is the opposite (3 5 ) to vertex codes. The position sequences represent a specific vertex, and the weight sequence denotes a weight value in an edge. where the threshold value δ is determined by experiments. 3. Construction of Graphs by DNA In 1998, Jonoska et al. [11] proposed 3-dimensional (3D) graph structures can be used for solving computational problems with DNA molecules. For the satisfiability and the 3-vertex-colorability problems they gave procedures that use 3D graph structure. In both cases, k-armed branched junction molecules are used to represent the k- degree vertices of the graph. Examples of 2-, 3-, 4-armed branched molecules are presented in Figure 3. The 3 ends of the k-armed branched molecules end with single stranded extensions. Polarity of the DNA strands is indicated with arrowheads placed at the 3 end. Hydrogen bonds between the anti-parallel complementary Watson-Crick bonds are depicted as dotted segments between the strands. Figure 2. coding scheme 2.3. Genetic Code Optimization The genetic algorithm(gas), which mimic adaptation and evolution of living creatures, have been proposed to design better machines and to make self evolving computer programs. GAs have been inspired by the challenge to render tractable problem hardly solvable to conventional computers, using analogies from natural biological phenomena. An application of the DNA based evolution program to a search for good DNA encoding is sketched. Three fundamental operators of GAs based on the DNA coding method are Crossover, Mutation and Inversion. Crossover: It is a process of genetic information. The crossover has onepoint and multi-point operations and a crossover point is randomly set. It is concluded that a two-point crossover seems to be an optimal number for multi-point crossover. Mutation: Two kinds of point mutation exist: transition mutation and transversion mutation. In a transition mutation, purines are replaced by purines, and pyrimidines by pyrimidines. In a transversion mutation, purines are replace by purines and pyrimidines, Inversion: the order of genes between two randomly chosen positions is inverted within the vertex sequences. Fitness function is defined to promote the paths formed with lower costs so that the minimum cost path could be found. Let T m be a temperature, Tm the melting temperature. Then the function is defined as: { T Tm if T T F i = m δ, (3) 0 otherwise. Figure 3. Building blocks for vertices and edges 3D DNA structures do not contain open ends are referred as graph structures. To form the graph, all vertex building blocks and all edge molecules are combined and their ends are allowed to form double stranded DNA according to Watson-Crick complementary. Once formed, the edges are locked together by sealing all open nicks in the DNA strands with DNA ligase. 4. DNA Algorithms Solving the TSP Solution The molecular algorithms adopted in this paper are the same as the iterative version of molecular programming described in Ref [12]. Simulation has been performed on the graph depicted in Figure 1. Step 1, encoding: determine the code sequence using genetic code optimization. Genetic algorithm parameters for

code optimization are listed in Table1. Table2 shows the weight sequences. Table3 shows the position sequences of vertex. Table 1. Genetic algorithm parameters for DNA codes optimization of TSP Parameter Value Population size 300 Max generation 300 Selection method Roulette Wheel Crossover rate 0.6 Mutation rate 0.3 Ligation error rate 0.001 Threshold 0.01 Table 2. Weight sequences for the 7-city traveling salesman problem Weight Sequence 5 3 T m( o C) 3 TCTATAAAATGAATTGCATG 41.53 5 AGTCTATACGAATGAGTCAC 47.68 7 AGGATCCTGTTCTCCTGACG 53.83 9 TGCCCAGGCATTACGCGTTC 55.88 11 GGCACGACTGCTGGCAGCCG 62.03 Table 3. The first half of vertex sequences for the 7-city traveling salesman problem Vertex Position Sequence 5 3 0 AGGCTGGGCC 1 ACCGGCCGGA 2 CCTGTCCGCG 3 TTACGCGCCC 4 AGCGGCAGCC 5 GGGCCAGGCT 6 CAGGCAGCCG Step 2, produce all candidate solutions by molecular operators. Combine multiple copies of all vertex building blocks with all edge molecules at random and allow the complementary ends to hybridize and be ligated. Step 3, remove partially formed 3D DNA structures with open ends that have not been matched. Step 4, those DNA graphs that contained exactly n vertices and the length of path is n+1 are selected. Step 5, keep only those paths that enter all of the vertices of the graph at least once. Step 6, since the starting city and the finishing city are the same as in the traveling salesman problem, we keep only those paths that start and end with city 0. Finally, the path that has the least Tm is chosen as a solution, which represents the minimum cost path. In this work, the path with minimum cost be 0 1 2 3 4 5 6 0 and the cost of this path is 27. 5. Conclusions We show that 3D DNA graph structures can be used for solving computational problems with DNA molecules. Vertex building blocks consisting of k-armed branched junction molecules are used to form the graph. The detailed description of forming the building blocks and graph structures can be found in [13]. We presented a weight encoding method and the associated DNA computing that uses melting temperature of DNA strands and 3D DNA graph structures to solve the traveling salesman problem efficiently. One of the most important features of our method is that it can handle quantitative expression of real numbers using fixed-length DNA strands. In terms of efficiency, the use of 3D DNA structures could significantly reduce the time and steps needed to identify a solution. References [1] R.P.Feynman. Miniaturization. D.H.Gilbert (Ed.), New York, Reinhold, 282-296,1961. [2] T.Head. Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors. Bulletin of Mathematical Biology, 737-759,1987. [3] L.M.Adleman. Molecular computation of solutions to combinational problems. Science, 266(5187):1021-1024, 1994. [4] D.Faulhammer, R.J.Lipton, L.F.Landweber. Counting DNA:eatimating the complexity of a test tube of DNA. Biosystems,52:193-196, 1999. [5] D.G.Susan, P.R.Paul, G.L.Max. Computation with DNA on surfaces. Surface Science, 500:699-721,2002. [6] Ji Youn Lee et al.. Solving traveling salesman problems with DNA molecules encoding numerical values. BioSystems, 39-47, 2004. [7] M.R.Garey, D.S.Johnson. Computers and Intractability. A guide to the Theory of NPcompleteness. W.H.Freeman and company,1979.

[8] A.Narayanan, S.Zorbalas. DNA algorithms for computing shortest paths. Proceeding of the Genetic Programming, Morgan Kaufmann, 718-723, 1998. [9] Shin S.Y.,et al.. Evolutionary sequences generation for reliable DNA computing. Proceedings of the Congress on Evolutionary Computation 1999, 994-1000, 1999. [10] J.G.wetmur. DNA probes: applications of the principles of nucleic acid hybridization. Biochemistry and Molecular BiologyCrit, 26:227-259,1991. [11] N.Jonoska, S.Kari, M.Saito. Graph Structures in DNA Computing, in Computing with Bio- Molecules, Theory and Experiments. Springer- Verlag, 93-110,1998. [12] Liu xikui, Li yan, Xu jin. DSolving Minimum Spanning Tree Problem with DNA Computing. Journal of electronics, 22(2):112-117,2005. [13] F.M.Ausubel, R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, K. Struhl, P. Wang-Iverson and S.G. Bonotz. Current Protocols in Molecular Biology. Greene Publishing Associates and Wiley-Interscience, New York,1993.