TravelingSalesmanProblemBasedonDNAComputing

Similar documents
Introduction to DNA Computing

Bio-inspired Computing for Network Modelling

A DNA Computing Model to Solve 0-1 Integer. Programming Problem

CSCI 2570 Introduction to Nanocomputing

What Are the Chemical Structures and Functions of Nucleic Acids?

BIOLOGICAL SCIENCE. Lecture Presentation by Cindy S. Malone, PhD, California State University Northridge. FIFTH EDITION Freeman Quillin Allison

Gene and DNA structure. Dr Saeb Aliwaini

Appendix A DNA and PCR in detail DNA: A Detailed Look

Clustering over DNA Strings

Nucleic acids. How DNA works. DNA RNA Protein. DNA (deoxyribonucleic acid) RNA (ribonucleic acid) Central Dogma of Molecular Biology

Fast parallel molecular solution to the dominating-set problem on massively parallel bio-computing

Chapter 9: DNA: The Molecule of Heredity

Nucleic Acids. By Sarah, Zach, Joanne, and Dean

DNA STRUCTURE & REPLICATION

Fundamentals of Organic Chemistry. CHAPTER 10: Nucleic Acids

Dina Al-Tamimi. Faisal Nimri. Ma amoun Ahram. 1 P a g e

Molecular Biology. IMBB 2017 RAB, Kigali - Rwanda May 02 13, Francesca Stomeo

Outline. Structure of DNA DNA Functions Transcription Translation Mutation Cytogenetics Mendelian Genetics Quantitative Traits Linkage

Nucleic Acids: DNA and RNA

By the end of today, you will have an answer to: How can 1 strand of DNA serve as a template for replication?

A DNA-based in vitro Genetic Program

A Genetic Algorithm for Order Picking in Automated Storage and Retrieval Systems with Multiple Stock Locations

Molecular Biology (1)

DNA Structure & Replication How is the genetic information stored and copied?

Nucleic Acids and the RNA World. Pages Chapter 4

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

DNA. translation. base pairing rules for DNA Replication. thymine. cytosine. amino acids. The building blocks of proteins are?

3.A.1 DNA and RNA: Structure and Replication

Concept 5.5: Nucleic acids store and transmit hereditary information

Ch Molecular Biology of the Gene

GENES AND CHROMOSOMES-I. Lecture 3. Biology Department Concordia University. Dr. S. Azam BIOL 266/

Lesson Overview. The Structure of DNA

Syllabus for GUTS Lecture on DNA and Nucleotides

Structural Bioinformatics (C3210) DNA and RNA Structure

DNA DNA. 1 of 9. Copyright 2007, Exemplars, Inc. All rights reserved.

The Metaphor. Individuals living in that environment Individual s degree of adaptation to its surrounding environment

Nucleic Acids: How Structure Conveys Information 1. What Is the Structure of DNA? 2. What Are the Levels of Structure in Nucleic Acids? 3.

DNA Structure and Replication

translation The building blocks of proteins are? amino acids nitrogen containing bases like A, G, T, C, and U Complementary base pairing links

Molecular Biology (1)

DNA Computing and Patterning

Evolutionary Computation. Lecture 1 January, 2007 Ivan Garibay

UNIT 24: Nucleic Acids Essential Idea(s): The structure of DNA allows efficient storage of genetic information.

A. Incorrect! A sugar residue is only part of a nucleotide. Go back and review the structure of nucleotides.

II. DNA Deoxyribonucleic Acid Located in the nucleus of the cell Codes for your genes Frank Griffith- discovered DNA in 1928

Exam: Structure of DNA and RNA 1. Deoxyribonucleic Acid is abbreviated: a. DRNA b. DNA c. RNA d. MRNA

Efficient Initial Pool Generation for Weighted Graph Problems Using Parallel Overlap Assembly

Nucleic acids. What important polymer is located in the nucleus? is the instructions for making a cell's.

DNA and RNA Structure Guided Notes

Name: - Bio A.P. DNA Replication & Protein Synthesis

DNA Structure and Replication 1

BCH302 [Practical] 1

Ch 10 Molecular Biology of the Gene

DNA Replication: Paper Clip Activity

CHAPTER 8 Nucleotides and Nucleic Acids

DNA Structure & the Genome. Bio160 General Biology

DNA and RNA Structure. Unit 7 Lesson 1

How do we know what the structure and function of DNA is? - Double helix, base pairs, sugar, and phosphate - Stores genetic information

DNA Computation. Outline. What is DNA? Double Helix. !Introduction to DNA!Adlemanís experiment!cutting Edge Technologies!Pros and Cons!

BIOCHEMISTRY Nucleic Acids

History of Topology. Semester I, Graham Ellis NUI Galway, Ireland

R = G A (purine) Y = T C (pyrimidine) K = G T (Keto) M = A C (amino) S = G C (Strong bonds) W = A T (Weak bonds)

Chapter 10. DNA: The Molecule of Heredity. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc.

Chapter 9. Topics - Genetics - Flow of Genetics - Regulation - Mutation - Recombination

DNA, Fantastic! View it at Glenn Wolkenfeld 2012

An Effective Genetic Algorithm for Large-Scale Traveling Salesman Problems

DNA: An Introduction to structure and function. DNA by the numbers. Why do we study DNA? Chromosomes and DNA

What can you tell me about DNA? copyright cmassengale 1

THE COMPONENTS & STRUCTURE OF DNA

The Structure and Func.on of Macromolecules Nucleic Acids

THE STRUCTURE AND FUNCTION OF DNA

All This For Four Letters!?! DNA and Its Role in Heredity

Molecular Genetics Quiz #1 SBI4U K T/I A C TOTAL

Worksheet Structure of DNA and Replication

Chapter 12 Reading Questions

Fundamentals of Organic Chemistry. CHAPTER 10: Nucleic Acids

Pre-Lab: Molecular Biology

Components of DNA. Components of DNA. Aim: What is the structure of DNA? February 15, DNA_Structure_2011.notebook. Do Now.

From Gene to Protein

THE CELLULAR AND MOLECULAR BASIS OF INHERITANCE

AGENDA for 10/11/13 AGENDA: HOMEWORK: Due end of the period OBJECTIVES:

Basic concepts of molecular biology

What are the three parts of a nucleotide?

Solving traveling salesman problems with DNA molecules encoding numerical values

C A T T A G C nitrogenous complimentary G T A A T C G to each other

Molecular Genetics I DNA

Bio-inspired Models of Computation. An Introduction

DNA, RNA, and Protein Synthesis

Chapter 16: The Molecular Basis of Inheritance

Molecular Genetics. Genetics is the study of how genes bring about traits in living things and how those characteristics are inherited.

Nature Motivated Approaches to Computer Science I. György Vaszil University of Debrecen, Faculty of Informatics Department of Computer Science

Chapter 16 DNA: The Genetic Material. The Nature of Genetic Material. Chemical Nature of Nucleic Acids. Chromosomes - DNA and protein

Name Class Date. Information and Heredity, Cellular Basis of Life Q: What is the structure of DNA, and how does it function in genetic inheritance?

The Double Helix. DNA and RNA, part 2. Part A. Hint 1. The difference between purines and pyrimidines. Hint 2. Distinguish purines from pyrimidines

ADENINE, THYMINE,CYTOSINE, GUANINE

DNA: The Primary Source of Heritable Information. Genetic information is transmitted from one generation to the next through DNA or RNA

Directed Reading. Section: Identifying the Genetic Material. was DNA? Skills Worksheet

Flow of Genetic Information

Name: Date: Period:

The Development of a Four-Letter Language DNA

Transcription:

TravelingSalesmanProblemBasedonDNAComputing Yan Li Weifang University College of computer and communication Engineering Weifang, P.R.China jk97@zjnu.cn Abstract Molecular programming is applied to traveling salesman problem whose solution requires encoding of real values in DNA strands. This paper introduces a new DNA encoding method to represent numerical values. DNA strands are designed to encode real values by variation of their temperatures. According to the characteristics of the problem, a DNA algorithm solving the traveling salesman problem is given. The effectiveness of the proposed method is verified by simulation. 1. Introduction In 1961, Frynman [1] gave a visionary talk about the prospect of performing massively parallel computations in nanotechnology. The first conception of using DNA for computation ripened in the mid-1980s which the first theoretical model of splicing systems introduced by Head [2]. These ideas came to full power with Adleman s seminal experiment which solved a small instant of the directed Hamiltonian path problem (HPP) using solely DNA molecules and biomolecular laboratory techniques [3]. DNA computing is an interdisciplinary field of research straddling computer science, molecular biology, mathematics and chemistry. One advantage of DNA computing is massive parallelism. Another benefit is its enormous information storage capacity. DNA computing holds vast potential for solving problems that are extremely difficult to solve by conventional computation methods. It has been an area of active investigation since Adleman s paper published in Science. Since then, several more theoretical results and innovative experimental solutions have been introduced. Results of these studies are considered rather significant that much of the research has been published by leading scientific journals such as Science and Nature. A DNA is a molecule that plays the main role in DNA based computing. DNA is a polymer, which is strung together from monomers called deoxyribonucleotides. Each deoxyribonucleotide containing three components: a sugar, a phosphate group and a nitrogenous base. This sugar has five carbon atoms for the sake of reference there is a fixed numbering of them. Because the bade also has carbons, to avoid confusion the carbons of the sugar are numbered from 1 to 5. The phosphate group is attached to the 5 carbon, and the base is attached to the 1 carbon. Within the sugar structure there is a hydroxyl group attached to the 3 carbon. Distinct nucleotides are detected only with their bases, which come in two sorts: purines and pyrimidines. Purines include adenine and guanine, abbreviated A and G. Pyrimidines contain cytosine and thymine, abbreviated C and G. Because nucleotides are only distinguished from their bases, they are simply represented as A, G, C, or T nucleotides, depending upon the sort of base that they have. Mathematically, this means we have at our disposal a four-letter alphabet X={A,G,C,T} to encode information. Nucleotides can link together in two different ways. The first method is that the 5 -phosphate group of one nucleotide is joined with 3 -hydroxyl group of the other forming a phos-phodiester bond. The resulting molecule has the 5 -phosphate group of one nucleotide, denoted as 5 end,and the 3 -OH group of the other nucleotide available, denoted as 3 end, for bonding. This gives the molecule the directionality, and we can talk about the direction of 5 end to 3 end or 3 end to 5 end. The second way is that the base of one nucleotide interacts with the bade of the other to form a hydrogen bond. This bonding is the subject of the following restriction on the base pairing: A and T can pair together, and C and G can pair together no other pairings are possible. This pairing principle is called the Watson-Crick complementarity. A DNA strand is essentially a sequence (polymer) of four types of nucleotides detected by one of four bases they contain. Two strands of DNA can form (under appropriate conditions) a double strand, if the respective bases are the Watson-Crick complements of each other: A matches T and C matches G, also 3 end matches 5 end. Some of the simple operations that

can be performed on DNA sequences are accomplished by a number of enzymesthat execute a few basic tasks. Biologic techniques include hybridization, ligation, extraction, polymerase chain reaction(pcr), gel electrophoresis, melting, annealing and the other techniques. Most of the previous research in DNA computing do not require the consideration of the representation of numerical data in DNA strands. However, many practical application in the real world involve edge-weighed graph problems. Examples include the traveling salesman problem, the shortest path problem, the minimum spanning tree problem. Therefore, representation of numerical data in DNA strands is an important issue toward expanding the capability of DNA computing to solve numerical optimization problems. There exists previous work to represent the numerical data with DNA, but the results are not satisfactory yet [4, 5]. In this paper, Similar to Ji Youn Lee s method [6], we introduce a encoding method that utilizes a temperature gradient to overcome the drawbacks of previous work. The paper is organized as follows. In Section 2, we describe the DNA encoding scheme. Section 3 introduces DNA 3-dimensional graph structures. Section 4 describes the DNA computing for solving the TSP problem. Sections 5, we draw conclusion. 2. DNA Coding for TSP 2.1. Traveling Salesman Problem The traveling salesman problem is to find a minimum cost path for a given set of vertices(cities) and edges(road). In addition, the solution path must contain all the cities given, each only once, and begin from the specified city to which the tour ends. More formally [7], given a set C of m cities, distance d(c i,c j ) Z + for each pair of cities c i,c j C, and a positive integer B, the TSP is to determine a tour of C having length B or less, i.e., a permutation (c π(1),c π(2),,c π(m) ) of C such that ( m 1 ) d(c π(i),c π(i+1) ) + d(c π(m),c π(1) ) B (1) i=1 Figure 1 shows an instance of an edge-weight graph that has 7 nodes and ten edges of five differing cost values. Let the starting vertex is 0. 2.2. Coding Scheme In 1998, Narayanan and Zorlalas [8] presented a conceptual encoding method that represents costs with the lengths of DNA strands. However, this method uses a code length which is proportional to the absolute value of the weights Figure 1. The 7-city traveling salesman problem solved in this work. The paths start and end with city 0. The circles denote the cities and the edges represent the road. The number on each edge represents the cost on the given road. and thus its code length grows fast as the number of resolution of weights increases. In addition, this method seems not appropriate for representing real values. Shin et al. [9] proposed a method for representing the real numbers in fixed-length DNA strands by varying the number of hydrogen bonds. Although the number of hydrogen bonds is an important factor in deciding the thermal stability of DNA strands, it is not a sufficient factor. The melting temperature (T m ) is a more direct characteristic indicating the stability of a DNA duplex. This paper proposes a melting temperature control encoding method. The method uses fixedlength DNA strands and represents weights by melting temperatures of the given DNA strands. The basic Tm calculations were performed according to the following equation [10]: yg + z C 16.4 T m =64.9+41.0 ( wa + x T + y G + z C ) (2) where x, y, w and z are the number of the bases of T, G, A and C, respectively. This equation assumes that the annealing occurs under standard conditions in a buffered solution of 50 mm Na+ and 50 nm of oligonucleotide concentration, with a ph close to 7.0. The basic idea is to design the sequences so that the DNA strands for higher-weight values have higher melting temperature than those for lowerweight values. The weight sequences describe the proportion of edge weights, so this coding scheme can express the real value weights, and therefore a more economical path has a lower G/C contents. Each vertex sequence is designed to have the same melting temperature, because vertex sequences should contribute equally to the thermal stability of path. The coding scheme is illustrated in Figure 2. Figure 2(a) depicts vertex sequence that consists of two parts: the first half of vertex sequence is 10-bp(base pairs) position

sequence(p i ), the second half of vertex sequence is 10-bp weight sequence( W i ). Figure 2(b) shows the edge sequence that consists of two components: One component is complementary to the coding of v i,and the other component is complementary to the coding of vertex v j. The polarity of edge codes is the opposite (3 5 ) to vertex codes. The position sequences represent a specific vertex, and the weight sequence denotes a weight value in an edge. where the threshold value δ is determined by experiments. 3. Construction of Graphs by DNA In 1998, Jonoska et al. [11] proposed 3-dimensional (3D) graph structures can be used for solving computational problems with DNA molecules. For the satisfiability and the 3-vertex-colorability problems they gave procedures that use 3D graph structure. In both cases, k-armed branched junction molecules are used to represent the k- degree vertices of the graph. Examples of 2-, 3-, 4-armed branched molecules are presented in Figure 3. The 3 ends of the k-armed branched molecules end with single stranded extensions. Polarity of the DNA strands is indicated with arrowheads placed at the 3 end. Hydrogen bonds between the anti-parallel complementary Watson-Crick bonds are depicted as dotted segments between the strands. Figure 2. coding scheme 2.3. Genetic Code Optimization The genetic algorithm(gas), which mimic adaptation and evolution of living creatures, have been proposed to design better machines and to make self evolving computer programs. GAs have been inspired by the challenge to render tractable problem hardly solvable to conventional computers, using analogies from natural biological phenomena. An application of the DNA based evolution program to a search for good DNA encoding is sketched. Three fundamental operators of GAs based on the DNA coding method are Crossover, Mutation and Inversion. Crossover: It is a process of genetic information. The crossover has onepoint and multi-point operations and a crossover point is randomly set. It is concluded that a two-point crossover seems to be an optimal number for multi-point crossover. Mutation: Two kinds of point mutation exist: transition mutation and transversion mutation. In a transition mutation, purines are replaced by purines, and pyrimidines by pyrimidines. In a transversion mutation, purines are replace by purines and pyrimidines, Inversion: the order of genes between two randomly chosen positions is inverted within the vertex sequences. Fitness function is defined to promote the paths formed with lower costs so that the minimum cost path could be found. Let T m be a temperature, Tm the melting temperature. Then the function is defined as: { T Tm if T T F i = m δ, (3) 0 otherwise. Figure 3. Building blocks for vertices and edges 3D DNA structures do not contain open ends are referred as graph structures. To form the graph, all vertex building blocks and all edge molecules are combined and their ends are allowed to form double stranded DNA according to Watson-Crick complementary. Once formed, the edges are locked together by sealing all open nicks in the DNA strands with DNA ligase. 4. DNA Algorithms Solving the TSP Solution The molecular algorithms adopted in this paper are the same as the iterative version of molecular programming described in Ref [12]. Simulation has been performed on the graph depicted in Figure 1. Step 1, encoding: determine the code sequence using genetic code optimization. Genetic algorithm parameters for

code optimization are listed in Table1. Table2 shows the weight sequences. Table3 shows the position sequences of vertex. Table 1. Genetic algorithm parameters for DNA codes optimization of TSP Parameter Value Population size 300 Max generation 300 Selection method Roulette Wheel Crossover rate 0.6 Mutation rate 0.3 Ligation error rate 0.001 Threshold 0.01 Table 2. Weight sequences for the 7-city traveling salesman problem Weight Sequence 5 3 T m( o C) 3 TCTATAAAATGAATTGCATG 41.53 5 AGTCTATACGAATGAGTCAC 47.68 7 AGGATCCTGTTCTCCTGACG 53.83 9 TGCCCAGGCATTACGCGTTC 55.88 11 GGCACGACTGCTGGCAGCCG 62.03 Table 3. The first half of vertex sequences for the 7-city traveling salesman problem Vertex Position Sequence 5 3 0 AGGCTGGGCC 1 ACCGGCCGGA 2 CCTGTCCGCG 3 TTACGCGCCC 4 AGCGGCAGCC 5 GGGCCAGGCT 6 CAGGCAGCCG Step 2, produce all candidate solutions by molecular operators. Combine multiple copies of all vertex building blocks with all edge molecules at random and allow the complementary ends to hybridize and be ligated. Step 3, remove partially formed 3D DNA structures with open ends that have not been matched. Step 4, those DNA graphs that contained exactly n vertices and the length of path is n+1 are selected. Step 5, keep only those paths that enter all of the vertices of the graph at least once. Step 6, since the starting city and the finishing city are the same as in the traveling salesman problem, we keep only those paths that start and end with city 0. Finally, the path that has the least Tm is chosen as a solution, which represents the minimum cost path. In this work, the path with minimum cost be 0 1 2 3 4 5 6 0 and the cost of this path is 27. 5. Conclusions We show that 3D DNA graph structures can be used for solving computational problems with DNA molecules. Vertex building blocks consisting of k-armed branched junction molecules are used to form the graph. The detailed description of forming the building blocks and graph structures can be found in [13]. We presented a weight encoding method and the associated DNA computing that uses melting temperature of DNA strands and 3D DNA graph structures to solve the traveling salesman problem efficiently. One of the most important features of our method is that it can handle quantitative expression of real numbers using fixed-length DNA strands. In terms of efficiency, the use of 3D DNA structures could significantly reduce the time and steps needed to identify a solution. References [1] R.P.Feynman. Miniaturization. D.H.Gilbert (Ed.), New York, Reinhold, 282-296,1961. [2] T.Head. Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors. Bulletin of Mathematical Biology, 737-759,1987. [3] L.M.Adleman. Molecular computation of solutions to combinational problems. Science, 266(5187):1021-1024, 1994. [4] D.Faulhammer, R.J.Lipton, L.F.Landweber. Counting DNA:eatimating the complexity of a test tube of DNA. Biosystems,52:193-196, 1999. [5] D.G.Susan, P.R.Paul, G.L.Max. Computation with DNA on surfaces. Surface Science, 500:699-721,2002. [6] Ji Youn Lee et al.. Solving traveling salesman problems with DNA molecules encoding numerical values. BioSystems, 39-47, 2004. [7] M.R.Garey, D.S.Johnson. Computers and Intractability. A guide to the Theory of NPcompleteness. W.H.Freeman and company,1979.

[8] A.Narayanan, S.Zorbalas. DNA algorithms for computing shortest paths. Proceeding of the Genetic Programming, Morgan Kaufmann, 718-723, 1998. [9] Shin S.Y.,et al.. Evolutionary sequences generation for reliable DNA computing. Proceedings of the Congress on Evolutionary Computation 1999, 994-1000, 1999. [10] J.G.wetmur. DNA probes: applications of the principles of nucleic acid hybridization. Biochemistry and Molecular BiologyCrit, 26:227-259,1991. [11] N.Jonoska, S.Kari, M.Saito. Graph Structures in DNA Computing, in Computing with Bio- Molecules, Theory and Experiments. Springer- Verlag, 93-110,1998. [12] Liu xikui, Li yan, Xu jin. DSolving Minimum Spanning Tree Problem with DNA Computing. Journal of electronics, 22(2):112-117,2005. [13] F.M.Ausubel, R. Brent, R.E. Kingston, D.D. Moore, J.G. Seidman, J.A. Smith, K. Struhl, P. Wang-Iverson and S.G. Bonotz. Current Protocols in Molecular Biology. Greene Publishing Associates and Wiley-Interscience, New York,1993.