Part 1: Motivation, Basic Concepts, Algorithms

Size: px
Start display at page:

Download "Part 1: Motivation, Basic Concepts, Algorithms"

Transcription

1 Part 1: Motivation, Basic Concepts, Algorithms 1

2 Review of Biological Evolution Evolution is a long time scale process that changes a population of an organism by generating better offspring through reproduction. Population of chromosomes. Chromosome: DNA-coded information characterizing an organism. Gene: Elementary DNA block of information (e.g.,: eye color). Allele: One of the possible values for a gene (e.g., brown, blue,... ). Trait: The physical characteristic encoded by a gene. Genotype: A particular set of genes. Phenotype: The physical realization of a genotype (e.g., a person). Fitness: A measure of success in life for an organism. Crossover (Recombination): Chromosomes from the parents exchange genetic materials to generate a new offspring. Mutation: Error occurring during DNA replication from parents. 2

3 A genetic algorithm, often referred to as genetic algorithms, (GAs) mimic the processes of biological evolution in order to solve combinatorial optimization problems and to model evolutionary systems. Developed by John Holland, University of Michigan (1970 s) To understand the adaptive processes of natural systems. To develop ways in which the mechanisms of natural adaptation might be applied for combinatorial optimization problems and machine learning applications. 3

4 Why use the mechanisms of natural evolution for solving computational Problems? Evolution searches among an enormous number of possible genetic sequences, to create highly fit organisms that survive and reproduce in their environments. Species evolve by means of random variation (via mutation, recombination, and other operators), followed by natural selection, in which the fittest tend to survive and reproduce, thus propagating their genetic material to future generations. Likewise, combinatorial optimization problems involve searching through a large solution space for optimal or sub-optimal solutions. 4

5 Optimization (e.g., circuits layout, job shop scheduling,... ) Prediction (e.g., weather forecast, protein folding,... ) Classification (e.g., fraud detection, quality assessment,... ) Economy (e.g., bidding strategies, market evaluation,... ) Ecology (e.g., biological arm races, host-parasite coevolution,... ) Automatic programming. Best suited for: Big search space, non convex. Finding good sub-optimum in a reasonable time, rather than spending years on finding, perhaps, the best solution. 5

6 GAs have the following elements in common: Population of chromosomes (multiple candidate solutions are held) Selection according to fitness Crossover to produce new offspring Random mutation of new offspring. A single solution is represented as a vector of components. The vector is called a chromosome. The components are called genes. There is a population of chromosomes (solutions) that cooperatively act towards a common goal (communication between candidates). The GA is an evolutionary or iterative algorithm that modifies the population of solutions at each epoch (cycle, iteration) of the algorithm. The modification is done by crossover and mutation. 6

7 { } choose initial population; measure fitness of each individual of the population; do while Termination Criteria Not Satisfied { select parents for reproduction; perform recombination and mutation; measure fitness of each individual of the population; } 7

8 Chromosomes could be: Bit strings (0101,..., 1100) Integers (7, 5,..., 1, 99) Real numbers (43.2, -33.1,..., 0.0, 89.2) Permutations of elements (E11, E3, E7,..., E1, E15) Lists of rules (R1, R2, R3,..., R22, R23) Program elements (genetic programming)... any data structure... 8

9 Example: Parameter Estimation Find such that Possible domain points. Chromosome with genes (i.e., template representation of a solution): Population of solutions: members in population. 9

10 Travelling Salesperson Problem Find a tour of a given set of cities so that each city is visited only once the total distance traveled is minimized Chromosome with genes (i.e., representation of a solution): Population of solutions:

11 Crossover or recombination is GAs distinguishing feature. It involves mixing and matching parts of two parents to form children. Crossover was originally based on the premise that highly fit individuals often share certain traits, called building blocks, in common. For fixed-length vector individuals, a building block was often defined as a collection of genes set to certain values. For example, perhaps parameters and need to be both small values in the parameter estimation problem (they should be in some range). For example, in the Boolean individual , perhaps ***101*1 might be a building block (where the * positions aren t part of the building block). 11

12 Based on the representation of the chromosomes, only. One-Point Crossover Two-Point Crossover Uniform Crossover. Based on the representation of the chromosomes and the optimization objective. Crossover adapts according to the performance of the candidate solutions of the population. Example: Determine and keep important building blocks of the population. 12

13 One-point crossover picks a number between and, inclusive, and swaps all the indexes 1 1: 1 st vector to be crossed over 2: 2 nd vector to be crossed over 3: random integer chosen uniformly from 1 to inclusive 4: for to do 6: Swap the values of and 7: return and 13

14 The problem with one-point crossover is that it may break important linkages between components. Notice that the probability is high that and will be broken up due to crossover, as any choice of will do it, except for. If the organization of your vector was such that elements and had to work well in tandem in order to get a high fitness, you d be constantly breaking up good pairs that the system discovered. 1 14

15 Two-point crossover is one way to alleviate the linkage problem. Just pick two numbers and, and swap the indexes between them. Think of the vectors as rings to understand how the endpoints don t get broken. However, all components between the two points must be swapped, and linkages at the boundary may be broken. 15

16 16

17 We can treat all genes fairly with respect to linkage by crossing over each point independently of one another, using Uniform Crossover. Here we simply march down the vectors, and swap individual indexes if a coin toss comes up heads with probability. 17

18 18

19 Crossover Incapable of Exploring Entire Solution Space If you cross over two vectors you can t get every possible vector in the space out of it. Imagine your vectors were points in space. Now imagine the hypercube formed with those points at its extreme corners. Crossovers will result in new vectors which lie at some other corner of the hypercube. For example: the possible crossovers of the two vectors (1,2,3) and (4,5,6) are (4,5,3), (4,2,6), (1,5,6), (4,2,3), (1,2,6), and (1,5,3). As such, other vectors are not possible, such as (1,1,1). Thus, to make GAs have a chance to explore a wider search space, another form of change is required (i.e., Mutation). 19

20 Mutation allows the algorithm to explore the solution space more than that allowed by crossover. It provides genetic diversity from one generation of a population of genetic algorithm chromosomes to the next. Mutation alters one or more gene values in a chromosome from its initial state. The larger the number of gene values that are mutated, the large region of the solution spaced may be searched. Mutation occurs during evolution according to a user-definable mutation probability. This probability should be set low. If it is set too high, the search will turn into a primitive random search. Example: Single point mutation A random number is drawn to choose which gene in a chromosome should be randomly altered. 20

21 More Controlled form of Mutation: Line Recombination 21

22 How do we select parents for crossover? Selection of parents (chromosomes) for crossover should not be done randomly, but should be done in a way that is focused on achieving the common goal. For example, selection should be based on fitness. The probability of being selected as a parent is proportional to fitness. Possible selection methods (there are others): Tournament selection. Roulette-wheel selection. Stochastic Universal Sampling. 22

23 Divide wheel into subintervals, one for each individual in the current generation. Interval length is proportional to individual s fitness. Uniformly distributed random number drawn from within the number of notches in the wheel chooses the subinterval (i.e., parent 1). Do again for parent 2. Source: hilite_graphics/rhjan07g02.png 23

24 Roulette-wheel Implementation Population Member # Fitness Index Initial Fitness Value Fitness Index CDF Value

25 Algorithm 30: Roulette-wheel Selection 25

26 Stochastic Universal Sampling A problem with Roulette-wheel selection, it is possible the fittest individual may never be chosen due to the random nature of the selection. In Stochastic Universal Sampling, the selection is biased so that fit individuals always get picked at least once. 26

27 Algorithm 31: Stochastic Universal Sampling

28 Tournament selection involves running tournaments among a number of individuals chosen at random from the current generation (population). The winner of each tournament (the one with the best fitness) is selected. Choose a tournament size,. Randomly select chromosomes (solutions) from the current generation, and choose the most fit of these to be the mother design. Randomly select chromosomes (solutions) from the current generation, and choose the most fit of these to be the father design. 28

29 Algorithm 32: Tournament Selection Randomly select chromosomes (designs) from the current generation, and choose the most fit of these to be the chosen one. 29

30 Effect of Tournament Size The greater is the tournament size, the higher is the chance that higher fit individuals will be chosen. Extreme case #1: If the tournament size is equal to the generation size, the most fit solution in the current generation would always be selected as both the mother and father. Extreme case #2: If the tournament size is one, fitness is completely ignored and the mother and father are selected randomly. 30

31 Tournament Selection Benefits Tournament Selection has become the primary selection technique used for the Genetic Algorithm. Efficient to code. Works on parallel architectures. Allows the selection pressure to be easily adjusted. The most popular setting is. 31

32 The elitist Genetic Algorithm injects the fittest individual(s) from the previous population into the next population. These individuals are called the elites. By keeping the best individual(s) around in future populations, this algorithm is exploitive. 32

33 Algorithm 33: Genetic Algorithm with Elitism 33

34 Designed for multidimensional real-valued spaces. Children must compete directly against their immediate parents for inclusion in the population. The size of Mutates is based on the current variance in the population. If the population is spread out, mutate will make large changes. If the population is condensed in a certain region, mutates will be small. Differential mutation is an adaptive mutation algorithm, that adapts to the variance of the population. 34

35 The idea is to mutate away from one of three chosen individuals by adding a vector to it. This vector is created from the difference between the other two individuals. If the population is spread out, and are likely to be far from one another and this mutation vector is large, else it is small. If the child is fitter than the parent, it replaces the parent in the original population, else the child is thrown away. 35

36 [1] J. D. Hedengren, "Optimization Techniques in Engineering," 5 April [Online]. Available: [Accessed 27 April 2015]. [2] A. R. Parkinson, R. J. Balling and J. D. Heden, "Optimization Methods for Engineering Design Applications and Theory," Brigham Young University, [3] S. Luke, "Essentials of Metaheuristics," [Online]. Available: [Accessed 11 May 2015]. [4] J. D. Hedengren, "Genetic Algorithms in Engineering Design," [Online]. Available: [Accessed 28 April 2015]. 36

37 37

38 Initial Population Elites on Contour Elites on Surface Mating 1 Mating 2 Mating 3 o: Parents *: Crossover x: Mutation 38

39 Initial Population Elites on Contour Elites on Surface Mating 3 Mating 4 Children + Elites 39

40 Histogram 1: 1 % 14: 14 % 211: 21.1 % 774: 77.4 % 40

41 Initial Population Elites on Contour Elites on Surface Children + Elites Next Generation 41