Survey of Annealing Methods for Schedule Optimization

Size: px
Start display at page:

Download "Survey of Annealing Methods for Schedule Optimization"

Transcription

1 Survey of Annealing Methods for Schedule Optimization By Steve Morrison, Ph.D Annealing methods are applicable to almost any possible problem structure or lack of structure. Like genetic methods, the choice to use annealing methods implicitly makes some assumptions about the topology of the solution space. Both methods excel at finding locally optimal solutions that are at the bottom of a valley, even if the valley has multiple local minima. Both can find deceptive solutions that resemble a shaft in the top of a hill only with great difficulty. The idea behind annealing methods comes from the observation in metallurgy that metal that is slowly cooled (annealed) has superior characteristics to metal that is rapidly quenched. The reason for this is that cooling freezes the metal in a frozen state, and slowly freezing the metal allows the atoms to freeze in a lower state. In contrast to a greedy algorithm, which tries to find the lowest solution as soon as possible, a simulated annealing algorithm allows some uphill moves. Initially it allows a large percentage of uphill moves, and towards the end it allows nearly none. The way the percentage of allowed uphill moves is changed is called the annealing schedule. If the annealing schedule cools too rapidly, the algorithm runs rather fast and approximates the greedy algorithm. If the annealing schedule cools too slowly, the algorithm approximates a Monte Carlo method, giving a very good solution and taking a very long time. The most common annealing method is simulated annealing, where the move directions are chosen at random. Mean field annealing is just like simulated annealing except that the (apparently) best move is always chosen. Ogbu and Smith [1990], Suh [1988], Aarts and Korst [1989], Das et al. [1990], Ku and Karimi [1991], and van Laarhoven et al. [1992] are among the many who use simulated annealing for scheduling. While Kirkpatrick et al. [1983] use simulated annealing to solve traveling salesman problems of several thousand cities, Johnson [1987] states that his comparison of simulated annealing with the Lin-Kernighan for the symmetric traveling salesman problem showed simulated annealing was a thousand times slower. Barrera and Evans [1989] only considered briefly simulated annealing for the MINLP solver. They rejected it due to its high computational effort and only slightly better solutions than 1-change plus restricted 2-change. Others who either believe simulated annealing is way oversold or who have just outright rejected simulated annealing as a solution method are Bounds [1987] and Pekny and Miller [1991]. However, according to Miller et al. [1993], Dupont is a company that has both simulated annealing and tabu search in their suite of scheduling algorithms in a commercial environment. This is not a contradiction of Pekny and Miller s earlier statements on simulated annealing, but rather while still recognizing that simulated annealing is not competitive where well designed conventional algorithms work, simulated annealing s simplicity and ease of allowing complex cost functions allow it to work where there are no conventional algorithms. In Dupont s original implementation of tabu search versus simulated annealing, tabu search was initially not competitive; it gave somewhat better results but was from 8 to 45 times slower. However, with a modified tabu search that had a short-cut for estimating costs, the modified tabu search was only one-third to half again as slow as simulated annealing and gave better results for large problems. 1

2 # Orders Annealing Cost Time Tabu Cost Time Mod. Tabu Cost Time large 2353 Table Performance of Annealing and Tabu Search Algorithms from Miller et al. [1993]. The general trend is that simulated annealing is the fastest method, and modified tabu search is slightly slower and gives better answers. Unfortunately they did not report the quality of simulated annealing and tabu search solutions when both ran for the same amount of time. Since tabu search is usually a single-threaded method, one improvement over the work of Miller et al. might be to use simulated annealing up to a point, and then use tabu search to refine the solution. This would hopefully give the best of both methods. Most simulated annealing algorithms use swap moves, insert moves, and sequence reversals for symmetric problems. Recalling Johnson s remarks, to this author s knowledge, nobody is currently combining Lin-Kernighan type of moves with simulated annealing. Move Acceptance and the Annealing Schedule It is important to understand how the probability of a move s acceptance is computed. Whichever of two equations is used, the probability is strongly a function of the temperature beta. Thus, the initial value of beta and how beta varies over time are keys to a particular simulated annealing algorithm. A greedy search has the simplest move acceptance criteria: accept all downhill moves and reject all uphill ones. According to Das et al. [1990] the two common annealing algorithms are the Metropolis algorithm and the Glauber algorithm. The Metropolis algorithm accepts all downhill moves, and accepts or rejects uphill moves according to the annealing schedule. The Glauber algorithm accepts and rejects both downhill and uphill moves according to the annealing schedule. The Metropolis annealing schedule for the probability of accepting uphill moves follows the Boltzmann probability distribution and is probability = exp( - (change in C )/ beta ). 2

3 where change in C is the improvement, and beta is the annealing temperature. Notice that if there is no change in change in C the move is automatically accepted, which is rather unusual. The Glauber algorithm accepts all moves with a probability of exp( - change in C / beta ) probability = 1 + exp( - change in C/ beta ) The Metropolis algorithm is more greedy than the Glauber algorithm, and thus, the Metropolis algorithm in general intensifies better but explores poorer. As a side note, it is interesting that replacing the numerator of the Glauber algorithm with 1.0 minus the old numerator gives a good choice for the transfer function for the hidden nodes of a neural network. Building on this idea, one could calculate the probability as a weighted sum of Metropolis and Glauber probabilities, causing different functions to dominate in different regions, and update the appropriate betas not only as a function of time but also as a function of success. A key part of the success of a simulated annealing algorithm is the annealing schedule, or how to initialize and vary ß. For the Metropolis algorithm, Das et al. [1990] suggest - change in C max ß 0 = ln p 0 Both Das et al. [1990] and Aarts and van Laarhoven [1985] suggest a initial probability value (p 0 ) of 0.9. For the Glauber algorithm the initial ß 0 works out to be - change in C max ß 0 = ln( p 0 / ( 1 - p 0 ) According to Das et al. [1990] the two basic types of annealing schedules are exponential and Aarts and van Laarhoven (AvL). The exponential annealing schedule has the form beta next = beta a where a is a constant from 0 to 1. 3

4 The Aarts and van Laarhoven annealing schedule is beta beta next = ( 1 + ln( 1 + delta) beta / 3 delta (beta) ) where σ(ß) is the standard deviation of the objective function and δ is a specified value which is a measure of the desired closeness to equilibrium. Das et al. [1990] initial used a value of 0.3. The value of ß was only updated every scheduled temperature change, which could be N(N-1) moves where N is the number of products. There are many other possibilities for determining the annealing schedule. Gunnels [1994], Yip and Pao [1994], and Thonemann [1994] use simulated annealing with a genetic algorithm to determine the annealing structure. Shen, Pao, and Yip [1994] apply Guided Evolutionary Simulated Annealing (GESA) to jobshop scheduling with some success. While many other algorithms can terminate when the change in objective function is less than a certain value, Das et al. [1990 p.1354] suggest the change in objective function times the temperature over the change in temperature times the initial objective function is less than a value between zero and one. Das et al. use Das et al. [1990 p.1359] compare two algorithms: Metropolis and Glauber, and two annealing schedules, exponential and Aarts and van Laarhoven. In their comparison, the Metropolis algorithm and the Aarts and van Laarhoven annealing schedule give the best schedules. This is despite the fact that they often explore slightly less solution space than either method with an exponential annealing schedule. As for time required, they were solving 16 product 6 unit problems on the order of 53 minutes on Sun 3/60 workstations. It gave results from 0% to 11% worse than optimum. This was two orders of magnitude more than a heuristic method that gave from 30% to 95% worse than optimum. Das et al. [1990] change the temperature schedule to N(N-1)/4 and δ = 10, leave everything else the same, and the new Metropolis Aarts and van Laarhoven algorithm still gave results from 11% to 27% of optimum and was only one order of magnitude slower than the heuristic. They point out that one key reason simulated annealing is slower than the heuristic is that it calculates all costs rigorously while the heuristic uses some estimations. Stern [1994] uses a temperature dependent penalty function. Leinbach [1992] labels any predetermined annealing schedule to be a serious kludge. Once a schedule comes to an end, the problem is finished regardless of the quality of the solution. The annealing schedule is updated without using any feedback on how well the cost is improving, and the annealing temperature is forced to be uniform across the entire solution space. He proposes automated local annealing, which is simulated annealing where each local unit controls its own temperature, and the temperature can be different from unit to unit. In summary, the two most common move acceptance criteria are the Metropolis and Glauber algorithms. The two most common annealing schedules are exponential and Aarts and van Laarhoven. Of these, Das reports the best results with the Metropolis algorithm and the Aarts 4

5 and van Laarhoven annealing schedule, but there are many other variations and possibilities, too. Leinbach has a compelling point, and more research needs to be done in this area. Issues with Simulated Annealing Das et al. [1990 p.1357] point out that simulated annealing examines some configurations more than once. In other words there is no provision to prevent cycling. Cycling is not too frequent due to the probabilistic nature of the moves, but one proposed suggestion for simulated annealing is to have a one or two move tabu imposed after every uphill move. A second proposed suggestion for simulated annealing is also related to tabu search. Tabu search algorithms have aspiration criteria, which says to ignore tabus when certain conditions occur. The most common aspiration criteria is to ignore the tabus if the solution is the best found yet. This same concept could be applied to simulated annealing, in that when the a solution is the best found so far, set the probability to 1. This could potentially speed up the algorithm late in the annealing schedule. However, this aspiration could also have a harmful effect of limiting exploration and getting stuck on a local optima, and this harmful effect would be most evident early in the annealing schedule. Thus, one might only want to turn this on late in the annealing schedule. Another suggestion is that when the simulated annealing algorithm slows down significantly, instead of stopping the algorithm, restart the algorithm (and roll back the annealing schedule) with a diversification move. Another suggestion is to explore the continuous concept of momentum as applied to the discrete method of simulated annealing. In the Metropolis algorithm, the probability is less than one for any non-improving more. What if the line of 1 probability was adjusted. For example, give all moves a probability of 1 if the move is no worse than the current answer plus a momentum term. The momentum term does not have to be constant, but it too can have an annealing schedule just like the temperature, initially being high and later being close to 0. There are a large number of potential aspects to simulated annealing algorithms, and Table 3.11 provides a partial list. Mean Field Annealing Simulated annealing is good for problems with little structure, it is quick to implement, and it has good convergence properties; however, it is often very slow. It works by using randomness to approximate a cooling schedule. Mean field annealing (MFA) is identical to simulated annealing except for one aspect: instead of choosing moves randomly, it intelligently selects what appears to be the best candidate. According to Bilbro et al. [1992], mean field annealing is a specific type of a graduated nonconvexity algorithm. The advantages of mean field annealing is that, according to Bilbro et al. [1991b], it is one to two orders of magnitude faster than simulated annealing, and it is often a simple matter to convert a simulated annealing code to a mean field annealing code. It often has similar convergence properties to simulated annealing. The drawback of mean field annealing is that it does not have any guarantee of convergence, even where simulated annealing does converge. Thus mean field annealing to simulated annealing is like Lamarkian genetic methods to regular genetic methods. According to Bilbro et al. [1991b], MFA essentially replaces the discrete degrees of freedom in simulated annealing with their average values as computed by the mean field approximation. According to Cichocki and 5

6 Unbehauen [1993 p.486], many consider the terms mean field annealing and mean field theory to be synonymous; however, others distinguish the two as mean field theory being mean field annealing with a constant annealing temperature. Both Bilbro et al. and Cichocki and Unbehauen explore the relationship between mean field theory and Hopfield neural networks. It should be mentioned there are different models for mean field annealing, but the most common one is an Issing Glass model. Zenbia and Chillapa [1993] explore Gauss-Markov Random fields for image processing. To convert a simulated annealing code to a mean field annealing code, one must change only one part of the algorithm. First, use a best apparent search direction in place of using a randomized search direction. The algorithm can decide to accept or reject moves as in simulated annealing. A combination of these two methods is to use mean field annealing as long as the algorithm is converging, and use simulated annealing as a fallback method when convergence seems elusive. Determining the best apparent search direction is not always straightforward. Bilbro et al. [1991] in the application of range image restoration resort to perturbing the entire vector at each iteration to calculate a pseudo-derivative. Then they perform a gradient-descent type of method until a minimum if found. This is the new equilibrium point. Then they reduce the temperature and restart from the previous equilibrium point. Mask Annealing The Mask annealing method, proposed by this author, is analogous to the mask genetic method. It splits the problem into a macro-level and a micro-level. At the micro level, the algorithm uses an intensification technique, such as the greedy method, branch and bound, dynamic programming, beam methods, the conveyor algorithm, and hybrid methods to find the best answer within a local neighborhood. At the macro-level, an annealing method is employed to find the best neighborhood. All of the preceding aspects are not reflected in Table Some aspects are common to many methods and are presented in Table 3.13 in section Annealing Method Aspect Possibilities Approach Philosophy Simulated annealing, Mean field annealing Scope Regular, GESA, Mask Annealing, Hybrid Initialization Initial probability 0.9, other Evaluation Global cost calculations Rigorous, Approximate, Use both, When to change Feasibility evaluation Ensuring feasibility intrinsically or through repair, Constraints, Penalties, Shifting penalty functions Annealing Detail 6

7 Annealing algorithm Annealing schedule (cooling rate) Cost function improvement Type of move after downhill No. of intensification searches before diversifying (nesting) Type of move after uphill Metropolis (all improving moves), Glauber, Other Exponential, Aarts and van Laarhoven, automated local annealing, Multiple probability cells, GA, Any violations of monotonically decreasing? Regular, Momentum 2-swap, 3-swap, Other, Change based on temperature, probability 0, 1, Constant, Threshold, Enumeration, Probabilistic, First uphill move, constant number of moves, Certainty of local optima, Strategic oscillation 2-insert, Crossover, Sequence reversal, L-K, Other, When to change 4. Diversification Randomized restart policy None, Constant, Based on past improvement Number of concurrent 1, Constant, Deaths, New births threads Genetic crossover Historical solutions only, Current threads only, Both Annealing schedule None, depends on events rollback Aspiration criteria for Global best solution, Momentum, Probabilistic, When to change non-metropolis methods Cycling prevention None, Tabu lists, Long-term memory, Penalizing frequent moves Table Potential simulated annealing implementation aspects. Benefits and Drawbacks of Annealing Methods Like genetic methods, annealing methods are not competitive with polynomial time methods and many specialized integer programming methods, where those methods exist. However, both of genetic algorithms and simulated annealing use randomness to excel where the problem does not have a structure amenable to a specialized method. These methods usually produce superior solutions to expert systems and dispatching rules, but at the cost of longer computational time. Both methods are easily parallelizable. Like some genetic methods, with annealing methods one can easily and dynamically change the amount of intensification versus diversification by adjusting the annealing schedule. List of References Aarts, E.H.L. and J. Korst. Simulated Annealing and Boltzmann Machines John Wiley & Sons Aarts, E.H.L. and P.J.M. van Laarhoven. Statistical Cooling : A General Approach to Combinatorial Optimization Problems. Philips Journal of Research vol p

8 Barrera, M.D. and L. Evans. Optimal Selection of Equipment Units for Batch Processes. AIChE Annual Meeting in San Francisco, CA November 5-10, Bilbro, G.L., et al. Mean Field Annealing: A Formalism for Constructing GNC-Like Algorithms. IEEE Transactions on Neural Networks. vol.3 no p Bilbro, G. et al. Optimization by Mean Field Annealing. in Advances in Neural Information Processing Systems p Bilbro, G.L. and W.E. Snyder. Range Image Restoration using Mean Field Annealing. in Advances in Neural Information Processing Systems p Bounds, D.G. New Optimization Methods from Physics and Biology. Nature 1987 vol.329 p Cichocki, A. and R. Unbehauen. Neural Networks for Optimization and Signal Processing. Wiley Chapter 9. Das, H., P.T. Cummings, and M.D. LeVan. Scheduling of Serial Multiproduct Batch Processes Via Simulated Annealing. Computers & Chemical Engineering vol.14 no p Gunnels, J. et al. Genetic Algorithms and Simulated Annealing for Gene Mapping. Proceedings of the First IEEE Conference on Evolutionary Computing vol.i 1994 p Johnson, D. More Approaches to the Travelling Salesman Guide. Nature vol p.525. Kirkpatrick, S., C.D. Gelatt, Jr., M.P. Vecchi. Optimization by Simulated Annealing Science vol p Ku, H.-M. and I. Karimi An Evaluation of Simulated Annealing for Batch Process Scheduling. Ind. Engineering Chem. Res p Leinbach, J. Automatic Local Annealing. in Advances in Neural Information Processing Systems. p Ogbu, F.A. and D.K. Smith. The Application of the Simulated Annealing Algorithm to the Solution of the n/m/c max Flowshop Problem. Computers in Operations Research vol.17 no p Pekny, J.F. and D.L. Miller. Exact Solution of the No-Wait Flowshop Scheduling Problem with a Comparison to Heuristic Methods. Computers & Chemical Engineering vol.15 no p Pekny, J.F., D.L. Miller, and G.K. Kudva. An Exact Algorithm for Resource Constrained Sequencing with Application to Production Scheduling Under an Aggregate Deadline. Computers & Chemical Engineering vol.17 no p Stern, J.M. Simulated Annealing with a Temperature Dependent Penalty Function. ORSA Journal on Computing vol.4 no.4 Summer 1992 p

9 Suh, C.J. Controlled Search Simulated Annealing for Job Scheduling University of Texas at Austin Dissertation van Laarhoven, P.J.M., E.H.L. Aarts, and J.K. Lenstra. Job Shop Scheduling by Simulated Annealing. Operations Research vol.40 no p Yip. P.P.C. and Y.-H. Pao. Combinatiorial Optimization with Use of Guided Evolutionary Simulated Annealing. IEEE Transactions on Neural Networks vol.6 no.2 March 1995 p Zenbia, J. and R. Chillapa. Mean Field Annealing Using Compound Gauss-Markov Random Fields for Edge Detection and Image Estimation. IEEE Transactions on Neural Networks. July 1993 vol.4 no.4 p