TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS. Liviu Lalescu, Costin Badica

TIMETABLING EXPERIMENTS USING GENETIC ALGORITHMS Liviu Lalescu, Costin Badica University of Craiova, Faculty of Control, Computers and Electronics Software Engineering Department, str.tehnicii, 5, Craiova, RO-1100, Romania EMail: c_badica@hotmail.com Abstract Scheduling in general and university timetabling in particular are important problems in combinatorial optimization and computer science. There were many attempts to find suitable algorithms for computing optimal solutions to the university timetable problem. This paper investigates an approach for solving this problem using a genetic algorithm. It also presents an evolutionary program built on the skeleton of this genetic algorithm, along with the obtained experimental results and conclusions. Keywords genetic algorithm, scheduling, timetable, evolution, genetic operators 1. INTRODUCTION Timetabling is a well known problem in computer science. It involves scheduling in the optimal way a given set of activities such that conflicts in using a given set of resources are avoided. The resulted schedule must be valid and must meet as much as possible an additional set of domain problem dependent soft constraints. In particular, scheduling university courses fits under the specification of the timetabling problem. The university has a set of teachers, a set of students (organized hierarchically in years of study, groups, sub-groups) and a set of activities (lessons). Each activity represents a teacher and subject set of students relation. The activities have to be scheduled in a weeks time such that the following constraints are satisfied: i) There is no teacher who is teaching more than a single lesson at a time; ii) There is no student who is attending more than a single lesson at a time. These constraints are called basic hard constraints. In our approach there may be added additional custom hard constraints to model teachers and/or students unavailability for certain periods during the week. Also, there can be introduced additional non-critical custom soft constraints. Violating these constraints does not break the consistency of the timetable, but it is desirable that as much as possible they are satisfied. Examples include avoiding gaps between the timetable activities or avoiding heavy loadings per day of the timetable. A complete timetable specification must also include the coherent allocation of classrooms for all the activities. This allocation must be consistent with some hard constraints (i.e. a classroom cannot be occupied by two teachers at the same time, or a certain activity must take place in a certain type of classroom). A time and computational saving idea is to split the complete timetabling problem into two phases: time (day and hour) allocation and place (classroom)

allocation ([1]). The assumption is that, after obtaining a solution to the first phase, the second phase is much easier to solve, i.e. there are many solutions to the second phase that satisfy the first phase. The first phase is the most important and difficult one. The constraints are much stronger and the computation effort is much higher in this first phase. After finding a suitable schedule, the allocation of the rooms is a conceptually similar task with the first phase and consequently it can be solved in the same way. The argument behind this separation is that from the search space for n activities, each with a starting time ranging from 0 to m 1, and an allocated room, ranging from 0 to p 1, i.e. an overall search space of m n p n possible solutions, the twophase approach will derive a search space of m n +p n possible solutions. This intuitive approach brings obviously an important improvement in the speed and effort of computation of any solving algorithm. For the sake of simplicity, in this paper it was only considered the first and the most difficult task of scheduling the activities starting times. 2. A GENETIC APPROACH TO THE TIMETABLING PROBLEM The timetabling problem is, by its nature, a computationally complex problem. Even defining when a solution is better than other is not a trivial task. The only possible perfect algorithm is to perform an exhaustive search through the solution space and choose the optimal one. Unfortunately, this algorithm is inapplicable in practice due to the exponential growth of its execution time (it was shown earlier that the size of the search space is m n ). Other approaches for solving the timetabling problem have been reported in literature. They were based on simulating the way human experts solve heuristically the problem or on more mathematical concepts like network flow. However, it is believed that the most promising approach is based on genetic algorithms and evolutionary programming ([2]). The genetic approach is inspired from the evolution concept observed in real life. The idea is to create a population of individuals, each individual representing a possible timetable. Starting with an initial population, one simulates the natural evolution towards better candidates to the timetable optimal solution. The main assumption is that, because the best individuals are the most likely to be chosen for reproduction and are also the most likely to survive for longer periods, fitter individuals develop and strengthen along generations. Thus, it results the possibility to obtain the best solution to the timetabling problem, represented by the fittest individual. There were attempts to give rigorous proofs that indeed genetic algorithms converge to nearly optimal solution, but unfortunately without success. It is difficult to give such a proof, because of the nature of the problem. Current justifications are intuitive and mainly based on experimentation. A very interesting argument, although still with a high degree of intuition, can be found in [4]. 2.1. Representation of the Problem The population is made up of a number of individuals, or chromosomes. Although an individual may have more than one chromosome, in our approach each individual has a single chromosome.

Each chromosome is made up of a set of genes (the smallest information carrying unit of a chromosome). In our approach, inside the chromosome, there is a gene for each activity in the timetable. This gene represents the scheduled time of the corresponding activity. So, a chromosome is actually an array of genes, each gene representing the starting day and hour of an activity. The fitness function gives a measure of the degree of optimality of a chromosome. Each chromosome has a hard fitness factor and a soft fitness factor, each representing the number of unsatisfied constraints (hard, respectively soft). The fitness function is computed as follows: i) The hard fitness is computed simply by counting the number of hard constraints conflicts. This approach proved to work well in the simulation. ii) The soft fitness function is computed as a weighted counting of soft constraints conflicts. Every custom soft constraint has assigned a weight giving its relevance. A comparison function is necessary, to define when a chromosome is better than other chromosome. Comparing two chromosomes (and thus obtaining the best individual) consists in choosing the individual with the best hard fitness (in case of ties, choosing the individual with the best soft fitness). Note that the best fitness means the lowest fitness factor (the fitness factor is decreasing with better individuals) and thus maybe a more appropriate name for this fitness factor would have been conflict factor. The evolution method is the main function of the population. It defines the strategy for generating a new population from an old one, by using mutation, crossover or simple propagation of old chromosomes. Usually, the new population is generated iteratively (new chromosomes are generated one by one from old ones). There were implemented two evolution methods: i) A classical method three tournament selection ([1]). Selecting a candidate chromosome for a crossover, mutation or simple propagation is done in the following manner: from the current population, three random chromosomes are chosen, each with equal probabilities. If it were the case that a simple propagation was needed (conservation of a chromosome) or a mutation of a chromosome, the best chromosome out of these three would be chosen for this tasks. If the considered operation were a crossover, we would choose the best two chromosomes out of these three. ii) A new, experimental method. Each evolution step consists in generating a population with a double number of individuals from the current one, then selecting half of it, consisting of the best chromosomes. Doubling the population is done by crossover or mutation (no propagation). The chromosomes for crossover or mutation are chosen with equal probabilities from the current population. In the three tournament selection method it is possible for weaker individuals to survive and even generate new chromosomes. However, this evolution is not turned into random search, by giving some priority to better individuals. This method is said not to implement extensive elitism. This new proposed method is said to implement an extensive elitism, because weaker individuals do not have any chance of survival

(contrary to the three-tournament selection). The experimental results of these methods are compared in section 3. The crossover operator accepts as arguments two chromosomes and creates a new chromosome, by combining the leftmost part of the first argument and the rightmost part of the second argument ([3]). Empirically has been noticed that a small part of the descendents will combine the qualities of their parents into a better chromosome. These chromosomes are in turn subject to subsequent crossovers and mutations. The selection function plays an important role here, by promoting the fitter chromosomes. The behavior of the crossover operator is displayed in figure 1. Chromosome 1 Chromosome 2 1 2 3 n 1 2 3 n...... 1 2 3 n... Chromosome after crossover Figure 1. The crossover operator Mutation involves altering a single gene of a chromosome. There were considered two kinds of mutations: i) Swapping two random genes of the chromosome ([3]). ii) New, experimental method randomize a single gene of a chromosome (randomization of an activity s starting time). The mutation is supposed to introduce a random change in the current population. This is needed in order to prevent the population to concentrate in a small zone, converging to a local optimum instead of converging to the global optimum (i.e. the optimal solution). These methods are compared experimentally in section 3. 3. EXPERIMENTAL RESULTS 3.1. A brief description of the program An experimental implementation was done as a C++ object oriented program 1. Some of the classes of this program and their responsibilities are: 1 The program has the status of a research prototype. It is distributed freely under GPL and can be downloaded from www.algorithms.ro.

i) GeneticTimetable the main class, encapsulating a set of rules and a population. It has all the necessary interface functions to read/edit/save rules, start/stop simulation and save/view results; ii) Rules the set of rules that make up a timetable. It includes the set of teachers, the subjects (lessons) that are taught by them, the set of classes of students (organized hierarchically in years, groups and subgroups), the set of activities and the set of constraints; iii) Population represents a set of chromosomes and has different member functions for implementing the presented evolution methods. iv) Chromosome represents a candidate solution. Its interface includes the mutation and crossover functions, the function for evaluating the fitness and the function for obtaining a schedule from the information stored in the genes. The class diagram of the experimental program is shown in figure 2. Figure 2. The class diagram of the experimental program The design of the experimental program was done with the following two goals in mind: usability and flexibility for experimentation. The user is provided with the facility to allocate by hand some activities, allowing the program to automatically schedule only the remaining activities. This is done with the reason that the user may want to guide the program to the best solution from his point of view. This feature is generally referred to as semi-automatic system. This feature seems to have a little impact on the execution time to obtain a timetable. The results seem not to improve by using the semi-automatic facility. This is due mainly to the recent increase in computational power that gives the ability of modern computers to perform all the required computations in an acceptable time, with or without the help of the human assistant.

The program was also designed to allow the easy addition of more custom constraints (either hard or soft). It remains as a non trivial task to allow this facility at run-time, rather than recompiling the sources. The implementation also considered bi-weekly activities. Bi-weekly activities can be scheduled in an overlapping manner (in the first week one activity, in the second week the other activity). The requirement for considering bi-weekly activities came from the particularities of the timetable at the Faculty of Automatics, Computers and Electronics, University of Craiova, Romania, that was used as a case study 3.2. Results and discussions There were carried out several experiments. The population was set to 512 (a compromise between speed, memory consumption and, on the other hand, the optimality of results). The program was allowed to run for 10000 generations for each case. These tests were done on randomly generated data sets, with 70 teachers, 80 subgroups and 500 activities. The loading factor for students (the percent of the working hours from the total number of working hours in a week) was set to about 70%. The tests involved altering four types of parameters and comparing the obtained results. To simplify the task, the experimentation considered, in turn, three fixed parameters and modified the values for the remaining fourth parameter. It is very important to point out that the experiments might not be relevant. The algorithms contain pseudo-random numbers and repeating the experiments might yield totally different results. We only present here the results obtained with the developed program (that has the status of a research prototype) and compare the presented methods using these potentially non-representative or experimental results. There were done four major kinds of experiments: i) Choosing the evolution probabilities: for this experiment it was considered only the three-tournament selection evolution method, with a propagation probability of 20% (the expected number of chromosomes that remain the same from a generation to the next one). Varying the probability of crossover from 0% to 80% and that of mutation from 80% to 0%, it was concluded that the fastest converging program is obtained using the values 60% for mutation and 20% for crossover. This is consistent with the statement from [4], that crossover has a less role in evolution than it is accorded, and mutation is more important than it is usually thought. The results of this experiment are shown in figure 3. This figure plots on a logarithmic scale the values of the hard fitness function for four sets of evolution parameters. ii) Comparing the evolution methods: there were compared the three tournament classical method and the experimental method proposed by the authors. The expected results were that the classical three-tournament selection to be much better than the experimental method. The reasons are that this experimental method does not maintain the population diversity. It encourages the solution to converge to a local optimum, because local optimal individuals suffocate other weaker individuals, but which are better candidates to the global optimum. In other words, the experimental method is supposed to show a too high degree of elitism ([4]).

Figure 3. The values of the hard fitness function on a logarithmic scale for four sets of evolution parameters As a confirmation of these facts, graphical analysis proved that the experimental proposed method is converging faster than three-tournament selection, towards 0 hard conflicts. The possible explanation is: there are a very large number of almost optimal solutions. Any good individual is close to a local optimum and this optimum is not far from the optimal solution. Since the experimental evolution method is more elitist, it is faster converging to a local optimum than the three-tournament selection is converging to the global optimum. Note also that, even though the experimental method proved a much faster convergence towards 0 hard fitness factor, both simulations reached 0 hard conflicts and the soft fitness factor was lower (better) in the three-tournament selection. This means that indeed the experimental method is trapped in a local minimum, in comparison with the three-selection method. The plots in figures 4 and 5 support this line of reasoning. iii) Comparing the mutation methods: there were compared the method presented in [3] (randomly swapping two genes) and the experimental method proposed by the authors (randomization of a gene). The experimental results that are plotted in figure 6 might suggest a faster convergence of the experimental method. A plausible explanation is that the experimental mutation method introduces a higher degree of randomization than the swapping method does. This method also seems to resemble closer the mutation concept observed in biological evolution.

Figure 4. The values of the hard fitness function on a logarithmic scale for the three tournament selection and the experimental evolution methods Figure 5. The values of the soft fitness function on a logarithmic scale for the three tournament selection and the experimental method

Figure 6. The values of the hard fitness function on a logarithmic scale for the mutation by randomly swapping two genes and randomization of a gene iv) Comparing population initialization methods: there were compared two population initialization methods: unallocated (an experimental method of the authors) and random allocation (recommended in the literature). Unallocated means that the simulation is started with a set of void timetables (chromosomes), while random allocation means that the initial chromosomes have all the activities scheduled at a random time of the week. The penalty set for an unallocated activity is very high, so at the beginning of the simulation the random allocation method presents a much fitter population of chromosomes than the unallocated initialization method. But real results appear near the end of the simulation, when the two graphics are getting closer. Examining the logarithmic chart in figure 7, it appeared that the experimental method is converging much faster to 0 conflicts than the other. So, starting with non-initialized chromosomes seems better than starting with randomly initialized chromosomes. A plausible explanation is that it seems that the unallocated initialization method obeys in a higher degree the concept of schema ([4]). The following citation is given in [4] for binary chromosomes (each gene is either 0 or 1), but it can of course be generalized to non-binary chromosomes: A schema is built by introducing a don t care symbol in the alphabet of genes. A schema represents all strings [...] which match it on all positions other than * [...] For example, the schema (*1*1100100) matches four [binary] strings, {(0101100100), (0111100100), (1101100100), (1111100100)}. Of course, [...] the schema (**********) represents all [binary] strings of length 10. The

conclusions in [4] indicate that the evolution seems to be guided by the emergence, combination and preservation of the best schemas, by crossover and mutation. The evaluation of two chromosomes in the unallocated initialization closely follows this schema concept, while comparing two chromosomes in the randomly allocation procedure introduces noise in the evaluation functions. The unallocated initialization and subsequent crossovers resemble more closely the concept of the schema than the random initialization. Figure 7. The values of the hard fitness function on a logarithmic scale for the allocated and unallocated initialization methods 4. CONCLUSIONS This paper analyzed experimentally the timetabling problem. It was presented a particular genetic algorithm, derived from the proposals reported in the literature, but also there were presented some improvements to the methods and parameters used in the cited papers. 5. REFERENCES [1] H.S.C.Lee, Timetabling Highly Constrained Systems via Genetic Algorithms, Masters Thesis, University of Philippines, Diliman, Quezon City, 2000 [2] A.Schaerf, A Survey of Automated Timetabling, Artificial Intelligence Review 13(2), 87-127, 1999 [3] S.Tongchim, Coarse-Grained Parallel Genetic Algorithm for Solving the Timetable Problem, Proc.of the 3 rd Annual Nat.Symp.on Computational Science and Engineering. Bangkok, Thailand, 1999 [4] Z.Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs (2 nd ed.), Springer- Verlag, 1994