Evolving Control for Micro Aerial Vehicles (MAVs)

Size: px
Start display at page:

Download "Evolving Control for Micro Aerial Vehicles (MAVs)"

Transcription

1 Evolving Control for Micro Aerial Vehicles (MAVs) M. Rhodes, G. Tener, and A. S. Wu Abstract This paper further explores the use of a genetic algorithm for the purposes of evolving the control systems for distributed micro aerial vehicles (MAVs). The control systems are essentially evolved rule sets which take into account target surveillance area coverage and built in sensors in the MAV. As in previous work where the overall task is distributed, the same rule set is contained in each MAV. The primary extension to previous work is to make the MAV representation and fitness function evaluation more realistic for a more useful rule set evolution. Secondary extensions would be to parallelize the rule set evolution process. I. INTRODUCTION Real world application is the inherent driver for the majority of research in robotics. It is often the current limitations in technology or design constraints that lead the way to new advances, and often workarounds, to the development of these robots. The robots of interest in this research, Micro Aerial Vehicles (MAV), are limited on many levels as far as robotics are concerned. Size, sensors, and computational power are the primary concerns in developing these airborne spies. Obtaining an optimum control rule set by weeding out the unnecessary and unfit is crucial to a fast and efficient system. Given these limitations, it seems fitting to optimize the rule sets used to control a group of independent and autonomous MAVs using an algorithm based upon the principles of Darwin s Natural Selection. As a result, using a Genetic Algorithm is the focus of parental and filial research. The questions of preliminary research such as this, because of the cost to develop an actual MAV platform, are being sought through simulation. After reaching a stage at which development of rule sets for a realistic model platform is satisfactory, there is This work was supported by the Office of Naval Research, The Naval Research Laboratory, and the National Research Council. M. Rhodes is with the Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL 32186, USA mrhodes@mail.ucf.edu G. Tener is with the Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA boris@cfl.rr.com A. S. Wu is with the Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA aswu@cs.ucf.edu potential to create a small population of individual MAVs for verification. Until then, MASON[1] is a simulator being used to simulate the evolved MAV rule sets. A. Background The overall research goal for evolving MAV rule sets with a GA is to attain a better understanding of how to properly evolve control rules for distributed robot systems, or pods. In order to fully understand this process it is necessary to begin with the GA itself. This has been done in the parental research of Wu, Schultz and Agah[2] where a small compact rule set is sought using a variable length GA. The base rule element for each rule set defined in the initial research used an eight bit condition string, followed by a four bit action string. This base rule element is directly related to the model upon which the simulated MAVs are based. The subsequent research of R. Shumaker, K. A. De Jong, S. Luke et al [3] developed this basic rule set further in order to expand the research as to the effects of social hierarchies upon the learning process. Their research found that social dominance had no negative impacts upon the rule set evolution. The MAV model used in previous research has been inherited by the current work. Each MAV has eight sensors equally spaced around its circumference. These sensors originally sensed only an Area of Interest (AoI) when it was in range. Additionally, the MAV can be controlled to move relative to itself in the same directions in which its sensors point. The later research provided in [3] provided an additional set of sensors to the MAV model which allow for the MAVs to sense each other with simplified proximity sensors. This kind of sensor was required for the use of social behaviors. The newly added sensors are actually quite realistic. They could probably be replicated with something as simple as bluetooth or infrared sensors. Consequently, the base research was ready for extension.

2 II. PROBLEM FORMULATION Extending this base required looking at the same problem from a new perspective. The previous research by Wu, Schultz and Agah[2] focused on the effect of initial genome lengths and mutation rates on the evolved sets. Additionally, the effects of these factors on the length of the fitness plateaus was under scrutiny. The new approach was to look more closely at the actual efficiency of the rule set developed by the end of the fitness plateau. In order to supplement the previous research the focus was to examine the problem with newly formulated fitness functions, as well as to augment the MAVs with a new kind of social behavior. It is important to note that these additions required extra data analysis that was not previously available with the existing code base. As a result, the system was modified to provide this extra data; and a set of scripts was created to collect and organize the new data. A. Social Behavior: Operating Modes One of the primary purposes of this research was to further explore the effect of social behaviors on the development of MAV control rule sets. In order to move toward more realistic operation, the MAVs were given modes of operation to work around while being evolved. Two modes of operation which are naturally inherent to the task of surveillance were integrated into the MAV model. Each MAV now considers itself to be in either a scouting mode and out searching for new AoIs, or in a monitoring mode in which it sits and monitors the AoI. This addition required some change to the basic rule element. The new basic rule element consists of the same two condition strings introduced in [3]. Additionally, the 4 bit action was converted to two possible three bit actions which are chosen by the current operating mode. The state changes of each MAV from one mode to the other is dependent upon some hard coded rules. Essentially, when a MAV finds an AoI it goes into monitor mode unless there are other MAVs present. If and when another MAV is sensed while in monitor mode, a return is made to scout mode for a given time count. 1) Condition Matching: Adding the basic operating modes and an additional action string led itself to another possible improvement. Under normal operating conditions, whether or not operating modes are in effect, the rule matching process takes place by finding the rule with the smallest hamming distance with the entire 18 bit condition string. Condition matching improves upon this by taking into account the fact that depending on the mode it is in, it only cares about one set of sensors anyways. If the MAV is in scout mode, only the sensors pertaining to AoIs will really matter. As a result, matching tells the the rule matching algorithm to only calculate the hamming distances for the AoI conditions. Similarly, in monitor mode, the hamming distance is only calculated for the MAV sensor condition strings. B. New Fitness Functions The other experimentation with the GA search technique for this problem required an analysis of fitness function composition. To start, two new fitness functions were added to the mix in order to study how well the system responds to different fitness requirements. These fitness functions were compared to the original fitness function. In order to put the fitness functions on equal footing with each other in terms of analysis, they were all scaled from. to 1.. It should be noted that the original fitness function was previously on a scale from. to 2.. 1) Original Fitness Function: The original fitness function was simply the sum of two percentages: the survival percentage and the percentage of time which there was an AoI being monitored. The resulting fitness reached a saturated plateau relatively quickly in previous experimentation. Noting the MAV population size in respect to the surveillance area, the fitness would always be fairly high. The original fitness function was deemed to not be critical enough. New fitness functions would be more distinctive in its requirement for surveillance times. 2) Survival Emphasis: The over survive fitness function is not too dissimilar to the original fitness function. Both of them measure the MAV survival percentage, however, over survive places extra emphasis upon it. The AoI surveillance is also considered in this fitness function as well. Though, in this iteration of the fitness function, it was decided to look at the total count of MAV sensors sensing AoIs across all of the simulation time steps. This should provide a better scaled number. The actual fitness function was a sum of the survival percentage, and this AoI sensor count weighted again with the percentage of MAVs alive at the end of the simulation and scaled. 3) Surveillance Emphasis: The most surveyal fitness function is once again, not too dissimilar from the

3 previous. This time however, the surveillance measurement was more objective. The measurement taken was that of the sum of the total time each MAV was over an AoI and considered to be surveying, regardless of its mode. A MAV was considered to be surveying if it had at least 4 of its sensors sensing AoIs. Based upon that, the most surveyal fitness function is simply the sum of this surveillance measurement and of the survival percentage. Of course this value is also scaled between. and 1. by dividing by 2. C. Visualization & Analysis Tools In order to make these experiments go smoother, a series of tools were created to run the simulations in the testing environment, and provide visual feedback beyond what MASON was already providing. The most important of these were a series of scripts that took the results of all the experiments that had been performed and automatically collected and graphed the data. Reruns of any portion of the experiments would then be able to be quickly evaluated. Additionally, a rule visualization tool was created. This tool, referred to as Rule2English, takes any binary rule, and illustrates the actions the MAV should take as a result of having the rule matched. An example of this output can be seen in figure 1. The m and s in the output correspond to which action would be fired based upon the mode. The m and the s represent monitor mode and scout mode respectively. Referencing figure 2, it can be seen that an action of corresponds to moving in the forward direction. The given rule example shown in figure 1 is From this, the last six bits represent the mode actions for scout and monitor modes. The first three of which are for scout mode, and the second three are for monitor mode. As illustrated, if the MAV were in scout mode, the action to be taken would be to move forward. III. EXPERIMENTAL DETAILS Given the additional social behavior of operational modes and the two additional fitness functions to evaluate, there were a number of experiments to be run. Essentially, there was a basic set of experiments which used the operational modes and varied some of the related parameters. This basic set of experiments was run for each of the fitness functions. Table I outlines all of the experiments performed. Additionally, one final experiment was run for the sake of curiosity. The original fitness function was Fig. 1. Output of the Rule2English visualization tool. Right represents the forward direction for the MAV representation. Fig. 2. Illustration of the model MAV. Each group of 3 bits corresponds to the direction of travel the MAV would take if those 3 bits are found as the action of the rule being fired. used to evolve a rule set with the operational modes turned off. This is essentially experiment no. 1 in table I. The only difference is that this experiment takes the resulting best genome from experiment no. 1 and runs the simulation with operational modes turned on. A. GA s Adaptability The dynamic nature of the GA used for this research really demonstrated its ability to adapt to the multiple conditions and parameters which were thrown at it as a result of social behaviors. For each fitness function used, the run average best fitness per generation was compared for all of the experiments. The results, at first glance, seemed unfruitful. From the data collected, it can be seen that each fitness function showed very similar fitness curves

4 TABLE I SUMMARY OF EXPERIMENTS. No. Op. Modes Matching Thresh. Fitness Function 1 OFF OFF NA original 2 OFF OFF NA over survive 3 OFF OFF NA most surveyal 4 ON OFF NA original ON OFF NA over survive 6 ON OFF NA most surveyal 7 ON ON 1 original 8 ON ON 1 over survive 9 ON ON 1 most surveyal 1 ON ON 2 original 11 ON ON 2 over survive 12 ON ON 2 most surveyal 13 OFF OFF NA original Experiment 13 is simply a rerun of experiment no. 1, but simulated with operational modes turned on after having the rule set evolved with modes turned off. This is one of the experiments suggested in [3]. Fitness Fig.. over survive Fitness Function Evaluation with varied parameters, averaged over runs. 1 Fitness original most_surveyal over_survive Fig. 3. Best fitness per generation, averaged over runs. Each of the 3 lines represents a different fitness function being used. Fitness Fig. 6. most surveyal Fitness Function Evaluation with varied parameters, averaged over runs. Fitness Fig. 4. original Fitness Function Evaluation with varied parameters, averaged over runs. Surveyors (MAVs) Fig. 7. Final surveyor count with original fitness function and varied parameters, averaged over runs.

5 Surveyors (MAVs) MAVs Fig. 8. Final surveyor count with over survive fitness function and varied parameters, averaged over runs. Surveyors (MAVs) Fig. 9. Final surveyor count with most surveyal fitness function and varied parameters, averaged over runs. per generation. Of the four fitness curves for each of the three fitness functions, there was no significant difference whether or not there was any social behaviors present. Additionally, three of each of the fitness curves were obtained from simulations with different settings for the operational modes social behavior. The results of the fitness analysis for the original, over survive, and most surveyal fitness functions respectively can be seen in Figures 4,, and 6 respectively. B. Fitness Function Improvement One of the results evident from this research is that the intuitions followed, with respect to the fitness functions, is that the new fitness functions would not be as prone to the fitness saturation experienced by the original fitness function. Comparison of the fitness curves without any social behaviors in the system Fig. 1. Surveyor count progression with original fitness function and operational modes off. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no. 1. show that the original fitness function does indeed saturate at around a.9 fitness level. The other two fitness functions provide curves that do not saturate. Additionally, the fitness curves provided by the over survive and most surveyal fitness functions seem to be logarithmic in nature. Initial generation fitness changes are rapid, and they quickly settle down to provide very minor improvements in fitness over time. These results are evident in figure 3. The current experimentation also showed that the new fitness functions did indeed provide a performance improvement over the original fitness function. The primary improvement is evident from the data collected in regards to the final surveyor count per generation. The results of the final surveyor count from the rule sets evolved with the original fitness function yield a count of 9 to 1 MAVs. These results can be seen in figure 7 The results of the over survive fitness function yield a count of 14 to MAVs. These results can be seen in figure 8. The most surveyal fitness function had results which were similar. Those can be seen in figure 9. Additional support from run time statistics?... C. Runtime Statistics Besides the fitness comparison and MAV survival percent, seven additional noteworthy evaluations were made. For experiments 1 through 6, and experiment no. 13, the best genome at the end of the evolution was used to collect simulation runtime statistics. Each rule

6 MAVs Fig. 11. Surveyor count progression with over survive fitness function and operational modes off. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no. 2. MAVs Fig. 12. Surveyor count progression with most surveyal fitness function and operational modes off. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no. 3. set was evaluated in simulation over 4, time steps in order to get a good feel for the trend of the evolved genome. This 4, step run of the simulation was repeated for a total of 3 runs and averaged to obtain the presented data. For each of these evaluations, the focus was to examine the total MAV population in comparison to the total number of MAVs in surveillance, and the number of MAVs in surveillance over each region. Upon examination of this data, much more useful data and analysis was encountered. First of all, it is important to note the results of the previous research on the same footing as that of the current research. Experiment no. 1 within the scope of this research provides the control for these results. The analysis of these results, which are presented in figure 1, can then be compared with the results of the other runtime statistics. 1) Experiment No. 1:: To begin, it should be noted that the results in figure 1 are not very favorable. There is a transient in all of the runtime experiments which illustrates the introduction of MAVs into the system every 2 simulation steps. Consequently, there are MAVs being obliterated prior to the end of this transient. Additionally, the total number of MAVs and the total number of surveyors continually decreases from the end of the transient onwards. Despite the constant obliteration of MAVs, these results show that shortly after the transient response of the system, the number of surveyors over each region begins to equalize such that each region has approximately the same number of surveyors. Additionally the gap between the total number of MAVs and the total number of surveyors slowly but continually decreases as time progresses. So aside from the certain doom for the MAV pod conducting the surveillance, things tend to look good from the point of view that the percentage of MAVs actually surveying is gradually increasing. 2) Experiment No. 2: The second experiment takes a look at how the first new fitness function, over survive, compares with the original. All of the parameters for the experiment other than the fitness function are the same as that of experiment 2. Overall, the over survive fitness function yielded a better rule set than the original fitness function. Evidence of this is presented in figure 11. The first important detail to note is that on average, the total MAV population is already greater by three starting at the end of the transient. By the end of the run, the MAV population is still greater by about 2 MAVs. Additionally, the overall population has a higher percentage of MAVs in surveillance of the AoIs. This can be seen from the smaller gap between the the first and second lines which represent the total MAV population and the portion of which are in surveillance at any given time. In figure 1 there is definitely a larger disparity. Unfortunately, as illustrated by the first experiment in figure 1, there is a continual demolition of MAVs in this experiment. Given enough time, the MAV population would probably extinguish itself. On a

7 MAVs 2 1 MAVs Fig. 13. Surveyor count progression with original fitness function and operational modes on. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no. 4. Fig. 14. Surveyor count progression with over survive fitness function and operational modes on. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no.. more positive note, the MAVs that are still alive seem to do a decent job of keeping themselves distributed among the different AoIs. 3) Experiment No. 3: The last fitness function introduced in this research, most surveyal, also evolved a better rule set than the original fitness function. These results are illustrated in figure 12. The performance difference of the most surveyal fitness function s rule set is only marginally better than that yielded by over survive. For starters, the initial population at the end of the transient is the same in both. As an improvement, however, the final population size is greater by about 2 MAVs with the most surveyal fitness function. It is also worth noting that the percentage of the MAV population in surveillance is marginally lower throughout the simulation than in the over survive runtime statistics. This is illustrated by the separation of the top two trends in figure 12. This slight decrease in performance, however, is offset by the fact that there is a slightly lower mortality rate in the MAV population as well. As a result the final Surveyor count, which is equally distributed among the regions, is marginally higher than that illustrated in figure 1. 4) Experiment Nos. 4, & 6: The next three experiments are from the results of evolving the rule sets with operating modes turned on. Upon examination of the results, it was seen that adding the social behavior to the system added stability as well. In all three experiments, the MAV populations were saved MAVs Fig.. Surveyor count progression with most surveyal fitness function and operational modes on. The total refers to the total This data is an average of 3 simulation runs with the best genome from experiment no. 6. from their previous demolition as was illustrated with the mortality rate remaining positive. In this set of experiments, the mortality rate quickly drops to zero. In experiment no. 4, where the original fitness function is used, the most dramatic improvement is seen. As is illustrated in 13, the MAV population never drops below 21 MAVs. More importantly, at about 1, time steps, the surveillance population remains constant at about 1 MAVs. It must be noted, however, that the MAVs in surveillance are less than half of the population. Additionally, the MAVs that

8 MAVs 2 1 Rule Set Size original most_surveyal over_survive Fig. 16. Surveyor count progression with original fitness function. Experimental run of the best genome evolved with operational modes turned off, but averaged over 3 simulations with operational modes turned on. This is directly from the results of experiment no. 13. are in surveillance are not equally distributed over the different areas of interest. Similar improvements are also seen in experiment. The areas where the over survive fitness function improved over the original fitness function remained. Additionally, the MAV population at the end of the simulation was increased on account of the operating modes. In comparison with the results from the original fitness function seen in figure 13, the distribution of MAVs in surveillance over the different AoIs is roughly the same. It should also be noted that a marginally higher percentage of the MAV population was in surveillance. All of these results can be seen in figure 14. Some interesting results do surface in experiment no. 6 when the operating modes are turned on for the rule set evolution with the most surveyal fitness function. All of the results from experiment nos. 4 and have similar counterparts in experiment no. 6. The factor that surprises the most is that in comparison to experiment no., the operational modes do not help the MAV population s numbers as much. Stability is still added, and the percentage of MAVs in surveillance is also marginally improved. But essentially, evolution with operational modes turned on helped the over survive results to a much larger degree. Some of this may be due to the fact that upon observing the results of experiment no. 6, which can be seen in figure, the MAV mortality rate is slightly accelerated for a short period immediately following the transient. Fig. 17. Rule set size per generation, averaged over runs. No strong correlation. ) Experiment No. 13: In addition to the other runtime statistics collected, it was decided to see how the rule set developed in experiment no. 1 would perform if a simulation was run with the operational modes turned on. These results, illustrated in figure 16, do show that the social behavior adds stability to the MAV population. It also shows, however, that the initial survival rate in the MAV population is the worst out of all of the experiments. Regardless, the final numbers for the MAVs covering AoIs are better. D. Genome Size One final observation was made as to the resulting genome sizes of these different evolutionary experiments. In general, there is no conclusive correlation between the resulting rule set size and the fitness function used to evolve the rules. In comparison of genome size to the generation for all three fitness functions, the results strongly resemble noise. This can be seen in figure 17. IV. CONCLUSIONS AND FUTURE WORKS A. Conclusions Overall, the results of this research provided satisfactory extension to the conclusions made previously. For starters, the variable length GA used in [2] continued to prove useful. Additionally, the results of introducing social behaviors to an evolutionary system extends the conclusions in [3]. The results show that not only do social behaviors have no negative impact on the GA evolutionary process, but that they can in actuality add stability to the system and improve performance.

9 It is also important to note, however, that this research is still has many possible extensions worth examining. The fact that the number of surveyors over the different AoIs is can be improved with the different fitness functions and social behaviors, from these results alone, there is no evidence that the MAVs are doing their jobs as any better. The results simply show that measures can be taken to keep the MAVs from being obliterated, and allow them to do their jobs, despite how well they are actually conducting surveillance. B. Future Works Subsequent research should focus on methods of examining the quality of surveillance. This can most likely be done by making the MAV model itself, as well as the simulation environment more realistic. Adding factors such as fuel consumption and more precise positioning methods such as GPS to replace AoI sensors would be a great start. In doing this, fitness functions can be developed that focus more on how well the surveillance job is being completed, and not just on how well the MAVs manage to stay alive and stay over AoIs. REFERENCES [1] (24) Mason simulator website. [Online]. Available: eclab/projects/mason/ [2] A. S. Wu, A. C. Schultz, and A. Agah, Evolving control for distributed micro air vehicles, in Proc. IEEE International Symposium on Computational Intelligence in Robotics and Automation, Monterey, California, Nov. 1999, pp [3] R. Shumaker, K. A. DeJong, S. Luke, A. S. Wu, K. Garfield, and J. HandUber, Evolving mav control rules within a social hierarchy, unpublished.