RoboAKUT 2006 Team Description

Size: px
Start display at page:

Download "RoboAKUT 2006 Team Description"

Transcription

1 RoboAKUT 2006 Team Description M. Utku Tatlıdede, Kemal Kaplan and H. Levent Akın Boğaziçi University, PK Bebek, İstanbul, TÜRKİYE, WWW home page: Abstract. RoboAKUT is a multi-agent rescue team developed at the Artificial Intelligence Lab of the Computer Engineering Department of Bogazici University. Our primary goal is to build a rescue team based on market paradigm and show that this method can be successfully implemented in highly dynamic, multi tasking and multi-robot environment. In addition to the general coordination issues we also work on multi-agent path planning which is not addressed by many of the teams. 1 Introduction Multi-agent systems have gained importance and have been implemented in many fields during the last decades since they are more reliable and fault tolerant due to the elimination of single point of failure and faster due to parallelism. Among many coordination paradigms proposed, market based coordination is a promising technique which is well suited to the requirements of the multi-agent systems [1]. Although multi-agent systems have been developed for solving different types of problems, these problems share some common characteristics. Gerkey and Mataric [2] developed a taxonomy based on three attributes of the problem definition. Here after we will refer to this taxonomy as GM taxonomy. First, the problem is categorized as either single robot (SR) or multi robot (MR) depending on whether the task can be achieved by one or more robots. The next categorization is done regarding whether a robot is capable of only a single task (ST)or more (MT). Finally, the problem is categorized as instantaneous (IA) if all the tasks are known by the agents initially. The counterpart definition of the assignment property is time extended (TA) which describes the problems where the tasks are discovered during the course of action. These definitions help to define problems in the multi-agent domain.the RoboCup Rescue Simulation domain contains many complex problems and is therefore very suitable for investigating multi-agent team development approaches. RoboAKUT is a multi-agent rescue team developed at the Artificial Intelligence Lab of the Computer Engineering Department of Bogazici University. RoboAKUT performs rescue operations on a simulation environment provided by the RoboCup Rescue Simulation League. RoboAKUT has participated in the RoboCup Rescue Simulation Competition held in Fukuoka, Japan in 2002,

2 2 Robocup 2003 held in Padua, Italy, Robocup 2004 in Lisbon, Portugal and Robocup 2005 in Osaka, Japan. In the previous competitions our team implemented a multi-queue scheduling algorithm where each task has a predefined priority and they cannot be changed during the simulation. Although we had defined very good task priorities, this only solves the problem of selecting the best action in the queue. However, we also need to switch the priorities of the queues. Moreover we must take advantage of other tasks that can be completed on the way to our target. This year, our primary goal is to build a rescue team based on market paradigm and show that this method can be successfully implemented in highly dynamic, multi tasking and multi-robot environment. In addition to the general coordination issues we also work on multi-agent path planning which is not addressed by many of the teams. 2 Agent Architecture 2.1 Task Representation Task definition is the key for a successful framework because it forms the basis of agent abilities in the market. The properties of a task in our framework are discussed in the following sections. Task Constraints Resource constraints are various for a system due to the fact that there are many different objectives to maximize many resources such as time and fuel used by the agents. For heterogeneous robots, the robot type and capabilities are also a type of resource constraint. Besides the resources for a multi agent system there are also coordination and synchronization constraints for tasks. The tasks may depend on the completion, the start and/or end time of another task. We propose a task decomposition as a graph that represents these properties. Task decomposition is used by police force agents to sell their tasks to other agents to minimize the total time of path clearance. Task Planning Single item auctions, however, are not sufficient for ensuring system optimality. Single item exchanges between robots with or without money lead to poor and suboptimal solutions in some but apparently possible cases. A simple scenario is depicted in Figure 1. The robots auction for the tasks which costs the lowest for them. Therefore after the first auction round R1 adds B1 to its task list and R2 adds B3 to its task list. In the second round R1 will add B2 because it becomes the best alternative for the task. After all the targets are finished B2 will not be exchanged because it is best handled after B1. The auction rounds are finished when all the agents are assigned to tasks. But after that single exchanges cannot remedy the problem. The solution is allowing some of the agents not to be assigned to any task as our work suggests. We propose that by taking a simple plan into account while bidding in auctions, the agent is capable of exchanging multiple items in single item auctions.

3 3 The proposed approach is implemented in multi-robot exploration task. There are many research on the exploration task because of its generality. Centralized solutions do not satisfy the communication and robustness requirements. All agents communicate with the center which introduces the single point of failure problem; moreover if the communication with the center is lost or is noisy, the performance degrades sharply even to non functioning level. The exploration task must be completed in any condition even only one robot survives. The majority of the research is based on single item auctions [3] [4]. In [3] initially, all targets are unallocated. The robots bid on all unallocated targets. The bid for each target is the difference between the total cost for visiting the new target and all targets and the total cost for visiting only the targets are already allocated to the robot. These total costs are computed using a TSP insertion heuristic. The robot with the overall lowest bid is allocated the target of that bid and then is no longer allowed to bid. The auction continues with the remaining robots and all unallocated targets. After every robot has won one target, all robots are again allowed to bid, and the procedure repeats until all targets have been allocated. Finally, single targets are transferred from one robot to another, starting with the target transfer that decreases the total cost the most, until no target transfer decreases the total cost any longer. Fig. 1. A task allocation scenario. Golfarelli et al [5] clusters the tasks and assigns them to agents. Money is not defined in the system so the agents are only allowed to swap tasks to increase the system performance. Although an increase in the performance is achieved, the system is not optimal. However, historical record shows us that if people could have met all their needs by barter, money would not have been invented. In combinatorial auctions the targets are auctioned in combinations so as to minimize the cost by grouping the targets and allocating in an efficient way. One of the implementations [6] uses GRAP HCU T algorithm to clear auctions. The major drawback of the combinatorial auction method is its time complexity. In [7] the GRAP HCU T is outperformed by an algorithm called P rimallocation. Although the main idea was to demonstrate the Prim allocation they represent another one which is called Insertionallocation. In Prim allocation each robot only submits its best (lowest) bid to the auctioneer, since no other bid has any possibility of success at the current round. The auctioneer collects the bids

4 4 and allocates only one target to the robot that submitted the lowest bid over all robots and all targets. The winning robot and the robots that placed their bid on the allocated target are notified and are asked to resubmit bids given the remaining targets. The bids of all other robots remain unchanged. The auction is repeated with the new bids, and so on, until all targets have been allocated. The insertion allocation is same as prim allocation except that the agent generates a path to its selected target and bid accordingly. The results of our studies show that both prim and insertion allocation is better than combinatorial auction implemented by GRAP HCUT. 1. check whether the target is reached 2. plan current tasks 3. bid for the lowest cost item 4. if auction won allocate task 5. else (a) if deadlock detected solve according to total plan cost (b) else stay Fig. 2. Market plan algorithm pseudo code Another approach which is in fact a combination of single auctions and combinatorial auctions is clustering [8] and task decomposition techniques [9]. In Dias et al s work [8], the tasks are clustered and traded as clusters therefore partially eliminating the inefficiencies of single item exchanges. The size of the clusters is the main problem and even the clusters are traded. The approach only removes the intra-cluster inefficiencies however the inter-cluster trading is still achieved as single bid auctions. Dias further defines opportunistic centralization where a leader role coordinates a team in a centralized fashion in order to increase the system s performance. But this approach is limited by communication quality. The task decomposition and clustering have a common point that both try to trade more than one task at a time. Zlot et al [9] decompose a task into subtasks as an AND-OR tree and any branch can be traded at any level. The decomposition s communication requirements are high and there is no general way to decompose tasks. The agent simply plans a route that covers all the known targets by using the simple TSP insertion heuristic. Each agent auctions for the lowest cost target which is in fact the closest target to the agent. Other agents bid in the auction according to their plan cost. The plan cost is the distance between the agent and target if the target is the closest target or the distance to the previous target in the plan. For example, In Figure 1 R1 constructs the path and its cost as B1:6, B2:4 and B3:8, in total 18. R1 constructs the path and its cost as B3:4, B1:4 and B2:4, in total 12. Both agents start auctions for targets which are closest to them, B1 and B3 respectively. R1 looses the auction because the R2 bids 4 for

5 5 B1 whereas R1 bids 6. B1 stays for this time step. R2 wins the auction because it bids 4 for B3 and R1 bids 8. Apparently to solve the problem optimally we need to stop R1 if time is not our first priority.the pseudo code for the algorithm is given in Figure 2. We tested our approach against some common market models and the results were encouraging. We have also implemented it for police forces and ambulances. We plan to collect statistical data in the rescue environment. Task Opportunities We propose for each task type, a list of possible task types that can be executed simultaneously be executed to be defined. Before bidding the agent checks its task list and tries to find a task opportunity that can increase its revenue, i.e. decrease the cost of taking a task. The benefit of this approach is the same as task decomposition which is trading for a task where the combination of tasks is planned behind the scenes and implementing such scheme without using complex task models in the framework. The default task opportunity for a task type is itself. Therefore task opportunities are the common version of the plan based bids paradigm. However its implementation is easier since each task type is encapsulated and complexity is reduced. A police force can remove other blockades on its path to a specific blockade target. This is an example of plans. In contrast, an agent can explore buildings on the way to the blockade and while bidding it can offer less cost to the system. This scenario represents task opportunities. Cost and Revenue Each task has a cost based on the resource requirements. However representations that depend on only the cost is not sufficient for multitasking environments for an agent may only minimize its costs but this does not yield maximization of the overall profit. Therefore the revenue for each task must also be defined and used by the system. In a market economy every asset has its price and the price is determined by the market mechanisms like supply and demand. So the price is not static and it is also subject to change according to the changes in the environment. Therefore revenue is not static. In multi tasking dynamic environments where tasks are assigned in time extended fashion any task can be abandoned if its cost is affordable. As a result the cost is not only a function of the resource usage but also the opportunities that the agent is missing. Our profit formalization is as follows: P rofit i = revenue i - (resourceusagecost) i - (opportunitycost) i for tasks when agent is idle P rofit i = revenue i - (resourceusagecost) i - (opportunitycost) i - (cancellationcost) j for tasks when agent is when agent is executing the task j Task Notifications All the tasks which the agent can execute or can be a part of are stored by all agents. Therefore no task is left unexecuted because the task is never lost due to robot failures or communication errors. But the agents cannot store the tasks forever even the task is finished or expired. The

6 6 task status also needs to be synchronized. Task notifications are defined for each task and sent when necessary. 2.2 Task Execution Preconditions A task is valid for some time frame, a world state or simply still exists in the environment. These preconditions must be checked for each time step. If an inconsistency is discovered other agents are notified as described in Task Allocation and Coalition Formation Coalitions are necessary for most of the heterogeneous agent domains. Because of the diversity of the robot capabilities, one agent is not be able to accomplish a task alone. The agent is single when the task is introduced to it. The task can be categorized as single task ST which only one agent can be able to achieve it or MT which a coalition should be formed. ST case: The task is auctioned and one of the agents wins the auction. After this part if the task is decomposable the winner agent starts auctions for sub tasks. A coalition is formed if any part of the task is sold. MT case: The tasks are auctioned depending on the task graph and a single task is reached. Then the subtasks are treated as single task case. Both cases are almost identical and show that the tasks can be decomposed and auctioned iteratively. For ST tasks there is another case in which the coalition emerges from the needs of heterogeneous agent teams. The task types are predefined as ST or MT but while executing a ST a task that must be done by other types of agents can be discovered that blocks the agent and prevents it from doing the task. For example, while trying to reach a target a block can be discovered on the road and unluckily on the only path to the target. In this case, the agent requests help from other agents by auctioning as in two cases. We call this dynamic coalition formation and the required task is added to the task graph of the requiring task. 2.4 Roles The agents prepare auctions, bid for the auctions and execute the tasks won. All these roles are general for auction based frameworks. An additional leader role is defined for our framework. Leader: The coordination of the auctions requires high quality and high band-with communication. If all the agents are allowed to trade as they want, communication becomes the bottleneck for the system, especially when the communication channel is limited. Therefore, we use a leader role to minimize the communication requirements by communicating the robots within the coalition and inform them accordingly and act as the interface of the coalition

7 7 with regard to other agents. So the leader must exist in order to minimize inter and intra agent communication. For example, for communication types where the range is important the agent which can communicate with most of the agents is selected as the leader. Since the leader is the most knowledgeable agent it can also be used for centralized planning of small tasks. The leader may or may not execute a task and this is not a constraint for the leader. Task Executer: Task executer acts in a coalition and communicates with the leader for the changes in the revenue function and executes the task. Task executer can join any other auction to increase its revenue. All agents are task executers even they are not assigned to any task because the default rest task is executed whenever the agent is idle for all agents and shows that the agent is alive. Bidder: Bidder is a hypothetical entity which is in fact used for representation purposes to ease the understanding of the framework. Since each agent is executing a task all the time there is definition of an idle agent. The bidder is the name of the task executer which is participating in an auction. Auctioneer: The auctioneer role manages the auction process for the agents. Just like the bidder, this role is also hypothetical because an agent can start an auction any time for a task which is not in the scope of the coalition. 2.5 Layered Architecture Layers define abstractions between the framework entities and minimize complexities due to coupled designs. Our framework is composed of three layers namely trading, scheduling and behavioral. The trading layer is responsible for auctions and relevant communication related to them. The scheduling layer is responsible for selecting the most profitable action in the task list, considering synchronizing issues like suspending or revoking of tasks. The assessments are necessary for dynamic environments where the state and the profitability of the tasks are rapidly changing. Assessment involves current task list and revenue cost models of the task in order to estimate the profit. Tasks are executed by means of robotic behaviors. The behaviors can be basic or complex. The robot s interaction with the world is achieved through behaviors. All layers have access to message queue directly. However for environments in which the message bandwidth is limited the queue acts as a priority queue.the framework is depicted in Figure Learning In many applications simple cost functions suffice the needs of the domain. However, in highly dynamic environments where the tasks are assigned in time

8 8 Fig. 3. Layered architecture of the framework. extended fashion some of the tasks need to be canceled, postponed according to other opportunities in the environment. Our main focus in learning will be learning bidding functions and opportunity costs of the task by reinforcement learning. But we also consider other learning issues in the layers. 3 Path Planning Since the performance of planning module depends on the performance of underlying path planning module, we are developing an efficient and robust path planning library for RoboAKUT. Since the disaster space consists of nodes, we mainly deploy our graph solving library project, path4j, which is under development for the robotics applications in our laboratory. Although the structure of the roads on the disaster space forms a relational graph, it is not a good idea to use each node on the disaster space as a node. Therefore, we introduce two new concepts, main road and main node. A main node is a node which has different number of roads than two. For example, for a dead-end road, the final node has only one road, we choose this node as a main road because it is an endpoint for our relational graph. If we consider a crossroad node, there are at least three roads connected to the center node. This node is again a main node for us because it represents a node with a node degree greater than two. Actually, there are no nodes with degree two in our relational graph, which consists of main nodes. Similarly, the main rode is the collection of roads and nodes between adjacent main nodes. Since the complexity of the complete path planning algorithms increases with the number of nodes for relational graph, this technique significantly reduces the computational cost. In addition to global path planning, a local path planning strategy should also be considered for the nodes with in a main road. However, this task is not as complex as finding the global and optimum path since the number of nodes are quite small in a main road. The decomposition of the path planning task is given in Figure 4. The global path planner is responsible for finding a complete path in the relational graph

9 9 of the disaster space. After deciding the next step, which is the next main road that should be taken, the local path planning module tries to find a way from starting main node to the end node. If it fails it simply signals the global path planning module to find an alternative route. Fig. 4. The decomposition of path planning task. 4 Team Implementation Priorities define the priorities of the entities of the same kind. The highest priority of the task is selected as the candidate for that action type: Platoon agents and center agents both implement the same auctioneer role to control the task assignment. Our preference is using the center however without implementing the role for ambulance teams the system is subject to single point of failure because of the ambulance center. The platoon agents leave the auctioneer role to the center if they detect the failure of the center or even it does not exist. Since the auction mechanism is based on messages, due to message delivery time our agents start auction process before finishing the current task. For example, if the auction is among the same type agents, in the first step an agent opens the auction, in the second step all agents who want to participate in the task bid. Depending on the communication method implementation the agents can clear the auction themselves if messages are broadcasted or the auctioneer clears the auction and awards the winner(s) in unicast/multicast messaging cases. If the agent is idle it selects to operate the task first while participating in the auction. 4.1 Ambulance Team and Ambulance Center Rescue Civilians Civilian information is acquired by the auctioneer. The civilian is categorized as helpless, will die soon, will die and will live according to time to live prediction and closeness to disaster area. For the civilian classification, civilian data dimension is decreased to one using PCA and classified using a Bayesian classifier. The life time estimation is based 4 th order polynomial fit

10 10 of the PC data. A civilian which cannot be rescued even if all the agents participate in the operation is classified as helpless and they are neglected. The will live and will die category are not taken into account unless all the buildings are explored. We plan to calculate the opportunity cost of rescuing a civilian in these categories by using reinforcement learning. All agents bid for the civilian which is the one that we can rescue more civilians if we start with him. The bid is the travel cost for an idle agent. If the agent is currently processing a rescue operation and it estimates that it has enough time to participate in the new task and continue its operation thus the total revenue will increase in addition to travel cost agent uses the cancellation cost which is again the travel cost is the bid. The auctioneer clears the auction by deciding the number of agents by calculating the minimum agents required for the task. If resources are not sufficient the civilian is queued for re-auctioning. Canceled rescue operations are re-auctioned immediately. Help Agent Helping an agent is the highest priority task. However if the ambulance is rescuing a civilian which will be died soon and the agent is not under life threat the action is postponed until the current task finishes. Although this task encountered rarely is one of the hardest tasks to behave optimally. We think that this choice is a very good example of opportunity task and plan to train the agents on this issue. 4.2 Police Force and Police Center Clear Route Clear route requests are initiated by platoon agents when they are not able to reach a target. The message includes the last position of the agent (AP) and the target (TP). The clear route task must be completed immediately after it is issued however the time can only be minimized by assigning more than one agent to the task. Since we want to minimize the time of the total operation the cost is the total traveling time and the total time required to clear the road. If the police force agent is executing a clear route process the new request is merged into the current plan. The new task can be executed before the current one, after the current one or intersecting with the current one. Our agents try to minimize time first by concurrently executing the tasks via task merging process. The costs are calculated after adding the route to the plan. After the task is assigned to the agent, the task is decomposed into subtasks and re-sold in the market as the second phase of time minimization. The auction for the task includes the clearance cost of the AP-TP route, TP-AP route and half way cost from AP-TP route. TP-AP half way can be calculated by subtracting first cost from the second cost and adding the third cost. We select the smallest half way costs which the maximum of the both gives the completion time of the action. If any one of the half-way cost is greater than any of full task assignment case cost (the first and the second cost) we assign the task to one of the agents that bids the smallest value for either first or the second route. Agents try to sell their tasks by decomposing it to smaller pieces by half-way upon the changing costs of previously added tasks.

11 11 Clear Blockade Clear blockade is a low priority task therefore the bidding schema designed to utilize less agent to the task compared to other police tasks. The cost for the blockade is the travel and clearance cost. In accordance to our planning paradigm the agent bids for the blockade according to its plan. As we previously mentioned this approach decreases the total cost by assigning most of the tasks to one of the agents if the targets are very close. Therefore our agents have time to do other tasks. Moreover a reward of 5 is given for each blockade removed to allocate the tasks on only some of the agents. The target is selected by Floyd algorithm based priority, notified count and the count of clear route tasks which the block is included. Help Agent The highest priority task for the police force is assigned by single bid auction to the closest agent. The bid is the time needed to rescue the agent. If the police is executing a help agent process it generates a plan of the agents and bid accordingly. 4.3 Fire Brigade and Fire Station 4.4 Extinguish Buildings Fires are very hard to extinguish which need effective collaboration among fire brigades. The first problem is to find the right building to extinguish. In our implementation we define a fire risk map of the environment. The environment is divided into small regions and two variables from threat and to threat are calculated for each time step. A region at the corner of the map is less dangerous than a region at the center because it threats less area. A fire threats its neighbors more than far regions of the environment. The total from threat is the measure of the region threat to the environment. On the other side, the total threat that affects the region is called to threat. The target building is selected first by selecting the target region which is the minimum threatened region that threats the other regions highly. The buildings that are likely to be extinguished easily which reside in the region boundary are selected as target. The agents are distributed among regions by single auction method. After a fire brigade is assigned to a region it is free to select the best building to extinguish because the center is updated with a delay so it is useful for global planning not for specific actions taken locally. Fire brigade selects the building at the edge of the fire region which has the most probability of successful extinguish. 4.5 Common Tasks Exploration Exploration of the buildings is very important for a successful rescue operation. The civilians are injured and buried in their houses but some of them are vital whereas some are not. A good exploration should find all vital damaged civilians before they die. The only way is to explore as many buildings as possible in the short time. The map is divided into regions and the each region

12 12 is assigned to at least one agent. The exploration task uses plan based cost values to bid the unexplored buildings. By this way some of the agents are cannot win the auctions if they are far from the region. In such cases agent request for task sharing and the task allocated agent auctions the last building in its plan by bidding the total plan cost for it. This can be seen as a different type of task decomposition. The buildings are ordered according to their priority which is a function of civilian existence probability which increases by the number of civilians found in the vicinity, closeness to disaster areas. 5 Conclusion Instead of using a multi-queue scheduling algorithm where each task has a predefined priority and they cannot be changed during the simulation as was the case in Robocup 2005, RoboAKUT 2006 is based on market based agents which can utilize resources effectively even under dynamic conditions by using new approaches such as bidding according to plan developed generally for all market applications. We also concentrate on global and local path planning algorithms for multi-agent systems. References 1. M. B. Dias, R. M. Zlot, N. Kalra, and A. T. Stentz. Market-based multirobot coordination: A survey and analysis. Technical Report CMU-RI-TR-05-13, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, April B. Gerkey and M. Mataric. A formal analysis and taxonomy of task allocation in multi-robot systems. International Journal of Robotic Research, 23(9): , R. Zlot, A. Stentz, B. Dias, and S. Thayer. A free market architecture for distributed control of a multirobot system. In Proceedings of the International Conference on Intelligent Autonomous Systems, pages IEEE, B. Gerkey and M. Mataric. Sold! auction methods for multi-robot coordination. IEEE Transactions on Robotics and Automation, 18(5): , M. Golfarelli, D. Maio, and S. Rizzi. Market-driven multirobot exploration. In Proceedings of the UK Planning and Scheduling SIG Workshop, pages 69 82, M. Berhault, H. Huang, P. Keskinocak, S. Koenig, W. Elmaghraby, P. Griffin, and A. Kleywegt. Robot exploration with combinatorial auctions. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages IEEE, M. Berhault, M. Lagoudakis, P. Keskinocak, A. Kleywegt, and S. Koenig. Auctions with performance guarantees for multi-robot task allocation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, September M. B. Dias and A. T. Stentz. Opportunistic optimization for market-based multirobot control. In IROS 2002, page , September R. M. Zlot and A. T. Stentz. Complex task allocation for multiple robots. In Proceedings of the International Conference on Robotics and Automation. IEEE, April 2005.