A Principled Study of the Design Tradeoffs for Autonomous Trading Agents

Size: px
Start display at page:

Download "A Principled Study of the Design Tradeoffs for Autonomous Trading Agents"

Transcription

1 A Principle Stuy of the Design Traeoffs for Autonomous Traing Agents Ioannis A. Vetsikas Computer Science Dept., Cornell University Ithaca, NY 14853, USA Bart Selman Computer Science Dept., Cornell University Ithaca, NY 14853, USA ABSTRACT In this paper we present a methoology for eciing the biing strategy of agents participating in a significant number of simultaneous auctions, when fining an analytical solution is not possible. We ecompose the problem into sub-problems an then use rigorous experimentation to etermine the best partial strategies. In orer to accomplish this we use a moular, aaptive an robust agent architecture combining principle methos an empirical knowlege. We applie this methoology when creating WhiteBear, the agent that achieve the highest score at the 2002 International Traing Agent Competition (TAC). TAC was esigne as a realistic complex test-be for esigning agents traing in e-marketplaces. The agent face several technical challenges. Deciing the optimal quantities to buy an sell, the esire prices an the time of bi placement was only part of its esign. Other important issues that we resolve were balancing the aggressiveness of the agent s bis against the cost of obtaining increase flexibility an the integration of omain specific knowlege with general agent esign techniques. We present our observations in ealing with these esign traeoffs an back up our conclusions with empirical results. Categories an Subject Descriptors I.2.11 [ARTIFICIAL INTELLIGENCE]: Intelligent Agents General Terms Design, Experimentation Keywors Agent-Meiate Electronic Commerce, Biing Agents, Biing Strategies, Electronic Marketplaces, Simultaneous Auctions 1. INTRODUCTION Auctions are becoming an increasingly popular metho for transacting business either over the Internet (e.g. ebay) or even between This is the primary an also corresponing author. Fune in part by Darpa contract F Permission to make igital or har copies of all or part of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. To copy otherwise, to republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. AAMAS 03, July 14 18, 2003, Melbourne, Australia. Copyright 2003 ACM /03/ $5.00. businesses an their suppliers. While a goo eal of research on auction theory exists, this is mostly from the point of view of auction mechanisms (for a survey see [9]). Strategies for biing in an auction for a single item are also known. However, in practice agents (or humans) are rarely intereste in a single item. 1 They wish to bi in several auctions in parallel for multiple interacting goos. In this case they must bi intelligently in orer to get exactly what they nee. For example a person may wish to buy a TV an a VCR, but if she oes not have a flexible plan, she may only en up acquiring the VCR. Goos are calle complementary if the value of acquiring both together is higher than the sum of their iniviual values. On the other han if that person bis for VCRs in several auctions, she may en up with more than one. Goos are calle substitutable if the value of acquiring two of them is less than the sum of their iniviual values. There have been relatively few stuies about agents biing in multiple simultaneous auctions an they mostly involve biing for substitutable goos (e.g. [1], [11], [8]). In this an other relate work, such as [7], researchers teste their ieas about agent esign in smaller market games that they esigne themselves. Time was spent on the esign an implementation of the market, but there was no common market scenario that researchers coul focus on an use to compare strategies. In orer to provie a universal test-be, Wellman an his team [17] esigne an organize the Traing Agent Competition (TAC). It is a challenging benchmark omain which incorporates several elements foun in real marketplaces in the realistic setup of travel agents that organize trips for their clients. It inclues several complementary an substitutable goos trae in a variety of ifferent auctions. However, instea of biing for bunles of goos an letting the auctioneer etermine the final allocation that maximizes its income (as in the combinatorial auction mechanism 2 ), in this setting the computational cost is shifte to the agents who have to eal themselves with the complementarities an substitutabilities between the goos. In orer to tackle this problem, we investigate a methoology for ecomposing the problem into several subparts an use systematic experimentation to etermine the strategies that work best for the problem in question. We chose TAC as our test-be an we implemente our methoology in WhiteBear, the agent that achieve the top score (1st place) in this year s (2002) TAC. An earlier version of Whitebear, which was not tune using the methoology ecribe in this paper, reache 3r place in last year s competition. 3 Our goal is to provie a scalable an robust biing agent that incorporates 1 Even on ebay one looks for the same goo in several auctions! 2 For instances of this problem an algorithms to etermine the winners, see [4], [12], [18], [10]. 3 It was uring this phase that a small number of parameters (e.g. the increment to be ae to the current hotel prices when biing) was etermine. During the experiments we only experimente with the ifferent strategies an i not really change these parameters. 473

2 principle methos an empirical knowlege. As part of our experimentation, we stuie esign traeoffs applicable to many market omains. We examine the traeoff of the agent paying a premium either for acquiring information or for having a wier range of actions available against the increase flexibility that this information provies. We also examine how agents of varying egrees of biing aggressiveness perform in a range of environments against other agents of ifferent aggressiveness. We fin that there is a certain optimal level of aggressiveness : an agent that is just aggressive enough to implement its plan outperforms agents who are either not aggressive enough or too aggressive. We will show that the resulting agent strategy is quite robust an performs well in a range of agent environments. We also show that even though generating a goo plan is crucial for the agent to maximize its utility, it may not be necessary to compute the optimal plan. In particular, we will see that a ranomize greey search methos prouces plans that are close to optimal an of sufficient quality to allow for effective biing. The use of such a ranomize greey strategy results in a more scalable agent esign. Overall our agent is aaptive an robust. Moreover, it appears that its esign elements are general enough to work well uner a variety of settings that requires biing in a large number of multiple simultaneous auctions. The paper is organize as follows. In the section 2, we give the efinition of the general problem our methoology is applie an the rules of the TAC market game. In section 3, we present our methoology an how it is applie to the TAC omain. In section 4 we explain in etail the controlle experiments neee in orer to implement our methoology an to stuy the agent esign traeoffs. In section 5 we present the results an our observations from the TAC competition. Finally in section 6 we iscuss possible irections for future work an conclue. 2. TRADING GOODS IN SIMULTANEOUS AUCTIONS We first present the problem setting in section 2.1. In section 2.2, we present the escription of the TAC game together with the reasons why this game captures many of the issues of the general problem setting. 2.1 General Problem Setting The general problem setting that we eal with involves several autonomous agents, which wish to trae commoities in orer to acquire the goos that they nee. There is a preefine time winow uring which the traes can take place (efining the uration of each game ), after which each agent calculates the payoff to itself. The agents are not allowe to cooperate in any explicit way (even though implicit cooperation might arise from their behavior) an they are also assume to be self intereste. In particular, each agent i is trying to maximize its own utility function U i(θ i,c i,t i), where θ i is type of the agent (parameters selecte ranomly from a given istribution, that influence the utility function), C i is the set of commoities that the agent owns an t i is the net monetary transfer, that is the algebraic sum of payments for selling goosminus the cost of buying goos. In most cases we can assume that the utility is quasilinear, i.e. linear in the monetary transfers t i; thus U i(θ i,c i,t i)=u i(θ i,c i)+t i. The combination of goos C i owne shoul inclue several complementary an substitutable goos in orer for the game to be interesting; otherwise one might be able to fin an equilibrium to the game analytically. The mechanism use in orer to exchange the various commoities is several ifferent auctions uring which each unit of a certain commoity is trae in exchange for a monetary payment. We will assume that there is no iscriminatory pricing in these auctions, which means that if two agents wish to buy the same goo at the same time they will have to make the same payment. We will also assume that similar goos are sol in auctions with similar rules 4. Other than that we allow the auctions to have a wie variety of rules, the most important of which are: 1. Agents may act as buyers only, sellers only or both in the auctions, so we can have single-sie an ouble-sie auctions. In case e.g. they act as buyers then an external source woul have to provie (input) goos into the system an remove money from it. 2. There can be a finite or an infinite number of units for each commoity. In the extreme case, only 1 unit exists. 3. Auctions can clear continuously, several times or only once. The first case means that traes can take place at any time, while the last that traes take place exactly once, when the auction closes. 4. Auctions can close at set known times,oratunspecifie times (e.g. etermine by ranom parameters). 5. Clearing prices can be etermine only by the bis of the agents or by external parameters as well (e.g. set by an external seller). Pricing in the first case in particular can follow any pricing scheme (e.g. N th, (N +1) th highest bi, where N is the number of ientical goos for sale in the auction). The question we are intereste in answering is what bis to place at each auction. There are therefore 3 main parameters to etermine: the quantity of each goo to be bought, the prices offere for each iniviual unit an the times at which the bis are place. 2.2 TAC: A escription of the Market Game The TAC game encapsulates most of the issues of the general problem an is thus an appropriate test-be for evaluating our agent esign. Each auction has rules which cover the various options iscusse in the previous section: some auctions are singlesie an others ouble-sie, someofferafinite an some an infinite number of ientical goos, some clear continuously an others only once, somecloseatpreset times an some at ranom times, some auction clearing prices are etermine by the agents bis an others by outsie sellers. There are 28 auctions running in parallel (an in fact our strategies an methoology scales well an woul also work for a larger TAC game with many more auctions). This setting is too complex to allow for analytical erivation of equilibrium (or optimal) strategies. A number of ifferent traeoffs are present in this game, which makes the etermination of an appropriate biing strategy a ifficult esign problem. For a etaile analysis of these traeoffs, see section 3.2. The TAC setting is esigne to moel a realistic market place setting, as might be encountere by, for example, a travel agent. It also inclues several complementary an substitutable goos an a complex utility function. In the Traing Agent Competition, an agent competes against 7 other agents in each game. Each agent is a travel agent with the goal of arranging a trip to Tampa for CUST =8customers. To o so it must purchase plane tickets, hotel rooms an entertainment tickets. Each goo is trae in separate simultaneous online auctions. The agents sen their bis to the central server an are informe about price quotes an transaction information. Each game lasts for 12 minutes (720 secons). For the full etails of the mechanism an the rules of TAC, see the url: Plane tickets (8 auctions): There is only one flight per ay each 4 We make these assumptions because this usually makes reasoning about biing strategies easier. However there are several cases in which our methoology woul work even if iscriminatory pricing exists or similar goos are sol in auctions with ifferent rules. 474

3 way with an infinite amount of tickets, which are sol in separate, continuously clearing auctions in which prices follow a ranom walk. For each flight auction a hien parameter x is chosen. The prices ten to increase every 30 secons or so an the hien parameter influences the change. The agents may not resell tickets. Hotels (8 auctions): There are only two hotels in Tampa: the Tampa Towers (cleaner an more convenient) an the Shoreline Shanties (the not so goo an expecte to be the less expensive hotel). There are 16 rooms available each night at each hotel. Rooms for each of the ays are sol by each hotel in separate, ascening, multi-unit, 16 th -price auctions with price quotes announce perioically. A customer must stay in the same hotel for the uration of her stay an place bis may not be remove. One ranomly selecte auction closes at minutes 4 to 11 (one each minute, on the minute). No prior knowlege of the closing orer exists an agents may not resell rooms they have bought. Entertainment tickets (12 auctions): They are trae (bought an sol) among the agents in continuous ouble auctions (stock market type auctions) that close when the game ens. Bis match continuously. Each agent starts with an enowment of 12 ranom tickets an these are the only tickets available in the game. The type of each agent is etermine by the preferences of its clients. Each customer i has a preferre arrival ate PRi arr an a preferre eparture ate PR ep i. She also has a preference for staying at the goo hotel represente by a utility bonus UH i as well as iniviual preferences for each entertainment event j represente by utility bonuses UENT i,j. The parameters of customer i s itinerary that an agent has to ecie upon are the assigne arrival an eparture ates, AA i an AD i respectively, whether the customer is place in the goo hotel GH i (which takes value 1 if she is place in the Towers an 0 otherwise) an ENT i,j which is the ay that a ticket of the event j is assigne to customer i (this is e.g. 0 if no such ticket is assigne). Let DAYS be the total number of ays an ET the number of ifferent entertainment types. The utility that the travel plan has for each customer i is: util i =1000 X + UH i GH i (1) AD i Φ Ψ + UENT i,j I(ENT i,j = ) max =AA i j PR arr i 100 AA i + PR ep i AD i if 1 AA i <AD i DAYS, else util i =0, because the plan is not feasible. It shoul be note that only one entertainment ticket can be assigne each ay an this is moele by taking the maximum utility from each entertainment type on each ay. We assume that an unfeasible plan means no plan (e.g. AA i = AD i =0). The function I(bool expr) is 1 if the bool expr =TRUE an 0 otherwise. The total income for an agent is equal to the sum of its clients utilities. Each agent searches for a set of itineraries (represente by the parameters AA i, AD i, GH i an ENT i,j) that maximize this profit while minimizing its expenses. 3. OUR PROPOSED METHODOLOGY AND THE AGENT ARCHITECTURE In the first year that the TAC was organize (2000), the auctions rules i not introuce any real traeoffs in the esign of the agent. A ominant biing strategy was foun by some of the top scoring agents: buy everything at the last moment an bi high (the marginal utility) for hotel rooms [5]. This happene because the hotel auctions were not closing at ranom intervals an the prices of plane tickets remaine approximately the same over time. Therefore most top-scoring teams concentrate on solving the optimization problem of maximizing the utility, since bi prices an bi placement times were not an issue. The rule changes in the 2001 TAC introuce the traeoffs that mae the game interesting. Most teams ecie to use a learning strategy (e.g. ATTac, the winner of the 2000 TAC, use a boosting-base metho [14]) in orer to preict the prices at ifferent times an also ecie that ecomposing the problem completely was probably not to their best interest as there are obvious epenencies among the quantity, the price an the placement time of each bi. Given our research interests involve biing strategies for agents an experimentation on how the ifferent agent behaviors influence the behavior of the multi-agent system, we ecie to investigate in the opposite irection an further ecompose the problem. The high-level escription of the methoology we propose is: A. Decompose the problem into subproblems: 1. Decie the quantities to buy assuming that everything will be bought at current prices (optimize utility) 2. For each ifferent auction type (an goo) o: a. Determine bounary partial strategies for this auction b. Generate intermeiate strategies. Main approaches: - combine the bounary strategies, or - moify them using empirical knowlege from the omain B. Use rigorous experimentation to evaluate partial strategies: 1. Keep other partial strategies fixe if possible 2. Experiment with ifferent mixtures of agents as follows: a. Keep fixe the agents using intermeiate strategies b. Vary the number of agents using bounary strategies 3. Evaluate ifferences in performance using statistical tests 4. Determine best strategy overall (in all possible mixtures) The first part of our methoology requires little further explanation; for each ifferent auction type, which correspons to a ifferent commoity type, we compute (usually analytically but sometimes also using omain knowlege) the bounary partial strategies that are possible. 5 We then combine parts of the bounary strategies or moify some of their parts to form intermeiate strategies that behave between the extreme bouns (e.g. if the one bounary strategy will place a bi at price p low an the other at price p high in a certain case, then the intermeiate strategy shoul place its bi at price p : p low p p high). For the specifics of how this part of the methoology is applie to the TAC omain, see section 3.2. The quantities place in each bi are etermine inepenently by maximizing the utility of the agent assuming that all the goosare bought at some preicte prices an that every unit will be bought instantly. A eicate moule calle the planner is oing this task. The planer for the TAC problem is escribe in section 3.1. The secon part of our methoology eals with the way experiments are run in orer to etermine the best combination of partial strategies. Each set of experiments is esigne to evaluate the partial strategies in ifferent mixes of agents. Determining the mixture of agents is an issue of paramount importance in multi-agent systems, since the performance of each strategy epens on the competition offere by the other agents. In orer to explore the whole spectrum of mixtures we propose to keep a fixe number of agents who are using the intermeiate strategies, while systematically changing the mixture of agents using the bounary cases. For example we start with all bounary strategy agents using the lowbiing strategy an in each subsequent experiment replace some of these with others using the high-biing strategy until in the last experiment we have only the latter type. This will explore sufficiently the ifferent multi-agent environment that the agents can participate in, since the behavior cause by the intermeiate strategies is within the bouns of the behavior cause by the bounary strategies. For more etails about how this helps us etermine the experiments to run in the case of TAC in orer to choose the strategy that performs the best across all agent mixtures, see section 4. Us- 5 We calle these strategies partial because they only eal with one particular type of auction. 475

4 ing this methoology also allows us to erive general observations about the behavior of certain strategies in ifferent omains. This methoology an the esire to have a scalable an general system imposes some requirements on the agent architecture that we must use. For an agent architecture to be useful in a general setting, it must be aaptive, flexible an easily moifiable, sothat it is possible to make changes on the fly an aapt the agent to the system in which it is operating. In aition, as information is gaine by participation in the system, the agent architecture must allow an facilitate the incorporation of the knowlege obtaine. The architecture shoul support interchangeable parts so that ifferent strategies are easy to implement an change, otherwise running experiments with agents using the ifferent strategies woul be quite time consuming an also the incorporation of omain specific knowlege woul be an aruous task. These were lessons that we incorporate into the esign of our architecture. The general architecture that we use follows the Sense Moel Plan Act (SMPA) architecture (this name originate with Brooks [2]). Other traing agents, e.g., [6], have use a similar global esign. Incluing the ecomposition in the biing section of the architecture that we introuce, the overall architecture can be summarize as follows: while (not en of game) { 1. Get price quotes an transaction information 2. Calculate price estimates 3. Planner: Form an solve optimization problem 4. Bier: Bi to implement plan Determine each bi inepenently of all other bis Use a ifferent partial strategy for each ifferent bi } This architecture is quite moular an each component of the agent can be replace by another or moifie. In fact parts of the components themselves are also substitutable (e.g. the partial strategies). One last requirement that is esire is to esign our the moules of the agent to be as fast an aaptive as possible without sacrificing efficiency. Spee is not so much of a problem in the TAC game, since each agent can spen up to secons eciing its next bis, but in other omains it is crucial to react fast (within secons) to omain information an other agents actions. In the next sections, we present how the first part of our methoology is applie in the TAC omain, that is the ecomposition into subproblems. In section 3.1 we present the planner moule an in section 3.2 the selection of bounary an intermeiate strategies. 3.1 Planner The planner is a moule of our architecture. In orer to formulate the optimization problem that it solves, it is necessary to estimate the prices at which commoities are expecte to be bought or sol. We starte from the priceline iea presente in [6] an we simplifie an extene it where appropriate. We implemente a moule which calculates price estimate vectors (PEV). These contain the value (price) of the x th unit for each commoity. Let PEV arr (x), PEV ep (x), PEV gooh (x), PEV bah (x) an PEV,t ent (x) be the PEVs. For some goos this price is the same for all units, but for others it is not; e.g. buying more hotel rooms usually increases the price one has to pay, since there is a limite supply. Other information to account for is the fact that some commoities once bought cannot be sol, so in that case they have to be consiere as sunk cost, an thus their PEV is 0. For some goos these values are known accurately an for some others they are estimate base on the current ask an bi prices. The utility function that the agent wishes to maximize is: X n CUST max util c COST AA c,ad c,gh c,ent c,t where the cost of buying the resources neee is: o (2) COST = X DAYS =1 + σ PEV ep + σ PEV gooh + σ PEV bah σ PEV arr (x), X CUST + X ET t=1 σ PEV ent X CUST (x), I(AAc = ) I(ADc = ) X CUST (x), [GHc I(AAc <ADc)] X CUST (x), [(1 GHc) I(AAc <ADc)] (x),x CUST,t I(ENT c,t = ) Λ P z where the operator σ(f(x),z)= i=1 f(i). Once the problem has been formulate, the planner must solve it. This problem is NP-complete, but for the size of the TAC problem an optimal solution, that is the type an total quantity of commoities that shoul be trae to achieve maximum utility, can usually be prouce fast. However in orer to create a more general algorithm we realize that it shoul scale well with the size of the problem an shoul not inclue elaborate heuristics applicable only to the TAC problem. Thus we chose to implement a greey algorithm: the orer of customers is ranomize an then each customer s utility is optimize separately. This is one a few hunre times in orer to maximize the chances that the solution will be optimal most of the time. 6 In practice we have foun the following aitions (that were not reporte by anyone else) to be quite useful: 1. Compute the utility of the plan P 1 from the previous loop before consiering other plans. Thus the algorithm always fins a plan P 2 that is at least as goo as P 1 an there are relatively few raical changes in plans between loops. We observe empirically that this prevente raical bi changes animproveefficiency. 2. We ae a constraint base on the iea of strategic eman reuction [16], that isperse the bis of the agent for resources in limite quantities (hotel rooms in TAC). Plans, which emane many hotel rooms for any single ay, were not consiere. This leas to some utility loss in rare cases. However, biing heavily for one room type means that overall eman will very likely be high an therefore prices will skyrocket, which in turn will lower the agent s score significantly. We observe empirically that the utility loss from not obtaining the best plan tens to be quite small compare to the expecte utility loss from rising prices. We have also verifie that this ranomize greey algorithm gives solutions which are often optimal an never far from optimal. We checke the plans (at the game s en) that were prouce by 100 ranomly select runs an observe that over half of the plans were optimal an on average the utility loss was about 15 points (out of 9800 to usually 7 ), namely close to 0.15%. Compare to the usual utility of 2000 to 3000 that our agents score in most games, they achieve about 99.3% to 99.5% of optimal. These observations are consistent with the ones about the relate greey strategy in [13] from which the initial iea for this algorithm was taken. Consiering that at the beginning of the game the optimization problem is base on inaccurate values, since the closing hotel prices are not known, an 100%-optimal solution is not necessary an can be replace by our near-optimal approximation. As commoities are bought an the prices approach their closing values, most of the commoities neee are alreay bought an we have observe empirically that biing is rarely affecte by the generation of near-optimal solutions instea of optimal ones. This algorithm takes approximately one secon to run through 500 ifferent ranomize orers an compute an allocation for each. 6 For the size of TAC (where CUST =8), searching systematically all 8! = orerings is not recommene, since it woul take consierably more time than we re willing to allocate to this subtask an furthermore woul not scale well as CUST increases. 7 These were the scores of the allocation at the en of the game (no expenses were consiere). (3) 476

5 Our test be was a cluster of 8 Pentium III 550 MHz CPU s, with each agent using no more than one cpu. This system was use for all our experiments an our participation in the TAC. 8 In summary, our planner is fast, relatively omain-inepenent, an performs near-optimally. Moreover, using a close to optimal but nevertheless non-optimal plan oes not effect the agent s overall performance. 3.2 Biing Strategies Once the planner has been generate the esire types an quantities of each goo, the bier moule places separate bis for all these goos. Accoring to our methoology, we nee to fin strategies for each ifferent set of auctions an this proceure is escribe in this section. We use principle approaches, where applicable, together with empirical knowlege that we acquire uring the competition. In fact every participating team, incluing ours, use empirical observations from the games it participate in (some 2000 games over the 2001 an 2002 TAC) in orer to improve its strategy. In the next sections we escribe how the strategies are generate for the ifferent auctions an the traeoffs that our agent face Bi Aggressiveness Biing for hotel rooms poses some interesting questions. The mainissueinthiscaseishowaggressively each agent shoul bi (the level of the prices it submits in its bis). If it bis low it might get outbi, while if it bis high (aggressively) it is likely to enter into price wars with the other agents. The first bounary strategy is to place low bis: the agent bis an increment higher than the current price. The agent also bis progressively higher for each consecutive unit of a commoity for which it wants more than one unit. E.g. if the agent wants to buy 3 units of a hotel room, it might bi 210 for the first, 250 for the secon an 290 for the thir. This is the lowest (L) possible aggressiveness since the agent will never wish to bi less. The other bounary strategy is that the agent bis progressively closer to the marginal utility δu as time passes 9. Since the agent will likely lose money if it bis above the marginal utility, this is the highest (H) possible aggressiveness. Now that the bounary strategies are set our methoology suggest that we try to combine these into intermeiate strategies. We therefore selecte the following compromise: the agent that bis like the aggressive (H) agent for rooms that have a high marginal utility δu an bis like the non-aggressiveness (L) agent otherwise. This is the agent of meium (M) aggressiveness. 10 One further improvement, which was juge necessary for the 2002 TAC, is to use historical ata to etermine the price of the hotel auction which closes first an that is because we observe that our agent was getting outbi while the bis were still low. As far as the timing of the bis is concerne, there is little ambiguity about what the optimal strategy is. The agent waits until the first hotel auction is about to close to places its first bis. The reason for this is that it oes not wish to increase the prices earlier than necessary nor to give away information to the other agents. We also observe empirically that an ae feature which increases performance is to place bis for a small number of rooms at the beginning of the game at a very low price (whether they are neee or not). In 8 During the competition only one processor was use, but uring the experimentations we use all 8, since 8 ifferent instantiations of the agent were running at the same time. 9 The marginal utility δu for a particular hotel room is the change in utility that occurs if the agent fails to acquire it. In fact for each customer i that nees a particular room we bi δu z instea of δu, where z is the number of rooms which are still neee to complete her itinerary. We o this, base on empirical observations, in orer not to rive the prices up prematurely. 10 We ecie that this intermeiate strategy was more appropriate compare to others, e.g. a weighte average of the bounary bis, base mainly on empirical observations from the competition. case these rooms are eventually bought, the agent pays only a very small price an gains increase flexibility in implementing its plan Paying for aaptability The purchase of flight tickets presents an interesting ilemma as well. We have calculate (base on the moel of price change escribe in the rules) that ticket prices are expecte to increase approximately in proportion to the square of the time elapse since the start of the game. This means that the more one waits the higher the prices will get an the increase is more ramatic towars the en of the game. From that point of view, if an agent knows accurately the plan that it wishes to implement, it shoul buy the plane tickets immeiately. On the other han, if the plan is not known accurately (which is usually the case), the agent shoul wait until the prices for hotel rooms have been etermine. This is because buying plane tickets early restricts the flexibility (aaptability) that the agent has in forming plans: e.g. if some hotel room that the agent nees becomes too expensive, then if it has alreay bought the corresponing plane tickets, it must either waste these, or pay a high price to get the room. An obvious traeoff exists in this case, since elaying the purchase of plane tickets increases the flexibility of the agent an hence provies the potential for a higher income at the expense of some monetary penalty. One way to solve this is to use a cost-benefit analysis. In this case the cost of eferring purchase can be compute, but in orer to estimate the benefit from elaye buying, one must use moels for the opponent agents, which are not easy to obtain. Our first step is to ecie the bounary strategies. Since the only issue is the time of bi placement, two obvious strategies are to buy everything at the beginning or to efer all the tickets purchases at a much later time. Initially we set this later time to be right after 2 (out of the 8) hotel auctions have close. The reason for this is that at that time the intentions of the other agents can be partially observe by their effect on the auctions bi prices an thus after this time the room prices approximate sufficiently their potential closing prices. Hence a plan generate at that time is usually quite similar to the optimal plan, when the closing prices are known. Another reason is that since ticket prices are expecte to increase approximately in proportion to the square of the time elapse, the price increases after this point ten to be prohibitive. However this is still not a very goobounary case; a further improvement is to buy some tickets at the start of the game. We buy about 50% of the tickets at the beginning: these are the almost certain to be use tickets (compute base on the client preferences an the ticket prices) an we have empirically observe that these tickets are almost never waste. Given these bounary strategies, we first obtaine an intermeiate strategy by moifying the latter to wait until only 1 (the first) hotel auction closes. Another intermeiate strategy comes from the iea of strategic eman reuction [16]: we compute the minimum number of tickets which, if left unpurchase, will allow the agent to complete its itineraries even if it fails to buy a hotel room on ays uring which it wishes a lot of rooms. A small optimization problem is solve to etermine these tickets. 80% to 100% of the tickets are now bought at the beginning. An improvement (for agents who efer the purchase of some tickets) was obtaine by estimating the likelihoo of price increases. This information is then use to bi earlier for tickets whose price is very likely to increase an to wait more for tickets whose price is expecte to increase little or none. We calculate that the agent approximately halves the cost it woul otherwise pay for the eferre purchases. The full etails can be foun in a workshop paper which escribe the strategy we use in the 2001 TAC [15]. A further improvement (especially for agents who buy most tickets at the beginning) was obtaine by using historical averages of the hotel prices in previous games to set the PEVs at the beginning of the game, since the planner gives a much more accurate plan in this way. 477

6 Experiments WB-N2L WB-M2L WB-M2M WB-M2H agent Exp 1 agent (144) average Exp 2 (200) Table 1: Average scores of agents WB-N2L, WB-M2L, WB- M2M an WB-M2H. For experiment 1 the scores of the 2 instances of each agent type are also average. The number insie the parentheses is the total number of games for each experiment an this will be the case for every table. WB*xSM WB*xSH WB*x2M WB*x2H # x=m games x=a (206) Difference? X X Table 4: The effect of using historical averages in the PEVs. Early biing agents benefit the most from this Entertainment The entertainment tickets o not present us with a challenging traeoff. Therefore we only use the following strategy. The agent buys (sells) the entertainment tickets that it nees (oes not nee) for implementing its plan at a price equal to the current price plus (minus) a small increment. The only exceptions to this rule are: (i) At the game s start an epening on how many tickets the agent begins with, it will offer to buy tickets at low prices, in orer to increase its flexibility at a small cost. Even if these tickets are not use the agent sometimes manages to sell them for a profit. (ii) The agent will not buy (sell) at a high (low) price, even if this is beneficial to its utility, because otherwise it helps other agents. This restriction is somewhat relaxe at 11:00, in orer for the agent to improve its score further, but it will still avoi some beneficial eals if these woul be very profitable for another agent EXPERIMENTAL RESULTS In this section we escribe the controlle experiments we performe (the majority of which were base on agent mixtures esignate by our methoology) in orer to etermine the best overall strategy an the conclusions we rew from them concerning the traeoffs escribe in section 3.2. To istinguish between the ifferent strategies (or if you prefer versions of the agent), we use the notations WB-xyz an WB*xyz, 12 where (i) x is M if the agent moels the plane ticket prices, N if this feature is not use an A if historical averages are use in the PEVs, (ii) y takes values 0, 1 or 2, which means that the agent buys its unpurchase tickets when the y th hotel auction closes (0 means it oes not wait at all), or the value S, which means that the version base on strategic eman reuction is use an (iii) z characterizes the aggressiveness with which the agent bis for hotel rooms an takes values L,M an H for low, meium an high egree of aggressiveness respectively. To formally evaluate whether one version outperforms another, we use paire t-tests; values of less than 10% are consiere to inicate a statistically significant ifference (in most experiments the values are actually well below 5%). If more than 1 instances of a certain version participate in an experiment, we compute the t-test for all possible combinations of instances. 13 The first set of experiments were aime at verifying our observation that moeling the plane ticket prices improves the performance 11 This is introuce because in the competition the agent is intereste in maximizing not just its own utility, but also the ifference between its utility an the utilities of the other agents. 12 WB-xyz is the base on our 2001 TAC agent an WB*xyz is a slightly improve version base on our 2002 TAC agent. 13 This means that 8 t-tests will be compute if we have 2 instances of version A an 4 of version B etc. We consier the ifference between the scores of A an B to be significant, if almost all the tests prouce values below 10%. of the agent. We expecte an improvement 14, since the agent uses this information to bi later for tickets whose price will not increase much (therefore achieving a greater flexibility at low cost), while biing earlier for tickets whose price increases faster (reucing its cost). We run 2 experiments with the following 4 versions: WB- N2L, WB-M2L, WB-M2M an WB-M2H. In the first we run 2 instances of each agent, while in the secon we run only one an the other 4 slots were fille with the stanar agent provie by the TAC support team. The results are presente in table 1. The other agents, which moel the plane ticket prices, perform better than agent WB-N2L, which oes not o so. The ifferences between WB-N2L an the other agents are statistically significant, except for the one between WB-N2L an WB-M2L in experiment 2. We also observe that WB-M2L is outperforme by agents WB-M2M an WB-M2H, which in turn achieve similar scores; these results are statistically significant for experiment 1. Having etermine that this moeling leas to significant improvement, we concentrate our attention only to agents using this feature. The next experiment was esigne to explore the traeoff of bi aggressiveness. As propose by our methoology we use agents WB-M2z (z=l,m,h), keeping all other partial strategies fixe, an we use a constant number of 2 instances of agent WB-M2M, while the number of agents WB-M2H was increase from 0 to 6. Therest of the slots were fille with instances of version WB-M2L. The result of this experiment are presente in table 2. By increasing the number of agents which bi more aggressively, there is more competition between agents an the hotel room prices increase, leaing to a ecrease in scores. While the number of aggressive agents #WB-M2H 4, the ecrease in score is relatively small for all agents an is approximately linear with #WB-M2H; The aggressive agents (WB-M2H) o relatively better in less competitive environments an non-aggressive agents (WB-M2L) o relatively better in more competitive environments, but still not gooenough compare to WB-M2M an WB-M2H agents. Overall WB-M2M (meium aggressiveness) performs comparably or better than the other agents in almost every instance. Agents WB-M2L are at a isavantage compare to the other agents, because they o not bi aggressively enough to acquire the hotel rooms that they nee. When an agent fails to get a hotel room it nees, its score suffers a ouble penalty: (i) it will have to buy at least one more plane ticket at a high price in orer to complete the itinerary, or else it will en up wasting at least some of the other commoities it has alreay bought for that itinerary an (ii) since the arrival an/or eparture ate will probably be further away from the customer s preference an the stay will be shorter (hence less entertainment tickets can be assigne), there is a significant utility penalty for the new itinerary. On the other han, aggressive agents (WB-M2H) will not face this problem an they will score well in the case that prices o not go up. In the case that there are a lot of them in the game though, the price wars will hurt them more than other agents. The reasons for this are: (i) aggressive agents will pay more than other agents, since the prices will rise faster for the rooms that they nee the most in comparison to other rooms, which are neee mostly by less aggressive agents, an (ii) the utility penalty for losing a hotel room becomes comparable to the price pai for buying the room, so nonaggressive agents suffer only a relatively small penalty for being overbi. Agent WB-M2M performs reasonably well in every situation, since it bis enough to maximize the probability that it is not outbi for critical rooms, an avois price wars to a larger egree then WB-M2H. Base on these results we i not use low aggressiveness agents in the next experiments. The next set of experiments intene to further explore the traeoff of biing early for plane tickets against waiting more in orer to gain more flexibility in planning. Initially we run a smaller experiment with 2 instances of each of the following agents: WB- 14 A gain of 120 to 150 was expecte accoring to a rough estimate. 478

7 Agent Scores Average Scores Statistically Significant Difference? #WB-M2H WB-M2L WB-M2M WB-M2H M2L/M2M M2M/M2H M2L/M2H 0 (178) N/A X 2 (242) X X 4 (199) (100) N/A X Table 2: Scores for agents WB-M2L, WB-M2M an WB-M2H as the number of aggressive agents (WB-M2H) participating increases. In each experiment agents 1 an 2 are instances of WB-M2M. The agents above the stair-step line are WB-M2L, while the ones below are WB-M2H. The averages scores for each agent type are presente in the next rows. In the last rows, Xinicates statistically significant ifference in the scores of the corresponing agents, while inicates statistically similar scores. Agent Scores Average Scores Statistically Significant Difference? #WB-M0H WB-M2M WB-M0H WB-M2M/WB-M0H 2 (343) (282) X 6 (69) X Table 3: Scores for agents WB-M2M an WB-M0H as the number of early biing agents (WB-M0H) participating increases. The agents above the stair-step line are WB-M2M, while the ones below are WB-M0H. M2M an WB-M2H together with WB-M1M (which bis for most of its tickets at the beginning) an WB-M0H (which buys immeiately all the plane tickets it nees an bis aggressively for hotel rooms. 15 We ran 78 games an observe that WB-M2M scores slightly higher than the other agents, while WB-M2H scores slightly lower. These results are however not statistically significant. The bigger experiment was one to examine the behavior of the two bounary strategies against each other. We varie the mixture of agents WB-M2M an WB-M0H as shown in table 3. When only 2 of the agents were WB-M0H, the WB-M0H s score on average close to the score of the WB-M2M s, but as their number increase their score roppe an, when they are the majority, the WB-M0H s performe much worse than the WB-M2M s. In this case, the WB- M2M s try to stay clear of rooms whose price increases too much (usually, but not always, successfully), while the early-biers o not have this choice ue to the reuce flexibility in changing their plans. One interesting result which we i not expect is that the score of the WB-M2M s increases when there are 2 instances of them compare to the case when there are 4; however hotel room prices are higher in the former case, so this result seems contraictory! The explanation for this is that the prices ten to increase quite fast for the rooms that are neee by the early-biers, so the 2 WB-M2M s avoi these rooms when possible an try to position themselves mostly on the other rooms, so they o not have to pay so much. This behavior also happens in the case that there are 4 WB- M2M s, but in this case there are many WB-M2M s an when they try to move away from the rooms that the early-biers want, they en up on similar rooms (so the reason it s harer to fin the goo eals is because they stop being eals much more often once the other WB-M2M s go after them). These results woul normally allow us to conclue that it is usually beneficial not to bi for everything at the beginning of the game, but there is a minor catch: without using historical prices the early-biing agents buy goos blinly. Therefore we introuce this feature an run an experiment in which we examine the benefit that agents WB*M2M, WB*M2H, WB*MSM an WB*MSH gain if historical prices are use. Note that we i not use WB*M0z, because the agent WB*MSz also buys the vast majority of its tickets at the beginning (but not all). The results are presente in table 4. We observe that the agents which bi earlier are the ones who benefit from the use of this feature, while the benefits for WB*M2M an WB*M2H are virtually non-existent. The increase of the price estimate has the effect that the planner generates itineraries which use slightly fewer rooms than before. This ecreases the price wars between agents an improves their scores. 15 An early-bier must be aggressive, because if it fails to get a room, it will pay a substantial cost for changing its plan, ue to the lack of flexibility in planning. The last experiment extens the experiment presente in table 3. This time we examine the effect on agents WB*AyM an WB*AyH (the meium an high aggressiveness with historical prices in the PEVs at the beginning of the game) when y =0, y =2an y = S. Since y = S is the intermeiate strategy we always keep 2 agents WB*ASz (z=m,h) in the mixture of agents an change the number of the other agents (which use the bounary strategies) as escribe by our methoology; half of these are of Meium an half of High aggressiveness. The results are presente in table 5. We observe that the strategy y =2, which leaves the highest number of unpurchase tickets, performs worse than the other two. The other two perform similarly overall. The only case, in which the WB*ASz s performance is statistically better than that of the early biers, is when there are lots of early biers. From these results we etermine that the strategic eman agent is probably performing most consistently an that is the reason we use it in the TAC. Another observation is that the scores of all agents ten to go up as the prices go higher. We believe (but nee to check further) that this is a result of the fact that the historical prices are use in the PEVs mainly at the beginning of the game an, when some auctions have close, we o not any more; the later biing agents (y =2) observe the lower prices an try to purchase more rooms which in turn rives the prices up. As their number ecreases the economy becomes more efficient an all the agents profit. We are continuing the experiments in orer to increase the statistical confience in the interpretation of the results so far. This is quite a time-consuming process, since each game is run at 15 to 20 minute intervals 16. It took over 4500 runs (about 9 weeks of continuous running time) to get the controlle experiment results an some 2000 more for our observations uring the competitions. 5. TRADING AGENT COMPETITION: RE- SULTS AND OBSERVATIONS We have entere our agent, WhiteBear, in the last two Traing Agent Competitions. Preliminary an seeing rouns were hel before the finals, so that teams coul improve their strategies over time. The top 16 teams were invite to participate in the semi-finals an the best 8 avance to the finals. In the 2001 TAC the White Bear variant we use in the competition was WB-M2M. The 4 top scoring agents (scores in parentheses) that year were: livingagents (3670), ATTac (3622), White- Bear (3513) an Urlaub01 (3421). The scores in the finals were higher than in the previous rouns, because most teams ha learne (ourselves inclue) that it was generally better to have a more aaptive agent than to bi too aggressively. 17 A surprising excep- 16 This is a restriction of the game an the TAC servers 17 This was emonstrate by the secon controlle experiment that 479