University of Southampton Research Repository eprints Soton

Size: px

Start display at page:

Download "University of Southampton Research Repository eprints Soton"

Bryan Myron McBride
5 years ago
Views:

University of Southampton Research Repository eprints Soton Copyright and Moral Rights for this thesis are retained by the author and/or other copyright owners.

1 University of Southampton Research Repository eprints Soton Copyright and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination

2 Autonomous Agents in Bargaining Games An Evolutionary Investigation of Fundamentals, Strategies, and Business Applications

4 Autonomous Agents in Bargaining Games An Evolutionary Investigation of Fundamentals, Strategies, and Business Applications PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op het gezag van de Rector Magnificus, prof.dr. R.A. van Santen, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op dinsdag 6 juli 2004 om uur door Enrico Harm Gerding geboren te Haarlem

5 Dit proefschrift is goedgekeurd door de promotoren: prof.dr.ir. J.A. La Poutré en prof.dr. H.M. Amman The research reported in this thesis has been carried out at the Centrum voor Wiskunde en Informatica (CWI) under the auspices of the Instituut voor Programmatuurkunde en Algoritmiek (IPA). The research is part of the project Autonomous Systems of Trade Agents, funded by the Telematica Instituut. CIP-DATA LIBRARY TECHNISCHE UNIVERSITEIT EINDHOVEN Gerding, Enrico Harm Autonomous agents in bargaining games : an evolutionary investigation of fundamentals, strategies, and business applications / by Enrico Harm Gerding. - Eindhoven : Technische Universiteit Eindhoven, Proefschrift. - ISBN NUR 788 Keywords: Bargaining / Evolutionary algorithms / Electronic commerce / Multiagent systems IPA Dissertation Series Copyright c 2004 E.H. Gerding Cover by Tobias Baanders

6 Acknowledgement I am grateful to many people for their support and collaboration in the course of my Ph.D. research. First of all, I thank by supervisor and first promotor Han La Poutré. Not only have the many discussions been inspiring and motivating, but I have also learnt a lot about research in general. I also thank my colleagues at the CWI with whom I ve had the pleasure to co-author papers: David van Bragt, Koye Somefun, Sander Bohte, and Pieter Jan t Hoen. Their input has been invaluable. A special thanks goes to my CWI roommates and former roommates Floortje Alkemade, Michiel de Jong, and Ivan Vermeulen for their friendship, discussions, and pleasurable talks. I also thank my other colleagues and former colleagues who have contributed through journal clubs, seminars, and other ways: Tomas Klos, Jano van Hemert, Rob van Stee, Valentin Robu and Erich Kutschinski. Furthermore, I am grateful to my second promotor, Hans Amman, for his comments and support, and to the other people in the kernel committee, Nick Jennings and Rudolf Müller, for reading my thesis and providing valuable comments. I would also like to thank my many friends who have helped me to keep my mind off the research! Last but not least, a very special thanks to my parents and to Bethany for your love and support.

8 Contents 1 Introduction Terms and definitions General economic concepts Game-theoretic concepts related to bargaining Concepts from computer science Evolutionary algorithms Principles of evolutionary algorithms Modelling adaptive bargaining agents Implementation Organisation of the thesis Publications Bargaining: an overview A brief history of bargaining Game theory and artificial intelligence Game-theoretic approaches to bargaining Cooperative bargaining theory Bargaining over a single issue Bargaining over multiple issues Bargaining with private information One-to-many bargaining Computational approaches to bargaining The evolutionary approach Using Q-Learning Using Bayesian beliefs Argumentation-based negotiation Discussion i

9 A Fundamental aspects of bargaining systems 39 3 Multi-issue bargaining by alternating offers Description of the bargaining game The evolutionary system Representation of the strategies Validation and interpretation of the evolutionary experiments Efficiency Further Analysis Social extension: fairness Motivation and description: the fairness model Fairness check at the deadline Fairness check in each round Validation and strategy analysis Concluding remarks Bargaining with multiple opportunities Description of the bargaining game Game-theoretical approach Evolutionary approach Strategy Encoding Mutation Operator Evolutionary simulation results Game-Theoretic Validation Incomplete Information Integrative Negotiations Search Costs and Premature Termination Concluding remarks B Bargaining systems for business applications 79 5 Competitive market-based allocation of consumer attention space Motivation and related research Centralised vs. decentralised recommendation Use of adaptive software agents Related research The design of CASy Mall manager agent Consumers Suppliers and supplier agents Auctions ii

10 5.2.5 Effectiveness and feasibility Evolutionary simulation model of CASy Mall manager agent Consumer models Supplier models Evolutionary simulation of supplier agents Measure for proper selection of suppliers Results Simulation settings Single advertisement model Consumer model 1: independent visits with several purchases Consumer model 2: one expected purchase Consumer model 3: search-till-found Two-dimensional profile Conclusions Evaluation and further research Reflections Open problems and future research Concluding remarks Automated bargaining and bundling of information goods A system for selling information goods Bargaining using software agents Fairness and one-to-many bargaining Bargaining protocol Agents and bargaining strategies Seller agent Buyer agent Decomposing the bargaining strategy Orthogonal strategy and DF Experimental setup and results Agent preference settings Concession strategies Results Related approaches Fuzzy similarity criteria Intermediaries Auctions Discussion The system revisited Bargaining and Pareto efficiency iii

11 6.6 Concluding remarks Bargaining strategies for one-to-many bargaining One-to-many bargaining Time pressure Bargaining strategies Bargaining simulation environment The bargaining game Buyers and their agents Seller agent The evolutionary system Experimental results Settings Results Bargaining revisited Concluding remarks Discussion and conclusion 143 Appendix 147 Game-theoretic analysis Multi-issue bargaining Model without a risk of breakdown (p = 1) Model with a risk of breakdown (p < 1) Calculating the Pareto-efficient frontier Extended model: Fairness General analysis Application to a simple case Bibliography 155 Samenvatting (Dutch) 169 Summary (English) 173 Curriculum Vitae 177 iv

12 Chapter 1 Introduction Autonomous software agents are considered by many as the next step in computer automation. Given a set of goals and tasks, an autonomous agent will try to maximally satisfy the interests of its owner. These agents should be capable of autonomously performing certain tasks which are currently done manually, like searching for information on the Internet, planning, booking a holiday, and buying and selling goods and services. Especially in the field of electronic commerce, an increased use of autonomous agents is expected [30, 56, 65, 70, 92, 119, 144]. Such agents should be able to autonomously negotiate with other agents about the price and other relevant aspects of a product or service, such as delivery time, quality, quantity, payment methods, and return policies. Furthermore, the agents should be adaptive in order to cope with diverse and changing environments. Current electronic markets are becoming increasingly transparent with low search costs. From a business perspective, this potentially results in strong price competition and low margins, with a negative effect on aspects such as quality and service. Through automated bargaining about a multitude of aspects, a business can go beyond price competition and gain a competitive advantage by personalising products and services to the needs of individual customers. In such a setting, where multiple self-interested adaptive agents perform complex negotiations, the key question is how they will behave in a given environment and with specific rules of interaction. Moreover, an important challenge is to find effective bargaining strategies for the agents, and, if the rules can be changed, to determine the set of rules that achieves the best results. These are the main issues addressed in this thesis. Game theory is a field that studies the behaviour of interacting agents and can be used to address the above issues through mathematical analysis. The limitation of game theory, however, is that many restrictive assumptions need to be made in order for a mathematical analysis to be feasible. Commonly made assumptions 1

13 2 Introduction are, for example, that the agents act rationally and are completely informed. This means that the agents completely understand the rules of the game, have infinite reasoning capabilities, make no mistakes, and know all that needs to be known about the world and other agents preferences to derive optimal outcomes. If such agents really existed, games like chess would no longer be a challenge. In reality, both humans and computational agents have only limited forward looking capabilities and information; instead, many tasks are learned through experience, by a process of trial and error. To analyse such settings with so-called boundedly rational agents, computer simulations are a helpful addition to the set of game-theoretic tools. In this thesis we consider the setting where agents are adaptive to their environment, and learn effective bargaining policies by trial and error. We apply learning techniques from the field of artificial intelligence, specifically evolutionary algorithms, to model the adaptive nature of bargaining agents in practical settings. In the first part of the thesis, we consider fundamental aspects of bilateral bargaining between a buyer and a seller. We first validate the evolutionary model for bilateral bargaining by comparing the outcomes with game-theoretic results of relatively simple bargaining settings. We then investigate several extensions of game-theoretical bargaining games, which are more complex and closer to real-world settings than traditional models. Such settings are difficult to analyse game-theoretically, but can be approached using computational techniques. In the second part, a number of business applications of automated bargaining are introduced and investigated using computational simulations. The focus here lies on one-to-many bargaining, where for example a seller negotiates with many buyers simultaneously. Either an auction or a bilateral bargaining protocol is applied to the one-to-many setting, depending on the application. Auctions can be an effective way to allocate scarce resources efficiently, or in other words, to ensure that goods are awarded to whoever values them the most. If resources are flexible, however, and negotiation involves multiple aspects, bilateral bargaining can again be the preferred way to reach an agreement. For the first case, we investigate the effectiveness of various auction rules using an evolutionary simulation for problems which are unwieldy to analyse mathematically. For the latter case, we present novel bargaining strategies for the agents that can be used in practical applications. These strategies are able to cope with complex goods and can maximise the gains of trade (i.e., the joint gains that results from an agreement) by adjusting different aspects of the goods to individual needs. We furthermore combine auctions with bilateral bargaining and propose strategies which benefit from the fact that the setting is one-to-many, even though the actual bargaining is bilateral. The performance of the strategies is evaluated using computational simulations.

14 1.1 Terms and definitions Terms and definitions This Section introduces the general terminology used throughout this thesis. A more detailed explanation of game-theoretic concepts related to bargaining is presented in Chapter 2, particularly Sections 2.1 and Sections 2.3. Furthermore, additional local definitions are provided in the corresponding chapters. Some definitions are numbered in order to facilitate the lookup. Note that the numbers contain the page number where the definition is introduced, plus an additional index number General economic concepts In order to analyse the choices that people make, such as in bargaining, it is important to consider the preferences of decision makers for different outcomes. Within economics and in this thesis the notion of utility is used to quantify individuals preferences. Utility can be considered as an individual s measure of goal achievement and is usually expressed in real numbers. In general, this measure is subjective and cannot be compared to the utility of other individuals. For many real-world applications, however, utility corresponds to a monetary value, in which case comparison is possible. A utility function describes an individual s preferences over possible outcomes in terms of utility. In many cases, outcomes depend not only on choices made by individuals, but can also be affected by unpredictable events or lotteries. When such uncertainty exists, the notion of expected utility is used. Expected utility specifies the preferences over lotteries, and is computed by multiplying the utility of an event by the probability that this event occurs, and adding across all events (see [72, Ch.6] for further details). Often, people have several goals and trade-offs between these goals. For example, when buying a house, trade-offs exist between the location, size, and price of the house. A multi-attribute utility function [10, 101] can be used in order to represent preferences in case of several (often independent) goals: Definition 3.1 Multi-Attribute Utility Function A multi-attribute utility function defines the utility over multiple weighted attributes, where each attribute corresponds to a goal, and the weight indicates the relative importance of the corresponding attribute. An attribute is also called a dimension or an issue. In general, the attributes are assumed to be preferentially independent or additive. In that case, the utility is calculated by multiplying each attribute by its weight and adding across the attributes Game-theoretic concepts related to bargaining Game theory [11, 90][72, Ch.8+9] is a collection of mathematical tools designed to analyse situations where decision-makers interact, for instance when bargaining.

15 4 Introduction The decision-makers are usually assumed to be fully rational (utility maximising) and to be completely informed of the circumstances in which the game is played 1 [11, Ch.10+11]. These assumptions are far from realistic, but are often necessary in order to make mathematical analysis feasible. We will elaborate on these assumptions in Chapter 2 (see Section 2.1). A decision maker in a game is henceforth called a player. We often use the term agent instead of player, especially in a computational context. Game theory is used in this thesis to investigate situations of bargaining. In a bargaining situation two or more players have the option to make a joint choice from a set of possible outcomes. The players may benefit from an agreement, but they have different preferences for the various outcomes. In economic terms, the players can jointly produce some type of bargaining surplus, provided that they agree on how to divide it [81]. Examples include bargaining over the price of a house, but also choosing a restaurant together; in both cases, all parties involved benefit from an agreement, but might have conflicting preferences for the different outcomes. The bargaining surplus or just surplus is the joint gains that can be achieved through cooperation. For example, if a seller wants to sell a house for at least $100000, and buyer is willing to pay up to $150000, then the bargaining surplus that is jointly produced equals $ We define bargaining as the corresponding attempt to resolve a bargaining situation, i.e., to determine the particular form of cooperation and the corresponding division of the bargaining surplus. Bargaining is bilateral when it concerns two players. We use the term negotiation interchangeably with the term bargaining. The interaction between negotiating agents is usually restricted by certain rules. For instance, in the alternating-offers game (discussed in Section 2.3.2), the players are restricted to making offers and counter offers in a sequential order. The rules are set by the so-called bargaining protocol: Definition 4.1 Bargaining Protocol A bargaining protocol (also called negotiation protocol) specifies the rules that govern the negotiation process [5]. The outcomes of a bargaining game have two desirable features: individual rationality and Pareto-efficient [11, Ch. 5]: Definition 4.2 Individually Rational A bargaining outcome is individually rational 2 if the utility assigned to each player is at least as large as a player can achieve by himself without cooperation. 1 Complete information does not rule out uncertainty (e.g. about the preferences of other players). In case of uncertainty, however, it is assumed that the probabilities are known to the players. This topic is further discussed in Section 2.1 of the next chapter. 2 Individual rationality is also used to denote a property of a mechanism (see Def.5.1). In short, a mechanism is individually rational if it induces voluntary participation.

16 1.1 Terms and definitions 5 Definition 4.3 Pareto-Efficient, Pareto-Efficient Frontier A bargaining outcome is Pareto-efficient if no outcome exists that is strictly preferred by one player and not less preferred by any other player. The Pareto-efficient frontier connects all the Pareto-efficient points in an N-dimensional space, where each dimension corresponds to the utility level of a player (see Fig. 2.1 on page 21 for an example in a 2-dimensional space). Loosely put, individual rationality of the bargaining outcome ensures that an agent benefits from the agreement. In most cases, a utility of zero is set as the agent s status quo (i.e., the agent s utility for not participating). Any positive outcome is then individually rational. A Pareto-efficient outcome is desirable since there is then no waste in the allocation of the resources [72, p. 313]. If outcomes are not Pareto efficient, another deal could have been made which was at least better for one player (and equally good for the other player), or even better for both. The players are endowed with strategies that determine how the bargaining proceeds. In general, a player s strategy is a plan which lays out a course of action for each possible state or history [90]. In a bargaining setting, a strategy determines the bids of a player, given the history of the game. Moreover, the strategy decides how the player responds to the bids received by other player(s) in the game. In the alternating-offers game (see Section 2.3.2), for example, a player can respond by accepting or refusing the bid received by the opponent. Mechanism design An important application area of game theory is setting up the rules of the games, such as voting procedures or auctions rules, as to induce a certain outcome, given that players act rationally and in their own best interest. For example, game theory can help to understand what type of penalties, rewards or tax system are most effective to induce industrial companies to apply environmentally friendly production methods. In the context of bargaining, common goals are maximising social welfare (i.e., the sum of utilities of the players) or maximising revenue. Choosing the right rules in order to achieve desired outcomes is known in economics as the problem of mechanism design [133][72, Ch. 23]. First, we define the notion of mechanism. Definition 5.1 Mechanism A mechanism is a set of decision rules that map the strategies of the agents to a collective outcome. A mechanism can be viewed as an institution with rules governing the procedure for making the collective choice [72, p. 866]. In a direct mechanism, the agents are asked to state their preferences directly (either truthfully or not). An agent s preferences or type is represented by a utility function, expressing the valuation of the possible outcomes or allocations. In an indirect mechanism, players do not communicate

17 6 Introduction an entire utility function, but for instance bids in an iterative auction such as the English auction. 3 Mechanism design deals with the problem of finding a mechanism that results in a desired collective outcome, given that the agents maximise their individual utility, and given that the institution that governs the rules does not know the preferences or types of the agents beforehand (i.e., we are in a setting characterized by incomplete information, see [72, Ch. 23.B] and Section 2.2). In other words, mechanism design tries to answer whether or not, and if so how, a desired social outcome can be materialised in a world of selfish agents. A mechanism is called incentive compatible if it induces the agents to reveal their preferences truthfully. An interesting theorem is the revelation principle [11, Ch. 11][72, Ch. 23], which states that if a desired social outcome can be realised by an indirect mechanism, there exists an incentive compatible direct mechanism that also reaches the desired outcome Concepts from computer science We describe software agents [144] in this thesis that fully or partially automate the task of negotiation. We define a software agent as an autonomous software program which operates on behalf of its owner. Software agents have a certain goal, which in this thesis is to maximise a given utility function. The software agents described here can usually learn from experience and adapt their behaviour given feedback from the environment, without any human intervention. When multiple software agents interact, the entire system is called a multi-agent system. Note that in a multi-agent system the agents can reside on different platforms, in which case communication occurs via a physical network. We also use the term evolutionary agent to denote an agent who s strategy is adapted using an evolutionary algorithm. 1.2 Evolutionary algorithms Evolutionary algorithms (EAs) are powerful search algorithms from the field of artificial intelligence that are based on the principles of natural evolution [8, 45, 51, 75, 103, 115]. EAs are originally applied to solve optimisation problems, such as the travelling salesman problem and the knapsack problem [29], but are now increasingly being used to model societies of learning agents, especially within the field of agent-based computational economics (ACE) [4, 29, 104, 124, 127, 139]. Throughout this thesis EAs are applied to model adaptive agents that can learn to bargain effectively by means of trial and error. This section first briefly explains the basic 3 In an English auction players call out increasingly higher bids until no more increases are made. The winner is the last bidder.

18 1.2 Evolutionary algorithms 7 principles of EAs. Then it motivates and explains the use of EAs in the context of bargaining. Furthermore, Section describes in more detail the actual algorithm used in this thesis. The basic approach is the same in all chapters that apply evolutionary algorithms Principles of evolutionary algorithms The cornerstones of evolution in nature are survival of the fittest together with the transfer (with some variation) of genetic material from one generation to the next. EAs apply these aspects of biology to evolve an artificial population of individuals. These individuals are not living organisms in this case, but for instance solutions to a optimisation problem or bargaining strategies of an agent. The solutions are encoded on a chromosome of an individual, often consisting of a string of real or binary values. As in natural ecosystems, the survival of these individuals depends on their fitness. A suitable fitness measure in artificial ecosystems depends on the problem domain. It can for instance be an objective function in case of an optimisation problem, or the mean utility obtained by a strategy in a game. Using the example of the well-known prisoner s dilemma 4 [90, p.16], an individual s chromosome encodes a player s (binary) strategy: confess or not confess. The fitness is determined by the final payoff (or utility) obtained when the game is played. By reproduction new individuals are generated that inherit genetic material from the existing individuals in a population. Natural selection then removes individuals with a relatively low fitness from the population. This process of evolution causes good traits (i.e., that contribute to a higher fitness) to remain and bad traits to die out in the long run. Additionally, variation or errors in the transfer of genetic material creates new type of individuals or solutions Modelling adaptive bargaining agents Traditional game-theoretic studies of bargaining rely on strong assumptions such as full rationality of the agents and common knowledge of beliefs and preferences (for details see Chapter 2). In reality it is rare that these criteria are met. Even in the case of computational autonomous agents, which are capable of performing calculations much faster than humans, optimal or rational solutions cannot always be found. More importantly, since agents can be programmed by different parties, it is better to avoid strict assumptions on other agents behaviour, in particular concerning their rationality. Rather than fully rational, we assume that bargaining 4 In this game, two suspects in a crime can choose either to confess or not to confess, without knowing the strategy of the other player. The payoff or final utility of a player depends on both his choice and of the choice made by the other player.

19 8 Introduction agents have little a-prori knowledge and gradually adapt and search for optimal solutions by a process of trial and error. Such agents are called boundedly rational. In this thesis we apply an EA to model this learning aspect of bargaining agents and to develop effective strategies for these agents. EAs are frequently used for modelling (adaptive) behaviour of human societies and societies of computational agents from the bottom up, especially within the field of agent-based computational economics (ACE). 5 EAs are also increasingly being used to study situations of bargaining that are difficult to analyse game-theoretically, as in [31, 34, 73, 88, 126] (see also Section 2.4.1). The advantage of EAs is that they make no explicit assumptions or use of rationality; basically, the fitness of the individual agents is used to determine whether a strategy will be used in future situations. Nonetheless, surprisingly rational behaviour often emerges from such low-rational agents [146] (as we will also show in this thesis). There are several ways of modelling adaptive agents using EAs. In the approach used in this thesis, agents select their bargaining strategies from a pool of strategies. A separate pool of strategies exists for each agent type, where a type is defined by the preferences (i.e., utility function) of the agent and/or the agent s role (e.g. buyer or seller). Agents of the same type select their strategies from the same pool, as these agents are likely to have similar behaviour. On the other hand, agents of different types will usually prefer different strategies, hence the use of separate pools. The pools then evolve independently, i.e. no genetic material is exchanged between the different pools. Note that if there is only a single agent of a certain type, all strategies in a pool belong to that agent. This is also called a model of individual learning. If there are several agents of the same type, this is called population learning, since a population of agents (of the same type) learns as a whole. Below, the implementation of the EA is explained in more detail Implementation The term evolutionary algorithm refers to a broad class of algorithms. The implementation used in this thesis is based on a branch within EAs called evolution strategies (ES) [8], originally developed by Rechenberg [103] and Schwefel [115]. The ES were developed independently from the well known genetic algorithms (GAs) [45, 75], introduced by Holland [51]. Whereas GAs are more tailored toward binary-coded search spaces, ES are originally designed for real-encoded representations, the latter being a more natural encoding for the type of bargaining strategies we employ in the simulations. Other classes of evolutionary algorithms include genetic programming, evolution strategies, and evolutionary programming. For an interesting overview of the various approaches within evolutionary computation, see [7]. 5 For an on-line survey of the field of ACE, see [125].

20 1.2 Evolutionary algorithms 9 Figure 1.1: Iteration loop of the evolutionary algorithm. An outline of the EA is given in Figure 1.1. The EA starts with a randomly initialised parental population of individuals. Each individual contains a bargaining strategy which is encoded on the chromosome, a fixed-size string [x 0,...,x l 1 ] of length l and real values x i [0, 1]. Subsequently, offspring individuals are created (see Figure 1.1) by first (randomly, with replacement) selecting an agent in the parental population, and then mutating his chromosome to create a new offspring (the mutation operator is described below). Figure 1.2 depicts the chromosomes of a parent individual and a corresponding (mutated) offspring individual. This process is repeated until the offspring population reaches the required size. Parent individual x 0 x 1 x 2... x l 1 Offspring individual x 0 x 1 x 2... x l 1 Figure 1.2: The chromosome of a parent individual and of an associated offspring individual. Each chromosome consists of l real values x i,x i [0, 1]. The offspring individual is created by mutating the chromosome of the selected parent individual. In the next stage, the fitness or performance of both the offspring and parent individuals is determined by a process of negotiation. The way in which this is achieved depends on the negotiation setup. Details are provided in the corresponding chapters. In the final stage of the iteration (see Fig. 1.1), the fittest agents are selected as the new parents for the next iteration. Selection is performed using the deterministic (µ + λ)-es selection scheme [7, 8], where µ is the number of parents and λ is the number of generated offspring. The µ survivors with the highest fitness are selected (deterministically) from the union of parental and offspring agents. This final step completes one iteration or generation of the EA.

21 10 Introduction Mutation and Recombination Mutation and recombination are the most commonly used EA operators for reproduction. Recombination exchanges parts of the parental chromosomes, whereas mutation produces random changes in a chromosome. In case of an ES, it is common to use mutation-based models without recombination, especially because the mutation operator (explained below) is much more advanced compared to the standard operator used in e.g. genetic algorithms. Moreover, for many computational experiments of the kind discussed in this thesis, the effects of recombination seemed to be negligible when using an ES (see also [126]). We therefore focus on mutation-based models in this thesis. The mutation operator of an ES implementation works as follows. Each real value x i of a parent chromosome (see Figure 1.2) is mutated by adding a zero-mean Gaussian variable with a standard deviation σ i [8, 126], thereby producing a new value x i for the chromosome of the offspring: x i := x i + σ i N i (0, 1). (1.1) All resulting values larger than unity (or smaller than zero) are set to unity (respectively zero). In our simulations, we use two mutation models: a mutation model with selfadaptive control of the standard deviations σ i [8, pp ][126], and a model with exponential decay of the standard deviations, which we describe below. Self-Adaptive Control This model allows the evolution of both the strategy and the corresponding standard deviations at the same time. More formally, an agent consists of strategy variables [x 0,...,x l 1 ] and ES-parameters [σ 0,...,σ l 1 ], where l is the length of the chromosome. The mutation operator first updates an agent s ES-parameters σ i in the following way: σ i := σ i exp[τ N(0, 1) + τn i (0, 1)], (1.2) where τ and τ are the so-called learning rates [8, p. 72], and N(0, 1) denotes a normally distributed random variable having expectation zero and standard deviation one. The index i in N i indicates that the variable is sampled anew for each value of i. We use commonly recommended settings for these parameters (see [8, p. 72]). 6 After the strategy parameters have been modified, the strategy variables are mutated as indicated in Eq Note that, since selection works on the σ i s as well as on the strategy variables, the σ i s are part of the evolutionary process. The particular initial value chosen for σ i is therefore typically not crucial for this model, as the self-adaptation process rapidly 6 Namely, τ = ( 2l) 1 and τ = ( 2 l) 1, where l is the length of the chromosome.

22 1.3 Organisation of the thesis 11 scales σ i into the proper range. For example, if solutions are far from the optimal value, the σ i can increase as a result of the evolutionary process. On the other hand, if good solutions are found, the σ i s can converge to smaller values in order to maintain these solutions. To prevent complete convergence of the population, we force all standard deviations to remain larger than a small value ε σ [8, pp ]. Exponential Decay Using this model, the standard deviations σ i decay exponentially such that every t generations their value is reduced to half the size. We call t the half-life parameter. This model is similar to the simulated annealing mechanism, where a temperature parameter is slowly lowered to reduce variation in the exploration space. Using this model, the EA always converges if the simulation is run for a sufficient number of generations. 1.3 Organisation of the thesis Readers that are new to the field of game theory and bargaining are recommended to read the introduction to this topic in Chapter 2. Specific topics include the ultimatum game, the alternating-offers game, bargaining with incomplete information, multi-issue bargaining, and one-to-many bargaining. Chapter 2 also contains a survey of approaches using techniques from artificial intelligence and are in that way related to the general topic of the thesis. Chapter 8 concludes the thesis with a discussion and an overview of the the main results. The remaining chapters of the thesis are grouped into two parts: Part A considers fundamental aspects of bilateral bargaining systems using both game-theoretical and computational techniques. Part B investigates two business applications of automated bargaining, and introduces a number of effective bargaining strategies. Additionally, in the Appendix a game-theoretic analysis is provided for the games described in Chapter 3. Each chapter of parts A and B can, in principle, be read independently. Where necessary, cross-references are indicated within the chapters. A recurring theme is the application of evolutionary algorithms for simulating the strategic behaviour of the agents. The evolutionary algorithm is therefore treated separately in Section 1.2. Parts A and B are organised as follows: Part A: Fundamental aspects of bargaining systems Chapter 3 describes a system for bilateral negotiations in which artificial agents are generated by an evolutionary algorithm. The negotiations are governed by a finite-horizon version of the alternating-offers protocol. Several issues are negotiated simultaneously. This can reduce the competitive nature of the game since trade-offs can be made to obtain mutually beneficial solutions. These so-called Pareto-efficient

23 12 Introduction solutions are indeed found by the evolutionary agents. The outcomes of the evolutionary system are also analysed and validated using the game-theoretic subgameperfect equilibrium as a benchmark. We furthermore present and investigate an extended model in which the agents take into account the fairness of the obtained payoff. The concept of fairness plays an important role in real-life negotiations and experimental economics. We find that when the fairness norm is consistently applied during the negotiation, the evolving agents reach symmetric outcomes which are robust and rather insensitive to the actual fairness settings. Chapter 4 extends the above game by allowing both agents to negotiate with other opponents in case of a disagreement. This way the basics of a competitive market are modelled where for instance a buyer can try several sellers before making a purchase decision. Negotiations are limited to a single round, which corresponds to the so-called ultimatum game. Whereas in the regular ultimatum game the proposer demands the entire surplus, responding agents can now choose to refuse unacceptable take-it-or-leave-it deals and negotiate with another opponent. As before, the game is investigated using an evolutionary simulation. The outcomes appear to depend largely on the information available to the agents. We find that if the agents number of future bargaining opportunities is commonly known, the proposer has the advantage. If this information is held private, however, the responder can obtain a larger share of the pie, even if the initial number of bargaining opportunities is equal for both agents. For the first case, a game-theoretic analysis of the game is also presented and compared to the evolutionary results. Although a theoretical analysis is hard for the incomplete information case, the evolutionary simulation is very suitable for analysing both settings. The game is further extended to allow several issues to be negotiated simultaneously. Furthermore, effects of search costs are investigated and the case where uncertainty exists about future opportunities and a new opponent cannot always be found. Part B: Bargaining systems for business applications Chapter 5 considers a business application of automated negotiation, where several supplier agents of goods and services compete for banner space or consumer attention space by bidding in an auction. Bidding occurs based on information about the consumers, their so-called profile. As a result of the auction, a small selection of banners is short-listed and presented to the consumer, for instance on a web site. The supplier agents are simulated using an evolutionary algorithm, and can learn, given feedback from the consumers and whether or not they were short-listed, the type of consumers to target and the amount to bid. A number of consumer behaviour models are investigated that simulate the consumer s response to the presented banners. In a relatively simple model, the response is independent

24 1.3 Organisation of the thesis 13 of other banners displayed concurrently. In other models, the response contains dependencies between the banners. The auctioneer can select the auction rules or mechanism that generates the best advertisements for the consumers, but at the same time provides the suppliers with sufficient profits. Several mechanisms are investigated using the simulation environment. Chapter 6 applies automated negotiation to buy and sell bundles of information goods. A single information provider agent or seller agent negotiates with a number of buyer agents simultaneously. Whereas in Chapter 5 an auction is used for a oneto-many setting, a bilateral negotiation protocol is applied in this case, where the seller negotiates with each buyer by alternating offers and counter offers, as described in Chapter 3. A bilateral protocol is more suitable here because information goods have no constraints on the supply and different buyers can be interested in very diverse bundles of goods. A personalisation of bundles is achieved by bargaining over multiple issues. Bargaining in this setting essentially has a double purpose: (1) division of the surplus, and (2) maximising the joint gains that can be achieved by finding win-win or Pareto-efficient (see Def. 4.3) outcomes. This chapter focuses on the latter part and introduces negotiation strategies for multi-issue negotiations which can approximate Pareto-efficient solutions. Chapter 7 also considers the one-to-many bargaining setting using a bilateral bargaining protocol, but focuses on the division of the surplus. Although the buyers perceive bargaining as bilateral, the seller can actually benefit from the fact that bargaining occurs with many buyers simultaneously. This is especially the case if buyers have time pressure and prefer early agreements. Several bargaining strategies for the seller are investigated and compared using an evolutionary simulation. A class of strategies are introduced which are based on the first-price auction. These strategies can especially benefit from competition arising from the time pressure. The seller s bargaining strategies also take into account a notion of fairness, which should ensure that buyers are treated fairly and do not feel discriminated based on their individual bargaining behaviour or preferences Publications Chapters 3-6 are based on published work and/or work that has been accepted for publication but has yet to appear. Chapters 2 and 7 are based on technical reports. Chapter 2 is based on [41]: E.H. Gerding, D.D.B. van Bragt, and J.A. La Poutré. Scientific approaches and techniques for negotiation: A game theoretic and artificial intelligence perspective. Technical Report SEN-R0005, CWI, Amsterdam, 2000.

25 14 Introduction Chapter 3 is based on [42]: E.H. Gerding, D.D.B. van Bragt, and J.A. La Poutré. Multi-issue negotiation processes by evolutionary simulation: Validation and social extensions. Computational Economics, 22:39 63, Chapter 4 is based on [38]: E.H. Gerding and J.A. La Poutré. Bargaining with posterior opportunities: An evolutionary social simulation. In M. Gallegati, A. Kirman, and M. Marsili, editors, The Complex Dynamics of Economic Interaction, Springer Lecture Notes in Economics and Mathematical Systems (LNEMS), Vol. 531, pages Springer-Verlag, Chapter 5 is based on [17]: S.M. Bohte, E.H. Gerding, and J.A. La Poutré. Market-based recommendation: Agents that compete for consumer attention. ACM Transactions on Internet Technology, August 2004 (to appear). A shorter version appeared earlier as [16]: S. M. Bohte, E. H. Gerding, and H. La Poutré. Competitive market-based allocation of consumer attention space. In M. Wellman, editor, Proceedings of the 3rd ACM Conference on Electronic Commerce (EC-01), pages The ACM Press, Chapter 6 is based on [120]: K. Somefun, E.H. Gerding, S. Bohte, and J.A. La Poutré. Automated negotiation and bundling of information goods. In Agent-Mediated Electronic Commerce V, Springer Lecture Notes in Artificial Intelligence (LNAI). Springer-Verlag, Berlin, to appear. Chapter 7 is based on [40]: E.H. Gerding, K. Somefun, and J.A. La Poutré. Bilateral bargaining in a one-to-many bargaining setting. Technical Report, CWI, Amsterdam, to appear. A shorter version has been accepted for publication as [39]: E.H. Gerding, K. Somefun, and J.A. La Poutré. Bilateral bargaining in a one-to-many bargaining setting. In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS2004), New York City, New York. IEEE Computer Society Press, 2004.

26 Chapter 2 Bargaining: an overview This chapter contains an overview of approaches and techniques concerned with bargaining. We here focus on the large body of literature that has been published in the fields of economics (in particular game theory) and artificial intelligence (AI). To give a brief impression of the rapid developments in this field, we first highlight some important breakthroughs in economic bargaining theory in Section 2.1. Section 2.2 discusses assumptions frequently made in game theory to make mathematical analysis feasible, and motivates the use of computational techniques. Details on game-theoretic bargaining approaches follow in Section 2.3. Bargaining approaches using computational techniques from the field of artificial intelligence are the topic of Section 2.4. Finally, Section 2.5 concludes this chapter with a short discussion. 2.1 A brief history of bargaining Perhaps surprisingly, the bargaining problem has challenged economists for decades. Yet the bargaining problem is stated very easily [110]: Two individuals have before them several possible contractual agreements. Both have interests in reaching agreement but their interests are not entirely identical. What will be the agreed contract, assuming that both parties behave rationally? Before the path-breaking work of Nash [82] and, much later, Rubinstein [110] the bargaining problem was considered to be indeterminate. For example, in their influential work Von Neumann and Morgenstern [137] argued that the most one can say is that the agreed contract will lie in the so-called bargaining set. The bargaining set is the set of all feasible outcomes (an outcome is feasible if it can be jointly achieved by the players involved) that are individually rational (see Def. 4.2) and Pareto-efficient (see Def. 4.3), i.e., it is no worse than disagreement and there is no agreement that both parties would prefer. But because this bargaining set consists 15

27 16 Bargaining: an overview in general of an infinite number of different agreements this requirement does not yield a unique bargaining solution. A unique solution can be found, however, if the agreed contract satisfies additional axioms such as those proposed by Nash [82]. This solution is called the Nash bargaining solution and is discussed in Section Because one can argue about which axioms are reasonable and which are not, Nash suggested to complement this axiomatic approach with a strategic game. This route was followed by Rubinstein [110] who proved that an important bargaining game (the alternating-offers game) has a unique solution (see Section 2.3.2). Binmore [12] then connected the fields of axiomatic and strategic bargaining by proving that the solution of Rubinstein s bargaining model coincides with the Nash bargaining solution under special circumstances. 2.2 Game theory and artificial intelligence Game theory frequently makes simplifying assumptions to facilitate the mathematical analysis. Common assumptions are for instance: (1) complete knowledge of the circumstances in which the game is played and (2) full rationality of the players. The first assumption implies that the rules of the game and the preferences (i.e., the utility functions) and beliefs 1 of the players are common knowledge. 2 A game has incomplete information if something about the circumstances in which the game is played, such as the preferences of other players, is not known to the players. Game theorists traditionally model incomplete information of other player s preferences and beliefs by specifying a limited number of player types (see also Section 2.4.3). Each type is then uniquely determined by a set of preferences and beliefs. Players are not completely certain about the exact type of their opponent. However, the probability that an opponent is of a certain type is, again, common knowledge for all players. In this manner, a game of incomplete information can be transformed in a game of imperfect information. 3 The second assumption relates to the need for common knowledge on how players reason. It is assumed that players maximise their expected utility given their beliefs. Players have infinite computational capacity to pursue statements like if I think that he thinks that I think... ad infinitum. Furthermore, players are assumed to have a perfect memory. 4 These assumptions limit the practical applicability of game- 1 Beliefs are subjective probability of events occurring about which the player is uncertain. 2 Common knowledge means that the players know what the other players know, etc., in an infinite regress. 3 In a game with imperfect information uncertainty exists about the state of the world. A game is said to have perfect information if (i) there are no simultaneous moves and (ii) at each decision point it is known which choices have previously been made [131, Ch. 1]. 4 Lately, much research in game theory focuses on the field of bounded rationality, in which players have limited computational power and/or limited hindsight. An overview of recent work

28 2.3 Game-theoretic approaches to bargaining 17 theoretic results. In the field of AI, however, assumptions like complete knowledge or full rationality are not necessary because the behaviour of individual agents can be modelled directly. 5 This gives the AI approach an important advantage over more rigorous (but at the same time more simplified) game-theoretical models. Researchers in the field of AI are currently developing software agents (see Section 1.1.3) which should be able (in the near future) to negotiate in an intelligent way on behalf of their users. A survey of the potential of automated negotiation is given in [144, Ch. 9]. The state-of-the-art of agent technology is reviewed in [70]. In future applications for e-commerce, multi-agent systems will need to be flexible, especially for trading, brokering, and profiling applications [128]. In particular, it is important for the negotiating (software) agents to be able to adapt their strategies to deal with changing opponents, changing topics and concerns, and changing user preferences. Multi-agent learning, (the ability of the agents to learn how to communicate, cooperate and compete) becomes crucial in such domains [70, p.23]. This should lead to much more advanced and universal systems. Nevertheless, due to this rapidly increasing complexity, the connection between the AI approach and a game-theoretic analysis remains important. Game theory may aid in the difficult task of choosing a suitable bargaining protocol [14] (see Def. 4.1). Tools and techniques from AI can be used to develop software applications, bargaining strategies, protocols and mechanisms which are currently beyond the reach of classical game theory. 2.3 Game-theoretic approaches to bargaining Traditionally, game theory can be divided into two branches: cooperative and noncooperative game theory. In cooperative game theory, groups of players are taken as primitives and binding agreements can be made. Cooperative game theory abstracts away from the rules of the game and is mainly concerned with finding a solution given a set of feasible outcomes. 6 A topic like coalition forming is typically analysed using cooperative game theory. Often, in real life, companies can gain profits by working together, for example by securing a larger market share or by reducing direct competition with the competitors. In such games, a surplus (see Section 1.1.2) is created when two or more players cooperate and form a coalition. Cooperative in this field can be found in [112]. Binmore also gives a short discussion of this topic in [11, pp ]. 5 For example, agents can be programmed with a certain strategy and use for instance reinforcement learning to improve this strategy. These agents are not explicitly rational or fully informed. Nevertheless, after a period of learning, the agents could exhibit behaviour that resembles that of rational and fully informed agents. 6 Recall from above that an outcome is feasible if it can be jointly achieved by the players involved.

29 18 Bargaining: an overview game theory can then determine how the surplus is to be divided, given a coalition and a set of assumptions (called axioms). Likewise, cooperative bargaining theory determines how the surplus is to be divided which results from an agreement, given the set of axioms (an example of such axioms resulting in a unique solution, the so-called Nash bargaining solution, is discussed in Section 2.3.1). Non-cooperative game theory, on the other hand, is concerned with specific games with a well defined set of rules, game strategies, and payoffs rather than axioms. All strategies, rules and payoffs are known beforehand by the players. A player s strategy is a plan which lays out a course of action for each possible state or history. Strategies can be pure or mixed. A pure strategy determines the actions for a given state deterministically. A mixed strategy requires a player to randomise between his pure strategies. Payoffs are the final returns (expressed in utility) to the players when the game is concluded. Non-cooperative game theory uses the notion of a strategic equilibrium or just equilibrium to determine rational outcomes of a game. Numerous equilibrium concepts have been proposed in the literature (see [131] for an overview). Some widelyused concepts are dominant strategies, Nash equilibrium and subgame perfect equilibrium. We define these notions below. Definition 18.1 Dominant Strategy A dominant strategy is optimal in all circumstances, that is, the strategy achieves the highest payoff no matter what the strategies of the other players are. This is obviously a very strong notion of an equilibrium strategy. A slightly weaker, but still very powerful, equilibrium concept is the so-called Nash equilibrium [83, 84]: Definition 18.2 Nash Equilibrium Strategies chosen by all players are said to be in Nash equilibrium if no player can benefit by unilaterally changing his strategy. Nash proved that every finite game has at least one equilibrium point (in pure or mixed strategies [83, 84]). The concept of dominant strategies is a refinement of the Nash equilibrium. That is, if strategies are dominant, they also constitute a Nash equilibrium. The reverse is not necessarily true, however. Another important refinement of a Nash equilibrium is Selten s subgame-perfect equilibrium [116, 117] for extensive-form games. Extensive-form games are games with a tree structure, i.e., where players can make decisions sequentially and at various stages of the game (by contrast, in strategic-form games, players are required to make decisions once and simultaneously). Subgame-perfect equilibrium is defined as follows: Definition 18.3 Subgame-Perfect Equilibrium Strategies in an extensiveform game are in subgame-perfect equilibrium if the strategies constitute a a Nash equilibrium at every decision point.

30 2.3 Game-theoretic approaches to bargaining 19 An overview of the main bargaining literature from the field of cooperative game theory is given in Section We note that the concepts from cooperative game theory are not necessary to understand the remainder of the thesis, and are intended for the interested reader. In Section several non-cooperative bargaining games are discussed. Particular attention is paid to the most important bargaining protocol: the alternating-offers game. In Section bargaining over a single issue is assumed. Section covers work on multiple-issue negotiations. As we mentioned before, traditional game theory assumes complete information, implying that the player s preferences and beliefs are common knowledge. However, lately many researchers in game theory have focused on the consequences of players having private information. Among other things, incomplete information could explain why inefficient deals are reached or why no deal is reached at all. For instance, the occurrence of strikes and bargaining impasses, but also the occurrence of delays in negotiations can theoretically be addressed when complete information is no longer assumed. Literature related to this topic is discussed in Section We also consider one-to-many bargaining, i.e., where one player interacts with multiple opponents simultaneously. Auctions are the most common approach for such a setting, and will be the topic of Section (an alternative approach, using bilateral bargaining, is discussed in Chapters 6 and 7) Cooperative bargaining theory Cooperative game theory considers the space of possible outcomes of a game, without specifying the game itself in detail. In case of bargaining, the outcomes are often denoted in terms of utilities (see Section 1.1.1). In case of two-player games, the outcomes are then represented by utility pairs. Cooperative bargaining theory is concerned with the question of which outcome will eventually prevail, given the set of all possible utility pairs. A particular set of possible outcomes is also referred to as a bargaining problem. A function which maps a bargaining problem to a single outcome is called a solution concept. Usually, a solution concept is only valid for a certain subset of all possible bargaining problems. For instance, the first and most famous solution concept, the Nash bargaining solution (see below) only applies to convex and compact bargaining sets (see [11, pp ]). Only if these requirements are satisfied the bargaining problem can properly be called a Nash bargaining problem. An alternative bargaining solution has been proposed by Kalai and Smorodinsky [57]. Their approach is discussed below. Both the Nash and the Kalai and Smorodinsky bargaining solutions are invariant with respect to the calibration of the players utility scales. The utilitarian solution concept differs in that respect and does actually depend on how the functions are scaled. For this reason, its application is limited to those situations where inter-personal utility comparison makes

31 20 Bargaining: an overview any sense. Cooperative theories of bargaining are discussed in more detail in [106]. The Nash bargaining solution Nash proposed four properties, now called the Nash axioms, which should be satisfied by rational bargainers [82],[11, p. 184]: 1. The final outcome should not depend on how the players utility scales are calibrated. This means the following. A utility function specifies a player s preferences. However, different utility functions can be used to model the same preferences. Specifically, any strictly increasing affine transformation of a utility function models the same preferences as the original function, and should therefore yield the same outcome. 2. The agreed payoff pair should always be individually rational (see Def. 4.2) and Pareto-efficient (see Def. 4.3) 3. The outcome should be independent of irrelevant alternatives. Stated otherwise, if the players sometimes agree on the utility pair s when t is also a feasible agreement, they never agree on t when s is a feasible agreement. 4. In symmetric situations, both players get the same. The solution which satisfies these four properties is characterised by the payoff pair s = (x 1,x 2 ) which maximises the so-called Nash product (x 1 d 1 )(x 2 d 2 ), where d 1 and d 2 are player 1 s and player 2 s outcomes in case of a disagreement. Nash proved that this is the only solution which satisfies all four axioms [82]. Given a Nash bargaining problem where the set of individually rational agreements is not empty, the Nash bargaining solution then leads to a unique outcome. Figure 2.1 illustrates how to construct the Nash bargaining solution for a given bargaining problem. Due to the fourth axiom, both players are treated symmetrically if the bargaining problem is symmetric as well. In other words, if the players labels are reversed, each one will still receive the same payoff. A more general solution attributes so-called bargaining powers α and β to player 1 and player 2, respectively. In this generalised or asymmetric Nash bargaining solution, the fourth axiom is abandoned and the bargaining solution comes to depend on the bargaining powers of the two players. 7 The generalised Nash bargaining solution corresponding to the bargaining powers α and β can be characterised as above as the pair s which maximises the product (x 1 d 1 ) α (x 2 d 2 ) β [11, p. 189]. 7 What these bargaining powers represent depends on the actual (non-cooperative) game played. For example, in case of negotiating companies the bargaining powers could be determined by the strength of their respective market positions. It should be clear however, that the bargaining powers have nothing to do with the bargaining skills of the players, since perfect rationality is assumed.

32 2.3 Game-theoretic approaches to bargaining 21 Figure 2.1: Construction of the Nash bargaining solution. This figure shows the Pareto-efficient frontier (denoted by the solid line, see also Def. 4.3) and the Nash bargaining solution for a specific bargaining problem. The bargaining problem is defined by the set of feasible utility pairs (denoted by the grey area) and the disagreement point d which specifies the players payoffs in case of a disagreement. To find the (symmetric) Nash bargaining solution, one needs to find a supporting line on the Pareto-efficient frontier which is bounded by lines r and t such that the Nash bargaining solution is exactly halfway between these lines. The lines r and t are respectively the horizontal and the vertical lines drawn from the disagreement point d. The Kalai-Smorodinsky bargaining solution The third of the Nash axioms (independence of irrelevant alternatives) has been the source of great controversy (follow the discussion in [69]). Kalai and Smorodinsky therefore proposed an alternative to this axiom, which they refer to as the axiom of monotonicity [57][72, p. 844]. For a set S of individually-rational and Paretoefficient points, let m i (S) = max{s i s S} be the maximum utility value that player i could attain (for i = 1, 2), given that the players are individually rational. The Kalai-Smorodinsky solution then selects the maximum element in S on the line that joins the disagreement point (d 1,d 2 ) with the point (m 1 (S),m 2 (S)). An example is given in figure 2.2. Utilitarianism A utilitarian policy in philosophy is one which prefers an outcome which maximises the total welfare of the individuals in a society [80]. Any bargaining solution which

33 22 Bargaining: an overview Figure 2.2: Construction of the Kalai-Smorodinsky solution. m 1 and m 2 are the maximum utilities for players 1 and 2 respectively, given that the players are individually rational. Point k is the unique solution which satisfies the four axioms proposed by Kalai and Smorodinsky [57]. maximises the sum of utilities is therefore called a utilitarian solution concept. Stated less formally, the utilitarian principle asserts that you should do something for me if it will hurt you less than it will help me. Clearly, a utilitarian solution concept assumes that interpersonal utility comparisons are possible. Therefore, Nash s first axiom (independence of utility calibrations) no longer holds in utilitarian models. 8 Concluding remarks Apparently, many different types of solutions to the bargaining problem exist in cooperative game theory. The choice of a specific solution is of course based on norms existing in a society, or, more specifically, on which axioms seem to be reasonable in a specific bargaining context. Certain outcomes might be for instance be considered as unfair. An example is given in [101, pp ]. Additionally, it is important to consider for which classes of non-cooperative games the solution concepts from cooperative game theory are appropriate. For instance, if no non-cooperative game can be found which results in a solution specified by cooperative game theory, then the results from cooperative game theory have little bearing. Fortunately, such a connection between cooperative and non- 8 Note that the Pareto-efficiency axiom still holds. The other axioms depend on the specific solution concept.

34 2.3 Game-theoretic approaches to bargaining 23 cooperative game theory has been observed under special circumstances [12]. More details are given in the next section Bargaining over a single issue Four different negotiation games or protocols (see Def. 4.1) are described in this section. These protocols can be used by two bargainers to divide a given bargaining surplus (see Section 1.1.2), that is, the mutual benefit resulting when the players reach an agreement. Without loss of generality, we assume that the bargaining surplus is of size unity in the following. The following protocols are considered below: (1) the Nash demand game, (2) the ultimatum game, (3) the alternating-offers game and (4) the monotonic concession protocol. The first three games are well-known and widely-used. The fourth game is described in [105] and is an attempt to model a more realistic negotiation scenario. However, in all games described here analytical solutions are obtained using the strong assumption of common knowledge. The extrapolation of results obtained here to real-world cases is therefore a non-trivial step. The protocols described in this section have been applied mainly to evaluate negotiations over a single issue. In real life, this issue is often the price of a good to be negotiated. Although this keeps matters simple, important value-added services such as delivery time, warranty or service are left out. Both the supplier and the consumer could for instance benefit if negotiations involve multiple issues. Moreover, multiple-issue negotiations can be less competitive because solutions can be sought which satisfy both parties. Multiple-issue negotiations are studied in more detail in Section The Nash demand game Both players simultaneously demand a certain fraction of the bargaining surplus in this game, without any knowledge of the other player s demand [11, pp ]. In case the sum of demands exceeds the surplus, both players only receive their disagreement payoff. Otherwise, the demands are said to be compatible, and both players get what they requested. This game has an infinite number of Nash equilibria: all deals which are Pareto-efficient, but also deals where both players receive their disagreement payoff. For example, if both players ask more than the entire surplus, no player could ever gain by unilaterally changing his strategy. The concept of a Nash equilibrium thus places few restrictions on the nature of the outcome. Nash therefore suggested a refinement for this game which does result in a unique solution. This refinement of the demand game is called the perturbed demand game [89, pp ]. In this perturbed game the players are not completely certain about which outcomes are within the bargaining set (i.e., the set

35 24 Bargaining: an overview of compatible demands) and which outcomes are not. When the degree of uncertainty approaches zero, the Nash equilibrium of the perturbed game approaches the Nash bargaining solution of the regular demand game (without uncertainty). 9 The reader is referred to [131] for technical details on this subject. A more introductory overview is given by Binmore [11]. The ultimatum game Playing Nash s demand game, both players could easily receive nothing, or it could occur that some of the surplus is thrown away. Players would do better by choosing a somewhat less competitive game. If they are unable to reach an agreement using this alternative game, the demand game still remains an option. A very simple alternative is the so-called ultimatum game. In this game, one of the players proposes a split of the surplus and the other player has only two options: accept or refuse. In case of a refusal, both players get nothing (or the demand game is played). Although the game again has an infinite number of Nash equilibria, it has only one subgame perfect equilibrium (in case the bargaining surplus can be divided with arbitrary precision) where the first player demands the whole surplus and the second player accepts this deal [11, pp ]. The alternating-offers game Basically a multiple-stage extension of the ultimatum game, the alternating-offers game is probably the most elegant bargaining model. As in the ultimatum game, player 1 starts by offering a fraction x of the surplus to player 2. If player 2 accepts player 1 s offer, he receives x and player 1 receives 1 x. Otherwise, player 2 needs to make a counter offer in the next round, which player 1 then accepts or rejects (sending the game to the next round). This process is repeated until one of the players agrees or until a finite deadline is reached. Bargaining over a single issue in an alternating fashion has been pioneered by Ingolf Ståhl [121]. A taxonomy and survey of economic literature on bargaining before 1972 is given in this reference. Ståhl analyzes bargaining games with a finite number of alternatives. Both games of finite and of infinite length are studied, but he primarily evaluates games of a finite length. Ståhl uses an assumption of good-faith bargaining to simplify the theoretical analysis. Good-faith bargaining prevents players from increasing their demands during play. He then identifies optimal strategies for rational players with perfect information by starting at the last stage of the game and then inductively working backwards until the beginning of play. This procedure yields those equilibria which can be found with dynamic programming methods. 9 Note that only the Nash equilibria which result in solutions within the bargaining set are considered. Nash equilibria in which no agreement is reached still remain [89, p.79].

36 2.3 Game-theoretic approaches to bargaining 25 A straightforward dynamic programming approach can fail in case of imperfect information [131, Ch. 1]. Sensible strategies can then be found by requiring that each player s optimal strategy for the entire game also prescribes an optimal strategy in every subgame. As mentioned before, this concept of a subgame-perfect equilibrium (SPE, see Def. 18.3) is due to Selten [116, 117]. Rubinstein [110] successfully applied this equilibrium concept to identify a unique solution in his variant of the alternating-offers game. Rubinstein s game [110] has an infinite length and there is a continuum of alternatives. To simplify the analysis, Rubinstein made several assumptions with regard to the players preferences. An important difference with Ståhl s model is that time preferences are assumed to be stationary (this means that the preferences of getting a part x of the surplus at time t over getting y at t + 1 is independent of t). Rubinstein analyses two specific stationary models: one in which each player has a fixed bargaining cost for each period (c 1 and c 2 ) and one in which each player has fixed discount factors (δ 1 and δ 2 ). Discount factors are used to relate the utility of future consumption to the utility of consuming immediately. In other words, discount factors model how impatient the player is [11, p. 202]. We provide a formal definition of a discount factor: Definition 25.1 Discount Factor The discount factor is used to translate expected utility or costs in any given future into present value terms. Player i s utility for getting a fraction x of the surplus at time t is equal to x(δ i ) t. If the discount factor is smaller than 1, a deal is therefore worth less if the agreement is reached in the future than if a deal is reached immediately. Using stationarity and other assumptions, Rubinstein first demonstrated that the Nash equilibrium concept is too weak to identify a unique solution by proving that every partitioning of the surplus can be supported as the outcome of Nash equilibrium play. To overcome this difficulty, Rubinstein then applied the concept of a SPE and proved that there exists a unique SPE in the alternating-offers bargaining model. For example, if both players have a fixed discounting factor (δ 1 and δ 2 ) the only SPE is one in which player 1 gets (1 δ 2 )/(1 δ 1 δ 2 ) and player 2 the remainder (of a surplus of size 1). Furthermore, if both players use their SPE strategy, agreement will be reached in the first round of the game. Notice that Rubinstein s proof assumes that both players have perfect information about the other player s preferences (i.e., their bargaining cost or discount factor). Bargaining with imperfect information (i.e., where uncertainty plays a crucial role) is discussed further in Section Rubinstein s paper has been very influential in bargaining theory. At the moment, a vast body of literature exists on infinite-horizon games. An overview is given in [79, 89]. Many pointers to the literature are given in these references. We will conclude this section by discussing a few key papers in this field.

37 26 Bargaining: an overview An particularly important paper is [12]. In this paper a relation between the SPE outcome of the alternating-offers game and the Nash bargaining solution is identified in case of weak player preferences (e.g., discount factors close to unity or small time intervals between rounds). This establishes a link between non-cooperative and cooperative bargaining theory and justifies the use of the Nash bargaining solution to resolve negotiation problems (at least in case of complete information). Van Damme et al. [132] have investigated the role of a smallest monetary unit (i.e., a finite number of alternatives) in the alternating-offers game with payoff discounting. They show that in case of a finite number of alternatives, any partition of the surplus can be supported as the result of a subgame-perfect equilibrium if the time interval between successive rounds becomes very small. This means that Rubinstein s assumption of a continuous spectrum of bids is essential in deriving a unique solution of the alternating-offers game under these conditions. Monotonic concession protocol A more restricted protocol, compared to the alternating-offers game, is described in [105]. In this monotonic concession protocol the two players announce their proposals simultaneously. If the offers of both agents match or exceed the other agent s demand, an agreement is reached. A coin is tossed to choose one of the offers in case they are dissimilar. If no agreement is reached, the players need to make new offers in the next round. The offers need to be monotonic, that is, the players are not allowed to make offers which have a lower utility for their counter player compared to the last offer. Hence, a player can either make the same offer (to stand firm) or concede. Negotiations end if both agents stand firm in the same round. The players receive their disagreement payoffs in this case. Because each round at least one of the players has to make a concession (or a disagreement occurs), the protocol has a finite execution time if the minimum concession per round is fixed and larger than zero. Note that in order to make a (monotonic) concession possible, a player needs to have some knowledge about the other players preferences. This knowledge is crucial when several issues are negotiated at the same time. In this case not only the sign of the utility function, but also the relative importance of the issues becomes important. Rosenschein and Zlotkin discuss which kinds of strategies are stable and efficient when using this protocol (in negotiations over a single issue). A strategy pair is efficient in this case if an agreement is always reached. Stability is defined using the notion of symmetric Nash equilibrium: A strategy s constitutes a symmetric Nash equilibrium (and is stable) if player 1 can do no better than playing s, given that player 2 also uses s. Note that a strategy s in which both players make a concession in the same round is not stable: one of the players could do better by standing firm. On the other hand, a strategy where a player tosses a coin to determine whether to

38 2.3 Game-theoretic approaches to bargaining 27 bargaining procedures global separate sequential simultaneous implementation independent implementation Figure 2.3: Four different bargaining procedures used in multiple-issue bargaining [97]. concede or stand still is not efficient (nor stable): a disagreement will occur with a probability of one fourth. The interested reader is referred to [105] for more details on the characteristics of this mechanism Bargaining over multiple issues The above situations can be described as negotiations about how to divide a surplus. This means that the negotiations are distributive: a gain for one player always creates a loss for the other player. These kinds of negotiations are also referred to as competitive [48]. When more than a single issue is involved, and players attach different importance to these issues, tradeoffs become an option and negotiations may become integrative. The latter kind of negotiations is the main topic of this section. Results from cooperative game theory are discussed first, followed by a overview of results from non-cooperative game theory. Cooperative game theory An additive scoring system or an additive multi-attribute utility function (see Def. 3.1) can be used to represent the relationships or trade-offs between the issues if several issues are involved. 10 However, these methods are appropriate only if the issues are preferentially independent, that is, if the contribution of one issue is independent of the values of the other issues. Once the preferences are mapped, for instance onto an additive multi-attribute utility function, the bargaining set can be determined. The main goal is again to reach a Pareto-efficient outcome (see Def. 4.3). Previously introduced solution concepts such as the Nash bargaining solution or the Kalai-Smorodinsky solution can be used for this purpose. Several practical considerations (concerning for example fairness of the outcome) and some instructive real-world examples are given by Raiffa in [101]. 10 See [101, pp ] for a discussion of the differences between these methods.

39 28 Bargaining: an overview Non-cooperative game theory Four different bargaining procedures can be distinguished for multiple-issue bargaining [97] (see figure 2.3). In case of global or simultaneous bargaining all issues are negotiated at once. The second procedure is called separate bargaining. In this protocol the issues are negotiated independently. The final two procedures fall under the header of sequential bargaining and are distinguished by their rules of implementation. These rules specify when the players can start enjoying the benefits of the issues which have been agreed on. 11 Three possibilities are considered in [35]. Here, however, we will only mention the most important two. Using the so-called independent implementation rule, an agreement on an individual issue takes effect immediately, that is, the agreed upon issues are no longer discounted. In the simultaneous implementation on the other hand, the players have to wait until agreement is reached on all issues before they can enjoy the benefits of it. The time it takes to agree on the remaining issues also influences the profits gained on the already agreed upon issues. When bargaining is sequential an agenda needs to be determined to set the order in which the issues will be negotiated. Agenda setting is of course only relevant if the issues are of different importance. Another concern is whether the players attach the same importance to each issue or whether different players have different evaluations regarding the importance of the issues. The latter is the most interesting case since this allows for integrative negotiations. Unfortunately, however, only a limited literature exists on this topic in game theory. Usually, either the issues are of equal importance (as in [6]) or the players have identical preferences (as in [19]). In [97] the assumption is made that preferences are additive over issues, implying that the multi-issue bargaining problem is equal to the sum of the bargaining problems over the separate issues. One of the few papers in game theory on integrative bargaining is [35]. Fershtman considers sequential bargaining over two issues. He states that, when using Rubinstein s alternating-offers protocol for each issue in a sequential order, each player prefers an agenda in which the first issue to bargain on is the one which is the least important for him but the most important for his opponent. Notably, it is shown in [35] that the subgame-perfect equilibrium outcome for this problem does not need to be Pareto-efficient Bargaining with private information Private information such as reservation values (i.e., limit values on what the players find acceptable), preferences amongst issues, attitudes towards risk or time preferences are often hidden from the opponent in real-life negotiations. In bargaining it 11 This is relevant in case the payoff is discounted in the course of time.

40 2.3 Game-theoretic approaches to bargaining 29 might for example be beneficial to be dishonest about one s attitudes towards risk in order to get a greater share of the surplus (as would be the case in Rubinstein s alternating-offers game). Sometimes, however, a mechanism (see Def. 5.1) can be designed which gives agents a compelling incentive to be honest to the opponent. Such mechanisms are called incentive compatible (see Section 1.1.2, p. 6). The Vickrey auction [136] is an example of such an incentive-compatible mechanism (this auction and other incentive-compatible mechanisms are discussed in Section 2.3.5). Unfortunately, however, a suitable mechanism does not always exist. Moreover, such mechanisms are static and mediated (e.g. by an auctioneer) [5]. In practice, bargaining is often dynamic and involves a sequence of offers and counter offers between two or more players. Therefore, it is necessary to analyse dynamic or extensive-form bargaining games with incomplete information. As mentioned in Section 2.2, game theory frequently assumes that the players have complete information. However, in order to analyse situations in which players are unsure of the opponent s type, the notion of imperfect information needs to be applied (see Section 2.2). Imperfect information enables us to address important issues as reputation building, signalling and self-selection mechanisms [111]. For example, the fact that players are unsure of the other player s type might explain the occurrence of (inefficient) delays in reaching an agreement [89, Ch. 5]. Using such inefficient strategies may be the only way to signal for instance one s strength (an example is the outbreak of strikes during wage bargaining situations). Any utterance which is not backed up by actions can be considered as being cheap talk. 12 Delays may therefore be required to convey private information credible [58]. In a wage negotiation problem, for example, the union is often unsure about the actual value of its workers for a firm. If this value is high, the firm will be more eager to sign an agreement. In case of a low value however, the firm will behave credible by bearing the costs of a strike [58]. A firm could try to bluff by ignoring a strike even in case of a high valuation, and use this strategy to signal a lower valuation of the union workers than actually is the case. However, such a strategy can potentially be very harmful. An overview of bargaining with incomplete information is given in [5]. More introductory texts on bargaining with private information can be found in [58] and [11, Ch. 11]. 12 In non-cooperative games, nothing anyone says constrains its future behaviour. If a player chooses to honour an agreement or threat that has been made, this will only be because it is optimal to do so.

41 30 Bargaining: an overview One-to-many bargaining In a one-to-many bargaining setting, one player negotiates contractual agreements with two or more opponents. A typical example is when a seller has one or more items for sale, and several buyers wish to purchase an item (or a bundle of items). Auctions are the most common mechanism (see Def. 5.1) to solve the one-to-many bargaining problem. An alternative approach, using bilateral bargaining, is discussed in Chapters 6 and 7. This section explains the most common auctions or mechanisms and discusses optimal bidding behaviour in these auctions. We focus here on sealed-bid auctions, where buyers submit positive bids to an auctioneer and the auctioneer selects the winners and the amount that they have to pay. 13 Note that the amount that the winners pay in such auctions does not always correspond to the actual bid, which will become clear below. The auction is called sealed because a buyer s bid is hidden from the other buyers and is only revealed to the auctioneer. Often, the role of the auctioneer is taken by the seller. Auctions for a single good are discussed first, followed by auctions for more complex cases. We assume in the following that buyers have independent valuations. In this context, a the buyer s valuation is the highest price that she 14 is willing to pay, such that she is indifferent between paying the highest price and not obtaining the good(s) at all (i.e., both options have equal utility). A player s valuation is independent if it does not depend on information available about the preferences of other players, nor on the allocation of the goods to other players. Single unit Perhaps the most common sealed-bid auction for selling a single item is the firstprice auction. In this auction, the item is awarded to the highest bidder, and she pays the price equal to the submitted bid. We can use game theory to derive optimal strategies for the buyers in this auction. Take for example the case where two buyers compete for the good and have different valuations for the good. If a buyer knows the valuation of the other buyer, it is optimal to bid slightly above the valuation of the other buyer if she has the highest valuation, and to bid her valuation otherwise. This strategy constitutes a Nash equilibrium. In case the other buyer s valuation is not known, but is independently drawn from a distribution, the optimal response can again be calculated (we refer the interested reader to [72, p.865] for details). Clearly, the buyer s bid depends on a buyer s speculation about the valuations of other bidders. In general, the buyer will then bid below her valuation. An interesting alternative auction is the aforementioned Vickrey or second-price auction [136]. In this auction the highest bidder wins as before, but pays the price 13 Note that such auctions can be considered direct mechanisms (see Section 1.1.2, page 5), in which the players are asked to submit their preferences directly. 14 In the following, we use she for a buyer and he to refer to a seller.

42 2.3 Game-theoretic approaches to bargaining 31 bid by the second-highest bidder. 15 In contrast to the previous auction, the optimal strategy in this case is to bid the true valuation for the good, irrespective of the valuations and bids of the other buyers [27, 136]. 16 This is in fact a dominant strategy (see Def. 18.1). This auction is also called incentive compatible (see Section 1.1.2, p.6) because it provides the players with the incentive to reveal their preferences truthfully. Intuitively, this is because a buyer s payment is independent from her bid, and therefore she does not benefit by bidding lower than her valuation. Bidding a higher value is also not beneficial since it can result in paying more than the valuation. In fact, it appears that an auction is incentive compatible if and only if the auction is bid-independent [44], i.e., if the bid value of a bidder i does not determine bidder i s payment (but only determines if she wins or not). The Vickrey or second-price auction has several advantages compared to the firstprice auction. First of all, since the second-price auction is incentive compatible, calculating the optimal strategy for the buyers is straightforward. The auction is also robust, since the choices of buyers do not depend on the behaviour of others. Another advantage is that the second-price auction is an efficient auction; efficient auctions put goods into the hands of the buyers who value them the most [27]. Efficiency is a very desirable property, as it maximises the total gains of trade (i.e., the bargaining surplus). In [27] it is shown that any incentive compatible auction is efficient. By contrast, the first-price is not, in general, efficient. In case of uncertainty about other buyers valuations and thus speculating buyers, inefficient outcomes can occur (see [27] for an example). Below, we consider incentive compatible (and thus efficient) auctions for the more general case of multiple units. Multiple units In case multiple goods are traded, the Generalised Vickrey Auction (GVA) [133] can be used to allocate the goods efficiently. Like the Vickrey auction, the GVA is also incentive-compatible, that is, truth-telling is a dominant strategy. In this section, we apply the GVA in case multiple (homogeneous) units of the same good are sold (for other applications, see e.g. [133]). The GVA then works as follows. In the initial stage, each buyer i reports a utility function u i ( x) to the auctioneer, which may or may not be the true utility function. The vector x specifies the number of units allocated to each buyer i. 17 For this application, the utility function expresses the amount of money a buyer is willing to spend for a given allocation x. The auctioneer then calculates the allocation of units x that maximises the sum of 15 In case of a single bidder, this bidder gets the good for free. 16 This holds assuming independent valuations, as stated before. 17 For the case described here, we assume that buyers only care about the units they receive, and not about the units received by others (which is part of the valuation independence assumption described earlier), i.e., u i ( x) = u i (x i ); there are no so-called allocative externalities [55]. We note, however, that the GVA can also be applied to the case of allocative externalities, see e.g. [134].

43 32 Bargaining: an overview utilities, under the constraint that the number of allocated units equals the number of available units. The auctioneer also calculates the allocation that maximises the sum of utilities other than that of buyer i. This allocation is denoted by x i. Each buyer i then receives the bundle according to the allocation x and has to pay the following amount to the auctioneer: j i u j (x i) j i u j (x ). In words, a buyer pays the other buyers losses as a consequence of obtaining the bundle. Note that since the payment of a buyer i does not depend on the utility reported by buyer i, but only on the utilities reported by the other buyers, it follows that this mechanism is incentive compatible. Below we show the application of this mechanism for two examples. Example 1 In case of a single unit, this mechanism is equivalent to the secondprice auction. We show this in the following. We assume (without loss of generality) that a buyer s utility equals zero if no units are allocated to this player. In case buyer i is not the highest bidder (i.e., does not report the highest utility value for the good), the allocation is not affected by buyer i (i.e., x i = x ), and the payment j i u j ( x i) j i u j ( x ) = 0. On the other hand, if buyer i is the highest bidder, then the second part of the equation [ j i u j ( x )] equals zero, since nobody else gets anything. The first part [ j i u j ( x i)], however, equals the reported valuation (i.e., bid) of the second-highest bidder, since this would be the (reported) valuation of the winner if buyer i would not participate. The payment therefore equals the reported valuation (i.e., bid) of the second-highest bidder. Example 2 In case of N units, and if each bidder is allocated up to one unit, the GVA mechanism reduces to an (N + 1)-price auction, i.e., where each winner pays the price of the (N + 1)-highest bidder. 18 To see this, consider first the case where buyer i is not a winner. As before, buyer i does not affect the allocation, and therefore pays zero. In the other case, i.e., when buyer i is one of the winners, then j i x equals the total bids (reported valuations) of the remaining winners. Furthermore, since the unit would go to the (N + 1)-highest bidder if buyer i would not participate (assuming there are at least N +1 participants), j i x i equals the total valuation of the remaining winners of the actual allocation, plus the valuation of the (N +1)-highest bidder. The payment is then exactly the valuation (or bid) of the (N + 1)-highest bidder. This holds for each winner, assuming there are at least N +1 bidders. Note that if there are less than N +1 bidders, all bidders receive the good for free. 2.4 Computational approaches to bargaining Simplifying assumptions frequently made in game-theoretical analyses, such as assumptions of perfect rationality and common knowledge, do not need to be made 18 This auction is applied in Chapter 5.

44 2.4 Computational approaches to bargaining 33 if the behaviour of boundedly-rational negotiating agents is modelled directly, for instance using techniques from the field of artificial intelligence (AI). This section provides an overview of the key research related to this thesis, where AI techniques such as evolutionary algorithms, reinforcement learning (specifically Q-learning) and Bayesian beliefs are applied to develop a negotiation environment consisting of intelligent agents. In addition, we shortly review the relatively new field of argumentation-based negotiation. Note that the evolutionary approach is the main focus of this thesis, and therefore the most relevant. The other techniques mentioned are intended for the interested reader. Using the above-mentioned techniques, agents are able to learn from experience and adapt to changing environments. This learning aspect is essential for automated negotiation settings (where software agents, see Section 1.1.3, bargain on behalf of their owners), especially when the behaviour of competitors and the payoffs are not known in advance. Several aspects of learning are potentially important during the negotiation processes. First, a bargaining agent needs to have a strategy which specifies his actions during the course of play. On the basis of the agent s experiences in previous bargaining games, he can learn that it might be profitable to adjust his strategy in order to achieve better deals. Second, it might even be useful to update a strategy during play. This may be the case if the agent is initially unsure about the type of his opponent. After playing a bargaining game for a number of rounds, the agent may form a belief about his opponent s type and fine-tune his behaviour accordingly. Third, an agent might need to learn the preferences of his owner first. Here, attention is focussed on the first two kinds of learning. This section is organised as follows. Section discusses the main related research where bargaining agents adapt using evolutionary algorithms (EAs). Q- learning and an application hereof for bargaining is described in Section Section approaches learning during the negotiation process using Bayesian beliefs. Section considers an alternative approach where negotiation is viewed as a dialogue game, and the parties attempt to reach consensus using argumentation The evolutionary approach Oliver [88] was the first to demonstrate that a system of adaptive agents can learn effective negotiation strategies using evolutionary algorithms. Computer simulations of both distributive (i.e., single issue) and integrative (i.e., multiple issue) alternating-offers negotiations are presented in [88]. Binary coded strings represent the agents strategies. Two parameters are encoded for each negotiation round: a threshold which determines whether an offer should be accepted or not and a counter offer in case the opponent s offer is rejected (and the deadline has not yet been reached). These elementary strategies were then updated in successive generations by a genetic algorithm (GA). Similar models are also investigated in this

45 34 Bargaining: an overview thesis. In [126], a related model was investigated. Here, a systematic comparison between game-theoretic and evolutionary bargaining models is also made, in case negotiations concern a single issue. Chapters 3 and 4 of this thesis extend similar negotiation models even further by considering multiple issues and cases that are unwieldy to analyse mathematically. More elaborate strategy representations are proposed in [73]. Offers and counter offers are generated in this model by a linear combination of simple bargaining tactics (time-dependent, resource-dependent, or behaviour-dependent tactics). As in [88], the parameters of these different negotiation tactics and their relative importance weightings are encoded in a string of numbers. Competitions were then held between two separate populations of agents, which were simultaneously evolved by a GA. The time-dependent tactics are further investigated in [34] using GAs, for the case that negotiating agents have different time preferences. Dworman et. al [31] studied negotiations between three players. If two players decide to form a coalition, a surplus is created which needs to be divided among them. The third party gets nothing. Of course, all three players want to be part of the coalition in this case. Moreover, they also want to receive the largest share of the bargaining surplus. Genetic programming was used in this paper to adapt the offers and to decide whether to form a coalition or not. A comparison with game theoretic predictions and human experiments was made. Evolutionary algorithms have recently been used not only to generate strategies but also to design auction mechanisms (see Def. 5.1 and Section 2.3.5), notably by Cliff [24] and Phelps et al. [96]. Especially for double auctions, where analytical solutions are typically intractable, the evolutionary approach has been successfully applied. Double auctions allow for many buyers and many sellers to exchange goods or services. In this type of auction, sellers and buyers submit bids (offered quantity and price) and asks (demanded quantity and price) respectively, which are then matched by the auctioneer. The auctioneer also determines the trading price for each match. In [96] genetic programming (GP) is used to evolve both the strategies of the traders and the auction mechanism. In this first endeavour towards automated design of auction mechanisms from scratch, GP is used to determine the rule for setting the trading price, while having a fixed matching algorithm. The goal is to optimise market efficiency, that is the total profits of both buyers and sellers as a fraction of the theoretical maximum, given that buyers and sellers are only concerned about maximising their individual profits. In a related approach by Cliff [24], a genetic algorithm is used to evolve both the traders and an additional parameter that selects between a continuum of auctions.

46 2.4 Computational approaches to bargaining Using Q-Learning Many learning techniques require feedback each time an action is performed. However, in many practical cases feedback is only received at the end of a (long) sequence of actions. A good example is a game like chess: only at the end of play the players know with certainty how well their strategy performs. In learning models like Q- learning, agents also try to evaluate the effect of intermediate actions. Q-learning is a reinforcement learning algorithm [113, p. 528] which learns an action-value function yielding the expected utility (see Section 1.1.1) of a given action in a given state [113, p. 599]. This algorithm maintains a list of so-called Q-values Q(a, i), which denote the expected utility of performing an action a at state i. The action which maximises the expected utility is selected, and the system moves to a new state j. The Q-value is then updated depending on the Q-value of the new state and the received reward (if available). The following equation can be used [113, p. 613] for updating the Q-value in case of a transition from state i to j by taking action a: Q(a,i) Q(a,i) + α(r(i) + maxq(a,j) Q(a,i)), (2.1) a where R(i) is the actual reward received in state i and α is the learning rate. The value max a Q(a,j) represents the expected utility of state j. For example, if the current state i has a relatively low expected utility and the next state j has a high expected utility, the Q-value Q(a,i) is updated in such a way that the difference between these states is reduced. In this way rewards which are given at the terminal state are passed to the other states in the sequence. As we mentioned before, selecting an action in the current state depends on the expected utility of each action. Hence, a trade-off needs to be made between exploitation and exploration. In other words, should an action be chosen which has already proven itself or do we prefer to try out new actions which might produce even better results? This question of finding an optimal exploration policy has been studied extensively in the subfield of statistical theory that deals with so-called bandit problems [113, pp ]. The Q-learning approach was applied by Oliveira and Rocha [87] for the formation of virtual organisations in an e-commerce environment. The idea is that in order to satisfy some user s need, often a combination of services is needed, which is provided by different companies. The agent representing the user (called the market agent ) negotiates with several organisation agents, after which a selection of these organisations is made and a virtual organisation is created. The protocol used during the negotiation phase is as follows. First, each participating organisation generates a bid, based on previous experience, and sends this bid to the market agent. A Q-learning technique is then used to determine which bid to make. The actions (i.e., the bids) made are then evaluated using the feedback given by the

47 36 Bargaining: an overview market agent. The market agent compares the bids using a multi-criteria evaluation method based on qualitative measures (in which only the preference ordering is assumed to be important). The market agent selects the organisation which either proposes a satisfactory evaluation, or he chooses the highest evaluation when a deadline is reached. Organisations not selected are given feedback as to which attributes were not satisfactory. Negotiations take several rounds, and each round an organisation is selected Using Bayesian beliefs Bayesian beliefs are used to model an agent s (probabilistic) knowledge of an uncertain environment. Suppose the agent has some a priori knowledge about the likelihood of a set of hypotheses H i, with i = 1,...,n. Furthermore, the agent has some conditional knowledge about the probability that an event e will occur, given that one of the hypotheses is true. If event e then occurs, the beliefs about the hypotheses are updated using the Bayesian update rule [148]: P(H i e) = P(H i )P(e H i ) nk=1 P(e H k )P(H k ), (2.2) where P(H i e) is the a posteriori probability of H i and P(H i ) the a priori probability. P(e H i ) is the conditional probability that event e occurs given hypothesis H i. When agents have incomplete information about one another, it becomes important to learn about the other agent by observing his behaviour during the negotiation process. Bayesian beliefs are often used to make assumptions about the opponent such as his type [64] or his reservation price [147],[148] (where the reservation price is defined here as an agent s threshold of offer acceptability). These beliefs are updated depending on the opponent s moves. However, once both agents use beliefs to determine their strategies, they also need beliefs about their opponent s beliefs, and so on. This is known as the problem of outguessing regress [148]. In game theory this problem is solved by having a limited number of different types of players. The beliefs and preferences of each type are common knowledge, but there is uncertainty about which player is of which type. This theory, suggested by Harsanyi, is a technique for transforming a game of incomplete information into a game of imperfect (but complete) information (see also Section 2.2). In reality however, the number of different types is usually very large, and, moreover, it is not always realistic to assume that the preferences and beliefs of the different types are common knowledge. In more practical applications (such as [64] and [147]), the problem of outguessing regress is circumvented by assuming limited reasoning capabilities. In [147], for instance, a player has beliefs about e.g. the payoff function and reservation price of the other player, but not about the beliefs of the other player.

48 2.4 Computational approaches to bargaining Argumentation-based negotiation An alternative approach to automated negotiation is the use of dialogues or argumentation to resolve conflicts. In recent years, this field has received increasing interest within the agent community [71, 74, 94, 99, 100]. We therefore relate some of the main concepts and highlight some of the research in this field. A more extensive overview of the state-of-the-art on argumentation-based negotiation can be found in [99]. Argumentation can be useful when, for example, negotiations involve several issues and a mutually beneficially situation can be achieved (as described in Section 2.3.3). When agents have incomplete information about each others preferences negotiations, inefficient deals are often obtained (see Section 2.3.4). This problem can be resolved using argumentation. The idea is that the agents are able to provide meta-information on why they have a particular objection to a proposal. This way, information is exchanged, but without fully disclosing each others preferences. A negotiation architecture using this kind of meta-information is described in [94]. This approach was also used in MIT s Tête-à-Tête system, a bilateral integrative negotiation system for online shopping [71]. Agents within this framework can: (1) make a new proposal, (2) accept the proposal of the counter agent, (3) criticise a proposal or (4) withdraw from the negotiations. This system uses the notion of a critique to enable agents to criticise a particular proposal. A critique is a comment of an agent specifying which part of the proposal he dislikes. In case of a new proposal or critique, the agent can also send additional information. For instance, a proposal may include conditions under which it holds (e.g., I will provide you with X if you provide me with Y). Argumentation can also be used to influence the preferences, beliefs and/or goals of other players. In general, preferences are assumed to be fixed. In reality, however, it is often true that a player s preferences are not completely formed or that uncertainty exists about the environment. In that case, a player s preferences and beliefs can be influenced upon receipt of new information. The negotiation process then not only consists of dividing the surplus, but also of gathering information. An interesting approach is described in [100], where one player may influence another player s preferences by discussing the underlying motivations and interests behind adopting certain (sub)goals. For example, a buyer may want to negotiate a flight ticket with a travel agent for the more fundamental goal of travelling to Paris. If the fundamental goal is known to the travel agent, she can suggest a train ticket as an alternative means to satisfy the same goal. Another way of influencing a player s behaviour is by means of persuasion, for example by using threats, rewards or appeals [102].

49 38 Bargaining: an overview 2.5 Discussion The first part of this chapter reviews, in broad lines, literature on bargaining from the field of game theory. This overview shows that game theory is a very useful tool to analyse bargaining situations in a mathematical fashion. Such a rigorous analysis is only tractable, however, if many details of human interaction, for instance emotions or irrational behaviour, are abstracted away. This may undermine the capability of game-theoretical models to explain or predict human behaviour. This aspect may be less problematic when we consider systems in which artificial agents interact with each other, because these agents are often designed to behave (in good approximation) in a rational fashion. Game theory may therefore yield fundamental insights in the design of efficient negotiation protocols for automated trading. Furthermore, given a negotiation protocol and under certain assumptions, optimal strategies can sometimes be derived. Nevertheless, game-theoretical assumptions like common knowledge and perfect rationality often appear to be too strong in modelling practical situations. The issue of common knowledge has been solved only partially in game theory by introducing a theory for players with imperfect information. The development of game-theoretic models for boundedly-rational players is a relatively young research direction. Our survey shows that techniques from the field of artificial intelligence are potentially very powerful in situations of incomplete information and boundedly-rational players. Learning techniques developed within the AI community can for instance be used to adapt the agents behaviour in complex environments and to construct accurate models of the other agents preferences.

50 Part A Fundamental aspects of bargaining systems

52 Chapter 3 Multi-issue bargaining by alternating offers Automated negotiations have received increasing attention in the last years, especially from the field of electronic trading [14, 56, 65, 71, 73, 88, 128]. In the near future, an increasing use of bargaining agents in electronic market places is expected. Ideally, these agents should not only bargain over the price of a product, but also take into account aspects like the delivery time, quality, payment methods, return policies, or specific product properties. In such multi-issue negotiations, the agents should be able to negotiate outcomes that are beneficial for both parties. The complexity of the bargaining problem increases rapidly, however, if the number of issues becomes larger than one. This explains the need for intelligent agents, which should be capable of negotiating successfully over multiple issues at the same time. In this chapter, 1 we consider negotiations that are governed by a finite-stage version of the Rubinstein-Ståhl multi-round bargaining game with alternating offers (see Section and [110, 121]). We investigate the computation of strategies of the agents by evolutionary algorithms (EAs) in case negotiations involve multiple issues. We first assess the efficiency of the agreements reached by the evolutionary agents (see Section 1.1.3). We then analyse to what extent the evolutionary outcomes match with game-theoretic results. We study models in which time plays no role and models in which there is a pressure to reach agreements early (because a risk of breakdown in negotiations exists after each round). Furthermore, we present and study a more realistic negotiation model, where agents take into account the fairness of the obtained payoff. This use of fairness is based on the following observation. When no time pressure is present, extreme divisions of the payoff occur in the computational experiments, due to a powerful 1 The results in this chapter have are published as [42]: E.H. Gerding, D.D.B. van Bragt, and J.A. La Poutré. Multi-issue negotiation processes by evolutionary simulation: Validation and social extensions. Computational Economics, 22:39 63,

53 42 Multi-issue bargaining by alternating offers take-it-or-leave-it position for one of the negotiating agents in the last round of the negotiation. Although such extreme outcomes are in agreement with game-theoretic results, they are usually not observed in real-life situations, where social norms such as fairness play an important role [13, 67, 107, 141]. We therefore introduce a fairness norm and incorporate this in the agents behaviour. We perform computational experiments with various fairness settings, and show that, depending on the actual settings, fair deals indeed evolve. A number of related paper demonstrate that, using an EA, artificial agents can learn effective negotiation strategies [34, 73, 88, 126] (see also Section 2.4.1). In [126], a systematic comparison between game-theoretic and evolutionary bargaining models is made, in case negotiations concern a single issue. In [34] single-issue negotiations are also studied using a genetic algorithm, when agents can select between a number of pre-specified strategies. The multi-issue problem is considered in [73, 88]. The main contribution in this chapter lies in the validation of the evolutionary model for multi-issue negotiations with possible breakdown, using game-theoretic subgame-perfect equilibrium (see Def. 18.3), and the introduction of a fairness norm in such negotiations. Especially the latter is a first attempt to study complex bargaining situations which are more likely to occur in practical settings. A rigorous game-theoretic analysis is typically much more involved or may even be intractable under these conditions. The chapter is organised as follows. The alternating-offers negotiation protocol for multiple issues is described in Section 3.1. Section 3.2 gives an outline of the evolutionary simulation environment and how the strategies of the agents are represented. A comparison of the computational results with game-theoretic results is presented in Section 3.3. The extension with fairness is the topic of Section 3.4. Section 3.5 summarises the main results and concludes. 3.1 Description of the bargaining game We consider negotiations that are governed by a finite-stage version of the Rubinstein- Ståhl multi-round bargaining game with alternating offers (see Section for details). During the negotiation process, the agents exchange offers and counter offers in an alternating fashion at discrete time steps (rounds). In the following, the agent starting the negotiations is called agent 1, whereas his opponent is called agent 2. Bargaining takes place over m issues simultaneously, where m is the total number of issues. We assume that mutual gains are possible for each issue by reaching an agreement, i.e., that a positive bargaining surplus is available (see also Section 1.1.2) for each issue. We further assume (without loss of generality) that the total bargaining surplus available per issue is equal to unity. We express an offer as a vector o, where the i-th component o i specifies the share that agent 1 receives of the bar-

54 3.2 The evolutionary system 43 gaining surplus for issue i if the offer is accepted. Agent 2 then receives 1 o i for issue i. The index i ranges from 1 to m. Note that an offer always specifies the share obtained by agent 1. The agents evaluate multi-issue offers using an additive multi-attribute utility function (see Def. 3.1 and [73, 88, 101]). Agent 1 s utility function is w 1 o j (r) = mi=1 w i 1 o i j(r), where j = 1 if the offer is proposed by agent 1 and j = 2 otherwise. Agent 2 s utility function is w 2 [ 1 o j (r)]. Here, w j is a vector containing agent j s weights w i j for each issue i. The weights are normalised and larger than zero, i.e., m i=1 w i j = 1 and w i j 0. Because we assume that 0 o i j(r) 1 for all i, the utilities are real numbers in [0, 1]. As stated above, agent 1 makes the initial offer. If agent 2 accepts this offer, an agreement is reached and the negotiations stop. Otherwise, play continues to the next round with a certain continuation probability p (0 p 1). When a negotiation is broken off prematurely, both agents receive a utility of zero. If negotiations proceed to the next round, agent 2 needs to propose a counter offer, which agent 1 can then either accept or refuse. This process of alternating bidding continues for a limited number of n rounds. When this deadline is reached without an agreement, the negotiations end in a disagreement, and both players receive nothing. 3.2 The evolutionary system We use an EA to evolve the negotiation strategies of the agents. Implementation details of the EA are discussed in Section Each strategy in the EA is associated with either an agent of type 1 (i.e., initiating the negotiation) or of type 2. The strategies of competing agents evolve in separate populations 2 : the strategies of the agents of type 1 evolve in population 1, and of type 2 in population 2. This way, the EA populations co-evolve since the performance of a strategy depends on the strategies in the opponent s population. An overview of the evolutionary system with separate populations for the strategies of the two agent types is depicted in Figure 3.1. The fitness of the parents is determined by negotiation between the agents in the two parental populations (as shown in Fig. 3.1). Each agent negotiates with all agents in the population of the opponent. The utility functions are the same for agents within the same population (i.e., the weight settings are equal). The average utility obtained in all negotiations is an agent s fitness value. The fitness of the 2 It is also possible to use a single population with strategies for both agent types on a single chromosome. The outcomes, however, are then affected by so-called hitchhiking [75], where relatively poor genes are selected because other genes on the chromosome yield a good performance.

55 44 Multi-issue bargaining by alternating offers Figure 3.1: Iteration loop of the evolutionary algorithm where strategies for competing agents evolve in separate populations. EA Parental population size (µ) 25 Parameters Offspring population size (λ) 25 Selection scheme (µ + λ)-es Mutation model self-adaptive Initial standard deviations (σ i (0)) 0.1 Minimum standard deviation (ǫ σ ) Negotiation Number of issues (m) 2 parameters Number of rounds (n) 10 Weights of agents in population 1 ( w 1 ) (0.7, 0.3) T Weights of agents in population 2 ( w 2 ) (0.3, 0.7) T Table 3.1: Default settings of the evolutionary system. new offspring is evaluated by negotiation with the parental agents. 3 A social or economic interpretation of this parent-offspring interaction is that new agents can only be evaluated by competing against existing or proven strategies Representation of the strategies An agent s strategy specifies the offers and counter offers proposed during the process of negotiation. In a game-theoretic context, a strategy is a plan which specifies an action for each history [11]. In our model, the agent s strategy specifies the offers o j (r) and thresholds t j (r) for each round r in the negotiation process for agents j {1, 2}. The threshold determines whether an offer of the other party is accepted or 3 In an alternative model, not only the parental agents are used as opponents, but also the newly-formed offspring. Similar dynamics have been observed in this alternative model.

56 3.3 Validation and interpretation of the evolutionary experiments 45 Agent 1 o 1 (1) t 1 (2) o 1 (3) t 1 (4)... Agent 2 t 2 (1) o 2 (2) t 2 (3) o 2 (4)... Figure 3.2: The strategies for agent j {1, 2} specify a sequence of offers o j (r) and thresholds t j (r) for rounds r {1, 2,...,n} of the negotiation. rejected: If the value of the offer (see below) falls below the threshold the offer is refused; otherwise an agreement is reached. 4 This strategy representation is depicted in Fig Notice that in each round, the strategy of an agent specifies either an offer or a threshold, depending on whether the agent proposes or receives an offer in that round. Note that in odd rounds, agent 1 makes an offer and agent 2 either accepts or rejects, and visa versa in even rounds. The strategy, consisting of offers and thresholds, is encoded on the chromosome using real values in the unit interval (one offer or threshold for each negotiation round). We use x i to denote the (real) value at location i of the chromosome. The agents strategies are initialised at the beginning of each EA run by drawing random numbers in the unit interval (from a flat distribution). 3.3 Validation and interpretation of the evolutionary experiments Experimental results obtained with the evolutionary system are presented in this section. All relevant settings of the evolutionary system are listed in Table 3.1 (further explanation is provided in Section 1.2.3). A comparison with game-theoretic results is made to validate the evolutionary approach. Section addresses the evolution of efficient negotiation results. Section further analyses the results and compares the experimental results with predictions from game theory. In the following, we refer to the agents in the evolutionary system as evolutionary agents (see Section 1.1.3) Efficiency First, we investigate the experimental results w.r.t. disagreements. Without breakdown (p = 1), disagreements can only occur when the deadline is reached. The experiments show that the percentage of disagreements is then very small (around 0.1% after 1000 generations if n = 10). With a risk of breakdown of 30% (p = 0.7), 4 A similar approach was used in [88, 126].

57 46 Multi-issue bargaining by alternating offers Figure 3.3: Agreements reached by the evolutionary agents at (a) the start of a typical EA run and (b) after 100 generations. The negotiation settings are p = 0.7 and n = 10. Each agreement is indicated by a point in these two-dimensional spaces. The Pareto-efficient frontier is indicated with a solid line. In point S [at (0.7, 0.7)] both agents obtain the maximum share for their most important issue, and receive nothing for the other issue.

58 3.3 Validation and interpretation of the evolutionary experiments 47 this percentage is between 1% and 10%. Timing is now important for efficiency. The evolutionary agents avoid disagreements by reaching agreements early: after 1000 generations, approximately 75% is reached in the first round. Next, we study the efficiency of the agreements reached in the experiments. The agreements are depicted in Fig This figure shows the utilities for both agents of the deals reached. Also depicted in Fig. 3.3 is the so-called Pareto-efficient frontier. An agreement is located on the Pareto-efficient frontier when an increase of utility for one agent necessarily results in a decrease of utility for the other agent. Agreements can therefore never be located above the Pareto-efficient frontier. A special point is the symmetric point S [at (0.7, 0.7)], where both agents obtain the maximum share of the issue they value the most, and receive nothing of the less important issue. Figure 3.3 shows that initially, many agreements are located far from the Paretoefficient frontier. After 100 generations, however, the agreements are chiefly Paretoefficient. We note that, even in the long run, the agents keep exploring the search space, resulting in a continuing moving cloud of agreements along the frontier. Conclusion. Results in this section thus show that the evolutionary agents reach efficient agreements, viz. on the Pareto-efficient frontier, and that disagreements are avoided. The next section studies the actual outcomes more closely, using results from game theory as a benchmark Further Analysis The computational results are analysed in more detail in this section and compared with game-theoretic results, and in particular the subgame perfect equilibrium (SPE) predictions (see Def. 18.3). Rubinstein and (much earlier) Ståhl applied this notion to the alternating-offers bargaining game [110, 121]. Our experimental setup differs in two respects from their model, however. First, the agents bargain over multiple issues instead of a single issue. Second, the evolutionary agents are myopic : they do not apply any explicit rationality principles in the negotiation process, nor do they maintain any history. Actually, they only experience the profit of their interactions with other agents. The SPE behaviour of rational agents with complete information will nevertheless serve as a useful theoretical benchmark. The equations for deriving the SPE outcomes in case of multiple issues are presented in Appendix 1. We distinguish between three classes of experiments w.r.t. the breakdown probability: (1) no risk of breakdown (p = 1), (2) a low breakdown probability (0.8 p < 1.0) and (3) a high breakdown probability (p < 0.8). For each of these classes we consider the role of n on the outcomes. We found that in our experiments, when p = 1, in the long run almost all agreements are delayed until the last round (about 80% after 1000 generations). Furthermore, the last offering agent makes a take-it-or-leave-it deal and demands

59 48 Multi-issue bargaining by alternating offers almost the entire surplus (on each issue), which is accepted by the opponent. This extreme division of the surplus agrees with game-theory (see Appendix 1.1); it is rational for the responder to accept any positive amount in the last round. Note, however, that rational agents are indifferent about the actual round in which the agreement is reached. The deadline-approaching behaviour in our experiments corresponds better to real-world behaviour [108], however. The EA results and SPE outcomes for different values of n (game length) are compared in Fig 3.4a. To guide the eye, the SPE outcomes for successive values of n are connected. Notice that the fitness of agents in population 1 converges to unity if n is odd, and to zero if n is even (the opposite holds for the agents in population 2). Figure 3.4b shows the results for p = Note that the partitioning becomes less extreme with a low breakdown probability compared to no breakdown. This holds for both SPE outcomes and EA results, although the effect is much stronger in the evolutionary system (see Fig. 3.4b). These differences with SPE are due to the myopic properties of the agents in the EA. The evolutionary agents do not reason backwards from the deadline (as in SPE), since most agreements are reached in the first few rounds (if p < 1). As a result, the deadline is not perceived accurately by the evolving agents. In fact, the game length is strongly overestimated. Furthermore, in SPE all agreements are reached without delay (see [126]). The EA, on the other hand, also continues to explore other strategies, which results in a remaining small number of disagreements (see Section 3.3.1). As p becomes smaller, the influence of the game length on the SPE outcome also decreases (see [126]). Instead, the first-mover advantage becomes more important. Therefore, if p becomes sufficiently small (e.g., p < 0.8), the computational results automatically show a much better match with SPE outcomes than if p is large: the match is almost perfect, although a small number of disagreements occur due to a continuing exploration of new strategies. This is clearly visible in Figure 3.5, which shows long-term results for n = 5 and different breakdown settings p. Interestingly, in the limit of n, game theory predicts that the agents in population 1 have a fitness of 0.71 when p = 0.95, whereas the agents in population 2 have a fitness of This corresponds to a point in the vicinity of the symmetric point S, indicated in Fig The results reported in Fig. 3.4b show that the behaviour of the agents corresponds much better to an infinite-horizon model than the finite-horizon model for n 5 (see Fig. 3.4b). The same behaviour was observed for other EA settings (e.g., larger population size) and other negotiation situations (e.g., other weight settings). We also studied the performance of the EA in case the number of issues m is increased to 8. 5 We observe that, for p = 1, the long-term outcomes of the EA 5 The 8-dimensional weight vector for agents in population 1 is set to (0.7,0.3,0.5,0.2,0.3,0.4,0.5,1.0)T 1 and equal to 3.9 (0.3,0.7,0.5,1.0,0.5,0.5,0.2,0.2)T for agents in population 2. These settings are such that they contain both competitive issues (e.g.,

60 3.3 Validation and interpretation of the evolutionary experiments 49 Figure 3.4: Comparison of the long-term evolutionary results with SPE results for (a) p = 1 (time indifference) and (b) p = The error bars indicate the standard deviations across 25 runs.

61 mean fitness (over 25 runs) 50 Multi-issue bargaining by alternating offers continuation probability p*100 Figure 3.5: Average long-term results using 2 issues for different values of p, where n = 5. are unstable and do not converge to the extreme partitioning. When we increase the population size for the EA from 25 to 100 agents, 6 the extreme partitioning reappears. Results are shown in Figure 3.6. Thus, for more complicated bargaining problems, the EA parameters must be adjusted. For m = 8 and p < 1, similar observations are found as reported in Section (like Fig. 3.4) when using the adjusted population size. Conclusion. Game-theoretic (SPE) results appear to be a very useful benchmark to investigate the results of the evolutionary simulations. In computational simulations without a risk of breakdown (case 1), agreements are predominantly reached in the final round. This deadline effect is consistent with human behaviour [108]. Furthermore, the last agent in turn successfully exploits his advantage and claims a take-it-or-leave-it deal (as in SPE). In case of a small risk of breakdown (case 2), the deadline is not accurately perceived by the evolving agents, and the last-mover advantage is smaller than predicted by game theory. In fact, if the finite game becomes long enough, results match the SPE outcomes for the infinite-horizon game. With a high risk of breakdown (case 3), however, this deviation from SPE becomes negligible. Finally, it appears to be important to adjust the EA parameter settings (e.g., by increasing population sizes) for more complex bargaining problems. issue 3) and issues where compromises can be made (e.g., issue 8). 6 To avoid a (quadratic) increase in the number of fitness evaluations, each agent negotiates with 25 (random) opponents.

62 3.4 Social extension: fairness SPE data mean fitness (over 25 runs) continuation probability p*100 Figure 3.6: Average long-term results using 8 issues for different values of p, where n = 5. These results are obtained using a population size of Social extension: fairness We extend the agent model within our evolutionary system in this section to study the influence of fairness, an important aspect of real-life bargaining situations. The motivation and description of this fairness model is given in Section In the fairness model studied in Section the evolving agents only take the fairness of a proposed deal into account when the deadline is reached. Section presents results obtained when agents perform a fairness check in each round. Section further analyses the model in Section for a simple case Motivation and description: the fairness model Game-theoretic models for rational agents often predict the occurrence of very asymmetric outcomes for the two parties. We showed in Section (see Fig. 3.4a) that such unfair behaviour can also emerge in a system of evolving agents, in particular when p = 1 or n is small (see Fig. 3.4). Large discrepancies between human behaviour in laboratory experiments and game-theoretic outcomes are found, however, both for ultimatum (a single round) and multi-stage (several rounds) games [13, 25, 67, 107, 109, 141]. A possible explanation for the occurrence of these discrepancies between theory and practice is the strong influence of social or cultural norms on the individual decision-making process. In [107, p. 264] and [50], for example, it is argued that responders tend to reject unfair or insultingly low proposals. There-

63 52 Multi-issue bargaining by alternating offers probability of acceptance (no fairness check) 1 2 (average fairness) (greedy behaviour) utility responder 1 Figure 3.7: Fairness functions used by the agents in the EA. fore, an anticipating agent should lower his demand in order to avoid a disagreement, this way taking into account the expectations about his opponent s behaviour. In [67] a model is proposed in line with this hypothesis. In their model, the probability of acceptance of an offer increases with the amount offered to the responder. Such a model, making more realistic assumptions about the agents behaviour, appears to organise the data from experiments with humans better than the SPE model [67]. Following [67], we introduce a fairness model in our evolutionary system. The agent model is extended as follows. If the value of an offer exceeds the responder s threshold, the agent has the opportunity to re-evaluate his decision. The probability that he finally accepts the agreement is then a function of the acquired utility. This so-called fairness function is assumed to be piece-wise linear (with up to three segments). 7 The instances that we use are shown in Fig We now further distinguish between two different extended agent models. In the first model, the fairness function is used at the deadline only. This situation is studied in Section In the second model, the fairness function is effective at any moment. This case is studied in Section The first case is motivated by the deadlineeffect observed in the experiments without a risk of breakdown (see Section 3.3.2), where most agreements are reached in the last round. The second case, however, is more likely to be an appropriate model of human behaviour Fairness check at the deadline In this section, fairness is applied in the last round. We study the case in which p = 1 and n = 3. Figure 3.8 shows that if the evolving agents in population 2 use fairness function 1 (i.e., a weak fairness model), the partitioning is much less extreme than 7 Piece-wise linear functions nicely fit the experimental data reported in [67, 109]. 8 We want to remark here that, although the fairness function is the same for all agents, the actual fairness function can depend on cultural norms in the real world [67].

64 3.4 Social extension: fairness 53 Figure 3.8: Mean fitness when fairness functions 0-5 are applied at the deadline. in case of no fairness check (function 0). However, the agents in population 1 still reach a relatively high fitness (utility) level. Fair agreements evolve, on the other hand, when the agents in population 2 use function 2 (a case with average fairness). In this case the mean long-term fitness is approximately equal to 0.7 for all agents (most agreements are thus located close to the symmetric point S in Fig. 3.3). When stronger fairness functions (e.g., functions 3 through 5) are used by the agents the roles reverse, and the agents in population 2 reach a higher fitness level than their opponents in population 1 (see Fig. 3.8). Because of the strong fairness check, many last-round agreements are rejected in this case and agents in population 2 can demand a larger share of the surplus in the round before last. As a result, the deadline is effectively reached one round earlier. This effect indeed occurs in our experiments. Conclusion. Our results show that fair outcomes can evolve in an evolutionary system with a fairness model in the last round. However, there is a rather large sensitivity to the actual fairness function that is used by the evolved agents; an average fairness function yields symmetric results, whereas more extreme fairness functions yield more asymmetric outcomes Fairness check in each round This section studies the second fairness model, in which the responding agent reevaluates all potential agreements. The EA settings are the same as in the previous section. The results in Fig. 3.9 for fairness functions 1 are similar to the previous case

65 54 Multi-issue bargaining by alternating offers Figure 3.9: Mean fitness when fairness functions 0-5 are applied each round. (see Fig. 3.8). However, when fairness functions 2 through 5 are used, the agents in both populations reach almost identical fitness levels. Most agreements now occur in the vicinity of point S in Fig Note that the agents have no explicit knowledge about the location of this point, and that this knowledge is also not incorporated within the fairness functions. We also observe that agreements are now reached in different rounds, whereas in earlier experiments without fairness most agreements occur at the very end of the game. Fig. 3.9 thus shows that the agents long-term behaviour is much less sensitive to the shape of the fairness function: the various stronger fairness functions all yield similar results. Figure 3.9 however indicates that when the agents use fairness function 5, the mean fitness of both agents decreases. This is due to the increasing number of disagreements which are a result of the strong fairness check. We furthermore studied a 2-issue negotiation problem with an asymmetric Paretoefficient frontier, as shown in Fig In this case, agent 1 values both issues equally important, whereas agent 2 has different valuations for each issue (his weights are 0.2 and 0.8 for issues 1 and 2 respectively). If each agent obtains the whole surplus on his most important issue, agent 1 obtains 0.5, whereas agent 2 gets 0.8. This outcome corresponds to the Nash bargaining solution (NBS), see Section The symmetric point (S), on the other hand, is located at ( 8, ).9 Both solutions can be considered to be fair outcomes in different ways: the first solution maximises the product of the agents utilities and also splits the surplus equally, whereas in the second case equal utility levels are obtained for both agents 9 This outcome corresponds to the Kalai-Smorodinsky solution, see Section

66 3.4 Social extension: fairness 55 Figure 3.10: Resulting agreements in a single generation when the Pareto-efficient frontier is asymmetric and fairness function 4 is used. (see [101, Ch. 16] for a related discussion). In the computational results, we observe that, when fairness functions 2-5 are applied, the agreements are divided and are usually concentrated in two separate clusters ( clouds ), see Fig The issue of the choice of and distribution over multiple fair agreement points seems an important issue for further research, both in a computational setting as well as in experimental economics. We also experimented with different weight vectors and with m > 2. A general finding is that extreme outcomes do not occur in the evolutionary process if the agents apply a fairness check. Conclusion. We have shown that fair agreements can evolve if fairness is evaluated each round, even with strong fairness norms: the fairness of the deals is much more stable w.r.t. the actual choice of the fairness function. Of course, the number of actual agreements drops if a very strong fairness function is used, resulting in a lower fitness for both parties. In case of two-issue negotiations with a symmetric Pareto-efficient frontier, most agreements are reached in the vicinity of the symmetric point. In the asymmetric case, fair solutions can also be obtained. The solutions are then distributed over various possible outcomes, which can all be considered fair in different ways. In the following, we first derive the game-theoretic subgame-perfect equilibrium for a relatively simple game (with only a single issue and using fairness function 4), and then compare the results with evolutionary outcomes for this game.

67 56 Multi-issue bargaining by alternating offers Payoff agent 1 Payoff agent 2 SPE EA (±0.022) (±0.014) Table 3.2: Comparison of the agents payoffs in the EA with SPE results. Round Offer Offer Threshold Threshold (SPE) (EA) (SPE) (EA) ± ± ± ± ± ± 0.13 Table 3.3: Comparison of the evolved strategies with game-theoretic (SPE) results for each round Validation and strategy analysis Although our incorporation of fairness aspects makes a game-theoretic analysis much more complicated, SPE strategies can again be derived for a very simple version: the game with only a single issue (m = 1) and fairness function 4. These settings were chosen because of mathematical feasibility. The general equations are presented in Appendix 2.1. A derivation for m = 1, n = 3, p = 1, and fairness function 4 is given in Appendix 2.2. Table 3.2 shows both the SPE results and the payoffs obtained by the evolving agents (in the long run) in the a with m = 1, n = 3, p = 1, and with the (rather strong) fairness function 4. Note that since m = 1, an agent s payoff equals the share obtained for issue 1. Results for the EA are obtained after 300 generations (averaged over 25 runs). Notice that the SPE payoffs are in good agreement with the outcome of the evolutionary experiments. However, in SPE agent 1 s payoff is slightly larger than agent 2 s payoff. In the EA this is reversed, although Table 3.2 shows that differences between theory and experiment are very small. We will further analyse the evolving strategies below. Table 3.3 compares the offers of the evolving agents (for each round) with SPE results, showing a good match. From Table 3.3, it can be derived that agreements are reached in all rounds, with some emphasis on the first round. 10 Table 3.3 also shows the acceptance thresholds (the thresholds are calculated based on the payoff which an agent expects to receive if he rejects the current offer, see Appendix 2). Because the thresholds in rounds 2 and 3 are much lower than the obtained utility, the thresholds in these rounds are not really relevant in SPE. This explains the large variance of the thresholds in the EA and why these thresholds can 10 Acceptance rates are approximately 39%, 22%, 20% in SPE in rounds 1-3, and 36±4%, 25±3%, 20 ± 2% for the EA in rounds 1-3.

68 3.4 Social extension: fairness 57 mean threshold population 2 (round no. 1) generation Figure 3.11: Average threshold values of the agent strategies in the EA in the first round. deviate from SPE predictions in these rounds. In round 1, the threshold is important in SPE and influences the offer made. The experiments show a much lower average threshold value than the SPE (see Table 3.3). Nevertheless, the thresholds influences the offers made in the EA due to a high variance of the threshold values. We analyse this more closely. Figure 3.11 shows the evolution of the threshold value for the first round for a single experiment. The indicate the variance in the population. Notice that this variance and the volatility of the mean threshold is rather high. This forces the offers in population 1 to be similar as in SPE. In order to obtain an even better match with SPE results, we reduced the occurrence of frequent peaks by using a decreasing mutation step-size in the EA (instead of self-adaptive mutation step-sizes, see Section 1.2.3). With this approach, the mutation step sizes σ i are gradually decreased in the course of evolution. 11 At the beginning of each EA run, σ i is set to 0.1 for all i (as before, see Table 3.1) and then exponentially decrease until σ i = 0.01 after 1000 generations. This procedure indeed reduces the fluctuations in the threshold values and the offers in the long run. Results for experiments with this EA setting appear to be in excellent agreement with SPE results, see Table 3.4. We found no significant effect of the new mutation scheme on the evolutionary outcomes for m = 2, however. We suspect that this is due to the integrative nature of the negotiation problem, where the results obtained are already beneficial for both parties. Conclusion. This relatively simple bargaining situation shows a good match 11 A similar approach was applied in [3] for a genetic algorithm.

69 58 Multi-issue bargaining by alternating offers Payoff agent 1 Payoff agent 2 SPE EA with decreasing σ i ± ± Table 3.4: Comparison of the evolutionary agents payoffs after 1000 generations (using exponentially decreasing mutation step-sizes) with SPE results between theoretical (SPE) and experimental results. Furthermore, when fairness norms are applied, the outcome of the negotiation process comes to depend on the actual round in which an agreement is finally reached, while thresholds play an important role in some of the rounds. We also showed that EA parameters can be fine-tuned for a more stable situation if needed. This rendered an excellent match with the SPE for m = Concluding remarks We have investigated a system for negotiations, in which agents learn effective negotiation strategies using evolutionary algorithms (EAs). Negotiations are governed by a finite-horizon version of the alternating-offers game. Several issues are negotiated simultaneously. Both negotiations with and without a risk of breakdown have been studied. Our approach facilitates the study of cases for which a rigorous mathematical approach is unwieldy or even intractable. We presented computational results for several difficult bargaining problems in this chapter. We first validated the long-term evolutionary behaviour using the game-theoretic concept of subgame-perfect equilibrium (SPE). When no risk of breakdown exists, the last agent in turn proposes a take-it-or-leave-it deal in the last round and demands most of the surplus for each issue. This extreme division is consistent with SPE predictions. When a risk of breakdown exists, most agreements in the EA are reached in the first round. If the finite game becomes long enough, the deadline is therefore no longer perceived by the evolutionary agents and results actually match SPE predictions for the infinite-horizon game. We also modelled and studied the concept of fairness, where a responding agent carries out a fairness check before an agreement is definitely accepted. This fairness check was modelled in two ways: a responding agent considers fairness only at the deadline or all the time, for any potential agreement. In both cases, fair outcomes can be obtained but the outcomes in the second case are much less sensitive to the actual choice of the fairness function. In case of an asymmetric bargaining situation (where the players have asymmetric preferences), multiple outcomes then exist which can be considered fair in different ways. We also found a good match between the EA results and game-theoretic SPE predictions for a simple bargaining game (concerning a single issue).

70 3.5 Concluding remarks 59 An interesting line of research is to further explore the notion of fairness and to compare the computational outcomes with results from experimental studies with human subjects. Of particular interest is the study of asymmetric multi-issue bargaining situations, where more than one outcome can be considered fair. This raises several new research questions for experimental economics as well as computational sciences.

72 Chapter 4 Bargaining with multiple opportunities In the advent of ubiquitous application of agent technology, bargaining agents are expected to play an essential role in electronic market places. The agents in a competitive market are self-interested and can be equipped with the ability to autonomously search for products and services and negotiate the terms of an agreement. In this chapter, 1 we focus on strategic aspects of bilateral bargaining within a market-like setting. We use the one-shot ultimatum game as the basic bargaining procedure for our model, a well-known approach within the field of game theory. In this game (see also Section 2.3.2), two players, a proposer and a responder, negotiate about the division of a bargaining surplus (see Section 1.1.2). The proposer makes an offer and the responder can only choose to accept or reject this offer. The ultimatum game has been extensively researched, both theoretically and by experiments using human subjects [67, 90, 107]. The ultimatum game models a negotiation between an isolated pair of players. In a market setting, however, an agent s behaviour can change if future opportunities are taken into account. This chapter introduces a natural extension of the basic ultimatum game in which fall-back opportunities are explicitly modelled. Both the proposing and the responding agents have several bargaining opportunities with different opponents before their final payoff is determined. In this way a market place is modelled where several sellers and buyers are available. The game is further extended to allow several issues to be negotiated simultaneously, as in the previous chapter; not only the price, but also other important 1 This chapter is based on [38]: E.H. Gerding and J.A. La Poutré. Bargaining with posterior opportunities: An evolutionary social simulation. In M. Gallegati, A. Kirman, and M. Marsili, editors, The Complex Dynamics of Economic Interaction, Springer Lecture Notes in Economics and Mathematical Systems (LNEMS), Vol. 531, pages Springer-Verlag,

73 62 Bargaining with multiple opportunities attributes such as delivery time, package deals, warranty, and other product-related aspects can be taken into account. This can reduce the competitive nature of the game since trade-offs can be made to obtain win-win solutions. Furthermore, we study the impact of search costs if an offer is refused and a new opponent needs to be found. In addition, we consider the case where uncertainty exists about future opportunities and a new opponent cannot always be found. An important aspect within this setting is the information available to the agents about their opponents. We distinguish between the complete information case, where an agent s current number of future bargaining opportunities is common knowledge, and the incomplete information case, where this information is known to the protagonist but hidden from the opponent. The complete information case can be approached theoretically using game theoretic subgame-perfect equilibrium (see Def. 18.3) given reasonable assumptions. The subgame-perfect results show an extreme split of the surplus, similar to the ultimatum game: the proposer claims the entire surplus, and the responder accepts this deal. The incomplete information case, on the other hand, seems much more difficult to analyse theoretically. We therefore apply an evolutionary simulation as described in Section 1.2 to investigate this setting. We also compare the evolutionary and the theoretical approach in the complete information case. The evolutionary outcomes show a good match with the game-theoretic results. Moreover, the simulation shows that results differ significantly if information about the opponent s future bargaining opportunities is not available: if the number of bargaining opportunities is sufficiently high, the responder now obtains the largest share. The outcomes in the incomplete information case, however, also depend on the existence of positive search costs. Search costs stimulate agents to reach agreements early and discourage both players to exploit the additional opportunities. In the evolutionary simulation, the agreements are then similar to the one-shot ultimatum game. A similar effect is observed if bargaining is terminated with a small probability because no new opponent can be found. This chapter is organised as follows. In Section 4.1 the bargaining game with multiple bargaining opportunities is described. Section 4.2 provides a game-theoretic analysis of the game in case of complete information. Section 4.3 outlines the evolutionary simulation and Section 4.4 discusses the obtained results from the simulation. Lastly, Section 4.5 concludes. 4.1 Description of the bargaining game The modelled market consists of buyers and sellers who exchange a single good through bilateral negotiations. At each bargaining opportunity, an ultimatum-like game is played, where the proposer makes an offer and the responder can reject or

74 4.1 Description of the bargaining game 63 Buyer γ b =2 o=(0.5,0.5) Reject Seller1 γ s =1 γ s =0, u s =0 o=(0.6,0.6) Seller2 γ b =1 Accept γ s =2 Agreement Figure 4.1: A two-issue negotiation example in a market where each agent has two initial bargaining opportunities (n = 2). accept this offer. 2 If an agreement is reached, both agents obtain a payoff equal to their utility of the offer. For convenience, we use seller and buyer to denote a proposer and responder respectively in the following (although we previously used the terms agent 1 and agent 2, this is not suitable here since several buyers and sellers can participate in a single market game). In our model an offer consists of one or more issues. The utility is calculated as in Chapter 3 (cf. Section 3.1): the seller s utility u s for an offer o can be written as w s o = m i=1 ws i o i, where w s is a vector containing the seller s weights for each issue and m is the number of issues. Similarly, the buyer s utility function u b = w b [ 1 o], where w b represents the buyer s weights. The utilities of the agents are normalised between 0 and 1. The differences in weights of the two players determine the degree of competitiveness of the negotiations (i.e., to what extend trade-offs can be beneficial). We formalise the notion of competitiveness and address this issue further in Section Each buyer and seller initially has up to n bargaining opportunities to reach an agreement. In case of a disagreement the agents are newly matched with randomly selected opponents, until no more bargaining opportunities remain. The number of remaining bargaining opportunities we call an agent s bargaining state, denoted by γ s {0, 1,...,n} for a seller and γ b {0, 1,...,n} for a buyer. If an agent s bargaining state reaches zero, the agent obtains a disagreement payoff which is set to zero. An example for a two-issue negotiation is shown in Figure 4.1 from a buyer s perspective. The buyer, whose initial bargaining state is γ b = 2, first encounters a seller, seller 1, with bargaining state γ s = 1. The seller proposes an offer o = (0.5, 0.5) and the buyer refuses this offer. Because the seller has no more bargaining 2 Alternatively, the multi-round alternating-offers game (e.g. see chapter 3) can be used. As shown in chapter 3, however, outcomes are equivalent to the ultimatum game, if no time pressure exists; agreements are delayed until in the final round a take-it-or-leave-it offer is made.

75 64 Bargaining with multiple opportunities opportunities his bargaining game ends and he obtains the disagreement payoff. The buyer, on the other hand, can continue bargaining when matched with another opponent, seller 2. In the example this opponent with γ s = 2 offers (0.6, 0.6). The buyer now accepts and the bargaining game ends for both agents. Note that even though the agents initially have equal bargaining opportunities, the matched agents can have different bargaining states. Having agents with different states is an important aspect of the market game, particularly when agents are unaware of their opponent s remaining opportunities. We assume that, once an offer is rejected, agents cannot go back on a previous offer. 3 We also assume that there are an equal number of buyers and sellers in the market. This in contrast to the work in e.g. [89], where markets are studied with unequal number of buyers and sellers. 4.2 Game-theoretical approach This section considers the game-theoretic subgame-perfect equilibrium (SPE) of the above game where the agents bargaining states are common knowledge. A gametheoretical analysis seems to be very difficult if the agents have incomplete information of their opponent s bargaining state. We will, however, drop the complete information assumption in the evolutionary approach (Section 4.3). In the following analysis we assume all agents of a specific type (i.e., buyer or seller) apply the same negotiation strategy. This assumption is reasonable since the preferences are identical for a given type. In case of a single opportunity, the bargaining game is reduced to the ultimatum game. The ultimatum game has a unique SPE where the seller (here the proposer) claims the total share for each issue, and the buyer (the responder) accepts this takeit-or-leave-it deal [90]. This result can be obtained by applying backward induction. Intuitively, a rational buyer will accept any positive amount, which is always better than obtaining the zero payoff in case of a disagreement. The SPE is precisely the point where the buyer is indifferent between accepting and refusing. We argue that the game with multiple bargaining opportunities and complete information has an SPE with the same outcome as the ultimatum game: the seller obtains the entire share, and the buyer receives the disagreement payoff, which is set to zero. 4 Consider a buyer with γ b = 1, i.e. with a final bargaining opportunity remaining. The buyer will then accept any positive amount offered by the seller. An anticipating seller will then claim the entire share, as in the ultimatum game, independent of γ s. In SPE, the buyer s payoff for γ b = 1 therefore equals zero. Note that this only holds if the seller is informed about the buyer s bargaining state. 3 Agents are said to have no recall [149]. 4 This holds for continuous divisions of the surplus.

76 4.3 Evolutionary approach 65 If γ b = 2, the buyer has two bargaining opportunities. Using the above, we can replace the payoff for refusing the seller s offer when γ b = 2 by the disagreement payoff. The situation for γ b = 2 is now equal to γ b = 1: the buyer is indifferent between accepting and refusing a value of zero and in SPE the buyer accepts this deal, independent of γ s. By backward induction the same holds for γ b = n. We note that, because the agents are indifferent to the bargaining state in which the agreement is reached, actually several subgame-perfect equilibria can exist. In all cases, however, the divisions are the same. Note furthermore that the above argument only holds if the seller is informed about the buyer s number of remaining bargaining opportunities. If this information is not available, a game-theoretic analysis seems much more difficult. An evolutionary simulation, however, is very apt to analyse the case of incomplete information. We analyse both the completely informed and the uninformed case in Section 4.4. First, the evolutionary system is described in detail. 4.3 Evolutionary approach We use an evolutionary algorithm to evolve the strategies of the agents. The evolutionary simulation is depicted in Figure 4.2. The evolutionary algorithm is based on the implementation described in Section As in Chapter 3, each strategy in the EA corresponds to an agent of a certain type (buyer or seller), and we use separate populations to evolve the strategies of the two types of agents. The way in which the fitness of the agents is determined, however, differs from the approach described in Chapter 3. In the previous model, each agent was evaluated against all agents in the opponent s population. In this case, however, all agents together constitute a market-like setting, where buyers and sellers can bargain several times with different opponents before their final fitness is determined. Also because the interactions determine the bargaining states of the agents, another approach is required here. The fitness of the agents is determined as follows. The parental and offspring populations are first combined to form a group of sellers and a group of buyers. The agents are then evaluated by a sequence of pair-wise matches. For each match, two agents are randomly selected (with replacement) and play the one-shot game. An agent obtains a payoff in case an agreement is reached or the disagreement payoff (which is zero) if no more opportunities are available for this agent. If an agent still has opportunities remaining, his fitness remains undetermined. Note that, since both agents can be in different bargaining states, the consequences of a disagreement may be different for the individual agents. Because an outcome depends on many random factors, each strategy is evaluated a number of times and the fitness is the average of r payoff values. The parameter r is called the evaluation frequency. This way the fitness becomes a more accurate measure of the expected payoff. The bargaining games continue until all agents have obtained at least r payoff values.

77 66 Bargaining with multiple opportunities Figure 4.2: Iteration loop of the evolutionary algorithm. Since both buyers and sellers start with the same bargaining state, in the first periods the opponent s bargaining states do not represent an ongoing bargaining society. To prevent so-called initiatory effects and to model an on-going bargaining society, a strategy s fitness is only measured after the first payoff is determined. A strategy is thus evaluated at least r + 1 times. Furthermore, we model a market situation where the number of agents remains constant over time, also called a steady-state market in [89]. Therefore, once the fitness of a strategy has been established, the strategy can still be selected to play again but its fitness is no longer affected by the outcome. The bargaining games are continued until the fitness for each strategy has been established Strategy Encoding The strategy, encoded on the chromosome, specifies either an offer or a threshold for each bargaining state, depending on the type of the agent (i.e., sellers only have offers and buyers only have thresholds). The threshold determines whether an offer of the opponent is accepted or rejected: if the utility falls below the threshold the offer is refused; otherwise an agreement is reached. A similar representation was used in Chapter 3 for the alternating-offers game, although in that game all strategies contain both offers and thresholds. We distinguish between the complete information setting and the incomplete information setting (see Section 4.1). The strategy representation depends on this setting and is schematically depicted in Figures 4.3 and 4.4 for the complete and incomplete information case respectively. In the incomplete information case (Figure 4.4), an offer or threshold is specified for each bargaining states of the agent. In case of complete information (Figure 4.3), an offer or threshold is also conditional on the opponent s bargaining state.

78 4.4 Evolutionary simulation results 67 Seller o(1 1) o(2 1)... o(n 1) Strategy o(1 2) o(2 2)... o(n 2) o(1 n) o(2 n)... o(n n) Buyer t(1 1) t(2 1)... t(n 1) Strategy t(1 2) t(2 2)... t(n 2) t(1 n) t(2 n)... t(n n) Figure 4.3: The strategies of a seller and a buyer for the market game with complete information about the opponent s bargaining state. The offers o(γ s γ b ) and thresholds t(γ b γ s ) are conditional on the bargaining state of the opponent, where γ s,γ b {1,...,n}. Seller Strategy o(1) o(2)... o(n) Buyer Strategy t(1) t(2)... t(n) Figure 4.4: The strategies of a seller and a buyer for the market game, where the players are uninformed about the opponent s bargaining state. An offer o(γ s ) or threshold t(γ b ) is only determined by an agent s own bargaining state, since more information is not available Mutation Operator Although several mutation models were tried, the mutation model with exponential decay showed a closest match with game-theoretic benchmark cases. We therefore only report the results using the exponential decay model. This mutation operator is explained in Section Evolutionary simulation results The results are organised as follows. First, the game with complete information is studied in Subsection and the results are compared to the game-theoretic (SPE) predictions. Subsection studies the incomplete information case. Subsection introduces a measure of competitiveness for multi-issue negotiations and compares results for different levels of integrative negotiations. Finally, in Subsection considers the effects of fixed search costs in the market game and

79 68 Bargaining with multiple opportunities Parental population size (µ) 30 Offspring population size (λ) 30 Initial standard deviations (σ) 0.1 Mutation model exponential decay Standard deviation half-life (t) 400 Number of generations 4000 Number of runs per experiment 30 Strategy evaluation frequency (r) 20 Table 4.1: Default settings of the evolutionary simulation. uncertainty about future opportunities Game-Theoretic Validation This section considers a competitive (i.e., single-issue) scenario with complete information of the agents bargaining opportunities and compares the evolutionary algorithm (EA) outcomes to SPE predictions. Default parameter settings for the EA are shown in Table 4.1. Note that because of random fluctuations, the EA results are averaged over 30 runs using the same settings. In SPE the share of the buyers is zero and the sellers obtain the whole surplus in case the initial number of bargaining opportunities of the players is finite, and the bargaining state of the opponent is common knowledge (see also Section 4.2). Figure 4.5 shows the EA outcomes for different values of n (initial bargaining opportunities). The results indicate an almost perfect match between evolutionary outcomes after 4000 generations and game-theoretic outcomes, particularly when n is small. For larger values of n we find that, using the same EA parameter settings, the evolutionary outcomes become somewhat less extreme. See also Figure 4.6, which shows the long-term EA outcomes (after 4000 generations) for n up to 10. This is because as n becomes larger, the complexity of the problem increases due to a larger search space, making learning by an EA more difficult. However, a better match for larger values of n also appears by adjusting EA parameters, such as the evaluation frequency and the population size, to handle the increased complexity. Details on tuning the EA are not treated here. Instead, we refer the interested reader to previous research [126], in which different EA settings are systematically studied for an alternating-offers bargaining game. Henceforth, we present only experiments using uniform EA settings here.

80 4.4 Evolutionary simulation results 69 Figure 4.5: Development of the mean fitness (averaged over 30 runs) for complete information setting with varying initial number of bargaining opportunities. Figure 4.6: Results after 4000 generations (averaged over 30 runs) in case of complete information.

81 70 Bargaining with multiple opportunities mean population fitness buyer seller initial number of bargaining opportunities (n) Figure 4.7: Results after 4000 generations (averaged over 30 runs) for incomplete information settings with various n. The error bars indicate the standard deviation of the averaged results Incomplete Information We now examine the results when the agents do not know their opponent s bargaining states; the agents only know their own bargaining states. Although no explicit information is available, the agents implicitly learn the distribution of the bargaining states in the opponent s population. This distribution is endogenously determined by the strategies of the agents. The strategies, in turn, adapt to the distribution of the bargaining states. This complex interaction is one reason why theoretical analysis is difficult. An EA, on the other hand, is well suited to find outcomes that emerge from such local interactions. Results produced after 4000 generations of the EA for the incomplete information case are shown in Figure 4.7, for different values of n (the initial number of bargaining opportunities). These results are averaged over 30 runs. The error bars indicate the standard deviation. Whereas in the complete information case the seller obtains almost the entire surplus, the responder (i.e., buyer) has the best bargaining position in the incomplete information case (see Figure 4.7). This holds as long as the initial number of bargaining opportunities are sufficiently large (i.e., 5). Note that these results are obtained even though the buyers and sellers initial settings are equal. The results can be explained as follows. If the buyer is in her final state, she will accept any deal (as in the ultimatum game). In other states, however, the buyer can try to find a better deal elsewhere. Consider a seller in his last bargaining

82 4.4 Evolutionary simulation results 71 state. Because he does not know the buyer s bargaining state, he can no longer anticipate the buyer s behaviour. In order to prevent a disagreement, the sellers will then concede in the last bargaining state. The expected payoff in case of a disagreement and the offers in earlier bargaining states will then also decrease. After many generations, the simulation converges to an outcome where the seller concedes almost his entire surplus in each bargaining state. We also observe that the seller concedes slightly less if he has more bargaining opportunities remaining, resulting in less extreme deals if n becomes large, as shown in Figure 4.7. In the incomplete information setting the first-mover (here the seller), has no information about his opponent. The responder, on the other hand, can make a relatively more informative decision based on the seller s offer. Whereas in the ultimatum game the proposer seems to dominate the outcome, a more competitive setting allows the responder to obtain a considerable advantage. This result, however, holds only if the number of bargaining opportunities is finite and equal for both players. Furthermore, the players incur no costs for refusing a deal. As we will show in Section 4.4.4, even slight costs completely change these results. When the number of initial bargaining opportunities is set higher than three, a sudden transition in the long-term outcomes can be observed in Figure 4.7: up to n = 3, the seller obtains almost all, whereas the buyer obtains the largest share if n > 3. By increasing n, the number of possible states increases, making the buyer s behaviour less predictable for the seller. The value for which the transition occurs depends on game parameters such as r and the competitiveness of the negotiation. The latter will be discussed further in the next section Integrative Negotiations An advantage of bilateral negotiation is the ability to negotiate complex contracts with several issues. When mutually beneficial solutions are available, negotiations are called integrative (see Section 2.3.3). We consider integrative two-issue negotiations in this section and introduce the notion of competitiveness. We show that the information in the integrative case has a very similar impact as in the competitive case. Due to increased complexity, however, the evolutionary results are less extreme when the number of bargaining opportunities is large. The utility of an offer is an additive, weighted function of the share obtained for each issue (see also Section 4.1). The weights for sellers and buyers for the two issues are w s = (0.5 α, α) and w b = (0.5 + α, 0.5 α) T respectively, where α [0.0, 0.5] is the so-called degree of competitiveness. When the parameter α is set equal to 0, negotiations are purely competitive; if α = 0.5 there is no competition at all. Note that the maximum social welfare, i.e. the maximum total utility that a seller and a buyer can achieve together equals 2 (0.5+α), where each agent obtains (0.5 + α).

83 72 Bargaining with multiple opportunities 1 buyer (incomplete) mean population fitness seller (complete) buyer (complete) seller (incomplete) initial number of bargaining opportunities (n) Figure 4.8: Mean long-term outcomes for two-issue negotiations and α = 0.2. Results for α = 0.2 are visualised in Figure 4.8. The results show that, as in the competitive case, a transition occurs to a buyer-dominated outcome for sufficiently large n and incomplete information. We find, however, that this transition already occurs when n = 2 (see Figure 4.8). Only two bargaining opportunities are needed to obtain an advantage for the responder, as supposed to four in the single-issue game (Figure 4.7). Figure 4.8 also shows a less extreme split compared to competitive negotiations, particularly for large n. This occurs firstly since the strategy search space is increased (a value for each issue needs to be learned), making learning more difficult. Moreover, the win-win possibilities are fully exploited: if one of the agents slightly concedes, the other agent can obtain a relatively large gain by negotiating a Paretoefficient deal. As shown in Figure 4.9, this effect becomes stronger as α increases. In the extreme case, where α = 0.5, both agents can obtain the full surplus without any concession. Note that the EA parameters are fixed for the various game settings. As we mentioned in Section we can adjust the parameters to handle more complex bargaining settings as a result of a larger n and an increased number of issues. By increasing the population size and adjusting other parameters of the EA, we obtain results which are closer to game-theoretic predictions.

84 4.4 Evolutionary simulation results 73 1 buyer (incomplete) mean population fitness seller (complete) buyer (complete) seller (incomplete) competitiveness Figure 4.9: Mean long-term outcomes for n = 5 and different values for the competitiveness (α) Search Costs and Premature Termination We further extend the bargaining game in two ways. First, we introduce search or negotiation costs each time an offer is refused and agents engage in a new negotiation. Subsequently, we consider the case where there exists uncertainty about whether a new bargaining opponent can be found. Whereas we have assumed until now that the number of bargaining opportunities remains fixed, there can be external factors which influence the number of opportunities (e.g., if a seller has in the meanwhile sold the good to another buyer). This is modelled as a probability that negotiations terminate prematurely, i.e., before the final number of bargaining opportunities is completely exhausted. Search costs can represent the amount of money, time, or effort that an agent may incur for finding a new opponent. It is shown theoretically that if buyers have search costs, the sellers charge monopolistic prices in equilibrium [22, Ch.7]. We consider the impact of search costs on the bargaining game where both buyers and sellers have equal search costs β. The final utility is reduced by fixed search costs β for each new bargaining opportunity. Only the first bargaining opportunity has no costs. Evolutionary outcomes for the complete and incomplete information settings with different search costs are depicted in Fig Negotiations are competitive and buyers and sellers each have 5 initial bargaining opportunities. Search costs seem

85 74 Bargaining with multiple opportunities 1 mean population fitness seller (incomplete) buyer (incomplete) seller (complete) buyer (complete) search costs Figure 4.10: Mean long-term results as a function of the search costs (β) for n = 5. to have little impact on the fitness in the complete information case; variations are not statistically relevant. Although the fitness does not change, the actual behaviour of the agents does: most agreements are now reached immediately. Without search costs, agreements reached are distributed over the various bargaining states. In the incomplete information case, on the other hand, even small search costs have a drastic impact on the fitness of the agents, see Figure The sellers claim almost the entire share even if search costs are very small (e.g. 0.01) and equal for both agents. Results are robust for different settings of the EA. These outcomes are consistent with economic theory, which states that prices become monopolistic even if search costs are infinitely small. As in the complete information case, both buyers and sellers are stimulated to reach agreements early in case of search costs. The final opportunity of the seller is therefore almost never reached, removing the advantage for the buyer. The game changes from a game with incomplete information, to a game where almost all players complete a deal in their first bargaining opportunity. Now the seller can again claim the entire surplus as in the one-shot game. Similar outcomes are observed when bargaining for a buyer and/or a seller is discontinued with a certain probability after each disagreement. 5 Figure 4.11 shows the long-term outcomes for different probabilities of premature termination after each bargaining opportunity. The probability is set equal for buyers and sellers, and 5 This is analogous to discount factors or a probability of break down in case of multi-round bargaining, as used in e.g. Chapter 3.

86 4.5 Concluding remarks 75 1 mean population fitness seller buyer probability of premature termination Figure 4.11: Long-term fitness values for n = 5 and incomplete information, when negotiations are discontinued with a certain probability after each disagreement. for each bargaining opportunity, but drawn independently. As with search costs, the seller obtains the largest share if the probability is sufficiently high. Note that the effect of premature termination is less extreme, however. This is because search costs also affect the utility if an agreement is not reached, providing an additional incentive to reach agreements (otherwise, a negative utility is obtained). In case of premature termination, on the other hand, an agent is indifferent between termination after the first bargaining opportunity and a disagreement in the last bargaining opportunity. 4.5 Concluding remarks We study the evolutionary dynamics of a market-like game in this chapter, where a seller sells a single good and has several opportunities to do so. At the same time, a buyer wishes to buy an item by trying several sellers. The terms of an agreement are negotiated using an ultimatum-like game, where the seller proposes an offer and the buyer can choose to accept or reject the offer. The game is extended to allow for multiple opportunities for both the seller and the buyer if the deal is rejected. This way a competitive market is modelled. We furthermore investigate multi-issue integrative negotiations and the effects of search costs and premature termination if a disagreement occurs. The game-theoretic outcome using subgame-perfect equilibrium (SPE) for the

87 76 Bargaining with multiple opportunities one-shot ultimatum game predicts an extreme split of the surplus: the seller obtains the whole surplus whereas the buyer obtains her disagreement payoff. We extend the analysis for multiple bargaining opportunities with complete information of the opponent s bargaining state and find an equivalent outcome. A theoretical analysis seems to be very difficult, however, if the bargaining states of the agents are not common knowledge. An evolutionary simulation, on the other hand, is very well suited to investigate such games with incomplete information. We first compare the evolutionary results with the game-theoretical outcomes for the game with complete information to validate the evolutionary approach. If the initial number of bargaining opportunities is small, a very good match is found. In larger games or when the negotiations become less competitive, the EA shows somewhat deviating outcomes due to larger search space and the limited computational capacity of the EA. We note that we mainly report experiments using uniform EA settings in this paper. However, adjusting EA settings appear to improve results even further for more complex games. The evolutionary simulation shows a large impact of the additional bargaining opportunities if the agents have no information on their opponent s number of future opportunities. Whereas in the complete information game the seller dominates the market, the buyer is better off in the incomplete information setting, as long as the number of bargaining opportunities is sufficiently high. By increasing the initial number of bargaining opportunities a sudden transition is observed where the buyer obtains the largest share instead of the seller. This occurs because the seller can then no longer anticipate the buyer s response and gives in to avoid a disagreement. Similar outcomes are found for two-issue integrative negotiations. At the same time, integrative negotiations produce less extreme evolutionary outcomes, both in the game with complete and incomplete information, particularly if the number of initial bargaining opportunities is large. This mainly occurs since the space of possible deals increases. Moreover, the agents find win-win situations which benefit one agent without affecting the payoff obtained by the opponent. An integrative setting also already affects small games with incomplete information: we find that for certain settings, a transition from a seller to a buyer dominated payoff occurs even in case both agents merely have two initial bargaining opportunities, whereas in the competitive case more bargaining opportunities are needed to achieve the same result. We also study the effect of search or negotiation costs in case a negotiation fails and the agent needs to find a new opponent. Search costs induce players to reach an agreement in the very first bargaining opportunity. This changes an incomplete information game into an ultimatum-like game with only a single bargaining opportunity. Even very small search costs result in an extreme split where the seller obtains almost the entire share, similar to the ultimatum game outcome. This is consistent with economic theory which states that even infinitely small search costs produce

88 4.5 Concluding remarks 77 monopolistic prices. The outcomes are similar but less extreme if search costs are replaced by a probability that bargaining is discontinued after a disagreement. This models the situation where uncertainty exists about future opportunities. In this chapter we have shown that evolutionary simulations are extremely useful to investigate negotiations with incomplete information, which are unwieldy to analyse theoretically. Using evolutionary algorithms, we can simulate complex interactions involving a large number of agents, as is the case in bargaining with multiple opportunities. It is interesting to further refine the model to specific realworld settings, where for instance agents have incomplete information about their own future number of bargaining opportunities. Another interesting extension is allowing agents to return to previously encountered opponents.

90 Part B Bargaining systems for business applications

92 Chapter 5 Competitive market-based allocation of consumer attention space In this chapter, 1 we consider an e-business application of automated negotiation using software agents. We present a framework for a distributed Competitive Attention-space System, CASy, to allocate the scarce resource that is consumer attention via the techniques of dynamic market-based control [20, 23, 43] and adaptive software agents (see see Section and [47, 60, 144]). In the example of an electronic shopping mall, CASy recommends shops to a consumer: the task of matching a consumer to a set of suitable shops is delegated to the individual shops, each of which evaluates the information that is available about the consumer and his or her interests (the consumer s interests and other information which the consumer is willing to provide; e.g. keywords, product queries, and available parts of a profile). Based on this information and on their domain knowledge, shops can make a monetary bid in an auction where a limited amount of consumer attention space, or banners, for the particular consumer is sold. To facilitate CASy, the system is designed as a multi-agent system (see Section 1.1.3) where each shop is represented by a software agent that executes the task of bidding for the attention of each individual consumer. The use of learning software agents allows shops to rapidly adapt their bidding strategy such that they only bid for consumers that are likely to be interested in their offerings. Further- 1 The results of this chapter have been published in [17]: S.M. Bohte, E.H. Gerding, and J.A. La Poutré. Market-based recommendation: Agents that compete for consumer attention. ACM Transactions on Internet Technology, Special Issue on Machine Learning in the Internet, August 2004 (to appear). A shorter version appeared as [16]: S. M. Bohte, E. H. Gerding, and H. La Poutré. Competitive market-based allocation of consumer attention space. In M. Wellman, editor, Proceedings of the 3rd ACM Conference on Electronic Commerce (EC-01), pages The ACM Press,

93 82 Competitive market-based allocation of consumer attention space more, efficient bidding for each customer is only feasible when automated: hence the use of software agents. These agents allow a shop to process a large number of small transactions, and enable them to make a deliberated bid for every customer entering the shopping mall. In CASy, shops react to consumer behaviour and to behaviours of other shops, yielding various interdependencies in the commercial effects related to being displayed together with competitors. For various basic and simple models for on-line consumers, shops, and profiles, we demonstrate the feasibility of our system, i.e., that proper matchings of consumers with shops are achieved, and that shops can learn their niche in the market, even in the case of such interdependencies. Especially, to validate the economical concept of the market mechanism underlying CASy, we develop an evolutionary system for bidding supplier agents. In this approach, the agent system is investigated like an (adaptive) economic market, as in agent-based computational economics (ACE) (see also Section 1.2, and Chapters 3 and 4). Furthermore, in this chapter we reflect on the merits of the system, and assess the advantages and issues that need further attention, from both the technological and the economical point of view. In [17] we extend this work and also develop adaptive software agents that learn bidding strategies based on neural networks and strategy exploration heuristics. We note that the mechanism we describe is not limited to the example of the electronic shopping mall, but can easily be extended to other domains where (pre) selection of possibilities has to be guided, like banners on more general websites, attention spaces on mobile devices, or other types of marketplaces. This chapter is organised as follows. First, Section 5.1 motivates the decentralised, agent-based approach for allocating attention space, and discusses related approaches. In Section 5.2, the design of CASy is presented. The evolutionary simulation is explained in Section 5.3, whereas Section 5.4 contains the results. Section 5.5 reflects on practical implementation issues such as privacy and the communication overhead of the mechanism. Finally, Section 5.6 concludes. 5.1 Motivation and related research Before describing CASy in more detail, we first elaborate on the merits of such a system, and the motivation for using software agents. Also, we discuss related work. In Section we compare the decentralised approach with the more commonly used centralised approach. In Section we comment on the use of software agents. Section gives an overview of related work.

94 5.1 Motivation and related research Centralised vs. decentralised recommendation With the advent of electronic marketplaces, scale limitations as encountered in the brick-and-mortar world no longer apply: the supply side of the market is no longer restricted by geographical considerations or lack of physical (shelf) space. At the same time, novel problems are encountered, like how consumers can find their way in a large marketplace where very many suppliers offer their products. To this end, a mechanism provided by a trusted third party is desired to propose relevant shops and products to a consumer in e.g. a virtual shopping mall. A central filtering scheme is a feasible solution for several different business areas. For such an approach, knowledge on both the user and of the shops, as well as knowledge on the product domain needs to be stored in a central location in order to determine appropriate matches. This approach is used in recommender systems like Amazon and ebay [114] to recommend goods on specific domains such as books and CD s, and in shopbots or pricebots [46], as for instance BargainFinder [66]. Keyword profiling is also a popular method for ranking online sites in search engines. This amounts to contracts for charging monetary amounts for increased visibility, given specific keyword entries, e.g. [52 54]. A central or personal filtering system works well in the case of suitable and welldemarcated domains, as for instance for a book and music store. However, for a large heterogeneous marketplace with many participating shops and consumers, several complexity difficulties arise. This is due to the amount of relevant information that has to be tracked and processed by the filtering mechanism in the form of relevant upto-date knowledge of e.g.: the consumer s interest in different product domains and shop categories; the shops products, ways of doing business, and business interest; and ontologies and domain knowledge for various product categories. Also, the weighing of multiple issues like service, quality, price, and product diversity (addons and customisation of products) can be important. Besides the computational complexity problems for information processing, this requires the transfer of business information of shops towards the central system as a trusted third party. Such a practice encounters many objections by businesses, even if only product catalogues are concerned [78, 135]. In addition, a central mechanism still needs to make decisions about what to display in which order to a consumer, in a way that is reasonable to all parties: all the suppliers and consumers. A fair and general of interests (utilities) of different market parties is usually not possible, however, and concepts like Pareto-efficiency (see Def. 4.3) are used instead. Thus, central filtering mechanisms may suffer from increasing (computational) complexity as well as serious objections and obstructions from commercial parties in various sorts of business areas.

95 84 Competitive market-based allocation of consumer attention space Use of adaptive software agents We believe that the system as presented is the natural evolution of auction-based allocation systems like those currently employed by internet companies like Google (for sponsored keywords, [52]) and Overture (for banner targeting, [53]). Whereas these pre-cursor systems rely on the human factor to set essentially static prices for particular goods, the use of software agents in our system in principle allows a market-party to assess the value of each individual prospect, if desired at a very detailed level, as well as take into account real-time business-related domain knowledge and strategies. The implementation of adaptivity into the software agents allows the market for consumer attention to function more efficiently, where the targeting of potential prospects can be more precise, and changing buyer behaviour can be tracked and followed. As such, agent-assisted recommendation in competitive markets represents the next logical step for current auction-based allocation systems Related research Our work relates to the large body of research concerning market-based control [20, 23, 43]. This paradigm is essentially about controlling complex systems using a (distributed) market mechanisms for allocating scarce resources. A large number of applications exist such as the allocation of computational resources [23, 43], load balancing and climate control [145]. Our work applies the paradigm of marketbased control to generating recommendations in a distributed fashion using software agents. Related to our approach for generating recommendations is a prototype called MATE [91] (Multi-Agent Trading Environment) that performs market-matching using agent technology. In [91], merchant agents receive the profile of the consumer, and each suggests one or more products to a personal consumer agent. The personal consumer agent then filters the appropriate products and ranks the remaining products according to the customer s preferences. In this approach, selection is done on the consumer side, and significant knowledge on a product domain should be incorporated in the personal consumer agent, being a task of a central party to provide. A more recent approach by Wei et al. [142, 143] has a number of characteristics similar to CASy; they also apply a central auctioneer to shortlist the recommendations based on bids made by information providers (called recommending agents). In their approach, a reward agent determines the reward or feedback for the recommending agents based on the quality of the recommendations as perceived by the user. The rule used to calculate the reward is shown to be Pareto-efficient (i.e., maximise the social welfare) [142]. Based on this feedback, the bidding (recommending) agents update their strategy using heuristic rules. The bidding strategy

96 5.2 The design of CASy 85 proposed here, on the other hand, is more general and adapted by machine learning algorithms. 2 Also, the feedback is directly obtained via the consumers, and it is up to the supplier agent to determine the value of this feedback. 5.2 The design of CASy In this section, we present the framework of CASy (Competitive Attention-space System) for matching consumers with relevant suppliers in the case of an electronic shopping mall. We note that the framework we describe is not limited to the example of the electronic shopping mall, but can easily be extended to other domains where (pre) selection of possibilities has to be guided, like banners on more general websites, attention spaces on mobile devices, or other types of marketplaces. Instead of addressing to the case of shops only, we henceforth mainly use the more general term supplier to refer to the suppliers of goods or services. Figure 5.1: Advertisements are shown in the form of banners. The banner list is tailored towards a consumer s characteristics. When a consumer enters a shopping mall, he 3 expresses his interest for certain products and selects the business sector of his interest. The information about his interest, possibly augmented by additional knowledge, is passed on to potential suppliers in the sector. The suppliers subsequently compete against each other in an auction by placing bids to purchase one of a limited number of entries of attention space for this specific consumer. Finally, the consumer is shown the list of winning suppliers, using for instance banner advertisements. An example is depicted in figure 5.1. Software agents (see Section 1.1.3)) are used to facilitate the fine grain of interaction, bidding, and selection in CASy. For our mechanism, we have software agents for the suppliers and for the enabling intermediary: the mall manager. The model 2 We discuss results using evolutionary algorithms in this chapter. For an approach using neural networks, see [17]. 3 he stands for he or she.

86 Competitive market-based allocation of consumer attention space Consumer Consumer Mall Manager Agent Supplier Agent Supplier Agent Supplier Supplier Consumer Web Site Supplier Agent Supplier

97 86 Competitive market-based allocation of consumer attention space Consumer Consumer Mall Manager Agent Supplier Agent Supplier Agent Supplier Supplier Consumer Web Site Supplier Agent Supplier Figure 5.2: Components of the shopping mall and their interactions. of the electronic shopping mall is depicted in figure 5.2, showing both the software agents and the actual economic players in the shopping mall: the consumers and the suppliers. The participants within the shopping mall and their roles are discussed in more detail in the sections that follow Mall manager agent The Mall Manager Agent (MMA) acts as an intermediary between consumers and supplier agents. The task of the MMA is to facilitate bidding and information dissemination processes by providing the auctions and additional customer profiling services to the suppliers. Given privacy concerns, the consumer profile will not automatically be communicated in full to the suppliers, as e.g. described in Subsection Information on the consumers could be stored within the MMA for revisiting consumers, leaving open consumers who wish to remain anonymous. The MMA applies the auction: it collects the bids of the supplier agents, selects the winners, charges the selected suppliers, and enables their display. In Section we address the auction in more detail Consumers In the model of figure 5.2, the consumer directly communicates its interest and preferences to the MMA, e.g. via a web page. Note, however, that the assistance of a personal software agent for the consumer is conceivable. Preferences include the product that is being searched after and various values for the attributes of the product. The MMA can also consider information on a consumer s profile. The consumer profile consists of more generic information on the consumer. This could