Decision Strategies for Automated Negotiation with Limited Knowledge

Size: px
Start display at page:

Download "Decision Strategies for Automated Negotiation with Limited Knowledge"

Transcription

1 Decision Strategies for Automated Negotiation with Limited Knowledge Jan Richter Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Faculty of Information & Communication Technologies Swinburne University of Technology December 2011

2 to my parents

3 Abstract This thesis focuses on decision strategies in automated negotiation when only limited knowledge about the negotiation partner or environment is available. Negotiation between self-interested agents is a key mechanism in distributed and autonomous software systems which facilitates multi-stage decision-making between two or more parties that are in conflict about their goals or preferences. In such systems, when agents are competitive and act rationally, the agents do not disclose information about their decision models and preferences, and behave in various ways to achieve their goals. Because of this, the available information for the decision-making of an agent is limited as it can only be derived from the current encounter or previous interactions. Therefore, an agent needs to find a decision strategy that obtains high payoffs while at the same time reaches an agreement given this limited knowledge. Decision-making in such situations is known to be hard and, while many approaches have been proposed, most assume that agents either have sufficient or precise knowledge about their opponents in the form of empirical data, domain knowledge or the decision models of their counterparts, or have enough time to learn it during their encounters. The thesis proposes novel solutions to the above problem and, in particular, focuses on the strategic concession behaviour of agents in competitive environments. The fundamental setting considered is that of bilateral negotiation in which two agents bargain for a product or service by exchanging offers alternately until one party agrees or withdraws from the encounter. In such a setting, the work first presents and investigates two decision mechanisms, an existing heuristic-based approach and a novel decision model based on multistage fuzzy decision-making, that are suitable for situations in which an agent has only limited knowledge, and then proposes a mechanism for coordinating these strategies in more complex and realistic concurrent negotiation scenarios. The heuristic-based approach linearly combines individual decision functions to create multi-tactic negotiation strategies that can react to a range of factors such as the opponent s behaviour, time, or the state of a resource. While the advantage is that only observable information from the current encounter is required, the mixing mechanism itself and its effect on the strategic concession behaviour of the agents has not been investigated before. As the traditional linear combination can not guarantee monotonic concession curves, even when all involved tactics are monotonic and weights are static,

4 agreements can be delayed and outcomes can differ significantly. We propose new mixing mechanisms based on linear combinations of individual negotiation threads or single concessions, which guarantee monotonic concession curves for monotonic tactics in static and dynamic strategies. The second decision mechanism models the negotiation process as a multistage fuzzy decision problem in which fuzzy state transitions represent the limited knowledge of the opponent s behaviour, for example, by using only a few reference cases. This enables the use of dynamic programming algorithms in order to find the best course of actions that achieves a desired outcome. In this model, the preferences of an agent are modelled using a fuzzy goal and fuzzy constraints that also allow an agent to combine a preferred strategy with the fuzzy state transitions in order to create different strategic concession behaviours. Due to the fuzzy transition model and the ability to impose fuzzy constraints on the decision-making process, agents are able to negotiate competitively by utilizing their limited knowledge about their opponents. The coordination of negotiation strategies in concurrent bilateral encounters is demonstrated using an example scenario with one-to-many negotiations in the domain of service-oriented computing. In this scenario, a number of service level agreements need to be negotiated with service providers in order to establish a workflow-based composite service. It shows that the mechanism increases the number of compound agreements by the method of utility boundary decomposition and surplus redistribution of successfully finished negotiations, while simultaneously allowing the individual agents to use their own decision strategies for negotiation. The major advantage of the proposed mechanisms is their ability to create negotiation strategies that successfully cope with situations in which the available knowledge about the opponents and the environment is limited. The example scenario also demonstrates the applicability of the mechanisms in a more complex and realistic scenario. Both decision models and the coordination mechanism are validated experimentally. iii

5 Acknowledgements First and foremost my deepest gratitude goes to my supervisor Professor Ryszard Kowalczyk for taking me on as a PhD student and for the guidance on this research topic throughout these years. Our regular meetings and numerous discussions helped me understand the field and discover my way of doing research. He gave me encouragement and orientation in any moment of difficulty. My heartfelt thanks go to my associate supervisor Professor Matthias Klusch for his kind support and the precise feedback, which always found the core of a problem. I appreciate his way of finding arguments and good solutions, and his motivation that helped me focus on my research. Special thanks goes to Mohan Baruwal Chhetri for his insights in writing publications together, to Irene Moser for her kind support, to Bao Quoc Vo for the interesting and critical discussions that encouraged me to look outside my research topic, and to all current and former members of the Intelligent Agent Technology Group at Swinburne University of Technology. Swinburne University has kindly supported me with a scholarship during my PhD candidature, making it possible for me to concentrate my efforts fully on the research. I also extend my thanks to Professor Bogdan Franczyk, Professor Dieter Ehrenberg, Professor Josef Noll and Professor Hans-Jürgen Kaftan, with whom I enjoyed working before my PhD study, for encouraging me to do the PhD, providing references, and for always having trust in my abilities as a computer scientist. This research also benefited tremendously from the many friends whom I found during the PhD study at the University. In particular, I thank Tino Schlegel, Stefano Bernardi and Bjoern Stütz for hours spent discussing ideas and our research problems over cups of coffee at the uni or at Mario s cafe. Most importantly, I thank my wife Franziska for her love and support throughout those years, especially in the difficult times, and my son Philipp, who had to sacrifice so many playing hours with me. I am deeply thankful to my family and friends back in Germany for their motivation and encouragement, especially my parents, to whom I dedicate this thesis.

6 Declaration This thesis contains no material which has been accepted for the award of any other degree or diploma, except where due reference is made. To the best of my knowledge, this thesis contains no material previously published or written by another person except where due reference is made in the text of the thesis. Jan Richter Date

7 Publications Portions of the material in this thesis have previously appeared in the following publications: 1. J. Richter and R. Kowalczyk. New Mechanisms for Mixing Time- and Behaviourdependent Tactics in Negotiation Strategies. In Proceedings of the IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 08), volume 2, J. Richter and R. Kowalczyk. Mixing Behaviour-dependent and -independent Tactics in Multi-issue Negotiation. In Proceedings of the 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 09), J. Richter, R. Kowalczyk, and M. Klusch. Multistage Fuzzy Decision Making in Bilateral Negotiation with Finite Termination Times. In Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence, Lecture Notes in Computer Science, Springer Berlin/Heidelberg, J. Richter, M. Klusch, and R. Kowalczyk. On Monotonic Mixed Tactics and Strategies for Multi-issue Negotiation. In Proceedings of the 9th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 10), J. Richter, M. Klusch, and R. Kowalczyk. A Multistage Fuzzy Decision Approach for Modelling Adaptive Negotiation Strategies. In Proceedings of the IEEE International Conference on Fuzzy Systems, J. Richter, M. B. Chhetri, R. Kowalczyk, Q. B. Vo, M. A. Talib, and A. W. Colman. Utility Decomposition and Surplus Redistribution in Composite SLA Negotiation. In Proceedings of the IEEE International Conference on Services Computing, J. Richter, M. Baruwal Chhetri, R. Kowalczyk, and Q. Bao Vo. Establishing Composite SLA s through Concurrent QoS Negotiation with Surplus Redistribu-

8 tion. Concurrency and Computation: Practice and Experience, 2011 (accepted for publication) 8. J. Richter, M. Klusch, and R. Kowalczyk. Monotonic Mixing of Decision Strategies for Agent-based Bargaining. To appear in Proceedings of the Ninth German Conference on Multi-Agent System Technologies (MATES 11), Lecture Notes in Artificial Intelligence. Springer Berlin / Heidelberg, vii

9 Contents Abstract Acknowledgements Declaration Publications ii iv v vi 1 Introduction Research Questions Contributions Thesis Overview Background and Preliminaries Game-Theoretic Background Cooperative Bargaining Theory Non-Cooperative Bargaining Theory Negotiation Preliminaries Negotiation Model Negotiation Thread Agents Preferences over Outcomes Decision-Making in Automated Negotiation Heuristic-based Negotiation Tactics Mixing Negotiation Tactics Mechanisms for Pareto-Efficient Negotiations Learning and Reasoning in Negotiation

10 Contents Fuzzy Logic-based Approaches in Negotiation Multistage Fuzzy Decision-Making Fuzzy Decision-Making Multistage Fuzzy Decision Making in Deterministic and Stochastic Systems Application Areas for Automated Negotiation Simulation Environment and Experimental Evaluation Simulation Environment General Settings for Experiments Summary Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies Dynamic Behaviour of Multi-tactic Strategies Monotonicity of Negotiation Tactics Monotonicity of Multi-tactic Negotiation Strategies Constrained linear weighted combination Mixing based on Negotiation Threads Mixing based on Single Concessions Evaluation Experiment Settings Non-Monotonicity of Concession Curves Scenario with Small Overlap and Equal Deadlines Scenario with Small Overlap and Different Deadlines Scenario with Large Overlap and Equal Deadlines Scenario with Large Overlap and Different Deadlines Related Work Summary Multistage Fuzzy Decision-Making in Automated Negotiation Model with Fuzzy State Transitions Modelling Negotiation Strategies States and Actions Fuzzy State Transitions Fuzzy Goal ix

11 Contents Fuzzy Constraints Modelling Different Negotiation Strategies with Fuzzy Constraints Decision Algorithms Negotiation Examples Agent using Reference Cases Agent using Preferred Strategy Both Agents using Multistage Fuzzy Decision-Making Evaluation Experiment Settings Scenario with Small Overlap and Equal Deadlines Scenario with Small Overlap and Different Deadlines Scenario with Large Overlap and Equal Deadlines Scenario with Large Overlap and Different Deadlines Related Work and Discussion Summary Coordinating Strategies in Concurrent Automated Negotiations Composite Service Provisioning Definitions and Challenges Motivating Scenario of Specialized Property Search QOS Aggregation SLA Negotiation Strategic Adjustment of Boundary Values in Multi-tactic Negotiation Strategies Strategic Adjustment of Multistage Fuzzy Decision Strategies via Fuzzy Constraints Utility Boundary Decomposition and Surplus Redistribution Algorithms Evaluation Experimental Settings Scenario with Two Services Property Search Scenario Related Work x

12 Contents 5.7 Summary Conclusions Answers to Research Questions Outlook and Future Work Bibliography 170 xi

13 List of Figures 2.1 Agreement zone for a single-issue negotiation example Polynomial (left) and exponential (right) decision functions (β poly {9, 4, 2, 1, 0.5, 0.2, 0.05} and β exp {20, 9, 5, 3, 1.8, 1, 0.5, 0.2}) Fuzzy decision Multistage fuzzy decision process Interface for single-issue negotiations between two agents Offer curves for Examples 3.2 and 3.3 when using the linear weighted combination or pure tactics Outcomes for different buyer strategy parameters when using linear weighted combinations of tactics Offer and utility curves for Example 3.4 using the traditional linear weighted combination or the negotiation thread-based mixing Offer curves for Examples 3.2 and 3.3 when using the constrained linear weighted combination (compared to the traditional linear weighted combination) Offer curves for examples 3.2 and 3.3 when using the negotiation thread-based mixing (compared to the traditional linear weighted combination) Offer curves for Example 3.3 when using the concession-based mixing (compared to the traditional linear weighted combination) Client (left) and provider (right) average utilities in the one-sided without withdraw scenario (client uses different mixing mechanisms while provider always uses traditional mixing) with small overlap and equal deadlines Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with small overlap and equal deadlines

14 List of Figures 3.9 Client (left) and provider (right) average utilities in the two-sided scenario (both agents use the same mixing mechanism) with small overlap and equal deadlines Average negotiation length (left) and agreement rates (right) for the one-sided without withdraw scenario with small overlap and equal deadlines Average negotiation length (left) and agreement rates (right) for the one-sided with withdraw scenario with small overlap and equal deadlines Average negotiation length (left) and agreement rates (right) for the two-sided scenario with small overlap and equal deadlines Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with small overlap and different deadlines Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with small overlap and different deadlines Client (left) and provider (right) average utilities in the two-sided scenario with small overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the one-sided without withdraw scenario with small overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the one-sided with withdraw scenario with small overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the two-sided scenario with small overlap and different deadlines Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with large overlap and equal deadlines Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with large overlap and equal deadlines Client (left) and provider (right) average utilities in the two-sided scenario with large overlap and equal deadlines Average negotiation length (left) and agreement rates (right) for the one-sided without withdraw scenario with large overlap and equal deadlines Average negotiation length (left) and agreement rates (right) for the one-sided with withdraw scenario with large overlap and equal deadlines xiii

15 List of Figures 3.24 Average negotiation length (left) and agreement rates (right) for the two-sided scenario with large overlap and equal deadlines Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with large overlap and different deadlines Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with large overlap and different deadlines Client (left) and provider (right) average utilities in the two-sided scenario with large overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the one-sided without withdraw scenario with large overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the one-sided with withdraw scenario with large overlap and different deadlines Average negotiation length (left) and agreement rates (right) for the two-sided scenario with large overlap and different deadlines Multistage fuzzy decision process of a negotiation agent Example fuzzy goal (left) and utility function (right) for a partial overlap of negotiation intervals Example fuzzy constraint Examples for different time-dependent fuzzy constraints Inference example for expected fuzzy goal and fuzzy case constraints Example offer curves for an agent using two reference cases and case constraints Inference example for the expected fuzzy goal and fuzzy constraints of a preferred strategy Example offer curves for an agent using two reference cases and timedependent fuzzy constraints Example offer curves when both agents use the multistage fuzzy decision approach Example cases for the multistage fuzzy strategy Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with small overlap and equal deadlines Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with small overlap and equal deadlines xiv

16 List of Figures 4.13 Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with small overlap and different deadlines Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with small overlap and different deadlines Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with large overlap and equal deadlines Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with large overlap and equal deadlines Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with large overlap and different deadlines Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with large overlap and different deadlines Composite service provisioning scenario Service process fulfilling the business service of finding suitable properties Service process in tree form Negotiation example with and without smoothing function for changing boundaries Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the scenario with two services and agents using static mixed strategies with the traditional linear weighted combination Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the scenario with two services and agents using static mixed strategies with the negotiation thread-based mechanism Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the scenario with two services and agents using static mixed strategies with the concession-based mechanism xv

17 List of Figures 5.8 Average end-to-end utility (left) and agreement rate (right) without and with surplus redistribution for the scenario with two services and agents using the multistage fuzzy strategy with different time-dependent fuzzy constraints Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the property search scenario with eight services and agents using static mixed strategies with the traditional linear weighted combination Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the property search scenario with eight services and agents using static mixed strategies with the negotiation thread-based mechanism Average end-to-end utility (top) and agreement rate (bottom) without and with surplus redistribution for the property search scenario with eight services and agents using static mixed strategies with the concessionbased mechanism Average end-to-end utility (left) and agreement rate (right) without and with surplus redistribution for the property search scenario with eight services and agents using the multistage fuzzy strategy with different fuzzy constraints xvi

18 List of Tables 2.1 Parameters for strategy groups Negotiation settings for example Non-monotonicity in negotiations Aggregation functions

19 Chapter 1 Introduction Negotiation is an important form of social interaction that facilitates conflict resolution between individuals, organisations or any kind of human parties. For that reason, negotiation has been studied extensively from different perspectives and in many research areas, including the social sciences [106], economics [108, 83] and psychology [32]. The rapid development of computing systems and networks over the past few decades and the emergence of complex and large distributed systems, such as the Grid [48], the semantic web [10], service-oriented computing [40], or recently cloud computing [22], has led to a demand for new types of interactions between software components and between humans and computing systems. A key mechanism for resolving conflicts in decentralized systems composed out of computational, intelligent agents is automated negotiation. The agents, acting autonomously on behalf of their users, interact with each other in order to fulfil certain goals or objectives. Due to the characteristics of these environments in terms of their open and dynamic architecture, computational distribution, lack of global or centralized knowledge and dispersed control of resources [48], automated negotiation has been studied widely among the research fields of game theory [12], artificial intelligence [50] and agent technology [64, 89], and for its potential in many real world applications. These include e-commerce [137], resource allocation and scheduling [86], task distribution [15] and lately service composition [28]. A fundamental setting is that of bilateral negotiation, wherein two agents bargain for a product or service by alternately exchanging offers [122]. The preferences of an agent are typically represented by its utility function, including its reservation limits

20 Chapter 1. Introduction and deadlines. The utility function orders all possible outcomes by assigning a score to each value in the outcome space. In order to find optimal decision strategies and solutions, classical game-theoretic approaches make strict assumptions of complete knowledge and unbounded rationality. When the agents have complete knowledge, the preferences, or at least the beliefs about the preferences, are common knowledge. If an agent unbounded rational, it is endued with an unlimited capacity for its reasoning and decision-making. This enables the finding of optimal decision strategies and solutions, though, in large problem domains, this might become intractable [78]. More realistic assumptions in open and distributed systems, however, are that agents do not know the decision model and preferences of their opponents, i.e. that these are private information, and that they have limited resources. For that reason, research in artificial intelligence focuses on tractable and more realistic approaches for an agent s decision-making in order to find good, rather than optimal, solutions. In this field, a common distinction based on the interaction behaviour of an agent is whether an agent is competitive or cooperative. For example, in multi-issue negotiations agents can submit partial preferences to a trusted, unbiased third party such as a mediator who helps the agents to find efficient solutions, i.e. outcomes which maximize the social welfare. Such a mediator, however, might not be available or trusted in such environments. When agents are competitive they do not disclose any information about their decision apparatus or their preferences nor use a third party to mediate. The agents are rational in the sense that they aim to achieve outcomes with the highest possible score, but at the same time have a common interest in finding an agreement before an agent reaches its deadline. It is this characteristic that makes the process of negotiation in this setting of incomplete information unique, due to the conflict in which all the parties involved find themselves engaged as it incorporates competitive and cooperative elements at the same time [64]. Decision-making in such situations is known to be hard [45, 81, 89] and a large number of models have been proposed and investigated to solve this problem. These range from If-then rules and heuristic-based approaches [44] to more advanced learning and reasoning techniques [15] such as Case-based Reasoning [132], Bayesian reasoning [148], evolutionary algorithms [93], reinforcement learning [26], neural networks [103] or non-linear regression analysis [18]. To negotiate effectively and efficiently, many of the proposed decision approaches are 2

21 built upon relatively strong assumptions. For example, in experience-based approaches the agents are required to either have prior knowledge in the form of empirical data, or to learn by exploring the environment and the negotiation partner s behaviour. Probabilistic approaches assume that the agent has partial knowledge in the form of probability distributions over some of the opponent s parameters, e.g. derived from domain knowledge or historical interactions, while regression-based approaches assume a set of underlying decision models from which the opponent may choose. In many situations, these assumptions appear difficult to fulfil due to the above-mentioned characteristics of such systems, the competitive interactions or the long learning times required to gain precise knowledge. Most importantly, the decision models have to cope with the dynamic nature of the system, since the agents expose different behaviours and may enter or leave the system at any time. As a result, the knowledge an agent has at its disposal is limited. For example, it might be derived from the information available of a few past interactions, the current encounters, or some states in the environment. This thesis focuses on decision-making strategies for automated negotiation in competitive environments which are able to react to the dynamic nature of the system and the different behaviours of opponents when only limited knowledge about the negotiation partner or environment is available. Of particular interest is the strategic concession behaviour of an agent, i.e. when and how much an agent should concede in order to obtain the best outcomes. The work first presents and investigates two decision mechanisms, an existing heuristic-based approach and a novel decision model based on multistage fuzzy decision-making, that are suitable for situations in which the agent has only limited knowledge, and then proposes a mechanism for coordinating such strategies in more complex and realistic concurrent negotiation scenarios. We investigate the method of mixing heuristic-based decision functions, or tactics [46, 44], in order to create multi-tactic negotiation strategies in the absence of prior information. Despite having the advantage of being able to react to a range of different factors simultaneously, such as the opponent s behaviour, the remaining time or the state of a resource, the mixing mechanism itself and its effect on the strategic concession behaviour of an agent has not been investigated previously. The traditional method of a linear combination of offers can expose quite complex and dynamic behaviour, but poses an important problem in that it can not guarantee monotonic concession curves 3

22 Chapter 1. Introduction even in cases where all involved tactics are monotonic, i.e. they propose positive concessions, and mixing weights are static. Since this behaviour might not be desirable in many situations, alternative mixing mechanisms are proposed based on linear combinations of individual negotiation threads for each imitative tactic or single concessions which guarantee monotonic concession curves for monotonic tactics in both cases, static and dynamic. A novel decision model for an agent s negotiation strategy based on multi-stage fuzzy decision making is proposed. In this model, the agents individual preferences are expressed via fuzzy goals and constraints whilst the dynamics of the negotiation are modelled as a fuzzy Markov decision process which represents the relation between the strategic concession behaviour between the two agents involved. The offers and counteroffers of the agents correspond to state-action pairs in the negotiation process, so that individual fuzzy state transitions enable an agent to utilize limited knowledge about the concession behaviour of its opponent, for example, by using only a few reference cases. The problem of finding the best course of actions to achieve the desired outcome can then be solved via fuzzy dynamic programming. By imposing the fuzzy constraints on the decision-process, an agent is able to generate different concession behaviours depending on the chosen preferences. Furthermore, the fuzzy representation of the decision process in this negotiation context allows the application the decision strategy in many real world scenarios in which the available information about agents behaviours, preferences and constraints is imprecise. Finally, a mechanism for the coordination of negotiation strategies in one-to-many bilateral concurrent negotiations is presented by using a more realistic and complex example scenario in the domain of service-oriented computing. In this scenario, a number of service level agreements need to be negotiated with service providers in order to establish a workflow-based composite service. The mechanism uses the methods of utility boundary decomposition to derive the negotiation limits for each atomic service agent and the consequent surplus redistribution of successfully finished negotiations in order to increase the number of compound agreements. At the same time, the agents on the service level remain in control of their concession behaviour and are able to negotiate competitively. The example scenario further demonstrates the applicability of the decision-making approaches presented in this thesis. An experimental evaluation is presented to validate each of the decision mechanisms. 4

23 1.1. Research Questions 1.1 Research Questions The work presented in this thesis is driven by the following research questions in decision-making situations in which agents face the problem of limited knowledge during negotiation: Mixing mechanisms for multi-tactic negotiation strategies with static and dynamic weights guaranteeing monotonic concession curves for monotonic tactics 1. What mechanisms can generate monotonic concession curves in multi-tactic negotiation strategies when monotonic tactics are mixed using static or dynamic weights? 2. How much do outcomes differ when using the alternative mixing mechanisms compared to the traditional method, and in which scenarios can an agent improve its utility? A decision model for an agent s strategic concession behaviour in automated negotiation based on multi-stage fuzzy decision making 1. How can an agent model the negotiation process as a multistage fuzzy decision problem when only limited knowledge about the concession behaviour of the opponent is available? 2. What are the advantages and disadvantages of the proposed multistage fuzzy decision model compared to the heuristic-based or other approaches, and in which scenarios can an agent gain in utility by using this approach? A mechanism for coordinating negotiation strategies in concurrent negotiations based on utility boundary decomposition and surplus redistribution 1. How can negotiation strategies be efficiently coordinated in more complex negotiation scenarios with many bilateral concurrent negotiations? 2. Are the proposed negotiation decision-models and mechanisms applicable in real world domains such as service-oriented computing? The above research questions will be answered in Section 6. 5

24 Chapter 1. Introduction 1.2 Contributions The main contributions of this thesis are as follows: 1. Mixing mechanisms for multi-tactic negotiation strategies with static and dynamic weights guaranteeing monotonic concession curves for monotonic tactics We investigate the dynamic behaviour of heuristic, multi-tactic negotiation strategies created by linear weighted combinations, and demonstrate that non-monotonicity in the concession curve of the agents can also occur when imitative and non-imitative tactics are mixed using static weights and all tactics involved are monotonic. We discuss the possible undesirable effects which can occur as a result of this mixing technique when used in negotiation situations with limited knowledge, and propose new mixing mechanisms that solve this problem, the first based on individual negotiation threads and the second based on single concessions of each tactic, and prove that these mechanisms guarantee monotonic concession curves for monotonic tactics, the first for static and the second also for dynamic weights. An experimental evaluation validates and compares the proposed mechanisms against the traditional method. This work has been in part published in [114, 115, 112, 113]. 2. A decision model for an agent s strategic concession behaviour in automated negotiation based on multi-stage fuzzy decision making We propose a new decision model for the modelling of an agent s strategic concession behaviour in automated negotiation based on multi-stage fuzzy decision making. In this model, the agent s preferences are modelled using a fuzzy goal and fuzzy constraints, while the fuzzy state transitions are created using limited and imprecise knowledge, for example, from only few reference cases. We show how the fuzzy constraints enable an agent to impose different strategic preferences on the decision-making process in order to create different concession behaviours. The decision algorithms for the model are presented and the limitations and advantages compared to other approaches are discussed. We validate the model in a series of experiments with different strategy settings and negotiation deadlines, demonstrating that this modelling framework is able to provide utility gains in many scenarios with limited and uncertain available knowledge. This work has been in part published in [116, 111]. 6

25 1.3. Thesis Overview 3. A mechanism for coordinating negotiation strategies in concurrent negotiations based on utility boundary decomposition and surplus redistribution Using a more realistic example scenario in the domain of service-oriented computing, we present a decision mechanism which enables the coordination of negotiation strategies in complex, one-to-many bilateral negotiations with limited knowledge. In the chosen scenario, a number of agents concurrently negotiate service level agreements with service providers in order to establish a workflow-based composite service. We show that the mechanism can increase the number of compound agreements by the method of utility boundary decomposition and surplus redistribution of successfully finished negotiations while leaving the control over the concession behaviour to the individual service agents in order to enable competitive negotiations. An experiment using the example SLA-negotiation scenario demonstrates the applicability of the proposed decision-making strategies in this thesis and the coordination mechanism. This work has been in part published in [110, 109] Thesis Overview The thesis is further organised in six chapters. Chapter 2 introduces basic notions of automated negotiation and related work in the context of decision-making in single- and multi-issue negotiation, as well as various learning and reasoning models studied in this research area. A number of potential application areas and scenarios presented in the literature are pointed out. Chapter 3 investigates the dynamic aspects of heuristic, multi-tactic strategies that are suitable for decision-making situations in negotiations with limited available knowledge. It investigates the monotonic concession behaviour of such strategies when the tactics involved are mixed using static or dynamic weights, and also when tactics are of different types, such as imitative and non-imitative. The problem of the automatic and uncontrolled occurrence of non-monotonic concession curves in static cases is demonstrated, and new mixing mechanisms are presented which solve that problem based on linear weighted combinations of single concessions or individual negotiation threads of each of the imitative tactics involved. By means of descriptive examples 7

26 Chapter 1. Introduction and negotiation experiments, we show that such undesired behaviour can change the outcome or delay agreements significantly in many scenarios, and that the proposed mixing mechanisms guarantee monotonic concession curves for monotonic tactics. Chapter 4 presents the multi-stage fuzzy decision model and shows how dynamic negotiation strategies are modelled when only limited knowledge about the negotiation partner s concession behaviour is available, for example, in the form of reference cases or a few past interactions. It is shown how the preferences of an agent are modelled using a fuzzy goal and fuzzy constraints, and how the concession behaviour of an opponent is represented by the fuzzy state transitions. Different negotiation strategies are modelled using the time-dependent fuzzy constraints which allow the agent to influence the proposed course of actions of the reference cases. The solution method of fuzzy dynamic programming is presented, in addition to the decision-making algorithm for policy generation and the proposal of offers during the negotiation encounter. A number of negotiation scenarios with different deadlines, negotiation intervals and strategies validate the proposed decision model, alongside a discussion of the experimental results. Chapter 5 presents the coordination mechanism for negotiation strategies in more complex and realistic scenarios with multiple bilateral, concurrent negotiations. We illustrate an example scenario situated within a service-oriented computing environment in which a number of agents negotiate service-level agreements with service providers in order to establish a composite service while having only limited knowledge about the provider agents. The algorithms for the utility boundary decomposition and the generation of reservation limits for each atomic service agent, and the surplus redistribution of successfully finished negotiations among the remaining negotiations, are shown. Finally, an experiment demonstrates the improvement in terms of the number of compound agreements and utility gain that can be achieved by these methods and that the decision mechanisms presented in this thesis are applicable in more complex negotiation scenarios. Chapter 6 draws conclusions about the decision mechanisms presented, answers the research questions and discusses interesting future work in the area of decision-making in automated negotiation with limited knowledge. 8

27 Chapter 2 Background and Preliminaries This chapter provides the fundamentals in the area of automated negotiation and discusses the problem of strategic decision-making in bilateral negotiations. It presents the game-theoretic background and the preliminaries to negotiation, such as the negotiation model, thread, and an agent s preferences, are presented, and the basic heuristicbased decision model for the strategic concession-making of an agent in a negotiation is also introduced. Various proposed approaches in related work for an agent s decisionmaking in different negotiation situations are reviewed. It also contains a discussion of approaches in the field of Artificial Intelligence (AI) for learning and reasoning about the opponent s behaviour in order to make better decisions in negotiation situations when preferences and decision models are private, and the available information is limited or uncertain. The decision models of multistage fuzzy control that provide the basis for the multistage fuzzy decision model in negotiation examined in Chapter 4 of this thesis is introduced. Important areas of potential application for distributed negotiation are outlined, while the experimental environment for the evaluation of the proposed decision strategies in this thesis is also presented. 2.1 Game-Theoretic Background The field of game theory has laid the foundation for negotiation research from the economics perspective, and provides insight into the decision-making process of the parties, especially through its study of bargaining games. While the game-theoretic

28 Chapter 2. Background and Preliminaries research focuses on finding unique solutions that are optimal given the preferences of all agents and the set of their possible choices, it also aims at the analysis of equilibrium solutions and the strategies that lead to them. The outcomes of a bargaining game are often denoted in terms of utilities. The utility function of a player assigns a value to each possible outcome of the game specifying how much the player prefers a particular outcome. A bargaining solution is then typically represented by the set of utility pairs of both players. In order to enable the mathematical analysis, game theory makes often strict and simplifying assumptions; the most common are that players have complete knowledge about the game and its players, and that all players are unbounded rational. The assumption of common knowledge implies that all players not only know the rules of the game, but have full knowledge about the preferences of other players or at least the beliefs about their preferences. The rationality assumption means that a player selects the best strategy maximizing its payoff from the space of all strategies, given all possible interactions (and strategies of the players) and the beliefs of the player. This implies that the agents are endued with unlimited computational resources to allow such reasoning and calculation of optimal decision strategies. This intractability of many game-theoretic approaches makes their application impractical for many realistic negotiation situations [64]. Under the above assumptions bargaining theory distinguishes between cooperative and non-cooperative approaches which, depending on whether an agreement is binding or not, focus on different aspects of bargaining games, such as optimal solutions given a set of axioms, or the equilibrium strategies given the decision-making process of the parties during the game. The next sections give a brief overview of both theories. For more details, we refer to the excellent surveys and introductions on bargaining theory in relation to negotiation mechanisms in artificial intelligence in [80, 50, 64, 84] Cooperative Bargaining Theory In cooperative bargaining theory the parties are supposed to be able to discuss the situation and agree on some joint actions while the agreement is assumed to be binding for both parties [97]. This means, that an agreement is enforceable, for example, by a third party which can impose a penalty to any party deviating from the agreement. 10

29 2.1. Game-Theoretic Background Under this assumption, cooperative bargaining theory is able to focus on the space of possible outcomes of a bargaining game while leaving the process of negotiation unspecified. In other words, it abstracts away from the actual details of the game and the decision processes of all parties by defining a set of axioms which, by representing desirable properties, uniquely define a rational solution. Such an approach was first proposed by Nash [97] who defined the following axioms for a bargaining solution: (a) independence of the utility scale, (b) each outcome pair is rational and Paretoefficient, (c) independence of alternatives, and (d) both parties get the same utility in symmetric situations. The first axiom means that the final solution should not depend on the scale of the player s utility function (i.e. that the same outcome is obtained after an affine transformation of the utility function), since players may use different functions to represent their preferences. The second axiom states that a solutions is rational if it obtains for each player a utility that is at least as large as the utility at the disagreement point, and, that it is Pareto-efficient if no other solution can increase the utility for a player without making any other player worse off. The third axiom relates to alternative feasible agreements in that they are not considered if the current agreed solution is also feasible. The last axiom holds for cases where the parties have the same preference structure and, as a result, obtain the same utility. Nash proved that under these properties there is a unique Nash-bargaining solution that corresponds to a payoff pair s = (x 1, x 2 ) that maximizes the so-called Nash-product (x 1 d 1 )(x 2 d 2 ) where x 1 and x 2 are the pay-offs for agent 1 and 2, respectively, for solution s, and d 1 and d 2 are the payoffs in case of a disagreement (the conflict point). Especially Nash s third axiom of independent alternatives has been discussed controversially and, as a result, other axiomatic bargaining solutions have been proposed such as the Kalai- Smorodinski or utilitarian solution [70]. The former replaces Nash s third axiom with an axiom of individual monotonicity. This enables to use the maximum feasible utility region of both parties based on their disagreements points to construct the final solution. The latter, utilitarian solution, aims at maximizing the social welfare, i.e. the sum of the individual players utilities of an outcome. This implies that the first axioms of the independence of the utility scales no longer holds and that the players utilities are comparable. Bargaining solutions have also been considered from a more practical viewpoint of an outcomes fairness given each partner s preferences, for example, by Raiffa [108]. Besides the assumption of a binding agreement in cooperative bargaining theory, axiomatic approaches require full knowledge of the details of the game and 11

30 Chapter 2. Background and Preliminaries complete preferences of each party. This makes the rather theoretical approaches of cooperative bargaining impractical for determining solutions in more realistic negotiation situations with no common knowledge and limited rationality. It should be noted that the notion of cooperativeness also appears in relation to Paretoefficiency in bargaining situations with incomplete information and multiple issues. In such cases, the parties can aim to obtain an outcome at the Pareto-frontier, for example, by making trade-off proposals. Such approaches are especially considered in the field of artificial intelligence, which are discussed in more detail in Section Non-Cooperative Bargaining Theory In non-cooperative bargaining theory the agreement is assumed to be non-binding, i.e. it is not possible to enforce it, for example, by a third party. The players in the game make decisions independently, and, while they may be able to cooperate, any occurring cooperation is self-enforced. Non-cooperative bargaining theory therefore focuses on the negotiation process and its determining factors such as the specific rules of the game, or protocol, and the decision apparatus the parties may use. Given the protocol and the set of players strategies, the aim is then to find the equilibrium solutions that determine rational outcomes of the game. Generally, a strategy is said to be an equilibrium strategy if there is no incentive for the player to deviate from it given the strategies of all other players. Most common equilibria concepts are, for example, dominant strategies, Nash equilibrium or the sub-game perfect equilibrium. While a dominant strategy is optimal in every situation, i.e. for any of the strategies of the other players, the Nash equilibrium represents a strategy combination in which no player can gain by changing its strategy. The sub-game perfect equilibrium is a particular equilibrium concepts for tree-like, extensive form games, in which each subgame resembles a Nash-equilibrium. A fundamental setting in non-cooperative bargaining is that of dividing a surplus between two parties, which has led to the study of different protocols and games such as the Nash bargaining game, the ultimatum game, the monotonic concession or the alternating offers game [13]. In the Nash bargaining game two players simultaneously demand a share of the surplus without knowing the other player s demand. If the sum of the demands does not exceed the surplus, both players get what they requested, otherwise 12

31 2.1. Game-Theoretic Background they receive only the disagreement payoff (conflict outcome). In this game all Paretoefficient outcomes represent Nash equilibria [98] and also the case when both parties ask for the whole surplus. In the ultimatum game one player proposes a split of the surplus whereas the other player can only accept or refuse this proposal. In the latter case, none of the players gets a share. The proposer thus has more bargaining power than its partner. The game has an infinite number of Nash-equilibria but only one sub-game perfect equilibrium in which the proposing agent demands the whole surplus and the partner accepts [13]. The alternating offers game represents a multi-stage extension of the ultimatum game in that after the first player proposed a share of the surplus, the second player can refuse the offer or make a counterproposal at the next stage. Vice versa, if the first player refuses the second player s counteroffer, the first player proposes a new counteroffer at the following stage and so on. The process finishes when one party agrees or a finite deadline is reached. A version of this game for a single-issue with a finite number of alternatives and a finite deadline has been studied in [128] where the theoretical analysis is simplified by the assumption that both parties can not increase their demands. Under further assumptions of perfect rationality and complete information, optimal strategies are obtained by backward induction starting with the last stage. Rubinstein [122] presents a variant of the alternating-offer game with infinite horizon and continuous alternatives. Because, in general, the game could go on forever if no player accepts an offer, two models with different kinds of discounting are analysed, the first with a fixed bargaining cost for each period, and the second with a fixed discounting factor for each player. The discounting thus resembles how impatient a party is. It is shown that in this alternating offers game the Nash-equilibrium is too weak to identify a unique solution since every outcome of the surplus partitioning represents a Nash equilibrium but that the concept of sub-game perfect equilibrium obtains a unique solution. However, the strong assumption of complete information still holds in this game. The monotonic concession protocol represents a more restricted protocol compared to the alternative-offers game. In this protocol, the two players announce their proposals simultaneously. If both offer overlap in that they match or exceed the other agent s demand, an agreement is reached. If both proposals do not overlap, then the agents either make a concession or repeat their proposal from the previous round. If neither agent concedes, the negotiation ends and each party receives the conflict payoff. The 13

32 Chapter 2. Background and Preliminaries unique characteristic of this protocol is that each party is not allowed to make offers with a lower utility to their counter player. Because of this property, at least one player has to concede at each round or a disagreement is reached, such that the process is finite if the minimum concession is fixed and larger than zero. In order to make monotonic concessions possible the players need to have knowledge about the preferences of each other, especially in cases with multiple issues, where also the relative importance between the issues is important. The above games and protocols provide the basis for the study of a wide range of decision strategies. For example, Rosenschein and Zlotkin [121] discuss strategies using the monotonic concession protocol in terms of stability and efficiency, where a pair of strategies is considered efficient if it reaches an agreement, and stable, if it resembles a Nash equilibrium. A protocol, such as in the alternating offers game, may only define the rules of the interaction between the parties while permitting each party to choose different kinds of decision strategies. On the other hand, a protocol may also constrain a strategy in that the players have a limited range of choices in order to enforce certain outcomes. An example is the Zeuthen strategy [150] which represents a rational strategy when using the monotonic concession protocol [53]. Using this strategy, a party concedes only if it has the more to lose than the opponent in a case of an immediate negotiation failure. In other words, the party with the highest risk should concede while the amount of the concession needs to change the balance of the risk such that the opponent needs to concede next. Although, the Zeuthen strategies are proven to achieve Pareto-optimal deals [53], they are not in equilibrium as the parties have an incentive to deviate from the strategy at the last stage [121]. The bargaining procedure may also change when more than one issue is involved in the game. The issues may be negotiated simultaneously, separately, or issue-by-issue in a sequential manner. However, most of the approaches for non-cooperative bargaining suggest to negotiate issues sequentially. The bargaining games above assume that all parties have complete knowledge about the preferences of each other in order to reduce the complexity of the game for the mathematical analysis in terms of equilibrium strategies. More realistic, however, is that the parties do not know the preferences of the other players, such as the reservation values, utility functions, risk attitudes or the individual evaluation of their issues. Such games are also referred to bargaining with incomplete information, and are typically 14

33 2.2. Negotiation Preliminaries modelled by using a limited number of player types that are associated with different preference structures and beliefs that are unknown [50, 5]. Under this premise, there are basically two general approaches, mechanism design and sequential bargaining. Rather than modelling the game as a sequence of offers and counteroffers, mechanism design focuses on the solution concepts of a game in an abstract way given the private information and the space of possible outcomes. This allows to study the incentives of the players, the attainable outcomes, and to identify equilibrium solutions such as the Bayesian-Nash equilibrium [130]. Games modelled by the mechanism design approach are usually solved using mediated mechanisms [71] where the players disclose their types. The second approach, sequential bargaining, considers the dynamic process of the offer exchange between the parties when either one party or both parties have private information about their preferences. In these settings, also referred to as one-sided and two-sided incomplete information games, different equilibrium concepts are studied such as the sequential or Bayes-Nash equilibrium, for example, by Rubinstein for one-sided [123] and by Chatterjee and Samuelson [27] for two-sided incomplete information games with infinite horizons. The common assumption for all incomplete information games, however, is that the private information can be represented as a finite set of player types. The uncertainty over the particular type of a partner in the form of a players beliefs is then represented by a probability distribution over all types which is, again, common knowledge. Although this increases the complexity of the game it still makes the computation of optimal solutions possible. In many realistic negotiations, however, the agents do not know the beliefs about the other agent s preferences when they are private. As a result, research in the field of artificial intelligence (AI) aims at relaxing the strict assumptions of game theory in order to enable automated negotiation in situations that are closer to the real world. 2.2 Negotiation Preliminaries Negotiation with its broad range of characteristics, approaches and phenomena has been studied extensively from different perspectives and in many research areas including social sciences [106], economics [108, 83] and psychology [32]. There is a particular interest in the automation of the negotiation process between self-interested software agents in order to facilitate decision-making and conflict resolution in dis- 15

34 Chapter 2. Background and Preliminaries tributed and autonomous computing systems. Consequently, there is a potential for automated negotiation in many real world applications such as e-commerce, task redistribution and scheduling, resource allocation, and recently service-oriented computing. While game theory provides insights into the decision-making process of agents, its strict assumptions such as complete knowledge of preferences and beliefs, and the full rationality of players limit the application in more realistic settings. The field of artificial intelligence attempts to relax these assumptions in order to enable the design of more practical mechanisms. Despite the variety of research topics in automated negotiation, one can distinguish in general three areas with respect to the negotiation protocol, the objects under negotiation and the decision models applied for the offer proposal [64]: Negotiation protocol: The rules of the interaction between all agents involved in the negotiation are determined by the negotiation protocol. In general, it specifies what types of messages can be sent to whom and when. This includes the permissible types, or roles of participants such as the negotiators or any relevant third parties, and the valid actions they may choose during the encounter. It also defines the possible states of the negotiation and the events that may change them, e.g. the acceptance of a proposal, no more bidders, or that a negotiation is closed. Negotiation object: The negotiation object comprises the set of issues under negotiation over which an agreement is to be met. Such issues may include for example price, quality or response time. In conjunction with the protocol the negotiation object also determines the types of operations that can be performed on the object. For example, if the content of the agreement is fixed, agents can only accept or reject, whereas in the non-fixed case, agents are able to make counter-proposals in order to find a better fitting agreement. The number of negotiation issues also influences the possible action of an agent as well as the nature of the overall encounter. For example, in the case of a single issue the negotiation is competitive since a gain for one agent represents a loss for the other. In the case of multiple issues, on the other hand, agents can also cooperate in that they search for joint gains, i.e. outcomes that are closer to the Pareto frontier. In addition, it might also be allowed to change the structure of the object by dynamically adding or removing issues. In general, agents have a preference 16

35 2.2. Negotiation Preliminaries over the issues of the object and their possible outcomes, typically represented by an utility function that also defines the acceptance region of the agent. Decision-Making Apparatus: The decision-making model determines what offers and counter-offers an agent proposes in each stage of the negotiation and thus specifies the negotiation strategy. The decision models can take into account a range of factors such as the behaviour of the negotiation partner or other agents, the current time or negotiation round in the encounter, the state of a particular resource in the environment or other outside options [85]. They may further include learning and reasoning capabilities in that they utilize the agent s experience for the decision-making while exploring the other agent s behaviours. The decision model acts in line with the negotiation protocol and also depends on the type of the negotiation object. In relation to these topics some further classification are common in automated negotiation. For example, two categories generally apply for protocols: bilateral negotiation and auctions, based on the number of participants and the setting of their interactions such as one-to-one (bargaining), one-to-many (bidding) or many-to-many (double action). While in bargaining two parties, typically a buyer and a seller, exchange offers over the set of issues [122], auctions allow bidding by more than two participants, which involves request for proposals and interactions among a number of buyer(s) and seller(s). For example, in open (e.g. English or Dutch) or sealed-bid (e.g. Vickrey or First-sealed bid) auctions, many buyers compete by bidding for a product sold by an auctioneer, whereas in double actions several sellers and buyers submit offers and bids in order to find a match between them. Other scenarios include multilateral bargaining, or one-to-many and many-to-many bilateral negotiations, which, however, usually employ protocols similar to the purely bilateral case. In the bilateral context, another typical distinction is whether the negotiating agents are competitive or cooperative. Opposed to the notion of cooperativeness in game theory (whether an agreement is enforceable or not), here the behaviour of an agent is the dominant concern. For example, in multi-issue negotiations rational agents should search for win-win situations, i.e. aim at solutions that are Pareto-optimal. A solution is Pareto-optimal if and only if no agent can increase its utility by deviating from this solution without sacrificing the other s utility. This type of negotiation in which the agents are also cooperative by searching for joint gains is also referred to as integrat- 17

36 Chapter 2. Background and Preliminaries ive negotiation. In contrast, single-issue negotiations are similar to zero-sum games in game theory as the gain of one party is the loss of the other. In this situation, also called positional bargaining, the agents are only interested in increasing their own utility and are therefore competitive. This type of negotiation is also referred to as distributive. However, in most multi-issue negotiations agents are also interested in increasing their own utility. Even when Pareto-optimality can be achieved (e.g. by means of a mediator), agents need to negotiate about the solutions along the Pareto-frontier. In that sense, the behaviour of an agent in multi-issue negotiation can be cooperative and competitive at the same time. This is more discussed in detail in Section The focus of research varies depending on the importance of the above topics in the considered negotiation context. For example, mechanism design focuses on the negotiation protocol and is concerned with the types of operations that can be performed on the negotiation object while leaving out strategic behaviours of agents through their decision models. In this thesis, we focus on the decision models of an agent in bilateral negotiation, in which the agents are competitive and have no information about the decision models and preferences of their opponents. Limited knowledge may be only derived from the offers exchanged during the current encounter, from previous interactions, or the knowledge of particular states in the environment. We also consider a more complex one-to-many scenario with concurrent bilateral negotiations (cf. Chapter 5). The next sections discuss required concepts and notations in this negotiation context such as the underlying negotiation model, the negotiation thread, and the preference structures of an agent as well as the assumptions drawn in this thesis Negotiation Model This work focuses on bilateral negotiation where two agents a and b propose offers and counteroffers x tn a b and xt n+1 b a, respectively, at discrete time points t n, t n+1 T ime on a set of issues J = {1, 2,..., k} such as price or delivery time with k, n N. In the case of multiple issues, an offer x tn a b represents a vector of values rather than a single value. Similar to most of the existing work [44, 81] in bilateral negotiation the negotiation mechanism is based on Rubinstein s alternating offers bargaining protocol [122], where two agents exchange offers alternately until one party accepts or withdraws from the encounter. However, we adopt the model and notation from the 18

37 2.2. Negotiation Preliminaries service-oriented negotiation model introduced in [44] in which agents use scoring or utility functions to evaluate their opponent s proposals. If an agent a receives an offer x tn b a from agent b, agent a assesses the offer and decides whether to accept the offer, to withdraw, or to propose a new offer. The agents assesses the received offer x t n+1 a b using its utility function U a that assigns a degree of satisfaction to the offer value. If the utility value U a (x tn b a ) is higher than the potential counterproposal, i.e. the offer agent a is going to propose at the next stage, then agent a accepts b s proposal. Otherwise, counteroffer x t n+1 a b is proposed by agent a. An agent withdraws, if its deadline t a max is reached or it has no incentive to continue the negotiation, for example, when a similar agreement is already met with another provider. Based on this model [44] a participant s response to the opponent s offer is formally written as follows: Definition 2.1 (Faratin et al. [44]). Given an agent a and its associated utility function U a, a s response at time t n+1 to agent b s offer x tn b a proposed at time t n < t n+1 is defined as: response a (t n+1, x tn b a ) = withdraw(a, b) accept(a, b, x tn offer(a, b, x t n+1 b a ) a b ) if t n+1 > t a max if U a (x tn b a ) U a (x t n+1 a b ) otherwise. (2.1) The response results in one of the three specified actions withdraw, accept or offer. The actions accept and withdraw terminate the negotiation process, the former with and the latter without an agreement Negotiation Thread The sequence of offers exchanged between two agents a and b until time t k T ime is called the negotiation thread. It reflects the process of the negotiation encounter and thus represents the time series of the alternating offer proposals of both parties until the current negotiation stage. The negotiation thread is defined formally as follows: Definition 2.2 (Faratin et al. [44]). A negotiation thread between two agents a and b at time t n T ime, denoted as X tn a b, is any finite sequence of length n of the form (x t 1 a b, x t 2 b a, x t 3 a b,... ) with t 1, t 2 t n, where: 19

38 Chapter 2. Background and Preliminaries 1. t i+1 > t i, the sequence is ordered over time, 2. For each issue j, x i a b [j] Da j, where D a j = [min a j, max a i ] for quantitative issues, x i+1 b a [j] Db j with i = 1, 3, 5,..., and optionally the last element of the sequence is one of the particles {accept, withdraw}. The agents extend the negotiation thread with their alternating offer proposals until one agent accepts or withdraws from the encounter (for example when a deadline is reached). The negotiation thread is said to be active, if last(x tn a b ) / {accept, withdraw}, where last() is a function that maps the sequence X tn a b into the last element of this sequence. The thread is assumed to be common knowledge, i.e. both parties are aware of the exchanged offers from the current encounter. In many realistic situations, this sequence of exchanged offers is the only precise knowledge the agents have about their opponent Agents Preferences over Outcomes In order to make decisions during the encounter about whether to accept a proposal from the opponent or to make a new proposal, an agent needs to evaluate the received offer and its own potential counteroffers by assigning scores, or utility values, according to its preference structure. In general, a negotiation agent has a negotiation interval D a j = [min a j, max a j ] (with D a j D j ) assigned to each issue j under negotiation, which is defined by the agent s most and least preferred value. The latter is also called reservation value and specifies the point until which an agent is willing to make concessions. For instance, a client agent might want to negotiate a low price for a service with a provider agent such that its reservation value would be RVprice c = max c price, whereas it is the opposite for the provider with RV p price = minp price. The negotiation interval hence determines the range of outcomes acceptable to an agent. For multiple issues, this results in a k-dimensional space and describes the acceptance region of the agent over the range of issues. If negotiation regions of both parties intersect then the agents can possibly reach an agreement. This zone, or region, is also called the zone of agreement. Figure 2.1 illustrates an example for the agreement zone of a single-issue negotiation between two agents a and b with different deadlines. The figure also shows the offer curves of the two agents and the obtained agreement if agent b proposes the 20

39 2.2. Negotiation Preliminaries first offer. As shown, the agreement corresponds to the first offer after the two offer curves overlap. Figure 2.1: Agreement zone for a single-issue negotiation example The preference over the values within the acceptance region is defined by an agent s utility function U a : j J D a j [0, 1] (2.2) where J is the set of all issues under negotiation. The utility function orders all possible outcomes in the negotiation region by mapping the Cartesian product of the negotiation intervals of all issues to the unit interval. Given an offer, it assigns the degree of satisfaction to the offer s value within its acceptable region. A widely used method is to assume that agents have an individual scoring or utility function assigned to each issue under negotiation with Uj a : [min a j, max a j ] [0, 1] (2.3) that is monotonic increasing or decreasing over the negotiation interval. The aggregated utility over all issues is then simply given by a weighted additive utility function U a (x) = 1 j p w a j U a j (x j ) (2.4) where the weight wj a represents the relative importance of issue j to agent a with j w aj = 1. For simplicity, it is often assumed that the individual utility functions are linear. The additive utility treats each issue separately and thus assumes that all issues are independent. Despite the advantage of its simple application and straightforward 21

40 Chapter 2. Background and Preliminaries creation by an agent s user, it is often argued that in more realistic situations, agents negotiate over multiple issues that are interdependent and thus require more complex utility functions. Different types of such utility functions are proposed in the literature such as quadratic, exponential interdependent or constant elasticity of substitution [81]. Other, non-linear utility models are also explored in [60, 74]. Although computing systems are fast and enable an agent to run thousands of iterations in a very short amount of time, the encounters in automated negotiation are finite. The deadline of the agent t a max hence becomes an important decision factor and is therefore also part of the preferences and thus, private. In addition, there may be situations where an agent is interested in obtaining early agreements, or, similarly, in achieving an agreement with a smaller number of messages exchanged. For that reason, the utility function may include some form of negotiation cost that incorporate the point of time when an agreement is reached into the evaluation of the outcome. A usual method is to determine communication costs through the number of exchanged messages. For example, based on the length of the negotiation thread, Faratin et al [42] proposes the cost-adjusted utility as the difference between the intrinsic utility (e.g. (2.4) above) and a cost function C a = tanh( X a b T ), such that Ucost(x) a = U a (x) C a where T determines the rate of change of tanh(). However, this method can produce negative utility values of outcomes before an agent s deadline, which is counter-intuitive because an agent would stop negotiating if it realizes that the possible outcomes from the current stage onward obtain negative utility values. Another method, producing nonnegative utility values throughout all stages, is to use simple discount factors, such that Ucost(x, a t) = δ t U a (x) where δ is the discount factor with 0 δ < 1, and t represents negotiation rounds, or similarly, the number of messages sent by the agent. This is similar to the discount method proposed in [122]. The cost-adjusted utility values the same outcome higher when it is obtained in an earlier stage in the encounter than in a future stage. It allows the assessment of outcomes not only in terms of the agent s subjective preference, but also in terms of its timely achievement. According to the type of utility function different decision mechanisms may apply. In the case of an additive utility function, for example, issues are negotiable separately (e.g. in a sequential manner) due to the independence between all issues. Consequently, if agents use concession-making strategies, often simple linear utility models are employed to enable investigation of the effects and performance of such 22

41 2.3. Decision-Making in Automated Negotiation strategies. On the other hand, it has been shown by Ito et al [59] that strategies wellsuited for linear utility models do not cope well with non-linear utility spaces, in which issues are highly interdependent. However, although more realistic in many situations, non-additive functions are more difficult to elicit. Since the focus of this thesis is the strategic concessions behaviour of agents, we use simple utility models with linear and additive utility functions. 2.3 Decision-Making in Automated Negotiation A large range of different AI approaches for an agent s decision-making have been proposed and investigated for automated bilateral negotiation, ranging from heuristicbased decision functions to more complex learning and reasoning models. The aim of those approaches is to overcome the strict assumptions of common knowledge and rationality of players in game theory by providing decision strategies that are applicable in more realistic situations, for example, when the knowledge about the negotiation partner is limited or uncertain. In general, when an agent a generates an offer proposal x tn a b, it has to decide over the amount of concession it makes. Agent a proposes a concession if U a (x t n+1 a b ) < U a (x t n 1 a b ), i.e. the agent s own utility of its next offer is lower than the utility of its previous offer. Since the agents do not know the deadlines of other agents, the decision problem each agent faces is how much to concede at each negotiation stage in order to obtain high payoffs without failing an agreement. In multi-issue negotiation, an agent can further decide whether to make a trade-off proposal or to manipulate the set of issues. A trade-off proposal of an agent has the same utility as its previous offer (is on the same indifference curve), such that U a (x t n+1 a b ) = U a(x t n 1 a b ). Agents typically propose trade-offs that are more likely to be accepted by the opponent in order to increase the chance of an agreement. Such trade-off proposals also support the search for agreement solutions that are closer to the Pareto-frontier. When an agent adds or removes issues, it attempts to change the utility function of its counterpart by manipulating the set of issues [42]. The decision mechanisms may have a different aim depending on the number of issues and the goal of the agent. For instance, while heuristic-based concession mechanisms focus on the tractability of decision strategies used by competitive agents, mediator-based approaches or trade-off mechanisms focus 23

42 Chapter 2. Background and Preliminaries on socially optimal outcomes. The next sections review and discuss some of those approaches Heuristic-based Negotiation Tactics Heuristic-based approaches provide approximate solutions to an agent s rational decision-making. They attempt to overcome the limitation of computational intractability of many decision and reasoning models by non-exhaustively searching the negotiation space. Since in many, more realistic, negotiation settings the agents do not have common knowledge or beliefs about the preferences and decision models of each other, optimal strategies are hard to find, such that the aim of heuristic-based approaches is to obtain good rather than optimal outcomes. The realistic assumptions for this approach allow the usage in a wider range of application domains and enable the creation of a large range of different agent behaviours and architectures. A prominent approach, introduced by Faratin et al [44], is to use decision functions, or so called negotiation tactics, that enable an agent to generate offers and counter-offers at each stage of the encounter based on various factors, such as time, the state of a resource in the environment, or the concession behaviour of the opponent. As a result, this model allows to create different decision strategies which, for example, use the deadline of an agent as a decision factor, or allow for some level of adaptation to the behaviour of the negotiation partner by imitation. A tactic is usually applied for one issue and thus determines the concession behaviour of an agent for this issue. The advantages of such decision functions is their straightforward use and that they require only information that is available during the current encounter. In addition, they provide the basis for obtaining more complex concession behaviour by the simple method of mixing tactics using linear weighted combinations, which are detailed more in Section In the following, we denote a negotiation tactic of an agent a for issue j as τj a. A tactic can be interpreted as a function mapping the mental state MS a of the agent to the issue domain D j with τ a j : MS a D j. The mental state can represent different factors that corresponding to the state of knowledge an agent has about its environment. The next sections review examples for the negotiation tactics introduced by Faratin et al [44] such as the time-, resource- or behaviour-dependent tactics. 24

43 2.3. Decision-Making in Automated Negotiation Time-dependent Tactics Time-dependent tactics generate offers based on the current time in the encounter and the deadline of an agent. These tactics are completely independent from the opponent s behaviour and typically use polynomial (2.6) or exponential (2.7) monotonic decision functions to propose the next offer. x t n+1 a b = min a j + αj a (t)(max a j min a j ) max a j αj a (t)(max a j min a j ) if U a j decreasing if U a j increasing (2.5) αj a (t) = κ a j + (1 κ a j )( min(t, ta max) ) 1 t a β (2.6) max min(t, t a max) αj a (1 (t) = e t a ) β ln κ a j max (2.7) where αj a generates values between 0 and 1 which are mapped to the interval of the issue using (2.5), and β and κ a i determine the concession behaviour of the tactics and the first offer (at t = 0), respectively. Both decision functions map to values between 0 and 1 where β defines the concession behaviour of the agent, i.e. how fast the agent reaches its reservation value with regard to its deadline. In general, three types of concession behaviours are distinguished: Boulware for β < 1, the agent makes smaller concession in the beginning of the encounter while reaching the reservation value quickly towards the deadline Linear β = 1 (only for polynomial decision function) and Conceder β > 1: large concessions are proposed in the beginning of the encounter and decreasing rapidly towards the agent s negotiation deadline. The first proposes larger concessions close to the deadline while the latter proposes large concessions very fast by reaching its reservation value quickly, while κ a j [0, 1] specifies the initial offer in the agents negotiation intervals Dj a. Figure 2.2 shows the shape of the produced curves by two decisions functions for different β settings. As we can see the two functions expose different curves for similar β settings. For example, 25

44 Chapter 2. Background and Preliminaries the linear setting β = 1 holds only for the polynomial function whereas it already shows a boulware curve for the exponential one Figure 2.2: Polynomial (left) and exponential (right) decision functions (β poly {9, 4, 2, 1, 0.5, 0.2, 0.05} and β exp {20, 9, 5, 3, 1.8, 1, 0.5, 0.2}) Resource-dependent Tactics Resource-dependent tactics make offer proposals based on the amount of available resources. That is, based on the offer generating function: α a j (t) = κ a j + (1 κ a j )e resourcea (t) (2.8) the function resource a (t) measures the quantity of a resource at time t for agent a whereas resources can be of any kind such as the number N a (t) of negotiating agents, or messages exchanged during the negotiation. Since time can be regarded as a resource as well time-dependent tactics are a special type of this family of tactics Behaviour-dependent Tactics Behaviour-dependent tactics imitate the opponent s behaviour to some degree. The counterproposal is calculated proportionally based on the previous offers of the opponent given by the negotiation thread. Three types of tit-for-tat (tft) tactics are proposed [44]: relative (2.10), random absolute (2.11) and average tit-for-tat (2.12): x t n+1 a b [j] = min(max(ξt n+1 a b [j], mina j ), max a j ) (2.9) 26

45 2.3. Decision-Making in Automated Negotiation ξ t n+1 a b [j] = xtn 2δ b a [j] x t n 2δ+2 b a [j] xtn 1 a b [j] (2.10) ξ t n+1 a b [j] = xt n 1 a b [j] + (xt n 2δ b a [j] xt n 2δ+2 b a [j]) + ( 1) s R(M) (2.11) ξ t n+1 a b [j] = xtn 2δ b a [j] x tn xtn 1 a b b a [j] [j] (2.12) with random factor R(M) [0, M] and s specifying whether the value is increased or decreased by the random amount. The constraints δ 1 and n > 2δ define the applicability of each of the tactics where at least two opponent offers and the agent s last offer are needed to suggest the next counteroffer. For average tit-for-tat, δ denotes the window size rather than delay steps as in (2.10) and (2.11). If δ = 1 this tactic is similar to relative tit-for-tat, whereas larger window sizes with δ > 1 result in larger concessions. For absolute tit-for-tat in (2.11) s is zero or one if U a j increasing, respectively. is decreasing or Mixing Negotiation Tactics A common method to generate negotiation strategies with more complex concession behaviour is to mix individual negotiation tactics by a linear weighted combination [44]. In this context, an agent s strategy determines which combination of tactics at each stage during the encounter is used to generate offers and counter offers if the opponent s current offer is unsatisfactory. The weights assigned to the pure tactics may change during the encounter depending on the mental state MS t a of an agent a at time t. The mental state is a concept, which represents the state of knowledge about the environment and other agents at a particular stage, including the agents beliefs, goals or obligations. The set of all possible mental states is denoted as MS a. The change of an agent s mental state may affect a negotiation strategy by the choice of the individual tactics and the change of weights for the their mixture. The definition of a weighted counterproposal is recalled according to Faratin et al [44] as follows: Definition 2.3. Given a negotiation thread between agents a and b at time t n, X tn a b over domain D = D 1... D p, with last(x tn a b ) = xtn b a, and a finite set of m tactics T a = {τ i τ i : MS a D} i {1,...,m}, a weighted counter proposal, x t n+1 a b, is a linear 27

46 Chapter 2. Background and Preliminaries weighted combination of the tactics given by a weight matrix Γ t n+1 a b γ 11 γ γ 1m Γ t n+1 a b = γ 21 γ γ 2m (2.13) γ p1 γ p2... γ pm defined as x t n+1 a b [j] = (Γt n+1 a b T a(ms t n+1 a ))[j] (2.14) where (T a (MS t n+1 a ))[i, j] = (τ i (MS t n+1 a ))[j] with γ [0, 1] and for all issues j, m i=1 γ ji = 1. The weighted counterproposal extends the negotiation thread by appending x t n+1 a b whereby each row in the matrix represents a weighted linear combination of m tactics for one issue. In simpler terms, the next counterproposal for a particular issue j can be written as x t n+1 a b [j] = m i=1 γ ji τ ji. Different types of negotiation behaviour can be obtained by weighting a given set of tactics in different ways. The advantage of this method is that such multi-tactic negotiation strategies can incorporate and respond to a large range of factors such as the behaviour of the opponent, the agent s deadline, or the state of a resource at the same time. The choice of different tactics and the dynamic adjustment of weights enables a limited, but simple level of adaptation to the agent s environment and the other agent s behaviour during the encounter. An agent can generate complex concession behaviour by mixing simple tactics in various ways. Moreover, because most pure tactics use only observable information from the current encounter, mixed tactics are also easier to apply in more realistic negotiation situations with no common knowledge Mechanisms for Pareto-Efficient Negotiations In multi-issue negotiations, when issues are negotiated simultaneously, it is possible that a proposed solution by an agent increases the utility of at least one agent without decreasing the utility of any other agent. Such proposals eventually lead to Paretooptimal outcomes, i.e. solutions from which no agent can deviate without sacrificing the utility of any agent involved. Besides the goal of maximizing the utility of a negoti- 28

47 2.3. Decision-Making in Automated Negotiation ation outcome, rational agents should therefore also seek for such win-win solutions in multi-issue encounters as they increase the chance of an agreement. However, such Pareto-optimal solutions are hard to find in situations without common knowledge. A common method is to introduce a trusted, non-biased mediator that, acting as a third party, supports the agents in finding Pareto-optimal settlements. While many mechanisms have been proposed in that realm, most are joint gain seeking methods, such as the improving directions [38] or constraint proposal method [37]. In the first method, the agents submit only local information based on a tentative agreement in the form of their preferred directions (gradient directions) to the mediator, which then generates a new tentative agreement using the set of jointly improving directions and some fairness criteria. This process is repeated until a the agreement is Pareto-optimal. However, in many cases the agents first need to decide upon such an initial agreement as it highly influences the range on the Pareto frontier in which joint improvements are sought. In constraint proposal methods, the mediator iteratively adjusts a hyperplane going through a chosen reference point on which the agents announce their optimal alternatives at each step until a joint tangent is found and alternatives coincide. Although such methods are able to generate outcomes along the whole Pareto frontier [36] given a set of reference points, distributive negotiation is still necessary to decide on the finally chosen settlement. Other works on mediation include query learning for elicitation of utility structures [79], bidding of participants with non-linear utility spaces [59], or a simulated annealing-based mediator for intractable contract spaces with binary attribute values [73]. In many situations, however, the requirement of a mediator limits the application in distributed and decentralized systems where such a mediator may not be trusted or may simply not be available [81]. Although these approaches are able to efficiently find solutions on the Pareto-frontier, the agents still require a distributive mechanism in order to agree on a starting solution or the final Pareto settlement. To overcome the requirement of a mediator, pure decentralized methods investigate how agents should behave in order to achieve Pareto-optimality in the absence of common knowledge and any third parties. In such a setting, it is hard to even find close Pareto-optimal outcomes, and agents have only limited means to find them, for example, by iteratively making trade-off proposals that are more similar to the opponent s previous offers. In other words, when trading off issues, agents choose an offer on the same indifference curve, or iso-curve, of their current aspiration level, defined as the set 29

48 Chapter 2. Background and Preliminaries of all offers iso a (θ) {x U a (x) = θ} with the same utility value θ, that they believe is more preferable to the opponent. Faratin et al [43] uses the fuzzy similarity to choose proposals on the iso-curve that are most similar to the last offer proposed by the opponent. Although this method also functions on qualitative issues, it assumes linear utility spaces and requires a method for the estimation of the opponent s issue weights. Consequently, [29] uses kernel density estimation to estimate these weights, but requires prior knowledge for good performance. Lai et al [81] proposes a trade-off mechanism for complex utility spaces in which an agent proposes an offer on its current iso-curve with the shortest distance to its partner s previous offer. The authors show that, if both parties iteratively use this mechanism outcomes are close to the Pareto frontier, and even closer when iterations are high and agents propose a number of close offers on their iso-curve from which the opponent chooses the offer that is closest to its current aspiration level and in turn finds the closest counteroffer. Another approach is to estimate the opponent s utility structure, which has been proposed in [118, 117] where utility graphs for binary issues are updated during the encounter under the assumption that the maximal structure of possible interdependencies is known beforehand. In all of the above methods, however, the performance criteria is the achievement of (or closeness to) Pareto-optimality. Since the agents are also interested in maximizing their utility of an outcome, they still need a strategy for their concession-making along their aspiration levels or indifference curves, or to negotiate over the Pareto set (along the Pareto-frontier). To do this, the above methods usually assume simple bargaining tactics, e.g. based on the time and the agents deadline (cf. Section ), or do not consider this distributive part of the negotiation. Consequently, in order to decide when to use a concession or trade-off tactic, meta-strategies have been proposed [120, 119] which only concede to a lower aspiration level when a deadlock occurs, i.e. the utility of two consecutive offers from the opponent does not improve for an agent. Multiissue negotiations with incomplete information hence comprise cooperative elements, in terms of reaching Pareto-efficient outcomes, and competitive elements, in that the agents need a concession strategy which obtains high utilities while still reaching an agreement. In this thesis, we focus on the distributive part of negotiation by assuming that agents are competitive and do not disclose any information about their decision models or preferences. 30

49 2.3. Decision-Making in Automated Negotiation Learning and Reasoning in Negotiation Various research fields have been investigated to enable reasoning and learning of agents in automated negotiation, such as reinforcement learning [26], Bayesian learning [148], neural networks [103], regression analysis [19], case-based reasoning [93], Markov decision processes [96], evolutionary approaches [102] or fuzzy logic based approaches [76]. The main focus is on creating or finding negotiation strategies that can anticipate some of the opponents strategic parameters and adapt to the opponents behaviour in order to improve utilities of outcomes and agreement rates. Basically all approaches can be classified based on the available information and assumptions made before the negotiation starts: The agent has experience or information from past interactions, e.g. in the form of historical cases. Typically, a large set of cases is required to enable reasoning. The agent has explicit (partial) knowledge about the negotiation environment such as the domain or the negotiation partner. A general assumption is that the agent has a probability distribution over different instances of unknown parameters, for example, the reservation value. A predefined set of strategies or models is assumed from which the partner might choose its strategies. Even though agents typically utilize information gathered during the current encounter, in many scenarios they have no or only little and uncertain information about the environment and negotiation partners beforehand. However, many of the approaches require one or two of the above assumptions in order to learn or reason properly about partners. In the following we discuss briefly some popular approaches studied in this field. We also refer to the excellent surveys discussing learning and reasoning approaches in [15] and [50]. A widely applied approach is reinforcement learning where agents revise their strategies based on observed failure or success. In Q-learning [26, 103] a reward function (Qfunction) provides feedback on actions taken in order to estimate a ranking of stateaction pairs. The Q-value is updated at every negotiation round for the chosen action 31

50 Chapter 2. Background and Preliminaries a taken in state s as follows: Q(s, a) Q(s, a) + α[r(s) + γ max Q(s, a ) Q(s, a)] a where max a Q(s, a ) is expected utility of the the next state s, α is the learning rate representing the impact of the update value and r(s) is the immediate reward for state s. The factor γ specifies how much the Q-values are discounted at each stage. The learning agent therefore has to explore the dynamic environment and the partner s behaviour by performing actions which are rewarded or punished. An agent applying this mechanism is able to improve its performance by using its experience to learn what tactics and actions should be used in what situations [15]. However, the disadvantage of this approach is that the algorithm has a slow convergence to near-optimal solutions and therefore needs a large number of trials. Furthermore, the agents need to determine a balance between new actions and actions which already proved to be good. In Bayesian learning the agent beliefs about the environment and the participating agents are explicitly modelled by a probabilistic framework using Bayesian reasoning for representation and updating [148, 147]. The beliefs are represented in the form of probability distributions which are generated based on the acquired knowledge before the negotiation starts. The knowledge can be gained from various sources such as the domain (e.g. demand for a resource), previous experiences or second-hand knowledge [15]. The beliefs about the partner can even contain payoff structure, reservation values or the negotiation style [148]. Based on the conditional probabilities for occurring events such as received offers or changes in the environment the probability distributions are updated using the Bayesian rule. This can be formally written as follows. Suppose that an agent has a priori knowledge about the likelihood of a set of hypothesis H i with i = 1, 2,..., n, then, given the conditional knowledge about the probability that an event e occurs the agent s belief is updated with P (H i e) = P (H i )P (e H i ) n k=1 P (e H k)p (H k ) where P (H i e) is the posteriori probability of the hypothesis H i, P (H i ) is the priori probability, and P (e H i ) is the conditional probability that event e occurs given the hypothesis H i. The major disadvantage of this approach is that domain specific knowledge or empirical data from other player s behaviour are hardly available. Fur- 32

51 2.3. Decision-Making in Automated Negotiation thermore, the agent still needs negotiation strategies which utilize the probabilities of the belief structure. A further disadvantage of this approach is that it does not provide the agent with a strategy for the negotiation process, e.g. in the form its concession behaviour, rather than a probabilistic belief structure about particular parameters of the opponent (such as the reservation value). It can therefore also not represent models or beliefs about the negotiation partners explicit behaviour. The Bayesian inference method has also been applied in many other works on negotiation. For example, in [6] a Bayesian network models various negotiation contexts in order to aid an agent in finding the best offer that is likely to be accepted. Hindriks and Tykhonov [56] apply Bayesian learning to learn a model about issue preferences of the opponent based the offers exchanged and the assumptions that some of the preference structure and the rationality of the bidding process is known. Tesauro [133] uses Bayesian inference and combinatorial search to estimate the expected value of a negotiation based on the prior beliefs. The combinatorial search is required to search the game tree effectively for the best actions. In all of those approaches the agent beliefs in the form of prior probability distributions are assumed to be available. Neural Networks have also been investigated [103] in order to learn and predict next proposals using the series of historical offers. The approach provides good results in medium to long term deadlines as the network needs to adapt to the new negotiation context on-line. As the network needs a number of pre-training steps and initializations in order to find optimal performing weights some knowledge about the opponent s behaviour is required in advance. For that reason the adaptation of the network takes much longer in situations where the negotiation partner play strategies very different to the ones in the training set. Regression analysis aims at finding the parameters for a given model of a single or a combination of negotiation decision functions based on the negotiation thread of the current encounter. Under the assumption of a limited set of tactics or strategies the agent may also assume a number of models in order to use the best fitting one for the prediction. Brzostowski and Kowalczyk [19] applied parametric non-linear regression analysis to predict opponents behaviour for static mixed strategies with time- and behaviour-dependent tactics. In the case of polynomial decision functions next offers could be predicted after 5 to 6 rounds whereas for exponential ones at least 12 offers were necessary. Hou [57] uses the same techniques to predict pure tactics and the 33

52 Chapter 2. Background and Preliminaries author claims that reservation points and deadlines can be predicted using regression analysis. However, Brzostowski provided a formal proof in [21] demonstrating that even in the case of pure tactics, such as simple time-dependent polynomial decision functions it is impossible to derive reservation values or deadlines from the prediction. Despite the fact that this approach does not need any empirical data before the negotiation starts the disadvantages are the computationally expensive prediction and the assumption of a number of predefined models which the partner may use to specify its strategy. Case-based reasoning (CBR) in negotiation captures and reuses previous negotiation cases. One of the first to use case-based reasoning in multi-agent negotiation was Sycara [131] where negotiation is performed through proposals and goal relaxations using solutions provided by most similar cases. Wong et al uses concessions to capture episodic strategies and applies filters to find best matches of buyer and seller concessions between the cases and the current encounter [139]. Matos and Sierra apply case-based reasoning in combination with fuzzy rules where the cases are used to adjust parameters and weights of combined decision functions [93]. This requires that not only the negotiation thread is captured but also the applied negotiation strategies which in many cases inhibits the use of cases from other agents, especially if they apply different decision models. In all cases successful negotiations are added to the case base for later retrievals. Typically a large number of cases is required to obtain good results, e.g. in marketplace scenarios where different agents may expose completely different behaviours. Another approach is possibilistic case-based reasoning which is applied in [17] to predict successful negotiations for potential partners. Based on the principle that similar problems require similar solutions, similarity degrees are derived for each case with regard to the current situation in order to obtain the qualitative expected utility for each potential partner. This approach provides good results also for a small number of cases. However, this method has not been applied to update and generate negotiation strategies during a negotiation encounter. Since negotiation can be regarded as a sequential decision problem negotiation has been modelled using a Markov decision processes (MDP) in [96] and [134]. A MDP is a stochastic process with observable states, which can change on the input of actions at discrete time steps. The Markovian property assumes that the state transition of a system depends only on the current state and is independent of the history of 34

53 2.3. Decision-Making in Automated Negotiation states. The state transition is defined via a transition function giving the probability for the next state under the current state and action. Based on a reward scheme optimal policies can be calculated which provide optimal actions to be taken in a particular state. In automated negotiation the main difficulty appears to be how to define the state space and the transition matrix. Narayanan and Jennings [96] model the agent s own behaviour by defining the states in terms of resource availability, deadlines and reservation values. Depending on the opponent s offers the algorithm proposes counteroffers by considering changes in those three realms. It is shown that agreements can be achieved much faster, but only when both agents use this algorithm. Teuteberg [134] defined the state space by a finite set of predefined tactics reflecting the behaviour of the partner. During the encounter probabilities for the transition matrix are derived from the frequency of applied tactics and their changes. The major drawback is that a large number of negotiation rounds is needed to obtain sufficient empirical data for a meaningful state transitions matrix. Evolutionary Approaches provide a good means to empirically learn best negotiation strategies [102, 93, 94] as many negotiation models operate in a wide range of environments with a large number of parameters. By using genetic algorithms the process works as follows: the strategy parameters are typically encoded in the form of chromosomes, which can be interpreted as a representation of the solution. At first, a random population of candidates is generated. By evaluating a fitness function the fittest candidates are selected as parents for generating a new population. While some parents may be preserved new candidates are created by operations of cross-over and mutation. The process then starts again with the newly created population. [102] applied genetic algorithms to automated negotiation before the idea of tactics and decision functions was proposed. The definition of strategies was hence defined in a simpler way in the form of threshold decision rules. However, the outcomes were evaluated against a number of different dimensions such as joint outcomes, nearness to the efficient frontier or similarity to outcomes of human negotiations. Matos et al [94, 93] employed genetic algorithms to more advanced strategies, such as mixed strategies in [94] or different architectures for automated negotiation such as case-based reasoning or fuzzy rules in [93]. Despite the fact that genetic algorithms provide a good means for evaluation their application in real world scenarios is limited due to the necessity of searching a large part of the strategy space. 35

54 Chapter 2. Background and Preliminaries Other approaches include modelling the negotiation as a distributed constraint satisfaction problem (DCSP) or using some form of fuzzy logic. When negotiation is modelled as DSCP the negotiation domain is represented by a set of variables over which the set of constraints is partitioned between the parties. The aim is to find a solution by exchanging information in the form of offers until all constraints are satisfied. The constraints are adjusted based on all offers and therefore guide the search for the solution [77]. The classical DCSP considers constraints that can be precisely defined and fully satisfied [143] which may limit its applicability in many real-world negotiation problems, where preferences and constraints are imprecise and soft. As a result, the use of fuzzy constraints have been proposed, which is discussed among other fuzzy logic approaches in the next section Fuzzy Logic-based Approaches in Negotiation Two major approaches of fuzzy logic based reasoning have been employed for decision making in automated negotiation: fuzzy if-then rules and modelling the negotiation process as a fuzzy constraint satisfaction problem. Fuzzy if-then rules provide a flexible means to model a negotiation strategy using changes in the environment, for example the current market condition, or inputs from the negotiation partners such as their offers. In particular, the fuzzy rules typically serve different purposes in the sense that they may determine the behaviour of the agent directly, for example in form of an agent s concessions, or they may adjust parameters of an existing decision model such as the weights and the parameters of a combination of tactics considering the agent s information and its mental state [93]. For example, a fuzzy rule acting on a number of input variables x 1,..., x n representing states in the environment or the mental state of an agent can be formulated as follows [93]: Rule i : IF x 1 is A i1 AND... AND x n is A in THEN y is B i (2.15) where y is a parameter in the negotiation strategy and A i1,..., A in, B i are the linguistic numbers in the universe of discourse corresponding to particular settings of the decision strategy parameters of the agent. Matos et al [93] also showed that fuzzy rules may be employed to assist reasoning applied by other approaches such as case-based reasoning in order to adapt the proposed strategy also to a change in the environment. 36

55 2.3. Decision-Making in Automated Negotiation In [127] the fuzzy rules are used to flexibly react to changing market conditions such as different trading options, competitions and deadlines, and accordingly adjust their concession making strategies. In contrast to adjusting the strategy parameters that approach applies a fuzzy decision controller that uses the fuzzy rules to relax an agent s trading condition, i.e. its aspiration level, and hence applies different sets of rules under certain conditions, for example when an agent is under time pressure. Even though fuzzy rules provide a flexible and simple method for modelling negotiation behaviour, in many cases the modelling process requires human input or a sufficient amount of data for the initial rule generation, and is thus difficult to do automatically. This is even more so in dynamic environments where the rules have to adapt quickly and automatically to changes in the environment and different behaviours of other agents. Negotiation has been modelled as a fuzzy constraint satisfaction problem in [91, 75, 76] where constraints, preferences and objectives are represented uniformly as fuzzy constraints which are distributed among the parties. The fuzzy constraints are represented by membership functions which define the degree of satisfaction of the constraints for a particular proposed solution. For example, in [76] a fuzzy relation C j (x j ) corresponds to the set of constraints C j = {C j k } for the jth party with k = 1,..., m j such that C j (x j ) = k=1,...,mj C j k (xj ) (2.16) where is a conjunctive combination operator. By exchanging their preferred solutions according to the level of constraint satisfaction the agents iteratively relax. The fuzzy constraints can be therefore considered as fuzzy relations over all issues between the agents which are iteratively relaxed during the exchange of the preferred solutions by the parties in the form of offers [15]. The agents therefore search for an agreement which satisfies the constraints of all agents while it is guided by individual negotiation strategies of each party. In that sense the fuzzy constraint based reasoning assists this search process by ordering and pruning the search space of each party and maximizes the satisfaction level of the final agreement for all agents [76]. Another fuzzy constraint-based that also includes the fuzzy similarity to select the alternative that may be accepted by the opponents is proposed in [82, 87]. This enables to select offers based on various proposed concession strategies, which make it difficult to apply different strategy models. Luo et al [92, 90] use prioritised fuzzy constraints in order to express priority over issues and constraints such that fair deals can be found. 37

56 Chapter 2. Background and Preliminaries However, in their model the offers exchanged contain information about the particular constraints which limits the application with traditional negotiation protocols where only information about the negotiation issues is exchanged. 2.4 Multistage Fuzzy Decision-Making The process of negotiation can be considered a multistage decision process in which each agent needs to make a decision at each stage of the encounter in order to find a solution that satisfies both agents. In such encounters, an agent also needs to be able to incorporate limited knowledge about the opponent s concession behaviour, for example from a few reference cases, into its decision process while following a particular negotiation strategy at the same time. Multistage fuzzy control seems to be a good candidate to model such decision problems. Therefore, we recall in the following sections the basic concepts of fuzzy set theory and fuzzy decision making with goals and constraints, and give a brief introduction of multistage fuzzy decision models for deterministic and stochastic systems. For a more thorough introduction into the subject matter we refer to large amount of literature available, especially to [68], [144] and [7]. An excellent overview and introduction into fuzzy set theory and multistage fuzzy decision models can be found in [68] Fuzzy Decision-Making A fuzzy set A in the universe of discourse X is defined as a set of pairs A = {(µ A (x), x)} (2.17) where µ A : X [0, 1] is the membership function of A and µ A (x) determines the grade of membership of an element x X in the fuzzy set A. While in conventional set theory, the elements either belong to the set or not, elements in a fuzzy set can belong to the set to some degree specified by the membership function. The universe of discourse X is the set containing all possible elements. A fuzzy set therefore is a set of pairs containing particular elements of the universe of discourse and their degrees of membership. Similar to conventional sets, fuzzy set theory has the basic operations 38

57 2.4. Multistage Fuzzy Decision-Making of complement, intersection and union. The complement of a fuzzy set A is written as A and is defined as follows µ A = 1 µ A (x) (2.18) for each x X. The intersection A B of two fuzzy sets A and B is defined by a so called t-norm which is defined as t : [0, 1] [0, 1] [0, 1]. The most widely used t-norm is the minimum µ A B (x) = µ A (x) µ B (x) (2.19) where the operator represents the minimum operation, i.e. a b = min(a, b). A large number of other t-norms have been proposed in the literature, for example, the algebraic product µ A B (x) = µ A (x) µ B (x) or the Lukasiewicz t-norm µ A B (x) = max(0, µ A (x) + µ B (x) 1). The union of two fuzzy sets A and B is written as A B and is defined in terms of the s-norm (or t-conorm) with s : [0, 1] [0, 1] [0, 1]. The most widely used is the maximum µ A B (x) = µ A (x) µ B (x) (2.20) where the operator is the maximum operation, i.e. a b = max(a, b). Other s-norms are, for example, the probabilistic product µ A B (x) = µ A (x) + µ B (x) µ A (x) µ B (x) or the Lukasiewicz s-norm µ A B (x) = min(µ A (x) + µ B (x), 0). Another important concept in fuzzy set theory is the fuzzy relation between conventional sets. A fuzzy relation R between two non-fuzzy sets X and Y is defined in the Cartesian product space X Y : R = {(µ R (x, y), (x, y))} for each (x, y) X Y and µ R (x, y) : X Y [0, 1]. A binary fuzzy relation is a fuzzy set specifying the fuzzy membership of elements in the relation between two non-fuzzy sets. Similarly, any n-ary fuzzy relation is defined in X 1... X n. A Fuzzy composition R S combines two fuzzy relations R in X Y and S in Y Z. For example, the max-min and max-product compositions are written as µ R max min S(x, z) = max y Y [µ R(x, y) µ S (y, z)] µ R max prod S(x, z) = max y Y [µ R(x, y) µ S (y, z)] (2.21) 39

58 Chapter 2. Background and Preliminaries for each x X, z Z. The following example shall demonstrate the composition of two binary fuzzy relations. Example 2.1 Fuzzy composition Assume that X = {1, 2}, Y = {1, 2, 3} and Z = {1, 2, 3, 4} with the fuzzy relations R and S below the max-min composition R S is given by: R S = y = x = z = y = = z = x = Throughout the thesis, we use the basic intersection and union aggregations of fuzzy sets with simple operators such as and, respectively. However, other t- and s-norms can be applied depending on the context and the decision problem. A fuzzy decision problem can now be defined using a fuzzy goal and a fuzzy constraint. Assume that the set X contains the elements of the decision problem, such as actions, options,etc. a fuzzy goal is defined as a fuzzy set G with the membership function µ G : X [0, 1] that specifies the grade of membership of an option x X in the fuzzy goal. Similarly, a fuzzy constraint is defined as a fuzzy set C in the set of options X, such that µ C (x) [0, 1] determines the membership grade of a particular option x in the fuzzy constraint. Since the decision problem is typically attain C and satisfy G, the decision can be found by aggregating the two fuzzy sets. The fuzzy decision D is then also a fuzzy set in the set of options X that is the result of the aggregation : [0, 1] [0, 1] [0, 1] of G and C such that µ D (x) = µ C (x) µ G (x) (2.22) Because of the and connective in the decision problem attain G and satisfy C a t-norm aggregation should be used here, such as the minimum. The min-type fuzzy decision is then µ D (x) = µ C (x) µ G (x). (2.23) 40

59 2.4. Multistage Fuzzy Decision-Making Since the fuzzy decision is a fuzzy solution to the decision problem, we need the optimal non-fuzzy decision that is the solution that maximizes the degree of membership in the fuzzy decision. The maximizing decision x X is then defined as µ D (x ) = max x X µ D(x). (2.24) The maximizing decision is basically a defuzzification of the fuzzy decision such that the above maximum represents only a simple solution. Depending on the decision problem other more suitable methods may be used, such as the center-of-area method x = n i=1 x iµ D (x i ) n i=1 µ D(x i ). (2.25) However, for the multistage fuzzy decision problems in this thesis the maximizing decision is sufficient. The following example shall further demonstrate the fuzzy decision: Example 2.2 Fuzzy Decision Suppose that X = R, the set of real numbers, the fuzzy goal is x should much large than 5 and the fuzzy constraint is x should be about 6. Both, the fuzzy constraint and fuzzy goal are shown in Figure 2.3. The coloured area in the figure represents the mintype fuzzy decision. The set of possible options is hence in the interval [5, 10] because the membership degree of the fuzzy decision is zero outside of this interval. The maximizing decision is then x = 7.5. The value of the fuzzy decision µ D (x) [0, 1] can also be interpreted as the satisfaction level of how much the fuzzy goal and fuzzy constraint are satisfied. Intuitively, the maximizing decision has the highest satisfaction level. Similar to the decision problem with two fuzzy sets, a fuzzy decision problem can have multiple fuzzy goals and constraints written as D = C 1 C m... G 1 G n (2.26) where D is the fuzzy decision in a fuzzy environment specified by n fuzzy goals G 1,..., G n and m fuzzy constraints, C 1,..., C n. Both, fuzzy goals and constraints, are fuzzy sets in the set of options X. The maximizing decision can then be used again to find the optimal decision. In the next section we recall multistage fuzzy decision- 41

60 Chapter 2. Background and Preliminaries Figure 2.3: Fuzzy decision making for different types of dynamic systems Multistage Fuzzy Decision Making in Deterministic and Stochastic Systems We consider now decision problems that are more dynamic in that a sequence of decisions has to be found that moves a system from the current state into a desired state. The discrete time moments at which decisions are made are called stages, while the input-output relationship of the system is also referred to as system under control. In this context, the decisions are called controls or actions, where we use the latter interchangeably with decision in the following. Assume that the state space of the system is X = {σ 1,..., σ n } and the action space is U = {α 1,..., α m }. The decision process starts with an initial state of the system x 0 X in which the action u 0 U is subjected to a fuzzy constraint µ C 0(u 0 ) and applied to the system. The system then moves to the next state x 1 X at stage 1 that may be subjected to a fuzzy goal µ G 1(x 1 ). The process repeats with state x 1, where the action u 1 is subjected to the fuzzy constraint C 1 so that state x 2 X is attained and so on. Suppose that we have a deterministic system under control whose state transitions are governed by the state transition function x t+1 = f(x t, u t ) (2.27) 42

61 2.4. Multistage Fuzzy Decision-Making where x t+1, x t X and u t U with t = 0, 1,... being the discrete time points or stages. In addition, we assume that the decision process is finite and that a fuzzy goal is only imposed at the last stage N. We only consider multistage fuzzy decision models with finite termination times here because in the automated negotiation context the agents have deadlines. This means that the termination time is fixed and specified in advance. Since only one fuzzy goal µ G N (x N ) at the final stage is used, the focus of the decision process is to get the best possible state at the end of the process. The decision process under these assumptions is illustrated in Figure 2.4. The fuzzy decision Figure 2.4: Multistage fuzzy decision process determines the performance of the multistage decision process being the aggregate of the consecutive constraints at the stages and the fuzzy goal, such that D(x 0 ) = C 0... C N 1 G N, (2.28) where is the aggregation operator. The consecutively attained states are given by the state transition function 2.27 applied at each stage, i.e. x 1 = f(x 0, u 0 ) x 2 = f(x 1, u 1 ) = f(f(x 0, u 0 )), u 1 )... x N = f(x N 1, u N 1 ) = f(f(... (f(x 0, u 0 ),..., u N 2 ), u N 1 ). (2.29) 43

62 Chapter 2. Background and Preliminaries The fuzzy decision using the minimum is then given by µ D (u 0,..., u N 1 x 0 ) = µ C 0(u 0 )... µ C N 1(u N 1 ) µ G N (x N ). (2.30) where x N is uniquely determined by the initial state and the action trajectory (x 0, u 0,... u N 1 ) via the state transition function. The problem is now to find the optimal sequence of actions u 0,..., u N 1 U that maximizes the fuzzy decision: µ D (u 0,..., u N 1 x 0 ) = max u 0,...,u N 1 µ D (u 0,..., u N 1 x 0 ). (2.31) A number of algorithms have been proposed in the literature that solve this problem, such as dynamic programming, branch-and-bound, genetic algorithms and neural networks [69]. Among those dynamic programming the most widely used solution that was proposed in the seminal paper of Bellman and Zadeh [7]. For that reason, we briefly outline the the dynamic programming solution in the following and refer to [69] for a more thorough discussion of other solution approaches. Using the state transition function the above maximizing decision can also be written as µ D (u 0,..., u N 1 x 0 ) = max u 0,...,u N 1 [µ C 0(u 0 )... µ C N 1(u N 1 ) µ G N (f(x N 1, u N 1 ))] (2.32) The two right hand terms do only depend on the action u N 1 at stage N 1 and not on any previous actions. This makes the application of dynamic programming possible as the maximization can be divided into maximizing the action sequence u 0,..., u N 2 and maximizing the action u N 1. The same line of reasoning can be applied to the next term µ C N 2(u N 2 ) that depends only on action u N 2, such that the maximizing decision can be written as µ D (u 0,..., u N 1 x 0 ) = max u 0,...,u N 3 [µ C 0(u 0 )... µ C N 3(u N 3 )... max u N 2 [µ C N 2(u N 2 ) max u N 1 [µ C N 1(u N 1 ) µ G N (f(x N 1, u N 1 ))]] (2.33) This backward iteration can be repeated until u 0 and therefore represents the dynamic programming solution of this problem. Based on this iteration one can derive the 44

63 2.4. Multistage Fuzzy Decision-Making recurrence equations for the dynamic programming: µ G N i(x N i ) = max u N i [µ C N i(u N i ) µ G N i+1(x N i+1 )] (2.34) x N i+1 = f(x N i, u N i ) (2.35) where i = 0, 1,..., N and µ G N i(x N i ) can be regarded as a fuzzy goal at stage t = N i induced by the fuzzy goal at the next stage t = N i + 1. The optimal sequence of actions is therefore given by the successive maximization of actions u N i with i = 1,..., N. Since each optimal action u N i depends on the state x N i at the same stage, the solution is expressed in terms of an optimal policy function a N i : X U that assigns to each state the optimal action at stages i = 1,... N, such that u N i = a N i(x N i ). (2.36) An optimal solution only exists if there is at least one action sequence for which µ D (u 0,..., u N 1 x 0 ) > 0. The set A = {a 0,..., a N 1 } then forms the optimal action strategy. Let us assume now that instead of a deterministic system the state transitions are governed by a conditional probability function p(x t+1 x t, u t ), (2.37) where x t, x t+1 X and u t U with t = 0, 1,..., N 1. This corresponds to a Markov decision process where the fuzzy goals and constraints, imposed at the respective stages of the process, represent the fuzzy environment in which a decision is to be found given the time-invariant transition function and the fixed termination time. Similar to the deterministic system we consider the case where the final outcome at the last stage N is of most importance so that only one fuzzy goal G N is imposed. The decision problem according to Bellman and Zadeh [7] is then to find an optimal sequence of controls u 0,..., u N 1 that maximizes the probability of attainment of the fuzzy goal considering the fuzzy constraints, written as µ D (u 0,..., u N 1 x 0 ) = max u 0,...,u N 1 [µ C 0(u 0 )... µ C N 1(u N 1 ) Eµ G N (x N )]. (2.38) 45

64 Chapter 2. Background and Preliminaries For the probability of attaining the fuzzy goal Eµ G N (x N ) Zadeh s definition of the non-fuzzy probability of a fuzzy event is used. However, other definitions may be applied instead [145, 140]. The fuzzy goal µ G N (x N ) is therefore regarded as a fuzzy event in X such that the conditional probability given the action u N 1 and state x N 1 of the previous stage is given by Eµ G N (x N ) = Eµ G N (x N x N 1, u N 1 ) = x N X p(x N x N 1, u N 1 )µ G N (x N ). (2.39) Because this notion is similar to the notion of expected utility, Eµ G N (x N ) may also be called the expected fuzzy goal. Given this goal at stage N and the constraint at stage N 1, the fuzzy decision at stage N 1 selects the optimal actions for each state x N 1. Consequently, the fuzzy decision for each state at stage N 1 may be regarded as a fuzzy goal µ G N 1 at stage N 1 induced by the fuzzy goal µ G N that is used for the next iteration in order to find the optimal actions at stage N 2. The backward iteration, which is similar to the one for the deterministic system shown above, is repeated until we find all optimal actions u N 1, u N 2,..., u 0. Using (2.37) to (2.39), the dynamic programming solution for this multistage decision problem is given according to [68, 7] by the following recurrence equations: µ G N i(x N i ) = max[µ C N i(u N i ) Eµ G N i+1(x N i+1 )] (2.40) u N i Eµ G N i+1(x N i+1 ) = p(x N i+1 x N i, u N i ) µ G N i+1(x N i+1 ), (2.41) x N i+1 X for i = 1,..., N. The solution is again expressed in terms of a policy function u t = a t (x t ) with t = 0, 1,..., N 1 and A strategy. = {a 0,..., a N 1 } being the optimal action 2.5 Application Areas for Automated Negotiation The rapid development of computing systems and networks over decades has led to the emergence of more complex distributed and decentralized systems, such as the Grid [48], service-oriented computing [40] or recently cloud computing [22], which increasingly demand more intelligent and reliable interaction mechanisms between dispersed 46

65 2.5. Application Areas for Automated Negotiation software components. Automated negotiation is considered as such a key mechanism being able to resolve conflicts between self-interested software agents. Artificial Intelligence research has thus focused on the study and development of negotiation mechanisms and decision models and their practical use in more realistic situations due to their potential in many real world applications. Automated negotiation is primarily useful for systems characterized by some of the following properties: Distribution: The system is composed of a number of software entities and resources which are loosely coupled with dispersed ownership and control. Decentralization: There is no central instance managing the system or parts of it. The individual software entities do not have global knowledge, except about the underlying protocols that govern the interactions. Openness: Agents can enter and leave the system at any time. Entities have to adhere to the offered protocol(s) of the system in order to be able to interact with other entities. Dynamic: The behaviour of the individual entities of the system expose different and changing behaviours as they interact and react to signals from the environment and other entities. Autonomic behaviour: Software entities act autonomously and are able to make decisions (e.g. on behalf of their users). By doing so, they perceive their environment and utilize the available information to generate knowledge supporting their decision-making. The examples of distributed systems mentioned above share some of these properties. In the following, we give a brief overview of the proposed application areas for automated negotiation, namely electronic commerce [137], supply chain management, task distribution and scheduling, [15], resource allocation, and service composition and selection in service-oriented environments [28]. An prevalent application domain is that of e-commerce and e-markets in which negotiation agents support sellers and buyers in finding trade agreements on economic goods or services [26, 107, 54]. Different types of negotiation frameworks have been 47

66 Chapter 2. Background and Preliminaries proposed ranging from agent-mediated approaches [55] for flexible brokering and coordination of markets to bilateral negotiation in which the agents act on behalf of their users. In this context, negotiation is an efficient means to select consumers and providers, and also to avoid deadlocks as compared to fixed price systems. While a number of specific agent frameworks and architectures have been investigated for their applicability in this domain in many-to-many and one-to-many negotiation settings [102, 107], such techniques also serve as a basis for more advanced inter-organisational relationships as, for example, the formation of virtual organisations [101] or the management of supply chains. In the bilateral negotiation context, learning and reasoning models have been in particular proposed in the area of e-commerce [96, 25, 103, 102] to make agents more adaptable to changing market conditions and different buyer/seller behaviours, and to increase their performance based on the gathered knowledge through their market interactions. Similar to the e-commerce domain, multi-agent approaches with negotiation interactions are proposed for the management of supply chains among different entities and organisations [88]. While the aim is to provide infrastructures that dynamically react to changes in the supply chain and synchronize supply and demands [30], proposed mechanisms support in particular the dynamic selection of suppliers and contracts [66, 47], and the planning and scheduling of tasks [88]. For example, Jiao et al [66] present an agent-based multi-contract negotiation system for the coordination of a global manufacturing supply chain, while also providing a case study in the area of mobile phone manufacturing. The mediated negotiation of complex supply chain contracts with large numbers of issues is considered in [47], whereas in the approach of Lopes et al [88], the agents, representing supply chain activities, negotiate in a bilateral setting with each other in order to execute their tasks. Similarly, Jennings et al [65] use agents to support the negotiation between business units in order to manage tasks and resources along a business process of providing a quote to a customer for installing a delivery network for telecommunications services. The authors argue that the system is more robust and flexible towards run-time changes, context-dependent exception handling, and provisioning of resources as compared to existing workflow systems. Another application example is resource allocation with autonomous negotiation agents in a Grid computing or service-based environment. In [86], a strategic negotiation mechanism is proposed to find agreements between resource providers and consumers 48

67 2.5. Application Areas for Automated Negotiation in order to autonomously allocate and manage heterogeneous and decentralized Grid resources. Also in a Grid environment, Streitberger et al. [129] compare centralized market mechanisms such as auctions with the decentralized bargaining mechanism for the allocation of resources and show that the decentralized methods perform better when the Grid network reaches a certain threshold at the expense of a larger message count. Furthermore, it is shown by An et. al [2] that automated negotiation for the dynamic resource allocation in service-based systems with multiple buyers and sellers perform better than combinatorial auction mechanism or fixed price models, when agents are allowed to decommit from an agreement (at the cost of a penalty) and individual negotiations are carried out concurrently. In service-oriented systems a critical issue for service consumers and service providers is to effectively achieve agreements on non-functional aspects, also called the quality of service, of the service provision [67]. Moreover, when services are dynamically composed together to form complex service workflows, such service level agreements (SLAs) need to be attained in a flexible manner during run-time. Because of this, automatic negotiation has been recently proposed and applied for the negotiation of quality of service parameters, such as price, response time or throughput, in order to establish SLAs [100]. For example, a novel framework for agent-based SLA negotiation in Web service compositions is proposed in [28, 67] where the individual agents negotiate with service providers in order to select the best candidate provider for a service in the composition while considering the end-to-end constraints on the quality of service parameters. In a similar setting, Brzostowski et al [16] discuss the decision-making for agents on the level of the coordination of agents, selection of partners and negotiation strategies. Another work, focusing rather on the negotiation architecture for service markets than for service compositions is proposed in [100]. In that work, the market place supports the negotiation process with mediation based on search algorithms. However, the automatic establishment of SLAs for complex service compositions is vital for enabling the dynamic composition, selection and enactment of services given the end-to-end requirements for the overall composite service. We focus in chapter 5 on such a scenario with the end-to-end QoS negotiation for SLA establishment in composite services. This involves compound multi-party negotiations in which the composite service provider concurrently negotiates with multiple candidates for each atomic service in the composition, selecting the one that best satisfies the atomic service QoS preferences while ensuring that the end-to-end QoS requirements are also 49

68 Chapter 2. Background and Preliminaries fulfilled. Using this scenario, we also demonstrate the applicability of the negotiation strategies presented in this work. 2.6 Simulation Environment and Experimental Evaluation This section describes the simulation environment and the general setup for the experiments carried out in this work Simulation Environment The negotiation system was implemented in Mathematica It allows bilateral agent negotiation in settings of one-to-one or one-to-many and testing of all strategies and tactics presented in this thesis. To facilitate the investigation of the behaviour of negotiation strategies and tactics graphical user interfaces were created for single- and multi-issue negotiations that allow to change strategy parameters and to observe the offer and utility curves of encounters in real time (see Figure 2.5 for the single-issue interface). In addition, the negotiation threads of individual interactions can be captured and stored such that the agents can use them as reference cases, for example, for the creation of the fuzzy model of the opponent as presented in Chapter General Settings for Experiments The aim of the negotiation experiments is to evaluate the proposed negotiation mechanisms in this thesis and to test them in different environment and strategic settings. In order to provide more realistic settings we first need to simulate agents that can expose different negotiation behaviours using various strategies. A common approach is to use heuristic-based tactics and mix them to simulate different agent behaviours. Similar to [42] and [21] we choose the heuristic-based time-dependent and behaviour-dependent tactics presented in Section to create mixed strategies. The advantage of such strategies is that because of their imitative component they are able to partially adapt 1 Wolfram Mathematica: 50

69 2.6. Simulation Environment and Experimental Evaluation Show Lines Goal Agree Strategy p Trad Constr Thread Conc Multistage Strategy c Trad Process constraints Client Provider Time 25 Max 25 Min 10 Time dep. Poly Exp Β 1.2 Beh. dep. Relative Absolute Average Start 1 Steps 1 Weight Time 30 Max 30 Min 15 Time dep. Poly Exp Β 0.5 Beh. dep. Relative Absolute Average Weight 1 RV Add 0 Time 10 Goal Min 15 Case 0 Case 2 0 Match Case Add Case Set Case Delete Case maxcases 0 Figure 2.5: Interface for single-issue negotiations between two agents to the opponent s behaviour. In order to be able to distinguish between different types of mixed strategies or strategy groups, we use different types of concession behaviour of the time-dependent tactics for different strategy settings, such as conceder, linear, and boulware, and mix the them with different sets of weights, such as such small, medium or large, with imitative tactics. By this method, a mixed strategy can be classified not only based on the concession behaviour of the time-dependent tactics, but also whether it is more reactive towards the opponent using the imitative tactics based on the different weights sets. Table 2.1 shows the chosen tactics and the sets of weights. Since the polynomial and exponential decision functions expose different offer curves for similar settings, the parameter differs for the particular concession sets. A set of mixed strategies can then be created by the Cartesian product of the individual sets of 51

70 Chapter 2. Background and Preliminaries Polynomial decision function: Conceder: PC = {β β {3, 5, 7}} (Time-dependent) Linear: PL = {β β {0.8, 1, 1.2}} Boulware: PB = {β β {0.1, 0.3, 0.5}} Exponential decision function: Conceder: EC = {β β {5, 7, 9}} (Time-dependent) Linear: EL = {β β {2, 3, 4}} Boulware: EB = {β β {0.3, 0.5, 0.7}} Behaviour-dependent Absolute TFT: a = δ = 1, R(M) = 0 Absolute TFT: r = δ = 1 Weights Small: S = {γ γ {0.1, 0.2, 0.3}} Medium: M = {γ γ {0.4, 0.5, 0.6}} Large: L = {γ γ {0.7, 0.8, 0.9}} Table 2.1: Parameters for strategy groups the time-dependent behaviour-dependent and weights strategy settings. For example, a set containing the polynomial time-dependent and imitative tactics is given by ST = {P C, P L, P B} {a, r} {S, M, L} (2.42) such that the set of possible strategy groups is ST = {(P C a S), (P C a M), (P C a L), (P C r S), (P C r M), (P C r L), (P L a S), (P L a M), (P L a L), (P L r S), (P L r M), (P L r L) (2.43) (P B a S), (P B a M), (P B a L), (P B r S), (P B r M), (P B r L)} The initial letters indicate the respective group of mixed strategies. For example, PCaS denotes the strategy group containing conceder time-dependent and absolute tit-for-tat tactics mixed by small weights. Each strategy group represents a particular type of behaviour, and when agents play strategies from a particular strategy group, their behaviour covers a similar range of behaviour in the space of all strategies. Using the above method for creating mixed strategies therefore allows us to simulate a large range of different concession behaviours. Besides the negotiation strategies, the negotiation environment also strongly affects the outcome of a negotiation. The negotiation environment is specified by the negotiation intervals and the deadlines of both agents. For example, if the intervals 52

71 2.7. Summary overlap only to a small degree and the agents have different deadlines the zone of agreement is also small such that a smaller number of agreements may be achieved. We generate the negotiation intervals as follows. Given the interval for issue j of a client agent max c j = θ j + min c j, the interval of the provider agent is given by min p j = min c j + Φ j (max c j min c j) and max p j = min p j + (maxc j min c j), where Φ j is the degree of overlap between the two negotiation intervals and θ j is the size of the intervals. In order to test the negotiation strategies within different environments we choose interval settings with small and large overlaps, so that Φ j {0.33, 0.66}. As described in Section negotiations in automated software systems are supposed to be finite that makes the deadline of an agent an important decision factor. Although the deadline may be imposed by the system in that it has a pre-specified time limit for an interaction, the agents may have their own limits. For that reason, we consider both variants in this thesis, where agents have equal or different deadlines. The simulation of real time in a software system is difficult due to the large number of possible conditions of the communication channel between the agents. Without loss of generality, the time measure used for all negotiation strategies can also be based on the numbers of messages exchanged, which is a common approach in many of the research work in automated negotiation [44, 21]. In this thesis, we use negotiation rounds for all experiments in which one negotiation round consists of one offer proposal of each agent with the first offer being from the agent that made the first proposal at the beginning of the encounter. 2.7 Summary This chapter has introduced important notions in bilateral automated negotiation and has discussed related work for decision-making in negotiation from a game-theoretic and AI perspective. It has shown that one of the key problems in bilateral encounters is to decide when and how to make concessions when the decision models and preferences of all parties are private and agents have only limited information available for their decision-making derived from the current encounter or a few interactions. It has also shown that even in the multi-issue case in which joint gains are possible, and the negotiation is not entirely distributive, the agents need a decision-apparatus in order to make concessions. For that reason, the heuristic-based model in which 53

72 Chapter 2. Background and Preliminaries tractable negotiation tactics are used to make concession decisions and mixed tactics are used in order to be able to take into account a larger range of factors, such as the opponent s behaviour and the agent s deadline at the same time, has been introduced. Moreover, we have presented relevant existing work on the decision apparatus an agent can apply to model its negotiation strategy, such as reasoning and learning models, including Bayesian inference, evolutionary approaches and reinforcement learning. We have also presented fuzzy logic-based approaches for negotiation and introduced the model of multistage fuzzy decision-making, an extension of which is used in Chapter 4 to model negotiation strategies. Finally, the possible application areas studied and presented in the literature were outlined. With the aim to investigate and propose negotiation decision strategies when only limited knowledge is available, the next two chapters investigate the heuristic-based approach of mixing tactics to create more complex concession behaviour and propose a novel decision model for an agent s negotiation strategy based on multistage fuzzy decision-making. Then, Chapter 5 demonstrates how such negotiation strategies can be coordinated in a more complex and realistic negotiation scenario with concurrent negotiations. 54

73 Chapter 3 Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies This chapter presents and investigates heuristic approaches for creating multi-tactic negotiation strategies for an agent by mixing a set of pure tactics, or decision functions, at each stage of the negotiation using linear weighted combinations. This method, first proposed in [44], is able to dynamically generate complex concession behaviour when combining different types of decision functions while the individual functions typically use only the limited information available in the current encounter, such as the offers exchanged, number of available agents or the agent s deadline. For that reason, the heuristic multi-tactic negotiation strategies represent practical models for application in situations when agents have no knowledge about the other parties decision models and preferences, including their reservation values, deadlines and utility functions. When an agent changes the weights of the linear combination during the encounter it can create more dynamic negotiation strategies, which may also result in a sequence of offers that is non-monotonic. However, in static strategy settings such non-monotonic offer curves can also occur at any time as a result of the dynamic effects of an agent system in which the agents use mixed strategies involving behaviour-dependent and - independent tactics, even though all tactics individually generate offers in a monotonic manner. Such effects are often undesirable as they can delay agreements, significantly change outcomes as compared to monotonic offers curves, and may also become difficult to control due to a high sensitivity of the strategy parameters. The automatic and uncontrolled occurrence of non-monotonicity in the offer curves of static mixed

74 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies strategies with monotonic tactics is different from the case where an agent intends to produce such non-monotonic behaviour by changing its strategy parameters during the encounter, and should therefore be avoided. As such behaviour can occur dynamically, it also makes it difficult for an agent to anticipate whether the resulting effects are beneficial or not. After giving definitions of monotonic tactics, this chapter describes the dynamic effects that can result from the mixing of different types of tactics, such as imitative and nonimitative, using static or dynamic weights. It then proposes new mixing mechanisms based on individual negotiation threads for all imitative tactics involved, or based on single concessions, that avoid the undesirable dynamic effects by guaranteeing monotonic concession behaviour, the first for static and the second also for dynamic mixing weights. An evaluation also compares the mixing mechanisms with the traditional linear weighted combination in situations in which either of the parties or both use the new mixing mechanism. A number of examples throughout the chapter further illustrate the concession behaviour of the presented mechanisms. In the following sections, the terms mixed strategy and multi-tactic strategy are used interchangeably. We also refer to an individual tactic as a pure or single tactic. The concept of a mixed strategy combining multiple tactics at a particular negotiation stage is similar to that of mixed strategies in many game theoretic models. It should be noted, however, that in the context of this thesis the weights in the linear combination do not represent probabilities, but rather an agent s method of attaching importance levels to the tactics involved in terms of their contribution to the resulting concession behaviour. 3.1 Dynamic Behaviour of Multi-tactic Strategies Heuristic-based tactics represent tractable decision models for the concession-making of agents, which typically have monotonic utility functions for the issues under negotiation. Under this premise, an agent makes a concession if its utility value decreases with the new offer proposal compared to its previous offer (cf. Section 2.3) thereby trying to make the offer more attractive to the opponent. When combining different types of tactics using a linear weighted combination as shown in Section 2.3.2, an agent can create complex concession behaviour that is able to take into account a range of differ- 56

75 3.1. Dynamic Behaviour of Multi-tactic Strategies ent factors, such as its deadline, the opponents behaviour or the state of a resource in the environment. Moreover, by changing the weights for the individual tactics during the encounter, the agent can create more dynamic negotiation strategies in order to adapt the concession behaviour to different situations. However, when changing weights dynamically the sequence of proposed offers may become non-monotonic, for example, in cases where the weight is changed in favour of a more stubborn strategy. Although such behaviour may be expected for dynamic weights, non-monotonic offer curves can also occur for static weights at any time as a result of the dynamic interrelation of an agent system in which the agents use mixed strategies involving behaviour-dependent and -independent tactics, even when all tactics individually generate offers in a monotonic manner. In other words, the negotiating agents using such mixed strategies constitute a dynamic system in which non-monotonic concession behaviour can emerge even when the agents strategy settings and mixing weights are static and all involved tactics are monotonic in the sense that they propose concessions if they had been individually applied. The resulting effects of such non-monotonic behaviour are often undesirable as they can delay agreements, significantly change outcomes as compared to monotonic offers curves, and also affect the sensitivity of the strategy parameters in that a small change of a parameter may result in a sudden and large change of outcomes. Furthermore, it is often argued [108, 42] that the process of negotiation should be designed in a way that agents make concessions, or, if possible, seek for joint improvements, for example in the form of trade-off proposals, in a negotiation where the agents have monotonic utility structures (cf. Section 2.2.3). This also implies monotonic behaviour: an agent makes proposals so that its own utility of its next offer is equal or lower than its own utility of its previous offer, i.e. U a (x t n+1 a b ) U a (x t n 1 a b ). In the following, we say that agents have monotonic behaviour if they propose offers according to this principle. In single-issue negotiations the agents typically have opposing utility structures, and although the exact utility functions are unknown, an agent can easily detect when the opponent tries to increase its utility by proposing a non-monotonic sequence of offers. An agent behaving in such a way may therefore increase the risk of a withdrawal of the opponent. In single-issue negotiations, we can say that an agent a acts rational if it concedes towards the last offer of its opponent, thereby trying to increase the opponent s utility such that the sequence of its own utilities is monotonically decreasing. 57

76 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies In multi-issue negotiations, on the other hand, an offer of an agent a with a higher aggregated utility for a as compared to its previous offer can not easily be detected by the opponent as the utility structures and the importance attached to the issues are unknown to each other. Consequently, if the opponent s utility for a s last offer is lower as a s previous offers, the opponent may assume that agent a made a trade-off proposal and can therefore not detect the cause of such non-monotonic behaviour. However, in order to reach agreements faster agents should behave monotonic, i.e. propose offers such that the sequence of their own utilities of proposed offers is monotonic decreasing. It is also argued that agents behaving non-monotonic under time-constraints can be advantageous and the question whether automated negotiation should be designed in a way that monotonic behaviour is ensured is widely discussed in the research literature [138]. However, the occurrence of non-monotonicity in the sequence of proposed offers and their respective utilities can also be the result of the dynamic effects of an agent system in which two interacting agents use static mixed strategies. In such cases, the agent did not intend to produce this non-monotonic behaviour, as for example, by changing their strategy parameters, that makes it undesirable because of its possible and unexpected emergence at any time. For that reason, we investigate the non-monotonic behaviour of multi-tactic negotiation strategies in the next sections by first defining when a tactic is considered monotonic and then discussing the effects on the negotiation outcome by means of examples. Without loss of generality, we restrict the discussion to linear utility spaces in order to simplify the illustration of the dynamic effects of non-monotonic concession behaviour. However, similar effects can be observed when using multi-tactic strategies for the concession-making with other monotonic utility structures Monotonicity of Negotiation Tactics In order to enable the discussion about the interrelated dynamic concession behaviour of multi-tactic negotiation strategies we need to determine the concession behaviour of a pure tactic in terms of its monotonicity. In general, a tactic or decision function is considered monotonic if it produces a monotonic sequence of offers such that x t i+1 x t i 1 or x t i 1 x t i+1, if the utility is monotonic decreasing or increasing, respectively. This criterion can easily be applied to tactics which entirely depend on time 58

77 3.1. Dynamic Behaviour of Multi-tactic Strategies (cf. Section ). In a similar way, a tactic depending on a resource in the environment (cf. Section ) can be characterised by the state of the resource at a certain time. The monotonicity criterion can then be used to determine whether the tactic produces a monotonic offer sequence when the state of the resource changes over time. In cases where a tactic is imitative towards the opponent s behaviour to some degree the monotonicity of the opponent s sequence of offers need also to be considered in addition to the above monotonicity criterion. We then say that a pure imitative tactic is monotonic if the sequence of offers it generates is monotonic in the above sense if the sequence of offers from the opponent is monotonic as well. For example, a mirroring tactic such as absolute tit-for-tat without a random factor copies the concessions of the opponent to the same degree such that the sequence of proposed offers is monotonic increasing if the sequence of copied offers is monotonic decreasing. In order to determine whether a mixed strategy generates a monotonic sequence of offers we need to distinguish only between two general types of pure tactics: monotonic behaviour-dependent and -independent, which are formally defined as follows: Definition 3.1. Given a negotiation between agents a and b, a monotonic behaviourindependent tactic τ a j (t k ) of agent a for issue j is a function generating offers at any times t k, t i T n such that τ a j (t k ) τ a j (t i ) if U a is decreasing or τ a j (t k ) τ a j (t i ) if U a is increasing under the condition that k, i {1, 2,..., n} and k > i. Definition 3.2. Given a negotiation between agents a and b at time t n, a monotonic behaviour-dependent tactic τj a tn tn ( X a b ) generates an offer using any sequence X a b = (x t a b ) t T n where T n and T n T n = {t 1,..., t n } under the conditions that there exists at least one offer x t i b a Db j of agent b in the sequence such that τj a tn ( X a b ) τ j a ( X t n 2 a b ) if the sequence of opponent s offers (xt b a ) t T n and U a is monotonic decreasing or τj a tn ( X a b ) τ j a ( X t n 2 b a ) if the sequence of opponent s offers (xt b a ) t T n and U a is monotonic increasing. As mentioned above, the first definition represents tactics that depend on a particular resource which state may change over time. We denote this class of tactics with τ j,time for issue j. In the simplest case the tactic may depend on time or the number 59

78 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies of negotiation rounds. For instance, the polynomial and exponential time-dependent decision functions proposed by Faratin et al [44] represent such tactics as they generate offers in a monotonically decreasing or increasing manner. However, in the case of a resource-dependent tactic the resource may diminish and increase over time such that a monotonic sequence of offers can not be guaranteed. A behaviour-dependent tactic according to Definition 2 uses some of the historical offers from the opponent to propose counteroffers and preserves a monotonic offer sequence as long as the opponent s sequence of offers is monotonic as well. We denote the class of such imitative tactics as τ j,beh. For instance, the imitative tit-for-tat tactics from [44] shown in Section fulfil this definition. However, once non-monotonicity is introduced by one partner it can in turn cause a non-monotonic offer sequence of the opponent depending on the degree of how much the concessions are copied. Nevertheless, if monotonic tactics are mixed together, non-monotonic behaviour can emerge even when both agents apply monotonic tactics as we investigate in the next section Monotonicity of Multi-tactic Negotiation Strategies This section investigates the non-monotonic behaviour of negotiation agents using mixed strategies, with static and dynamic weights. Intuitively, non-monotonic behaviour can occur when an agent changes its strategy, e.g. the mixing weights, during the encounter. The emergence of non-monotonic behaviour can also be observed when imitative and non-imitative tactics are mixed by a linear weighted combination without the agent changing its strategy, i.e. even in the case of static strategy settings and mixing weights. A simple example shall demonstrate this: Example 3.1 Non-monotonic concession behaviour for a single issue Assume a single-issue negotiation between two agents a and b at time t n where agent a applies a mixed strategy with static weights specified by γ, and one time-dependent tactic τ a time(t n+1 ) and one imitative tactic. The imitative tactic is a simple (absolute) titfor-tat tactic (cf. Section ) that copies the concession behaviour of the partner: τbeh a (xt n 2 b a, xt n 1 a b, xtn b a ) = xt n 2 b a xtn given by b a + xt n 1 a b. The next offer of agent a s is hence x t n+1 a b =γ τ a time(t n+1 )+(1 γ) τ a beh(x t n 2 b a, xt n 1 a b, xtn b a ) (3.1) 60

79 3.1. Dynamic Behaviour of Multi-tactic Strategies Assume further that, given an ongoing negotiation with the thread (..., x t n 2 b a, xt n 1 a b, xtn b a )=(..., 30, 10, 20), (3.2) agent a s next time-dependent proposal is τ time (t n+1 ) = 11. With a mixing weight of γ = 0.5, the next counteroffer is x t n+1 a b = = If agent b replies with a comparatively small concession x t n+2 b a = 19 and agent a s next time-dependent proposal is τ time (t n+3 ) = 12, then agent a s response is lower than its previous offer and thus non-monotonic with x t n+3 a b = = In the above example, agent a proposes an offer with a higher utility for herself than its previous offer as a result of the mixed strategy. This non-monotonic behaviour occurred even though the sequence of opponents offers is monotonic and all involved tactics in the mix are monotonic according to definitions 1 and 2 as they individually propose concessions. If both agents have imitative tactics in their mix a non-monotonic sequence of offers is likely to be copied to some degree reproducing non-monotonicity in the sequence of opponent s offers and vice versa. In addition, if the agents have opposing utility functions, a non-monotonic utility sequence of one agent then also causes a non-monotonic utility sequence of the partner s offers. To answer the question of why and when this non-monotonicity occurs in such scenarios we need to investigate if a static mixed strategy can guarantee monotonic offer sequences in static cases with monotonic tactics. In order to so, it is sufficient to use a simple mixed strategy with two tactics, one behaviour-dependent and the other behaviour-independent, in a single-issue negotiation. Suppose that a buyer agent b uses the mixed strategy while negotiating with a seller agent s about a price and the following conditions hold: a) Agent b uses monotonic tactics τtime b and τbeh b according to definitions 1 and 2: τtime(t b n+1 ) τtime(t b n 1 ) and τbeh b (xt n 2 s b, xt n 1 b s, xtn s b ) = xt n 2 s b xtn s b + xt n 1 b s b) The opponent s proposes offers in a monotonic sequence: x tn s b xt n 2 s b The next offer for agent b using the static mixed strategy is written as follows: x t n+1 b s = γb τ b time(t n+1 ) + (1 γ b )(x t n 2 s b xtn s b + xt n 1 b s ) (3.3) 61

80 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies with t n, t n+1 time. After transforming the above equation the concession of the buyer agent is given by x t n+1 b s xt n 1 b s = γb (τtime(t b n+1 ) x t n 1 b s ) + (1 γb )(x t n 2 s b xtn s b ). (3.4) In order to see if the mixed strategy produces a monotonic offer sequence the client concession has to be greater than or equal to zero, i.e. x t n+1 b s xt n 1 b s 0, such that γ b (x t n 1 b s τ time(t b n+1 )) (1 γ b )(x t n 2 s b xtn s b ). (3.5) This condition, however, can not be guaranteed because it depends not only on the tactics of the agent but also on the concession of the opponent in relation to its previous offer. Therefore, the agent s imitative tactic is not independent from the other non-imitative tactic since it uses the last offer of the current negotiation thread which is a result of the mixed strategy rather than the pure imitative tactic. Example 3.1 demonstrated this situation in which agent b s concession x tn b a xt n+2 b a = = 1 was smaller than the difference x t n 1 a b τ time(t a n+1 ) = = 2, so that 0.5(3.5) (1 0.5)(1). As we can see, the occurrence of non-monotonic behaviour depends on a number of factors such as the agent s mixing weights, the opponent s amount and change of concessions, as well as the agent s behaviour-independent tactics. In a similar way the condition at which non-monotonicity occurs can be found for other combinations of tactics. For example, using relative tit-for-tat instead of the above absolute tit-for-tat tactic the condition is given by 1 γ b τ b time(t n+1 ) x t n 1 b s (1 γ b ) xt n 2 s b. (3.6) x tn s b The automatic occurrence of non-monotonic concession behaviour can result in a number of undesirable effects, such as delayed or failed agreements, varying outcomes, compared to monotonic offer sequences, and a high sensitivity of the strategy parameters. In the following, we demonstrate such effects by means of some examples, which will also be used in the subsequent sections for a comparison between the mixing mechanisms in the next sections and the traditional linear weighted combination. Example 3.2 Mixed strategies with large agreement zone We assume a single-issue negotiation between a buyer b and a seller s with the in- 62

81 3.1. Dynamic Behaviour of Multi-tactic Strategies tervals min b = 10, max b = 25, and min s = 15, max s = 30, and equal deadlines t b max = t s max = 20. The buyer applies a mixed strategy with one time-dependent and one imitative tactic, whereas we consider two cases for the seller in which either a pure tactic or a mixed strategy is applied. The settings for the agent s strategies are as follows: Buyer: mixed strategy γ = 0.3; time-dependent tactic: polynomial β = 0.5; imitative: absolute tit-for-tat δ = 1, R(M) = 0. Seller: mixed strategy γ = 0.4; time dependent: polynomial β = 4; imitative: absolute tit-for-tat δ = 1, R(M) = 0. Figure 3.1a shows the offer curves for the cases where the seller uses either the pure time-dependent tactic or the mixed strategy. The agreement is slightly delayed in both cases due to the non-monotonic offer curve produced by the mixed strategy of the buyer, compared to situations in which the buyer applies the individual tactics only. This seems counter-intuitive, because the expected offer curve should indeed lie between the offer curves of the pure tactics applied. In the case where the opponent also applies a mixed strategy with imitative tactics, the non-monotonic behaviour is reciprocated. The seller s mixed strategy copies the negative concessions of the buyer and thus reproduces the non-monotonicity in its sequence of offers. The offer curves in Figure 3.1a of Example 3.2 show that non-monotonic offer curves can occur in simple negotiation scenarios with static mixed strategies. However, in that example, the buyer s non-monotonic behaviour results in a outcome with a higher utility gain for the buyer since the seller s strategy is static (time-dependent) and does not change with different behaviours. On the other hand, in single-issue negotiations, such behaviour may increase the risk of a failed agreement due to a withdrawal of the opponent, or, at least, the opponent might also change its concession behaviour. The following example illustrates the effect of the produced non-monotonic behaviour for the case where the two agents have a smaller zone of agreement. Example 3.3 Mixed strategies with small agreement zone In this example, the buyer and the seller use the same settings for the mixed strategy, imitative and time-dependent tactics as in the previous Example 3.2, but with a smaller overlap of the negotiation intervals with min b = 10, max b = 25, min s = 15, and max s = 30, and different deadlines t s max = 15 and t b max = 20. The resulting overall agreement zone is hence smaller than in the previous example. Figure 3.1b shows the 63

82 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies offer curves for the cases where the seller uses either the pure time-dependent tactic or the mixed strategy. In both cases, no agreement can be obtained due to the the non-monotonic offer curves produced by the mixed strategy of the buyer (a) Example 3.2 with seller using pure time-dependent tactic (left) or mixed strategy (right) (b) Example 3.3 with seller using pure time-dependent tactic (left) or mixed strategy (right) Linear weighted combination Pure time-dependent Pure imitative Figure 3.1: Offer curves for Examples 3.2 and 3.3 when using the linear weighted combination or pure tactics It is important to note that in both examples above the offer curve produced by the buyer s mixed strategy approaches the offer curve of the applied pure time-dependent tactic towards the end of the negotiation (cf. Figure 3.1). This seems counter-intuitive, since the buyer s mixing weight is γ = 0.3, i.e. in favour of the imitative tactic, so that the outcome of the mixed strategy should, in fact, be closer to the outcome of the pure imitative tactic. Therefore, the linear weighted combination of tactics does not represent a true mix of both imitative and non-imitative tactics in this scenario. Another effect in mixed strategy scenarios is that the strategy parameters may become highly sensitive as a result of the dynamic interrelation between the two agents in the 64

83 3.1. Dynamic Behaviour of Multi-tactic Strategies Β Γ x (a) Outcomes for buyer strategy settings β [0.01, 4] and γ [0, 1] x Γ (b) Outcomes for buyer strategy settings β = 0.8 and γ [0, 1] (Seller: mixed strategy γ = 0.25; time dependent: polynomial β = 1; imitative: relative tit-for-tat δ = 1 / Buyer: mixed strategy γ [0, 1]; time-dependent: polynomial β = 1; imitative: absolute tit-for-tat δ = 1, R(M) = 0) Figure 3.2: Outcomes for different buyer strategy parameters when using linear weighted combinations of tactics sense that little changes in the settings of one agent may result in a sudden and significant change in the negotiation outcome. In such cases, the described dynamics of the system makes it difficult for an agent to control such a mixed strategy and to determine whether a strategy preforms good or not. Figure 3.2 shows the outcome range for different mixing weights and concession settings in the case of a time-dependent tactic in the mix for a negotiation example in which both agents use mixed strategies. In this example, the outcome range is almost similar for all buyer s mixed strategies using mixing weights of 0.6 and higher, whereas below this value the outcome may suddenly change even for small changes. This high sensitivity makes it difficult for an agent to apply such strategies in real world scenarios, because it does not depend solely on the agent s parameters, but also on the opponent s settings, which are private information. In single-issue negotiations, the non-monotonic concession behaviour also results in a non-monotonic utility curve of the agent. In a similar manner, such effects can be observed in multi-issue negotiations, which we demonstrate by the following example. Example 3.4 Non-monotonic utility for multiple issues Assume a negotiation between a buyer and a seller about two issues with static strategy settings shown in Table 3.1. Figure 3.3 shows the offer curves and utility curves of both agents for two different mixing weights of the provider for issue 1. The offer curve of 65

84 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies Buyer Agent Seller Agent Issue 1 min 1 =10, max 1 =25, w 1 =0.7 min 1 =15, max 1 =30, w 1 =0.5 Mixed Strategy (γ 1 = 0.3): Mixed Strategy (γ 1 {0.1, 0.12}): τ 1,time : polynomial, β 1 = 5 τ 1,time : polynomial, β 1 = 1 τ 1,beh : absolute tft, δ 1 =1, R 1 =0 τ 1,beh : absolute tft, δ 1 = 1, R 1 =0 Issue 2 min 2 = 20, max 2 = 40, w 2 = 0.3 min 2 = 30, max 2 = 50, w 2 = 0.5 Mixed Strategy (γ 2 = 0.4): Mixed Strategy (γ 2 = 0.2): τ 2,time : polynomial, β 2 = 2 τ 2,time : polynomial, β 2 = 0.3 τ 2,beh : absolute tft, δ 2 =1, R 2 =0 τ 2,beh : relative tft, δ 2 = 1 Table 3.1: Negotiation settings for example 3.4 the seller change rapidly when the seller changes its mixing weight for issue 1 by a small amount and becomes non-monotonic (Figure 3.3a and 3.3b). As both agents use imitative tactics in their mix and apply the traditional linear weighted combination, the non-monotonic behaviour of one agent is reproduced by the other. As a consequence, the offer and utility curves of both agents become non-monotonic, with the result that the agreement is delayed. Figure 3.3 also illustrates that the agreement is delayed in comparison to a monotonic mixing mechanism (negotiation thread-based) that is introduced in Section 3.2. In situations, where the agents have different deadlines this behaviour might also result in a failed agreement. The example demonstrates also the high sensitivity of the parameters in such scenarios that makes it difficult for an agent to find suitable strategy parameters as the outcome utility may change significantly for slightly different settings. As shown in Figure 3.3b and 3.3d, the seller and buyer utility changes from U s = 0.26 and U b = 0.26 to U s = 0.4 and U b = 0.16, respectively, when the seller changes its weight γ1 s from 0.12 to 0.1. The multi-issue negotiation example above demonstrates that a non-monotonic utility sequence for the agent s own offers can occur at any time when using static mixed strategies with imitative and non-imitative tactics. Since the strategy parameters of the opponent are private information, it is difficult for an agent to anticipate such behaviour and detect if the effects are beneficial or not. Further, it should be noted that the uncontrolled occurrence of the non-monotonic concession behaviour in static strategy settings is different from the case in which an agent chooses to behave non-monotonic, e.g. by changing its mixing weights, and should therefore be avoided. 66

85 3.1. Dynamic Behaviour of Multi-tactic Strategies Issue Issue (a) Buyer s and seller s offer curves for issue 1 and 2 and seller s mixing weight γ 1 = (b) Buyer s and seller s aggregated offer utilities for seller s mixing weight γ 1 = 0.12 Issue 1 30 Issue (c) Buyer s and seller s offer curves for issue 1 and 2 and seller s mixing weight γ 1 = 0.1 U b t U s (d) Buyer s and seller s aggregated offer utilities for seller s mixing weight γ 1 = 0.1 Linear weighted combination Negotiation thread-based mixing t Figure 3.3: Offer and utility curves for Example 3.4 using the traditional linear weighted combination or the negotiation thread-based mixing 67

86 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies (a) Example 3.2 with seller using pure time-dependent tactic (left) or mixed strategy (right) (b) Example 3.3 with seller using pure time-dependent tactic (left) or mixed strategy (right) Linear weighted combination Constrained linear weighted combination Figure 3.4: Offer curves for Examples 3.2 and 3.3 when using the constrained linear weighted combination (compared to the traditional linear weighted combination) Constrained linear weighted combination The question remains of how the occurrence of non-monotonic concession behaviour in static and dynamic mixed strategies can be avoided. Intuitively, a simple min- or max-constraint could be applied to the next offer proposal, such that x t n+1 a b = max(x t n+1 min(x t n+1 a b, xt n 1 a b, xt n 1 a b ) if U a is increasing a b ) if U a is decreasing,. (3.7) This ensures that the agent s own utility does not increase compared to its previous offer. However, the offer curve may then rapidly change to linear, so that the agent proposes the same offer over a long time period, which may also increase the risk of 68

87 3.2. Mixing based on Negotiation Threads the opponent s withdrawal. This is demonstrated in Figure 3.4 which shows the offer curves for Examples 3.2 and 3.3 when the constraint is applied to the mixed strategies in comparison to the offer curves without the constraint. In all cases, the offer curve of the mixed strategy with a constraint applied approaches the offer curve of the mixed strategy without the constraint towards the end of negotiation, and leads to similar outcomes. The constrained linear weighted combination still does not represent a true mix of both imitative and non-imitative tactics in this scenario. For these reasons, we present in the next sections two alternative mixing mechanisms that produce monotonic offer and utility sequences in multi-tactic negotiation strategies based on individual negotiation threads for each imitative tactic involved or single concession. 3.2 Mixing based on Negotiation Threads To calculate the imitative tactics in mixed strategies using the traditional mixing method the last offer in the current negotiation thread is used. The imitative part of the strategy does therefore not represent an individually applied behaviour-dependent tactic. Another intuitive method is to use the last offers of each imitative tactic involved in the mix. This can be interpreted as using individual negotiation threads X tn a b k denotes the k th behaviour-dependent tactic τ jk ( X tn a b [j, k] where [j, k]) for issue j. As a result, offers from all imitative functions have to be stored in order to be used in the calculation of next proposals. Formally, the linear weighted combination of tactics can now be written as follows: x t n+1 a b [j] = l γ ji τ ji (t n+1 ) + i=1 m k=l+1 γ jk τ jk ( tn X a b [j, k]) (3.8) where m and l denote the total number and the number of behaviour-independent tactics, respectively. Unlike the traditional mixing method described in Section this method can be regarded as a true linear weighted combination of tactics in which all involved tactics are independent from each other. Theorem 1. The mixing mechanism using individual negotiation threads for each behaviour-dependent tactic results in a monotonic offer curve if monotonic tactics from definitions 1 and 2 are used with static weights for all tactics. 69

88 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies Proof Let X tn a b be the negotiation thread at time t n with x tn b a [j] being the last offer and x t n+1 a b [j] being the next counteroffer of agent a for issue j, then, according to tn Definition 1 and 2 γ jk τ jk ( X a b [j, k]) γ jk τ jk ( X t n 2 a b [j, k]) and γ ji τ ji (t n+1 ) γ ji τ ji (t n 1 ) if U a is decreasing and all γ ji, γ jk 0. Since each term of the sum in (3.8) at t n is larger than the corresponding term of the sum at t n 2 it follows that x tn+1 a b x tn 1 a b. The same line of reasoning can be followed for an increasing utility function U a. Taking the example 3.1 from Section we can calculate the offer sequence using the thread-based mechanism as follows. Given the thread (..., x t n 2 b a, xt n 1 a b, xtn b a ) = (..., 30, 10, 20) agent a s next offer is similar to the traditional mixing method with γ τtime(t a n+1 )+(1 γ) τbeh a (xt n 2 b a, xt n 1 a b, xtn b a ) = = After b s next offer of 19 the agent uses for the imitative part of the mixed strategy the imitative offer 20 from the previous step instead of the actual offer 15.5, so that the imitative part is = 21 instead of The new offer of agent a is then given by x t n+3 a b = = 16.5 which is larger than the agent previous offer and therefore results in a monotonic offer sequence. Figure 3.5 shows the offer curves produced by the negotiation thread-based mechanism for Examples 3.2 and 3.3 in comparison to the offer curves of the traditional linear weighted combination. In Example 3.2 an agreement is reached at an earlier time when using the thread-based mechanism, because the offer curve is now closer to the offer curve of the imitative tactic, if it had been individually applied. This seems intuitive as the buyer s mixing weight is 0.3 (and therefore in favour of the imitative tactic). As a result, the offer curves make larger concessions due to the partially copied conceder time-dependent tactic used by the seller, which also leads to agreements in Example 3.3, compared to the traditional mixing method (cf. Figure 3.5b). Figure 3.3 shows the monotonic offers curves and the resulting monotonic utility sequence when both agents use this mixing mechanism for Example 3.4. The offer curves of the negotiation thread-based mechanism also do not rapidly change for small changes in the strategy settings, compared to the traditional mixing method. An agreement is reached at an earlier time with a changed outcome favour of the seller. The system of agents using this mixing mechanism does not expose the dynamic effects as described in Section However, the mechanism does not force the agent to propose offers 70

89 3.2. Mixing based on Negotiation Threads (a) Example 3.2 with seller using pure time-dependent tactic (left) or mixed strategy (right) 35 x t (b) Example 3.3 with seller using pure time-dependent tactic (left) or mixed strategy (right) Traditional linear weighted combination Pure time-dependent Negotiation thread-based Pure imitative Figure 3.5: Offer curves for examples 3.2 and 3.3 when using the negotiation threadbased mixing (compared to the traditional linear weighted combination) in a monotonic manner. For instance, if the opponent still proposes offers in a nonmonotonic sequence, an imitative tactic in the mix might still copy it to some degree. The agent may choose to strictly ensure monotonicity by applying a constraint C to the imitative tactic: C(τ jk ( X tn a b [j, k]), xt n 1 [j, k]) (3.9) where C min if U a decreasing and C max if U a increasing. Although the negotiation thread-based mechanism truly mixes the pure imitative and non-imitative tactics in the mix according to their weights, the individual imitative threads used by this method do not represent the actual negotiation thread. This seems counter-intuitive as the offer curve and the outcome of the individually applied imitative tactics might indeed be different from the mixed strategy. Therefore, we propose another mixing mechanism based on combining single concession in the next section. 71 a b

90 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies 3.3 Mixing based on Single Concessions Instead of mixing the individual offers generated by each tactic in the linear weighted combination, one can use single concessions. Since the decision functions of most heuristic negotiation strategies generate offers rather than concessions, such as the tactic shown in Section 2.3.1, a concession-based form needs to be derived first. For example, the concession proposed by the time-dependent functions in Section can be written as t n+1 τ ji = τ ji (t n+1 ) τ ji (t n 1 ) (3.10) where i is a time-dependent tactic according to Definition 3.1. In the case of a behaviourdependent decision function the concession can be expressed as the difference between the behaviour-dependent offer to be proposed and the last proposed offer of the agent, t n+1 τ jk = τ jk ( X tn a b [j]) xt n 1 a b [j] (3.11) with k being a imitative tactic according to Definition 3.2 (cf. imitative tactics in Section ). For example, in the case of the absolute tit-for-tat the concessionbased is written as t n+1 τ abs tft = x t n 2δ b a xt n 2δ+2 b a + ( 1) s R(M) (3.12) with δ, s and R(M) being the same as in Eq. (2.11) in Section In a similar way any decision function might be expressed in the concession-based form such that the linear weighted combination of tactics in the concession-based form is given by x t n+1 a b [j] = xt n 1 a b [j] + l γ ji t n+1 τ ji + i=1 m k=l+1 γ jk t n+1 τ jk (3.13) with m and l denoting the total number and the number of behaviour-independent tactics respectively. In order to use concessions at least two offers of the opponent are necessary. Any of the former mechanisms can be used for initial offers as they propose the same offers in the first round. Concessions for behaviour-independent tactics are, since they do not depend on opponent offers, the difference τ ji (t n+1 ) τ ji (t n 1 ) between the calculated offer at t n+1 and the previous individual offer at t n 1. For the imitative tactic we can not follow the same line of reasoning because, as described 72

91 3.3. Mixing based on Single Concessions in the previous section, the last offer of the individually applied imitative tactic is unknown. However, suppose that the agent changed its strategy to the pure imitative tactic at time t n+1 the last offer is still be x t n 1 a b by τ jk ( X tn a b and hence the next offer is given [j]). We can hence calculate the behaviour-dependent concession by the difference between the proposed imitative offer and the last offer of the agent. This approach provides monotonic offer curves similar to the negotiation thread-based mixing and also avoids non-monotonic aggregated utilities over time. The major advantage, however, is that a monotonic sequence of utilities is also never introduced if the agent changes weights for tactics dynamically. This can be proven as follows: Theorem 2. The mixing mechanism based on single concessions of pure tactics results in a monotonic offer curve (and therefore preserves a monotonic sequence of utilities) if monotonic tactics from Definitions 1 and 2 are used by both parties. Proof Let X tn a b be the negotiation thread at time t n with x tn b a being the last offer and x t n+1 a b being the next counteroffer of agent a then according to Definition 1 the behaviour-independent concession τji(t a n+1 ) τji(t a n 1 ) is always greater zero if U a is increasing. The offer proposed by the pure behaviour-dependent tactics τjk a tn ( X a b [j]) for issue j is greater than the previous offer x t n 1 a b [j] if monotonic tactics from Definition 2 are used and the opponent never introduces non-monotonicity. The behaviourdependent concession τjk a tn ( X a b [j]) xt n 1 a b [j] is therefore always greater zero. For all weights γ i, γ k 0 follows that each term of the sum in Eq. (3.13) is greater zero and hence x tn+1 a b [j] xtn 1 a b [j]. The same line of reasoning can be followed for an increasing scoring function U a. For example, we can calculate the offer sequence in Example 3.1 from Section using the concession-based mechanism as follows. Given the thread (..., 30, 10, 20) agent a s next offer is similar to the traditional mixing method even though the agent uses the individual concessions of each tactic, such that γ τ a time(t n+1 ) + (1 γ) τbeh a (xt n 2 b a, xt n 1 a b, xtn b a ) = (11 10) = After b s next offer of 19 agent a using the concessions for the mix generates a new offer x t n+3 a b = = 16.5 which is larger than the agent previous offer and represents a monotonic offer sequence. The result is also similar to the negotiation thread-based method. 73

92 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies (a) Seller using pure time-dependent tactic Traditional linear weighted combination (b) Seller using mixed strategy Concession-based mixing Figure 3.6: Offer curves for Example 3.3 when using the concession-based mixing (compared to the traditional linear weighted combination) The concession-based mixing mechanism produces similar offer curves as the negotiation thread-based mechanism for the Example 3.2 with a single issue and Example 3.4 with multiple issues. For that reason, we do not show the offer curves here (see Figure 3.5 and Figure 3.3 for the offer curves of the thread-based mechanism). However, in Example 3.3 the concession-based mixing obtains an earlier agreement for the case where the seller uses a mixed strategy with a similar outcome to the thread-based mixing. This is shown in Figure 3.6b. The concession-based mechanism also does not expose the high sensitivity of strategy parameters in the examples above. In contrast to the thread-based mixing, this mechanism needs no separate negotiation threads and produces monotonic offer curves even for dynamically changing weights. Therefore, an agent can change its strategy dynamically during the negotiation encounter and ensure that it never introduces a non-monotonic sequence of offers. Like the previous method the mechanism does not force the agent to propose offers in a monotonic manner because an involved imitative tactic may still copy a non-monotonic sequence of offers of the opponent. The agent can strictly avoid such imitation by applying a constraint C to each imitative concession in (3.13) written as C(τ jk ( X tn a b [j]) xt n 1 a b [j], 0) (3.14) where C min if U a decreasing or C max if U a increasing. In the next section, we evaluate the mixing mechanisms investigated in this chapter. 74

93 3.4. Evaluation 3.4 Evaluation This section presents the results of a comparative evaluation of the discussed mixing mechanisms in this chapter with respect to their non-monotonic concession behaviour and the respective effects in different bilateral negotiation settings Experiment Settings In order to enable an analysis of the individual concession behaviour of mixed strategies we consider negotiations about a single-issue (for example price) between a client c and a provider p and both agent using mixed strategies. Because the number of possible mixes of tactics is infinite, we restrict the evaluation to an example mix of two tactics, one behaviour- and one time-dependent, for each agent with static weights throughout the encounter. The tactics chosen are the polynomial decision function and absolute tit-for-tat, such the the set of strategy groups as detailed in Section is generated for the different settings and mixing weights as follows: ST = {P C, P L, P B} {a} {S, M, L} (3.15) The absolute tit-for-tat tactic is chosen because it is a symmetrical tactic, i.e. has the same behaviour regardless whether a buyer or seller applies it, and is also independent of the scale of the negotiation interval (cf. Section ). The polynomial decision function is a good choice as the concession behaviour can be clearly classified into the groups conceder, linear and boulware. This simplifies the interpretation of the experimental results. The different mixing mechanisms are compared against each other in a setting where the provider randomly selects a strategy of the set ST while the client plays a particular strategy group from ST. This is similar to playing a particular strategy group by the client against the average of all strategies in the set ST by the provider. Furthermore, it is interesting to see how the agents perform if both use the same mixing mechanism or only one agent uses the monotonic mixing mechanisms. For that reason, we distinguish between two types of scenarios, one-sided and two-sided, where in the former the provider uses the traditional linear weighted combination of tactics while the client applies the different mixing mechanisms, and in the latter both agents use the same mechanism for each strategy group. In addition, we 75

94 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies also consider an agent s more rational behaviour in that it withdraws from the negotiation if it detects a non-monotonic concession behaviour of the opponent for more than two negotiation rounds. Since the non-monotonicity only occurs in the one-sided scenarios, we only consider such rational behaviour for the one-sided case, and we call the two scenario variants one-sided without withdraw or one-sided with withdraw scenario. The performance of the mixed strategies is measured using the average intrinsic utility U a with a {c, p}, the negotiation length t n and the agreement rate A (in %). The agents employ linear utility functions which enables the direct measurement of the effects of the different mixing mechanisms in terms of their influence on the concession behaviour. We use bar chart diagrams to illustrate the performance of the individual mixing mechanisms (with the small dotted bars on top representing the standard deviation). In each diagram a group of bars represents one strategy scenario, where the different bars depict the mixing mechanism from left (light) to right (dark) as follows: 1 - Linear weighted combination of tactics 2 - Constrained linear weighted combination 3 - Negotiation thread-based mechanism 4 - Concession-based mechanism As described in Section we focus on scenarios with more realistic settings. That means that agents have only partial overlap of their negotiation intervals, i.e. that the zone of agreement is either small or large (Φ {0.33, 0.66}). In addition, the agents typically do not know their opponents deadlines as it part of their preferences, but an agent system may also have a system-specific deadline for their negotiation interactions. Because of this, we distinguish between scenarios with equal or different deadlines in the evaluation. In the evaluation the negotiation environment settings are as follows: Client: t c max {20, 25, 30, 35, 40}, min c {10}, max c {25} Provider: t p max {20, 25, 30, 35, 40}, min p { Φ Φ {0.33, 0.66}}, max p {min p + 15} 76

95 3.4. Evaluation Since it also important to see how often non-monotonic concession curves occur when both agents use the traditional mixing mechanism we first consider a simple example scenario and provide the percent of negotiations in which non-monotonicity occurred as well as the degree of non-monotonicity measured in terms of utility. As a result, the following types of negotiation scenarios are considered: Non-monotonicity of concession curves Small overlap and equal deadlines Small overlap and different deadlines Large overlap and equal deadlines Large overlap and different deadlines It should be noted that, as described in Section 2.3, in multi-issue negotiations the decision strategies using the heuristic-based tactics for the concession-making of an agent are typically applied either for each issue individually or along the indifference curves according to the agent s utility function, potentially in combination with a tradeoff mechanism [120]. Therefore, it is sufficient to focus the evaluation on a single issue. In addition, the space of possible combinations of mixed strategies for a number of issues and their respectively used tactics becomes intractably large and is thus difficult to evaluate. However, an initial evaluation for some examples can be found in [113]. The following sections present the experimental results and their discussion for the settings described above Non-Monotonicity of Concession Curves Before comparing the different mixing mechanisms we are interested in when and to what degree non-monotonic behaviour emerges in static mixed strategies using the traditional linear weighted combination of tactics. In order to demonstrate this, we choose an example scenario with equal deadlines and large overlap of negotiation intervals (Φ = 0.33). Table 3.2 illustrates the rate (%) of negotiations where non-monotonic offer curves occurred in the case of both agents applying the traditional linear weighted combination of tactics for a particular strategy group from ST. The numbers below 77

96 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies p / c PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCaS 5 % 4 % 4 % 32 % 85 % 53 % 37 % 98 % 83 % PCaM 25 % 26 % 31 % 56 % 100 % 62 % 60 % 100 % 70 % PCaL 14 % 19 % 17 % 58 % 86 % 51 % 60 % 96 % 84 % PLaS 61 % 79 % 57 % 0 % 0 % 0 % 33 % 72 % 60 % PLaM 22 % 78 % 64 % 0 % 0 % 0 % 4 % 24 % 35 % PLaL 17 % 46 % 40 % 0 % 0 % 0 % 21 % 45 % 33 % PBaS 89 % 86 % 90 % 0 % 0 % 0 % 0 % 0 % 0 % PBaM 88 % 100 % 90 % 0 % 15 % 10 % 0 % 0 % 0 % PBaL 64 % 72 % 87 % 11 % 7 % 0 % 0 % 0 % 0 % Table 3.2: Non-monotonicity in negotiations the rate correspond to the maximum variation in terms of non-monotonicity occurred which is given as a utility measure for the provider (top) and client (bottom). As we can see, the dynamically emerging non-monotonic behaviour in static strategy settings is not a negligible side-effect in negotiation. In 38 % of all negotiations the agents expose non-monotonic offer curves. We can observe that in such scenarios the variation of non-monotonicity is higher in the case of oppositional applied time-dependent tactics in the mix, such as conceder against boulware. This corresponds to our observations in Section where the non-monotonicity occurred, for example, when one party uses a boulware tactic in the mix together with an imitative tactic and the opponent applies a conceder tactic. Accordingly, almost no non-monotonicity occurs 78

97 3.4. Evaluation in scenarios with both agents applying the same concession behaviour in the mix, e.g. conceder against conceder or boulware against boulware. It should also be noted that, since both agents apply the imitative absolute tit-for-tat tactic in their mix the degree of non-monotonic concession behaviour is similar when both parties apply the same mixed strategies Scenario with Small Overlap and Equal Deadlines In this scenario, both agents, the client and the provider, have equal deadlines chosen from the set {20, 25, 30, 35, 40} and small overlap with Φ = The utilities shown in Figure 3.7 to 3.9 suggest that the traditional linear weighted combination and the constrained version perform similar as well as the thread-based and concession-based mixing mechanisms. This corresponds to the observations in the examples earlier (cf. Section 3.1.3, 3.2 and 3.3) in which the monotonicity constraint did not have a large effect on the outcome of the traditional mechanism, and the thread- and concessionbased mechanisms had similar concession curves. When agents continue negotiation in the one-sided scenario regardless whether they detect non-monotonic concession behaviour of their opponent, the results are similar for the one one-sided and the two-sided scenario. This is a surprising result, since in the one-sided scenario non-monotonicity still occurs due to the provider always applying the traditional method, whereas in the two-sided scenario, no non-monotonic concession curves occur since both agents apply the monotonic mixing mechanisms. This suggests that the monotonic conces PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.7: Client (left) and provider (right) average utilities in the one-sided without withdraw scenario (client uses different mixing mechanisms while provider always uses traditional mixing) with small overlap and equal deadlines 79

98 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.8: Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with small overlap and equal deadlines U p ST PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.9: Client (left) and provider (right) average utilities in the two-sided scenario (both agents use the same mixing mechanism) with small overlap and equal deadlines sion curve produced by the thread- and concession-based mechanisms already influences the negotiation outcome. In fact, the new mixing mechanisms generate different monotonic concession curves than the constrained one, as the latter only constraints the concession curve of the traditional linear weighted combination and therefore obtains similar results. Since both agents have an imitative tactic in their mix the monotonic concession curve of the client is copied to some degree resulting in different outcomes for the new monotonic mixing mechanism already in the one-sided scenario. The reason for the different monotonic concession behaviour is that the traditional and constrained mechanisms approach the time-dependent tactic towards the end of the negotiation in many settings, such that the mixed strategy reaches the reservation value of the agent. The thread- and concession-based mechanisms, on the other hand, truly mix both tactics with the result that the reservation value of the mixed strategy is a mix of the reservation values of both tactics (cf. Section 3.2 and 3.3). In the case of 80

99 3.4. Evaluation the imitative tactic this depends on the amount of the opponent s concessions and the degree of how much they are copied. In our considered scenario with absolute tit-fortat the reservation value (or maximum possible amount of copied concession) is in the middle of the negotiation range and therefore lower than the reservation value of the time-dependent tactic. This means that the mixed strategies using the thread- and PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.10: Average negotiation length (left) and agreement rates (right) for the onesided without withdraw scenario with small overlap and equal deadlines A PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.11: Average negotiation length (left) and agreement rates (right) for the onesided with withdraw scenario with small overlap and equal deadlines concession-based mechanisms do not reach the reservation value of the agent towards the end in some settings, which may give the agent higher utilities in some scenarios, but also a slightly lower rate of agreements in others (see Figure 3.10 and 3.12). However, if we assume an agent withdraws from the negotiation if it detects non-monotonic concession behaviour, the utilities are significantly lower for all strategy groups while the agreement rate is larger for the monotonic mixing mechanisms (see Figure 3.11). 81

100 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.12: Average negotiation length (left) and agreement rates (right) for the twosided scenario with small overlap and equal deadlines Compared to the traditional mixing the monotonic mechanisms obtain higher utilities for the client in all strategy groups. The negotiation length is slightly smaller for the thread- and concession-based mechanisms in many strategy groups in the one-sided scenario without withdraw and the two-sided scenario. In general, it can be observed that the thread- and concession-based mechanisms improve the client s utilities for the strategy groups with conceder time-dependent tactics in the mix compared to the traditional and constrained mechanism, whereas it is the opposite for the seller. This is similar for the two-sided scenario where the utilities are higher for the provider and lower for the client for the boulware tactics when using the thread- and concessionbased method Scenario with Small Overlap and Different Deadlines In this scenario, both agents, the client and the provider, have equal deadlines chosen from the set {20, 25, 30, 35, 40} and small overlap with Φ = As a result the agents fail to reach an agreement in many scenarios such that utilities obtained by both agents are very low compared to the scenarios with equal deadlines or large overlap. This is because the zone of agreement is very small, such that the performance is only slightly lower (cf. Figure 3.14) in the one-sided scenario with withdraw. 82

101 3.4. Evaluation PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.13: Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with small overlap and different deadlines U p ST PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.14: Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with small overlap and different deadlines U c U p ST ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.15: Client (left) and provider (right) average utilities in the two-sided scenario with small overlap and different deadlines 83

102 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.16: Average negotiation length (left) and agreement rates (right) for the onesided without withdraw scenario with small overlap and different deadlines PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.17: Average negotiation length (left) and agreement rates (right) for the onesided with withdraw scenario with small overlap and different deadlines t n A ST ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.18: Average negotiation length (left) and agreement rates (right) for the twosided scenario with small overlap and different deadlines 84

103 3.4. Evaluation The client can benefit in almost all scenarios and strategy groups when using the thread- or concession-based mechanism, whereas the provider obtains higher utilities only for conceder time-dependent tactics in the mix using the traditional or constraintbased mixing in the one-sided without withdraw and the two-sided scenario. In addition, the thread- and concession-based mechanisms achieve a higher agreement rate in almost all scenarios and strategy groups except for the conceder time-dependent tactics in the one-sided scenario without withdraw Scenario with Large Overlap and Equal Deadlines In this scenario, the different mixing mechanisms are compared when both agents have equal deadlines chosen from the set {20, 25, 30, 35, 40} and the overlap is large with Φ = Similar to the setting with small overlap and equal deadlines, the one-sided PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.19: Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with large overlap and equal deadlines scenario without withdraw and the two-sided scenario are similar. We can also see that the new mixing mechanisms shift utility from one agent to the other when they apply oppositional concession behaviour in their time-dependent tactics. The client gains in utility in the one-sided without withdraw and two-sided scenario when it applies conceder time-dependent tactics in the mix with the thread- and concession-based mechanisms (cf. Figure 3.19 and 3.19) whereas it the opposite for the seller. Similarly, the client looses utility when using the boulware tactics and the new mechanisms while the provider gains utility. This corresponds to the findings form Section where the highest rate of non-monotonicity occurred for strategy groups with oppositional 85

104 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.20: Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with large overlap and equal deadlines U c U p ST PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.21: Client (left) and provider (right) average utilities in the two-sided scenario with large overlap and equal deadlines t n A ST ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.22: Average negotiation length (left) and agreement rates (right) for the onesided without withdraw scenario with large overlap and equal deadlines concession behaviour. The utility drops considerably for both agents when they withdraw from the negotiation after they detected non-monotonic concession behaviour 86

105 3.4. Evaluation PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.23: Average negotiation length (left) and agreement rates (right) for the onesided with withdraw scenario with large overlap and equal deadlines A PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.24: Average negotiation length (left) and agreement rates (right) for the twosided scenario with large overlap and equal deadlines while also the rate of agreements is lower. Due to the large overlap and the equal deadlines the agents a full rate of agreements is reached for the one-sided without withdraw and the two sided scenario in the case of the traditional and constraint mixing method. In the same scenarios, the agreements rate is only slightly lower for some strategy groups when using the thread- or concession-based mechanisms. As explained in Section this is because the thread- and concession-based mechanisms treat the pure tactics individually that may result in an overall lower reservation value for the mixed strategy as compared to the traditional or constrained mixing methods. 87

106 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies Scenario with Large Overlap and Different Deadlines In this scenario the agents have different deadlines chosen from the set {20, 25, 30, 35, 40} with a large overlap of the negotiation intervals (Φ = 0.33). Because of the different deadlines the agents obtain lower average utilities as compared to the scenario with equal deadlines. Similar to the other scenarios, the one-sided setting with withdraw PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.25: Client (left) and provider (right) average utilities in the one-sided without withdraw scenario with large overlap and different deadlines U c U p ST PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.26: Client (left) and provider (right) average utilities in the one-sided with withdraw scenario with large overlap and different deadlines obtains lower utilities for both agents. Again, the client benefits from the thread- and concession-based mechanisms for strategy groups with conceder time-dependent tactics in the mix whereas it is the opposite for the provider. It is interesting to observe that both agents gain in utility in the one-sided setting with and without withdraw when the client applies boulware tactics (cf. Figure 3.25 and 3.26) with the new mechanisms. This, however, is not the case in the two-sided setting in Figure Accordingly, the 88

107 3.4. Evaluation PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.27: Client (left) and provider (right) average utilities in the two-sided scenario with large overlap and different deadlines t n A ST PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.28: Average negotiation length (left) and agreement rates (right) for the onesided without withdraw scenario with large overlap and different deadlines t n ST A ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL Figure 3.29: Average negotiation length (left) and agreement rates (right) for the onesided with withdraw scenario with large overlap and different deadlines agreement rates are also significantly higher for the one-sided with and without withdraw scenarios in the case of boulware tactics and the thread- and concession-based 89

108 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies PCaS PCaL PLaS PLaM PLaL PBaS PBaL PCaS PCaL PLaS PLaM PLaL PBaS PBaL Figure 3.30: Average negotiation length (left) and agreement rates (right) for the twosided scenario with large overlap and different deadlines mixing, but not in the two sided scenario. As mentioned in the previous sections, the reason for this is that the reservation value of the overall mixed strategy is lower for the thread- and concession-based mechanisms in many settings as the traditional or constraint-based methods as their mixed strategies approach the time-dependent tactics in the mix towards the end of the negotiation. 3.5 Related Work Based on the prominent negotiation tactics introduced in [44] different negotiation strategies have been proposed, which focus primarily on single families of tactics. For example, Fatima et al [45, 46] investigate scenarios of single- and multi-issue negotiation where agents have only partial information about each other trying to find optimal strategies that most exploit the opponent. While this work focuses on the effect of time, information states and discounting factors on the outcome and comparisons are made to equilibrium solutions, it is limited to time-dependent tactics and does not consider mixed strategies. Faratin et al provides evaluation results for pure, static and dynamic mixed strategies in [44, 42] with focus on the influence of long and short term deadlines, and initial offers. Although the initial idea of mixing tactics by a linear weighted combination is proposed in that work, it does not investigate the resulting offer curves of the mixed strategies nor the non-monotonic effects. Matos et al [94] propose the application of genetic algorithms to determine most successful mixed strategies that evolve depending on the environment and strategy of the opponent. Both approaches 90

109 3.6. Summary demonstrate that mixed strategies perform better than pure tactics in terms of gained utility and negotiation cycles, but do not investigate the mechanism of their mixing with respect to non-monotonic behaviour. Cardoso et al [25], and Brzostowski et al [19, 18] consider the mixing of different tactic families to evaluate adaptive strategies based on reinforcement learning, respectively, heuristic predictive methods or regression analysis with a focus on their negotiation outcomes only. Sierra and Ros [120] propose to let an agent make concessions through single or mixed tactics whenever a deadlock occurs, i.e. the opponent s last offer does not improve the utility of the offer two steps before, otherwise a trade-off tactic is used. However, utilities of offers may also decrease when single tactics are combined. The work presented here is different in that it focuses on the analysis of the mixing mechanism itself, and proposes new mechanisms that, in contrast to the commonly used linear weighted combination of tactics, generate monotonic sequences of offers and utilities during the process of negotiation, thereby leading to different outcomes compared to the non-monotonic offer curves. 3.6 Summary In this chapter we have investigated a heuristic approach for creating multi-tactic negotiation strategies by mixing a set of pure tactics at each stage of the negotiation using linear weighted combinations. This approach was chosen because of its ability to dynamically generate complex concession behaviour when combining different types of simple decision functions that require the agent to have no knowledge about the other s decision models and preferences, and may use only limited information available in the current encounter. We have shown that when using the traditional linear weighted combination, non-monotonic sequences of offers may occur even in static mixed strategies that involve behaviour-dependent and -independent tactics which individually generate offers in a monotonic manner. Such non-monotonic offers curves also lead to a nonmonotonic sequence of an agent s own utilities, which is argued to be undesirable in automated negotiation. Furthermore, it has been shown that such non-monotonic concession behaviour can occur at any time as a result of the dynamic effects of an agent system in which the agents use mixed strategies with imitative tactics, and that the effects can delay agreements, significantly change outcomes, and result in high sensit- 91

110 Chapter 3. Monotonic Mixing Mechanisms for Multi-Tactic Negotiation Strategies ive strategy parameters compared to monotonic mixing mechanisms. Accordingly, we have proposed two new mixing mechanisms, the first based on individual negotiation threads of each imitative tactic involved and the second using single concessions of each tactic. Based on definitions of monotonic behaviour-dependent and -independent tactics, it has been proven that the new mechanisms produce monotonic concession behaviour, the negotiation thread-based mechanism for static weights, and also the concession-based mechanism for dynamic weights. A number of examples have been used to demonstrate the different concession behaviours of the mixing mechanisms in this chapter, and the evaluation has also shown that the proposed mechanisms can improve an agent s utilities in many negotiation scenarios. 92

111 Chapter 4 Multistage Fuzzy Decision-Making in Automated Negotiation This chapter presents a novel approach for modelling negotiation strategies based on multistage fuzzy decision-making. The process of negotiation can be considered a multistage decision process in which two parties make decisions over multiple stages in order to find a mutually acceptable agreement that satisfies the preferences of both. During the decision process the agents need to consider the concession behaviour of the opponent as well as their own preferences and conditions while the decision model, preferences and deadlines of the agents are unknown to each other. Although the heuristic-based multi-tactic strategies are suitable in such situations and can generate partially adaptable behaviour when mixing behaviour-dependent and -independent tactics, it is difficult for an agent to decide not only which tactics to choose but also to find the appropriate set of strategy parameters in order to achieve a particular behaviour, due to the infinite possible combinations and their sometimes unknown behavioural effects (see previous chapter). The pure tactics, on the other hand, typically represent only simple functions that react to time, a particular resource or simply copy the opponent s behaviour to some degree, but are not able to utilize any knowledge an agent may have about the concession behaviour of the opponent. For that reason, we consider a multistage fuzzy decision approach for an agent s negotiation strategy in which the agent can use its limited knowledge, for example, in the form of reference cases from a few past interactions, to generate a model with fuzzy state transitions about the possible concession behaviour of the opponent. In this decision model, the

112 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation agent s preferences are modelled using a fuzzy goal and fuzzy constraints that impose the agent s soft preferences on the decision process in order to obtain a strategy that at the same time considers the limited knowledge about the opponent s concession behaviour and the agent s preferred strategy. The representation of the offer-response pattern in the form of the fuzzy transition model allows the agent to use dynamic programming algorithms to find the best course of actions during the encounter. In the subsequent sections, we first present the model for the multistage decisionmaking with fuzzy state transitions, and show how an agent can use this model to create flexible negotiation strategies using limited knowledge in the form of a few reference cases in order to negotiate competitively in negotiation environments in which agents can expose different strategic behaviours. It is also shown that an agent can use the fuzzy constraints to impose further soft preferences or conditions on the decision process that then finds a course of actions which takes into account both the limited knowledge about the concession behaviour and the agent s soft preferences. We use some examples to demonstrate the modelling of a negotiation strategy, and also show the decision algorithms of an agent applying this model. In a manner similar to that in the previous chapter, the evaluation section validates the modelling approach and shows experimental results for different negotiation behaviours. 4.1 Model with Fuzzy State Transitions The multistage fuzzy decision models for deterministic and stochastic systems from Section assume that the underlying model of the state transitions is known. In our negotiation context this appears to be a strong assumption, because in many scenarios these probabilistic state transitions might not be obtainable by an agent due to the limited amount of available information, or the required large number of negotiations that is needed to derive such a model. For that reason, we consider the case where the underlying state transitions are fuzzy and represent possible state transitions of the system when choosing a particular action. This corresponds to the viewpoint of possibility theory [146] where the possibility degree of a particular state transition reflects how plausible it is to attain a succeeding state given the state and action at the current stage [124]. The fuzzy representation of the state transitions allows an agent to use the observed concession behaviour that after fuzzification can be utilized in the 94

113 4.1. Model with Fuzzy State Transitions model for an agent s decision-making. As a result, the knowledge required for the creation of the fuzzy transition model may be limited, for example, taken from only a few reference cases, which is shown in the next section. In such a fuzzy system, the state transition function is a conditional fuzzy relation with the membership function µ F (x t+1 x t, u t ) (4.1) with µ f : X U X [0, 1], assigning for each x t X and u t U a fuzzy value to the consecutive state x t+1 X. A similar model has been proposed by Kacprzyk [68] in which the transition model as well as the the states and actions are fuzzy. However, in our negotiation context, the agents exchange crisp offers which makes the modelling with fuzzy states and actions inapplicable. In addition, it has been shown that in that model, due to the infinite number of possible fuzzy states and actions during the backward iteration, finding a solution may become intractable, and therefore requires the use of interpolation between a limited number of fuzzy states and actions. We follow the approach where the state transitions are fuzzy but the states and actions in the decision process are crisp. Similar to the models described in we assume that the transition matrix is time-invariant and the decision process has a finite termination time N. Since in a negotiation an agent is most interested in the final outcome of the encounter, one fuzzy goal G N is imposed at the final stage only whereas fuzzy constraints C t may be imposed at each stage of the decision process with t = 1,..., N 1. This model is thus a fuzzy or possibilistic Markov decision process in a fuzzy environment. The decision problem is derived from the stochastic system under control described in Section where the optimal course of actions u 0,..., u N 1 is sought that maximizes the expected fuzzy goal given the initial state x 0 and the fuzzy constraints over all stages: µ D (u 0,..., u N 1 x 0 ) = max u 0,...,u N 1 [µ C 0(u 0 )... µ C N 1(u N 1 ) Eµ G N (x N )]. (4.2) However, the expected goal Eµ G N (x N ) clearly does not follow the same notion of the probability of attainment of a fuzzy event as in the probabilistic case. In fact, the expected goal contains the expected fuzzy values for each possible action at each state in the current stage based on the particular decision criterion chosen by the agent. For example, in the case of the probabilistic system in a fuzzy environment the decision 95

114 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation criterion is the probability of the attainment of a fuzzy event, whereas in a pure Markov decision process the decision criterion is the expected value or reward of future states and actions. A number of decision criteria have been proposed and discussed for qualitative frameworks, especially from the perspective of possibility theory [125, 34, 141], among which the most common are the optimistic and pessimistic qualitative expected utility represented by possibility or necessity of a fuzzy event, respectively, which has been shown to be counterparts to the expected utility in standard decision theory [35]. The optimistic criterion has also been applied by Kacprzyk [68] in the above mentioned setting with fuzzy constraints, and we will use the same criterion in the following. The expected goal can then be expressed using the optimistic qualitative criterion as follows: Eµ G N (x N x N 1, u N 1 ) = max x N X [µ F (x N x N 1, u N 1 ) µ G N (x N )], (4.3) This corresponds to the max-min composition shown in Section However, depending on the decision problem and context different decision criteria may apply, such that other s-t norm compositions could also be used instead [68]. With the decision criterion above we can now formulate the recurrence equation similar to the stochastic system in Section as follows: Eµ G N i+1(x N i+1 ) = max x N i+1 X [µ F (x N i+1 x N i, u N i ) µ G N i+1(x N i+1 )], (4.4) µ G N i(x N i ) = max u N i [µ C N i(u N i ) Eµ G N i+1(x N i+1 )] (4.5) for i = 1,..., N. Since the expected goal is conditioned on states and actions at stage N i it represents a fuzzy relation between x N i and u N i giving the maximum expected possibility over next states x N i+1, the correct notation for the expected goal is Eµ G N i+1(x N i, u N i ). However, throughout this chapter, we also use the simplified notation introduced by Kacprzyk [68] interchangeably. Similar to the other models the solution is expressed in terms of a policy function u t = a t (x t ) with t = 0, 1,..., N 1 and A = {a 0,..., a N 1 } being the optimal action strategy. Based on the above recurrence equations a dynamic programming approach can be chosen to generate the action policies. For more details about decision-making in qualitative and semi-qualitative frameworks we refer to the vast amount literature including [68, 124, 125, 35, 142]. In 96

115 4.2. Modelling Negotiation Strategies the next sections, we will adapt this model to the bilateral negotiation process in order to find the course of counteroffers an agent proposes given the limited knowledge about the opponent s concession behaviour. 4.2 Modelling Negotiation Strategies The multistage fuzzy decision model from the previous section can be mapped directly to the decision process of an agent in bilateral negotiation. Figure 4.1 shows the decision process where the opponent offers correspond to the states, the modelling agent s offers to the actions, and the conditional fuzzy relation represents the underlying model the agent created on the basis its limited knowledge about the opponent s behaviour. The agent s fuzzy constraints are imposed at each stage of the encounter whereas the fuzzy goal is imposed only at the last stage, since the agent is most interested in the final outcome of the negotiation. In the next sections we describe in Figure 4.1: Multistage fuzzy decision process of a negotiation agent more detail how an agent models the state and action space, creates the fuzzy transition model based on a few reference cases, and applies the fuzzy goal and fuzzy constraints. In order to avoid a conflict in the notation of the negotiation model from Section and the multistage fuzzy decision model from previous section, we slightly change the notation for the offers exchanged, where an offer from agent a to b is denoted as o k i a b with k i time representing the discrete time points where the offers are proposed during the encounter. The negotiation thread is denoted as N T instead of X, which 97

116 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation stands for the state space in this chapter. In addition, we use the term action instead of control in the following since the opponent can not be controlled in the sense that the agent has incomplete knowledge over the underlying decision structure States and Actions The state and action spaces relate to the offers exchanged during the encounter. Despite the fact that there are various ways for modelling the state and actions, for example using a response or imitation rate [116], the most straightforward approach is to use the offers directly. Both spaces have to be in discrete form where the state space covers the complete negotiation range and the action space the negotiation interval of the agent for the issue under negotiation. The discretization method for space X is given by { (l 1)(uB lb) S = n 1 } + lb l = 1,..., n. (4.6) where n is the total number of states, and ub and lb represent the upper and lower boundary of the state space, respectively. For example, if we assume that an agent a applies this model, and agent b makes the first proposal, then for the state space, the boundaries ub and lb correspond to the first offers o k 1 b a and ok 2 a b, respectively. For the action space, the upper boundary of agent a is given by its reservation value RV a while the lower boundary is the first offer of agent a. For simplicity, we use the same discretization factor for both spaces such that the cardinality m of the action space is given in relation to the total number of states n with m = n RV a o k 2 a b o k 1 b a ok 2 a b (4.7) where a can be a buyer or seller agent. Since the bilateral negotiation model in Section and the multistage fuzzy decision model use different time intervals, the sequence of offers in the negotiation thread is mapped into a state-action form such that offers and counteroffers at time k i and k i+1 correspond to states and actions at stage t. The sequence of the offers exchanged during the encounter is equivalent to the trajectory 98

117 4.2. Modelling Negotiation Strategies T R of states and actions written as T R =(x 0, u 0, x 1, u 1,..., x t 1, u t 1, x t ) (o k 1 b a, ok 2 a b, ok 3 b a, ok 4 a b,..., ok i 2 b a, ok i 1 a b, ok i b a ), (4.8) where the next offer o k i+1 a b is the action u t by agent a for the given state x t, that corresponds to the offer o k i b a from its opponent b. Thus, the action u t (offer o k i+1 a b ) is the action sought at stage t. Since the state and action spaces are in discrete form and offers may be proposed in a different (continuous) space, offers are mapped to states x t and actions u t with x t = arg min o k i b a σ σ X u t = arg min o k i+1 a b α, (4.9) α U if agent a is using the multistage fuzzy decision model. Agent a needs at least one offer from its opponent to make a decision according to the responsive negotiation model in Section while the course of actions determined by its policy function then represents its negotiation strategy in response to the opponent s strategic concession behaviour Fuzzy State Transitions The fuzzy transition matrix of an individual agent encodes its fuzzy knowledge about the possible concession behaviour of its opponent and the agent s responses that may lead to an agreement. Depending on the source and the amount of an agent s knowledge, different methods may be used to create the fuzzy transition model. For example, an agent may use fuzzy rules that it generated through a large number of interactions or received from an expert or other agents. Although various methods have been shown for the generation of such fuzzy rules in the setting of negotiation [3], they require a large amount of data in order to generate a sufficient rule base. In most realistic situations in automated negotiation, however, such knowledge is simply not available due to the open and distributed nature of the agent system. Therefore, we assume in this thesis that the agent uses a small number of reference cases, which the agent may take from past interactions, to obtain the fuzzy state transitions. If the agent 99

118 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation has no prior knowledge the reference cases can also reflect an agent s preferred course of responses to a negotiation partner s concessions. Such reference cases reflect the range of actions over time as a response to the proposed offers of the opponent and, in that sense, define the dynamic negotiation strategy over the agent s possible offers. Using only a limited number of reference cases, we use the similarity measure to create and update the agent s fuzzy transition matrix during the encounter. Let NT [h] be the negotiation thread of case h, then the thread can be transformed into the state-action form (cf. Section 4.2.1) obtaining the trajectory T R[h]: T R[h] = (σ l0 [h], α v0 [h]..., α vn[h] 1 [h], σ ln[h] [h]) (4.10) where N[h] is the last stage, and σ li [h] and α vi [h] are states and actions at stages i = 1,..., N[h] of case h, respectively. The indices l i [h] {1,..., n} and v i [h] {1,..., m} correspond to the number of the states and actions of case h at stage i. The trajectory of all states of case h can also be written as T R X [h] = (σ l0 [h],..., σ ln[h] [h]) and, respectively, of all actions as T R U [h] = (α v1,..., α vn[h] ). As described in Section 4.1, the policy function recommends at least one action for each state in the state space. In order to create the necessary state transitions, we therefore need to interpolate the trajectory of each case, such that it contains all states of the state space and each state is assigned a particular action. This implies, that also the last state σ ln[h] is assigned an action α vn[h] since it represents the agreement of case h with σ ln[h] = α vn[h]. We choose linear interpolation, and obtain the interpolated states σ li,j [h] and actions α vi,j [h] for all i = 0,..., N[h] 1 with l i [h] + j for l i [h] < l i+1 [h] l i,j [h] = (4.11) l i [h] j for l i [h] > l i+1 [h], v i [h] + j δ[h] v i,j [h] = v i [h] j δ[h] for v i [h] < v i+1 [h] for v i [h] > v i+1 [h], (4.12) where j = 0,..., l i [h] l i+1 [h] 1 and δ i [h] is the interpolation factor for two consecutive actions in the trajectory: δ i [h] = v i[h] v i+1 [h] 1 l i [h] l i+1 [h] 1. (4.13) 100

119 4.2. Modelling Negotiation Strategies Index j hence depends on the number of interpolated states between two consecutive states in the state trajectory T R[h]. The interpolated state and action trajectories T R X [h] and T R U [h] can then be written as T R X [h] = (σ l0,0 [h],..., σ l0,j [h],..., σ l1,0 [h],..., σ ln[h],0 [h]) T R U [h] = (α v0,0 [h],..., α v0,j [h],..., α v1,0 [h],..., α vn[h],0 [h]). (4.14) For the state transitions, we use the similarity between the trajectory of case h and the current behaviour of the opponent represented by the current trajectory T R X [curr] at time t: sim t (T R X [h], T R X [curr]) = 1 t + 1 t i=0 1 σ li [h] x i (max h H (σ li [h]) min h H σ li [h]) (4.15) for i N[h] and H being the set of all cases. The similarity values provide the necessary fuzzy transitions for each case in comparison to the current negotiation and are updated at each negotiation round. If during the negotiation the current stage exceeds the last stage from a particular case its last offer is used instead. The transition matrix is then created based on an initially zero transition matrix µ(x t+1, x t, u t ) = 0 n,m,n for all m actions and n states using the similarity values: µ(σ li+1 [h] σ li,j [h], α vi,j [h]) = max[sim t (T R X [h], T R X [curr]), µ(σ li+1 [h] σ li,j [h], α vi,j [h])] (4.16) for all i = 1,..., N[h] 1. In scenarios where only reference cases are used for the state transitions, the expected fuzzy goal at each stage can be derived directly from all cases: Eµ G i+1(x i+1 x i[h], u i [h]) = max h H (sim t(h) µ G i+1(x i+1 [h])), (4.17) for i = t, t + 1,..., N[h] 1, where t is the stage in the current negotiation. This simplifies the recalculation of the expected goal at each stage with respect to the current similarity value. Thus, the computational effort is reduced, especially in scenarios where the fuzzy transition matrix is sparse due to a small number of cases. To enable inference between the expected goal and the fuzzy constraint, the actions holding zero value in the possibility distribution over all actions need to be interpolated 101

120 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation for each state σ l as follows: Eµ G t+1(x t+1 σ l, α v ) = Eµ G t+1(σ l, α v ) = Eµ G t+1(σ l, α v2 ) Eµ G t+1(σ l, α v1 ) v 2 v 1 (v v 1 ) + α v1, (4.18) under the condition that v 1 < v < v 2 and Eµ G t+1(σ l, α v1 ), Eµ G t+1(σ l, α v2 ) > 0 with v, v 1, v 2 {1,..., m}. If, however, the boundary actions α 1 or α m are zero, we can replace them by very small values greater zero before applying the interpolation method to obtain a non-zero possibility distribution over the whole actions space. The rationale behind is that a limited number of cases may be sufficient to propose an agent s response. Since the expected goal then holds values for all states and actions after the interpolation, the model can also propose actions not covered by any of the reference cases. Therefore, this approach provides a great flexibility towards the creation of adaptive negotiation strategies. As mentioned previously, the above method is used when the agent has only a few reference cases at its disposal. Although also a large number of cases can be used, other methods may be more efficient for the creation and the update of the transition matrix if the agent has a larger amount of the pre-existing knowledge or beliefs about the opponent, for example, in the form of a set of fuzzy rules. In the next sections, the fuzzy goal and the fuzzy constraints are discussed as they represent the preferences of an agent over its opponent s offers (states) and its own offers (actions), respectively, and constitute the means of an agent to direct the decision process Fuzzy Goal The negotiation agent uses the fuzzy goal to specify its preferences over all states in the state space. The degree of membership in the fuzzy goal increases for states closer to the initial value of the agent as they are more preferable to states close to the initial offer of the opponent. The fuzzy goal is therefore similar to the utility function of an agent in that it orders the possible outcomes of the negotiation by assigning values from the interval [0, 1] to the possible offers in the negotiation space. However, while the utility function orders all offers in the negotiation interval of the agent and assign zero to all other offers (outside the interval), the membership degrees of the fuzzy goal have to be non-zero for all states in the negotiation range, except the initial 102

121 4.2. Modelling Negotiation Strategies state, as otherwise a state might never be reached as an intermediate or final state during the backward iteration of the decision process. The fuzzy goal always covers the whole discretized negotiation range, i.e. the range between the agent s first offer proposals. This difference between the utility function and the fuzzy goal is therefore only important if the agents negotiation interval does not cover the whole negotiation range, i.e. there is only a partial overlap of the agents intervals. Figure 4.2 shows an example for a fuzzy goal and a utility function in the case where the agents negotiation interval ([10, 25]) overlaps only partially with the other agents interval ([15, 30]). If the agent s negotiation interval, however, fully overlaps the negotiation range, its utility function can be mapped directly to the fuzzy goal. Μ G Nx N x Figure 4.2: Example fuzzy goal (left) and utility function (right) for a partial overlap of negotiation intervals Fuzzy Constraints Whilst the fuzzy goal represents a preference over the states an agent can use the fuzzy constraints to impose a preference over its actions at each stage of the decision process. In this sense, the fuzzy constraints constitute a means to influence and direct the decision process based on an agent soft preferences or any other factors, such as a particular resource in the agent s environment. Since time is an important factor in negotiation, it is straightforward to represent the fuzzy constraints in a time-dependent form (as shown in Section 4.1) where to each stage of the encounter a particular fuzzy constraint is assigned. The fuzzy constraints constitute a time-dependent preference over the range of possible offers of the modelling agent. For simplicity, we use the 103

122 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation triangle membership function in the following 0 x < a, x > c µ C t(x, a, b, c) = x a a x b b a c x b < x c. c b (4.19) However, any other type of membership function may be used instead. The fuzzy constraints (an example is shown in Figure 4.3) does not need to cover the whole action space because inference with the non-zero expected fuzzy goal always results in a non-zero fuzzy set. The effect of a fuzzy constraint can vary depending on the shape and the support of its fuzzy set. In general, the larger the support and area of the fuzzy constraint the stronger the influence of the cases on the actions and vice versa. For simplicity and easy specification, constraints are typically normalized. However, in a Μ C tu t Figure 4.3: Example fuzzy constraint negotiation, the other agent may choose states close to its initial offer in the beginning with small membership degrees in the fuzzy goal, such that membership degrees for all state-action pairs in the expected goal relation (Eq. (4.3)) become also small. As a result, constraints may have a low effect on the actions as they are normalized and may completely overlay the expected goal distribution over the actions for a particular state. The influence is increased by scaling the fuzzy constraints down, e.g. to the maximum of the expected goal, before it is applied (cf. Eq. (4.4)): ˆµ C t(α) = µ C t(α) max (Eµ Gt+1(σ, α)) (4.20) α U,σ X 104

123 4.2. Modelling Negotiation Strategies for all α U. This method ensures a high effect of the individual constraints on the transition matrix and therefore on the cases during the encounter. The scaling factor for the constraints depends on the preference of the agent and can be different from the one shown above Modelling Different Negotiation Strategies with Fuzzy Constraints The possibility to impose time-dependent fuzzy constraints on the actions enables an agent to apply its own soft preferences or conditions on the decision-making process. In that sense, the generated course of actions during the backward induction considers both the limited knowledge about the opponent s concession behaviour in the fuzzy transition relation and the agent s own soft strategy. An agent can therefore use any strategy or conditions based on time as, for example, the time-dependent decision functions in Section where µ C t(α(t), a, b, c) represents the membership function with t = 0, 1,..., N 1 and α(t) corresponds to one of the decision functions. In addition, the height and support of the fuzzy constraints allow an agent to determine in what range and to what degree they influence the decision-making. For example, if an agent needs to make sure that towards the end of the negotiation its own strategy is prevalent, e.g. in order to approach the reservation value, the support of the fuzzy constraints may be decreasing. Examples for such time-dependent soft strategies for the boulware and conceder polynomial function are shown in Figure 4.4a and 4.4b, and an example for fuzzy constraints with a decreasing support in Figure 4.4c. In the case where the agent uses the reference cases only without any fuzzy constraints the decision process may propose actions which, although they corresponds to the offerresponse patterns in the fuzzy transition relation, do not resemble the offers of the reference cases at that particular stage of the encounter. This is because the fuzzy transition matrix contains the knowledge about the relationship between the possible concession behaviour of the opponent and the agent, but without the information at what stage in the encounter the concession is made. In such situations, an agent can use the reference cases to create the fuzzy constraints over time in order to ensure concession behaviour that is more similar to the reference cases. Figure 4.4d shows such an example in which the actions of two cases are used for generation of the fuzzy 105

124 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation (a) Boulware polynomial fuzzy constraints 10 (b) Conceder polynomial fuzzy constraints (c) Fuzzy constraints with decreasing support 10 (d) Fuzzy constraints based on example cases Figure 4.4: Examples for different time-dependent fuzzy constraints constraints. The similarity of the two cases to the current encounter is then used to determine the hight of the individual fuzzy constraints for each case. 4.3 Decision Algorithms The question remains of how an individual agent applies the multistage fuzzy decision model to specify its negotiation strategy. In the following, we provide the decision algorithm for a client agent c negotiating with a provider agent p assuming that p proposes the first offer. Algorithm 1 details the communication mechanism with the opponent in terms of the offer exchange during negotiation (lines 6 to 28) according to Section after the agent created its state and action space and transformed the reference cases into the respective form (cf. Section and 4.2.2) using the first offers of both parties (lines 1 to 5). For simplicity, the agent uses the number of negotiation rounds to specify its negotiation deadline (t b max) for its withdrawal instead of 106

125 4.3. Decision Algorithms using a real time measure. A negotiation round consists of one offer proposal of both agents and thus corresponds to one stage in the multistage fuzzy decision process. However, the agent may abort the negotiation after a timeout period (line 9), where it receives no response after a predefined threshold time. This timeout period naturally depends on the conditions and preferences of the system and the agent. Algorithm Algorithm 1 Decision algorithm of the client agent c 1: Exchange first offers o k 1 p c (= x 0 ) and o k 2 c p (= u 0 ) 2: Create Action and State Space from x 0 and u 0 cf. Eq. 3: for all reference cases h H do 4: T R[h] Transform T R[h] into state-action form 5: end for 6: end False 7: t 1 8: while end T rue do 9: if t > t c max or timeout then 10: Withdraw from negotiation 11: end True 12: else 13: x t Next offer from opponent s 14: if s accepts last offer u t 1 of agent b then 15: end True 16: else 17: a GETPOLICY(t, T R X [curr]) 18: u t a t (x t ) 19: if U c (x t ) U c (u t ) then 20: Accept last offer x t of s 21: end True 22: else 23: Propose counteroffer u t 24: end if 25: end if 26: end if 27: t t : end while 29: end algorithm 2 details how the multistage fuzzy decision model is applied to obtain action policies throughout all stages of the negotiation encounter for the agent. It represents the fuzzy dynamic programming method including the creation of the expected goal matrix from 107

126 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation Algorithm 2 Get action policy at stage t 1: procedure GETPOLICY(t, T R X [curr]) 2: Eµ G N (x N 1, u N 1 ) 0 n,m 3: k N 4: while k > t do 5: k = k 1 6: for all cases h H do 7: sim[h] sim t (T R X [h], T R X [curr]) 8: for all i = 1,... N[h] 1 do 9: g min(sim[h], µ G k+1(σ li+1 [h]) 10: for all j = 0,..., l i [h] l i+1 [h] 1 do 11: Eµ G k+1(σ li,j [h], α vi,j [h])... 12:... max(eµ G k+1(σ li,j [h], α vi,j [h]), g) 13: end for 14: end for 15: end for 16: ˆµ C k(u k ) µ C k(u k )... 17:... max α U,σ X (Eµ G k+1(σ, α)) 18: for all l = 1,..., n do 19: Interpolate Eµ G k+1(σ l, u k ) 20: µ σl (u k )... 21:... min α U (ˆµ C k(α), Eµ G k+1(σ l, α)) 22: µ G k(σ l ) max α U (µ σl (α)) 23: a k (σ l) arg max α U (µ σl (α)) 24: end for 25: end while 26: a {a k,..., a N 1 } 27: Return a 28: end procedure the reference cases (line 2 to 15) and its interpolation. It should be noted that the linear interpolation to transform the reference cases into the state-action form (line 4, Algorithm 1) and in the expected goal matrix (line 19, Algorithm 2) is straightforward (cf ) and therefore not detailed here. Similar to the traditional dynamic programming algorithms the complexity of finding the next offer at each round is O(n 2 ). For a detailed discussion about the complexity in terms of storage (space) and time of fuzzy dynamic programming we refer to [41]. It should be noted that, as described in Section 2.3, in multi-issue negotiations the decision strategies for the concession-making of an agent can be applied either for each 108

127 4.4. Negotiation Examples issue individually or along the indifference curves according to the agent s utility function in combination with a trade-off mechanism. In such cases, the multistage fuzzy decision approach can be used for the concession-making of an agent. For simplicity, we focus in the following on a single issue, and illustrate some example negotiations with an agent using the multistage fuzzy approach in the next section. 4.4 Negotiation Examples In this section, we discuss some examples where an agent uses the multistage fuzzy decision approach with different fuzzy constraint setting to propose offers in a negotiation encounter Agent using Reference Cases First, the modelling agent uses two reference cases only to generate its fuzzy state transitions. The cases, shown Figure 4.6a, need to be transformed into the state-action form as described in Section based on the discrete state and action space of the agent. For example, assuming that the client and the provider have the following intervals: min c = 10, max c = 25 and min p = 15, max p = 30, and the provider proposes the first offer with o k 1 p c = 30, the agent discretizes the negotiation range based on the discretization factor (e.g. 0.25), such that X = {10, 10.25, 10.5,..., 29.75, 30} and the U = {10, 10.25,..., 24.75, 25}. The state-action form of a case then requires that each case covers the whole state space. For example, for two consecutive offers o k i p c = 29.8 and o k i+2 p c = 28.2 and counteroffers o k i+1 c p = 10.3 and o k i+3 c p = 11.9 of case h the state and action trajectories would be T R X [h] = {..., 29.75, 29.5,..., 28.5, 28.25,...} and T R U [h] = {..., 10.25, 10.5,..., 11.75, 12,...}, respectively. Each state-action pair is assigned the next state from the original case. The cases in the state-action form and their similarity to the current encounter is then used to generate the transition matrix, or more simply, the expected fuzzy goal at each stage directly. For example, assume for a particular current state σ l that the next states, according to the cases, are σ l1 and σ l2 for the actions α v1 and α v2, respectively. Further assume that the fuzzy goal for the two states is µ G N (σ l1 ) = 0.8 and µ G N (σ l2 ) = 0.6. Now, if the similarity of the cases is sim t (1) = 0.9 and sim t (2) = 0.5, the expected fuzzy goal for state σ l and the 109

128 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation (a) Example fuzzy set for one state in the expected fuzzy goal based on the similarity of two cases (b) Inference of normalized fuzzy case constraints and the expected fuzzy goal (shown for one state) Figure 4.5: Inference example for expected fuzzy goal and fuzzy case constraints actions α v1 and α v2 is, according to (4.17), Eµ G N (x t+1 σ l, α v1 ) = max h H (sim t (h) µ G N (x t+1 [h])) = = 0.8 and Eµ G N (x t+1 σ l, α v2 ) = = 0.5. The two singletons for state σ l are then used to interpolate the distribution in the expected fuzzy goal while the inference of the similarity with the fuzzy goal is carried out for each state and all reference cases. Figure 4.5a shows such an example fuzzy set in the expected fuzzy goal. In the situation, where the agent uses no fuzzy constraints the similarity and the fuzzy goal determine which actions are chosen, so that the concession behaviour of the agent is similar to either of the cases. This is illustrated in Figure 4.6b and 4.6c. In these examples, the agent uses the two cases from Figure 4.6a to generate its course of actions. As we can see, in both examples, the outcome is close to the outcome of the cases. Since in our model the fuzzy transition relation does not contain information about the stage in which a concession is made, the agent can use the original reference cases to generate fuzzy constraints (as described in Section 4.2.5). Figure 4.5b shows the example where fuzzy case constraints are inferred with the expected fuzzy goal for a particular state. As a result, the chosen actions are closer to the actual actions in the same stage in the reference cases (Figure 4.6d). The inference between the expected fuzzy goal and the fuzzy constraints, however, may achieve different results depending on the height and support of the fuzzy sets. Figure 4.6d shows two action curves for different normalization levels of the fuzzy constraints (for the lower curve the constraints have been normalized to the height of the expected fuzzy goal). In addition, because during the backward induction the values in the fuzzy goal decrease from stage to stage, it needs to be normalized, as otherwise the simil- 110

129 4.4. Negotiation Examples (a) Reference cases of client agent (b) Multistage fuzzy strategy without fuzzy case constraints (bottom) when provider uses polynomial time-dependent tactic with β = 2 (top) (c) Multistage fuzzy strategy w/o fuzzy case constraints (bottom) when provider uses polynomial time-dependent tactic with β = 0.4 (top) (d) Multistage fuzzy strategy with fuzzy case constraints and different normalization levels (bottom) when provider uses polynomial timedependent tactic with β = 0.4 (top) Figure 4.6: Example offer curves for an agent using two reference cases and case constraints arity of the cases and the fuzzy constraints are less effective. However, as mentioned in [7] such normalization problems are not trivial as they can strongly influence the calculated course of actions and therefore the negotiation strategy. The next section will further demonstrate the effect of the fuzzy constraints Agent using Preferred Strategy In addition to the reference cases the agent may also have a preferred soft strategy over the course of actions. Using the same settings and reference cases from the previous section we show examples where the modelling agent uses the time-dependent 111

130 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation Figure 4.7: Inference example for the expected fuzzy goal and fuzzy constraints of a preferred strategy (a) Multistage fuzzy strategy (bottom) with conceder fuzzy constraints (polynomial with β = 2) and the provider using polynomial time-dependent tactic with β = 2 (top) x 30 (b) Multistage fuzzy strategy (bottom) with conceder fuzzy constraints (polynomial with β = 2) and the provider using polynomial time-dependent tactic with β = 0.4 (top) x t (c) Multistage fuzzy strategy (bottom) with boulware fuzzy constraints (polynomial with β = 0.4) and the provider using polynomial time-dependent tactic with β = 2 (top) t (d) Multistage fuzzy strategy (bottom) with boulware fuzzy constraints (polynomial with β = 0.4) and the provider using polynomial time-dependent tactic with β = 0.4 (top) Figure 4.8: Example offer curves for an agent using two reference cases and timedependent fuzzy constraints 112

131 4.4. Negotiation Examples decision functions to create its fuzzy constraints as shown in Section An inference example for the expected fuzzy goal with the time-dependent fuzzy constraints is shown in Figure 4.7. Again, the agent can adjust the effect of the constraints via the support and the height of the constraints. Figure 4.8 illustrate how conceder and boulware fuzzy constraints change the offer proposals towards larger or smaller concessions. Whereas the conceder constraints in Figure 4.8a and 4.8b show little effect, as the reference cases are close to the constraints, the boulware fuzzy constraints attempt to pull the offer curves towards the client (Figure 4.8c and 4.8d). Depending on the aim of the agent, i.e. whether reaching an agreement quickly or negotiating more competitively in order to gain higher outcome utilities, an agent can apply its soft strategy to direct the encoded concession behaviour in the fuzzy transition relation towards its own preferences Both Agents using Multistage Fuzzy Decision-Making Another interesting question is how are the offer curves when both agents use the multistage fuzzy decision approach, either with the same set or different sets of reference cases. Figure 4.9a shows an example in which the two agents use the same set of cases from the previous section. The provider makes the first proposal and chooses the offer from the case which potentially obtains the highest utility. Once the provider chose that boulware case, the client recognizes the similarity with that case and chooses an action accordingly. As a result, both agents propose offers along that case. In Figure 4.9b both agents have the same sets of cases but use boulware fuzzy constraints (polynomial with β = 0.4). Although both use the same fuzzy constraints the outcome is still to the advantage of the provider due to the boulware reference case. Figure 4.9c shows an example in which the provider has a different set of cases. As the sets of cases of both agents are almost symmetrical, the outcome of the negotiation is close to the middle point of the negotiation range (in this case the Nash point). This is illustrated in Figure 4.9d. In general, however, the outcome depends on the reference cases of the agents and their set of fuzzy constraints. 113

132 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation (a) Offers curves when both agents use the same reference cases (b) Offer curves when both agents use the same reference cases and boulware fuzzy constraints (c) Two reference cases used by the provider (d) Offers curves when agents use different sets of reference cases Figure 4.9: Example offer curves when both agents use the multistage fuzzy decision approach 4.5 Evaluation This section evaluates the negotiation strategies generated by the model of multistage fuzzy decision-making of this chapter Experiment Settings Similar to the evaluation in Chapter 3, we consider single-issue negotiations between client c and provider p in order to compare the performance of an agent using the multistage fuzzy decision approach with an agent using mixed strategies. Because the two decision-making approaches use entirely different techniques and settings, we compare the multistage fuzzy strategy using two example reference cases with the av- 114

133 4.5. Evaluation erage mixed strategy using static weights based on either the traditional linear weighted combination or the concession-based mechanism from Section 3.3, while the opponent applies different types of mixed strategies. In a similar manner, the mixed strategies are created using one behaviour-dependent and one behaviour-independent tactic. For the set of mixed strategies ST we choose the following tactics based on the settings detailed in Section 2.6.2: ST = {P C, P L, P B, EC, EL, EB} {a, r} {S, M, L} (4.21) Playing the average mixed strategy is similar to playing all strategy groups from the set ST. In order to ensure that the strategies are tested in a wide range of different negotiation settings, the client agents with the multistage fuzzy strategy and the average mixed strategy play against providers with a particular strategy group of the set ST. The strategies are hence compared for each strategy group in ST. Due to the different concession behaviours of mixed strategies using different mixing mechanisms, the traditional and concession-based mixing methods are chosen for comparison against the multitsage fuzzy strategy. For simplicity and easy analysis, the agent with the multistage fuzzy strategy uses the same reference cases from the examples in Section The cases are shown again in Figure In addition, we are interested in how the performance changes when the multistage fuzzy agent applies boulware fuzzy constraints in order to improve its utility gain by negotiating more competitively. The boulware fuzzy constraints are created using the polynomial decision function with boulware β settings as shown in Section The performance of the strategies is measured using the average intrinsic utility U c of the client agent and the agreement rate A (in %). Similar to the evaluation section in Chapter 3 the agents employ again the linear utility functions allowing the direct measurement of negotiation outcomes in the negotiation interval, and we use bar chart diagrams to illustrate the performance of the strategies (with the small dotted bars on top representing the standard deviation). The dark bars correspond to the multistage fuzzy strategy and the light bars to the average mixed strategy. As described in Section we focus on scenarios with more realistic settings in which agents have only partial overlap of their negotiation intervals with Φ {0.33, 0.66}. As the agents typically do not know their opponents deadlines as it part of their preferences, but an agent system may also have a systemspecific deadline for their negotiation interactions, we distinguish between scenarios 115

134 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation Figure 4.10: Example cases for the multistage fuzzy strategy with equal or different deadlines. The considered types of negotiation scenarios are: Small overlap and equal deadlines Small overlap and different deadlines Large overlap and equal deadlines Large overlap and different deadlines with the negotiation environment settings as follows: Client: t c max = 30, min c {10}, max c {25} Provider: t p max {20, 25, 30, 35, 40}, min p { Φ Φ {0.33, 0.66}}, max p {min p + 15} The following sections present the experimental results and their discussion Scenario with Small Overlap and Equal Deadlines In this scenario, the agents have equal deadlines and intervals overlap only to a small degree with the settings from the previous section. We compare the multistage fuzzy strategy using the example cases shown in the previous section with the average mixed strategy generated by either the traditional linear weighted combination (Figure 4.11a) or the concession-based mechanism (Figure 4.11b). 116

135 4.5. Evaluation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL (a) Average utility (top) and agreement rate when compared with the traditional mixed strategy U c 0.4 U c PCaS PCaM PCaL PLaS PLaM PLaL ECaS ECaM ECaL ELaS ELaM ELaL ST ST A 100 PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A 100 EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ST ST (b) Average utility (top) and agreement rate when compared with the concession-based mixed strategy PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.11: Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with small overlap and equal deadlines 117

136 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation In addition, Figure 4.12 shows the results when the agent with the multistage fuzzy strategy imposes the boulware fuzzy constraints on its decision strategy. Figure 4.11 shows that the multistage fuzzy strategy performs better than the average mixed strategies based on the traditional linear weighted combination or the concession-based mechanism in most scenarios, except in cases where the opponent applies exponential boulware tactics in its mix. The reason for this is that the exponential decision function used with boulware settings proposes concessions towards the end of the encounter, and when used in a mixed strategy with the traditional linear weighted combination may cause a non-monotonic concession curve. Figure 4.11b shows the difference when the monotonic concession-based mechanism is used. In this setting, the multistage fuzzy ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS ST PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.12: Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with small overlap and equal deadlines strategy obtains the maximum agreement rate in all cases, whereas the utility is lower only compared to some of the boulware time-dependent tactics in the mix. When the agent imposes the boulware time-dependent fuzzy constraints the average utility as well as the agreement rates for the exponential boulware tactics are improved (Figure 4.12), resulting in a better performance in all strategy groups. This result is intuitive, as, due to the equal deadlines of both agents, the smaller concessions proposed by the 118

137 4.5. Evaluation multistage fuzzy strategy with the boulware fuzzy constraints achieve higher utilities without sacrificing the agreement rate Scenario with Small Overlap and Different Deadlines In this scenario we compare the multistage fuzzy strategy with the average mixed strategies using the traditional linear weighted combination and the concession-based mechanism when the agents have different deadlines and only a small overlap of the negotiation intervals. The Figures 4.13 and 4.14 demonstrate that the multistage fuzzy strategy does not perform as well as compared to the scenario with equal deadlines. Although the average utility and the agreement rate are still good compared to the average mixed strategy using the traditional method in cases where the opponent uses conceder or linear tactics in the mix, it is the opposite for the other strategy groups. The reason is that if both agent have different deadlines, the boulware time-dependent tactics may miss the zone of agreement even though this zone exists. In addition, even if the opponent makes concessions similar to a case captured by the multistage fuzzy agent but with a different negotiation deadline, it results in a different negotiation behaviour. That emphasizes the fact that time is an important factor in automated negotiation. A case captured by the multistage fuzzy agent not only represents the relation between the concession behaviour of the two agents, but also when these concessions were made in the process of the negotiation. Because the agent has no knowledge about the opponent s deadline it can only assume that the deadline is at least as long as the time when the agreement was reached in that particular case. However, the individual concessions and the agreement point is a result of the dynamic concession behaviour of both agents, such that the deadline may indeed be different from the one assumed by the multistage fuzzy agent. As a result, when both agents have different deadlines the utility and agreement rate may indeed be lower as compared to the setting with equal deadlines. This effect is also emphasized in Figure 4.14 where the multistage fuzzy agent uses the boulware fuzzy constraints. As they result in smaller concessions, the agent misses a significant amount of agreements. This is also reflected in the average utility which decreases with a larger number of failed agreements. 119

138 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL (a) Average utility (top) and agreement rate when compared with the traditional mixed strategy U c 0.4 U c PCaS PCaM PCaL PLaS PLaM PLaL ECaS ECaM ECaL ELaS ELaM ELaL ST ST A 100 PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A 100 EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ST ST (b) Average utility (top) and agreement rate when compared with the concession-based mixed strategy PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.13: Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with small overlap and different deadlines 120

139 4.5. Evaluation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS ST PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.14: Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with small overlap and different deadlines Scenario with Large Overlap and Equal Deadlines In this scenario, the two agents have equal deadlines and negotiation intervals overlap to a large degree. The multistage fuzzy strategy obtains full rate of agreements in all strategy scenarios with high average utilities (Figures 4.15a, 4.15b, and Figure 4.16). This result is not surprising as the the large overlap increases the size of the agreement zone and therefore makes it easier for an agent to negotiate competitively while still having a high chance of reaching an agreement. Only in the case where the opponent makes large concessions, i.e. applies conceder time-dependent tactics in the mix, the multistage fuzzy strategy has a lower average utility compared to the average mixed strategy in Figure 4.15, as it also makes relatively large concessions here. However, when the multistage fuzzy agent applies the boulware fuzzy constraints, the average utility is improved to a large degree. Again, because of the equal deadlines and the large agreement zone the multistage fuzzy agent gains utility in all strategy scenarios when applying the fuzzy constraints. 121

140 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL (a) Average utility (top) and agreement rate when compared with the traditional mixed strategy U c 0.6 U c PCaS PCaM PCaL PLaS PLaM PLaL ECaS ECaM ECaL ELaS ELaM ELaL ST ST A 100 PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A 100 EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ST ST (b) Average utility (top) and agreement rate when compared with the concession-based mixed strategy PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.15: Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with large overlap and equal deadlines 122

141 4.5. Evaluation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaS PCaS PCaM PCaL PLaS PLaM PLaL PBaS ST PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.16: Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with large overlap and equal deadlines Scenario with Large Overlap and Different Deadlines In this scenario both agents have different deadlines with a large overlap of the negotiation intervals. Similar to the scenarios where the agents have small overlaps, the different deadlines result in missed agreements and therefore lower average utilities as compared to the scenario with equal deadlines. However, the multistage fuzzy strategy still obtains higher agreement rates and average utilities as the average mixed strategy in many strategy scenarios. The multistage fuzzy strategy obtains lower agreement rates in cases where the opponent uses boulware time-dependent tactics in the mixed strategies with the linear weighted combination (Figure 4.17a). This again is caused by the traditional mixing mechanism that may produce non-monotonic concession curves in such situations. 123

142 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL PCaS PCaM PCaL PLaS PLaM PLaL PBaS 0.1 ECaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM 20 ELrL EBrS EBrM EBrL ST PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL (a) Average utility (top) and agreement rate when compared with the traditional mixed strategy U c PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ST U c ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL ST A A PCaS PCaM PCaL PLaS PLaM PLaL PBaS PBaM PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL 20 ST ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL 20 ST (b) Average utility (top) and agreement rate when compared with the concession-based mixed strategy Figure 4.17: Results for the multistage fuzzy strategy and the average mixed strategies in the scenario with large overlap and different deadlines 124

143 4.5. Evaluation As a comparison, the concession-based mechanism in Figure 4.17b produces higher agreement rates and utilities for both the multistage fuzzy strategy and the average mixed strategy for the strategy groups involving boulware time-dependent tactics (with the former being better in almost all boulware strategy groups). Intuitively, the multistage fuzzy agent obtains lower agreement rates in the case where it applies the boulware fuzzy constraints (4.18). Similar to the scenario with small overlaps, the smaller concessions of that strategy result in the agent missing the zone of agreement in many cases due to the different deadlines of the agents ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL PCaS PCaM PCaL PLaS PLaM PLaL PBaS 0.1 ECaS 100 PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL A PCaS PCaM PCaL PLaS PLaM PLaL PBaS ST PBaL PCrS PCrM PCrL PLrS PLrM PLrL PBrS PBrM PBrL ECaS ECaM ECaL ELaS ELaM ELaL EBaS EBaM EBaL ECrS ECrM ECrL ELrS ELrM ELrL EBrS EBrM EBrL Figure 4.18: Average utility (top) and agreement rate of the multistage fuzzy strategy using boulware fuzzy constraints and the average mixed strategy using the traditional mechanism in the scenario with large overlap and different deadlines It should be noted that the results shown in the evaluation of all four negotiation environment scenarios depend on the chosen cases for the multistage fuzzy strategy as well as how the fuzzy constraints are modelled (cf. Section 4.2.5). In addition, the negotiation strategy of an opponent may not only depend on the behaviour of its counterpart, but also on different factors such as its deadline or the state of a resource in the environment. Because the multistage fuzzy strategy captures the relationship between the modelling agent s and the opponent s concessions it attempts to model the behaviour 125

144 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation of the opponent in relation to the agent s behaviour based on the limited knowledge it observed. An opponent with a strategy that also depends on other factors may indeed behave different than the modelled concession behaviour and lead to different results. 4.6 Related Work and Discussion The problem of bilateral agent negotiation with limited or uncertain information about the strategies of the opponent is known to be hard, and many solution approaches have been proposed to cope with it ranging from simple If-then rules, heuristic tactics to more advanced learning and reasoning techniques [15]. Such adaptive negotiation mechanisms mostly assume agents to steadily explore their environment and other agents behaviour to gain experience from past interactions, or maintain explicit beliefs about utilities, constraints and decision models of their opponents. There two approaches, which are similar to our approach in that they use past cases to generate a negotiation strategy. Matos and Sierra [93] present a case-based reasoning-driven approach that lets agents use past successful interactions to negotiate similar agreements by respectively (case-based) adjusting combined decision function parameters. In fact, alongside the negotiation thread of each case the parameter values of the applied strategies are required. However, that inhibits the use of cases by agents with different individual decision models. Wong et al [139] use observed concessions to capture past negotiation cases and apply certain filters to select the best one. It differs from our approach in that they do not allow for reasoning on and interpolation between the cases and the preferences of an agent. The application of possibility theory to negotiation has been proposed in [17] where the decision on potentially beneficial negotiation partners bases on the expected qualitative utility but without modelling the negotiation process itself as a fuzzy (or possibilistic) Markov decision process. In fact, only a few approaches exist so far. For example, Narayanan and Jennings [96] model the agent s behaviour by defining the states in terms of resource availability, deadlines and reservation values where counteroffers are proposed based on the opponent s offers and changes in those three realms. It is shown that agreements can be achieved much faster when both agents use this algorithm, but no results for cases are provided where only one agent uses this strategy. Similar to our method, Teuteberg [134] models the behaviour of the opponent, but uses a probabilistic approach to generate the transition 126

145 4.6. Related Work and Discussion matrix based on a predefined set of opponent tactics. The major disadvantage of such an approach is the large number of negotiations required to obtain sufficient empirical data for reliable state transitions. Negotiation has also been modelled as a fuzzy constraint satisfaction problem [91] where constraints, preferences and objectives are represented uniformly as fuzzy sets which are distributed among the agents and iteratively relaxed during the exchange of offers [15]. The search process is guided by ordering and pruning the search space but still requires negotiation strategies for proposing offers [76]. Based on the seminal paper of Bellmann and Zadeh [7] decision making in fuzzy environments has been studied and extended by many researchers, such as Kacprzyk [68], Iwamoto [61] and Dubois et al [33], and has been applied in many areas including resource allocation, planning or scheduling [68]. The modelling of agent-based negotiation strategies using multistage fuzzy decision-making represents a new application of the model in the domain of automated negotiation and further demonstrates their ability to respond to opponent s exposing different concession behaviours. The advantages of the multistage fuzzy decision approach result from the fuzzy representation of the state transitions that allow an agent to use a limited number of reference cases to generate the transition model for its multistage decision-making during negotiation. The fuzzy constraints further enable to impose a preferred soft strategy or conditions of agent on the decision process, which provides flexibility in terms of its application in different and more realistic negotiation scenarios. For example, an agent can generate the fuzzy constraints based on the time-dependent decision functions as shown in Section to model different types of negotiation strategies in order to make it more or less competitive. The state-action form and the modelling of the decision problem as a fuzzy Markov decision process enables the use of traditional dynamic programming techniques to find the best course of actions. An agent using this decision model is able to simultaneously take different factors in the environment and the opponent s behaviour into account in order to create more adaptive negotiation strategies for its concession-making during the encounter. On the other hand, a limitation of the multistage fuzzy decision approach in this thesis is the high computational cost, which increases for larger negotiation ranges and more negotiation issues due to the requirement that the state space needs to cover the whole negotiation space. Added to this, the cost of the algorithm also increases because of the required interpolation in the expected fuzzy goal for each state in situations with a small number of reference 127

146 Chapter 4. Multistage Fuzzy Decision-Making in Automated Negotiation cases. It is therefore easier to fuzzify the cases to a form based on fuzzy rules when a large number of cases is available, and use them to create the fuzzy transition model. The similarity between the cases and the current encounter is then used to adjust the fuzzy case constraints. Although the approach is able to successfully model the opponent s concession behaviour based on limited knowledge, it does not consider other possible factors of a negotiation strategy. For example, in many realistic scenarios the opponent s strategy typically does not depend only on the behaviour of its counterpart, but also on factors such as a resource in the environment, a users preference or other outside options. The mixed strategies shown in Chapter 3 represent examples for such strategies. An agent using the multistage fuzzy approach is not able to take such factors into account as they are usually private information. Other important factors of a negotiation strategy such as the reservation value and the negotiation deadline are unknown to the negotiation partner. As a result, an agent with the multistage fuzzy strategy can make concessions in order to obtain an agreement considering the possible concession behaviour of the opponent, but without knowing if the opponent would make concessions beyond the agreement point of the particular cases (as shown in the evaluation in Section 4.5). Furthermore, if the negotiation partner changes its reservation value or deadline, its concession behaviour changes even though it uses the same concession strategy. It would be beneficial to the agent to have a mechanism that anticipates reservation values or the negotiation deadlines of its opponents, which, however, requires domain knowledge or a large amount of historical information. Another method is to adjust the agents own reservation value depending on the negotiation scenario and the domain. This is discussed in more detail in the next chapter with a more complex example scenario in the domain of service-oriented computing. 4.7 Summary In this chapter we have presented a novel approach for modelling an agent s negotiation strategy based on multistage fuzzy decision-making. In this approach, an agent s limited knowledge about the opponent s concession behaviour, for example, in the form reference cases from a few past interactions, is used to create a model with fuzzy state transitions, while the agent s soft preferences are represented using a fuzzy goal and fuzzy constraints. While the fuzzy constraints allow an agent to model different 128

147 4.7. Summary types of negotiation strategies by imposing its soft preferences on the decision process, the fuzzy transition model in the form of states and actions enables the use of traditional dynamic programming algorithms for finding the best course of actions during the encounter. The decision algorithms of an agent using the multistage fuzzy decision model have been presented, as well as some negotiation examples, which have further illustrated the approach using example reference cases and different fuzzy constraint settings. The evaluation has validated the approach by comparing the mutistage fuzzy decision approach with the mixed strategies using the traditional mixing mechanism and the concession-based mechanism from Chapter 3. We have also discussed related work and the limitations of this model. In the next chapter, we present a more complex example scenario for automated concurrent negotiations that is used to present a new coordination mechanism for negotiation strategies, and to demonstrate the applicability of the decision strategies discussed in this thesis. 129

148 Chapter 5 Coordinating Strategies in Concurrent Automated Negotiations In this chapter, we present a new decision mechanism which enables the coordination of negotiation strategies in more complex, one-to-many bilateral concurrent negotiations using an example scenario in the domain of service-oriented computing. In this scenario, a number of agents concurrently negotiate with service providers about the quality of service (QoS) parameters, such as delivery time, price or throughput, in order to establish service level agreements for a number of atomic services within in a workflow-based composite service. The composite service provider typically has some end-to-end QoS constraints over the overall composite service given by the service consumer, which the individual atomic service agents need to consider in their encounters when negotiating towards their negotiation boundaries. In addition, the structure of the composite service influences the aggregation of the particular QoS parameters. Therefore, we propose in this chapter an algorithm for the utility boundary decomposition based on the end-to-end QoS constraints provided by the service consumer and the subsequent redistribution of surpluses from successfully finished negotiations among those remaining considering the structure of the composite service. While the algorithm coordinates the negotiation boundaries of each atomic service agent, it leaves control over the concession behaviour to the individual agents so that they can use the limited knowledge they acquired. It is also shown that the mechanism can increase the number of compound agreements through the method of surplus redistribution of successfully finished negotiations while simultaneously negotiating

149 5.1. Composite Service Provisioning competitively. An experiment using a two-service scenario and the SLA-negotiation example demonstrates the applicability of the proposed decision-making strategies in this thesis and the coordination mechanism. 5.1 Composite Service Provisioning The Service Oriented Computing paradigm has paved the way for a new serviceoriented business model in dynamic business networks [11] that is referred to as Composite Service Provisioning [8, 51] or Service Aggregation [72]. In this model, service based applications are rarely built using a single service but are instead composed by aggregating or bundling several component services to create dynamic business processes that span organisations and computing platforms [104]. This model allows service consumers, providers and composite service providers (CSP) to collaborate in highly distributed environments, and establish on-demand, short-term and dynamic business relationships based upon their requirements, constraints and capabilities. This is particularly relevant in the context of Cloud Computing which is a technology that aims to dynamically deliver on-demand IT resources based on Service Level Agreements [136]. With increasingly competitive business environments, service providers are interested in maximizing their profitability, and service requestors are interested in selecting service providers that best meet their QoS standards. As a consequence, several possible scenarios can arise in the services market. Several providers can offer functionally equivalent services but at varying levels of quality; or any given service provider can offer the same service at varying levels of quality [39]; and service requestors can have varying QoS preferences over a requested service (both atomic and composite). Given the diversity and complexity of end-user needs and the multitude of service offerings, the mechanism chosen for establishment of Service Level Agreements (SLA) between service requestors and service providers (including composite service providers) can vary between simple design-time service selection (based upon static SLAs), dynamic run-time service selection (based upon the latest but fixed SLA offerings) and dynamic automated negotiations over the QoS requirements. Given the unique characteristics of composite service provisioning including a dy- 131

150 Chapter 5. Coordinating Strategies in Concurrent Automated Negotiations namic service landscape, varying non-functional service capabilities and requirements, and changing user requests, service selection may not always be the best solution for SLA establishment. Even manual human based negotiation may be inefficient for negotiating and establishing SLAs for on-demand composite service provisioning. However, automated negotiation is suited for the on-the-fly adjustment of QoS requirements and offerings on the basis of the service consumers needs and the service providers load and current context. In the composite service provisioning process, negotiations have to be held with multiple service providers for each atomic service within the composition such that the end-to-end QoS requirements of the consumer are satisfied. We present an algorithm for the initial global utility decomposition and subsequent surplus redistribution, which takes into account the relative importance of each atomic service in the composition, the appropriate aggregation function for each QoS attribute, and the control flow pattern of the composite service. In scenarios involving multiple concurrent service negotiations, the preference structure over negotiation outcomes, the negotiation strategies and deadlines are unknown to both the service consumers and providers. This increases the risk of failed agreements, especially with more complex composite service structures. For that reason, we propose a mechanism to redistribute the surplus of a successfully finished negotiation among the remaining concurrent ones thereby increasing the chances of reaching agreements for all atomic services within the composition. In addition, if negotiations are unsuccessful or services fail during the provisioning, the atomic services can be renegotiated using our mechanism taking into account already obtained agreements. The proposed mechanism is a practical approach to efficiently coordinate concurrent service negotiations within complex workflows, enabling the iterative and interactive adjustment of the negotiation boundaries for each atomic service in a composition based on the performance and results of other concurrent negotiations. We demonstrate the usefulness of our coordination algorithm by evaluating it with the multi-tactic and multistage fuzzy negotiation strategies using the Specialised Property Search Scenario and show that it obtains significantly higher agreement rates and utilities for a large range of different negotiation behaviours. The algorithm presented in this chapter can be equally applied to QoS negotiation for the provisioning of resources, services and applications on any distributed environment including the grid and the cloud. 132

151 5.1. Composite Service Provisioning Composite Service Provider SLA1 S1 A S1 B Service Consumer Client SLA S3 Composite Service S1 S4 S5 S2 SLA2 SLA3 SLA4 SLA5 S2 A S3 A S4 A S5 A S2 B S3 B S4 B S5 B S1 C Service Providers S6 SLA6 S6 A S6 B Figure 5.1: Composite service provisioning scenario Definitions and Challenges Common Definitions. Figure 5.1 shows a composite service provisioning scenario in which the CSP is a business entity that is adept at generating new business capabilities by aggregating services in innovative ways. A composite service is a logical collection of services that can collectively fulfil the functional requirements of the end-consumers. Each indivisible and self-contained service in the composite service is referred to as an atomic service. Each atomic service belongs to a particular service type (a grouping of functionally equivalent services) and can either be abstract (before negotiation) or concrete (after successful negotiation). There are a number of service providers that can offer functionally equivalent services from which the CSP can choose the best candidate. The concept of service types has been proposed in several research projects and is widely published in [8, 28, 24, 31]. Composite service provisioning generally occurs in two phases. The first phase involves the generation of the abstract process definition that defines a new business capability. This definition describes a logical collection of abstract services along with the control flow and data flow between them. Each composite service is composed according to the various composition patterns and is in turn offered by the composite service provider to targeted 133