Bayesian-based Preference Prediction in Bilateral Multi-issue Negotiation between Intelligent Agents

Size: px

Start display at page:

Download "Bayesian-based Preference Prediction in Bilateral Multi-issue Negotiation between Intelligent Agents"

Nelson Dean
5 years ago
Views:

1 Bayesian-based Preference Prediction in Bilateral Multi-issue Negotiation between Intelligent Agents Jihang Zhang, Fenghui Ren, Minjie Zhang School of Computer Science and Software Engineering, University of Wollongong, Wollongong, NSW, Australia Abstract Agent negotiation is a form of decision making where two or more agents jointly search for a mutually agreed solution to a certain problem. In multiissue negotiation, with information available about the agents preferences, a negotiation may result in a mutually beneficial agreement. In a competitive negotiation environment, however, self-interested agents may not be willing to reveal their preferences, and this can increase the difficulty of negotiating a mutually beneficial agreement. In order to solve this problem, this paper proposes a Bayesian-based approach which can help an agent to predict its opponent s preference in bilateral multi-issue negotiation. The proposed approach employs Bayesian theory to analyse the opponent s historical offers and to approximately predict the opponent s preference over negotiation issues. A counter-offer proposition algorithm is also integrated into the prediction approach to help agents to propose mutually beneficial offers based on the prediction results. Experimental results indicate good performance of the proposed approach in terms of utility gain and negotiation efficiency. Keywords: Multi-issue negotiation, opponent modelling, preference prediction, Bayesian learning Corresponding author. Tel.: Postal Address.: 1/6 Cassian Street, Keiraville, NSW, addresses: jz718@uowmail.edu.au (Jihang Zhang), fren@uow.edu.au (Fenghui Ren), minjie@uow.edu.au (Minjie Zhang) Preprint submitted to Knowledge-Based Systems June 3, 2015

2 1. Introduction Intelligent agents are encapsulated software entities that have the ability to make decisions autonomously in dynamic environments to meet their predesigned objectives [1, 2, 3]. In multi-agent systems, agents usually need to cooperate with each other in order to achieve certain goals in a shared environment. However, the agents may have conflicts about how to cooperate with each other to achieve these goals and this involves negotiation. Agent negotiation is a form of decision making where agents jointly explore possible solutions in order to reach an agreement [4, 5, 6, 7]. In recent decades, agent negotiation technology has been widely developed to solve issues in different areas, such as business transactions in e-commerce [8, 9] and service management in cloud computing [10, 11]. With the support of agent negotiation technology, many operations which originally required human intervention can be conducted automatically and intelligently by autonomous agents, and this means that very large amounts amount of time and money can be saved. Currently, one major research challenge in this area is opponent modelling [12, 13, 14, 15]. More precisely, during a negotiation, agents usually need to use a number of negotiation parameters (i.e. deadline, preference, reservation utility and concession strategy) to make wise decisions so that a win-win agreement can be reached. Some cooperative negotiation strategies have assumed that these negotiation parameters are public information. In a competitive environment (non-cooperate negotiation), however, self-interested agents usually keep their negotiation parameters secret in order to avoid being exploited by their opponents [16]. Without the knowledge of opponents negotiation parameters, agents may have difficulty in adjusting their negotiation strategies properly to a reach win-win agreement. In order to overcome this difficulty, prediction approaches has been integrated into agents negotiation strategies in recent years to estimate opponents negotiation parameters. In multi-issue negotiation, one of the most important negotiation parameters is the negotiation preferences on negotiation issues, because the preferences can play a critical role in terms of agents utility gains and the success rate of a negotiation. Precisely speaking, in multi-issue negotiation, an agent s preference indicates the agent s weighting over different negotiation issues. A high weighted issue can help agents to generate more utility comparing with a low weighted issue. During a multi-issue negotiation, an offer that an agent proposed should not only maximise its own utility, but also 2

3 try to minimise the damage on its opponent s utility, so that the opponent agent will be more willing to accept the offer. In order to propose such an offer, agents need to know their opponents preferences on negotiation issues. According to the opponent s preference, an agent can trade off negotiation issues. In other words, while an agent makes some concession on its opponent highly weighted issues, it also tries to gain some payoff from the low weighted issues, so that both agents can benefit from the offer. In recent years, many different approaches have been proposed to help agents to predict their opponents preferences. These include: genetic algorithm-based prediction [17], statistical analysis-based prediction [18, 19] and machine learning-based prediction [20]. However, all these approaches have different limitations. For example, the approaches in [18, 19] require previous negotiation data to make the prediction and the approach in [20] may need a long training time before the prediction algorithm becomes effective. In this paper, a bilateral multi-issue negotiation approach is proposed in order to overcome the above prediction limitations and to improve the negotiation results. The goal of the proposed negotiation approach is to increase both agents utilities, which can be employed by both of them. In the proposed negotiation approach, Bayesian theory is employed to predict the opponent s preference. The major contributions of the proposed approach are that (1) the proposed preference prediction algorithm does not require any previous negotiation data about the opponent to initialise the prediction. The prediction procedure is an online procedure and only based on the analysis of opponent s counter-offers that are proposed in the on-going negotiation; and (2) the proposed approach has integrated a counter-offer proposition algorithm, which is capable of trading issues effectively based on the predicted preference of the opponent. Therefore, both agents can increase their utilities from the mutual beneficial offer. The rest of this paper is organised as follows. Section 2 presents the details of the proposed negotiation approach, including preference prediction and counter-offer proposition. Section 3 shows the experimental results of the proposed negotiation approach. Section 4 analyses this approach in a case study. Section 5 compares the results to some related work on multi-issue negotiation. Section 6 gives the conclusion and future work. 3

4 2. Negotiation Approach with Bayesian-based Preference Predication This section presents the proposed negotiation approach in detail and is divided into five subsections. Subsection 2.1 introduces the basic negotiation model used in the proposed approach. Subsection 2.2 introduces the basic technical terms used in the proposed approach. Subsection 2.3 presents the detail of agents concession behaviour. Subsection 2.4 describes how to predict an opponent s preference based on Bayesian theory. Subsection 2.5 introduces the procedure of issue trade-off and counter-offer proposition The Basic Negotiation Model In the proposed negotiation approach, certain assumptions are made. First, both agents have target utilities and negotiation deadlines. Second, there is no dependency between negotiation issues. Third, both agents follow the concession-based negotiation strategy to decrease their target utilities [21]. The proposed negotiation approach uses Rubinstein s alternating offer protocol as agents interaction rules during a negotiation [22]. More precisely, a negotiation process is divided into multiple rounds. During each negotiation round, after an agent receives an offer from its opponent, the agent will first check whether the current round has exceeded its negotiation deadline. If the deadline has not been reached yet, the agent will concede its target utility based on parameters defined in its concession strategy (see Subsection 2.3 for detail), then use its utility function to calculate an offer s payoff. Based on the calculation result, the agent can decide whether to accept the offer. If the agent rejects the offer, the agent will try to predict its opponent s preference (see Subsection 2.4 for detail) and propose a counter-offer based on the prediction result (see Subsection 2.5 for detail). The negotiation will end when the agent accepts the offer or its deadline is reached. The proposed negotiation approach also applies package deal procedure to propose offers in each negotiation round [23, 24]. Package deal procedure means that agents treat all negotiation issues as a package (offer) and negotiate all issues simultaneously. By using the package deal procedure, agents can effectively trade-off negotiation issues and achieve a win-win agreement. The negotiation process in a negotiation round is depicted in Figure 1. 4

5 Figure 1: Negotiation process in a single negotiation round 2.2. The Basic Negotiation Terms The proposed negotiation model partially employs the multi-issue negotiation model proposed by Faratin et al. [21]. Let i represent one of the negotiation agent and i represent its opponent agent and j (j 1,..., n) is one of the issues that are negotiated between the two agents. Let x j = [min j, max j ] be a value of issue j and min j, max j represent the lower bound and the upper bound of x j, respectively. Each agent has an evaluation function E i j : [min j, max j ] [0, 1] that evaluates the value of issue j to a normalised value in-between 0 and 1. For example, a general and widely used evaluation function E i j for agent i on issue j can be defined as: E i j(x j ) = x j min j max j min j (1) Agent i s preference is represented by P i = {w i j}, which contains a set of weighting w i j (j 1,..., n) for each negotiation issue. The summation of 5

6 all w i j equals to 1. According to the above terms, an agent s utility function can be defined by Equation (2): U i (X t i i) = n wje i j(x i t i i[j]), (2) j=1 where X t i i represents the offer proposed by opponent i to agent i at time t and x t i i [j] represents the value of issue j in the offer Xt i i. The calculation result of an agent s utility function is also a normalised value between 0 and 1. Let t i max represent the deadline for agent i to complete the negotiation. Agent i also has a target utility Vt i (Vt i = [0, 1]) at time t, which is used to determine whether to accept an offer. As described in previous section, the negotiation protocol used in the proposed prediction approach is based on the Rubinstein s alternating offer protocol. The formal procedure of our negotiation protocol is described as follows. Step 1: At the beginning of a negotiation, after agent i receives offer X t 1 i i from its opponent i, agent i will first compare current time t with t i max. If t > t i max, the procedure will go to Step 2, while if t t i max, the procedure will go to Step 3. Step 2: Because the current negotiation time has already exceeded agent i s negotiation deadline, agent i will terminate the negotiation and the negotiation fails. Step 3: Because agent i still has time for further negotiation, agent i will first concede its target utility Vt i according to the concession strategy and then evaluate the offer X t 1 i i by using its utility function U i (X t 1 i i ). The calculation result will be used to compare with the target utility Vt i. If Vt i U i (X t 1 i i ), the procedure will go to Step 4. If Vt i < U i (X t 1 i i ), the procedure will go to Step 5. Step 4: Because the offer X t 1 i i s utility is greater than or equals to agent i s target utility Vt i, agent i will accept the offer X t 1 i i and the negotiation success. Step 5: Because the offer X t 1 i i s utility is smaller than agent i s target utility Vt i, agent i will reject the offer and try to predict opponent 6

7 i preference. Finally, agent i will propose a counter-offer X t i i to opponent i. The detail of the negotiation procedure is described in Algorithm 1 as follows. Algorithm 1 : Negotiation Procedure 1: agent i receives offer X t i i 2: if t > t i max then 3: quits negotiation 4: else 5: concedes Vt i 6: calculates U i (X t i i ) = n j=1 wi jej(x i t i i [j]) 7: if U i (X t i i ) V t i then 8: accepts offer X t i i 9: else 10: predicts and update the estimation on opponent i s preference 11: proposes counter-offer X t i i 12: end if 13: end if In Algorithm 1, agent i checks its deadline and decides whether to quit the negotiation (Lines 2-3). If the negotiation time does not exceed agent i s deadline, agent i will concede its target utility and evaluate the offer to decide whether to accept it (Lines 5-8). If agent i does not accept the offer, agent i will predict its opponent i s preference and then propose a counter-offer (Lines 10-11) Agents Concession Behaviour During negotiation, autonomous agents usually follow certain strategies to propose offers. We assume both agents use concession-based negotiation strategies and the concession made by an agent is strongly related to negotiation time. As negotiation time increases, both agents must consider further concession on the negotiation issues [21, 25, 26]. Generally, agent i s target utility is set to its maximum value at the beginning of a negotiation (usually equal to 1), when negotiation time reaches agent i s deadline, the target utility must be decreased to the minimum value 7

8 (usually equals to 0) that agent i can accept, which can be defined as: { Vt i Vmax i when t = 0 = Vmin i when t = t i max (3) where V i max and V i min represent the maximum and minimum target utility of agent i, respectively. The agent s concession algorithm can be defined by Equation (4): V i t = V i max (V i max V i min) ( t t i max ) α, (4) where the value of α is used to change the concession strategy, which can be classified as: (1) when 0 < α < 1, agent i will make large concession at the beginning of the negotiation and small concession when the negotiation approaches to the end; (2) when α = 1, agent i will make a constant degree of concession through the whole negotiation; and (3) when α > 1, agent i will make small concession at beginning of the negotiation but increase the concession degree at latter rounds of the negotiation. The detail of the agent s concession strategy is depicted in Figure 2. Figure 2: Agent s concession strategy 8

9 2.4. Bayesian-based Preference Prediction As described in previous sections, agents preferences are extremely important for issue trade-off in multi-issue negotiation. The purpose of issue trade-off is to propose an offer X t i i that can not only maximise agent i s own utility U i (X t i i ), but can also decrease the loss of opponent i s utility U i (X t i i ). For example, suppose that in a negotiation between agent i and opponent i, there are two issues which called j 1 and j 2. For agent i, issue j 1 s weighting is 0.8 and issue j 2 s weighting is 0.2, while for opponent i, issue j 1 s weighting is 0.1 and issue j 2 s weighting is 0.9. During the negotiation, if agent i can propose an offer X t i i that requests more utility on j1 but concedes utility on j 2, then both agent i and opponent i can get more utilities from this offer and a mutual beneficial agreement will be achieved. More formally, if agent i receives an offer X t 1 i i from opponent i at round t 1 and agent i rejects this offer and proposes a counter-offer X t i i at round t, the counter-offer X t i i must fulfil following objectives to achieve a beneficial agreement for both agents. { Objective 1: max U i (X t i i i ) to Vt Objective 2: min U i (X t 1 i i ) U i (X t i i ) The meaning of Objective 1 is that when agent i proposes a mutually beneficial offer X t i i, its utility gain from this offer (U i (X t i i )) should be maximised to its current round s target utility Vt i. The meaning of Objective 2 is that when agent i proposes a mutually beneficial offer X t i i, opponent i utility gain from this offer (U i (X t i i )) should be close to the utility gain from the offer proposed by opponent i at previous round (U i (X t 1 i i )), thus opponent i utility loss will be minimised. Apparently, in order to propose a mutual beneficial offer that reach the above objectives, we need to know opponent i s utility function U i (X t i i ). According to Equation (2), there are two unknown parameters in U i (X t i i ), which are opponent i s weighting wj i on each negotiation issue and opponent i s evaluation function Ej i of each negotiation issues. In the proposed negotiation approach, we assume that most of the issues negotiated between agents are conflict issues. Conflict issues here mean that increasing the value of an issue will help agents to raise their utilities but decrease its opponent s utility. For example, when two agents negotiate over the price for a service, the seller agent will be happy to increase the price while the buyer agent will not. Therefore, the opponent i s evaluation function on issue j can be 9

10 assumed as 1 E i j(x t i i [j]) by agent i and opponent i s utility function can be estimated by agent i as: U i (X t i i) = n j=1 wj i (1 Ej(X i t i i [j])) (5) In order to calculate the final unknown parameter w i j in Equation (5), Bayesian theory is employed. Usually, Bayesian theory is used to calculate the explicit probabilities for a hypothesis. In the Bayesian theory, there is a hypothesis space H, which contains a set of possible hypotheses and Bayesian rule is used to determine the most probable hypothesis among them [27]. The Bayesian rule can be defined as follows: Bayesian Rule: P (h D) = P (D h)p (h), (6) P (D) where h is one of the hypothesis in the hypothesis space H and D is the training dataset; P (h) is the prior probability of the hypothesis h; P (D) is the probability that the training dataset D will be observed given no knowledge about which hypothesis h holds; P (D h) denotes the probability of observing dataset D given the condition that the hypothesis h holds. Finally, P (h D) is the posterior probability, which represents the probability that hypothesis h holds given the observed training dataset D. It reflects the confidence that hypothesis h holds after the training dataset D has been seen. In the multi-issue negotiation field, the hypothesis space H can be used to represent all possible rankings of the negotiation issues of agent i, and the training dataset D are offers and counter-offers in the negotiation [28]. After each negotiation round, agent i must update the belief of each hypothesis h in the hypothesis space H according to the latest offer. Let H w donate the possible ranking of the negotiation issues of opponent i and h m (h m {1,..., n}) represent one of the hypothesis (ranking of the issues) that belongs to hypothesis space H w. The weights of negotiation issues can be normalised by Equation (7) [28]: w m j = 2 r m j n(n + 1), (7) where w m j represents the weighting of issue j in hypothesis h m and r m j donates the ranking of issue j in hypothesis h m. The ranking starts from the least important issue to the most important issue. 10

11 Before Bayesian theory can be applied, a uniform distribution is assigned to the hypotheses in the hypothesis space H w. More precisely, if there are n hypotheses in H w, the prior probability of each hypothesis is assigned with 1 n. During each round of negotiation, when a new offer is received from opponent i, the Bayesian rule will be used to calculate the posterior probability of each hypothesis h m. The calculation is defined by Equation (8): P (h m X t i i) = P (Xt i i h m)p (h m ) n k=1 P (Xt i i h k)p (h k ), (8) where P (h m ) represents the latest probability of hypothesis h m and P (h m X t i i ) represents the posterior probability of hypothesis h m given the condition that offer X t i i is proposed by opponent i and received by agent i at time t. The only unknown parameter in Equation (8) is the conditional probability P (X t i i h m). P (X t i i h m) means that when the given hypothesis h m is hold, the probability that opponent i offers X t i i. We use Figure 3 to demonstrate the calculation process of P (X t i i h m). Figure 3: Calculation of P (X t i i h m) 11

12 In Figure 3, the solid line represents opponent i real concession line and the square points on this solid line represent opponent i real target utility of each negotiation round. Besides, the dash lines in Figure 3 represent the estimated concession lines of opponent i and the trigonal points on these dash lines are opponent i s estimated target utilities of each negotiation round. These estimated target utilities are calculated based on the preference hypotheses of opponent i. For example, in Figure 3, the opponent i estimated target utilities on dash line l h1 are calculated based on opponent i s preference hypothesis h 1. Term Vh t m represents opponent i s estimated target utility on time t based on hypothesis h m. In order to calculate the conditional probability P (X t i i h m), the similarity between opponent i real concession line and agent i s estimated concession lines needs to be analysed. A estimated concession line that has the highest similarity to opponent i real concession line indicates the hypothesis used to calculate this concession line is most close to opponent i s real preference. Opponent i s real target utility in each negotiation round can be estimated by Equation (4). In order to calculate the similarity between opponent i real concession line and the estimated concession lines, the regression analysis is used. The non-linear correlation can be calculated by Equation (9): γ hm = t t t=1 t=1 (V i t (V i t V i )(Vh t m V h t m ) V i ) 2 n t=1 (V t h m V, (9) h t m ) 2 where γ hm represents the non-linear correlation for hypothesis h m, t represents current negotiation round, Vt i represents opponent i s target utility at round t, V i represents the average value of opponent i s target utilities until round t and V h t m represents the average value of opponent i s target utility, which is calculated based on preference hypothesis h m. After calculation of each P (h m X t i i ), agent i will use the calculation results to update the probability distribution of hypothesis space H w. Finally, the hypothesis h m that has the maximum posterior probability is set as h max m, which represents the most believable preference hypothesis in current negotiation round. The issues weightings in h max m will be used by agent i to trade off issues and propose a counter-offer to opponent i. The detail of the preference prediction procedure is illustrated in Algorithm 2, which is mainly divided into four steps as follows. 12

13 Step 1: Agent i uses Equation (3) to update opponent i real target utility (Lines 1-2). Step 2: If negotiation round is the first round, agent i will initialise the hypothesis space H w (Lines 3-8). Step 3: Agent i calculates the non-linear correlation of each estimated concession line by using Equation (9) and apply the result to Bayesian rule (Equation (8)) to calculate each hypothesis h m s posterior probability (Lines 9-15). Step 4: Agent i chooses the hypothesis that has the maximum posterior probability and set it as h max m (Line 16-17). Algorithm 2 : Preference prediction 1: apply t to concession equation Vt i = Vmax i (Vmax i Vmin) i ( t ) α t i max 2: update opponent i s target utility Vt i 3: if t = 0 then 4: initialise hypothesis space H w 5: for all h m H w do 6: assign probability 1 to p(h n m) 7: end for 8: end if 9: for all h m H w do 10: apply h m and X t i i to U i (X t i i ) = n j=1 rj m(1 Ej(x i t i i [j])) 11: calculate V t h m 12: calculate γ hm = t t=1 t t=1 (V i t (V i t 13: calculate P (h m X t i i ) = P (Xt i i 14: save result of P (h m X t i i ) 15: end for 16: choose the maximum P (h m X t i i ) 17: set h m as h max m V i )(Vhm t V hm t ) V i ) 2 n t=1 (V t V hm t )2 hm)p (hm) n k=1 P (Xt i i h k)p (h k ) 2.5. Counter-Offer Proposition As described above, the main purpose of preference prediction in our negotiation approach is to use the prediction result to trade offer issues and 13

14 than propose beneficial offers for both agent i and opponent i. According to Subsection 2.4, there are two objectives that agent i must try to reach during its counter-offer proposition, which are (1) maximise its owe utility and (2) minimise opponent i s utility loss. More precisely, the proposition of the new counter-offer is based on the adjustment of the offer (X t 1 i i ) that was sent by opponent i at round t 1. In order to meet the first objective, agent i should start counter-offer utility increasing, which is to increase the new counter-offer s utility to its next current target utility Vt+1. i Since it is assumed that the issues negotiated between agent i and opponent i are conflict issues, thus every time agent i tries to gain utility from an issue, opponent i will lose certain utility from this issue. In order to minimise such a utility loss for opponent i (the second objective), agent i must choose an issue that has high weighting for itself but low weighting for opponent i. For example, assume that agent i and opponent i negotiate on three issues and their weighting of these issues are listed in Table 1. By comparing agent i and opponent i s weightings, it is clearly that issue j 1 is the most suitable issue for agent i to increase its counter-offer s utility. This is because if agent i uses issue j 1 to increase 0.1 utility, it will only cause opponent i to lose utility. While if agent i uses issue j 2 or j 3 to increase 0.1 utility, opponent i will lose utility and 0.25 utility, respectively. Table 1: Utility increase example 1 j 1 j 2 j 3 agent i weighting opponent i weighting In detail, before the counter-offer proposition starts, agent i needs to calculate the utility increasing ratio of each issue according to the preference prediction result of opponent i, which can be calculated by Equation (10): η j = wi j, (10) where η j represents the utility increasing ratio of issue j and wj i represent opponent i weighting on issue j in hypothesis h max m. After the calculation of the utility increasing ratio of each issue, agent i must choose the issue with the highest increasing ratio to start its counteroff s utility increasing. The procedure to increase the utility of a particular 14 wj i

15 Figure 4: Utility Increasing example 2 issue depends on this issue s evaluation function Ej(x i t i i [j]). If the evaluation result of issue j is increasing with the increasing of x t i i [j] (see Figure 4 (b)), agent i must try to increase the value of issue j to gain more utility. On the contrary, if the evaluation result of issue j is increasing with the decreasing of x t i i [j] (see Figure 4 (a)), agent i must try to decrease the value of issue j to gain more utility. Besides, the utility increasing for each issue in the offer has a boundary, which was defined by the initial value of issue j (x ini j ). The value of x ini j also depends on issue j s evaluation function Ej(x i t i i [j]). If the shape of Ei j(x t i i [j]) is monotone decreasing (see Figure 4 (a)), the value of x ini j equals to issue j s minimal acceptable value min j. Contrarily, If the shape of Ej(x i t i i [j]) is monotone increasing (see Figure 4 (b)), the value of x ini j is equal to issue j s maximum acceptable value max j. When a negotiation issue has reached its utility increasing boundary x ini j, agent i must choose another issue according to the issue s utility increasing ratio to purpose the counter-offer. The detail of the counter-offer proposition procedure is described in Algorithm 3, which is divided into four steps as follows. Step 1: If the negotiation round is the first round, agent i will set all issues values to their initial values (Lines 1-5). Then, the procedure goes to Step 4. Step 2: If the negotiation is not the first round, agent i will initialise its counter-offer based on offer x t 1 i i [j] (Lines 6-13). Then, the procedure goes to Step 3. 15

16 Step 3: After the counter-offer initialisation, agent i needs to increase the counter-offer s utility to its next round s target utility Vt+1. i The utility increasing starts from the issue that has the highest utility increasing ratio (Lines 16-20). During the utility increasing, agent i needs to check whether the value of the issue has reached its utility increasing boundary (Lines 22-34). The utility increasing procedure will stop when the counter-offer s utility equals to Vt+1. i Then, the procedure goes to Step 4. Step 4: Agent i sends the counter-offer X t i i to opponent i (Line 38). 3. Experiment In this section, experimental results are presented and the performance of our negotiation approach is analysed. The experiments focus primarily on testing the improvement in agents utility gain and negotiation time when employing the proposed prediction approach. The rest of this section is divided into two subsections. Subsection 3.1 describes the experimental settings and Subsection 3.2 shows the experimental results and performance analysis in three different experimental scenarios Experimental Setting In the experiments, our negotiation approach was tested in three different scenarios, as shown in Table 2, which are: (1) both agents do not apply the preference prediction and the issue trade-off during the negotiation, (2) only one of the negotiation agent applies the preference prediction and the issue trade-off and (3) both agents apply the preference prediction and the issue trade-off during the negotiation. Table 2: Experimental Scenarios Scenario Preference prediction Issue Trade-off 1 No Agent No Agent 2 Agent 1 Agent 1 3 Agent 1 & 2 Agent 1 & 2 In Scenarios 1 and 2, when an agent does not apply the preference prediction and the issue trade-off approaches that were described in Algorithms 16

17 Algorithm 3 : Counter-offer Proposition 1: if t = 0 then 2: for all x t i i [j] Xt i i do 3: set x t i i [j] = xini j 4: end for 5: else if 0 < t t i max then 6: for all x t i i [j] Xt i i do 7: if x t 1 i i [j] > max j then 8: set x t i i [j] = max j 9: else if x t 1 i i [j] < min j then 10: set x t i i [j] = min j 11: else 12: set x t i i [j] = xt 1 i i [j] 13: end if 14: end for 15: calculate Vt+1 i = Vmax i (Vmax i Vmin) i ( t+1 t i max 16: for all wj m h max m do 17: calculate η j = wi j wj m 18: end for 19: choose the issue has the highest η j 20: set this issue as k 21: while U i (X t i i ) V t+1 i do 22: increase x t i i [k] by δ to make U i (X t i i ) = V t+1 i 23: if Ej(x i t i i [k]) is monotone increase then 24: δ = (V t+1 i U i (X t i i )) (max k min k ) + min wi k k x t i i [k] 25: else 26: δ = (V t+1 i U i (X t i i )) (min k max k ) + min wi k k x t i i [k] 27: end if 28: if (x t i i [k] + δ) exceed xini k then 29: 30: set x t i i [k] = xini k choose the next highest η j issue 31: set this issue as k 32: else 33: set x t i i [k] = xt i i [k] + δ 34: end if 35: update x t i i [k] in xt i i 36: end while 37: end if 38: send new offer X t i i 17 ) α

18 2 and 3, respectively, it will simply maximise its own utility without considing its opponent s utility. More precisely, when a self-interested agent tries to propose offers, it will randomly choose issues to increase its utility to its target utility. For each experimental scenario, the negotiation issue s setting and the agent s initial parameters are same. An issue s minimal value (min j ) is randomly selected from 0 to 500 and the maximum value (max j ) is randomly selected from 1000 to The preference values (w j ) of all five negotiation issues are random numbers between 0 and 1. An agent s minimum target utility (V min ) is randomly selected from 0 to 0.1 and its maximum target utility (V max ) is randomly selected from 0.9 to 1. The deadlines (t max ) for both agents are set to 1000 and their concession strategies (α) are set to 1. The evaluation function (E i j) used by agents is derived from Equation (1), which can be defined as: Ej(x i t i i [j]) = xt i i [j] xres j, (11) x ini j x res j where x j represents the value of issue j, x ini j and x res j represent agent i s initial and reservation values on issue j, respectively. Like x ini j, the value of x res j also depends on the shape of Ej(x i t i i [j]) (see Subsection 2.5 for detail). The detail of our experiment parameters are shown in Table 3 and Table 4. Table 3: Parameters for Both Agent s Setting Agent V max V min t max α Ej(x i t i i [j]) x agent 1 & 2 [0.9, 1] [0, 0.1] t i i [j] x res j x ini j x res j Table 4: Parameters for Negotiation Issue s Setting Issue max j min j w j Ej(x i t i i [j]) Shape issue 1 to 6 [1000, 2000] [0, 500] [0,1] {monotone increase, monotone decrease} 3.2. Experimental Results and Analysis For each of the experimental scenarios, we tested an offer s utility when the offer was accepted by agent 1 and agent 2. We also recorded the time 18

19 needed by agents to accept an offer in the three scenarios. By comparing the experimental results of the three different scenarios, we can understand the overall performance of the preference prediction and issue trade-off algorithms in our negotiation approach. Furthermore, we tested our negotiation approach on different numbers of negotiation issues (from two issues to six issues), thus we could have a glimpse of how issue number could affect the performance of our negotiation approach. Since our negotiation approach has employed Bayesian theory to predict the opponent s preference, the prediction results could be greatly affected by the opponent s preference and the acceptance range on the negotiation issues. This could decrease the accuracy of our experimental results. In order to solve this problem, the experiment in each scenario was repeated 1000 times and the average results were recorded, thus our experimental results would be robustness and generality. Figure 5 (a) shows the average utility when an offer was accepted by agent 1 in the three experimental scenarios. Figure 5 (b) presents the average utility when agent 2 accepts the offer. Figure 5 (c) shows the total utility of agent 1 and agent 2. 19

Figure 5: Agents utility testing In Figure 5 (a), we can see that when both agent 1 and agent 2 did not apply our negotiation approach, agent 1 s utilities are below 0.

20 Figure 5: Agents utility testing In Figure 5 (a), we can see that when both agent 1 and agent 2 did not apply our negotiation approach, agent 1 s utilities are below 0.55 from 2-issue tests to 6-issue tests. Figure 5 (b) shows the similar result of agent 2 s utility tests. Such experimental results are not surprising, since when both selfinterested agents try to maximise their own utilities, the negotiation usually will not end with a high utility agreement. In Scenario 2, after agent 1 applied our negotiation approach, we can see from Figures 5 (a) and 5 (b) that not only agent 1 s utility had the obvious increment, but agent 2 s utility was also slightly increased. Such experimental results indicate that although only agent 1 applied our negotiation approach, the mutually beneficial offer proposed by agent 1 could still help both agents to gain more utilities from the final agreement. In Scenario 3, when both agents applied our negotiation 20

approach, both agents utilities were increased significantly. Figure 5 (c) shows when the number of issues is between 3 and 6, the agents overall utility in Scenario 3 is higher than 1.

21 approach, both agents utilities were increased significantly. Figure 5 (c) shows when the number of issues is between 3 and 6, the agents overall utility in Scenario 3 is higher than 1.52, while the overall utility in Scenario 2 is below Based on the experimental results, we can conclude that our negotiation approach can help agents to predict their opponent s preferences by analysing historical counter-offers, and to produce mutual beneficial offers based on the prediction results. Figure 6: The time needed when agent 1 and 2 reach an agreement In addition to agents utilities, we also recorded the negotiation time that agents needed to reach an agreement in the three experimental scenarios. From Figure 6 we can see that agents in Scenario 3 used the least time to complete the negotiation, while agents in Scenario 1 used the most time to reach an agreement. The experimental results indicate that the preference prediction in our negotiation approach can help an agent to have a better understanding of its opponent s preference, so as to efficiently propose a satisfying offer for its opponent before agents concede too much target utilities. Although the preference prediction will increase the computation time during each negotiation round, agents can still save more time by using the preference prediction to decrease the total negotiation rounds required to reach an agreement. 21

22 4. Case Study In the previous section, we demonstrated that the proposed prediction approach can not only help both negotiation agents to increase their average utility gain but also decrease their average negotiation time in 1000 experiments of the three scenarios. In order to better understand how our negotiation approach can affect agents behaviour, a case study is presented in this section. These case study has three purposes: (1) to analyse the relationship between the preference prediction and the utility of agents offers in each negotiation round, (2) to demonstrate that by using the result of preference prediction, agents can propose beneficial offers in each negotiation round and (3) to demonstrate that agents utility gain from the negotiation agreement is close to the Nash equilibrium Case Study Setting The case study was conducted in the three experimental scenarios. The parameters of negotiation issues are listed in Table 5 and the agents parameters are listed in Table 6. During each negotiation round, when agent 1 or agent 2 proposed an offer, this offer s utility was calculated by both agents utility functions and the results were recorded. The difference between the predicted preference and the opponent agent s real preference was also recorded during each negotiation round. The preference difference is calculated by Equation (12): φ i t = n j=1 w i j w m j, (12) where φ i t represents the preference difference between agent i s prediction result and opponent i s real preference in round t, wj i represents opponent i s real weighting of issue j and wj m represents agent i s predicted weighting of issue j in hypothesis h m. 22

23 Table 5: The Settings for Negotiation Issues Issue M AX M IN Agent 1 preference Agent 2 Preference No Table 6: The Settings for Negotiation Agents Agent V max V min t max α agent agent In addition to the agent offers utilities and the preference prediction difference in each negotiation round, a Nash equilibrium line is also calculated. Generally, the Nash equilibrium means that in a game each player knows the other players strategies and if each player chooses the best strategy by the consideration of other players strategies, the current set of strategy choices and the corresponding payoffs constitute a Nash equilibrium [29]. In a negotiation, if an offer reaches Nash equilibrium, this offer has the maximum total utility that the negotiation parties can gain considering about each other s preferences. The purpose of recording this information is to see whether our prediction approach can help agents to reach the Nash equilibrium at the end of a negotiation. By analysing the change of both agents utility gains in each negotiation round before and after our prediction approach is employed, we can further discover what effects the proposed prediction approach can have on each negotiation agent. 23

4.2. Case Study Results and Analysis Figure 7: Case Study in Scenario 1 Figures 7 (a), 8 (a) and 9 (a) show agent 1 and 2 s utility gains from the offer in each negotiation round in the three

24 4.2. Case Study Results and Analysis Figure 7: Case Study in Scenario 1 Figures 7 (a), 8 (a) and 9 (a) show agent 1 and 2 s utility gains from the offer in each negotiation round in the three scenarios, respectively. The y-axis of these three figures represents an offer s utility evaluated by agent 1, while the x-axis represents the offer s utility evaluated by agent 2. In these 24

25 charts, we can see three lines. The line with cross points was generated from agent 1 s offers and the line with circle points was generated from agent 2 s offers. The line with trigonal points was generated by Nash equilibrium offer in each negotiation round. Furthermore, the numbers marked on the utility line and Nash equilibrium line represent the negotiation round when the offer was generated. Besides, Figures 7 (b), 8 (b) and 9 (b) show the differences between the preference prediction results and the opponent s real preference in each negotiation round in three scenarios, respectively. In these three figures, the lines with cross points were generated by agent 1 s prediction results and the lines with circle points were generated by agent 2 s prediction results. In Figure 7 (a), we can see that when both agents did not apply our negotiation approach, the utilities that agent 1 and agent 2 could gain from same offer were very different at the beginning of the negotiation. Using the data in the first negotiation round as an example, when agent 1 proposed an offer which had 0.9 utility, agent 2 could only gain 0.12 utility from this offer. In the second negotiation round, agent 2 proposed an offer with 0.88 utility, agent 1 could only gain 0.46 utility from this offer. Such an experimental result is not surprising, since at the beginning of the negotiation, agents target utilities are usually close to 1. This means agents need to try their best to gain utilities from negotiation issues, which leave only little benefit to their opponents. We can see from Figure 7 (a) that with the negotiation keeps going, both utility lines are extended in the same direction (the centre of the chart) and finally merged with each other at the 28th negotiation round. This is because both agents decreased their target utilities slightly in each negotiation round. Consequently, both agents could reach their target utilities so as to reach an agreement. Although agent 1 and agent 2 reached an agreement before their deadlines, we can see agent that 1 and agent 2 only gained 0.50 and 0.48 utilities from the final offer, respectively. Apparently, this offer is not close to the Nash equilibrium offer (the trigonal point marked as 50). This is mainly because agents did not know each other s preferences, they randomly chose issues to increase their offers utilities, and this causes agent 1 and agent 2 s utility lines to have more fluctuations. For example, in the first three offers proposed by agent 1, we can see that agent 2 s utility gain from these three offers was continually increased from 0.12 to 0.31, which indicated that agent 1 might accidentally choose issues with a high utility increasing ratio when proposing these three offers (see Equation (10) for detail). In round seven, however, agent 1 proposed an offer in which 25

agent 2 only gained 0.15 utility, which indicates agent 1 might choose issues with a low utility increasing ratio when proposing the offer in this negotiation round.

26 agent 2 only gained 0.15 utility, which indicates agent 1 might choose issues with a low utility increasing ratio when proposing the offer in this negotiation round. Obviously, the fluctuation of agent 1 and agent 2 s utility lines would significantly increase the merge time needed for these two lines, which caused the merge point (the agreement) to deviate from the Nash equilibrium point. Figure 8: Agent 1 and 2 s utility in Scenario 2 26

27 In Figure 8 (a), we can see that after agent 1 applied our negotiation approach, the cross points on the utility line generated by agent 1 s offers have a greater horizontal range and less fluctuation compared with the point s range in the Figure 7 (a). Obviously, the larger horizontal range between each cross point indicates that the utility that agent 2 gained from agent 1 s offer had a big increment in each negotiation round. Such a result is due primarily to the fact that agent 1 predicted agent 2 s preference before generating a counter-offer, so agent 1 could trade off negotiation issues by using Algorithm 3. More precisely, in Figure 8 (b), we can see that agent 1 s preference prediction difference is quite large (0.452) in the first negotiation round, which cause that agent 2 could only get 0.16 utility from agent 1 s first offer. However, in the next eight rounds, the prediction difference was continually decreased from to Correspondently, agent 1 s utility line has the most obvious horizontal increment during these eight negotiation rounds. After finding out the preference hypothesis that could have the smallest difference (0.115) compared to agent 2 s real preference, the horizontal increment of agent 1 s utility line slowed down and finally merged with agent 2 s utility line at round 19. Apparently, the merge point is much closer to the first Nash equilibrium point compared with the result in Scenario 1, which is mainly because agent 1 s utility line has no fluctuation after the preference prediction and issue trade-off. Through further analysis of Figures 7 (a) and 8 (a), we can see that there is no significant difference between the utility lines generated by agent 2 s offers in these two figures. This result is quite normal, since in both Scenarios 1 and 2, agent 2 did not apply our negotiation approach, thus agent 2 only considered maximising its own utility. 27

28 Figure 9: Agent 1 and 2 s utility in Scenario 3 Figure 9 (a) shows the utility change after both agents applied our prediction approach. In Figure 9 (b), we can see that agent 2 used eight rounds to calculate the preference hypothesis which is most similar to agent 1 s real preference. As a result, agent 2 s utility line has a large vertical increment in the first eight rounds. Such change indicates that after agent 2 applied the prediction approach, agent 1 could get more utility from the offer proposed 28

29 by agent 2 as well. By comparing the merging points of the utility lines from two agents in the three figures, we can also see that the merger point in Figure 9 (a) is closest to the upper right corner of the figure and this merge point is almost identical to the first Nash equilibrium offer. Apparently, if the merge point is closer to the upper right corner of the figure, agent 1 and agent 2 s utilities will be closer to 1 when the negotiation ends. This result indicates that both agents in Scenario 3 have the highest overall utility when they have reached the agreement. Based on the above case study, we can confirm that our prediction approach can help agents to regularly propose mutual beneficial offers based on the preference prediction results and to help agents to reach Nash equilibrium at the end of a negotiation. 5. Related Work In this section, some related work to opponent prediction in automated negotiation is given and the difference between our approach and related work is also analysed. In [30], Zeng and Sycara proposed a sequential decision making model for multi-issue negotiation, called Bazaar. This negotiation model used a Bayesian learning algorithm to predict the opponent s reservation value of certain negotiation issues. During the negotiation, once the agent receives information that comes from its opponent or the outside world, the agent will update its beliefs about the opponent agent s reservation value. Our negotiation approach also employes Bayesian theory to model the opponent agent. Instead of trying to predict the opponent s reservation value, however, our negotiation approach focuses on the preference prediction, which plays an important role in the proposition of mutually beneficial offers for multi-issue negotiations. In [20], Soo and Hung used a machine learning approach to predict opponent s preferences. Their approach is based on Q-learning, which is a model-free reinforcement learning technology. In reinforcement learning, an agent has a set of actions. Each time an agent tries to interact with its environment by taking an action, the agent will receive a reward. Through analysing the reward, the agent will learn whether the action is good or bad. In agent negotiation, if an opponent has rejected an offer, then this offer is marked as a negative instance for the Q-learning algorithm, while a counter-offer proposed by the opponent gives a positive reward. However, 29

30 their negotiation approach assumes that the opponent s reservation price is public information, which is rarely or never the case in most automated negotiation scenarios. Our negotiation approach does not require any information of the opponent s reservation price. In [19], Ros and Sierra introduced a simple statistical analysis theory to predict agents preferences. They considered that issues with fewer changes are more important than those with more changes during a negotiation. For example, if an agent considers delivery time as a high preference issue, the agent may try to keep it as stable as possible with small changes. A more comprehensive preference prediction approach based on statistical analysis was proposed by Coehoorn and Jennings in [5], which called kernel density estimation (KDE). In order to make prediction, KDE needs an offline processing of any data available about agents previous negotiation for the provision of a particular service. Then according to the processing result, a probability density function over the opponent s likely preferences for the various issues can be acquired. This function can be used by online learning to reflect new information from the ongoing negotiation. More precisely, the data that KDE uses for analyse are the offers and counter-offers in agents previous negotiations. In particular, KDE can be used to analyse the offers that are proposed at the beginning and end of a negotiation. The authors assumed that a relatively small change in an issue in the offer at the beginning of the negotiation might indicate that this issue is more important than other issues for the opponent. While at the end of the negotiation, a relatively large concession of an issue might indicate that this issue is important. By comparing the difference in a negotiation issue between multiple offers, the opponent s preference over this issue might be estimated. The advantage of their preference learning approach is the constant lookup of a prediction. However, one major problem is the requirement for offline analysis of prior negotiation data of their learning approach. For agents which encounter each other for the first time, this learning approach is not suitable. In our negotiation approach, an agent can predict its opponent s preference based only on the offers in the on-going negotiation. In [31], Jomker and Robu proposed a model for integrative bilateral multiissue negotiation in which all issues are negotiated simultaneously. In their negotiation model, both negotiation agents need to reveal partial preference information over some unimportant issues before a negotiation starts, but keep their preferences over important issues in secret. During the negotiation, a heuristic guessing approach is used by the agent to analyse the 30