CHAPTER 3 IDENTIFICATION OF THE MOST PREFERRED ATTRIBUTE USING CONJOINT ANALYSIS In a heterogeneous distributed computing environment, the weighted nodes scheduling principles assigns the number of requests to each server based on its serving capacity. The weights are assigned to each server based on one or more of its attributes. To design a weighted nodes scheduling technique, in this research work, the most preferred attribute of a server is used to measure its serving capacity. In order to identify the most preferred attribute of a server, Conjoint analysis, a statistical method is used. The subsequent sections discusses Conjoint analysis in detail with illustration. 3.1 Introduction to Conjoint Analysis Conjoint Analysis or Stated preference analysis is a mathematical statistical technique that originated in mathematical psychology. It is used in social sciences and applied science including marketing, product management and operations research. This analysis is used to measure the customers preferences based on the attributes of a product [80, 81]. It quantifies each attribute s preference value using multi-linear regression. It is also used to assess different combinations of preference used by individuals to choose a product or service. Conjoint analysis is widely used in market research to determine how people value different attributes that make up an individual product or service. Conjoint analysis estimates the level of influence of the attributes called as the part-worth utility. This numeric value represents the quantified measure of the preference of an individual attribute. Part-worth utility is calculated based on the preference / ranking of the defined set of combinations of the attributes values [82, 83]. 34
For example, when a customer buys a product, using conjoint analysis, it is possible to derive the level of influence of that product s attributes such as price, design and brand that influence the purchasing decision. After getting the preferences of the attribute combinations from the customer, conjoint analysis ranks the decision by the percentage of price, percentage of design and percentage of brand [84]. Also, the conjoint analysis enumerates the influence level of each attribute over the other attributes. 3.2 Suitability of Conjoint Analysis for this Research Though there are other preference analysis techniques as mentioned in literature [75, 76], conjoint analysis is more suited to this research work since it offers the flexibility to perform the analysis on any number of attributes with relative ease. Also, conjoint analysis offers a visual representation, which makes the analysis of preference between attributes easier. These reasons favor conjoint analysis to be adopted in this research work. 3.3 Illustration of Conjoint Analysis To perform conjoint analysis, a set of attributes from the domain of interest are chosen after a careful study. Since, this research is focused on the distributed computing environment, the pertinent attributes from its structural components are identified and presented in the next section. 3.3.1 Identification of Set of Attributes In a distributed computing environment, the major structural components from which a set of attributes can be identified are chosen in Figure 3.1. i Servers ii Users iii Requests iv Services v Network 35
Requests Users Scheduler Network... Server 1 Server 2 Server 3... Server n Servers Services Figure 3.1 Components of a distributed system The attributes having a major influence on request scheduling are listed below. Attributes from Servers i Processor speed: It is the number of instructions executed per second by the computer measured in terms of megahertz or gigahertz. ii Load capacity: The number of parallel sessions/connections a server can handle. It is limited by the server s hardware configuration and the network. iii Bandwidth: The rate at which the data is transferred from one point to another. 36
iv Operating system: It is software that manages computer resources and provides common services. Linux, Windows, and Mac OS are the few examples. v Storage capacity: Maximum memory capacity of a server that restricts the number of services it can process. vi Suitability of the server: Approriateness of a server to process specialized request. vii Participation policy of the server: The condition based on which a server participates in the distributed system to accept requests for processing. It can be based on a request s geographical origin or the type of data handled by the requesting service. Attributes from Requests i Arrival time: Time at which the request is received by the scheduler. ii Burst time: Time taken by the server to complete the submitted request. The burst time is distinct for each request. iii Demand: It is the consolidated number of requests collected between two time units t i and t i+1. iv Queue size: The size of the request pool that contains the request received by the request scheduler. v Geographical proximity: Closeness of a server to the gerographical origin of the request. vi Nature of the request: The types of requests. Examples include request for multimedia service, request for printing service etc. Attributes from Users i User type: User classification based on a criterion. ii Preference of node: Users preference to choose their servers. 37
Attributes from Network i Connection cost: This is the measure of network costs. It signifies the time taken for a byte of data to travel across the network from one point to another. ii Traffic rate: Traffic rate is the rate of movement of data in the network. The rate varies based on time and region. 3.3.2 Attributes Chosen for the Illustration To design a scheduling principle from the perspective of the servers, in this research work, conjoint analysis is performed on the attributes of the severs. From literature, it is observed that following are the major attributes of a server impacting scheduling [16, 85]. i Operating System ii Load capacity iii Memory Attributes in conjoint analysis are also known as factors. The number of instances of each factor is called Levels. Let the number of factors affecting the preference of a product be m and the number of levels of the i th factor be n i, where i varies from 1 to m. To start with, a set of factors and levels are chosen. In this work, conjoint analysis is performed on a scenario having two servers with three attributes for illustration. Table 3.1 shows the chosen factors and their levels for the analysis. Table 3.1 Factors and levels Factor Server 1 Server 2 Operating System OS Windows Linux Load capacity SC 8 Users 6 Users Memory Mem 4 GB 8 GB 38
3.3.3 Derivation of Factorial Combinations From the factors and their levels, factorial combinations are derived as follows:. For k factors, there are 2 k possible combinations as represented in the Figure 3.2. Load capacity 8 Users 6 Users Memory 4 GB 8 GB 4 GB 8 GB OS Windows a b c d Linux e f g h Figure 3.2 Complete factorial design of product profiles The full factorial design known as product profile is derived using all the possible combinations and presented in Table. 3.2. Table 3.2 Complete factorial combinations Product profile a b c d e f g h Factorial combination OS: Windows; Load capacity: 8 Users; Memory: 4 GB OS: Windows; Load capacity: 8 Users; Memory: 8 GB OS: Windows; Load capacity: 6 Users; Memory: 4 GB OS: Windows; Load capacity: 6 Users; Memory: 8 GB OS: Linux; Load capacity: 6 Users; Memory: 4 GB OS: Linux; Load capacity: 6 Users; Memory: GB OS: Linux; Load capacity: 8 Users; Memory: 4 GB OS: Linux; Load capacity: 8 Users; Memory: 8 GB 39
This illustration uses multi-factor evaluation approach because the factor and levels are limited. If the total factorial combination is too large, one can select a limited number of product profiles using orthogonal design [80, 87]. 3.3.4 Ranking the Preferences In this step, all the possible combinations are ranked. Rank for the most preferred combination is assigned as 1, going upto 8 for the least preferred combination as presented in Table 3.3. For this illustration, a survey was carried out using http://www.surverymonkey.com for getting the preferences of attributes of the servers. The questionnaire used as part of the suvery is presented in Appendix A. The participants in the survey have been chosen with various levels of expertise. Table 3.3 shows the ranking of attributes from the survey. Table 3.3 Ranking the preferences Product profile Factorial combination Preference a Windows, 6 Users, 32GB 8 b Linux, 6 Users, 32GB 7 c Windows, 6 Users, 64 GB 4 d Linux, 6 Users, 64GB 3 e Windows, 8 Users, 16GB 6 f Linux, 8 Users, 16GB 5 g Windows, 8 Users, 64GB 2 h Linux, 8 Users, 64GB 1 3.3.5 Assignment of Levels In order to bring the factorial combinations into a mathematical model, the levels are to be coded. In this illustration, a value of -1 is assigned for the least 40
preferred attribute and +1 for the most preferred attribute. For example, Windows is coded as -1 and Linux is coded as +1. The 3 attributes are treated as variables, each with the value of -1 and +1. The assignment of levels is extended for the consolidated product profile. Depending on the number of levels of the factor, the codes vary. The list of combinations with their levels is known as design matrix and is shown in Table 3.4. Table 3.4 Design matrix with levels Product profile OS X 1 Load capacity X 2 Memory X 3 a -1-1 -1 b 1-1 -1 c -1 1-1 d 1 1-1 e -1-1 1 f 1-1 1 g -1 1 1 h 1 1 1 3.3.6 Utility Graph Representation To measure the difference between the ranks of each attribute, the full factorial design has to be represented in a graphical notion known as utility graph. Since this illustration considers only three attributes, a cube is constructed as the utility graph. In this cube, the rank of each product profile is represented as a point. Figure 3.3 represents all the ranks in the utility graph. It represents the ranks of each product profile with their factor which is used for calculating the individual preference level of that particular attribute. 41
7 8 5 6 Memory 3 4 1 2 OS Figure 3.3 Utility graph of the factors 3.3.7 Linear Regression Function In this step, the preference of the levels of each factor are obtained such that the estimated ranks of the product profile using the utility weights highly correlate with the original ranks of the product profile. The rank for each factorial combination is defined using the multiple linear regression function [83, 86] given in equation 3.1, Rank = Part_worth of attribute i * Level of attribute i 3.1 where i=1,2, n. Part-worth is written with an underscore for clarity in the notation. Using equation 3.1, ranking is defined for the three factors as, 3.2 where is the rank of a product profile, is the part-worth of OS, is the partworth of Load capacity, is the part-worth of Memory, X is the level of the attribute and is the adjustment factor. To estimate the influence of variables on one another, a multi-variate linear regression function is derived [83]. A system of linear equations using the coded 42
combinations and the ranking for each combination is formed using the regression function. These linear equations for each rank is given in Table 3.5. Table 3.5 Multiple regression function Rank 3.3.8 Regression function 8 7 4 3 6 5 2 1 Calculation of the Part-worth Utilities For each attribute, the part-worth utility is calculated. From the cube, part-worth utility corresponds to difference between the sum of the rank values of all points on the right side of the vertical plane and the sum of rank values of all points on the left side of the vertical plane. Figure 3.4 shows the calculation of the part-worth utility for the attribute OS. 7 8 5 6 3 4 2 1 OS Figure 3.4 Utility graph for OS 43
In the similar way, the part-worth utilities are calculated for other attributes as shown in Figure 3.5 and Figure 3.6. 7 8 5 6 3 4 1 1 2 Figure 3.5 Utility graph for Load capacity 7 8 5 Memory 6 3 4 2 1 Figure 3.6 Utility graph for Memory The part-worth utilities for Load capacity and Memory are also calculated using the utility graph and the values are mentioned in Table 3.6. Table 3.6 Part-worth utilities Factor Part-worth utility Operating System Load Capacity Memory - 0.5-2 -1 44
The ranks are calculated by substituting the part-worth utilities given in Table 3.6 in equation 3.1 as, where 3.3 is the rank for a product profile and X is the level of the attribute. The utility adjustment factor is chosen such that the sum of the errors of the actual preference values and the corresponding estimated preference values is minimum [80]. When the original preference of the respondents and the estimated preferences are compared, the difference is found to be negligible. It is observed that the calculated profile ranking match the actual ranking. 3.3.9 Calculation of the Relative Preferences The relative preference of an attribute can be calculated from its part-worth utility, with levels -1 and +1, is given as, where 3.4 is an integer representing the individual preference, levels of an attribute, is the number of the is the part-worth utility of the attribute and i=1, 2, n. Using equation 3.4, the preference value for each attribute is computed as an integer and given in Table 3.7. Table 3.7 Individual preference of the attributes Attribute Preference value Operating System 1 Load capacity 4 Memory 2 The total variation of all the attributes is computed using the individual preference value of the attributes and is given as, 45
3.5 where is the total variation of the attributes, is the variation of the individual attribute. The relative preference of the attribute is derived using the individual preference value P and the total variation T. From equations 3.4 and 3.5, the relative preference is given as, 3.6 where represents the percentage of relative preference of an attribute, is the preference of the attribute, is the total variation of the attributes and i=1,2,3 n. Using equation 3.6, relative preferences are calculated as a percentage and given in Table 3.8. Table 3.8 Relative preferences of the attributes Attribute Relative preference Operating system 14% Load capacity 57% Memory 29% 3.4 Summary This chapter illustrated the method of performing conjoint analysis for a scenarios having two servers with three attributes. This work used http://www.surveymonkey.com to carry out a survery to rank attribute preferences with experts participating in the survery. The questionnaire is presented in Appendix A. Using the collected data, conjoint analysis was carried out. Load capacity was identified as the most preferred attribute of the server. In a similar way, another scenario having four servers with three attributes was also analyzed using conjoing analysis. This scenario forms the basis of the discussions in the subsequent chapters. 46