A Hierarchical Bayes Model for Ranked Conjoint Data

Size: px
Start display at page:

Download "A Hierarchical Bayes Model for Ranked Conjoint Data"

Transcription

1 A Hierarchical Bayes Model for Ranked Conjoint Data Prof. Dr. Thorsten Teichert Edlira Shehu* Hamburg, June 2005 *Correspondence to: Edlira Shehu University of Hamburg, Chair for Marketing and Innovation Von-Melle-Park 5, D Hamburg, Germany I

2 Table of Contents Table of Contents...II Abstract... III 1 Introduction A Model of Ranked Conjoint Data Rank inequalities Relationship between metric preference functions and rankings Relationship between error-free and empirical rankings Stochastic model of rank orders Probability of an observed ranking Fit of estimation methods with model requirements Adequacy of traditional estimation methods A Hierarchical Bayes model for rank ordered conjoint data Concept Estimation and Modelfit Exemplary Simulation Study Design Estimation and comparison of results Summary and Conclusions References Tables Figures Footnotes II

3 A Hierarchical Bayes Model for Ranked Conjoint Data Abstract This paper investigates the adequacy of the hierarchical Bayes (HB) model for rankbased conjoint data. While recent research has demonstrated the robustness of the HB model compared to traditional estimation methods for rank based conjoint models, we conduct an indepth analysis of the underlying reasons of these findings. Hereby we investigate the fundamental assumptions of rank based conjoint models in order to explain problems concerning information loss by using ranking methods. In particular, we expose the ambiguity of rank orders and the inherent ambiguity of part-worth estimates. Propositions conform to requisite traits postulated by the model of rank based conjoint data are elaborated. We point out existing shortfalls of traditional estimation methods since they do not fulfill the requirements of rank based conjoint data. In contrast, the HB model explicitly involves the stochastic modeling of deviations at individual and super-population level. Thus we postulate a superior adequacy of the HB model versus OLS and LINMAP. In order to investigate this hypothesis, a comparison of the estimation accuracy is performed by using simulated data. The simulation results show the superiority of the HB model in terms of minimizing the absolute deviations of part-worth estimates from their true values and consequently in generating parameters which reflect the true nature of the data. Key words: Hierarchical Bayes, rank based conjoint analysis, scale transformation, part-worth estimates. III

4 1 Introduction Since the early 1970s, conjoint analysis has been a popular method for measuring customers preference structures. Different types of metric and non-metric scales are used to measure consumers preferences for various product profiles that are the basis for estimating the utility functions for the different attributes (Green, Srinivasan, 1990). Besides others, the elegance of rank-order solutions supported the early diffusion of conjoint analysis (Green, Rao, 1971). It was assumed that higher qualified evaluation tasks, e.g. ratings, overtax respondents capabilities and thus lead to inconsistent answers. Although rating scales are being used more frequently (Wittink et al., 1994), the basic features of ranked conjoint data have not seriously been questioned until now. This paper compares hierarchical Bayes, OLS and LINMAP in their ability to derive part-worth estimates in ranked conjoint data. Recent research has demonstrated the robustness of HB compared to traditional estimation methods (such as OLS and LINMAP) for ranked conjoint data (Park, 2004). We seek to investigate the fundamental reasons for these findings. We apply a model-based approach of assessing rank based conjoint techniques and investigate the basic characteristics of the latter. There seems to be a need for this model-based approach of assessing rank based conjoint methods: On the one hand, evaluators can choose between various estimation techniques which base on different propositions, as they range from metric to choice interpretations (Chapman, Staelin, 1982) of ranked data. On the other hand, the empirical problems of cross-validating their outcomes are various (Bateson et al., 1987) and yet there is no clear indication on the superiority of one estimation method. Simulation studies (Park, 2004; Darmon, Rouziès, 1994, 1991) show that the relative performance of different estimation methods depends on the conditions of the experimental settings. Thus, there is a need for an in-depth analysis of the fundamental reasons of these findings. Such an approach

5 should generate a reasonable and common basis for more general comparisons and should provide an update on the possibilities and limitations of ranked-based conjoint analysis. In the following, the needed explanations will be derived by a model-based approach. Several propositions are presented which characterize the model of ranked conjoint data. The loss of information resulting from use of rank-order data is quantified. The fit of traditionally used evaluation methods is evaluated in terms of meeting the model requirements. An alternative HB model is presented, which better utilizes existing information through stochastic modeling of error terms at individual and super-population levels. By this, the HB model optimizes the scale transformation of observed rank orders into metric part-worth estimates. As a validity check, a comparison of the methodological adequacy of commonly used estimation methods compared to the HB model is carried out by using simulation data. Shortcomings are analyzed and fields for future research are shown. 2 A Model of Ranked Conjoint Data Conjoint analyses which are based on ranked data are characterized by a bi-directional transformation of scale: Firstly, respondents implicitly assess the metric utility values of the stimuli presented. This establishes the basis for the articulated ranking. In a second step, statistical methods are applied to transform the ranking back into a combination of estimated metric part-worth values. Finally, goodness-of-fit is measured on the individual level as the match between estimated and observed ranking. This analytical framework of ranked conjoint data is visualized in [Figure 1. The shadowed areas indicate information which remains hidden for the evaluator. The arrows indicate the sequence of the above sketched steps being undertaken in a conjoint experiment

6 [Figure 1] From this it can be seen that traditional estimation methods and their goodness-of-fit measures rely entirely on the rank order data. This constitutes the basis for potential shortcomings: While the evaluator is interested in achieving estimation accuracy between estimated and true metric part-worth values, he bases his estimations on the goodness-of-fit between observed and estimated rankings. The latter measure may serve as a substitute of the former one only if the scale transformation can be unambiguously reversed and if the estimation methods correctly reshape the transformation process of respondents, i.e. the model of ranked conjoint data. Therefore, we investigate the adequacy of estimation methods for ranked conjoint data in terms of their true estimation accuracy. This fit is of central importance for assessing both the adequacy of estimation methods and the information content of goodness-of-fit measures. Ranked conjoint data do not possess a metric anchorage since they contain only information on relative values. They only provide the order of conjoint profiles and not their absolute positions on an underlying continuum. Furthermore, they are not error-free translations of the true metric utility values but are characterized by a stochastic error term. In the following paragraph, the basic patterns of rank inequalities are shown and their scale properties are tested. The ambiguities in relating preference functions to error-free rankings and those in turn to empirical rankings are subsequently analyzed. At each stage of analysis, propositions formulate the requisite traits of evaluation procedures. They specify a model of ranked conjoint data. 2.1 Rank inequalities Ranked based conjoint analysis does not ask directly for stimuli utility, but obtains relative statements on the preferability of any stimulus against all others. A metric anchorage

7 is missing, since rank neighbours only provide lower and upper bounds which by themselves are not fixed but depend on the location of consecutive ranks. Thus, in mathematical terms ranked data are only a set of consecutive inequalities. The ranking information can be segregated into all possible combinations of pair wise comparisons. The resulting set of inequalities contains all information which is utilized to determine part-worth estimates. A ranking of n stimuli provides n(n-1)/2 inequalities. Most of them are redundant; few determine the limits of the part-worth estimates. For example, assume a full-factorial design with three dichotomous variables A, B and C and 2 3 =8 stimuli and a fully consistent rank order according to an utility function with partworth values for each upper level +A=40, +B=35 and +C=25 utility points and corresponding negative part-worth values for each lower level. This would lead to an error-free rank order as following: R(-A,-B,-C)<R(-A,-B,C)<R(-A,B,-C)<R(A,-B,-C)<R(-A,B,C)<R(A,-B,C)<R(A,B,-C)<R(A,B,C) Out of this rank order, a total of 28 pair wise comparisons can be gained, which contain following non-redundant inequalities: i. Comparisons of stimuli pairs at the level of a single variable, as: R(-A,-B,-C)<R(-A,-B,C) -C < +C ii. iii. Comparisons of stimuli pairs at the levels of exactly two variables, as: R(-A,-B,C)<R(-A,B,-C) +C < +B; R(-A,B,C)<R(A,-B,C) +B < +A Comparison of stimuli pairs at the levels of all three variables, as: R(+A,-B,-C)< R(-A,B,C) +A-B <+C In combination, the ranking provides the following information: (a) 0<+C<+B<+A and (b) A<B+C. These inequalities determine the part-worth estimates and summarize all information which can be gained from the above rank order. They convey an ambiguous relationship between ranked data and the underlying metric utility

8 function. This ambiguity is an essential, however often overlooked feature of ranked conjoint data. It can only be resolved, if the inequalities can be recoded into a system of equations, which requires that ranked data approximate an interval scale. Thus, a design should select equally spaced stimuli in order to indicate interval-scale properties of the ranked input data. Orthogonal designs, however, balance the variable levels and not the differences of stimuli utilities. Coincidence of scales is consequently not inherently guaranteed. This was confirmed by previous analyses that show that ranked data do not necessarily approximate an interval scale, if the number of stimuli is increased (Teichert, 1997). Furthermore, the intervalscale properties of stimuli utilities were shown to be differently affected depending on the type of the underlying preference function. This leads us to: Proposition 1: Evaluation procedures should not blindfoldedly follow interval scale assumptions. 2.2 Relationship between metric preference functions and rankings Given that ranked data may fail to approximate the interval scale, the inequalities provided by ranked data cannot be unambiguously translated into a set of equations. For sets of inequalities, however, multiple solutions exist. Thus, different preference functions may well lead to the same ranking. This constitutes a well-known problem in operations research (Brockhoff, 1972) that was recognized by early scholars of conjoint analysis (Srinivasan, Shocker, 1973). Since rankings build a set of consecutive inequalities (see above), they provide lower and upper limits for variable estimates. Within these limits all solutions are equally feasible. Therefore, alternative preference functions form estimation intervals. An estimation interval conditional on other variable estimates can be assessed by modifying the estimate for a single independent variable as long as no rank reversal occurs. For the example used above, the part

9 worth values for variables upper levels, if normalized to 100 utility points, can vary within: [ 35;49 ]; B [ 26;48 ]; C [ 3;32] A. The more variables and stimuli are included, the more complex comparisons are possible. This limits the ambiguity of part-worth estimates, but does not lead to single point estimates. Following proposition is derived: Proposition 2: Evaluation procedures should take ambiguity of effect estimates into account and thus should provide estimation intervals. 2.3 Relationship between error-free and empirical rankings Empirical rank data are not error-free translations of the true metric utility values but are characterized by a stochastic error term. Thus it is unlikely that the observed empirical rankings are identical with the true, error-free rankings. The observed ranking however is the only information provided by respondents. Both the number and the location of rank reversals are unknown and are implicit within the estimation procedure. This constitutes a severe problem, because an observed ranking may be the result of different errors from different true rankings and thus from different underlying utility functions. For example, an observed faulty ranking of a true utility function may be identical with an error-free ranking of another (and thus false ) utility function. This point is elaborated through a simulation of the two most-likely errors, i.e. the reversals between those ranks with most similar metric utility values 1. [Table 1 shows the (non-representative) findings of this assessment. It can be seen that the ability to detect even simple rank reversals is by no means guaranteed and that it varies largely. This can be explained by the (non)existence of redundant information: A rank reversal remains hidden, when it is the only inequality which provides the specific information. Reduced designs possess small degrees of freedom and provide consequently no redundant information. Accordingly, the ability to

10 detect rank reversals remains low even if more than one rank reversal occurs. In contrast, extended designs provide multiple, redundant information. [Table 1] Empirical data very often have more than one rank reversal. Thus, inconsistencies are likely to remain unsolved. In fact, a Kendall s τ of 0.8 is often seen as sufficient to classify the observed rankings as internally valid, which corresponds to 3 identified reversals of rank pairs in the case of 8 stimuli and 12 in the case of 16 stimuli. The larger the irresolvable inconsistencies, the less determined the estimate becomes, because the location of reversals remains ambiguous. Furthermore, a misleading interpretation of observed rankings is more likely if the number of possible rank reversals increases. It can be concluded that it is unlikely that all rank reversals are accurately identified. This will inevitably lead to defects of both the estimation procedure and of the applied goodness-of-fit measures when they rely on the fit between estimated and observed rankings. Based on the available information an incorrect utility function might be estimated and even a perfect fit might be falsely diagnosed. Proposition 3: Evaluation procedures should take into account the inability to detect rank reversals. Therefore, they should not utilize the fit between estimated and observed ranking as the only criterion for assessing estimation outcomes. 2.4 Stochastic model of rank orders According to the basic idea of conjoint analysis, respondents should rank stimuli using a holistic and comparative assessment of stimuli utilities in accordance to their true utility function. Thus, it can be assumed that respondents assess stimuli utilities holistically with an error term centered on their true means and subsequently articulate their ranking in accordance to these judgments. This can be specified by adding a normal-distributed error term to the

11 metric stimuli utility values. Other distributional patterns may be thought of; however, the normal distribution is in line with a comparable model on rating scales (Srinivasan, Basu, 1989) and can base on the arguments elaborated therein. [Figure 2] provides an illustration of this error model, which has been used for practical purposes in many simulation analyses of rank-order conjoint data (e.g. Darmon, Rouziès 1994, 1991; Umesh, Mishra, 1990; Wittink, Cattin, 1981; Carmone, Green, 1981). It shows that the probability of observing a rank reversal can be assessed by calculating the integral of overlapping normal distributions. Given that the probability distributions of more than two neighbouring stimuli are overlapping, a joint probability integral has to be calculated. This dependence of probabilities is major difference between ranked conjoint data and rating data, in which the assessment of one stimulus would be independent of all others (Srinivasan, Basu, 1989). [Figure 2] Different assumptions about the decision making process and the error term underlie the Luce and Suppes Ranking Choice Theorem (Luce, Suppes, 1965), which handles a ranking as a sequence of independent choice tasks. This view justifies applying the estimation method of rank explosion (Chapman, Staelin, 1982): It is assumed that respondents rank all stimuli from top to bottom by repeatedly choosing the most preferred stimulus. The chosen stimulus is excluded and the ranking/choice procedure is repeated from scratch on for the remaining stimuli: this implies a new evaluation of the stimuli after each ranking/choice decision. In contrast, our model would lead to dependent choice tasks: Given that we observe in [Figure 2] a reversal of ranks 5 and 6, then stimulus 5 (6) is likely to be consistently over-(under-)valued in following pair wise comparisons as well

12 Comparing both models, the Luce and Suppes approach seems to be less adequate for describing respondents behavior. It would require respondents to reassess subsets of the n stimuli over and over again, leading to ½*n(n+1)-1 comparisons, while simultaneously forgetting about possible misperceptions in the former stages and neglecting already ranked stimuli. This regards respondents as very persistent and extremely forgetful at the same time. In addition, respondents would neither be allowed to pre-sort the stimuli nor to re-evaluate their ranking, despite the fact that both aspects are often encouraged in interviewers guidelines in order to enhance the reliability of answers. The normal distributed model reflects better the hypothesized process of respondents assessments. In fact, [Figure 2] can be viewed as an analogy to an efficient process of respondents assessment: The location of stimuli probability distributions may be seen as the outcome of presorting the stimuli on a continuous line. The respondent now refines his judgment by comparing the utilities of neighbouring stimuli, considering the better and the worse alternatives. Each stimulus is given exactly one perceived utility value and stimuli are ranked according to their resulting order on the x-axis. Overall, the model of normal distributed stimuli evaluations is in accordance to the basic idea of conjoint analysis. Thus, the further analysis will be based on this error model. 2.5 Probability of an observed ranking The normally distributed error model allows us to calculate probabilities to observe any possible ranking under the premise of a given metric preference function and a distributional assumption. Given a normally distributed error model and assuming a design with fixed endpoints, rank reversals are to be expected, when neighbouring ranks possess similar metric utilities. They are unlikely when neighbouring ranks are highly different. Both the probability of

13 observing the true rankings and the number of rank reversals do not only depend on the size of the error term but also on the metric differences between neighbouring stimuli. Thus, the ranked data by themselves do not contain all information which could be utilized by the evaluator. The stochastic model and the pattern of possibly underlying metric stimuli utilities provide further information. Proposition 4: Evaluation procedures should relate the ranking information to the probable pattern of underlying metric utilities. Based on this proposition, evaluation methods should not only aim at replicating the observed ranking, i.e. minimize the number of rank reversals. Since the observed ranking should be looked at as a sequence of equally normal-distributed comparisons of rank pairs, consideration should be given to the joint probability of observing both the estimated correct rank pairs and the rank reversals. Alternative solutions should be valued accordingly. An estimate, which diagnoses rank reversals of similar stimuli and correctly classifies highly different rank pairs, should be preferred to an estimate, which diagnoses a cluster of similar stimuli without a rank reversal. The latter estimate is questionable, since similar stimuli are likely to result in some rank reversals, while there is a high probability of obtaining the combination of rank reversals and correctly classified rank pairs from the first solution. To sum up, evaluation procedures should apply a two-sided optimization procedure which maximizes the joint probability of observing the ranking: (a) observing the correct rankings and: (b) observing the rank reversals. Proposition 5: Evaluation procedures should be based on the total probability of observing the actual ranking, that is the probability of observing the specific combination of rank reversals and correctly ordered rank pairs

14 3 Fit of estimation methods with model requirements Traditional estimation procedures and their goodness-of-fit measures are directed towards replicating the observed ranking through the estimated one, since the observed ranking constitutes the only information provided by respondents. In our analysis the main emphasis is on the investigation of estimation accuracy of traditional methods and HB model in terms of fitting the fundamental features and the requisite traits of rank based conjoint data. Therefore, while most research surveys investigate adequacy of estimation methods in terms of goodness-of-fit measures, such as predictive validity, we conduct a fundamental analysis focused on the true estimation accuracy of these methods (Figure 1). 3.1 Adequacy of traditional estimation methods In the following paragraph, it will be analyzed whether or not traditional estimation methods adequately reflect the propositions of the model of ranked conjoint data. Since the Luce and Suppes Ranking Choice Theorem already showed to be not in concurrence with the stochastic model applied here, the choice-based evaluation techniques are excluded. Instead, the analyses focus on the metric ordinary least squares regression analysis (OLS), the most often used evaluation method (Wittink et al., 1994), which is based on a well-known robustness, and on the statistically more adequate, non-metric Linear Programming Approach LINMAP (Shocker, Srinivasan, 1977). These two methods represent the metric and the nonmetric evaluation alternatives. The findings can be easily applied to variations of these estimation approaches. First of all, traditional estimation procedures deliver point estimates. Therefore, they do not contain information on the size of the estimation interval (which is against proposition 2). The metric estimation technique OLS assumes use of interval-scaled ranked data (which is against proposition 1). The observed ranks are simply interpreted as metric numbers. A

15 unique solution is achieved by minimizing the squared sum of deviations between estimated and observed metric values. Thus, each rank reversal is valued as equally important (which is against proposition 4), and a rank reversal encountering a skipped rank (e.g. reversal of rank 1 and rank 3) is valued as more important than the sum of two reversals of neighboured ranks. In minimizing the sum of squared errors, information from both rank reversals as well as from correctly ordered rank pairs determines estimation outcomes (proposition 5). In this regard, the OLS algorithm departs from a mere replication of the observed ranking (proposition 3) but may falsely diagnose rank reversals even in error-free conditions. LINMAP does not state assumptions on the scale properties of the ranked data (proposition 1). It uses the method of linear optimization to minimize the sum of metric corrections which have to be added to force the estimated error-free stimuli values into the observed rank order. LINMAP will provide an estimate with no rank reversals, as long as there is such a solution, since in this case the necessary metric corrections are zero and, thus, minimized. Accordingly, it is likely that rank reversals are falsely resolved (which is against proposition 3). Under error-free conditions, LINMAP leads to an estimate which falsely identifies ties in the extended design. In minimizing the metric distance of any rank reversal, LINMAP utilizes the information content from different distances between rank pairs (proposition 4). However, the distance between two correctly classified stimuli is not analyzed (which is against proposition 5). Thus the algorithm ceases as soon as it finds one of the possible solutions. The location of this solution within the estimation interval is arbitrary (which is against proposition 2) and depends on the initial values given by the algorithm. Accordingly, Srinivasan and Shocker (1973) recognize that alternate optima may exist for their linear programming technique LINMAP and that they could be enumerated. However, there is no evidence for any application of such procedure

16 The comparison between OLS and LINMAP reveals strength and weaknesses of both methods, which requires a differentiated assessment of their relative performance: OLS wrongly bases on the interval-scale proposition, but correctly bases its evaluation on the entire ranking, i.e. it considers the patterns of metric differences between all rank pairs simultaneously. From these basic differences between both estimation procedures we can derive suppositions on their relative performance: The interval-scale assumption causes OLS to overvalue the information provided from closely neighbouring ranks. Furthermore, the squared term causes skipping of ranks to exert a high impact on estimation outcomes. Highly distorted estimates are thus to be expected, if closely neighbored rank pairs cause skipped ranks. Accordingly, the propositions of OLS seem to be best satisfied in case of small designs and small error terms, as a skipping of closely neighbored ranks is less likely to occur under these conditions. LINMAP, on contrary, can be expected to underperform under the above mentioned conditions. In cases of relatively small numbers of rank reversals, LINMAP provides less determined solutions because it does not fully utilize the information of correctly ordered rank pairs. This leads to an arbitrary choice of any solution within the larger estimation intervals of smaller designs. Furthermore, LINMAP is likely to resemble suboptimal solutions which avoid rank reversals but cluster similar stimuli against more realistic solutions, which encounter rank reversals of similar stimuli and correctly classified stimuli of larger metric differences. To sum up, the estimation procedures which are based on different propositions about the basic principles of ranked data use different search algorithms and, thus, lead to divergent estimates. Thereby, neither of the estimation techniques fulfills all of the required traits

17 postulated by the model of conjoint data. This hints towards the existence of systematic shortcomings of traditional estimation methods. Overall, the evaluator is confronted with a high degree of inherent uncertainty about the unambiguousness of estimates gained from ranked-based conjoint analysis. By employing ranks as the dependent variable the usual statistical tests and goodness-of-fit measures are not strictly valid (Park, 2004). This clearly shows the need for developing evaluation schemes which better fulfill the propositions of the model of ranked conjoint data: (a) Improved goodness-of-fit measures should reflect the dual nature of conjoint data. They should not only rely on the ranking itself, but should encompass the underlying metric stimuli utilities as well. Validity measures based on the normal distributed error model, as in the hierarchical Bayes model, constitute a more rigid test of model validity than commonly used measures. (b) Estimation procedures should base on the error model of ranked conjoint data and should use information above and beyond simple point estimates. Individual estimates should reflect the inherent ambiguity of part-worth estimates and should provide probabilistic estimation intervals. 3.2 A Hierarchical Bayes model for rank ordered conjoint data As a solution for the shortfalls of traditional methods a hierarchical Bayes model for rank ordered conjoint data is presented. Several surveys have already applied this model in the analysis of conjoint studies (Lenk et al. 1996; Allenby et al. 1995; Allenby, Ginter 1995). By using the HB model preference heterogeneity can be taken into account and prior knowledge can be incorporated in the estimation process. Other recent surveys investigate the crossvalidity of hierarchical Bayes approaches to traditional methods, such as OLS or latent segments for rating-based and choice-based conjoint analysis models (Moore, 2004; Otter et

18 al., 2004; Andrews et al., 2002; Moore et al., 1998; Allenby et al., 1998; Lenk et al., 1996). In spite of the publication of several articles on the HB model, only one published study has analyzed the predictive validity of the hierarchical Bayes method for ranked based conjoint data (Park, 2004). The results of this study demonstrate the superiority of the hierarchical Bayes model compared to traditional estimation methods (OLS and LINMAP) in terms of predicted hit rates, while it is less robust in terms of RMSE (Park, 2004). However, this survey provides no in-depth analysis of the fundamental reasons of these findings. We investigate the estimation accuracy of the HB model in terms of fitting the fundamental features and the requisite traits of rank based conjoint data. The Bayesian estimation technique is therefore considered under the perspective of the optimization of the scale transformation process from observed rankings to metric part-worths. Firstly a HB model for ranked conjoint data is specified followed by the description of the estimation process of the part-worth values Concept Bayesian estimators combine prior information about model parameters with information contained in the data to generate posterior distributions. The prior distributions of the individual part-worth parameters β mnr of respondent r for the n-th level of attribute m are multivariate normal with mean b mn representing the aggregated estimates of the super-population. mnr ( ) β ~ NV b, σ mn r The standard deviation σ r is drawn form an inverted Gamma distribution with prior values of the distribution parameters α and λ to be determined a-priori. By assuming

19 individual-specific error variances, the model adjusts the individual estimators for the differential use of the measurement scale (Lenk et al., 1996). ( ) σ r ~ IG α, λ By combining the individual part-worth-values β and the dummy-variables d ( ) of mnr mn a the design matrix for stimulus a we deduce the deterministic components of the utilityv ar : V ar = β * d m n mnr mn( a) The stochastic utilities U ar are computed by compounding the deterministic utility components with an error term τ according to the classic random utility theory. Information loss and uncertainty are explicitly modelled by the stochastic specification of error terms at both individual ( σ r ) and super population level ( τ ). Therefore smaller effects in the scale transformation of rankings in metric part-worth values are expected. The stochastic utilities U ar are drawn from a multivariate normal distribution with the parameters V ar and τ respectively as mean values and standard deviations: ( ) U ar ~ NV Var, τ The observed rankings result from a deterministic transformation of the metric data in an ordinal ranking order. We only need to perform a deterministic transformation, since the error term is considered in the generation of the metric utilities: R Rang ( ) ar U ar On the basis of these assumptions, the complete hierarchical Bayes model can now be specified by a series of conditional distributions. A visualization of the HB model for rank based conjoint data is provided in [Figure 3]. Here, deterministic modeled relations are represented through dashed lines and stochastic parameters are shown through arrows

20 [Figure 3] A stochastic distribution of the individual utility functions is at the bottom of the hierarchy. The individual observations are considered as replications of a higher ordered utility function of the entire sample. This function is followed by successively higher levels of priors Estimation and Modelfit The estimation of the model is carried out by successively generating random draws from the conditional distributions. The analysis of the hierarchical Bayes model is conducted according to the principle of the Gibbs Sampler. The basic idea of these methods is to construct a Markov Chain and simulate this chain to obtain a sequence of draws which can be used to approximate the joint posterior (Gelfand et al., 1992; Gelfand, Smith, 1990). It is shown, that the conditional stochastic distributions converge to the joint probability distribution of all the involved parameters (Gelfand, Smith, 1990). By this, information concerning the total probability of observing the specific combination of rank reversals and correctly ordered rank pairs is considered during the estimation process (according to proposition 5). First prior values for the parameters α, λ, τ and b mn have to be specified. If no prior information about these parameters is available, very diffuse settings for these priors can be applied (Allenby, Rossi, 1999). Starting from the defined prior values the first iteration is conducted by drawing the standard deviations σ r from the inverse Gamma distribution. Individual part-worth values β mnr successively are drawn from the respective multinomial normal distribution. The drawn parameters are used to achieve new values of the stochastic utilities and the rank orders

21 The metric utilities serve as a basis for the next iteration of the individual part-worth values (proposition 4). This procedure is repeated until the burn-in number of iterations and consequently convergence of the posterior distribution is achieved. Additional draws are then used to compute the marginal posterior distribution of various parameters of interest. In opposite to the point estimates deliverable from the traditional estimation methods, the HB model delivers the whole range of the stochastic modelled part-worths at both individual and aggregated level (proposition 2). Furthermore the HB model investigates the stochastic nature and the distribution of the parameters according to proposition 3. The problem of using traditional goodness-of-fit measures that tend to maximise the fit between observed and estimated rankings and hence to overestimation of the true consistency is in so far avoided. The estimation procedure bases on the underlying metric utilities and does not use the observed rankings. The latter are generated in the course of the iteration process depending on the metric utility values (proposition 1). This feature distinguishes the estimation process of the HB model fundamentally from that of the traditional estimation methods, which generate the metric utilities on the basis ob the observed rankings. To sum up, the HB model fulfills all of the required traits postulated by the model of conjoint data. Therefore, the application of this model should provide improved estimators that reflect the true state of nature correctly ([Figure 1]). 4 Exemplary Simulation Study This section presents a simulation employed to exemplify the findings above. Simulations hold an advantage in providing results with greater generalizibility than real data. For this reason they have been widely used to assess different aspects of conjoint analysis (Park, 2004; Otter et al., 2004; Darmon, Rouzies, 1999; Leigh et al., 1984; Carmone et al.,

22 1978). Furthermore simulations allow the benchmark of the estimation outcomes against otherwise unknown true metric utility functions. Considering our model-based approach, there is a need for alternative goodness-of-fit criteria, since the traditional ones rely on rank-based information and not on the underlying pattern of metric utilities. To be specific, we employ the sum of absolute deviations of estimated variable part-worth values from their true metric part-worth values, to assess the adequacy of the estimation methods for rank based conjoint data. 4.1 Design We used linear-additive utility functions with five dichotomous attributes to obtain design matrices. We obtained two design matrices from the fractional factorial design based on these product attributes: an orthogonal main-effect design with 8 stimuli and an extended design containing 16 stimuli. A normally distributed error term is added to the error-free calculated preference values in order to simulate variation of the respondent answers. Since the magnitude of the true error is unknown and contextual, a variation of the error term is conducted in four levels. The range of this variation is selected so that its spectrum possesses realistic dimensions: Comparing the resulting estimated goodness-of-fit measures ([Table 2]) with cut-off levels applied in real-life conjoint applications (e.g. Mullet, Karson, 1986; Sattler, 1994), the upper error level should simulate a sample with bad consistency, the medium error level a reasonable one, the lower error level one with high consistency and the lowest an error-free estimation. We generated three types of preference functions in order to account for different preference structures and based on previous analysis (Teichert, 1994): A homogeneous preference function, a heterogeneous preference function and a dominant preference function. Through the combination of these three features (type of preference function, type of design

23 and size of the error term) 24 constellations are yielded. 100 different utility functions are generated for each constellation. The resulting data basis of 2400 observations is used to assess the adequacy of the estimation techniques. 4.2 Estimation and comparison of results The estimation methods discussed above (OLS, LINMAP and HB 2 ) were applied on the simulated utility functions. The results are reported in [Table 2]. The total deviations of the estimated part-worths from their true values are reported for each estimation techniques (see above). In the lower part of the table a comparative index is calculated as an indicator of the relative estimation precision. [Table 2] As expected OLS performs the best in case of small designs and small error terms. The HB model shows, however, lower values than OLS independently of design size and level of the error term. The positive values of the relative estimation adequacy over the whole range of simulated utility functions indicate the robustness of the HB model compared to OLS. The robustness is especially evident in the case of extended designs by higher levels of the error term. This can be seen in the increase of the relative adequacy from 3% in case of the main-effect design with middle error term (ND(0,10)) up to 15% in case of the extended design and same level of the error term. LINMAP stands to benefit from the increased number of profiles, since its estimation precision accedes in case of the extended design 3. The relative performance of HB compared to LINMAP indicates ambivalent conclusions: While HB performs better in case of small designs independently from the level of the error term; LINMAP is superior when using extended designs especially for high levels of error terms 4. It is however not uncommon to

24 observe little improvement of HB when increasing the number of profiles (Park, 2004; Lenk et al., 1996). To sum up, the estimation accuracy of the HB model versus OLS is demonstrated independently from the size of design and the level of the error term. The superiority of the HB model compared to LINMAP can distinctly be recognized in the case of smaller designs. 5 Summary and Conclusions This study aimed to provide a methodological frame for the estimation of rank based conjoint data by defining propositions that should be fulfilled by adequate estimation techniques. A model of ranked conjoint data was developed which provides an in-depth view on the scale properties of ranked data and on the hidden processes in ranked conjoint analyses (see [Figure 1]). Five basic propositions were generated which specify requirements for evaluation procedures. Furthermore conceptual framework revealed general insights on the possibilities and limitations of ranked based conjoint analysis. It provided valuable information on the origin of shortcomings observed in previous studies. From this modelbased background the adequacy of the traditional methods was investigated. It could be shown, that, by and large, the traditional evaluation methods fail to meet most of the propositions postulated before. Thus the conceptual foundation of the evaluation methods possesses major shortcomings, if the proposed model of ranked conjoint data holds true. As an alternative a HB model for ranked conjoint data was specified. The adequacy of this model for ranked conjoint methods was shown. In contrast to traditional estimation methods procedures introduced above, which tend to maximise the fit between observed and estimated rankings and hence to overestimate the true consistency, the HB model reflects the stochastic nature and the distribution of the parameters. Ambiguity in the scale transformation of rankings in

25 metric part-worth values should hence be diminished by explicitly modelling information loss and uncertainty. [Table 3] summarizes the findings regarding the fit of estimation methods with a model of ranked based conjoint data. It can bee seen, that the shortfalls of the traditional estimation methods can be avoided by using the HB model which meets all the requirements of rankedbased conjoint models. [Table 3] A validation of these findings was conducted by applying the traditional methods (OLS and LINMAP) and the presented HB model on simulation data. The methods were compared in terms of the accuracy of the estimated parameters. For this the total deviation of the estimated and the real part-worths was used as criteria. LINMAP and OLS possess in many aspects complementary strengths and weaknesses. Accordingly, the analyses revealed a changing superiority: OLS showed to perform better in case of smaller designs and smaller error term, while LINMAP outperformed OLS in case of larger designs and larger error terms. The simulation results confirmed the expected superiority of the HB model in terms of estimating part-worth parameters which reflect the true nature of data. HB performed better than OLS independently of design size and level of error term. Its superiority towards LINMAP could be distinctly shown in case of small designs. These findings correspond to those of previous studies, which have already demonstrated the robustness of the HB model to small design size (Lenk et al., 1996; Allenby et al., 1995). Relative shortfalls of the Bayesian approach are assumed in case of extended designs. Therefore, a topic of future research is the implementation of more extensive simulations over a wider parameter space in case of large designs. An in-depth analysis of the

26 performance of the HB model for ranked based conjoint data by empirical application to real data is a further avenue for future research. In this paper we investigated the adequacy of a basic structured HB model for the estimation of ranked based conjoint data. The HB model though leaves space for modeling additional, more complex effects due to the high flexibility contained in each hierarchical level: Alternative distributions, such as the Gumble distribution, can be implied for the error terms. The effect of lower response variability for extreme rank-levels and higher response variability for interior rankings (inverse U-shape preference functions) can for example be modeled by specifying different standard deviations of the error terms at individual level. Prior information on different consistency degree of respondents can be incorporated in the estimation process by modeling covariates at individual level. This pattern can be modeled in the HB model by adequately specifying the standard deviations of the error terms at individual level. Improved part-worth estimates would be thus yielded. To conclude, the application of the HB model on rank based conjoint data bears a high potential in generating estimators which reflect the true nature of data correctly

27 References Allenby, G. M., Arora, N., & Ginter, J. L. (1998): On the Heterogeneity of Demand. Journal of Marketing Research, 35(3) Allenby, G. M., Arora, N., & Ginter, J. L. (1995): Incorporating Prior Knowledge into the Analysis of Choice Studies. Journal of Marketing Research, XXXII, Allenby, G. M. & Ginter, J. L. (1995): Using Extremes to Design Products and Segment Markets. Journal of Marketing Research, XXXII, Allenby, G. M. & Rossi, P. E. (1999): Marketing Models of Consumer Heterogeneity. Journal of Econometrics, 89, Andrews, R. L., Ansari, A. & Currim, I. S. (2002). Hiearchical Bayes versus Finite Mixture Conjoint Analysis Models: A Comparison of Fit, Prediction and Partworth Recovery. Journal of Marketing Research, 39(1) Bateson, J., Reibstein, D., & Boulding, W. (1987): Conjoint Analysis Reliability and Validity: A Framework for Future Research. in: Houston, M. (Hrsg.). Review of Marketing, Chicago, IL: American Marketing Association Brockhoff, K. (1972): On Determining Relative Values. Zeitschrift für Operations Research, 16, Carmone, F.J., & Green, P.E. (1981): Model Misspecification in Multiattribute Parameter Estimation. Journal of Marketing Research, 18, Carmone, F.J., Green, P.E., & Jain, A.K. (1978): Robustness of Conjoint Analysis: Some Monté Carlo Results. Journal of Marketing Research, 15, Chapman, R. G., & Staelin, R. (1982): Exploiting Rank Ordered Choice Set Data within the Stochastic Utility Model. Journal of Marketing Research, 19, Darmon, R., & Rouziès, D. (1991): Internal Validity Assessment of Conjoint Estimated Attribute Importance Weights. Journal of the Academy of Marketing Science, 19(4), Darmon, R., & Rouziès, D. (1994): Reliability and Internal Validity of Conjoint Estimated Utility Functions under Error-Free versus Error-Full Conditions. International Journal of Research in Marketing, 11, Darmon, R., & Rouziès, D. (1999): Internal Validity of Conjoint Analysis Under Alternative Measurement Procedures. Journal of Business Research, 46,

28 Gelfand, A. E. & Smith, A. F. (1990): Sampling-Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Associations, 85, Gelfand, A. E., Smith, A. F. & Lee, T. M. (1992): Bayesian Analysis of Constrained Parameter and Truncated Data Problems Using Gibbs Sampling, Journal of the American Statistical Associations, 87, Green, P.E., & Rao, V. R. (1971): Conjoint Measurement for Quantifying Judgmental Data. Journal of Marketing Research, 8, Green, P.E., & Srinivasan,V. (1990). Conjoint Analysis in Marketing: New Developments With Implications for Research and Practice. Journal of Marketing, Hensher, D. (1994): Stated Preference Analysis of Travel Choices: The State of Practice. Transportation, 21, Leigh, T. W., MacKay, D. B. & Summers, J. O. (1984): Reliability and Validity of Conjoint Analysis and Self-Explicated Weights: A Comparison. Journal of Marketing Research, Lenk, P. J., DeSarbo, W. S., Green, P. E., Young, M. R. (1996): Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs. Marketing Science, 15(2), Luce, R. D., & Suppes, P. (1965): Preference, Utility, and Subjective Probability. in: Luce, R.D., Bush, R.R., Galanter, E. (Hrsg.). Handbook of Mathematical Psychology. Volume 3, New York: John Wiley & Sons, Moore, W.L. (2004): A Cross-Validity Comparison of Rating-Based and Choice-Based- Conjoint Analysis Models. Journal of Research in Marketing, 21, Moore, W.L., Gray-Lee, J. & Louviere, J. J. (1998): A Cross-Validity Comparison of Conjoint Analysis and Choice Models at Different Levels of Aggregation. Marketing Letters, 9:2, Mullet, G. M., & Karson, M. J. (1986). Percentiles of LINMAP Conjoint Indices of Fit for Various Orthogonal Arrays: A Simulation Study. Journal of Marketing Research, 23, Otter, T., Tüchler, R., & Frühwirth-Schnatter, S. (2004): Capturing consumer heterogeneity in metric conjoint analysis using Bayesian mixture models. International Journal of Research in Marketing, 21, Park, C. S. (2004): The robustness of hierarchical Bayes Conjoint Analysis Under Alternative Measurement Scales. Journal of Business Research, 57, Sattler, H. (1994). Die Validität von Produkttests. Marketing ZFP, 1,