PREFACE. The data files referred to in this text are all available on the student web site as part of this module.

M J XAVIER

This data analysis module was developed by Professor M.J. Xavier in conjunction with the textbook authors for the basis of class discussion rather than to illustrate either effective or ineffective marketing practice. Copyright 2004 by Houghton Mifflin Company. All rights reserved. Houghton Mifflin Company hereby grants you permission to print the Houghton Mifflin material contained in this work solely for use with the accompanying Houghton Mifflin textbook. All reproductions must include the Houghton Mifflin copyright notice, and no fee may be collected except to cover the cost of duplication. If you wish to make any other use of this material, including reproducing or transmitting the material or portions thereof in any form or by any electronic or mechanical means including any information storage or retrieval system, you must obtain prior written permission from Houghton Mifflin Company, unless such use is expressly permitted by federal copyright law. If you wish to reproduce material acknowledging a rights holder other than Houghton Mifflin Company, you must obtain permission from the rights holder. Address inquiries to College Permissions, Houghton Mifflin Company, 222 Berkeley Street, Boston, MA 02116-3764.

PREFACE This practical guide on data analysis has been prepared specifically for the business students majoring in marketing who have an aversion for numbers and statistical methods. The simple step-by-step approach used in the guide should enable students to gain insight into statistical tools and help them develop their skills in interpreting and making meaning out of numbers. The entire range of statistical tools has all been explained using a single data set from a questionnaire on tooth-paste market. The tools covered range from simple frequencies, mean, median etc. to multivariate techniques like factor, cluster and discriminant analysis. The questionnaire, the code sheet and the final report are all given in the appendix. The first chapter on simple analytical methods starts with SPSS data preparation and goes on to explain the use of descriptive statistics to prepare summary results for each question in the survey data. It also highlights the use of charts for displaying data. The second chapter goes into the use of brand rating data for making snake charts, and positioning of brand using factor analysis. The third chapter introduces the concept of correlation coefficient and its sue for getting derived importance weights used for construction of Kano diagram. Chapter 4 uses the importance scores for benefits to do benefit segmentation using cluster analysis. Chapter 5 introduces Correspondence analysis and its use for mapping brandpersonality association data. Use of regression analysis in marketing research is explained in Chapter 6. The problem of multicollinearity and talking the same using factor analysis is also explained in the same. Chapter 7 describes the use of discriminant analysis to find out brand drivers for different brands. Chapter 9 explains the use of multi-dimensional scaling for brand positioning. The data files referred to in this text are all available on the student web site as part of this module. I am grateful to the graduate and undergraduate students who enrolled for the marketing research course during the Fall 2003 term for their co-operation in developing the questionnaire and collection of data. I am grateful to Dr. R Krishnan, Director Graduate Program and Dr. Norm Borin, Marketing Area Chair for their support and encouragement for this project. M J Xavier

CONTENTS Chapter pic Page No. 1. Introduction and Simple Analytical methods 3 2. Snake Chart, Factor Analysis and Brand Positioning 13 3. Kano Model 26 4. Cluster Analysis and Benefit Segmentation 31 5. Correspondence Analysis 37 6. Regression Analysis 44 7. Discriminant Analysis 51 8. Multidimensional Scaling 59 APPENDIX othpaste Questionnaire 71 Code-sheet 77 Power Point Slides 88

Chapter 1 Introduction and Simple Analytical Methods Objectives: 1. understand Data view and Variable view in an SPSS data file. 2. understand the difference between String and Numeric variables. 3. become familiar with Labels and Value labels. 4. learn how to get frequencies of variables and get Pie or Bar charts of those frequencies. 5. learn how to create a transformed variable and understand the difference between the raw variable and the transformed variable. 6. learn how to calculate the mean, standard deviation, and variance of a variable using SPSS. 7. understand how to make a cross-tabulation of two nominal variables and use the chi-square test to see whether the relationship between the two variables is significant or not. 8. learn to use the compare means command and learn about independent t-test. Data View Versus Variable View: Open the file `descriptives.sav At the bottom left hand corner you will see TWO BUTTONS: DATA VIEW AND VARIABLE VIEW Click on the variable view. You will see the complete definition of each variable. Data View- Variable View 3

String Versus Numeric Variables Study the variable definitions. Note that Nickname is a `STRING VARIABLE All others are `NUMERIC VARIABLES Go to Data View and see that String variables are made up of letters or letters and numbers (alpha numeric) while the numeric variables are made up of numbers only. Labels and Value Labels Go to variable view again and study the columns LABEL and VALUES. Click at the right hand corner of VALUES corresponding to the VARIABLE `class A small window shown below will open. These are the codes used for the variable class. Now shift to Data view and then click: VIEW VALUE LABEL You will notice that labels corresponding to the numerical codes appear on the data sheet. Frequencies and Charts: Let us first try to understand the profile of respondents. 4

Let us start with the age profile. Run the following SPSS Commands to get the distribution of respondents by age. ANALYZE DESCRIPTIVE STATISTICS FREQUENCIES Drag variable `Age[q12a] on to the VARIABLE(S) Box CHART PIE CHARTS PERCENTAGES CONTINUE OK Check if you get the following table and the Pie chart from in a new window. Age Valid Frequency Percent Valid Percent Cumulative Percent Under 18 years 58 82.9 82.9 82.9 18-24 years 12 17.1 17.1 100.0 tal 70 100.0 100.0 Age 18-24 years 17.1% Under 18 years 82.9% 5

Now repeat the analysis with other demographic variables, namely Household Income, Gender, and Race. You can drag all three variables to the variables box and have the charts made simultaneously. Now do the frequencies with other variables, awareness of Brands and also with trial of brands. Now go the variable Current Brand and change the chart from PIE to Bar and see if you get the following chart. 40 Current Brand 30 20 10 Percent 0 Aqua-Fresh Crest Arm & Hammer Colgate Mentadent Others Current Brand Raw Variable Versus Transformed Variables: Suppose we want to know on an average how many brands a person is aware of, we cannot get it directly from the data. We need to create a new variable from the existing ones. 6

Try the following commands to create a new variable called aware which is derived from other variables. TRANSFORM COMPUTE Type `aware in the TARGET VARIABLE Box Move variable `q01a into the NUMERIC EXPRESSION Box Click on + Move variable `q01b into the NUMERIC EXPRESSION Box Click on + Move variable `q01c into the NUMERIC EXPRESSION Box Click on + Move variable `q01d into the NUMERIC EXPRESSION Box Click on + Move variable `q01e into the NUMERIC EXPRESSION Box Click on + Move variable `q01f into the NUMERIC EXPRESSION Box Click on + Move variable `q01g into the NUMERIC EXPRESSION Box Click on + OK Note that we are forming a numeric expression Aware = q01a + q01b + q01c + q01c + q01d + q01e + q01f + q01g Notice that a new column has been created by SPSS called `aware. While the original variables are called raw variables, the new one formed out of raw variables is called a transformed variable. Go to variable view and type `No. Of Brands Aware in the LABEL column corresponding to the variable `aware Now perform a Frequency analysis on the new variable `aware and get a bar chart as shown below. No. Of Brands Aware Valid Cumulative Frequency Percent Valid Percent Percent 2.00 1 1.4 1.4 1.4 3.00 4 5.7 5.7 7.1 4.00 30 42.9 42.9 50.0 5.00 28 40.0 40.0 90.0 6.00 6 8.6 8.6 98.6 7.00 1 1.4 1.4 100.0 tal 70 100.0 100.0 7

50 No. Of Brands Aware 40 30 20 10 Percent 0 2.00 3.00 4.00 5.00 6.00 7.00 No. Of Brands Aware In the same way create a new variable called trial (no. Of brands tried by each person) using the following expression. trial = q02a + q02b + q02c + q02d + q02e + q02f + q02g And find the frequency distribution of the number of brands tried. Mean, Standard Deviation and Variance: Note that the two new transformed variables, namely aware and trial, are different from the other variables we have seen earlier. These are ratio scaled variables whereas the other variables are only nominally scaled. We shall see how to calculate mean, standard deviation and variance for a ratio scaled data. 8

Try the following SPSS Commands. ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES drag the variable `aware to VARIABLES Box OPTIONS VARIANCE CONTINUE OK You will get the following output Descriptive Statistics N Minimum Maximum Mean Std. Deviation Variance No. Of Brands Aware 70 2.00 7.00 4.5286.84650.717 Valid N (listwise) 70 This shows that on an average a respondent is aware of 4.53 brands and the standard deviation of the same variable is 0.85. Crosstabs and Chi-Square Test: Suppose we want to know whether the use of a particular brand depends on whether the person is a male or female, we need to use the type of analysis called cross-tabulation. Crosstabs is used to explore the relationship between two nominal or categorical variables. Try the following SPSS Commands ANALYZE DESCRIPTIVE STATITICS CROSSTABS Drag Gender to ROW(S) Drag Current Brand to COLUMN(S) CELLS ROW CONTINUE STATISTICS CHI-SQUARE OK 9

Gender * Current Brand Cross-tabulation Gender tal Male Female Current Brand Aqua- Mentade Arm & Fresh Colgate Crest nt Hammer Others tal Count 7 9 8 3 3 5 35 % within Gender 20.0% 25.7% 22.9% 8.6% 8.6% 14.3% 100.0% Count 3 11 17 3 0 1 35 % within Gender 8.6% 31.4% 48.6% 8.6%.0% 2.9% 100.0% Count 10 20 25 6 3 6 70 % within Gender 14.3% 28.6% 35.7% 8.6% 4.3% 8.6% 100.0% It is a convention to keep the independent variable in the row, dependent variable in the column and get row percentages in the cells. In this case, we are trying to explore whether gender has an impact on brand choice. interpret the table, always look column-wise and see if the percentages vary drastically. In the Aquafresh column, there is a larger percentage of males. Colgate has marginally large percentages of females. Crest has a substantially large percentage of females compared to males. Mentadent has an equal following among males and females. Arm & Hammer appears to be an exclusive male brand. There appears to be some relationship between gender and brand used. check whether the relationship is significant or not, we need to look at the chi-square value. Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 10.707(a) 5.058 Likelihood Ratio 12.230 5.032 Linear-by-Linear Association 1.452 1.228 N of Valid Cases 70 a 6 cells (50.0%) have expected count less than 5. The minimum expected count is 1.50. A chi-square value of 10.707 at 5 degrees of freedom is significant at 0.058, i.e. at a confidence level of 94.2. Normally we look for a confidence level of 95% or more. As it is close to 95%, and also given the fact that some of the cells have values less than 5, we can take this as a significant relationship. Note that the cell frequencies should be 5 or more for chi-square test. However SPSS applies a correction factor to take care of this deficiency and makes it all the more difficult to attain significance. 10

The same rule for the significance level of 0.05 or less applies to all the tests that we are going to learn, be it t-test, f-test or any other test. Degrees of freedom = (no. of rows -1) x (no. of columns 1) The same way construct cross tabs for age, income, and race against Current Brand and check if the relationships are significant using chi-square values. Compare Means Suppose we want to know whether the mean number of brands aware across males and females, we could use the following commands. ANALYZE COMPARE MEANS MEANS Drag variable `aware to DEPENDENT LIST gender to INDEPENDENT LIST OK The following output will be obtained. No. Of Brands Aware Report Gender Mean N Std. Deviation Male 4.6286 35 1.00252 Female 4.4286 35.65465 tal 4.5286 70.84650 4.6 and 4.2 are very close values. The difference between Male and Female appears to be very marginal. Now do the analysis with other classification variables, race, income, age and class. Independent t-test: Suppose we want to know whether the difference of 0.4 in the number of brand aware of between male and female populations is statistically significant, we need to conduct a t- test. 11

Run the following SPSS Commands ANALYZE COMPARE MEANS INDEPENDENT SAMPLES t-test Drag variable aware to the TEST VARIABLE(S) Drag variable q12c to GROUPING VARIABLE DEFINE GROUPS GROUP 1 (Type 1) GROUP 2 (Type 2) CONTINUE OK Independent Samples Test No. Of Brands Aware Equal variances assumed Equal variances not assumed Levene's Test for Equality of Variances F Sig. t df t-test for Equality of Means Sig. (2- tailed) Mean Differe nce Std. Error Differe 3.608.062.988 68.327.2000.20239.988 58.53 5.327.2000.20239 95% Confidence Interval of the Difference nce Lower Upper -.6038.2038 6 6 -.2050 4.6050 4 Our Null Hypothesis is that the means for males and females are the same. The alternate Hypothesis is that the means are significantly different. As we do not know which one should be greater we use a 2-tailled significance test. Notice that the t-value of 0.988 at 68 degrees of freedom has significance of only 0.327. The corresponding confidence level is 67.3% which is too low. Unless the significance level is less than 0.05 the mean values are not significantly different. Note that the degrees of freedom in this case are number of observations minus two. Conduct the t-test for other variables age, race and income to see whether the mean brands aware of vary by any of these categories. 12

Objectives: Chapter 2 Snake Chart, Factor Analysis and Brand Positioning understand how to compute mean ratings of brands and to construct snake charts. learn how to run factor analysis and understand the following concepts: - variance explained - factor loading - eigenvalue - communality - rotation - factor score Use factor analysis for brand positioning. Data Structure: Open the file factor.sav Study the file structure. This new file has been created out of the master data file by rearranging the variables q06a01 to q06d11 as indicated below. q06a01... q06a11 q06b01... q06b11 q06c01... q06c11 q06d01... q06d11 1. 2...... 70 1. 2...... 70 1. 2...... 70 1. 2...... 70 Original Data 13

Brand Code q06a01... q06a11 Data q06b01... q06b11 Data q06c01... q06c11 Data q06d01... q06d11 Data Rearranged Data for Further Analysis Note that blank rows have been deleted in the new file and a new variable brand code has been created. Snake Chart: We need to calculate the mean ratings of brands, before we can construct the snake chart. Use the following commands to obtain the mean ratings. ANALYZE COMPARE MEANS MEANS Highlight and Drag `q06_01 q06_10 to DEPENDENT LIST Highlight `brand and drag to INDEPENDENT LIST OPTIONS Uncheck NUMBER OF CASES Uncheck STANDARAD DEVIATION CONTINUE OK 14

Fighting Cavities Whitening Teeth Cleaning Stains Good Taste Likeable Flavor Freshening Breath Brand Image Attractive Packaging Innovative features Brand Color Aquafresh 7.2 6.1 6.7 6.8 6.9 7.4 6.6 7.0 7.1 6.4 Colgate 8.1 6.9 7.5 6.9 7.0 7.7 7.9 7.0 7.0 6.8 Crest 8.4 7.6 7.9 8.2 8.0 8.3 8.7 7.8 7.8 7.4 Mentadent 9.3 8.1 8.9 9.3 9.6 9.6 7.9 8.7 8.3 7.4 Arm & Hammer 9.0 8.3 8.3 6.0 5.0 8.7 7.3 7.7 7.3 8.0 Others 8.6 6.2 6.8 9.0 8.8 9.4 5.8 7.2 6.0 8.4 Mean Ratings for Brands Right click on the table, copy and paste onto an excel worksheet. Highlight the relevant portions and click Chart Wizard. Choose Line, press Next and Finish to get the following snake chart. 12.0 10.0 8.0 6.0 4.0 2.0 0.0 Fighting Cavities Whitening Teeth Cleaning Stains/Tartar Good Taste Likeable Flavor Freshening Breath Brand Image Color Attractive Packaging Innovative features/ingredients Aqua-Fresh Colgate Crest Mentadent Arm & Hammer Others This chart can be used to study the relative positioning of brands on different attributes. We can see that Mentadent and Arm&Hammer are rated highly on selected attributes while Crest scores consistently higher rating on all the attributes. Aquafresh has lower rating and Colgate is stuck in the middle. Here the points are very cluttered and it is difficult to see finer distinctions. Factor analysis will help us do sharper positioning. 15

Factor Analysis Factor analysis is used to understand the underlying dimensions of a set of variables having high correlation among them. Execute the following commands to get the factor analysis output. ANALYZE DATA REDUCTION FACTOR Highlight and Drag `q06_01 q06_10 to VARIABLES DESCRIPTIVES Check COEFFICIENTS CONTINUE EXTRACTION SCREE PLOT CONTINUE ROTATION VARIMAX CONTINUE SCORES SAVE VARIABLES CONTINUE OPTIONS SORTED BY SIZE CONTINUE OK Take a look at the Correlation Matrix and notice that the variables are correlated among themselves. For example the correlation between good taste and likeable flavor is as high as 0.928. The correlations are sufficient for conducting a factor analysis is confirmed by Bartlett s Test of Sphericity which is significant. 16

Fighting Cavities Whitening Teeth Cleaning Stains Correlation Matrix Good Taste Likeabl e Flavor Freshenin g Breath Brand Image Attracti ve Packag ing Innovativ e features Color Fighting Cavities 1.000.385.610.221.203.450.367.142.210.228 Whitening Teeth.385 1.000.613.278.223.403.354.198.249.501 Cleaning Stains/Tartar.610.613 1.000.218.161.470.461.188.314.488 Good Taste.221.278.218 1.000.928.475.313.436.303.366 Likeable Flavor.203.223.161.928 1.000.481.276.436.300.318 Freshening Breath.450.403.470.475.481 1.000.384.349.358.390 Brand Image.367.354.461.313.276.384 1.000.576.624.373 Color.142.198.188.436.436.349.576 1.000.650.419 Attractive Packaging.210.249.314.303.300.358.624.650 1.000.413 Innovative features.228.501.488.366.318.390.373.419.413 1.000 Now take a look at the variance explained matrix. tal Variance Explained Component Initial Eigenvalues Rotation Sums of Squared Loadings tal % of Variance Cumulative % tal % of Variance Cumulative % 1 4.438 44.378 44.378 2.651 26.512 26.512 2 1.577 15.773 60.152 2.392 23.915 50.428 3 1.239 12.391 72.543 2.211 22.115 72.543 4.804 8.043 80.585 5.500 5.004 85.590 6.440 4.399 89.989 7.343 3.432 93.421 8.329 3.291 96.711 9.261 2.613 99.324 10.068.676 100.000 Read component as factors in the table. Technically the 10 original variables can be converted into 10 new factors which are orthogonal to each other (i.e., will have zero correlation among them). The first such factor will account for 44.378 percent variance in the original data, second one will account for 15.773 percent and so on. In statistics, variance is information. As 72.543 percent of information (variance) is summarized by three variables, it is enough to work with three factors. We shall see what the eigenvalue and rotation mean later. 17

Now go to the data view in the SPSS data file. You will notice that three new variables, namely, fact1_1, fact2_1 and fact3_1 have been added by the system. The values that these variables take are called factor scores. Basically the original 10 inter-correlated variables have been converted to 3 new factors which are orthogonal to each other. check the orthogonality, do the following analysis. ANALYSE CORRELATE BIVARIATE Highlight and drag `fact1_1, fact2_1 and fact3_1 to VARIABLES OK You will get the following output which shows that the factors have zero correlation between them. Correlations REGR factor score 1 for analysis 1 REGR factor score 2 for analysis 1 REGR factor score 3 for analysis 1 REGR factor score 1 for analysis 1 REGR factor score 2 for analysis 1 REGR factor score 3 for analysis 1 Pearson Correlation 1.000.000 Sig. (2-tailed). 1.000 1.000 N 199 199 199 Pearson Correlation.000 1.000 Sig. (2-tailed) 1.000. 1.000 N 199 199 199 Pearson Correlation.000.000 1 Sig. (2-tailed) 1.000 1.000. N 199 199 199 Now the problem is to find out what these factors mean. Obviously the three new factors summarize the information present in the original ten variables. We need to establish which variables go into which factor. Look at the rotated component matrix 18

Rotated Component Matrix (a) Component 1 2 3 Cleaning Stains/Tartar.878.203.014 Fighting Cavities.763.051.103 Whitening Teeth.758.150.126 Freshening Breath.550.220.484 Innovative features/ingredients.482.437.234 Attractive Packaging.154.867.115 Color.013.830.323 Brand Image.364.757.077 Likeable Flavor.098.183.950 Good Taste.152.195.933 The cells contain factor loadings, i.e., correlation coefficients of original variables with the new factors. Conduct the following analysis to confirm the above statement. Conduct a correlational analysis of the first variable Cleaning of Stains/Tartar with the three new factors (fact1_1, fact2_1 and fact3_1) to get the first row in the rotated component matrix. The variable `cleaning stains/tartar has a correlation coefficient of 0.878 with the first factor, 0.203 with the second factor, and 0.014 with factor 3. What it means is that the variable `cleaning stains/tartar belongs to first factor. The same way the variables highlighted in the column corresponding to factor 1 belong to the same factor. For the moment ignore the variable `innovative features/ingredients as it is highlighted in two columns. Looking at the variables that go into each factor we can name them as Dental Hygiene, Visibility and Sensory Benefits. Factor -1 Factor 2 Factor 3 Cleaning Stains/Tartar Fighting Cavities Attractive Packaging Color Likeable Flavor Good Taste Whitening Teeth Freshening breath Brand Image Dental Hygiene Visibility Sensory Benefits The variable `innovative features/ingredients has a high correlation with Dental Hygiene as well as Visibility. Suppose that a brand claims in its advertisements that it has a new ingredient that whitens the teeth, it contributes to Dental Hygiene as well the Visibility of the brand. That is how it features in two factors. 19

Now take a look at the communalities matrix. Communalities Initial Extraction Fighting Cavities 1.000.596 Whitening Teeth 1.000.613 Cleaning Stains/Tartar 1.000.812 Good Taste 1.000.931 Likeable Flavor 1.000.945 Freshening Breath 1.000.585 Brand Image 1.000.712 Color 1.000.794 Attractive Packaging 1.000.789 Innovative features/ingredients 1.000.478 Communalities refer to the amount of information that has been extracted from each variable. Notice that more than 90 percent of information (variance) has been extracted from variables good taste and likeable flavor whereas less than 50% is extracted from innovative features/ingredients. If we work with large number of variables, say more than 20, it may be a good idea to leave out variables with low communality while naming factors. In the same way, the eigenvalues are directly proportional to the amount of variance explained by each factor. The sum of all eigenvalues always equals the total number of variables. Hence the proportion of variance explained by each factor can be calculated by dviding the corresponding eigenvalue by the total number of variables. Now take a look at the variance explained table to verify the same. As the first eigenvalue is 4.438, the variance explained by the first factor can be calculated by diving 4.438 by 10 (total number of variables) and multiplying bv 100. Now take a look at the Scree Plot 20

5 Scree Plot 4 3 2 Eigenvalue 1 0 1 2 3 4 5 6 7 8 9 10 Component Number Scree plot gives an idea as to how many factors to extract. The rule normally applied is to stop at where the arm bends. In this case it is three factors. After three factors the curve gets flat indicating that the gain will be marginal if we go beyond three factors. The default in SPSS is that it stops when the eigenvalue gets to less than one. understand the concept of rotation, take a look at the unrotated component matrix. If we plot factor2 and factor3 we get the graph shown below. 21

.6 Likeable Flavor.4 Good Taste.2 Freshening Breath Fighting Cavities Whitening Teeth Cleaning Stains/Tart FACTOR3 -.0 -.2 Innovative features/ -.4 Color Brand Image -.6 -.6 -.4 Attractive Packaging -.2 -.0 FACTOR2. 2.4.6.8 It is difficult to interpret this type of data as we find that the cluster of variables likeable flavor and good taste are mid-way between factor 2 and factor 3. If we rotate the y-axis so as to pass through the cluster we can interpret the y-axis as Sensory Benefits. In the same way the x-axis can be rotated to pass through the cluster that corresponds to Dental Hygiene. Rotation is done to make it easy to interpret the output. Note that the angle between X and Y axis in our rotation is more than 90 degrees. If the angle is maintained at 90 o it is called an orthogonal rotation otherwise it is known as oblique rotation. Note that we used Varimax rotation which is an orthogonal rotation method. What we have achieved by conducting a factor analysis is that we have converted the original ten variables into 3 new factors. Now we can use these three new variables to do brand positioning. We can bring both the variables and the brands on to the same map. 22

Brand Positioning Using Factor Scores: We shall now find out the mean ratings of brands for the three new factors. First of all go to the variables view and label those new factors as: 1. Dental Hygiene 2. Visibility 3. Sensory Benefits Then compute mean ratings by executing the following commands. ANALYZE COMPARE MEANS MEANS Highlight and Drag `Dental Hygiene, `Visibility, and `Sensory Benefits to DEPENDENT LIST Drag `brand to INDEPENDENT LIST OPTIONS Highlight `number of case and `standard deviation and send back to STATISTICS CONTINUE OK You will get the following output. Dental Sensory Brand Hygiene Visibility Benefits Aqua-Fresh -.5622039 -.1069474 -.1044542 Colgate.0723164 -.0603488 -.2591244 Crest.3170127.2179352.2061569 Mentadent.7780827.0560046.9165143 Arm & Hammer.9514446.0165516 -.9228734 Others.0827401 -.8240165 1.1187287 tal.0000000.0000000.0000000 We already have the coordinates of the variables in the rotated components matrix. Create combined table, which has the coordinates of both the brands and attributes as below. 23

Brand/Attribute Dental Hygiene Visibility Sensory Benefits Aqua-Fresh -0.56-0.11-0.1 Colgate 0.07-0.06-0.26 Crest 0.32 0.22 0.21 Mentadent 0.78 0.06 0.92 Arm & Hammer 0.95 0.02-0.92 Others 0.08-0.82 1.12 Cleaning Stains/Tartar 0.88 0.2 0.01 Fighting Cavities 0.76 0.05 0.1 Whitening Teeth 0.76 0.15 0.13 Freshening Breath 0.55 0.22 0.48 Innovative features/ingredient 0.48 0.44 0.23 Attractive Packaging 0.15 0.87 0.12 Color 0.01 0.83 0.32 Brand Image 0.36 0.76 0.08 Likeable Flavor 0.1 0.18 0.95 Good Taste 0.15 0.2 0.93 Using this data create a new SPSS file factor1.sav get the positioning map of the first two factors use the following commands. GRAPHS SCATTER SIMPLE DEFINE Drag `Dental Hygiene to X-AXIS Drag `Visibility to Y-AXIS Drag `Brand/Attribute to LABEL CASES BY OPTIONS Check DISPLAY CHART WITH CASE LABELS OK The resulting plot can be taken to Power point to have it annotated as shown below. Note that the attributes are represented as vectors and brands as points. 24

1.0 Colo r Attractive Packaging Brand Image.5 Innovative features Visibility 0.0 Aqua-Fresh Good Taste Likeable Colgate Crest Freshening Cleaning Stains Breath Whitening Teeth Mentadent Fighting Cavities Arm & Hammer -.5 Others -1.0 -.6 -.4 -.2 -.0.2.4 Dental Hygiene.6.8 1.0 While Arm & Hammer and Mentadent are seen as better in Dental Hygiene, Crest scores better on Visibility. In the same way get the other two plots, namely, Dental Hygiene Versus Sensory Benefits and Visibility Versus Sensory Benefits. 25

Chapter 3 KANO Model Objectives: understand the basics of Kano Model learn how to calculate derived Importance weights for attributes learn how to plot the Kano Model and interpret the same Data Files: We will be using two different data files for this analysis. 1. cluster.sav 2. factor.sav Stated Importance: This will be the mean importance rating given to attributes by the respondents. calculate the means, open the file cluster.sav and run the following commands. ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES Highlight and drag variables `q05a.q05j OPTIONS Check DESCENDING MEANS CONTINUE OK 26

Descriptive Statistics N Minimum Maximum Mean Std. Deviation Freshening Breadth 70 4.00 7.00 6.3000.76802 Fighting Cavities 70 1.00 7.00 6.1143 1.44004 Cleaning Stains/Tartar 70 1.00 7.00 5.6714 1.34834 Whitening Teeth 70 1.00 7.00 5.6429 1.41458 Good Taste 70 1.00 7.00 5.3429 1.50252 Likeable Flavor 70 1.00 7.00 5.1714 1.52250 American Dental Association Recommendation 70 1.00 7.00 4.4857 1.88620 Innovative Feature/new ingredient 70 1.00 7.00 4.0714 1.36543 High Prestige Brand 70 1.00 7.00 3.4714 1.50093 Attractive Packaging 70 1.00 7.00 3.4286 1.68171 Valid N (listwise) 70 You find that Freshening Breath is the most important attribute with a mean rating of 6.3 on a 1-7 scale. Attractive Packaging is the least important attribute. convert the means into importance weights, we need to normalize the means. Take the above table to Excel and find the sum of means. Then calculate: Importance Weight = (Mean/Sum of Means)*100 Develop the following Stated Importance weights table. Attribute Mean Importance Weight Freshening Breadth 6.30 12.68 Fighting Cavities 6.11 12.30 Cleaning Stains/Tartar 5.67 11.41 Whitening Teeth 5.64 11.35 Good Taste 5.34 10.75 Likeable Flavor 5.17 10.41 American Dental Association Recommendation 4.49 9.03 Innovative Feature/new ingredient 4.07 8.19 High Prestige Brand 3.47 6.98 Attractive Packaging 3.43 6.90 Sum 49.70 100.00 Stated Importance 27

Derived Importance: In order to obtain the derived importance we are going to use the file factor.sav. By correlating rating of attributes with the overall rating we get the derived importance of attributes. Use the following commands. ANALYZE CORRELATE BIVARIATE Highlight variables q06_01 to q06_11 and drag to VARIABLES OK You will get an 11 by 11 matrix of correlations. We are interested in the last column only which has the correlation of individual attributes with the overall rating. As correlations can range from -1 to +1, take the r 2 value for derived importance. Once again these values can be normalized by taking the sum of all the r 2 values. The resultant table is given below. Attribute r r 2 Weights Importance Freshening Breath 0.65 0.43 13.26 Good Taste 0.63 0.39 12.22 Likeable Flavor 0.59 0.35 10.84 Brand Image 0.59 0.35 10.81 Innovative features/ingredients 0.57 0.33 10.22 Color 0.57 0.32 9.96 Cleaning Stains/Tartar 0.54 0.29 9.04 Whitening Teeth 0.53 0.28 8.77 Fighting Cavities 0.52 0.27 8.46 Attractive Packaging 0.45 0.21 6.41 Sum 3.21 100.00 Derived Importance Weights Bring stated and derived importance to a common table as shown below. Now plot derived Stated Importance against derived Importance to develop the Kano Model. 28

Attribute Stated Importance Derived Importance Attractive Packaging 6.90 6.41 Cleaning Stains/Tartar 11.41 9.04 Fighting Cavities 12.30 8.46 Freshening Breadth 12.68 13.26 Good Taste 10.75 12.22 High Prestige Brand 6.98 10.81 Innovative Feature/new ingredient 8.19 10.22 Likeable Flavor 10.41 10.84 Whitening Teeth 11.35 8.77 Stated Versus Derived Importance Use the commands: GRAPH SCATTER Drag stated importance to X-AXIS Drag derived importance to Y-AXIS Drag attribute to LABEL CASES BY OPTIONS Check DISPLAY CHART WITH CASE LABELS CONTINUE OK On the graph use 10 as a cut off for High and Low values of importance and illustrate by taking it to PowerPoint. 29

Derived Importance 1 4 1 High 3 1 2 1 1 1 0 Low 9 8 7 6 6 7 Delight Attributes High Prestige brand Brand Innovative Ingredients Attractive packaging Packaging Low 8 Whitening teeth teethteeth 9 1 1 0 High 1 Stated Importance Good Taste Taste Likeable flavor Flavor Freshening breath Breadth Cleaning Stains/Tartar Minimum Expected Attributes Fighting cavities Cavities 1 2 1 3 KANO s Model According to Kano s model attributes that have a high stated and low derived importance are Minimum expected attributes. Attributes like whitening teeth, fighting cavities, cleaning stains are the minimum expected in a toothpaste. Attributes with low stated and high derived importance are called Delight attributes. The marketers should concentrate on these attributes. In this study the innovative ingredients and brand image emerge as the delight attributes. Others are linear attributes. If they are important then pay attention. The most important attribute is freshening breath, as the stated and derived importance is high. If they have low importance one should not do over engineering of those attributes. In this case spending too much on packaging may not produce commensurate returns. 30

Chapter 4 Cluster Analysis and Benefit Segmentation Objectives: learn how cluster analysis can be used for grouping of subjects. understand the difference between hierarchical clustering and k-means clustering. use SPSS to perform cluster analysis and interpret the results. learn how to use cluster analysis for benefit segmentation. Cluster Analysis: We shall use the cluster.sav file for this session. Let us first calculate the descriptives. Execute the following commands. ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES Drag variables q05a to q05j to VARIABLES OK If you sort the output according to descending values of standard deviation you will get this output. Attribute N Mean Std. Deviation American Dental Association Recommendation 70 4.49 1.89 Attractive Packaging 70 3.43 1.68 Likeable Flavor 70 5.17 1.52 Good Taste 70 5.34 1.50 High Prestige Brand 70 3.47 1.50 Fighting Cavities 70 6.11 1.44 Whitening Teeth 70 5.64 1.41 Innovative Feature/new ingredient 70 4.07 1.37 Cleaning Stains/Tartar 70 5.67 1.35 Freshening Breadth 70 6.30 0.77 Mean Rating of Benefits Sought 31

From the output, it is clear that the five variables with high standard deviation are: American Dental Association Recommendation Attractive Packaging Likeable Flavor Good Taste High Prestige Brand Fighting Cavities Whitening Teeth These are the attributes where the opinion of the respondents varies much. Hence for clustering and segmentation we shall use only these seven variables. Hierarchical Clustering: Let us start with hierarchical clustering. Execute the following commands. ANALYZE CLASSIFY HIERARCHICAL CLUSTER Highlight and drag the seven variables to VARIABLE(S) Drag nickname to LABEL CASES BY PLOTS Check DENDROGRAM Check NONE CONTINUE If you look at the output you will find a tree structure. If you leave out three cases 38 bearing the nickname `Warden, 2 (Bud) and 9 (Hank) there are three major branches. That gives us some idea about how many clusters to ask for when we go to K-means clustering. 32

K-Means Clustering: In this method the respondents will get allocated to different clusters based on the number of clusters the researcher asks for. Based on the results of the hierarchical cluster we have decided to ask for three clusters. ANALAYZE CLASSIFY K-MEANS CLUSTER Highlight and drag the five variables to VARIABLES box Drag `nickname to LABEL CASES BY NUMBER OF CLUSTERS change from 2 to 3 SAVE CLUSTER MEMEBERSHIP CONTINUE OPTION CLUSTER INFORMATION FOR EACH CASE Look at FINAL CLUSTER CENTERS. Attribute Cluster 1 Cluster 2 Cluster 3 Attractive Packaging 2.63 3.46 3.82 American Dental Association Recommendation 4.63 2.08 5.24 Likeable Flavor 3.32 5.77 5.89 Good Taste 3.53 6.00 6.03 High Prestige Brand 2.63 3.08 4.03 Whitening Teeth 5.58 5.85 5.61 Fighting Cavities 6.68 3.77 6.63 Yellow filling indicates rank one across the row and green indicates rank 2. Benefit Segmentation: Cluster 1 members are primarily concerned with fighting cavities and are also interested in American Dental Association recommendation. So the benefit sought is `medically proven cavity fighter. Cluster 2 is primarily interested in whitening teeth. They have also given relatively high rating for likeable flavor and good taste. Though they get the second rank on attractive packaging and high prestige brand, the ratings themselves are low in absolute terms. the benefit sought by this group is white teeth plus good taste and flavor. Cluster 3 wants everything; they look for a balanced paste that provides dental care and sensory benefits (taste & flavor). 33

So the benefit segments that we have devised are as follows: 1. Proven cavity fighter 2. Tasty flavorful paste for white teeth 3. Balanced paste which provides dental care as well as good taste and flavor From the table on number of cases in each cluster we find that 19 are in cluster 1, 13 are in cluster 2 and 38 are in cluster 3. The majority (54%) of the people want a balanced paste, 27% want a cavity fighter, and 19% are for white teeth. Number of Cases in Each Cluster Cluster 1 19.000 2 13.000 3 38.000 Valid 70.000 Missing.000 Now take a look at the cluster membership table. This gives the information about the cluster membership of each individual. The same information is also stored in the SPSS data file as a new variable crated qcl_1. Insert label values for the new variable as given below: 1. Cavity Fighter 2. White Teeth 3. Balanced Paste Cross Classification with Demographic Variables: Cross tabulation of the new variable qcl_1 Vs race will produce the following table. Crosstab Cluster Number of Case tal Cavity fighter White teeth Balanced Paste Race/Ethnicity White Others tal Count 16 3 19 % within Cluster Number of Case 84.2% 15.8% 100.0% Count 9 4 13 % within Cluster Number of Case 69.2% 30.8% 100.0% Count 30 8 38 % within Cluster Number of Case 78.9% 21.1% 100.0% Count 55 15 70 % within Cluster Number of Case 78.6% 21.4% 100.0% 34

While cavity fighting is important for more proportion of whites, white teeth seems to be of more importance to non-whites. However the chi-square does not show a significant relationship between benefits segments and race. Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 1.036(a) 2.596 Likelihood Ratio 1.005 2.605 Linear-by-Linear Association.097 1.755 N of Valid Cases 70 a 2 cells (33.3%) have expected count less than 5. The minimum expected count is 2.79. The same kind of analysis can be done with age, income, and gender. The benefit segments versus current brand produced the following table. Crosstab Cluster Number of Case tal Cavity fighter White teeth Balanced Paste Current Brand Aquafresh Colgate Crest Others tal Count 4 5 6 4 19 % within Cluster Number of 21.1% 26.3% 31.6% 21.1% 100.0% Case Count 1 7 2 3 13 % within Cluster Number of 7.7% 53.8% 15.4% 23.1% 100.0% Case Count 5 8 17 8 38 % within Cluster Number of 13.2% 21.1% 44.7% 21.1% 100.0% Case Count 10 20 25 15 70 % within Cluster Number of Case 14.3% 28.6% 35.7% 21.4% 100.0% Aquafresh has a large proportion of cavity fighters; Colgate has a large proportion of white-teeth seekers; and Crest has a large proportion of the balanced paste segment. Once again the chi-square is not significant. We need to take these results with a pinch of salt. 35

Chi-Square Tests Value df Asymp. Sig. (2-sided) Pearson Chi-Square 7.213(a) 6.302 Likelihood Ratio 7.038 6.317 Linear-by-Linear Association.674 1.412 N of Valid Cases 70 a 6 cells (50.0%) have expected count less than 5. The minimum expected count is 1.86. 36

Chapter 5 Correspondence Analysis Objectives: understand the basic nature of correspondence analysis. conduct correspondence analysis using SPSS and interpret the results. What is Correspondence Analysis? Correspondence analysis is typically used to get a graphical representation of contingency tables. Suppose we did a sample study in which we obtained the income level and the brand used by 36 respondents. The following table summarizes the responses. Brand Brand A Brand B Brand C Brand D tal Income Less than $1000 5 2 1 1 9 $1000 to $3000 2 4 1 2 9 $3001 to $5000 2 2 4 1 9 Above $5000 2 1 1 5 9 tal 11 9 7 9 36 Data Preparation: get a better insight on the relationship between income and brand used, we could use correspondence analysis. run correspondence analysis using SPSS we need to have the data organized in the following format. 37

Income Income Brand Brand Frequency Code Code Less Than $1000 1 Brand A 1 5 Less Than $1000 1 Brand B 2 2 Less Than $1000 1 Brand C 3 1 Less Than $1000 1 Brand D 4 1 $1000 to $ 3000 2 Brand A 1 2 $1000 to $ 3000 2 Brand B 2 4 $1000 to $ 3000 2 Brand C 3 1 $1000 to $ 3000 2 Brand D 4 2 $3001 to $5000 3 Brand A 1 2 $3001 to $5000 3 Brand B 2 2 $3001 to $5000 3 Brand C 3 4 $3001 to $5000 3 Brand D 4 1 Above $5000 4 Brand A 1 2 Above $5000 4 Brand B 2 1 Above $5000 4 Brand C 3 1 Above $5000 4 Brand D 4 5 SPSS Commands: The same data has been used to create a data file corres1.sav. Use the following SPSS commands to run correspondence analysis. DATA WEIGHT CASES WEIGHT CASES BY Drag `freq to FREQUENCY VARIABLE OK ANALYZE DATA REDUCTION CORRESPONDENCE ANALYSIS Drag `income to ROW VARIABLE DEFINE RANGE MINIMUM VALUE `1 MAXIMUM VALUE `4 UPDATE CONTINUE Drag `brand to COLUMN VARIABLE DEFINE RANGE MINIMUM VALUE `1 MAXIMUM VALUE `4 UPDATE CONTINUE OK 38

Interpretation of Results: The data we used as input for the analysis is printed in the correspondence table. Correspondence Table BRAND INCOME Brand A Brand B Brand C Brand D Active Margin Less Than $1000 5 2 1 1 9 $1000 to $3000 2 4 1 2 9 $3001 to $ 5000 2 2 4 1 9 Above $5000 2 1 1 5 9 Active Margin 11 9 7 9 36 From the summary table we can infer that the first two dimensions account for 81.4% of inertia. This is pretty similar to the eigenvalues in factor analysis. It is enough to work with two dimensions. Summary Dimen sion Singular Value Inertia Chi Square Sig. Proportion of Inertia Accounted for Cumulati ve 1.427.182.497.497 2.341.116.317.814 3.261.068.186 1.000 tal.367 13.201.154(a) 1.000 1.000 a 9 degrees of freedom Correspondence analysis decomposes the original matrix into row and column points. Overview Row Points(a) Score in Dimension Of Point to Inertia of Dimension Contribution Of Dimension to Inertia of Point INCOME Mass 1 2 Inertia 1 2 1 2 tal Less Than $1000.250.310.801.080.056.470.128.682.810 $1000 to $3000.250.020.245.053.000.044.001.096.097 $3001 to $ 5000.250.717 -.764.106.301.428.518.469.987 Above $5000.250-1.047 -.282.127.642.058.920.053.974 Active tal 1.000.367 1.000 1.000 39

The score in Dimension gives the co-ordinates for the row variables in the joint plot. In the same way you will find the co-ordinates for the column variables in the net matrix. Overview Column Points(a) Score in Dimension Of Point to Inertia of Dimension Contribution Of Dimension to Inertia of Point BRAND Mass 1 2 Inertia 1 2 1 2 tal Brand A.306.198.640.068.028.367.075.627.702 Brand B.250.283.251.059.047.046.146.092.238 Brand C.194.720 -.960.107.236.526.402.571.972 Brand D.250-1.085 -.287.133.689.060.947.053 1.000 Active tal 1.000.367 1.000 1.000 Using the coordinates for the row and column co-ordinates the program produces a joint map which is given below. 1.0 Less Than $1000 Brand A.5 Brand B Dimension 2 0.0 -.5 Brand D Above $5000 $1000 to $3000-1.0-1.5-1.0 -.5 0.0 $3001 to $ 5000 Brand C.5 1.0 BRAN D INCOM E Dimension 1 40

From the chart it is very clear that there is a one-to-one relationship between brand used and the income category. It also shows that Brand A and B are closer to each other than the other brands. othpaste Data: Now let us turn our attention to the brand-personality association data collected in the toothpaste study. The data has been arranged in the file correspond.sav Open file corresponds, study the way the data is arranged and run the following SPSS Commands. DATA WEIGHT CASES WEIGHT CASES BY Drag `freq to FREQUENCY VARIABLE OK ANALYZE DATA REDUCTION CORRESPONDENCE ANALYSIS Drag `attri to ROW VARIABLE DEFINE RANGE MINIMUM VALUE `1 MAXIMUM VALUE `11 UPDATE CONTINUE Drag `brand to COLUMN VARIABLE DEFINE RANGE MINIMUM VALUE `1 MAXIMUM VALUE `3 UPDATE CONTINUE OK 41

Notice that 100 percent inertia has been accounted for the first two dimensions. It is enough to work with two dimensions. Dimensio n Singular Value Inertia Chi Square Sig. Summary Proportion of Inertia Accounted for Cumulati ve Confidence Singular Value Standar Correlati d on Deviatio n 2 1.286.082.793.793.035.043 2.146.021.207 1.000.036 tal.103 72.795.000(a) 1.000 1.000 a 20 degrees of freedom The resulting correspondence map can be taken to PowerPoint and annotated as given below..8.6 Hedonist Masculine Dimension 2.4.2 0.0 -.2 Aquafresh Fun Loving Outgoing Feminine Sensuous Romantic Colgate Overcautious Traditional -.4 Crest Ambitious -.6 -.8-1.0 -.5 Achiever 0.0.5 Dimension 1 1.0 Brand Personality Trait Crest is seen to be used by Ambitious Achievers. Colgate is seen to be used by the Traditional, Overcautious, Masculine person. Aquafresh is seen as Fun-loving, Feminine and Outgoing. There is no brand available for Romantic, sensuous types. Here is an opportunity for a new product. 42

Correspondence analysis is a powerful tool for visualization of data from contingency tables. It has no restrictions on the sample size or on the scale used. Association between any two categorical variables can be easily analyzed using this technique. 43

Chapter 6 Regression Analysis Objectives: understand the meaning of regression. conduct simple regression and interpret the results. conduct multiple regression and interpret the results. understand the problem of multi-collinearity and a method to overcome the same. Simple Regression: Open the file regression.sav Study the file structure to understand that it is the same field which was used for factor analysis along with three new variables corresponding to the factor scores that we created using factor analysis. Variables q06_01 to q06_10 refer to rating scores for different brands on different attributes. Variable q06_11 correspond to overall rating given to different brands. We are going to fit regression equations with overall rating as dependent variable and attribute ratings as independent variables. Linear regression refers to fitting of a linear mathematical model between one dependent variable and one or more independent variables. We shall first conduct a regression analysis using just two variables. Use the following SPSS commands to fit a regression model with Fighting Cavities (q06_01) as independent variable and Overall Rating (q06_11) as the dependent variable. ANALYZE REGRESSION LINEAR Drag Fighting Cavities (q06_01) to INDEPENDENT Drag Overall Rating(q06_11) to DEPENDENT OK Take a look at the Coefficients table. 44

Coefficients(a) Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 3.771.477 7.905.000 Fighting Cavities.505.059.521 8.576.000 a Dependent Variable: Overall rating The unstandardized coefficients give the linear mathematical model: Y = 3.771 + 0.505 X Y - Overall rating X Fighting Cavities The strength of relationship between the independent and dependent variables is given by the correlation coefficient R given in the Model Summary table. Model Summary Adjusted R Std. Error of Model R R Square Square the Estimate 1.521(a).272.268 1.21679 a Predictors: (Constant), Fighting Cavities The value of R 2 0.272 is somewhat low. Normally an R 2 value of 0.7 and above is supposed to signify a strong relationship. In the present case we cannot rule out that there is no relationship between the two variables as the F value is significant in the ANOVA Table and the t-value corresponding to the variable Fighting Cavities is significant in the coefficients Table. ANOVA(b) Model 1 Sum of Squares df Mean Square F Sig. Regressio n 108.880 1 108.880 73.540.000(a) Residual 291.672 197 1.481 tal 400.553 198 a Predictors: (Constant), Fighting Cavities b Dependent Variable: Overall rating 45

Coefficients(a) Unstandardized Coefficients Standardized Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 3.771.477 7.905.000 Fighting Cavities.505.059.521 8.576.000 a Dependent Variable: Overall rating In the same way, conduct a simple regression of each rating variable with the dependent variable and interpret the results. Multiple Regression: Now we shall conduct regression analysis with all the 10 attribute ratings as independent variables. ANALYZE REGRESSION LINEAR Drag variables q06_01 to q06_10 to INDEPENDENT Drag Overall Rating(q06_11) to DEPENDENT OK Note that the R 2 value dramatically improves to 0.743 Model Summary Adjusted R Std. Error of Model R R Square Square the Estimate 1.862(a).743.730.74117 a Predictors: (Constant), Innovative features/ingredients, Fighting Cavities, Likeable Flavor, Attractive Packaging, Whitening Teeth, Brand Image, Freshening Breath, Color, Cleaning Stains/Tartar, Good Taste From the coefficients table, we can construct the following mathematical model to depict the relationship between the 10 independent variables and the overall rating. Y = 0.598 + 0.208X 1 + 0.097X 2 + 0.016X 3 + 0.109X 4 + 0.060X 5 + 0.150X 6 + 0.135X 7 + 0.154X 8 0.096X 9 + 0.120X 10 The negative sign for variable nine (Attractive Packaging) connotes that the same has a negative relationship with overall rating. That is, the paste receiving higher rating on attractive packaging has a diminishing effect on the overall rating. 46