To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications

Size: px
Start display at page:

Download "To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications"

Transcription

1 To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications Factor Analysis Introduction Factor Analysis attempts to identify the underlying structure in a data set by defining a small number of factors that capture the variation in the collected data. Factor analysis assumes that relationships between variables are due to the effects of underlying factors and that observed correlations are the result of variables sharing common causes. Describing a data set in terms of factors (or latent variables as they are sometimes called) is often useful at a theoretical level as it may identify the underlying processes which determined the correlations among the variables. This often allows a simpler and clearer interpretation of the relationships in the data. Factor analysis is also useful at a more practical level as it reduces the number of variables in the data set making model selection easier, particularly with respect to regression models. A common example of the use of factor analysis is in the identification of attitudes through questionnaire research. In a large questionnaire a number of questions are likely to address similar issues which will lead to answers which are correlated. For example, answers of 'agree' or 'strongly agree' to questions such as 'It is important to preserve one's culture', 'I am prepared to die for my country' and 'It is good to take part in our traditional festivals' may lead one to conclude that the 'patriotism' factor is present. Here, patriotism is not a single measurable entity but a construct which is derived from the measurement of other, directly observable variables (the individual questions in the questionnaire). The patriotism factor explains some of the relationships between the variables and its identification can simplify the description of the data and help in our understanding of a complex relationship. Postulating the existence of something called 'patriotism' through the identification of the factor, explains the observed correlations between responses to numerous and varied situations. Key Features Factor analysis attempts to identify the underlying structure in a data frame in terms of its correlational structure. Factor analysis can be used to reduce a large number of variables to a smaller number of factors. Factor analysis can help to identify the under-lying causal structure in the data set. Factor analysis can help in the further analysis of the data by reducing the number of variables and simplifying model selection General Principles of Factor Analysis Research that attempts to investigate why consumers select certain supermarkets might ask consumers to rate a number of questions: For example, How important is is that the store has parking facilities? How important is a home-delivery service? How important is it that the store has cash-points available? How important is it that the store has a cafeteria?

2 How important is it that the store has a petrol station? How important is is that the store has a friendly atmosphere? How important is it that the store has promotions? How important is it that the store offers value for money? The multiple variables recorded in such a questionnaire might actually be a reflection of a simpler underlying structure in the data. For example, those consumers who use a car will more likely rate parking facilities and petrol stations as being important, whilst rating a home delivery service and value for money as less important and having no particular preference for cafeterias and value for money. Similarly, those consumers with little money are likely to rate value for money and the availability of promotions as being important, whilst rating a home-delivery service as being unimportant and having no particular preference for the atmosphere in the store. An individual's response to a question is likely to depend on a number of underlying factors. This is represented in Equation 1 where the importance of car parking facilities is given as a function of a number of hypothetical factors. These factors are not gathered data, but are inferred from the relationships in the data set. Importance of a car park = β 1 F:attitude to car use β 2 F:environmental concern β 3 F:life style... β k F:monetary concern Equation 1 where Importance of a car park is a measured variable, β 1 to β k are regression weights and (F: ) are factors derived from the data. The response to the question How important is it that the store has parking facilities? may be related to the person's attitude to car use, their environmental concerns, their life-style, money considerations and other factors. Some of these factors will be more highly related to the question that others; information that is provided in the β 1 to β k coefficients. Equation 1 shows a recorded variable represented as a function of unrecorded factors. It is easy to also represent an unrecorded factor as a function of the recorded variables. This is shown in Equation 2, where a factor is defined in terms of the k questions asked in the questionnaire. F:Attitude to car use = β 1 How important is it that the store has parking facilities? β 2 How important is a home-delivery service? β 3 How important is it that the store has a cafeteria?... β k How important is it that the store offers value for money? Equation 2 where F:Attitude to car use is a factor derived from the data, β 1 to β k are regression weights and (How important...) are measured variables. The 'attitude to car use' factor in Equation 2 is identified through the relationships between the variables in the data. Further factors can be identified by investigating the relationships in the data that are uncorrelated with those factors that have already been identified. In practice, k factors can be derived from k variables, the difference being that the factors represent the correlational structure in the data whereas the variables represent the responses to the individual questions. The process of the initial identification of these factors (or components) is explained in detail in the chapter on

3 principal components analysis and will not be discussed further here. FA is based on correlations, so continuous data is, technically speaking, needed. However, this requirement is often relaxed so that ordered data, particularly 5-point Likert-scales can be used (see Hutcheson and Sofroniou, 1999, for a full discussion of this issue). Indeed, in management research, the use of factor analytic techniques is used predominantly on ordered responses collected from questionnaire research. The components that represent the correlational structure in the data may be difficult to interpret meaningfully, as individual components are often associated with a number of variables. It is common that an individual component will load highly on many of the variables in the data set which makes it difficult to identify the underlying structure. For example, Table 1 shows two components that have been derived from an analysis of educational data (see Hutcheson and Sofroniou, 1999). The table shows a list of 8 variables (the recorded data) and two components that have been derived using principal components analysis. The strength of the relationship between the variables and components is shown in the factor loading scores. From the table it can be seen that component 1 is more highly related to all of the variables than component 2, making it difficult to identify any meaningful distinction between the components. Table 1: Two components derived from 10 correlated variables Component 1 Component 2 Variables Articulation Comprehension Coordination Drawing Memory Motor Skill Sentence Completion Writing It is useful to illustrate the relationships shown in Table 1 in a graphic. Figure 1 shows the component loadings and clearly shows the variables clustering into two groups (identified here as physical dexterity and linguistic competence). This is quite obvious in this graphic, but is not obvious from the information in the table.

4 Figure 1: A graphical representation of the relationship between components and variables The two groups of variables can be identified by redistributing the component loadings so that individual components load highly on relatively few variables. This can be considered as a process of rotating the axes of the graphic in Figure 1, so that the axes are drawn closer to the clusters. This process is called rotation and is illustrated in Figure 2. Both of the axes of the graph in Figure 1 can be rotated so that they remain at 90 degrees to each other (orthogonal) or they can be rotated independently (oblique). Orthogonal rotation represents factors that are uncorrelated, whereas oblique rotation represents factors that are correlated. There are a number of popular methods for identifying the rotations used including ones that minimize the number of variables which load highly on factors in order to enhance the interpretability of the factors, to methods that minimize the number of factors in order to provide simpler interpretations (refer to Kim and Mueller, 1994, Browne (2001) and Bernaards and Jennrich (2006), for discussions of rotation techniques). Although there are many rotation techniques available, in practice, the different techniques tend to produce similar results when there is a large sample and the factors are relatively well defined (Fava and Velicer, 1992). Applying rotations often results in a clearer differentiation between the components and enables the factors to be identified from the factor loadings. Table 2 shows the loadings for the example above before rotation and after an orthogonal and an oblique rotation were computed.

5 Figure 2: A graphical representation of orthogonal and oblique rotation methods Table 2: Factor loadings for unrotated components and rotated factors Variables Unrotated components Orthogonal factors Oblique factors Articulation Comprehension Memory Sentence Completion Coordination Drawing Motor Skill Writing Table 2 clearly shows the orthogonal and oblique rotation methods have identified the underlying physical dexterity and linguistic competence factors. The process of rotation has identified the clusters in the loading matrix and enabled the underlying structure in the data to be identified. It

6 should be noted that whilst the structure in these data may be quite obvious (particularly as restricting the analysis to two factors enabled simple graphics to be used), identifying the underlying structure when there are greater numbers of components and factors can be difficult, making the use of factor analysis essential for complex data structures. Conclusion Factor analysis is a very common technique in management research and is used extensively in the analysis of questionnaire data. It is particularly useful in identifying the underlying structure in data sets and can contribute theoretical insights into the research area. Factor analysis can also reduce the number of variables needed to represent relationships (it is commonly referred to as a datareduction technique) and thereby provide benefits when the data are used for model-building (for example, generalised linear models). Further Reading Bernaards, C. A. and Jennrich, R. I. (2006). Gradient Projection Algorithms and Software for Arbitrary Rotation Criteria in Factor Analysis. Educational and Psychological Measurement. 65(5): Browne, M.W. (2001). An Overview of Analytic Rotation in Exploratory Factor Analysis. Multivariate Behavioral Research. 36(1): Fava, J. L. and Velicer, W. F. (1992). An empirical comparison of factor, image, component, and scale scores. Multivariate Behavioral Research, 27: Hutcheson, G. D., and Sofroniou, N. (1999). The Multivariate Social Scientist: an introduction to generalized linear models. Sage Publications. Kim, J. and Mueller, C. W. (1994). Factor Analysis: Statistical Methods and Practical Issues. In M. S. Lewis-Beck (editor). Factor Analysis and Related Techniques. International Handbooks of Quantitative Applications in the Social Sciences, Volume 5. Sage Publications. Graeme Hutcheson Manchester University Nick Sofroniou SI Research, Athens