Factor Analysis and Structural Equation Modeling: Exploratory and Confirmatory Factor Analysis

Similar documents
Transcription:

Factor Analysis and Structural Equation Modeling: Exploratory and Confirmatory Factor Analysis Hun Myoung Park International University of Japan 1. Glance at an Example Suppose you have a mental model that has two concepts (constructs) of economic value and moral value. These concepts are called latent variables or factors in a sense that they are not observable in reality. You cannot observe economic and moral values although you have some senses of these concepts. Figure 1. A Measurement Model with Two Latent Variables Now you want to measure both economic and moral values. Unfortunately, you cannot directly measure these concepts because they are not observable. But here is an alternative to measure the latent variables indirectly. Suppose you observe some phenomena that are assumed to manifest two factors. That is, a group of phenomena is caused (influenced) by the economic value (factor) and the other group by moral value (factor). These phenomena are captured by observed variables. In the figure above, private ownership, government responsibility, and competition manifest (or are caused by) the economic value, whereas the moral value is manifested by homosexuality, legalized abortion, and assisted suicide. Notice that the variables from private ownership through assisted suicide are observable, but economic and moral values are not. However, there are random components in these causal relationships. For instance, private ownership (observed variable) is explained by the economic value (latent variable) but the economic value alone cannot explain all variation of private ownership. The portion of variation that the economic value cannot explain is a random part of the manifest variable. This random component is labeled as ε 1 or σ 1. The impact of the latent variable on the observed variable is represented by β 1. This relationship is described in the ordinary least squares (OLS) style as follows.

Private ownership = α 1 + (β 1 economic value) + ε 1 2. What Do We Want to Know? The question here is if your mental model in you mind is correct. If your mental model is supported by statistical inference, then you will be able to have reliable measures of the economic and moral values. These latent variables are not observable but are measured indirectly through observed or manifest variables. The implication of this method is that you can reduce data (draw only two variables out of six observed variables). Once you reduce data from many observed variables to several latent variables successfully, you can use these latent variables as dependent and independent variables in quantitative methods like OLS. If your mental model turns out incorrect, you have to modify your model and test it out again. Factor analysis is a data reduction technique that examines the relationship between observed and latent variables (factors). This process is called measurement model that links manifest variables to unobserved factors. Here is the summary of related terminologies. Alternative Names Role/Relationship Latent variables Factors, constructs, concepts Cause (manifested by) observed variables Observed variables Manifest variables Manifest (caused by) latent variables The fundamental questions of factor analysis are (1) to what extent observed variables are significantly caused by factors (confirm your mental model measurement model) and (2) how to aggregate observed variables if they turn out significantly being influenced by the corresponding latent variables. There are two approaches to confirm your mental model: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). EFA does not impose any constraints on the model, while CFA places substantive constraints. EFA is data driven, but CFA is theory driven. Once your measurement model turns out statistically significant, you may calculate factor score of the latent variables on the basis of the factor analysis. Or simply you can get, for example, a factor-based score or an average of individual means of related observed variables (create a variable that has means of three variables of each subject and then calculate the average of the new variable). CFA does not produce any factor scores. 3. Model Specification Our mental model is represented as follows (Albright and Park 2009: 3). Assume that all latent and observed variables are mean centered (transformed to have deviation from their means) in order to eliminate intercept terms. or X = Λξ + δ x 1 = λ 11 ξ 1 + δ 1 x 2 = λ 21 ξ 1 + δ 2 x 3 = λ 31 ξ 1 + δ 3 x 4 = λ 42 ξ 2 + δ 4 x 5 = λ 52 ξ 2 + δ 5 x 6 = λ 62 ξ 2 + δ 6

in which X is the vector of observed variables, Λ (lambda) is the matrix of factor loadings λ connecting the ξ i to the x i, ξ (ksi) is the vector of common factors, and δ (delta) is the vector of unique factors. It is assumed that the error terms have a mean of zero, E(δ) = 0, and that the common and unique factors are uncorrelated, E(ξδ )=0. This model is called measurement model that describes the relationship between latent variables and observed variables (observed variables are used to measure or estimate the latent variables). Measurement models specify how latent variables (hypothetical constructs) directly or indirectly influences changes in other latent variables. There are two types of measurement models: one for dependent observed variables labeled as Y and the other for independent observed variables labeled as X. And they are formally written as Y = Λ y η +ε and X = Λ x ξ +δ and their matrix arrangement looks like, y1 λy y2 = λ y...... y p 0 11 21 0 ε1 0 η + ε 1 2... η... 2 λyp 2 ε p x1 λx x2 = λx...... x q λxq 11 21 1 [ ξ ] 1 δ 1 δ 2 +... δ q A structural model describes the causal relationship between X and Y. A structural model specifies the causal relationships among the latent endogenous (η, eta) and exogenous (ξ, ksi) variables, describes the causal effects, and assigns the explained and unexplained variance (disturbance term). The formal expression of a structural model is η = B η + Γξ + ζ and its matrix arrangement looks like, η 1 0 = η 2 β 21 β 12 0 η 1 γ + η 2 γ 11 21 ζ ζ 2 1 [ ξ ] + 1 Structural equation model integrates measurement models and structural models. Basic notations used in structural equation modeling are summarized as follows. η (eta) is a m x 1 random vector of latent dependent, or endogenous, variables ξ (ksi) is a n x 1 random vector of latent independent, or exogenous, variables y is a p x1 vector of observed (endogenous) indicators of the dependent latent variables η x is a q x 1 vector of observed (exogenous) indicators of the independent latent variables ξ ε (epsilon) is a p x 1 vector of measurement errors in an observed endogenous variable y δ (delta) is a q x 1 vector of measurement errors in an observed exogenous variable x L y (lambda y) is a p x m coefficients matrix of the regression of y on η

Lx (lambda x) is a q x n coefficients matrix of the regression of x on ξ B (beta) is a m x m coefficients matrix of the h in the structural relationship. (B has zeros in the diagonal, and I - B is required to be non-singular). Γ (gamma) is a m x n coefficients matrix of the x in the structural relationship. ζ (zeta) is a m x 1 vector of equation errors (residual) in the structural relationship between η and ξ. The following is an example of structural equation model used in (Byrne 1998: 38).

4. Descriptive Statistics Once you obtain and clean data, you need to describe data and take a look at them carefully before conducting statistical inferences. Although often ignored in reality, descriptive statistics provide important information and guidance on data analysis.. sum * Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- privtown 1200 3.508333 2.259244-1 10 govtresp 1200 4.3075 2.700127-1 10 compete 1200 3.438333 2.39847-1 10 homosex 1200 4.663333 3.317755-1 10 abortion 1200 4.323333 2.991306-1 10 euthanas 1200 2.61 2.474807-1 10 The correlations among six observed variables suggest that private ownership, government responsibility and competition are closely related to each other and homosexuality, legalized abortion, and assisted suicide are another group. And government responsibility is significantly related to homosexuality, legalized abortion, and assisted suicide as well, implying that this variable is related to the moral value (2 nd factor) as well.. graph matrix privtown-euthanas, half privtown 10 5 govtresp 0 10 5 compete 0 10 5 homosex 0 10 5 abortion 0 10 5 euthanas 0 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10. pwcorr privtown-euthanas, sig privtown govtresp compete homosex abortion euthanas -------------+------------------------------------------------------ privtown 1.0000 govtresp 0.0826 1.0000 0.0042

compete 0.4061 0.1113 1.0000 0.0000 0.0001 homosex -0.0199 0.1402-0.0076 1.0000 0.4915 0.0000 0.7914 abortion -0.0106 0.1041-0.0027 0.4833 1.0000 0.7127 0.0003 0.9260 0.0000 euthanas 0.0335 0.1062 0.0621 0.3589 0.4071 1.0000 0.2455 0.0002 0.0314 0.0000 0.0000 5. Exploratory Factor Analysis Factor analysis follows (1) mental model (measurement model), (2) data collection and cleaning, (3) data description (descriptive statistics), (4) factor extraction (determine the number of factors), (5) rotation (choose rotation methods), (6) interpretation and labeling, and (7) calculation of factor scores. The core steps are extraction of the factors, determination of the number of meaningful factors, rotation, and creation of factor scores. Let us first fit an exploratory measurement model without any constraints.. factor privtown-euthanas (obs=1200) Factor analysis/correlation Number of obs = 1200 Method: principal factors Retained factors = 2 Rotation: (unrotated) Number of params = 11 -------------------------------------------------------------------------- Factor Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 1.13871 0.53063 0.9871 0.9871 Factor2 0.60808 0.62757 0.5271 1.5142 Factor3-0.01949 0.10605-0.0169 1.4973 Factor4-0.12554 0.08365-0.1088 1.3885 Factor5-0.20919 0.02978-0.1813 1.2072 Factor6-0.23897. -0.2072 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(15) = 854.44 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances ------------------------------------------------- Variable Factor1 Factor2 Uniqueness -------------+--------------------+-------------- privtown 0.0463 0.5310 0.7158 govtresp 0.2025 0.1502 0.9364 compete 0.0714 0.5389 0.7045 homosex 0.6158-0.0809 0.6143 abortion 0.6422-0.0799 0.5812 euthanas 0.5467 0.0141 0.7010 ------------------------------------------------- Stata output above displays eigenvalues of each factor and their proportions. The significant difference of eigenvalues implies the appropriate number of factors to be extracted. That is, a tiny difference of eigenvalues (e.g.,.10605 between eigenvalues of factors 3 and 4) means that factor 3 and 4 are not distinguished statistically. Accordingly, there appear to be two factors in the measurement model. Then factor loadings of two factors are listed (Albright and Park 2009: 3). For example,.0463 is the amount of factor 1 that is loaded on private ownership. The squared factor loading,.000214 =.0463 2, is called commonality of the factor or the proportion of variance in an observed variable that is explained by the factor. That is, factor 1 can explain.02 percent of variance in private ownership. Similarly, 28.1961 percent of variance

of private ownership (=.5310 2 ) is explained by factor 2. The last column is uniqueness or the proportion of variance in an observed variable that is not explained by factors listed; for instance, 71.58 percent of variance in private ownership is not explained by factor 1 and factor 2. By definition, the sum of squared factor loadings and uniqueness is 1; =.0463 2 +.5310 2 +.7158 = 1. Therefore, this mental model for private ownership appears to be really bad. Cattell (1966: 26-27) suggests interpretability criteria for factor analysis. (1) There are at least three variables (items) with significant loadings on each retained component (latent variable); (2) the variables that load on a given component share the same conceptual meaning; (3) the variables that load on different components seem to be measuring different constructs; (4) the rotated factor pattern demonstrates simple structure. The simple structure here means that (1) most of the variables have relatively high factor loadings on only one component (factor), and near zero loadings on the other components and (2) most components have relatively high factor loadings for some variables, and near-zero loadings for the remaining variables. Stata can draw factor loading plot and scree plot that visualizes the result of factor analysis. By looking at the loading plot below, we are able to guess that there are two groups of observed variables (private ownership and competition versus homosexuality, legalized abortion, and assisted suicide) and government responsibility is located at a blurring area. The scree plot implies that two factors are reasonable since eigenvalues marginally change after 2 factors.. loadingplot. screeplot Factor loadings Scree plot of eigenvalues after factor Factor 2 -.2 0.2.4.6 privtown compete govtresp euthanas homosex abortion 0.2.4.6 Factor 1 Eigenvalues -.5 0.5 1 1.5 1 2 3 4 5 6 Number Next step in to rotate factor loadings to clarify distinction of factors. There are two types of rotation: orthogonal and non-orthogonal rotations. Varimax, quartimax, equamax, and parsimax are common orthogonal methods, where promax and quartimin are commonly used non-orthogonal (oblique) methods.. rotate, varimax Factor analysis/correlation Number of obs = 1200 Method: principal factors Retained factors = 2 Rotation: orthogonal varimax (Kaiser off) Number of params = 11 -------------------------------------------------------------------------- Factor Variance Difference Proportion Cumulative

-------------+------------------------------------------------------------ Factor1 1.13257 0.51836 0.9818 0.9818 Factor2 0.61421. 0.5324 1.5142 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(15) = 854.44 Prob>chi2 = 0.0000 Rotated factor loadings (pattern matrix) and unique variances ------------------------------------------------- Variable Factor1 Factor2 Uniqueness -------------+--------------------+-------------- privtown -0.0111 0.5329 0.7158 govtresp 0.1852 0.1711 0.9364 compete 0.0130 0.5434 0.7045 homosex 0.6209-0.0143 0.6143 abortion 0.6471-0.0104 0.5812 euthanas 0.5420 0.0728 0.7010 ------------------------------------------------- Factor rotation matrix -------------------------------- Factor1 Factor2 -------------+------------------ Factor1 0.9942 0.1075 Factor2-0.1075 0.9942 -------------------------------- Rotation will give us different pattern matrix (factor loadings). Rotation does not change raw information such as variation but views statistics from different perspectives. Let us draw a loadings plot and scree plot on the based of varimax rotation. These plots are not significantly different from those above (without rotation). Factor 2 0.2.4.6 privtown compete govtresp euthanas homosex abortion 0.2.4.6 Factor 1 Rotation: orthogonal varimax Method: principal factors Factor loadings Eigenvalues -.5 0.5 1 1.5 Scree plot of eigenvalues after factor 1 2 3 4 5 6 Number Stata has a command to compare factor loadings before and after rotation. Since uniqueness remains unchanged, only factor loadings are compared..estat rotatecompare Rotation matrix -- orthogonal varimax (Kaiser off) ------------------------------------ Variable Factor1 Factor2 -------------+---------------------- Factor1 0.9942 0.1075 Factor2-0.1075 0.9942 ------------------------------------ Factor loadings ----------------------------------------------------------- Rotated Unrotated Variable Factor1 Factor2 Factor1 Factor2 -------------+----------------------+---------------------- privtown -0.0111 0.5329 0.0463 0.5310

govtresp 0.1852 0.1711 0.2025 0.1502 compete 0.0130 0.5434 0.0714 0.5389 homosex 0.6209-0.0143 0.6158-0.0809 abortion 0.6471-0.0104 0.6422-0.0799 euthanas 0.5420 0.0728 0.5467 0.0141 ----------------------------------------------------------- Once you finish extracting factors successfully, you need to get aggregate information from observed variables. Assuming rotated measurement model is correct, then you can get factor scores of two factors by running.predict command. Stata will create two variables factor1 and factor2 and add them to the current dataset.. predict factor1 factor2 (regression scoring assumed) Scoring coefficients (method = regression; based on varimax rotated factors) ---------------------------------- Variable Factor1 Factor2 -------------+-------------------- privtown -0.01448 0.36833 govtresp 0.07292 0.09930 compete -0.00202 0.37982 homosex 0.33748-0.02270 abortion 0.36897-0.02257 euthanas 0.26351 0.04362 ----------------------------------. sum factor1 factor2 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- factor1 1200 1.06e-09.777661-1.583102 2.201795 factor2 1200-5.07e-10.6507104-1.70306 2.322467 Alternatively, you can calculate factor-based score or average of related observed variables. Pay attention to the following.egen command with rowmean() function.. egen f1 = rowmean(privtown govtresp compete). egen f2 = rowmean(homosex abortion euthanas) When comparing factor scores and factor-based scores, we observe big difference between two sets of scores. Factor scores are recommended since they are theory based. Cattell (1966) says that a factor score (component score) is a linear composite of the optimally-weighted observed variables, whereas a factor-based score is a linear composite of the variables that demonstrated meaningful loadings for the component in question (p. 31). Also see O Rourke and Hatcher (2013:72-74).. sum factor1 f1 factor2 f2 Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- factor1 1200 1.06e-09.777661-1.583102 2.201795 f1 1200 3.751389 1.666624-1 10 factor2 1200-5.07e-10.6507104-1.70306 2.322467 f2 1200 3.865556 2.299605-1 10 Before moving forward, you need to check reliability of observed variables. You can calculate alpha (measure of reliability) using.alpha command. If alpha is larger than.8, in general, the set of variables are assumed to be caused by the same factor. The first set of observed variables below shows.4286, implying that the measurement model of the economic value does not fit the data well.

. alpha privtown govtresp compete, asis item std Test scale = mean(standardized items) average item-test item-rest interitem Item Obs Sign correlation correlation correlation alpha - privtown 1200 + 0.7264 0.3278 0.1113 0.2003 govtresp 1200 + 0.5826 0.1156 0.4061 0.5777 compete 1200 + 0.7404 0.3516 0.0826 0.1527 - Test scale 0.2000 0.4286 ------------------------------------------------------------------------------- By contrast, homosexuality, legalized abortion, and assisted suicide are more reliable measures of the moral value (alpha=.6816).. alpha homosex abortion euthanas, asis item std Test scale = mean(standardized items) average item-test item-rest interitem Item Obs Sign correlation correlation correlation alpha - homosex 1200 + 0.7856 0.5020 0.4071 0.5786 abortion 1200 + 0.8062 0.5401 0.3589 0.5282 euthanas 1200 + 0.7531 0.4447 0.4833 0.6516 - Test scale 0.4164 0.6816 ------------------------------------------------------------------------------- 6. Confirmatory Factor Analysis 1 Stata has.sem command to fit confirmatory factor analysis and structural equation model. You need to specify a measurement model within parenthesis as shown in the following command. Here the latent variable is Values (one latent variable). -> indicates a causal relationship between latent and observed variables. method(ml) tells Stata to uses maximum likelihood method for estimation.. sem (Values -> privtown govtresp compete homosex abortion euthanas), method(ml) Endogenous variables Measurement: privtown govtresp compete homosex abortion euthanas Exogenous variables Latent: Values Fitting target model: Iteration 0: log likelihood = -17155.58 Iteration 1: log likelihood = -17154.46 Iteration 2: log likelihood = -17154.399 Iteration 3: log likelihood = -17154.399 Structural equation model Number of obs = 1200 Estimation method = ml Log likelihood = -17154.399 ( 1) [privtown]values = 1 -------------------------------------------------------------------------------- OIM Coef. Std. Err. z P> z [95% Conf. Interval] -- Measurement privtown <- Values 1 (constrained) _cons 3.508333.0651917 53.82 0.000 3.38056 3.636107

govtresp <- Values.3519197.0847335 4.15 0.000.1858451.5179943 _cons 4.3075.0779135 55.29 0.000 4.154792 4.460208 compete <- Values 1.280656.2595241 4.93 0.000.7719986 1.789314 _cons 3.438333.0692088 49.68 0.000 3.302687 3.57398 homosex <- Values.1101116.1216503 0.91 0.365 -.1283186.3485418 _cons 4.663333.0957354 48.71 0.000 4.475695 4.850971 abortion <- Values.115648.1105949 1.05 0.296 -.101114.33241 _cons 4.323333.0863156 50.09 0.000 4.154158 4.492509 euthanas <- Values.2180622.0865095 2.52 0.012.0485068.3876177 _cons 2.61.0714118 36.55 0.000 2.470036 2.749964 -- var(e.privtown) 3.415275.3523314 2.790056 4.180599 var(e.govtresp) 7.075966.2967287 6.517647 7.682112 var(e.compete) 2.984817.5986811 2.014594 4.422296 var(e.homosex) 10.9779.4496889 10.13098 11.89562 var(e.abortion) 8.917925.3658081 8.229018 9.664505 var(e.euthanas) 6.039458.2517568 5.565643 6.55361 var(values) 1.684681.3586063 1.110014 2.556861 -------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(9) = 616.17, Prob > chi2 = 0.0000 Look at the parameter estimates. The first coefficient of the latent variable on private ownership is set to 1 in order to make the measurement model exactly identified; the measurement model without any constraint is not under-identified. The unstandardized coefficient on government responsibility is.3519 and its standard error is.0847 (p<.000). Therefore, the latent variable causes government responsibility significantly. By contrast the coefficient of the latent variable on homosexuality is.1101 and its standard error is.1217 (p<.365). Therefore, it will be a suspicious relationship between the latent variable and observed variable homosexuality. The last line of the Stata output is the chi-square test of the model fit. The null hypothesis is that the covariance structure of the population and sample is identical: Σ = s. The chi-square 616.17 is large enough at the 9 degrees of freedom to reject the null hypothesis at the.01 level (p<.0000). That is, covariance of the sample is different from that of the population; the measurement model does not fit the data well. The parameter estimates need to be standardized to evaluate the impact of latent variable on observed variables substantively. Once you fit a measurement model, and run.sem with standardized option.. sem, standardized Structural equation model Number of obs = 1200 Estimation method = ml Log likelihood = -17154.399 ( 1) [privtown]values = 1 -------------------------------------------------------------------------------- OIM Standardized Coef. Std. Err. z P> z [95% Conf. Interval] -- Measurement privtown <- Values.5747456.0584807 9.83 0.000.4601256.6893656 _cons 1.553523.0428827 36.23 0.000 1.469474 1.637571

govtresp <- Values.1692386.0392781 4.31 0.000.0922549.2462222 _cons 1.595961.0435272 36.67 0.000 1.510649 1.681272 compete <- Values.6933293.0750543 9.24 0.000.5462257.840433 _cons 1.434155.0411136 34.88 0.000 1.353574 1.514736 homosex <- Values.0430952.0484466 0.89 0.374 -.0518584.1380487 _cons 1.406155.0407087 34.54 0.000 1.326368 1.485943 abortion <- Values.0502016.0490001 1.02 0.306 -.0458368.14624 _cons 1.445902.0412847 35.02 0.000 1.364985 1.526819 euthanas <- Values.114414.0461772 2.48 0.013.0239083.2049196 _cons 1.055067.036016 29.29 0.000.9844773 1.125657 -- var(e.privtown).6696675.067223.5500642.8152768 var(e.govtresp).9713583.0132947.9456475.9977682 var(e.compete).5192945.1040746.3506063.769144 var(e.homosex).9981428.0041756.9899922 1.006361 var(e.abortion).9974798.0049198.9878837 1.007169 var(e.euthanas).9869094.0105666.966415 1.007838 var(values) 1... -------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(9) = 616.17, Prob > chi2 = 0.0000 The parameter estimate on private ownership is.5747 and its standard error is.0585 (p<.0000). The coefficient on homosexuality is.0431 and its standard error is.0484 (p<.374). The chi-square test remains unchanged even after standardization. The following command displays r-squared scores. For instance, the r-squared of the latent variable on private ownership is.3303 =.5747 2. This figure is interpreted as if it is r-squared in OLS; the latent variable can explain 33.03 percent of variation in private ownership. The following output suggests that the latent variable cannot explain government responsibility (2.86%), homosexuality (.19%), legalized abortion (.25%), and assisted suicide (1.31%) sufficiently. Yes, this measurement model with one latent variable is really bad.. estat eqgof Equation-level goodness of fit ------------------------------------------------------------------------------ Variance depvars fitted predicted residual R-squared mc mc2 -------------+---------------------------------+------------------------------ observed privtown 5.099957 1.684681 3.415275.3303325.5747456.3303325 govtresp 7.284609.2086435 7.075966.0286417.1692386.0286417 compete 5.747831 2.763014 2.984817.4807055.6933293.4807055 homosex 10.99832.020426 10.9779.0018572.0430952.0018572 abortion 8.940457.0225317 8.917925.0025202.0502016.0025202 euthanas 6.119567.0801085 6.039458.0130906.114414.0130906 -------------+---------------------------------+------------------------------ overall.5945024 ------------------------------------------------------------------------------ mc = correlation between depvar and its prediction mc2 = mc^2 is the Bentler-Raykov squared multiple correlation coefficient Stata can produce various goodness of fit measures. The presence of multiple goodness of fit measures implies that chi-square test does not always produce reliable result. You need to report at least chi-square, RMSEA (root mean square error of approximation), CFI (comparative fit index).. estat gof, stats(all)

---------------------------------------------------------------------------- Fit statistic Value Description Likelihood ratio chi2_ms(9) 616.175 model vs. saturated p > chi2 0.000 chi2_bs(15) 856.459 baseline vs. saturated p > chi2 0.000 Population error RMSEA 0.237 Root mean squared error of approximation 90% CI, lower bound 0.221 upper bound 0.253 pclose 0.000 Probability RMSEA <= 0.05 Information criteria AIC 34344.798 Akaike's information criterion BIC 34436.419 Bayesian information criterion Baseline comparison CFI 0.278 Comparative fit index TLI -0.203 Tucker-Lewis index Size of residuals SRMR 0.144 Standardized root mean squared residual CD 0.595 Coefficient of determination ---------------------------------------------------------------------------- RMSEA incorporates a penalty function for poor model parsimony (Brown 2006: 83-84). If RMSEA is less than.05, then you can conclude that the measurement model fit the data well. CFI evaluates the fit of a user-specified solution in relation to a more restricted, nested baseline model (Brown 2006: 84). CFI ranges from zero to 1; a CFI larger than.9 indicates a good fit. This measurement model with one latent variable does not fit data well since (1) chisquare is large (p<.0000), (2) RMSEA.237 is larger than.05, and (3) CFI.278 is smaller than.9. Then what you have to do? Modification indices provide you directions to go ahead. Stata post-estimation command.estat mindices will produce modification indices for you.. estat mindices Modification indices -------------------------------------------------------------------------- Standard MI df P>MI EPC EPC --------------------------+----------------------------------------------- cov(e.privtown,e.compete) 164.626 1 0.00 33.82137 10.59301 cov(e.privtown,e.homosex) 5.607 1 0.02 -.5239465 -.0855686 cov(e.privtown,e.abortion) 4.455 1 0.03 -.4239463 -.0768186 cov(e.privtown,e.euthanas) 3.914 1 0.05 -.3738381 -.0823136 cov(e.govtresp,e.homosex) 22.360 1 0.00 1.216426.1380172 cov(e.govtresp,e.abortion) 11.569 1 0.00.7888493.0993046 cov(e.govtresp,e.euthanas) 9.754 1 0.00.5990838.0916422 cov(e.compete,e.homosex) 9.224 1 0.00 -.8442552 -.1474875 cov(e.compete,e.abortion) 9.358 1 0.00 -.7724137 -.1497129 cov(e.homosex,e.abortion) 279.824 1 0.00 4.785221.483627 cov(e.homosex,e.euthanas) 154.265 1 0.00 2.934899.3604414 cov(e.abortion,e.euthanas) 198.536 1 0.00 3.001695.4090117 -------------------------------------------------------------------------- EPC = expected parameter change The first index says, if you posit a correlation between errors of private ownership and competition in the current measurement model, then the chi-square will decrease by164.626. 7. Confirmatory Factor Analysis 2 Now let us fit the original measurement model with two latent variables. The specification is (1) the economic value causes private ownership, government responsibility, and competition,

(2) the moral value causes homosexuality, legalized abortion, and assisted suicide, and (3) two latent variables economic and moral values are correlated: cov(economic*morals).. sem (Economic -> privtown govtresp compete) (Morals -> homosex abortion euthanas), method(ml) cov(economic*morals) Endogenous variables Measurement: privtown govtresp compete homosex abortion euthanas Exogenous variables Latent: Economic Morals Fitting target model: Iteration 0: log likelihood = -16864.451 Iteration 1: log likelihood = -16864.177 Iteration 2: log likelihood = -16864.175 Iteration 3: log likelihood = -16864.175 Structural equation model Number of obs = 1200 Estimation method = ml Log likelihood = -16864.175 ( 1) [privtown]economic = 1 ( 2) [homosex]morals = 1 ------------------------------------------------------------------------------------- OIM Coef. Std. Err. z P> z [95% Conf. Interval] -- Measurement privtown <- Economic 1 (constrained) _cons 3.508333.0651916 53.82 0.000 3.38056 3.636106 govtresp <- Economic.3318497.0857226 3.87 0.000.1638365.4998629 _cons 4.3075.0779135 55.29 0.000 4.154792 4.460208 compete <- Economic 1.420458.4544731 3.13 0.002.5297075 2.311209 _cons 3.438333.069209 49.68 0.000 3.302686 3.57398 homosex <- Morals 1 (constrained) _cons 4.663333.0957354 48.71 0.000 4.475695 4.850971 abortion <- Morals 1.021741.0787282 12.98 0.000.867437 1.176046 _cons 4.323333.0863156 50.09 0.000 4.154158 4.492509 euthanas <- Morals.6295779.0469439 13.41 0.000.5375695.7215863 _cons 2.61.0714118 36.55 0.000 2.470036 2.749964 -- var(e.privtown) 3.55354.5126355 2.678345 4.714721 var(e.govtresp) 7.114315.2960741 6.557057 7.718933 var(e.compete) 2.627709 1.000575 1.245828 5.542384 var(e.homosex) 6.313344.4211254 5.539632 7.19512 var(e.abortion) 4.049547.3855569 3.360189 4.880331 var(e.euthanas) 4.26259.2190258 3.854216 4.714233 var(economic) 1.54639.5138593.806247 2.965992 var(morals) 4.684978.4960753 3.806948 5.765516 -- cov(economic,morals).0768921.1201458 0.64 0.522 -.1585895.3123736 ------------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(8) = 35.73, Prob > chi2 = 0.0000 All standardized coefficients turn out statistically significant (discernable from zero). This two latent variable model appears to be better than one latent variable model.. sem, standardized

Structural equation model Number of obs = 1200 Estimation method = ml Log likelihood = -16864.175 ( 1) [privtown]economic = 1 ( 2) [homosex]morals = 1 ------------------------------------------------------------------------------------- OIM Standardized Coef. Std. Err. z P> z [95% Conf. Interval] -- Measurement privtown <- Economic.5506522.0898219 6.13 0.000.3746045.7267 _cons 1.553527.0428828 36.23 0.000 1.469478 1.637576 govtresp <- Economic.1528966.0382189 4.00 0.000.077989.2278042 _cons 1.59596.0435272 36.67 0.000 1.510649 1.681272 compete <- Economic.7367749.1181931 6.23 0.000.5051206.9684292 _cons 1.434151.0411136 34.88 0.000 1.353569 1.514732 homosex <- Morals.6526653.0285085 22.89 0.000.5967896.7085411 _cons 1.406155.0407087 34.54 0.000 1.326368 1.485943 abortion <- Morals.7396307.0294042 25.15 0.000.6819996.7972618 _cons 1.445902.0412847 35.02 0.000 1.364986 1.526819 euthanas <- Morals.5508621.0281688 19.56 0.000.4956524.6060719 _cons 1.055067.036016 29.29 0.000.9844772 1.125657 -- var(e.privtown).6967821.0989213.5275371.9203246 var(e.govtresp).9766226.0116871.9539829.9997996 var(e.compete).4571628.1741635.2166666.9646055 var(e.homosex).5740279.0372131.5055352.6518005 var(e.abortion).4529464.0434965.3752372.5467487 var(e.euthanas).6965509.0310342.6383051.7601117 var(economic) 1... var(morals) 1... -- cov(economic,morals).0285672.0440346 0.65 0.517 -.057739.1148735 ------------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(8) = 35.73, Prob > chi2 = 0.0000. estat eqgof Equation-level goodness of fit ------------------------------------------------------------------------------ Variance depvars fitted predicted residual R-squared mc mc2 -------------+---------------------------------+------------------------------ observed privtown 5.09993 1.54639 3.55354.3032179.5506522.3032179 govtresp 7.28461.170295 7.114315.0233774.1528966.0233774 compete 5.747863 3.120154 2.627709.5428372.7367749.5428372 homosex 10.99832 4.684978 6.313344.4259721.6526653.4259721 abortion 8.940456 4.890908 4.049547.5470536.7396307.5470536 euthanas 6.119567 1.856977 4.26259.3034491.5508621.3034491 -------------+---------------------------------+------------------------------ overall.8883495 ------------------------------------------------------------------------------ mc = correlation between depvar and its prediction mc2 = mc^2 is the Bentler-Raykov squared multiple correlation coefficient Chi-square decreased to 35.726 but p-value is still smaller than.05. RMSEA is.054 (marginally significant) and CFI.967 is high enough. This measurement model is better than one latent variable model, but it needs further modification.. estat gof, stats(all)

---------------------------------------------------------------------------- Fit statistic Value Description Likelihood ratio chi2_ms(8) 35.726 model vs. saturated p > chi2 0.000 chi2_bs(15) 856.459 baseline vs. saturated p > chi2 0.000 Population error RMSEA 0.054 Root mean squared error of approximation 90% CI, lower bound 0.037 upper bound 0.072 pclose 0.334 Probability RMSEA <= 0.05 Information criteria AIC 33766.349 Akaike's information criterion BIC 33863.061 Bayesian information criterion Baseline comparison CFI 0.967 Comparative fit index TLI 0.938 Tucker-Lewis index Size of residuals SRMR 0.041 Standardized root mean squared residual CD 0.888 Coefficient of determination ---------------------------------------------------------------------------- The following modification indices suggest a causal relationship between moral value and government responsibility to decrease chi-square by 25.098.. estat mindices Modification indices ------------------------------------------------------------------------- Standard MI df P>MI EPC EPC -------------------------+----------------------------------------------- Measurement govtresp <- Morals 25.098 1 0.00.2140615.1716679 -----------------------+----------------------------------------------- euthanas <- Economic 6.667 1 0.01.1736824.0873082 -------------------------+----------------------------------------------- cov(e.privtown,e.compete) 25.100 1 0.00 86.26505 28.23031 cov(e.govtresp,e.homosex) 10.588 1 0.00.715922.1068241 cov(e.homosex,e.abortion) 6.667 1 0.01 26.53614 5.248131 ------------------------------------------------------------------------- EPC = expected parameter change 8. Confirmatory Factor Analysis 3 If your data set contains missing value, you need to use method(mlmv) instead of method(ml) option. The following specification includes a causal relationship between government responsibility and the moral value.. sem (Economic -> privtown govtresp compete) (Morals -> homosex abortion euthanas govtresp), method(mlmv) cov(economic*morals) standardized Structural equation model Number of obs = 1200 Estimation method = mlmv Log likelihood = -16646.235 ( 1) [privtown]economic = 1 ( 2) [govtresp]morals = 1 ------------------------------------------------------------------------------------- OIM Standardized Coef. Std. Err. z P> z [95% Conf. Interval] --

Measurement privtown <- Economic.615487.0997437 6.17 0.000.419993.8109811 _cons 1.580339.0433778 36.43 0.000 1.49532 1.665358 govtresp <- Economic.1543272.0367104 4.20 0.000.0823761.2262783 Morals.1755322.0334072 5.25 0.000.1100554.2410091 _cons 1.599609.0435951 36.69 0.000 1.514165 1.685054 compete <- Economic.6908745.1110768 6.22 0.000.4731679.908581 _cons 1.43715.0411744 34.90 0.000 1.35645 1.517851 homosex <- Morals.6769805.0273489 24.75 0.000.6233776.7305835 _cons 1.467597.0418744 35.05 0.000 1.385525 1.549669 abortion <- Morals.736724.0274479 26.84 0.000.6829271.7905209 _cons 1.470055.0418022 35.17 0.000 1.388124 1.551985 euthanas <- Morals.5663112.0271625 20.85 0.000.5130738.6195487 _cons 1.074781.0363942 29.53 0.000 1.003449 1.146112 -- var(e.privtown).6211757.1227819.4216631.9150891 var(e.govtresp).9456524.0160201.9147692.9775782 var(e.compete).5226925.1534802.2939711.9293684 var(e.homosex).5416973.0370294.4737729.6193601 var(e.abortion).4572378.0404431.3844613.5437905 var(e.euthanas).6792916.0307648.6215924.7423467 var(economic) 1... var(morals) 1... -- cov(economic,morals) -.0051832.0458986-0.11 0.910 -.0951427.0847764 ------------------------------------------------------------------------------------- LR test of model vs. saturated: chi2(7) = 9.92, Prob > chi2 = 0.1932. estat eqgof Equation-level goodness of fit ------------------------------------------------------------------------------ Variance depvars fitted predicted residual R-squared mc mc2 -------------+---------------------------------+------------------------------ observed privtown 5.01994 1.901675 3.118265.3788243.615487.3788243 govtresp 7.267005.3949446 6.872061.0543476.2331258.0543476 compete 5.736214 2.737938 2.998276.4773075.6908745.4773075 homosex 10.58109 4.849342 5.731749.4583027.6769805.4583027 abortion 8.794909 4.773544 4.021365.5427622.736724.5427622 euthanas 6.057372 1.94265 4.114722.3207084.5663112.3207084 -------------+---------------------------------+------------------------------ overall.8890593 ------------------------------------------------------------------------------ mc = correlation between depvar and its prediction mc2 = mc^2 is the Bentler-Raykov squared multiple correlation coefficient This measurement model shows better goodness of fit measures. Chi-square 9.920 is small enough (p<.193) not to reject the null hypothesis of the same covariance structure between the population and sample. RMSEA.019 is smaller than.05 and CFI.997 is closer to 1.. estat gof, stats(all) ---------------------------------------------------------------------------- Fit statistic Value Description Likelihood ratio chi2_ms(7) 9.920 model vs. saturated p > chi2 0.193 chi2_bs(15) 910.923 baseline vs. saturated p > chi2 0.000

Population error RMSEA 0.019 Root mean squared error of approximation 90% CI, lower bound 0.000 upper bound 0.043 pclose 0.987 Probability RMSEA <= 0.05 Information criteria AIC 33332.470 Akaike's information criterion BIC 33434.272 Bayesian information criterion Baseline comparison CFI 0.997 Comparative fit index TLI 0.993 Tucker-Lewis index Size of residuals CD 0.889 Coefficient of determination ---------------------------------------------------------------------------- Note: SRMR is not reported because of missing values.. estat mindices Modification indices -------------------------------------------------------------- Standard MI df P>MI EPC EPC --------------+----------------------------------------------- Measurement euthanas <- Economic 5.084 1 0.02.137277.0769172 -------------------------------------------------------------- EPC = expected parameter change End of document.

References Acock, Alan C. (2013). Discovering structural equation modeling using Stata. College Station, TX: Stata Press. Albright, Jeremy J., and Hun Myoung Park. (2009). Confirmatory factor analysis using Amos, LISREL, Mplus, and SAS/STAT CALIS. Working Paper. The University Information Technology Services (UITS) Center for Statistical and Mathematical Computing, Indiana University. Brown, Timothy A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press. Byrne, Barbara M. (1998). Structural equation modeling with lisrel, prelis, and simplis : basic concepts, applications, and programming. Mahwah, NJ: L. Erlbaum Associates. Byrne, Barbara M. (2012). Structural equation modeling with Mplus: Basic concepts, applications, and programming. New York: Routledge Academic. Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245-276. O'Rourke, Norm, and Larry Hatcher. (2013). A step-by-step approach to using the SAS for factor analysis and structural equation modeling, 2 nd ed. Cary, NC: SAS Institute.