SAS/STAT 13.1 User s Guide. Introduction to Multivariate Procedures

Similar documents
Introduction to Multivariate Procedures (Book Excerpt)

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

Introduction to Categorical Data Analysis Procedures (Chapter)

What is DSC 410/510? DSC 410/510 Multivariate Statistical Methods. What is Multivariate Analysis? Computing. Some Quotes.

How to Get More Value from Your Survey Data

SAS/STAT 14.3 User s Guide The PSMATCH Procedure

Release Notes. JMP Genomics. Version 3.1

Center for Demography and Ecology

Masters in Business Statistics (MBS) /2015. Department of Mathematics Faculty of Engineering University of Moratuwa Moratuwa. Web:

Code Compulsory Module Credits Continuous Assignment

Discriminant models for high-throughput proteomics mass spectrometer data

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software

The Kruskal-Wallis Test with Excel In 3 Simple Steps. Kilem L. Gwet, Ph.D.

Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System

GETTING STARTED WITH PROC LOGISTIC

I am an experienced SAS programmer but I have not used many SAS/STAT procedures

What Is Conjoint Analysis? DSC 410/510 Multivariate Statistical Methods. How Is Conjoint Analysis Done? Empirical Example

Indian Institute of Technology Kanpur National Programme on Technology Enhanced Learning (NPTEL) Course Title Marketing Management 1

Getting Started with HLM 5. For Windows

Studying the Employee Satisfaction Using Factor Analysis

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

An Application of Categorical Analysis of Variance in Nested Arrangements

Phd Program in Transportation. Transport Demand Modeling. Session 4

MARKETING RESEARCH AN APPLIED APPROACH FIFTH EDITION NARESH K. MALHOTRA DANIEL NUNAN DAVID F. BIRKS. W Pearson

Common Space Analysis of Several Versions of The Wechsler Intelligence Scale For Children

CHAPTER IV DATA ANALYSIS

Using Factor Analysis to Generate Clusters of Agile Practices

SAS Human Capital Management 5.1. Administrator s Guide

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition

Discriminant Analysis Applications and Software Support

Statistics and Data Analysis

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

Learn What s New. Statistical Software

CONTRIBUTORY AND INFLUENCING FACTORS ON THE LABOUR WELFARE PRACTICES IN SELECT COMPANIES IN TIRUNELVELI DISTRICT AN ANALYSIS

GETTING STARTED WITH PROC LOGISTIC

Business Quantitative Analysis [QU1] Examination Blueprint

SAS. Activity-Based Management Adapter 6.1 for SAP R/3 User s Guide

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

* Project over a ten year time horizon,

5 CHAPTER: DATA COLLECTION AND ANALYSIS

Multivariate analysis of marketing data - applications for bricolage market

CHAPTER 7 ASSESSMENT OF GROUNDWATER QUALITY USING MULTIVARIATE STATISTICAL ANALYSIS

Statistics, Data Analysis, and Decision Modeling

SAS and Hadoop Technology: Overview

The prediction of economic and financial performance of companies using supervised pattern recognition methods and techniques

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Marketing Research: Uncovering Competitive Advantages

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Pairing Kano and Conjoint to Streamline New Product Development

What s New in ArcGIS Business Analyst Server 9.3

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Models in Engineering Glossary

Spreadsheets in Education (ejsie)

part 1 Marketing Research and the Research Process 1 part 5 Data Analysis and Interpretation 349 2, Determine Research Design 57

Linear State Space Models in Retail and Hospitality Beth Cubbage, SAS Institute Inc.

The Multivariate Dustbin

IBM SPSS Statistics: What s New

Using a Spatially Explicit Analysis Model to Evaluate Spatial Variation of Corn Yield

Multiple correspondence analysis

Statistics and Econometrics for Finance

IBM SPSS Statistics. Editions. Get the analytical power you need for better decision making. Why use IBM SPSS Statistics? IBM Analytics Solution Brief

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY RESPONSE FOR TEKNISI AHLI BBPLK SERANG TRAINING GRADUATES STATUS

One Year Executive Program in Applied Business Analytics

IBM SPSS Statistics Editions

CRM On Demand. Oracle CRM On Demand for Partner Relationship Management Configuration Guide

J.P. Sjamsoedin. D.P.E. Saerang. Analyzing Customer Perception.

DATA MINING AND BUSINESS ANALYTICS WITH R

To appear in: Moutinho and Hutcheson: Dictionary of Quantitative Methods in Management. Sage Publications

Over the past 20 years, the

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author

RESIDUAL FILES IN HLM OVERVIEW

CHAPTER FIVE CROSSTABS PROCEDURE

Segmentation, Targeting and Positioning in the Diaper Market

Data Visualization. Prof.Sushila Aghav-Palwe

Informed Decision-Making in Exploratory Factor Analysis

Content Appendicises

Oracle Utilities Analytics Dashboards for Customer Analytics, Revenue Analytics, and Credit & Collections Analytics

SAS/STAT 14.3 User s Guide What s New in SAS/STAT 14.3

SPSS 14: quick guide

Environmental Determinants of Surface Water Quality Based on Environmetric Methods

Creative Commons Attribution-NonCommercial-Share Alike License

Partial Least Squares Structural Equation Modeling PLS-SEM

MdAIR Spring Institute. April 28, 2006

Segmentation and Targeting

Diagnosing and Changing Organizational Culture

LIFE CYCLE RELIABILITY ENGINEERING

THE STRATEGIC IMPORTANCE OF OLAP AND MULTIDIMENSIONAL ANALYSIS A COGNOS WHITE PAPER

FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT

Segmentation and Targeting

Biostatistics 307. Conjoint analysis and canonical correlation

STATISTICS. by DEBORAH J. RUMSEY, PhD with DAVID UNGER

Predictive Modeling with SAS Enterprise Miner : Practical Solutions for Business Applications, Third Edition For a hard-copy book:

IT Project Valuation Survey - Preliminary Report -

F u = t n+1, t f = 1994, 2005

Transcription:

SAS/STAT 13.1 User s Guide Introduction to Multivariate Procedures

This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2013. SAS/STAT 13.1 User s Guide. Cary, NC: SAS Institute Inc. Copyright 2013, SAS Institute Inc., Cary, NC, USA All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government s rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414. December 2013 SAS provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our offerings, visit support.sas.com/bookstore or call 1-800-727-3228. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.

Gain Greater Insight into Your SAS Software with SAS Books. Discover all that you need on your journey to knowledge and empowerment. support.sas.com/bookstore for additional books and resources. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 2013 SAS Institute Inc. All rights reserved. S107969US.0613

Chapter 9 Introduction to Multivariate Procedures Contents Overview: Multivariate Procedures............................... 177 Comparison of the PRINCOMP and FACTOR Procedures.................. 178 Comparison of the PRINCOMP and PRINQUAL Procedures................. 179 Comparison of the PRINCOMP and CORRESP Procedures................. 179 Comparison of the PRINQUAL and CORRESP Procedures.................. 180 Comparison of the TRANSREG and PRINQUAL Procedures................ 180 References........................................... 181 Overview: Multivariate Procedures The procedures discussed in this chapter investigate relationships among variables without designating some as independent and others as dependent. Principal component analysis and common factor analysis examine relationships within a single set of variables, whereas canonical correlation looks at the relationship between two sets of variables. The following is a brief description of SAS/STAT multivariate procedures: CORRESP PRINCOMP PRINQUAL FACTOR CANCORR performs simple and multiple correspondence analyses, with a contingency table, Burt table, binary table, or raw categorical data as input. Correspondence analysis is a weighted form of principal component analysis that is appropriate for frequency data. The results are displayed in plots and tables and are also available in output data sets. performs a principal component analysis and outputs standardized or unstandardized principal component scores. The results are displayed in plots and tables and are also available in output data sets. performs a principal component analysis of qualitative data and multidimensional preference analysis. The results are displayed in plots and are also available in output data sets. performs principal component and common factor analyses with rotations and outputs component scores or estimates of common factor scores. The results are displayed in plots and tables and are also available in output data sets. performs a canonical correlation analysis and outputs canonical variable scores. The results are displayed in tables and are also available in output data sets for plotting. Many other SAS/STAT procedures can also analyze multivariate data for example, the CATMOD, GLM, REG, CALIS, and TRANSREG procedures as well as the procedures for clustering and discriminant analysis.

178 Chapter 9: Introduction to Multivariate Procedures The purpose of principal component analysis (Rao 1964) is to derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. Often a small number of principal components can be used in place of the original variables for plotting, regression, clustering, and so on. Principal component analysis can also be viewed as an attempt to uncover approximate linear dependencies among variables. The purpose of common factor analysis (Mulaik 1972) is to explain the correlations or covariances among a set of variables in terms of a limited number of unobservable, latent variables. The latent variables are not generally computable as linear combinations of the original variables. In common factor analysis, it is assumed that the variables are linearly related if not for uncorrelated random error or unique variation in each variable; both the linear relations and the amount of unique variation can be estimated. Principal component and common factor analysis are often followed by rotation of the components or factors. Rotation is the application of a nonsingular linear transformation to components or common factors to aid interpretation. The purpose of canonical correlation analysis (Mardia, Kent, and Bibby 1979) is to explain or summarize the relationship between two sets of variables by finding a small number of linear combinations from each set of variables that have the highest possible between-set correlations. Plots of the canonical variables can be useful in examining multivariate dependencies. If one of the two sets of variables consists of dummy variables generated from a classification variable, the canonical correlation is equivalent to canonical discriminant analysis (see Chapter 31, The CANDISC Procedure ). If both sets of variables are dummy variables, canonical correlation is equivalent to simple correspondence analysis. The purpose of correspondence analysis (Lebart, Morineau, and Warwick 1984; Greenacre 1984; Nishisato 1980) is to summarize the associations between a set of categorical variables in a small number of dimensions. Correspondence analysis computes scores on each dimension for each row and column category in a contingency table. Plots of these scores show the relationships among the categories. The PRINQUAL procedure obtains linear and nonlinear transformations of variables by using the method of alternating least squares (Young 1981) to optimize properties of the transformed variables covariance or correlation matrix. PROC PRINQUAL nonlinearly transforms variables, improving their fit to a principal component model. The name, PRINQUAL, for principal components of qualitative data, comes from the special case analysis of fitting a principal component model to nominal and ordinal scale of measurement variables (Young, Takane, and de Leeuw 1978). However, PROC PRINQUAL also has facilities for smoothly transforming continuous variables. All of PROC PRINQUAL s transformations are also available in the TRANSREG procedure, which fits regression models with nonlinear transformations. PROC PRINQUAL can also perform metric and nonmetric multidimensional preference (MDPREF) analyses (Carroll 1972) and produce plots of the results. Comparison of the PRINCOMP and FACTOR Procedures Although PROC FACTOR can be used for common factor analysis, the default method is principal components. PROC FACTOR produces the same results as PROC PRINCOMP except that scoring coefficients from PROC FACTOR are normalized to give principal component scores with unit variance, whereas PROC PRINCOMP by default produces principal component scores with variance equal to the corresponding eigenvalue. PROC PRINCOMP can also compute scores standardized to unit variance. Both procedures produce graphical results through ODS Graphics.

Comparison of the PRINCOMP and PRINQUAL Procedures 179 PROC PRINCOMP has the following advantages over PROC FACTOR: PROC PRINCOMP is slightly faster if a small number of components is requested. PROC PRINCOMP can analyze somewhat larger problems in a fixed amount of memory. PROC PRINCOMP can output scores from an analysis of a partial correlation or covariance matrix. PROC PRINCOMP is simpler to use. PROC FACTOR has the following advantages over PROC PRINCOMP for principal component analysis: PROC FACTOR produces more output. PROC FACTOR does rotations. If you want to perform a common factor analysis, you must use PROC FACTOR instead of PROC PRINCOMP. Principal component analysis should never be used if a common factor solution is desired (Dziuban and Harris 1973; Lee and Comrey 1979). Comparison of the PRINCOMP and PRINQUAL Procedures The PRINCOMP procedure performs principal component analysis. The PRINQUAL procedure finds linear and nonlinear transformations of variables to optimize properties of the transformed variables covariance or correlation matrix. One property is the sum of the first n eigenvalues, which is a measure of the fit of a principal component model with n components. Use PROC PRINQUAL to find nonlinear transformations of your variables or to perform a multidimensional preference analysis. Use PROC PRINCOMP to fit a principal component model to your data or to PROC PRINQUAL s output data set. PROC PRINCOMP produces a report of the principal component analysis, a number of graphical displays, and output data sets. PROC PRINQUAL produces only a few graphs and an output data set. Comparison of the PRINCOMP and CORRESP Procedures As summarized previously, PROC PRINCOMP performs a principal component analysis of interval-scaled data. PROC CORRESP performs correspondence analysis, which is a weighted form of principal component analysis that is appropriate for frequency data. If your data are categorical, use PROC CORRESP instead of PROC PRINCOMP. Both procedures produce graphical displays of the results with ODS Graphics. The plots produced by PROC CORRESP graphically show relationships among the categories of the categorical variables.

180 Chapter 9: Introduction to Multivariate Procedures Comparison of the PRINQUAL and CORRESP Procedures Both PROC PRINQUAL and PROC CORRESP can be used to summarize associations among variables measured on a nominal scale. PROC PRINQUAL searches for a single nonlinear transformation of the original scoring of each nominal variable that optimizes some aspect of the covariance matrix of the transformed variables. For example, PROC PRINQUAL could be used to find scorings that maximize the fit of a principal component model with one component. PROC CORRESP uses the crosstabulations of nominal variables, not covariances, and produces multiple scores for each category of each nominal variable. The main conceptual difference between PROC PRINQUAL and PROC CORRESP is that PROC PRINQUAL assumes that the categories of a nominal variable correspond to values of a single underlying interval variable, whereas PROC CORRESP assumes that there are multiple underlying interval variables and therefore uses different category scores for each dimension of the correspondence analysis. Scores from PROC CORRESP on the first dimension match the single set of PROC PRINQUAL scores (with appropriate standardizations for both analyses). Comparison of the TRANSREG and PRINQUAL Procedures Both the TRANSREG and PRINQUAL procedures are data transformation procedures that have many of the same transformations. These procedures can either directly perform the specified transformation (such as taking the logarithm of the variable) or search for an optimal transformation (such as a spline with a specified number of knots). Both procedures can use an iterative, alternating least squares analysis. Both procedures create an output data set that can be used as input to other procedures. PROC PRINQUAL displays relatively little output, whereas PROC TRANSREG displays many results. PROC TRANSREG has two sets of variables, usually dependent and independent, and it fits linear models such as ordinary regression and ANOVA, multiple and multivariate regression, metric and nonmetric conjoint analysis, metric and nonmetric vector and ideal point preference mapping, redundancy analysis, canonical correlation, and response surface regression. In contrast, PROC PRINQUAL has one set of variables, fits a principal component model or multidimensional preference analysis, and can also optimize other properties of a correlation or covariance matrix. PROC TRANSREG performs hypothesis testing and can be used to code experimental designs prior to their use in other analyses. PROC TRANSREG can also perform Box-Cox transformations and fit models with smoothing spline and penalized B-spline transformations. See Chapter 4, Introduction to Regression Procedures, for comparisons of the TRANSREG and REG procedures.

References 181 References Carroll, J. D. (1972), Individual Differences and Multidimensional Scaling, in R. N. Shepard, A. K. Romney, and S. B. Nerlove, eds., Multidimensional Scaling: Theory and Applications in the Behavioral Sciences, volume 1, New York: Seminar Press. Dziuban, C. D. and Harris, C. W. (1973), On the Extraction of Components and the Applicability of the Factor Model, American Educational Research Journal, 10, 93 99. Greenacre, M. J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. Lebart, L., Morineau, A., and Warwick, K. M. (1984), Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices, New York: John Wiley & Sons. Lee, H. B. and Comrey, A. L. (1979), Distortions in a Commonly Used Factor Analytic Procedure, Multivariate Behavioral Research, 14, 301 321. Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979), Multivariate Analysis, London: Academic Press. Mulaik, S. A. (1972), The Foundations of Factor Analysis, New York: McGraw-Hill. Nishisato, S. (1980), Analysis of Categorical Data: Dual Scaling and Its Applications, Toronto: University of Toronto Press. Rao, C. R. (1964), The Use and Interpretation of Principal Component Analysis in Applied Research, Sankhy Na, Series A, 26, 329 358. Young, F. W. (1981), Quantitative Analysis of Qualitative Data, Psychometrika, 46, 357 388. Young, F. W., Takane, Y., and de Leeuw, J. (1978), The Principal Components of Mixed Measurement Level Multivariate Data: An Alternating Least Squares Method with Optimal Scaling Features, Psychometrika, 43, 279 281.