Statistics and Econometrics for Finance

Statistics and Econometrics for Finance Series Editors David Ruppert Jianqing Fan Eric Renault Eric Zivot More information about this series at http://www.springer.com/series/10377

This is the second part of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package. This volume focuses on multivariate analysis, forecasting techniques and research methods.

Abdulkader Aljandali Multivariate Methods and Forecasting with IBM SPSS Statistics

Abdulkader Aljandali Accounting, Finance and Economics Department Regent s University London London, UK ISSN 2199-093X ISSN 2199-0948 (electronic) Statistics and Econometrics for Finance ISBN 978-3-319-56480-7 ISBN 978-3-319-56481-4 (ebook) DOI 10.1007/978-3-319-56481-4 Library of Congress Control Number: 2017939132 Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface IBM SPSS Statistics is an integrated family of products that addresses the entire analytical process, from planning to data collection to analysis, reporting and deployment. It offers a powerful set of statistical and information analysis systems that runs on a wide variety of personal computers. As such, IBM SPSS (previously known as SPSS) is extensively used in industry, commerce, banking and local and national government education. Just a small subset of users of the package in the UK includes the major clearing banks, the BBC, British Gas, British Airways, British Telecom, Eurotunnel, GlaxoSmithKline, London Underground, the NHS, BAE Systems, Royal Dutch Shell, Unilever and W.H. Smith & Son. In fact, all UK universities and the vast majority of universities worldwide use IBM SPSS Statistics for teaching and research. It is certainly an advantage for a student in the UK to have knowledge of the package since it obviates the need for an employer to provide in-house training. There is no text at present that is specifically aimed at the undergraduate market in business studies and associated subjects such as finance, marketing and economics. Such subjects tend to have the largest numbers of enrolled students in many institutions, particularly in the former polytechnic sector. The author is not going to adopt an explicitly mathematical approach, but rather will stress the applicability of various statistical techniques to various problem-solving scenarios. IBM SPSS Statistics offers all the benefits of the Windows environment as analysts can have many windows of different types open at once, enabling simultaneous working with raw data and results. Further, users may learn the logic of the program by choosing an analysis rather than having to learn the IBM SPSS command language. The last thing wanted by students new to statistical methodology is simultaneously to have to learn a command language. There are many varieties of tabular output available, and the user may customise output using IBM SPSS script. This book builds on a previous publication, Quantitative Analysis and IBM SPSS Statistics: A Guide for Business and Finance (Springer, 2016), which provided a gentle introduction to the IBM SPSS Statistics software for both students and v

vi Preface professionals. This book is aimed at those who have had exposure to the program and intend to take their knowledge further. This text is more advanced compared to the one above-mentioned and will be beneficial to students in their final year of undergraduate study, master s students, researchers and professionals working in the areas of practical business forecasting or market research data analysis. This text would doubtlessly be more sympathetic to the readership than the manuals supplied by IBM SPSS Inc. London, UK June 10 th 2017 Abdulkader Mahmoud Aljandali

Introduction This is the second part of a two-part guide to the IBM SPSS Statistics computer package for business, finance and marketing students. This, the second part of the guide, introduces multivariate regression, logistic regression, Box-Jenkins methodology alongside other multivariate and forecasting methods. Although the emphasis is on applications of the IBM SPSS Statistics software, there is a need for the user to be aware of the statistical assumptions and rationale that underpin correct and meaningful application of the techniques that are available in the package. Therefore, such assumptions are discussed, and methods of assessing their validity are described. This, the second part of the IBM SPSS Statistics guide, is itself divided into three sections. The first chapter of Part I introduces multivariate regression and the assumptions that underpin it. The chapter discusses the multicollinearity and residual problems. Two-variable regression and correlation are illustrated, and the assumptions underlying the regression method are stressed. Logistic and dummy regression models in addition to functional forms of regression are the subject matter of Chap. 2. The Box-Jenkins methodology, stationarity of data and various steps that lead to the generation of mean equations are introduced in Chap. 3. The practical utility of time series methods is discussed. Exponential smoothing and naïve models Chap. 4 conclude Part I. Part II introduces multivariate methods such as factor analysis (Chap. 5) discriminant analysis (Chap. 6) and multidimensional scaling (Chap. 7). This part concludes with a chapter on the hierarchical log-linear analysis model (Chap. 8). Part III comprises chapters that introduce popular concepts usually taught under research methods. Testing for dependence using the chi square test is discussed in Chap. 9, while applications on parametric and non-parametric tests are made available to the reader in Chap. 10. Parametric methods make more rigid assumptions about the distributional form of the gathered data than do non-parametric vii

viii Introduction methods. However, it must be recognised that parametric methods are more powerful when the assumptions underlying them are met. This book concludes by a review of the concept of constant and real prices in business and the effect it might have on the recording of data over time (Chap. 11).

Contents Part I Forecasting Models 1 Multivariate Regression... 3 1.1 The Assumptions Underlying Regression................. 4 1.1.1 Multicollinearity... 4 1.1.2 Homoscedasticity of the Residuals...... 5 1.1.3 Normality of the Residuals.... 8 1.1.4 Independence of the Residuals... 8 1.2 Selecting the Regression Equation... 11 1.3 Multivariate Regression in IBM SPSS Statistics........ 12 1.4 The Cochrane-Orcutt Procedure for Tackling Autocorrelation................................... 19 2 Other Useful Topics in Regression... 27 2.1 Binary Logistic Regression........................... 28 2.1.1 The Linear Probability Model (LPM).............. 28 2.1.2 The Logit Model............................. 31 2.1.3 Applying the Logit Model... 32 2.1.4 The Logistic Model in IBM SPSS Statistics......... 33 2.1.5 A Financial Application of the Logistic Model....... 39 2.2 Multinomial Logistic Regression.... 40 2.3 Dummy Regression... 40 2.4 Functional Forms of Regression Models...... 47 2.4.1 The Power Model.... 49 2.4.2 The Reciprocal Model... 52 2.4.3 The Linear Trend Model....................... 55 3 The Box-Jenkins Methodology... 59 3.1 The Property of Stationarity... 59 3.1.1 Trend Differencing... 60 3.1.2 Seasonal Differencing... 62 ix

x Contents 3.1.3 Homoscedasticity of the Data... 63 3.1.4 Producing a Stationary Time Series in IBM SPSS Statistics... 63 3.2 The ARIMA Model................................ 66 3.3 Autocorrelation................................... 67 3.3.1 ACF... 67 3.3.2 PACF...... 70 3.3.3 Patterns of the ACF and PACF... 71 3.3.4 Applying an ARIMA Model..................... 71 3.4 ARIMA Models in IBM SPSS Statistics... 74 4 Exponential Smoothing and Naïve Models... 81 4.1 Exponential Smoothing Models.... 81 4.2 The Naïve Models................................. 88 Part II Multivariate Methods 5 Factor Analysis... 97 5.1 The Correlation Matrix... 98 5.2 The Terminology and Logic of Factor Analysis............ 98 5.3 Rotation and the Naming of Factors... 102 5.4 Factor Scores in IBM SPSS Statistics.... 105 6 Discriminant Analysis... 107 6.1 The Methodology of Discriminant Analysis..... 107 6.2 Discriminant Analysis in IBM SPSS Statistics..... 108 6.3 Results of Applying the IBM SPSS Discriminant Procedure... 110 7 Multidimension Scaling (MDS)... 117 7.1 Types of MDS Model and Rationale of MDS.............. 119 7.2 Methods for Obtaining Proximities..... 120 7.3 The Basics of MDS in IBM SPSS Statistics: Flying Mileages................................... 121 7.4 An Example of Nonmetric MDS in IBM SPSS Statistics: Perceptions of Car Models..... 126 7.5 Methods of Computing Proximities.... 127 7.6 Weighted Multidimensional Scaling in IBM SPSS, INDSCAL... 130 8 Hierchical Log-linear Analysis... 135 8.1 The Logic and Terminology of Log-linear Analysis........ 135 8.2 IBM SPSS Statistics Commands for the Saturated Model.......................................... 138 8.3 The Independence Model............................ 142 8.4 Hierarchical Models................................ 144 8.5 Backward Elimination...... 148

Contents xi Part III Research Methods 9 Testing for Dependence... 153 9.1 Introduction..... 153 9.2 Chi-Square in IBM SPSS Statistics..................... 155 10 Testing for Differences Between Groups... 159 10.1 Introduction.... 159 10.2 Testing for Population Normality and Equal Variances....... 160 10.3 The One-Way Analysis of Variance (ANOVA)... 162 10.4 The Kruskal-Wallis Test... 164 11 Current and Constant Prices... 167 11.1 HICP and RPI.................................... 167 11.2 Current and Constant Prices.... 168 References... 173 Index... 175

List of Figures Fig. 1.1 Linear regression: statistics... 6 Fig. 1.2 Homoscedastic residuals... 7 Fig. 1.3 Heteroscedastic residuals... 7 Fig. 1.4 Positively correlated residuals... 9 Fig. 1.5 Negatively correlated residuals... 10 Fig. 1.6 Correlations between the regressor variables... 13 Fig. 1.7 The linear regression dialogue box... 14 Fig. 1.8 The linear regression: statistics dialogue box... 14 Fig. 1.9 The linear regression: plots dialogue box... 15 Fig. 1.10 The linear regression: save dialogue box... 16 Fig. 1.11 Part of the output from the stepwise regression procedure... 17 Fig. 1.12 A histogram of the regression residuals... 18 Fig. 1.13 A plot of standardized residuals against predicted values........ 19 Fig. 1.14 A case by case analysis of the standardized residuals... 20 Fig. 1.15 A plot of observed versus predicted values... 21 Fig. 1.16 A plot of the regression residuals over time... 22 Fig. 1.17 The SPSS syntax editor... 23 Fig. 1.18 The C-O procedure in IBM SPSS syntax... 24 Fig. 1.19 Output from the Cochrane-Orcutt procedure... 25 Fig. 2.1 Home ownership and income ( 000 s)... 29 Fig. 2.2 Regression line when Y is dichotomous... 30 Fig. 2.3 A plot of the logistic distribution function... 31 Fig. 2.4 The logistic regression dialogue box... 34 Fig. 2.5 The logistic regression: save dialogue box... 35 Fig. 2.6 The logistic regression: options dialogue box... 35 Fig. 2.7 The first six cases in the active data file... 36 Fig. 2.8 Variables in the final logistic model... 37 Fig. 2.9 The classification table associated with logistic regression... 38 Fig. 2.10 The Hosmer-Lemeshow test... 38 xiii

xiv List of Figures Fig. 2.11 The multinomial logistic regression dialogue box... 41 Fig. 2.12 Scatterplot of tool life by tool type... 43 Fig. 2.13 Part of the output from dummy regression... 44 Fig. 2.14 A plot of residuals against predicted values... 45 Fig. 2.15 Computation pf the cross-product term... 46 Fig. 2.16 The new data file... 47 Fig. 2.17 Part of the output for dummy regression with a cross product term... 47 Fig. 2.18 Raw data... 48 Fig. 2.19 Bivariate regression results... 49 Fig. 2.20 A plot of average annual coffee consumption against average price... 49 Fig. 2.21 A plot of lny against lnx... 50 Fig. 2.22 Results of regressing lny against lnx... 51 Fig. 2.23 The reciprocal model with asymptote... 53 Fig. 2.24 UK increases in wage rates and unemployment... 54 Fig. 2.25 Regression of increases in wage rates against the reciprocal of unemployment... 54 Fig. 2.26 United States GDP, 1972 1991... 55 Fig. 2.27 A plot of U.S.A. GDP over time... 56 Fig. 2.28 Regression results for GDP against t... 56 Fig. 3.1 Stock levels over time... 61 Fig. 3.2 The create time series dialogue box... 64 Fig. 3.3 The default variable name change... 64 Fig. 3.4 The variable FIRSTDIF added to the active file... 65 Fig. 3.5 A plot of first differences of the variable STOCK... 65 Fig. 3.6 The autocorrelations dialogue box... 72 Fig. 3.7 The autocorrelations: options dialogue box... 73 Fig. 3.8 The ACF plot... 73 Fig. 3.9 The PACF plot... 74 Fig. 3.10 The ARIMA dialogue box... 75 Fig. 3.11 The ARIMA criteria dialogue box... 76 Fig. 3.12 The ARIMA save dialogue box... 77 Fig. 3.13 Observed and predicted stock levels... 78 Fig. 4.1 A company s monthly stock levels over time... 82 Fig. 4.2 A plot of stock levels over time... 83 Fig. 4.3 The exponential smoothing dialogue box... 83 Fig. 4.4 The exponential smoothing: parameters dialogue box... 84 Fig. 4.5 The exponential smoothing: save dialogue box... 85 Fig. 4.6 The exponential smoothing: options dialogue box... 86 Fig. 4.7 The active data file with forecasted and predicted values, plus residuals... 87 Fig. 4.8 A plot of observed and predicted stock levels... 87 Fig. 4.9 The compute variable dialogue box... 89

List of Figures xv Fig. 4.10 Lagged values of the variable LEVEL... 90 Fig. 4.11 Computation of the residuals from the Naïve 1 model... 91 Fig. 4.12 Creation of LAG12 and LAG24... 91 Fig. 4.13 Forecasted and residual values from the Naïve 2 model... 92 Fig. 4.14 Define graphs with multiple lines, Naïve 1 and 2... 92 Fig. 4.15 Forecasts generated using the Naïve 1 and 2 models... 93 Fig. 5.1 Inter-correlations between study variables... 99 Fig. 5.2 The factor analysis dialogue box... 100 Fig. 5.3 The eigenvalues associated with the factor extraction... 101 Fig. 5.4 The communalities associated with the study variables..... 101 Fig. 5.5 Loadings of four variables on two factors... 102 Fig. 5.6 The factor analysis: rotation dialogue box... 103 Fig. 5.7 Unrotated and rotated factor loadings... 104 Fig. 5.8 The factor analysis: factor scores dialogue box... 105 Fig. 5.9 Factor scores added to the active file... 106 Fig. 6.1 The discriminant analysis dialogue box x... 109 Fig. 6.2 The discriminant analysis: define ranges dialogue box........... 109 Fig. 6.3 The discriminant analysis: classification dialogue box... 110 Fig. 6.4 IBM SPSS output from discriminant analysis... 111 Fig. 6.5 Histogram of discriminant scores for the low Fig. 6.6 population group... 114 Histogram of discriminant scores for the high population group... 115 Fig. 6.7 The discriminant analysis: save dialogue box... 115 Fig. 6.8 Results of discriminant analysis added to the working file... 116 Fig. 7.1 A hypothetical MDS perceptual map... 118 Fig. 7.2 Airmiles data... 121 Fig. 7.3 The MDS dialogue box: data format... 122 Fig. 7.4 The MDS: model dialogue box... 122 Fig. 7.5 The MDS: options dialogue box... 123 Fig. 7.6 MDS plot of intercity flying mileages... 124 Fig. 7.7 IBM SPSS statistics output for the airmiles data (AIRMILES.SAV)... 125 Fig. 7.8 Scatterplot of raw data versus distances... 126 Fig. 7.9 MDS map for a consumer s perceptions of car makes... 127 Fig. 7.10 Output for MDS of car make similarities... 128 Fig. 7.11 The MDS: create measure dialogue box... 129 Fig. 7.12 Fig. 7.13 Fig. 7.14 MDS plot of intercity flying mileages using Manhattan distances... 130 The multidimensional scaling: shape of data dialogue box... 132 The MDS: model dialogue box for the store perception data... 133

xvi List of Figures Fig. 8.1 The loglinear analysis: model dialogue box... 139 Fig. 8.2 The model selection loglinear analysis dialogue box... 140 Fig. 8.3 The loglinear analysis: options dialogue box... 140 Fig. 8.4 IBM SPSS output for the saturated model... 142 Fig. 8.5 The loglinear analysis: model dialogue box for main effects only... 143 Fig. 8.6 IBM SPSS output for the unsaturated model... 144 Fig. 8.7 A normal probability plot of residuals from the unsaturated model... 145 Fig. 8.8 IBM SPSS for the 4-way loglinear model... 147 Fig. 8.9 Part of the results from backward elimination... 149 Fig. 9.1 The Crosstabs dialogue box... 156 Fig. 9.2 The Crosstabs: statistics dialogue box... 157 Fig. 9.3 The Crosstabs: cell display dialogue box... 157 Fig. 9.4 A Crosstabulation of deposits and levels of satisfaction (Note: If there are three or more study variables, it is best not to use the chi-squared test of independence. There is a method called log-linear analysis which is available in IBM SPSS Statistics (please refer to Chap. 8))... 158 Fig. 10.1 The explore dialogue box... 160 Fig. 10.2 The explore: plots dialogue box... 161 Fig. 10.3 Test of normality output... 162 Fig. 10.4 Test of homogeneity of Variance output... 162 Fig. 10.5 Box plots of type of share * % change in price... 163 Fig. 10.6 The one-way ANOVA box... 164 Fig. 10.7 The one-way ANOVA: post hoc multiple comparisons box... 164 Fig. 10.8 Part of the IBM SPSS statistics output: ANOVA & multiple comparisons... 165 Fig. 10.9 Kruskal-Wallis test output... 166 Fig. 11.1 Current and constant prices data file... 169 Fig. 11.2 Compute variable dialogue box Pindex... 170 Fig. 11.3 Price index variable added to the data file... 170 Fig. 11.4 Real expenditures variable added to the data file... 171 Fig. 11.5 Plot of real expenditures vs current expenditures... 172

List of Tables Table 1.1 Fictious data... 4 Table 2.1 Logistic estimate results for the Dietrich and Sorenson study... 39 Table 9.1 Contingency table... 154 Table 10.1 Types of shares * % change in shares... 160 xvii