Statistics and Econometrics for Finance

Size: px
Start display at page:

Download "Statistics and Econometrics for Finance"

Transcription

1 Statistics and Econometrics for Finance Series Editors David Ruppert Jianqing Fan Eric Renault Eric Zivot More information about this series at

2 This is the second part of a two-part guide to quantitative analysis using the IBM SPSS Statistics software package. This volume focuses on multivariate analysis, forecasting techniques and research methods.

3 Abdulkader Aljandali Multivariate Methods and Forecasting with IBM SPSS Statistics

4 Abdulkader Aljandali Accounting, Finance and Economics Department Regent s University London London, UK ISSN X ISSN (electronic) Statistics and Econometrics for Finance ISBN ISBN (ebook) DOI / Library of Congress Control Number: Springer International Publishing AG 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

5 Preface IBM SPSS Statistics is an integrated family of products that addresses the entire analytical process, from planning to data collection to analysis, reporting and deployment. It offers a powerful set of statistical and information analysis systems that runs on a wide variety of personal computers. As such, IBM SPSS (previously known as SPSS) is extensively used in industry, commerce, banking and local and national government education. Just a small subset of users of the package in the UK includes the major clearing banks, the BBC, British Gas, British Airways, British Telecom, Eurotunnel, GlaxoSmithKline, London Underground, the NHS, BAE Systems, Royal Dutch Shell, Unilever and W.H. Smith & Son. In fact, all UK universities and the vast majority of universities worldwide use IBM SPSS Statistics for teaching and research. It is certainly an advantage for a student in the UK to have knowledge of the package since it obviates the need for an employer to provide in-house training. There is no text at present that is specifically aimed at the undergraduate market in business studies and associated subjects such as finance, marketing and economics. Such subjects tend to have the largest numbers of enrolled students in many institutions, particularly in the former polytechnic sector. The author is not going to adopt an explicitly mathematical approach, but rather will stress the applicability of various statistical techniques to various problem-solving scenarios. IBM SPSS Statistics offers all the benefits of the Windows environment as analysts can have many windows of different types open at once, enabling simultaneous working with raw data and results. Further, users may learn the logic of the program by choosing an analysis rather than having to learn the IBM SPSS command language. The last thing wanted by students new to statistical methodology is simultaneously to have to learn a command language. There are many varieties of tabular output available, and the user may customise output using IBM SPSS script. This book builds on a previous publication, Quantitative Analysis and IBM SPSS Statistics: A Guide for Business and Finance (Springer, 2016), which provided a gentle introduction to the IBM SPSS Statistics software for both students and v

6 vi Preface professionals. This book is aimed at those who have had exposure to the program and intend to take their knowledge further. This text is more advanced compared to the one above-mentioned and will be beneficial to students in their final year of undergraduate study, master s students, researchers and professionals working in the areas of practical business forecasting or market research data analysis. This text would doubtlessly be more sympathetic to the readership than the manuals supplied by IBM SPSS Inc. London, UK June 10 th 2017 Abdulkader Mahmoud Aljandali

7 Introduction This is the second part of a two-part guide to the IBM SPSS Statistics computer package for business, finance and marketing students. This, the second part of the guide, introduces multivariate regression, logistic regression, Box-Jenkins methodology alongside other multivariate and forecasting methods. Although the emphasis is on applications of the IBM SPSS Statistics software, there is a need for the user to be aware of the statistical assumptions and rationale that underpin correct and meaningful application of the techniques that are available in the package. Therefore, such assumptions are discussed, and methods of assessing their validity are described. This, the second part of the IBM SPSS Statistics guide, is itself divided into three sections. The first chapter of Part I introduces multivariate regression and the assumptions that underpin it. The chapter discusses the multicollinearity and residual problems. Two-variable regression and correlation are illustrated, and the assumptions underlying the regression method are stressed. Logistic and dummy regression models in addition to functional forms of regression are the subject matter of Chap. 2. The Box-Jenkins methodology, stationarity of data and various steps that lead to the generation of mean equations are introduced in Chap. 3. The practical utility of time series methods is discussed. Exponential smoothing and naïve models Chap. 4 conclude Part I. Part II introduces multivariate methods such as factor analysis (Chap. 5) discriminant analysis (Chap. 6) and multidimensional scaling (Chap. 7). This part concludes with a chapter on the hierarchical log-linear analysis model (Chap. 8). Part III comprises chapters that introduce popular concepts usually taught under research methods. Testing for dependence using the chi square test is discussed in Chap. 9, while applications on parametric and non-parametric tests are made available to the reader in Chap. 10. Parametric methods make more rigid assumptions about the distributional form of the gathered data than do non-parametric vii

8 viii Introduction methods. However, it must be recognised that parametric methods are more powerful when the assumptions underlying them are met. This book concludes by a review of the concept of constant and real prices in business and the effect it might have on the recording of data over time (Chap. 11).

9 Contents Part I Forecasting Models 1 Multivariate Regression The Assumptions Underlying Regression Multicollinearity Homoscedasticity of the Residuals Normality of the Residuals Independence of the Residuals Selecting the Regression Equation Multivariate Regression in IBM SPSS Statistics The Cochrane-Orcutt Procedure for Tackling Autocorrelation Other Useful Topics in Regression Binary Logistic Regression The Linear Probability Model (LPM) The Logit Model Applying the Logit Model The Logistic Model in IBM SPSS Statistics A Financial Application of the Logistic Model Multinomial Logistic Regression Dummy Regression Functional Forms of Regression Models The Power Model The Reciprocal Model The Linear Trend Model The Box-Jenkins Methodology The Property of Stationarity Trend Differencing Seasonal Differencing ix

10 x Contents Homoscedasticity of the Data Producing a Stationary Time Series in IBM SPSS Statistics The ARIMA Model Autocorrelation ACF PACF Patterns of the ACF and PACF Applying an ARIMA Model ARIMA Models in IBM SPSS Statistics Exponential Smoothing and Naïve Models Exponential Smoothing Models The Naïve Models Part II Multivariate Methods 5 Factor Analysis The Correlation Matrix The Terminology and Logic of Factor Analysis Rotation and the Naming of Factors Factor Scores in IBM SPSS Statistics Discriminant Analysis The Methodology of Discriminant Analysis Discriminant Analysis in IBM SPSS Statistics Results of Applying the IBM SPSS Discriminant Procedure Multidimension Scaling (MDS) Types of MDS Model and Rationale of MDS Methods for Obtaining Proximities The Basics of MDS in IBM SPSS Statistics: Flying Mileages An Example of Nonmetric MDS in IBM SPSS Statistics: Perceptions of Car Models Methods of Computing Proximities Weighted Multidimensional Scaling in IBM SPSS, INDSCAL Hierchical Log-linear Analysis The Logic and Terminology of Log-linear Analysis IBM SPSS Statistics Commands for the Saturated Model The Independence Model Hierarchical Models Backward Elimination

11 Contents xi Part III Research Methods 9 Testing for Dependence Introduction Chi-Square in IBM SPSS Statistics Testing for Differences Between Groups Introduction Testing for Population Normality and Equal Variances The One-Way Analysis of Variance (ANOVA) The Kruskal-Wallis Test Current and Constant Prices HICP and RPI Current and Constant Prices References Index

12 List of Figures Fig. 1.1 Linear regression: statistics... 6 Fig. 1.2 Homoscedastic residuals... 7 Fig. 1.3 Heteroscedastic residuals... 7 Fig. 1.4 Positively correlated residuals... 9 Fig. 1.5 Negatively correlated residuals Fig. 1.6 Correlations between the regressor variables Fig. 1.7 The linear regression dialogue box Fig. 1.8 The linear regression: statistics dialogue box Fig. 1.9 The linear regression: plots dialogue box Fig The linear regression: save dialogue box Fig Part of the output from the stepwise regression procedure Fig A histogram of the regression residuals Fig A plot of standardized residuals against predicted values Fig A case by case analysis of the standardized residuals Fig A plot of observed versus predicted values Fig A plot of the regression residuals over time Fig The SPSS syntax editor Fig The C-O procedure in IBM SPSS syntax Fig Output from the Cochrane-Orcutt procedure Fig. 2.1 Home ownership and income ( 000 s) Fig. 2.2 Regression line when Y is dichotomous Fig. 2.3 A plot of the logistic distribution function Fig. 2.4 The logistic regression dialogue box Fig. 2.5 The logistic regression: save dialogue box Fig. 2.6 The logistic regression: options dialogue box Fig. 2.7 The first six cases in the active data file Fig. 2.8 Variables in the final logistic model Fig. 2.9 The classification table associated with logistic regression Fig The Hosmer-Lemeshow test xiii

13 xiv List of Figures Fig The multinomial logistic regression dialogue box Fig Scatterplot of tool life by tool type Fig Part of the output from dummy regression Fig A plot of residuals against predicted values Fig Computation pf the cross-product term Fig The new data file Fig Part of the output for dummy regression with a cross product term Fig Raw data Fig Bivariate regression results Fig A plot of average annual coffee consumption against average price Fig A plot of lny against lnx Fig Results of regressing lny against lnx Fig The reciprocal model with asymptote Fig UK increases in wage rates and unemployment Fig Regression of increases in wage rates against the reciprocal of unemployment Fig United States GDP, Fig A plot of U.S.A. GDP over time Fig Regression results for GDP against t Fig. 3.1 Stock levels over time Fig. 3.2 The create time series dialogue box Fig. 3.3 The default variable name change Fig. 3.4 The variable FIRSTDIF added to the active file Fig. 3.5 A plot of first differences of the variable STOCK Fig. 3.6 The autocorrelations dialogue box Fig. 3.7 The autocorrelations: options dialogue box Fig. 3.8 The ACF plot Fig. 3.9 The PACF plot Fig The ARIMA dialogue box Fig The ARIMA criteria dialogue box Fig The ARIMA save dialogue box Fig Observed and predicted stock levels Fig. 4.1 A company s monthly stock levels over time Fig. 4.2 A plot of stock levels over time Fig. 4.3 The exponential smoothing dialogue box Fig. 4.4 The exponential smoothing: parameters dialogue box Fig. 4.5 The exponential smoothing: save dialogue box Fig. 4.6 The exponential smoothing: options dialogue box Fig. 4.7 The active data file with forecasted and predicted values, plus residuals Fig. 4.8 A plot of observed and predicted stock levels Fig. 4.9 The compute variable dialogue box... 89

14 List of Figures xv Fig Lagged values of the variable LEVEL Fig Computation of the residuals from the Naïve 1 model Fig Creation of LAG12 and LAG Fig Forecasted and residual values from the Naïve 2 model Fig Define graphs with multiple lines, Naïve 1 and Fig Forecasts generated using the Naïve 1 and 2 models Fig. 5.1 Inter-correlations between study variables Fig. 5.2 The factor analysis dialogue box Fig. 5.3 The eigenvalues associated with the factor extraction Fig. 5.4 The communalities associated with the study variables Fig. 5.5 Loadings of four variables on two factors Fig. 5.6 The factor analysis: rotation dialogue box Fig. 5.7 Unrotated and rotated factor loadings Fig. 5.8 The factor analysis: factor scores dialogue box Fig. 5.9 Factor scores added to the active file Fig. 6.1 The discriminant analysis dialogue box x Fig. 6.2 The discriminant analysis: define ranges dialogue box Fig. 6.3 The discriminant analysis: classification dialogue box Fig. 6.4 IBM SPSS output from discriminant analysis Fig. 6.5 Histogram of discriminant scores for the low Fig. 6.6 population group Histogram of discriminant scores for the high population group Fig. 6.7 The discriminant analysis: save dialogue box Fig. 6.8 Results of discriminant analysis added to the working file Fig. 7.1 A hypothetical MDS perceptual map Fig. 7.2 Airmiles data Fig. 7.3 The MDS dialogue box: data format Fig. 7.4 The MDS: model dialogue box Fig. 7.5 The MDS: options dialogue box Fig. 7.6 MDS plot of intercity flying mileages Fig. 7.7 IBM SPSS statistics output for the airmiles data (AIRMILES.SAV) Fig. 7.8 Scatterplot of raw data versus distances Fig. 7.9 MDS map for a consumer s perceptions of car makes Fig Output for MDS of car make similarities Fig The MDS: create measure dialogue box Fig Fig Fig MDS plot of intercity flying mileages using Manhattan distances The multidimensional scaling: shape of data dialogue box The MDS: model dialogue box for the store perception data

15 xvi List of Figures Fig. 8.1 The loglinear analysis: model dialogue box Fig. 8.2 The model selection loglinear analysis dialogue box Fig. 8.3 The loglinear analysis: options dialogue box Fig. 8.4 IBM SPSS output for the saturated model Fig. 8.5 The loglinear analysis: model dialogue box for main effects only Fig. 8.6 IBM SPSS output for the unsaturated model Fig. 8.7 A normal probability plot of residuals from the unsaturated model Fig. 8.8 IBM SPSS for the 4-way loglinear model Fig. 8.9 Part of the results from backward elimination Fig. 9.1 The Crosstabs dialogue box Fig. 9.2 The Crosstabs: statistics dialogue box Fig. 9.3 The Crosstabs: cell display dialogue box Fig. 9.4 A Crosstabulation of deposits and levels of satisfaction (Note: If there are three or more study variables, it is best not to use the chi-squared test of independence. There is a method called log-linear analysis which is available in IBM SPSS Statistics (please refer to Chap. 8)) Fig The explore dialogue box Fig The explore: plots dialogue box Fig Test of normality output Fig Test of homogeneity of Variance output Fig Box plots of type of share * % change in price Fig The one-way ANOVA box Fig The one-way ANOVA: post hoc multiple comparisons box Fig Part of the IBM SPSS statistics output: ANOVA & multiple comparisons Fig Kruskal-Wallis test output Fig Current and constant prices data file Fig Compute variable dialogue box Pindex Fig Price index variable added to the data file Fig Real expenditures variable added to the data file Fig Plot of real expenditures vs current expenditures

16 List of Tables Table 1.1 Fictious data... 4 Table 2.1 Logistic estimate results for the Dietrich and Sorenson study Table 9.1 Contingency table Table 10.1 Types of shares * % change in shares xvii