Researchjournali s Journal of Mathematics

Size: px
Start display at page:

Download "Researchjournali s Journal of Mathematics"

Transcription

1 Vol. 3 No. 3 May 2016 ISSN Comparing And Contrasting The Linear Regression Analysis Theory And Pearson s Product Moment Correlation Analysis Theory In Inferential Statistics Dr. Eldard Ssebbaale Mukasa (PhD) Dr. Martha Kibukamusoke (PhD)

2 Vol. 3 No. 3 May 2016 ISSN ABSTRACT This article takes a comparison and contrast of the correlation and regression theories in inferential statistics. The article uses the two theories on the relationship between customer services and volumes of sales at Kamwanyi B, restaurant in Kabalagala, Kampala, Uganda. The methods of data collection were by structured and un structured interviews with the use of a structured interview guide, which was administered to the restaurant management and at the same time the owners. The analysis was done using Ms Excel spread sheet, correlation and regression analysis was run and results indicated similarities in the results with r = 0.95 and also R 2 = 0.95, which implied a very high positive relationship between customer services and volumes of sales at the restaurant. Hypotheses were tested for both correlation coefficient(r) and the regression slope b 1, with the level of significance of 0.01 (99%) and in both cases the results indicated a rejection of the null hypotheses and accepting the alternative, which affirmed that there was a positive strong linear relationship between the IV and the DV. Key Words: Correlation, Regression, Sample, level of significance (alpha), Test statistic, hypothesis testing, theory, comparing and contrasting. 1.0 INTRODUCTION 1.1 PRESUMPTIONS A presumption is a combination of postulates, suggestions, or conventional specifics that try to give a logical and /or lucid account of basis-and-result (causal) associations among a collection of pragmatic happenings. On the other hand a theory is a based upon a hypothesis and backed by evidence. A theory presents a concept or idea that is testable. In science, a theory is not merely a guess. A theory is a fact-based framework for describing a phenomenon This article is intended to compare and contrast two theories/models. The theories considered are; the regression analysis theory and Correlations analysis theory and their application in the business environment. 1.2 BACK GROUND OF THE THEORIES (CORRELATION AND REGRESSION) The two theories try to describe the relationship between variables and the levels of associations, which we establish by testing the hypotheses at given levels of significance (Alpha), for this comparison we take alpha to be 0.01(99%) REGRESSION ANALYSIS THEORY It is an algebraic modus operand for learning about a linear interaction. Regression analysis theory, assumes a general regression model, stated as thus: Y = α + β 1X 1+ β 2X β kx k +ε

3 Vol. 3 No. 3 May 2016 ISSN This model tries to explain the relationship between the independent and dependent variables. T More so it emphasizes the linear relationship between the variables. The cause and effect relationship. This relationship is established by estimating a set of regression coefficients through the application of Ordinary Least Squares (OLS) approach. The coefficients considered here are: α, β 1, β 2,. β k. These coefficients will always be established by considering use of sample data consisting of observed values of independent and dependent variables. The estimated coefficients are determined by finding values that make the mean residual and standard deviation of the residual term as minimal as possible. The results of the estimates are presented as a prediction /estimation equation, usually in the form: Y Estimated.. = a +b1x1+b2x2+ + bkxk.+ U CORRELATIONS ANALYSIS THEORY Correlation is a measure of linear relationship between two or more variables. It is an approach used to strength and direction of the relationship being measured. Correlation is not concerned about the causes of the relationship talked about. Correlation coefficient for the population is denoted by rho (p), where as the sample correlation coefficient is denoted by ( r ). The relationship of association is either positive or negative at any single time, but not both. The value of the coefficients p or r ranges between -1 and +1, when the relationship very strong on either direction. The values of the coefficients tend to Zero (0) as the relationship of association diminishes. 1.3 CONTEXT OF COMPARISON BETWEEN REGRESSION AND CORRELATION THEORIES The article is indented to compare the similarities between the two theories and the contrasts when we test the hypothesis between the relationships of variables by looking at the correlation coefficients (r) and the coefficients of the slopes ( b) of the regression models and the overall hypothesis of the regression model, and looking at the possibilities of the causes and effects. 1.4 WHY COMPARE THE TWO THEORIES? The major reasons of comparisons are to establish, the relationships between the outputs of each theory when trying to estimate relationships amongst variables and/or when each model should be used and for what reasons. If used, then under which circumstances and what is being measured and of what importance. This comparison is intended to establish what kind conclusions should be drawn whenever either model is used. 2.0 DESCRIPTION AND THE USES OF THE THEORIES In both theories there must be at least two variables to be considered to enable the analyst establish a relationship between the two variables while using the theories /models in question. 2.1 IDENTIFICATION OF VARIABLES TO COMPARE AND CONTRAST USING THE THEORIES

4 Vol. 3 No. 3 May 2016 ISSN In this scenario, there basically two major variables with two minor ones. Sales volumes in restaurants are the dependent variables, whereas customer care is the independent variable. The reader will note that there are variables not considered but very vital in the theories under discussions, the intervening and the bridging variables 2.2 ANALYSIS OF THE RELATIONSHIP BETWEEN THE VARIABLES USING THE THEORIES In our analysis, we need to establish whether there is a relationship between sales volumes in restaurants and the quality of customer cares provided by the attendants. In this analysis, we shall have the following objectives: To record the volumes of sales for the period of 6 months To identify the quality of customer services provided by the restaurant attendants Assess the relationship between sales volumes and customer care by restaurants (Kamwanyi B, Kabalagala). The details of performance of the restaurant for the 6 months is summarized in the table below Months Sales volumes ( US $) Customer services Ranked from 1 to 4 July 12,000 1 Poor August 16,000 2 Fair September 17,000 3 Good October 18,000 4 Excellent November 16, Good December 19,000 4 Excellent Source: Kamwanyi B, Restaurant, Kabalagala ( 2014) Using MS excel we have carried an analysis by the two theories and here below are the outputs. We have assumed the level of significance to be alpha = 0.01 (99%). SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 6 ANOVA Df SS MS F Significance F Regression Residual Total Coefficients Standard Upper Lower Upper t Stat P-value Lower 95% Error 95% 99.0% 99.0% Intercept X Variable Correlations Output Volumes of sales Volumes of sales 1 Customer care customer care 1

5 Vol. 3 No. 3 May 2016 ISSN FURTHER ANALYSIS AND INTERPRETATION OF THE OUTPUT (REGRESSION AND CORRELATION ANALYSIS THEORIES REGRESSION ANALYSIS OUTPUT INTERPRETATION We shall fit the regression model from the output and give a simple interpretation of the model. The model will be in the form: Y est = a + b 1X b kx k We shall then have: Y pred = X 1 + e: This implies that with absence of desirable customer services, the restaurant would still earn US$ 10, , but for every unit of improved customer service the restaurant would earn an additional US $ 1, to the restaurants volumes of sales, ANALYSIS AND INTERPRETATION OF THE OUTPUT OF CORRELATION THEORY Sales volumes Sales volumes 1 customer care customer care According to the output (r = ), which measures the relationship of association between sales volumes and customer services offered by the restaurant. This implies that there is a high, positive linear relationship of association between customer services and volumes of sales for the period under study. 3.0 HYPOTHESIS TESTING OF THE SAMPLE OUTPUTS 3.1 INFERENCE ABOUT THE SLOPES OF THE REGRESSION THEORY AND CORRELATION COEFFICIENTS REGRESSION THEORY INFERENCE Under this section we shall further test our results above by test the hypothesis about the slope of the fitted regression line and the population slope. Thus the hypothesis is tested. Ho: There is no significant relationship between sales and customer care at the restaurant HA: There exists a significant relationship between sales and customer care at the restaurant Level of significance (alpha = 0.01), where: b 1 = Sample regression slope coefficient, β 1 = Hypothesized slope s b1 = Estimator of the standard error of the slope, d.f = n-2 = 4 We shall test the above hypothesis that there is no significant relationship between the sales and customer services of the restaurant

6 Vol. 3 No. 3 May 2016 ISSN From the table above, the slope value ( b 1= ), sb 1 = , t stat = , Where we have t = ( )/ = by computation. Comparing the t- statistic and the critical value of t (alpha/2), n-2 = Alternatively we compare the p value which is ( ) from the out put and compare it with the value of alpha (0.01). In the two comparisons we note that the critical value (4.604) is less than the t statistic ( ), and the p value ( ) is less than the value of alpha (0.01). We therefore reject the Null hypothesis and accept the alternative hypothesis. We therefore conclude that: There is evidence that there exists a relationship between the independent and dependent variables. We could alternatively use the level of significance alpha value (0.01) and the P values or the F significance value. F value is compared to the value and We note that the F value is less than the level of significance. We therefore again reject the null and accept the alternative CORRELATION THEORY INFERENCE We shall test the hypothesis about the relationship of association between volumes of sales and customer services at the restaurants. We are test about the population parameter (rho), but using the sample statistic(r ) Hypotheses H : ρ = 0 (no correlation) 0 H : ρ 0 (correlation exists) A Test statistic r (with n 2 degrees of freedom) t 2 1 r n 2 t r 2 1 r n

7 Vol. 3 No. 3 May 2016 ISSN Looking at the outcomes, we shall conclude as follows: Conclusion We shall reject HO: There is no significant relationship between sales and customer service at 99% level of confidence and Accept HA: That there exists a significant relationship between sales and c. We customer services at 99% level of significance. We shall say that, there is evidence to show that there significant a exists a relationship between sales volumes of restaurants and customer services worldwide. 3.2 COMPARISONS AND CONTRASTS BETWEEN THE TWO THEORIES COMPARING THE TWO THEORIES Specifically from two theories run above we note that: In both theories, HO: Was rejected and HA; Accepted In a single independent variable as the case above of regression analysis, the coefficient of regression is equal to the coefficient of correlation Again in the single independent variable case (like in this model), the coefficient of determination is, Where: 2 2 R r R 2 = Coefficient of determination, r = Simple correlation coefficient Neither regression nor correlation analyses can be interpreted as establishing cause-and-effect relationships. They can indicate only how or to what extent variables are associated with each other. The correlation coefficient(r) measures only the degree of linear association between two variables. Any conclusions about a cause-and-effect relationship must be based on the judgment of the analyst, other factors assumed to be constant CONTRASTS BETWEEN THE TWO THEORIES When we look at the regression theory, we can carry out further analyses and get other interpretations as indicated by the outputs, which is not the case for correlation theory.

8 Vol. 3 No. 3 May 2016 ISSN In a further analysis the slope (b 1), or call it the gradient of line or the marginal physical product and probably the marginal propensity to improve customer services takes on an interval which is given by the formula: b 1 t/2s b 1, with (n-2), degrees of freedom, from the output, which is the final result at both 95% and 99% levels of confidence, where as the correlation coefficient takes on only one value as in the case above. It can further be concluded from the output table using the R 2 ( ) the coefficient of multiple determination that, proportion of variation in volumes of sales (Y) are 95% explained by all customer services (X) variables taken together as indicated by the formula here below: R 2 Y.12 = Explained variation = SSR Total variation SST Always the value of R 2 does not decrease whenever a new X variable is added to the model; It is only sales volumes (y) values that will determine SST. This becomes a disadvantage when comparing models. On the other hand when we look at the R 2 ( ), the coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. In this case about 95 % of the variation in the volumes of sales are explained by the customer services at the restaurant, which means that, there is still 5% of the variations in volumes of sales which can be explained by other variables (say intervening and probably extraneous) which may be beyond the explanation of this theory, but could be by other factors, beyond the investigations of the researcher unless a further analysis is carried out. But these may probably be, say issues of prices, location, seasonal variations and others 3.3 CONCLUSIONS In conclusion, the objective of the article was to carry out a comparison and a contrast between the theory of regression analysis and the product moment Pearson s correlation coefficient analysis in terms of use and applications taking an example of customer services and volumes of sales in a restaurant set up. The research findings have revealed that the two theories have several seminaries and a few differences. The two theories are in general commonly used to measure linear relationships among self determining and reliant characteristics among observations of interests Whereas the regression analysis considers more of the causes and designs amongst studied variables of interest in this case the volumes of sales in relation to customer care given to clients attending the restaurant, Pearson s Correlational, analysis concentrates on the degree, or magnitude and directions of the relations amongst the characteristics of variables under study. The two theories are often used interchangeably depending on the level of experiences and knowledge of the subject. The two theories as used in parametric statistics before use a number of assumptions must be satisfied

9 Vol. 3 No. 3 May 2016 ISSN amongst which, sample data is used and within the data, the residual values are said to be normally distributed with Mean =0 and Standard deviation = 1 In our case above the test results indicate a very strong relationship between the two variables in question, say customer services and volumes of food stuff sold in a restaurant in Kamwanyi B. When variables are highly related, say there exists a perfect relationship, in which case extremity of relationships, for correlation theory the values of the coefficients will be ( r ± 1, ), this will definitely imply that r 2 = 1. The measure of the prefect or extremity of relationship in linear form will be the value of R 2 = 1 (Coefficient of determination). In such a situation, for both theories, the perfect relationship is symbolized by r 2 = R 2 = 1 We usually deal with sampled data, and we are interested in testing the validity of designs and tools of data collection for our analysis, and findings together with reliability of our results, we may need to test the hypothesis for significance of the correlation coefficient. For that matter in correlation theory we may use a t- test for hypothesis testing when sample size is less than 30, under several established assumptions and or a normal distribution /Z test for bigger samples, whereas for regression we may use the F significance test whenever we are testing the significance of overall model and/or the p-value compared to the level of significance to assure us about the goodness of sample slope (b 1) representing the population slope (β 1) for data analysis to come up with reliable conclusions. We therefore may say the two theories are different and can be used differently when analyzing the same data set. 4. REFERENCES Anker, Susan. Real Essays with Readings with 2009 MLA Update: Writing Projects for College, Work, and Everyday Life. Palgrave Macmillan, Eunseok Ro (2014, Vol. 26) Pleasure reading behavior and attitude of non-academic ESL students: A replication study Girjatowicz, Józef P.(2014) Ice Thrusting and Hum mocking on the Shores of the Southern Baltic Sea's Coastal Lagoons Grzebisz, Witold (Jul 2013, Vol. 368) Crop response to magnesium fertilization as affected by nitrogen supply Grunbaum, Avigyle (Nutrition Journal; 2013), Dynamics of vitamin D in patients with mild or inactive inflammatory bowel disease and their families Katherine Bakeev, Robert Chimenti (2013) Pros and Cons of Using Correlation Versus Multivariate Algorithms for Material Identification via Handheld Spectroscopy Keith M. Bower, (2000), The paired T-test using MINTAB, Kyklos; May 2013, International journal of biological science, Vol. 66 Issue 2, Malcom R. Forster (2003), Unification. Wayne Myrvold (Philosophy of Science, Norman R.Draper and Hary Smith, 1998, Applied Regression analysis.