Regression Analysis I & II

Similar documents
CHAPTER 10 REGRESSION AND CORRELATION

Gasoline Consumption Analysis

STATISTICS PART Instructor: Dr. Samir Safi Name:

Timing Production Runs

Performance and regression analysis of thermoelectric generator

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

Questionnaire. (3) (3) Bachelor s degree (3) Clerk (3) Third. (6) Other (specify) (6) Other (specify)

A Study on Employee Engagement and its importance for Employee Retention in IT industry in India

CONSUMER ACCEPTANCE TOWARDS ONLINE GROCERY SHOPPING IN MALANG, EAST JAVA, INDONESIA

demographic of respondent include gender, age group, position and level of education.

Kvalitativ Introduktion til Matematik-Økonomi

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd

Regression diagnostics

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

Linear Regression Analysis of Gross Output Value of Farming, Forestry, Animal Husbandry and Fishery Industries

Statistical analysis of ambient air PM10 contamination during winter periods for Ruse region, Bulgaria

INVESTIGATION OF SOME FACTORS AFFECTING MANUFACTURING WORKERS PERFORMANCE IN INDUSTRIES IN ANAMBRA STATE OF NIGERIA

The Multivariate Regression Model

Running head: THE MEANING AND DOING OF MINDFULNESS

CSR organisational taxonomy and job characteristics on performance: SME case studies

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Optimal Productivity And Design Of Production Quantity Of A Manufacturing Industry

Author please check for any updations

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Please respond to each of the following attitude statement using the scale below:

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

Advances in Engineering & Scientific Research. Research Article

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Experimental Design Day 2

Getting Started with HLM 5. For Windows

A statistical analysis of value of imports in Nigeria

Week 10: Heteroskedasticity

Outliers Richard Williams, University of Notre Dame, Last revised April 7, 2016

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Salary Determinants for Higher Institutions of Learning in Kenya

The Dummy s Guide to Data Analysis Using SPSS

INVESTIGATING THE IMPACT OF POOR UTILISATION OF QUALITY MANAGEMENT SYSTEM IN A SOUTH AFRICAN FOUNDRY. CSIR, South Africa

Relationship of Leadership Styles and Employee Creativity: A Mediating Role of Creative Self-efficacy and Moderating Role of Organizational Climate

ALL POSSIBLE MODEL SELECTION IN PROC MIXED A SAS MACRO APPLICATION

Understanding Fluctuations in Market Share Using ARIMA Time-Series Analysis

LIR 832: MINITAB WORKSHOP

Professional Ethics and Organizational Productivity for Employee Retention

Telecommunications Churn Analysis Using Cox Regression

Hypothesis Testing: Means and Proportions

Psy 420 Midterm 2 Part 2 (Version A) In lab (50 points total)

The effectiveness of the promotional tools in creating awareness toward customers of Islamic banking in Malaysia

Power Generation Capacity and Its Investment Requirements in Pakistan for Twenty Years ( )

Service Quality and Consumer Behavior on Metered Taxi Services

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL

Measurement Systems Analysis

AMB201: MARKETING & AUDIENCE RESEARCH

Calculate the explicit, implicit, and the total economic costs of attending college. [5 marks]

Business Math Curriculum Guide Scranton School District Scranton, PA

Human Services Cosmetology II Multiple Choice Math Assessment Problems

Factors Affecting Brand Switching In Telecommunication Sector

SECTION 11 ACUTE TOXICITY DATA ANALYSIS

Effectiveness of Strategic Human Resource Management on Organizational Performance at Kenya Seed Company-Kitale

Managerial Economics

Introduction of STATA

Effects of Service Quality, Price and Promotion on Customers Purchase Decision of Traveloka Online Airline Tickets in Jakarta, Indonesia

Advances in Engineering & Scientific Research. Research Article FACTORIAL DESIGN AND OPTIMIZATION OF THE WEIGHT OF THE CUBE (KG) IN CONCRETE MIXTURE

E-SERVICE QUALITY EXPERIENCE AND CUSTOMER LOYALTY: AN EMPHASIS OF THE NIGERIA AIRLINE OPERATORS

Factors Affecting Management Accounting Practices in Malaysia

How to Conduct an OFCCP-Style Compensation Analysis with Microsoft Excel(Will Begin Momentarily)

The impacts of intellectual capital of China s public pharmaceutical company on company s performance

Priscilla Jennifer Rumbay. The Impact of THE IMPACT OF CUSTOMER LOYALTY PROGRAM TO CUSTOMER LOYALTY (STUDY OF GAUDI CLOTHING STORE MANADO)

Why Learn Statistics?

SUCCESSFUL ENTREPRENEUR: A DISCRIMINANT ANALYSIS

Maximizing validity of personality questionnaires. Michael D. Biderman

Probability Of Booking

Arch. Metall. Mater. 62 (2017), 2,

CHAPTER 3. Quantitative Demand Analysis

STATISTICAL INFERENCES IN MARKET RESEARCH FOR SUSTAINABLE DEVELOPMENT IN CONFERENCE TOURISM

Quadratic Regressions Group Acitivity 2 Business Project Week #4

Impact of Extrinsic Rewards on Job Satisfaction of Banking Sector Employees of Karachi Pakistan

Keywords: Marketing, Technology, Online Marketing, Technology Acceptance Model.

ESTIMATING GENDER DIFFERENCES IN AGRICULTURAL PRODUCTIVITY: BIASES DUE TO OMISSION OF GENDER-INFLUENCED VARIABLES AND ENDOGENEITY OF REGRESSORS

A study of cartel stability: the Joint Executive Committee, Paper by: Robert H. Porter

Advanced Analytics through the credit cycle

Analysis of Factors Affecting Resignations of University Employees

ARIMA LAB ECONOMIC TIME SERIES MODELING FORECAST Swedish Private Consumption version 1.1

Chapter 13. Oligopoly and Monopolistic Competition

The Effects of Consumers Brand Equity Perceptions on Brand Extension Strategy

Computer Science and Software Engineering University of Wisconsin - Platteville 3. Statistical Process Control

CHARACTERIZATION OF KEY PROCESS PARAMETERS IN INJECTION BLOW MOLDING FOR IMPROVING QUALITY. Submitted by

Joseph G. Eisenhauer Interaction Between Indicators: An Example

Project 2 - β-endorphin Levels as a Response to Stress: Statistical Power

Understanding and accounting for product

The Relationship Between Service Quality and Customer Satisfaction in the Telecommunication Industry: Evidence From Nigeria

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]

STUDY OF CUSTOMER PERCEPTION OF TELECOMMUNICATION SERVICE PROVIDERS IN HIMACHAL DISTT SOLAN

Logistic Regression for Early Warning of Economic Failure of Construction Equipment

COMPARISON OF LOGISTIC REGRESSION MODEL AND MARS CLASSIFICATION RESULTS ON BINARY RESPONSE FOR TEKNISI AHLI BBPLK SERANG TRAINING GRADUATES STATUS

THE COMPETITIVE ADVANTAGE OF USING ISO-9000 FOR FORTUNE 100 COMPANIES ABSTRACT

SQL*LIMS Stability Analytics Software

LEAN SIX SIGMA BLACK BELT CHEAT SHEET

Chapter 5 DATA ANALYSIS & INTERPRETATION

S-ID Used Subaru Foresters I

Transcription:

Data for this session is available in Data Regression I & II Regression Analysis I & II Quantitative Methods for Business Skander Esseghaier 1

In this session, you will learn: How to read and interpret a regression output how much does the regression explain (R-Square) is the regression significant? (F-Test) is any coefficient significant? (T-Test) How to deal with some key issues you may be confronted with when using regression models multicollinearity categorical independent variables (dummy variable regression) How to identify the most important variables in a regression analysis 2

Meddicorp Meddicorp sells medical supplies to hospitals, clinics and doctors offices Company currently markets in 3 regions of US: South, West and Midwest Meddicorp management is concerned with the effectiveness of a new bonus program for its sales force Management wants to know if there is a relationship between sales and bonuses in 1999 3

What Does the Data Says? SALESPERSON SALES $ BONUS $ 1 964 231 2 893 236 3 1057 272 4 1183 291 5 1420 282 6 1548 321 7 1580 294 8 1072 306 9 1078 238 10 1123 271 11 1305 333 12 1552 262 13 1040 236 14 1045 250 15 1102 233 16 1225 272 17 1508 267 18 1564 277 19 1635 312 20 1159 293 21 1203 268 22 1294 310 23 1468 291 24 1584 289 25 1125 273

Measures of Association Scatter plot It describes the relationship between two variables graphically Relationship between Sales Perfromance and Bonus Received 340 320 Bonus in $ 300 280 260 240 220 200 800 900 1000 1100 1200 1300 1400 1500 1600 1700 Sales in $

How Strong is the Association? Relationship between Sales Perfromance and Bonus Received B o n u s i n $ 340 320 300 280 260 240 220 200 800 900 1000 1100 1200 1300 1400 1500 1600 1700 Sales in $

Measures of Association SALESPERSON SALES $ BONUS $ 1 964 231 2 893 236 3 1057 272 4 1183 291 5 1420 282 6 1548 321 7 1580 294 8 1072 306 9 1078 238 10 1123 271 11 1305 333 12 1552 262 13 1040 236 14 1045 250 15 1102 233 16 1225 272 17 1508 267 18 1564 277 19 1635 312 20 1159 293 21 1203 268 22 1294 310 23 1468 291 24 1584 289 25 1125 273 Advertising 700 650 600 550 500 450 400 350 300 800 900 1000 1100 1200 1300 1400 1500 1600 1700 Sales in S The two variables must have the same number of observations (paired variables)

Measures of Association Covariance and Correlation Measures that summarize the strength of the linear relationship between the two variables numerically Cov(X,Y) = n i= 1 (X X)(Y Y) i n 1 i Corr(X,Y) = Cov(X,Y) ss X Y Excel: COVAR(data1,data2) Excel: CORREL(data1,data2) The two variables, say X and Y, must have the same number of observations (paired variables)

Measures of Association The advantage that the correlation has over covariance is that the correlation is always between -1 and +1. Corr(X,Y) = -1 negative linear relationship Corr(X,Y) = +1 positive linear relationship Corr(X,Y) = 0 no linear relationship All other values of correlation are judged in relation to these three values.

Measures of Association Covar(X,Y) = 11.88 Corr(X,Y) = 0.91 Covar(X,Y) = - 943.163 Corr(X,Y) = - 0.82

Measures of Association Covar(X,Y) = 85.43 Corr(X,Y) = 0.20 Covar(X,Y) = - 0.00058 Corr(X,Y) = - 0.0038

The Basic Idea Behind Regression Suppose you believe Salary is related to Education, Experience & Gender Measure of Association Corr (Salary, Education) Hypothesis testing Gender discrimination? Corr (Salary,Experience) Average Salary for Males versus Average Salary for Females 12

The Basic Idea Behind Regression If we believe Salary is related to Education, Experience and Gender, can we come up with some weighted sum of Education, Experience and Gender that will help us predict Salary with as little error as possible Salary = b0 + b1*education + b2*experience + b3*gender Goal of regression come up with the weight b0, b1, b2, b3 that will predict Salary with the least possible error using the variables Education, Experience and Gender Gender discrimination context Suppose b3 is significantly different from zero, then we can more conclusively say that there is possible gender discrimination, because we have separated out the effects of education and experience 13

Uses of Regression Describing and understanding relationships Sales = 20,000-3.2 Price + 0.1 Advertising Forecasting and predicting a new observation what will sales be if I change advertising or price? Adjusting the independent variables how much should I change advertising or price? 14

Back to Meddicorp Meddicorp management is concerned with the effectiveness of a new bonus program for its sales force There seems to be a relationship between sales and bonuses in 1999 But sales should be adjusted for other factors for example, if Meddicorp advertised heavily, then that would increase sales so sales should be corrected for advertising expenses to get the true effect of sales force performance 15

Meddicorp Regression Regression Analysis: SALES versus ADV, BONUS The regression equation is SALES = - 516 + 2.47 ADV + 1.86 BONUS T-Stats P-Value for 2-tailed test Predictor Coef SE Coef T P Constant -516.2 191.1-2.70 0.013 ADV 2.4715 0.2762 8.95 0.000 BONUS 1.8587 0.7184 2.59 0.017 Analysis of Variance R-Sq = 85.3% R-Sq(adj) = 84.0% R-Square Source DF SS MS F P Regression 2 1066070 533035 64.01 0.000 Residual Error 22 183212 8328 Total 24 1249282 F-Test 16

What is the R-Square Here? Y Y Y X X X R-Sq = 1 R-Sq < 1 R-Sq << 1 17

And Here? Y Y Y X X X R-Sq = 1 R-Sq < 1 R-Sq << 1 18

What About Here? Y Y Y X R-Sq = 1 R-Sq = 1 X R-Square is meaningless X 19

What if R-Square = 0? Does it imply that there is no relationship between the variables? 20

No Linear Relationship Y There is a relationship between the variables but not a linear one: R-Sq = 0 X 21

Types of Non-Linear Relationships Y Y X X 22

No Relationship Between the Variables Y No Relationship R-Sq = 0 X (but R-Sq = 0 No Linear Relationship only!!) 23

Meddicorp More Variables Suppose Meddicorp believes that sales of a representative could be a function of: its market share; and the sales of the largest competitor in its territory What would be your research hypothesis for these variables? 24

Meddicorp Interpret Output Regression Analysis: SALES versus ADV, BONUS, MKTSHR, COMPET The regression equation is SALES = - 594 + 2.51 ADV + 1.91 BONUS + 2.68 MKTSHR - 0.119 COMPET Predictor Coef SE Coef T P Constant -594.3 260.5-2.28 0.034 ADV 2.5113 0.3153 7.96 0.000 BONUS 1.9074 0.7450 2.56 0.019 MKTSHR 2.678 4.661 0.57 0.572 COMPET -0.1192 0.3743-0.32 0.753 Not significant R-Sq = 85.8% R-Sq(adj) = 82.9% Analysis of Variance Source DF SS MS F P Regression 4 1071426 267857 30.12 0.000 Residual Error 20 177856 8893 Total 24 1249282 25

Meddicorp In Search for More Predictors Meddicorp wonders if sales could be better explained if they not only included current bonus, but also last year s bonus of the sales person 26

Including Last Year Bonus to Explain Sales Regression Analysis: SALES versus ADV, BONUS, LAST YR BONUS The regression equation is SALES = - 511 + 2.45 ADV + 1.60 BONUS + 0.28 LAST YR BONUS Predictor Coef SE Coef T P Constant -511.0 196.3-2.60 0.017 ADV 2.4528 0.2915 8.41 0.000 BONUS 1.598 1.257 1.27 0.217 LAST YR 0.279 1.089 0.26 0.801 No longer significant R-Sq = 85.4% R-Sq(adj) = 83.3% Analysis of Variance Source DF SS MS F P Regression 3 1066639 355546 40.88 0.000 Residual Error 21 182643 8697 Total 24 1249282 27

Meddicorp Regression Regression Analysis: SALES versus ADV, BONUS The regression equation is SALES = - 516 + 2.47 ADV + 1.86 BONUS T-Stats P-Value for 2-tailed test Predictor Coef SE Coef T P Constant -516.2 191.1-2.70 0.013 ADV 2.4715 0.2762 8.95 0.000 BONUS 1.8587 0.7184 2.59 0.017 Analysis of Variance R-Sq = 85.3% R-Sq(adj) = 84.0% R-Square Source DF SS MS F P Regression 2 1066070 533035 64.01 0.000 Residual Error 22 183212 8328 Total 24 1249282 F-Test 28

Multicollinearity Caused by high correlation between the RHS variables Correlations: SALES, ADV, BONUS, LAST YR BONUS SALES ADV BONUS ADV 0.899 0.000 BONUS 0.565 0.415 0.003 0.039 Too High LAST YR 0.587 0.472 0.847 0.002 0.017 0.000 Cell Contents: Pearson correlation P-Value 29

Detecting and Correcting Multicollinearity Detecting Multicollinearity Look at Correlations Large F-Stat, but poor T-Stat Correcting Multicollinearity Keep only one of the highly correlated variables 30

More on Detecting Multicollinearity Look at Variance Inflation Factors (VIF) set this Option in Minitab when estimating regression look at largest VIF VIF=1 indicates no multicollinearity VIF >5 is considered very bad 31

Meddicorp Regression Regression Analysis: SALES versus ADV, BONUS The regression equation is SALES = - 516 + 2.47 ADV + 1.86 BONUS T-Stats P-Value for 2-tailed test Predictor Coef SE Coef T P Constant -516.2 191.1-2.70 0.013 ADV 2.4715 0.2762 8.95 0.000 BONUS 1.8587 0.7184 2.59 0.017 Analysis of Variance R-Sq = 85.3% R-Sq(adj) = 84.0% R-Square Source DF SS MS F P Regression 2 1066070 533035 64.01 0.000 Residual Error 22 183212 8328 Total 24 1249282 F-Test 32

Correcting for Multicollinearity Regression Analysis: SALES versus ADV, BONUS, LAST YR BONUS The regression equation is SALES = - 511 + 2.45 ADV + 1.60 BONUS + 0.28 LAST YR BONUS Predictor Coef SE Coef T P VIF Constant -511.0 196.3-2.60 0.017 ADV 2.4528 0.2915 8.41 0.000 1.3 BONUS 1.598 1.257 1.27 0.217 3.5 LAST YR 0.279 1.089 0.26 0.801 3.8 S = 93.26 R-Sq = 85.4% R-Sq(adj) = 83.3% Analysis of Variance Source DF SS MS F P Regression 3 1066639 355546 40.88 0.000 Residual Error 21 182643 8697 Total 24 1249282 Source DF Seq SS ADV 1 1010324 BONUS 1 55746 LAST YR 1 569 33

Modeling Categorical Variables Meddicorp believes that sales are often a function of a region s attractiveness and bonus should correct for that is bonus related to sales beyond regional attractiveness? How to include regions in the regression? South, West, Midwest Create dummy variables 34

Including Dummies Regression Analysis: SALES versus ADV, BONUS, South, West, Midwest * Midwest is highly correlated with other X variables * Midwest has been removed from the equation The regression equation is SALES = 441 + 1.36 ADV + 0.970 BONUS - 259 South - 211 West Predictor Coef SE Coef T P Constant 440.9 206.7 2.13 0.045 ADV 1.3608 0.2623 5.19 0.000 BONUS 0.9703 0.4815 2.02 0.058 South -259.43 48.47-5.35 0.000 West -210.82 37.49-5.62 0.000 R-Sq = 94.7% R-Sq(adj) = 93.6% Can use only n-1 dummy variables if there are n categories; if you use all n it creates multicollinearity 35

Assessing the Importance of the Variables Standardized coefficient for Xi when regressed on Y standardized coeff = unstandardized coeff * stdv of Xi / stdv of Y standardized coeff are also called beta coefficient Compute in Excel if you like compute std deviation for regulars variables using the stdev(.) compute std deviation for dummy variables using the stdeva(.) Warning: Compute betas only for significant coefficients 36

Most Important Variables for Meddicorp Regression Analysis: SALES versus ADV, BONUS, South, West, Midwest The regression equation is SALES = 441 + 1.36 ADV + 0.970 BONUS - 259 South - 211 West Predictor Coef SE Coef T P Constant 440.9 206.7 2.13 0.045 ADV 1.3608 0.2623 5.19 0.000 BONUS 0.9703 0.4815 2.02 0.058 South -259.43 48.47-5.35 0.000 West -210.82 37.49-5.62 0.000 Beta Coef 0.442 0.121-0.521-0.439 R-Sq = 94.7% R-Sq(adj) = 93.6% 37

Takeaways T-stats > 2 and p-values < 0.05 indicate significance at 5% level of a variable in a regression implies that the variable can explain variation in the dependent variable With large number of independent variables, don t put them all at one shot in the regression if there are high correlations among some variables, pick one of these for the regression to avoid multicollinearity If there are n categories in a categorical variable, use n-1 dummy variables Compute standardized (beta) coefficients to check importance of variables 38