Business Statistics (BK/IBA) Tutorial 4 Exercises

Size: px
Start display at page:

Download "Business Statistics (BK/IBA) Tutorial 4 Exercises"

Transcription

1 Business Statistics (BK/IBA) Tutorial 4 Exercises Instruction In a tutorial session of 2 hours, we will obviously not be able to discuss all questions. Therefore, the following procedure applies: we expect students to prepare all exercises in advance; we will discuss only a selection of exercises; exercises that were not discussed during class are nevertheless part of the course; students can indicate their wish list of exercises to be discussed during the session; teachers may invite students to answer questions, orally or on the blackboard. We further understand that your time is limited, and in particular that your time between lecture and tutorial may be limited. In case you have no time to prepare everything, we kindly advise you to give priority to the exercises that are indicated with the are not relevant! 7A+B Several μs: comparison + Several μs and medians: more issues icon. This does not mean that the other questions Q1 (based on Doane & Seward, 4/E, 11.7 and 11.13) Semester GPAs are compared for seven randomly chosen students in each class level at Oxnard University. a. Does the data show a significant difference in mean GPAs? b. Which pairs of mean GPAs differ significantly (4 majors)? Excel output for ANOVA Anova: Single Factor SUMMARY Groups Count Sum Average Variance Accounting 7 19,84 2, , Finance 7 21,17 3, , Human Resources 7 22,69 3, , Marketing 7 23,6 3, , ANOVA Source of Variation SS df MS F P-value F crit Between Groups 1, , , , , Within Groups 2, , Total 3, SPSS output for post-hoc BS 1 Tutorial 4

2 Q1 a. Yes; b. the mean score for Marketing differs significantly from the mean score for Accounting. Q2 The retailing manager of a supermarket chain wants to determine whether product location has any effect on the sale of pet toys. Three different aisle locations are considered: front, middle, and rear. A random sample of 18 stores is selected, with 6 stores randomly assigned to each aisle location. The size of the display area and price of the product are constant for all stores. At the end of a one-month trial period, the sales volumes (in thousands of dollars) of the product in each store were as follows (and are stored in the file Locate). Aisle Location Front Middle Rear 8,6 3,2 4,6 7,2 2,4 6,0 5,4 2,0 4,0 6,2 1,4 2,8 5,0 1,8 2,2 4,0 1,6 2,8 a. At the 0.05 level of significance, is there evidence of a significant difference in mean sales among the various aisle locations? b. If appropriate, which aisle locations appear to differ significantly in mean sales? c. At the 0.05 level of significance, is there evidence of a significant difference in the variation in sales among the various aisle locations? d. What should the retailing manager conclude? Fully describe the retailing manager s options with respect to aisle locations. Test of Homogeneity of Variances SALES Levene Statistic df1 df2 Sig. 2, ,142 ANOVA SALES Between Groups Within Groups Total Sum of Mean Squares df Square F Sig. 48, ,222 14,105,000 25, ,717 74, BS 2 Tutorial 4

3 Dependent Variable: sales Multiple Comparisons Tukey HSD (I) locate front middle rear (J) locate middle rear front rear front middle *. The mean difference is significant at the.05 level. Mean Difference (I-J) Std. Error Sig. 4,0000*,7566,0003 2,3333*,7566,0195-4,0000*,7566,0003-1,6667,7566,1031-2,3333*,7566,0195 1,6667,7566,1031 sales Subset for alpha =.05 locate N 1 2 Tukey HSD a middle 6 2,067 rear 6 3,733 front 6 6,067 Sig.,103 1,000 Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 6,000. Q2 a. Yes; b. μfront differs significantly from μmiddle and μrear; c. No; d. The front aisle is best. Q3 (based on Doane & Seward, 4/E, 16.10) The results shown below are mean productivity measurements (average number of assemblies completed per hour) for a random sample of workers at each of three work stations. a. At α =.05, is there a difference in median productivity? b. Use one-factor ANOVA to compare the means. c. Do you reach the same conclusion? BS 3 Tutorial 4

4 Q4 (based on Berenson 11/E, 11.38) A student team in a business statistics course performed an experiment to investigate the time required for pain-relief tablets to dissolve in a glass of water. The factor of interest is the temperature of the water (hot or cold). The experiment consisted of 12 replicates for each of the two temperature levels. The following data show the time a tablet took to dissolve (in seconds) for the 24 tablets used in the experiment: At the 0.05 level of significance, a. Is there an effect due to water temperature? Use Output from SPSS: b. Can you redo this analyses with a t-test? Q4 a. there is a temperature effect ; b. the analysis can be repeated with an independent samples t-test. Q3 a. reject H0; b. reject H0; c. the same. BS 4 Tutorial 4

5 8A+B Simple regression analysis + Multiple regression analysis and other issues Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression HomePrice = 125, SquareFeet. b. What is the prediction for HomePrice if SquareFeet = 2,000? c. Would the intercept be meaningful if this regression applies to home sales in a certain subdivision, different form the one used to find the regression equation? Q2 (based on Doane & Seward, 4/E, 12.13) The regression equation HomePrice = Income was estimated from a sample of 34 cities in the eastern United States. Both variables are in thousands of dollars. HomePrice is the median selling price of homes in the city, and Income is median family income for the city. a. Interpret the slope. b. Is the intercept meaningful? Explain. c. Make a prediction of HomePrice when Income = 50 and also when Income = 100. d. Given: R 2 = What is the meaning of that? (Data are from Money Magazine 32, no. 1 [January 2004], pp ) Q3 (based on Doane & Seward, 4/E, 12.26) A regression was performed using data on 16 randomly selected charities in The variables were Y = expenses (millions of dollars) and X = revenue (millions of dollars). a. Write the fitted regression equation. b. Construct a 95 percent confidence interval for the slope. c. Perform a right-tailed t test for zero slope at α =.05. State the hypotheses clearly. (Data are from Forbes 172, no. 12, p. 248, and SUMMARY OUTPUT Regression Statistics Multiple R 0, R Square 0, Adjusted R Square 0, Standard Error 14, Observations 16 ANOVA df SS MS F Significance F Regression , , , ,07289E-08 Residual , ,13245 Total Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 7, ,0403 Revenue 0,9467 0,0936 Q3 a. Y = X; b β ; c. reject H0 Q2 a. Increasing the median income by $1,000 raises the median home price by $2,610; b. If median income is zero, then the model suggests that median home price is $51,300; c. $181,800 and $312,300; d. 34% of the variance of HomePrice is explained by the model. Q1 a. Increasing the size of a home by 1 square foot increases the price by $150. b. HomePrice = $125,000 + ($150 2,000) = $425,000. c. The intercept might be interpreted as the value of the lot without a home. But the range of values for SquareFeet does not include zero so it would be dangerous to extrapolate for SquareFeet = 0. BS 5 Tutorial 4

6 Q4 Use a linear regression model to explain the height (Dutch: lengte ) of female premaster students ( ) in terms of their shoe size (Dutch: schoenmaat ). Below you find some computer output, based on a random sample of these students. Predicted values for: Lengtecm 95% Confidence Interval 95% Prediction Interval Schoenmaat Predicted lower upper lower upper a. Determine the theoretical and the estimated model belonging to the given output. b. It is claimed that the slope in this model is larger than 2. Test this hypothesis (α = 1%). c. Is this a useful model in order to predict the height of female premaster students? (Perhaps you have seen a footprint in the snow; is it useful (using this model) to predict the height of the person concerned?) d. You see a footprint of size 38 in the snow and looking up you see in the distance a (female) premaster student just walking away. Give a relevant 95% interval for the height of this (female) premaster student. BS 6 Tutorial 4

7 e. The next day you see another footprint of size 38. Give a relevant 95% interval for the average height of all (female) premaster students with shoe size 38. f. Calculate a 90% confidence interval for the constant in the regression model. Q4 a. Theoretical model: Yi = β0 + β1xi + εi, with εi~n(0, σ 2 ); Estimated model: Y = b0 + b1x = X; b. reject H0; c. not very useful; d. ሾ , ሿ; e. ሾ , ሿ; f. ሾ24.467,78.305ሿ Q5 A consumer products company wants to measure the effectiveness of different types of advertising media in the promotion of its products. Specifically, the company is interested in the effectiveness of radio advertising and newspaper advertising (including the cost of discount coupons). A sample of 22 cities with approximately equal populations is selected for study during a test period of one month. Each city is allocated a specific expenditure level both for radio advertising and for newspaper advertising. The sales of the product (in thousands of dollars) and also the levels of media expenditure (in thousands of dollars) during the test month are recorded, with the following results: SPSS results: BS 7 Tutorial 4

8 a. State the multiple regression equation (description of the model including assumptions and the estimated model). b. Interpret the meaning of the slopes, b 1 and b 2, in this problem. c. Interpret the meaning of the regression coefficient, b 0. d. Which type of advertising is more effective? Explain. e. Determine whether there is a significant relationship between sales and the two independent variables (radio advertising and newspaper advertising) at the 0.05 level of significance. f. Interpret the meaning of the p-value. g. Compute the coefficient of multiple determination, R 2, and interpret its meaning. h. Find the adjusted R 2 and interpret its meaning. i. Is there evidence that the slope coefficient for Radio advertisements is more than 10 at α = 0.05? Q5 a. Y = β0 + β1x1 + β2x2 + ε with ε~n(0, σ 2 ) and Y = b0 + b1x1 + b2x2 = X X2; d. newspaper advertising is more effective; e. there is evidence of a significant linear relationship; i. there is evidence that the coefficient for radio advertisements is larger than 10. Old exam questions Q1 (i) 22 May 2017, Q1i-1k We wish to establish a demand function Q = a bp, using measurement data in 6 different years. Data are below; some part has been erased. BS 8 Tutorial 4

9 (j) (k) (l) Write down the estimated demand function. See (i). Give the upper bound of the 95% confidence interval for the slope coefficient for price. (2 decimals) See (i). Give the value of the usual test statistic for testing the overall significance of the model. (1 decimal) See (i). Give the value of R 2. (2 decimals) Q1 i. Q = P j k or 12.8 l Q2 29 March 2017, Q3 The effect of alcohol and drugs on learning achievements is a subject of intense research. A group of test subjects is asked to do a test exam, with a score between 1 (low) and 10 (high). Researchers want to find out how this relates to their use of alcohol and drugs in the week before the test was taken. For instance, student #2 reported that he used 13 alcoholic beverages and 4 times drugs in the week before the test. Results are shown below (take care: these tables use a decimal point). Some parts of the output have been suppressed. In all questions, define all symbols, except when you use standard symbols (such as R 2 ). BS 9 Tutorial 4

10 (a) (b) (c) (d) State the the theoretical model analyzed, as well as its practical relevance (numerical) and statistical significance (numerical and the null hypothesis). Interpret practical relevance in a few words. Define all non-standard symbols. Find a 95% confidence interval for the slope coefficient for alcohol. Before taking the exam, Bob wants to relax by taking either an alcoholic beverage or a drug. Given that he likes to obtain a high grade, what can he best take: alcohol or drugs? Explain why. Test, at α = 5%, if the hypothesis that the slope coefficient for drugs is equal to 0.2 can be rejected. Use the five-step procedure. Q2 (a) Theoretical model: Y = β0 + β1x1 + β2x2 + ε, where X1 is the use of alcohol and X2 is the use of drugs. Statistical significance: the p-value of this model is 0.001, with H0: β1 = β2 = 0. Practical relevance: R 2 = SSR = SST Q3 29 March 2017, Q2c A study focuses on the speed of typing WhatsThat messages, split by age group. We ask random persons in three age groups to type a standard message, and observe the time required (seconds). Sample size, mean of the typing time and standard deviation of the typing time are reported below. What is the name of the test we use to find out if the differences in mean typing times of the three age groups are significant, and what is the critical value of the usual test statistic at α = 0.05? (4 points) Q3 Analysis of variance, the critical value is Or: Kruskal-Wallis, the critical value is = This means that 53% of the variance in the exam score is explained by alcohol and drugs use. (b) CIβ1;0.95 = ሾ , ሿ. (c) Bob can best take a drug, no alcohol. (d) Do not reject H0 and conclude that there is no reason for concluding that the slope factor for drugs is unequal to 0.2. BS 10 Tutorial 4