Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Size: px
Start display at page:

Download "Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT"

Transcription

1 STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

2 1. A manufacturing company is interested in the relationship between the number of orders processed per month (ORDERS) and the total work hours (HOURS1) needed to handle them. Use the attached SAS output at the end of the exam and the scatterplot below to answer the following questions. (a) (4 pts) What is the response variable and what is the explanatory variable in this analysis? (b) (6 pts) Write down the least squares regression line and describe the relationship between ORDERS and HOURS1 in terms that the CEO of this company (she only has taken an introductory statistics course) would understand.

3 (c) (5 pts) One of the cases used to fit this model was (3000, 8065). What is the residual for this case? (d) (5 pts) Suppose the CEO immediately dismisses the results of this analysis saying that with zero orders, the model should predict zero hours. Describe how you would respond to the CEO s concern. (e) (5 pts) Construct a 95% confidence interval for the slope and interpret, in words, what this confidence interval tells you. (f) (6 pts) The CEO thinks that each 6 additional orders should result in an average increase of 18 additional work hours. Express this statement in terms of a hypothesis test (state the null and alternative hypotheses). Does this seem reasonable given the results of this study at the α =.05 level? Show your work.

4 (g) (6 pts) The company expects 2800 orders next month. This happens to be the same ORDERS value as observation #4 in the output. Use the information provided to construct an appropriate 95% interval for the number of work hours next month. (h) (3 pts) What is the estimate of the Pearson correlation coefficient between ORDERS and HOURS1? Show your work.

5 2. Short answer questions. Each part is unrelated. (a) (6 pts) Explain when and why multicollinearity among explanatory variables is a concern. (b) (9 pts) A new graduate student at Purdue comes to you because he has concerns about his analysis. Specifically, his response variable does not look Normally distributed. He says a friend told him to not worry about the Normality of the response variable because the linear regression procedure is robust but he does not understand what this means. i Describe to the graduate student what is meant by a procedure being robust. ii Comment on his concern about the Normality of the response variable and what you would do to check the assumptions underlying his analysis. Feel free to suggest some graphical or numerical summaries for the graduate student to check.

6 (c) (10 pts) For each of the following simple linear regression scenarios, draw a representative scatterplot (you can choose the number of cases to put in the figures). i SS(Model)=SS(Total) ii SS(Error)=SS(Total) iii F test is significant but R iv Example where weighted least squares would be appropriate to use.

7 3. A set of movies released in 2008 was randomly drawn from the Internet Movie Database (IMDb) to see if information available soon after a movie s theatrical release can successfully predict total U.S. revenue (USRevenue). Information collected includes i the movie s budget (Budget) ii opening weekend revenue (Opening) iii the number of theaters (Theaters) the movie was in for the opening weekend; iv the movie s IMDb rating at the end of the first week (Opinion), which is on a 1 to 10 scale (10 being best). All dollar amounts are measured in millions of U.S. dollars. Use the attached SAS output to answer the following questions. (a) (10 pts) Use the scatterplot matrix to describe the pairwise relationships of each explanatory variable and the response variable. Which predictor appears to be the best single predictor of total U.S. revenue? Would you possibly consider any transformations of variables before fitting? If so, what would they be and why? (b) (5 pts) Using the model selection output, write down the fitted regression line for the best model based on adjusted R 2. Explain what this fitted line tells you about the relationship between the explanatory variables and the revenue.

8 (c) (4 pts) Get Smart was released in 2008 and had a budget of $80 million, was shown in 3911 theaters grossing $38.7 million during the first weekend, and had an IMDb rating of 6.8. Using the model in part (b), what was its expected U.S. Revenue? (d) (5 pts) Fit diagnostics for one model is shown to you. Are there any issues that concern you? If so, describe what you might do to further investigate these issues. (e) (4 pts) Suppose you were interested in a formal F test to compare the model with Opening, Opinion, and Budget as explanatory variables to the model with just Opening and Opinion. What is the value of the F statistic and what are its degrees of freedom?

9 t distribution table Area to the left of t(df) df

10 RESULTS FOR PROBLEM #1 The REG Procedure Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept orders <.0001 Output Statistics Dep Var Predicted Std Error Obs hours1 Value Mean Predict