Only check assumptions when I explicitly ask you to.

Size: px
Start display at page:

Download "Only check assumptions when I explicitly ask you to."

Transcription

1 Name: Check one: Section 1 (Tues-Thurs. 10:30-noon): Section 2 (Tues-Thurs. 1:30-3:00): Statistics 102 Midterm #2 November 15, pm This exam is closed book. You may have two pages of notes. You may use a calculator. You must write the exam using pen (not pencil). Show all your work. Statistical Tables are attached. There are blank pages at the end of the exam if you need more room. Question Total Points Points Received Total 65 Only check assumptions when I explicitly ask you to. 15 pages (including 2 blank pages at end in case you need extra space for answers) + 5 pages of statistical tables. 1

2 1) The following is a list of some of the statistical methods that you have learned about since midterm #1: 2 χ test for independence 2 χ goodness of fit test One-way ANOVA Two-way ANOVA Randomized Block Design ANOVA Simple linear regression For each of the situations described below, state the technique (from the list above) that you believe is appropriate. If none are appropriate, state none of the above. No calculations are required. a) [2 points] In a study to determine if prayer helps coronary-care patients recover (Philadelphia Inquirer, Jan. 7, 2001 p.j5), 45 patients were randomly assigned to receive standard treatment, standard treatment + herbal supplements, or standard treatment + prayer (where by prayer we mean that other people are praying for the patients without their knowledge). 15 patients were assigned to each treatment regimen. Patients not receiving herbal supplements were given placebo pills ---so the study was blind meaning that the patients didn t know which treatment group they were in. Doctors recorded the number of patients with adverse outcomes in each group and want to know if the rate of adverse outcomes is different across the 3 treatments. b) [2 points] A snack foods company is interested in improving the shelf life of its tortilla chips product. Twenty batches (each batch contains 1 pound of tortillas) of tortillas were made under each of four different formulations (containing different preservatives). The batches were then stored (all under the same conditions). Each batch was checked each day for freshness (tortillas were kept in resealable containers). For each batch, the shelf life (in days) until the product was deemed to be no longer fresh enough was recorded. The company wants to determine if there is any difference in average shelf life between the four formulations. 2

3 c) [2 points] An owner of a fast-food restaurant, is interested in finding out the best strategies for retaining employees. He wonders how much employee turnover is wage-related (e.g. employees quitting to get a better paying job elsewhere). On the other hand, he thinks that maybe fast-food employees just get tired of fastfood jobs and quit to get a different type of job. In an effort to get some answers, he obtains data from the local association of fast-food operators. This provides data for 50 local fast-food restaurants on the average wage in each restaurant and the quit rate (quits per 100 employees) in each restaurant. The owners wants to decide if higher wages are associated with lower quit rates. d) [2 points] Securities analysts evaluate and compare industry sectors. One of the variables used in this analysis is the variance of the percentage growth in net incomes for the previous year. An analyst obtains data (say, from Forbes) on the percentage growth in net income for a random sample of firms from the banking and energy sectors of the U.S. economy. The analyst wants to know if the variability in the growth rates of net income is different in the two sectors. e) [2 points] The operations manager for an appliance manufacturer wishes to determine the optimal length of time for the washing cycle of a particular household clothes washer model. An experiment is designed to measure the effect of length of the washing cycle time on the amount of dirt removed for standard household laundry loads. This experiment involves washing several laundry loads at each of four lengths of wash cycle (18 minutes, 20 minutes, 22 minutes, and 24 minutes) and recording the amount of dirt removed (e.g. in micrograms). However, since the brand of detergent used might also have an effect on the amount of dirt removed, it is decided that the wash cycles will be tested using four different detergent brands (A, B, C, and D). So, ultimately, 20 loads are washed using each of the 16 possible length/brand combination (for a total of 320 loads). The goal is to recommend one overall best length of wash cycle (regardless of detergent used) for the manufacturer to put on the label of the clothes washer. 3

4 2) [11 points] Research published in Accounting, Organizations and Society (vol 19, 1994) investigated whether the effects of different performance evaluation styles (PES) on the level of job-related tension is affected by trust. Three performance evaluation styles were considred. Each is related to the way in which accounting information is used for the purpose of evaluation. The three styles are budgetconstrained (BC), profit-conscious (PC), and the nonaccounting style (NA). A questionnaire was administered to 215 managers working in 18 Australian organizations. It measured the performance evaluation style of each manager s superior, the manger s job-related tension, and the manager s level of trust (low, medium, and high) in his or her superior. A partial ANOVA table of the analysis of this data appears below. Source df SS MS F P-value PES 1.09 Trust PES*Trust 8.86 Error Total a) Complete the ANOVA table. b) What can you conclude from this ANOVA table? (list all relevant conclusions). 4

5 3) [12 points] The owner of a single-family home would like to predict her electricity bill for the coming winter so she can budget accordingly. She has data from the previous 49 months (she kept all her electric bills!). For each month she knows the average temperature (in degrees Fahrenheit) and how many kilowatts of electricity she consumed. She has long range weather forecasts for the expected average temperature in each of the coming winter months. To estimate the relationship between average temperature and electricity consumption she fits the regression model below. (Some results have been purposefully blanked out with?????? ). Questions are continued on the next page.. Bivariate Fit of kilowatt By temp kilo wa 80 tt temp Linear Fit Linear Fit kilowatt = temp Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C. Total DF Parameter Estimates Term Intercept temp Sum of Squares Estimate Std Error Mean Square t Ratio 33.39????? Prob> t <.0001????? F Ratio??????? Prob > F?????? 5

6 a) Do the regression diagnostics indicate any problems with this regression model? Comment on both the residual plot and the normal probability plot. If transformations are warranted, list two transformations you would try. For the remaining question, ignore any problems (if any) with the regression assumptions as indicated by the regression diagnostics. That is, go ahead and interpret the regression statistics (for the purposes of this question) even if you think the regression model should be improved. b) Is average monthly temperature a useful predictor of monthly electricity consumption? Justify your answer. c) Interpret the slope of the regression line. d) Predict electricity consumption for a month when the average temperature is 90. Why should you not trust this prediction? 6

7 4) [8 points] A company that markets videotaped continuing education programs for the financial industry studied how to increase the number of customers who agree to purchase the programs. The sales representatives wanted to determine which initial approach will result in more sales of the program tapes: (1) a sales-information videotape mailed containing a preview of the programs to the prospective customer, (2) a letter containing a web link where the prospective customer can obtain more information including previews of the programs, (3) a telephone call to prospective customers (with a tape being sent subsequently to only those customers who are interested). They randomly assigned 50 prospective customers to get each strategy (i.e., there were a total of 150 prospective customers in this experiment). Data from this study on the percentage of customers who ultimately made a purchase are shown below. Does the sales approach seem to have an effect on the proportion of tapes purchased? Use α = Report the P-value of your test. Check any necessary assumptions. Which approach would you recommend? Show your work. Sales Approach Videotape Letter Phone Call Purchased 34% 24% 12% Use the back of the page if you need more room for you answer.. 7

8 5) [18 points] The National Science Foundation conducts a large biennial survey of scientists and engineers in the U.S. who have at least a Bachelor s degree. Analyses of a portion of the data from the 1997 survey are presented in the following pages. In particular, these analyses look at salary, education, and job satisfaction of the engineers that were surveyed in Based on these analyses, answer the following questions (all necessary output is provided some results have been purposefully blanked out with???? ). When answering questions: State the necessary hypotheses, quote the test statistic from the JMP output, state the distribution of the test statistic, quote the P-value from the JMP output (or compute it from Tables if necessary), state and interpret the conclusion. Clearly indicate which output you are using to answer each question (the analyses are labeled.) Note that job satisfaction (JOBSATIS) categories are 1=very satisfied, 2=somewhat satisfied, 3=somewhat dissatisfied, 4=very dissatisfied. Also, education degree (DGRDG) represents the highest degree obtained and the levels are 1=Bachelor s, 2=Master s, 3=Doctorate, 4=Professional. The salary variable is SALARP, the gender variable is GENDER (M,F) a) Do men and women have the same distribution across the job satisfaction categories? If not what is the difference. Check any necessary assumptions. b) Is salary related to job satisfaction for both men and women? Or is it just related for men? Or just for women? c) Does education (DGRDG) have the same effect on salary for both men and women? One more question on page 13 8

9 Display A Display B 9

10 Display C For Women Only Display D For Men Only Oneway Analysis of SALARP By JOBSATIS SALARP Oneway Anova Summary of Fit Rsquare Adj Rsquare Root Mean Square Error Mean of Response Observations (or Sum Wgts) Source JOBSATIS Error C. Total DF JOBSATIS Analysis of Variance Sum of Squares Level Number Mean e e11 Means for Oneway Anova Std Error Mean Square Lower 95% Std Error uses a pooled estimate of error variance F Ratio Upper 95% Prob > F Display E 10

11 Display F (Men and Women analyzed together) Display G (Men and Women analyzed together)???????? Display H????? 11

12 Display I Display J (both sets of output below) Y=SALARP Summary of RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Source DF Sum of Squares Mean Square F Ratio Model e e11????? Error e Prob > F C. Total e12?????? Parameter Term Estimate Std Error t Ratio Prob> t Intercept <.0001 GENDER[F] DGRDG[1] DGRDG[2] DGRDG[3] <.0001 GENDER[F]*DGRDG[1] GENDER[F]*DGRDG[2] GENDER[F]*DGRDG[3] Effect Source Nparm DF Sum of Squares F Ratio Prob > F GENDER DGRDG e <.0001 GENDER*DGRDG

13 6) [6 points] Men s Fitness magazine rated Philadelphia as the nation s least-fit city (January, 2000). Suppose you want to do your own study to see if you can confirm this finding. Unfortunately, you are limited to conducting a survey only in Philadelphia. On the bright side, you have access to information from various national sources, such as the National Institutes of Health (NIH). From these sources you determine that Body Mass Index (BMI) (the measure for determining whether someone is overweight) for adult Americans is 25 on average, with a standard deviation of 7.4. Furthermore, suppose these sources tell you that BMI of adult Americans follows a normal distribution (with mean 25 and standard deviation 7.4). Suppose you hire a survey firm to interview 200 Philadelphians. This firm gets each respondent to measure and report their BMI. This data is reported below in a table showing the number of respondents who were underweight (BMI<19), healthy (19 BMI < 25 ), and overweight ( BMI 25): Underweight Healthy Overweight BMI < BMI < 25 BMI 25 # respondents a) Suppose that you only have access to the table of data above (e.g. suppose you don t have access to the raw data). Test, as best you can, the hypothesis that the distribution of BMI for Philadelphians is normal with mean 25 and standard deviation 7.4 (or something as close to this hypothesis as is possible given the data above). Be sure to check the necessary assumptions of your test. State the P- value of your test. b) If you had access to the raw data (e.g., to the list of 200 BMI measurements), what graphical display could you use to check that the measurements are normally distributed? 13

14 14

15 15