= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

Size: px
Start display at page:

Download "= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data"

Transcription

1 Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help him understand his business better. You will complete six types of analyses for Donald: confidence intervals, t-test, one-way ANOVA, correlation, simple regression and multiple regression analyses. And present those findings in a memorandum that summarizes your findings in a way that Donald would understand. You will hand in this worksheet and your perfect and professional looking memorandum to receive credit for this assignment. This worksheet will help you to complete this assignment. To receive credit, the memo must include each of these analyses along with the appropriate graphs (with your name in the title of each graph). This assignment will be graded, and if perfect can earn up to 10 points toward your final grade. This is equivalent to raising a test score by one full grade. The database can be found on our class website: Donald s used car data Finding Descriptive Statistics First we will focus on the mileage (miles on odometer) data. Use the data analysis package in Excel to find the Descriptive Statistics for Mileage. Step 1: Open database Step 2: Open data Analysis menu Step 3: Choose Descriptive Statistics Be sure to click the box for Summary statistics Name: Lab Session: CID Number: The average mileage = The standard deviation for mileage = The best (highest, or max) mileage = The worst (lowest, or min) mileage = The number of cars (count) = The standard error of mean for mileage = Calculating Confidence Intervals Find standard error of mean by dividing the standard deviation by the square root of number of cars: Show your work here = _ = Did you find the same value as what is listed for Standard Error? Help Donald find the confidence intervals for car mileage: Remember a confidence interval allows you to guess the mean of the population from the mean of a sample. By guessing a range we are more likely to be correct in our guess (even though our guess is a range rather than just a specific number.) For this problem you ll need the following descriptive statistics from above: The average mileage = The standard error of mean for mileage = Critical z for 95% Confidence interval = _ (see table to right) Critical z for 99% Confidence interval = _ (see table to right) Step 3: Find the scores that border the middle 95% x = x + zσx Also written as: mean + (z score)(standard error of the mean) Step 4: Find the scores that border the middle 99% x = x + zσx Also written as: mean + (z score)(standard error of the mean) 95% Confidence Interval lower boundary raw score is _ 95% Confidence Interval upper boundary raw score is _ (Please input values into drawing on the right.) 99% Confidence Interval lower boundary raw score is _ 99% Confidence Interval upper boundary raw score is _ (Please input values into drawing on the right.)

2 Creating a Histogram: Donald wants to see the actual curve for his data (mpg), so let s create it and print in out for him. (Be sure to include the histogram properly labeled including your name in your memo.) Again we will use the mileage (miles on the odometer) data. Step 1: Open database - Choose tab on bottom called: Mileage and Bins Step 2: Open Data Analysis menu Step 3: Choose Histogram For the Input Range select all of the data for Mileage For the Bin Range select the data for Mileage Bin Be sure to click the box for Chart Output Step 4: Clean up your Histogram chart Delete the label that reads Frequency (click and delete) Delete the last row in the table More 0 Select and then right-click on the histogram bars Choose Format Data Series set Gap Width to zero Step 5: Adjust the labels and print graph It should look like this

3 Completing a t-test hypothesis test Donald wants to know whether car price is affected by the number of doors a car has. So he compares car price with number of doors (there are only two choices 2-door and 4-door). So there are only two levels of the independent variable. So, you decide to complete a t-test with an alpha of {tea for two and two for tea} (Be sure to include the bar graph showing both means, properly labeled including your name, in your memo.) Independent variable (IV): Number of levels of IV (what are they?): Quasi or True experiment: Dependent variable: Level of measurement of DV: Between or within participant design: One or Two-tailed test: Step 1: Open database - Choose tab on bottom called: t-test doors & price Step 2: Open Data Analysis menu - Choose t-test: Two-Sample Assuming Equal Variances Be sure the data are sorted by Doors so that all of the 2-door cars are listed before all of the 4-door cars (Careful this next bit can be tricky - be sure that you select data from column A mileage because that is our DV) For the Variable 1 Range select the data for Mileage but just for those cars with 2-doors (should be about 190 cars) For the Variable 2 Range select the data for Mileage but just for those cars with 4-doors (should be about 614 cars) (Careful to notice that you are entering 2-door first, then 4-door; so 2-door will appear first on Excel output) Step 3: Interpret output Average price for 2-door cars & 4-door cars: _ State the alpha level: Value for the observed t-statistic (called t Stat): Value for the critical t-statistic: Value for the degrees of freedom: What is the p value: Was it a significant difference: yes no Should he reject the null hypothesis? yes no Should he report the p < 0.05? yes no Report finding in proper form: Step 4: Let s draw a bar graph of the two means. Should look like this

4 Completing an ANOVA hypothesis test Donald wants to know whether car price is affected by the size of engine (4, 6, versus 8 cylinders). So he compares car price with size of engine (there are three levels of the independent variable 4, 6, versus 8 cylinders). So, you decide to complete an ANOVA with an alpha of (Be sure to include the bar graph showing all three means, properly labeled including your name, in your memo.) Independent variable (IV): Number of levels of IV (what are they?): Quasi or True experiment: Dependent variable: Level of measurement of DV: Between or within participant design: Step 1: Open database Choose tab on bottom called: Engine size and price Step 2: Open Data Analysis menu Step 3: Choose ANOVA: Single Factor We have three columns, one for each level of the independent variable. The data have been rearranged so that Excel can complete the ANOVA. For Input Range select all three columns some cells will be blank because we have a different number of cars in each category that s okay Remember to choose labels, and click appropriate box Step 4: Interpret output Average price for 4, 6 and 8-cylinder cars: State the alpha level: Value for the observed F-statistic: Value for the critical F-statistic: Degrees of freedom between and within: What is the p value: Was it a significant difference: yes no Should he reject the null hypothesis? yes no Should he report the p < 0.05? yes no Report finding in proper form: Step 5: Let s draw a bar graph of the three means. Should look like this

5 Completing a Correlation Donald wants to know whether car price is related to mileage. Both of these variables are numeric and he is looking for a relationship. (Be sure to include the scatterplot, properly labeled including your name, in your memo.) Step 1: Open database Choose tab on bottom called: Mileage&Price Step 2: Create a scatter plot and label properly Should look like this: Step 3: Open Data Analysis and choose Correlation Step 4: Interpret Correlation: Value of observed r: Degrees of freedom: Critical r: Was it a significant difference: yes no Should he reject the null hypothesis? yes no Should he report the p < 0.05? yes no Construct a summary using proper formatting _

6 Completing a Simple regression Donald wants to know whether he can predict price better if he knows how many miles the care has on it. (Be sure to include the scatterplot, that properly labeled and includes the regression equation and your name, in your memo.) Step 1: Create a scatterplot using same data as you did when completing correlation Step 2: Find the regression line Highlight the data points by clicking on one of the dots, and then right click the mouse to get Add Trendline option and choose it. Also be sure to click on the Display Equation on Chart Option. Also, be sure to click on the Display R-squared value on chart Option Clean up the font so that it looks like this: Step 3: Interpret regression Degrees of freedom: Value of correlation coefficient (r): Value of regression coefficient (b) Value of y intercept (a): Cars with more miles on the engine would tend to have price. (Higher or lower?) Cars with few miles on the engine would tend to have price. (Higher or lower?) What is the regression equation? Y = Interpret slope for each additional mile we would predict what to happen to price? Interpret y intercept: If a car had 30,000 miles what would Donald predict the price to be? What is the r 2 for this problem? Please interpret the r 2? _ (Hint: The proportion of total variance of the price of a car.. )

7 Completing a Multiple regression Donald wants to know whether he can predict the price of the cars better if he knows both how many miles the car has on it and how big the engine is. (This analysis will not include a graph, so besure to include the Excel output including your name, in your memo.) Step 1: Identify the predicted variable (DV): Identify the two predictor variables (IVs): & _ Step 2: Open database Choose tab on bottom called: Price, mileage and car size Step 3: Create a correlation matrix Open Data Analysis menu and choose Correlation We have three columns, select all three for the Input Range d15s_hw18_summary_prototypical_designs.docx Step 4: Open Data Analysis menu and choose Regression We have three columns, Price is first and is our predicted Y variable (Choose this column for Input Y Range ) Mileage and cylinder are next and are our two X variables (Choose both columns together for Input X Range ) Step 3: Interpret regression What was your regression coefficient for Intercept What was your regression coefficient for Mileage What was your regression coefficient for Cylinder ; Is the p < 0.05? ; Is the p < 0.05? ; Is the p < 0.05? _ What is your regression equation _ Y = a + b1x1 + b2x2 or Y = a + b1 (mileage) + b1 (car size) Interpreting slopes: For each addition mile that the car is driven (as X goes up by 1), the predicted price of the car (Y) will decrease. If we increase mileage by 1 full point and hold the other independent variable constant, we can estimate an decrease of in price. For each increase in engine size (from 4 to 6, or 6 to 8 cylinders), the predicted price of the car (Y) will increase. If we increase engine size by 1 full point (so X goes up by 1) and hold the other independent variable constant, we can estimate an increase of in price. Your output should look like this: d17f_sbs200_extracredit_summary_prototypical_designs