Lecture 8: Introduction to sampling distributions

Size: px
Start display at page:

Download "Lecture 8: Introduction to sampling distributions"

Transcription

1 Lecture 8: Introduction to sampling distributions Statistics 101 Mine Çetinkaya-Rundel February 9, 2012

2 Announcements Announcements Due: Quiz 3 Monday morning 8am. OH change: Monday s office hours moved to Tuesday after class, 2:30pm - 4:30pm. Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

3 Recap Review question Which of the following is false? (a) Z scores can be calculated for observations that don t come from a normal distribution. (b) Z tables cannot be used to calculate percentiles for observations that don t come from a normal distribution. (c) Median of a right skewed distribution has a negative Z score. (d) The mean of any distribution distribution always marks the 50% percentile. Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

4 1 Variability in estimates Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, 2012

5 pewresearch.org/ pubs/ 2191/ young-adults-workers-labor-market-pay-careers-advancement-recession Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

6 Margin of error 41% ± 2.9%: We are 95% confident that 38.1% to 43.9% of the public believe young adults, rather than middle-aged or older adults, are having the toughest time in today s economy. 49% ± 4.4%: We are 95% confident that 44.6% to 53.4% of years olds have taken a job they didn t want just to pay the bills... Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

7 Parameter estimation We are often interested in population parameters. Since complete populations are difficult (or impossible) to collect data on, we use point estimates from samples to estimate parameters. Point estimates vary from sample to sample, and quantifying how they vary gives a way to estimate the margin of error associated with our point estimate. But before we get to quantifying the variability among samples, let s try to understand how and why point estimates vary from sample to sample. Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

8 Activity: Estimating number of exclusive relationships We would like to estimate the number of exclusive relationships stats students have been in, and we actually have the population data: N = number of exclusive relationships Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

9 Activity (cont.): Sampling scheme Sample, with replacement, ten students and record the number of exclusive relationships they reported. If you have your computer with you, use RStudio to generate 10 random numbers between 1 and 203. round(runif(n = 10, min = 1, max = 203)) If not, use 3 10 sided die, roll until you get ten numbers between 1 and 203. Find the sample mean and record it. If we randomly select observations from this data set, which values are most likely to be selected, which are least likely? Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

10 Activity (cont.): Population Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

11 Activity (cont.): Example > round(runif(n = 10, min = 1, max = 203)) [1] ( )/10 = 4.1 Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

12 Clicker question Click the appropriate letter for the mean your sample yielded. (a) 0 x 2 (b) 2 < x 3 (c) 3 < x 4 (d) 4 < x 5 (e) 5 < x < 7 Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

13 Sampling distribution What you just constructed is called a sampling distribution. What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

14 Number of Duke games attended Population: Frequency number of Duke games attended Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

15 Number of Duke games attended Sampling distribution, n = 10: Frequency sample means from samples of n = 10 Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

16 Number of Duke games attended Sampling distribution, n = 30: Frequency sample means from samples of n = 30 Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

17 Number of Duke games attended Sampling distribution, n = 70: Frequency sample means from samples of n = 70 Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16

18 Number of Duke games attended Clicker question The mean of the sampling distribution is 5.75, and the standard deviation of the sampling distribution (also called the standard error) is Which of the following is the most reasonable guess for the 95% confidence interval for the true average number of Duke games attended by stats students? (a) 5.75 ± 0.75 (b) 5.75 ± (c) 5.75 ± (d) cannot tell from the information given Statistics 101 (Mine Çetinkaya-Rundel) L8: Intro to. sampling distributions February 9, / 16