Notes on PS2

Size: px
Start display at page:

Download "Notes on PS2"

Transcription

1 Notes on PS2 Mike Sances MIT April 2, 2012 Mike Sances (MIT) Notes on PS2 April 2, / 9

2 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the coecient estimate for distance. Mike Sances (MIT) Notes on PS2 April 2, / 9

3 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the coecient estimate for distance. A one-foot increase in distance is associated with a 4.1 percentage point decrease in the success rate. Mike Sances (MIT) Notes on PS2 April 2, / 9

4 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the coecient estimate for distance. A one-foot increase in distance is associated with a 4.1 percentage point decrease in the success rate. A one-foot increase in distance is associated with a 4.1% decrease in the success rate. Mike Sances (MIT) Notes on PS2 April 2, / 9

5 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the coecient estimate for distance. A one-foot increase in distance is associated with a 4.1 percentage point decrease in the success rate. A one-foot increase in distance is associated with a 4.1% decrease in the success rate. No! 4.1% decrease in success rate means = Mike Sances (MIT) Notes on PS2 April 2, / 9

6 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the coecient estimate for distance. A one-foot increase in distance is associated with a 4.1 percentage point decrease in the success rate. A one-foot increase in distance is associated with a 4.1% decrease in the success rate. No! 4.1% decrease in success rate means = A one-unit increase in X k is associated with a ˆβ k change in Y. Mike Sances (MIT) Notes on PS2 April 2, / 9

7 Interpreting Regression: Condence Interval regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the condence interval. Mike Sances (MIT) Notes on PS2 April 2, / 9

8 Interpreting Regression: Condence Interval regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the condence interval. If we were to repeatedly sample from the population and run this regression in each sample, then our condence intervals will contain the true value of β in 95% of these samples. Mike Sances (MIT) Notes on PS2 April 2, / 9

9 Interpreting Regression: Condence Interval regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the condence interval. If we were to repeatedly sample from the population and run this regression in each sample, then our condence intervals will contain the true value of β in 95% of these samples. This gives us a measure of how uncertain we are about our estimate. Mike Sances (MIT) Notes on PS2 April 2, / 9

10 Interpreting Regression: Standard Error of Regression regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the Standard Error of Regression (SER, Stata calls it Root MSE). Mike Sances (MIT) Notes on PS2 April 2, / 9

11 Interpreting Regression: Standard Error of Regression regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the Standard Error of Regression (SER, Stata calls it Root MSE). On average, in sample predictions will be o the average in-sample mark by about Mike Sances (MIT) Notes on PS2 April 2, / 9

12 Interpreting Regression: Standard Error of Regression regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the Standard Error of Regression (SER, Stata calls it Root MSE). On average, in sample predictions will be o the average in-sample mark by about On average, in sample predictions will be o the average in-sample mark by about 9.2 percentage points. Mike Sances (MIT) Notes on PS2 April 2, / 9

13 Interpreting Regression: Standard Error of Regression regress success_rate dist Source SS df MS Number of obs = F( 1, 17) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = success_rate Coef. Std. Err. t P> t [95% Conf. Interval] dist _cons Interpret the Standard Error of Regression (SER, Stata calls it Root MSE). On average, in sample predictions will be o the average in-sample mark by about On average, in sample predictions will be o the average in-sample mark by about 9.2 percentage points. How good is the SER (9.2 percentage points) here? Mike Sances (MIT) Notes on PS2 April 2, / 9

14 Interpreting Regression: Standard Error of Regression Imagine you were playing golf and you found yourself 5 feet from the hole. Then the model tells us that your probability of success is ˆα + ˆβ 5 = = = 0.64 and the SER tells us that you can expect this prediction to be o the mark by Mike Sances (MIT) Notes on PS2 April 2, / 9

15 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Mike Sances (MIT) Notes on PS2 April 2, / 9

16 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Problem 1: Confounding Mike Sances (MIT) Notes on PS2 April 2, / 9

17 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Problem 1: Confounding Problem 2: Reverse Causation Mike Sances (MIT) Notes on PS2 April 2, / 9

18 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Problem 1: Confounding Problem 2: Reverse Causation Why do experiments overcome these two problems? Mike Sances (MIT) Notes on PS2 April 2, / 9

19 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Problem 1: Confounding Problem 2: Reverse Causation Why do experiments overcome these two problems? Random assignment to the treatment Mike Sances (MIT) Notes on PS2 April 2, / 9

20 Observational Studies vs Randomization In observational studies, what are the two main problems researchers face with internal validity? Problem 1: Confounding Problem 2: Reverse Causation Why do experiments overcome these two problems? Random assignment to the treatment Random sampling is not sucient. Why not? Mike Sances (MIT) Notes on PS2 April 2, / 9

21 Random Sampling vs Random Assignment Say we have a model like this: Y i = α + β 1 X 1i + β 2 X 2i + u i And we're interested in the relationship between X 1 and Y. However, we aren't able to observe X 2, which is itself correlated with both X 1 and Y. Mike Sances (MIT) Notes on PS2 April 2, / 9

22 Random Sampling vs Random Assignment Say we have a model like this: Y i = α + β 1 X 1i + β 2 X 2i + u i And we're interested in the relationship between X 1 and Y. However, we aren't able to observe X 2, which is itself correlated with both X 1 and Y. What type of variable is X 2? Mike Sances (MIT) Notes on PS2 April 2, / 9

23 Random Sampling vs Random Assignment Say we have a model like this: Y i = α + β 1 X 1i + β 2 X 2i + u i And we're interested in the relationship between X 1 and Y. However, we aren't able to observe X 2, which is itself correlated with both X 1 and Y. What type of variable is X 2? A confound, confounder, or omitted variable. Mike Sances (MIT) Notes on PS2 April 2, / 9

24 Random Sampling vs Random Assignment Say we have a model like this: Y i = α + β 1 X 1i + β 2 X 2i + u i And we're interested in the relationship between X 1 and Y. However, we aren't able to observe X 2, which is itself correlated with both X 1 and Y. What type of variable is X 2? A confound, confounder, or omitted variable. Confounding means we will not be able to estimate β 1 without bias, even with an innite and random sample. Mike Sances (MIT) Notes on PS2 April 2, / 9

25 Random Sampling vs Random Assignment In the next slide I show regression estimates for β 1 when randomly sampling from a model like this: Y i = α + β 1 X 1i + β 2 X 2i + u i I set β 2 = 2 and β 1 = 0 and allow for some small correlation between X 1 and X 2. The black line is the estimate of β 1 for each random sample. I increase the size of the sample by 1 for each sample, starting with a sample of 10 and going to a sample of The red line is the true value of β 1. Mike Sances (MIT) Notes on PS2 April 2, / 9

26 Random Sampling vs Random Assignment Regression estimate of Beta_ Sample Size Mike Sances (MIT) Notes on PS2 April 2, / 9