BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression. 1. Start a log Read in a dataset...

Size: px
Start display at page:

Download "BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression. 1. Start a log Read in a dataset..."

Transcription

1 BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression 1. Start a log Read in a dataset Familiarize yourself with the data. 4. Create 1/2 Variables when you want to use command tab2. 5. Create 0/1 Variables when you want to use commands cc, cs.. 6. Fit a Logistic Regression Model Compare Models Side-by-Side Perform a Likelihood Ratio Test 9. Regression Diagnostics for Logistic Regression: Numerical.. a. Numerical Measures of Fit Using fitstat.... b. Test of Model Adequacy Using linktest. c. Test of Overall Goodness-of-Fit Using lfit Regression Diagnostics for Logistic Regression: Graphical. a. Plot of ROC Curve Using lroc.. b. Plot of Standardized Residuals versus Observation Number. c. Plot of Influential Observations Using Cook s Distances Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 1 of 23

2 1. Start a log Tip - Always keep a log of your stata session. Start a log of your session, taking care to save it as a.log file and not as a.smcl file Launch Stata. From the main menu at upper left: FILE > LOG > BEGIN > From file format drop down: log > At where: choose a folder you will remember! > At Save as: name your log 2. Read in a data set From the public course website page ASSIGNMENTS, download the data set illeetvilaine.dta. In Stata, open this data set. From the main menu at upper left: FILE > OPEN Browse the folders on your computer: choose illeetvilaine.dta 3. Familiarize yourself with the dataset Stata offers several commands for describing a dataset, including describe and codebook. From command window, use the help command to learn about describe and codebook and their various options. Next, play around with various choices to see which descriptions you like best!. describe Contains data from /Users/cbigelow/Desktop/1. Teaching/web640/datasets/illeetvilaine.dta obs: 975 vars: 7 21 Mar :25 size: 27, storage display value variable name type format label variable label case float %9.0g Case status (1=case, 0=control) age float %9.0g agegp float %9.0g agegp Age group tob float %9.0g Tobacco consumption gm/day tobgp float %9.0g tobgp Grouped tobacco consum. alc float %9.0g Alcohol consumption gm/day alcgp float %9.0g alcgp Grouped alcohol consum..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 2 of 23

3 . codebook, compact Variable Obs Unique Mean Min Max Label case Case status (1=case, 0=control) age agegp Age group tob Tobacco consumption gm/day tobgp Grouped tobacco consum. alc Alcohol consumption gm/day alcgp Grouped alcohol consum label list agegp: tobgp: gm/day alcgp: gm/day Ille-et-Vilaine Data: Illustration Suppose we are interested in the 2x2 table cross-classification of heavy smoking (30+ gm/day versus other) and case status (esophageal cancer case versus control): Disease (Esophageal Cancer) Exposure (Heavy Smoking) Yes No Yes (30+ gm/day) No Stata has commands cc and cs for epidemiological analyses of 2x2 tables. The layout of the output is nice. Cases are in row 1 (controls in row 2) and exposed are in column 1 (non-exposed in column 2)..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 3 of 23

4 4. Create 1/2 Variables when you want to use command tab2 Why the fuss? Answer Sometimes the arrangement of rows and columns in a 2x2 table are not what you expected nor want! tab2 Stata will order the rows and columns according to the numeric values of the row and column variable. For a 0/1 variable, row 1 will be the value 0 row. Row 2 will be the value 1 row. For a 1/2 variable, row 1 will be the value 1 row. Row 2 will be the value 2 row. Columns are ordered similarly. cc, cs Stata assumes that you are using 0/1 variables here with 1= event and 0=non-event Stata will order the rows and columns according to event, with event being the first row (or column) Thus, row 1 will be the value 1=event row. Row 2 will be the value 0=non-event row. Columns are ordered similarly. The variable tobgp has four values (1, 2, 3, and 4) representing increasing levels of smoking. Create a variable that you name exposure12 defined as follows: exposure12 = 1 if tobgp = 4 2 if tobgp = 1, 2, or 3 The variable case is a 0/1 variable denoting case status. Create a variable that you name case12 defined as follows: case12 = 1 if case=1 2 if case=0 Tip Always check your variable creation work. Eg. issue the command tab2 tobgp exposure12.. * Create "1/2" variables when you want to use command tab2. * 1/2 measure of heavy smoking (1=30+ gm/day versus 2=other). * Exposure will be heavy smoking defined as tobgp=4 (30+ gm/day). generate exposure12=tobgp. recode exposure12 (1=2) (2=2) (3=2) (4=1) (exposure12: 739 changes made). label define exposure12f 2 "other" 1 "heavy". label values exposure12 exposure12f. * "1/2" variable for case status (1=case versus 2=other). generate case12=case. recode case12 (0=2) (case12: 775 changes made). label define case12f 2 "control" 1 "case". label values case12 case12f.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 4 of 23

5 . * Check variable creations. tab2 tobgp exposure12 -> tabulation of tobgp by exposure12 Grouped tobacco exposure12 consum. heavy other Total gm/day Total * tab2 with 1/2 variables - more to your liking?. tab2 exposure12 case12 -> tabulation of exposure12 by case12 case12 exposure12 case control Total heavy other Total Nice. Heavy exposure is now row 1 and cases are now in column Create 0/1 Variables when you want to use commands cc,cs What is this about? The command cc is for case-control designs and cs is for cohort designs!. * Create "0/1" variables when you want to use commands cc, cs. * 0/1 measure of heavy smoking (1=30+ gm/day versus 0=other). * Exposure will be heavy smoking defined as tobgp=4 (30+ gm/day). generate exposure01=tobgp. recode exposure01 (1=0) (2=0) (3=0) (4=1) (exposure01: 975 changes made). label define exposure01f 0 "other" 1 "heavy". label values exposure01 exposure01f. * "0/1" variable for case status (1=case versus 0=other). * This already exists as the variable case.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 5 of 23

6 . * Check variable creations. tab2 tobgp exposure01 -> tabulation of tobgp by exposure01 Grouped tobacco exposure01 consum. other heavy Total gm/day Total * Use the command cc for case-control. Use the command cs for cohort.. cc case exposure01 Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop chi2(1) = Pr>chi2 = eststo, estout, esttab Stata has a set of commands for saving the results of fitting models (eststo) and then using the saved results to produce a side-by-side comparison of models (estout and esttab). To save (for later use) the results of fitting the current model, the command is: estout yourchoiceofname.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 6 of 23

7 6. Fit a Logistic Regression Model Ille-et-Vilaine Data: Illustration After creating another 3 new variables for illustration purposes, we will fit 4 logistic regression models. In #7, we ll then compare them. Model 1: Predictors = heavy drinking, age Model 2: Predictors = heavy smoking, age Model 3: Predictors = heavy drinking, heavy smoking, age Model 4: Predictors = heavy drinking, heavy smoking, drinking x smoking interaction, age 6a) Create 3 new variables: i) alcohol_80plus = 0/1 measure of alcohol use >= 80 gm/day, ii) smoking_30plus = 0/1 measure of tobacco use >=30 gm/day, iii) drinker_smoker = interaction of heavy drinking and heavy smoking * HEAVY DRINKER: Create alcohol_80plus = 0/1 measure of alcohol use >=80 gm/day.. generate alcohol_80plus=alcgp. recode alcohol_80plus (1=0) (2=0) (3=1) (4=1) (alcohol_80plus: 975 changes made). label define alcoholf 0 "< 80 gm/day" 1 "80+ gm/day". label values alcohol_80plus alcoholf. label variable alcohol_80plus "Heavy Drinker". * Check variable creation. tab2 alcgp alcohol_80plus -> tabulation of alcgp by alcohol_80plus Grouped alcohol Heavy Drinker consum. < 80 gm/d 80+ gm/da Total gm/day Total Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 7 of 23

8 . * HEAVY SMOKER: Create smoking_30plus = 0/1 measure of tobacco use >=30 gm/day.. generate smoking_30plus=tobgp. recode smoking_30plus (1=0) (2=0) (3=0) (4=1) (smoking_30plus: 975 changes made). label define smokingf 0 "< 30 gm/day" 1 "30+ gm/day". label values smoking_30plus smokingf. * Check variable creation. numlabel, add. tab2 tobgp smoking_30plus -> tabulation of tobgp by smoking_30plus Grouped tobacco smoking_30plus consum. 0. < 30 g gm Total gm/day Total * INTERACTION: Create drinker_smoker = interaction of heavy drinking and heavy smoking. generate drinker_smoker=alcohol_80plus*smoking_30plus. label variable drinker_smoker "Interaction alcohol*smoking" 6b) Model 1: Predictors = heavy drinking, age. After fit, issue the command: eststo model1. * MODEL 1. * Logistic Regression Heavy Drinking Alone - adjusted for age. logistic case alcohol_80plus i.agegp Logistic regression Number of obs = 975 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] alcohol_80plus agegp _cons * ESTSTO to save results for later comparison. eststo model1.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 8 of 23

9 6c) Model 2: Predictors = heavy smoking, age. After fit, issue the command: eststo model2.. * MODEL 2. * Logistic Regression Heavy Smoking Alone - adjusted for age. logistic case smoking_30plus i.agegp Logistic regression Number of obs = 975 LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] smoking_30plus agegp _cons * ESTSTO to save results for later comparison. eststo model2 6d) Model 3: Predictors = heavy drinking, heavy smoking, age. After fit, issue: eststo model3. * MODEL 3. * Logistic Regression Heavy Drinking and Heavy Smoking - adjusted for age. logistic case alcohol_80plus smoking_30plus i.agegp Logistic regression Number of obs = 975 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] alcohol_80plus smoking_30plus agegp _cons * ESTSTO to save results for later comparison. eststo model3.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 9 of 23

10 6e) Model 4: Predictors = heavy drinking, heavy smoking, drinking x smoking interaction, age. Issue: eststo model4.. * MODEL 4. * Logistic Regression Heavy Drinking and Heavy Smoking PLUS INTERACTION - adjusted. logistic case alcohol_80plus smoking_30plus i.agegp drinker_smoker Logistic regression Number of obs = 975 LR chi2(8) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] alcohol_80plus smoking_30plus agegp drinker_smoker _cons * ESTSTO to save results for later comparison. eststo model4.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 10 of 23

11 7. Compare Models Side-by-side estout, esttab Stata has 2 commands for produce a side-by-side comparison of models (estout and esttab). Use esttout for simple reporting. Use esttab for tests of significance. Because we have fit 4 models at this point, the basic commands are: estout model1 model2 model3 model4, option option option esttab model1 model2 model3 model4, option option option By default, Stata will give you a tabular summary of the betas. Of course, there are options you can give depending on what comparison of the 4 models you want to make. Here are some examples: estout Use this option,prehead( titleyouprovide ),eform If you want A title on your table Odds ratios esttab Use this option If you want,stats(n chi2 bic, star(chi2)) Chi square tests of Null: beta = 0,eform stats(n chi2 bic, star(chi2)) Odds ratios & chi square tests of Null: OR=1 7a) Simple: estout to display of regression coefficients (betas). Option prehead( ) to obtain a title. estout model1 model2 model3 model4, prehead("logistic Regression of Esophageal Cancer - BETA's") Logistic Regression of Esophageal Cancer - BETA's model1 model2 model3 model4 b b b b case alcohol_80~s b.agegp agegp agegp agegp agegp agegp smoking_30~s drinker_sm~r _cons Nice! We see the betas..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 11 of 23

12 7b) estout to display estimated odds ratios [ exp(beta) ]: use option eform. estout model1 model2 model3 model4, eform prehead("logistic Regression of Esophageal Cancer - ODDS RATIO's") Logistic Regression of Esophageal Cancer - ODDS RATIO's model1 model2 model3 model4 b b b b case alcohol_80~s This is 0/1 heavy drinker 1b.agegp This is the reference agegroup 2.agegp agegp agegp agegp agegp smoking_30~s This is 0/1 heavy smoker drinker_sm~r Interaction: heavy drinking x smoking _cons Intercept The option eform stands for exponentiated coefficients. Thus, these are the odds ratios..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 12 of 23

13 7c) esttab to display chi square tests of Null: beta=0. Use option stats( n chi2 bic, star(chi2) ). * BETAs with chi square statistics. esttab model1 model2 model3 model4, stats(n chi2 bic, star(chi2)) prehead("logistic Regression of Esophageal Cancer - BETA's") Logistic Regression of Esophageal Cancer - BETA's (1) (2) (3) (4) case case case case case alcohol_80~s 1.654*** 1.634*** 1.613*** (8.74) (8.49) (7.99) 1b.agegp (.) (.) (.) (.) 2.agegp (1.45) (1.72) (1.71) (1.73) 3.agegp 3.200** 3.648*** 3.500*** 3.533*** (3.13) (3.56) (3.37) (3.38) 4.agegp 3.706*** 4.177*** 4.029*** 4.063*** (3.64) (4.09) (3.90) (3.90) 5.agegp 3.966*** 4.412*** 4.394*** 4.425*** (3.88) (4.30) (4.22) (4.21) 6.agegp 3.959*** 4.085*** 4.269*** 4.306*** (3.72) (3.84) (3.95) (3.95) smoking_30~s 1.438*** 1.384*** 1.316*** (5.02) (4.50) (3.55) drinker_sm~r (0.33) _cons *** *** *** *** (-5.00) (-5.04) (-5.35) (-5.33) n chi *** 145.7*** 219.2*** 219.3*** bic t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001 This tabulation shows the betas. Underneath are the values of t-statistic = (beta/standard error).1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 13 of 23

14 7d) esttab to display OR, 95% CI, and chi square tests of Null: OR=1. Use option eform and stats( n chi2 bic, star(chi2) ). Do this for models 1-3 ONLY!. esttab model1 model2 model3, stats(n chi2 bic, star(chi2)) eform ci prehead("logistic Regression of Esophageal Cancer - ODDS RATIO's") Logistic Regression of Esophageal Cancer - ODDS RATIO's (1) (2) (3) case case case case alcohol_80~s 5.228*** 5.122*** [3.608,7.576] [3.512,7.469] 1b.agegp [1,1] [1,1] [1,1] 2.agegp [0.580,37.82] [0.777,50.54] [0.763,52.06] 3.agegp 24.54** 38.39*** 33.11*** [3.304,182.3] [5.162,285.5] [4.336,252.9] 4.agegp 40.70*** 65.17*** 56.19*** [5.529,299.5] [8.825,481.3] [7.409,426.1] 5.agegp 52.78*** 82.45*** 80.93*** [7.107,391.9] [11.04,616.0] [10.51,623.1] 6.agegp 52.42*** 59.45*** 71.42*** [6.503,422.6] [7.369,479.6] [8.588,593.9] smoking_30~s 4.211*** 3.990*** [2.403,7.382] [2.185,7.287] n chi *** 145.7*** 219.2*** bic Exponentiated coefficients; 95% confidence intervals in brackets * p<0.05, ** p<0.01, *** p< Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 14 of 23

15 8. Perform a Likelihood Ratio Test Perform a likelihood ratio test is performed to assess the stastistical significance of the interaction of heavy drinking and heavy smoking in the model, controlling for age and the main effects of each of heavy drinking and heavy smoking. Thus, Model reduced : Predictors = age, heavy drinking, heavy smoking Model full : Predictors = age, heavy drinking, heavy smoking + (drinking x smoking) To do this requires 5 commands i) fit the reduced model ii) save the results of the reduced model fit (e.g. eststo reduced) iii) fit the full model iv) save the results of the full model fit (e.g. eststo full) v) to perform the likelihood ratio test, issue the command: lrtest reducedname fullname. * Reduced model. logistic case i.agegp smoking_30plus alcohol_80plus Logistic regression Number of obs = 975 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = (-2) Log likelihood Reduced Model = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] agegp smoking_30plus alcohol_80plus _cons estimates store reduced.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 15 of 23

16 . * Full model. logistic case i.agegp smoking_30plus alcohol_80plus drinker_smoker Logistic regression Number of obs = 975 LR chi2(8) = Prob > chi2 = Log likelihood = Pseudo R2 = (-2) Log likelihood Full Model = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] agegp smoking_30plus alcohol_80plus drinker_smoker _cons estimates store full. lrtest reduced full Likelihood-ratio test LR chi2(1) = 0.11 (Assumption: reduced nested in full) Prob > chi2 = CHECK: [(-2) ln L reduced] [(-2)ln L full] = = match!.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 16 of 23

17 9. Regression Diagnostics for Logistic Regression: Numerical Preliminary Install the suite of commands in the package SPost. * Step 1: Install SPost Using net install. net install spost9_ado checking spost9_ado consistency and verifying not already installed... installing into /Users/cbigelow/Library/Application Support/Stata/ado/plus/... installation complete.. * Step 2: Now obtain all the ancillary files. net get spost9_do checking spost9_do consistency and verifying not already installed... copying into current directory... copying st9all.do copying st9ch2tutorial.do copying st9ch3estimate.do copying st9ch4binary.do copying st9ch5ordinal.do copying st9ch6nomcase.do copying st9ch7nomalt.do copying st9ch8count.do copying st9ch9other.do copying binlfp2.dta copying couart2.dta copying gsskidvalue2.dta copying nomocc2.dta copying ordwarm2.dta copying science2.dta copying sciwork.dta copying travel2.dta copying travel2case.dta copying wlsrnk.dta ancillary files successfully copied..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 17 of 23

18 Summary Now you have a model that is your candidate final model. There are lots of further explorations you can do to assess whether this really is a good final model. Ille-et-Vilaine Data: Illustration Having retained the null hypothesis in our likelihood ratio test of the interaction of heavy smoking and heavy drinking, our candidate final model contains: heavy drinking, heavy smoking, and age.. * Before requesting any diagnostics of a model, you must have fit it.. logistic case i.agegp smoking_30plus alcohol_80plus Logistic regression Number of obs = 975 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] agegp smoking_30plus alcohol_80plus _cons Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 18 of 23

19 . *. ***** 9a) Numerical measures of fit using command FITSTAT. fitstat Measures of Fit for logistic of case Log-Lik Intercept Only: Log-Lik Full Model: D(967): LR(7): Prob > LR: McFadden's R2: McFadden's Adj R2: ML (Cox-Snell) R2: Cragg-Uhler(Nagelkerke) R2: McKelvey & Zavoina's R2: Efron's R2: Variance of y*: Variance of error: Count R2: Adj Count R2: AIC: AIC*n: BIC: BIC': BIC used by Stata: AIC used by Stata: PARTIAL KEY: Log-Lik Intercept Only = : This is the log likelihood for the intercept only model Log-Lik Full Model = : This is the log likelihood for the current model LR(7) = is the likelihood ratio chi square statistic which tests whether the current model predicts better than the intercept only model Prob > LR =.0001: This is the p-value for the LR(7) test Then there are a series of pseudo-r 2 measures. Finally, there are a series of information criterion measures that are used to compare different models.. *. ***** 9b) Test of Model Adequacy Using command LINKTEST. linktest -- iteration output omitted -- Logistic regression Number of obs = 975 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = case Coef. Std. Err. z P> z [95% Conf. Interval] _hat _hatsq _cons WHAT TO LOOK FOR: We expect the p-value for _HAT to be highly significant. Evidence of a GOOD FIT is reflected in a NON-SIGNIFICANT _HATSQ Here the p-value for _HATSQ is.934 This suggests good model adequacy.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 19 of 23

20 . *. ***** 9c) Test of Overall Goodness of Fit Using command LFIT. lfit, group(10) table Logistic model for case, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) (There are only 9 distinct quantiles because of ties) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total number of observations = 975 number of groups = 9 Hosmer-Lemeshow chi2(7) = 4.43 Prob > chi2 = WHAT TO LOOK FOR: Evidence of a OVERALL GOODNESS OF FIT is reflected in a NON-SIGNIFICANT p-value Here the Hosmer-Lemeshow test p-value is.7291 This suggests good overall fit.1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 20 of 23

21 10. Regression Diagnostics for Logistic Regression: Graphical. ***** 10a) Plot of ROC Curve using LROC. predict xb, xb. lroc Logistic model for case number of observations = 975 area under ROC curve = WHAT TO LOOK FOR: Classification that is no better than a coin toss is reference in the 45 degree line Evidence of GOOD FIT is reflected in an ROC curve that lies above the 45 degree line reference Area under the ROC curve =.8119 says that 81% of the observations are correctly classified..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 21 of 23

22 . *. ***** 10b) Plot of Y=Standardized Residual versus X=Observation Number. predict std_residual, rs. label variable std_residual "Standardized Residual". generate index=_n. label variable index "Observation Number". graph twoway (scatter std_residual index,msymbol(d)), xlabel(0(100)1000) ylabel(-4(2)4) title("plot of Standardized Residuals versus Observation Number") xtitle("observation Number") ytitle("standardized Residual") yline(0) caption("stdresidual.png", size(vsmall)) WHAT TO LOOK FOR: Think of standardized residuals as Z-scores, approximately. We d like the majority to be within 1.96 of the expected value of 0 Values outside are potentially extreme..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 22 of 23

23 . *. ***** 10c) Plot of Influential Observations: Y=Cook versus X=Observation Number. predict cook, dbeta. label variable cook "Cook Distance". graph twoway (scatter cook index, msymbol(d)), xlabel(0(100)1000) title("plot of Cook Distance versus Observation Number") xtitle("observation Number") ytitle("cook Distance") caption("cook.png", size(vsmall)) WHAT TO LOOK FOR: Look for a even ribbon of cook distance values with no spikes..1. Teaching\stata\stata version 14\stata version 14 SPRING 2016\Stata Lab 2 Basics and Logistic Regression 2016.docx Page 23 of 23

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes

More information

Appendix C: Lab Guide for Stata

Appendix C: Lab Guide for Stata Appendix C: Lab Guide for Stata 2011 1. The Lab Guide is divided into sections corresponding to class lectures. Each section includes both a review, which everyone should complete and an exercise, which

More information

Green-comments black-commands blue-output

Green-comments black-commands blue-output PubHlth 640 Spring 2011 Stata v10or 11 Categorical Data Analysis Page 1 of 13 From top menu bar - - Create a log of your session by clicking on FILE > LOG > BEGIN Format the log file as a stata log. At

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

Stata version 12. Lab Session 2 April 2013

Stata version 12. Lab Session 2 April 2013 Stata version 12 Lab Session 2 April 2013 1. Probability Calculations (p-values and such).. (a) Binomial. (b) Chi Square.. (c) F.. (d) Hypergeometric (Central).... (e) Normal (f) Poisson. (g) Student t

More information

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of

More information

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 These notes borrow heavily (sometimes verbatim) from Long &

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Ille-et-Vilaine case-control study

Ille-et-Vilaine case-control study Ille-et-Vilaine case-control study Cases: 200 males diagnosed in one of regional hospitals in French department of Ille-et-Vilaine (Brittany) between Jan 1972 and Apr 1974 Controls: Random sample of 778

More information

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

* STATA.OUTPUT -- Chapter 5

* STATA.OUTPUT -- Chapter 5 * STATA.OUTPUT -- Chapter 5.*bwt/confounder example.infile bwt smk gest using bwt.data.correlate (obs=754) bwt smk gest -------------+----- bwt 1.0000 smk -0.1381 1.0000 gest 0.3629 0.0000 1.0000.regress

More information

Working with Stata Inference on proportions

Working with Stata Inference on proportions Working with Stata Inference on proportions Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Outline Inference on one population proportion Principle of maximum

More information

BIOSTATS 640 Spring 2017 Stata v14 Unit 2: Regression & Correlation. Stata version 14

BIOSTATS 640 Spring 2017 Stata v14 Unit 2: Regression & Correlation. Stata version 14 Stata version 14 Illustration Simple and Multiple Linear Regression February 2017 I- Simple Linear Regression.... 1. Introduction to Example... 2. Preliminaries: Descriptives.. 3. Model Fitting (Estimation)

More information

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments. Lab Guide Written by Trent Mize for ICPSRCDA14 [Last updated: 17 July 2017] 1. The Lab Guide is divided into sections corresponding to class lectures. Each section should be reviewed before starting the

More information

Never Smokers Exposure Case Control Yes No

Never Smokers Exposure Case Control Yes No Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers

More information

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

More information

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List

More information

Longitudinal Data Analysis, p.12

Longitudinal Data Analysis, p.12 Biostatistics 140624 2011 EXAM STATA LOG ( NEEDED TO ANSWER EXAM QUESTIONS) Multiple Linear Regression, p2 Longitudinal Data Analysis, p12 Multiple Logistic Regression, p20 Ordered Logistic Regression,

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Tabulate and plot measures of association after restricted cubic spline models

Tabulate and plot measures of association after restricted cubic spline models Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group

More information

Stata v 12 Illustration. One Way Analysis of Variance

Stata v 12 Illustration. One Way Analysis of Variance Stata v 12 Illustration Page 1. Preliminary Download anovaplot.. 2. Descriptives Graphs. 3. Descriptives Numerical 4. Assessment of Normality.. 5. Analysis of Variance Model Estimation.. 6. Tests of Equality

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Interactions made easy

Interactions made easy Interactions made easy André Charlett Neville Q Verlander Health Protection Agency Centre for Infections Motivation Scientific staff within institute using Stata to fit many types of regression models

More information

Soci Statistics for Sociologists

Soci Statistics for Sociologists University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 11 Multiple Regression For further information on any command

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

. *increase the memory or there will problems. set memory 40m (40960k)

. *increase the memory or there will problems. set memory 40m (40960k) Exploratory Data Analysis on the Correlation Structure In longitudinal data analysis (and multi-level data analysis) we model two key components of the data: 1. Mean structure. Correlation structure (after

More information

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

(LDA lecture 4/15/08: Transition model for binary data. -- TL) (LDA lecture 4/5/08: Transition model for binary data -- TL) (updated 4/24/2008) log: G:\public_html\courses\LDA2008\Data\CTQ2log log type: text opened on: 5 Apr 2008, 2:27:54 *** read in data ******************************************************

More information

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro. Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

Center for Demography and Ecology

Center for Demography and Ecology Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01

More information

CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA

CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA PLS 802 Spring 2018 Professor Jacoby CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA This handout shows the log from a Stata session that examines the Duncan Occupational Prestige data

More information

Guideline on evaluating the impact of policies -Quantitative approach-

Guideline on evaluating the impact of policies -Quantitative approach- Guideline on evaluating the impact of policies -Quantitative approach- 1 2 3 1 The term treatment derives from the medical sciences and has more meaning when is used in that context. However, this term

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

Getting Started With PROC LOGISTIC

Getting Started With PROC LOGISTIC Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (

More information

MNLM for Nominal Outcomes

MNLM for Nominal Outcomes MNLM for Nominal Outcomes Objectives Introduce the MNLM as an extension of the BLM Derive the model as a nonlinear probability model Illustrate the difficulties in interpretation due to the large number

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Chapter 2 Part 1B. Measures of Location. September 4, 2008 Chapter 2 Part 1B Measures of Location September 4, 2008 Class will meet in the Auditorium except for Tuesday, October 21 when we meet in 102a. Skill set you should have by the time we complete Chapter

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

Trunkierte Regression: simulierte Daten

Trunkierte Regression: simulierte Daten Trunkierte Regression: simulierte Daten * Datengenerierung set seed 26091952 set obs 48 obs was 0, now 48 gen age=_n+17 gen yhat=2000+200*(age-18) gen wage = yhat + 2000*invnorm(uniform()) replace wage=max(0,wage)

More information

Categorical Data Analysis for Social Scientists

Categorical Data Analysis for Social Scientists Categorical Data Analysis for Social Scientists Brendan Halpin, Sociological Research Methods Cluster, Dept of Sociology, University of Limerick June 20-21 2016 Outline 1 Introduction 2 Logistic regression

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved. AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...

More information

Mining for Gold gets easier and a lot more fun! By Ken Deal

Mining for Gold gets easier and a lot more fun! By Ken Deal Mining for Gold gets easier and a lot more fun! By Ken Deal Marketing researchers develop and use scales routinely. It seems to be a fairly common procedure when analyzing survey data to assume that a

More information

Survival analysis. Solutions to exercises

Survival analysis. Solutions to exercises Survival analysis Solutions to exercises Paul W. Dickman Summer School on Modern Methods in Biostatistics and Epidemiology Cison di Valmarino, Treviso, Italy June/July 2010 Exercise solutions 1 (a) The

More information

Logistic Regression Part II. Spring 2013 Biostat

Logistic Regression Part II. Spring 2013 Biostat Logistic Regression Part II Spring 2013 Biostat 513 132 Q: What is the relationship between one (or more) exposure variables, E, and a binary disease or illness outcome, Y, while adjusting for potential

More information

Analyzing CHIS Data Using Stata

Analyzing CHIS Data Using Stata Analyzing CHIS Data Using Stata Christine Wells UCLA IDRE Statistical Consulting Group February 2014 Christine Wells Analyzing CHIS Data Using Stata 1/ 34 The variables bmi p: BMI povll2: Poverty level

More information

17.871: PS3 Key. Part I

17.871: PS3 Key. Part I 17.871: PS3 Key Part I. use "cces12.dta", clear. reg CC424 CC334A [aweight=v103] if CC334A!= 8 & CC424 < 6 // Need to remove values that do not fit on the linear scale. This entails discarding all respondents

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Be sure to read the Stata Manual s

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Applying Regression Analysis

Applying Regression Analysis Applying Regression Analysis Jean-Philippe Gauvin Université de Montréal January 7 2016 Goals for Today What is regression? How do we do it? First hour: OLS Bivariate regression Multiple regression Interactions

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

Chapter 5 Regression

Chapter 5 Regression Chapter 5 Regression Topics to be covered in this chapter: Regression Fitted Line Plots Residual Plots Regression The scatterplot below shows that there is a linear relationship between the percent x of

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Lecture 2a: Model building I

Lecture 2a: Model building I Epidemiology/Biostats VHM 812/802 Course Winter 2015, Atlantic Veterinary College, PEI Javier Sanchez Lecture 2a: Model building I Index Page Predictors (X variables)...2 Categorical predictors...2 Indicator

More information

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017 Interpreting and Visualizing Regression models with Stata Margins and Marginsplot Boriana Pratt May 2017 Interpreting regression models Often regression results are presented in a table format, which makes

More information

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. It is critical that when you run your own analyses, you generate your own syntax. Both of these procedures have very

More information

Notes on PS2

Notes on PS2 17.871 - Notes on PS2 Mike Sances MIT April 2, 2012 Mike Sances (MIT) 17.871 - Notes on PS2 April 2, 2012 1 / 9 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs

More information

Midterm Exam. Friday the 29th of October, 2010

Midterm Exam. Friday the 29th of October, 2010 Midterm Exam Friday the 29th of October, 2010 Name: General Comments: This exam is closed book. However, you may use two pages, front and back, of notes and formulas. Write your answers on the exam sheets.

More information

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria Multilevel Mixed-Effects Generalized Linear Models in aaaa Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria SUMMARY - Theoretical Fundamentals of Multilevel Models. - Estimation of Multilevel Mixed-Effects

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre

More information

Exploring Functional Forms: NBA Shots. NBA Shots 2011: Success v. Distance. . bcuse nbashots11

Exploring Functional Forms: NBA Shots. NBA Shots 2011: Success v. Distance. . bcuse nbashots11 NBA Shots 2011: Success v. Distance. bcuse nbashots11 Contains data from http://fmwww.bc.edu/ec-p/data/wooldridge/nbashots11.dta obs: 199,119 vars: 15 25 Oct 2012 09:08 size: 24,690,756 ------------- storage

More information

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods Multilevel Cross-Classified and Multi-Membership Models Don Hedeker Division of Epidemiology & Biostatistics Institute for Health Research and Policy School of Public Health University of Illinois at Chicago

More information

Econometric Analysis Dr. Sobel

Econometric Analysis Dr. Sobel Econometric Analysis Dr. Sobel Econometrics Session 1: 1. Building a data set Which software - usually best to use Microsoft Excel (XLS format) but CSV is also okay Variable names (first row only, 15 character

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability

More information

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset. Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

FACTOR ANALYSIS OF ECONOMIC PERCEPTION ITEMS

FACTOR ANALYSIS OF ECONOMIC PERCEPTION ITEMS Measurement, Scaling, and Dimensional Analysis Bill Jacoby Summer 2017 FACTOR ANALYSIS OF ECONOMIC PERCEPTION ITEMS This handout shows the log for a STATA session that performs a factor analysis on citizens

More information

Compartmental Pharmacokinetic Analysis. Dr Julie Simpson

Compartmental Pharmacokinetic Analysis. Dr Julie Simpson Compartmental Pharmacokinetic Analysis Dr Julie Simpson Email: julieas@unimelb.edu.au BACKGROUND Describes how the drug concentration changes over time using physiological parameters. Gut compartment Absorption,

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT STAT 512 EXAM I STAT 512 Name (7 pts) Problem Points Score 1 40 2 25 3 28 USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT WRITE LEGIBLY. ANYTHING UNREADABLE WILL NOT BE GRADED GOOD LUCK!!!!

More information

Final Exam Spring Bread-and-Butter Edition

Final Exam Spring Bread-and-Butter Edition Final Exam Spring 1996 Bread-and-Butter Edition An advantage of the general linear model approach or the neoclassical approach used in Judd & McClelland (1989) is the ability to generate and test complex

More information

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

Milk Data Analysis. 1. Objective: analyzing protein milk data using STATA.

Milk Data Analysis. 1. Objective: analyzing protein milk data using STATA. 1. Objective: analyzing protein milk data using STATA. 2. Dataset: Protein milk data set (in the class website) Data description: Percentage protein content of milk samples at weekly intervals from each

More information

for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty

for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty -> reg trstprl trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty Source SS df MS

More information

ECON Introductory Econometrics Seminar 6

ECON Introductory Econometrics Seminar 6 ECON4150 - Introductory Econometrics Seminar 6 Stock and Watson EE10.1 April 28, 2015 Stock and Watson EE10.1 ECON4150 - Introductory Econometrics Seminar 6 April 28, 2015 1 / 21 Guns data set Some U.S.

More information

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4137 Applied Micro Econometrics Date of exam: Thursday, May 31, 2018 Grades are given: June 15, 2018 Time for exam: 09.00 to 12.00 The problem set covers

More information

Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,

More information

Using Mapmaker/QTL for QTL mapping

Using Mapmaker/QTL for QTL mapping Using Mapmaker/QTL for QTL mapping M. Maheswaran Tamil Nadu Agriculture University, Coimbatore Mapmaker/QTL overview A number of methods have been developed to map genes controlling quantitatively measured

More information

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260

More information

Week 11: Collinearity

Week 11: Collinearity Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other

More information

A Little Stata Session 1

A Little Stata Session 1 A Little Stata Session 1 Following is a very basic introduction to Stata. I highly recommend the tutorial available at: http://www.ats.ucla.edu/stat/stata/default.htm When you bring up Stata, you will

More information

log: F:\stata_parthenope_01.smcl opened on: 17 Mar 2012, 18:21:56

log: F:\stata_parthenope_01.smcl opened on: 17 Mar 2012, 18:21:56 log: F:\stata_parthenope_01.smcl opened on: 17 Mar 2012, 18:21:56 (20 cities >100k pop). de obs: 20 20 cities >100k pop vars: 13 size: 1,040 storage display value variable name type format label variable

More information

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data Tutorial #7: LC Segmentation with Ratings-based Conjoint Data This tutorial shows how to use the Latent GOLD Choice program when the scale type of the dependent variable corresponds to a Rating as opposed

More information

ECON Introductory Econometrics Seminar 9

ECON Introductory Econometrics Seminar 9 ECON4150 - Introductory Econometrics Seminar 9 Stock and Watson EE13.1 May 4, 2015 Stock and Watson EE13.1 ECON4150 - Introductory Econometrics Seminar 9 May 4, 2015 1 / 18 Empirical exercise E13.1: Data

More information