Stata version 12. Lab Session 2 April 2013

Size: px
Start display at page:

Download "Stata version 12. Lab Session 2 April 2013"

Transcription

1 Stata version 12 Lab Session 2 April Probability Calculations (p-values and such).. (a) Binomial. (b) Chi Square.. (c) F.. (d) Hypergeometric (Central).... (e) Normal (f) Poisson. (g) Student t 2. Categorical: Single 2x2 Table.. (a) Cohort Study Design (b) Case-Control Study Design. 3. Categorical: K 2x2 Tables... (a) Cohort Study Design (b) Case-Control Study Design.. (c) Graph of OR (95% CI) Over Stratification Variable 4. Categorical: Test of Trend.. (a) 2xC Table (b) RxC Table.. 5. Logistic Regression..... (a) Estimation and Hypothesis Tests (b) Graphical Assessments (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 1 of 25

2 1. Probability Calculations (p-values and such) Preliminary: Download the module probcalc This user-created module is used for the following distributions: binomial, poisson, and normal. Type the following in the command window.. ssc install probcalc (a) Binomial Distribution Binomial(n, pi): Probability of exactly k events, Pr[X = k] probcalc b ntrials pi exactly k. * Binomial(n=20, pi=.03) Prob[X=2]. probcalc b exactly 2 Distribution: Binomial n=20 p=.03 option:exactly x=2 P(X=2)= Binomial(n, pi): Probability of at most k events, Pr[X < k] probcalc b ntrials pi atmost k. * Binomial(n=20, pi=.03) Prob[X <= 2]. probcalc b atmost 2 Distribution: Binomial n=20 p=.03 option:atmost x=2 P(X=0)= P(X=1)= P(X=2)= pmf Method 1: P(X<=2)= cdf Method 2: P(X<=2)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 2 of 25

3 Binomial(n, pi): Probability of less than k events, Pr[X < k] probcalc b ntrials pi atmost k-1. * Binomial(n=20, pi=.03) Prob[X < 2]. probcalc b atmost 1 Distribution: Binomial n=20 p=.03 option:atmost x=1 P(X=0)= P(X=1)= pmf Method 1: P(X<=1)= cdf Method 2: P(X<=1)= Binomial(n, pi): Probability of at least k events, Pr[X > k] probcalc b ntrials pi atleast k. * Binomial(n=20, pi=.03) Prob[X >= 2]. probcalc b atleast 2 Distribution: Binomial n=20 p=.03 option:atleast x=2 P(X=2)= output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=2)= cdf Method 2: P(X>=2)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 3 of 25

4 Binomial(n, pi): Probability of more than k events, Pr[X > k] probcalc b ntrials pi atleast k+1. *. * Binomial(n=20, pi=.03) Prob[X > 2]. probcalc b atleast 3 Distribution: Binomial n=20 p=.03 option:atleast x=3 P(X=3)= output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=3)= cdf Method 2: P(X>=3)= (b) Chi Square Distribution Chi Square (degrees of freedom = df): Probability [Y < y ] is the same as Probability [Y < y ] display chi2(df,y). * Pr[Chi square df=2 <= 1.5]. display chi2(2,1.5) Chi Square (degrees of freedom = df): Probability [Y > y ] is the same as Probability [Y > y ] display chi2tail(df,y). * Pr[Chi square df=2 >= 1.5]. display chi2tail(2,1.5) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 4 of 25

5 Chi Square (degrees of freedom = df): Solution for pth quantile display invchi2(df,p). * Chi Square df=2: Solution for 95th percentile. display invchi2(2,.975) (c) F Distribution F (degrees of freedom = df1 and df2): Probability [Y < y ] is the same as Probability [Y < y ] display F(df1, df2,y). * Pr[F(df=2,6) < 2.3]. display F(2,6,2.3) F (degrees of freedom = df1 and df2): Probability [Y > y ] is the same as Probability [Y > y ] display Ftail(df1,df2,y). * Pr[F(df=2,6) > 2.3]. display Ftail(2,6,2.3) F (degrees of freedom = df1 and df2): Solution for pth quantile display invftail(df1,df2,1-p). * F with df=2,6: Solution for 95th percentile. display invftail(2,6,.05) * F with df=2,6: Solution for 5th percentile. display invftail(2,6,.95) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 5 of 25

6 (d) Hypergeometric Distribution (Central) Disease Exposure Yes No Yes a b n = total # exposed No c d K = total # with disease N=grand total Hypergeometric (N total, K disease, n exposed): Probability [Exactly a with exposure AND disease ] display hypergeometricp(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a=2]. display hypergeometricp(259,4,23,2) Hypergeometric (N total, K disease, n exposed): Probability [a or less with exposure AND disease ] display hypergeometric(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a<=2]. display hypergeometric(259,4,23,2) Hypergeometric (N total, K disease, n exposed): Probability [a or more with exposure AND disease ] Tips: (1) Use this for p-values; and (2) Note that that a needs to be reduced by 1. display 1 - hypergeometric(n,k,n,a-1). * Pr[Hypergeometric N=259, K=4, n=23, a>=2]. display 1-hypergeometric(259,4,23,1) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 6 of 25

7 (e) Normal Distribution(mean=mu, standard deviation=sigma) Normal(mu, sigma), between: Probability[a < X < b] is the same as Probability[a < X < b] probcalc n mu sigma between a b. * Pr[Normal(mu=100, sigma=15) is between 85 and 115. probcalc n between Distribution: Normal mean:100 s.d.:15 option:between x= cdf Method: P(85<=X<115)= Normal(mu, sigma), at most: Probability[ X < b] is the same as Probability[X < b] probcalc n mu sigma atmost b. * Pr[Normal(mu=100, sigma=15) is at most 115. probcalc n atmost 115 Distribution: Normal mean:100 s.d.:15 option:atmost x=115 cdf Method: P(X<=115)= Normal(mu, sigma), at least: Probability[ X > a] is the same as Probability[X > a] probcalc n mu sigma atleast a. * Pr[Normal(mu=100, sigma=15) is at least 85. probcalc n atleast 85 Distribution: Normal mean:100 s.d.:15 option:atleast x=85 cdf Method: P(X>=85)= Normal(mu, sigma): Solution for pth quantile display mu+sigma*invnormal(p). * Normal(mu=100, sigma=15): Solution for 95th percentile. display *invnormal(.95) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 7 of 25

8 (f) Poisson Distribution Poisson(mu): Probability of exactly k events, Pr[X = k] probcalc p mu exactly k. * Pr[Poisson(mu=1.8) = 6]. probcalc p 1.8 exactly 6 Distribution: Poisson mu=1.8 option:exactly x=6 P(X=6)= Poisson(mu): Probability of at most k events, Pr[X < k] probcalc p mu atmost k. * Pr[Poisson(mu=1.8) <= 6]. probcalc p 1.8 atmost 6 Distribution: Poisson mu=1.8 option:atmost x=6 P(X=0)= output omitted -- P(X=6)= pmf Method 1: P(X<=6)= cdf Method 2: P(X<=6)= Poisson(mu): Probability of less than k events, Pr[X < k] probcalc p mu atmost k-1. * Pr[Poisson(mu=1.8) < 6]. probcalc p 1.8 atmost 5 Distribution: Poisson mu=1.8 option:atmost x=5 P(X=0)= output omitted -- P(X=5)= pmf Method 1: P(X<=5)= cdf Method 2: P(X<=5)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 8 of 25

9 Poisson(mu): Probability of at least k events, Pr[X > k] probcalc p mu atleast k. * Pr[Poisson(mu=1.8) >= 6]. probcalc p 1.8 atleast 6 Distribution: Poisson mu=1.8 option:atleast x=6 P(X=0)= output omitted -- pmf Method 1: P(X>=6)= cdf Method 2: P(X>=6)= Poisson(mu): Probability of more than k events, Pr[X > k] probcalc p mu atleast k+1. * Pr[Poisson(mu=1.8) > 6]. probcalc p 1.8 atleast 7 Distribution: Poisson mu=1.8 option:atleast x=7 Note: For Poisson ''at least'' questions, the sum of the lower tail pmf's is subtracted from one. So only variates less than x are reported below. P(X=0)= output omitted -- P(X=6)= pmf Method 1: P(X>=7)= cdf Method 2: P(X>=7)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 9 of 25

10 (g) Student-t Distribution Student-t (degrees of freedom = df): Probability [T < t ] is the same as Probability [T < t ] display 1 - ttail(df,t). * Pr[Student-t(df=12) < 2.1]. display 1-ttail(12,2.1) Student-t (degrees of freedom = df): Probability [T > t ] is the same as Probability [T > t ] display ttail(df,t). * Pr[Student-t(df=12) > 2.1]. display ttail(12,2.1) Student-t (degrees of freedom = df): Solution for pth quantile display invttail(df,1-p). * Student-t(df=12): Solution for 97.5th percentile. display invttail(12,.025) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 10 of 25

11 2. Categorical: Single 2x2 Table Preliminary We will use the following data for both: (a) Cohort Study Design and (b) Case-Control Study design Disease (Lung Cancer) Exposure (Smoking) Yes No Yes 9 31 No 2 47 Create a data set called single2x2.dta and save it.. generate smoking=.. generate lungca=.. generate tally=. -- click on data editor to enter data. Close window. Data will not be lost! --. label define smokingf 1 "Smoker" 0 "Non-smoker". label values smoking smokingf. label define lungcaf 1 "Cancer" 0 "Healthy". label values lungca lungcaf. * Use command expand to create data set with individual records. expand tally (85 observations created). * Check data before saving. tab2 smoking lungca -> tabulation of smoking by lungca lungca smoking Healthy Cancer Total Non-smoker Smoker Total save "/Users/carolbigelow/Desktop/single2x2.dta" file /Users/carolbigelow/Desktop/single2x2.dta saved (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 11 of 25

12 (a) Cohort Study Design. * Estimation of Relative Risk (RR) and Fisher Exact Test. * cs DISEASEVARIABLE EXPOSUREVARIABLE, exact. cs lungca smoking, exact smoking Exposed Unexposed Total Cases Noncases Total Risk Point estimate [95% Conf. Interval] Risk difference Risk ratio Attr. frac. ex Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = * Estimation of Relative Risk (RR) and Fisher Exact Test IMMEDIATE VERSION. * csi DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. csi , exact Exposed Unexposed Total Cases Noncases Total Risk Point estimate [95% Conf. Interval] Risk difference Risk ratio Attr. frac. ex Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 12 of 25

13 (b) Case-Control Study Design. * Estimation of Odds Ratio (OR) and Fisher Exact Test. * cc DISEASEVARIABLE EXPOSUREVARIABLE, exact. cc lungca smoking, exact Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = * Estimation of Odds Ratio (OR) and Fisher Exact Test IMMEDIATE VERSION. * cci DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. cci , exact Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 13 of 25

14 3. Categorical: K 2x2 Tables Preliminary The data are provided for you in a data set called coffeemi_full.dta. It consists of K=4 2x2 tables. The exposure of interest is coffee (1=> cups/day, 0=<5 cups/day). The outcome is MI (1=yes, 0 = no). The stratifying variables is cigarettes (1=former smoker, 2=1-14 cigarettes/day, 3=35-44 cigarettes/day, 4=45+ cigarettes/day) Stratum 1: FORMER SMOKER Cups Coffee per day MI Control > < Stratum 2: 1-14 CIGARETTES/DAY Cups Coffee per day MI Control > < Stratum 3: CIGARETTES/DAY Cups Coffee per day MI Control > < Stratum 4: 45+ CIGARETTES/DAY Cups Coffee per day MI Control > < Input the data set coffeemi_full.dta and check.. clear. use " (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 14 of 25

15 (a) Cohort Study Design. * Check.. * table DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable) row column. table mi coffee, by(smoking) row col Stratum of Smoking and MI-Myocardial Cups of Coffee/Day Infarction Less 5+ cups Total Former Smoker Non-MI MI Total cigs/day Non-MI MI Total cigs/day Non-MI MI Total cigs/day Non-MI MI Total * Compact display of % experiencing DISEASE because MI is coded 1=disease 0=healthy. tabulate smoking coffee, summarize(mi) means Means of MI-Myocardial Infarction Stratum of Cups of Coffee/Day Smoking Less 5+ cups Total Former Sm cigs cig cigs/ Total (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 15 of 25

16 . * Stratified analysis of Relative Risk (Event=mi) w Exposure (Coffee). * cs DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cs mi coffee, by(smoking) Stratum of Smoki RR [95% Conf. Interval] M-H Weight Former Smoker cigs/day cigs/day cigs/day Crude M-H combined Test of homogeneity (M-H) chi2(3) = Pr>chi2 = (b) Case-Control Study Design. sort smoking. * Stratified analysis of Odds Ratio (Event=mi) w Exposure (Coffee). * cc DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cc mi coffee, by(smoking) Stratum of Smoki OR [95% Conf. Interval] M-H Weight Former Smoker (exact) 1-14 cigs/day (exact) cigs/day (exact) 45+ cigs/day (exact) Crude (exact) M-H combined Test of homogeneity (M-H) chi2(3) = Pr>chi2 = Test that combined OR = 1: Mantel-Haenszel chi2(1) = 1.65 Pr>chi2 = * Test of Trend in Odds Ratio (Event=mi, Exposure=coffee) over increasing strata. * tabodds DISEASEVARIABLE STRATUMVARIABLE, or. tabodds mi smoking, or smoking Odds Ratio chi2 P>chi2 [95% Conf. Interval] Former ~r ci~y c~y cig~y Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 16 of 25

17 (c) Graph of OR (95% CI) Over Stratification Variable. *. * Obtain stratum specific OR and 95% CI limits. * mhodds DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). mhodds mi coffee, by(smoking) Maximum likelihood estimate of the odds ratio Comparing coffee==1 vs. coffee==0 by smoking smoking Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] Former S cigs ci cigs Mantel-Haenszel estimate controlling for smoking Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] Test of homogeneity of ORs (approx): chi2(3) = Pr>chi2 = *. **** Create a new "little" data set containing the information to be plotted. clear. generate or=.. generate high=.. generate low=.. generate smoking=. (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 17 of 25

18 Click on the DATA EDITOR icon to enter the data. Close this window when done.. * graph twoway (scatter or STRATUMVARIABLE) (rcap low high STRATUMVARIABLE). graph twoway (scatter or smoking, msymbol(d)) (rcap low high smoking), yline(1,lwidth(thin) lpattern(dash) lcolor(black)) xlabel(0 "Overall" 1 "Former" 2 "1-4 cigs" 3 "35-44 cigs" 4 "45+ cigs", angle(45)) title("relative Odds Mycardial Infarction") subtitle("associated with High Coffee Consumption") ytitle("odds Ratio, 95% CI") legend(off) caption("mi_or.png", size(vsmall)) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 18 of 25

19 4. Categorical: Test of Trend Preliminary The data are provided for you in a data set called esophageal_cancer.dta. It is in tabular form so you will need to use the command expand. This data set has n=117 observations on cases (#=25) of esophageal cancer and controls (#=92). The exposure of interest is alcohol consumption (g/day) at 4 levels: (0-39g/day, g/day, g/day and 120+ g/day).. use " expand tally (224 observations created) (a) 2xC Table Use the command tabodds when you have case-control data (1=cases, 0=controls) and you have more than 2 levels of expsosure.. * Test of Trend for 2xC table. * tabodds CASEVARIABLE EXPOSUREVARIABLE. tabodds case alcohol alcohol cases controls odds [95% Conf. Interval] g g Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 19 of 25

20 (b) RxC Table The command tabodds will not work (you will get an error message) if the number of rows is more than 2. For testing trend in the more general setting of an RxC table, use the command nptrend.. * Test of Trend for RxC table must use command nptrend. sort alcohol. nptrend case, by(alcohol) alcohol score obs sum of ranks z = 4.57 Prob > z = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 20 of 25

21 5. Logistic Regression Preliminary The data are provided for you in a data set called depress_small.dta. This data set has n=294 observations, including 50 cases of depression (depressed=1). For this illustration, we consider 3 predictors: female gender (female=1), alcohol (drinker=1) and age.. clear. use " Descriptives and Creation of Indicator Variables. summarize Variable Obs Mean Std. Dev. Min Max sex age cases drink depressed * recode 1/2 variables to 0/1 variables. Note- I happen to know the codes already.. * SEX: males (1->0), females (2->1). recode sex (1=0) (2=1), generate(female). label variable female "female". * DRINK: Non-drinker (2->0). recode drink (2=0), generate(drinker). label variable drinker drinker. * AGE: Create 0/1 indicators for quartiles. centile age, centile( ) -- Binom. Interp. -- Variable Obs Percentile Centile [95% Conf. Interval] age * * * Lower (upper) confidence limit held at minimum (maximum) of sample (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 21 of 25

22 . recode age (18/28=1) (28/42.5=2) (42.5/59=3) (59/89=4), generate(age_quartile). generate age1828=(age_quartile==1). generate age2843=(age_quartile==2). generate age4359=(age_quartile==3). generate age5989=(age_quartile==4) (a) Estimation and Hypothesis Tests. * Fit model: Command logit yields betas. Command logistic yields Ors.. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] female drinker age age age _cons * Display predicted probabilities of event=depressed using command adjust w option pr. adjust, by(female) pr Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: drinker, age2843, age4359, age female pr males, on average, have an estimated pr[depression] = females, on average, have an estimated pr[depression] = adjust, by(female drinker) pr Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: age2843, age4359, age drinker female females who drink, on average, have an estimated pr[depression] = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 22 of 25

23 . * Wald test of age modeled using 3 0/1 indicators. test (age2843=0) (age4359=0) (age5989=0) ( 1) [depressed]age2843 = 0 ( 2) [depressed]age4359 = 0 ( 3) [depressed]age5989 = 0 chi2( 3) = 3.65 Prob > chi2 = Not significant. We can probably drop age. * LR test of age modeled using 3 0/1 indicators. * full model. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. * reduced model. quietly: logistic depressed female drinker. estimates store reduced. lrtest full reduced Likelihood-ratio test LR chi2(3) = 3.78 (Assumption: reduced nested in full) Prob > chi2 = * nice tabular display of fit of several models, showing betas and SE(beta). quietly: logistic depressed female. estimates store modelf. quietly: logistic depressed drinker. estimates store modeld. quietly: logistic depressed age2843 age4359 age5989. estimates store modelage. quietly: logistic depressed female drinker. estimates store reduced. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. estimates table modelf modeld modelage reduced full, b(%7.2f) se(%7.2f) stfmt(%7.4g) Variable modelf modeld model~e reduced full female drinker age age age _cons legend: b/se (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 23 of 25

24 (b) Regression Diagnostics. * Must first fit model to be evaluated. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] female drinker age age age _cons * Hosmer Lemeshow Goodness of Fit Test. estat gof, group(10) Logistic model for depressed, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) (There are only 9 distinct quantiles because of ties) number of observations = 294 number of groups = 9 Hosmer-Lemeshow chi2(7) = 2.45 Prob > chi2 = * Link Test for Omitted Predictors. linktest -- output omitted - - Logistic regression Number of obs = 294 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Coef. Std. Err. z P> z [95% Conf. Interval] _hat _hatsq Good. NS _cons (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 24 of 25

25 . * Identify observations with standardized residuals > 2. predict std_residual, res. label variable std_residual "Standardized Residual". * I don't have an id variable so I am creating one here for illustration purposes". generate id=_n. list id std_residual drinker female age if abs(std_residual)>2. * none > 2. Let's try 1 for illustration purposes.. list id std_residual drinker female age if abs(std_residual)> id std_re~l drinker female age rest of output omitted --. * Plot of Cook's Distance. predict cook, dbeta. label variable cook "Cook's Distance". scatter cook id, mlabel(id) msize(1) mlabsize(2) jitter(*10) title("plot of Cook's Distance") subtitle("by Study ID") caption("cooks.png") (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 25 of 25

Green-comments black-commands blue-output

Green-comments black-commands blue-output PubHlth 640 Spring 2011 Stata v10or 11 Categorical Data Analysis Page 1 of 13 From top menu bar - - Create a log of your session by clicking on FILE > LOG > BEGIN Format the log file as a stata log. At

More information

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression. 1. Start a log Read in a dataset...

BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression. 1. Start a log Read in a dataset... BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression 1. Start a log.... 2. Read in a dataset..... 3. Familiarize yourself with the data. 4. Create 1/2 Variables when you

More information

Never Smokers Exposure Case Control Yes No

Never Smokers Exposure Case Control Yes No Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers

More information

* STATA.OUTPUT -- Chapter 5

* STATA.OUTPUT -- Chapter 5 * STATA.OUTPUT -- Chapter 5.*bwt/confounder example.infile bwt smk gest using bwt.data.correlate (obs=754) bwt smk gest -------------+----- bwt 1.0000 smk -0.1381 1.0000 gest 0.3629 0.0000 1.0000.regress

More information

Soci Statistics for Sociologists

Soci Statistics for Sociologists University of North Carolina Chapel Hill Soci708-001 Statistics for Sociologists Fall 2009 Professor François Nielsen Stata Commands for Module 11 Multiple Regression For further information on any command

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

Working with Stata Inference on proportions

Working with Stata Inference on proportions Working with Stata Inference on proportions Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Outline Inference on one population proportion Principle of maximum

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments. Lab Guide Written by Trent Mize for ICPSRCDA14 [Last updated: 17 July 2017] 1. The Lab Guide is divided into sections corresponding to class lectures. Each section should be reviewed before starting the

More information

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

(LDA lecture 4/15/08: Transition model for binary data. -- TL) (LDA lecture 4/5/08: Transition model for binary data -- TL) (updated 4/24/2008) log: G:\public_html\courses\LDA2008\Data\CTQ2log log type: text opened on: 5 Apr 2008, 2:27:54 *** read in data ******************************************************

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes

More information

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (

More information

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List

More information

Tabulate and plot measures of association after restricted cubic spline models

Tabulate and plot measures of association after restricted cubic spline models Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group

More information

Ille-et-Vilaine case-control study

Ille-et-Vilaine case-control study Ille-et-Vilaine case-control study Cases: 200 males diagnosed in one of regional hospitals in French department of Ille-et-Vilaine (Brittany) between Jan 1972 and Apr 1974 Controls: Random sample of 778

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Be sure to read the Stata Manual s

More information

Guideline on evaluating the impact of policies -Quantitative approach-

Guideline on evaluating the impact of policies -Quantitative approach- Guideline on evaluating the impact of policies -Quantitative approach- 1 2 3 1 The term treatment derives from the medical sciences and has more meaning when is used in that context. However, this term

More information

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output. Chapter 9 Stata v10.1 Analysis Examples Syntax and Output General Notes on Stata 10.1 Given that this tool is used throughout the ASDA textbook this chapter includes only the syntax and output for the

More information

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

Analyzing CHIS Data Using Stata

Analyzing CHIS Data Using Stata Analyzing CHIS Data Using Stata Christine Wells UCLA IDRE Statistical Consulting Group February 2014 Christine Wells Analyzing CHIS Data Using Stata 1/ 34 The variables bmi p: BMI povll2: Poverty level

More information

Logistic Regression Part II. Spring 2013 Biostat

Logistic Regression Part II. Spring 2013 Biostat Logistic Regression Part II Spring 2013 Biostat 513 132 Q: What is the relationship between one (or more) exposure variables, E, and a binary disease or illness outcome, Y, while adjusting for potential

More information

Interactions made easy

Interactions made easy Interactions made easy André Charlett Neville Q Verlander Health Protection Agency Centre for Infections Motivation Scientific staff within institute using Stata to fit many types of regression models

More information

Longitudinal Data Analysis, p.12

Longitudinal Data Analysis, p.12 Biostatistics 140624 2011 EXAM STATA LOG ( NEEDED TO ANSWER EXAM QUESTIONS) Multiple Linear Regression, p2 Longitudinal Data Analysis, p12 Multiple Logistic Regression, p20 Ordered Logistic Regression,

More information

Appendix C: Lab Guide for Stata

Appendix C: Lab Guide for Stata Appendix C: Lab Guide for Stata 2011 1. The Lab Guide is divided into sections corresponding to class lectures. Each section includes both a review, which everyone should complete and an exercise, which

More information

Survival analysis. Solutions to exercises

Survival analysis. Solutions to exercises Survival analysis Solutions to exercises Paul W. Dickman Summer School on Modern Methods in Biostatistics and Epidemiology Cison di Valmarino, Treviso, Italy June/July 2010 Exercise solutions 1 (a) The

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 These notes borrow heavily (sometimes verbatim) from Long &

More information

Stata v 12 Illustration. One Way Analysis of Variance

Stata v 12 Illustration. One Way Analysis of Variance Stata v 12 Illustration Page 1. Preliminary Download anovaplot.. 2. Descriptives Graphs. 3. Descriptives Numerical 4. Assessment of Normality.. 5. Analysis of Variance Model Estimation.. 6. Tests of Equality

More information

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre

More information

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017 Interpreting and Visualizing Regression models with Stata Margins and Marginsplot Boriana Pratt May 2017 Interpreting regression models Often regression results are presented in a table format, which makes

More information

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved. AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Categorical Data Analysis for Social Scientists

Categorical Data Analysis for Social Scientists Categorical Data Analysis for Social Scientists Brendan Halpin, Sociological Research Methods Cluster, Dept of Sociology, University of Limerick June 20-21 2016 Outline 1 Introduction 2 Logistic regression

More information

*STATA.OUTPUT -- Chapter 13

*STATA.OUTPUT -- Chapter 13 *STATA.OUTPUT -- Chapter 13.*small example of rank sum test.input x grp x grp 1. 4 1 2. 35 1 3. 21 1 4. 28 1 5. 66 1 6. 10 2 7. 42 2 8. 71 2 9. 77 2 10. 90 2 11. end.ranksum x, by(grp) porder Two-sample

More information

MNLM for Nominal Outcomes

MNLM for Nominal Outcomes MNLM for Nominal Outcomes Objectives Introduce the MNLM as an extension of the BLM Derive the model as a nonlinear probability model Illustrate the difficulties in interpretation due to the large number

More information

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

Using Excel s Analysis ToolPak Add-In

Using Excel s Analysis ToolPak Add-In Using Excel s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful,

More information

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Chapter 2 Part 1B. Measures of Location. September 4, 2008 Chapter 2 Part 1B Measures of Location September 4, 2008 Class will meet in the Auditorium except for Tuesday, October 21 when we meet in 102a. Skill set you should have by the time we complete Chapter

More information

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous

More information

Basics of Stata language

Basics of Stata language Basics of Stata language Nicola Orsini, PhD Associate Professor of Medical Statistics Department of Public Health Sciences Karolinska Institutet 2018 Aims This course helps to get familiar with Stata language

More information

Lecture 2a: Model building I

Lecture 2a: Model building I Epidemiology/Biostats VHM 812/802 Course Winter 2015, Atlantic Veterinary College, PEI Javier Sanchez Lecture 2a: Model building I Index Page Predictors (X variables)...2 Categorical predictors...2 Indicator

More information

Center for Demography and Ecology

Center for Demography and Ecology Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01

More information

X. Mixed Effects Analysis of Variance

X. Mixed Effects Analysis of Variance X. Mixed Effects Analysis of Variance Analysis of variance with multiple observations per patient These analyses are complicated by the fact that multiple observations on the same patient are correlated

More information

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro. Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to

More information

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data Tutorial #7: LC Segmentation with Ratings-based Conjoint Data This tutorial shows how to use the Latent GOLD Choice program when the scale type of the dependent variable corresponds to a Rating as opposed

More information

********************************************************************************************** *******************************

********************************************************************************************** ******************************* 1 /* Workshop of impact evaluation MEASURE Evaluation-INSP, 2015*/ ********************************************************************************************** ******************************* DEMO: Propensity

More information

. *increase the memory or there will problems. set memory 40m (40960k)

. *increase the memory or there will problems. set memory 40m (40960k) Exploratory Data Analysis on the Correlation Structure In longitudinal data analysis (and multi-level data analysis) we model two key components of the data: 1. Mean structure. Correlation structure (after

More information

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. It is critical that when you run your own analyses, you generate your own syntax. Both of these procedures have very

More information

17.871: PS3 Key. Part I

17.871: PS3 Key. Part I 17.871: PS3 Key Part I. use "cces12.dta", clear. reg CC424 CC334A [aweight=v103] if CC334A!= 8 & CC424 < 6 // Need to remove values that do not fit on the linear scale. This entails discarding all respondents

More information

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260

More information

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh abc@glue.umd.edu, tap sas82, sas

More information

Examples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set

Examples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set Examples of Using Stata v110 with JRR replicate weights Provided in the NHANES 1999-2000 data set This document is designed to illustrate comparisons of methods to use JRR replicate weights sometimes provided

More information

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING INTRODUCTION JMP software provides introductory statistics in a package designed to let students visually explore data in an interactive way with

More information

CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA

CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA PLS 802 Spring 2018 Professor Jacoby CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA This handout shows the log from a Stata session that examines the Duncan Occupational Prestige data

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

THE GUIDE TO SPSS. David Le

THE GUIDE TO SPSS. David Le THE GUIDE TO SPSS David Le June 2013 1 Table of Contents Introduction... 3 How to Use this Guide... 3 Frequency Reports... 4 Key Definitions... 4 Example 1: Frequency report using a categorical variable

More information

PubHlth Introduction to Biostatistics. 1. Summarizing Data Illustration: STATA version 10 or 11. A Visit to Yellowstone National Park, USA

PubHlth Introduction to Biostatistics. 1. Summarizing Data Illustration: STATA version 10 or 11. A Visit to Yellowstone National Park, USA PubHlth 540 - Introduction to Biostatistics 1. Summarizing Data Illustration: Stata (version 10 or 11) A Visit to Yellowstone National Park, USA Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Trunkierte Regression: simulierte Daten

Trunkierte Regression: simulierte Daten Trunkierte Regression: simulierte Daten * Datengenerierung set seed 26091952 set obs 48 obs was 0, now 48 gen age=_n+17 gen yhat=2000+200*(age-18) gen wage = yhat + 2000*invnorm(uniform()) replace wage=max(0,wage)

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design Robert A. Vierkant, Terry M. Therneau, Jon L. Kosanke, James M. Naessens Mayo Clinic, Rochester, MN ABSTRACT A matched

More information

A Little Stata Session 1

A Little Stata Session 1 A Little Stata Session 1 Following is a very basic introduction to Stata. I highly recommend the tutorial available at: http://www.ats.ucla.edu/stat/stata/default.htm When you bring up Stata, you will

More information

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis What is a Logistic Regression Analysis? Logistic Regression (LR) is a type of statistical analysis that can be performed on employer data. LR is used to examine the effects

More information

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria Multilevel Mixed-Effects Generalized Linear Models in aaaa Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria SUMMARY - Theoretical Fundamentals of Multilevel Models. - Estimation of Multilevel Mixed-Effects

More information

Getting Started With PROC LOGISTIC

Getting Started With PROC LOGISTIC Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting

More information

RACE 616: Advance Analysis in Medical Research Jan 5 th Feb 7 th 2017

RACE 616: Advance Analysis in Medical Research Jan 5 th Feb 7 th 2017 RACE 616: Advance Analysis in Medical Research Jan 5 th Feb 7 th 2017 Ammarin Thakkinstian, Ph.D. Section for Clinical Epidemiology and Biostatistics (CEB) Email: ammarin.tha@mahidol.ac.th http://www.ceb-rama.org

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

DAY 2 Advanced comparison of methods of measurements

DAY 2 Advanced comparison of methods of measurements EVALUATION AND COMPARISON OF METHODS OF MEASUREMENTS DAY Advanced comparison of methods of measurements Niels Trolle Andersen and Mogens Erlandsen mogens@biostat.au.dk Department of Biostatistics DAY xtmixed:

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

Applying Regression Analysis

Applying Regression Analysis Applying Regression Analysis Jean-Philippe Gauvin Université de Montréal January 7 2016 Goals for Today What is regression? How do we do it? First hour: OLS Bivariate regression Multiple regression Interactions

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4137 Applied Micro Econometrics Date of exam: Thursday, May 31, 2018 Grades are given: June 15, 2018 Time for exam: 09.00 to 12.00 The problem set covers

More information

Biostatistics 208 Data Exploration

Biostatistics 208 Data Exploration Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment

More information

Module 202 Statistical methods in epidemiology. Study guide and Reader

Module 202 Statistical methods in epidemiology. Study guide and Reader Epidemiology Module 202 Statistical methods in epidemiology Study guide and Reader Sept 2011 v4.0 These study materials for the distance learning Epidemiology course have been prepared by the London School

More information

Basic Statistics, Sampling Error, and Confidence Intervals

Basic Statistics, Sampling Error, and Confidence Intervals 02-Warner-45165.qxd 8/13/2007 5:00 PM Page 41 CHAPTER 2 Introduction to SPSS Basic Statistics, Sampling Error, and Confidence Intervals 2.1 Introduction We will begin by examining the distribution of scores

More information

Introduction to Categorical Data Analysis Procedures (Chapter)

Introduction to Categorical Data Analysis Procedures (Chapter) SAS/STAT 12.1 User s Guide Introduction to Categorical Data Analysis Procedures (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 12.1 User s Guide. The correct bibliographic

More information

Telecommunications Churn Analysis Using Cox Regression

Telecommunications Churn Analysis Using Cox Regression Telecommunications Churn Analysis Using Cox Regression Introduction As part of its efforts to increase customer loyalty and reduce churn, a telecommunications company is interested in modeling the "time

More information

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures SAS/STAT 14.1 User s Guide Introduction to Categorical Data Analysis Procedures This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual

More information

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset. Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents

More information

Timing Production Runs

Timing Production Runs Class 7 Categorical Factors with Two or More Levels 189 Timing Production Runs ProdTime.jmp An analysis has shown that the time required in minutes to complete a production run increases with the number

More information

Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons Jacopo Lenzi

Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons Jacopo Lenzi Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons Jacopo Lenzi Italian Stata Users Group Meeting November 15, 2018 Bologna Introduction An overriding

More information

This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator.

This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator. Analysis Example-Stata 11.0 sgmediation Command with Survey Data Correction March 25, 2011 This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation

More information

SUDAAN Analysis Example Replication C6

SUDAAN Analysis Example Replication C6 SUDAAN Analysis Example Replication C6 * Sudaan Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 6 ; libname d "P:\ASDA 2\Data sets\nhanes 2011_2012\" ; ods graphics off

More information

The Multivariate Regression Model

The Multivariate Regression Model The Multivariate Regression Model Example Determinants of College GPA Sample of 4 Freshman Collect data on College GPA (4.0 scale) Look at importance of ACT Consider the following model CGPA ACT i 0 i

More information

Florida. Difference-in-Difference Models 8/23/2016

Florida. Difference-in-Difference Models 8/23/2016 Florida Difference-in-Difference Models Bill Evans Health Economics 8/25/1997, State of Florida settles out of court in their suits against tobacco manufacturers Awarded $13 billion over 25 years Use $200m

More information

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below.

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below. 450-2 Winter 2008 Problem Set #1: SUGGESTED SOLUTIONS The results are attached below. 1. The balanced panel contains larger firms (sales 120-130% bigger than the full sample on average), which are more

More information

CHAPTER FIVE CROSSTABS PROCEDURE

CHAPTER FIVE CROSSTABS PROCEDURE CHAPTER FIVE CROSSTABS PROCEDURE 5.0 Introduction This chapter focuses on how to compare groups when the outcome is categorical (nominal or ordinal) by using SPSS. The aim of the series of exercise is

More information

Mixed Mode Surveys in Business Research: A Natural Experiment. Dr Andrew Engeli March 14 th 2018

Mixed Mode Surveys in Business Research: A Natural Experiment. Dr Andrew Engeli March 14 th 2018 Mixed Mode Surveys in Business Research: A Natural Experiment Dr Andrew Engeli March 14 th 2018 Structure of todays presentation The general context The natural experiment Resources Conclusion Coverage

More information