MNLM for Nominal Outcomes

Size: px
Start display at page:

Download "MNLM for Nominal Outcomes"

Transcription

1 MNLM for Nominal Outcomes Objectives Introduce the MNLM as an extension of the BLM Derive the model as a nonlinear probability model Illustrate the difficulties in interpretation due to the large number of parameters and comparisons Introduce graphical methods that make interpretation simpler Nominal LHS \ 1

2 (Rethinking) the BLM The BLM describes the relative probability of one outcome compared to a base or reference outcome For example, being in the labor force compared to being out of the labor force Nominal LHS \ 2

3 Think of the BLM as having two sets of β 's : One set is associated with y=1 compared to y=0 The other set is associated with y=0 compared to y=1 Only J-1 sets are estimated Nominal LHS \ 3

4 Binary Logit Model (new notation) ( y = x) ( y = x) ( y= A x) ( y= B x) ln Pr 1 = xβ Pr 0 Pr ln = xβ AB Pr For a model with three independent variables ( y= A x) ( y= B x) Pr ln = β AB + β ABx + β ABx + β ABx Pr 0, 1, 1 2, 2 3, 3 Nominal LHS \ 4

5 The probability that y=1 (or A) exp( xβ) exp( xβab ) Pr( y= 1 x) = Pr( y= A x) = 1+ exp( xβ) 1+ exp( xβ ) AB The probability that y=0 (or B) 1 1 Pr( y= 0 x) = Pr( y= B x) = 1+ exp( xβ) 1+ exp( xβab ) Question (for you) Which is the base (or reference) category? Nominal LHS \ 5

6 Three outcome categories Consider y with categories L, S, and P L-Labor S-Skilled P-Professional Nominal LHS \ 6

7 Think of this as three BLMs The effect of Ed on the odds of L versus S: ( L Ed ) ( S Ed ) ln Pr = β + β Pr 0, L S 1, L S Ed For S versus P: ( ) ( ) Pr S Ed ln Pr P Ed! = β + β Ed 0,S P 1,S P Nominal LHS \ 7

8 Question (for you) What about the remaining comparison? Nominal LHS \ 8

9 Redundancy Using the property ln( a/ b) = ln( a) ln( b ): ( L Ed ) ( P Ed ) Pr ln = lnpr lnpr + 0 Pr ( L Ed ) ( P Ed ) [ ] ( L Ed ) ( P Ed ) ( S Ed ) ( S Ed ) ( L Ed ) ( S Ed ) ( S Ed ) ( P Ed ) = lnpr lnpr + lnpr lnpr = lnpr lnpr + lnpr lnpr ( L Ed ) ( S Ed ) ( S Ed ) ( P Ed ) Pr Pr = ln + ln Pr Pr Nominal LHS \ 9

10 Thus, if we add equations 1 and 2, we get 3: ( ) ( ) ( ) ( ) ( ) ( ) Pr L Ed Pr S Ed Pr L Ed ln + ln = ln Pr S Ed Pr P Ed Pr P Ed Nominal LHS \ 10

11 Logical Relationship You can find a coefficient for any comparison from a pair of other coefficients: β = β + β L P L S S P β = β β L S L P S P β = β β S P L P L S Nominal LHS \ 11

12 But Why won't the results from separate BLMs match those from MNLM exactly? Nominal LHS \ 12

13 A Minimal Set of Coefficients For J outcomes, J-1 comparisons Different software might compute different minimal sets Question (for you) Is this a problem? Nominal LHS \ 13

14 The MNLM as a Probability Model Let y have J nominal outcomes numbered 1 through J ( = x) Pr y m is a function of x β mj Take the exponential to ensure that the probabilities are non-negative J Divide by exp( ) x iβ j to make the probabilities sum to 1 J j= 1 Which results in: exp x Pr = = ( y m x ) ( ) iβmj ( x β ) i i J j= 1 exp i j J Nominal LHS \ 14

15 Identification One of the β's is constrained to equal zero For example, ( y m x ) ( x ) iβmj ( x β ) exp Pr = = where β = 0 i i J j= 1 exp i j J 1 Can be written as: ( y x ) Pr i = 1 i = J = ( x ) 1 exp ( x β ) j 2 i j ( x β ) exp Pr y = m = for m> 1 i m i i J j= 2exp i j ( x β ) Nominal LHS \ 15

16 The Data 1982 General Social Survey A sample of 337 currently employed men Nominal LHS \ 16

17 Outcome Respondents were asked to indicate their occupation These occupations were recoded to correspond to Schmidt and Strauss (1975) in an early application of the MNLM Five occupation categories: Menial jobs Blue-collar jobs Craft jobs White-collar jobs Professional jobs Nominal LHS \ 17

18 Descriptive Information. usecda cda_nomocc2. codebook, compact Variable Obs Unique Mean Min Max Label occ Occupation white Race: 1=white 0=nonwhite ed Years of education exper Years of work experience Nominal LHS \ 18

19 . sum Variable Obs Mean Std. Dev. Min Max occ white ed exper tab occ Occupation Freq. Percent Cum Menial BlueCol Craft WhiteCol Prof Total Nominal LHS \ 19

20 Descriptive Table Name Mean StdDev Min Max Description OCC Occupation: M1=menial; B2=blue collar; C3=craft; W4=white collar; P5=professional WHITE Race: 1= white; 0=another race ED Education: Number of years of formal education EXP Possible years of work experience: Age minus years of education minus 5 Note: N=337. OCC has categories: M1=menial; B2=blue collar; C3=craft; W4=white collar; P5=professional with marginal percentages 9, 21, 25, 12, and 33, respectively. Nominal LHS \ 20

21 Estimating the MNLM. mlogit occ i.white ed exper, base(1) nolog Multinomial logistic regression Number of obs = 337 LR chi2(12) = Prob > chi2 = Log likelihood = Pseudo R2 = occ Coef. Std. Err. z P> z [95% Conf. Interval] Menial (base outcome) BlueCol 1.white ed exper _cons Craft 1.white ed exper _cons Nominal LHS \ 21

22 WhiteCol 1.white ed exper _cons Prof 1.white ed exper _cons Nominal LHS \ 22

23 Questions (for you) What does the coefficient for ed under Prof represent? Using this minimal set of coefficients, how would we calculate: the effect of education on the log-odds of having a Professional occupation compared to a Blue Collar occupation? the effect of white on the log-odds of having a White-Collar occupation compared to having a Professional occupation? the effect of experience on the log-odds of having a Blue Collar occupation compared to having a Craft occupation? Hint: β P W = β P M β W M Nominal LHS \ 23

24 Interpretation In even a simple MNLM there are a lot of parameters Too often, the MNLM is estimated, the parameters are listed, and statistical significance is noted, while the magnitudes and even directions of the effects are ignored We will consider: Factor change in the odds (odds ratio) Predicted probabilities Nominal LHS \ 24

25 Factor change coefficients For a model with three independent variables, ( x, ) β β x β x β x Ω mn x2 = e e e e 0, m n 1, m n 1 2, m n 2 3, m n 3 A change of one unit in x can be measured by the ratio of the odds: 2 ( x, 1) ( x, ) β0, m n β1, m nx1 β2, m nx2 β2, m n β3, m nx3 Ω mn x2+ e e e e e = = e β0, m n β1, m nx1 β2, m nx2 β3, m nx3 Ω x e e e e mn 2 β 2, m n Nominal LHS \ 25

26 Interpretation For a unit change in x, the odds are expected to change by a factor of k exp( ), holding all other variables constant β kmn, For a standard deviation change in x, the odds are expected to change k by a factor of exp( β s ), holding kmn, k Nominal LHS \ 26

27 Computing factor change. listcoef, help mlogit (N=337): Factor change in the odds of occ Variable: 1.white (sd=0.276) b z P> z e^b e^bstdx Menial vs BlueCol Menial vs Craft Menial vs WhiteCol Menial vs Prof BlueCol vs Menial BlueCol vs Craft BlueCol vs WhiteCol BlueCol vs Prof Craft vs Menial Craft vs BlueCol Craft vs WhiteCol Craft vs Prof WhiteCol vs Menial WhiteCol vs BlueCol WhiteCol vs Craft WhiteCol vs Prof Prof vs Menial Prof vs BlueCol Prof vs Craft Prof vs WhiteCol Nominal LHS \ 27

28 Variable: ed (sd=2.946) b z P> z e^b e^bstdx Menial vs BlueCol Menial vs Craft Menial vs WhiteCol Menial vs Prof BlueCol vs Menial BlueCol vs Craft BlueCol vs WhiteCol BlueCol vs Prof Craft vs Menial Craft vs BlueCol Craft vs WhiteCol Craft vs Prof WhiteCol vs Menial WhiteCol vs BlueCol WhiteCol vs Craft WhiteCol vs Prof Prof vs Menial Prof vs BlueCol Prof vs Craft Prof vs WhiteCol Nominal LHS \ 28

29 Variable: exper (sd=13.959) b z P> z e^b e^bstdx Menial vs BlueCol Menial vs Craft Menial vs WhiteCol Menial vs Prof BlueCol vs Menial BlueCol vs Craft BlueCol vs WhiteCol BlueCol vs Prof Craft vs Menial Craft vs BlueCol Craft vs WhiteCol Craft vs Prof WhiteCol vs Menial WhiteCol vs BlueCol WhiteCol vs Craft WhiteCol vs Prof Prof vs Menial Prof vs BlueCol Prof vs Craft Prof vs WhiteCol b = raw coefficient z = z-score for test of b=0 P> z = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bstdx = exp(b*sd of X) = change in odds for SD increase in X Nominal LHS \ 29

30 Question (for you) Were your coefficient calculations correct? For any pair of contrasts (e.g.,! β k,m n & β k,n m ): How are the b coefficients related? How are the e^b coefficients related? Nominal LHS \ 30

31 Computing all contrasts at a given p value and for one x. listcoef white, pval(.05) mlogit (N=337): Factor Change in the Odds of occ when P> z < 0.05 Variable: white (sd= ) Odds comparing Alternative 1 to Alternative 2 b z P> z e^b e^bstdx Menial -Prof Craft -Prof Prof -Menial Prof -Craft Question (for you) Is the variable white a significant predictor of occupational class? Nominal LHS \ 31

32 Predicted probabilities As before predict, mtable, mchange, mgen, and margins + mlincom can be used Nominal LHS \ 32

33 Something new A discrete change plot The steps: Run mlogit, for example: mlogit occ exper ed i.white Run mchange, for example: mchange, atmeans Nominal LHS \ 33

34 . quietly mlogit occ white ed exper, base(1). mchange mlogit: Changes in Pr(y) Number of obs = 337 Expression: Pr(occ), predict(outcome()) Menial BlueCol Craft WhiteCol Prof white 1 vs p-value ed +1 cntr p-value SD cntr p-value Marginal p-value exper +1 cntr p-value SD cntr p-value Marginal p-value Average predictions Menial BlueCol Craft WhiteCol Prof Pr(y base) Nominal LHS \ 34

35 Discrete Change Plot white 1 vs 0 C M W B P ed SD change B C M W P exper SD change B M WP C Marginal Effect on Outcome Probability Job: 1=Menial 2=BlCol 3=Craft 4=WhCol 5=Prof Nominal LHS \ 35

36 . mchangeplot 1.white ed exper, /// > note(job: M=Menial B=BlCol C=Craft W=WhCol P=Prof) Nominal LHS \ 36

37 Adding CI to the DC. margins, at(white=(0 1)) atmeans post Adjusted predictions Number of obs = 337 Model VCE : OIM Expression : Pr(occ==Menial), predict() 1._at : white = 0 ed = (mean) exper = (mean) 2._at : white = 1 ed = (mean) exper = (mean) Delta-method Margin Std. Err. z P> z [95% Conf. Interval] _at mlincom (2-1) lincom pvalue ll ul Nominal LHS \ 37

38 Getting the Odds Ratio out of the doghouse Discrete change does not indicate the dynamics among the dependent outcomes. For example, a decrease in education increases the probability of both blue collar and craft jobs, but, how does it affect the odds of a person choosing a craft job relative to a blue-collar job? To answer these questions, consider the factor change in the odds Nominal LHS \ 38

39 Nominal LHS \ 39

40 β B A ( β ) exp B A p-value x x x x Factor Change Scale Relative to Category A x1 B A x2 BA x3 A B x4 A B Logit Coefficient Scale Relative to Category A Nominal LHS \ 40

41 Factor Change Scale Relative to Category A x1 B A x2 BA x3 A B x4 A B Logit Coefficient Scale Relative to Category A Nominal LHS \ 41

42 Consider a hypothetical model with three outcomes: Logit Coefficient for Comparison x 1 x 2 x 3 B A β B A exp( β B A ) p C A β C A exp( β C A ) p C B β C B exp( β C B ) p Nominal LHS \ 42

43 Nominal LHS \ 43

44 Nominal LHS \ 44

45 . listcoef, help mlogit (N=337): Factor Change in the Odds of occ Variable: white (sd= ) Odds comparing Alternative 1 to Alternative 2 b z P> z e^b e^bstdx BlueCol -Craft BlueCol -WhiteCol BlueCol -Prof BlueCol -Menial Craft -BlueCol Craft -WhiteCol Craft -Prof Craft -Menial WhiteCol-BlueCol WhiteCol-Craft WhiteCol-Prof WhiteCol-Menial Prof -BlueCol Prof -Craft Prof -WhiteCol Prof -Menial Menial -BlueCol Menial -Craft Menial -WhiteCol Menial -Prof Nominal LHS \ 45

46 Variable: ed (sd= ) Odds comparing Alternative 1 to Alternative 2 b z P> z e^b e^bstdx BlueCol -Craft BlueCol -WhiteCol BlueCol -Prof BlueCol -Menial Craft -BlueCol Craft -WhiteCol Craft -Prof Craft -Menial WhiteCol-BlueCol WhiteCol-Craft WhiteCol-Prof WhiteCol-Menial Prof -BlueCol Prof -Craft Prof -WhiteCol Prof -Menial Menial -BlueCol Menial -Craft Menial -WhiteCol Menial -Prof Nominal LHS \ 46

47 Variable: exper (sd= ) Odds comparing Alternative 1 to Alternative 2 b z P> z e^b e^bstdx BlueCol -Craft BlueCol -WhiteCol BlueCol -Prof BlueCol -Menial Craft -BlueCol Craft -WhiteCol Craft -Prof Craft -Menial WhiteCol-BlueCol WhiteCol-Craft WhiteCol-Prof WhiteCol-Menial Prof -BlueCol Prof -Craft Prof -WhiteCol Prof -Menial Menial -BlueCol Menial -Craft Menial -WhiteCol Menial -Prof Nominal LHS \ 47

48 Odds Ratio Plot Nominal LHS \ 48

49 What do you see? Question (for you) Note the different ordering of categories for the different variables Would the OLM allow for this different ordering? Why or why not? Nominal LHS \ 49

50 Why predicted probabilities remain important While the factor change in the odds is constant across the levels of all variables, the discrete changes get larger or smaller at different values of the variables. E.g., if the odds increase by a factor of ten but the current odds are 1 in 10,000, then the substantive impact is small. Nominal LHS \ 50

51 Putting it all together Incorporate information about the discrete change in the probability by making the height of the letter in the odds ratio plot proportional to the square root of the DC. Odds Ratio Scale Relative to Category Prof white 1 vs 0 M_ C_ B W P ed SD increase B_ M_ C_ W_ P exper B_ P M_ W SD increase C Logit Coefficient Scale Relative to Category Prof Job: M=Menial B=BlColl C=Craft W=WhColl P=Prof Nominal LHS \ 51

52 Stata Code. //OR plot. mlogitplot, amount(sd) symbols(m B C W P) mcolor(rainbow) /// note(job: M=Menial B=BlColl C=Craft W=WhColl P=Prof) /// min(-3) max(0.5) gap(.5). graph export mnlm-02-orplot.emf, replace. //DC and OR combined mlogitplot, amount(sd) symbols(m B C W P) mcolor(rainbow) /// note(job: M=Menial B=BlColl C=Craft W=WhColl P=Prof) /// min(-3) max(0.5) gap(.5) meffect. graph export mnlm-03-dcorplot.emf, replace Nominal LHS \ 52

53 Testing that a Variable Has No Effect The hypothesis that x does not affect the dependent variable can be k written as: H : β = β = β = β = 0 0 kbm, kc, M kw, M kpm, Nominal LHS \ 53

54 LR test using lrtest. quietly mlogit occ i.white ed exper, base(1) nolog. estimates store base. quietly mlogit occ i.white exper, base(1) nolog. estimates store noed. lrtest base noed Likelihood-ratio test LR chi2(4) = (Assumption: noed nested in base) Prob > chi2 = Wald test using test. quietly mlogit occ i.white ed exper,base(1). test ed ( 1) [Menial]o.ed = 0 ( 2) [BlueCol]ed = 0 ( 3) [Craft]ed = 0 ( 4) [WhiteCol]ed = 0 ( 5) [Prof]ed = 0 Constraint 1 dropped chi2( 4) = Prob > chi2 = Nominal LHS \ 54

55 Either, using mlogtest. quietly mlogit occ white ed exper,base(1). mlogtest ed, lr wald **** Likelihood-ratio tests for independent variables (N=337) Ho: All coefficients associated with given variable(s) are 0. chi2 df P>chi ed **** Wald tests for independent variables (N=337) Ho: All coefficients associated with given variable(s) are 0. chi2 df P>chi ed Nominal LHS \ 55

56 Testing that outcome categories can be combined The hypothesis that P and W are indistinguishable is H : β = β = β = 0 0 1, PW 2, PW 3, PW Nominal LHS \ 56

57 A Wald test using mlogtest. mlogtest, combine **** Wald tests for combining outcome categories Ho: All coefficients except intercepts associated with given pair of outcomes are 0 (i.e., categories can be collapsed). Categories tested chi2 df P>chi Menial- BlueCol Menial- Craft Menial-WhiteCol Menial- Prof BlueCol- Craft BlueCol-WhiteCol BlueCol- Prof Craft-WhiteCol Craft- Prof WhiteCol- Prof Nominal LHS \ 57

58 A LR test using mlogtest. mlogtest, lrcom **** LR tests for combining outcome categories Ho: All coefficients except intercepts associated with given pair of outcomes are 0 (i.e., categories can be collapsed). Categories tested chi2 df P>chi Menial- BlueCol Menial- Craft Menial-WhiteCol Menial- Prof BlueCol- Craft BlueCol-WhiteCol BlueCol- Prof Craft-WhiteCol Craft- Prof WhiteCol- Prof Nominal LHS \ 58

59 Question (for you) Do you notice any logical inconsistencies? Nominal LHS \ 59

60 Specification Searches Given the complexities in interpreting the MNLM, it is tempting to search for a more parsimonious model constructed by excluding variables or combining outcome categories. Tests for combining categories and that all coefficients for a variable are zero can guide a specification search, but great care is required to avoid over-fitting or misfitting the model. Nominal LHS \ 60

61 Independence of Irrelevant Alternatives For a model with outcome categories M, N, and L In the MNLM, the odds of M compared to N do not depend on L ( x, 1) ( x, ) β0, m n β1, m nx1 β2, m nx2 β2, m n β3, m nx3 Ω mn x2+ e e e e e = = e β0, m n β1, m nx1 β2, m nx2 β3, m nx3 Ω x e e e e mn 2 β 2, m n In other words, outcome L is irrelevant to the comparison of M to N This property is called the independence of irrelevant alternatives (IIA) Nominal LHS \ 61

62 McFadden's Classic example of IIA A person has two choices: Pr( car) = 1 /2 and Pr( red bus) = 1 /2 Odds of taking the car versus the red bus are ( ) ( ) = = Pr car 1/2 1 Pr red bus 1 / 2 Nominal LHS \ 62

63 A new bus company opens with identical service to the red bus. IIA requires: Pr( car) = 1 / 3; Pr( red bus) = 1 / 3; Pr( blue bus) = 1 / 3 So that the original odds can be maintained: ( ) ( ) = 1 = Pr car 1/3 Pr red bus 1 / 3 Nominal LHS \ 63

64 But what makes sense is: Pr( car) = 1 /2; Pr( red bus) = 1 / 4; Pr( blue bus) = 1 / 4 But this violates IIA: ( ) ( ) = = Pr car 1/2 2 Pr red bus 1 / 4 Nominal LHS \ 64

65 This implies that MNLM should only be used in cases where the outcome categories can plausibly be assumed to be distinct Red bus and Blue bus can be viewed as "perfect substitutes" Care in specifying the model to involve distinct outcomes that are not substitutes for one another seems to be reasonable advice But, many reviewers like to see formal tests of IIA Nominal LHS \ 65

66 Formal tests of IIA Hausman-type test Comparison of two estimators of the same parameter One estimator is consistent and efficient if the null hypothesis is true The second estimator is consistent but inefficient. Question (for you) What would be a consistent but inefficient estimator? Nominal LHS \ 66

67 . set seed 112. mlogtest, iia **** Hausman tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence BlueCol for Ho Craft WhiteCol Prof Note: If chi2<0, the estimated model does not meet asymptotic assumptions of the test. **** suest-based Hausman tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence BlueCol for Ho Craft against Ho WhiteCol for Ho Prof for Ho Nominal LHS \ 67

68 **** Small-Hsiao tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted lnl(full) lnl(omit) chi2 df P>chi2 evidence BlueCol against Ho Craft against Ho WhiteCol for Ho Prof against Ho Nominal LHS \ 68

69 . set seed mlogtest, iia **** Hausman tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence BlueCol for Ho Craft WhiteCol Prof Note: If chi2<0, the estimated model does not meet asymptotic assumptions of the test. **** suest-based Hausman tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted chi2 df P>chi2 evidence BlueCol for Ho Craft against Ho WhiteCol for Ho Prof for Ho Nominal LHS \ 69

70 **** Small-Hsiao tests of IIA assumption (N=337) Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives. Omitted lnl(full) lnl(omit) chi2 df P>chi2 evidence BlueCol for Ho Craft for Ho WhiteCol for Ho Prof for Ho Nominal LHS \ 70

71 Case-specific vs. alternative-specific models Sometimes we want to model nominal outcomes as a function of decision-maker characteristics (e.g., education, experience, age as predictors of occupational class). These predictors are referred to as case-specific. Other times we want to model nominal outcomes as a function of alternative-specific characteristics (e.g., income, hours worked, number of years required for each occupational class). These predictors are referred to as alternative-specific. Nominal LHS \ 71

72 Conditional logit model The conditional logit model (CLM) uses alternative-specific data to model multiple nominal categories as alternatives to one another. Transportation alternatives: car, bus, bike, taxi Alternative-specific variables measure aspects of each different alternative. How long does it take to get to class with each alternative? How much does it cost to get to class with each alternative? The CLM estimates a single parameter for each variable that translates the value (or cost) of that alternative into a probability of choosing that alternative. Nominal LHS \ 72

73 CLM example Continuing the transportation example, imagine that we had three different transportation options. For one independent variable (time to class) and four observations, our data would look like this: Nominal LHS \ 73

74 MNLM vs. CLM In MNLM, coefficients for a variable differ for each outcome. Values for a variable are the same for a given variable (e.g., we have only one measure for each observation). In CLM, coefficients for a variable are the same for each outcome. Values for a variable differ for each outcome within the same observation. Nominal LHS \ 74

75 The transportation example: CLM vs. MNLM MNLM The effect of time differs for each mode of transport. The amount of time is the same for each mode of transport. CLM The effect of time is the same for each mode of transport. The amount of time differs for each mode of transport. Nominal LHS \ 75

76 . usecda cda_travel4. asclogit choice time, alt(mode) case(id) nolog Alternative-specific conditional logit Number of obs = 456 Case variable: id Number of cases = 152 Alternative variable: mode Alts per case: min = 3 avg = 3.0 max = 3 Wald chi2(1) = Log likelihood = Prob > chi2 = choice Coef. Std. Err. z P> z [95% Conf. Interval] mode time Train (base alternative) Bus _cons Car _cons Nominal LHS \ 76

77 Another example: Occupational attainment MNLM: Race, education and experience affect the odds of individuals have different occupations. For a given individual, the values of the regressors are the same for all outcomes (i.e., the value of race doesn t vary depending on which occupation we are examining). CLM The regressors are the costs and benefits of each occupation. For each observation, the present value of full-time employment is computed for each occupation. The effect of the present value is the same across all occupations, but the present value of holding that occupation differs by occupation. o For example, the present value of a professional occupation will exceed the present value of a menial occupation, thus making a professional occupation more likely, all else being equal. Nominal LHS \ 77

78 CLM and MNLM models reflect different aspects of the processes by which individuals choose or attain occupations, or choose models of transportation. Selection of the appropriate model should be driven by specification of the process you are interested in modeling. Nominal LHS \ 78

79 Also, note that both CLM & MNLM require IIA as a fundamental assumption (i.e., that the odds of being in any one category/choosing any one outcome) do not depend on the other outcomes. Nominal LHS \ 79

80 Other models for nominal outcomes Multinomial probit: Can allow for alternative- and case-specific predictors in the same model; IIA assumption can be relaxed. Requires modeling of variance structure (e.g., unstructured, exchangeable, etc.). Requires simulated ML estimates (or MCMC). o In Stata: mprobit; asmprobit o In R: mnp package (mlogit perhaps?) Nominal LHS \ 80

81 Nested logit: Allows for nested structure of outcomes (i.e., errors of certain outcomes to be correlated conditional on group membership). Can include alternative- and case-specific predictors; IIA assumption relaxed. Uses full ML. o In Stata: nlogit o In R: mlogit; mnp packages Nominal LHS \ 81

82 End MNLM Nominal LHS \ 82

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 These notes borrow heavily (sometimes verbatim) from Long &

More information

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments. Lab Guide Written by Trent Mize for ICPSRCDA14 [Last updated: 17 July 2017] 1. The Lab Guide is divided into sections corresponding to class lectures. Each section should be reviewed before starting the

More information

Appendix C: Lab Guide for Stata

Appendix C: Lab Guide for Stata Appendix C: Lab Guide for Stata 2011 1. The Lab Guide is divided into sections corresponding to class lectures. Each section includes both a review, which everyone should complete and an exercise, which

More information

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes

More information

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

More information

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

* STATA.OUTPUT -- Chapter 5

* STATA.OUTPUT -- Chapter 5 * STATA.OUTPUT -- Chapter 5.*bwt/confounder example.infile bwt smk gest using bwt.data.correlate (obs=754) bwt smk gest -------------+----- bwt 1.0000 smk -0.1381 1.0000 gest 0.3629 0.0000 1.0000.regress

More information

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017 Interpreting and Visualizing Regression models with Stata Margins and Marginsplot Boriana Pratt May 2017 Interpreting regression models Often regression results are presented in a table format, which makes

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Center for Demography and Ecology

Center for Demography and Ecology Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

(LDA lecture 4/15/08: Transition model for binary data. -- TL) (LDA lecture 4/5/08: Transition model for binary data -- TL) (updated 4/24/2008) log: G:\public_html\courses\LDA2008\Data\CTQ2log log type: text opened on: 5 Apr 2008, 2:27:54 *** read in data ******************************************************

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4137 Applied Micro Econometrics Date of exam: Thursday, May 31, 2018 Grades are given: June 15, 2018 Time for exam: 09.00 to 12.00 The problem set covers

More information

Tabulate and plot measures of association after restricted cubic spline models

Tabulate and plot measures of association after restricted cubic spline models Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group

More information

Interactions made easy

Interactions made easy Interactions made easy André Charlett Neville Q Verlander Health Protection Agency Centre for Infections Motivation Scientific staff within institute using Stata to fit many types of regression models

More information

Applying Regression Analysis

Applying Regression Analysis Applying Regression Analysis Jean-Philippe Gauvin Université de Montréal January 7 2016 Goals for Today What is regression? How do we do it? First hour: OLS Bivariate regression Multiple regression Interactions

More information

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre

More information

Working with Stata Inference on proportions

Working with Stata Inference on proportions Working with Stata Inference on proportions Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Outline Inference on one population proportion Principle of maximum

More information

Guideline on evaluating the impact of policies -Quantitative approach-

Guideline on evaluating the impact of policies -Quantitative approach- Guideline on evaluating the impact of policies -Quantitative approach- 1 2 3 1 The term treatment derives from the medical sciences and has more meaning when is used in that context. However, this term

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of

More information

Lecture-21: Discrete Choice Modeling-II

Lecture-21: Discrete Choice Modeling-II Lecture-21: Discrete Choice Modeling-II 1 In Today s Class Review Examples of maximum likelihood estimation Various model specifications Software demonstration Other variants of discrete choice models

More information

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

Categorical Data Analysis for Social Scientists

Categorical Data Analysis for Social Scientists Categorical Data Analysis for Social Scientists Brendan Halpin, Sociological Research Methods Cluster, Dept of Sociology, University of Limerick June 20-21 2016 Outline 1 Introduction 2 Logistic regression

More information

Longitudinal Data Analysis, p.12

Longitudinal Data Analysis, p.12 Biostatistics 140624 2011 EXAM STATA LOG ( NEEDED TO ANSWER EXAM QUESTIONS) Multiple Linear Regression, p2 Longitudinal Data Analysis, p12 Multiple Logistic Regression, p20 Ordered Logistic Regression,

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model

More information

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. It is critical that when you run your own analyses, you generate your own syntax. Both of these procedures have very

More information

Biostatistics 208 Data Exploration

Biostatistics 208 Data Exploration Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Be sure to read the Stata Manual s

More information

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below.

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below. 450-2 Winter 2008 Problem Set #1: SUGGESTED SOLUTIONS The results are attached below. 1. The balanced panel contains larger firms (sales 120-130% bigger than the full sample on average), which are more

More information

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260

More information

You can find the consultant s raw data here:

You can find the consultant s raw data here: Problem Set 1 Econ 475 Spring 2014 Arik Levinson, Georgetown University 1 [Travel Cost] A US city with a vibrant tourist industry has an industrial accident (a spill ) The mayor wants to sue the company

More information

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening r's age when 1st child born 2 4 6 Density.2.4.6.8 Density.5.1 Sociology 774: Regression Models for Categorical Data Instructor: Natasha Sarkisian Preliminary Data Screening A. Examining Univariate Normality

More information

Trunkierte Regression: simulierte Daten

Trunkierte Regression: simulierte Daten Trunkierte Regression: simulierte Daten * Datengenerierung set seed 26091952 set obs 48 obs was 0, now 48 gen age=_n+17 gen yhat=2000+200*(age-18) gen wage = yhat + 2000*invnorm(uniform()) replace wage=max(0,wage)

More information

Analyzing CHIS Data Using Stata

Analyzing CHIS Data Using Stata Analyzing CHIS Data Using Stata Christine Wells UCLA IDRE Statistical Consulting Group February 2014 Christine Wells Analyzing CHIS Data Using Stata 1/ 34 The variables bmi p: BMI povll2: Poverty level

More information

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output. Chapter 9 Stata v10.1 Analysis Examples Syntax and Output General Notes on Stata 10.1 Given that this tool is used throughout the ASDA textbook this chapter includes only the syntax and output for the

More information

Correlated Random Effects Panel Data Models

Correlated Random Effects Panel Data Models NONLINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Why Nonlinear Models? 2. CRE versus

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

Stata v 12 Illustration. One Way Analysis of Variance

Stata v 12 Illustration. One Way Analysis of Variance Stata v 12 Illustration Page 1. Preliminary Download anovaplot.. 2. Descriptives Graphs. 3. Descriptives Numerical 4. Assessment of Normality.. 5. Analysis of Variance Model Estimation.. 6. Tests of Equality

More information

Applied Econometrics

Applied Econometrics Applied Econometrics Lecture 3 Nathaniel Higgins ERS and JHU 20 September 2010 Outline of today s lecture Schedule and Due Dates Making OLS make sense Uncorrelated X s Correlated X s Omitted variable bias

More information

Lecture 2a: Model building I

Lecture 2a: Model building I Epidemiology/Biostats VHM 812/802 Course Winter 2015, Atlantic Veterinary College, PEI Javier Sanchez Lecture 2a: Model building I Index Page Predictors (X variables)...2 Categorical predictors...2 Indicator

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria Multilevel Mixed-Effects Generalized Linear Models in aaaa Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria SUMMARY - Theoretical Fundamentals of Multilevel Models. - Estimation of Multilevel Mixed-Effects

More information

Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa

Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa Presented by Kodjo Kondo PhD Candidate, UNE Business School Supervisors Emeritus Prof. Euan

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Univariate Statistics Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved Table of Contents PAGE Creating a Data File...3 1. Creating

More information

Producer Preferences and Characteristics in Biomass Supply Chains. Ira J. Altman Southern Illinois University-Carbondale

Producer Preferences and Characteristics in Biomass Supply Chains. Ira J. Altman Southern Illinois University-Carbondale Producer Preferences and Characteristics in Biomass Supply Chains Ira J. Altman Southern Illinois University-Carbondale Tom G. Johnson University of Missouri-Columbia Wanki Moon Southern Illinois University-Carbondale

More information

Survival analysis. Solutions to exercises

Survival analysis. Solutions to exercises Survival analysis Solutions to exercises Paul W. Dickman Summer School on Modern Methods in Biostatistics and Epidemiology Cison di Valmarino, Treviso, Italy June/July 2010 Exercise solutions 1 (a) The

More information

Midterm Exam. Friday the 29th of October, 2010

Midterm Exam. Friday the 29th of October, 2010 Midterm Exam Friday the 29th of October, 2010 Name: General Comments: This exam is closed book. However, you may use two pages, front and back, of notes and formulas. Write your answers on the exam sheets.

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

Factor Analysis and Structural Equation Modeling: Exploratory and Confirmatory Factor Analysis

Factor Analysis and Structural Equation Modeling: Exploratory and Confirmatory Factor Analysis Factor Analysis and Structural Equation Modeling: Exploratory and Confirmatory Factor Analysis Hun Myoung Park International University of Japan 1. Glance at an Example Suppose you have a mental model

More information

Milk Data Analysis. 1. Objective: analyzing protein milk data using STATA.

Milk Data Analysis. 1. Objective: analyzing protein milk data using STATA. 1. Objective: analyzing protein milk data using STATA. 2. Dataset: Protein milk data set (in the class website) Data description: Percentage protein content of milk samples at weekly intervals from each

More information

for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty

for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty for var trstprl trstlgl trstplc trstplt trstep: reg X trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty -> reg trstprl trust10 stfeco yrbrn hinctnt edulvl pltcare polint wrkprty Source SS df MS

More information

*STATA.OUTPUT -- Chapter 13

*STATA.OUTPUT -- Chapter 13 *STATA.OUTPUT -- Chapter 13.*small example of rank sum test.input x grp x grp 1. 4 1 2. 35 1 3. 21 1 4. 28 1 5. 66 1 6. 10 2 7. 42 2 8. 71 2 9. 77 2 10. 90 2 11. end.ranksum x, by(grp) porder Two-sample

More information

Notes on PS2

Notes on PS2 17.871 - Notes on PS2 Mike Sances MIT April 2, 2012 Mike Sances (MIT) 17.871 - Notes on PS2 April 2, 2012 1 / 9 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs

More information

Never Smokers Exposure Case Control Yes No

Never Smokers Exposure Case Control Yes No Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers

More information

DAY 2 Advanced comparison of methods of measurements

DAY 2 Advanced comparison of methods of measurements EVALUATION AND COMPARISON OF METHODS OF MEASUREMENTS DAY Advanced comparison of methods of measurements Niels Trolle Andersen and Mogens Erlandsen mogens@biostat.au.dk Department of Biostatistics DAY xtmixed:

More information

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508 PSC 508 Jim Battista Univ. at Buffalo, SUNY Dummies Dummy variables Sometimes we want to include categorical variables in our models Numerical variables that don t necessarily have any inherent order and

More information

********************************************************************************************** *******************************

********************************************************************************************** ******************************* 1 /* Workshop of impact evaluation MEASURE Evaluation-INSP, 2015*/ ********************************************************************************************** ******************************* DEMO: Propensity

More information

Timing Production Runs

Timing Production Runs Class 7 Categorical Factors with Two or More Levels 189 Timing Production Runs ProdTime.jmp An analysis has shown that the time required in minutes to complete a production run increases with the number

More information

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Chapter 2 Part 1B. Measures of Location. September 4, 2008 Chapter 2 Part 1B Measures of Location September 4, 2008 Class will meet in the Auditorium except for Tuesday, October 21 when we meet in 102a. Skill set you should have by the time we complete Chapter

More information

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous

More information

The Potential Determinants of German Firms Technical Efficiency: An Application Using Industry Level Data

The Potential Determinants of German Firms Technical Efficiency: An Application Using Industry Level Data The Potential Determinants of German Firms Technical Efficiency: An Application Using Industry Level Data by Oleg Badunenko and Andreas Stephan March, 2004 Abstract Stochastic Frontier Analysis is employed

More information

Introduction of STATA

Introduction of STATA Introduction of STATA News: There is an introductory course on STATA offered by CIS Description: Intro to STATA On Tue, Feb 13th from 4:00pm to 5:30pm in CIT 269 Seats left: 4 Windows, 7 Macintosh For

More information

Week 11: Collinearity

Week 11: Collinearity Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods Multilevel Cross-Classified and Multi-Membership Models Don Hedeker Division of Epidemiology & Biostatistics Institute for Health Research and Policy School of Public Health University of Illinois at Chicago

More information

. *increase the memory or there will problems. set memory 40m (40960k)

. *increase the memory or there will problems. set memory 40m (40960k) Exploratory Data Analysis on the Correlation Structure In longitudinal data analysis (and multi-level data analysis) we model two key components of the data: 1. Mean structure. Correlation structure (after

More information

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney Name: Intro to Statistics for the Social Sciences Lab Session: Spring, 2015, Dr. Suzanne Delaney CID Number: _ Homework #22 You have been hired as a statistical consultant by Donald who is a used car dealer

More information

rat cortex data: all 5 experiments Friday, June 15, :04:07 AM 1

rat cortex data: all 5 experiments Friday, June 15, :04:07 AM 1 rat cortex data: all 5 experiments Friday, June 15, 218 1:4:7 AM 1 Obs experiment stimulated notstimulated difference 1 1 689 657 32 2 1 656 623 33 3 1 668 652 16 4 1 66 654 6 5 1 679 658 21 6 1 663 646

More information

Journal of Asian Scientific Research

Journal of Asian Scientific Research Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 A METAFRONTIER PRODUCTION FUNCTION FOR ESTIMATION OF TECHNICAL EFFICIENCIES OF WHEAT FARMERS UNDER DIFFERENT

More information

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data Intro to Statistics for the Social Sciences Fall, 2017, Dr. Suzanne Delaney Extra Credit Assignment Instructions: You have been hired as a statistical consultant by Donald who is a used car dealer to help

More information

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset. Module 7: Multilevel Models for Binary Responses Most of the sections within this module have online quizzes for you to test your understanding. To find the quizzes: Pre-requisites Modules 1-6 Contents

More information

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Prof. Brent Coull TA Shira Mitchell BIO 226: Applied Longitudinal Analysis Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Purpose: To provide an introduction to the use of PROC MIXED

More information

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro. Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?

More information

3. Stated choice analyses

3. Stated choice analyses 3. Stated choice analyses 3.1 Stated and revealed preference data Revealed preference (RP) data: These data are based on actual decisions and choices in real-world situations, i.e. individuals reveal their

More information

X. Mixed Effects Analysis of Variance

X. Mixed Effects Analysis of Variance X. Mixed Effects Analysis of Variance Analysis of variance with multiple observations per patient These analyses are complicated by the fact that multiple observations on the same patient are correlated

More information

Analyzing Ordinal Data With Linear Models

Analyzing Ordinal Data With Linear Models Analyzing Ordinal Data With Linear Models Consequences of Ignoring Ordinality In statistical analysis numbers represent properties of the observed units. Measurement level: Which features of numbers correspond

More information

Survey commands in STATA

Survey commands in STATA Survey commands in STATA Carlo Azzarri DECRG Sample survey: Albania 2005 LSMS 4 strata (Central, Coastal, Mountain, Tirana) 455 Primary Sampling Units (PSU) 8 HHs by PSU * 455 = 3,640 HHs svy command:

More information

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

ROBUST ESTIMATION OF STANDARD ERRORS

ROBUST ESTIMATION OF STANDARD ERRORS ROBUST ESTIMATION OF STANDARD ERRORS -- log: Z:\LDA\DataLDA\sitka_Lab8.log log type: text opened on: 18 Feb 2004, 11:29:17. ****The observed mean responses in each of the 4 chambers; for 1988 and 1989.

More information

Chapter 5 Regression

Chapter 5 Regression Chapter 5 Regression Topics to be covered in this chapter: Regression Fitted Line Plots Residual Plots Regression The scatterplot below shows that there is a linear relationship between the percent x of

More information

Biophysical and Econometric Analysis of Adoption of Soil and Water Conservation Techniques in the Semi-Arid Region of Sidi Bouzid (Central Tunisia)

Biophysical and Econometric Analysis of Adoption of Soil and Water Conservation Techniques in the Semi-Arid Region of Sidi Bouzid (Central Tunisia) Biophysical and Econometric Analysis of Adoption of Soil and Water Conservation Techniques in the Semi-Arid Region of Sidi Bouzid (Central Tunisia) 5 th EUROSOIL INTERNATIONAL CONGRESS 17-22 July 2016,

More information

Econometric Analysis Dr. Sobel

Econometric Analysis Dr. Sobel Econometric Analysis Dr. Sobel Econometrics Session 1: 1. Building a data set Which software - usually best to use Microsoft Excel (XLS format) but CSV is also okay Variable names (first row only, 15 character

More information

Continuous Improvement Toolkit

Continuous Improvement Toolkit Continuous Improvement Toolkit Regression (Introduction) Managing Risk PDPC Pros and Cons Importance-Urgency Mapping RACI Matrix Stakeholders Analysis FMEA RAID Logs Break-even Analysis Cost -Benefit Analysis

More information

Experiment Outcome &Literature Review. Presented by Fang Liyu

Experiment Outcome &Literature Review. Presented by Fang Liyu Experiment Outcome &Literature Review Presented by Fang Liyu Experiment outcome 1. Data from JD Sample size: 1) Data contains 3325 products in 8 days 2) There are 2000-3000 missing values in each data

More information

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Statistics 101 Thomas Leininger June 18, 2013 Types of outliers in linear regression Types of outliers How do(es) the outlier(s) influence

More information

The Multivariate Regression Model

The Multivariate Regression Model The Multivariate Regression Model Example Determinants of College GPA Sample of 4 Freshman Collect data on College GPA (4.0 scale) Look at importance of ACT Consider the following model CGPA ACT i 0 i

More information

Mixed Mode Surveys in Business Research: A Natural Experiment. Dr Andrew Engeli March 14 th 2018

Mixed Mode Surveys in Business Research: A Natural Experiment. Dr Andrew Engeli March 14 th 2018 Mixed Mode Surveys in Business Research: A Natural Experiment Dr Andrew Engeli March 14 th 2018 Structure of todays presentation The general context The natural experiment Resources Conclusion Coverage

More information

Module 6 Case Studies in Longitudinal Data Analysis

Module 6 Case Studies in Longitudinal Data Analysis Module 6 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation SISCR 2018 July 24, 2018 Learning objectives This module will focus on the design of longitudinal

More information

Correlation and Simple. Linear Regression. Scenario. Defining Correlation

Correlation and Simple. Linear Regression. Scenario. Defining Correlation Linear Regression Scenario Let s imagine that we work in a real estate business and we re attempting to understand whether there s any association between the square footage of a house and it s final selling

More information