Points of disagreement: Time series: typically used to denote measurements taken of one case at mul7ple 7mes,

Size: px
Start display at page:

Download "Points of disagreement: Time series: typically used to denote measurements taken of one case at mul7ple 7mes,"

Transcription

1 6/3/15 Panel Data Analysis Dr. Adalbert Wilhelm e- mail: university.de Visi>ng professor à l Università degli Studi di Cagliari, Giugno 2015 Session 1: Panel data - Defini>ons!!!! Points of disagreement: Time series: typically used to denote measurements taken of one case at mul7ple 7mes, " e.g. daily stock market price of Apple Inc. " Be=er name: Time series cross- sec7onal data (TSCS) Panel: In prac7ce, some7mes different people may be asked in each wave, but all respondents belong to the panel and hence comparability over 7me is ensured. Individuals typically represent a social subgroup and hence are considered to be exchangeable Cohort: Here, we really try to observe the same people at each wave. Shared experience defined at start. Over 7me some respondents might drop out. Panel data or Time Series Cross Sec>on data?!! " Narrow defini7on:! Panel: always the same cases! TSCS: varying cases Wide defini7on: " Used interchangeably " Repeated observa7ons on a (variable) set of units In the following, I will mostly go for the wide defini7on. The narrow defini7on is some7mes needed for specific analysis purposes and some sta7s7cal proper7es of es7mates and tests. 1

2 Panel data or Time Series Cross Sec>on data! Examples:! Economic performance (GDP) of n Countries over T years! Democracy level of n Countries over T years! Regime transi7on of n countries over T years! Share of educa7on budget in n districts over T years! Opinion of n persons surveyed over 5*T years! Vote share of governing coali7on in n countries over T elec7ons! Rebel violence against n peacekeeping opera7ons over T i years! Balanced and unbalanced panel data! Large n & small T (some7mes called a cross- sec)on panel, oxen found in microeconomics)! Small n & large T (some7mes called a )me- series panel, oxen found in macro- economics)! Medium n & medium T! Panel data or Time Series Cross Sec>on data! In Compara7ve Poli7cs! Mostly fixed and small n! Medium but expandable T! In economics! Large n, small T! Medium, but expandable n, small and growing T! Three issues! Temporal dependence: weak/strong! Heterogeneous units: omit variable bias! Heteroscedas7city: oxen variance varies over 7me or across units Panel data or Time Series Cross Sec>on data C. Adolph, U Washington 2

3 Panel data analysis! Why use panel data?! allows to control for variables you can not observe or measure (individual heterogeneity)! related to individuals/cases: e.g. cultural factors, differences in business prac7ces, state laws and regula7ons, (7me- invariant)! related to 7me: Zeitgeist, global policies (case- invariant)! more data which might make inference more precise! more informa7ve data! more variability! less collinearity among variables! more degrees of freedom! more efficiency! be=er able to study dynamics of adjustment! in par7cular with strict panel data! iden7fy and measure effects that are simply not detectable in pure cross- sec7ons or 7me- series data (in par7cular, if variables don t change over 7me)! allows to construct more complicated models! more accurate measurements Panel data analysis! Limita7ons of panel data?! design and data collec7on more complex and more expensive! distor7on of measurement errors! faulty responses etc.! inconsistencies over 7me! selec7vity problems! self- selec7vity! non- response! a=ri7on! short 7me- series dimension! asympto7cs typically will depend on number of cases, not on 7me span! a panel of 10 observa7ons from 1935 to 1954 " total number of observa)ons : 200 " number of different variables: 5 of which two are iden)fiers! firm: company under considera7on! year: the year of observa7on! inv: gross investment! value: value of the firm! capital: stock of plant and equipment " Source: Online complements to Baltagi (2001). h=p:// " References: " Baltagi, B. H. (2001) Econometric Analysis of Panel Data, 2nd ed., John Wiley and Sons. 3

4 Data formats for panel data case-period data set long format years: 200 rows, 5 columns : : : Data formats for panel data case-level data set wide format years: now 10 rows, 302 columns Data often stored in wide format; R mostly requires long format for analysis Data formats for panel data Using the R-package plm provides opportunity of panel data frame pdata.frame : : : years: now 200 rows, 3 columns plus rownames 4

5 1500 Heterogeneity across firms 1000 inv factor(firm) Heterogeneity across time inv factor(year) 5

6 Panel data: To pool or not to pool! general panel data linear model Individuals (country, group) time! too many parameters to be es7mated! pooling! borrowing strength across units in es7ma7ng parameters! imposing restric7ons on parameters! relatedness! being constant across units or 7me! trade- off between flexibility to cover different sources of heterogeneity and imposing communality to improve precision of es7mates Sta>s>cal models and pooling! All models are wrong. But some models are useful! George E.P. Box! Panel data offers a rich choice of modeling decisions, in par7cular, which parameters to pool and which to separate out 6

7 Pooling and par>al pooling! General panel data linear model Individuals (country, group) time! Pooled model (pooling over individuals and 7me)! Looks not too different from an ordinary least squares model (OLS)! In fact, OLS is an extremely simplifying variant of the pooled model! in contrast to OLS, panel models are variance- component models! Based on the pooled model we put further specifica7ons on the error terms! Most of panel data applica7ons use a one- way error component model for the disturbances, " either for 7me- invariant heterogeneity in individuals " or for case- invariant heterogeneity in 7me! these models are oxen called unobserved effects models! we con7nue the model deriva7on mostly with 7me- invariant heterogeneity in individuals! Unobserved effects model (separate error term for each individual) " models individual heterogeneity that is constant over 7me " Can be es7mated in two ways: as fixed effects or as random effects " Es7ma7on as fixed effects (within or least squares dummy variable) " Es7ma7on as random effects 7

8 " Es7ma7on as fixed effects " between " equivalent to OLS es7ma7on of 7me- averaged data " within or least squares dummy variable " different parametriza7ons " level " dmean " dfirst! a panel of 10 observa7ons from 1935 to 1954 " total number of observa)ons : 200 " number of different variables: 5 of which two are iden)fiers! firm: company under considera7on! year: the year of observa7on! inv: gross investment! value: real value of the firm (shares outstanding)! capital: real value of stock of plant and equipment " Source: Online complements to Baltagi (2001). h=p:// " References: " Baltagi, B. H. (2001) Econometric Analysis of Panel Data, 2nd ed., John Wiley and Sons.! General panel model 8

9 " Es7ma7on as random effects " can be tackled as a general least squares problem (GLS) resul7ng in " various feasible GLS es7mators are equivalent to OLS on par7ally demeaned data " special cases: " with 9

10 " Es7ma7on as random effects " can be tackled as a general least squares problem (GLS) resul7ng in " various feasible GLS es7mators are equivalent to OLS on par7ally demeaned data " special cases: " with " direct inversion of variance- covariance matrix not recommended " due to large dimensions " various numerical approaches " four of which implemented in R package plm " swar: from Swamy and Arora (1972), default " walhus: from Wallace and Hussain (1969) " amemiya: from Amemiya (1971) " nerlove: from Nerlove (1971) Random effects models for Grunfeld data 10

11 " Hausmann test to decide between fixed and random effects " null hypothesis: random effects " alterna7ve hypothesis: fixed effects " basically test whether unique errors are correlated with regressors " compares es7mates of both models " Hausmann test to decide between fixed and random effects " null hypothesis: random effects " alterna7ve hypothesis: fixed effects " basically test whether unique errors are correlated with regressors " compares es7mates of both models " however, Hausman test is debated and needs discussion prior to its use " fixed effects model imposes testable restric7ons on parameters: check validity of these restric7ons first " random effects model assumes exogeneity of all regressors with individual effects " fixed effects model allows for endogeneity of all regressors with individual effects " in fixed effects models all 7me- invariant characteris7cs of the individuals are collinear with en7ty dummies "...the crucial dis)nc)on between fixed and random effects is whether the unobserved individual effect embodies elements that are correlated with the regressors in the model, not whether these effects are stochas)c or not [Green, 2008, p.183] 11

12 Summary " Panel data structures " one- way error component models " unobservable individual effects model " fixed effects models " between " within " random effects models " Hausman test 12