Training and productivity: evidence for US manufacturing industries

Training and productivity: evidence for US manufacturing industries By Facundo Sepúlveda Departamento de Economía, Universidad de Santiago de Chile. Alameda 3363. Estación Central. Santiago, RM. Chile. e-mail: facundo.sepulveda@fsp.cl Abstract We use a panel of two digit manufacturing industries to examine the role of formal training programs in productivity growth and wage growth. We find evidence for positive and decreasing effects of onthe-job training (OJT) in human capital accumulation, and therefore productivity. We find however weak evidence that OJT affects wage growth, suggesting that the firm appropriates most of the benefits from OJT programs. Off-the-job training on the other hand has no effects on industrial productivity or wages. JEL Classifications: D92,J24,L60 1 Introduction Skill acquisition through formal training programs is believed to have large positive effects on workers future wages. While this effect has been widely documented, there is much less evidence that training has any effects on firm or industry level performance outcomes, such as productivity or profits. Identifying such effects is important for two reasons. From an individual worker perspective, it is now clear that the evidence on the wage effects of training cannot be used to infer the extent of its productivity effects. Theory predicts that only under stringent conditions on both the general nature of 1

training (e.g. Becker, 1964), and the absence of labor market frictions (Acemoglu and Pischke, 1999), will both effects coincide. Identification of such effects needs therefore to be done through a primal (e.g. production function), rather than a factor price approach. From an aggregate perspective, a large branch of the literature on economic growth, beginning with Lucas (1988), identifies human capital (HC) accumulation -of which training is one channel- as the engine of growth, but so far the evidence is limited to HC accumulation through formal education (e.g. Barro and Lee, 1993). In this paper, we use a panel of US manufacturing industries that covers the period 1988 to 1998 to examine the effects of formal training programs on productivity growth and wage growth. We estimate the parameters of a production function in the spirit of Burnside et al (1995). In our model, human capital augmented labor is an input in production, and training affects labor -and therefore aggregate productivity- indirectly as an investment in the stock of human capital. Because we use a framework rooted in capital theory, we are able to characterize the technology that converts training into new human capital in a way that is directly relevant for the literature on endogenous growth spanned by Lucas (1988). At the same time, our results can be compared to the findings on the wage effects of training as reported in, for instance, Lowenstein and Spletzer (1998) and Barron et al (1989). There is a small literature based on US data that attempts to estimate directly the productivity effects of training, so far with limited success. Bartel (1994) uses a two year panel of US firms and finds that training has no discernible effects on labor productivity in the entire sample, and only for firms that implement new training programs a significant effect can be found. Black and Lynch (1995) and Black and Lynch (1996), also using a two year firm panel, are unable to find significant effects of their main training variables on firm sales. A few papers, among which Barron et al (1989), Bishop (1994), and Lowenstein and Spletzer (1998) find positive and significant effects using subjective measures of individual productivity, but given the difficulty of comparing such measures across workers and firms, it is unclear how to interpret these results. Work based on European data, where training is more prevalent, reports consistently positive effects. Schonewille (2001) use a small dataset of British industries to estimate a production function, and reports significant effects when using the variable hours in training, while Barrett and O Connell (1999), using data on a nationally representative sample of 1000 Irish firms, find a positive and significant point estimate of of the elasticity of labor productivity with respect to the variable training 2

days/total employment. In France, Carriou and Jeger (1997) find positive effects of training on value added using a large panel of firms, and Kramarz and Delame (1997) also find positive effects for some categories of workers, using a smaller dataset. 1 We see these papers as suffering from two types of problems. First, they do not address the issue that training is a choice variable for the firms, and therefore is endogenous to the determination of productivity growth. Second, they treat training as a stock of human capital, which is an input in the production function, instead of an investment in new skills. This paper is closer in scope and methods to Dearden et al (2006). In that study, the authors use a yearly panel of British industries to examine the effects of training incidence on value added per worker and wages. After instrumenting current training with (functions of) its past values, as proposed by Arellano and Bond (1991), the authors report that the rates of return for training are much higher, by as much as 6 percentage points, than previously estimated. This paper differs from Dearden et al (2006) in a number of dimensions: first, we use US data. Because of the different labor market institutions in the UK, it is unclear to what extent their results can be extrapolated to the US. Second, while their dataset is constructed from a survey that asks respondents about their training incidence in the previous month, we are able to construct a quarterly panel from aggregating detailed individual histories of training incidence and hours. This allows us, in particular, to disaggregate training into on-the-job (or in-house) training programs, which are likely to contain a stronger component of firm specific skills, and off-the-job training programs, which should comprise mostly training in general skills. Finally, while Dearden et al (2006) use training as a measure of the stock of HC, a simplifying assumption used in most of the literature, we correctly use it as a measure of investments in HC. Our findings can be summarized as follows: we find evidence of positive and decreasing effects of current on the job training on current productivity growth, with a large concavity parameter. Off the job training has no effects on productivity growth, and we are unable to identify robust effects of any training measures on wage growth. The next Section of the paper derives an estimable model, examines the 1 In a related paper, Ichniowski et al (1997) argue that the productivity effects of training should be studied along other complementary human resource practices, such as flexible job arrangements and employment security, but the NLSY contains little information about such practices. 3

identification problems that may arise, and proposes an estimation approach. Then, Section 3 describes the data used, Section 4 presents and discusses the results, and Section 5 concludes. 2 Model To examine the effects of training on productivity we begin with a general production function with human capital and two types of inputs. Y t = A t P (K t, H t L t, M t, E t ) (1) Where Y t is production, A t is a stochastic productivity shock, K t is capital, H t and L t stand for human capital and labor input respectively, M t represents materials, and E t is energy consumption. All variables are dated at time t. Because we have no data on {K t, M t, H t }, we need to impose more structure on the problem before P (.) can be estimated. This structure takes the form of restrictions on P (.) and on the law of motion for H. The restrictions on P (.) can be summarized as follows Restriction 1 P (.) = p(k t, M t, E t ) (H t L t ) α Restriction 2 p(.) = min{z(k t, M t ), E ψ } Restriction 1 says that HC enhanced labor is Cobb-Douglas separable, an assumption made implicit or explicitly in all studies that estimate a linearized production function (see e.g Basu, 1996, and references therein). Restriction 2 is a proportionality assumption between energy use and a function of capital and material inputs, and includes the case where electricity, materials and capital services are used in fixed proportions, as in Burnside et al (1995). We use the yearly KLEMS dataset, described in Jorgenson and Stiroh (2000), to test this restriction. To do this, we study a linear regression of E t use on K t, M t, time, and time interactions with the explanatory variables. The time interactions should be no different from zero if Restriction 2 holds. Table 1 displays the results. In column 1, the model is estimated using data from 1958 to 1996. We obtain clear evidence that electricity and capital are substitutes in the long run, evidenced by the significant coefficient on the year capital interaction. We may wish that in the shorter time span covered by our data (1987 to 1998) there would be less scope for substitution, so in columns 2 and 3 we run the regressions using data from 1987 to 1996. The 4

model in column 2, without the time variables, show a clearly defined function z(.). We then add the time interactions in column 3, and no interactions are statistically significant. Note that the coefficient on the year variable is immaterial here, as it would be picked up by time dummies in the main regressions. Since we cannot completely rule out substituability between electricity and capital, it is worth discussing its potential effects on our estimates. Given the short time span of our data, we are mostly concerned about substitution along the business cycle. In expansions, the relative price of capital p K pe increases with respect to its level in recessions, which promotes the use of energy saving capital goods. In this case, K t would increase (decrease) more in expansions (recessions) that would be measured by the changes in E t. The variation in K t not picked up by E t will most probably be assigned, in a regression, to the labor input which, just as capital, is strongly procyclical. The effects on the estimates of training should be opposite, as training is countercyclical, so substitution between K and E will bias the training estimates towards zero. 2 Because we do not observe HC, but we observe investments in HC in the form of training, we assume that the technology to produce new HC is homogeneous of degree one in the previous period stock of HC. Restriction 3 H t = H t 1 f( T R t, exp t, δ t ) Where T R t is the vector {T R t, T R t 1,...}, and its inclusion reflects the fact that we do not know what lags of T R are arguments of the function f, exp t is a variable measuring experience, and δ t represents the rate of depreciation of human capital. We can approximate the rate of growth of HC as h t = log H t log H t 1. Using the function in restriction 3, this implies: h t = log f( T R t, exp t, δ t ). (2) For the empirical implementation, we take f to be differentiable and obtain a Taylor expansion of expression (2) around a value of { T R 0, exp 0, δ 0 }. While the appropriate term of the expansion and the number of lags in T R t is an 2 The potential bias created by another variable not in our data, R&D, is less clear. While opportunity cost considerations induce some R&D to be countercyclical, other factors such as limited property rights tend to make it procyclical. See Ouyang (2007) and Barlevy (2007) for a discussion. 5

empirical issue, an expansion quadratic in T R and exp with T R t = T R t will turn out to be sufficient. In this case, h t specializes to h t = log f + f T R f (T R t T R 0 ) + f T R2f f 2 T R 2f 2 (T R t T R 0 ) 2 + f exp f (exp t exp 0 ) + f exp2f f 2 exp 2f 2 (exp t exp 0 ) 2 + f δ f (δ t δ 0 ) (3) Constant + θ 1 T R t + θ 2 T Rt 2 + θ 3 exp t + θ 4 exp 2 t + θ 5 δ t (4) Where, in (3), f and its derivatives are evaluated around {T R 0, exp 0, δ 0 }. In this paper, we will be concerned with testing a number of hypotheses about the function f. To begin with, we are interested in testing the following hypotheses: Hypothesis 1 Hypothesis 2 f( T R t, exp t, δ t ) = f( OJT, OF F JT, exp t, δ t ) f( T R t, exp t, δ t ) = f(t R t, exp t, δ t ) Where OJT is on-the-job training and OF F JT is off-the job-training. Hypothesis 1 says that OJT, which can be associated with firm-specific training, and OFFJT, which represents investments in general human capital, have differentiated effects on productivity. The results in the literature point to insignificant effects of training programs that can be associated to firm-specific skills, but significant effects of some training programs associated to general skills. Barrett and O Connell (1999) addresses this particular question, and finds that training in general skills has positive productivity effects, while training in firm specific skills has insignificant effects. In Black and Lynch (1995), the only training variable that helps explain log(f irm sales) is the percentage of formal training outside working hours. We believe that the results on OJT are puzzling and unsatisfactory, in that measures of firm specific training should have large and positive productivity effects. This type of training programs impose large costs on firms (see Training, 2002, for a measure of direct costs), so one would expect that they have large expected benefits. Hypothesis 2 says that only current levels of training have an effect on human capital accumulation. Models of HC accumulation in continuous time (e.g. Lucas, 1988) do share this feature, but in most models in discrete time the stock of HC follows H t = g(h t 1, T R t 1 ), implying a one period lag 6

between learning and HC accumulation. An example of g without learning delays is g(.) = (1 δ)h t 1 +T R γ 1 t, used by DeJong and Ingram (2001) when studying the cyclical properties of human capital accumulation. An example with delays is g(.) = δh t 1 + θt R t 1 Kt 1H α t 1 1 α, used by de Gregorio (1996) (eq. (3)) when studying the interactions between borrowing constraints and human capital driven growth. While this timing question may have limited importance for long run growth analysis, it probably does affect the business cycles properties of HC accumulation. We are also interested in testing whether the function f is increasing and concave in training, as assumed in most of the literature. Hypothesis 3 f is increasing (f T R f T R (f T R 2 2 f < 0). T R 2 > 0) and concave In hypothesis 3, we aim to quantify the marginal effects, as well as the degree of decreasing returns to training programs. Note that because we will use the approximation in (3) to the function f, we will only be able to identify normalized first and second derivatives: f T R f and f T R 2. f An estimable linear model can be derived from eq. (1) if we impose restrictions 1 to 3. Taking logarithms on both sides of eq. (1), using restrictions 1 and 3, and time differencing yields y t = ψ e t + α h t + α l t + a t = ψ e t + α(constant + θ 1 T R + θ 2 T R 2 + θ 3 exp t + θ 4 exp 2 t + θ 5 δ t ) +α l t + a t β 0 + β 1 e t + β 2 T R t + β 3 T R 2 t + β 4 exp t + β 5 exp 2 t +β 6 δ t + β 7 l t + a t (5) where x t log X t log X t 1 is the log difference operator, and we use eq. (4) in the second step. Equation (5) can be estimated with the data at hand. We turn now to the properties of the productivity shock A t. We assume that it follows A i,t+1 = (1 + g i ) t A ρ i,tµ i ɛ i,t+1. (6) Where ɛ t is an i.i.d. random variable, ρ < 1 is an autocorrelation factor, g i is a sector specific growth rate, and µ i is a sector specific time invariant effect on productivity. This structure is general enough to allow for a discussion of the identification problems that arise in the model. An expression for a t 7

can be found from (6) and using the approximation log(1 + g) g. Using this error structure we can write the complete model, which we call M1. [M1] y t = β 0 + β 1 e t + β 2 T R t + β 3 T R 2 t + β 4 exp t + β 5 exp 2 t +β 6 δ t + β 7 l t + a t (7) a t = g i + ρ a t 1 + ɛ t (8) Where eq. (7) is just a restatement of eq. (5) and eq. (8) is the corresponding error structure. Two types of identification problems need to be addressed in the context of model M1. The first is that the error term may display an industryspecific time invariant effect g i that is correlated with the regressors. In this case M1 has the fixed effects structure, and OLS on (7) is inconsistent. The second issue is that all the regressors are arguably endogenous, since they are chosen by firms after observing the productivity shock. In particular, there is evidence that training is countercyclical (Sepulveda, 2002), so that firms assign their workers to training programs in times of low productivity. That labor supply and electricity use are highly procyclical has been widely documented. 3 In this paper, we use past levels of the explanatory variables as instruments. Valid instruments need to be correlated with the endogenous variables and uncorrelated with the error. The first condition can be tested empirically, and is discussed in Section 4. Whether the instruments satisfy the second condition has to be defended on a priori grounds. Formally, this condition can be written E[ε t X t j ] = 0, j > 0 (9) Where ε t is the error term of the estimating equation, and X t is the vector of explanatory variables. In model M1 for instance, ε t = a t and X t = { e t, l t, T R t, exp t, δ t }. When instrumental variable estimation is based on 3 Another concern is that a t displays serial correlation, since a t and a t 1 share the term ɛ t 1, and in addition if ρ 0. We have a strong prior of high autocorrelation (ρ (0, 1)) in the stochastic productivity shock, following the literature on the measurement of the Solow residual. OLS or Fixed Effects estimation would still be consistent in this case, but we need to estimate standard errors that are robust to general forms of serial correlation. In this paper we use a Huber-White estimator of the covariance matrix, where we allow for arbitrary within industry autocorrelation. In some cases however, the number of industry groups was smaller than the number of instruments, and the estimates could not be made robust to within group autocorrelation. 8

using predetermined variables as instruments, the identifying restriction is that the the error is not serially correlated at lags j (0 < j < ) or higher (see, e.g. Bond, 2002). This rules out an autoregressive term in the error structure, but still allows a finite moving average component. In the error term (eq. (6)), we have assumed an autoregressive term of order one, as well as an industry specific growth rate of productivity g i. These imply that that the error structure of model M1 (eq. (8)) is composed of an industry specific time invariant effect (g i ), and AR(1) term ρ a t 1, and a MA(1) term ɛ t ɛ t 1, so condition (9) is violated both through the AR component and the (possible) fixed effect. 4 To address the problem of autocorrelation in the errors, we add ρ y t 1 to both sides of (7) 5. Rearranging, we obtain a model that is dynamically complete: [M2] y t = (β 0 (1 ρ)) + ρ y t 1 + β 1 e t ρβ 1 e t 1 +β 2 T R t ρβ 2 T R t 1 + β 3 T Rt 2 ρβ 3 T Rt 1 2 +β 4 exp t ρβ 4 exp t 1 + β 5 exp 2 t ρβ 5 exp 2 t 1 +β 6 δ t ρβ 6 δ t 1 + β 7 l t ρβ 7 l t 1 +( a t ρ a t 1 ) (10) a t ρ a t 1 = g i + ɛ t (11) The structure of this model is similar to that studied by Blundell and Bond (2000). In model M2, the explanatory variables are not correlated with the error term lagged 3 or more periods, once a fixed effects transformation is applied to model M2, so X t j, for j=3,4..., are valid instruments. We will also make use of two instruments used by Burnside et al (1995), following Hall (1988): the growth rate of oil prices and the growth rate of money supply, as these variables are arguably exogenous to industry variations in the use of inputs. The IV approach used here was first proposed by Anderson and Hsiao (1982). The more commonly used System GMM estimator (see Arellano and 4 An important issue is the order of the autocorrelation term. In expression (6) we have assumed, for simplicity of exposition, a (log) AR component of order one, but since we use quarterly data, it is sensible to assume that errors are autocorrelated at at least lags 1 and 4. In Section 4 we use tests of overidentifying restrictions, such as the Hansen test, as well as tests of autocorrelation in the residuals, to discriminate between competing assumptions about the error term. 5 We assume, for ease of exposition, a first order AR term, as in expression (8), derived from expression (6) 9

Bond, 1991) is not a practical alternative in our framework, where training represents an investment flow, as it would imply differencing eq. (7), and therefore using second differences of the data on electricity, production, and hours. Indeed, we found that our system GMM estimates on eq. (10) were not robust and had large standard errors. We believe that differencing data that has already been log differenced eliminates too much information to be of any use: lagged differences in X t were very weakly correlated with current training, and past levels of X t were also uncorrelated with current differences in training. In the next section we describe the data used. 3 Data We construct a quarterly panel of 2 digit industrial sectors, from Q1 : 1988 to Q1 : 1998 using three sources. We obtain data on training and educational attainment from the National Longitudinal Survey of Youth 1979 (NLSY79) dataset, data on production and electricity consumption from the Board of Governors of the Federal Reserve System, and data on hours worked from the Bureau of Labor Statistics. Appendix 1 contains the list of industries in our dataset. Our use of industry rather than firm level data warrants a discussion of the potential aggregation bias involved. Our main concern is the exit of non performing firms from an industry. Such firms would tend to have limited learning capabilities, and therefore the effects of training on productivity and wages would also be small. This exit behavior could in principle bias our training estimates upwards, but such bias would arise only in the effects of lagged training, of which -as we report in the next section- we find none. The NLSY79 follows 12686 individuals that were aged 24 to 31 in 1988. We construct the training variables by aggregating monthly training histories from these respondents 6. The NLSY79 contains a comprehensive description of training careers, recording a wealth of information on up to five formal training spells per year, including the type of training program. We separate training programs according to whether they took place on the job or outside of it. In doing so, we hope to separate programs destined to learn skills that are strongly related to the current worker occupation, versus those that are not. Table 2 describes which types of training programs are considered on the 6 We exclude respondents who are younger than 25 years of age, as well as those participating in Apprenticeships or government sponsored training programs. 10

job training (OJT) and which are considered off the job training (OFFJT). While a worker may use OJT related skills in a different firm, such skills should have a larger firm specific component than those related to OFFJT. Training programs run by an external company are classified in Table 2 as on the job training whenever they are held at the workplace. Since we have little guidance as to whether this training is firm specific or not, we do a robustness check by reproducing our main results after reclassifying such programs as OFFJT. Our measure of training incidence is now described. We begin by creating an individual incidence variable: the number of months a worker participated in training on a given quarter (0,1,2 or 3). In order to obtain the measures of training used in the paper, we aggregate over individuals and compute averages of OJT, OFFJT and aggregate training incidence (AGGT=OJT+OFFJT) within an industry/quarter cell. The incidence estimate reported here is therefore larger than if the standard incidence measure (participated/did not) was used. Both measures would coincide only if all training programs had a one month duration. We finally keep track of the number of observations per cell (nobs), so that regressions can be properly weighted. We exclude cells constructed with less than 30 observations of respondents who report working in manufacturing in a given quarter. The measures of turnover, wages, education, and experience are also obtained from the NLSY. We use the exit rates from an industry as our turnover variable, and use the (log) average hourly wage rate in an industry/quarter cell as our wage variable. Our measure of education is the average highest grade completed (hgc) of workers in an industry/quarter cell, and the experience variable (exp) is constructed by aggregating the individual experience measure age hgc within an industry/quarter cell. The original data on industrial production and electricity use was obtained in index form, where the base year is 1997. The original source for hours are the BLS indexes of aggregate weekly hours in manufacturing industries. Since we could only obtain hours data in index form, we were unable to substract from this series the hours spent in on the job training, but as workers spend in average a mere 1.35 hours a month in OJT, we believe that this is a minor issue. 7 7 Data on industrial production: Tables A to IE in http : //www.federalreserve.gov/releases/g17/table1 2.htm ; data on electricity use, not seasonally adjusted, from http : //www.f ederalreserve.gov/releases/g17/kwh.htm; Data on Hours: Series id EEU 32200040 to EEU 32390040 from http : //www.bls.gov. 11

Figure 1 shows the time series of averages across industries for the main variables we use. Note that the series are not adjusted for seasonality, since this procedure (in any of its forms) would have included in any one observation the error terms of other observations. Because we are estimating a technological relationship, seasonal changes in demand do not by themselves create a hurdle. On the other hand, if seasonal changes in production reflect in part seasonal differences in productivity growth, this can be easily accommodated in the present framework. Table 3 shows descriptive statistics for all variables. The values for OJT and OFFJT imply that an equivalent of 2.3% (0.07/3) of workers participate full time (three months per quarter) in an OJT program, and 1.4% in an OFFJT program. Note also that each observation on training is constructed from as many as 426 or as few as 30 observations on respondents who reported working in industry i on quarter t. Data on production, hours worked, and electricity use are in growth rates. 4 Results This Section presents the estimation results. We first test hypotheses 1-3 regarding the productivity effects of training. Then, we use the same framework to examine the wage effects, and finally we discuss the results using a simple model of the firm. 4.1 Training and productivity Table 4 shows the results of estimating eq. (7) by Instrumental Variables, in columns 1 to 4, and also reports results from estimation by OLS and fixed effects. Robust t-statistics are in parentheses. Equation (7) is estimated adding also lag 4 of X t = {P rod t 1, elec t, hours t, T R t, exp t, turnover t } to the right hand side, to account for seasonality. Our choice of the set of instruments was guided both by the Hansen test of overidentifying restrictions, and the Arellano-Bond (AB) test of autocorrelation. For the baseline model shown here, the Hansen test does not reject the null of validity of the set of instruments (p = 0.52), and the AB test shows no autocorrelation of in the errors at lag 6 (p = 0.16) and higher. We use then lags 6 to 12 of X t as instruments, as well as lags 0 to 4 of changes in the (log) price of oil and 12

M2, which are used as standard instruments. 8 Seasonal and quarter dummies, and lagged levels of the explanatory variables are omitted from the Table. In this Table we perform a general functional form search for f that will shed light on hypotheses 1-3. Note first that the coefficient on hours worked is higher, and the coefficient on electricity is smaller than those reported in the literature (see for instance Table 4, col. 2 in Burnside et al, 1995), but these estimates are consistent across regressions, and they are similar to the estimates obtained using the annual KLEMS dataset. Hypothesis 1 : The first column suggests that aggregate training is a poor measure of productivity enhancing activities, as the coefficient on AGGT is not significant. Adding a quadratic, and more or fewer lags and powers did not change this result, nor did interaction terms between lags. The next two models use OFFJT (IV2) and OJT (IV3) as the training measure. In IV2, OFFJT is not significant, and no lags or powers of off the job training turned out to be significant, so this variable was excluded from the estimation. For IV3, the coefficient on OJT is not significant either, but adding a quadratic (IV4) turns both the linear and quadratic coefficients significant. Our preferred specification is then IV4. For comparison, we add in columns 5 and 6 results for the model estimated by OLS and fixed effects. Note that the coefficients on training increase substantially when the endogeneity of the explanatory variables is accounted for. Since training is countercyclical, it is negatively correlated with productivity growth, generating a downward bias in the estimates. Moreover, in some sectors training may be higher due to unmeasured factors that also contribute to productivity growth, causing the FE estimates of training to be smaller than the OLS estimates. 9. Note that in the baseline regressions in columns 1 to 4, experience and education are never significant. This is probably due to the strong correlation 8 In most of the baseline regressions, these standard instruments were dropped due to collinearity. 9 We added hours in on the job training to the baseline specification, which contains OJT incidence. These two variables could be interpreted as the effect of increasing training on the extensive margin (incidence) by adding more workers to training programs, versus increasing training on the intensive margin(hours) by training the same number of workers for a longer time. Hours in training however do not seem to add any additional information, as coefficients are invariably insignificant. The coefficients on hours in training are also insignificant when hours substitutes incidence in the baseline specification. These results suggest large measurement errors in hours on training, so we omit this variable from the analysis. 13

of both variables with time, and the use of time dummies in all regressions. That OFFJT has no discernible effects on productivity growth is somewhat puzzling, as the literature examining the wage effects of training consistently find positive wage effects (see, e.g. Lowenstein and Spletzer, 1998 ) of this type of skill acquisition. One possible explanation is that the OFFJT programs in our data have mainly a signaling role, which more productive workers use to increase their wages. An alternative explanation, of course, is that our data on OFFJT contains too much noise to be of use. We further assess the robustness of these results by reclassifying the training category Seminar or training program outside of work as OJT, and running models IV2 and IV3 with the redefined training measures, labelled OF F JT alt and OJT alt. The results, displayed in Table 5, are qualitatively similar to those with the baseline definitions of on-the-job and off-the-job training: the coefficients on OJT are significant, while those on OFFJT are not. Hypothesis 2 : We expect that training has a lagged effect on productivity but, as became clear in the discussion above, we find that only current quarterly measures of training affect current human capital. One possibility is that a quarter is too long a period to pick up this effect. To obtain evidence on this point, we disaggregate OJT in a measure of OJT in the first month of the quarter (OJT first), and OJT in months 2 and 3 (OJT late). Regression IV4 was then replicated with OJT first and OJT late (and quadratics) replacing OJT. The results show a coefficient of 1.1 on OJT first (p = 0.06), and a positive (0.64) but not significant (p = 0.22) coefficient on OJT late, suggesting indeed a learning lag shorter than a quarter. Quadratics have negative and non significant coefficients. Hypothesis 3 : We now examine Hypothesis 3 in the context of the baseline model (IV4, Table 4). As noted in section 2, the coefficients on training and training squared in that model need to be interpreted as the reduced form coefficients of a linear quadratic approximation around log f(t R t, exp t, δ t ), but we are really interested in the concavity (on T R) of the function f itself. In evaluating this approximation, it makes sense to use T R 0 = T R, but because the exposition is simpler we use T R 0 = 0 to explain the procedure. In this case, we have β T R = α f T R f and β T R 2 = α f T R 2 f ft 2 R, so β 2f 2 T R + 2β T R 2 βhours β T R f T R 2 f T R gives us the sign of f T R 2 (f T R is positive since β T R > 0). We obtain a similar expression when expanding around the mean. The point estimate of this expression is negative, and a Wald test rejects the null that it is 14

indifferent from zero (p = 0.056) (versus the alternative that it is negative), so we find evidence in favor of hypothesis 3 10. Using the above expression, we obtain a measure of concavity defined by f T R 2 (T R) T R, which gives a value f T R (T R) of -0.85. If the function f was a simple power function f(t R t ) = a + bt R η t, this would imply a coefficient η of 0.15. Since we would like to compare our results with those in Dearden et al (2006), it is important to assess whether the differences are due to the data or to the modeling and estimation choices. To do so, we must estimate a model similar to the baseline model in that paper. 11 One such model is log(value added) t = β 0 + β 1 elec t + β 2 hours t + β 3 T R t + β 4 exper t +β 5 exper 2 t + β 6 hgc t + β 7 turnover t + ɛ t. (12) With an error structure similar to that of model M1 (eq. (8)).Equation (12) is a levels equation in hours, electricity, and training. Training is then used as a measure of the stock of human capital, while in the model used in this paper training is used as a flow that augments the stock of human capital. We believe our specification has the advantage of being better rooted in theory, but at the same time it requires differencing the dependent and a number of explanatory variables, and is therefore prone to amplifying the measurement error that most likely exists in these variables. A model in levels, by contrast, requires for identification that training be an unbiased measure of the unobserved part of the stock of human capital. As in the original paper, we estimate this model by System GMM, adding lags one and four of all variables to the right hand side. Table 6 summarizes the results. In both specifications the dependent variable is log P rod. Note that aggregate training (column 1) has no discernible effect on productivity, but OJT (column 2) has a positive and statistically significant effect, just as in our baseline specification. A 10 percentage point increase in OJT results 10 If the approximation is evaluated around the mean, we have β T R 2 = α f T R 2 f f 2 T R 2f 2 2αT Rβ T R 2, where the function f and its derivatives are evaluated at and β T R = α f T R f T R = T R. 11 The baseline model estimated in that paper is log(p rod) t = β 0 + β 1 T R t + β 2 (capital/worker) + β 3 (hours/worker) +(OT HER CONT ROLS) t + ɛ t. Where OT HER CON T ROLS include Occupation, Experience, and firm size. 15

in a 0.39% increase in productivity, a full order of magnitude smaller than both the estimate in our baseline specification (3.7% for a similar increase), and the 6% estimate by Dearden et al (Table 2, p.412). A concern with the previous results is that neither N nor T might be large enough in our data to guarantee using the central limit theorem. Judson and Owen (1996) use Monte Carlo methods to study the performance of, among other, the Anderson-Hsiao and System GMM estimators in datasets with dimensions similar to ours, which are relatively common in macroeconomics. Their estimates for the bias on the regression coefficients range from 0 to 0.5% of the coefficient value when the Anderson-Hsiao approach is used, for N = 20 and T = 20. When the System GMM approach is used, the authors estimates of the bias when N = 20 and T = 20 range from 0.4 to 4% (Table 4, p.13). We find these results reassuring in that the estimators used here are appropriate given the dimensions of our dataset. 4.2 Training and wages The same model used to examine the productivity effects of training can be used to study its effects on wage growth. Table 7 reports the results of estimating model M2, with log(p rod) substituted for log(wages). The results show a coefficient on OJT that is statistically insignificant in the baseline regression (IV1). Adding OJT sq, and more or fewer lags of X t as instruments did not modify this result. By contrast, the OLS estimates show a small but positive and significant coefficient. Since OLS biases the estimates of training towards zero, this result does have some relevance. Although we had wished for more precision in the estimates for training, we believe that the estimate on OJT may reflect in part the firm specific character of the skills learned in in-shop training programs. Following Becker (1964), we should observe that firms pay for the costs associated with the acquisition of firm specific skills, and also receive all the benefits from it. In this case, the coefficient on the training variable in a wage equation should be zero. The point estimate of the coefficient on OJT in regression IV1 is not robust to small changes in the instrument set, varying from -0.2 to 0.4, and is never significantly different from zero. However, note that we have no direct evidence about the firm specific vs. general skill content of the training programs in OJT. An alternative explanation for these IV results is that our data on wages is simply not of good quality. To address this point we use our data set 16

to run the models developed in Dearden et al (2006). In that paper, the authors report a positive and significant coefficient on the training variable when estimating a (log) wage equation, similar to that in footnote 11, by System GMM. We replicate these results in Table 8. Note that in this case both AGGT and OJT -but not OFFT- have a positive and significant effect on wage levels. Moreover, the coefficient on AGGT is quite similar to that of Dearden et al (0.31 and 0.35 respectively). The estimate of the effects of OJT is larger in this wage equation than in the productivity equation ( column 1, Table 6). It is worth noting that, if we believe these results as reflecting the effects of training on wage and productivity growth, they only imply that wages increased more than production as a result of training, and therefore that training increased the share of the wage bill in output over this period. 4.3 Discussion It is time to take stock of the results so far. We have obtained two sets of estimates that predict very different quantitative productivity and wage effects of training, which in turn are different from the results by Dearden et al. In this subsection we discuss these effects in the light of an optimizing model. We conclude that both our baseline estimates in the productivity and wage equations, and those by Dearden et al, imply very similar estimates of the costs of training programs. By contrast, the system GMM results on US data imply implausibly small estimates of these costs. Evaluated at the mean, a 10 percentage point increasing in training increases productivity by 3.7% in our baseline estimates (Table 4, column 4), by 6% in the estimates by Dearden et al (Table 2, p. 412), and 0.39% when the Dearden model and methods are used in our data (Table 6). The same increase in training would have no effects on wages in our baseline specification (Table 7), a 3.5% increase in the Dearden paper (Table 2, p. 412), and 2.5% in our alternative model estimated by System GMM (Table 8). To obtain a better intuition for the plausibility of these effects, we examine a version of the firm s problem. In our model, firms capture a fraction s of the benefits of training, and pay a cost C per hour of training. Training depreciates through obsolescence and has effects for only T periods. A simple way to capture this is to define human capital as H t = T 1 i=0 f(t R t i ) Because we assume that the production function has constant returns to scale, we can express all variables in a per hour basis. Current per hour capital 17

services and training, as well as next period s human capital are chosen to maximize Π = t=0 ( 1 1+r )t (y(h t ) C t T R t w t (1 s)h t ) (13) s.t. H t = T 1 i=0 f(t R t i ) (14) Where r is the interest rate, and H 0 is given. Note that the firm pays a wage on a fraction (1 s) of the worker s human capital, and therefore obtains a share s of the benefits of training, which last for T periods. The model then captures both the sharing of benefits from higher human capital between workers and firms and the obsolescence of human capital. The variable T R t stands for training per hour of labor, of which our training variable (OJT) is a close measure, and C is the cost of an hour of training. This model is consistent with a number of market arrangements, as it only assumes that (a) all on the job training is chosen (but not necessarily financed) by firms, and (b) the choice of training is consistent with profit maximization. In particular, it is consistent with the influential model of Acemoglu and Pischke (1999), where switching jobs is costly, so firms have an incentive to finance even general types of training since they will eventually capture part of the benefits Because we use the firm s problem described in eqs (13)-(14) as an accounting device however, where the parameters are to be calibrated from data, it will not be necessary to take a stand on the nature of the economic environment that gives rise to training. From the first order conditions we obtain an approximate balanced growth relationship between the share of benefits appropriated by the firm, the elasticity of HC with respect to training f /f and the costs of training 12 : 1 (β) T 1 β f (T R) f(t R) s = x (15) Where x is the cost of an hour of training in terms of the wage rate, and we use β (1 + r) 1. This expression says that, for the firm, the benefits it obtains from an extra hour of training are equal to the costs it bears from this extra hour. The cost for the firm is represented by x, and the total benefits of training are T 1 i=0 β i f (T R) = 1 βt f (T R), of which a fraction s is f(t R) 1 β f(t R) appropriated by the firm. 12 See appendix 2 for a complete description of the problem 18

Note that our mincerian estimate of the effect of training in the wage equation pins down the right hand side of expression (15). To see this, note that this estimate, which we can call β T R,W, represents the proportion of (marginal) wages appropriated by the worker. Since wages are, in the US and UK, approximately 0.7 of output, we have that the fraction of extra output appropriated by workers following a training spell is 0.7β T R,W, therefore we have s = 1 0.7β T R,W. An estimate of f (T R) is also readily available from the productivity equations. 13 By contrast, the only information we have on the costs of training f(t R) programs and on the obsolescence of training skills is from indirect sources, but we can use our knowledge of s, β and f (T R) to calibrate expression (15), f(t R) infer what levels of costs of training programs are predicted by the three sets of estimates, and discuss whether these costs are reasonable based on the available evidence. The NLSY survey has no information on the direct costs of training programs, but a firm level survey conducted every year by Training Magazine, a publication that caters to the training industry, has a wealth of information on such costs. The survey Report, Training (2002), states that, on average, firms provide about 50 hours per year of on the job training to employees, or about 2% of total hours, and training budgets average 5% of the wage bill. This implies that the direct costs of an hour of training are about 2.5 times those of an hour of wages. In addition, firms pay for most of the time spent in training, so adding up direct and indirect costs give an x of 3.5. Figure 2 uses expression (15) to plot the unit costs of training x as a function of the lifespan of training skills T. In this figure, we take β = 0.99. The three estimates of s are quite similar: they are 0.76 (Dearden), 0.83 (System GMM on our data), and 1 (baseline). Moreover, even though the estimates in our baseline equation and those in Dearden et al (2006) are quite different, they imply very similar costs of training. For T = 8, the cost of an hour of training is 4.1 hours of wages in our baseline model, and 5.0 in Dearden et al (2006). This occurs because, although the returns from training are smaller in our data, the share of these returns appropriated by the firms is larger. For our system GMM estimates, the figure is about 0.36. Since firms pay for most of the training costs, and OJT programs are most probably held during working hours, a unit of training time should cost at least its equivalent in wages, which implies x 1. This leads us 13 From the productivity equations we obtain α f (T R) f(t R), and use α = 0.7 to retrieve f (T R) f(t R). 19

to believe that our own System GMM estimates of the effects of training are implausibly low, while our baseline estimates, and those by Dearden et al (2006), are reasonable for T small: for T larger than 20 (5 years), both estimates are larger than 10. 5 Conclusion Using an industry level dataset that covers the period 1988-97, we first examined a number of hypotheses regarding the technology used to produce new human capital from investments in formal training programs. We found that on the job training increases productivity at the industry level, while off the job training programs have no effects. We also obtained evidence of concavity of the production function for human capital in current levels of on the job training. While we could not identify any effects of training programs on wage growth in our baseline model, a model where training is used as a proxy for human capital shows positive effects of both on the job and aggregate training on wage growth. Using a simple model of a firm that decides on training and labor input, we discuss what implications these results have for the costs of training programs. While our baseline estimates of these costs are reasonable, and are comparable to those in the closely related paper of Dearden et al (2006), we found that the costs implied by our own estimates of the model where training proxies for human capital are implausibly low. In this paper we have characterized the productivity effects of one channel of learning, namely formal training programs, and their effects on wage growth. It is unclear however that most learning occurs via this channel. In the US, in any given year about ten percent of the workers enroll in such programs, but at the same time productivity growth is pervasive and affects in one way or another a large majority of workers. The main alternative channel for learning new skills is learning by doing, and we believe that a theory that links both channels of learning, training and learning by doing, would be important to fully understand the process through which workers and firms become more productive. We plan to take up this issue in future work. 20

Acknowledgments I wish to thank Daiji Kawaguchi, Gerhard Glomm, Alison Booth, and Rodrigo Aranda, as well as the editor and two referees, and seminar participants at SECHI, the University of Adelaide and the RSSS, Australian National University, for useful comments. All remaining errors are mine. Funding Comisión Nacional de Investigación Científica y Tecnológica (Anillos soc12/2007); Departamento de Investigaciones Científicas y Tecnológicas at the University of Santiago (DICYT 030762SP) 21

References Acemoglu, D. and Pischke, S. (1999) Beyond Becker: training in imperfect labor markets, The Economic Journal, 109, 112-42. Anderson, T.W. and Hsiao, C. (1982) Formulation and estimation of dynamic models using panel data, Journal of Econometrics, 18, 47-82. Arellano, M. and Bond, S. (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations, Review of Economic Studies, 58, 277-97. Barlevy, G. (2007) On the cyclicality of research and development, American Economic Review, 97, 1131-64. Barrett, A. and O Connell, P. (1999) Does training generally work? The returns to in-company training, Working Paper 99-51, Institute for the Study of Labor (IZA). Barro, R. and Lee, J.W. (1993) International comparisons of educational attainment, Journal of Monetary Economics, 32, 363-94. Barron, J., Black, D., and Lowenstein, M. (1989) Job matching and on-the-job training, Journal of Labor Economics, 7, 1-19. Bartel, A.P. (1994) Productivity gains from the implementation of employee training programs, Industrial Relations, 33, 411-25. Basu, S. (1996) Procyclical productivity: increasing returns or cyclical utilization?, The Quarterly Journal of Economics, 111, 719-51. Becker, G. (1964) Human capital: a theoretical and empirical analysis with special reference to education, Columbia University Press, New York, NY. Bishop, J. (1994) The impact of previous training on productivity and wages, in L. Lynch (ed.)training and the private sector: international evidence, National Bureau of Economic Research Comparative Labor Market Series. 22