Measuring Human Capital in Higher Education

Size: px
Start display at page:

Download "Measuring Human Capital in Higher Education"

Transcription

1 Measuring Human Capital in Higher Education Pietro Giorgio Lovaglio, Gianmarco Vacca and Stefano Verzillo Abstract The concept of Human Capital (HC) could be defined, from an economical viewpoint, as a stock variable representing the capacity of an individual to produce a sustained flow of income due to its investment in (higher) education and work experience. This paper focuses on the empirical estimation of the latent variable HC, composed of two principal dimensions, Educational HC and Work Experience HC, within a realistic structural model allowing causal relationship among endogenous and exogenous indicators taking into account possible effects of concomitants indicators. To this end, new administrative archives and a novel methodological approach, called Generalized Redundancy Analysis are used. The methodology is applied to estimate HC of graduates in two universities of the Milan area in the early stages of their working career. The empirical results confirm the structure of the Italian job market, where the work experience and economic background of origin family play significant roles for explaining the economic performance of graduates in the labour market, contrary to investment in HC through higher education. Keywords: human capital, redundancy analysis, concomitant indicators 1 Introduction Over the last 50 years, the concept of HC has been systematically developed in the economic literature ([2]). Although most of the existing literature considers human capital being generated by both education and labour market experiences, they de facto use these dimensions as proxies for the underlying individual stock of HC without engaging in any quantitative estimation. Especially when comparing different educational systems economic researchers consider the human capital production function as part of an income maximization problem where the schooling endowment Pietro Giorgio Lovaglio, University of Bicocca-Milan and CRISP, piergiorgio.lovaglio@unimib.it Gianmarco Vacca, University of Bicocca-Milan; gianvacca88@gmail.com Stefano Verzillo, University of Milan and CRISP; stefano.verzillo@unimi.it

2 P.G. Lovaglio, G. Vacca, S. Verzillo (years or educational levels) represents a proxy of HC ([6]). Differently, the amount of schooling should be considered as an endogenous variable possibly affected by other unobservable dimensions which together build up the individual HC. Following this assumption HC should be considered as a complex framework composed by various dimensions, not always directly observable, that cannot be measured without loss of accuracy by a single attribute or by a set of attributes ([8]).Dagum ([4]), in the Encyclopedia of Statistical Sciences, defined HC as a stock latent (i.e. non-observable) variable that represents the capacity of an individual (household, nation) to generate a sustained flow of earned income. In addition, in the OECD ([12]) report, HC was defined as the knowledge, skill, competencies and attributes embodied in individuals that are relevant to economic activity. In this framework, considering HC as a latent or composite variable, in accordance to the recent economic theory ([5], [9]) means to establish a causal path for investment in HC, linking with formative observed variables (the causes of the latent or composite variable, such as investment in education) and reflective observed variables (the effects of the latent or composite variable itself, such as economic performance). Accordingly with HC literature, HC may be specified as bi-dimensional latent composite (LC): the Educational HC, whose unobservable scores depend on the formative indicators measuring education and academic performances and the work experience HC (hereinafter called Working HC) that measure the HC accumulated in the labour market during their working career. Further, since both HC dimensions have impact on economic performance in the labour market, so blocks of formative and reflective indicators have to be specified. However, often exogenous covariate, not directly linked to latent components, may affect causal relationships. Indeed such covariates could have both a causal impact on observed endogenous variables and/or onto the LC too. If it is the case they will be hereinafter called concomitant indicators To this end, a set of concomitant/external indicators may be considered in the model having causal impact both on economic results and/or Educational HC. Such indicators typically reflect some opportunity factors of Education HC formation or cultural elements and/or the family socio-economic background. Moreover, recent economic literature ([1], [3]) started in studying the impact of (amount and quality of) academic research on labour market outcomes, such as employability or economic advantages, for graduates. The aim of this paper is to specify and fit a structural equation model that adequately represents the mechanism of HC accumulation during higher education aimed at improving economic performance. Section 2 presents the methodology, in Section 3 we illustrate the application referring to graduates of two Universities in the Milan area in the early years of the working career. Section 4 offers conclusions. 2 Measurement model for the Latent Variable HC Since the objective is to model HC as LC with formative, reflective and concomitant indicators, recent methodologies, such as Extended Redundancy Analysis (ERA, see [13]) and its extension, Generalized Redundancy Analysis (GRA, see [11], [14]) may be used. ERA only allows the inclusion of observed covariates having direct effects on endogenous variables, but excludes concomitant indicators. GRA overcomes the 2

3 Measuring Human Capital in Higher Education drawbacks of the ERA method with direct effects, allowing specifying concomitant indicators and fitting the model, minimising a well defined loss function. Assuming q observed exogenous variables (X), p observed endogenous variables (Y), k concomitant indicators (T) and d composites (F), direct effects could be accommodated in the following SEM-RA specification: Y = TA Y ' + FA' + E F =XW + TW T (1) under restrictions diag(f'f)=i d and d min(k+q, p), where W, W T and A Y, A are matrices of weights and loadings, respectively and E is the error matrix. In order to separate the contribution of strictly formative and concomitant indicators, instead of the measurement model for F in Eq. 1, we adopt an equivalent specification, as follows: F = XW + T W T, diag(f'f)=i d (2) with X is orthogonal to T, where T = T X X + T, W = W + X + T W T and X + is the Moore-Penrose generalized inverse of X obtained through the singular-value decomposition of X. Substituting Eq. (2) into Eq. (1), the associated loss function is SS[Y (TA Y ' + FA')] = SS[Y (TA Y ' + XW A' + T W T A')] (3) where SS[Z] = trace(z Z). Since W, W T, and/or A may contain prescribed, fixed (zero) elements, depending on the specified model, the minimization of the loss function (3) cannot be achieved in a closed form. Hence, we use an iterative method, employing the alternating least-squares (ALS) algorithm developed by Kiers and ten Berge ([7]). In the algorithm, matrices A Y, W, W T, and A are alternately updated in a four-step algorithm until convergence is reached. In the first step, we update W independently from W T for fixed A and A Y. In the second step, W T is updated for fixed A and A Y, with W obtained in the first step. Further, through Eq. (2) we could obtain weight W in its original specification (as in Eq. 1) by the back-transformation W = W X + T W T. In the third step, A is updated for fixed A Y, W, and W T, whereas in the fourth step A Y is updated for fixed A, W, and W T. The above four steps are alternated until convergence is reached. Since weights and loadings have to be estimated without destroying their structure (e.g., zero elements in W and W, depending on the observed variables that define each composite), in each step, firstly we obtain least squares estimates of sub-matrices that selects non-zero elements by eliminating the row/columns corresponding to the zero elements and secondly, the final least squares estimates matrices are updated reconstructing the zero elements to their original positions. Finally, GRA allows the evaluation of the total fit of a hypothesized model, measured by the total variance of the observed endogenous variables explained by the exogenous variables. In presence of concomitant indicators, since they belong to the exogenous variables as long as formative indicators, the fit index Ψ becomes Ψ = SS[Ỹ (TA Y ' + FA')] / SS[Ỹ] (4) where Ỹ includes all endogenous variables. They may be represented by a single block (see Eq. 1, where Ỹ=Y) or by multiple blocks (when, apart strictly endogenous variables Y, a component f 1 has a causal impact on a block of formative variables X 2 : here the endogenous block may be represented by Ỹ = [X 2 Y]). This fit index Ψ ranges 3

4 P.G. Lovaglio, G. Vacca, S. Verzillo from 0 to 1. The larger the fit value, the larger is the variance of the endogenous variables explained by the exogenous variables. 3 Application This section describes the main steps to specify and fit HC accumulated during higher education for the graduates enrolled in two Universities of the Milan area and its effect on the labour market during the recent economic crisis. Data are drawn by the combination of different institutional databases. The University Administrative archives of two (blinded) Universities of the Milan area compose the first dataset, including socio-demographic characteristics and academic performance information for all graduates who obtained their degree in We considered only postgraduate level (two further years in addition to the undergraduate level) which refers to the following faculties: Law, Economics, Political Sciences, Sociology, Natural Sciences and Psychology. The archive of Employment Centers is the second source used, which offers information on workers who are active, after graduation, in the labour market of the Milan area. The third dataset is the Italian National Revenue Agency archive, which adds data on annual gross earned incomes for workers residing in this area. Finally, different institutional and web sources (MIUR, CIVR and ISI web of Knowledge) were investigated to obtain standard measures of the quantity and impact of produced research for researchers in both universities. Following the arguments of Section 2, the structural model specified for the HC application is depicted in Figure 1, whereas the observed variables are illustrated in Table 1. Notice that all variables refer to individual graduates with some exceptions: variables x 4 to x 8 are aggregated by faculty, and specifically variables x 4 to x 7 refer to teaching and research staff only (assistant, associates and full professors) affiliated in the same faculty of the graduate. Since x 4 to x 8 are extremely correlated, an aggregate measure of quality of the research in the faculty is used (c 1 ) as the first principal component of mentioned variables. Data consist of 2,838 graduates in 2007 whose career paths in terms of declared income and working experience (days from graduation to the first occupation and working saturation, defined as the number of worked days over the potential number of working days in each year) have been observed until 2009 (declared income at 2010). Academic variables refer to the year of graduation (2007), the observed labour market information refers to period , whereas earned incomes from 2007 to 2009 refers to annual declarations from 2008 to The selected graduates are mainly males (62%), 97% of graduates is between 23 and 28 years, 63% has a scientific high-school background, 92% has obtained a degree in science, economics or technical faculties, 57% comes from families with lowmedium economic status and 63% are full-time students. Low values (Mean =18,806, Median=20,270, I st (3 rd ) quartile = 11,322 (25,232 )) for gross earned income reflect post-graduation incomes for graduates at the very beginning of their careers. Further, one (two) years after graduation 33% (39%) of graduates had a permanent contract. To fit the structural model, two GRA models, the first without the concomitant indicator and the second involving the specified concomitant indicators are estimated. Table 2 illustrates the results (estimated standardized parameters and p-values, 4

5 Measuring Human Capital in Higher Education obtained with nonparametric bootstrap re-sampling using 500 replications). In Table 2 are reported both the full models involving all structural relationships and the final models reporting the estimated parameters only for significant relationships at 0.10 level. Table 1: Specific indicators for Educational HC and Working HC Formative indicators for Educational HC (f 1 ) Formative indicators for Working HC (f 2 ) x 1 Expected graduation age / Graduation age s 1 Number of days from graduation to the first occupation x 2 Graduation mark - cohort graduation mark (graduates in the same year and faculty) s 2 Saturation of the observed career (worked days / potential working days) x 3 Condition at graduation (1 = full time student, 0= student worker) s 3 Saturation variation from to x 4 Average number of citations per researcher Concomitant indicator in the Faculty in 2007 x 5 Average impact factor (5 years Journal Citation Report) per researcher in the faculty t 1 Economic status of the family of origin (10 ordinal categories) x 6 Number of projects financed by MIUR over Reflective Indicators for f 1 and f 2 the period in the faculty x 7 Number of ISI publications per researcher in the faculty in 2007 y 1 y 2 Last (declared) gross annual income Average daily income (self-employed) x 8 Faculty percentage of foreign students in 2007 y 3 Average daily income (subordinate work) Figure 1: Path diagram for the multi-dimensional HC. Structural model without the concomitant indicator t 1 (right) and with the concomitant indicator t 1 (left). Interestingly, GRA selects the same significant paths in both models (with and without t 1 ) and the estimated coefficients for parameters that do not involve the concomitant indicator, are quite robust in both situations. Concerning the interpretation of the final model without t 1, the formative indicators for Education HC (f 1 ) have signs in accordance with expectations. However, only 5

6 P.G. Lovaglio, G. Vacca, S. Verzillo indicators such as individual ability measured by graduation mark and being a full time student (x 3 ) are significant, meaning that only those indicators succeeded in generating the linear composite Education HC (by means of component weights) that has some significant effect on the specified endogenous variables. On the opposite, the age at graduation (x 1 ) and the quality of academic research of the faculty s teaching staff (c 1 ) are not statistically significant. Table 2: GRA full and final models with and without the concomitant indicator Full Model (no concomitant) Final Model (no concomitant) Full Model (with concomitant) Final Model (with concomitant) Effect Est. Sign. Est. Sign. Est. Sign. Est. Sign. x 1 f x 2 f < < < <0.001 x 3 f < < < <0.001 c 1 f s 1 f s 2 f < < < <0.001 s 3 f < < <0.001 f 1 y < < < <0.001 f 1 y f 1 y f 2 y < < < <0.001 f 2 y f 2 y t 1 f 1 _ t 1 y 1 _ < <0.001 t 1 y 2 _ t 1 y 3 _ FIT (Ψ) As second empirical evidence, it is to be enlightened the inverse relationships between Educational HC and the entire blocks of endogenous variables, although only y 1 results significant, whereas the Working HC (f 2 ) shows a large and direct impact on earnings. The large effect of Working HC on income and the advantageous condition of working students for graduates, indicate that the labour market in the early stage of the career path rewards work experience also during the academic studies, whereas the investment in Education HC (and thus academic performance and quality of institutions) pays a negative monetary premium (this negative sign was also confirmed by fitting the same model with only full time students). Although this may appear counterintuitive, to our knowledge, no empirical research on Italian data using real incomes (not only declared in sample surveys) exist to disconfirm such evidences. Stratifying graduates by faculty may clarify further evidence. However, another main aspect was considered. As typically occurs with administrative data, since the specified model can only be estimated for graduates who are employed and have a positive 6

7 Measuring Human Capital in Higher Education income, if they are not a random sample of all graduates (both employed and unemployed), then the monetary effect of education will be biased. The available data substantiate a negative selection in our sample of wage earners: workers sort themselves into lower paying work experiences, accordingly to previous research based on Italian administrative archives ([11,[9]). Specifically, the variable x 2 has mainly negative values, meaning that the selected graduates with observed income have graduation mark (significantly) lower than those of the entire cohort of 7,204 graduates that, for various reasons, are not collected in the National Revenue Agency database after graduation ( ) and thus have missing income. The comparison of both groups of graduates by other characteristics such as graduation age and time to the completion of studies confirms this figure. Moreover, using the well-known two-step approach proposed by Heckman (see [6]), we found a negative and strongly significant effect of the inverse Mill's ratio, added as regressor in the Mincerian wage equation (where log-income depends on age and its square, gender, graduation mark, faculty, saturation of the observed career and age of graduation), to control for sample selection, indicating serious negative selection bias. If the available graduates sort themselves into lower paying work experiences (whereas other graduates may attain highest level of education or further specialisations or apprenticeship) and have worst academic performances, the negative impact of investment in HC on earnings, seems reasonable. Concerning the formation of Working HC, positive and significant effects are associated with the working saturation indexes, both in terms of saturation of the whole observed career and saturation variation (s 2, s 3 whereas the time to the first occupation is not significant), meaning that higher working saturations and high improvements generate larger Working HC component scores having significant and positive effects on income. To resume, both the HC dimensions show significant but discordant effects on declared incomes (y 1 ), with a larger and positive effect played by Working HC. In a second step of analysis, the economic status of the family of origin (t 1 ) is included (left part of Figure 1) a concomitant indicator and the model was estimated using GRA. Last four columns of Table 2 shows that the family background has no significant impact on education HC, whereas has strong effect on declared income; therefore, since the family economic background acts only as external covariate (rather than concomitant indicator) having a direct effect on endogenous variables, in the final GRA model the parameters reflect the estimated relationships net of family background effect. Specifically, for the selected graduates, parents economic status significantly affects the annual income, whereas has positive, but marginal effect on the accumulation of Educational HC. Further, notice that controlling for family economic background, the impact on Educational HC on annual income (f 1 y 1 ) becomes weaker, as expected, whereas the effect of Working HC on income does not change. Finally, both models have similar fit. 4 Conclusions In this paper our main purpose was to estimates HC as a multidimensional latent variable composed of education and work experience dimensions in structural equation models, using GRA, a generalized version of ERA, allowing us to specify, direct effects and/or concomitant indicators, typically relevant in applications. The inclusion of such effects provides meaningful and realistic structural equation models in accordance with economic theory. As the application has shown, the effects of self selection have to be 7

8 P.G. Lovaglio, G. Vacca, S. Verzillo accurately taken into account for HC applications, especially when administrative data are used. The second innovative feature of this study deals with the use and integration of different Institutional administrative archives: the university archives, the database of the National Revenue Agency, the database of the Employment Centres and institutional and web bibliometric databases to measure, although as a proxy, the quantity and impact of produced research for teaching staff. The merged database collects several indicators on the labour market, while restricted information deals with individual characteristics and origin household, such as parental education. Other limitations of this study deal mainly with the restricted number of examined universities, faculties and especially a very short observation period for observed career (two years after graduation). Ideally, following cohorts of graduates over time, for some decades and different universities would illustrate a more complete representation. Moreover, what observed indicators really measure graduates HC remains an open question. However, the application has showed the feasibility of a consistent way to estimate graduates HC and its impact on the labour market in the first stage of their professional career. The empirical results not only have confirmed the weak effect of academic HC on economic performance in the Italian labour market but also aggravated the picture for recent graduates observed in the beginning of their careers during the recent economic crisis. References 1. Aghion, P., Dewatripont, M., Hoxby, C., Mas-Colell, A. Sapir, A.: Governance and performance of research universities: evidence from Europe and U.S. NBER 14851, (2009) 2. Becker, G.S.: Human Capital. Columbia University press and NBER, New York, (1964); 3. Ciriaci, D., Muscio, A.: University choice, research quality and graduates' employability: Evidence from Italian national survey data. AlmaLaurea Working Papers No. 49, (2010) 4. Dagum, C.: Human Capital. Encyclopedia of Statistical Sciences. Wiley&Sons, 1-12, (2004) 5. Dagum, C., Vittadini, G., Lovaglio, P.G.: Formative indicators and effects of a causal model for household human capital with application. Economet. Rev. 26(5): , (2007) 6. Heckman, J.J., Lochner, L.J., Taber, C.: Explaining rising wage inequality: explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents. Rev. Econ. Stat., 1, pp. 1 58, (1998) 7. Kiers, H.A., ten Berge, J.M.: Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices for all populations. Psychometrika, 54, , (1989) 8. Le, T, Gibson, J., Oxley, L.: Cost- and income-based measures of human capital, J. Econ. Surv.. 17, , (2003) 9. Lovaglio, P.G.: Process of accumulation of Italian human capital. Struct. Change Econ. Dynam., 19: , (2008) 10. Lovaglio, P.G., Verzillo, S., Mezzanzanica, M.: Estimation of educational returns using university and labor market administrative archives. Adv. App. Stat., 23, 2, , (2011) 11. Lovaglio, P.G. Vittadini, G. Component analysis for structural equation models with concomitant indicators, in Giudici P., Ingrassia S., Vichi, M., (eds.) Statistical Models for Data Analysis, Springer, in press (2013) 12. OECD, Human Capital Investment: an International Comparison. Centre for International Research and Innovation, Paris, (1998) 13. Takane, Y. Hwang, H.: An extended redundancy analysis and its applications to two practical examples, Comput. Stat. Data Anal., 49(3): , (2005) 14. Vacca G.: Evaluation of the economical impact of human capital with extended redundancy. University of Bicocca-Milan master thesis, (2013) 8