Labour Market Evaluation: Theory and Practice. Seamus McGuinness 20 th November 2015

Size: px

Start display at page:

Download "Labour Market Evaluation: Theory and Practice. Seamus McGuinness 20 th November 2015"

Myra Parks
5 years ago
Views:

1 Labour Market Evaluation: Theory and Practice Seamus McGuinness 20 th November 2015

2 Why is Evaluation Necessary It assesses the extent that policy initiatives are achieving their expected targets and goals. Drawing from this the evaluator will identify the nature of any shortfalls in either program delivery or the stated objectives. Value for money from the perspective of the tax payer is also likely to prove a dominant feature of any evaluation. Fulfils a vital policy challenge role within society and helps ensure that policy is evidence based and that ineffective programmes are modified or closed. Represents a key mechanism for policy challenge.

3 What are the most common forms of labour market evaluation? Generally labour economists tend to focus on impact evaluation (is the programme achieving its desired impacts?). Process evaluation (is the programme being delivered as intended?) is less common. However, in practice most impact evaluations will also consider the efficiency of programme delivery and implementation. The bulk of impact evaluations focus on labour market programmes that are designed to improve outcomes related to employment, earnings and labour market participation.

4 Main Barriers to Effective Independent Evaluation Lack of an evaluation culture: Policy makers may view evaluation as a threat and actively seek a less rigorous form of assessment. The organisation being evaluated has the power to set the terms of reference and is invariably involved in choosing the evaluating body. Stemming from this, often little consideration given to programme evaluation at the programme implementation and design stage (often a lack of viable a control group to assess the counterfactual). Data constraints: Lack of available and linkable administrative datasets also make proper evaluation difficult.

5 Measuring a programmes impact Not at all straightforward: There have been instances when different researchers have arrived at very different conclusions regarding a programmes impact. We basically need to know what would happen to individuals had the programme not been in place i.e. we attempt to measure the counterfactual. There are various methods used for estimating the counterfactual, however, they all generally rely on measuring the difference in outcomes between people participating in the programme (the treatment group) and those eligible for the programme but not participating in it (the control group).

6 The Selection Problem Comparison of a treatment and control group is not straightforward as substantial differences may exist between the two groups that must be factored out as assignment to either is rarely random. Such differences can also arise as a consequence of ineffective control group construction. Non-random selection refers to the possibility that (a) programme administrators engaged in picking winners in order to ensure the programmes success or (b) more capable individuals are more likely to put themselves forward for intervention. Failure to account for this will result in a serious over-estimate of the programmes effectiveness.

7 Programme design and control group delivery Piloting the programme. By rolling out the programme to different areas at difference times. Ensuring access to administrative data on the targeted population (for instance live register data). Keeping records of unsuccessful applicants to the programme in instances where the demand for programme places exceeds supply.

8 Ineffective Control Group Construction In evaluating the National Employment Action Plan (NEAP) in 2005 Indecon consultants compared a treatment of 1000 NEAP claimants (by definition first time claimants) and a control group of 225 unemployed (non-neap) individuals taken from the ECHPS 58 % of whom were already LT unemployed at the initial point of observation. By definition none of the NEAP treatment group will have been LT unemployed. Indecon then compared the unemployment rates of the control and treatment groups 24 months down the line and concluded that the control group faired much better and that the NEAP programme was, therefore, effective. Does this represent a like for like comparison?

9 Methods Used for Overcoming the Selection Problem Difference in Difference Estimator: This is a two-period estimator and requires that the treatment is introduced in a second time period. More powerful as it seems as will eradicate non-random selection based around unobserved attributes (picking winners etc). Matching Estimators: Tries to match control and treatment group members on observable characteristics (education, age, labour market history etc) to ensure a like-for-like comparison (consider the earlier NEAP example). May still be prone to unobserved influences? Other methods do exist such as the use of controlled experiments but these are rarely seen in the context of labour market evaluation.

10 Difference in Difference Period 1: Outcome Y (say earnings) determined by observable characteristics X (age, education, labour market experience etc) and unobservable factors that do not change over time I (innate ability, motivation etc). Period 2: Outcome variable determined as in period 1 but say a labour market training programme ( a treatment T) is now present. By differencing across the same individuals in two periods we can both isolate T and remove the impact of time invariant (and often unobserved) factors.

11 Difference in Difference Y X I t0 it i Y X I T t 1 it 1 i t 1 Y Y ( X X ) T t 1 t0 it 1 it t 1

12 Example of a difference in difference approach Say we plan to introduce a new unemployment activation measure in June 2013 in the County Dublin. Our control group would be the rest of the country that were not to receive the measure (until perhaps 2014). We would estimate a model comparing exits from unemployment in Dublin w.r.t. the Rest of Ireland over both periods ( ). The extent to any change in the margin of difference in Dublin exit rates exit rates (relative to rest of Ireland) over the two periods will be interpreted as the impact of the programme.

13 Model Estimation Y dt T dt T * 2 t ( Y Y ) ( Y Y ) 3 treat 2 treat1 cont 2 cont1 dt = dummy variable for treatment group (Dublin area), will pick up any differences between the treatment and control groups prior to the policy change. T is a dummy variable for time period 2 and measures the extent to which the value of y increased or fell in period 2 independent of anything else. dt*t will be = 1 for those individuals in the treatment group receiving intervention in the second period. It is therefore a measure of the impact of the policy.

14 Difference in Difference Really powerful tool in eradicating unobserved bias picking winners self-selection etc. Required little data. Requires that policy be implemented in a rolled out fashion e.g. across regions across time «not always appreciated by policy makers. Is it sufficient to deal with selection bias on observables?

15 Propensity Score Matching This technique allows us to deal explicitly with the problem of differences in the characteristic make-up of the control and treatment group that have the potential to bias our estimate of the programme impact. For example, say we have an active labour market programme aimed at reducing unemployment and the control group contains a higher proportion of LT unemployed. Failure to control for this will upwardly bias the estimated programme impact as the control group, by definition almost, have lower likelihoods of labour market success even before the impacts of any programme are started. Basically, chances are that if you compare the proportions in employment of both groups, at a future point in time, in the absence of any labour market programme, the treatment group will have performed better. Thus the problem we must confront is that the estimated programme impact is simply being driven up, or entirely attributable, to differences in the characteristic make up of our control and treatment groups.

16 What does PSM do It is a method that allows us to match both the treatment and control groups on the basis of observable characteristics to ensure we are making a like for like comparison. After matching has been completed, we simply compare the mean outcomes (e.g. employment rates) of the control and treatment groups to see which is highest.

17 How do we match We estimate a probit (1,0) model on treatment group membership. This identifies that main characteristics that separates the control from the treatment group. Every member of the control and treatment group is then given a probability of their likelihood of being assigned to the treatment group based on their characteristics. Each member of the treatment group is then matched with a member of the control group with a similar probability score. It can be shown that matching on probability score is equivalent to matching on actual characteristics. This process ensures that the treatment and control groups are similar in terms of their observable characteristics.

18 Matching Clearly again a powerful tool and the most effective for tackling the sample selection problem. Requires a lot of data and additional checks to ensure that matching was successful and all observable differences between the control and treatment groups were eradicated. Does not deal with unobserved bias.

19 Carrots without sticks: an evaluation of active labour market policy in Ireland Seamus McGuinness, Philip O'Connell & Elish Kelly

20 Overview This study focuses on assessing the effectiveness of the Job Search Assistance (JSA) component of the National Employment Action Plan (NEAP). The NEAP was Irelands principal tool for activating unemployed individuals back into the labour market. Under the NEAP, individuals registering for unemployment benefit were automatically referred to FÁS for an interview after 13 weeks on the system. The FAS interview is aimed at helping claimants back into work through advice and placement and referring others for further training. Individuals with previous exposure to NEAP i.e. those with a previous history of unemployment are excluded and will not be referred to FÁS for a second time. NEAP was distinct in an international sense in that it was characterised by an almost complete absence of monitoring and sanctions. Unusually, it did not appear to hinge on the principal of mutual obligation.

21 Evaluations Objectives To assess the extent to which individuals participating in the NEAP were more likely to find employment relative to nonparticipants To assess the extent to which individuals in receipt of both interview and training had enhanced employment prospects relative to those in receipt of interview only (impact of training). We are going to focus on the effectiveness of the referral and interview process.

22 Problem 1: No control group? Selection under the NEAP is automated and universal. If all claimants are automatically sent for interview at week 13 of their claim then how can we construct a counterfactual. i.e. remember counterfactual assesses what happens to individuals in the absence of the programme. The only people not exposed to the programme are those already in employment by week 13. This rules difference-in-difference out for a start. Problem illustrates very clearly that the need for proper evaluation was not a major consideration in the programmes design or implementation.

23 What can we do? Only option is to utilise the fact that individuals with previous exposure to NEAP can t access it again (totally counter-intuitive rule as basically those most in need of support were being excluded from the outset). We take as an initial control group individuals who had previous exposure to NEAP more than two years prior to the study who s contact was limited to a FAS interview. Given the time lapse and changing macroeconomic conditions any advice received by the control group should have declined in relevance allowing some assessment of the impact of the programme. Still even if the above were true we are still left with a selection problem as, prior to the study, all of the control group will have had a previous unemployment spell of at least 13 weeks whereas none of the treatment group will. This difference cannot be eradicated by matching and our estimates are unlikely to be free of bias.

Construction of The Evaluation data Profiling Questionnaire Information for Claimant Population Issued June to September 2006 Weekly Population of Live Register

24 Construction of The Evaluation data Profiling Questionnaire Information for Claimant Population Issued June to September 2006 Weekly Population of Live Register Claimants Li ve Register Claimant Population (September 2006 June 2008) Dataset for NEAP Evaluation Weekly Population of Live Register Claimant Closure Files FAS Events Histories

25 New Control Group Found? On linking the data we found that around 25 % of new claimants were not being referred by DSP to FAS after 13 weeks unemployment duration, despite these individuals having no previous exposure to the NEAP. We need to establish what is going on here, are we missing something in terms of the referral process and, if not, what are the factors driving the omission and are they random. A list containing the PPS numbers of our potential new control group was sent to DSP for validation.

26 Validation checks DSP confirm that individuals had fallen through the net. No concrete explanation found. Most likely that individuals were not referred when number of referrals in DSP office exceeded slots in local FAS office and had been subsequently overlooked when slots became available. Even before we begin we have uncovered major problems with programme processes i.e. 25 % of potential claimants excluded and a further 25% missed. Clear example of how process evaluation becomes a component of an impact evaluation.

27 the control group A natural experiment? Entire Claimant Population From NEAP Evaluation Database Treatment Group New clients qualifying for NEAP intervention and intervened with who were on register for at least 20 weeks (N=4111) Control Group I New clients qualifying for NEAP intervention but not contacted who were in register for at least 20 weeks (N=1678) Control Group II Previous NEAP clients with light interventions with two year gap who were on register for at least 20 weeks (N=3074)

28 Data and methods In terms of econometrics, we estimate probit and matching models augmented by additional checks for unobserved heterogeneity bias. All models contain a wide range of controls for educational attainment, health, location attributes, access to transport, age marital status, labour market history etc, that we available to us as a consequence of the profiling data.

29 How random are our control groups? Is there a selection problem? Total Treatment Control Control Sample Group Group I Group II Gender: Male Female Age: Age Age Age Age Age

30 Marital Status: Single Married Cohabits Separated/Divorced Widowed Children Education/Training: Primary or Less Junior Certificate Leaving Certificate Third-level Total Sample Treatment Group Control Group I Control Group II Literacy/Numeracy Problems English Proficiency Apprenticeship Transportation: Own Transport Public Transport

31 Total Treatment Control Control Sample Group Group I Group II Employment History: Employed in Last Month Employed in Last Year Employed in Last 5 Years Employed Over 6 Years Ago Never Employed

32 What are the descriptive telling us? The treatment group and control group I look very similar which would suggest that the process that generating control group I was random in nature. There are more substantial differences between the treatment group and control group II in that the latter tends to be more disadvantaged in terms of their observable characteristics. Potential for selection bias here.

33 Kaplan-meier survival estimate Treatment Group Control Group 1 Control Group 2

34 PSM Estimates FÁS Interview (Nearest Neighbour) FÁS Interview (Kernel) Control Group I & II (Model 1) (0.018)*** (0.013)*** Control Group I (Model 2) (0.022)*** (0.017)*** Control Group II (Model 3) (0.028) (0.020)

35 Summary and conclusions - I Strong and consistent evidence that JSA delivered under the NEAP was highly ineffective and actively reduced transitions off the Live Register to employment. Two possibilities arise: (i) claimants received poor advice or (ii) claimants relaxed the intensity of job-search on learning of the absence of monitoring and sanctions. Advice explanation not supported by results as we would expect the negative impact to fall away in medium term models as claimants adjust behaviour.

36 Summary and conclusions - II We conclude that participants attending the interview quickly learnt that their prior fears with respect to the extent of job search, monitoring and sanctions were unjustified and consequently lowered their job search activity levels. Note: - The analysis was found to be robust to the influences of both sample selection and unobserved heterogeneity. - Strong negative JSA effects were also generated using a other estimation techniques (Cox Proportional Hazard Model).

37 An Evaluation of the Back to Education Allowance Elish Kelly, Seamus McGuinness, John Walsh Economic and Social Research Institute Report Launch Seminar, 03 November 2015

38 Activation programme aimed at raising the education and skill levels of social welfare recipients to help them to progress into employment The BTEA is a second-chance education opportunities scheme: Second-level (SLO) and Third-level (TLO) Full-time courses Education institutions under the remit of DES, while DSP administer the payment Eligibility criteria: Qualifying benefit payment (e.g., jobseeker s, one parent family, etc.) Duration of this payment (3 /9 months) Age (21/24) Commencing first year of a course that will lead to a QQI accreditation Received acceptance onto a qualifying course Progressing in educational qualifications Weekly payment: Rate varies according to when course commenced and a person s means.

39 Number Thousands ( ) Between 2007 and 2012, spending on the BTEA scheme more than trebled - from 64.1m to 199.5m; while the number of recipients grew from approximately 6,000 to 25, BTEA Expenditure 30,000 25,000 20,000 15,000 10,000 5, BTEA Recipients Source: Statistical Information on Social Welfare Services (Department Publications,

40 1. Impact of participating in an SLO or TLO BTEA programme on helping participants to transition employment on completion of their course 2. Impact of participating in an SLO or TLO BTEA programme on helping participants to pursue another education, training or employment placement programme 3. Impact of participating in an SLO or TLO BTEA programme on keeping individuals out of unemployment on completion of their course.

41 Data big improvement! Anonymised data was provided to the ESRI by the DSP from its new Jobseekers Longitudinal Dataset (JLD) The JLD tracks the social welfare claim, employment, training and activation programme episodes of all individuals that made a jobseeker or one parent family payment claim since 2004 JLD created through amalgamation of 5 administrative data sources Rich dataset for conducting counterfactual impact evaluations because of the level of individual detail within it Nevertheless, there are still some data limitations e.g., educational attainment, exact qualification or course pursued, course duration, completion, accreditation, etc. This evaluation is a pathfinder with regard to the use of the JLD to evaluate the Department s remaining activation programmes.

42 Methodology I: Counterfactual Analysis Want to know what would happen to unemployed individual if he/she had not participated in a BTEA option want to measure the counterfactual Various methods for estimating the counterfactual, but they all generally rely on measuring the difference in outcomes between people participating in the programme (the treatment group) and those eligible for participation but did not (the control group).

43 Methodology III: BTEA Separate evaluations conducted for SLO and TLO BTEA options: 1. Overall participation in SLO or TLO programme 2. Level of attendance ( < 1 year, 1 year, etc.) Evaluated in terms of Live Register status in June 2012 and June Time points selected to ensure analysis not affected by lock-in effects. The treatment group were individuals who entered in programme in or around September 2008 while the control group were similar individuals on the live register at that time who did not enter the preogramme. Employed Propensity Score Matching (PSM) techniques, which is standard in impact evaluations of public policies.

44 Methodology III: BTEA Separate evaluations conducted for SLO and TLO BTEA options: 1. Overall participation in SLO or TLO programme 2. Level of attendance ( < 1 year, 1 year, etc.) Evaluated in terms of Live Register status in June 2012 and June Time points selected to ensure analysis not affected by lock-in effects Employed Propensity Score Matching (PSM) techniques, which is standard in impact evaluations of public policies.

45 Exited from Unemployment June 2012 June SLO Participants: Overall -30.5*** -25.4*** SLO Level of Attendance: < 1 Year -26.3*** -21.7*** 1 Year -30.6*** -22.7*** 2 Years -33.7*** -29.0*** 3 Years -42.4*** Years *** 2008 TLO Participants: Overall -19.9*** -14.0*** TLO Level of Attendance: Up to and including 1 Year -18.8*** -19.6*** 2 Years -14.3*** Years -29.4*** -16.1*** 4-5 Years ***

46 Exit to Employment June 2012 June SLO Participants: Overall -38.0*** -29.3*** SLO Level of Attendance: < 1 Year -28.9*** -36.9*** 1 Year -33.5*** -30.6*** 2 Years -31.1*** -38.6*** 3 Years -42.6*** Years *** 2008 TLO Participants: Overall -23.1*** -13.7*** TLO Level of Attendance: Up to and including 1 Year -21.1*** -18.4*** 2 Years -19.6*** Years -34.7*** -17.7*** 4-5 Years ***

47 Summary and Conclusions - I Objective of the BTEA scheme is to raise the education and skill levels of social welfare recipients to help them to progress into employment For those individuals that entered the BTEA scheme in Sept/Oct 2008, the evaluation results indicated that the BTEA was not effective in achieving this objective: Relative to a control group of similarly unemployed individuals, participants in both components of the BTEA scheme (SLO and TLO) were substantially less likely to be in employment four and six years following entry into their respective BTEA programmes The one exception to this result was for TLO participants who received BTEA support for two years: in terms of employment prospects, such individuals were no different to the control group Conducted various sensitivity checks and the results held firm.