Note of the British Academy/GSR evaluation event 12 th July Evaluation: impact, challenges and complexity

Size: px
Start display at page:

Download "Note of the British Academy/GSR evaluation event 12 th July Evaluation: impact, challenges and complexity"

Transcription

1 Note of the British Academy/GSR evaluation event 12 th July 2017 Evaluation: impact, challenges and complexity The Government Social Research Network and the British Academy convened a half day workshop with civil servants, academics and other evaluation specialists to discuss evaluation methodologies, generating evidence of impacts, and challenges relating to complexity. The purpose of this workshop was to discuss the frequent pitfalls in commissioning, designing and delivering robust evaluation, and to explore good methodologies that can be employed when evaluating policy programmes. Jenny Dibden (Head of the Government Social Research Service) Jenny provided an overview of the GSR and the importance of evaluation in Government. In particular she mentioned the following key elements for successful evaluation in Government: - Good communication between evaluators and policy; - Having a good counterfactual; - Triangulating evidence from different sources; - Engagement with key stakeholders; - Getting evaluation on the agenda early on. Professor Oriana Bandiera FBA (Chair of the event) Oriana spoke about impact evaluations and ways to engage academia at appropriate stages of an evaluation. She thought that the British Academy could play a role in bringing the right people together. She provided some examples of Government policies around the world that have been evaluated and produced unexpected negative results, and how Governments responded to these results. She outlined some ingredients for successful collaboration between Government and academia: - Having a shared interest in the policy being evaluated; - Starting the evaluation early; 1

2 - Keeping the evaluation going beyond the end of the policy to take account of longer term impacts; - Embed researchers in the policy context. She also noted that Randomised Control Trials (RCTs) don t solve everything and highlighted some of the challenges to the RCT approach including: a) that large samples are needed; b) uptake is still optional; c) drop-outs are possible; d) spillage of other interventions; e) an RCT may be politically infeasible. Julian Barr (Vice President of the UK Evaluation Society) Julian spoke about the added benefit of evaluation. He emphasised the importance of trustworthy evaluation i.e. evaluation that is unbiased; precise; conceptually sound; and repeatable. He referred to the Maryland scale of evaluation where the evaluation design is rated from 1 (correlation between an intervention and an outcome) and 5 (the RCT - Randomised Control Trial) and whilst RCTs may be considered the gold standard, the level of rigour required should depend on the type of intervention/ policy being evaluated. For example, in some circumstances where the cost of an intervention is low and the risk of a negative outcome is not high, there is an argument that a less rigorous evaluation design is appropriate. He advised putting evaluation resource where: - Uncertainty about the outcome is high; - Large samples of people are involved; - The uptake pathways are clear; - There are proportionate costs. He noted that complexity had driven innovation in evaluation and there may be other approaches to evaluation which do not involve a counterfactual. For example: 1) Generative framework e.g. theory based approaches such as Theory of Change evaluation. 2) Comparative framework e.g. case study based approaches such as Quantitative Case Study Analysis. 3) Participatory framework. Siobhan Campbell (Head of the Cross-Government Evaluation Group) Siobhan spoke about the different functions of evaluation accountability, to demonstrate a policy s efficacy and value for money and as a learning exercise to understand what works and why. As part of this, she discussed some of the common pitfalls, tensions and challenges in Government evaluations. Some of the key themes from Siobhan s talk included: 2

3 - The need to be clear why an evaluation is being conducted from the outset, with explicit learning objectives and an understanding of how the findings will be used. In this way the evaluation will be useful, rather than simply a tick box exercise. - In order to be useful and used, evaluation needs to be timely planned early, built into the policy implementation and producing results in line with policy decision points. - Learning from evaluation should be more than just does it work? We need to know what parts work, in what contexts and why for the learning from the evaluation to be genuinely useful (especially in the face of changing policies) - The importance of policy ownership of the evaluation. Evaluation is one of the government s policy profession s core policy delivery skills. Evaluation must be seen as an essential tool for successful policy delivery. - The benefit of making in flight adjustments to the policy/initiative during policy implementation as evaluation evidence emerges. - The need for government evaluators to work collaboratively with external evaluators (whilst acknowledging some of the challenges around the procurement of evaluation work and whether evaluators should be embedded/independent). - The importance of having a shared understanding of the policy objectives across the policy team and with the internal and external evaluators. Developing the theory of change as the initial steps in an evaluation helps to achieve this. Siobhan also noted that the Magenta Book is currently being revised and is expected to be published later in Case study discussions: 1) NHS England New Care Models evaluation The 'New Care Models' programme is a large NHS transformation project aiming to improve the health and wellbeing of patients, and the efficiency of the NHS. As many as 50 'vanguards' (or pilots) were established across England to implement the new ways of working across primary care, secondary care, elderly care homes and social care. Each vanguard differs in terms of both the care model being implemented to meet the needs of its patient population, and its local approach to evaluation. This case study provides a brief overview of the central approach to evaluation (which is multifaceted) and introduce some of the challenges faced in evaluating this vast and complex transformation project. Such challenges include: Measuring impact the practicalities of establishing a good evaluation design. Having central influence over evaluation design when evaluations are commissioned locally. Managing the pressure to provide positive signs of impact, at the expense of learning (which would include both positive and negative results). Dealing with the need to provide results yesterday. 3

4 Synthesising the large amounts of data and evidence from the different vanguards. Some solutions to the challenges presented included: - Revisiting initial logic models to understand key questions of interest and use these as a framework for synthesising the wealth of evidence. - Acknowledge that there may be different key questions for different audiences e.g. politicians; commissioners; local deliverers. - Use the newly established Community of Practice as an enabler to facilitate the sharing of good evaluation practice (where evaluations are commissioned locally, peer support rather than central enforcement may work better) and drawing together of the evidence. - When faced with the pressure to provide results yesterday, present the narrative on why change cannot be seen immediately. - It may not be possible to synthesise the evidence from all the evaluations/interventions. Instead map out the key interventions and prioritise the interest. 2) DCLG Homelessness Designing an impact evaluation of interventions to prevent rough sleeping DCLG is currently running a Homelessness Prevention Programme, which includes 20 million allocated to projects tackling rough sleeping. A number of these projects are aiming to take a No First Night Out approach - in other words, preventing people from sleeping rough - by taking a data driven approach to identify people at risk. DCLG would like to evaluate whether the right people have been identified and whether the interventions are effective. Where these projects are being run in areas with a case level data collection system on rough sleepers, projects are asked to use a scaled risk assessment (no, low, medium, high). This will enable DCLG to evaluate whether people who do not meet the risk level needed to receive an intervention, go on to sleep rough, and whether the extent to which people then sleep rough is associated with their level of risk. This will provide an indication of whether the risk assessment process is effective. However, we cannot see a way of designing a robust impact evaluation of the interventions. There are significant challenges involved in evaluating any preventative work, but these are exacerbated when considering the target group, on whom there is only very limited data. Here the problem was around the difficulty of conducting a robust impact evaluation. Possible solutions included: - Data pooling between different organisations. - Establishing good baseline data. - Having a good, multidisciplinary advisory group to sound out ideas with. - Trying a different approach to evaluation such as Quantitative Comparative Analysis. - Looking at the impact on pre-cursers to/indicators of homelessness rather than homelessness itself, which is challenging to measure. 4

5 3) DWP Health-led trials This case study examines the challenges of designing large-scale, mixed methods trials in a local healthcare setting with multiple delivery partners. It presents the challenges of reconciling differing views on what trials should be seeking to achieve and their level of ambition; agreeing a design; modelling expected outcomes and working towards implementation. What are the health-led trials? Individual Placement and Support (IPS) is a place then train model of supporting people with health conditions back into work that is well evidenced when working with people with severe and complex mental health conditions. The success of IPS in this setting has prompted policy makers to seek to understand whether its success can be replicated with people with mild to moderate mental and physical health conditions. The health-led trials are being run in two areas of the country, West Midlands Combined Authority and Sheffield City Region, to test IPS with these new groups using an RCT approach with supporting process and economic evaluation activity. Questions to consider Where do the key risks for the evaluation lie in design or implementation? To what extent do the realities of working in a cross-departmental setting and with a range of local delivery partners and stakeholders require choices that compromise the study design? How big an issue is contamination? How do you balance these various competing priorities and risks to deliver a successful trial? This case study also considers the importance of governance, communications and leadership in navigating a public policy trial landscape. Evaluators and policy makers need to work together in order to ensure the success of complex projects As shown in the logic model, the trial includes different layers, ranging from the activities of the intervention, the actions resulting from these activities and the different resulting outcomes. It is a combination of factors that lead to the outcomes (not necessary linear). The question that arises then is how to evaluate something so complex? Mix of different methodologies (e.g RCT, CBA, etc) Challenges o Health and Welfare sectors: how to combine two different cultures? Need relationship building and personal interactions. o Local leadership and how local councils talk to each other. It is easier to implement activities in areas where there is a strong leadership. Where the culture is more collaborative, it proves more difficult. Risks o Where there is a strong leadership, the risk is to lose momentum if the key person leaves. Therefore, contingency planning and adequate recording and documentation is important to ensure continuity. 5

6 o Risk of losing touch of what is actually happening on the ground after design as there is much reliance on local authorities to report back. Issues are not always flagged as there can be financial impacts if the project is shown to fail. o Coordination risk -proper governance and resourcing is important, including a project management office, and the wider delivery structure is much bigger than the evaluation team; o Implementation risk -the variety of local partners upon whom the success of the project rely is wide; it is important that communications about the evaluation are delivered effectively and repeatedly, and with consideration of local politics; o Contamination risk there is plenty of scope for local deliverers to distort the trial whether knowingly or otherwise; it is important that researchers go out into the field to make the case with local deliverers. Mitigation o There needs to be an appropriate framework for each step of the project o There also needs to be a programme oversight covering design, implementation, evaluation, communications, etc. o How to avoid contamination? Record processes and changes as well as creating an open space to discuss thoughts. 4) DWP Group Work Trial This case study examines why some trials fail and the challenges of delivering a trial in a complex environment. It discusses how good trial design and active trial management can help to ensure the successful delivery of a trial. What is Group Work? The intervention was developed in the US, called JOBS II. It is an employment support package delivered over 5 x 4 hour sessions within 1 week. The support is designed to generate social engagement within a group (between job seekers) to develop and share skills and experiences. The discussions and exercises, facilitated by a trainer, are designed to improve psychological resilience to the stresses of job search. Can a good trial design overcome these issues? In a word, Yes. However, active trial management is required to mitigate the many other potential issues that could cause your trial to fail, such as: Adherence to the process; Staff buy-in/lack of engagement; Operational pressures, and; Challenge to equipoise. The case study also focuses on the management of such issues within the Group Work Trial. Questions to consider What steps have you taken to ensure successful delivery of trial? What other factors have emerged in live running that risk trial failure; how are these managed? 6

7 What is the role of active trial integrity management? This group discussed a group work trial on an employment support package aimed at improving health and employment outcomes. Some of the themes discussed included: - Challenging the assumptions around power calculations. - Are we being overly concerned about ethical issues? - How can we make randomisation work? - Flow samples can be an issue with cluster design as there is not an exact defined cohort because the cohort will increase. - Getting the buy-in from those who are central to making an RCT work. - The culture around the trial. Question and answer session Q1) There is often a personal/ethical dilemma if the evaluation flags up something that has not been expected, what should we do in these circumstances? A1) A good relationship between the commissioner and evaluator is essential in order for the evaluator to have the confidence to mention this. An early commitment to publishing regardless of outcome can help to overcome this, or publishing a preanalysis report. Q2) How can we evaluate policies when they are constantly changing? A2) This is a reality and the panel were sympathetic to the problem. It may be helpful to plot a time line of changes and the outcomes. Also, the pace of evaluations needs to change so that learning cycles become quicker. Q3) Non-counterfactual approaches can be difficult to communicate to policymakers who tend to like concrete answers such as something improved by x% etc. How do we manage this? A3) Again having a good relationship between policymakers and the evaluator is crucial here. The panel also highlighted the importance of ensuing that policymakers buy into the theory of change of the policy/intervention, early on when thinking about evaluation. Q4) One of the key themes of today has been that all evaluations are different. Can we establish what the key components should be? A4) The magenta book will help with this by providing greater clarity of definitions and methodologies geared towards what policymakers need to know from evaluations. 7

8 Q5) The profession of an evaluator has been noted today what is this? A5) In some other countries you can become a charted evaluator, but we haven t gone down that route in England. It is more important to train those who commission evaluations. It is more than just a check list of skills. Q6) How much should trials be actively managed/left to evolve? A6) It depends on what question is being asked. Often evaluations start with a methodology rather than a clear objective, which may be the wrong approach. One suggestion is to stick to the evaluation design, and then evaluate the evaluation. For more information about this event and the series of British Academy/Government Social Research activities, please policy@britac.ac.uk. 8