The evaluation effect and the unintended consequences of measuring impact

Size: px
Start display at page:

Download "The evaluation effect and the unintended consequences of measuring impact"

Transcription

1 Laterite Working Paper Series Working Paper no3 [DRAFT VERSION. NOT TO BE QUOTED OR CIRCULATED] The evaluation effect and the unintended consequences of measuring impact HOW DATA COLLECTION BOOSTS OUTCOMES IN A COFFEE AGRONOMY TRAINING PROGRAM IN RWANDA November, 2013 Sachin Gathani, Maria Paula Gomez, Ricardo Sabates, Dimitri Stoelinga * We would like to thank TechnoServe and in particular Nupur Parikh, Paul Steward, Juma Kibiciyu, and Carol Hemmings, for their support and advice. Many thanks also to Henriette Hanicotte for reviewing the paper.

2 ABSTRACT In this paper we exploit a quasi-experimental set-up to study how surveying in a large-scale coffee agronomy training program in Rwanda led to substantial improvements in farmer performance levels and altered the way beneficiaries experienced the project. We call this the evaluation effect. We find that surveying led to surprisingly large increases in farmer participation levels in the project and also led to improved project outcomes. The higher the frequency of the surveying, the larger the impact and the longer the duration of the effect, despite decreasing returns over time. Interestingly, we find that the evaluation effect had a disproportionate impact on farmers that did not attend the training sessions regularly, suggesting that surveying can be an effective tool to target and lift the participation levels of the worst performers. Results have useful implications for (i) the design of development projects and (ii) for program evaluation. We argue that governments and organizations investing in development projects should use evaluation not just as a tool to fulfill reporting requirements but also as an effective way to improve project outcomes, in particular for worst performers. For program evaluation, especially when high frequency data collection is required, our findings imply that it is necessary to account for the potential evaluation effect in the experimental design. 2

3 1. Introduction This paper takes a slightly different twist to impact evaluation, focusing not on the effects of a specific intervention but rather on the effect of the evaluation itself, which is usually something we cannot observe. More specifically, we study how surveying efforts in a largescale coffee agronomy training program in Rwanda meant to enable an objective evaluation of the project led to substantial improvements in farmer performance levels and altered the way beneficiaries experienced the project. We call this the evaluation effect and study some of its properties. Results have useful implications for (i) the design of development projects and (ii) for program evaluation. In this paper we argue that governments and organizations investing in development projects should use evaluation not just as a tool to fulfill reporting requirements but also as an effective way to improve project outcomes, in particular worst performers. For program evaluation, especially when high frequency data collection is required, our findings imply that it is necessary to account for the potential evaluation effect in the experimental design. This is because data collection does not affect the treatment and control groups in the same way. It is usually impossible to observe the evaluation effect because there is no counter-factual. By definition we only have information on people from whom data is collected and, in most cases, data is collected in the same way for everyone in a given sample. The research team stumbled upon the findings presented in this paper by chance while working on an impact evaluation of TechnoServe s Agronomy Training program in Rwanda 1. The reason it was possible to observe the evaluation effect at work in this case is because TechnoServe collected a very large amount of data on every farmer in the project and had an effective monitoring and evaluation (M&E) system in place with a unique identifier for each farmer. More specifically, TechnoServe collected individual participation data for farmers in each training session and this in each consecutive project Cohort (every year TechnoServe selected a new Cohort of farmers in different cooperatives to participate in the project). In each Cohort, TechnoServe created two random samples of farmers for evaluation purposes: a yield sample, from which yield data and agronomy practices data were collected on a regular basis (coffee trees in Rwanda produce over a seven month period, so regular monitoring was required to get a decent estimate of yield levels); and, a best practice 1 Initial results were presented in our report for TechnoServe, entitled: Laterite, Independent Assessment of TechnoServe s Coffee Agronomy Program in Rwanda Final Report, February

4 sample, from which farmers agronomy practices data was collected once or twice per year. While TechnoServe s monitoring and evaluation strategy was not designed for this specific purpose, this did give us an almost perfect quasi-experimental design to test: (i) the effect of high - and low - frequency data collection on participation levels in the program; and (ii) to test how high versus low frequency data collection affects desired project outcomes, in particular the adoption of best practices. The impact of surveying on the people s behaviour and their decisions has been widely studied in academic literature for market research. However, most studies in this area have focused on the effect market research has on consumer s future purchasing behavior and attitudes towards a particular brand. For example, Dhokolakia and Morwitz (2002), Levav et al. (2006) and Morwitz et al. (1993) and Morwitz et al. (2004) found that asking questions about intentions to buy or about brand perceptions led to significant increases in purchases and that these effects were long lived, lasting between two months and one year. Additional findings include: (i) the fact that frequent surveying can lead to polarizing effects; (ii) that easy to represent behaviors lead to more pronounced evaluation effects and (iii) that subjects with previous experience with the specific product are less affected by frequent surveys (Morwitz et al., 1993 and Levav et al., 2006). In economic and social research, few have studied the impact of surveying. To the best of our knowledge the only major study that looked into the question of whether being surveyed changes the behavior of interviewees and subsequently program outcomes were a series of experiments conducted by Zwane et al. (2010). The authors studied the effect of surveying in five different socio economic programs in developing countries three health programs and two micro-lending programs - and found that in three of the five cases studied, frequent surveys led to higher program effects. In the three health programs, surveying led to an increase in the use of water treatment products and a higher take-up of medical insurance. More frequent surveying on reported diarrhea also led to biased estimates of the impact of improved source water quality. The results from all these studies show that surveying can change the behavior of individuals. Three potential channels through which this happens have been identified: (1) the Hawthorne effect, where behavior changes as a result of being observed as part of an experiment (McCarney et al., 2007); (2) the mere-measurement effect, where the future behavior of subjects changes as a consequence being asked specific questions about intentions and 4

5 predicted behavior, in particular when it comes to brands and purchasing criteria (Dhokolakia and Morwitz, 2002, Levav et al., 2006 and Morwitz et al., 1993, Morwitz et al., 2004); and (3) the reminder effect, where the simple act of asking people about a particular action serves as a reminder (Zwane et al., 2010, Karlan et al., 2010). In line with results in the literature on this subject, we find that surveying had a surprisingly large impact on project participation rates and project outcomes in a coffee agronomy training program in Rwanda. Farmers that were surveyed did significantly better both in terms of participation and the implementation of best farming practices than farmers that were not surveyed. The higher the frequency of the surveying, the larger the impact and the longer the duration of the effect. The effect works in the same way as an impulse response function, where the impulse has decreasing returns over time, and where we observe a gradual decay of the effect over time. Interestingly, we find that the evaluation effect had a disproportionate impact on farmers that did not attend the training sessions regularly, suggesting that M&E can be an effective tool to target and lift the participation levels of the worst performers. We interpret this evaluation effect to result from either the increased motivation that comes with more face-to-face interaction between project M&E staff and beneficiaries (this is one manifestation of the Hawthorne effect ) or the fact that data collection serves as a reminder to farmers that they need to attend training sessions and implement certain best practices. The structure of the paper is as follows: we first briefly describe the TechnoServe Coffee Agronomy program, its M&E system and the nature of the data we are working with. We then provide an overview of our methodology and the research questions we are trying to answer, before presenting the results we obtain for both high and low frequency surveying. We conclude on the implications of these findings for project design and program evaluation. 5

6 2. TechnoServe s Coffee Agronomy Program in Rwanda Background TechnoServe is an international nonprofit organization that promotes business solutions to poverty. One of TechnoServe s focus areas is the development of coffee value chains. TechnoServe s Coffee Agronomy program in Rwanda, part of a larger East Africa Coffee Initiative, started in and was designed to increase farmer productivity through a two-year training program focused on best coffee farming practices. At the time of writing, TechnoServe s Rwanda Coffee Agronomy program was in its fifth and last year of operations and more than 30,000 farmers had already completed (or were in the process of completing) the two-year training program. The coffee agronomy program is targeted at farmers that are members - or that live in the vicinity of newly established cooperatives that meet a certain set of criteria, including their management and administrative structure as well local geo-climatic conditions. Each year, TechnoServe ranks newly established cooperatives in the country on this set of preestablished criteria and selects the top 5-10 ranking cooperatives to participate in the program. Members - or non-member coffee farmers that live in the vicinity - are then invited to register (i.e. self-select) into the program. Each year a new batch or what we refer to as Cohort - of about 10,000 farmers are added to the program. Although there is no random selection at the cooperative-level, successive project Cohorts are relatively similar on average: they are not geographically concentrated, cooperatives are selected using the same set of criteria and hence are likely to have similar characteristics on average, and finally farmers self-select into the program, so the risk of selection bias at the farmer level is small 2. In this paper we focus separately on the 2009, 2010, and 2011 Cohorts of the program and not surprisingly find very similar results in each 3. The two-year program consists of between training sessions, one session per month in the first year of training and one session every two months in the second year of training. Training is delivered by a Farmer Trainer (i.e. a person who has been trained by TechnoServe to administer the training) to small groups of about 30 farmers in a very 2 For more details on the composition of Cohort see Laterite, Independent Assessment of TechnoServe s Coffee Agronomy Program in Rwanda Final Report, February Full data for Cohort 2011 was not yet available at the time of writing and randomization within the Cohort was not very successful. We nevertheless use Cohort 2011 to test whether the type of surveyor has a large effect on the results obtained. 6

7 structured and hands-on way. The training takes place in the plot of a focal farmer, who is elected by participant farmers within a community to serve as a focal point for the program. The curriculum, which has been consistent across Cohorts, focuses on a number of known sustainable coffee farming practices or best practices - that improve the productivity of coffee trees and reduce their cyclicality. Monitoring and evaluation in the program In order to measure the performance and impact of the agronomy training program on coffee yields and best practice adoption, TechnoServe put in place a very elaborate M&E system that consistently collected three types of data on project beneficiaries: (i) attendance to training sessions on all farmers in the program (along with gender, cooperative affiliation and the training group they belong to); (ii) best practice adoption data on a randomly selected group of farmers ( best practice sample ); and (iii) yield and best practice adoption data on a separate randomly selected group of farmers to measure the productivity of coffee trees ( yield sample ). A unique farmer identifier links all the samples. Over the years TechnoServe staff collected over a million data points on the performance of project beneficiaries. The mode and frequency of data collection varied for each of these data collection processes and is summarized in TABLE 1 below: TABLE 1: TYPE AND FREQUENCY OF DATA COLLECTION Variable Data Collected Sample Sample Size All farmers in a Attendance Data Attendance to Entire training Cohort (between training sessions population 5,000-10,000 per Cohort) Best practice Randomly selected Best-Practice 500 to 1,000 farmers, adoption rates for 15 sample of high Adoption Data depending on Cohort best practices attendance farmers Randomly selected Yield Data Daily weight of sample of farmers farmers per cherry harvest (selection method Cohort depends on Cohort) Frequency of Data Collection Every training session Twice per year (during and after training) At least once per month during coffee season (during and after training) 7

8 Low-frequency data collection: in this paper low-frequency data collection refers to the collection of best practice data from a randomly selected group of high attendance farmers that we call the best practice sample. TechnoServe tracks farmers in the best practice sample on 11 best coffee-farming practices and the use of 4 types of fertilizer, as summarized in TABLE 2 below. Data on best-practice adoption is collected twice per year (starting in year 2 of the training program), once in the March to June period and once between the months of July to November (referred to as round 2). During the data collection process, TechnoServe staff (either a Farmer Trainer or a data collector) visits a farmer s plot with a checklist and inspects the field, the trees and the farmer s records. The whole process takes no more than minutes. This inspection enables the staff to check whether the farmer has mulched and weeded his/her plot, pruned and rejuvenated the trees, provided enough shade for the trees, taken steps to control erosion, composted, and determine whether the trees are well nourished and whether the farmer has kept good records. The data collector then asks some pre-determined questions to test the farmer s knowledge of Integrated-Pest- Management, one of the focus areas of the TechnoServe curriculum, and asks the farmer whether he owns the required protection equipment to safely use pesticide, and what pesticide he uses, including NPK, composting, zinc/borium and lime. In each Cohort, there are approximately 800 farmers that form the best practice sample. TABLE 2 PERFORMANCE VARIABLES ON WHICH FARMERS ARE TRACKED Best-practices Record keeping (BP1) Mulching (BP2) Weeding (BP3) Nutrition (BP4) Composting (BP5) Rejuvenation (BP6) Pruning (BP7) Safe use of pesticides (BP8) IPM (BP9) Erosion Control (BP10) Shade Management (BP11) Fertilizer usage Composting NPK Zinc/Borium Lime 8

9 High-frequency data collection: high-frequency data collection refers to the regular collection of yield and best-practice data from a randomly selected group of farmers that we call the yield sample. During the coffee season (from March through to August/September), farmers in the yield sample receive monthly visits from TechnoServe staff. They are given scales to estimate daily coffee production and trained to record it into a calendar. Towards the end of each month, the TechnoServe staff visits to collect the completed coffee production calendar of the corresponding month and provide farmers with a calendar for the subsequent month. In addition to these monthly visits, TechnoServe s trainers or data collectors visit the coffee farm once per year to survey the number of trees on the farm - thereby enabling the M&E team to calculate yield levels - and once or twice per year to collect information on best practice adoption. Some plots are also randomly spotchecked by TechnoServe business advisors 4 to ensure that the data collection efforts are resulting in accurate production estimates. Finally, farmers are asked to sign a contract with TechnoServe confirming they have received the scales and committing to keeping daily records on coffee production. This data collection system amounts to regular and structured interactions between project staff and farmers in the yield sample. This is not the case for regular farmers who only interact with project staff at the monthly training sessions. In each Cohort, there are approximately 300 farmers in the yield sample. Another dimension of the evaluation to take into account is who collected the data since the identity of the data collectors are an important factor in program evaluation design. In Cohorts 2009 and 2010 attendance, yield and best practice data was solely collected by Farmer Trainers. Two or three Farmer Trainers were placed in each cooperative and were responsible not only for training and running demonstration plots, but also for data collection. TechnoServe received some criticism for the fact that the trainers themselves - with a stake in the success of the program - were also the ones collecting performance data on their trainees. This led TechnoServe to change the system from 2011 onwards (corresponding to Cohort 2011), at which point a team of independent enumerators was hired and trained to collect the same information. This break in the data collection system between Cohorts 2010 and 2011 also makes it possible to compare whether the results we obtain are significantly different depending on who collected the data. 4 TechnoServe Business Advisors oversaw project activities in a number of cooperatives, while TechnoServe Trainers based at the cooperative level delivered the actual training and took care of a lot of the data collection efforts. 9

10 3. Methodology While it might not have been intentional, the design of TechnoServe s M&E system provides us with an almost perfect experiment to: (i) test the effect of high and low frequency data attendance rates over time; and (ii) to test the effect of high frequency data collection on best practice adoption at the end of the training period. Estimating impact on attendance rates. Given that session-by-session attendance data is available on the entire training population and that the best practice and yield sample were randomly selected from that population (amongst farmers that achieved certain attendance criteria), all that is needed to measure the impact of data collection on attendance rates is to compare the attendance rates of a treatment group, consisting of either best practice sample or yield sample farmers, to that of a control group, consisting of all the remaining farmers that comply to the relevant selection criteria. The only difference between the two groups should in theory be the fact that the treatment group was subject to data collection and more frequent interactions with project staff (this is what we refer to as the treatment ) while farmers in the control group only interacted with project staff during training sessions. We will see that in practice selection into the yield and best practice samples was not fully random and that cooperative members and male farmers were slightly over-represented in both. We correct for these small baseline distortions, that do not in any way affect our conclusions, using inverse propensity score weighting (see Hirano et al, 2003). The propensity score is calculated using Kernel Regularized Least Squares, an innovative machine learning technique developed by Hainmueller and Hazlett (2012) and Ferwerda et al (2013) that allows researchers to tackle regression and classification problems without imposing strong functional assumptions. 5 We estimate impact using a simple difference-in-difference strategy where the baseline consists of average pre-treatment attendance rates and the endline of average attendance rates during or after the data collection period. The length of these periods, both before and during the intervention, varies depending on the Cohort and the sample. The baseline can consist of one or multiple training sessions, as can the treatment itself. Note that this implies that the results we obtain for different Cohorts and samples are not directly comparable, even though 5 Ferwerda et al (2013) 10

11 in practice we find that the estimated impact of data collection is consistent across Cohorts and data collection frequencies. Formally, we can specify this difference in difference equation as follows: where is the average attendance over the period of interest, is a dummy variable for the second time period corresponding to the data collection period, is a dummy variable for whether a farmer belongs to the treatment group or not, the coefficient that multiplies the interaction term and is our difference in difference estimate, and a list of covariates to control for baselines differences between treatment and control (including gender, whether the farmer is a cooperative member or not, the size of the training group they belong to, and which cooperative area they are in). More details on how the control groups were constituted are presented in the ensuing sections where we focus on the impact of both high and low- frequency data collection efforts on attendance rates in Cohorts 2010 and Estimating impact on best practice adoption. It is also possible to test whether highfrequency data collection led to improved agronomic practices by comparing best practice adoption rates in the yield sample to best-practice adoption rates in the best practice sample. Both are randomly selected groups of farmers from the same population and are therefore comparable. The treatment effect here is the difference in the impact of high frequency data collection versus low frequency data collection. Assuming that surveying actually affects best practice adoption rates whether it is high or low frequency surveying - this estimate would result in an under-estimation of the actual treatment effect. We only make the comparison at one point in time i.e. at the end of the project - as sample sizes and the time lapse between subsequent data collection periods are too small to enable a difference-indifference analysis. Given that selection criteria for the yield and best practice samples are slightly different, it is necessary to adjust the inclusion criteria for the best practice samples so that they better match the corresponding yield sample. We also correct for baseline differences between the treatment and control groups using inverse propensity weighting, where the propensity scores are calculated using Kernel Regularized Least Squares. More details on how the control groups were adjusted can be found in the ensuing sections. 11

12 Main research questions. The main research objective of this exercise is to determine to what extent data collection alters the behavior of participants and the outcomes of TechnoServe s coffee agronomy program in Rwanda. Available data and the structure of TechnoServe s M&E system also enables us to look further into which type of farmers are affected the most, whether the frequency of data collection matters, and lastly whether who the data collector is makes a big difference. These questions will enhance our understanding of the mechanisms through which data collection affects or does not affect participant behavior and performance. We start by presenting the results obtained on the impact of high-frequency data collection in Cohorts 2009 and 2010, in which Farmer Trainers collected the data. We then look at the case of low frequency data collection in the same Cohorts, before comparing all results to the Cohort 2011 where data collection was outsourced to an independent team of enumerators recruited by TechnoServe. 4. Results The effect of high-frequency data collection on farmer participation in the program We study the effect of regular data collection efforts on farmer attendance-rates in the TechnoServe coffee agronomy program by exploiting a quasi-experimental set-up resulting from the fact that farmers that were randomly selected into the yield sample received frequent visits from project trainers during the coffee season (we call this the treatment group), while farmers that were not included in any survey samples did not interact with the trainers or other project staff outside the monthly training sessions (we call this the control group). The effect of high-frequency data collection on farmer participation in Cohort 2009 and 2010 Yield data was collected from a randomly selected group of farmers in both Cohorts 2009 and In Cohort 2009, farmers that received the treatment (i.e. high frequency data collection), were randomly selected amongst farmers that had attended at least 8 out of 11 12

13 sessions in year one of the program 6. The fact that selection into the yield sample or what we will refer to as the treatment group - was done after one year of training, makes it possible to provide nice visual evidence of how the intervention affected attendance rates in Cohort In Cohort 2010 farmers in the yield sample were randomly selected amongst farmers that had attended the first session of the program. In both cases we show that high frequency data collection led to a significant increase in attendance rates of about percentage points. We call this the evaluation effect. In Cohort 2009, we measure the impact of high-frequency data-collection efforts on project attendance rates by comparing the average attendance rates of treatment farmers in each training session to that of a control group, consisting of all non-treatment farmers that had attended at least 8 sessions in year 1 of training. Given that randomization was done at the cooperative level (the strata) we ensure that standard errors are clustered by cooperative. We also eliminate focal farmers from the sample, who have artificially high attendance rates given that the training is conducted in their coffee plot. Finally we use inverse propensity score weighting to adjust for a bias towards cooperative members and male farmers in the treatment group and to balance the cooperative distribution of the treatment and control groups. We measure the propensity using Kernel Regularized Least Squares. As can be seen in TABLE 3, where we compare treatment and control groups, these adjustments do not in any way affect pre and post-treatment results. While pre-treatment differences between the adjusted treatment group and the control group are negligible both in terms of attendance rates and individual farmer characteristics (see TABLE 3), we find that attendance rates in the treatment group increased substantially during the data collection period and remained high thereafter. FIGURE A depicts average attendance rates for the treatment and control group from session 1 to the very end of the project for Cohort 2009 in session 17 (between January 2009 and November 2010). The attendance rates of the treatment and control groups follow almost identical patterns until the beginning of the data collection period in the treatment group, which started in session 13 (which corresponds to March, 2011) and ended after session 15 (which corresponds to August/September, 2011) 7. 6 Out of 294 farmers in the treatment group only 19 or 6.5% of farmers had attended less than 8 sessions (between 5-7). We drop this group of 19 farmers from the treatment sample as they appear to have been included by errors in the randomization process. 7 Some farmers/cooperatives still had coffee production in the month of September, but for the bulk of farmers in the treatment and control group the coffee season ended in August. 13

14 Attendance rate (%) The Evaluation Effect. Laterite Working Paper, November, TABLE 3 FARMER CHARACTERISTICS AND ATTENDANCE RATES IN HIGH FREQUENCY TREATMENT AND CONTROL GROUPS (COHORT 2009) Variable Treatment Treatment Control Treatment- (adjusted) (not Control adjusted) Average attendance first 11 sessions (pretreatment, condition > 7 out of 11 attended) Average attendance session 12 (pre-treatment, no condition) Average attendance sessions (treatment period) Average attendance sessions (post treatment) 75.45% 75.46% 75.62% -0.18% 78.96% 79.37% 78.47% 0.49% 89.79% 89.68% 76.86% 12.93%*** 85.24% 86.11% 75.15% 10.10%*** Female Farmers (% sample) 30.48% 26.19% 31.38% -0.90% Cooperative Members (% sample) 23.11% 29.76% 22.27% 0.84% Average farmer group size (# farmers per training group) Sample size *** statistically significant at the 1% level FIGURE A Comparing attendance rates in treatment and control groups 100% 90% 80% 70% 60% 50% 40% 30% Treatment (adjusted) Control s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 s17 Training session Data collection period 14

15 Average farmer attendance rates - which had been almost identical in both groups before the treatment at an average rate of about 83% - increased to 89.8% in the treatment group during the data collection period, but decreased to 76.9% in the control group. These results suggest that the treatment (high frequency data collection) led to an increase in farmer attendance rates of 13 percentage points during the treatment period (our difference-in-difference point estimate is 13.1 percentage points). The effect holds and is statistically significant at the 1% level when controlling for differences in individual characteristics of treatment and control group farmers, farmer attendance rates in year one, the size of the training groups farmers belong to, and which cooperative they are in (see Technical Annexes). Interestingly, the evaluation effect persists even after the end of the treatment period. While average attendance rates in the treatment group dropped by a full 7 percentage points between sessions 15 and 16 i.e. between August and September 2010 when the coffee season and the yield data collection period came to an end they nevertheless remained about 10 percentage points higher than in the control group. Average attendance rates in the last two sessions of the project, held between September and December 2010, were 85.2% in the treatment group compared to just 75.2% in the control group. This suggests that 7 months of data collection, from March 2010 through to August 2010, had effects on average attendance rates in the treatment group for at least 4 months after the end of the data collection period. We provide some additional statistics on the durability of the evaluation effect in the section on low-frequency data collection. These results are in line with other similar studies. For example Dhokolakia and Morwitz (2002) found that the effect of measuring consumer satisfaction increases for several months after surveying and can persists a year later. Zwane et al. (2010) also found long-lasting effects of surveying on respondents behaviors. The effect of high-frequency data collection on farmer participation in Cohort 2010 The link between regular and structured data collection efforts and farmer attendance levels is confirmed using attendance data from the 2010 Cohort, where we find that farmers that were randomly assigned to the yield sample at the beginning of the project had significantly higher attendance rates than farmers that were not assigned to any project sample. In the 2010 cohort farmers in the yield sample were randomly selected amongst farmers that had attended 15

16 session 1 of the training program 8. Here we call the treatment group all farmers that were randomly assigned to the yield sample and attended session 1 of the program; the control group consists of all non-treatment farmers in Cohort 2010 that also attended session 1 of the training program. We also exclude from the control group all farmers that were part of the best practice sample, who were subject to low-frequency data collection during the training period. Data collection on yield levels in Cohort 2010 started in session 3 of the program, which corresponds to the month of March High-frequency data collection covered the May to June periods in both years of the training program (which corresponds to sessions 3-6 of the project in year 1, and sessions 12 to 14 in year 2) As in the case of Cohort 2009, we find that the initial attendance rates and composition of the treatment and control groups where we balance the treatment and control groups using propensity score weighting, calculated using Kernel Regularized Least Squares - are relatively similar (seeerror! Reference source not found.). In particular, we cannot reject the null hypothesis that pre-treatment or baseline attendance rates in session 2 (for which we apply no condition) were the same in the treatment and control groups. TABLE 4 - FARMER CHARACTERISTICS AND ATTENDANCE RATES IN HIGH FREQUENCY TREATMENT AND CONTROL GROUPS (COHORT 2010) Variable Treatment Treatment Control Treatment- (adjusted) (not Control adjusted) Average attendance in session 2 (no condition) 82.83% 82.94% 82.42% 0.41% Average attendance after data collection starts (session 3 to 18) 89.13% 89.10% 75.94% 13.19%*** Female Farmers (% sample) 28.20% 28.43% 28.55% -0.35% Cooperative Members (% sample) 28.31% 28.76% 24.48% 3.84%^ Average farmer group size (# farmers per training group) Sample size ^ not significant at 10% level; *** statistically significant at the 1% level 8 We eliminate from the sample of 350 treatment farmers 18 farmers that did not attend session 1 and that were added to the treatment group in session 2. 16

17 Average attendance rate (%) s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 s18 The Evaluation Effect. Laterite Working Paper, November, FIGURE A 100% 95% 90% 85% 80% 75% 70% 65% 60% Comparing attendance rates in treatment and control groups in Cohort 2010 Treatment Control Training sessions As soon as the regular data collection of both yield and best practice data for the treatment group started in session 3 in March, 2010, attendance rates in the treatment group increased significantly: from 82.8% in session 2 to an average of 89.1% for the remaining sessions of the project, compared to just 75.9% in the treatment group. Yield levels in the treatment group were therefore on average about 13 percentage points higher than in the control group (our difference-in-difference point estimate is 12.8 percentage points) throughout the remainder of the two year program which covered two separate coffee seasons (see FIGURE B) The difference between treatment and control varied quite a lot over time however. In both years, data on yield levels was only collected during the coffee season, which for farmers in Cohort 2010 started in March, peaked in the month of May, and ended in September. Only a handful of farmers in the Cohort were still producing coffee in the months of July, August and September, which means that data collection on yield production was also scaled back during that period. We can clearly see that in FIGURE B, which depicts the difference in attendance rates between the treatment and control group over time in years 1 and 2 of the program. The two camel humps in the curve correspond to the beginning and end of the coffee seasons and the corresponding scaling-up and down of the data collection efforts. During the two data collection periods the difference in attendance rates between treatment and control started increasing rapidly with decreasing returns to scale; after the peak of the coffee season was reached, the difference started decreasing again before settling at a posttreatment minimum. In between coffee seasons and data collection periods, the difference 17

18 Difference in attendance rates (%) The Evaluation Effect. Laterite Working Paper, November, between the treatment and control group increases not because treatment farmers are attending more, but because the post-treatment decrease in attendance is greater in the control group than in the treatment group as a result of the effect of time (attendance rates in the program dropped with time, especially in the immediate aftermath of the coffee season). FIGURE B Difference between treatment and control group (Cohort 2010) 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% Coffee season year 1 Coffee season year 2 s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 s12 s13 s14 s15 s16 s18 Session # Based on the evolution of relative attendance rates in the treatment and control groups over time we can derive the following two properties about the evaluation effect in the case of TechnoServe s Coffee Agronomy program in Rwanda: We can confirm that in Cohort 2010 as well the treatment effect persists for several months beyond the treatment period. In year one for example data collection stopped in September 2010, but the evaluation effect was still present and growing in January 2010, four months later. The difference between the treatment and control groups even increased between data collection periods in Cohort 2010, because attendance rates dropped less in the treatment group than in the control group. High-frequency data collection has decreasing returns over time. In year 1 for example, the first month of data collection led to a 9.6 percentage point increase in attendance rates, the second to an additional 3.7 percentage points, the third to an additional 0.7 percentage point, while the marginal effect turned negative in month four. 18

19 Attendance year 2 (# sessions) Attendance year 2 (# sessions) Difference (# sessions) The Evaluation Effect. Laterite Working Paper, November, High frequency data collection affects low-attendance farmers the most A very interesting feature of the evaluation effect is that in the case of TechnoServe s agronomy program it affected low attendance farmers the most. There is a clear, positive and statistically significant association between farmer attendance levels in year one and two of the program (see FIGURES D and F). The more sessions a farmer attended in year one, the more sessions he/she was likely to attend in year two. The effect of the treatment was to lift the participation levels of low-attendance farmers the most, both in Cohorts 2009 and High participation farmers in year one remained high participation farmers in year two after the treatment, while low-participation farmers that would otherwise have remained low-participation farmers also became high participation farmers in year two. This can clearly be seen in FIGURES E and G, which depict the difference in attendance rates between treatment and control group farmers based on their attendance levels in year 1. The treatment effect on attendance levels disappears for farmers with high initial attendance levels, but increases exponentially for farmers with low participation levels. In Cohort 2009 for example, farmers that had attended only 7 sessions in year one saw their year two attendance rates increase by 25% in year 2, compared to just 1% for farmers that had attended all 11 sessions in year one. Note that the difference for Cohort 2009 is the actual treatment effect as we are comparing average attendance rates after the intervention (i.e. after session 12), with attendance rates before the intervention. This is not the case for Cohort 2010, where the intervention started in session 3. FIGURE C FIGURE D Year 1 and Year 2 attendance rates in Cohort 2009 Year 1 and Year 2 attendance rates in Treatment and Control (Cohort 2009) Attendance year 1 (# sessions) Control 0.4 Treatment 0.2 Treatment effect Attendance year 1 (# sessions) 19

20 Year 2 attendance (# sessions) Attendance year 2 (# sessions) The Evaluation Effect. Laterite Working Paper, November, FIGURE E FIGURE F 5 Year 1 and Year 2 attendance rates in Cohort Year 1 and Year 2 attendance rates in Treatment and Control (Cohort 2010) Control Treatment Treatment effect Year 1 attendance (# sessions) 0 < Attendance year 1 (# sessions) 0 This suggests that high frequency field visits and more face-to-face interaction between program staff and beneficiaries can have a large and significant effect on average project outcomes as they lift the performance of low participation beneficiaries the most. This finding strengthens the case for better linking project design to M&E, utilizing M&E not only as way to determine whether the project is achieving its objectives or not, but as an explicit intervention to improve project outcomes. A potential explanation for this result is that frequent surveys increase awareness of the importance of attending training sessions. Farmers who have high attendance levels from the beginning of the program already know the importance of these sessions and their behavior is not significantly altered by the data collector.. Farmers who do not have high attendance levels may only realize the importance of training sessions after receiving a visit from the data collector, even though none of the surveys directly verify or explicitely reference attendance to the training sessions.. This hypothesis is supported by market research literature where it has been found that the measurement effect is less pronounced for people who have prior experience with the product (Morwitz et al., 1993). High frequency outcomes also affects best practice adoption rates in Cohorts 2009 and 2010 So far we have been looking at the link between data collection and farmer participation levels in the program. Can we also find evidence that the data collection efforts led to 20

21 improved best practice adoption rates in the treatment group (which was the main objective of the project)? Assuming that TechnoServe s training sessions were useful and led to better outcomes 9, we would expect to find that farmers in the treatment group also had higher best practice adoption rates. Additional learning could also have taken place outside the sessions, through more face-time between project beneficiaries and project staff. Moreover, the mere presence of TechnoServe staff could have provided farmers with the necessary motivation, the nudge, to implement what they had learned during the training sessions (the Hawthorne effect). TABLE 5 ALTERING THE CONTROL GROUPS TO ENSURE THAT FARMER SELECTION CRITERIA INTO THE GROUP BETTER MATCH THE TREATMENT GROUPS Cohort Treatment Group Control Group (sub-sample of Best-Practice group) Random selection of farmers that attended at least 8 out of the first 11 sessions (yield sample) Random selection of farmers that attended session 1 of the program (yield sample) All farmers in the best practice sample that attended at least 8 out of the first 11 training sessions (the best practice sample consists of a random selection of farmers that attended at least 8 out of 17 sessions during the training program) All farmers in the best-practice sample that attended session 1 of the program, excluding 4 cooperatives that are not represented in the treatment group (farmers in the best-practice sample were randomly selected amongst farmers that had attended at least 4 out of the first 8 sessions of the program) To test this hypothesis we compare best-practice adoption rates amongst farmers in the yield sample (who received high-frequency data collection) to that of farmers in the best practice sample (who were subject to low-frequency data collection) in both Cohorts 2009 and Comparing the yield and best-practice samples does not amount to a perfect experiment because: (i) farmers in the yield and best practice samples were selected in random but slightly different ways; (ii) both groups were subject to evaluation, albeit of different intensity; and (iii) we only have one observation in time, i.e. no baseline. We alter the best practice samples in Cohorts 2009 and 2010, as presented in TABLE 5, to make them more comparable to their corresponding yield samples. In this section we continue to refer to the yield samples in both Cohorts as the treatment group, but call the farmers in the bestpractice samples the control group. Note that by comparing outcomes for farmers in the 9 Laterite, Independent Assessment of TechnoServe s Coffee Agronomy Program in Rwanda Final Report, February

22 yield and best-practice samples we are estimating the difference between the effect of high-frequency data collection versus low-frequency data collection on best-practice adoption rates. It is therefore an under-estimation of the actual effect of high-frequency data collection. Treatment effects aside, after correcting for the cooperative distribution of farmers, we would expect farmers in the control and treatment groups to be relatively similar in Cohort 2009 (see TABLE 5, above). Farmers in both the treatment and control group were randomly selected amongst farmers that attended at least 8 out of the first 11 sessions of the training program. The comparison between treatment and control farmers in 2009 is therefore quasiexperimental, the main difference between the two groups being the frequency with which data was collected. This is not the case for farmers in Cohort 2010, where we would expect the control group to perform better. While treatment farmers were selected amongst farmers that had attended session 1 of the program, control group farmers were randomly selected amongst farmers that attended at least 4 out of the 8 first sessions and also attended session 1. The difference between the end-of-project best practice adoption rates of the treatment and control groups is therefore an under-estimation of the actual treatment effect (here treatment is the difference in best practice adoption between farmers that were subject to highfrequency data collection vs. farmers that were subject to low-frequency data collection). Best practice adoption rates for both Cohorts were measured in November 2011, after the end of the training period. TABLE 6 DIFFERENCE IN BEST PRACTICE ADOPTION RATES BETWEEN FARMERS IN THE TREATMENT AND CONTROL GROUPS (NOVEMBER 2011 DATA) Cohort Treatment (% of best practices adopted) 70.4% 71.0% Control (% of best practices adopted) 65.0% 66.2% Difference 5.4%*** 4.8%*** Difference with individual and cooperative-level 4.8%*** 3.2%*** controls (see Technical Annexes) Sample size (Control) Sample size (Treatment) *** significant at the 1% level 22

23 In both Cohorts 2009 and 2010, we find that treatment farmers performed significantly better than control group farmers 10. The best-practice adoption rates among farmers in the treatment group were about 4 percentage points higher than in the control group after controlling for individual characteristics, group size, and which cooperative farmers belong to. More detailed comparisons between the treatment and control groups are provided in Annex 1 and show that farmers that were subject to high-frequency evaluation performed better on almost every single individual best practice. This was particularly true for composting, integrated pest management, NPK utilization and record keeping, where the increases in adoption rates were of the order of 10 percent or more. A higher attendance rate amongst treatment farmers was not the only vector of increased performance in the treatment group however. We confirm findings from the previous sections on attendance rates, but find that higher attendance rates in the treatment group do not fully explain differences in best practice adoption between the treatment and control groups. Farmers in the treatment groups had significantly higher attendance rates than farmers in the control groups (85.0% vs 82.2% in Cohort 2009 and 87.7% vs 82.2% in Cohort 2010), but the treatment effect was robust to keeping farmer attendance rates constant in both Cohorts (see Technical Annexes). Another fact that reinforces this point is that in Cohort 2009 the difference in the performance of the treatment and control group persisted for at least one year after the end of the training period. The best-practice data we are working with for Cohort 2009 was collected in late 2011, one year after the end of the two year training period that finished in December TechnoServe continued to collect yield and best practice data from Cohort 2009 to monitor whether training would have a lasting effect on coffee farmer performance levels.. This suggests that post-training data collection visits had an effect on best practice adoption, even though we cannot disentangle the effect of past data visits between from data collection in These findings present strong evidence that: (i) high-frequency data collection led not only to higher farmer participation levels but also significantly better project outcomes; and (ii) that frequent data collection affected project outcomes in two ways: indirectly, through higher farmer attendance rates, and directly through frequent face-to-face interaction between the beneficiary and the project staff. This face-to-face interaction could have led to more 10 We measure best practice adoption as the share of the 11+4=15 best practices that farmers have adopted (see TABLE 2 for list of all best practices) 23