46266 The World Bank PREMnotes A U G U S T 2 0 0 8 N U M B E R 123 POVERTY Assessing Our Work on Impact Evaluation Markus Goldstein, Emmanuel Skoufias, and Ariel Fiszbein Over the last several years, the World Bank has increasingly engaged in impact evaluations as means of building evidence for results. During this process, the Bank has also produced an extensive variety of knowledge products. However, there are several institutional and resource issues that constrain the effectiveness of our impact evaluation work. This brief outlines recent gains in the Bank's work on impact evaluation, highlights several issues, and proposes some options to continue improving and expanding the Bank's efforts in this area. Introduction port of impact evaluation of interventions that Demand for evidence of the results of development are not directly funded by the Bank. assistance is increasing. Donors and country Of the ongoing and completed impact partners alike are keen to see demonstrable evaluations shown in figure 1, over one third are results from the programs and interventions cases where the Bank is engaged in the impact that are supported through Bank assistance. In evaluation, but is not providing financing for addition, there is increasing scrutiny outside of the intervention being evaluated.2 Thus there the donor community. For example, academics are arguing for more rigorous evaluations1 and the Center for Global Development is spear- Box 1: Impact evaluation: heading an independent evaluation initiative. definition and methods Finally, as the Bank scales up its interventions, Impact evaluations compare the outcomes of a the ability to demonstrate the impact of aid on program against a counterfactual that shows what development outcomes is critical for attracting would have happened to beneficiaries without the additional support, particularly from bilateral program. The counterfactual corresponds to a donors. In response to these and other pres- group that is as similar as possible (in observable sures, the Bank has increased its emphasis on and unobservable dimensions) to those benefit- the use of monitoring and evaluation tools. ing from the program. The counterfactual group is Among monitoring and evaluation techniques, established by either assigning the program ran- impact evaluation is an important tool for showing domly (experimental designs) or through a range the effect of interventions. The rigorous nature of of statistical techniques (quasi-experimental tech- impact evaluation provides a clearly attribut- niques). Unlike other forms of evaluation, impact able measure of the effect of interventions and evaluations allow attributing observed changes in hence is a powerful tool to show their value. In outcomes to the program being evaluated. addition, the rigor of this approach strengthens the argument for the portability of lessons, While in many cases randomization does reduce allowing the program design and evaluation the number of assumptions necessary to draw results in one country to influence others. causal inferences, it is not feasible or desirable in Given the power of this tool, the Bank is sup- all situations. The general best-practice approach porting an increasing number of impact evaluations at the Bank has been to use whichever experi- (figure 1). Moreover, the knowledge generated mental or quasi-experimental technique works through impact evaluation is a public good, best and to be explicit about the limitations of the which is an important rationale for Bank sup- results obtained. FROM THE POVERTY REDUCTION AND ECONOMIC MANAGEMENT NETWORK Figure 1: World Bank Impact Evaluations, the establishment of impact evaluation as a by Year and Status recognized AAA product, thereby providing a formal venue for this work to be recognized. 180 A large share of current evaluations has been initiated in the networks and the Regions. On 160 the network side, Human Development (HD) is leading a large program of impact evaluation 140 and regional workshops. Regions are pursu- 54 ing a range of strategies, from the centralized Non-Bank projects 120 Africa Impact Evaluation Initiative to the more Bank projects decentralized efforts of South Asia (SAR), East evaluations 100 Asia and the Pacific (EAP), and Latin America and the Caribbean (LAC). impact 80 of Central Issues 60 Although the number of impact evaluations is grow- 30 107 ing overall, some Regions and networks are more Number 40 active than others. Given that knowledge gaps 13 about what works and what does not exist in all sectors and Regions, there is a fair amount of 20 41 28 work to be done to encourage staff to include impact evaluation in their work, especially in 0 the Sustainable Development Network (SDN), Before After Ongoing 2004 2004 Private Sector Development Network (PSD), and the Europe and Central Asia Region (ECA). Year and status Most ongoing impact evaluations are in the Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. social sectors (figure 2), which reflects not only the support provided by the HD Network, but also that there is more of an evaluation tradition are two overlapping roles for impact evaluation in these areas and that the projects are more within the Bank: first, as a tool to evaluate our amenable to impact evaluation techniques. own work, and second, as a tool and service we Appendix 1 shows the breakdown of impact can provide for others. As bilateral donors and evaluations by sector against lending by sector national governments increase their demand for FY05­07. for rigorous evidence, we can expect this latter The Regional picture is also a skewed one. role to assume increased importance. Africa is the leader with 61 ongoing4 evalua- A number of key actors have helped to promote tions, followed by SAR (39), LAC (32), and EAP this large increase in impact evaluation over the past (23). Middle East and North Africa (MENA) few years. The Development Impact Evalua- and ECA have 3 evaluations each.5 Appendix 2 tion (DIME) initiative3 of the Chief Economist provides a lending comparison for the regional has organized and help coordinate clusters of breakdown of evaluations. A significant reason evaluation on similar topics, while the Poverty for Africa's lead is the Africa Impact Evaluation Group of the Poverty Reduction and Economic Initiative, which provides dedicated staff for Management (PREM) Network, through the promoting impact evaluation in the region. Thematic Group on Poverty Impact Analysis, While this large number of ongoing impact Monitoring, and Evaluation, has helped de- evaluations is encouraging, many of them are still in velop staff capacity through a wide range of early stages (figure 3), which makes it essential that knowledge products. The Bank's Development we maintain quality and continuity going forward. Research Group (DECRG) provides additional Figure 3 shows that, of the 225 evaluations now capacity building and is currently undertaking in progress in the Bank, 138 have no tangible a significant number of impact evaluations. The commitment; that is, they have not reached the Results Secretariat of the Operations Policy point of collecting baseline data (appendix 3 and Country Services Network (OPCRX) led provides a breakdown of the status of ongoing evaluations by Region and by sector). In ad- 2 PREMNOTE AUGUST 2008 Figure 2: Impact Evaluations (Ongoing) by Category Agriculture and Employment Environment 4% 2% Other CCT and other 1% Social Protection Private Sector 11% Development and Microfinance 9% Health, Nutrition, Other and Population infrastructure 19% 11% Urban Upgrading 6% Youth Programs Education, 2% CDD/ ECD, and Social Funds Training 11% 24% Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. dition, the typical impact evaluation will take tice relies mostly on the initiative and interest 4­5 years to produce findings. These ongoing of individual TTLs. Objective criteria for which evaluations potentially are a rich pipeline of interventions should be evaluated could include evidence, but their uneven status also raises the following: (i) an intervention that is innova- some concerns. Therefore, it is essential that we tive in delivery or content, (ii) an intervention maintain continuity and quality as evaluations that will inform the evolution of sector strategy, move along. A number of factors make quality (iii) an intervention that involves significant control a challenge. First, most impact evalu- risk due to its size, (iv) the application of an ations will outlast the average tenure of a task intervention that has worked elsewhere to a team leader (TTL) and there is no guarantee new context (such as conditional cash transfers that the next TTL will be committed to this in Africa), and (v) the feasibility of developing activity. This is compounded by the fact that a counterfactual. The solution here is for sec- Regional and sectoral support staff for impact tor/regional management teams to review the evaluation are thin (if it exists at all) and thus pipeline of projects that meet these criteria and are incapable of guiding all ongoing evalua- facilitate the evaluations. tions through to completion.6 Finally, despite Limited government ownership is a possible the creation of impact evaluation as a product constraint to the expansion and effective use of im- line, there is no central mechanism to review pact evaluation. The big exception to this point and monitor the quality of these evaluations comes from Latin America, where countries There is insufficient strategic direction to deci- such as Mexico and Colombia press the Bank sions about which Bank interventions get evaluated, for help on impact evaluation. These requests especially outside of the HD Network. Current prac- include demands for the evaluation of indi- AUGUST 2008 PREMNOTE 3 Figure 3: Status of World Bank Impact Evaluations 80 74 70 64 60 60 50 Evaluations 40 Impact 30 of # 21 20 10 6 0 Under Evaluation Baseline data Follow-up Analysis in discussion designed collected data collected progress Status Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. vidual interventions and also Bank support for ensure that lessons from impact evaluations building national results management capabili- (successes and failures both) are widely dissemi- ties. These demands for national capacity are nated and used for policy. Below, we describe six instructive because they suggest a model for components of a potential strategy to support making this work: building a role for impact impact evaluations. evaluation into work on national monitoring and evaluation systems. Note also that impact 1. Identify priorities for evaluation evaluation is an area of business in which we Identifying priorities requires work on both are experiencing increasing demand from our sides of the Bank's organizing structure. middle-income clients (for example in LAC, Networks, through a process they determine, but also countries such as Indonesia). should identify priorities for clusters of evalu- ations based on gaps identified in their sector Options to Help Meet Impact strategies. On the other side, country teams should identify priorities based on country dia- Evaluation Challenges logue or country program issues. Cross-sectoral Any strategy to address these issues will likely coordination can be provided by DIME. For need to include a number of underlying at- example, one of the current DIME clusters is tributes. First, it will be important to support local governance, which brings together impact the current groundswell of impact evaluations evaluations from Social Development (SDV), initiated by project TTLs. Such support can be the Least Developed Countries Expert Group delivered through appropriate incentives and (LEG), PREM, and DECRG. resources, accompanied by a strategic direction for new or underserved areas and a commit- 2. Ensure quality ment to maintaining quality. Second, a strategy A three pronged approach could support the for impact evaluations will need to balance twin, delivery of high-quality impact evaluations: and potentially conflicting, goals: (i) country a. Staff capacity to supervise and execute ownership, and (ii) the Bank's need to identify impact evaluations needs to be strength- priority areas for impact evaluation based on ened (discussed below). Given that the best a global perspective. Finally, the strategy must approach to impact evaluation involves a 4 PREMNOTE AUGUST 2008 close and ongoing engagement between the resource issue: making sure that there is a suf- project team and the evaluation team from ficient skill base across regions and sectors (or the start of project design, this training is in central units such as DECRG). In the case critical to ensuring quality. where the staff member is supervising an exter- b. Support teams could be put in place. These nal evaluator, the issue is again one of capacity, could be located in the network VPUs or in but this is easier to rectify through the type of network-Region teams, as in the example training we identify below, perhaps combined discussed in Box 2 on Africa's education with some back-stopping support from sectoral initiative. support teams. c. The Bank's review process can be used to monitor quality as the evaluation is car- 4. Build capacity ried out. The impact evaluation guidelines As part of the process of increasing the quantity call for a concept note and a peer review. and scope of evaluations, it would be desirable In addition, this being a form of AAA, to provide additional training to Bank staff and regional chief economists should be made client governments, in addition to developing responsible for ensuring quality. Currently, global learning materials such as the Doing Im- much of this feedback process has been go- pact Evaluation series.7 Training for Bank staff ing on through informal meetings. A more could be provided at two levels. The first level formal process could help ensure that, with would be to provide a more intuitive explana- the growing number of impact evaluations, tion of impact evaluation concepts and methods individual evaluations get the feedback they coupled with practical considerations of how need. evaluation could be used for policy (this level could be something akin to the current PREM 3. Maintain continuity in week course). This level would equip staff to the face of staff transitions supervise and intelligently consume impact Bank staff engage in impact evaluation in two evaluations. The second level would build on main ways: supervising the execution of the the graduate school training of staff to develop evaluation (and using the results) and doing the their technical skills to actually carry out evalu- evaluation (especially the analysis) themselves. ations (this level could be like an expanded In the latter case, the main issue for continuity version of the current DECRG one-week mod- will be ensuring that the replacement for the ule). In terms of developing the skills of client evaluator is sufficiently skilled to continue the governments to supervise and consume impact work on the evaluation. This suggests a human evaluations, we could provide training similar to that used for Bank staff. Integrating this training with the de- Box 2: The Africa Program for velopment of national results management Education Impact Evaluation capabilities could help to not only place the The Africa Program for Education Impact Evalu- tool of impact evaluation in context, but also ation provides one model for sectoral-regional enhance the demand for a broader monitoring teams on impact evaluation. Under an initiative and evaluation system. To build client capacity funded by the Education for All Trust Fund, the for executing impact evaluations, we propose program solicited country and TTL interest in that each Bank-sponsored impact evaluation education evaluations. Eleven countries expressed be required to involve 1­2 local researchers interest in participating. Program management and a local research institution. Given the will be provided by the regional impact evaluation current rate of Bank evaluations, this learning- team, with supervision and support from the Africa by-doing approach should yield a worldwide education group. In addition, a technical support cadre of 200­300 trained researchers in the team will be set up to provide help with issues in next few years, providing significant progress impact evaluation design, household and facility towards one of the goals outlined at the Hanoi surveys, sampling, and costing of interventions. Roundtable on Results this past year. As these This structure will support government evaluation researchers interact with policy makers, they and evaluation research teams, which work in col- will function as powerful domestic advocates laboration with Bank TTLs. for improved monitoring and evaluation more generally. AUGUST 2008 PREMNOTE 5 5. Learn from the results support), additional support will be needed to Putting the lessons of individual evaluations to- make sure that this database remains current. gether heightens their utility for policy making. 6. Commit more resources and incentives Given their alignment with sectoral strategies, for more and better impact evaluation network teams would likely have a clear role Current incentives and resources for impact in analyzing, aggregating, and disseminating evaluation are inadequate. While impact evalu- the results of impact evaluations. However, ation now exists as an AAA product, not many the presence of cross-sectoral clusters of activi- management units are devoting significant ties aimed at a common problem suggests the resources to this activity. Given that the evalu- need for a cross-sectoral vehicle to coordinate ation program will be driven in part by knowl- this knowledge accumulation and dissemina- edge gaps identified in their sector strategies, tion--and DIME exists to play this role. This networks (including their regional affiliates) cross-sectoral learning mechanism could also should make some of their existing resources be well placed to tackle the task of comparing available for impact evaluation. In addition, at results across interventions, as well as evaluat- the staff level, there is currently a lack of coher- ing programs that intervene in multiple sec- ent incentives to initiate evaluations. The result tors simultaneously. For example, we have is that most of the TTLs who engage in evalu- clear evidence that conditional cash transfers ation either do so on their own initiative or at increase enrollment and have effects on health the request of their immediate manager or the and poverty. This has no doubt contributed to client government. These incentives should be the proliferation of these programs. But what altered to provide the proper signals to staff. we do not know is whether CCTs are the most Both the capacity development and staff- cost effective way to get these benefits. In ad- ing of sector support teams will require the dition, a more comprehensive understanding commitment of resources in terms of funding of the impacts of interventions will require a and staff time. Project and government funds broader perspective and examination of the are often used to fund evaluation activities once major binding constraints facing economic ac- a project is underway, but the management of tors. For example, interventions such as CCTs different evaluation programs, pre-program to increase the demand for schooling will be less evaluation data collection, and training will effective where the supply of schooling is con- all require other sources of funds. While the strained. It is important that we understand the recent Spanish Trust Fund (SIEF) received by constraints on the supply side and the demand the HD Network provides one example of ways side. Finally, markets may fail for reasons en- to mobilize new funds for impact evaluation tirely unrelated to individual firms and people, support, there are some restrictions. These so it is important to understand that the impact include limits on sectoral coverage and lack of of interventions is conditional on the structure support for capacity development originating of markets. For example, it may be difficult to from Bank headquarters. The fact that over design interventions to match workers with jobs 160 applications for a total of US$40 million if labor markets are constrained by regulations were received in response to the latest SIEF call such as hiring and firing restrictions or binding for proposals also provides evidence of a clear legislated minimum wages. resource constraint (SIEF will provide US$6 This analysis and lesson collecting should million in this round). The upcoming PREM- be complemented by steps to make sure that the IEG impact evaluation conference can provide individual results are quickly and easily avail- a venue to mobilize donors around this issue able. To this end, the Poverty Reduction Group and develop a more flexible and broader source within PREM currently maintains a searchable of financial support. Web-based database of over 200 completed The lessons provided by impact evaluations Bank and nonBank impact evaluations. De- are a global public good. For example, as we mand for this data seems fairly strong, with have seen with conditional cash transfer pro- over 44,000 hits last year. However, with the grams, countries can learn quite a lot from each increasing number of evaluations both inside others' rigorously analyzed experiences. This and outside the Bank in coming years (and the public good aspect provides a powerful argu- current reliance on soon-to-expire trust fund ment for some sort of international subsidy. 6 PREMNOTE AUGUST 2008 Appendix 1: Lending and Ongoing Impact Evaluations, by Sector Figure A1.1: World Bank Lending by Bank Sector Industry and Communication; Agriculture, Fishing, Law and Justice, and and Forestry Public Administration 15% 18% Health and Other Education Social Services 21% 14% Finance 14% Energy and Mining, Transport, Water, Urban Upgrading 18% Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. Figure A1.2: Impact Evaluations by Bank Sector Industry and Communication; Law and Justice and Public Administration 2% Agriculture, Fishing, and Forestry 11% Health and Other Social Services 27% Education 30% Finance 11% Energy and Mining, Transport, Water, Urban Upgrading 19% Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. AUGUST 2008 PREMNOTE 7 Appendix 2: Lending and Impact Evaluations, by Region Figure A2.1: World Bank Lending by Region SAR AFR 23% 24% MENA 4% EAP 16% LAC 18% ECA 15% Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. Figure A2.2: Impact Evaluations (Ongoing) by Region SAR 24% AFR 38% MENA 2% LAC 20% EAP ECA 14% 2% Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. 8 PREMNOTE AUGUST 2008 Appendix 3: Impact Evaluation Status by Region and Sector Figure A3.1: Status of Ongoing Impact Evaluations by Category 70 60 50 evaluations 40 impact 30 of 20 Number10 0 Under Evaluation Baseline Data Follow-up Data Analysis in Discussion Designed Collected Collected Progress Status Agriculture & Environment Community Driven Development/Social Funds Conditional Cash Transfers and other Social Protection Education, Early Childhood Dev. & Training Health, Nutrition & Population Private Sector Development & Microfinance Urban Upgrading Other Infrastructure Youth Programs Other Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. Figure A3.2. Status of Ongoing Impact Evaluations by Region 70 60 50 SAR MENA evaluations 40 LAC ECA EAP impact 30 AFR of 20 Number 10 0 Under Evaluation Baseline Data Follow-up Analysis in Discussion Designed Collected Data Collected Progress Status Source: World Bank Impact Evaluation Database at: http://www.worldbank.org/impactevaluation. AUGUST 2008 PREMNOTE 9 Endnotes 5. Calculations based on information in Figure 3 1. See the recent debate on aid in the Boston and Appendix 2. Review: http://bostonreview.net/ndf.html#Aid. 6. The only exception to this are evaluations 2. All graphs and data cited are current as of where DECRG staff are involved, as they have a July 2008. longer time horizon and are more likely to push 3. All graphs and data cited are current as of the next TTL to continue the evaluation. July 2008. 7. The Doing Impact Evaluation series presents 4. "Ongoing" refers to the status of impact general methodological guidance (e.g. Data for evaluations where, at minimum, the evaluation has Impact Evaluations) as well as sectoral methods been designed and not to evaluations that are just notes (e.g. Conducting Impact Evaluations in Ur- under discussion. See color coding in Figure 3. ban Transport). The complete series can be found at http://www.worldbank.org/impactevaluation. This note series is intended to summarize good practices and key policy findings on PREM-related topics. The views expressed in the notes are those of the authors and do not necessarily reflect those of the World Bank. PREMnotes are widely distributed to Bank staff and are also available on the PREM Web site (http://www.worldbank.org/prem). If you are interested in writing a PREMnote, email your idea to Madjiguene Seck at mseck@worldbank.org. For additional copies of this PREMnote please contact the PREM Advisory Service at x87736. PREMnotes are edited and laid out by Grammarians, Inc. Prepared for World Bank staff