Guidance Manual for Independent Evaluation Group Validators Implementation Completion and Results Report Reviews for Development Policy Financing Last Revision: April 2024 Contents Abbreviations ..................................................................................................................................................... v Introduction ....................................................................................................................................................... vi Guidance Manual..............................................................................................................................................1 Section 1. Information on Operation or Programmatic Series ...................................................... 1 Section 2. Objectives and Pillars or Policy Areas of Operation or Programmatic Series ..... 1 Section 3. Relevance of Design ............................................................................................................... 3 Section 4. Rating the Relevance of Results Indicators (RIs) criteria ............................................. 9 Section 5. Achievement of Objectives (Efficacy) .............................................................................. 13 Section 6. Outcome................................................................................................................................... 15 Section 7. Risk to Development Outcomes ....................................................................................... 15 Section 8. Assessment of Bank Performance.................................................................................... 16 Section 9. Other Impacts ......................................................................................................................... 21 Section 10. Quality of the Implementation Completion and Results Report.......................... 21 Section 11. Ratings..................................................................................................................................... 22 Section 12. Lessons ................................................................................................................................... 22 Conducting the Task Team Leader Interview as Part of the Implementation Completion and Results Report Review Exercise.................................................................................................... 23 Boxes Box 1.1. An Example of a Parsed Complex or Compound Project Development Objective (PDO) ..................................................................................................................................................................... 2 Box 1.2. Numerical Scores for Prior Action Relevance Ratings ........................................................... 8 Figure Figure 1.1. Calculating the Overall Outcome Rating ............................................................................. 15 Tables Table 1.1. Numbering and Listing Prior Actions in a Programmatic Series: An Example from Mauritania ............................................................................................................................................................ 5 Table 1.2. Assessing Relevance of a Prior Action or Set of Related Prior Actions ........................ 7 Table 1.3. Ratings Methodology: Deriving the Overall Rating from Subratings ...........................8 Table 1.4. Rating the Relevance of Results Indicators ............................................................................ 9 Table 1.5. Sample Table on Results Indicators (Required) .................................................................. 12 iii Contents Table 1.6. Step 1: Assigning Achievement Ratings to Each Results Indicator ............................... 13 Table 1.7. Step 2: Rating Efficacy at the Objective Level ..................................................................... 14 Table 1.8. Rating Bank Performance .......................................................................................................... 19 Table 1.9. Example of a Ratings Summary Table .................................................................................. 22 iv Abbreviations CPF Country Partnership Frmaework DPO development policy operations FCV fragility, conflict, and violence HS highly satisfactory HU highly unsatisfactory IBRD International Bank for Reconstruction and Development ICR Implementation Completion and Results Report ICRR Implementation Completion and Results Report Review IDA International Development Association IT indicative trigger IEG Independent Evaluation Group MDB Multilateral Development Bank MS moderately satisfactory MU moderately unsatisfactory PA prior actions PDO project development objectives PPAR Project Performance Assessment Report RI results indicator S satisfactory SCD Systemic Country Diagnostic TTL task team leader U unsatisfactory v Introduction The Implementation Completion and Results Report (ICR) is one of the World Bank’s main instruments for project- and operation-level self-evaluation. It is prepared by World Bank staff within six months of the close of every project funded by the International Development Association (IDA) and the International Bank for Reconstruction and Development (IBRD) or, in the case of a series of programmatic development policy operations, within six months after closing of the final operation in the series. The Implementation Completion and Results Report Review (ICRR), conducted by the Independent Evaluation Group (IEG), is an independent, desk-based, critical validation of the evidence, results, and ratings of the ICR in relation to the project’s design documents. It also assesses additional dimensions of the ICR to help promote staff learning. Based on the evidence provided in the ICR and an interview with the task team leader at closing of the operation(s),1 IEG validates the ICR findings and adjusts the ratings appropriately, based on the evaluation criteria agreed with Operations Policy and Country Services. IEG reviews all ICRs. This manual provides guidance to evaluators preparing ICRRs on ICRs for development policy financing operations. It provides guidance for and gives examples of how to structure ICRRs with respect to content, presentation, and ratings. It also provides guidance on the preparation of ICRRs for development policy operations (DPOs) in countries affected by fragility, conflict, and violence to better reflect their particular characteristics and realities and make the ICRR a better tool for learning. Although this guidance manual does not focus on writing style, the ICRR should comply with IEG’s writing style guidelines found in the Independent Evaluation Group Style Guide. 1If the Implementation Completion and Results Report Review is for a programmatic series, questions may arise that can be answered only by previous task team leaders, who should then be interviewed. vi Guidance Manual Section 1. Information on Operation or Programmatic Series Section 1 is filled in automatically by the system. Make sure your name appears as the evaluator. Note any missing fields. Section 2. Objectives and Pillars or Policy Areas of Operation or Programmatic Series 2a. Objectives Section 2a should describe the project development objectives (PDOs) of the operation or series. Step 1: The formal PDO for the operation or series should be indicated in this section. The formal PDO is that which appears in the operation’s Financing Agreement and or the Project Appraisal Document (PAD). If the PDO in the Financing Agreement differs in any way from that in the program document, the difference should be noted. If no formal PDO is stated in either the Financing Agreement or the PAD, this should be noted. In lieu of a formal PDO, the PDO identified in the Implementation Completion and Results Report (ICR) should be described. For a programmatic series, describe any changes or evolution in the PDOs across operations. There should be no assessment of the PDO in this section; it is purely descriptive. Step 2: When necessary, it may be useful to “parse” the PDO to arrive at the underlying de facto objectives. Sometimes, the PDO consists of several distinct objectives (that is, it may contain different objectives that either are loosely related or require policy actions in separate and distinct areas). If so, you should articulate the parsed objectives for the purpose of the Implementation Completion and Results Report Review (ICRR) validation. It may be useful to review the prior actions (PAs) to inform the best articulation of parsed objectives. • Example: The PDO “improve access to education and energy and foster financial inclusion” should be parsed into “improve access to education,” “improve access to energy,” and “foster financial inclusion.” • Example: If the PDO is “promotion of fiscal consolidation,” and the operation supports reforms on both the spending and revenue sides, the PDO could be parsed into revenue and expenditure components (for example, “control 1 Guidance Manual government spending” and “increase revenue mobilization”). See Mato Grosso Fiscal Adjustment Sustainability DPL (P164588). In some cases, you may find that the PDO is set at too high a level or has overly broad objectives (for example, “support inclusive growth”). In such a case, articulating a credible results chain linking the set of PAs to the associated PDO can be difficult. You may need to restate the PDO objectives as de facto objectives that better align with the scope and ambition of the PAs. Step 3: After any parsing, the ICRR text should state, “For the purpose of this ICRR, the objectives of the operation/series (against which outcomes will be assessed) are taken to be:” After this, the parsed, de facto PDOs are listed (see box 1.1). Box 1.1. An Example of a Parsed Complex or Compound Project Development Objective (PDO) PDO: (i) strengthening the policy framework to support state effectiveness, private investment, and social inclusion; and (ii) improving the policy and institutional framework for public financial management. For the purpose of this Implementation Completion and Results Report Review, the PDOs of the operation/series (against which outcomes will be assessed) are taken to be: • Strengthen the policy framework to support private investment • Strengthen the policy framework to support social inclusion • Improve the policy and institutional framework for public financial management. Source: Independent Evaluation Group. Step 4: For section 3b, you will prepare a table that maps the full list of PAs associated with the operation(s) to the parsed objectives from step 3. 2b. Pillars or Policy Areas For the purposes of the ICRR, the terms pillars and policy areas have the same meaning and are used interchangeably. They refer to the area of reform required to support achievement of each objective. The text in section 2b is limited to describing the pillars of the operation as expressed in the program document. 2c. Comments on Program Cost, Financing, and Dates This section describes the amount and source of financing of the operation or program (IDA grant, IBRD, and so on), the approval date of the operation (or dates if a programmatic series), the date(s) it became effective, and the closing date. Specify the amount disbursed, and explain any discrepancies between the amount approved and 2 the amount disbursed. With development policy financing, because most operations are disbursed in a single tranche, differences are almost always due to exchange rate fluctuations between the approval and disbursement dates. If differences are large, you should seek additional information from the task team leader (TTL) during the standard ICRR interview. For a large movement in the exchange rate, the ICRR could note the movement between the approval and disbursement dates. This information can be found on the International Monetary Fund web page “Exchange Rate Archives by Month” at https://www.imf.org/external/np/fin/data/param_rms_mth.aspx. Section 3. Relevance of Design 3a. Relevance of Objectives Section 3a discusses the relevance of each objective (as parsed and described in 2a). The objectives of the operation (or series) are expected to contribute to country-specific development objectives and should reflect reform priorities as identified in diagnostic or analytical work. The discussion of the relevance of objectives should address the following questions: • Are the objectives relevant to tackling country-specific development constraints as identified in the Systematic Country Diagnostic (SCD) or other relevant analytical work (for example, Financial Sector Assessment Program, Debt Management Performance Assessment, Public Expenditure and Financial Accountability, Public Investment Management Assessment, analytical work from other mulitlateral development banks (MDBs) and academic work from research institutions and/or agencies. • Are the objectives relevant to the country’s development strategy and the priorities set out in the Country Partnership Framework (CPF)? o The discussion and assessment of relevance should go beyond simply noting that objectives are consistent or aligned with the Country Partnership Framework (CPF) or the country’s development plan. The text should assess the extent to which the objectives of the operation(s) would address priority country-specific challenges (for example, as identified in the Systematic Country Diagnostic or other diagnostic or academic work including that of other MDBs and research institutions ). In effect, it should assess why the operation is a good use of scarce World Bank resources. An objective may be relevant if it responds to a significant shock or development not foreseen when the SCD, CPF, or country specific national development plan or strategywas prepared. • Are the objectives important enough to warrant direct World Bank involvement? 3 Guidance Manual • Is the level at which the objectives are set appropriate, given the depth and scope of the reforms supported? (Generic objectives pitched at too high a level often lack specificity and extend well beyond the scope of the PAs.) If objectives are too high level and ambitious to be credibly achieved by the PAs of the development policy operations (DPOs), this should be noted. o Note: The ICRR does not evaluate the ambitiousness of the objectives. However, the ambition of objectives should be consistent with the scope and ambition of PAs—that is, it should be feasible for the reforms supported by the PAs to make a meaningful contribution to achievement of the objective(s), for example, by addressing important preconditions for reform progress. When PAs in support of a PDO are few and narrowly focused, the PDO should be similarly focused. For example, if PAs are limited to reforms in a single sector, a PDO that seeks “economic transformation of the economy” would be considered too broad or at too high a level. For countries affected by fragility, conflict, and violence (FCV), the discussion of the relevance of objectives may also cover the following points: • The extent to which the objectives are realistic and achievable over the life of the operation or programmatic series, given the FCV country context; • The extent to which the objectives are consistent with the approach, strategies, and priorities identified in the Risk and Results Assessment or similar analysis. For example, in an FCV context, DPOs often have objectives that seek to strengthen a country’s institutions or institutional capacity or build resilience. Where this is the case, it should be noted in the discussion of the relevance of objectives; • Whether the focus of the operation or programmatic series is sufficiently narrow so as not to overtax the limited capacity of the country’s institutions; and • The extent to which the use of a DPO rather than an investment project is justified. For example, DPOs are seldom the best instrument for building technical capacity unless they are complementary to other efforts targeted at capacity building. 3b. Relevance of Prior Actions Section 3b assesses the relevance of PAs in supporting achievement of the policy objectives (as parsed in section 2a). The text should address the following questions: • Does the PA (individually or in combination with other PAs) address constraints to achievement of the associated objective? • Does the PA make a substantive and credible contribution to achieving that objective? 4 You should assess the credibility of the results chain that runs from each PA (or set of related PAs) to the relevant (parsed) objective. Note that a PA may be relevant to more than one objective. To facilitate understanding of the program’s design, PAs should be grouped by objective, and each PA should be listed as it appears in the program document(s)—that is, PAs should not be paraphrased. To help organize the discussion, each PA should be assigned a distinct number. Table 1.1 shows the recommended format for listing and numbering PAs. Numbering is straightforward for a single-operation DPO. However, when the relevance of PAs for a programmatic series is being assessed, analysis can be facilitated by organizing PAs under each DPO, as in table 1.1. In this example of a programmatic series with two objectives, the first operation has four PAs, and the second has three PAs. The PAs are numbered from 1 to 7 and listed in order, with PAs that are part of the same results chain next to each other. Where the PAs are re-numbered in the ICRR, it is helpful to include the original numbering for ease of reference, for eg. PA7 (DPO2-PA1). Table 1.1. Numbering and Listing Prior Actions in a Programmatic Series: An Example from Mauritania DPO 1 DPO 2 PDO 1: Improve domestic revenue mobilization PA1: Minister of Finance has issued an order introducing PA2: Ministry of Economy and Finance, based on a the benchmark tax model for tax exemptions, and has policy communique to the Council of Ministers, has published it in the official gazette, and has compiled a tax notified the companies in full breach of their exemption registry for firms benefiting from tax investment agreements that their tax and customs exemptions under the 1982 Investment Code and the incentives, awarded under the 2012 Investment Code, 1966 Free Zone Area law. will be revoked, effective January 1, 2018. PA3: The Ministry of Economy and Finance has adopted the legal provisions for a comprehensive transfer pricing documentation and disclosure requirements as well as an [sic] effective anti-abuse provisions, which limit an entity’s net interest deductions to a fixed percentage of its profit, measured using earnings before interest, taxes, depreciation and amortization. PDO 2: Increase efficiency of public spending PA4: The Council of Ministers has issued a decree creating an institutional framework for the evaluation, selection, and execution of public investment projects, and has published it in the official gazette. PA5: The Council of Ministers has approved the budget law proposal for 2017 that includes an integrated public investment budget with combined domestic and foreign financed projects. 5 Guidance Manual DPO 1 DPO 2 PA6: The Minister of Economy and Finance has issued an PA7: Minister of Economy and Finance has issued a executive circular requiring the expansion of the policy communique instructing expansion of the automated expenditure-chain system (RACHAD) to treasury management system (RACHAD) to encompass include all eligible EPAs in Nouakchott beginning January revenues and expenditures of all eligible public 1, 2017. agencies starting January 1, 2018, to reduce fiscal risks and enable budgetary savings. Source: Independent Evaluation Group 2021. Note: DPO = development policy operation; EPA = administrative government agency; PA = prior action. In assessing PA relevance, PAs are not expected to be sufficient in themselves to achieve objectives, but they are expected to move meaningfully along the results chain from the PA to the associated objective in the specific country context. Assign a relevance rating for each PA based on a six-point scale, from 1 for highly unsatisfactory (HU) to 6 for highly satisfactory (HS; see table 1.2 and box 1.2). When PAs are clearly part of the same results chain (for example, complementary or subsequent steps in achieving the associated goal), you may assess them collectively. You should provide the following information to justify the assessment and the assignment of each rating, drawing on information contained in the Project Appraisal Document or ICR. • Results chain. How the PA, in the country context (and considering known constraints), is expected to make meaningful progress toward the achievement of the relevant objective. 2 • The rating for each PA should be noted in the paragraph in which its relevance is assessed (but numerical scores should not be included in the text). Where PAs are assessed together (that is, are part of the same results chain), the write-up can be consolidated into a single paragraph, but the distinct ratings for each PA should be articulated. Ratings and justification should reflect the following points: • The clarity and credibility of the results chain linking the PA(s) to achievement of the relevant objective • The extent to which the PA(s) is expected to o Address meaningful constraints to achievement of the objective(s); and o Make a substantive and credible contribution to achieving the objective(s). 2For example, “By establishing detailed reporting on budget outcomes, PA1 is expected to support Uruguay’s implementation of a results-based budgeting framework to strengthen accountability and transparency in the budget process.” 6 • The expected impact of a PA(s) in making progress toward the achievement of the objective(s) that is contingent on subsequent actions not contained in the programmatic series Indicative Triggers (IT): The relevance of indicative triggers is not assessed. In a programmatic DPO series, the indicative triggers normally signify planned PAs for subsequent operations in the series. Occasionally, an indicative trigger may be dropped. Insuch a case, the relevance of the IT is not assessed. However, where an IT was dropped, the evaluator should note in the pertinent PA relevance write up whether the effectiveness of the PA depended upon the subsequent completion or follow-through of the IT that was dropped. Table 1.2. Assessing Relevance of a Prior Action or Set of Related Prior Actions Highly Moderately Moderately Highly Satisfactory Satisfactory Satisfactory Unsatisfactory Unsatisfactory Unsatisfactory Clarity and There is an explicit, A credible results The description The description There is no credibility of comprehensive, and chain linking the of the results of the results reference to a the results convincing results chain linking PA(s) to chain linking chain linking results chain chain the PA(s) to the achievement achievement of the PA(s) to the PA(s) to linking the of the PDO, grounded in the PDO is achievement of achievement of PA(s) to credible analytical work at the outlined but not the PDO is only the PDO is achievement of country level (and explicitly partly unconvincing. the PDO. incorporating lessons learned described or convincing. from similar operations or grounded in experiences). credible analytical work. Importance The PA(s) is The PA(s) The PA(s) makes The PA(s) makes a minor The PA(s) of PA to the dominant makes a major a moderate contribution to the achievement makes no achievement factor in the contribution to contribution to of the relevant PDO. discernible of outcome achievement the the achievement contribution to of the PDO. achievement of of the relevant the the relevant PDO. achievement of PDO. any PDO. Source: Independent Evaluation Group. Note: PA = prior action; PDO = project development objective. In an FCV context, the following should also inform the discussion and rating of the relevance of a PA (or set of related PAs): • Is the PA consistent with the approach, strategies, and priorities identified in the Risk and Resilience Assessment or similar analysis? Does it show an awareness of underlying fragility and conflict dynamics and the need to strengthen public institutions? 7 Guidance Manual • Is the number of PAs (and policy areas) appropriate, given the capacity and implementation constraints? Determining the Overall Prior Action Relevance Ratings To determine the overall relevance rating for PAs, first convert all PA scores to their numerical scores (see box 1.2). The default approach is to assign equal weight to each PA (that is, the overall relevance rating is the simple average of the individual PA relevance ratings). In some cases, one or more particular PA may be considered more important than others. If so, you may use judgment to assign those PAs a higher weight, but the reweighting should be made explicit and a credible justification provided. Box 1.2 can be used again to convert that final score back to the rating scale of HS to HU, with decimals rounded up or down as appropriate (see table 1.3). Box 1.2. Numerical Scores for Prior Action Relevance Ratings Highly satisfactory (HS) = 6 Satisfactory (S) = 5 Moderately satisfactory (MS) = 4 Moderately unsatisfactory (MU) = 3 Unsatisfactory (U) = 2 Highly unsatisfactory (HU) = 1 Source: Independent Evaluation Group. Table 1.3. Ratings Methodology: Deriving the Overall Rating from Subratings PA No. Rating on HS to HU Scale Rating on Six-Point Scale 1 S 5 2 MS 4 3 MU 3 4 HU 1 5 U 2 6 MU 3 7 U 2 8 S 5 Average 3.125 Converted back to rating scale of HS to HU MU Source: Independent Evaluation Group. Note: HS = highly satisfactory; HU = highly unsatisfactory; MS = moderately satisfactory; MU = moderately unsatisfactory; PA = prior action; S = satisfactory; U = unsatisfactory. 8 Guidance Manual Section 4. Rating the Relevance of Results Indicators (RIs) criteria Table 1.4. Rating the Relevance of Results Indicators Moderately Highly Highly Satisfactory Satisfactory Moderately Satisfactory Unsatisfactory Unsatisfactory Unsatisfactory Likely The RI (alone or in conjunction with other RIs) fully The RI (alone or in conjunction The RI (alone or in The RI (alone or in The RI is not impact of and adequately measures the impact of the PA(s) with other RIs) is mostly adequate conjunction with other conjunction with relevant to the the PA in on progress toward achievement of the targeted to measure the impact of the PA(s) RIs) partly measures the other RIs) only impact of the support of outcome through reference to a clear and credible on progress toward achievement impact of the PA(s) on peripherally PA(s) toward PDO(s) results chain. of the targeted outcome through progress toward measures the the reference to a clear and credible achievement of the impact of the PA(s) achievement of results chain. targeted outcome, but its or is not clearly the PDO. link to the PDO is unclear. relevant to achievement of the PDO, or both. Clarity of (i) The definition and (i) The definition and (i) The definition and calculation of (i) The definition and (i) RIs are not defined in program RI calculation of the RI is calculation of the RI are the RI are explained in program calculation of the RI are documentation. definition, clearly explained in clearly explained in documentation, but its calculation not clearly explained in (ii) Data for either the baseline or data program documentation. program is unclear or not in appropriate program documentation. target are missing, and data source, (ii) There are credible documentation. units. (ii) There are clear sources are not indicated. and data baseline data and a clear (ii) There are credible (ii) There are credible baseline data baseline data and a (iii) The RI uses data that are not availability target; the sources of databaseline data and a and a clear target; the sources of target, but sources for available to assess achievement of to calculate the RI are clear target; the data to calculate the RI are clearly data to calculate the RI the target at the time the ICR is clearly indicated. sources of data to indicated. are vague. produced. (iii) The RI is used to calculate the RI are (iii) Credible data are available to (iii) The RI uses data that regularly monitor progress clearly indicated. measure achievement of the target are either not credible or toward achievement of (iii) Credible data are at the time the ICR is produced. not available to assess the target during available to measure achievement of the target implementation of the achievement of the at the time the ICR is programmatic series and target at the time the produced. at the time the ICR is ICR is produced. produced. Source: Independent Evaluation Group. Note: The relevance of RIs is judged within the country context. In countries affected by fragility, conflict, and violence, the availability of regularly updated data for measuring progress may be limited, and you may need to augment the RIs with qualitative indicators. ICR = Implementation Completion and Results Report; PA = prior action; PDO = project development objective; RI = results indicator. 9 Guidance Manual An RI that measures progress toward the objective but does not capture the impact of a PA is not considered relevant for the purposes of the assessment. Example: In a case where the PDO objective was raising domestic tax revenues, the PA was an increase in the value-added tax rate, and the RI measured the revenue to gross domestic product ratio, the RI would be considered moderately unsatisfactory, because although it captured the impact of that PA, it is also influenced by many other factors (for example, increases in other taxes, improved compliance). A better RI would be value- added tax collections. Relevance also requires that each RI be clearly defined, including the associated data source and how the RI is calculated. Finally, RIs that capture the impact of PAs but are not connected to an objective through a coherent results chain are not considered relevant for the purposes of the assessment. Example: In a case where the PA is increased funding for a program providing cash transfers to households conditional on children’s school attendance, an RI measuring the increase in the number of beneficiaries of the cash transfer program would adequately capture one impact of the PA. However, if the relevant objective is to ensure better funding and targeting of programs for people living in poverty, the RI would not adequately capture the targeting element. Without another indicator capturing targeting, the relevance of the RI would be considered marginally unsatisfactory. In an FCV context, institution building is critical. One or more RIs in this context would generally be expected to capture some aspect of this objective. The absence of indicators measuring progress toward this objective (whether explicit or not) should be noted. Required Table in Section 4 Section 4 of the ICRR should list the RIs as described in the program document. For ease of understanding the results chain (and for assessing efficacy later), group these by objective (as parsed in section 2). Section 4 should include a table that contains information on both the relevance of RI and RI efficacy ratings (to be discussed in the following section; see table 1.5 for an example). 3 The table should contain the following columns: 3Results indicator baseline and target values (and associated dates) are included in the table, although that information is not discussed until the discussion of efficacy in section 5. The table should note the status of the indicator at the target date in the last column. Often, this information is contained in a table in the Implementation Completion and Results Report and can be directly imported, although the information may need to be reorganized. 10 Guidance Manual • RI number and description • PA(s) for which the RI is intended to capture impact • Rating of RI relevance (see table 1.4 for guidance on rating RI relevance) • The baseline and target values of the RI from the program document, including associated years • Most recent data on RI (and date of observation) • Assessment of actual change in RI relative to targeted change o Example: If the operation envisioned an increase in a particular RI from 40 to 100, the targeted increase is 60. If over the course of the operation, the RI increased to 70, the actual increase is 30. In the table, you should note that only one-half of the planned change was achieved. • RI achievement rating In a programmatic series, list only the RIs and targets in place at approval for the last operation of the series (RIs that are dropped should be excluded). An RI used in several operations but for which the RI target value changed should focus on the RI target for the last operation in the series. You should still make note (in the text) of RIs that were dropped or changed during the life of the series (this should also be noted in the section on Bank performance—Implementation in discussing the adaptation of the series over time), but the assessment of relevance (and efficacy) should be based on only the final set of RIs and targets. The criteria for assigning relevance ratings to RIs are described in table 1.5. These ratings and their justification are discussed in the text. The overall relevance rating for RIs is determined in the same way as for PAs, mapping individual ratings to numerical scores and then taking the unweighted average of the scores. This average is then mapped back to the associated rating after rounding up or down as appropriate. Record the overall relevance rating at the end of section 4. 11 Guidance Manual Table 1.5. Sample Table on Results Indicators (Required) Baseline Target Actual Value Actual Change in Most Recent RI RI Description (Assigning Associated RI (Including (Including as of Target RI Relative to Value Available (If Achievement a Number to Each RI) PA(s) Relevance Units and Date) Units and Date) Datea Targeted Change Not Target Date) Rating Objective 1: Increase domestic revenue mobilization RI1: Tax revenue PA1 MS 17 18.2 Actual 18.8 More than 100% of 19.0 (2020) High (percentage of GDP) (2015) (2019) (2019) targeted change RI2: Public enterprises’ PA2 HU 1.2 0.2 Actual 0.5 70% of targeted [Substantial]b and agencies’ (2016) (2018) (2019) change; extrabudgetary (no data for spending and carry- superior indicator forwards (percentage of available) GDP) Objective 2: Increase private sector participation in nonextractives sector RI3: Executive PPP Unit PA4 S 0 Half of PPP Actual: 100% of More than 100% of 100% (2020) High has reviewed and (2016) portfolio proposed targeted change assessed PPP projects (2018) projects according to new reviewed by regulatory framework PPP unit (2018) RI4: Increase in the PA5 S 27,168 31,000 Actual 55% of targeted 32,130 Modest number of formal (2015) (2018) 29,275 (2018) change properties titled Source: Independent Evaluation Group. Note: GDP = gross domestic product; HU = highly unsatisfactory; MS = moderately unsatisfactory; PA = prior action; PPP = public-private partnership; RI = results indicator. a. For a programmatic series, if the RI was dropped before the final approved operation in the series, use “Dropped” in place of “Actual.” b. RI achievement ratings in brackets (e.g. in Table 1.5 above), where the RI Relevance is MU or lower, reflect ratings achievement that may have been adjusted and discussed in the Efficacy Section (see guidelines in Table 1.6 below). 12 Guidance Manual Section 5. Achievement of Objectives (Efficacy) Section 5 evaluates the extent to which the objectives of the operation or series have been achieved or are expected to be achieved in the near future. Efficacy is defined as the extent to which the objective has been achieved as a result of the PAs supported by the operation(s). Begin by assessing achievement of the target for each RI. Step 1. Assign an achievement rating to each RI using the four-point rating scale in table 1.6. The rating is based on the change in the RI relative to the targeted change (not relative to the RI’s target value). If you determined in the RI relevance section that an RI does not adequately capture the impact of a PA, progress toward the associated objective, or both, or if data for the RI are not credible, you should adjust the achievement rating downward (unless other relevant evidence is produced). If data for the RI are not available, the RI targets should be considered not achieved (that is, negligible). Example: Consider an objective to increase agricultural productivity in citrus fruits and corn, and a PA to give fertilizer vouchers to producers of these two products. The RI was “bushels of corn produced,” with a targeted increase of 2 million bushels per year. The targeted change was achieved. However, the evaluator identified two shortcomings of the RI: (i) the RI focused only on the output side of production (whereas productivity has both an input and output dimension), and (ii) the RI captured only corn production. Because the RI did not adequately measure progress toward the productivity objective or capture the intended impact of the PA on citrus fruit production, the evaluator should downgrade the achievement rating unless additional information can more satisfactorily verify the intended PA impact toward the objective. Table 1.6. Step 1: Assigning Achievement Ratings to Each Results Indicator Rating Description High RI target met or exceeded for the indicator, and RI relevance is rated HS or S. The assessment can be informed by additional evidence. Substantial At least two-thirds of the targeted change in the RI was realized by the target date, and RI relevance is rated MS or higher. The assessment can be informed by additional evidence. Modest Less than two-thirds but more than 25 percent of the targeted change in the RI was realized by the target date, and/or RI relevance is rated MU. The assessment can be informed by additional evidence. Negligible Twenty-five percent or less of the targeted change in the RI was realized by the target date, and/or RI relevance is rated U or HU. When there is insufficient evidence to assess the achievement of the target, and no credible additional evidence is presented, the target is considered “not verified,” which is equivalent to “negligible.” Source: Independent Evaluation Group. Note: HS = highly satisfactory; HU = highly unsatisfactory; MS = moderately satisfactory; MU = moderately unsatisfactory; RI = results indicator; S = satisfactory; U = unsatisfactory. 13 Guidance Manual If the ICR or the TTL provides additional relevant evidence of progress toward achievement of a particular objective as a result of a PA, 4 you may consider this in assessing achievement. You may choose to include additional evidence in the assessment, although you are under no obligation to expend significant effort in locating it. This can include further discussions with the TTL or the project team, or sourcing supervision reports and decision meeting minutes where applicable. Record these ratings in the final column of table 1.4 (Achievement Rating). Step 2: Determine objective-level efficacy. Create a separate section for each objective. Under each objective, summarize the intended outcomes from the objective (the changes expected in the RIs, where RIs are relevant), noting results achieved relative to targeted results and highlighting where RIs were not appropriate for capturing progress. If other relevant evidence is available, describe it here. For each objective, look at the set of RI achievement ratings and compute the objective-level efficacy score using the rating methodology shown in table 1.7 (a six-point scale from HU to HS). Report the objective-level efficacy rating at the end of the section. Table 1.7. Step 2: Rating Efficacy at the Objective Level Rating Description Highly satisfactory Achievement of all RI targets is rated high. Satisfactory Achievement of most RI targets is rated substantial or above;a no RI target is rated negligible. Moderately satisfactory Achievement of at least half of RI targets is rated modest or above; fewer than one-third of RI targets are rated negligible. Moderately unsatisfactory Achievement of most RI targets is rated modest or below;a at least one RI target is rated negligible. Unsatisfactory Achievement of most RI targets is rated negligible;a the remainder are rated no higher than modest. Highly unsatisfactory Achievement of all RI targets is rated negligible. Source: Independent Evaluation Group. Note: RI = results indicator. a. Most is defined as two-thirds or more. These rating definitions should cover the majority of situations. In the rare situation where the achievement of RI targets fits into more than one category, you should exercise judgment, taking into account the relevance of the RIs, existence of additional relevant evidence, and the extent to which there are gaps in the results framework measuring progress toward the project development objectives as a result of the prior actions. Step 3: The overall efficacy rating draws on the efficacy ratings for each objective. To calculate the overall efficacy rating, convert the efficacy scores for each objective to numbers using the mapping in box 1.2 (if scores were rounded up or down, revert to the original scores up to two decimal places). Average the efficacy scores across objectives, 4See the Conducting the Task Team Leader Interview as Part of the ICRR Exercise section of this manual. 14 Guidance Manual and map it back to the ratings (rounding up or down as appropriate). The overall efficacy rating is an unweighted average of the objective-level efficacy ratings. Note: In an FCV context, flexibility may be needed in assessing efficacy, particularly for a situation of conflict. The level of uncertainty and volatility in the underlying context may make it unrealistic to expect all RI targets to be achieved. However, it may be difficult to anticipate ex ante which RI targets or pillars will be achieved. Moreover, the availability of credible and timely data may be limited. This may suggest the need for greater attention to qualitative data, lower-level outcomes, and proxies in assessing progress toward objectives. Section 6. Outcome The rating for overall outcome is determined using figure 1.1. The write-up should briefly summarize the findings on relevance of PAs and on efficacy. It should note the main strengths and shortcomings that contributed to those two ratings. For example, you could point out that the overall outcome rating was brought down by the low relevance of PAs. Figure 1.1. Calculating the Overall Outcome Rating HS S MS MU U HU Achievement of Objective (Efficacy) HS S MS MU U HU Relevance HS HS S MS MU U HU of Prior S HS S MS MU U HU Actions MS S S MS MU U HU MU MU MU MU MU U HU U MU MU MU U U HU HU U U U HU HU HU Source: Independent Evaluation Group. Note: HS = highly satisfactory; HU = highly unsatisfactory; MS = moderately satisfactory; MU = moderately unsatisfactory; S = satisfactory; U = unsatisfactory. Section 7. Risk to Development Outcomes The discussion of the risks to development outcomes should highlight the risks to sustaining the development outcomes achieved. It should not highlight the ex ante risks to the 15 Guidance Manual achievement of the PDO as noted in the program document. 5 Identify which outcomes are at risk of not being sustained, and explain the nature of the risks that threaten their sustainability. For eg. Institutional capacity: “Lack of commitment to reform in some parts of the government” (ICR, p. 33) could inhibit effective implementation of some measures initiated during the DPL series, such as creation of a risk management unit in DGT. Indeed, the third DPL was added to what was originally planned as two operations in part because more time was needed to meet the triggers. This risk is being mitigated through continued support by the World Bank team to the relevant ministries with respect to “improving the quality of tax policy and tax administration, as well as improving the quality of central government and subnational public spending”, (World Bank 2023). Discuss developments or actions taken that could mitigate risks of policy reversal or erosion of progress achieved. If a subsequent supporting World Bank operation or International Monetary Fund program is in place, for example, discuss whether (and how) it supports the sustainability of the outcomes achieved. Section 8. Assessment of Bank Performance Bank performance is assessed for (i) the design and preparation of the operation or series (that is, up to approval of the operation or the first operation in a series) and (ii) implementation of the operation or series (that is, after approval of the operation or the first operation in a programmatic series). The overall bank performance rating is an average of 8a and 8b. For DPOs (particularly in stand alone DPOs), 8a is more important as there is no implementation. 8a. Design and Preparation Section 8a should cover the following points: • The extent to which World Bank staff have drawn on lessons learned from prior experience in design of the operation or series. These lessons should be clearly identified and could be either from the country in question or from similar operations or activities in other countries. • The adequacy of the analytical underpinnings of PAs and RIs (including their role in articulating the underlying results chain). For example, are the assumptions underpinning the theory of change based on sound and rigorous analysis that is relevant to the country context? Is the theory of change based on clearly identified diagnostic findings? 5The assessment of the adequacy of the identification and discussion of the ex ante risks in the program document is covered in the Implementation Completion and Results Report Review section on Bank Performance: Design and Preparation (section 8a). 16 Guidance Manual • The extent to which the program document identified the main risks and constraints to achieving PDOs and the quality and depth of the discussion of the main risks. The assessment should also include consideration of the credibility and coherence of the mitigating measures identified to reduce the risks. For example, where institutional capacity constraints in a government posed risks to implementation, was technical support from the World Bank or other development partners envisioned? • The extent to which the operation drew on consultations with relevant major stakeholders and development partners or envisioned collaboration, as appropriate (for example, where other development partners were involved in similar support). For FCV countries, the assessment should also cover the following factors: • The extent to which lessons learned from prior experience in FCV contexts informed program design. • The adequacy of analytical underpinnings of the operation in the specific FCV situation in which the operation is being implemented, including with respect to the key drivers of fragility. This could include work done by both the World Bank and other development partners. • The extent to which the operation identified possible negative impacts on drivers of fragility and conflict. For example, did evaluators draw on a Poverty and Social Impact Analysis of the reforms supported by the PAs to identify risks that could increase instability or violence? • The extent to which the World Bank proactively supported efforts to mitigate or reduce risks identified ex ante. In FCV situations, weaknesses in technical and institutional capacity may pose particularly important risks to the ability of the authorities to implement supported reforms. Where this is the case, the World Bank should have had a strategy to address these shortcomings through parallel technical assistance, training, or project support provided directly or by development partners. • The extent to which design of the operation drew on consultations and cooperation with major stakeholders and development partners (when necessary). In an FCV context, this may extend beyond traditional development partners (for example, United Nations agencies or humanitarian, diplomatic, and security actors may be critical partners). 8b. Bank Performance—Implementation Implementation refers to the period after approval of the operation or the first operation in a programmatic series. 17 Guidance Manual Consider the following questions: • Is there evidence of ongoing monitoring of progress toward achievement of targets using the results framework (for example, aide-mémoire, notes to file)? This is particularly important for a programmatic series, in which progress toward RI targets should be monitored regularly. To enable this, the selection of RIs should take into account the availability of data during the implementation of the series (not just at closing). • In the case of a programmatic series, were triggers, targets, or RIs adapted appropriately to lessons learned or changes in underlying conditions, risks, operational priorities, or unexpected events after approval? • Were the identified mitigation measures for addressing risks to achievement of the PDO (for example, technical capacity constraints, ownership concerns) implemented? • Was there stakeholder and donor coordination where needed? In FCV situations, this might include (where appropriate) humanitarian, diplomatic, and security actors. • Was there an effort to identify new and emerging risks to the achievement of the PDOs? The ratings guidance for Bank performance is shown in table 1.8. 18 Guidance Manual Table 1.8. Rating Bank Performance Highly Satisfactory Moderately Satisfactory Moderately Unsatisfactory Unsatisfactory Highly Satisfactory Unsatisfactory Prior The design of the operation or The design of the operation The design incorporated limited The design made no reference to the experience and series explicitly drew on prior or series referenced prior prior experience and analytical incorporation of prior experience or lessons lessons learned experience and lessons learned. experience and lessons and diagnostic work, if relevant. learned. learned. Identification The operation The operation The operation discussed The operation contained a The operation contained There was no and mitigation contained a discussed specific risks to achievement discussion of risks to achievement a superficial and discussion of risks of risks to meaningful some of the of PDOs, but only a subset of of PDOs at a general level, but incomplete discussion of to achievement of achievement of discussion of the major risks to the mitigating measures key risks were missed. Mitigating risks to achievement of PDOs or of PDOs major risks to achievement were credible and measures were discussed but PDOs. Mitigating mitigating achievement of of PDOs and substantive. were largely superficial or not measures were not measures. PDOs, articulated articulated implemented. discussed. credible credible mitigating mitigating measures, and measures. incorporated them in the design of the operation. Consultation The operation was informed by The operation was informed The operation was informed by Few stakeholders were consulted in the design with major consultation with all major by consultation with most consultation with only some of of the operation. stakeholders stakeholders. major stakeholders. the major stakeholders. Coordination There was close There was close cooperation and There was limited cooperation There was minimal There was no with cooperation and coordination with major development and coordination with major cooperation or cooperation or development coordination with partners. development partners. coordination with major coordination with partners all major development partners. major development development partners. partners. Monitoring There is credible There is evidence (for example, reports, aide- There is evidence (for example, There is no evidence of monitoring of evidence (for mémoire) of periodic monitoring of progress reports, aide-mémoire) of progress toward targets for results indicators example, reports, toward achievement of targets or most periodic monitoring of progress before series completion. aide-mémoire) of results indicators. toward achievement of targets for regular a few results indicators. monitoring of 19 Guidance Manual Highly Satisfactory Moderately Satisfactory Moderately Unsatisfactory Unsatisfactory Highly Satisfactory Unsatisfactory progress toward achievement of targets for all results indicators. Adaptation Circumstances Circumstances and priorities changed, and Changed Changed Changed Changed and priorities some elements of the series were adapted to circumstances circumstances or circumstances or circumstances or changed, and the lessons learned. or lessons lessons learned lessons learned lessons learned did series was learned resulted in resulted in minimal not result in any adapted resulted in insufficient adaptation of the meaningful appropriately and modest adaptation of the series, with little adaptation of the explicitly to adaptation of series; the explanation for the series. lessons learned. the series. rationale for changes. changes was not explained. Source: Independent Evaluation Group. Note: PDO = project development objective. 20 Guidance Manual Section 9. Other Impacts Frequently, operations will have significant impacts, both positive and negative, in addition to those explicitly identified in the program document. These include social, gender, poverty, climate, environmental, and conflict-related impacts. It is important that actual observed impacts be identified in the ICR. Note that this section is not a description of expected impacts identified in the program document but a discussion of actual impacts. You should draw on the ICR to identify these other impacts, noting when evidence is absent or inconsistent. Where no such assessment appears in the ICR, note this in the ICRR. Failure to identify and discuss other impacts should negatively influence the Independent Evaluation Group (IEG) rating of the quality of the ICR. This is particularly the case when social, gender, poverty, climate, and environmental impacts were expected (for example, they are identified in the program document) but are not discussed in the ICR. For FCV countries, “other impacts” may include disproportionate impacts on aggrieved, excluded, or vulnerable groups; gender-based violence; and possible implications for fragility and conflict drivers. It is important to assess possible FCV risks that may be exacerbated by policy actions (for example, reforms to subsidies or tariffs). Section 10. Quality of the Implementation Completion and Results Report Because the ICRR is largely based on the information found in the ICR, the reliability of IEG’s ratings depends critically on the accuracy and quality of the evidence it provides. For this reason, IEG rates the quality of the ICR, taking into account the following criteria: • Internal consistency. Does the ICR present a coherent narrative of the program that flows logically? • Quality of evidence. Does the ICR present an adequate and robust evidence base to support the achievements reported, including in annexes or appendixes? Does the evidence come from credible sources, and is it appropriately referenced and presented in a concise fashion? • Quality of analysis. Has there been sufficient and balanced interrogation of the evidence and clear linking of evidence to interventions and outcomes through a coherent results chain? • Quality of lessons learned. Are the lessons formulated in the ICR supported by the evidence and findings of the ICR? Are they operationally relevant (that is, can they be drawn on to concretely influence future behavior)? Are they focused 21 Guidance Manual on what can be derived from experience with the operation, or have they been overly generalized? In general, lessons based on evidence from a single country could not be extended to other countries or groups of countries. • Outcome orientation. Is it clear how better results could have been achieved or what should be done differently in the future to improve impact? • Consistency with guidelines. Does the report follow the ICR guidelines and methodology (for example, with regard to structure and ratings)? • Conciseness. Does the ICR focus on critical information and evidence, or is it overly descriptive and contain information unnecessary for self-evaluation? Section 11. Ratings The ratings summary table lists and compares the ratings of World Bank staff (ICR) and IEG (ICRR) for outcome, Bank performance, relevance of results indicators, and quality of ICR (table 1.9). The IEG ratings are automatically generated from those entered in earlier sections of the ICRR. Wherever ICR and IEG ratings for outcome or Bank performance differ, you should briefly note the source of the difference. Table 1.9. Example of a Ratings Summary Table Reason for Disagreement or Ratings ICR IEG Comments Outcome Satisfactory Moderately Weak relation between some PAs satisfactory and outcomes and some unclear results indicators reduced efficacy rating and hence the rating for overall outcome. Bank performance Satisfactory Satisfactory Relevance of results n.a. Moderately indicators unsatisfactory Quality of ICR n.a. Substantial Source: Independent Evaluation Group. Note: ICR = Implementation Completion and Results Report; IEG = Independent Evaluation Group; PA = prior action. Section 12. Lessons Each ICR presents lessons to inform future efforts. ICRs for programs that do not achieve their objectives often produce some of the most valuable lessons. IEG, in the context of the ICRR, reviews the lessons articulated by staff and assesses them for clarity, coherence, and value added. You should identify the most pertinent lessons from the ICR and redraft them for clarity or to better reflect the finding of the ICRR. You should note where lessons do not appear well grounded in the evidence and analysis presented in the ICR. 22 Guidance Manual You may also include lessons that emerge from the ICRR that are not identified in the ICR. These should meet the same standard of quality, specificity, and rigor that is expected in the ICR. Avoid identifying generic lessons. Lessons should be distinct from findings, or recommendations but should be able to highlight the key factors that affected performance and outcomes. Lessons can be positive or negative, but should should actually emerge from an operation's experience, pitching lessons at the right level (not too specific, not too generic) which can provide valuable insights for follow-up or similar operations in the sector/sub-sector, country, or other countries. Project Performance Assessment Report (PPAR) recommendations: The PPAR assesses projects for two purposes: to improve the performance of World Bank projects by identifying lessons from experience, and to ensure the integrity of the World Bank’s self-evaluation process and verify that the World Bank’s work is producing the expected results. PPARs are a project evaluation, not a validation, and draw on new evidence and analysis. PPARs rely on a mixed methods approach that usually includes (but is not limited to) literature review, portfolio analysis and a country mission, involving site visits and semistructured interviews with different stakeholders. Where the evaluator assesses satisfactory grounds for further enquiry, and for additional lessons to be learnt from an operation a recommendation can be made for a PPAR assessment. Conducting the Task Team Leader Interview as Part of the Implementation Completion and Results Report Review Exercise As part of the ICRR drafting exercise, you will conduct an interview with the last TTL of the operation. The purpose of the meeting is twofold: (i) to gain a better understanding of the project experience to improve the accuracy and quality of IEG’s ICRRs and (ii) to ensure due process by providing the project TTL and the IEG ICR reviewer an opportunity to discuss the project experience. The meeting is explicitly not intended to discuss any possible ICRR ratings. This meeting is conducted before IEG sends the draft ICRR to the Global Practice. The meeting with the TTL is different from the meeting that the Global Practice might request to discuss the draft ICRR after receiving it from IEG (see point 4 for further details on the timing of the meeting). The meeting should be held with the last TTL of the project or in the case of a programmatic series, the TTL of the final project. The meeting should not be held with the ICR author alone, unless the last TTL and the ICR author are the same person, or the last TTL specifically delegates to the ICR author the responsibility for the meeting on behalf of the Global Practice. If the last TTL of the project is no longer employed with the 23 Guidance Manual World Bank, on consultation with the ICRR coordinator, you should contact the concerned sector manager for an alternative suggestion. It would be up to the project TTL to invite other Global Practice staff at their discretion. The meeting should be conducted only after you have prepared an advanced draft of the ICRR and after the feedback on the first draft is received from the panel reviewer. You are expected to indicate in the relevant sections of the draft ICR that information will be sought to substantiate the assessment when submitting the draft to the panel reviewer, along with the list of questions that you intend to ask. You should inform the meeting participant(s) that additional information obtained during the meeting and their comments may be used in the ICRR. You should focus on missing or ambiguous information in the ICR that is necessary to answer IEG’s evaluative questions, including any additional evidence that may be needed to substantiate the ratings. For example, an ICR often states that an RI target will be achieved by a specified date that is later than the ICR’s publication date. In the TTL interview, you should ask for confirmation and evidence that the target was achieved. The ICR may have contradictory data in different sections. If so, the TTL interview is a chance to ask for the correct data. Finally, the ICR may mention that other development partners supported the reform agenda, without providing detail. The TTL interview is an opportunity to ask for details. You should use the meeting to confirm your understanding of the project context, gain a better understanding of the factors that might explain the project’s performance (good or bad), and probe what the project TTL might have done differently had they had the option. 24 References World Bank. 2020. “Mauritania—Mauritania DPO.” Implementation Completion and Results Report Review ICRR0021978, Independent Evaluation Group, World Bank, Washington, DC. http://documents.worldbank.org/curated/en/197021622122628474/Mauritania- Mauritania-DPO. World Bank. 2022. “Brazil— Mato Grosso Fiscal Adjustment and Environmental Sustainability Development Policy Loan (English).” Implementation Completion and Results Report ICR5960, World Bank, Washington, DC. http://documents.worldbank.org/curated/en/099718412222222365/BOSIB056d6241a0e10a 8a1028c9e46fe079. World Bank. 2023. “Indonesia—IDN Fiscal Reform DPL (P156655).” .” Implementation Completion and Results Report Review ICRR0022624, Independent Evaluation Group, World Bank, Washington, DC. https://documents1.worldbank.org/curated/en/099630002032238414/pdf/P1566550ccf9b40 520b6c00eebba5cfaf58.pdf. 25